FLUX.2 Inference: Sub-Second Image Generation on NVIDIA & AMD

Deploy images at scale. 4x faster than PyTorch

The most capable FLUX model, compiled and optimized with MAX for sub-second image generation. Reduced computational overhead, portable across NVIDIA and AMD GPUs, and seamless scaling on Modular Cloud.

Request a demo

Request early access

[data-wf-bgvideo-fallback-img] { display: none; } @media (prefers-reduced-motion: reduce) { [data-wf-bgvideo-fallback-img] { position: absolute; z-index: -100; display: inline-block; height: 100%; width: 100%; object-fit: cover; } }

Flux.2 at a glance:

Generation time
<1s
vs torch.compile
4.1x
per image on AMD
$0.009
GPU vendors
2

Generate images and videos in seconds

Test out our image generation quality, then request access to a full account. Video generation coming soon. Request a demo to get priority access.

Request API Token

Performance

Compiled, not wrapped

Modular’s engine (MAX) compiles the full diffusion pipeline (DIT, VAE, text encoder, scheduler) into a single fused execution graph through MLIR. No PyTorch runtime. No ComfyUI shim. Every denoising step optimized together.

Resolution	MAX on B200	torch.compile on B200	Speedup on B200	MAX on MI355X	Speedup on MI355X
1024 x 1024	3.3s	13.3s	4.1x	3.4s	3.8x
768 x 1360	2.8s	10.7s	3.8x	3.0s	4.0x
1360 x 768	2.4s	8.2s	3.4x	2.5s	3.6x

Cost per image

5.5x total cost advantage on AMD

4.1x from compiled performance. 1.33x from lower hardware cost. The economics stack.

Lowest Cost
$0.009 / image
MI355X with sub second generation
MAX on AMD MI355X - 1024 x 1024
$0.011 / image
B200 with sub second generation
MAX on NVIDIA B200 - 1024 x 1024

Provider / Setup	Cost per image (1024 x 1024)	vs. Modular on AMD
Nano Banana Pro	$0.134	15x more expensive
fal.ai (FLUX.2 Dev)	$0.012	1.3x more expensive
MAX on B200	$0.011	1.2x more expensive
MAX on MI355X	$0.009	Reference

fal.ai pricing: $0.012/megapixel for FLUX.2 (dev) (1MP - 1024 x 1024) Source: fal.ai/models/fal-ai/flux2. Modular pricing reflects compute cost with BFL commercial license.

Quality

Same quality. 4x the speed.

Identical model weights. Identical output quality. The performance comes from compilatio, not approximation.

A small red propeller plane banking sharply between massive jungle trees in a bright anime style, with midday sun illuminating lush green foliage and waterfalls cascading in the background.

Cartoon astronaut giving a thumbs-up with a dark reflective visor and white suit.

4.1x faster
1024 x 1024 - 50 steps
Diffusers with torch.compile
1x baseline
1024 x 1024 - 50 steps

Top: MAX(~1.0s)
Bottom: torch.compile (~4s)

Left: MAX(~1.0s)
Right: torch.compile (~4s)

Same prompt, same model weights. Quality tolerance is configurable — trade fractions of perceptual quality for sub second generation.

Hardware

The only FLUX.2 endpoint on NVIDIA and AMD

Every other FLUX.2 provider is locked to NVIDIA. MAX compiles diffusion models natively for both vendors from the same container.

Cost arbitrage
AMD spot pricing runs 25-40% lower. Shift batch generation workloads without changing a line of code
Supply flexibility
GPU availability fluctuates. Two vendors means you’re never capacity-constrained by one.
Future-proof
MAX targets hardware at the instruction level. No CUDA dependency. New silicon support ships in days

Architecture

Full pipeline, one compiled graph

Diffusion inference isn’t a single model call, it’s dozens of denoising steps plus decoding, encoding, and scheduling. Modular’s engine (MAX) compiles all of it together.

Text Encoder
Prompt embedding
DIT
Denoising x N stops
Scheduler
Step optimization
VAE Decode
Latent > Pixel
Image
<1s total

Other platforms run each stage separately through PyTorch, with memory round-trips between them. MAX eliminates that overhead by fusing the entire pipeline into one MLIR-compiled graph. Built with custom Mojo kernels, not a PyTorch wrapper.

Integration

OpenAI-compatible.
Drop-in replacement.

Change the base URL. That’s it.

  from openai import OpenAI
  
  client = OpenAI(
      base_url="https://api.modular.com",
      api_key="<YOUR_KEY_HERE>",
  )
  
  response = client.images.generate(
      model="black-forest-labs/flux-2-dev",
      prompt="Product photo, white background, soft lighting",
      n=1,
      size="1024x1024",
  )

SPecs

Model details

- Developer
  Black Forest Labs
- Precision
  BF16
- API
  OpenAI compatible
- Hardware
  NVIDIA, AMD, Apple Silicon
- License
  developer
- Parameters
  32B
- Modality
  Text > Image, Image Editing
- Deployment
  Shared, Dedicated, Self Hosted
- Container
  <700MB (90% smaller than vLLM)

FAQ

How fast is FLUX.2 on Modular?
Sub-second at 1024x1024 on NVIDIA B200. That’s 4.1x faster than torch.compile on the same hardware. On AMD MI355X, generation time is within 4% of B200.
How much does FLUX.2 cost on Modular
Compute cost starts at $0.001/image on AMD MI355X and $0.002/image on NVIDIA B200. Volume pricing available for dedicated endpoints
Can FLUX.2 run on AMD GPUs
Yes. Modular is the only inference platform that runs Flux.2 natively on both NVIDIA and AMD from the same container. No code changes required.
Is the API OpenAI-compatible
Yes. Drop-in replacement with same images generate endpoint format. Switch from any provided by changing the base URL.
Does Modular support LoRA fine-tuned models?
Yes. Bring your own LoRA weights and MAX compiles and serves them with the same performance advantage.
How does this compare to vLLM for image generation?
vLLM and SGLang don’t support diffusion models since they’re LLM inference engines. MAX serves image models natively alongside LLMs in the same container and API

Start using Flux.2 with Modular

Custom demo of FLUX.2
We'll show you Modular's benchmarks on workloads similar to yours.
First Name*
Last Name*
Work Email*
Anything you want to share:
Thank you for your submission.
Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.
Thank you,
Modular Sales Team
Oops! Something went wrong while submitting the form.
Thank you for your submission.
Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.
Thank you,
Modular Sales Team