Gemma 4 just dropped on Modular, Day Zero! Read More →

FLUX.2 Inference: Sub-Second Image Generation on NVIDIA & AMD

Deploy images at scale. 4x faster than PyTorch

The most capable FLUX model, compiled and optimized with MAX for sub-second image generation. Reduced computational overhead, portable across NVIDIA and AMD GPUs, and seamless scaling on Modular Cloud.

Flux.2 at a glance:

  • Generation time

    <1s

  • vs torch.compile

    4.1x

  • per image on AMD

    $0.009

  • GPU vendors

    2

Generate images and videos in seconds

Test out our image generation quality, then request access to a full account. Video generation coming soon. Request a demo to get priority access.

Request API Token
Performance

Compiled, not wrapped

Modular’s engine (MAX) compiles the full diffusion pipeline (DIT, VAE, text encoder, scheduler) into a single fused execution graph through MLIR. No PyTorch runtime. No ComfyUI shim.  Every denoising step optimized together.

Resolution

MAX

on B200

torch.compile

on B200

Speedup

on B200

MAX

on MI355X

Speedup

on MI355X

1024 x 1024

3.3s

13.3s

4.1x

3.4s

3.8x

768 x 1360

2.8s

10.7s

3.8x

3.0s

4.0x

1360 x 768

2.4s

8.2s

3.4x

2.5s

3.6x

Cost per image

5.5x total cost advantage on AMD

4.1x from compiled performance.  1.33x from lower hardware cost.  The economics stack.

  • Lowest Cost
    $0.009   / image
    MI355X with sub second generation

    MAX on AMD MI355X  -  1024 x 1024

  • $0.011    / image
    B200 with sub second generation

    MAX on NVIDIA B200  -  1024 x 1024

Provider / Setup

Cost per image

(1024 x 1024)

vs. Modular on AMD

Nano Banana Pro

$0.134

15x more expensive

fal.ai (FLUX.2 Dev)

$0.012

1.3x more expensive

MAX on B200

$0.011

1.2x more expensive

MAX on MI355X

$0.009

Reference

fal.ai pricing: $0.012/megapixel for FLUX.2 (dev) (1MP - 1024 x 1024) Source: fal.ai/models/fal-ai/flux2. Modular pricing reflects compute cost with BFL commercial license.

Quality

Same quality. 4x the speed.

Identical model weights. Identical output quality.  The performance comes from compilatio, not approximation.

A small red propeller plane banking sharply between massive jungle trees in a bright anime style, with midday sun illuminating lush green foliage and waterfalls cascading in the background.

Cartoon astronaut giving a thumbs-up with a dark reflective visor and white suit.
  • 4.1x faster
    Red single-engine plane flying through a lush forest with tall trees and waterfalls under a bright blue sky.
    1024 x 1024  - 50 steps
  • Diffusers with torch.compile
    1x baseline
    Red single-engine airplane flying low through a lush, green jungle with waterfalls in the background.
    1024 x 1024  - 50 steps
  • Top: MAX(~1.0s)
  • Bottom: torch.compile (~4s)
  • Left: MAX(~1.0s)
  • Right: torch.compile (~4s)

Same prompt, same model weights.  Quality tolerance is configurable — trade fractions of perceptual quality for sub second generation.

Hardware

The only FLUX.2 endpoint on NVIDIA and AMD

Every other FLUX.2 provider is locked to NVIDIA.  MAX compiles diffusion models natively for both vendors from the same container.

  • Cost arbitrage

    AMD spot pricing runs 25-40% lower.  Shift batch generation workloads without changing a line of code

  • Supply flexibility

    GPU availability fluctuates.  Two vendors means you’re never capacity-constrained by one.

  • Future-proof

    MAX targets hardware at the instruction level.  No CUDA dependency.  New silicon support ships in days

Architecture

Full pipeline, one compiled graph

Diffusion inference isn’t a single model call, it’s dozens of denoising steps plus decoding, encoding, and scheduling.  Modular’s engine (MAX) compiles all of it together.

  • Text Encoder
    Prompt embedding
  • DIT
    Denoising x N stops
  • Scheduler
    Step optimization
  • VAE Decode
    Latent > Pixel
  • Image
    <1s total

Other platforms run each stage separately through PyTorch, with memory round-trips between them.  MAX eliminates that overhead by fusing the entire pipeline into one MLIR-compiled graph.  Built with custom Mojo kernels, not a PyTorch wrapper.

Integration

OpenAI-compatible.
Drop-in replacement.

Change the base URL.  That’s it.

  from openai import OpenAI
  
  client = OpenAI(
      base_url="https://api.modular.com",
      api_key="<YOUR_KEY_HERE>",
  )
  
  response = client.images.generate(
      model="black-forest-labs/flux-2-dev",
      prompt="Product photo, white background, soft lighting",
      n=1,
      size="1024x1024",
  )
SPecs

Model details

    • Developer
      Black Forest Labs
    • Precision
      BF16
    • API
      OpenAI compatible
    • Hardware
      NVIDIA, AMD, Apple Silicon
    • License
      developer
    • Parameters
      32B
    • Modality
      Text > Image, Image Editing
    • Deployment
      Shared, Dedicated, Self Hosted
    • Container
      <700MB (90% smaller than vLLM)

FAQ

  • How fast is FLUX.2 on Modular?

    Sub-second at 1024x1024 on NVIDIA B200. That’s 4.1x faster than torch.compile on the same hardware. On AMD MI355X, generation time is within 4% of B200.

  • How much does FLUX.2 cost on Modular

    Compute cost starts at $0.001/image on AMD MI355X and $0.002/image on NVIDIA B200. Volume pricing available for dedicated endpoints

  • Can FLUX.2 run on AMD GPUs

    Yes. Modular is the only inference platform that runs Flux.2 natively on both NVIDIA and AMD from the same container. No code changes required.

  • Is the API OpenAI-compatible

    Yes. Drop-in replacement with same images generate endpoint format. Switch from any provided by changing the base URL.

  • Does Modular support LoRA fine-tuned models?

    Yes. Bring your own LoRA weights and MAX compiles and serves them with the same performance advantage.

  • How does this compare to vLLM for image generation?

    vLLM and SGLang don’t support diffusion models since they’re LLM inference engines. MAX serves image models natively alongside LLMs in the same container and API

Start using Flux.2 with Modular

  • Person with blonde hair using a laptop with an Apple logo.

    Custom demo of FLUX.2

    We'll show you Modular's benchmarks on workloads similar to yours.

    Thank you for your submission.

    Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.

    Thank you,

    Modular Sales Team

    Oops! Something went wrong while submitting the form.

    Thank you for your submission.

    Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.

    Thank you,

    Modular Sales Team