FLUX.2 Image Generation: 4x faster performance. Read More →

Editions that work for everyone.
Scale as you grow.

  • Free Forever

    Self Hosted

    The full power of MAX and Mojo - free for all developers. One container, under 700MB, runs on NVIDIA, AMD, and Apple Silicon. Deploy anywhere you have hardware.

    • SOTA inference performance on any GPU vendor

    • Run AI models and pipelines on any hardware we support.

    • Deploy MAX and Mojo yourself - container under 1GB

    • Custom kernels in Mojo for novel architectures

    • Community support through Discord and Github

    For engineers who want full control, any GPU, and no upfront cost.  
    Join our open-source community.

  • PAY PER TOKEN / MINUTE

    Our Cloud

    Model Endpoints on NVIDIA and AMD GPUs with forward-deployed Modular engineers optimizing your workloads. Not just infrastructure - engineering.

    • Always-on compute with SOTA inference performance

    • Shared & Dedicated Endpoints

    • Usage metrics and observability

    • Lowest cost endpoints to maximize ROI for the most demanding workloads

    • SOC 2 Type 2 certified

    • Forward-deployed engineers tuning your deployment

    For teams that want speed, reliability, and a dedicated engineer - without managing infrastructure.

  • PAY PER MINUTE

    Your Cloud

    Built on our production-hardened BYOC infrastructure. Your VPC, your cloud credits, your compliance policies - with Modular engineers and our control plane inside.

    • Everything in Dedicated Endpoint, plus:

    • Deployment in your cloud or on-premise

    • Data never leaves your VPC

    • Performance optimization of your specific pipelines and workloads

    • Custom APIs

    • SOC 2 Type 2 certified

    • Forward-deployed engineers tuning your deployment

    For enterprises needing compliance, control, GPU flexibility, and hands-on engineering.

Compare deployment options

Self-Hosted

Our Cloud

Your Cloud

Support

Active community and fast responses in Discord, Discourse, Github

Dedicated support, engineering team, standard and custom SLAs/SLOs

Dedicated support, engineering team, standard and custom SLAs/SLOs

Models

Hundreds of models in our model repo, view top performers

Top performers available for dedicated endpoint, custom model deployment

Top performers available for dedicated endpoint, custom model deployment

AI Skills

Use our open AI skills to easily write models, or optimize code

Our engineers can help train your team & migrate your workloads

Our engineers can help train your team & migrate your workloads

Platform access

Deploy MAX and Mojo yourself anywhere you want. Build with open source

Access Modular Platform with a console for deploying, scaling and managing your AI endpoints.

Access Modular Platform with a console for deploying, scaling and managing your AI endpoints.

Scalability

Scale on your own with the MAX container

Auto-scaling, scale to zero, burst capacity

Auto-scaling, proven at Fortune 500 scale.

Deployment location

Self-deployed, anywhere

Our cloud

Your cloud or hybrid

Compute hardware

NVIDIA, AMD, and Apple Silicon & more on hardware you own

NVIDIA & AMD GPUs in our cloud. More hardware types coming soon

NVIDIA & AMD GPUs, Intel, AMD & ARM CPUs - deployed in your cloud.

Custom kernels

Your engineers write custom kernels for your workloads.

Modular engineers tune kernels for your workloads

Modular engineers write custom kernels for your workloads

Forward Deployed Engineers

Available with support plan

Included

Included; working in your environment

Security & Compliance

SOC 2 Type 2 certified

SOC 2 Type 2 certified (Type II in progress)

SOC 2 Type 2 certified (Type II in progress)

Billing & Pricing

Free

Per token (shared) Per minute (dedicated)

Per minute deployed. Use your AWS/GCP/Azure credits and commits

License

Enterprise Contract

FAQ

  • Which models can I run on Modular?

    With Modular, you can run the latest open-source models, fine-tuned variants or your own custom ones. Our Self-Hosted Open-Source Community Edition is free and runs on NVIDIA, AMD, and Apple Silicon. For managed deployments, Our Cloud offers shared and dedicated endpoints with forward-deployed engineers optimizing your workloads.

  • What are Shared & Dedicated Endpoints?

    Modular run's all the latest open models in a shared endpoint (shared GPUs, billed on a $ / token basis), or a dedicated (dedicated GPUs, billed on $ / per minute basis) on our cloud infrastructure. We can also deploy in your compute environment on a dedicated basis (Your Cloud). Feel free to reach out to us if you have questions.

  • Which GPUs are available on Modular?

    We support all variants of NVIDIA & AMD - from datacenter GPUs like NVIDIA B200s and AMD MI355Xs to consumer accelerators like NVIDIA RTX. Our platform delivers exceptional, state-of-the-art performance across the board, with industry-leading results on the latest NVIDIA B200s and AMD MI355Xs.

  • Can I get started with Modular Community Edition easily?

    Yes. The Self-Hosted Community Edition is completely free and open source. Install via Docker, PIP, UV, PIXI, or Conda - the container is under 700MB and runs on any GPU we support. You can be serving models in minutes.

  • Is Modular hosted infrastructure secure?

    Yes. Modular is SOC 2 Type I certified and independently audited, with SOC 2 Type II certification on the way. Our Cloud and Your Cloud editions both include enterprise-grade security. With Your Cloud (BYOC), data never leaves your VPC.

  • How does Modular integrate with our existing infrastructure?

    Modular integrates seamlessly into your stack. Our Cloud endpoints are fully compatible with the OpenAI API standard - swap in with a single line change. For custom kernels, Mojo interoperates directly with C++, CUDA, and ROCm. Every paid tier includes forward-deployed engineers to help with migration and optimization.

  • What level of customer support do you offer?

    Support varies by edition. The free Self-Hosted Community Edition is backed by an active Discord and GitHub community. Our Cloud and Your Cloud editions include dedicated support via email, Slack, and video calls - plus forward-deployed engineers who tune your deployments directly.

    Do I pay for idle time on Modular's hosted endpoints?

    Pricing depends on your edition. Our Cloud charges per token or per minute - you pay for what you use. Your Cloud (BYOC) is billed per minute of reserved GPU capacity for guaranteed low-latency availability. The Self-Hosted Community Edition is free forever with no usage fees.

    Do you offer volume discounts on compute?

    Yes. We offer committed-use and volume pricing for Our Cloud and Your Cloud editions. Every paid tier also includes forward-deployed engineers who actively optimize your workloads — not just infrastructure, but hands-on engineering support.

    Can I host Modular in my own cloud or on-premises?

    Yes. The free Self-Hosted Community Edition lets you deploy on your own hardware immediately — one container, under 700MB, on any GPU we support. For production-hardened BYOC deployments, Your Cloud runs in your VPC on AWS, GCP, Azure or OCI with your cloud credits, your compliance policies, and Modular engineers and our control plane inside.

Build the future of AI with Modular

  • Person with blonde hair using a laptop with an Apple logo.

    Sign up today

    Signup to our Cloud Platform today to get started easily.

    Sign Up
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    Browse our model catalog, or deploy your own custom model

    Browse models

Compare deployment options

Self-Hosted

Our Cloud

Your Cloud

Support

Active community and fast responses in Discord, Discourse, Github

Dedicated support, engineering team, standard and custom SLAs/SLOs

Dedicated support, engineering team, standard and custom SLAs/SLOs

Models

Hundreds of models in our model repo, view top performers

Top performers available for dedicated endpoint, custom model deployment

Top performers available for dedicated endpoint, custom model deployment

AI Skills

Use our open AI skills to easily write models, or optimize code

Our engineers can help train your team & migrate your workloads

Our engineers can help train your team & migrate your workloads

Platform access

Deploy MAX and Mojo yourself anywhere you want. Build with open source

Access Modular Platform with a console for deploying, scaling and managing your AI endpoints.

Access Modular Platform with a console for deploying, scaling and managing your AI endpoints.

Scalability

Scale on your own with the MAX container

Auto-scaling, scale to zero, burst capacity

Auto-scaling, proven at Fortune 500 scale.

Deployment location

Self-deployed, anywhere

Our cloud

Your cloud or hybrid

Compute hardware

NVIDIA, AMD, and Apple Silicon & more on hardware you own

NVIDIA & AMD GPUs in our cloud. More hardware types coming soon

NVIDIA & AMD GPUs, Intel, AMD & ARM CPUs - deployed in your cloud.

Custom kernels

Your engineers write custom kernels for your workloads.

Modular engineers tune kernels for your workloads

Modular engineers write custom kernels for your workloads

Forward Deployed Engineers

Available with support plan

Included

Included; working in your environment

Security & Compliance

SOC 2 Type I certified

SOC 2 Type I certified (Type II in progress)

SOC 2 Type I certified (Type II in progress)

Billing & Pricing

Free

Per token (shared) Per minute (dedicated)

Per minute deployed. Use your AWS/GCP/Azure credits and commits

License

Enterprise Contract