
Editions that work for everyone. Scale as you grow.

Self Hosted
The full power of MAX and Mojo - free for all developers. One container, under 700MB, runs on NVIDIA, AMD, and Apple Silicon. Deploy anywhere you have hardware.
SOTA inference performance on any GPU vendor
Run AI models and pipelines on any hardware we support.
Deploy MAX and Mojo yourself - container under 1GB
Custom kernels in Mojo for novel architectures
Community support through Discord and Github
For engineers who want full control, any GPU, and no upfront cost.
Join our open-source community.
Our Cloud
Model Endpoints on NVIDIA and AMD GPUs with forward-deployed Modular engineers optimizing your workloads. Not just infrastructure - engineering.
Always-on compute with SOTA inference performance
Shared & Dedicated Endpoints
Usage metrics and observability
Lowest cost endpoints to maximize ROI for the most demanding workloads
SOC 2 Type 2 certified
Forward-deployed engineers tuning your deployment
For teams that want speed, reliability, and a dedicated engineer - without managing infrastructure.

Your Cloud
Built on our production-hardened BYOC infrastructure. Your VPC, your cloud credits, your compliance policies - with Modular engineers and our control plane inside.
Everything in Dedicated Endpoint, plus:
Deployment in your cloud or on-premise
Data never leaves your VPC
Performance optimization of your specific pipelines and workloads
Custom APIs
SOC 2 Type 2 certified
Forward-deployed engineers tuning your deployment
For enterprises needing compliance, control, GPU flexibility, and hands-on engineering.
Compare deployment options
Self-Hosted | Our Cloud | Your Cloud | |
|---|---|---|---|
Support | Active community and fast responses in Discord, Discourse, Github | Dedicated support, engineering team, standard and custom SLAs/SLOs | Dedicated support, engineering team, standard and custom SLAs/SLOs |
Models | Hundreds of models in our model repo, view top performers | Top performers available for dedicated endpoint, custom model deployment | Top performers available for dedicated endpoint, custom model deployment |
AI Skills | Use our open AI skills to easily write models, or optimize code | Our engineers can help train your team & migrate your workloads | Our engineers can help train your team & migrate your workloads |
Platform access | Deploy MAX and Mojo yourself anywhere you want. Build with open source | Access Modular Platform with a console for deploying, scaling and managing your AI endpoints. | Access Modular Platform with a console for deploying, scaling and managing your AI endpoints. |
Scalability | Scale on your own with the MAX container | Auto-scaling, scale to zero, burst capacity | Auto-scaling, proven at Fortune 500 scale. |
Deployment location | Self-deployed, anywhere | Our cloud | Your cloud or hybrid |
Compute hardware | NVIDIA, AMD, and Apple Silicon & more on hardware you own | NVIDIA & AMD GPUs in our cloud. More hardware types coming soon | NVIDIA & AMD GPUs, Intel, AMD & ARM CPUs - deployed in your cloud. |
Custom kernels | Your engineers write custom kernels for your workloads. | Modular engineers tune kernels for your workloads | Modular engineers write custom kernels for your workloads |
Forward Deployed Engineers | Available with support plan | Included | Included; working in your environment |
Security & Compliance | SOC 2 Type 2 certified | SOC 2 Type 2 certified (Type II in progress) | SOC 2 Type 2 certified (Type II in progress) |
Billing & Pricing | Free | Per token (shared) Per minute (dedicated) | Per minute deployed. Use your AWS/GCP/Azure credits and commits |
Enterprise Contract |
FAQ
Which models can I run on Modular?
With Modular, you can run the latest open-source models, fine-tuned variants or your own custom ones. Our Self-Hosted Open-Source Community Edition is free and runs on NVIDIA, AMD, and Apple Silicon. For managed deployments, Our Cloud offers shared and dedicated endpoints with forward-deployed engineers optimizing your workloads.
What are Shared & Dedicated Endpoints?
Modular run's all the latest open models in a shared endpoint (shared GPUs, billed on a $ / token basis), or a dedicated (dedicated GPUs, billed on $ / per minute basis) on our cloud infrastructure. We can also deploy in your compute environment on a dedicated basis (Your Cloud). Feel free to reach out to us if you have questions.
Which GPUs are available on Modular?
We support all variants of NVIDIA & AMD - from datacenter GPUs like NVIDIA B200s and AMD MI355Xs to consumer accelerators like NVIDIA RTX. Our platform delivers exceptional, state-of-the-art performance across the board, with industry-leading results on the latest NVIDIA B200s and AMD MI355Xs.
Can I get started with Modular Community Edition easily?
Yes. The Self-Hosted Community Edition is completely free and open source. Install via Docker, PIP, UV, PIXI, or Conda - the container is under 700MB and runs on any GPU we support. You can be serving models in minutes.
Is Modular hosted infrastructure secure?
Yes. Modular is SOC 2 Type I certified and independently audited, with SOC 2 Type II certification on the way. Our Cloud and Your Cloud editions both include enterprise-grade security. With Your Cloud (BYOC), data never leaves your VPC.
How does Modular integrate with our existing infrastructure?
Modular integrates seamlessly into your stack. Our Cloud endpoints are fully compatible with the OpenAI API standard - swap in with a single line change. For custom kernels, Mojo interoperates directly with C++, CUDA, and ROCm. Every paid tier includes forward-deployed engineers to help with migration and optimization.
What level of customer support do you offer?
Support varies by edition. The free Self-Hosted Community Edition is backed by an active Discord and GitHub community. Our Cloud and Your Cloud editions include dedicated support via email, Slack, and video calls - plus forward-deployed engineers who tune your deployments directly.
Do I pay for idle time on Modular's hosted endpoints?
Pricing depends on your edition. Our Cloud charges per token or per minute - you pay for what you use. Your Cloud (BYOC) is billed per minute of reserved GPU capacity for guaranteed low-latency availability. The Self-Hosted Community Edition is free forever with no usage fees.
Do you offer volume discounts on compute?
Yes. We offer committed-use and volume pricing for Our Cloud and Your Cloud editions. Every paid tier also includes forward-deployed engineers who actively optimize your workloads — not just infrastructure, but hands-on engineering support.
Can I host Modular in my own cloud or on-premises?
Yes. The free Self-Hosted Community Edition lets you deploy on your own hardware immediately — one container, under 700MB, on any GPU we support. For production-hardened BYOC deployments, Your Cloud runs in your VPC on AWS, GCP, Azure or OCI with your cloud credits, your compliance policies, and Modular engineers and our control plane inside.

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models
Compare deployment options
Self-Hosted | Our Cloud | Your Cloud | |
|---|---|---|---|
Support | Active community and fast responses in Discord, Discourse, Github | Dedicated support, engineering team, standard and custom SLAs/SLOs | Dedicated support, engineering team, standard and custom SLAs/SLOs |
Models | Hundreds of models in our model repo, view top performers | Top performers available for dedicated endpoint, custom model deployment | Top performers available for dedicated endpoint, custom model deployment |
AI Skills | Use our open AI skills to easily write models, or optimize code | Our engineers can help train your team & migrate your workloads | Our engineers can help train your team & migrate your workloads |
Platform access | Deploy MAX and Mojo yourself anywhere you want. Build with open source | Access Modular Platform with a console for deploying, scaling and managing your AI endpoints. | Access Modular Platform with a console for deploying, scaling and managing your AI endpoints. |
Scalability | Scale on your own with the MAX container | Auto-scaling, scale to zero, burst capacity | Auto-scaling, proven at Fortune 500 scale. |
Deployment location | Self-deployed, anywhere | Our cloud | Your cloud or hybrid |
Compute hardware | NVIDIA, AMD, and Apple Silicon & more on hardware you own | NVIDIA & AMD GPUs in our cloud. More hardware types coming soon | NVIDIA & AMD GPUs, Intel, AMD & ARM CPUs - deployed in your cloud. |
Custom kernels | Your engineers write custom kernels for your workloads. | Modular engineers tune kernels for your workloads | Modular engineers write custom kernels for your workloads |
Forward Deployed Engineers | Available with support plan | Included | Included; working in your environment |
Security & Compliance | SOC 2 Type I certified | SOC 2 Type I certified (Type II in progress) | SOC 2 Type I certified (Type II in progress) |
Billing & Pricing | Free | Per token (shared) Per minute (dedicated) | Per minute deployed. Use your AWS/GCP/Azure credits and commits |
Enterprise Contract |