
Editions that work for everyone. Scale as you grow.
Free Forever
Community
An open AI platform powered by MAX and Mojo - free for every developer. Build, scale, and deploy AI on any hardware with a single framework.
SOTA GenAI serving performance
Supports the latest AI models across the latest AI hardware.
Deploy MAX and Mojo yourself in any cloud environment.
Open source and a vibrant community of developers.
Community support through Discord and Github
For engineers who want full control, customization, and no upfront cost.
PAY PER GPU HOUR
Batch API Endpoint
Fully managed batch API endpoints that are 85% lower cost than competitors, delivering fast completion with the latest AI models.
Asynchronous large-scale batch inference endpoints
Support the latest AI models - Qwen3, InternVL, GPT-OSS
Lowest-cost endpoints to maximize ROI
Turn around large batches in hours to days
SOC 2 Type I certified and independently audited
For teams that want faster, accurate, and less expensive batch inference at scale.
PAY PER GPU HOUR
Dedicated Endpoint
Fully managed, dedicated API endpoints for low-latency online inference with resilient, availability support for the latest AI models.
Distributed, large-scale online inference endpoints.
Support the latest AI models - Qwen3, InternVL, GPT-OSS
Highest-performance endpoints to maximize ROI
Resilient, high-availability, large-scale services
SOC 2 Type I certified and independently audited
Terms and Conditions
For teams that want faster, accurate, and less expensive online inference at scale.
Custom pricing
Enterprise
We partner with enterprises on advanced deployments —whether you need full data control, run your compute on CSPs or Neoclouds, or prefer a hybrid approach. Let's talk.
Everything in Dedicated Endpoint, plus:
Deployment in your cloud or on-premise environment.
Optimization of your custom pipelines and workloads.
Hybrid deployments designed for data sovereignty.
Tailored and flexible SLAs and SLOs for enterprise needs.
Terms and Conditions
For enterprises needing control, or hybrid cloud/on-prem setups.
Community | Batch API Endpoint | Dedicated Endpoint | Enterprise | |
---|---|---|---|---|
Support | Active community and fast responses in Discord, Discourse, Github | Dedicated customer support, support from engineering team, standard SLAs/SLOs guarantees. | Dedicated customer support, support from engineering team, standard SLAs/SLOs guarantees. | Dedicated customer support, support from engineering team, roadmap prioritization, standard SLAs/SLOs guarantees. |
Models | Anything in our model repo, view top performers first | See list which are available for batch endpoint. | Our team will bring up any model, or help you customize a model | Our team will bring up any model, or help you customize a model |
Platform access | Deploy MAX and Mojo yourself anywhere you want. Build with open source | Access Modular Platform with a fully managed dedicated endpoint. Receive Usage metrics. | Access Modular Platform with a fully managed dedicated endpoint. Receive Usage metrics. | Access Modular Platform with a console for deploying, scaling and managing your GenAI applications. |
Deployment location | Self-deployed | Our Cloud | Our Cloud | Hybrid, customizable |
Compute Hardware | Your hardware, see compatibility in builds | Our hardware | Our hardware | Hybrid, customizable |
Scalability | Scale on your own with the MAX container. | Highly flexible with tailored scalability | Highly flexible with tailored scalability | Highly flexible with tailored scalability |
Security & Compliance | SOC 2 Type I certified | SOC 2 Type I certified | SOC 2 Type I certified | SOC 2 Type I certified |
License | Read community license |
FAQ
Which models can I run on Modular?
With Modular, you can run the latest open-source models or your own custom builds. Choose to host on your infrastructure or leverage ours – we provide multiple product editions for full deployment flexibility. Check out our latest models on builds.modular.com.
Which GPUs are available on Modular?
We support the full spectrum of GPUs – from NVIDIA and AMD datacenter hardware to consumer accelerators like NVIDIA RTX and Apple Silicon. Our platform delivers exceptional, state-of-the-art performance across the board, with industry-leading results on the latest NVIDIA B200s and AMD MI355Xs. Get in touch to learn more.
Can I get started with Modular Community Edition easily?
Yes, the Modular Community Edition is completely free and open source. You can download our Docker Containers, or install this easily via PIP, UV, PIXI, Conda and more below.
Is Modular hosted infrastructure secure?
Yes, Modular is SOC 2 Type I certified and independently audited, with SOC 2 Type II certification on the way. Reach out to us below if you have any questions.
How does Modular integrate with our existing infrastructure?
Modular integrates seamlessly into your applications – MAX and our hosted endpoints are fully compatible with the OpenAI API standard. For custom kernels, Mojo interoperates directly with C++, CUDA, and ROCm, making it simple to migrate existing model codebases to the Modular Platform. And if you need hands-on help, we offer dedicated forward-deployed engineering support. Reach out to our team to explore the best path for you.
What level of customer support do you offer?
Our world-class customer support varies by software edition. For paid editions (Batch, Dedicated and Enterprise) - we offer email, Slack, and Zoom/Google Hangouts support. We can also offer dedicated forward-deployed engineering support. Just reach out to our team to discuss what's best for you.