Editions that work for everyone. Scale as you grow.

Free Forever
Community
An open AI platform powered by MAX and Mojo - free for every developer. Build, scale, and deploy AI on any hardware with a single framework.
- SOTA GenAI serving performance
- Supports the latest AI models across the latest AI hardware.
- Deploy MAX and Mojo yourself in any cloud environment.
- Open source and a vibrant community of developers.
- Community support through Discord and Github
- Modular Community License
Get started now
For engineers who want full control, customization, and no upfront cost.
PAY PER GPU HOUR
Batch API Endpoint
Fully managed batch API endpoints that are 85% lower cost than competitors, delivering fast completion with the latest AI models.
- Asynchronous large-scale batch inference endpoints
- Support the latest AI models - Qwen3, InternVL, GPT-OSS
- Lowest-cost endpoints to maximize ROI
- Turn around large batches in hours to days
- SOC 2 Type I certified and independently audited
- View our competitive pricing
Sign Up
For teams that want faster, accurate, and less expensive batch inference at scale.
PAY PER GPU HOUR
Dedicated Endpoint
Fully managed, dedicated API endpoints for low-latency online inference with resilient, availability support for the latest AI models.
- Distributed, large-scale online inference endpoints.
- Support the latest AI models - Qwen3, InternVL, GPT-OSS
- Highest-performance endpoints to maximize ROI
- Resilient, high-availability, large-scale services
- SOC 2 Type I certified and independently audited
- Terms and Conditions
Talk to sales
For teams that want faster, accurate, and less expensive online inference at scale.
Custom pricing
Enterprise
We partner with enterprises on advanced deployments —whether you need full data control, run your compute on CSPs or Neoclouds, or prefer a hybrid approach. Let's talk.
Everything in Dedicated Endpoint, plus:
- Deployment in your cloud or on-premise environment.
- Optimization of your custom pipelines and workloads.
- Hybrid deployments designed for data sovereignty.
- Tailored and flexible SLAs and SLOs for enterprise needs.
- Terms and Conditions
Talk to sales
For enterprises needing control, or hybrid cloud/on-prem setups.

	Community	Batch API Endpoint	Dedicated Endpoint	Enterprise
Support	Active community and fast responses in Discord, Discourse, Github	Dedicated customer support, support from engineering team, standard SLAs/SLOs guarantees.	Dedicated customer support, support from engineering team, standard SLAs/SLOs guarantees.	Dedicated customer support, support from engineering team, roadmap prioritization, standard SLAs/SLOs guarantees.
Models	Anything in our model repo, view top performers first	See list which are available for batch endpoint.	Our team will bring up any model, or help you customize a model	Our team will bring up any model, or help you customize a model
Platform access	Deploy MAX and Mojo yourself anywhere you want. Build with open source	Access Modular Platform with a fully managed dedicated endpoint. Receive Usage metrics.	Access Modular Platform with a fully managed dedicated endpoint. Receive Usage metrics.	Access Modular Platform with a console for deploying, scaling and managing your GenAI applications.
Deployment location	Self-deployed	Our Cloud	Our Cloud	Hybrid, customizable
Compute Hardware	Your hardware, see compatibility in builds	Our hardware	Our hardware	Hybrid, customizable
Scalability	Scale on your own with the MAX container.	Highly flexible with tailored scalability	Highly flexible with tailored scalability	Highly flexible with tailored scalability
Security & Compliance	SOC 2 Type I certified	SOC 2 Type I certified	SOC 2 Type I certified	SOC 2 Type I certified
License	Read community license	Enterprise agreement	Enterprise agreement	Enterprise agreement

~70% faster compared to vanilla vLLM

"Our collaboration with Modular is a glimpse into the future of accessible AI infrastructure. Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation, at just 200ms for 2 second chunks. This allowed us to serve more QPS with lower latency and eventually offer the API at a ~60% lower price than would have been possible without using Modular’s stack."

Igor Poletaev

Chief Science Officer - Inworld

Read study

Slashed our inference costs by 80%

"Modular’s team is world class. Their stack slashed our inference costs by 80%, letting our customer dramatically scale up. They’re fast, reliable, and real engineers who take things seriously. We’re excited to partner with them to bring down prices for everyone, to let AI bring about wide prosperity."

Evan Conrad

CEO - San Francisco Compute

Read study

Confidently deploy our solution across NVIDIA and AMD

"Modular allows Qwerky to write our optimized code and confidently deploy our solution across NVIDIA and AMD solutions without the massive overhead of re-writing native code for each system."

Evan Owen

CTO, Qwerky AI

Read study

MAX Platform supercharges this mission

"At AWS we are focused on powering the future of AI by providing the largest enterprises and fastest-growing startups with services that lower their costs and enable them to move faster. The MAX Platform supercharges this mission for our millions of AWS customers, helping them bring the newest GenAI innovations and traditional AI use cases to market faster."

Bratin Saha

VP of Machine Learning & AI services

Read study

Supercharging and scaling

"Developers everywhere are helping their companies adopt and implement generative AI applications that are customized with the knowledge and needs of their business. Adding full-stack NVIDIA accelerated computing support to the MAX platform brings the world’s leading AI infrastructure to Modular’s broad developer ecosystem, supercharging and scaling the work that is fundamental to companies’ business transformation."

Dave Salvator

Director, AI and Cloud

Read study

Build, optimize, and scale AI systems on AMD

"We're truly in a golden age of AI, and at AMD we're proud to deliver world-class compute for the next generation of large-scale inference and training workloads… We also know that great hardware alone is not enough. We've invested deeply in open software with ROCm, empowering developers and researchers with the tools they need to build, optimize, and scale AI systems on AMD. This is why we are excited to partner with Modular… and we’re thrilled that we can empower developers and researchers to build the future of AI."

Vamsi Boppana

SVP of AI, AMD

Read study

Inworld

San Francisco Compute

Qwerky AI

AWS

NVIDIA

AMD

Which models can I run on Modular?
With Modular, you can run the latest open-source models or your own custom builds. Choose to host on your infrastructure or leverage ours – we provide multiple product editions for full deployment flexibility. Check out our latest models on builds.modular.com.
Go to builds.modular.com
Which GPUs are available on Modular?
We support the full spectrum of GPUs – from NVIDIA and AMD datacenter hardware to consumer accelerators like NVIDIA RTX and Apple Silicon. Our platform delivers exceptional, state-of-the-art performance across the board, with industry-leading results on the latest NVIDIA B200s and AMD MI355Xs. Get in touch to learn more.
Can I get started with Modular Community Edition easily?
Yes, the Modular Community Edition is completely free and open source. You can download our Docker Containers, or install this easily via PIP, UV, PIXI, Conda and more below.
Install Community Edition
Is Modular hosted infrastructure secure?
Yes, Modular is SOC 2 Type I certified and independently audited, with SOC 2 Type II certification on the way. Reach out to us below if you have any questions.
Reach out to us
How does Modular integrate with our existing infrastructure?
Modular integrates seamlessly into your applications – MAX and our hosted endpoints are fully compatible with the OpenAI API standard. For custom kernels, Mojo interoperates directly with C++, CUDA, and ROCm, making it simple to migrate existing model codebases to the Modular Platform. And if you need hands-on help, we offer dedicated forward-deployed engineering support. Reach out to our team to explore the best path for you.
Reach out to us
What level of customer support do you offer?
Our world-class customer support varies by software edition. For paid editions (Batch, Dedicated and Enterprise) - we offer email, Slack, and Zoom/Google Hangouts support. We can also offer dedicated forward-deployed engineering support. Just reach out to our team to discuss what's best for you.
Reach out to us

Build the future of AI with Modular

Get started - FREE

Editions that work for everyone. Scale as you grow.

Community

Batch API Endpoint

Dedicated Endpoint

Enterprise

FAQ

Editions that work for everyone. Scale as you grow.