Editions that work for everyone.
Scale as you grow.

  • Free Forever

    Community

    An open AI platform powered by MAX and Mojo - free for every developer. Build, scale, and deploy AI on any hardware with a single framework.

    • SOTA GenAI serving performance

    • Supports the latest AI models across the latest AI hardware.

    • Deploy MAX and Mojo yourself in any cloud environment.

    • Open source and a vibrant community of developers.

    • Community support through Discord and Github

    For engineers who want full control, customization, and no upfront cost.

  • PAY PER GPU HOUR

    Batch API Endpoint

    Fully managed batch API endpoints that are 85% lower cost than competitors, delivering fast completion with the latest AI models.

    • Asynchronous large-scale batch inference endpoints

    • Support the latest AI models - Qwen3, InternVL, GPT-OSS

    • Lowest-cost endpoints to maximize ROI

    • Turn around large batches in hours to days

    • SOC 2 Type I certified and independently audited

    For teams that want faster, accurate, and less expensive batch inference at scale.

  • PAY PER GPU HOUR

    Dedicated Endpoint

    Fully managed, dedicated API endpoints for low-latency online inference with resilient, availability support for the latest AI models.

    • Distributed, large-scale online inference endpoints.

    • Support the latest AI models - Qwen3, InternVL, GPT-OSS

    • Highest-performance endpoints to maximize ROI

    • Resilient, high-availability, large-scale services

    • SOC 2 Type I certified and independently audited

    • Terms and Conditions

    For teams that want faster, accurate, and less expensive online inference at scale.

  • Custom pricing

    Enterprise

    We partner with enterprises on advanced deployments —whether you need full data control, run your compute on CSPs or Neoclouds, or prefer a hybrid approach. Let's talk.

    Everything in Dedicated Endpoint, plus:

    • Deployment in your cloud or on-premise environment.

    • Optimization of your custom pipelines and workloads.

    • Hybrid deployments designed for data sovereignty.

    • Tailored and flexible SLAs and SLOs for enterprise needs.

    • Terms and Conditions

    For enterprises needing control, or hybrid cloud/on-prem setups.

Community

Batch API Endpoint

Dedicated Endpoint

Enterprise

Support

Active community and fast responses in Discord, Discourse, Github

Dedicated customer support, support from engineering team, standard SLAs/SLOs guarantees.

Dedicated customer support, support from engineering team, standard SLAs/SLOs guarantees.

Dedicated customer support, support from engineering team, roadmap prioritization, standard SLAs/SLOs guarantees.

Models

Anything in our model repo, view top performers first

See list which are available for batch endpoint.

Our team will bring up any model, or help you customize a model

Our team will bring up any model, or help you customize a model

Platform access

Deploy MAX and Mojo yourself anywhere you want. Build with open source

Access Modular Platform with a fully managed dedicated endpoint. Receive Usage metrics.

Access Modular Platform with a fully managed dedicated endpoint. Receive Usage metrics.

Access Modular Platform with a console for deploying, scaling and managing your GenAI applications.

Deployment location

Self-deployed

Our Cloud

Our Cloud

Hybrid, customizable

Compute Hardware

Your hardware, see compatibility in builds

Our hardware

Our hardware

Hybrid, customizable

Scalability

Scale on your own with the MAX container.

Highly flexible with tailored scalability

Highly flexible with tailored scalability

Highly flexible with tailored scalability

Security & Compliance

SOC 2 Type I certified

SOC 2 Type I certified

SOC 2 Type I certified

SOC 2 Type I certified

License

Read community license

~70% faster compared to vanilla vLLM

"Our collaboration with Modular is a glimpse into the future of accessible AI infrastructure. Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation, at just 200ms for 2 second chunks. This allowed us to serve more QPS with lower latency and eventually offer the API at a ~60% lower price than would have been possible without using Modular’s stack."

Igor Poletaev

Chief Science Officer - Inworld

Slashed our inference costs by 80%

"Modular’s team is world class. Their stack slashed our inference costs by 80%, letting our customer dramatically scale up. They’re fast, reliable, and real engineers who take things seriously. We’re excited to partner with them to bring down prices for everyone, to let AI bring about wide prosperity."

Evan Conrad

CEO - San Francisco Compute

Confidently deploy our solution across NVIDIA and AMD

"Modular allows Qwerky to write our optimized code and confidently deploy our solution across NVIDIA and AMD solutions without the massive overhead of re-writing native code for each system."

Evan Owen

CTO, Qwerky AI

MAX Platform supercharges this mission

"At AWS we are focused on powering the future of AI by providing the largest enterprises and fastest-growing startups with services that lower their costs and enable them to move faster. The MAX Platform supercharges this mission for our millions of AWS customers, helping them bring the newest GenAI innovations and traditional AI use cases to market faster."

Bratin Saha

VP of Machine Learning & AI services

Supercharging and scaling

"Developers everywhere are helping their companies adopt and implement generative AI applications that are customized with the knowledge and needs of their business. Adding full-stack NVIDIA accelerated computing support to the MAX platform brings the world’s leading AI infrastructure to Modular’s broad developer ecosystem, supercharging and scaling the work that is fundamental to companies’ business transformation."

Dave Salvator

Director, AI and Cloud

Build, optimize, and scale AI systems on AMD

"We're truly in a golden age of AI, and at AMD we're proud to deliver world-class compute for the next generation of large-scale inference and training workloads… We also know that great hardware alone is not enough. We've invested deeply in open software with ROCm, empowering developers and researchers with the tools they need to build, optimize, and scale AI systems on AMD. This is why we are excited to partner with Modular… and we’re thrilled that we can empower developers and researchers to build the future of AI."

Vamsi Boppana

SVP of AI, AMD

Inworld
San Francisco Compute
Qwerky AI
AWS
NVIDIA
AMD

FAQ

  • Which models can I run on Modular?

    With Modular, you can run the latest open-source models or your own custom builds. Choose to host on your infrastructure or leverage ours – we provide multiple product editions for full deployment flexibility. Check out our latest models on builds.modular.com.

    Which GPUs are available on Modular?

    We support the full spectrum of GPUs – from NVIDIA and AMD datacenter hardware to consumer accelerators like NVIDIA RTX and Apple Silicon. Our platform delivers exceptional, state-of-the-art performance across the board, with industry-leading results on the latest NVIDIA B200s and AMD MI355Xs. Get in touch to learn more.

    Can I get started with Modular Community Edition easily?

    Yes, the Modular Community Edition is completely free and open source. You can download our Docker Containers, or install this easily via PIP, UV, PIXI, Conda and more below.

    Is Modular hosted infrastructure secure?

    Yes, Modular is SOC 2 Type I certified and independently audited, with SOC 2 Type II certification on the way. Reach out to us below if you have any questions.

    How does Modular integrate with our existing infrastructure?

    Modular integrates seamlessly into your applications – MAX and our hosted endpoints are fully compatible with the OpenAI API standard. For custom kernels, Mojo interoperates directly with C++, CUDA, and ROCm, making it simple to migrate existing model codebases to the Modular Platform. And if you need hands-on help, we offer dedicated forward-deployed engineering support. Reach out to our team to explore the best path for you.

    What level of customer support do you offer?

    Our world-class customer support varies by software edition. For paid editions (Batch, Dedicated and Enterprise) - we offer email, Slack, and Zoom/Google Hangouts support. We can also offer dedicated forward-deployed engineering support. Just reach out to our team to discuss what's best for you.