Editions that work for everyone.
Develop and scale with us.
Community-first. Always free.
Enterprise-grade deployments as you grow.
FREE FOREVER
Modular Community Edition
An open AI platform with the power of MAX & Mojo free for ALL developers.
SoTA inference performance for LLMs
Run AI models and pipelines on any CPUs and NVIDIA GPUs
Deploy MAX & Mojo yourself anywhere
Build with open source & join the community
Community support through Discord and Github
Modular Community License
“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

@drdude
FAQ
What is MAX? What is Mojo?
MAX is a high performance inference framework that enables you to serve LLMs, write high performance inference graphs, and develop custom, scalable test-time inference pipelines across image, audio, video and text.
Mojo is a pythonic language for blazing-fast CPU+GPU execution without CUDA. Optionally use it with MAX for insanely fast AI inference.
Can you explain the community license in simple terms? I’m not a lawyer.
You can use Mojo and MAX for any non-production or non-commercial use, however you’d like.
Additionally, you can use Mojo and MAX for free for commercial purposes - we ask that you let us know and allow us to put your company logo on our website so we can demonstrate the impact our work has by emailing usage@modular.com. Such usage is unlimited on CPUs and NVIDIA hardware, and up to 8 discrete GPUs from other vendors - though MAX doesn’t support any other vendors today.
Can I use MAX for free on a 512 NVIDIA GPU cluster?
Yes, for any purpose.
If you deploy MAX & Mojo for commericial, production use - we ask that you let us know and allow us to put your company logo on our website so we can demonstrate the impact our work has by emailing usage@modular.com. Such usage is unlimited on CPUs and NVIDIA hardware, and on up to 8 discrete GPUs from other vendors - though MAX doesn’t support any other vendors today.
Can I use Mojo and MAX on a laptop or desktop PC?
Yes, most laptops and desktops have CPUs and 8 or fewer discrete GPUs in them - and you can use MAX for anything on such machines. Additionally, you can use Mojo and MAX for free for any non-production or non-commercial use-case!
Please just Talk to Us if you want to use MAX for more than 8 non-NVIDIA accelerators in a PC for some commercial purpose.
Can I use MAX on Apple GPUs?
Yes, Apple GPUs show up as a single device in the computer, and therefore are counted as a single GPU, so you can use them unlimited for both personal and commercial use-cases.
Keep in mind that MAX does not support Apple GPUs today, but this is something we may explore in the future.
How does Modular define a "device"?
Modular software is licensed on a discrete physical per-device basis. So any discrete device accessible by the software is what we consider as "a single device".
The Community Edition enables you to scale limitlessly for non-production, non-commercial usage cases. For production, commercial use you can scale to unlimited CPU, NVIDIA hardware deployments, and can utilize up to 8x other device types before we require you to contact us for licensing.
What happens if I build on MAX & Mojo, and Modular disappears?
The Modular Community License is intentionally structured so Modular can’t “rug pull” the software.
If for some reason Modular decides to stop updating the SDK, you can continue use of the existing versions and get a perpetual license to do so. This is much more liberal than licenses for comparable software like CUDA.
Why is Modular Community Edition limited to 8 GPUs from other vendors?
We would like to be able to scale to support a wide variety of hardware from multiple vendors over time, but AI software is difficult to build and expensive for a startup like us to scale, support, and qualify in our releases.
MAX is free for use on all hardware it officially supports today, and we will continue to evaluate the right approach on a case-by-case basis over time as we expand support for more hardware.
If you’re interested in MAX supporting hardware from other vendors, please reach out to them and encourage them to work with us.
PAY PER GPU
Modular Self-Hosted Enterprise
Our self-hosted Enterprise Platform that is managed on your own Kubernetes Infrastructure, and supported by our world-class AI team.
Enabled for simple self-hosting in your cloud or data-center.
Modular K8's multi-node inference scaling technology
High-volume distributed inference scaling
Enterprise MAX AI Framework for NVIDIA & AMD GPUs
Per GPU Pricing & Dedicated enterprise support
MAX Enterprise License
"The usage experience is outstanding. The CLI is concise ... MAX is optimal for AI model scenarios requiring low latency and high request throughput."

AI Engineer
@ Fortune 10 company
Deploy with us today
Thank you for your submission.
Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.
Thank you,
Modular Sales Team
FAQ
What is the Self-Hosted Modular Enterprise Edition?
Modular Enterprise Edition is a highly scalable inference platform that enables you to serve production-grade inference across thousands of GPUs, supporting all of the latest LLMs at enormous scale for your enterprise.
It is multi-cloud, supports hardware from multiple GPU vendors, enables multi-model routing and has the latest Gen AI optimizations to optimize performance directly for your workload and traffic. It is perfect for on-premise or advanced AI infrastructure teams.
What support do you provide for Enterprise Edition?
You will have access to the worlds best AI infrastructure team, that can work with you directly to build out your AI infrastructure stack. We have dedicated Slack channels, direct support and incredible response times to ensure that our infrastructure meets your enterprise needs.
Why use the Self-hosted Enterprise Edition vs Community?
The Self-Hosted Enterprise Edition features robust, enterprise grade AI inference infrastructure that enables you to scale across GPU vendors, delivers SOTA performance for the largest workloads and can help you directly manage your AI infrastructure for Gen AI at unprecedented scale - delivering larger TCO wins directly for your business. You also get advanced support directly from our team for MAX and Mojo.
Contact us right now to discuss how we can help.
How does your pricing work for Self-hosted Enterprise Edition?
We charge on a per-gpu basis for our enterprise edition, and you can talk to us to discuss how we can tailor to your specific needs.
Contact for pricing
BYOC Enterprise
Fully managed inference platform to support the largest deployments needed by your enterprise with expert guidance at every step.
Fully managed Modular Platform in your own cloud
Modular K8's multi-node inference scaling technology
10K's token/s distributed inference scaling
Enterprise MAX AI Framework for NVIDIA & AMD GPUs
Per GPU Pricing &dDedicated enterprise support
MAX Enterprise License
"The usage experience is outstanding. The CLI is concise ... MAX is optimal for AI model scenarios requiring low latency and high request throughput."

AI Engineer
@ Fortune 10 company
Talk to sales about self hosting
Thank you for your submission.
Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.
Thank you,
Modular Sales Team
FAQ
Start building with Modular
Easy ways to get started
Get started guide
With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.
Browse open source models
Copy, customize, and deploy. Get your GenAI app up and running FAST with total control over every layer.
Find Examples
Follow step by step recipes to build Agents, chatbots, and more with MAX.