Deploy fast, scalable
Gen AI inference now
The MAX platform enables you to easily serve SOTA GenAI models across NVIDIA and AMD GPUs.
SOTA performance for GenAI workloads on Open LLMs
Reduced TCO
Get SLA support directly from the world’s best AI engineering team
💸
From RAG to Agents
From RAG applications to next generation agents, we support them all.
👍
Simple Deployment
MAX can get setup and running in under 3 minutes, and scale easily from local to cloud.
⚙️
Iterate quickly from your laptop
Develop, test, and deploy in a unified environment that eliminates inconsistencies and accelerates your time to production.
OR
Deploy to any cloud VPC or Kubernetes, using the same codebase
Deploy to any cloud provider with ease, ensuring flexibility and scalability without having to reconfigure for different environments.
A portable GPU software stack that gives you options
Easy to switch
Have a PoC that's working with closed proprietary model and now you're ready to own your AI stack? That's what MAX does best!
Use any open source model
Run Llama 3.1 now with MAX now, or use an open source model using the tutorial linked below.
Run on GPU or CPU
Get great performance and utilization across all your instances.
Deploy to any cloud
Deploy yourself anywhere, or book time with our support team to help you configure your business critical applications and connect to your inference pipelines.
Own, control, and secure your AI future
Get off the endpoint
Gain full control over performance, security, and optimization.
Manage your data privacy & compliance
Get peace of mind with MAX. Own your ML Pipelines and avoid sending your proprietary data to external sources.
Own your IP
Control every layer of your stack. Get your weights from anywhere. Customize down to the kernel if needed.
FAQ
How do I use MAX?
The Modular Accelerated Xecution (MAX) platform is a unified set of APIs and tools that simplify the process of building and deploying your own high-performance AI endpoint. To get started with MAX either locally, or via a Docker Container, just Install MAX or follow one our tutorials like Deploying Llama 3 on GPU with MAX Serve.
What does MAX replace?
We created MAX to solve the fragmented and confusing array of AI tools that plague the industry today. Our unified toolkit is designed to help the world build high-performance AI pipelines and deploy them to any hardware removing the need for hardware vendor specific libraries. You can read more in our blog post.
How much do I have to pay to use MAX?
MAX is a free and permissive AI inference framework that enables developers and enterprises to develop and deploy AI inference workloads on any hardware type, into any type of environment (including into production). We offer MAX Enterprise for organizations seeking enterprise support, and you can read more on our Pricing Page.
Is MAX compatible with my current stack?
Almost certainly. MAX is built without vendor-specific hardware libraries, enabling it to scale effortlessly across a wide range of CPUs and GPUs. We tightly integrate with AI ecosystem tools such as Python, PyTorch, and Hugging Face, and have fully extensible API surface. MAX Serve is available in a ready-to-deploy containers, and provides an OpenAI API endpoint API surface. We work and deploy easily with Docker and Kubernetes. Read more here.
What models does MAX currently support?
MAX supports model formats provided by Hugging Face, PyTorch, ONNX and MAX Graphs (our model format). We have a fully integrated LLM serving and execution stack that provides SOTA performance out-of-the-box. You can read more about the models here.
Free to start. Scale as you grow.
MAX is FREE for anyone to self-manage. Looking for enterprise solutions and dedicated support? Book a demo or reach out to our sales team.
Developer Approved 👍
“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”
Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.
“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”
“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”
“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”
Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.
“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”
“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”
“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”
Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.
“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”
“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”
“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”
Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.
“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”
“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”
“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”
“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”
“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”
“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”
“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”
“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”
“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”
“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”
“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”
“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”
“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”
“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”
“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”
“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”
“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”
“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”
“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”
“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”
“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”
“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”
“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”
“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”
“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”
“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”
“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”
“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”
“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”
“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”
"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."
"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."
“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”
“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”
It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.
“The more I benchmark, the more impressed I am with the MAX Engine.”
"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."
"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."
“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”
“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”
It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.
“The more I benchmark, the more impressed I am with the MAX Engine.”
"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."
"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."
“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”
“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”
It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.
“The more I benchmark, the more impressed I am with the MAX Engine.”
"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."
"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."
“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”
“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”
It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.
“The more I benchmark, the more impressed I am with the MAX Engine.”