fast and unified genai For enterprise

Save money on inference using any model, hardware, & cloud

Get started with MAX

Book a demo of MAX

SOTA performance on GenAI workloads

3765

Output Throughput (req/s)

MODEL: LLAMA-3.1-8B-INSTRUCT

GPU: NVIDIA A100 80GB

WORKLOAD: SHAREGPT (500 PROMPTS)

52_ms

Time to first token (mean)

7.1_ms

Inter-token latency

4.1_k

Tokens/s (output)

Experience MAX’s performance firsthand

Replicate our benchmarks.

Run MLPerf benchmarks on any compatible model without writing any code.

Read Blog

Request performance data

Looking for more specific performance data? Our sales team can provide exact data for any use case.

Develop locally, deploy globally. With the same code.

Iterate faster from your laptop

Don’t want to waste time and money deploying to cloud. Validate your proof of concept right from your laptop.

Try MAX

Minimal dependencies, make life easy

MAX's small container lets you deploy easily to NVIDIA and AMD GPUs with minimal dependencies - saving you time and simplifying deployment.

Get Started

Deploy to any cloud VPC or Kubernetes

Start creating your own defensible I.P. and take control of your data privacy and compliance.

Read Tutorial

Integrates with PyTorch. Scales to the Max.

Bring any PyTorch model

Execute a wide range of models with seamless PyTorch Eager & Torch.compile integrations.

Get started

Serve optimized MAX models

Browse through multiple models designed specifically for even better performance with MAX.

View Models

Serve PyTorch LLMs from Hugging Face

MAX Serve’s native Hugging Face model support, enables you to rapidly develop, test, and deploy any PyTorch LLMs.

Free yourself of lock-ins. Multi-cloud. Multi-hardware.

Avoid lock-in. Choose freely.

MAX gives you more flexibility and scalability, enabling seamless deployment across different cloud providers or on-premises systems while optimizing performance and cost.

OpenAI compatible endpoint

Quickly integrate existing applications and workflows without needing to rewrite code or learn new APIs.

Scale your workloads

Handle increasing or fluctuating demands in processing AI tasks, ensuring optimal performance and cost-effectiveness.

Out-of-the-box performance & utilization

Get immediate performance wins with torch.compile interoperability and MAX’s custom stack & backend

Control, secure, & own all your AI.

Own your IP

Control every layer of your stack. Get your weights from anywhere. Customize down to the kernel if needed.

Manage your data privacy & compliance

Get peace of mind with MAX. Own your ML Pipelines and avoid sending your proprietary data to external sources.

Own your AI endpoint

Unify your AI infrastructure, owning your endpoint for seamless performance and better control.

Get started with MAX. Deploy in minutes.

Install and start running LLMs in 3 steps

Install MAX with just 3 terminal commands. Run any of our optimized models with a single command from here.

01 Install package manager

$ curl -ssL https://magic.modular.com | bash

Copied

02 Clone the MAX repo

git clone https://github.com/modularml/max && cd pipelines/python

Copied

03 Go to the models directory

magic run llama3 --prompt "What is the meaning of life?"

Copied

View Docs

Develop with Python APIs

Use what you know with Python integrations allowing you to interop with your existing workloads and offload onto MAX where it matters

Streamlined AI deployment

Simplify your infrastructure, optimization, and integration processes so you can leverage more AI with fewer technical hurdles.

Use your existing use cases & tools

Use the MAX APIs to build, optimize and deploy from one model to more complex GenAI pipelines on CPUs or GPUs.

Where MAX sits in your stack

MAX inference engine sits inside your preferred cloud provider and gives you SOTA performance on NVIDIA and AMD GPUs

Step-by-step guides on using MAX

Deploy Llama 3 on GPU with MAX Serve

Read Tutorial

Deploy Llama 3 on GPU-powered Kubernetes clusters

Read Tutorial

Get started with MAX Graph in Python

Read Tutorial

Deploy a PyTorch model from Hugging Face

Read Tutorial

What developers are saying about MAX

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“The Community is incredible and so supportive. It’s awesome to be part of.”

benny.n

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“The Community is incredible and so supportive. It’s awesome to be part of.”

benny.n

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“The Community is incredible and so supportive. It’s awesome to be part of.”

benny.n

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“The Community is incredible and so supportive. It’s awesome to be part of.”

benny.n

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273