Modular partners with AMD, and much more!

Watch the June 10th,2025 announcement below

Unlock speed-of-light inference on GPUs with an open, unified AI stack

Scale your AI workloads from 1 GPU to megaclusters, all without CUDA - all on a single platform that grows with you from startup to enterprise

Get Started

Learn More

What can you do with Modular?

500+ GenAI models

Customizable

Open source implementation

Get going fast

Tiny containers

Multi-hardware support

Multi-cloud deployment

Full hardware control

Great documentation

State-of-the-art performance

Hardware agnostic

Multi hardware

Write once, deploy anywhere

Backwards compatible

"At AWS we are focused on powering the future of AI by providing the largest enterprises and fastest-growing startups with services that lower their costs and enable them to move faster. The MAX Platform supercharges this mission for our millions of AWS customers, helping them bring the newest GenAI innovations and traditional AI use cases to market faster."

Bratin Saha

VP of Machine Learning & AI services

"We're truly in a golden age of AI, and at AMD we're proud to deliver world-class compute for the next generation of large-scale inference and training workloads… We also know that great hardware alone is not enough. We've invested deeply in open software with ROCm, empowering developers and researchers with the tools they need to build, optimize, and scale AI systems on AMD. This is why we are excited to partner with Modular… and we’re thrilled that we can empower developers and researchers to build the future of AI."

Vamsi Boppana

SVP of AI, AMD

"Our collaboration with Modular is a glimpse into the future of accessible AI infrastructure. Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation, at just 200ms for 2 second chunks. This allowed us to serve more QPS with lower latency and eventually offer the API at a ~60% lower price than would have been possible without using Modular’s stack."

Igor Poletaev

Chief Science Officer - Inworld

"Modular’s team is world class. Their stack slashed our inference costs by 80%, letting our customer dramatically scale up. They’re fast, reliable, and real engineers who take things seriously. We’re excited to partner with them to bring down prices for everyone, to let AI bring about wide prosperity."

Evan Conrad

CEO - San Francisco Compute

"Modular allows Qwerky to write our optimized code and confidently deploy our solution across NVIDIA and AMD solutions without the massive overhead of re-writing native code for each system."

Evan Owen

CTO, Qwerky AI

Unlock speed of light performance at every level

Industry leading performance out-of-the-box

Achieve state-of-the-art speed and efficiency immediately—no complex tuning, just install and run.

Scale from 1 GPU, to unlimited

Kubernetes-native control plane, router, and substrate specially-designed for large-scale distributed AI serving. It supports multi-model management, prefill-aware routing, disaggregated compute and cache.

Biggest open source library of GPU code

Boost performance across the entire AI stack with innovations like compiler, kernels, runtime, and more

View GitHub

Drive Total Cost Savings

MAX's speed is proven to bring your overall AI budget down. Connect with us to find out how much you can save

Talk to us

Benchmark our performance

Run real benchmarks, validate SOTA performance, and confidently scale AI workloads

Read docs

Native portability, from CPU to GPU and beyond

Write once, deploy everywhere

Forward & backward compatible

Future proof your AI stack

Spend less on compute

Data center & consumer GPUs

Take control, customize at every layer—from model to silicon

Customize down to the silicon

Fine-tune custom ops and runtimes for performance and precision at the deepest levels.

Bring cutting-edge research to production

Boost performance across the entire AI stack with innovations like compiler, kernels, runtime, and more.

Open source

Access, modify, and extend every layer of the stack without restrictions—true freedom to innovate.

Vertically integrated

One cohesive system for better performance, easier debugging, and more consistent behavior across development and deployment.

Start simple and grow

Our engineers have created a platform that works out of the box. Get comfortable with our tools, then optimize further down to the metal.

"after wrestling with CUDA drivers for years, it felt surprisingly… smooth. No, really: for once I wasn’t battling obscure libstdc++ errors at midnight or re-compiling kernels to coax out speed. Instead, I got a peek at writing almost-Pythonic code that compiles down to something that actually flies on the GPU."

Sanika

1/1

"This is about unlocking freedom for devs like me, no more vendor traps or rewrites, just pure iteration power. As someone working on challenging ML problems, this is a big thing."

Jayesh

1/1

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

1/1

"The performance gains and the whole Mojo direction felt genuinely exciting, especially from the perspective of someone who’s always looking to squeeze more out of Python-based workflows."

Vishnu

1/1

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

1/1

"Having a single framework in Mojo that compiles both lower-level code and a higher level metaprogramming layer using dependent types feels like black magic, so exciting!"

Theodore Papamarkou

1/1

Less waiting, more building

Minimize dependencies

Fewer external libraries for faster installs, smaller packages, and smoother deployments.

GPU Acceleration, without CUDA

Get your AI stack running without complex GPU drivers like CUDA. Say goodbye to setup issues and dependency conflicts.

Lightning-fast compile times

Experience effortless prototyping—no lag, no friction, just fast iteration. Pythonic code that flies on the GPU.

Build with Python, optimize with Mojo🔥

Develop easily with familiar Python, then switch to Mojo for higher speed and precision when needed, without changing platforms.

Best developer tools

With native VS Code extensions (100k installs!), and fully featured LLDB debugger, develop faster than ever.

Works with AI code editors

Plug into any AI Coding Editor

The Modular Platform works great with any AI code editor. Cursor, Claude Code, Windsurf - all supported with streamlined setup.

Scales for enterprises

Dedicated enterprise support

We are a team of the world's best AI infrastructure leaders who are reinventing and rebuilding accelerated compute for everyone.

About Us

Infinitely scalable to reduce your TCO

Optimize costs and performance with multi-node inference at massive scale across cloud or on-prem environments.

Enterprise grade SLA

Our performance is backed with an enterprise grade SLA, ensuring reliability, accountability, and peace of mind.

Developer Approved 👍

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

"Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators."

jeremyphoward

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through would be awesome.”

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

jeremyphoward

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

jeremyphoward

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

jeremyphoward

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

"It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing."

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

Build the future of AI with Modular

Get started - Docs

Quick start resources

Latest Blog Posts

Modular Platform 25.5: Introducing Large Scale Batch Inference

August 5, 2025

SF Compute and Modular Partner to Revolutionize AI Inference Economics

July 31, 2025

AI Agents for AWS Marketplace

July 16, 2025

Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend

July 9, 2025

Inside Modular Hack Weekend: Top Projects and Community Highlights

July 3, 2025

How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)

June 20, 2025

Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In

June 18, 2025

Modular + AMD: Unleashing AI performance on AMD GPUs

June 10, 2025

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

June 10, 2025

Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon

May 29, 2025

Exploring Metaprogramming in Mojo

May 27, 2025

Modular GPU Kernel Hackathon Highlights: Innovation, Community, & Mojo🔥

May 20, 2025

Modular’s bet to break out of the Matrix (Democratizing AI Compute, Part 10)

May 8, 2025

Modular Platform 25.3: 450K+ Lines of Open Source Code and pip Packaging

May 6, 2025

A New, Simpler License for MAX and Mojo

April 23, 2025

Why do HW companies struggle to build AI software? (Democratizing AI Compute, Part 9)

April 22, 2025

Modverse #47: MAX 25.2 and an evening of GPU programming at Modular HQ

April 17, 2025

What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)

April 8, 2025

What about Triton and Python eDSLs? (Democratizing AI Compute, Part 7)

March 26, 2025

MAX 25.2: Unleash the power of your H200's–without CUDA!

March 25, 2025

What about TVM, XLA, and AI compilers? (Democratizing AI Compute, Part 6)

March 12, 2025

What about OpenCL and CUDA C++ alternatives? (Democratizing AI Compute, Part 5)

March 5, 2025

Modverse #46: MAX 25.1, MAX Builds, and Democratizing AI Compute

February 27, 2025

CUDA is the incumbent, but is it any good? (Democratizing AI Compute, Part 4)

February 20, 2025

MAX 25.1 - Introducing MAX Builds

February 18, 2025

How did CUDA succeed? (Democratizing AI Compute, Part 3)

February 12, 2025

Paged Attention & Prefix Caching Now Available in MAX Serve

February 6, 2025

What exactly is “CUDA”? (Democratizing AI Compute, Part 2)

February 5, 2025

DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)

January 30, 2025

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

January 30, 2025

Use MAX with Open WebUI for RAG and Web Search

January 23, 2025

Hands-on with Mojo 24.6

January 21, 2025

Evaluating Llama Guard with MAX 24.6 and Hugging Face

December 19, 2024

MAX GPU: State of the Art Throughput on a New GenAI platform

December 17, 2024

Introducing MAX 24.6: A GPU Native Generative AI Platform

December 17, 2024

Build a Continuous Chat Interface with Llama 3 and MAX Serve

December 17, 2024

Why Magic?

November 5, 2024

Understanding SIMD: Infinite Complexity of Trivial Problems

October 25, 2024

Community Spotlight: Writing Mojo with Cursor

October 10, 2024

Hands-on with Mojo 24.5

October 1, 2024

MAX 24.5 - With SOTA CPU Performance for Llama 3.1

September 13, 2024

Announcing stack-pr: an open source tool for managing stacked PRs on GitHub

July 23, 2024

Debugging in Mojo🔥

July 16, 2024

Develop locally, deploy globally

July 9, 2024

Take control of your AI

July 9, 2024

Bring your own PyTorch model

July 9, 2024

A brief guide to the Mojo n-body example

July 3, 2024

What's new in MAX 24.4? MAX on macOS, fast local Llama3, native quantization and GGUF support

June 25, 2024

What’s new in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements

June 17, 2024

MAX 24.4 - Introducing quantization APIs and MAX on macOS

June 7, 2024

Deep dive into ownership in Mojo

June 4, 2024

What ownership is really about: a mental model approach

May 29, 2024

Fast⚡k-means clustering in Mojo🔥: a guide to porting Python to Mojo🔥 for accelerated k-means clustering

May 20, 2024

Developer Voices: Deep Dive with Chris Lattner on Mojo

May 8, 2024

What’s New in Mojo 24.3: Community Contributions, Pythonic Collections and Core Language Enhancements

May 2, 2024

MAX 24.3 - Introducing MAX Engine Extensibility

May 2, 2024

Row-major vs. Column-major Matrices: A Performance Analysis in Mojo and NumPy

April 10, 2024

What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

April 2, 2024

MAX 24.2 is Here! What’s New?

March 28, 2024

The Next Big Step in Mojo🔥 Open Source

March 28, 2024

Semantic Search with MAX Engine

March 21, 2024

How to Be Confident in Your Performance Benchmarking

March 19, 2024

Mojo🔥 ❤️ Pi 🥧: Approximating Pi with Mojo🔥 using Monte Carlo methods

March 14, 2024

Evaluating MAX Engine inference accuracy on the ImageNet dataset

March 13, 2024

Optimize and deploy AI models with MAX Engine and MAX Serving

March 11, 2024

Announcing MAX Developer Edition Preview

February 29, 2024

Getting started with MAX Developer Edition

February 29, 2024

MAX is here! What does that mean for Mojo🔥?

February 29, 2024

What are dunder methods? A guide in Mojo🔥

February 26, 2024

Mojo🔥 ♥️ Python: Calculating and plotting a Valentine’s day ♥️ using Mojo and Python

February 15, 2024

Mojo vs. Rust: what are the differences?

February 12, 2024

What is loop unrolling? How you can speed up Mojo🔥 code with @unroll

January 29, 2024

Mojo🔥 SDK v0.7 now available for download!

January 25, 2024

Mojo 🔥 lightning talk ⚡️ one language for all AI programming!

January 23, 2024

Modular to bring NVIDIA Accelerated Computing to the MAX Platform

December 4, 2023

Modular partners with Amazon Web Services (AWS) to bring MAX to AWS services

December 4, 2023

Key announcements from ModCon 2023

December 4, 2023

Mojo 🔥 Traits Have Arrived!

December 3, 2023

Mojo 🔥 Advent of Code 2023

November 26, 2023

ModCon 2023 sessions you don’t want to miss!

November 22, 2023

ModCon Mojo 🔥 Contest

November 21, 2023

Implementing NumPy style matrix slicing in Mojo🔥

November 20, 2023

What’s new in Mojo SDK v0.5?

November 14, 2023

Welcome Mostafa Hagog to Modular

November 6, 2023

Mojo🔥 is now available on Mac

October 19, 2023

Mojo 🔥 - A systems programming language presented at LLVM 2023

October 15, 2023

Community Spotlight: How I built llama2.🔥 by Aydyn Tairov

October 13, 2023

Using Mojo🔥 with Python🐍

October 2, 2023

How to setup a Mojo🔥 development environment with Docker containers

September 28, 2023

AI Regulation: step with care, and great tact

September 26, 2023

Mojo🔥 - It’s finally here!

September 7, 2023

We’ve raised $100M to fix AI infrastructure for the world's developers

August 24, 2023

An easy introduction to Mojo🔥 for Python programmers

August 8, 2023

What’s the difference between the AI Engine and Mojo?

July 11, 2023

Modular natively supports dynamic shapes for AI workloads

June 22, 2023

Do LLMs eliminate the need for programming languages?

June 8, 2023

Accelerating AI model serving with the Modular AI Engine

June 1, 2023

Our launch & what's next

May 11, 2023

A unified, extensible platform to superpower your AI

May 2, 2023

The world's fastest unified matrix multiplication

April 20, 2023