Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Illustration of a smiling astronaut and a cheerful orange flame character floating in front of a neon-lit triangular background.

Democratizing AI Compute Series

Go behind the scenes of the AI industry with Chris Lattner

🚨

News

Engineering

The Five Eras of KVCache

vLLM, SGLang, TensorRT-LLM, and MAX Serve are all built on top of increasingly sophisticated KVCache management. This blog explores the evolution and role of the KVCache in these inference engines.

February 5, 2026

/

Brian Zhang

,  

🚨

News

Engineering

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

In late August, AMD and TensorWave reached out to collaborate on a presentation for AMD’s Media Tech Day—they asked if we could demo MAX on AMD Instinct™ MI355 on September 16th. There was just one problem: no one at Modular had access to an MI355.

October 17, 2025

/

Tracy Sharpe

,  

Anand Pratap Singh

,  

Prince Jain

,  

Abdul Dakkak

,  

🚨

News

Engineering

Exploring Metaprogramming in Mojo

May 27, 2025

/

Brian Grenier

,  

🚨

News

Engineering

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

January 30, 2025

/

Ehsan M. Kermani

,  

🚨

News

Engineering

Use MAX with Open WebUI for RAG and Web Search

Learn how quickly MAX and Open WebUI get you up-and-running with RAG, web search, and Llama 3.1 on GPU

January 23, 2025

/

Bill Welense

,  

🚨

News

Engineering

Hands-on with Mojo 24.6

Mojo 24.6 introduces key improvements in argument conventions, memory management, and reference tracking, enhancing code clarity and safety with features like 'mut' for mutable arguments, 'origins' for references, and new collection types.

January 21, 2025

/

Ehsan M. Kermani

,  

🚨

News

Engineering

Evaluating Llama Guard with MAX 24.6 and Hugging Face

Imagine unlocking a world of open innovation while ensuring secure, reliable, and enterprise-ready Gen AI deployments—MAX 24.6 enables enterprise AI teams to seamlessly run a vast range of cutting-edge AI models from Hugging Face on NVIDIA GPUs.

December 19, 2024

/

Bill Welense

,  

🚨

News

Engineering

Build a Continuous Chat Interface with Llama 3 and MAX Serve

Build a Chat Application with Llama 3 and MAX Serve

December 17, 2024

/

Ehsan M. Kermani

,  

🚨

News

Engineering

MAX GPU: State of the Art Throughput on a New GenAI platform

Measuring state of the art GPU performance compared to vLLM on Modular's MAX 24.6

December 17, 2024

/

Max Hutchinson

,  

Tyler Kenney

,  

🚨

News

Engineering

Understanding SIMD: Infinite Complexity of Trivial Problems

A deep dive into the complexities of optimizing code for SIMD instruction sets across multiple platforms.

October 25, 2024

/

Ash Vardanian

,  

  • Series

    Democratizing Compute Series

    Go behind the scenes of the AI industry in this blog series by Chris Lattner. Trace the evolution of AI compute, dissect its current challenges, and discover how Modular is raising the bar with the world’s most open inference stack.

    11 part series

  • Series

    Matrix Multiplication on Blackwell

    Learn how to write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation while leveraging Mojo's special features to make the kernel as simple as possible.

    4 part series

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Get started guide

    Install MAX with a few commands and deploy a GenAI model locally.

    Read Guide
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    500+ models, many optimized for lightning-fast performance

    Browse models