Between a global hackathon, a major release, and standout community projects, last month was full of progress across the Modular ecosystem!

Modular Platform 25.4 launched on June 18th, alongside the announcement of our official partnership with AMD, bringing full support for AMD Instinct™ MI300X and MI325X GPUs. You can now deploy the same container across both AMD and NVIDIA hardware with no code changes, no vendor lock-in, and no additional configuration!

Highlights from 25.4 include up to 53% better throughput on prefill-heavy BF16 workloads across Llama 3.1, Gemma 3, Mistral, and other state-of-the-art language models. The release also added support for AMD MI300/325, RDNA3/4, and NVIDIA RTX 2060–5090, along with expanded model coverage.

June also united builders from around the world for Modular Hack Weekend, where developers created everything from Fast Fourier Transform implementations and GPU-accelerated quantum simulators to high-performance bioinformatics libraries. We introduced Mammoth, our new system for scaling GenAI inference across any GPU, and rolled out new ways to integrate Mojo kernels directly into Python workflows. The community pushed the boundaries of kernel design, explored breakthroughs in scientific computing, and continued expanding what’s possible with Mojo and MAX.

Let’s take a look at everything the Modular universe made possible last month.

Blogs, Tutorials, and Videos

Developers from across the AI and systems programming communities recently came together for Modular Hack Weekend: a global, virtual hackathon focused on GPU programming and model implementation with Mojo and MAX.
- To kick off the hackathon, we hosted a GPU Programming Workshop, both in-person at our office in Los Altos, California, and virtually via livestream. Check out all the talk recordings:
  - Chris Lattner and Tim Davis, co-founders of Modular
  - Chuan Li, founding team member and Chief Scientific Officer at Lambda
  - Bin Bao, Software Engineer and torch.compile Tech Lead at Meta
  - Jared Roesch, cofounder of OctoAI and Distinguished Engineer at NVIDIA
- Watch the hackathon highlight reel and check out the recap blog post.
- Explore the winning projects and all the submitted projects.
We dropped a series of exciting announcements in our video premiere:
- Modular Platform is now generally available on AMD Instinct™ MI300X and MI325 GPUs! Benchmarks show up to 53% better throughput on prefill-heavy BF16 workflows. Together with AMD, we’re combining best-in-class compute with developer-friendly software. Check out the full blog post.
- Meet Mammoth: our new Kubernetes-native system for scaling GenAI inference across any GPU. Deploy Hugging Face models across AMD and NVIDIA from a single container, with no manual configuration. Join the public preview.
- Mojo in Python: you can now drop Mojo kernels directly into your Python workflows. Available today in nightly builds, and backed by 450k+ lines of open source Mojo kernel code. Start here.
Now on YouTube: Chris Lattner's full talk from AMD AdvancingAI 2025! Learn how Mojo brings together Python’s simplicity and C++ performance to power a next-gen AI software stack. Plus, catch the post-talk Q&A with Chris.
Chris Lattner joined the Latent Space podcast to share an inside look at the history of Modular and Mojo, and the future of GPU programming.
Our June community meeting featured two in-depth presentations on how Mojo is being applied in scientific computing:
- Bioinformatics with Mojo: Seth walked us through ish, a high-performance, index-free alignment tool built in Mojo. He shared insights on SIMD optimizations, GPU acceleration, and benchmarking against C++ libraries like Parasail.
- Particle Physics with Mojo: Photon shared how Mojo is helping streamline complex particle physics simulations. He introduced two open-source libraries, newmojo and hepjo, and discussed porting a C++/Python research pipeline to Mojo with promising performance gains.
Simon Veitner published a deep dive on crafting a blazing-fast matrix transpose kernel for NVIDIA Hopper. He covers TMA, swizzling, thread coarsening, and shows how far you can push performance using pure Mojo.
- If you want to understand how to set up descriptors and move data efficiently, start with Simon's previous post.
- New to Mojo? Start from the beginning. Simon’s intro post shows how to write your first GPU kernel using vector addition in Mojo’s Pythonic syntax.
- If you're looking to push Mojo even further, Simon also walked through using custom PTX instructions in Mojo for advanced GPU control.
We released our comic series, GPU Whisperers, that perfectly captures the beautiful chaos of living through the GenAI revolution! 🧑‍🚀
Vincent Warmerdam shared an excellent writeup on calling Mojo from Python.
Modular is now available on the Amazon Web Services (AWS) Marketplace! 500+ Pre-Optimized Models, with an OpenAI API Compatible endpoint, ready for you to run across NVIDIA B200, H200, H100, A100, A10, L40 and L4 GPUs, with intelligent batching and memory management.
Modular Tech Talks is an exclusive series featuring internal presentations from our engineering team, explaining the inner workings of the Modular technology stack. In our most recent edition, Kyle Caverly gives a tour of the MAX Pipelines architecture, covering its major interfaces and how they enable the Modular team to rapidly bring up state of the art models with high-performance features like KV Cache optimization and speculative decoding.
Vibe coding your next Mojo masterpiece? You’re in luck: check out our guide on using AI coding assistants like Cursor and Copilot to build faster with Mojo and MAX.
Our recent Democratizing AI Compute series by Chris Lattner offers a clear perspective on the challenges shaping the future of AI infra, and you can now explore the full series in one convenient place! 🔖 Bookmark for later or subscribe to the RSS feed.
- In the latest installment, Chris Lattner plots a flight plan across the Modular stack: Mojo the molten inner world, MAX the mighty gas giant, all orbiting in the Mammoth cluster.
Building high-performance AI infrastructure doesn’t have to take months. Inworld proved that by launching a state-of-the-art speech pipeline into production in under 8 weeks with Modular. Their blog post explains how they used MAX and Mojo to run on NVIDIA Blackwell GPUs, meet real-time latency targets that were 70% faster than using the latest vLLM.

Awesome MAX + Mojo

forfudan created an online book on Mojo called “Mojo Miji - A Guide to Mojo Programming Language from A Pythonista’s Perspective”.
TilliFe started a notebook series on training transformer neural networks in Nabla, a framework for differentiable programming in Mojo.
HammadHAB built a simple, lightweight INI file parser in Mojo.
26 community members shared their Modular Hack Weekend projects!

Open-Source Contributions

‍If you’ve recently had your first PR merged, message Caroline Frasca in the forum to claim your epic Modular swag! Check out the recently merged contributions from our amazing community members:

Ivo-Balbaert [1]
simveit [1][2][3]
soraros [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41]
martinvuyk [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
christoph-schlumpf [1]
bgreni [1][2]
gabrieldemarmiesse [1]
sstadick [1][2][3][4]
bgreni [1][2][3]
hardikkgupta [1][2][3]
sibarras [1]
msaelices [1][2]
mzaks [1][2]
zsiegel92 [1]
winding-lines [1]
samufi [1]

Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend

Blogs, Tutorials, and Videos

Awesome MAX + Mojo

Open-Source Contributions

Next blog post:

Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend

Blogs, Tutorials, and Videos

Awesome MAX + Mojo

Open-Source Contributions

Sign up for our newsletter

Next blog post: