April 8, 2025

Democratizing AI Compute, Part 8: What about the MLIR compiler infrastructure?

Chris Lattner

By 2018, AI software had a system fragmentation problem. TensorFlow, PyTorch, JAX, Glow, ONNX, TensorFlow-Lite, XLA, TVM—the list kept growing, and each framework invented its own tangled web of “AI graphs” with different “ops.” The ecosystem was splintering into silos, each racing to optimize for different hardware while reinventing the same ideas with subtle variations. Complexity was exploding, and something had to give.

At the time, I was helping scale Google’s TPUs (and several other internal ASICs) in support of TensorFlow, and it was clear we couldn’t keep reinventing compiler infrastructure from scratch for every project. We needed a better foundation. Fortunately, I had years of experience building LLVM—and Jeff Dean as my manager. Jeff, a legendary engineer and a compiler PhD himself, saw the same problem.

In a 1:1 conversation, Jeff said something like:

“Hey Chris, I agree we have a compiler problem. Why don’t you go build a new compiler to unify this mess?”

And so, MLIR was born—a modular, extensible compiler infrastructure designed to bring order to the chaos. It brought forth a foundation that could scale across hardware platforms, software frameworks, and the rapidly evolving needs of machine learning. It aimed to unify these systems, and provide a technology platform that could harmonize compute from many different hardware makers.

But unification is hard. What started as a technical project quickly turned into a battleground: open-source governance, corporate rivalries, and competing visions all collided. What could have been a straightforward engineering win became something much more complicated.

Today, MLIR is embedded in nearly every major AI stack—including CUDA—yet it still hasn’t delivered on the dream of democratized AI compute.

This is the story of MLIR: how it started, what it changed, and the power struggles along the way.

You're reading Part 8 of Modular’s “Democratizing AI Compute” series. For more, see:

MLIR, the Origin Story

Modern AI systems rely on complex graphs of operations—matrix multiplications, convolutions, attention mechanisms, and more—all strung together into computational pipelines. To optimize and transform these efficiently requires a solid compiler foundation, as discussed in part 6.

But in 2018, most AI frameworks were reinventing compiler technology—and often doing it poorly. Basic techniques like Static Single Assignment (SSA) were missing from many. Each framework had its own ad-hoc graph system, bolted together with hacks that didn’t scale. The result was a fragmented, inefficient ecosystem, riddled with duplication.

I knew we needed a better approach, so I pulled four like-minded folks into a small room at Google. We spent days white-boarding, sketching out what a modern, scalable compiler infrastructure for AI might look like. Our central question: Could we build a unified representation that could support every AI framework, every hardware backend, and every kind of optimization—from algebraic simplification to polyhedral analysis?

Circa 2018: Yours truly and four colleagues gather in front of a whiteboard to brainstorm a next-generation compiler

The breakthrough idea we created is now known as MLIR dialects—a way to cleanly separate domain-specific concerns from the core infrastructure of a compiler. Rather than forcing every user to adopt a rigid, one-size-fits-all intermediate representation (like LLVM and other compilers), MLIR would let compiler engineers define their own representations—custom ops, types, and semantics—tailored to their domain.

Aside: I’m not diving deep on how MLIR works in this post. If you’re curious, check out the original technical keynote or one of the many tutorials online.

At the time, this was a radical departure from how most compilers were built. Traditional infrastructures were monolithic—forcing all frontends and passes into a single, rigid model. But MLIR embraced heterogeneity from day one. It let multiple levels of abstraction coexist, transform, and interoperate seamlessly.

That modularity was the key. Instead of reimplementing the same infrastructure over and over, MLIR gave developers a shared foundation—whether they were working with TensorFlow graphs, PyTorch IR, or custom TPU ops. It made it possible to build specialized compilers without starting from scratch, and it enabled true composability across the AI compiler stack.

MLIR wasn’t just another compiler: It was a framework for building many compilers.

MLIR Growth Within Google and Beyond

MLIR began as a research project inside Google Brain as a focused team trying to rethink how AI compilers should work. My team was heads-down on the fundamentals: designing the IR, implementing transformations, and validating that the core ideas actually worked. Meanwhile, Google’s open culture and MLIR’s modular design made it easy for others to pick it up and experiment. Before long, MLIR took on a life of its own.

Across Google, teams working on custom ASICs saw the potential. MLIR gave them a structured way to express and optimize hardware-specific operations. Application-focused teams started using it for mobile AI, and the TensorFlow team brought MLIR into TensorFlow Lite. Even individual researchers, fascinated by MLIR’s flexibility, began using it to prototype novel compiler techniques.

What followed was a mini-explosion of use cases. Every new application brought fresh feedback, often while we were still deep in iteration mode. Crucially, this validated our dialect-first approach—proving that MLIR could scale across wildly different domains, from edge devices to datacenter accelerators. Eventually, we reached a tipping point: MLIR was becoming a critical piece of infrastructure across many projects.

Many of us wanted MLIR to reach its full potential—to go beyond Google’s first-party use cases.

Above: a well-known meme within the MLIR community (Credit: Mehdi Amini)

So we took the leap: we open-sourced MLIR and contributed it to the LLVM Foundation, making it available for the entire industry. To support adoption, we organized regular “open design meetings,” where external contributors could participate in MLIR’s evolution and benefit from the engineering investment behind it. This open collaboration helped catalyze MLIR’s global momentum, especially among compiler developers hungry for a modern infrastructure.

With this as fuel, MLIR took off: It is now the foundation for many major AI projects: OpenXLA, Triton, and even parts of CUDA itself. It’s also powering compilers in quantum computing, hardware design (via CIRCT), and many other domains. Companies around the world—from scrappy startups to hyperscalers—started building their next-generation compilers using MLIR. Much of MLIR’s early growth and success was directly attributable to Google’s leadership and open approach—something I think the industry still under-appreciates.

Yet for all that success, the grand vision remained out of reach. The ecosystem is still fractured. CUDA still reigns supreme. The dream of truly democratized AI compute remains just that—a dream.

So what happened? Why did MLIR succeed technically, but fail to break the CUDA lock-in?

To understand that, we need to talk about the politics, power struggles, and compromises that shaped MLIR’s evolution.

The Race to Claim an End-to-end AI Solution

From the outset, MLIR was conceived as general-purpose compiler infrastructure—a framework designed to allow for domain-specific compilers. The goal was flexibility and modularity—MLIR was never just about Machine Learning. In fact, the “ML” in MLIR stood for everything but Machine Learning (yep, compiler jokes are nerdy!). However, the AI community was hungry for something more. The AI world wanted an end-to-end compiler—something that could map TensorFlow or PyTorch models cleanly to a broad range of hardware.

The race was on to build the first end-to-end MLIR-based AI solution

As MLIR gained traction, teams inside and outside Google began racing to build an end-to-end AI solution on top of it. Other projects—like OpenXLA, TritonLang and many others—adopted MLIR as an implementation detail to strengthen their own stacks. This raised a question: Everyone wanted to be the next-gen AI stack—so who would get there first?

The race was on. Years later, we know the unfortunate answer: nobody.

MLIR’s AI Dialect Explosion

Contributing MLIR to the LLVM Foundation supercharged adoption. It gave companies a shared foundation—and compiler engineers a chance to prove serious impact inside their organizations. The LLVM Foundation helps with oversight and legal matters, but doesn’t intervene in technical design. For that, the community is left to self-organize.

Engineers across the industry, led by Google, started contributing AI-specific dialects—including arith, linalg, and tensor—providing some bits and pieces useful for building a modern AI compiler stack. It started with Google research teams who had early access to MLIR—but the precedent was set: many “potentially useful” contributions were upstreamed, with limited governance that allowed project leaders to say “no” in a principled way.

Unfortunately, this explosion happened very early in MLIR’s design, and many design decisions in these dialects weren’t ideal for the evolving requirements of GenAI. For example, much of this early work was directed towards improving TensorFlow and building OpenXLA, so these dialects weren’t designed with first-class PyTorch and GenAI support (as we discussed earlier in this series).

While many of these efforts hit their original goals, the world changed around them.

Competitive “Coopetition” Strikes Back

For a variety of reasons, almost all of the early MLIR developers (including myself) moved on from Google, with many of them ending up at hardware companies.  This spread of MLIR knowledge was a positive outcome—it meant that the technology would grow far and wide—but it also brought new challenges.

The problem? MLIR’s success scattered its core developers across the industry. Former allies and colleagues—now at competing companies—began building proprietary AI stacks on top of shared MLIR dialects. What began as open collaboration soon collided with commercial competition. With a lack of central coordination, communication between these teams broke down. Competing priorities created tension, and the once-unified vision for MLIR began to splinter.

MLIR's identity crisis: Machine learning solution or compiler framework?

MLIR now faces is an identity crisis: Is it a general-purpose compiler framework for any domain—or an AI solution? Today, it remains unmatched as general-purpose, reusable infrastructure, powering everything from hardware design to quantum computing. On the other hand, the built-in AI-related dialects are contested and incomplete—but still critical to many open and proprietary downstream stacks.

It started to feel a lot like OpenCL all over again: no reference stack, competing hardware vendors, and a very polite battlefield—just like the old Khronos committee.

A New Hope: Improved MLIR Governance

The tensions have simmered for years—and they're deeply felt across the broader LLVM and MLIR communities.

Fortunately, there’s a new hope: LLVM is a meritocratic community with a long track record of aligning engineers—even when their companies are at war in the market. The MLIR community is filled with amazing engineers who have poured years of their hearts and souls into improving the project to work through these challenges, and progress is now happening!

MLIR now has a new Area Team to help guide its evolution, along with a new organizational structure and charter and governance group. The charter defines separate area groups: MLIR Core (the domain-independent infrastructure), and the dialects (like the machine learning-specific pieces).  I am extremely thankful to everyone who is spending time to improve MLIR and work through these issues—such work has a profound impact on everyone building into the ecosystem as well as the downstream users.

If I could have one wish, it would be for ”MLIR” to unambiguously refer to the domain-independent compiler infrastructure, and for these dialects to get a new, different name (perhaps “TensorIR”?). This would reduce confusion about what “MLIR” actually is!

Lessons learned from MLIR

The biggest lesson I learned from MLIR is how scaling too early—before the core foundations are fully settled—can cause lasting problems. The early explosion of interest and contribution was exciting, but it also meant that many design decisions were made in parallel, without clear guidance or alignment. We got “many things fast” at the expense of getting “something great at each level,” and then fell prey to Hyrum's Law.

This also reinforced a management lesson I’ve learned in other places: when you have too many smart engineers running ahead in different directions, it’s hard to steer the ship later—even if the ship is made of beautifully designed IR. In this case, while I remain influential in the LLVM/MLIR community, I learned that influence is no match for the paycheck from an employer, which guides a contributor to get their work into the tree so they can move on to the next bug fix or project.

Another lesson is about infrastructure with ambition. My goal for MLIR was to unify compiler implementations—and it succeeded beyond my hopes. But I also encouraged and catalyzed others to aim beyond that, fueled by a shared optimism that community-led projects could move the world forward. That didn’t work out, and it reinforced a lesson of mine seen across other industry-impactful projects I’ve helped build—LLVM, Clang, Swift, and “MLIR Core.” I learned, more than ever, that small teams are best at aligning on a vision of success and driving it forward. Only once a project’s identity is firmly established does it make sense to scale it to a broader community.

MLIR has many dialects, but many are contested or incomplete.

As with the tradition of my last three blog posts, I’ll try to evaluate the MLIR AI dialects against the wishlist of features for a next-generation AI solution.  Here’s my best take:

  • “Provide a reference implementation”: While MLIR is excellent for general-purpose compiler infrastructure, it does not include an end-to-end solution that can be used directly for AI workloads, just useful building blocks with “some assembly required”. 👎
  • Have strong leadership and vision”: MLIR AI dialects lacked clear leadership early on, with contributions often driven by individuals or different teams, resulting in fragmented direction and confusion over its core identity. While strong leadership is emerging, it remains unresolved. 👎
  • “Run with top performance on the industry leader’s hardware”: While MLIR Core provides a strong foundation for optimization, I’m not aware of any downstream implementations built on the MLIR AI Dialects that match CUDA’s performance for GenAI LLMs on NVIDIA GPUs (including Triton or cuTile that leave 15-20% performance on the table). 👎
  • “Evolve rapidly”: MLIR’s pace of evolution has been impressive, with contributions flooding in from a broad community. The flexibility of its design has allowed for rapid adaptation to new use cases and domains. 👍
  • “Cultivate developer love”: MLIR has certainly won the hearts of compiler engineers and system researchers, offering a flexible and powerful toolkit for building custom compilers. 👍  However, AI developers, especially those in the machine learning community, have found the learning curve steep and the integration with existing ML frameworks to be less seamless. 👎
  • “Build an open community”: MLIR has built a truly open and thriving community. Regular design meetings, open contributions, and cross-company collaboration have helped it gain broad adoption and investment from many industry players.👍👍
  • “Avoid fragmentation”: This is where MLIR has struggled the most. The early explosion of dialects and contributions, combined with a lack of strong central governance, led to fragmentation in downstream systems. The vision for a unified approach to AI compilation was difficult to maintain as competing projects moved in different directions.👎👎👎

Ultimately, as we discussed before, this is a wildly unfair way to measure “MLIR core” as a compiler building toolkit—MLIR is widely used by dozens of systems and has certainly succeeded in its original mission. The success of MLIR’s AI dialects is best measured through its impact on the countless downstream AI implementations that it gets utilized in—I’m just not sure how to do that.

Why do HW companies struggle to build AI software?

At this point in the series, a pattern has emerged: whether it’s OpenCL/OneAPI, TVM/XLA, MLIR, or some other well-meaning acronym, we’ve seen powerful attempts to build unifying AI infrastructure—but none have delivered a solution that developers love. Projects fragment, promises fade, and users of alternate hardware are left with tools that don’t “just work”.

The hard truth is this: only one company has ever truly figured this out, and that’s NVIDIA. CUDA isn’t just infrastructure—it’s a strategy, backed by tight vertical integration, application engineers on the ground, and a relentless focus on real-world performance. It’s not open and it’s not pretty—but it works great for NVIDIA, even if the innovator’s dilemma is alive and well in Santa Clara.

So, why can’t other hardware companies pull this off? Why do the industry’s smartest people, backed by billions in funding, keep producing software no one wants to use? When you’re competing against an entrenched, vertically integrated leader, the deck is stacked against you—and the incentives of the industry and the organizations within it shape the outcome:

“Show me the incentive and I'll show you the outcome.”
– Charlie Munger

We’ll dive deeper into that next time—and until then, let no dialect go uncanonicalized! 🛠

-Chris

Chris Lattner
,
Co-Founder & CEO

Chris Lattner

Co-Founder & CEO

Distinguished Leader who founded and scaled critical infrastructure including LLVM, Clang, MLIR, Cloud TPUs and the Swift programming language. Chris built AI and core systems at multiple world leading technology companies including Apple, Google, SiFive and Tesla.

clattner@modular.com