Modular: Democratizing AI Compute, Part 1: DeepSeek’s Impact on AI

DeepSeek’s recent breakthrough has upended assumptions about AI’s compute demands, showing that better hardware utilization can dramatically reduce the need for expensive GPUs.

This is Part 1 of Modular’s “Democratizing AI Compute” series. For more, see:

Part 1: DeepSeek’s Impact on AI (this article)
Part 2: What exactly is “CUDA”?
Part 3: How did CUDA succeed?
Part 4: CUDA is the incumbent, but is it any good?
Part 5: What about CUDA C++ alternatives like OpenCL?
Part 6: What about AI compilers (TVM and XLA)?‍
Part 7: What about Triton and Python eDSLs?

For years, leading AI companies have insisted that only those with vast compute resources can drive cutting-edge research, reinforcing the idea that it is “hopeless to catch up” unless you have billions of dollars to spend on infrastructure. But DeepSeek’s success tells a different story: novel ideas can unlock efficiency breakthroughs to accelerate AI, and smaller, highly focused teams to challenge industry giants–and even level the playing field.

We believe DeepSeek’s efficiency breakthrough signals a coming surge in demand for AI applications. If AI is to continue advancing, we must drive down the Total Cost of Ownership (TCO)–by expanding access to alternative hardware, maximizing efficiency on existing systems, and accelerating software innovation. Otherwise, we risk a future where AI’s benefits are bottlenecked–either by hardware shortages or by developers struggling to effectively utilize the diverse hardware that is available.

This isn’t just an abstract problem–it's a challenge I’ve spent my career working to solve.

My passion for compute + developer efficiency

I've spent the past 25 years working to unlock computing power for the world. I founded and led the development of LLVM, a compiler technology that opened CPUs to new applications of compiler technology. Today, LLVM is the foundation for performance-oriented programming languages like C++, Rust, Swift and more. It powers nearly all iOS and Android apps, as well as the infrastructure behind major internet services from Google and Meta.

This work paved the way for several key innovations I led at Apple, including the creation of OpenCL, an early accelerator framework now widely adopted across the industry, the rebuild of Apple’s CPU and GPU software stack using LLVM, and the development of the Swift programming language. These experiences reinforced my belief in the power of shared infrastructure, the importance of co-designing hardware and software, and how intuitive, developer-friendly tools unlock the full potential of advanced hardware.

Falling in love with AI

In 2017, I became fascinated by AI’s potential and joined Google to lead software development for the TPU platform. At the time, the hardware was ready, but the software wasn’t functional. Over the next two and a half years, through intense team effort, we launched TPUs in Google Cloud, scaled them to ExaFLOPS of compute, and built a research platform that enabled breakthroughs like Attention Is All You Need and BERT.

Yet, this journey revealed deeper troubles in AI software. Despite TPUs' success, they remain only semi-compatible with AI frameworks like PyTorch–an issue Google overcomes with vast economic and research resources. A common customer question was, “Can TPUs run arbitrary AI models out of the box?” The hard truth? No–because we didn’t have CUDA, the de facto standard for AI development.

I’m not one to shy away from tackling major industry problems: my recent work has been the creation of next-generation technologies to scale into this new era of hardware and accelerators. This includes the MLIR compiler framework (widely adopted now for AI compilers across the industry) and the Modular team has spent the last 3 years building something special–but we’ll share more about that later, when the time is right.

How do GPUs and next-generation compute move forward?

Because of my background and relationships across the industry, I’m often asked about the future of compute. Today, countless groups are innovating in hardware (fueled in part by NVIDIA’s soaring market cap), while many software teams are adopting MLIR to enable new architectures. At the same time, senior leaders are questioning why–despite massive investments–the AI software problem remains unsolved. The challenge isn’t a lack of motivation or resources. So why does the industry feel stuck?

I don’t believe we are stuck. But we do face difficult, foundational problems.

To move forward, we need to better understand the underlying industry dynamics. Compute is a deeply technical field, evolving rapidly, and filled with jargon, codenames, and press releases designed to make every new product sound revolutionary. Many people try to cut through the noise to see the forest for the trees, but to truly understand where we’re going, we need to examine the roots—the fundamental building blocks that hold everything together.

This post is the first in a multipart series where we’ll help answer these critical questions in a straightforward, accessible way:

🧐 What exactly is CUDA?
🎯 Why has CUDA been so successful?
⚖️ Is CUDA any good?
❓ Why do other hardware makers struggle to provide comparable AI software?
⚡ Why haven’t existing technologies like Triton or OneAPI or OpenCL solved this?
🚀 How can we as an industry move forward?

I hope this series sparks meaningful discussions and raises the level of understanding around these complex issues. The rapid advancements in AI —like DeepSeek’s recent breakthroughs–remind us that software and algorithmic innovation are still driving forces. A deep understanding of low-level hardware continues to unlock "10x" breakthroughs.

AI is advancing at an unprecedented pace–but there’s still so much left to unlock. Together we can break it down, challenge assumptions, and push the industry forward. Let’s dive in!

-Chris

What’s next?

Learn more about the MAX Platform and the Mojo programming language, and join us in building the next wave of AI innovation.

Get started with MAX–explore our latest platform to accelerate your AI work.
Check out the the Mojo manual and get started with our next-generation language for programming GPUs.
Continue the conversation in the Modular forum.

‍

Democratizing AI Compute, Part 1: DeepSeek’s Impact on AI

My passion for compute + developer efficiency

Falling in love with AI

How do GPUs and next-generation compute move forward?

What’s next?

Next blog post: