Modular: The world's fastest unified matrix multiplication

Abdul Dakkak

AI Compiler Engineer

Expert in machine learning, compilers, programming languages, and accelerated computing. Before Modular, Abdul led the development of AI compilers for GPUs at Microsoft Research and the Mathematica Compiler at Wolfram Research. Abdul has developed open-source tools for accelerating real-world applications to optimize their performance across the hardware and software stack.

View more articles from Abdul

Chad Jarvis

AI Performance Engineer

Chad has a strong background in low-level code optimization, parallel programming, and high performance computing. He has worked for more than 17 years in research and engineering in Europe and North America. He holds a PhD in particle physics and is one of the authors of the Higgs Boson discovery. Chad has a passion and focus for understanding things at a fundamental level. Prior to joining Modular he worked at Graphcore working closely with the hardware engineers to implement and optimize custom features of the GC hardware, and before that Simula Research Laboratory among many other distinguished research facilities. He enjoys spending his spare time with his family, travelling, collecting rare coins, and eating excellent food.

View more articles from Chad

Eric Johnson

Product Lead

Product leader who has built and scaled AI applications and infrastructure. Eric led the TensorFlow API, Compiler, and Runtime teams at Google Brain and Core Systems, including the founding of TFRT and the productionization of JAX. He holds an MBA from Wharton and Computer Science MS from Penn and loves soccer, fitness, and the great outdoors.

View more articles from Eric

eric@modular.com

Hengjie Wang

AI Performance Engineer

Hengjie Wang is a software engineer focusing on performance optimizations for AI and scientific applications. He has many years of experience in developing and optimizing large-scale scientific applications on world-ranking supercomputers. He has also developed Deep learning algorithms to advance physical simulations. Before joining Modular, he was a postdoctoral scholar in the Lawerence Berkeley National Lab, where he participated in developing the Exa-scale projects MFIX-Exa and AMReX on national supercomputers. He is a big fan of Go and enjoys hiking and dog training.

View more articles from Hengjie

Ian Tramble

AI Performance Engineer

Experienced systems software engineer with a background in performance and accelerated computing. Before joining Modular, Ian spent 5 years at NVIDIA working on MLPerf Inference, TensorRT, and systems software for autonomous vehicles. He is passionate about providing great out-of-the-box performance by abstracting hardware. Ian graduated from the Engineering Science program at the University of Toronto with a major in electrical and computer engineering.

View more articles from Ian

The world's fastest unified matrix multiplication

"Matmul", a microcosm of AI performance

A novel approach

Unification starts with a single source of truth

Performance portability

Dynamism

Composability

Unparalleled performance

Methodology

Performance results

Kernel fusion aware

What’s next

Next blog post: