One language, any hardware. Systems-level performance. Pythonic syntax.

Mojo unifies high-level AI development with low-level systems programming. Write once, deploy everywhere - from CPUs to GPUs - without vendor lock-in.

Mojo highlights

  fn add(out: &mut LayoutTensor, a: &LayoutTensor, b: &LayoutTensor):
      i = global_idx.x
      if i < size:
          out[i] = a[i] + b[i]
  def mojo_square_array(array_obj: PythonObject):
      alias simd_width = simdwidthof[DType.int64]()
      ptr = array_obj.ctypes.data.unsafe_get_as_pointer[DType.int64]()
      @parameter
      fn pow[width: Int](i: Int):
          elem = ptr.load[width=width](i)
          ptr.store[width=width](i, elem * elem)
  struct VectorAddition:
      @staticmethod
      def execute[target: StaticString](
          out: OutputTensor[rank=1],
          lhs: InputTensor[dtype = out.dtype, rank = out.rank],
          rhs: InputTensor[dtype = out.dtype, rank = out.rank]
      )
          @parameter
          if target == "cpu":
              vector_addition_cpu(out, lhs, rhs)
          elif target == "gpu":
              vector_addition_gpu(out, lhs, rhs)
          else:
              raise Error("No known target:", target)

Why we built Mojo?

  • Vendor lock-in is expensive

    You're forced to choose: NVIDIA's CUDA, AMD's ROCm, or Intel's oneAPI. Rewrite everything when you switch vendors. Your code becomes a hostage to hardware politics.

  • The two-language tax

    Prototype in Python. Rewrite in C++ for production. Debug across language boundaries. Your team splits into 'researchers' and 'engineers' - neither can work on the full stack.

  • Python hits a wall

    Python is 1000x too slow for production AI. The GIL blocks true parallelism. Can't access GPUs directly. Every optimization means dropping into C extensions. Simplicity becomes a liability at scale.

  • Toolchain chaos

    PyTorch for training. TensorRT for inference. vLLM for serving. Each tool has its own bugs, limitations, and learning curve. Integration nightmares multiply with every component.

  • Memory bugs in production

    C++ gives you footguns by default. Race conditions in parallel code. Memory leaks that OOM your servers. Segfaults in production at 3 AM.

  • Developer experience ignored

    30-minute build times. Cryptic template errors. Debuggers that can't inspect GPU state. Profilers that lie about performance. Modern developers deserve tools that accelerate, not frustrate.

Why should I use Mojo?

  • Easier

    GPU Programming Made Easy

    Traditionally, writing custom GPU code means diving into CUDA, managing memory, and compiling separate device code. Mojo simplifies the whole experience while unlocking top-tier performance on NVIDIA and AMD GPUs.

      # GPU-specific coordinates for MMA tile processing
      @parameter
      for n_mma in range(num_n_mmas):
          alias mma_id = n_mma * num_m_mmas + m_mma
          var mask_frag_row = mask_warp_row + m_mma * MMA_M
          var mask_frag_col = mask_warp_col + n_mma * MMA_N
          @parameter
          if is_nvidia_gpu():
              mask_frag_row += lane // (MMA_N // p_frag_simdwidth)
              mask_frag_col += (lane * p_frag_simdwidth) % MMA_N
          elif is_amd_gpu():
              mask_frag_row += (lane // MMA_N) * p_frag_simdwidth
              mask_frag_col += lane % MMA_N
    
  • Performant

    Bare metal performance on any GPU

    Get raw GPU performance without complex toolchains. Mojo makes it easy to write high-performance kernels with intuitive syntax, zero boilerplate, and native support for NVIDIA, AMD, and more.

      # Using low level warp GPU instructions ergonomically
      
      @parameter
      for i in range(K):
          var reduced = top_k_sram[tid]
          alias limit = log2_floor(WARP_SIZE)
      
          @parameter
          for j in reversed(range(limit)):
              alias offset = 1 << j
              var shuffled = TopKElement(
                  warp.shuffle_down(reduced.idx, offset),
                  warp.shuffle_down(reduced.val, offset),
              )
              reduced = max(reduced, shuffled)
      
          barrier()
    
  • Interoperable

    Use Mojo to extend python

    Mojo interoperates natively with Python so you can speed up bottlenecks without rewriting everything. Start with one function, scale as needed—Mojo fits into your codebase

      if __name__ == "__main__":
          # Calling into a Mojo `passthrough` function from Python:
          result = hello_mojo.passthrough("Hello")
          print(result)
    
      fn passthrough(value: PythonObject) raises -> PythonObject:
          """A very basic function illustrating passing values to and from Mojo."""
          return value + " world from Mojo"
    
  • Community

    Build with us in the open to create the future of AI

    Mojo has more than  750K+ lines of open-source code with an active community of 50K+ members. We're actively working to open even more to build a transparent, developer-first foundation for the future of AI infrastructure.

    750k

    lines of open-source code

  • MOJO + MAX

    Write GPU Kernels with MAX

    Traditionally, writing custom GPU code means diving into CUDA, managing memory, and compiling separate device code. Mojo simplifies the whole experience while unlocking top-tier performance on NVIDIA and AMD GPUs.

      # Define a custom GPU subtraction kernel
      
      @compiler.register("mo.sub")
      struct Sub:
          @staticmethod
          fn execute[target: StaticString, _trace_name: StaticString]
              z: FusedOutputTensor,
              x: FusedInputTensor,
              y: FusedInputTensor,
              ctx: DeviceContextPtr,
          capturing raises:
              @parameter
              @always_inline
              fn func[width: Int](idx: IndexList[z.rank]) -> SIMD[z.dtype, width]:
                  var lhs = rebind[SIMD[z.dtype, width]](x._fused_load[width](idx))
                  var rhs = rebind[SIMD[z.dtype, width]](y._fused_load[width](idx))
                  return lhs - rhs
      
              foreach[
                  func,
                  target=target,
                  _trace_name=_trace_name,
              ](z, ctx)
    
  • Interoperable

    Powering Breakthroughs in Production AI

    Top AI teams use Mojo to turn ideas into optimized, low-level GPU code. From Inworld’s custom logic to Qwerky’s memory-efficient Mamba, Mojo delivers where performance meets creativity.

  • Performant

    World-Class Tools, Out of the Box

    Mojo ships with a great VSCode debugger and works with dev tools like Cursor and Claude. Mojo makes modern dev workflows feel seamless.

Mojo learns from

    • What Mojo keeps from C++

      • Zero cost abstractions

      • Metaprogramming power

        Turing complete: can build a compiler in templates

      • Low level hardware control

        Inline asm, intrinsics, zero dependencies

      • Unified host/device language

    • What Mojo improves about C++

      • Slow compile times

      • Template error messages

      • Limited metaprogramming

        ...and that templates != normal code

      • Not MLIR-native

    • What Mojo keeps from Python

      • Minimal boilerplate

      • Easy-to-read syntax

      • Interoperability with the massive Python ecosystem

    • What Mojo improves about Python

      • Performance

      • Memory usage

      • Device portability

    • What Mojo keeps from Rust

      • Memory safety through borrow checker

      • Systems language performance

    • What Mojo improves about Rust

      • More flexible ownership semantics

      • Easier to learn

      • More readable syntax

    • What Mojo keeps from Zig

      • Compile-time metaprogramming

      • Systems language performance

    • What Mojo improves about Zig

      • Memory safety

      • More readable syntax

Get started with Mojo

  • Start using Mojo

    ( FREE )

    Install Mojo and get up and running in minutes. A simple install, familiar tooling, and clear docs make it easy to start writing code immediately.

  • Easy ways to get started

    Not sure where to start?  These examples below give you a few simple entry points into Mojo.

    • Mojo Manual

      Write a simple GPU program and learn the basics.

    • GPU Puzzles

      Practice GPU programming with guided puzzles.

    • Python Interoperability

      Read and write Mojo using familiar Python syntax.

Popular Mojo Tech Talks

  • Man with short hair wearing a dark cardigan speaking against a background with vertical glowing light bars.

    Next-Gen GPU Programming

    1:15:56

  • Portrait of a smiling man wearing glasses and a dark shirt with programming code blurred in the background.

    Kernel Programming and Mojo

    52:51

  • Man wearing a plaid shirt speaking with a headset microphone against a dark background.

    GPU Programming Workshop

    11:36

“Mojo has Python feel, systems speed. Clean syntax, blazing performance.”

Explore the world of high-performance computing through an illustrated comic. A fresh, fun take—whether you're new or experienced.

Read the comic

Developer Approved

Person with blonde hair using a laptop with an Apple logo.

very excited

strangemonad

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

actually flies on the GPU

Sanika

"after wrestling with CUDA drivers for years, it felt surprisingly… smooth. No, really: for once I wasn’t battling obscure libstdc++ errors at midnight or re-compiling kernels to coax out speed. Instead, I got a peek at writing almost-Pythonic code that compiles down to something that actually flies on the GPU."

easy to optimize

dorjeduck

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

amazing achievements

Eprahim

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

The future is bright!

mytechnotalent

Mojo destroys Python in speed. 12x faster without even trying. The future is bright!

one language all the way through

fnands

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through is be awesome.”

Community is incredible

benny.n

“The Community is incredible and so supportive. It’s awesome to be part of.”

impressive speed

Adalseno

"It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing."

impressed

justin_76273

“The more I benchmark, the more impressed I am with the MAX Engine.”

pure iteration power

Jayesh

"This is about unlocking freedom for devs like me, no more vendor traps or rewrites, just pure iteration power. As someone working on challenging ML problems, this is a big thing."

performance is insane

drdude81

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

huge increase in performance

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

potential to take over

svpino

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

feeling of superpowers

Aydyn

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

surest bet for longterm

pagilgukey

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

completely different ballgame

scrumtuous

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

works across the stack

scrumtuous

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

12x faster without even trying

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

was a breeze!

NL

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

high performance code

jeremyphoward

"Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators."