February 12, 2024

Mojo vs. Rust: what are the differences?

Jack Clayton

Mojo: our goal

Mojo is built on the latest MLIR compiler technology, an evolution of LLVM which Rust lowers to. This enables programmers to write code optimized for different CPU architectures, and use the same ergonomic programming model to compile and run native GPU kernels. Mojo's goal as a language is to meet Python developers where they are, and allow them to learn some new tricks to optimize their code to the performance limits of any hardware. Mojo's main advantage is taking all the ergonomic and memory safety advancements from languages like Rust, and allow you to use the same proven concepts when programming GPU's.

Is Mojo faster than x language?

A common question when users first join the Discord is How much faster is Mojo than x language?. There are a lot of considerations surrounding any benchmark implementation, you can't use any one benchmark to say x language is faster than y language. A better question is How much overhead does Mojo introduce, compared to x?. A major goal for Mojo, is to allow you to push your hardware to the limits of physics, while remaining ergonomic and familiar to Python developers.

Compared to a dynamic language like Python, compiled languages allow you to remove unnecessary CPU instructions such as allocating objects to the heap, reference counting, and periodic garbage collection. Mojo takes lessons learned and best practices from languages like C++, Rust, Swift, and Zig to provide direct access to the machine without these kinds of overheads.

Mojo vs. Rust

Mojo and Rust both allow you to optimize at a lower level, but in Rust for example you can still wrap everything in Arc, Mutex, Box etc. to avoid fights with the borrow checker at the cost of performance. If you’re writing application code this might not have any significant impact, but if you’re writing a library or performance sensitive code, that overhead can add up quickly. It's up to the programmer how much they care about reducing overhead and optimizing performance.

Mojo has made some design decisions to improve the ergonomics and performance tradeoffs:

Reduced memcpy with borrow by default

When a new user is learning Rust, one of the first pitfalls they run into, is that function arguments default to taking an object by moving it. This means when you pass something into a function and try to reuse it, you get a compiler error:

Rust
fn bar(foo: String){ println!("{foo}"); } fn main(){ let foo = String::from("bar"); bar(foo); dbg!(foo); }
Output
5 | let foo = String::from("bar"); | --- move occurs because `foo` has type `String`, which does not implement the `Copy` trait 6 | bar(foo); | --- value moved here 7 | dbg!(foo); | ^^^^^^^^^ value used here after move

The line with dbg! throws a compiler error, because you've moved foo into the bar function and so can't reuse it. In Rust move can also mean that foo does a memcpy of the String pointer, size, and capacity. The memcpy can be optimized away by LLVM in many cases, but this doesn't always occur and is hard to predict unless you know how the Rust/LLVM compiler works.

The way you solve this is by making foo a &str or other similar borrowed type which String can automatically dereference to:

Rust
fn bar(foo: &str){ println!("{foo}"); } fn main(){ let foo = String::from("foo"); bar(&foo); dbg!(foo); }
Output
foo

Mojo simplifies this concept for the standard use case:

Mojo
# foo is an immutable reference by default fn bar(foo: String): pass fn main(): var foo = String("foo") bar(foo) print(foo)
Output
foo

Mojo arguments are borrowed by default: not only is this much more gentle when learning Mojo compared to Rust, it's also more efficient due to no potential implicit memcpy. If you want to get closer to Rust behavior, you can change the argument to owned:

Mojo
fn bar(owned foo: String): foo += "bar" fn main(): var foo = String("foo") bar(foo) print(foo)
Output
foo

This still works! Because String implements a copy constructor, it's able to be moved into bar and leave behind a copy. Under the hood this is still passing by reference for maximum efficiency, it'll only create a copy if foo is mutated.

To fully opt into the Rust default of moving an object and losing ownership, you need to use the ^ transfer operator:

Mojo
fn bar(owned foo: String): foo += "bar" # Ok to mutate a uniquely owned value fn main(): var foo = String("foo") bar(foo^) print(foo) # error: foo is uninit because it was transferred above

Now you finally get a compiler error for trying to use foo after move, you have to work much harder to fight the borrow checker in Mojo! This is the better default behavior, not only is it more efficient, it doesn't roadblock engineers from dynamic programming backgrounds like Python. They still get the behavior they expect by default, with the best performance possible.

No Pin requirement

In Rust for a self-referential struct pointing to its own member, that data can become invalid if the object moves, as it'll be pointing to the old location in memory. This creates a complexity spike, particularly in parts of async Rust where futures need to be self-referential and store state, so you must wrap Self with Pin to guarantee it's not going to move. In Mojo, when moving an object that has an an address, you can still update any self-referential fields. So self.foo will continue to point correctly to the location of the object in memory, even in async contexts.

There is a nice blog titled pin and suffering that takes you on a journey of a Rustacean 🦀 working through the implications of Pin. These are complexities that a Mojician 🪄 will never encounter.

Update: check out this blog by one of the core Rust async contributors on how Pin came to be: https://without.boats/blog/pin and how they're planning to improve it: https://without.boats/blog/pinned-places.

Built on state-of-the-art compiler technology

Rust was started in 2006 and Swift was started in 2010, and both are primarily built on top of LLVM IR. Mojo started in 2022 and builds on MLIR, which is a more modern “next generation” compiler stack than the LLVM IR approach that Rust uses. There is a history here: our CEO Chris Lattner started LLVM in college in Dec 2000 and learned a lot from its evolution and development over the years.  He then led the development of MLIR at Google to support their TPU and other AI accelerator projects, taking that learning from LLVM IR to build the next step forward: described in this talk from 2019.

Mojo is the first programming language to take advantage of all the advances in MLIR, both to produce more optimized CPU code generation, but also to support GPUs and other accelerators, and to also have much faster compile times than Rust. This is an advantage that no other language currently provides, and it's why AI and compiler nerds are excited about Mojo 🔥. They can build their fancy abstractions for exotic hardware, while less specialized engineers can take advantage of them with Pythonic syntax.

Great SIMD ergonomics

CPUs have special registers and instructions to process multiple bits of data at the same time, known as SIMD (Single Instruction, Multiple Data). But the ergonomics of writing this code has historically been very ugly and difficult to use. These special instructions have been around for many years, but most code is still not optimized for it. When someone works through the complexities and writes a portable SIMD optimized algorithm, it blows the competition out of the water, for example simd_json.

Mojo's primitives are natively designed to be SIMD-first: UInt8 is actually a SIMD[DType.uint8, 1] which is a SIMD of 1 element. There is no performance overhead to represent it this way, but it allows the programmer to easily use it for SIMD optimizations. For example, you can split up text into 64 byte blocks and represent it as SIMD[DType.uint8, 64] then compare it to a single newline character, in order to find the index for every newline. Because the SIMD registers on your machine can calculate operations on 512bits of data at the same time, this will improve the performance for those operations by 64x!

Or a more simple example is if you have a SIMD[DType.float64, 8](2, 4, 6, 8, 16, 32, 64, 128), you can simply multiply it by a Float64(2), improving performance by 8x on most machines compared to multiplying each element individually.

LLVM (and therefore Rust) has automatic vectorization optimization passes, but they’ll never be able to reach the same level of performance as the programmer expressing exactly what they intended, because LLVM cannot change memory layout or other important details for SIMD. Mojo has been built from the ground up to take advantage of SIMD, and writing SIMD optimizations feels very close to writing normal code.

Eager Destruction

Rust was inspired by RAII (Resource Acquisition is Initialization) from C++, which means that once the object goes out of scope, the application developer doesn't have to worry about freeing the memory, the programming language takes care of it. This is a really nice paradigm, you get the ergonomics of a dynamic language, without the performance drawback of a garbage collector.

Mojo takes this one step further, instead of waiting until the end scope, it frees the memory on last use of the object. This is advantageous in the field of AI, where freeing an object early can mean deallocating a GPU tensor earlier, therefore fitting a larger model in GPU RAM. This is a unique advantage for Mojo, where the programmer gets the best possible outcome without having to think about it. The Rust borrow checker originally extended the lifetime of everything to the end of its scope to match the destructor behavior, which had some confusing consequences for users. Rust added features to simplify this for developers with Non-Lexical Lifetimes. Due to Mojo’s eager destruction, we get these simplifications for free, and it aligns with how objects are actually destroyed so we don’t have confusing edge cases.

Another piece of overhead is the way that Drop works in Rust. It tracks if an object should be dropped at runtime, with Drop Flags. Rust can optimize these away in some cases, but Mojo defines them away categorically to eliminate the overhead in all cases.

Tail Call Optimization (TCO)

Because Mojo has eager destruction, MLIR and LLVM are able to perform tail call optimizations more effectively. This example compares a recursive function with a heap allocated dynamic vector in both languages. Note that this is just a simple example with as few lines of code as possible to demonstrate the difference.

First run cargo new rust and edit ./rust/src/main.rs to look like this:

./rust/src/main.rs
fn recursive(x: usize){ if x == 0 { return; } let mut stuff = Vec::with_capacity(x); for i in 0..x { stuff.push(i); } recursive(x - 1) } fn main() { recursive(50_000); }

Then run:

Bash
cd rust cargo build --release cd target/release hyperfine ./rust

These results are on an M2 Mac:

Output
Benchmark 1: ./rust Time (mean ± σ): 2.119 s ± 0.031 s [User: 1.183 s, System: 0.785 s] Range (min … max): 2.081 s … 2.172 s 10 runs

And you can run the mojo version with a single file in the same folder, call it mojo.mojo:

./mojo.mojo
fn recursive(x: Int): if x == 0: return var stuff = List[Int](x) for i in range(x): stuff.append(i) recursive(x - 1) fn main(): recursive(50_000)

Then run:

Bash
mojo build mojo.mojo hyperfine ./mojo

The compiler must ensure that destructors are called at the appropriate time, which for Rust is when a value goes out of scope. In the recursive function, the Vec has a destructor that needs to be run after each function call. This means the function's stack frame can't just be discarded or overwritten, as is required for tail call optimization. Because Mojo destructs eagerly it doesn't have this limitation, and is able to optimize for TCO more efficiently with heap allocated objects.

You can get more insight to this behavior when profiling the two programs with valgrind --tool=massif. I switched to a Linux cloud instance to run this experiment, which sent the Rust mean time to 9.067 s with 10 GB peak allocated memory, and Mojo to 1.189 s with 1.5 MB peak allocated memory! As previously noted, memory is an important resource in AI applications, and eager destruction ensures the programmer gets optimal behavior without having to think about it.

You can try running the above bechmarks yourself. If you don't have Mojo 🔥 yet, you can install it here!

Conclusion

We all love Rust at Modular and are inspired by it, the tooling is great, and it currently has one of the best high level ergonomics for any systems programming language. But it has some problems in the field of AI, such as slow compile times, and the lack of motivation for AI researchers to learn a much more difficult language from scratch. We also love Python/C++/Rust/Swift/Julia etc. but after over a decade of the industry hill climbing these technologies, we believe that the fresh start that Mojo embodies is the only way to make a dent in these age-old problems.

Mojo already has optimal performance for systems engineers, and we're working on all the dynamic features that Python programmers expect. If you're curious about GPU programming emory safety, or an easier to use language with memory safety and high performance, you can give Mojo a try here. We've also built MAX as the killer app to show the world what Mojo can do!

We'd love to see you in the Mojo community, here are some links to get you started:

Jack Clayton
,
AI Developer Advocate

Jack Clayton

AI Developer Advocate

Jack started his career optimizing autonomous truck software for leading mining companies, including BHP and Caterpillar. Most recently he was designing computer vision software, putting AI inference pipelines into production for IDVerse. He is enormously passionate about the developer community, having been a Rust, Go, Python and C++ developer for over a decade. Jack enjoys making complicated topics simple and fun to learn, and he’s dedicated to teaching the world about Mojo 🔥.