November 20, 2023

Implementing NumPy style matrix slicing in Mojo🔥

Python libraries such as NumPy and Pandas are so popular because they make engineers and data scientists productive. The first time I used NumPy, I was struck, not by its vast array of supported math functions, but by the ease with which I could manipulate arrays and matrices and, in particular, slicing rows, columns, or submatrices from large tabular datasets. 

Slicing is a very fundamental operation in data science. Take, for example, a large matrix with rows representing individual customers, and columns representing customer attributes such as demographics, buying preferences, etc. To calculate basic summary statistics across specific rows you’ll need to slice specific columns and calculate the statistics. To calculate similarity measures across customer attributes (say to cluster them), you’ll need to efficiently slice rows and calculate pairwise similarity measures. You may also want to slice specific columns to remove unwanted columns from your dataset. 

In this blog post, I’ll show how you can implement the NumPy style slicing feature in Mojo from scratch. Through this example, you’ll show you how to:

  1. Create your custom Matrix data structure that supports NumPy style slicing operations
  2. Use linear and strided memory access for row and column slicing
  3. Handle NumPy style slicing edge-cases (e.g. negative index [-2:,:] and open index [:,4])
  4. Implement vectorized and parallelized slicing of a submatrix 

The code example I use in this blog post is available on the Mojo repository on GitHub.

Before we proceed: The purpose of this blog post is to demonstrate how to use Mojo standard library features like slicing, vectorization, parallelization, and strided memory access so you can learn how to use them in your applications. The goal is not to build a feature-complete NumPy replacement or to write the most performant Mojo code. Both of which are good exercises for the reader.

Before we jump into the implementation details, let’s take a look at our final result. We’ll build a Matrix data structure that can perform slice operations.

Mojo

let mat = Matrix(8,5)
mat.print()
    

Output:

Bash

[[0.0850   0.8916   0.1896   0.3980   0.7435]
 [0.5603   0.8095   0.5117   0.9950   0.9666]
 [0.4260   0.6529   0.9615   0.8579   0.2940]
 [0.4146   0.5148   0.7897   0.5442   0.0936]
 [0.4322   0.8449   0.7728   0.1918   0.7803]
 [0.1813   0.5791   0.3141   0.4119   0.9923]
 [0.1639   0.3348   0.0762   0.1745   0.0372]
 [0.4674   0.6741   0.0667   0.3897   0.1653]]
  Matrix: 8 x 5 , DType: float32
    

We can slice the matrix to extract elements at rows 2,3,4 and columns last 3 columns:

Mojo

mat[2:4,-3:].print()
    

Output:

Bash

[[0.9615   0.8579   0.2940]
 [0.7897   0.5442   0.0936]]      
  Matrix: 2 x 3 , DType: float32
    

Verify that the elements in the sliced submatrix matches with the elements in original matrix

Now, let's try some advanced slicing:

Mojo

mat[1::2,::2].print()
    

Output:

Bash

[[0.5603   0.5117   0.9666]
 [0.4146   0.7897   0.0936]
 [0.1813   0.3141   0.9923]
 [0.4674   0.0667   0.1653]]
  Matrix: 4 x 3 , DType: float32
    

Verify that the elements in the sliced submatrix matches with the elements in original matrix

Creating a custom Matrix data structure with slicing support

Let’s start with a basic Matrix struct in Mojo. A struct in Mojo is similar to a class in Python, but Mojo structs are static and compile-time bound. Our Matrix structure has the following functions:

  1. Initialize the Matrix: __init__()
  2. Enable copying the sliced matrix/vector to a new variable: __copyinit__()
  3. Handle slice edge cases: _adjust_slice_()
  4. 4 x overloaded __getitem__() method to support indexing, row-only slice, column-only slice, and combination of row and column slicing
  5. Convenience print() function to visualize the matrix

The skeleton of our matrix structure looks like this:

Mojo

struct Matrix[dtype: DType = DType.float32]:
    var dim0: Int
    var dim1: Int
    var _data: DTypePointer[dtype]
    ...
    
    fn __init__(inout self, *dims: Int):
    ...
        
    fn __copyinit__(inout self, other: Self):
    ...
    
    fn _adjust_slice_(self, inout span: slice, dim: Int):
    ...
    
    fn __getitem__(self, x: Int, y: Int) -> SIMD[dtype,1]:
    ...
    
    fn __getitem__(self, owned row_slice: slice, col: Int) -> Self:
    ...
    
    fn __getitem__(self, row: Int, owned col_slice: slice) -> Self:
    ...
    
    fn __getitem__(self, owned row_slice: slice, owned col_slice: slice) -> Self:
    ...
    
    fn print(self, prec: Int=4)->None:
    ...
    

The three concepts we’ll be spending time on are:

  1. The struct variables that define a Matrix: Its dimensions dim0 (rows) and dim1 (columns) and its data which we allocate and access using DTypePointer 
  2. The _adjust_slice_() function that handles slice edge cases
  3. The __getitem__() function that performs all the slicing magic

Adjusting slice expression to handle edge cases

In Python and in Mojo __getitem__() is a special type of function that enables indexing on an object. For example when you want to access the element at row=0 and column=3 you can do: matrix[0,3], which is equivalent to: arr.__getitem__(0,3). In our example __getitem__() is overloaded to support various combinations of indexing with integers and slices and here we’ll discuss the following __getitem__() that supports slice variables

Mojo

fn __getitem__(self, owned row_slice: slice, owned col_slice: slice) -> Self:
    

The has two arguments row_slice and col_slice of type slice. When you use a slice expression such as [:,3:4], a slice object is generated and passed to __getitem__() which is why we have row_slice and col_slice variables with type slice. The slice object lets you access the requested slice items and the length of the slice, but it doesn’t handle edge cases. If the slice expression has negative numbers or exceeds the number of rows or columns in the matrix, we have to adjust the slice to handle those cases. This is what the _adjust_slice_() function does and is the first function we call on the row_slice and col_slice variables.

Here's a screenshot of __getitem__() highlighting the calls to _adjust_slice_() functions to handle slice edge cases:

Let’s take a closer look at _adjust_slice_()

Mojo

    fn _adjust_slice_(self, inout span: slice, dim: Int):
        if span.start < 0:
            span.start = dim + span.start
        if not span._has_end():
            span.end = dim
        elif span.end < 0:
            span.end = dim + span.end
        if span.end > dim:
            span.end = dim
        if span.end < span.start:
            span.start = 0
            span.end = 0
    

The _adjust_slice_() function supports row and column slice adjustments and works as follows. We check

  1. If the start is negative and we subtract it from the dimension size
  2. If the slice expression has no end, i.e. uses : and we assign an end
  3. If the end is negative, subtract it from the dimension size
  4. If the end exceeds the dimension size, fix it to the dimension size
  5. If the end is less than the start, make the slice invalid

After we adjust for the slice, we’re ready to start slicing the matrix. 

Slicing rows and columns with linear and strided memory access

First, let’s go over our memory access strategy. Matrix struct includes the raw data stored in memory and accessible using the pointer _data, and it includes dimension information stored in variables dim0 for row dimension and dim1 for column dimension. Together they contain the complete information to describe a matrix.

Accessing all elements in a row

Row access is relatively straightforward. We get the pointer to the first element defined by (row number to slice) * (number of columns). The following line of code gets you the pointer to the first element:

Once you have the pointer to the data and know the number of elements to load (=number of columns in the matrix), you can use _data.simd_load to load simd_width chunks of data. Things get a little more involved when you want to load non-sequential data as we’ll see next.

Accessing all elements in a column

Column data are not stored sequentially in memory, but they are spaced equally apart. Just like before, we’ll get the pointer to the first element:

The spacing between each element in the column is also called stride length and is equal to the number of columns in the above example. To load data in strided fashion you can use strided_load(src_ptr, stride).

Accessing arbitrary row and column slice

Now, things start to get more interesting. Here we have a combination of row and column slices: [1::2,::2]. For each row we have to:

  1. Get the pointer to the first element
  2. Get the data using strided_load with column slice as the stride length

In the above example, to access the elements: [[4,6][12,14]] we follow the two steps, as I’ve also illustrated below:

Since each row can be computed independently we can parallelize across rows to get speedup!

All of this comes together in the __getitem__() function, as you can see with annotations below: 

Now it’s your turn!

Head over to GitHub to access the Jupyter Notebooks with the Matrix implementation we discussed in this blog post and test it with different matrix sizes and extend it for your application.

Conclusion

One of the benefits of using Mojo is that it doesn’t suffer from the two language problem. Most high-performance Python libraries are thin Python wrappers over complex C and C++ implementations and NumPy is no exception. This makes understanding what's under the hood difficult for a Python programmer, and harder to extend if you are not familiar with both languages. Mojo looks and reads like Python and offers lower-level features to manage memory, add types etc. You don’t have to use two separate languages to write performant code, you can do it all in Mojo! I hope you enjoyed this walkthrough, all the code shared in this blog post is available as a Jupyter Notebook on GitHub.

Don't forget to register for Modular’s first-ever annual conference, ModCon, happening on Dec 4th in San Francisco. Register now if you want to meet the team in person, get access to workshops and panels, epic swag, and even more surprises! We're also taking Mojo project submissions for ModCon Contest where you could win prizes and be invited to present on stage! Submit now!

Until next time! 🔥

Shashank Prasanna
,
AI Developer Advocate

Shashank Prasanna

AI Developer Advocate

Shashank is an engineer, educator and doodler. He writes and talks about machine learning, specialized machine learning hardware (AI Accelerators) and AI Infrastructure in the cloud. He previously worked at Meta, AWS, NVIDIA, MathWorks (MATLAB) and Oracle in developer relations and marketing, product management, and software development roles and hold an M.S. in electrical engineering.