The world’s fastest inference engine. Accelerate your AI deployment.

The MAX Engine executes all of your TensorFlow and PyTorch models with no model rewriting or conversions. Bring your model as-is and deploy it anywhere, across server and edge, with unparalleled usability and performance.

Modular
Accelerated
Xecution


MAX Engine is everything you need to deploy low-latency, high-throughput inference pipelines into production. Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.

Support all your generative and traditional AI use cases

MAX provides drop-in compatibility with any model from any framework.  Support for all the framework operations, quantized types, dynamics shapes, and your custom operations.

Sign up now

Train in any framework,
deploy anywhere

Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.

Cloud & On-Prem
Frameworks
modular Engine
Server & edge

Graph APIs

Low level control over the engine with minimal external dependencies, direct programmability over hardware, including high level abstractions and the ability to drop down when you need it.

Deploy directly to cloud

MAX is free to download locally on your machine for development and experimentation, and can be deployed to via our production BYOC Cloud SaaS offering.

Maximize performance, minimize costs

Reduce latency, increase throughput, and improve resource efficiency across CPUs, GPUs, and accelerators. Productionize larger models and significantly reduce your computing costs.

Explore our performance
125 qps
TensorFlow
17
qps
PyTorch
28
qps
Modular Engine
125
qps
$ 0.12
TensorFlow
$0.89
PyTorch
$0.54
Modular Engine
$0.12
* Model
DLRM RMC1
Instance
AWS c6g.4xlarge (Graviton2)
Batch Size
1
Model Family
 vs
 vs
 vs
 vs
 vs
 vs
Language Model
3x
3.2x
5.3x
1.4x
2.1x
4x
Recommender Models
6.5x
5x
7.5x
1.1x
1.2x
4.3x
Vision Models
2.1x
2.2x
1.7x
1.5x
1.5x
1.3x
Compute Type
Intel (c5.4xlarge)
AMD (c5a.4xlarge)
ARM (c6g.4xlarge)
Intel (c5.4xlarge)
AMD (c5a.4xlarge)
ARM (c6g.4xlarge)

Works with your existing AI libraries and tools

Modular is designed to drop into your existing workflows and use cases. Our tools are... well... modular. They integrate with industry-standard infrastructure and open-source tools to minimize migration cost.

Contact Sales

01.

Easily integrate the engine into your own custom server image or use Modular's off-the-shelf NVIDIA Triton and TensorFlow-Serving builds.

02.

Deploy the engine on-prem, in your own VPC on any major cloud provider, or get up and running quicker with our hosted solutions.

03.

The MAX Engine works with industry-standard open-source tooling like Prometheus and Grafana, and seamlessly integrates.

Engine works with all the rest of the suite

Modular MAX Engine can be used in combination with MAX Serving and is extensible by Mojo 🔥 the fastest and most portable programming language for your AI applications.

Our engine integrates with the rest of our suite of MAX products, while being usable on its own.

Ready to try a preview?

Sign up right now and try the MAX Engine yourself.

Read the MAX Engine docs