Tensors and dynamic Neural Networks in Mojo

Infermo is a Mojo library that provides two high-level features:

  • Tensor computation
  • Deep neural networks built on a tape-based autograd system

Mojo currently operates on CPU only. GPU support will come soon! Infermo is currently still a Proof-of-Concept, if you encounter any bugs, feel free to create an issue or a PR. Thank you for your contribution. :)

Available Operators

The operators listed below are methods of the Module class, which orchestrates both forward and backward computations. Each operator accepts one or two Tensor objects as input. All binary operators accept differently shaped Tensors via broadcasting.

  • matmul: Performs matrix multiplication of two tensors.
  • conv_2d: Applies a 2D convolution over an input signal composed of several input planes.
  • max_pool_2d: Applies a 2D max pooling over an input signal composed of several input planes.
  • sum: Computes the sum of all elements in the input tensor.
  • softmax: Applies a softmax function along the last dimension.
  • mse: Calculates the mean squared error between each element in the input x and target.
  • ce: Computes cross entropy loss, often used for classification problems.
  • reshape: Returns a tensor with the same data and number of elements as input, but with the specified shape.
  • transpose: Transposes a Tensor along the last two dimensions.
  • mean: Computes the mean value along a list of dimensions. (TODO: backward)
  • variance: Computes the variance value along a list of dimensions. (TODO: backward)
  • std: Computes the standard deviation along a list of dimensions. (TODO: backward)
  • mul: Performs element-wise multiplication of two tensors.
  • add: Performs element-wise addition of two tensors.
  • sub: Performs element-wise subtraction of two tensors.
  • div: Performs element-wise division of two tensors.
  • sqrt: Elemtwise square root computation.
  • abs: Computes the absolute value of each element in input.
  • pow: Elementwise pow operation between two Tensors, or elemtwise raise to the power of some number.
  • exp2: Computes 2 raised to the power of each element in input.
  • exp: Computes exponential of each element in input.
  • log2: Computes logarithm base 2 of each element in input.
  • log: Computes natural logarithm ln(x) of each element in input.
  • sin, cos, tan, asin, acos, atan, sinh, cosh, tanh: Elementwise trigonometric functions.
  • relu: Applies the rectified linear unit function element-wise.
  • copy: Performs a deep copy of the input Tensor.

Advanced Operators

  • linear: This operator represents a dense layer of neurons.
  • mlp: Similar to the dense operator, but specifically tailored for use within a transformer block.
  • conv2d: Executes a convolution operation with a specified tensor and adds a bias if necessary.
  • transformer_block, embed, unembed, pos_embed: These are the fundamental building blocks of a Transformer model.
  • DataLoader: A utility for handling data. It reads, initializes, and loads data from a given .txt file. (TODO: dataset splitting, read from csv)

Example Code

Train a Neural Network on the MNIST dataset

Import the necessary parts from Infermo

from infermo import Module, Tensor, shape, linear, max, accuracy

Define the Model architecture (simple mlp with relu activations and biases)

struct Model:
    var nn: Module
    var input: Tensor
    var true_vals: Tensor
    var logits: Tensor
    var loss: Tensor
    var avg_acc: Float32

    fn __init__(inout self):
        self.input = Tensor(shape(64,784))
        self.input.requires_grad = False
        self.true_vals = Tensor(shape(64,10))
        self.true_vals.requires_grad = False
        self.nn = Module()
        self.avg_acc = 0

        # define Model architecture
        var x = linear(self.nn,self.input, num_neurons=64, add_bias=True, activation='relu')
        for i in range(2):
            x = linear(self.nn,x, num_neurons=64, add_bias=True, activation='relu')
        x = linear(self.nn,x,10,True,'none')
        self.logits = self.nn.softmax(x)
        self.loss = self.nn.ce(self.true_vals,self.logits)

    fn forward(inout self, _input: DTypePointer[DType.float32], _true_vals: DTypePointer[DType.float32]) -> Tensor:

        # fill the input and true_vals Tensors with theri data
        self.nn.Tensors[0].set_data(_input) # bug!

        # one forward pass through the network

        # some additional ops, not necessary for the training, just for showing the accuracy
        let one_hots = one_hot(self.logits)
        self.avg_acc = accuracy(one_hots,self.true_vals)

        return self.logits

    fn backward(inout self):

    fn step(inout self):
        self.nn.optimize('sgd_momentum', lr = 0.0001, momentum = 0.9)

Read in the MNIST dataset from a file, initialize the Model and define the number of epochs, then let it train on a randomly generated batch of data.

fn main()raises:

    # init
    var dl = DataLoader('./datasets/mnist.txt')
    var model = Model()

    let num_epochs = 1000
    var loss_sum: Float32 = 0
    var avg_acc: Float32 = 0
    let every = 100

    for epoch in range(1,num_epochs+1):
        # load a batch of images into the Model
        let inputs = dl.load(
            start=1, # regarding the columns of the dataset
        # load the labels for the images (one_hot encded from 0 to 9)
        let labels = dl.one_hot(
            index=0, # regarding the columm of the labels in the dataset
        let logits = model.forward(inputs,labels)

        loss_sum +=
        avg_acc += model.avg_acc
        if( epoch % every == 0):
            print("Epoch", epoch,", AvgLoss =", loss_sum / every, ", AvgAccuracy =", avg_acc / every)
            loss_sum = 0
            avg_acc = 0
            # logits.print_data()
            # model.true_vals.print_data()

Simple Example

If that was a bit too much, here is a simpler example of a basic multiplication between two tensors and their respective gradient computation.

from infermo import Module, Tensor, shape

fn main():
    # init
    var nn = Module()
    var A = Tensor(shape(2,5,3))
    var B = Tensor(shape(1,3,4))

    # specify tensor entries

    # perform computation
    var C = nn.matmul(A,B)
    var D = nn.sum(C) # compute sum, since the gradient can only be computed of a scalar value

    # print result of matrix multiplication

    # compute gradients of A and B


Make sure you have installed and configured the latest version of mojo on your environment

Clone the repository

git clone

Navigate to the cloned repository

cd Infermo

Once this is set up, you can directly try out one of the tests e.g. the MNIST training setup with the following command

mojo train_MNIST.mojo
