Tensors and dynamic Neural Networks in Mojo
Infermo is a Mojo library that provides two high-level features:
Mojo currently operates on CPU only. GPU support will come soon! Infermo is currently still a Proof-of-Concept, if you encounter any bugs, feel free to create an issue or a PR. Thank you for your contribution. :)
The operators listed below are methods of the Module
class, which orchestrates both forward and backward computations. Each operator accepts one or two Tensor
objects as input. All binary operators accept differently shaped Tensors via broadcasting.
Import the necessary parts from Infermo
from infermo import Module, Tensor, shape, linear, max, accuracy
Define the Model architecture (simple mlp with relu activations and biases)
struct Model:
var nn: Module
var input: Tensor
var true_vals: Tensor
var logits: Tensor
var loss: Tensor
var avg_acc: Float32
fn __init__(inout self):
self.input = Tensor(shape(64,784))
self.input.requires_grad = False
self.true_vals = Tensor(shape(64,10))
self.true_vals.requires_grad = False
self.nn = Module()
self.avg_acc = 0
# define Model architecture
var x = linear(self.nn,self.input, num_neurons=64, add_bias=True, activation='relu')
for i in range(2):
x = linear(self.nn,x, num_neurons=64, add_bias=True, activation='relu')
x = linear(self.nn,x,10,True,'none')
self.logits = self.nn.softmax(x)
self.loss = self.nn.ce(self.true_vals,self.logits)
@always_inline
fn forward(inout self, _input: DTypePointer[DType.float32], _true_vals: DTypePointer[DType.float32]) -> Tensor:
# fill the input and true_vals Tensors with theri data
self.nn.Tensors[0].set_data(_input) # bug!
self.true_vals.set_data(_true_vals)
# one forward pass through the network
self.nn.forward(self.logits)
# some additional ops, not necessary for the training, just for showing the accuracy
let one_hots = one_hot(self.logits)
self.avg_acc = accuracy(one_hots,self.true_vals)
return self.logits
@always_inline
fn backward(inout self):
self.nn.backward(self.loss)
@always_inline
fn step(inout self):
self.nn.optimize('sgd_momentum', lr = 0.0001, momentum = 0.9)
Read in the MNIST dataset from a file, initialize the Model and define the number of epochs, then let it train on a randomly generated batch of data.
fn main()raises:
# init
var dl = DataLoader('./datasets/mnist.txt')
var model = Model()
let num_epochs = 1000
var loss_sum: Float32 = 0
var avg_acc: Float32 = 0
let every = 100
for epoch in range(1,num_epochs+1):
# load a batch of images into the Model
let inputs = dl.load(
batch_size=64,
start=1, # regarding the columns of the dataset
end=785,
scalingFactor=Float32(1)/Float32(255)
)
# load the labels for the images (one_hot encded from 0 to 9)
let labels = dl.one_hot(
batch_size=64,
index=0, # regarding the columm of the labels in the dataset
ndims=10
)
let logits = model.forward(inputs,labels)
model.backward()
model.step()
loss_sum += model.loss.data.load(0)
avg_acc += model.avg_acc
if( epoch % every == 0):
print("Epoch", epoch,", AvgLoss =", loss_sum / every, ", AvgAccuracy =", avg_acc / every)
loss_sum = 0
avg_acc = 0
# logits.print_data()
# model.true_vals.print_data()
If that was a bit too much, here is a simpler example of a basic multiplication between two tensors and their respective gradient computation.
from infermo import Module, Tensor, shape
fn main():
# init
var nn = Module()
var A = Tensor(shape(2,5,3))
var B = Tensor(shape(1,3,4))
# specify tensor entries
A.fill(2)
B.fill(3)
# perform computation
var C = nn.matmul(A,B)
var D = nn.sum(C) # compute sum, since the gradient can only be computed of a scalar value
nn.forward(C)
# print result of matrix multiplication
C.print_data()
# compute gradients of A and B
nn.backward(D)
A.print_grad()
B.print_grad()
Make sure you have installed and configured the latest version of mojo on your environment
Clone the repository
git clone https://github.com/TilliFe/Infermo.git
Navigate to the cloned repository
cd Infermo
Once this is set up, you can directly try out one of the tests e.g. the MNIST training setup with the following command
mojo train_MNIST.mojo