MemTorch Versions Save

A Simulation Framework for Memristive Deep Learning Systems

v1.1.6

2 years ago

Added

The random_crossbar_init argument to memtorch.bh.Crossbar. If true, this is used to initialize crossbars to random device conductances in between 1/Ron and 1/Roff.
CUDA_device_idx to setup.py to allow users to specify the CUDA device to use when installing MemTorch from source.
Implementations of CUDA accelerated passive crossbar programming routines for the 2021 Data-Driven model.
A BiBTeX entry, which can be used to cite the corresponding OSP paper.

Fixed

In the getting started tutorial, Section 4.1 was a code cell. This has since been converted to a markdown cell.
OOM errors encountered when modeling passive inference routines of crossbars.

Enhanced

Templated quantize bindings and fixed semantic error in memtorch.bh.nonideality.FiniteConductanceStates.
The memory consumption when modeling passive inference routines.
The sparse factorization method used to solve sparse linear matrix systems.
The naive_program routine for crossbar programming. The maximum number of crossbar programming iterations is now configurable.
Updated ReadTheDocs documentation for memtorch.bh.Crossbar.
Updated the version of PyTorch used to build Python wheels from 1.9.0 to 1.10.0.

v1.1.5

2 years ago

Added

Partial support for the groups argument for convolutional layers.

Fixed

Patching procedure in memtorch.mn.module.patch_model and memtorch.bh.nonideality.apply_nonidealities to fix semantic error in Tutorial.ipynb.
Import statement in Exemplar_Simulations.ipynb.

Enhanced

Further modularized patching logic in memtorch.bh.nonideality.NonIdeality and memtorch.mn.Module.
Modified default number of worker in memtorch.utils from 2 to 1.

v1.1.4

2 years ago

Added

Added Patching Support for torch.nn.Sequential containers.
Added support for modeling source and line resistances for passive crossbars/tiles.
Added C++ and CUDA bindings for modeling source and line resistances for passive crossbars/tiles*.
Added a new MemTorch logo to README.md
Added the set_cuda_malloc_heap_size routine to patched torch.mn modules.
Added unit tests for source and line resistance modeling.
Relaxed requirements for programming passive crossbars/tiles.

*Note it is strongly suggested to set cuda_malloc_heap_size using m.set_cuda_malloc_heap_size manually when simulating source and line resistances using CUDA bindings.

Enhanced

Modularized patching logic in memtorch.bh.nonideality.NonIdeality and memtorch.mn.Module.
Updated ReadTheDocs documentation.
Transitioned from Gitter to GitHub Discussions for general discussion.

v1.1.3

2 years ago

Added

Added another version of the Data Driven Model defined using memtorch.bh.memrsitor.Data_Driven2021.
Added CPU- and GPU-bound C++ bindings for gen_tiles.
Exposed use_bindings.
Added unit tests for use_bindings.
Added exemptAssignees tag to scale.yml.
Created memtorch.map.Input to encapsulate customizable input scaling methods.
Added the force_scale input argument to the default scaling method to specify whether inputs are force scaled if they do not exceed max_input_voltage.
Added CPU and GPU bindings for tiled_inference.

Enhanced

Modularized input scaling logic for all layer types.
Modularized tile_inference for all layer types.
Updated ReadTheDocs documentation.

Fixed

Fixed GitHub Action Workflows for external pull requests.
Fixed error raised by memtorch.map.Parameter when p_l is defined.
Fixed semantic error in memtorch.cpp.gen_tiles.

v1.1.2

2 years ago

Added

C++ and CUDA bindings for memtorch.bh.crossbar.Tile.tile_matmul.

Using an NVIDIA GeForce GTX 1080, a tile shape of (25, 25), and two tensors of size (500, 500), the runtime of tile_matmul without quantization support is reduced by 2.45x and 5.48x, for CPU-bound and GPU-bound operation, respectively. With an ADC resolution of 4 bits and an overflow rate of 0.0, the runtime of tile_matmul with quantization support is reduced by 2.30x and 105.27x, for CPU-bound and GPU-bound operation, respectively.

Implementation	Runtime Without Quantization Support (s)	Runtime With Quantization Support (s)
Pure Python (Previous)	6.917784	27.099764
C++ (CPU-bound)	2.822265	11.736974
CUDA (GPU-bound)	1.262861	0.2574267

Eigen integration with C++ and CUDA bindings.
Additional unit tests.

Enhanced

Modularized C++ and CUDA quantize bindings.
Enhanced functionality of naive_progam and added additional input arguments to dictate logic for stuck devices.

Fixed

Removed debugging code from naive_progam.

v1.1.0

3 years ago

Added

Unit tests and removed system CUDA dependency;
Support for Conv1d and Conv3d Layers;
Legacy support;
MANIFEST.in and resolved header dependency;
Native toggle for forward_legacy and size arguments to tune;
codecov integration;
Support for all torch.distributions;
1R programming routine and non-linear device simulation during inference;
Stanford PKU and A Data Driven Verilog a ReRAM memristor models;
Modular crossbar tile support;
ADC and variable input voltage range support, and modularized all memtorch.mn modules;
cibuildwheel integration to automatically generate build wheels.

Enhanced

Mapping functionality;
Reduced pooling memory usage with maxtasksperchild;
Programming routine;
set_conductance;
apply_cycle_variability.

Fixed

Dimension mismatch error for convolutional layers with non-zero padding;
reg.coef_ and reg.intercept_ extraction process for N-dimensional arrays;
Various semantic errors.

v1.0.0

4 years ago