A Simulation Framework for Memristive Deep Learning Systems
random_crossbar_init
argument to memtorch.bh.Crossbar. If true, this is used to initialize crossbars to random device conductances in between 1/Ron and 1/Roff.CUDA_device_idx
to setup.py
to allow users to specify the CUDA
device to use when installing MemTorch
from source.memtorch.bh.nonideality.FiniteConductanceStates
.naive_program
routine for crossbar programming. The maximum number of crossbar programming iterations is now configurable.memtorch.bh.Crossbar
.PyTorch
used to build Python wheels from 1.9.0
to 1.10.0
.groups
argument for convolutional layers.memtorch.mn.module.patch_model
and memtorch.bh.nonideality.apply_nonidealities
to fix semantic error in Tutorial.ipynb
.Exemplar_Simulations.ipynb
.memtorch.bh.nonideality.NonIdeality
and memtorch.mn.Module
.memtorch.utils
from 2 to 1.torch.nn.Sequential
containers.README.md
set_cuda_malloc_heap_size
routine to patched torch.mn
modules.*Note it is strongly suggested to set cuda_malloc_heap_size
using m.set_cuda_malloc_heap_size
manually when simulating source and line resistances using CUDA bindings.
memtorch.bh.nonideality.NonIdeality
and memtorch.mn.Module
.ReadTheDocs
documentation.Gitter
to GitHub Discussions
for general discussion.memtorch.bh.memrsitor.Data_Driven2021
.gen_tiles
.use_bindings
.use_bindings
.exemptAssignees
tag to scale.yml
.memtorch.map.Input
to encapsulate customizable input scaling methods.force_scale
input argument to the default scaling method to specify whether inputs are force scaled if they do not exceed max_input_voltage
.tiled_inference
.tile_inference
for all layer types.memtorch.map.Parameter
when p_l
is defined.memtorch.cpp.gen_tiles
.memtorch.bh.crossbar.Tile.tile_matmul
.Using an NVIDIA GeForce GTX 1080, a tile shape of (25, 25), and two tensors of size (500, 500), the runtime of tile_matmul
without quantization support is reduced by 2.45x and 5.48x, for CPU-bound and GPU-bound operation, respectively. With an ADC resolution of 4 bits and an overflow rate of 0.0, the runtime of tile_matmul
with quantization support is reduced by 2.30x and 105.27x, for CPU-bound and GPU-bound operation, respectively.
Implementation | Runtime Without Quantization Support (s) | Runtime With Quantization Support (s) |
---|---|---|
Pure Python (Previous) | 6.917784 | 27.099764 |
C++ (CPU-bound) | 2.822265 | 11.736974 |
CUDA (GPU-bound) | 1.262861 | 0.2574267 |
Eigen
integration with C++ and CUDA bindings.quantize
bindings.naive_progam
and added additional input arguments to dictate logic for stuck devices.naive_progam
.codecov
integration;torch.distributions
;memtorch.mn
modules;cibuildwheel
integration to automatically generate build wheels.maxtasksperchild
;set_conductance
;apply_cycle_variability
.reg.coef_
and reg.intercept_
extraction process for N-dimensional arrays;