Ml Explore Mlx Versions Save

MLX: An array framework for Apple silicon

v0.12.0

2 weeks ago

Highlights

  • Faster quantized matmul

Core

  • mx.synchronize to wait for computation dispatched with mx.async_eval
  • mx.radians and mx.degrees
  • mx.metal.clear_cache to return to the OS the memory held by MLX as a cache for future allocations
  • Change quantization to always represent 0 exactly (relevant issue)

Bugfixes

  • Fixed quantization of a block with all 0s that produced NaNs
  • Fixed the len field in the buffer protocol implementation

v0.11.0

3 weeks ago

Core

  • mx.block_masked_mm for block-level sparse matrix multiplication
  • Shared events for synchronization and asynchronous evaluation

NN

  • nn.QuantizedEmbedding layer
  • nn.quantize for quantizing modules
  • gelu_approx uses tanh for consistency with PyTorch

v0.10.0

1 month ago

Highlights

  • Improvements for LLM generation
    • Reshapeless quant matmul/matvec
    • mx.async_eval
    • Async command encoding

Core

  • Slightly faster reshapeless quantized gemms
  • Option for precise softmax
  • mx.metal.start_capture and mx.metal.stop_capture for GPU debug/profile
  • mx.expm1
  • mx.std
  • mx.meshgrid
  • CPU only mx.random.multivariate_normal
  • mx.cumsum (and other scans) for bfloat
  • Async command encoder with explicit barriers / dependency management

NN

  • nn.upsample support bicubic interpolation

Misc

  • Updated MLX Extension to work with nanobind

Bugfixes

  • Fix buffer donation in softmax and fast ops
  • Bug in layer norm vjp
  • Bug initializing from lists with scalar
  • Bug in indexing
  • CPU compilation bug
  • Multi-output compilation bug
  • Fix stack overflow issues in eval and array destruction

v0.9.0

1 month ago

Highlights:

  • Fast partial RoPE (used by Phi-2)
  • Fast gradients for RoPE, RMSNorm, and LayerNorm

Core

  • More overhead reductions
  • Partial fast RoPE (fast Phi-2)
  • Better buffer donation for copy
  • Type hierarchy and issubdtype
  • Fast VJPs for RoPE, RMSNorm, and LayerNorm

NN

  • Module.set_dtype
  • Chaining in nn.Module (model.freeze().update(…))

Bugfixes

  • Fix set item bugs
  • Fix scatter vjp
  • Check shape integer overlow on array construction
  • Fix bug with module attributes
  • Fix two bugs for odd shaped QMV
  • Fix GPU sort for large sizes
  • Fix bug in negative padding for convolutions
  • Fix bug in multi-stream race condition for graph evaluation
  • Fix random normal generation for half precision

v0.8.0

1 month ago

Highlights

Core

Optimizers

  • Set minimum value in cosine decay scheduler

Bugfixes

  • Fix bug in multi-dimensional reduction

v0.7.0

1 month ago

Highlights

  • Perf improvements for attention ops:
    • No copy broadcast matmul (benchmarks)
    • Fewer copies in reshape

Core

  • Faster broadcast + gemm
  • mx.linalg.svd (CPU only)
  • Fewer copies in reshape
  • Faster small reductions

NN

  • nn.RNN, nn.LSTM, nn.GRU

Bugfixes

  • Fix bug in depth traversal ordering
  • Fix two edge case bugs in compilation
  • Fix bug with modules with dictionaries of weights
  • Fix bug with scatter which broke MOE training
  • Fix bug with compilation kernel collision

v0.6.0

2 months ago

Highlights:

  • Faster quantized matrix-vector multiplies
  • mx.fast.scaled_dot_product_attention fused op

Core

  • Memory allocation API improvements
  • Faster GPU reductions for smaller sizes (between 2 and 7x)
  • mx.fast.scaled_dot_product_attention fused op
  • Faster quantized matrix-vector multiplications
  • Pickle support for mx.array

NN

  • Dilation on convolution layers

Bugfixes

  • Fix mx.topk
  • Fix reshape for zero sizes

v0.5.0

2 months ago

Highlights:

  • Faster convolutions.
    • Up to 14x faster for some common sizes.
    • See benchmarks

Core

  • mx.where properly handles inf
  • Faster and more general convolutions
    • Input and kernel dilation
    • Asymmetric padding
    • Support for cross-correlation and convolution
  • atleast_{1,2,3}d accept any number of arrays

NN

  • nn.Upsample layer
    • Supports nearest neighbor and linear interpolation
    • Any number of dimensions

Optimizers

  • Linear schedule and schedule joiner:
    • Use for e.g. linear warmup + cosine decay

Bugfixes

  • arange throws on inf inputs
  • Fix Cmake build with MLX
  • Fix logsumexp inf edge case
  • Fix grad of power w.r.t. to exponent edge case
  • Fix compile with inf constants
  • Bug temporary bug in convolution

v0.4.0

2 months ago

Highlights:

  • Partial shapeless compilation
    • Default shapeless compilation for all activations
    • Can be more than 5x faster than uncompiled versions
  • CPU kernel fusion

Core

  • CPU compilation
  • Shapeless compilation for some cases
    • mx.compile(function, shapeless=True)
  • Up to 10x faster scatter: benchmarks
  • mx.atleast_1d, mx.atleast_2d, mx.atleast_3d

Bugfixes

  • Bug with tolist with bfloat16 and float16
  • Bug with argmax on M3

v0.3.0

2 months ago

Highlights:

  • mx.fast subpackage
  • Custom mx.fast.rope up to 20x faster

Core

  • Support metadata with safetensors
  • Up to 5x faster scatter and 30% faster gather
  • 40% faster bfloat16 quantizated matrix-vector multiplies
  • mx.fast subpackage with a fast RoPE
  • Context manager mx.stream to set the default device

NN

  • Average and Max pooling layers for 1D and 2D inputs

Optimizers

  • Support schedulers for e.g. learning rates
  • A few basic schedulers:
    • optimizers.step_decay
    • optimizers.cosine_decay
    • opimtizers.exponential_decay

Bugfixes

  • Fix bug in remainder with negative numerators and integers
  • Fix bug with slicing into softmax
  • Fix quantized matmuls with non 32 multiples