MLX: An array framework for Apple silicon
mx.synchronize
to wait for computation dispatched with mx.async_eval
mx.radians
and mx.degrees
mx.metal.clear_cache
to return to the OS the memory held by MLX as a cache for future allocationslen
field in the buffer protocol implementationmx.async_eval
mx.metal.start_capture
and mx.metal.stop_capture
for GPU debug/profilemx.expm1
mx.std
mx.meshgrid
mx.random.multivariate_normal
mx.cumsum
(and other scans) for bfloat
nn.upsample
support bicubic interpolationModule.set_dtype
nn.Module
(model.freeze().update(…)
)mx.fast.rms_norm
and mx.fast.layer_norm
__setitem__
(e.g. a[...] = b
)mx.inverse
, CPU onlymx.matmul
and mx.addmm
mx.fast.rms_norm
, token generation benchmark
mx.fast.layer_norm
, token generation benchmark
mx.linalg.svd
(CPU only)nn.RNN
, nn.LSTM
, nn.GRU
mx.fast.scaled_dot_product_attention
fused opmx.fast.scaled_dot_product_attention
fused opmx.array
mx.topk
mx.where
properly handles inf
atleast_{1,2,3}d
accept any number of arraysnn.Upsample
layer
arange
throws on inf
inputslogsumexp
inf
edge caseinf
constantsmx.compile(function, shapeless=True)
mx.atleast_1d
, mx.atleast_2d
, mx.atleast_3d
tolist
with bfloat16
and float16
argmax
on M3mx.fast
subpackagemx.fast.rope
up to 20x faster
safetensors
bfloat16
quantizated matrix-vector multipliesmx.fast
subpackage with a fast RoPEmx.stream
to set the default deviceoptimizers.step_decay
optimizers.cosine_decay
opimtizers.exponential_decay