Self-contained Machine Learning and Natural Language Processing library in Go
With this release we introduce breaking changes that bring significant improvements to the project's structure, API and performance.
It would be difficult and confusing to list every single API change. Instead, the following sections will broadly describe the most relevant changes, arranged by topic.
Until this release, the project was essentially a monorepo in disguise: the core packages for handling matrices and computational graphs were accompanied by many models implementations (from the very simple up to the most sophisticated ones) and commands (models management utilities and servers).
We now prefer to keep in this very repository only the core components of spaGO, only enriched with an (opinionated) set of popular models and functionalities. Bigger sub-packages and related commands are moved to separate repositories. The moved content includes, most notably, code related to Transformers and Flair. Please refer to the section Projects using Spago from the README for an updated list of references to separate projects (note: some of them are still work in progress). If you have the feeling that something big is missing in spaGO, chances are it was moved to one of these separate projects: just have a look there first.
The arrangement of packages has been simplified: there's no need anymore to
distinguish between cmd
and pkg
; all the main subpackages are located in
the project's root path. Similarly, many packages, previously nested under
pkg/ml
, can now be found at root level too.
The minimum required Go version is 1.18
, primarily needed for the
introduction of type parameters (generics).
Thanks to the creation of separate projects, discussed above, and further refactoring, the main set of required dependencies is limited to the ones for testing.
Only the subpackage embeddings/store/diskstore
requires something more, so
we defined it as "opt-in" submodule, with its own dependencies.
Instead of separate packages mat32
and mat64
, there is now a single unified
package mat
. Many parts of the implementation make use of type parameters
(generics), however the package's public API makes a rather narrow use of them.
In particular, we abstained from adding type parameters to widely-used types,
such as the Matrix
interface. Where suitable, we are simply favoring float64
values, the de-facto preferred floating point type in Go (just think about Go
math
package). For other situations, we introduced a new subpackage
mat/float
. It provides simple types, holding either float32
or float64
values, as scalars or slices, and makes it easy to convert values between
different precisions, all without making explicit use of generics.
This design prevents the excessive spreading of type arguments to tons of other
types that need to manipulate matrices, bot from other spaGO packages and
from your own code.
mat.Matrix
is the primary interface for matrices and vectors
throughout the project.mat.Dense
is the concrete implementation for a dense matrix.
Unlike the interface, it has a type argument to distinguish between float32
and float64
.mat.New***
(NewDense
, NewVecDense
, ...). Here you must choose
which data type to use, specifying it as type parameter (unless implicit).New***
methods
on the matrix instance itself, rather than their top-level function
counterparts.ag
now implicitly works in "define-by-run" mode only.
It's way more performant compared to the previous releases, and there would
be no significant advantage in re-using a pre-defined graph ("define-and-run").Graph
anymore! At least, not as a first citizen: an implicit
"virtual" graph is progressively formed each time an operation over some
nodes is applied. The virtual graph can be observed by simply walking the
tree of operations. Most methods of the former Graph are now simple
functions in the ag
package.sync.Pool
. The function ag.ReleaseGraph
operates on the
virtual graph described above, usually starting from the given output nodes.ag
, such as Add
,
Prod
, etc.), the related Function's Forward
procedure is performed
on a new goroutine. Nevertheless, it's always safe to ask for the Operator's
Value
without worries: if it's called too soon, the function will lock
until the result is computed, and only then return the value.GOMAXPROCS
variable.ag.Backward
or ag.BackwardMany
, specifying the output node (or nodes)
of your computation (such as loss values, in traditional scenarios).
The backward functions traverse the virtual graph and propagate the gradients,
leveraging concurrency and making use of goroutines and locks in a way that's
very similar to the forward procedure. The backward functions will lock and
wait until the whole gradients propagation is complete before returning.
The locking mechanism implemented in the nodes' Grad
methods, will still
prevent troubles in case your own code reads the gradients concurrently
(that would be very uncommon).ag.TimeStepHandler
,
and related functions, such as NodeTimeStep
. For performing a truncated
backpropagation, we provide the function ag.BackwardT
and
ag.BackwardManyT
: they work similarly to the normal backpropagation
functions described above, only additionally requiring a time-step
handler and the desired amount of back steps.ag.Var
, which accepts a Matrix value and
creates a new node-variable with gradients accumulation disabled by default.
To enable gradients propagation, or setting an explicit name (useful for
model params or constants), you can use the Variable's chainable methods
WithGrad
and WithName
. As a shortcut to create a scalar-matrix variable
you can use ag.Scalar
.ag/encoding
provides generic structures and functions to obtain
a sort of view of a virtual graph, with the goal of facilitating the
encoding/marshaling of a graph in various formats.
The package ag/encoding/dot
is a rewriting of the former pkg/ml/graphviz
,
that uses the ag/encoding
structures to represent a virtual graph in
Graphviz DOT format.nn
provides types and functions for defining and
handling models. Its subpackages are implementations of most common models.
The set of built-in models has been remarkably revisited, moving some of them
to separate projects, as previously explained.Model
interface has been extremely simplified: it only requires the
special empty struct Module
to be embedded in a model type. This is
necessary only to distinguish an actual model from any other struct, which
is especially useful for parameters traversal, or other similar operations.ag
, the models clearly don't need
to hold a reference to it anymore. Similarly, there is no need for any other
model-specific field, like the ones available from the former BaseModel
.
This implies the elimination of some seldomly used properties.
Notable examples are the "processing mode" (from the old Graph) and the time
step (from the old BaseModel).
In situations where a removed value or feature is still needed, we suggest to
either reintroduce the missing elements on the models that needs them, or
to extract them to separate types and functions. An example of
extracted behavior is the handling of time steps, already mentioned in the
previous section.nn/recurrent/...
: each model has a single-step
forward function (usually called Next
) that accepts a previous state
and returns a new one.Stack
Model, in favor of a new simple function nn.Forward
,
that operates on a slice of StandardModel
interfaces, connecting outputs to
inputs sequentially for each module.nn.Buffer
: it's a Node implementation that does
not require gradients, but can be serialized just like any other
parameter. This is useful, for example, to store constants, to track the mean
and std in batch norm layers, etc.
As a shortcut to create a Buffer with a scalar-matrix value you can use
nn.Const
.ForEachParam
and ForEachParamStrict
.
Furthermore, the new interface ParamsTraverser
allows to traverse a model's
parameters that are not automatically discovered by the traversal functions
via reflection. If a model implements this interface, the function
TraverseParams
will take precedence over the regular parameters visit.Apply
, which visits all sub-models of any Model.
Typical usages of this function include parameters initialization.Store
interface, defined in
package embeddings/store
, only requires an implementation to implement
a bunch of read/write functions for key/value pairs. Both keys and values
are just slice of bytes.
For example, in a typical scenario involving word embeddings, a key might
be a string
word converted to []byte
, and the value the byte-marshaled
representation of a vector (or a more complex struct also holding
other properties).Repository
, another interface
defined in embeddings/store
. A Repository is simply a provider for Stores,
where each Store is identified by a string
name.
For example, if we are going to use a relational database for storing
embeddings data, the Repository might establish the connection to the
database, whereas each Store might identify a separate table by name,
used for reading/writing data.embeddings/store/diskstore
is a Go submodule that stores data
on disk, using BadgerDB; this is comparable to the implementation
from previous releases.
The package embeddings/store/memstore
is a simple volatile in-memory
implementation; among other usages, it might be especially convenient for
testing.embeddings
implements the main embeddings Model
.
One Model can read and write data to a single Store, obtained from a
Repository by the configured name.
The model delegates to the embeddings Store the responsibility to actually
store the data; for this reason, the Store value on a Model is prevented
from being serialized (this is done with the utility type
embeddings/store.PreventStoreMarshaling
).Key
as type argument.Embedding
represents a single embedding value that can be handled
by a Model. It satisfies the interface nn.Param
, allowing seamless
integration with operations involving any other model. Behind the hood,
the implementation takes care of reading/writing data against a
Store, efficiently handling marshaling/unmarshaling and preventing
race conditions. The Value
and the Payload
(if any) are read/written
against the Store; the Grad
is only kept in memory. All properties
of different Embedding
instances for the same key are kept
synchronized upon changes.TraverseParams
allows these parameters to be discovered and
seen as if they were any other regular type of parameter. This is
especially important for operations such as embeddings optimization.Shared
structure
that prevents binary marshaling.gd
, with minor API changes.pkg/utils
. Some of its content was related
to functionalities now moved to separate projects. Any remaining useful code
has been refactored and moved to more appropriate places.ml/ag/encoding/dot
, for simple serialization of a Graph to
DOT (Graphviz) format.ml/nn/sgu
, implementing a Spatial Gating Unit (SGU) model.ml/nn/conv1x1
, implementing a simple 1-dimensional
1-sized-kernel convolution model.ml/nn/gmlp
, implementing a gMLP
model.ml/nn/activation/Model.Forward
now simply returns the input as it is if
the activation function is the identity.ml/losses.WeightedCrossEntropy()
ml/losses.FocalLoss()
ml/losses.WeightedFocalLoss()
nlp/sequencelabeler.LoadModel()
(it replaces Load()
and LoadEmbeddings()
)nlp/charlm.LoadModel()
nlp/transformers/bert.Model.PredictMLM()
nlp/transformers/bart/tasks
packagenlp/transformers/bert.Model.Vectorize()
ml/ag.Graph.Nodes()
and ml/ag.Nodes()
ml/nn.Model.Close()
ml/nn.ReifyForTraining()
and ml/nn.ReifyForInference()
ml/ag.Graph.Backward()
now panics if it is executed with nodes belonging to
different graphs.ml/graphviz
package allows exporting a Graph to Graphviz
DOT format. To make it possible,
we introduced a new go-mod dependency gographviz.Graph.NewVariableWithName()
and Graph.NewScalarWithName()
to create named Variables, and get the name of a Variable with
Variable.Name()
.UnaryElementwise
functions provided by the package ag/fn
have been
promoted to separate dedicated structs. This improves debuggability and you
can get appropriate function names when using reflection. Here is the full
list of the modified functions: Tan
, Tanh
, Sigmoid
, HardSigmoid
,
HardTanh
, ReLU
, Softsign
, Cos
, Sin
, Exp
, Log
, Neg
,
Reciprocal
, Abs
, Mish
, GELU
, Sqrt
, Swish
.
For the same reason, a dedicated Square
function is introduced, replacing
Prod
with both operands set to the same value.ml/ag
types Operator
, Variable
, Wrapper
are now public.ml/nn.Reify()
now expects a Graph and a Processing Mode arguments
instead of a Context
object (removed).ml/nn.BaseModel
has been modified, replacing the field Ctx Context
with
a direct reference to the model's Graph and the Processing Mode (fields G
and ProcessingMode
).nlp/sequencelabeler
,
nlp/transformers/bert
, and nlp/transformers/bart
.protoc-gen-go
v1.26.0 and
protoc
v3.16.0).nlp/sequencelabeler.Load()
and LoadEmbeddings()
(now replaced by
nlp/sequencelabeler.LoadModel()
)ml/nn.Context
(see related changes on Reify()
and BaseModel
)nlp.charlm.flair_converter.go
to import Flair character language models.nlp.transformer.generation
algorithms:
Generator.getTopKScoredTokens()
.Generator.updateTokensScores()
.mat32.Dense.Mul
when doing Matrix-Vector multiplication.math32
functions using chewxy/math32 functions.ag.Graph
efficiency:
ag.Graph.groupNodesByHeight()
.sync.pool
to reduce allocations of graph's operators.nlp.transformer.generation
package.nlp.tokenizers.sentencepiece
package.nlp.transformers.bart.head.conditionalgeneration
package.nn.Closer
interface (e.g. embeddings.Model
needs to close the underlying key-value store).pe.SinusoidalPositionalEncoder
(this implementation replaces unused pe.PositionalEncoder
and pe.AxialPositionalEncoder
)fn.NewSwish
into fn.NewSwishB
as this was the Swish variant with trainable parameters (B).ag.GetOpName
to match operator names in lower-case.pe.PositionalEncoder
and related functionspe.AxialPositionalEncoder
and related functionsnn.ScaledDotProductAttention
ReleaseMatrix
to packages mat32
and mat64
.Matrix
interface, from mat32
and mat64
: Minimum
, Maximum
, MulT
, Inverse
, DoNonZero
. However, the implementation on sparse matrices is not implemented yet (it always panics).Matrix
interface values over specific Dense
or Sparse
matrices, also avoiding unnecessary type casts. Relevant changes to the public API are listed below.
mat(32|64).Stack
function's arguments and returned value are now Matrix
interfaces, instead of explicit Dense
matrices.Dense.Minimum
and Dense.Maximum
, from packages mat32
and mat64
, return a Matrix
interface, instead of a specific Dense
type.fofe.EncodeDense
, fofe.Encode
, and fofe.BiEncode
are slices of Matrix
values, instead of Dense
or Sparse
.z
argument of the function fofe.Decode
is of type Matrix
, instead of Dense
.ml.optimizers.de
(Differential Evolution optimizer) API was changed handling Matrix
values, instead of specific Dense
matrices. Changes include: Member.TargetVector
, Member.DonorVector
, ScoredVector.Vector
, the vector
argument of NewMember
function, the solution
argument of score
and validate
functions passed to NewOptimizer
.PositionalEncoder.Cache
and AxialPositionalEncoder.Cache
are slices of Matrix
, instead of slices of Dense
.AxialPositionalEncoder.EncodingAt
returns a Matrix
value, instead of Dense
.nn.DumpParamsVector
returns a Matrix
value, instead of Dense
.vector
argument of the function nn.LoadParamsVector
is a Matrix
, instead of Dense
.value
argument of the method embeddings.Model.SetEmbedding
is of type Matrix
, instead of Dense
.evolvingembeddings.WordVectorPair.Vector
is Matrix
, instead of Dense
.