FastText Versions Save

Library for fast text representation and classification.

v0.9.2

4 years ago

We are happy to announce the release of version 0.9.2.

WebAssembly

We are excited to release fastText bindings for WebAssembly. Classification tasks are widely used in web applications and we believe giving access to the complete fastText API from the browser will notably help our community to build nice tools. See our documentation to learn more.

Autotune: automatic hyperparameter optimization

Finding the best hyperparameters is crucial for building efficient models. However, searching the best hyperparameters manually is difficult. This release includes the autotune feature that allows you to find automatically the best hyperparameters for your dataset. You can find more information on how to use it here.

Python

fastText loves Python. In this release, we have:

several bug fixes for prediction functions
nearest neighbors and analogies for Python
a memory leak fix
website tutorials with Python examples

The autotune feature is fully integrated with our Python API. This allows us to have a more stable autotune optimization loop from Python and to synchronize the best hyper-parameters with the _FastText model object.

Pre-trained models tool

We release two helper scripts:

download_model.py to automatically download pre-trained vectors from our website
reduce_model.py to reduce the word-vectors' size using PCA.

They can also be used directly from our Python API.

More metrics

When you test a trained model, you can now have more detailed results for the precision/recall metrics of a specific label or all labels.

Paper source code

This release contains the source code of the unsupervised multilingual alignment paper.

Community feedback and contributions

We want to thank our community for giving us feedback on Facebook and on GitHub.

v0.9.1

4 years ago

We are happy to announce the release of version 0.9.1.

New release of python module

The main goal of this release is to merge two existing python modules: the official fastText module which was available on our github repository and the unofficial fasttext module which was available on pypi.org.

You can find an overview of the new API here, and more insight in our blog post.

Refactoring

This version includes a massive rewrite of internal classes. The training and test are now split into three different classes : Model that takes care of the computational aspect, Loss that handles loss and applies gradients to the output matrix, and State that is responsible of holding the model's state inside each thread.

That makes the code more straighforward to read but also gives a smaller memory footprint, because the data needed for loss computation is now hold only once unlike before where there was one for each thread.

Misc

Compilation issues fix for recent versions of Mac OS X.
Better unicode handling :
- on_unicode_error argument that helps to handle unicode issues one can face with some datasets
- bug fix related to different behaviour of pybind11's py::str class between python2 and python3
script for unsupervised alignment
public file hosting changed from aws to fbaipublicfiles
we added a Code of Conduct file.

Thank you !

As always, we want to thank you for your help and your precious feedback which helps making this project better.

v0.2.0

5 years ago

We are happy to announce the change of the license from BSD+patents to MIT and the release of fastText 0.2.0.

The main purpose of this release is to set a beta C++ API of the FastText class. The class now behaves as a computational library: we moved the display and some usage error handlings outside of it (mainly to main.cc and fasttext_pybind.cc). It is still compatible with older versions of the class, but some methods are now marked as deprecated and will probably be removed in the next release.

In this respect, we also introduce the official support for python. The python binding of fastText is a client of the FastText class.

Here is a short summary of the 104 commits since 0.1.0 :

New :

Introduction of the “OneVsAll” loss function for multi-label classification, which corresponds to the sum of binary cross-entropy computed independently for each label. This new loss can be used with the -loss ova or -loss one-vs-all command line option ( 8850c51b972ed68642a15c17fbcd4dd58766291d ).
Computation of the precision and recall metrics for each label ( be1e597cb67c069ba9940ff241d9aad38ccd37da ).
Removed printing functions from FastText class ( 256032b87522cdebc4850c99b204b81b3255cb2a ).
Better default for number of threads ( 501b9b1e4543fd2de55e4a621a9924ce7d2b5b17 ).
Python support ( f10ec1faea1605d40fdb79fe472cc2204f3d584c ).
More tests for circleci/python ( eb9703a4a7ed0f7559d6f341cc8e5d166d5e4d88, 97fcde80ea107ca52d3d778a083564619175039c, 1de0624bfaff02d91fd265f331c07a4a0a7bb857 ).

Bug fixes :

Normalize buffer vector in analogy queries.
Typo fixes and clarifications on website.
Improvements on python install issues : setup.py OS X compiler flags, pybind11 include.
Fix: getSubwords for EOS.
Fix: ETA time.
Fix: division by 0 in word analogy evaluation.
Fix for the infinite loop on ARM cpu.

Operations :

We released more pre-trained vectors (92bc7d230959e2a94125fbe7d3b05257effb1111, 5bf8b4c615b6308d76ad39a5a50fa6c4174113ea ).

Worth noting :

We added circleci build badges to the README.md
We modified the style to be in compliance with Facebook C++ style.
We added coverage option for Makefile and setup.py in order to build for measuring the coverage.

Thank you fastText community!

We want to thank you all for being a part of this community and sharing your passion with us. Some of these improvements would not have been possible without your help.

v0.1.0

6 years ago