Alink Versions Save

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

v1.5.1

2 years ago

Improve the performance of dl module.
Resolve many issues on Windows platform.
Add incremental training mode for LR, Softmax etc.
Improve the performance of graph-based random walk algorithms.

v1.5.0

2 years ago

Add timeseries ○ Add prophet model. #176 ○ Add AutoArima, Arima, HoltWinters, AutoGarch ○ Add LSTNet, DeepAR
Add deep learning module (Linux and MacOSX Intel chips)
Add MTable, Tensor
Add resource plugin
Improve usage of PyFlink in PyAlink, close #178

v1.4.0

2 years ago

Adapt flink 1.13.
Fixed some bugs
Add some feature engineering methods
Refine the documents of BatchOp/StreamOp
Add java demoes

v1.3.2

3 years ago

Release note:

SLF4J load error when run Java example #109
one hot encode a little optimization #112
Add quoting in mysql column name #159
fix some error (decimal exception & partition set invalid) #162
Fix partition overwrite in hive.
Upgrade the flink version from 1.12.0 to 1.12.1

v1.3.1

3 years ago

Adapt flink 1.12.
Add plugin of kafka.
Add s3 file system.
Add odps catalog.
Fix poisson and add glm model info.
Add multi-files in pipeline loader and local predictor loader.
Use legacy serializer to compatible with old ak format.
Change vector type as CompositeType and Change Sparse vector as pojo type.
Remove the REGEXP_REPLACE in sql selector for flink 1.12

v1.3.0

3 years ago

Add more model info batch op and support print model info in pipeline model.
Add recommendation module.
- Supported recommender are:
  - Als
  - Factorization Machines
  - ItemCF
  - UserCF
- Supported others function for recommendation module are:
  - Leave k-object out
  - Leave top k-object out
  - Ranking evaluation
  - Multi-Label evaluation
Add online learning algorithoms.
- ftrl model filter
Add a series of similarity algorithms.
- VectorNearestNeighbor
- TextSimilarity
- TextNearestNeighbor
- TextApproxNearestNeighbor
- StringSimilarity
- StringNearestNeighbor
- StringApproxNearestNeighbor
Add DocWordCountBatchOp，KeywordsExtractionBatchOp， TfidfBatchOp，WordCountBatchOp
Add KNN
Add GeoKMeans, Streaming Kmeans
Add model selctor algorithms.
- RandomSearchCV
- RandomSearchTVSplit
Add plugin in filesystem and catalog. Add catalogs of hive, mysql, derby and sqlite
PyAlink:
- Align with new functionalities in Java side, including new operators, catalog, plugin mechanism, and so on;
- For Flink version 1.9, PyAlink now depends on PyFlink directly, resulting in supporting flink run, and table-related operetions.
Fix some issues, optimize performance and add more parameters in linear and tree model
Add test utils module and optimize performance of unit tests.
Remove the db module.
Refine the save/load in pipeline and pipeline model. Use Ak as the default format for save/load.
Support load LocalPredictor from Ak file which saved on filesystem. This will avoid collect when load the LocalPredictor. see #78 #79
Add multi-threads in all mapper
Optimize memory usage of batch prediction.
Add pseudoInverse in matrix
Support that the sparse vector has not size
Fix sequencing issue when linkFrom the model info batch op
Optimize the format of lazy print.
Add Stopwatch and TimeSpan
Add serialVersionUID in all serializable classes.

v1.2.0

3 years ago

Adapt for Flink 1.11
- Flink API calls (#129), Hive connectors (#130) and kafka connector(#129) are adapted for Flink 1.11.
- Adjust FilePath of FileSystem for Flink 1.11 #131
Add Factorization Machines classification and regression #115
Support Lazy APIs for higher user interactivity and richer information. Lazy APIs enable intermediate outputs of the ML pipeline to be printed, collected, and post-processed along with the mainstream of data process. Such intermediate outputs include: ML model and training information, evaluation metrics, data statistics, etc.
- PyAlink supported
- Support Lazy APIs for BatchOperators and related methods in EstimatorBase/TransformerBase #116
- Add model information:
  - Linear model #118 #132
  - Tree model #125
  - PCA #117
  - ChisqSelector #117
  - VectorChisqSelector #117
  - KMeans #120
  - BisectingKMeans #120
  - NaiveBayes #122
  - Lda #122
  - GaussianMixture #120
  - OneHotEncoder #120
  - QuantileDiscretizer #120
  - MinMaxScaler #122
  - VectorMinMaxScaler #122
  - MaxAbsScaler #122
  - VectorMaxAbsScaler #122
  - StandardScaler #122
  - VectorStandardScaler #122
- Add training information:
  - word2vec #125
- Add statistics:
  - Correlation #117
  - Summary #117
- Add EvaluationMetrics #124
Add FileSystem APIs. #126 Using FileSystem APIs, users can process files on different file systems with unified and friendly experience. Such processing can be exists, isDir, list, read, write or other commonly functions used for files. Supported file system are:
- HDFS
- OSS
- Local
Add Ak source/sink and Csv source/sink support new FileSystem APIs. #126 Ak is a file format storing data together with its schema that can be written to filesystem. It makes the advantages of compressed, tabular data representation.The supported APIs are shown in the table below:

HDFS OSS Local

Ak source ✔️ ✔️ ✔️

Ak sink ✔️ ✔️ ✔️

Csv source ✔️ ✔️ ✔️

Csv sink ✔️ ✔️ ✔️
Support EqualWidthDiscretizer. #123
Feature Enhancements and API unification in Clustering. #121
Refine code of QuantileDiscretizer and OneHotEncoder #111
Fix predict stream op in alspredictstreamop.md #104

	HDFS	OSS	Local
Ak source	✔️	✔️	✔️
Ak sink	✔️	✔️	✔️
Csv source	✔️	✔️	✔️
Csv sink	✔️	✔️	✔️

v1.1.2

3 years ago

Add transformers among formats Vector, CSV, Json, KV, Columns and Triple #93 • Support AnyToAny transformation • Unified transformation params and easy use.
Support SQL select statements in the Pipeline and LocalPredictor #61 • Support flink planner built-in functions regarding individual rows: comparison, logical, arithmetic, string, temporal, conditional, type conversion, hash, etc. • Add alink_shaded/shaded_protobuf_java to support usage of native Calcite.
Support Hive source and sink #96 • Support Batch/Stream source&sink of Hive. • Support partition of table. • Simplify the dependence of Hive jar. • Support multiple versions: 2.0, 2.1, 2.2, 2.3, 3.0
Fix PyAlink starting and UDF issues on Windows #76, #77
Support BigInteger type in MySql source #86
Add open and close in mapper. #92
Add open function in SegmentMapper and StopwordsRemoverMapper #94
Unify HandleInvalid Params #95

v1.1.1

4 years ago

Enhancements & New Features

Optimize conversion between operator and dataframe
Auto-detect localIp when useRemoveEnv
Add enum type parameter #65 • Adapt enum type params in quantile, distance and decision tree. #67 • linear model train params change to enum #71 • Kafka, StringIndexer and Join add enum parameters #72 • Adapt enum type params in pca, chi square test, glm and correlation. #73
streamop window group by #68
Add operators to parse strings in CSV, JSON and KV formats to columns #70
Tokenizer supports string split with multiple spaces #69
Make error message clear when selected columns are not found #66
Add an FTRL example #64

Fix dill version conflict
ALSExample error #33
Bug of HasVectorSize alias #56
mysqlsource error when i use collect method #45

v1.1.0

4 years ago

Enhancements & New Features

Improvement of UDF/UDTF operators, Java and PyAlink have consistent usage and behaviors. #32 #44.
Publish to maven central and PyPI.
Support Flink 1.10 and Flink 1.9. #46
- https://github.com/alibaba/Alink/releases/tag/v1.1.0-flink-1.10
- https://github.com/alibaba/Alink/releases/tag/v1.1.0-flink-1.9
Support more Kafka connectors. #41.

API change

Modify Naive Bayes algorithm as a text classifier. #47
Modify and enhance the parameter, model in QuantileDiscretizer, OneHotEncoder and Bucketizer. #48

Documentation

Update data links in docs and codes. #28
Update PyAlink install instructions. #8

Fix the problem in LDA online method and refine comments in FeatureLabelUtil. #29
Fit the bug that initial data of KMeansAssignCluster is not cleared. #31
Fix the int overflow bug in reading large csv file, and dd test cases for CsvFileInputSplit. See #27
Cleanup some code. #15
Remove a redundant test case whose data source is unaccessible. see #28
Fix the NEP in PCA. see #42

PyPI support

Support PyAlink installation using pip install pyalink

Maven Dependencies

Alink is now synchronized to the Maven central repository, which you can easily add to Maven projects.

With Flink-1.10

<dependency>
    <groupId>com.alibaba.alink</groupId>
    <artifactId>alink_core_flink-1.10_2.11</artifactId>
    <version>1.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-scala_2.11</artifactId>
    <version>1.10.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.11</artifactId>
    <version>1.10.0</version>
</dependency>

With Flink-1.9

<dependency>
    <groupId>com.alibaba.alink</groupId>
    <artifactId>alink_core_flink-1.9_2.11</artifactId>
    <version>1.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-scala_2.11</artifactId>
    <version>1.9.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.11</artifactId>
    <version>1.9.0</version>
</dependency>

Alink Versions Save

v1.5.1

v1.5.0

v1.4.0

v1.3.2

v1.3.1

v1.3.0

v1.2.0

v1.1.2

v1.1.1

Enhancements & New Features

Fix & Refinements

v1.1.0

Enhancements & New Features

API change

Documentation

Fix & Refinements

PyPI support

Maven Dependencies

With Flink-1.10

With Flink-1.9