Alink Versions Save

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

v1.5.1

2 years ago
  1. Improve the performance of dl module.
  2. Resolve many issues on Windows platform.
  3. Add incremental training mode for LR, Softmax etc.
  4. Improve the performance of graph-based random walk algorithms.

v1.5.0

2 years ago
  1. Add timeseries ○ Add prophet model. #176 ○ Add AutoArima, Arima, HoltWinters, AutoGarch ○ Add LSTNet, DeepAR
  2. Add deep learning module (Linux and MacOSX Intel chips)
  3. Add MTable, Tensor
  4. Add resource plugin
  5. Improve usage of PyFlink in PyAlink, close #178

v1.4.0

2 years ago
  1. Adapt flink 1.13.
  2. Fixed some bugs
  3. Add some feature engineering methods
  4. Refine the documents of BatchOp/StreamOp
  5. Add java demoes

v1.3.2

3 years ago

Release note:

  1. SLF4J load error when run Java example #109
  2. one hot encode a little optimization #112
  3. Add quoting in mysql column name #159
  4. fix some error (decimal exception & partition set invalid) #162
  5. Fix partition overwrite in hive.
  6. Upgrade the flink version from 1.12.0 to 1.12.1

v1.3.1

3 years ago
  1. Adapt flink 1.12.
  2. Add plugin of kafka.
  3. Add s3 file system.
  4. Add odps catalog.
  5. Fix poisson and add glm model info.
  6. Add multi-files in pipeline loader and local predictor loader.
  7. Use legacy serializer to compatible with old ak format.
  8. Change vector type as CompositeType and Change Sparse vector as pojo type.
  9. Remove the REGEXP_REPLACE in sql selector for flink 1.12

v1.3.0

3 years ago
  1. Add more model info batch op and support print model info in pipeline model.
  2. Add recommendation module.
    • Supported recommender are:
      • Als
      • Factorization Machines
      • ItemCF
      • UserCF
    • Supported others function for recommendation module are:
      • Leave k-object out
      • Leave top k-object out
      • Ranking evaluation
      • Multi-Label evaluation
  3. Add online learning algorithoms.
    • ftrl model filter
  4. Add a series of similarity algorithms.
    • VectorNearestNeighbor
    • TextSimilarity
    • TextNearestNeighbor
    • TextApproxNearestNeighbor
    • StringSimilarity
    • StringNearestNeighbor
    • StringApproxNearestNeighbor
  5. Add DocWordCountBatchOp,KeywordsExtractionBatchOp, TfidfBatchOp,WordCountBatchOp
  6. Add KNN
  7. Add GeoKMeans, Streaming Kmeans
  8. Add model selctor algorithms.
    • RandomSearchCV
    • RandomSearchTVSplit
  9. Add plugin in filesystem and catalog. Add catalogs of hive, mysql, derby and sqlite
  10. PyAlink:
    • Align with new functionalities in Java side, including new operators, catalog, plugin mechanism, and so on;
    • For Flink version 1.9, PyAlink now depends on PyFlink directly, resulting in supporting flink run, and table-related operetions.
  11. Fix some issues, optimize performance and add more parameters in linear and tree model
  12. Add test utils module and optimize performance of unit tests.
  13. Remove the db module.
  14. Refine the save/load in pipeline and pipeline model. Use Ak as the default format for save/load.
  15. Support load LocalPredictor from Ak file which saved on filesystem. This will avoid collect when load the LocalPredictor. see #78 #79
  16. Add multi-threads in all mapper
  17. Optimize memory usage of batch prediction.
  18. Add pseudoInverse in matrix
  19. Support that the sparse vector has not size
  20. Fix sequencing issue when linkFrom the model info batch op
  21. Optimize the format of lazy print.
  22. Add Stopwatch and TimeSpan
  23. Add serialVersionUID in all serializable classes.

v1.2.0

3 years ago
  1. Adapt for Flink 1.11

    • Flink API calls (#129), Hive connectors (#130) and kafka connector(#129) are adapted for Flink 1.11.
    • Adjust FilePath of FileSystem for Flink 1.11 #131
  2. Add Factorization Machines classification and regression #115

  3. Support Lazy APIs for higher user interactivity and richer information. Lazy APIs enable intermediate outputs of the ML pipeline to be printed, collected, and post-processed along with the mainstream of data process. Such intermediate outputs include: ML model and training information, evaluation metrics, data statistics, etc.

    • PyAlink supported
    • Support Lazy APIs for BatchOperators and related methods in EstimatorBase/TransformerBase #116
    • Add model information:
      • Linear model #118 #132
      • Tree model #125
      • PCA #117
      • ChisqSelector #117
      • VectorChisqSelector #117
      • KMeans #120
      • BisectingKMeans #120
      • NaiveBayes #122
      • Lda #122
      • GaussianMixture #120
      • OneHotEncoder #120
      • QuantileDiscretizer #120
      • MinMaxScaler #122
      • VectorMinMaxScaler #122
      • MaxAbsScaler #122
      • VectorMaxAbsScaler #122
      • StandardScaler #122
      • VectorStandardScaler #122
    • Add training information:
      • word2vec #125
    • Add statistics:
      • Correlation #117
      • Summary #117
    • Add EvaluationMetrics #124
  4. Add FileSystem APIs. #126 Using FileSystem APIs, users can process files on different file systems with unified and friendly experience. Such processing can be exists, isDir, list, read, write or other commonly functions used for files. Supported file system are:

    • HDFS
    • OSS
    • Local
  5. Add Ak source/sink and Csv source/sink support new FileSystem APIs. #126 Ak is a file format storing data together with its schema that can be written to filesystem. It makes the advantages of compressed, tabular data representation.The supported APIs are shown in the table below:

    HDFS OSS Local
    Ak source ✔️ ✔️ ✔️
    Ak sink ✔️ ✔️ ✔️
    Csv source ✔️ ✔️ ✔️
    Csv sink ✔️ ✔️ ✔️
  6. Support EqualWidthDiscretizer. #123

  7. Feature Enhancements and API unification in Clustering. #121

  8. Refine code of QuantileDiscretizer and OneHotEncoder #111

  9. Fix predict stream op in alspredictstreamop.md #104

v1.1.2

3 years ago
  1. Add transformers among formats Vector, CSV, Json, KV, Columns and Triple #93 • Support AnyToAny transformation • Unified transformation params and easy use.
  2. Support SQL select statements in the Pipeline and LocalPredictor #61 • Support flink planner built-in functions regarding individual rows: comparison, logical, arithmetic, string, temporal, conditional, type conversion, hash, etc. • Add alink_shaded/shaded_protobuf_java to support usage of native Calcite.
  3. Support Hive source and sink #96 • Support Batch/Stream source&sink of Hive. • Support partition of table. • Simplify the dependence of Hive jar. • Support multiple versions: 2.0, 2.1, 2.2, 2.3, 3.0
  4. Fix PyAlink starting and UDF issues on Windows #76, #77
  5. Support BigInteger type in MySql source #86
  6. Add open and close in mapper. #92
  7. Add open function in SegmentMapper and StopwordsRemoverMapper #94
  8. Unify HandleInvalid Params #95

v1.1.1

4 years ago

Enhancements & New Features

  1. Optimize conversion between operator and dataframe
  2. Auto-detect localIp when useRemoveEnv
  3. Add enum type parameter #65 • Adapt enum type params in quantile, distance and decision tree. #67 • linear model train params change to enum #71 • Kafka, StringIndexer and Join add enum parameters #72 • Adapt enum type params in pca, chi square test, glm and correlation. #73
  4. streamop window group by #68
  5. Add operators to parse strings in CSV, JSON and KV formats to columns #70
  6. Tokenizer supports string split with multiple spaces #69
  7. Make error message clear when selected columns are not found #66
  8. Add an FTRL example #64

Fix & Refinements

  1. Fix dill version conflict
  2. ALSExample error #33
  3. Bug of HasVectorSize alias #56
  4. mysqlsource error when i use collect method #45

v1.1.0

4 years ago

Enhancements & New Features

API change

  • Modify Naive Bayes algorithm as a text classifier. #47
  • Modify and enhance the parameter, model in QuantileDiscretizer, OneHotEncoder and Bucketizer. #48

Documentation

  • Update data links in docs and codes. #28
  • Update PyAlink install instructions. #8

Fix & Refinements

  • Fix the problem in LDA online method and refine comments in FeatureLabelUtil. #29
  • Fit the bug that initial data of KMeansAssignCluster is not cleared. #31
  • Fix the int overflow bug in reading large csv file, and dd test cases for CsvFileInputSplit. See #27
  • Cleanup some code. #15
  • Remove a redundant test case whose data source is unaccessible. see #28
  • Fix the NEP in PCA. see #42

PyPI support

  • Support PyAlink installation using pip install pyalink

Maven Dependencies

Alink is now synchronized to the Maven central repository, which you can easily add to Maven projects.

<dependency>
    <groupId>com.alibaba.alink</groupId>
    <artifactId>alink_core_flink-1.10_2.11</artifactId>
    <version>1.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-scala_2.11</artifactId>
    <version>1.10.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.11</artifactId>
    <version>1.10.0</version>
</dependency>
<dependency>
    <groupId>com.alibaba.alink</groupId>
    <artifactId>alink_core_flink-1.9_2.11</artifactId>
    <version>1.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-scala_2.11</artifactId>
    <version>1.9.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.11</artifactId>
    <version>1.9.0</version>
</dependency>