Flink Dynamic Storage Save Abandoned

An unified stream and batch storage for building streaming warehouse on Apache Flink

Project README

flink-dynamic-storage

An unified stream and batch storage for building streaming warehouse on Apache Flink.

What is Dynamic Storage

The Flink Dynamic Storage is designed to be the best connector to Flink as the storage for streaming warehouse. For high speed and large amount of data update & query capability, dynamic-storage uses a full LSM (Log-Structured Merge-Tree) structure. It consists of the following two components:

  • filestore: LSM + Columnar format, manage snapshots on DFS (No service).
  • logstore: As the write-ahead-log of the LSM. Also provides the ability to change tracking (streaming reading). Abstract implementation, using Apache Kafka by default.

The filestore and logstore interact with each other and complement each other:

  • Writing: The logstore can be used as a failover to replay data to the filestore
  • Reading: When a streaming read starts, the filestore is read first, followed by the logstore, ensure data integrity.

Dynamic-Storage supports the following usage:

  • Batch/OLAP Query: Read snapshot of the storage, efficient querying of real-time data.
  • Streaming Query: Read changes of the storage, ensure exactly-once consistency.
  • Lookup Query: LSM provides the ability to point query. If good performance is required, a certain amount of cache needs to be maintained on the query side.

Why is Dynamic Storage

The Dynamic-Storage is made for streaming data. Faced with real-time, massive updates. Moreover, it is mainly oriented towards Flink SQL, meets all Flink scenarios. Hope to illustrate the unique scenario of the Dynamic-Storage with the following comparison:

  • Compared to Hive Table: Dynamic-Storage supports massive update changes in stream data.
  • Compared to Hudi / Iceberg:
    • Dynamic-Storage uses LSM to support real-time updates at a smaller cost. Mainly oriented to real-time updates, if the updates are less and slow, more suitable to use Hudi / Iceberg. Orderliness will also help with the data skipping of reading.
    • Dynamic-Storage provides real-time stream reading down to the millisecond level (thanks to the implementation behind the logstore).
    • Dynamic-Storage uses LSM to support point lookup.
  • Compared to Clickhouse:
    • Both are LSM/Mergetree + Columnar storage
    • Clickhouse is MPP architecture. But Dynamic-Storage is built on DFS, serverless, low cost, compute nodes share everything. Dynamic-Storage is designed for flexibility and ease of use. Of course, without the cache on the server node, its OLAP performance is not as good as that of Clickhouse.

Development

If you use IntelliJ IDEA, you can refer to Flink's IntelliJ IDEA Setup.

Release

The Dynamic-Storage is planned to be released with Flink 1.15, which is currently under development and not released.

Open Source Agenda is not affiliated with "Flink Dynamic Storage" Project. README Source: flink-extended/flink-dynamic-storage
Stars
31
Open Issues
4
Last Commit
2 years ago
License

Open Source Agenda Badge

Open Source Agenda Rating