Dingodb Dingo Versions Save

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

v0.8.0

1 month ago

Release Notes v0.8.0

Major New Features

1. Distributed Transaction

The addition of distributed transaction capabilities meets the core ACID features of the database, ensuring the integrity and reliability of the database, and expands the range of applications.

Transaction-related interfaces are added to the Store layer/Index layer/Executor layer.
Provides the ability for garbage collection of distributed transaction data, cleaning up completed and no longer needed transaction data, freeing up storage space, and reducing storage space occupancy.
Transaction table creation: When creating a table, specify ENGINE=LSM_TXN to complete the creation.
Transaction commit methods:
- Explicit commit: Use the COMMIT command to complete the commit.
- Implicit commit: Use SQL commands (BEGIN, START TRANSACTION, etc.) to indirectly complete the commit.
- Auto commit: After INSERT/UPDATE/DELETE execution, the system automatically completes the commit.
Three transaction isolation levels: Read Committed, Repeatable Read.
Two transaction modes: Optimistic and Pessimistic.
Transaction locking mech anism: Provides table-level and row-level lock management. By locking tables/rows, it ensures transaction consistency and isolation, effectively avoiding data conflicts between concurrent transactions.
Deadlock detection mechanism: Supports periodic checking of lock resources and waiting relationships in the system to identify potential deadlock situations.

2. Compute Pushdown

Refactoring of compute pushdown, optimizing code execution logic, and improving data query performance.
Supports expression compute pushdown, handling execution with expression syntax to improve computational efficiency.
Supports Vector ScalarData operator pushdown: When performing vector approximate nearest neighbor search, filters scalar data to further select data that meets specific conditions.
Python SDK introduces the Self Query feature, providing filtering capabilities for vector data Scalar Data, satisfying specific query vector data scenarios.

Product Feature Enhancements

1. Data Storage Layer

1.1 Architecture optimization

Added encapsulation for google::protobuf::Closure to facilitate request statistics and log tracking.
Refactored the RawRocksEngine class by splitting it into multiple files based on functionality and supporting multi-column family mode to address the bloated issues of the current RawRocksEngine.
Refactored the StoreService/IndexService modules to unify the logic inside and outside the queue.
Refactored the Storage class by extracting the execution queue logic and placing it in the traffic control module.

1.2 Region Management (Merge & Split):

Optimized the region split strategy by introducing backward region splitting in addition to the existing strategy.
Added region merge functionality to the Store layer/Coordinator layer to dynamically adjust data and optimize storage space utilization.
Supported splitting in multi-column family mode, greatly improving scalability, performance, and reliability. Adopted a unified encoding format compatible with key encoding formats for distributed transactions.

1.3 Vector Indexing

Based on retrieval speed, a new IVF_FLAT vector indexing method based on inverted indexes is added, which is suitable for high-dimensional sparse vector data. It provides fast retrieval speed and good retrieval performance.
Based on memory, a new IVF_PQ vector indexing method is added, which is based on inverted indexes and product quantization. It is suitable for high-dimensional dense vector data and offers good search speed and low storage overhead.
Based on accuracy, a new BruteForce index is added, which is suitable for small-scale vector datasets or scenarios that require high search accuracy.

1.4 Storage Engines

Added B+Tree engine to optimize database query performance.
Added XDP engine to achieve high-performance data processing.
Diversified storage engine support, allowing users to specify specific storage engines based on their actual business needs.

1.5 Snapshot Capability Upgrade

Upgraded VectorIndex to support multi-column family storage.
Snapshot supports multi-column family storage mode and is compatible with key encoding formats for distributed transactions.
Implemented Fake Snapshot to reduce I/O burden.
Supported BaikalDB-style save/load snapshot.

2. Executor Execution Layer

2.1 Data Types

Added Blob data type for storing binary data such as images, audio, videos, etc.

2.2 SQL Syntax

SQL layer provides batch data import and export.
Added vector distance calculation functions:
- Inner product distance: ipDistance
- Euclidean distance: l2Distance
- Cosine distance: cosineDistance
Support vector queries without functions, allowing vector queries even without vector indexes.
Table supports Chinese for table creation, insertion, querying, updating, and deletion.
Distributed transaction-related parameters:
- Support transaction parameter settings at different levels: Global/Session.
- Timeout settings, supporting setting retry or blocking timeout, automatically rolling back after the timeout:
  - Lock_wait_timeout
  - Set [session | global] statement_timeout = timeout

2.3 Module Refactoring

Based on version-based new features, refactor existing modules such as Store/Task/Job/Calcite/Client for distributed transactions.
Integrate the client-side with the SQL execution layer to optimize the system architecture and reduce code redundancy.

3. SDK Layer

Added C++ SDK, enabling independent integration testing execution with Dingo-store based on the C++ SDK.

4. Operations and Monitoring

Visual web monitoring interface to monitor the real-time health status of Store, Executor, and Coordinator components, providing cluster-wide monitoring information.

v0.7.0

7 months ago

Release Notes v0.7.0

1.Store Storage Layer

1.1 Distributed Storage

Provide the ability to manage IndexRegions, supporting dynamic creation and deletion of IndexRegions.
Add functionality for Raft Snapshot creation and installation for IndexRegions, which helps generate and load snapshot data for IndexRegions, enhancing system reliability and recovery capabilities.
Introduce the Build, Rebuild, and Load functions for VectorIndex to enable efficient creation, reconstruction, and loading of vector indexes, facilitating similarity search of vector data.
Enhance the management capability of IndexRegion for capacity expansion and contraction, enabling dynamic adjustment of index size to accommodate changes in data scale.
Support automatic splitting of VectorIndex/ScalarIndex Regions for region partitioning based on data load and distribution.
Introduce a mechanism to load indexes only on the leader (saving memory), by concentrating index loading and maintenance tasks on the leader node to reduce memory consumption on other nodes.

1.2 Vector Index

Provide the ability to manage vector indexes, including operations such as creation, deletion, and querying of vector indexes.
Offer diverse types of vector indexes, including HNSW, FLAT, IVF_FLAT, and IVF_PQ.
Support read and write operations for scalar data, enabling mixed storage and fusion analysis of multimodal data.
Enable top-N similarity search capability.
Allow precise lookup based on ID.
Provide the ability to perform batch queries based on specified offsets.
Support pre-filtering in vector search by passing a scalar key during VectorSearch operation.
Support post-filtering in vector search by passing a scalar key during VectorSearch operation.

1.3 Scalar Index

Support the creation of indexes on non-vector columns, providing more efficient query and retrieval capabilities for non-vector data.
Provide the ability to manage scalar indexes, including operations such as creation, deletion, and querying of scalar indexes.
Support LSM Tree-type ScalarIndex, using LSM Tree as the underlying storage structure to build ScalarIndex.

1.4 Distributed Lock

Implement the Lease mechanism for distributed locks, allowing clients to acquire, release, and maintain distributed locks by managing the lifecycle and renewal of leases.
Support MVCC (Multi-Version Concurrency Control) for key-value storage: The Coordinator stores all change records for each key-value pair and generates a globally unique revision for each change.
Provide a simple and efficient OneTimeWatch mechanism for event notification scenarios that only require triggering once.

2. Executor Execution Layer

2.1 Data Types

Extend the Float data type to support high-dimensional data storage and processing for supporting vector databases.

2.2 SQL Syntax

Extend the CREATE TABLE statement to support creating scalar tables and vector tables.
Add new vector index query functions for retrieving vector data.
Introduce functions for text and image vectorization, converting text and images into vector representations.

2.3 SQL Optimizer

Support mapping statistics to Calcite selectivity calculations to accurately estimate query costs and select the optimal execution plan, thereby improving query performance and efficiency.
Support different types of statistics: general statistics (e.g., Integer, Double, Float), cm_sketch, histograms, and Calcite's default calculation for all types.
Introduce the ANALYZE TABLE command to collect statistics information, notifying the optimizer to collect and update statistics for specified tables.
Provide a custom CostFactory to implement RelOptCost, redefine interfaces such as isLe, isLt, multiply, plus, and minus.
Rewrite Dingo TableScan cost calculation.
Modify DingoLikeScan selectivity estimateRowCount calculation method.

2.4 Pushdown computation

Optimize the C++ layer serialization and deserialization logic by reducing the number of deserialized columns, shortening the deserialization time.
Add serialization and deserialization for List data type.
Optimize the C++ expressions to improve computation efficiency.
Support pushdown execution plan with a prefix selection to apply the query conditions to the data source as early as possible, reducing the number of rows that need to be read and processed.

2.5 Partitioning strategy

Add Hash-range partitioning strategy, which has some hashing properties to reduce data skew problems, achieving even distribution of data.

3. SDK Layer

3.1 Python SDK

Add a Python SDK client for communication with the server.
Provide Python SDK functionality for Index operations.
Support join operations in Python SDK.
Use the pip package management tool to publish the Python SDK, improving its usability, maintainability, and portability.
DingoDB-Python supports data serialization and deserialization using Proto.

3.2 Java SDK

Provide Java SDK functionality for Index operations.
Provide distance measurement API for vector modules.
Offer partitioning strategy based on Index, distributing data to different partitions based on the range of data index values, facilitating the proper configuration of partitioned data.
DingoClient provides the ability to merge multiple partitions into one, simplifying the merging process and improving data management efficiency.
Provide an index encoding mechanism based on AutoIncrement, automatically assigning a unique identifier to each new record to ensure that each record has a unique identifier.

4. Knowledge Assistant Support

Successfully integrated with the LangChain framework.
Added support for cosine similarity queries, expanding the vector index query capabilities to include cosine similarity queries. This is useful for retrieving data such as text and images.
Added a count interface to calculate the number of records in a data collection.
Added a scan interface for scanning data collections while also satisfying scalar-based data filtering operations.

v0.6.0

10 months ago

Release Notes v0.6.0

1 架构层

1.1 存储计算分离

1. 计算引擎（Executor）:接收基于MySQL协议和DingoDB自有协议的SQL，进行SQL解析、逻辑计划和执行计划生成，对接低层Store存储。
2. 分布式存储引擎（Store）：基于C++的高效分布式存储。整个存储层分为元数据存储和数据存储；存储层设计采用灵活扩展的方式，进行多种存储引擎的扩展，如Rocksdb, memory, xdp-rocks等。
3. 支持计算下推操作：为了高效的提升聚合、过滤操作带来的价值，提升计算的效率，存储层支持计算下推的逻辑实现；支持filter，count，sum, min, max等操作。

1.2 Raft升级

1. 提供Leader选举机制，支持多节点选举；
2. 提供日志复制，保证了系统的可靠性，有效防止数据丢失。
3. 提供高性能的Raft，采用多线程和异步IO，提高了系统的吞吐量和响应速度。
4. 提供Snapshot机制，用于恢复状态机的状态。能够减少日志的大小，从而提高性能，还可以用于在节点故障时快速恢复状态机的状态。
5. 提供集群扩缩容迁移能力，能够在不影响整个系统的稳定性和一致性前提下更加容易地添加或删除节点。

1.3 协议层支持MySQL协议

1. 提供MySQL Shell这种交互式命令行工具，用于高效管理和操作MySQL数据库；同时支持SSL加密，保证数据库的安全性。
2. 提供MySQL JDBC Driver 数据库连接驱动程序， 通过Java应用程序中的JDBC API访问MySQL数据库，用于连接和操作MySQL数据库。

1.4 集群运维监控

1. 提供可视化监控，涉及grafana监控、http监控，实现集群节点（磁盘、CPU、IO等）、表（分区、Region）、Region监控、raft group监控
2. 提供多种部署方案：单点、docker-compose、ansible多节点部署
3. 提供了集群在线扩容、缩容方案，进行集群扩缩容操作。

2 功能层

2.1 Common公共和基础模块

1. 支持手动调整日志级别，可以根据实际场景需要灵活地控制日志地详细程度，减少日志文件的大小和存储成本。
2. 优化Store & Dingo Client端错误码
3. 支持C++版本的数据序列化，将序列化后的数据按照序列化时的格式解析，然后将解析后的数据还原为原始数据，

2.2 Raft管理和分布式存储

1. 提供Snapshot机制，用于恢复状态机的状态。
2. 支持Region Split，当某个Region超过最大限制，系统自动将其分裂成多个Region，保证各个Region大小接近，有利于进行调度决策。
3. 支持Region Merge，当某个Region因为大量删除导致Region的大小变小是，系统会将较小的两个相邻Region进行合并。
4. 优化了Range范围校验规则，提高代码执行效率，从而缩短数据查询时间，对性能有极大提升。
5. 支持活配置dingo-store服务线程个数，用户可以根据实际场景需要调整线程个数
6. 支持服务在运行时指定故障点（failpoint），方便测试Corner case。
7. 支持Sotre & Region 的Metric信息管理
8. 支持算子计算下推，存储层提供基本的operator，DingoClient作为中间端的桥接器，支持SDK和SQL场景，操作类型如下：
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL
9. 支持自增ID Auto Increment ID，当创建一个具有自增列的表时，DingoDB 自动为插入到表中的每一行分配一个唯一的整数值。通过使用分布式序列生成器确保自增列的值在整个集群中都是唯一的。

2.3 SQL协议层

1. 重构Executor端架构，Executor端负责计算，用于解析并响应 Client 端 SQL 请求和其他管理请求
2. 兼容MySQL协议
3. 完成Calcite的升级，提高SQL端的执行效率
4. Metric表级信息采集
5. 网络传输层增加task响应机制（STOP/READY/QUIT）

2.4 SQL语法扩展

1. 扩展创建表时指定副本数和分区能力信息，附加相关的附属信息。
2. 扩展通过SQL进行Region分裂，实现数据分布管理，使用更加灵活易用。
3. 扩展MySQL协议相关语法
  - 支持查看全局/用户/会话变量
  - 支持设置全局/用户/会话变量
  - 支持查看表结构/指定列的信息
  - 支持查看表/用户创建语句
  - 支持设置mysql-driver空闲超时时长
  - 支持Sql的预处理

2.5 Java SDK层

SDK为开发人员创建的一组软件工具程序，通过特定API接口对数据库进行操作，开发者能够更加灵活且高效的执行数据库操作，降低学习成本，极大提升开发效率。以下是DingoDB SDK层支持的功能特性：

1. 支持通过DingoDB自研API接口执行Connect集群操作
2. 支持表操作（创建/删除）
3. 支持单条数据操作（查看/插入/删除/修改）
4. 支持批量数据操作（查看/插入/删除/修改）
5. 支持范围过滤后的聚合操作
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL

2.6 DevOPS层

可视化系统监控

节点信息监控，帮助用户更有效观察服务器节点状态变化
系统进程监控，帮助用户及时发现异常进程，用户及时响应处理

新增系统运维工具

支持多节点部署，使用ansible自动化运维工具，通过批量系统配置，程序部署，命令运行等功能来实现批量部署的功能。

新增DBA级别的系统管理工具

支持Leader的迁移，将同一group中的Leader切换到领一个follower节点，用于负载均衡或紧急情况下重启机器。
支持Region分裂/合并，系统自动将其分裂/合并成多个Region，保证各个Region大小接近，达到负载均衡。
支持节点扩缩容，用户可以根据实际场景中数据分布来决定是否增加或减少借点，从而做到负载均衡。
可视化Sechma/Table/Region管理，通过可视化工具有效监控Sechma/Table/Region信息。

dingo-v0.6.0

10 months ago

Release Notes v0.6.0

1 架构层

1.1 存储计算分离

1. 计算引擎（Executor）:接收基于MySQL协议和DingoDB自有协议的SQL，进行SQL解析、逻辑计划和执行计划生成，对接低层Store存储。
2. 分布式存储引擎（Store）：基于C++的高效分布式存储。整个存储层分为元数据存储和数据存储；存储层设计采用灵活扩展的方式，进行多种存储引擎的扩展，如Rocksdb, memory, xdp-rocks等。
3. 支持计算下推操作：为了高效的提升聚合、过滤操作带来的价值，提升计算的效率，存储层支持计算下推的逻辑实现；支持filter，count，sum, min, max等操作。

1.2 Raft升级

1. 提供Leader选举机制，支持多节点选举；
2. 提供日志复制，保证了系统的可靠性，有效防止数据丢失。
3. 提供高性能的Raft，采用多线程和异步IO，提高了系统的吞吐量和响应速度。
4. 提供Snapshot机制，用于恢复状态机的状态。能够减少日志的大小，从而提高性能，还可以用于在节点故障时快速恢复状态机的状态。
5. 提供集群扩缩容迁移能力，能够在不影响整个系统的稳定性和一致性前提下更加容易地添加或删除节点。

1.3 协议层支持MySQL协议

1. 提供MySQL Shell这种交互式命令行工具，用于高效管理和操作MySQL数据库；同时支持SSL加密，保证数据库的安全性。
2. 提供MySQL JDBC Driver 数据库连接驱动程序， 通过Java应用程序中的JDBC API访问MySQL数据库，用于连接和操作MySQL数据库。

1.4 集群运维监控

1. 提供可视化监控，涉及grafana监控、http监控，实现集群节点（磁盘、CPU、IO等）、表（分区、Region）、Region监控、raft group监控
2. 提供多种部署方案：单点、docker-compose、ansible多节点部署
3. 提供了集群在线扩容、缩容方案，进行集群扩缩容操作。

2 功能层

2.1 Common公共和基础模块

1. 支持手动调整日志级别，可以根据实际场景需要灵活地控制日志地详细程度，减少日志文件的大小和存储成本。
2. 优化Store & Dingo Client端错误码
3. 支持C++版本的数据序列化，将序列化后的数据按照序列化时的格式解析，然后将解析后的数据还原为原始数据，

2.2 Raft管理和分布式存储

1. 提供Snapshot机制，用于恢复状态机的状态。
2. 支持Region Split，当某个Region超过最大限制，系统自动将其分裂成多个Region，保证各个Region大小接近，有利于进行调度决策。
3. 支持Region Merge，当某个Region因为大量删除导致Region的大小变小是，系统会将较小的两个相邻Region进行合并。
4. 优化了Range范围校验规则，提高代码执行效率，从而缩短数据查询时间，对性能有极大提升。
5. 支持活配置dingo-store服务线程个数，用户可以根据实际场景需要调整线程个数
6. 支持服务在运行时指定故障点（failpoint），方便测试Corner case。
7. 支持Sotre & Region 的Metric信息管理
8. 支持算子计算下推，存储层提供基本的operator，DingoClient作为中间端的桥接器，支持SDK和SQL场景，操作类型如下：
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL
9. 支持自增ID Auto Increment ID，当创建一个具有自增列的表时，DingoDB 自动为插入到表中的每一行分配一个唯一的整数值。通过使用分布式序列生成器确保自增列的值在整个集群中都是唯一的。

2.3 SQL协议层

1. 重构Executor端架构，Executor端负责计算，用于解析并响应 Client 端 SQL 请求和其他管理请求
2. 兼容MySQL协议
3. 完成Calcite的升级，提高SQL端的执行效率
4. Metric表级信息采集
5. 网络传输层增加task响应机制（STOP/READY/QUIT）

2.4 SQL语法扩展

1. 扩展创建表时指定副本数和分区能力信息，附加相关的附属信息。
2. 扩展通过SQL进行Region分裂，实现数据分布管理，使用更加灵活易用。
3. 扩展MySQL协议相关语法
  - 支持查看全局/用户/会话变量
  - 支持设置全局/用户/会话变量
  - 支持查看表结构/指定列的信息
  - 支持查看表/用户创建语句
  - 支持设置mysql-driver空闲超时时长
  - 支持Sql的预处理

2.5 Java SDK层

1. 支持通过DingoDB自研API接口执行Connect集群操作
2. 支持表操作（创建/删除）
3. 支持单条数据操作（查看/插入/删除/修改）
4. 支持批量数据操作（查看/插入/删除/修改）
5. 支持范围过滤后的聚合操作
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL

2.6 DevOPS层

可视化系统监控

节点信息监控，帮助用户更有效观察服务器节点状态变化
系统进程监控，帮助用户及时发现异常进程，用户及时响应处理

新增系统运维工具

支持多节点部署，使用ansible自动化运维工具，通过批量系统配置，程序部署，命令运行等功能来实现批量部署的功能。

新增DBA级别的系统管理工具

支持Leader的迁移，将同一group中的Leader切换到领一个follower节点，用于负载均衡或紧急情况下重启机器。
支持Region分裂/合并，系统自动将其分裂/合并成多个Region，保证各个Region大小接近，达到负载均衡。
支持节点扩缩容，用户可以根据实际场景中数据分布来决定是否增加或减少借点，从而做到负载均衡。
可视化Sechma/Table/Region管理，通过可视化工具有效监控Sechma/Table/Region信息。

dingo-v0.5.0

1 year ago

Release Note - V0.5.0

一、SQL相关特性

支持like关键字的模糊查询
支持用户认证：用户的增删改查
支持用户权限赋予
支持集群认证
支持SQL批量插入
优化Calcite函数校验机制
错误码信息重构

二、元数据管理

将集群表粒度管理拆分到executor
废弃原有Dingo-jraft模块
Coordinator中将原有Dingo-jraft迁移至Dingo-mpu
支持基于SQL的元数据表查询

三、索引相关

支持索引的增删改查，提升查询性能
支持多种多索引类型：非主键索引和联合索引

四、SDK相关特性

支持基于链式表达式的计算，实现多种范围查找后的聚合计算、更新等
支持非主键列扫描、过滤计算
指标计算特性列表：

序号	函数	说明
1	Scan	扫描表中数据
2	Get	读取表中数据
3	Filter	根据条件过滤数据
4	Add	对列进行数值加操作
5	Put	向表中写入数据
6	Update	修改表中数据
7	Delete	删除表中数据
8	DeleteRange	范围删除表中数据
9	Max	对列与输入求最大值
10	Min	对列与输入求最小值
11	Avg	对列与输入求平均数
12	Sum	对列与输入求和
13	Count	计算记录条数
14	SortList	对输入的数值和已存储的数值按照数值大小进行排序，默认升序
15	DistinctList	对输入的数值和已存储的数值执行去重操作，对重复的数值只纪录一次
16	List	列表，基于输入的数值和已存储的数值，根据条件返回List结果
17	IncreaseCount	递增次数，序列中，存在相邻两点递增，统计相邻递增的次数
18	DecreaseCount	递减次数，序列中，存在相邻两点递减，统计相邻递减的次数
19	maxIncreaseCount	最大递增，序列中，每次连续递增中产生的递增次数的最大值
20	maxDecreaseCount	最大递减，序列中，每次连续递减中产生的递减次数的最大值

五、列存

支持基于Merge Tree的列式存储

六、分布式存储

解决RocksDB update/delete磁盘释放慢的问题
优化Prefix Scan
完成RocksDB版本升级
优化RocksDB的I/O流程
释放DeleteRange执行后的磁盘空间
RocksDB固定参数可配置

v0.5.0

1 year ago

Release Note - V0.5.0

一、SQL相关特性

支持like关键字的模糊查询
支持用户认证：用户的增删改查
支持用户权限赋予
支持集群认证
支持SQL批量插入
优化Calcite函数校验机制
错误码信息重构

二、元数据管理

将集群表粒度管理拆分到executor
废弃原有Dingo-jraft模块
Coordinator中将原有Dingo-jraft迁移至Dingo-mpu
支持基于SQL的元数据表查询

三、索引相关

支持索引的增删改查，提升查询性能
支持多种多索引类型：非主键索引和联合索引

四、SDK相关特性

支持基于链式表达式的计算，实现多种范围查找后的聚合计算、更新等
支持非主键列扫描、过滤计算
指标计算特性列表：

序号	函数	说明
1	Scan	扫描表中数据
2	Get	读取表中数据
3	Filter	根据条件过滤数据
4	Add	对列进行数值加操作
5	Put	向表中写入数据
6	Update	修改表中数据
7	Delete	删除表中数据
8	DeleteRange	范围删除表中数据
9	Max	对列与输入求最大值
10	Min	对列与输入求最小值
11	Avg	对列与输入求平均数
12	Sum	对列与输入求和
13	Count	计算记录条数
14	SortList	对输入的数值和已存储的数值按照数值大小进行排序，默认升序
15	DistinctList	对输入的数值和已存储的数值执行去重操作，对重复的数值只纪录一次
16	List	列表，基于输入的数值和已存储的数值，根据条件返回List结果
17	IncreaseCount	递增次数，序列中，存在相邻两点递增，统计相邻递增的次数
18	DecreaseCount	递减次数，序列中，存在相邻两点递减，统计相邻递减的次数
19	maxIncreaseCount	最大递增，序列中，每次连续递增中产生的递增次数的最大值
20	maxDecreaseCount	最大递减，序列中，每次连续递减中产生的递减次数的最大值

五、列存

支持基于Merge Tree的列式存储

六、分布式存储

解决RocksDB update/delete磁盘释放慢的问题
优化Prefix Scan
完成RocksDB版本升级
优化RocksDB的I/O流程
释放DeleteRange执行后的磁盘空间
RocksDB固定参数可配置

dingo-v0.4.1

1 year ago

1. Feature and Optimization about SQL

1.1 Features about SQL

1.1.1 Extended SQL Syntax

Support TTL when create table using options
Support to assign partitions when create table

1.1.2 Features about Complex Data Type

Support Operations about MAP
Support Operations about MultiSet
Support Operations about Array

1.1.3 Support to use variables in SQL statement, such as insert, select, delete.

1.1.4 Support stratagy to control messages transmitted between operators in execution plan

1.1.5 Support new SQL function

No	Function Name	Description about Function
1	pow(x,y)	The POW() function returns the value of a number raised to the power of another number
2	round(x,y)	The ROUND() function rounds a number to a specified number of decimal places
3	ceiling(x)	The CEILING() function returns the smallest integer value that is bigger than or equal to a number
4	floor(x)	The FLOOR() function returns the largest integer value that is smaller than or equal to a number
5	mod(x,y)	The MOD() function returns the remainder of a number divided by another number
6	abs(x)	The ABS() function returns the absolute (positive) value of a number.

1.2 Optimization about SQL

Optimizate query using range filter
Optimizate query about range scan
Optimizate type system about dingo internally
Optimization about SQL date/time/timestamp function

2. Operation of Key-Value

2.1 Equivalent operation of Key-Value and SQL

Support to do table operation using Key-Value API, such as create table, drop table
Support to insert, update, delete record in table using Key-Value API
Support to do table operation using Annotation API
Operations about table and record are equivalent between Key-Value API and SQL

2.2 Operation lists about Key-Value SQL

2.2.1 Basic Key-Value Operation

No	Function Name	Description about Function
1	put	insert or update records in table
2	get	query records by user key
3	delete	delete records by user key

2.2.2 Numerical operations

No	Funcation Name	Description about Function
1	add	add values on same data type
2	sum	calculate the summary of columns filtered by keys
3	max	calculate the max of columns filtered by keys
4	min	calculate the min of columns filtered by keys

2.2.3 Compound operation

No	Function Name	Description about Function
1	Operate	do multiple operations on a single record, the operation list can be numerical operation or basic operation
2	OperateList	do multiple operations on a single record
3	UDF	defined using LUA script to implement user define function

2.2.4 Collection operations

No	Type	Function Name	Description about Function
1	read	size	get size of the elements
2	read	get_all	get all the elements of collection
3	read	get_by_key	get all the elements of collection by input key
4	read	get_by_value	get all the elements of collection by input value
5	read	get_by_index_range	get all the elements of collection by range index
6	write	put	append a element to the end
7	write	clear	clear all the elements of collection
8	write	remove_by_key	remove the key from collection
9	write	remove_all_by_value	remove all records match the value
10	write	remove_by_index	remove record by index

2.2.5 Filter operations

DateFilter

Query records using range filter with Date type.

NumberRange

Query records using range filter with Numberic type.

StringRange

Query records using range filter with String type

ValueEquals

Query records with specifiy record value.

3. Optimization about Storage

3.1 Distributed Consistency Protocol

Refactor the implements of raft protocol to replace sofa-jraft
Refactor the implements about log replication and leader selection
Support new serialization about key and value

3.2 Improvement about Rocksdb

Rocksdb can load configuration by files
Support TTL features using user timestamp
Update Rocksdb version and release package about io.dingodb. on maven central

4. Other features

Support parameters using JDBC connection such as timeout
Support explain to view plan about Dingo SQL
Support to release related package to maven-central

No	Module	Description about module
1	dingo-driver-client	the jdbc driver client used by sql
2	dingo-sdk	the key-value sdk client to do operation about key-value
3	dingo-rocksdb	Extended features on rocksdb

v0.4.1

1 year ago

1. Feature and Optimization about SQL

1.1 Features about SQL

1.1.1 Extended SQL Syntax

Support TTL when create table using options
Support to assign partitions when create table

1.1.2 Features about Complex Data Type

Support Operations about MAP
Support Operations about MultiSet
Support Operations about Array

1.1.3 Support to use variables in SQL statement, such as insert, select, delete.

1.1.4 Support stratagy to control messages transmitted between operators in execution plan

1.1.5 Support new SQL function

No	Function Name	Description about Function
1	pow(x,y)	The POW() function returns the value of a number raised to the power of another number
2	round(x,y)	The ROUND() function rounds a number to a specified number of decimal places
3	ceiling(x)	The CEILING() function returns the smallest integer value that is bigger than or equal to a number
4	floor(x)	The FLOOR() function returns the largest integer value that is smaller than or equal to a number
5	mod(x,y)	The MOD() function returns the remainder of a number divided by another number
6	abs(x)	The ABS() function returns the absolute (positive) value of a number.

1.2 Optimization about SQL

Optimizate query using range filter
Optimizate query about range scan
Optimizate type system about dingo internally
Optimization about SQL date/time/timestamp function

2. Operation of Key-Value

2.1 Equivalent operation of Key-Value and SQL

Support to do table operation using Key-Value API, such as create table, drop table
Support to insert, update, delete record in table using Key-Value API
Support to do table operation using Annotation API
Operations about table and record are equivalent between Key-Value API and SQL

2.2 Operation lists about Key-Value SQL

2.2.1 Basic Key-Value Operation

No	Function Name	Description about Function
1	put	insert or update records in table
2	get	query records by user key
3	delete	delete records by user key

2.2.2 Numerical operations

No	Funcation Name	Description about Function
1	add	add values on same data type
2	sum	calculate the summary of columns filtered by keys
3	max	calculate the max of columns filtered by keys
4	min	calculate the min of columns filtered by keys

2.2.3 Compound operation

No	Function Name	Description about Function
1	Operate	do multiple operations on a single record, the operation list can be numerical operation or basic operation
2	OperateList	do multiple operations on a single record
3	UDF	defined using LUA script to implement user define function

2.2.4 Collection operations

No	Type	Function Name	Description about Function
1	read	size	get size of the elements
2	read	get_all	get all the elements of collection
3	read	get_by_key	get all the elements of collection by input key
4	read	get_by_value	get all the elements of collection by input value
5	read	get_by_index_range	get all the elements of collection by range index
6	write	put	append a element to the end
7	write	clear	clear all the elements of collection
8	write	remove_by_key	remove the key from collection
9	write	remove_all_by_value	remove all records match the value
10	write	remove_by_index	remove record by index

2.2.5 Filter operations

DateFilter

Query records using range filter with Date type.

NumberRange

Query records using range filter with Numberic type.

StringRange

Query records using range filter with String type

ValueEquals

Query records with specifiy record value.

3. Optimization about Storage

3.1 Distributed Consistency Protocol

Refactor the implements of raft protocol to replace sofa-jraft
Refactor the implements about log replication and leader selection
Support new serialization about key and value

3.2 Improvement about Rocksdb

Rocksdb can load configuration by files
Support TTL features using user timestamp
Update Rocksdb version and release package about io.dingodb. on maven central

4. Other features

Support parameters using JDBC connection such as timeout
Support explain to view plan about Dingo SQL
Support to release related package to maven-central

No	Module	Description about module
1	dingo-driver-client	the jdbc driver client used by sql
2	dingo-sdk	the key-value sdk client to do operation about key-value
3	dingo-rocksdb	Extended features on rocksdb

dingo-v0.3.0

1 year ago

1.Semantics and Function of SQL

1.1 New data type

Boolean
Date: default format yyyy-MM-dd
Time: default format HH:mm:ss
Timestamp: default format yyyy-MM-dd HH:mm:ss.SSS

1.2 Allow assigning a default value to column, either constant or internal functions

1.3 Support Join operation

Inner Join
Left Join
Right Join
Full Join
Cross Join

1.4 Function list about String

No	Function Names	Notes about Function
1	Concat	Adds two or more expressions together
2	Format	Formats a number to a format like "#,###,###.##", rounded to a specified number of decimal places
3	Locate	The LOCATE() function returns the position of the first occurrence of a substring in a string
4	Lower	Converts a string to lower-case
5	Lcase	Converts a string to lower-case
6	Upper	Converts a string to upper-case
7	Ucase	Converts a string to upper-case
8	Left	Extracts a number of characters from a string (starting from left)
9	Right	Extracts a number of characters from a string (starting from right)
10	Repeat	Repeats a string as many times as specified
11	Replace	Replaces all occurrences of a substring within a string, with a new substring
12	Trim	Removes leading and trailing spaces from a string
13	Ltrim	Removes leading spaces from a string
14	Rtrim	Removes trailing spaces from a string
15	Mid	Extracts a substring from a string (starting at any position)
16	Substring	Extracts a substring from a string (starting at any position)
17	Reverse	Reverses a string and returns the result

1.5 Function list about Date and Time

No	Function Names	Notes about Function
1	Now	Return current date and time
2	CurrentDate	Return the current date
3	Current_date	Return the current date
4	CurTime	Return the current time
5	Current_time	Return the current time
6	Current_timestamp	Return the current date and time
7	From_UnixTime	Convert unix time to timestamp
8	Unix_Timestamp	Format the time to unix timestamp
9	Date_Format	Formats a date
10	DateDiff	Returns the number of days between two date values
11	Time_Format	Formats a time by a specified format

2. Management of Replicator

2.1 Management of metadata

Physical table can be split into N partitions based on data size
Management of physical tables such as table creation time, table status, partition strategy, split conditions, etc

2.2 Scheduler of partition replicator

Support multiple partition modes, such as One table with one partition, One table with multiple partitions
Support multiple split strategies, such as auto-split or manually split by API
Support resource isolation between physical tables

2.3 Tools of partition management

Support to view status about partition, such as leader, follower, etc
Support to migrate, split partition by internal API
Support to view metrics about partition, such as write, read latency, size, record count

3. The data access method for DingoDB

3.1 JDBC mode

Support to connect to dingo by JDBC

3.2 SDK client mode

Support to put, get, and delete records to tables in dingo
Support to batch write records to tables in dingo

3.3 Import data from external

Support to import data from local files in CSV, JSON format
Support to import data from Kafka in JSON and Avro format

4. Tools and Monitor

Support to monitor dingo cluster by grafana and prometheus
Support to management partitions of the cluster by API
Support to adjust log level dynamically by tools
Support to deploy cluster by ansible or docker-compose
Newly add autotests more than 1300+

v0.3.0

1 year ago

1.Semantics and Function of SQL

1.1 New data type

Boolean
Date: default format yyyy-MM-dd
Time: default format HH:mm:ss
Timestamp: default format yyyy-MM-dd HH:mm:ss.SSS

1.2 Allow assigning a default value to column, either constant or internal functions

1.3 Support Join operation

Inner Join
Left Join
Right Join
Full Join
Cross Join

1.4 Function list about String

No	Function Names	Notes about Function
1	Concat	Adds two or more expressions together
2	Format	Formats a number to a format like "#,###,###.##", rounded to a specified number of decimal places
3	Locate	The LOCATE() function returns the position of the first occurrence of a substring in a string
4	Lower	Converts a string to lower-case
5	Lcase	Converts a string to lower-case
6	Upper	Converts a string to upper-case
7	Ucase	Converts a string to upper-case
8	Left	Extracts a number of characters from a string (starting from left)
9	Right	Extracts a number of characters from a string (starting from right)
10	Repeat	Repeats a string as many times as specified
11	Replace	Replaces all occurrences of a substring within a string, with a new substring
12	Trim	Removes leading and trailing spaces from a string
13	Ltrim	Removes leading spaces from a string
14	Rtrim	Removes trailing spaces from a string
15	Mid	Extracts a substring from a string (starting at any position)
16	Substring	Extracts a substring from a string (starting at any position)
17	Reverse	Reverses a string and returns the result

1.5 Function list about Date and Time

No	Function Names	Notes about Function
1	Now	Return current date and time
2	CurrentDate	Return the current date
3	Current_date	Return the current date
4	CurTime	Return the current time
5	Current_time	Return the current time
6	Current_timestamp	Return the current date and time
7	From_UnixTime	Convert unix time to timestamp
8	Unix_Timestamp	Format the time to unix timestamp
9	Date_Format	Formats a date
10	DateDiff	Returns the number of days between two date values
11	Time_Format	Formats a time by a specified format

2. Management of Replicator

2.1 Management of metadata

Physical table can be split into N partitions based on data size
Management of physical tables such as table creation time, table status, partition strategy, split conditions, etc

2.2 Scheduler of partition replicator

Support multiple partition modes, such as One table with one partition, One table with multiple partitions
Support multiple split strategies, such as auto-split or manually split by API
Support resource isolation between physical tables

2.3 Tools of partition management

Support to view status about partition, such as leader, follower, etc
Support to migrate, split partition by internal API
Support to view metrics about partition, such as write, read latency, size, record count

3. The data access method for DingoDB

3.1 JDBC mode

Support to connect to dingo by JDBC

3.2 SDK client mode

Support to put, get, and delete records to tables in dingo
Support to batch write records to tables in dingo

3.3 Import data from external

Support to import data from local files in CSV, JSON format
Support to import data from Kafka in JSON and Avro format

4. Tools and Monitor

Support to monitor dingo cluster by grafana and prometheus
Support to management partitions of the cluster by API
Support to adjust log level dynamically by tools
Support to deploy cluster by ansible or docker-compose
Newly add autotests more than 1300+