Dingodb Dingo Versions Save

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

v0.8.0

1 month ago

Release Notes v0.8.0

Major New Features

1. Distributed Transaction

The addition of distributed transaction capabilities meets the core ACID features of the database, ensuring the integrity and reliability of the database, and expands the range of applications.

  • Transaction-related interfaces are added to the Store layer/Index layer/Executor layer.
  • Provides the ability for garbage collection of distributed transaction data, cleaning up completed and no longer needed transaction data, freeing up storage space, and reducing storage space occupancy.
  • Transaction table creation: When creating a table, specify ENGINE=LSM_TXN to complete the creation.
  • Transaction commit methods:
    • Explicit commit: Use the COMMIT command to complete the commit.
    • Implicit commit: Use SQL commands (BEGIN, START TRANSACTION, etc.) to indirectly complete the commit.
    • Auto commit: After INSERT/UPDATE/DELETE execution, the system automatically completes the commit.
  • Three transaction isolation levels: Read Committed, Repeatable Read.
  • Two transaction modes: Optimistic and Pessimistic.
  • Transaction locking mech anism: Provides table-level and row-level lock management. By locking tables/rows, it ensures transaction consistency and isolation, effectively avoiding data conflicts between concurrent transactions.
  • Deadlock detection mechanism: Supports periodic checking of lock resources and waiting relationships in the system to identify potential deadlock situations.

2. Compute Pushdown

  • Refactoring of compute pushdown, optimizing code execution logic, and improving data query performance.
  • Supports expression compute pushdown, handling execution with expression syntax to improve computational efficiency.
  • Supports Vector ScalarData operator pushdown: When performing vector approximate nearest neighbor search, filters scalar data to further select data that meets specific conditions.
  • Python SDK introduces the Self Query feature, providing filtering capabilities for vector data Scalar Data, satisfying specific query vector data scenarios.

Product Feature Enhancements

1. Data Storage Layer

1.1 Architecture optimization

  • Added encapsulation for google::protobuf::Closure to facilitate request statistics and log tracking.
  • Refactored the RawRocksEngine class by splitting it into multiple files based on functionality and supporting multi-column family mode to address the bloated issues of the current RawRocksEngine.
  • Refactored the StoreService/IndexService modules to unify the logic inside and outside the queue.
  • Refactored the Storage class by extracting the execution queue logic and placing it in the traffic control module.

1.2 Region Management (Merge & Split):

  • Optimized the region split strategy by introducing backward region splitting in addition to the existing strategy.
  • Added region merge functionality to the Store layer/Coordinator layer to dynamically adjust data and optimize storage space utilization.
  • Supported splitting in multi-column family mode, greatly improving scalability, performance, and reliability. Adopted a unified encoding format compatible with key encoding formats for distributed transactions.

1.3 Vector Indexing

  • Based on retrieval speed, a new IVF_FLAT vector indexing method based on inverted indexes is added, which is suitable for high-dimensional sparse vector data. It provides fast retrieval speed and good retrieval performance.
  • Based on memory, a new IVF_PQ vector indexing method is added, which is based on inverted indexes and product quantization. It is suitable for high-dimensional dense vector data and offers good search speed and low storage overhead.
  • Based on accuracy, a new BruteForce index is added, which is suitable for small-scale vector datasets or scenarios that require high search accuracy.

1.4 Storage Engines

  • Added B+Tree engine to optimize database query performance.
  • Added XDP engine to achieve high-performance data processing.
  • Diversified storage engine support, allowing users to specify specific storage engines based on their actual business needs.

1.5 Snapshot Capability Upgrade

  • Upgraded VectorIndex to support multi-column family storage.
  • Snapshot supports multi-column family storage mode and is compatible with key encoding formats for distributed transactions.
  • Implemented Fake Snapshot to reduce I/O burden.
  • Supported BaikalDB-style save/load snapshot.

2. Executor Execution Layer

2.1 Data Types

  • Added Blob data type for storing binary data such as images, audio, videos, etc.

2.2 SQL Syntax

  • SQL layer provides batch data import and export.
  • Added vector distance calculation functions:
    • Inner product distance: ipDistance
    • Euclidean distance: l2Distance
    • Cosine distance: cosineDistance
  • Support vector queries without functions, allowing vector queries even without vector indexes.
  • Table supports Chinese for table creation, insertion, querying, updating, and deletion.
  • Distributed transaction-related parameters:
    • Support transaction parameter settings at different levels: Global/Session.
    • Timeout settings, supporting setting retry or blocking timeout, automatically rolling back after the timeout:
      • Lock_wait_timeout
      • Set [session | global] statement_timeout = timeout

2.3 Module Refactoring

  • Based on version-based new features, refactor existing modules such as Store/Task/Job/Calcite/Client for distributed transactions.
  • Integrate the client-side with the SQL execution layer to optimize the system architecture and reduce code redundancy.

3. SDK Layer

  • Added C++ SDK, enabling independent integration testing execution with Dingo-store based on the C++ SDK.

4. Operations and Monitoring

  • Visual web monitoring interface to monitor the real-time health status of Store, Executor, and Coordinator components, providing cluster-wide monitoring information.

v0.7.0

7 months ago

Release Notes v0.7.0

1.Store Storage Layer

1.1 Distributed Storage

  • Provide the ability to manage IndexRegions, supporting dynamic creation and deletion of IndexRegions.
  • Add functionality for Raft Snapshot creation and installation for IndexRegions, which helps generate and load snapshot data for IndexRegions, enhancing system reliability and recovery capabilities.
  • Introduce the Build, Rebuild, and Load functions for VectorIndex to enable efficient creation, reconstruction, and loading of vector indexes, facilitating similarity search of vector data.
  • Enhance the management capability of IndexRegion for capacity expansion and contraction, enabling dynamic adjustment of index size to accommodate changes in data scale.
  • Support automatic splitting of VectorIndex/ScalarIndex Regions for region partitioning based on data load and distribution.
  • Introduce a mechanism to load indexes only on the leader (saving memory), by concentrating index loading and maintenance tasks on the leader node to reduce memory consumption on other nodes.

1.2 Vector Index

  • Provide the ability to manage vector indexes, including operations such as creation, deletion, and querying of vector indexes.
  • Offer diverse types of vector indexes, including HNSW, FLAT, IVF_FLAT, and IVF_PQ.
  • Support read and write operations for scalar data, enabling mixed storage and fusion analysis of multimodal data.
  • Enable top-N similarity search capability.
  • Allow precise lookup based on ID.
  • Provide the ability to perform batch queries based on specified offsets.
  • Support pre-filtering in vector search by passing a scalar key during VectorSearch operation.
  • Support post-filtering in vector search by passing a scalar key during VectorSearch operation.

1.3 Scalar Index

  • Support the creation of indexes on non-vector columns, providing more efficient query and retrieval capabilities for non-vector data.
  • Provide the ability to manage scalar indexes, including operations such as creation, deletion, and querying of scalar indexes.
  • Support LSM Tree-type ScalarIndex, using LSM Tree as the underlying storage structure to build ScalarIndex.

1.4 Distributed Lock

  • Implement the Lease mechanism for distributed locks, allowing clients to acquire, release, and maintain distributed locks by managing the lifecycle and renewal of leases.
  • Support MVCC (Multi-Version Concurrency Control) for key-value storage: The Coordinator stores all change records for each key-value pair and generates a globally unique revision for each change.
  • Provide a simple and efficient OneTimeWatch mechanism for event notification scenarios that only require triggering once.

2. Executor Execution Layer

2.1 Data Types

  • Extend the Float data type to support high-dimensional data storage and processing for supporting vector databases.

2.2 SQL Syntax

  • Extend the CREATE TABLE statement to support creating scalar tables and vector tables.
  • Add new vector index query functions for retrieving vector data.
  • Introduce functions for text and image vectorization, converting text and images into vector representations.

2.3 SQL Optimizer

  • Support mapping statistics to Calcite selectivity calculations to accurately estimate query costs and select the optimal execution plan, thereby improving query performance and efficiency.
  • Support different types of statistics: general statistics (e.g., Integer, Double, Float), cm_sketch, histograms, and Calcite's default calculation for all types.
  • Introduce the ANALYZE TABLE command to collect statistics information, notifying the optimizer to collect and update statistics for specified tables.
  • Provide a custom CostFactory to implement RelOptCost, redefine interfaces such as isLe, isLt, multiply, plus, and minus.
  • Rewrite Dingo TableScan cost calculation.
  • Modify DingoLikeScan selectivity estimateRowCount calculation method.

2.4 Pushdown computation

  • Optimize the C++ layer serialization and deserialization logic by reducing the number of deserialized columns, shortening the deserialization time.
  • Add serialization and deserialization for List data type.
  • Optimize the C++ expressions to improve computation efficiency.
  • Support pushdown execution plan with a prefix selection to apply the query conditions to the data source as early as possible, reducing the number of rows that need to be read and processed.

2.5 Partitioning strategy

  • Add Hash-range partitioning strategy, which has some hashing properties to reduce data skew problems, achieving even distribution of data.

3. SDK Layer

3.1 Python SDK

  • Add a Python SDK client for communication with the server.
  • Provide Python SDK functionality for Index operations.
  • Support join operations in Python SDK.
  • Use the pip package management tool to publish the Python SDK, improving its usability, maintainability, and portability.
  • DingoDB-Python supports data serialization and deserialization using Proto.

3.2 Java SDK

  • Provide Java SDK functionality for Index operations.
  • Provide distance measurement API for vector modules.
  • Offer partitioning strategy based on Index, distributing data to different partitions based on the range of data index values, facilitating the proper configuration of partitioned data.
  • DingoClient provides the ability to merge multiple partitions into one, simplifying the merging process and improving data management efficiency.
  • Provide an index encoding mechanism based on AutoIncrement, automatically assigning a unique identifier to each new record to ensure that each record has a unique identifier.

4. Knowledge Assistant Support

  • Successfully integrated with the LangChain framework.
  • Added support for cosine similarity queries, expanding the vector index query capabilities to include cosine similarity queries. This is useful for retrieving data such as text and images.
  • Added a count interface to calculate the number of records in a data collection.
  • Added a scan interface for scanning data collections while also satisfying scalar-based data filtering operations.

v0.6.0

10 months ago

Release Notes v0.6.0

1 架构层

1.1 存储计算分离

1. 计算引擎(Executor):接收基于MySQL协议和DingoDB自有协议的SQL,进行SQL解析、逻辑计划和执行计划生成,对接低层Store存储。
2. 分布式存储引擎(Store):基于C++的高效分布式存储。整个存储层分为元数据存储和数据存储;存储层设计采用灵活扩展的方式,进行多种存储引擎的扩展,如Rocksdb, memory, xdp-rocks等。
3. 支持计算下推操作:为了高效的提升聚合、过滤操作带来的价值,提升计算的效率,存储层支持计算下推的逻辑实现;支持filter,count,sum, min, max等操作。

1.2 Raft升级

1. 提供Leader选举机制,支持多节点选举;
2. 提供日志复制,保证了系统的可靠性,有效防止数据丢失。
3. 提供高性能的Raft,采用多线程和异步IO,提高了系统的吞吐量和响应速度。
4. 提供Snapshot机制,用于恢复状态机的状态。能够减少日志的大小,从而提高性能,还可以用于在节点故障时快速恢复状态机的状态。
5. 提供集群扩缩容迁移能力,能够在不影响整个系统的稳定性和一致性前提下更加容易地添加或删除节点。

1.3 协议层支持MySQL协议

1. 提供MySQL Shell这种交互式命令行工具,用于高效管理和操作MySQL数据库;同时支持SSL加密,保证数据库的安全性。
2. 提供MySQL JDBC Driver 数据库连接驱动程序, 通过Java应用程序中的JDBC API访问MySQL数据库,用于连接和操作MySQL数据库。

1.4 集群运维监控

1. 提供可视化监控,涉及grafana监控、http监控,实现集群节点(磁盘、CPU、IO等)、表(分区、Region)、Region监控、raft group监控
2. 提供多种部署方案:单点、docker-compose、ansible多节点部署
3. 提供了集群在线扩容、缩容方案,进行集群扩缩容操作。

2 功能层

2.1 Common公共和基础模块

1. 支持手动调整日志级别,可以根据实际场景需要灵活地控制日志地详细程度,减少日志文件的大小和存储成本。
2. 优化Store & Dingo Client端错误码
3. 支持C++版本的数据序列化,将序列化后的数据按照序列化时的格式解析,然后将解析后的数据还原为原始数据,

2.2 Raft管理和分布式存储

1. 提供Snapshot机制,用于恢复状态机的状态。
2. 支持Region Split,当某个Region超过最大限制,系统自动将其分裂成多个Region,保证各个Region大小接近,有利于进行调度决策。
3. 支持Region Merge,当某个Region因为大量删除导致Region的大小变小是,系统会将较小的两个相邻Region进行合并。
4. 优化了Range范围校验规则,提高代码执行效率,从而缩短数据查询时间,对性能有极大提升。
5. 支持活配置dingo-store服务线程个数,用户可以根据实际场景需要调整线程个数
6. 支持服务在运行时指定故障点(failpoint),方便测试Corner case。
7. 支持Sotre & Region 的Metric信息管理
8. 支持算子计算下推,存储层提供基本的operator,DingoClient作为中间端的桥接器,支持SDK和SQL场景,操作类型如下:
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL
9. 支持自增ID Auto Increment ID,当创建一个具有自增列的表时,DingoDB 自动为插入到表中的每一行分配一个唯一的整数值。通过使用分布式序列生成器确保自增列的值在整个集群中都是唯一的。

2.3 SQL协议层

1. 重构Executor端架构,Executor端负责计算,用于解析并响应 Client 端 SQL 请求和其他管理请求
2. 兼容MySQL协议
3. 完成Calcite的升级,提高SQL端的执行效率
4. Metric表级信息采集
5. 网络传输层增加task响应机制(STOP/READY/QUIT)

2.4 SQL语法扩展

1. 扩展创建表时指定副本数和分区能力信息,附加相关的附属信息。
2. 扩展通过SQL进行Region分裂,实现数据分布管理,使用更加灵活易用。
3. 扩展MySQL协议相关语法
  - 支持查看全局/用户/会话变量
  - 支持设置全局/用户/会话变量
  - 支持查看表结构/指定列的信息
  - 支持查看表/用户创建语句
  - 支持设置mysql-driver空闲超时时长
  - 支持Sql的预处理

2.5 Java SDK层

SDK为开发人员创建的一组软件工具程序,通过特定API接口对数据库进行操作,开发者能够更加灵活且高效的执行数据库操作,降低学习成本,极大提升开发效率。以下是DingoDB SDK层支持的功能特性:

1. 支持通过DingoDB自研API接口执行Connect集群操作
2. 支持表操作(创建/删除)
3. 支持单条数据操作(查看/插入/删除/修改)
4. 支持批量数据操作(查看/插入/删除/修改)
5. 支持范围过滤后的聚合操作
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL

2.6 DevOPS层

可视化系统监控

  • 节点信息监控,帮助用户更有效观察服务器节点状态变化
  • 系统进程监控,帮助用户及时发现异常进程,用户及时响应处理

新增系统运维工具

  • 支持多节点部署,使用ansible自动化运维工具,通过批量系统配置,程序部署,命令运行等功能来实现批量部署的功能。

新增DBA级别的系统管理工具

  • 支持Leader的迁移,将同一group中的Leader切换到领一个follower节点,用于负载均衡或紧急情况下重启机器。
  • 支持Region分裂/合并,系统自动将其分裂/合并成多个Region,保证各个Region大小接近,达到负载均衡。
  • 支持节点扩缩容,用户可以根据实际场景中数据分布来决定是否增加或减少借点,从而做到负载均衡。
  • 可视化Sechma/Table/Region管理,通过可视化工具有效监控Sechma/Table/Region信息。

dingo-v0.6.0

10 months ago

Release Notes v0.6.0

1 架构层

1.1 存储计算分离

1. 计算引擎(Executor):接收基于MySQL协议和DingoDB自有协议的SQL,进行SQL解析、逻辑计划和执行计划生成,对接低层Store存储。
2. 分布式存储引擎(Store):基于C++的高效分布式存储。整个存储层分为元数据存储和数据存储;存储层设计采用灵活扩展的方式,进行多种存储引擎的扩展,如Rocksdb, memory, xdp-rocks等。
3. 支持计算下推操作:为了高效的提升聚合、过滤操作带来的价值,提升计算的效率,存储层支持计算下推的逻辑实现;支持filter,count,sum, min, max等操作。

1.2 Raft升级

1. 提供Leader选举机制,支持多节点选举;
2. 提供日志复制,保证了系统的可靠性,有效防止数据丢失。
3. 提供高性能的Raft,采用多线程和异步IO,提高了系统的吞吐量和响应速度。
4. 提供Snapshot机制,用于恢复状态机的状态。能够减少日志的大小,从而提高性能,还可以用于在节点故障时快速恢复状态机的状态。
5. 提供集群扩缩容迁移能力,能够在不影响整个系统的稳定性和一致性前提下更加容易地添加或删除节点。

1.3 协议层支持MySQL协议

1. 提供MySQL Shell这种交互式命令行工具,用于高效管理和操作MySQL数据库;同时支持SSL加密,保证数据库的安全性。
2. 提供MySQL JDBC Driver 数据库连接驱动程序, 通过Java应用程序中的JDBC API访问MySQL数据库,用于连接和操作MySQL数据库。

1.4 集群运维监控

1. 提供可视化监控,涉及grafana监控、http监控,实现集群节点(磁盘、CPU、IO等)、表(分区、Region)、Region监控、raft group监控
2. 提供多种部署方案:单点、docker-compose、ansible多节点部署
3. 提供了集群在线扩容、缩容方案,进行集群扩缩容操作。

2 功能层

2.1 Common公共和基础模块

1. 支持手动调整日志级别,可以根据实际场景需要灵活地控制日志地详细程度,减少日志文件的大小和存储成本。
2. 优化Store & Dingo Client端错误码
3. 支持C++版本的数据序列化,将序列化后的数据按照序列化时的格式解析,然后将解析后的数据还原为原始数据,

2.2 Raft管理和分布式存储

1. 提供Snapshot机制,用于恢复状态机的状态。
2. 支持Region Split,当某个Region超过最大限制,系统自动将其分裂成多个Region,保证各个Region大小接近,有利于进行调度决策。
3. 支持Region Merge,当某个Region因为大量删除导致Region的大小变小是,系统会将较小的两个相邻Region进行合并。
4. 优化了Range范围校验规则,提高代码执行效率,从而缩短数据查询时间,对性能有极大提升。
5. 支持活配置dingo-store服务线程个数,用户可以根据实际场景需要调整线程个数
6. 支持服务在运行时指定故障点(failpoint),方便测试Corner case。
7. 支持Sotre & Region 的Metric信息管理
8. 支持算子计算下推,存储层提供基本的operator,DingoClient作为中间端的桥接器,支持SDK和SQL场景,操作类型如下:
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL
9. 支持自增ID Auto Increment ID,当创建一个具有自增列的表时,DingoDB 自动为插入到表中的每一行分配一个唯一的整数值。通过使用分布式序列生成器确保自增列的值在整个集群中都是唯一的。

2.3 SQL协议层

1. 重构Executor端架构,Executor端负责计算,用于解析并响应 Client 端 SQL 请求和其他管理请求
2. 兼容MySQL协议
3. 完成Calcite的升级,提高SQL端的执行效率
4. Metric表级信息采集
5. 网络传输层增加task响应机制(STOP/READY/QUIT)

2.4 SQL语法扩展

1. 扩展创建表时指定副本数和分区能力信息,附加相关的附属信息。
2. 扩展通过SQL进行Region分裂,实现数据分布管理,使用更加灵活易用。
3. 扩展MySQL协议相关语法
  - 支持查看全局/用户/会话变量
  - 支持设置全局/用户/会话变量
  - 支持查看表结构/指定列的信息
  - 支持查看表/用户创建语句
  - 支持设置mysql-driver空闲超时时长
  - 支持Sql的预处理

2.5 Java SDK层

SDK为开发人员创建的一组软件工具程序,通过特定API接口对数据库进行操作,开发者能够更加灵活且高效的执行数据库操作,降低学习成本,极大提升开发效率。以下是DingoDB SDK层支持的功能特性:

1. 支持通过DingoDB自研API接口执行Connect集群操作
2. 支持表操作(创建/删除)
3. 支持单条数据操作(查看/插入/删除/修改)
4. 支持批量数据操作(查看/插入/删除/修改)
5. 支持范围过滤后的聚合操作
    - SUM
    - SUM0
    - COUNT
    - MAX
    - MIN
    - COUNTWITHNULL

2.6 DevOPS层

可视化系统监控

  • 节点信息监控,帮助用户更有效观察服务器节点状态变化
  • 系统进程监控,帮助用户及时发现异常进程,用户及时响应处理

新增系统运维工具

  • 支持多节点部署,使用ansible自动化运维工具,通过批量系统配置,程序部署,命令运行等功能来实现批量部署的功能。

新增DBA级别的系统管理工具

  • 支持Leader的迁移,将同一group中的Leader切换到领一个follower节点,用于负载均衡或紧急情况下重启机器。
  • 支持Region分裂/合并,系统自动将其分裂/合并成多个Region,保证各个Region大小接近,达到负载均衡。
  • 支持节点扩缩容,用户可以根据实际场景中数据分布来决定是否增加或减少借点,从而做到负载均衡。
  • 可视化Sechma/Table/Region管理,通过可视化工具有效监控Sechma/Table/Region信息。

dingo-v0.5.0

1 year ago

Release Note - V0.5.0

一、SQL相关特性

  1. 支持like关键字的模糊查询
  2. 支持用户认证:用户的增删改查
  3. 支持用户权限赋予
  4. 支持集群认证
  5. 支持SQL批量插入
  6. 优化Calcite函数校验机制
  7. 错误码信息重构

二、元数据管理

  1. 将集群表粒度管理拆分到executor
  2. 废弃原有Dingo-jraft模块
  3. Coordinator中将原有Dingo-jraft迁移至Dingo-mpu
  4. 支持基于SQL的元数据表查询

三、索引相关

  1. 支持索引的增删改查,提升查询性能
  2. 支持多种多索引类型:非主键索引和联合索引

四、SDK相关特性

  1. 支持基于链式表达式的计算,实现多种范围查找后的聚合计算、更新等
  2. 支持非主键列扫描、过滤计算
  3. 指标计算特性列表:
序号 函数 说明
1 Scan 扫描表中数据
2 Get 读取表中数据
3 Filter 根据条件过滤数据
4 Add 对列进行数值加操作
5 Put 向表中写入数据
6 Update 修改表中数据
7 Delete 删除表中数据
8 DeleteRange 范围删除表中数据
9 Max 对列与输入求最大值
10 Min 对列与输入求最小值
11 Avg 对列与输入求平均数
12 Sum 对列与输入求和
13 Count 计算记录条数
14 SortList 对输入的数值和已存储的数值按照数值大小进行排序,默认升序
15 DistinctList 对输入的数值和已存储的数值执行去重操作,对重复的数值只纪录一次
16 List 列表,基于输入的数值和已存储的数值,根据条件返回List结果
17 IncreaseCount 递增次数,序列中,存在相邻两点递增,统计相邻递增的次数
18 DecreaseCount 递减次数,序列中,存在相邻两点递减,统计相邻递减的次数
19 maxIncreaseCount 最大递增,序列中,每次连续递增中产生的递增次数的最大值
20 maxDecreaseCount 最大递减,序列中,每次连续递减中产生的递减次数的最大值

五、列存

  1. 支持基于Merge Tree的列式存储

六、分布式存储

  1. 解决RocksDB update/delete磁盘释放慢的问题
  2. 优化Prefix Scan
  3. 完成RocksDB版本升级
  4. 优化RocksDB的I/O流程
  5. 释放DeleteRange执行后的磁盘空间
  6. RocksDB固定参数可配置

v0.5.0

1 year ago

Release Note - V0.5.0

一、SQL相关特性

  1. 支持like关键字的模糊查询
  2. 支持用户认证:用户的增删改查
  3. 支持用户权限赋予
  4. 支持集群认证
  5. 支持SQL批量插入
  6. 优化Calcite函数校验机制
  7. 错误码信息重构

二、元数据管理

  1. 将集群表粒度管理拆分到executor
  2. 废弃原有Dingo-jraft模块
  3. Coordinator中将原有Dingo-jraft迁移至Dingo-mpu
  4. 支持基于SQL的元数据表查询

三、索引相关

  1. 支持索引的增删改查,提升查询性能
  2. 支持多种多索引类型:非主键索引和联合索引

四、SDK相关特性

  1. 支持基于链式表达式的计算,实现多种范围查找后的聚合计算、更新等
  2. 支持非主键列扫描、过滤计算
  3. 指标计算特性列表:
序号 函数 说明
1 Scan 扫描表中数据
2 Get 读取表中数据
3 Filter 根据条件过滤数据
4 Add 对列进行数值加操作
5 Put 向表中写入数据
6 Update 修改表中数据
7 Delete 删除表中数据
8 DeleteRange 范围删除表中数据
9 Max 对列与输入求最大值
10 Min 对列与输入求最小值
11 Avg 对列与输入求平均数
12 Sum 对列与输入求和
13 Count 计算记录条数
14 SortList 对输入的数值和已存储的数值按照数值大小进行排序,默认升序
15 DistinctList 对输入的数值和已存储的数值执行去重操作,对重复的数值只纪录一次
16 List 列表,基于输入的数值和已存储的数值,根据条件返回List结果
17 IncreaseCount 递增次数,序列中,存在相邻两点递增,统计相邻递增的次数
18 DecreaseCount 递减次数,序列中,存在相邻两点递减,统计相邻递减的次数
19 maxIncreaseCount 最大递增,序列中,每次连续递增中产生的递增次数的最大值
20 maxDecreaseCount 最大递减,序列中,每次连续递减中产生的递减次数的最大值

五、列存

  1. 支持基于Merge Tree的列式存储

六、分布式存储

  1. 解决RocksDB update/delete磁盘释放慢的问题
  2. 优化Prefix Scan
  3. 完成RocksDB版本升级
  4. 优化RocksDB的I/O流程
  5. 释放DeleteRange执行后的磁盘空间
  6. RocksDB固定参数可配置

dingo-v0.4.1

1 year ago

1. Feature and Optimization about SQL

1.1 Features about SQL

1.1.1 Extended SQL Syntax

  • Support TTL when create table using options
  • Support to assign partitions when create table

1.1.2 Features about Complex Data Type

  • Support Operations about MAP
  • Support Operations about MultiSet
  • Support Operations about Array

1.1.3 Support to use variables in SQL statement, such as insert, select, delete.

1.1.4 Support stratagy to control messages transmitted between operators in execution plan

1.1.5 Support new SQL function

No Function Name Description about Function
1 pow(x,y) The POW() function returns the value of a number raised to the power of another number
2 round(x,y) The ROUND() function rounds a number to a specified number of decimal places
3 ceiling(x) The CEILING() function returns the smallest integer value that is bigger than or equal to a number
4 floor(x) The FLOOR() function returns the largest integer value that is smaller than or equal to a number
5 mod(x,y) The MOD() function returns the remainder of a number divided by another number
6 abs(x) The ABS() function returns the absolute (positive) value of a number.

1.2 Optimization about SQL

  • Optimizate query using range filter
  • Optimizate query about range scan
  • Optimizate type system about dingo internally
  • Optimization about SQL date/time/timestamp function

2. Operation of Key-Value

2.1 Equivalent operation of Key-Value and SQL

  • Support to do table operation using Key-Value API, such as create table, drop table
  • Support to insert, update, delete record in table using Key-Value API
  • Support to do table operation using Annotation API
  • Operations about table and record are equivalent between Key-Value API and SQL

2.2 Operation lists about Key-Value SQL

2.2.1 Basic Key-Value Operation

No Function Name Description about Function
1 put insert or update records in table
2 get query records by user key
3 delete delete records by user key

2.2.2 Numerical operations

No Funcation Name Description about Function
1 add add values on same data type
2 sum calculate the summary of columns filtered by keys
3 max calculate the max of columns filtered by keys
4 min calculate the min of columns filtered by keys

2.2.3 Compound operation

No Function Name Description about Function
1 Operate do multiple operations on a single record, the operation list can be numerical operation or basic operation
2 OperateList do multiple operations on a single record
3 UDF defined using LUA script to implement user define function

2.2.4 Collection operations

No Type Function Name Description about Function
1 read size get size of the elements
2 read get_all get all the elements of collection
3 read get_by_key get all the elements of collection by input key
4 read get_by_value get all the elements of collection by input value
5 read get_by_index_range get all the elements of collection by range index
6 write put append a element to the end
7 write clear clear all the elements of collection
8 write remove_by_key remove the key from collection
9 write remove_all_by_value remove all records match the value
10 write remove_by_index remove record by index

2.2.5 Filter operations

  • DateFilter

Query records using range filter with Date type.

  • NumberRange

Query records using range filter with Numberic type.

  • StringRange

Query records using range filter with String type

  • ValueEquals

Query records with specifiy record value.

3. Optimization about Storage

3.1 Distributed Consistency Protocol

  • Refactor the implements of raft protocol to replace sofa-jraft
  • Refactor the implements about log replication and leader selection
  • Support new serialization about key and value

3.2 Improvement about Rocksdb

  • Rocksdb can load configuration by files
  • Support TTL features using user timestamp
  • Update Rocksdb version and release package about io.dingodb. on maven central

4. Other features

  • Support parameters using JDBC connection such as timeout
  • Support explain to view plan about Dingo SQL
  • Support to release related package to maven-central
No Module Description about module
1 dingo-driver-client the jdbc driver client used by sql
2 dingo-sdk the key-value sdk client to do operation about key-value
3 dingo-rocksdb Extended features on rocksdb

v0.4.1

1 year ago

1. Feature and Optimization about SQL

1.1 Features about SQL

1.1.1 Extended SQL Syntax

  • Support TTL when create table using options
  • Support to assign partitions when create table

1.1.2 Features about Complex Data Type

  • Support Operations about MAP
  • Support Operations about MultiSet
  • Support Operations about Array

1.1.3 Support to use variables in SQL statement, such as insert, select, delete.

1.1.4 Support stratagy to control messages transmitted between operators in execution plan

1.1.5 Support new SQL function

No Function Name Description about Function
1 pow(x,y) The POW() function returns the value of a number raised to the power of another number
2 round(x,y) The ROUND() function rounds a number to a specified number of decimal places
3 ceiling(x) The CEILING() function returns the smallest integer value that is bigger than or equal to a number
4 floor(x) The FLOOR() function returns the largest integer value that is smaller than or equal to a number
5 mod(x,y) The MOD() function returns the remainder of a number divided by another number
6 abs(x) The ABS() function returns the absolute (positive) value of a number.

1.2 Optimization about SQL

  • Optimizate query using range filter
  • Optimizate query about range scan
  • Optimizate type system about dingo internally
  • Optimization about SQL date/time/timestamp function

2. Operation of Key-Value

2.1 Equivalent operation of Key-Value and SQL

  • Support to do table operation using Key-Value API, such as create table, drop table
  • Support to insert, update, delete record in table using Key-Value API
  • Support to do table operation using Annotation API
  • Operations about table and record are equivalent between Key-Value API and SQL

2.2 Operation lists about Key-Value SQL

2.2.1 Basic Key-Value Operation

No Function Name Description about Function
1 put insert or update records in table
2 get query records by user key
3 delete delete records by user key

2.2.2 Numerical operations

No Funcation Name Description about Function
1 add add values on same data type
2 sum calculate the summary of columns filtered by keys
3 max calculate the max of columns filtered by keys
4 min calculate the min of columns filtered by keys

2.2.3 Compound operation

No Function Name Description about Function
1 Operate do multiple operations on a single record, the operation list can be numerical operation or basic operation
2 OperateList do multiple operations on a single record
3 UDF defined using LUA script to implement user define function

2.2.4 Collection operations

No Type Function Name Description about Function
1 read size get size of the elements
2 read get_all get all the elements of collection
3 read get_by_key get all the elements of collection by input key
4 read get_by_value get all the elements of collection by input value
5 read get_by_index_range get all the elements of collection by range index
6 write put append a element to the end
7 write clear clear all the elements of collection
8 write remove_by_key remove the key from collection
9 write remove_all_by_value remove all records match the value
10 write remove_by_index remove record by index

2.2.5 Filter operations

  • DateFilter

Query records using range filter with Date type.

  • NumberRange

Query records using range filter with Numberic type.

  • StringRange

Query records using range filter with String type

  • ValueEquals

Query records with specifiy record value.

3. Optimization about Storage

3.1 Distributed Consistency Protocol

  • Refactor the implements of raft protocol to replace sofa-jraft
  • Refactor the implements about log replication and leader selection
  • Support new serialization about key and value

3.2 Improvement about Rocksdb

  • Rocksdb can load configuration by files
  • Support TTL features using user timestamp
  • Update Rocksdb version and release package about io.dingodb. on maven central

4. Other features

  • Support parameters using JDBC connection such as timeout
  • Support explain to view plan about Dingo SQL
  • Support to release related package to maven-central
No Module Description about module
1 dingo-driver-client the jdbc driver client used by sql
2 dingo-sdk the key-value sdk client to do operation about key-value
3 dingo-rocksdb Extended features on rocksdb

dingo-v0.3.0

1 year ago

1.Semantics and Function of SQL

1.1 New data type

  • Boolean
  • Date: default format yyyy-MM-dd
  • Time: default format HH:mm:ss
  • Timestamp: default format yyyy-MM-dd HH:mm:ss.SSS

1.2 Allow assigning a default value to column, either constant or internal functions

1.3 Support Join operation

  • Inner Join
  • Left Join
  • Right Join
  • Full Join
  • Cross Join

1.4 Function list about String

No Function Names Notes about Function
1 Concat Adds two or more expressions together
2 Format Formats a number to a format like "#,###,###.##", rounded to a specified number of decimal places
3 Locate The LOCATE() function returns the position of the first occurrence of a substring in a string
4 Lower Converts a string to lower-case
5 Lcase Converts a string to lower-case
6 Upper Converts a string to upper-case
7 Ucase Converts a string to upper-case
8 Left Extracts a number of characters from a string (starting from left)
9 Right Extracts a number of characters from a string (starting from right)
10 Repeat Repeats a string as many times as specified
11 Replace Replaces all occurrences of a substring within a string, with a new substring
12 Trim Removes leading and trailing spaces from a string
13 Ltrim Removes leading spaces from a string
14 Rtrim Removes trailing spaces from a string
15 Mid Extracts a substring from a string (starting at any position)
16 Substring Extracts a substring from a string (starting at any position)
17 Reverse Reverses a string and returns the result

1.5 Function list about Date and Time

No Function Names Notes about Function
1 Now Return current date and time
2 CurrentDate Return the current date
3 Current_date Return the current date
4 CurTime Return the current time
5 Current_time Return the current time
6 Current_timestamp Return the current date and time
7 From_UnixTime Convert unix time to timestamp
8 Unix_Timestamp Format the time to unix timestamp
9 Date_Format Formats a date
10 DateDiff Returns the number of days between two date values
11 Time_Format Formats a time by a specified format

2. Management of Replicator

2.1 Management of metadata

  • Physical table can be split into N partitions based on data size
  • Management of physical tables such as table creation time, table status, partition strategy, split conditions, etc

2.2 Scheduler of partition replicator

  • Support multiple partition modes, such as One table with one partition, One table with multiple partitions
  • Support multiple split strategies, such as auto-split or manually split by API
  • Support resource isolation between physical tables

2.3 Tools of partition management

  • Support to view status about partition, such as leader, follower, etc
  • Support to migrate, split partition by internal API
  • Support to view metrics about partition, such as write, read latency, size, record count

3. The data access method for DingoDB

3.1 JDBC mode

  • Support to connect to dingo by JDBC

3.2 SDK client mode

  • Support to put, get, and delete records to tables in dingo
  • Support to batch write records to tables in dingo

3.3 Import data from external

  • Support to import data from local files in CSV, JSON format
  • Support to import data from Kafka in JSON and Avro format

4. Tools and Monitor

  • Support to monitor dingo cluster by grafana and prometheus
  • Support to management partitions of the cluster by API
  • Support to adjust log level dynamically by tools
  • Support to deploy cluster by ansible or docker-compose
  • Newly add autotests more than 1300+

v0.3.0

1 year ago

1.Semantics and Function of SQL

1.1 New data type

  • Boolean
  • Date: default format yyyy-MM-dd
  • Time: default format HH:mm:ss
  • Timestamp: default format yyyy-MM-dd HH:mm:ss.SSS

1.2 Allow assigning a default value to column, either constant or internal functions

1.3 Support Join operation

  • Inner Join
  • Left Join
  • Right Join
  • Full Join
  • Cross Join

1.4 Function list about String

No Function Names Notes about Function
1 Concat Adds two or more expressions together
2 Format Formats a number to a format like "#,###,###.##", rounded to a specified number of decimal places
3 Locate The LOCATE() function returns the position of the first occurrence of a substring in a string
4 Lower Converts a string to lower-case
5 Lcase Converts a string to lower-case
6 Upper Converts a string to upper-case
7 Ucase Converts a string to upper-case
8 Left Extracts a number of characters from a string (starting from left)
9 Right Extracts a number of characters from a string (starting from right)
10 Repeat Repeats a string as many times as specified
11 Replace Replaces all occurrences of a substring within a string, with a new substring
12 Trim Removes leading and trailing spaces from a string
13 Ltrim Removes leading spaces from a string
14 Rtrim Removes trailing spaces from a string
15 Mid Extracts a substring from a string (starting at any position)
16 Substring Extracts a substring from a string (starting at any position)
17 Reverse Reverses a string and returns the result

1.5 Function list about Date and Time

No Function Names Notes about Function
1 Now Return current date and time
2 CurrentDate Return the current date
3 Current_date Return the current date
4 CurTime Return the current time
5 Current_time Return the current time
6 Current_timestamp Return the current date and time
7 From_UnixTime Convert unix time to timestamp
8 Unix_Timestamp Format the time to unix timestamp
9 Date_Format Formats a date
10 DateDiff Returns the number of days between two date values
11 Time_Format Formats a time by a specified format

2. Management of Replicator

2.1 Management of metadata

  • Physical table can be split into N partitions based on data size
  • Management of physical tables such as table creation time, table status, partition strategy, split conditions, etc

2.2 Scheduler of partition replicator

  • Support multiple partition modes, such as One table with one partition, One table with multiple partitions
  • Support multiple split strategies, such as auto-split or manually split by API
  • Support resource isolation between physical tables

2.3 Tools of partition management

  • Support to view status about partition, such as leader, follower, etc
  • Support to migrate, split partition by internal API
  • Support to view metrics about partition, such as write, read latency, size, record count

3. The data access method for DingoDB

3.1 JDBC mode

  • Support to connect to dingo by JDBC

3.2 SDK client mode

  • Support to put, get, and delete records to tables in dingo
  • Support to batch write records to tables in dingo

3.3 Import data from external

  • Support to import data from local files in CSV, JSON format
  • Support to import data from Kafka in JSON and Avro format

4. Tools and Monitor

  • Support to monitor dingo cluster by grafana and prometheus
  • Support to management partitions of the cluster by API
  • Support to adjust log level dynamically by tools
  • Support to deploy cluster by ansible or docker-compose
  • Newly add autotests more than 1300+