Starrocks Versions Save

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

3.1.10

2 weeks ago

Release date: March 29, 2024

New Features

  • Primary Key tables support Size-tiered Compaction. #42474

Behavior Changes

  • When null values in JSON data are evaluated based on the IS NULL operator, they are considered NULL values following SQL language. For example, true is returned for SELECT parse_json('{"a": null}') -> 'a' IS NULL (before this behavior change, false is returned). #42815

Improvements

  • When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42348

Bug Fixes

Fixed the following issues:

  • In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #42241
  • When users query ORC files by using Hive catalogs, the query results may be incorrect because StarRocks used to read ORC files from Hive based on mapping by position. To resolve this issue, users can set the session variable orc_use_column_names to true, which specifies to read ORC files from Hive based on mapping by column name. #42905
  • When LDAP authentication for the AD system is adopted, logins without passwords are allowed. #42476
  • When disk device names end with digits, the values of monitoring metrics remain 0s because the disk device names may be invalid after such digits are removed. #42741

2.5.20

4 weeks ago

Release date: March 22, 2024

Improvements

  • replace_if_not_null supports BITMAP columns in an Aggregate table. Users can specify replace_if_not_null as the aggregate function for BITMAP columns in an Aggregate table. #42104
  • G1 Garbage Collector is used for JDK 9 and later by default. #41374

Parameter Changes

  • The default value of the BE parameter update_compaction_size_threshold is changed from 256 MB to 64 MB to accelerate compaction. #42776

Bug Fixes

Fixed the following issues:

  • Synchronizing data using StarRocks external tables encounters the error "commit and publish txn failed". The synchronization succeeds after a retry but the same copy of data is loaded twice. #25165
  • RPC transmit resources are temporarily unavailable due to GC issues. #41636
  • array_agg() in v2.5 processes NULLs in a different way than it does in v2.3. As a result, the query result is incorrect after an upgrade from v2.3 to v2.5. #42639
  • The Sink Operator in a query unexpectedly exits, which causes BEs to crash. #38662
  • Executing the DELETE command on an Aggregate table results in a race for accessing tablet metadata, which causes BEs to crash. #42174
  • The MemTracker encounters the Use-After-Free issue during UDF calling, which causes BEs to crash. #41710
  • The unnest() function does not support aliases. #42138

3.2.4

1 month ago

Release date: March 12, 2024

New Features

  • Cloud-native Primary Key tables in shared-data clusters support Size-tiered Compaction to reduce the write I/O amplification. #41034
  • Added the date function milliseconds_diff. #38171
  • Added the session variable catalog, which specifies the catalog to which the session belongs. #41329
  • Supports setting user-defined variables in hints. #40746
  • Supports CREATE TABLE LIKE in Hive catalogs. #37685
  • Added the view information_schema.partitions_meta, which records detailed metadata of partitions. #39265
  • Added the view sys.fe_memory_usage, which records the memory usage for StarRocks. #40464

Behavior Changes

  • cbo_decimal_cast_string_strict is used to control how CBO converts data from the DECIMAL type to the STRING type. The default value true indicates that the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). The DECIMAL type is not strictly filled in earlier versions, causing different results when comparing the DECIMAL type and the STRING type. #40619
  • The default value of the Iceberg Catalog parameter enable_iceberg_metadata_cache has been changed to false. From v3.2.1 to v3.2.3, this parameter is set to true by default, regardless of what metastore service is used. In v3.2.4 and later, if the Iceberg cluster uses AWS Glue as metastore, this parameter still defaults to true. However, if the Iceberg cluster uses other metastore service such as Hive metastore, this parameter defaults to false. #41826
  • The user who can refresh materialized views is changed from the root user to the user who creates the materialized views. This change does not affect existing materialized views. #40670
  • By default, when comparing columns of constant and string types, StarRocks compares them as strings. Users can use the session variable cbo_eq_base_type to adjust the rule used for the comparison. For example, users can set cbo_eq_base_type to decimal, and StarRocks then compares the columns as numeric values. #40619

Improvements

  • Shared-data StarRocks clusters support the Partitioned Prefix feature for S3-compatible object storage systems. When this feature is enabled, StarRocks stores the data into multiple, uniformly prefixed partitions (sub-paths) under the bucket. This improves the read and write efficiency on data files in S3-compatible object storages. #41627
  • StarRocks supports using the parameter s3_compatible_fs_list to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameter fallback_to_hadoop_fs_list to specify non-S3-compatible object storages that require access via HDFS Schema (this method requires the use of vendor-provided JAR packages). #41123
  • Optimized compatibility with Trino. Supports syntax conversion from the following Trino functions: current_catalog, current_schema, to_char, from_hex, to_date, to_timestamp, and index. #41217 #41319 #40803
  • Optimized the query rewrite logic of materialized views. StarRocks can rewrite queries with materialized views created upon logical views. #42173
  • Improved the efficiency of converting the STRING type to the DATETIME type by 35% to 40%. #41464
  • The agg_type of BITMAP-type columns in an Aggregate table can be set to replace_if_not_null in order to support updates only to a few columns of the table. #42034
  • Improved the Broker Load performance when loading small ORC files. #41765
  • The tables with hybrid row-column storage support Schema Change. #40851
  • The tables with hybrid row-column storage support complex types including BITMAP, HLL, JSON, ARRAY, MAP, and STRUCT. #41476
  • A new internal SQL log file is added to record log data related to statistics and materialized views. #40453

Bug Fixes

Fixed the following issues:

  • "Analyze Error" is thrown if inconsistent letter cases are assigned to the names or aliases of tables or views queried in the creation of a Hive view. #40921
  • I/O usage reaches the upper limit if persistent indexes are created on Primary Key tables. #39959
  • In shared-data clusters, primary key index directories are deleted every 5 hours. #40745
  • After users execute ALTER TABLE COMPACT by hand, the memory usage statistics for compaction operations are abnormal. #41150
  • Retries of the Publish phase may hang for Primary Key tables. #39890

3.1.9

1 month ago

Release date: March 8, 2024

New Features

  • Cloud-native Primary Key tables in shared-data clusters support Size-tiered Compaction to reduce write I/O amplification for the loading of a large number of small-sized files. #41610
  • Added the function regexp_extract_all. #42178
  • Added the view information_schema.partitions_meta, which records detailed metadata of partitions. #41101
  • Added the view sys.fe_memory_usage, which records the memory usage for StarRocks. #41083

Behavior Changes

  • The logic of dynamic partitioning is changed. Now partition columns of the DATE type do not support hour-level data. Note that partition columns of the DATETIME type still support hour-level data. #40328
  • The user who can refresh materialized views is changed from the root user to the user who creates the materialized views. This change does not affect existing materialized views. #40698
  • By default, when comparing columns of constant and string types, StarRocks compares them as strings. Users can use the session variable cbo_eq_base_type to adjust the default rule used for the comparison. For example, users can set cbo_eq_base_type to decimal, and StarRocks then compares the columns as numeric values. #41712

Improvements

  • StarRocks supports using the parameter s3_compatible_fs_list to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameter fallback_to_hadoop_fs_list to specify non-S3-compatible object storage that require access via HDFS Schema (this method necessitates the use of vendor-provided JAR packages). #41612
  • The compatibility with Trino's SQL statement syntax is optimized to support converting the following functions of Trino: current_catalog, current_schema, to_char, from_hex, to_date, to_timestamp, and index. #41505 #41270 #40838
  • A new session variable cbo_materialized_view_rewrite_related_mvs_limit is added to control the maximum number of candidate materialized views allowed during query planning. The default value of this session variable is 64. This session variable helps mitigate the excessive resource consumption caused by a large number of candidate materialized views for a query during the query planning. #39829
  • The agg_type of BITMAP-type columns in an Aggregate table can be set to replace_if_not_null to support updates only to a few columns of the table. #42102
  • The session variable cbo_eq_base_type is optimized to support specifying the implicit conversion rule applied to the comparison of data that contains both string and numeric data types. By default, such data is compared as strings. #40619
  • More DATE-type data (for example, "%Y-%m-%e %H:%i") can be recognized to better support partition expressions for Iceberg tables. #40474
  • The JDBC connector supports the TIME data type. #31940
  • The path parameter in the SQL statement for creating a file external table supports wildcards (*). However, like the DATA INFILE parameter in the SQL statement for creating a Broker Load job, the path parameter supports using wildcards (*) to match at most one level of directory or file. #40844
  • A new internal SQL log file is added to record log data related to statistics and materialized views. #40682

Bug Fixes

Fixed the following issues:

  • "Analyze Error" is thrown if inconsistent letter cases are assigned to the names or aliases of tables or views queried in the creation of a Hive view. #40921
  • I/O usage reaches the upper limit if persistent indexes are created on Primary Key tables. #39959
  • In shared-data clusters, the primary key index directory is deleted every 5 hours. #40745
  • After a table for which list partitioning is enabled is truncated or its partitions are truncated, queries based on the partitioning keys of the table return no data. #40495
  • After users execute ALTER TABLE COMPACT by hand, the memory usage statistics for compaction operations are abnormal. #41150
  • During data migration between clusters, if only some columns are updated in column mode, the destination cluster may crash. #40692
  • The SQL blacklist may not take effect if the submitted SQL statement contains multiple spaces or newline characters. #40457

2.5.19

2 months ago

Release date: February 8, 2024

New features

  • Added a pattern-matching function: regexp_extract_all.
  • Added Bitmap value processing functions: serialize, deserialize, and serializeToString. #40162

Improvements

  • Supports automatic activation of inactive materialized views when refreshing these materialized views. #38521
  • Optimized BE log printing to prevent too many irrelevant logs. #22820 #36187
  • Supports using Hive UDFs to process and load Bitmap data into StarRocks and export Bitmap data from StarRocks to Hive. #40165 #40168
  • Added date formats yyyy-MM-ddTHH:mm and yyyy-MM-dd HH:mm to support TIMESTAMP partition fields in Apache Iceberg tables. #39986

Bug Fixes

Fixed the following issues:

  • Running a Spark Load job that has no PROPERTIES specified causes null pointer exceptions (NPEs). #38765
  • INSERT INTO SELECT occasionally encounters the error "timeout by txn manager". #36688
  • The memory consumption of PageCache exceeds the threshold specified by the BE dynamic parameter storage_page_cache_limit in certain circumstances. #37740
  • After a table is dropped and then re-created with the same table name, refreshing asynchronous materialized views created on that table fails. #38008 #38982
  • Writing data to S3 buckets using SELECT INTO occasionally encounters the error "The tablet write operation update metadata take a long time". #38443
  • Some operations during data loading may encounter "reached timeout". #36746
  • The DECIMAL type returned by SHOW CREATE TABLE is inconsistent with that specified in CREATE TABLE. #39297
  • If partition columns in external tables contain null values, queries against those tables will cause BEs to crash. #38888
  • When deleting data from a Duplicate Key table, if the condition in the WHERE clause of the DELETE statement has a leading space, the deleted data can still be queried using SELECT. #39797
  • Loading array<string> data from ORC files into StarRocks (array<json>) may cause BEs to crash. #39233
  • Querying Hive catalogs may be stuck and even expire. #39863
  • Partitions cannot be dynamically created if hour-level partitions are specified in the PARTITION BY clause. #40256
  • The error message "failed to call frontend service" is returned during loading from Apache Flink. #40710

3.2.3

2 months ago

Release date: February 10, 2024

New Features

  • [Preview] Supports hybrid row-column storage for tables. It allows better performance for high-concurrency, low-latency point lookups against Primary Key tables and partial data updates. Currently, this feature does not support modification via ALTER TABLE, changing Sort Key, and partial updates in column mode.
  • Supports backing up and restoring asynchronous materialized views.
  • Broker Load supports loading JSON-type data.
  • Supports query rewrite using asynchronous materialized views created upon views. Queries against a view can be rewritten based on materialized views that are created upon that view.
  • Supports CREATE OR REPLACE PIPE. #37658

Behavior Changes

  • Added the session variable enable_strict_order_by. When this variable is set to the default value TRUE, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910
  • Added the session variable enable_materialized_view_for_insert, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false. #37505
  • When a single query is executed within the Pipeline framework, its memory limit is now constrained by the variable query_mem_limit instead of exec_mem_limit. Setting the value of query_mem_limit to 0 indicates no limit. #34120

Parameter Changes

  • Added the FE configuration item http_worker_threads_num, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
  • Added the BE configuration item lake_pk_compaction_max_input_rowsets, which controls the maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data StarRocks cluster. This helps optimize resource consumption for compaction tasks. #39611
  • Added the session variable connector_sink_compression_codec, which specifies the compression algorithm used for writing data into Hive tables or Iceberg tables, or exporting data with Files(). Valid algorithms include GZIP, BROTLI, ZSTD, and LZ4. #37912
  • Added the FE configuration item routine_load_unstable_threshold_second. #36222
  • Added the BE configuration item pindex_major_compaction_limit_per_disk to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1. #36681
  • Added the BE configuration item enable_lazy_delta_column_compaction. The default value is true, indicating that StarRocks does not perform frequent compaction operations on delta columns. #36654
  • Added the FE configuration item default_mv_refresh_immediate, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true. #37093
  • Changed the default value of the FE configuration item default_mv_refresh_partition_numto 1. This indicates that when multiple partitions need to be updated during a materialized view refresh, the task will be split in batches, refreshing only one partition at a time. This helps reduce resource consumption during each refresh. #36560

Improvements

  • Added date formats yyyy-MM-ddTHH:mm and yyyy-MM-dd HH:mm to support TIMESTAMP partition fields in Apache Iceberg tables. #39986
  • Added Data Cache-related metrics to the monitoring API. #40375
  • Optimized BE log printing to prevent too many irrelevant logs. #22820 #36187
  • Added the field storage_medium to the view information_schema.be_tablets. #37070
  • Supports SET_VAR in multiple sub-queries. #36871
  • A new field LatestSourcePosition is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. #38298
  • When the string on the right side of the LIKE operator within the WHERE clause does not include % or _, the LIKE operator is converted into the = operator. #37515
  • The default retention period of trash files is changed to 1 day from the original 3 days. #37113
  • Supports collecting statistics from Iceberg tables with Partition Transform. #39907
  • The scheduling policy for Routine Load is optimized, so that slow tasks do not block the execution of the other normal tasks. #37638

Bug Fixes

Fixed the following issues:

  • The execution of ANALYZE TABLE gets stuck occasionally. #36836
  • The memory consumption by PageCache exceeds the threshold specified by the BE dynamic parameter storage_page_cache_limit in certain circumstances. #37740
  • Hive metadata in Hive catalogs is not automatically refreshed when new fields are added to Hive tables. #37549
  • In some cases, bitmap_to_string may return incorrect results due to data type overflow. #37405
  • When SELECT ... FROM ... INTO OUTFILE is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045
  • In some cases, querying semi-structured data in tables may cause BEs to crash. #40208

3.1.8

2 months ago

Release date: February 6, 2024

New Features

  • StarRocks Community provides the StarRocks Cross-cluster Data Migration Tool, which supports migrating data from a shared-nothing cluster to either another shared-nothing cluster or a shared-data cluster.
  • Supports creating synchronous materialized views with the WHERE clause specified.
  • Added metrics that show memory usage of the data cache to MemTracker. #39600
  • Added an array function, array_unique_agg.

Parameter Change

  • Added a BE configuration item, lake_pk_compaction_max_input_rowsets, which controls the maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data StarRocks cluster. This helps optimize resource consumption for compaction tasks. #39611

Improvements

  • Supports ORDER BY and INDEX clauses in CTAS statements. #38886
  • Supports equality deletes on ORC-formatted Iceberg v2 tables. #37419
  • Supports setting the datacache.partition_duration property for cloud-native tables created with the list partitioning strategy. This property controls the validity period of the data cache and can be dynamically configured. #35681 #38509
  • Optimized the BE configuration item update_compaction_per_tablet_min_interval_seconds. This parameter is originally used only to control the frequency of compaction tasks on Primary Key tables. After the optimization, it can also be used to control the frequency of major compaction tasks on Primary Key table indexes. #39640
  • Parquet Reader supports converting INT32-type data in Parquet-formatted data to DATETIME-type data and storing the resulting data to StarRocks. #39808

Bug Fixes

Fixed the following issues:

  • Using NaN (Not a Number) columns as ORDER BY columns may cause BEs to crash. #30759
  • Failure to update primary key indexes may cause the error "get_applied_rowsets failed". #27488
  • The resources occupied by compaction_state_cache are not recycled after compaction task failures. #38499
  • If partition columns in external tables contain null values, queries against those tables will cause BEs to crash. #38888
  • After a table is dropped and then re-created with the same table name, refreshing asynchronous materialized views created on that table fails. #38008
  • Refreshing asynchronous materialized views created on empty Iceberg tables fail. #24068

3.1.7

3 months ago

Release date: January 12, 2024

New Features

Behavior Change

  • Added the session variable enable_materialized_view_for_insert, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false. #37505
  • The FE dynamic parameter enable_new_publish_mechanism is changed to a static parameter. You must restart the FE after you modify the parameter settings. #35338
  • Added the session variable enable_strict_order_by. When this variable is set to the default value TRUE, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910

Parameter Change

  • Added the FE configuration item routine_load_unstable_threshold_second. #36222
  • Added the FE configuration item http_worker_threads_num, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
  • Added the BE configuration item pindex_major_compaction_limit_per_disk to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1. #36681
  • Added session variables transaction_read_only and tx_read_only to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
  • Added the FE configuration item default_mv_refresh_immediate, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true. #37093
  • Added a new BE configuration item lake_enable_vertical_compaction_fill_data_cache, which specifies whether to allow compaction tasks to cache data on local disks in a shared-data cluster. The default value is false. #37296

Improvements

  • INSERT INTO FILE() SELECT FROM supports reading BINARY-type data from tables and exporting the data to Parquet-formatted files in remote storage. #36797
  • Asynchronous materialized views support dynamically setting the datacache.partition_duration property, which controls the validity period of the hot data in the data cache. #35681
  • Wen using JDK, the default GC algorithm is G1. #37386
  • The date_trunc, adddate, and time_slice functions support setting the interval parameter to values that are accurate to the millisecond and microsecond. #36386
  • When the string on the right side of the LIKE operator within the WHERE clause does not include % or _, the LIKE operator is converted into the = operator. #37515
  • A new field LatestSourcePosition is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. #38298
  • Added a new resource group property, spill_mem_limit_threshold, to control the memory usage threshold (percentage) at which a resource group triggers the spilling of intermediate results when the system variable spill_mode is set to auto. The valid range is (0, 1). The default value is 1, indicating the threshold does not take effect. #37707
  • The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
  • The scheduling policy for Routine Load is optimized, so that slow tasks do not block the execution of the other normal tasks. #37638

Bug Fixes

Fixed the following issues:

  • The execution of ANALYZE TABLE gets stuck occasionally. #36836
  • The memory consumption by PageCache exceeds the threshold specified by the BE dynamic parameter storage_page_cache_limit in certain circumstances. #37740
  • Hive metadata in Hive catalogs is not automatically refreshed when new fields are added to Hive tables. #37668
  • In some cases, bitmap_to_string may return incorrect results due to data type overflow. #37405
  • Executing the DELETE statement on an empty table returns "ERROR 1064 (HY000): Index: 0, Size: 0". #37461
  • When the FE dynamic parameter enable_sync_publish is set to TRUE, queries on data that is written after the BEs crash and then restart may fail. #37398
  • The value of the TABLE_CATALOG field in views of the StarRocks Information Schema is null. #37570
  • When SELECT ... FROM ... INTO OUTFILE is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045

2.5.18

3 months ago

RRelease date: Jan 10, 2024

New Features

  • Users can set or modify session variables when they CREATE or ALTER asynchronous materialized views. #37401

Improvements

  • When using JDK, the default GC algorithm is G1. #37498
  • The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222

Behavior Change

  • Added the session variable enable_materialized_view_for_insert, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false. #37505
  • Added the session variable enable_strict_order_by. When this variable is set to the default value TRUE, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910

Parameter Change

  • Added session variables transaction_read_only and tx_read_only to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
  • Added the FE configuration item routine_load_unstable_threshold_second. #36222
  • Added the FE configuration item http_worker_threads_num, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
  • Added the BE configuration item pindex_major_compaction_limit_per_disk to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1. #37695

Bug Fixes

Fixed the following issues:

  • Using NaN (Not a Number) columns as ORDER BY columns may cause BEs to crash. #30759
  • Failure to update primary key indexes may cause the error "get_applied_rowsets failed". #27488
  • Hive metadata in Hive catalogs is not automatically refreshed when new fields are added to Hive tables. #37668
  • When SELECT ... FROM ... INTO OUTFILE is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045
  • In some cases, bitmap_to_string may return incorrect result due to data type overflow. #37405

3.0.9

3 months ago

Release date: January 2, 2024

New features

  • Added the percentile_disc function. #36352
  • Added a new metric max_tablet_rowset_num for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539

Improvements

  • A new value option GROUP_CONCAT_LEGACY is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150
  • When using JDK, the default GC algorithm is changed to G1. #37386
  • The be_tablets view in the information_schema database provides a new field INDEX_DISK, which records the disk usage (measured in bytes) of persistent indexes #35615
  • Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
  • Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. #34777
  • The Primary Key table size returned by the SHOW DATA statement includes the sizes of .cols files (these are files related to partial column updates and generated columns) and persistent index files. #34898
  • Optimized the performance of persistent index update when compaction is performed on all rowsets of a Primary Key table, which reduces disk read I/O. #36819
  • When the string on the right side of the LIKE operator within the WHERE clause does not include % or _, the LIKE operator is converted into the = operator. #37515
  • Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. #36534
  • The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
  • Optimized the performance of some Bitmap-related operations, including:
    • Optimized nested loop joins. #340804 #35003
    • Optimized the bitmap_xor function. #34069
    • Supports Copy on Write to optimize Bitmap performance and reduce memory consumption. #34047

Compatibility Changes

Behavior Change

  • Added the session variable enable_materialized_view_for_insert, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false. #37505
  • Changed the FE configuration item enable_new_publish_mechanism to a static parameter from a dynamic one. You must restart the FE after you modify the parameter settings. #35338
  • The default retention period of trash files is changed to 1 day from the original 3 days. #37113

Parameters

Session variables

  • Added session variable cbo_decimal_cast_string_strict, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set to true, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set to false, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value is true. #34208
  • Added session variables transaction_read_only and tx_read_only to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249

FE configurations

  • Added the FE configuration item routine_load_unstable_threshold_second. #36222
  • Added the FE configuration item http_worker_threads_num, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
  • Added the FE configuration item default_mv_refresh_immediate, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true. #37093

BE configurations

  • Added the BE configuration item enable_stream_load_verbose_log. The default value is false. With this parameter set to true, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113
  • Added the BE configuration item pindex_major_compaction_limit_per_disk to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 2. #36681
  • Added BE configuration items to specify the timeout duration for connecting to object storage:
    • object_storage_connect_timeout_ms: Timeout duration to establish socket connections with object storage. The default value is -1, which means to use the default timeout duration of the SDK configurations.
    • object_storage_request_timeout_ms: Timeout duration to establish HTTP connections with object storage. The default value is -1, which means to use the default timeout duration of the SDK configurations.

Bug Fixes

Fixed the following issues:

  • In some cases, BEs may crash when a Catalog is used to read ORC external tables. #27971
  • The BEs crash if users create persistent indexes in the event of data corruption. #30841
  • BEs occasionally crash after a Bitmap index is added. #26463
  • Failures in replaying replica operations may cause FEs to crash. #32295
  • Setting the FE parameter recover_with_empty_tablet to true may cause FEs to crash. #33071
  • Queries fail during hash joins, causing BEs to crash. #32219
  • In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. #34682
  • The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
  • Running show proc '/statistic' may cause a deadlock. #34237
  • The FE performance plunges after the FE configuration item enable_collect_query_detail_info is set to true. #35945
  • Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
  • After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
  • If INFORMATION_SCHEMA is queried by using the database driver MariaDB ODBC, the CATALOG_NAME column returned in the schemata view holds only null values. #34627
  • FEs crash due to the abnormal data loaded and cannot restart. #34590
  • If schema changes are being executed while a Stream Load job is in the PREPARD state, a portion of the source data to be loaded by the job is lost. #34381
  • Including two or more slashes (/) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601
  • The partition_live_number property added by using the ALTER TABLE statement does not take effect. #34842
  • The array_distinct function occasionally causes the BEs to crash. #36377
  • Deadlocks may occur when users refresh materialized views. #35736
  • Global Runtime Filter may cause BEs to crash in certain scenarios. #35776
  • In some cases, bitmap_to_string may return incorrect result due to data type overflow. #37405