StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
Release date: March 29, 2024
IS NULL
operator, they are considered NULL values following SQL language. For example, true
is returned for SELECT parse_json('{"a": null}') -> 'a' IS NULL
(before this behavior change, false
is returned). #42815
Fixed the following issues:
orc_use_column_names
to true
, which specifies to read ORC files from Hive based on mapping by column name. #42905
Release date: March 22, 2024
replace_if_not_null
supports BITMAP columns in an Aggregate table. Users can specify replace_if_not_null
as the aggregate function for BITMAP columns in an Aggregate table. #42104
update_compaction_size_threshold
is changed from 256 MB to 64 MB to accelerate compaction. #42776
Fixed the following issues:
Release date: March 12, 2024
milliseconds_diff
. #38171
catalog
, which specifies the catalog to which the session belongs. #41329
information_schema.partitions_meta
, which records detailed metadata of partitions. #39265
sys.fe_memory_usage
, which records the memory usage for StarRocks. #40464
cbo_decimal_cast_string_strict
is used to control how CBO converts data from the DECIMAL type to the STRING type. The default value true
indicates that the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). The DECIMAL type is not strictly filled in earlier versions, causing different results when comparing the DECIMAL type and the STRING type. #40619
enable_iceberg_metadata_cache
has been changed to false
. From v3.2.1 to v3.2.3, this parameter is set to true
by default, regardless of what metastore service is used. In v3.2.4 and later, if the Iceberg cluster uses AWS Glue as metastore, this parameter still defaults to true
. However, if the Iceberg cluster uses other metastore service such as Hive metastore, this parameter defaults to false
. #41826
root
user to the user who creates the materialized views. This change does not affect existing materialized views. #40670
cbo_eq_base_type
to adjust the rule used for the comparison. For example, users can set cbo_eq_base_type
to decimal
, and StarRocks then compares the columns as numeric values. #40619
s3_compatible_fs_list
to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameter fallback_to_hadoop_fs_list
to specify non-S3-compatible object storages that require access via HDFS Schema (this method requires the use of vendor-provided JAR packages). #41123
agg_type
of BITMAP-type columns in an Aggregate table can be set to replace_if_not_null
in order to support updates only to a few columns of the table. #42034
Fixed the following issues:
Release date: March 8, 2024
regexp_extract_all
. #42178
information_schema.partitions_meta
, which records detailed metadata of partitions. #41101
sys.fe_memory_usage
, which records the memory usage for StarRocks. #41083
root
user to the user who creates the materialized views. This change does not affect existing materialized views. #40698
cbo_eq_base_type
to adjust the default rule used for the comparison. For example, users can set cbo_eq_base_type
to decimal
, and StarRocks then compares the columns as numeric values. #41712
s3_compatible_fs_list
to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameter fallback_to_hadoop_fs_list
to specify non-S3-compatible object storage that require access via HDFS Schema (this method necessitates the use of vendor-provided JAR packages). #41612
current_catalog
, current_schema
, to_char
, from_hex
, to_date
, to_timestamp
, and index
. #41505 #41270 #40838
cbo_materialized_view_rewrite_related_mvs_limit
is added to control the maximum number of candidate materialized views allowed during query planning. The default value of this session variable is 64
. This session variable helps mitigate the excessive resource consumption caused by a large number of candidate materialized views for a query during the query planning. #39829
agg_type
of BITMAP-type columns in an Aggregate table can be set to replace_if_not_null
to support updates only to a few columns of the table. #42102
cbo_eq_base_type
is optimized to support specifying the implicit conversion rule applied to the comparison of data that contains both string and numeric data types. By default, such data is compared as strings. #40619
path
parameter in the SQL statement for creating a file external table supports wildcards (*
). However, like the DATA INFILE
parameter in the SQL statement for creating a Broker Load job, the path
parameter supports using wildcards (*
) to match at most one level of directory or file. #40844
Fixed the following issues:
Release date: February 8, 2024
yyyy-MM-ddTHH:mm
and yyyy-MM-dd HH:mm
to support TIMESTAMP partition fields in Apache Iceberg tables. #39986
Fixed the following issues:
storage_page_cache_limit
in certain circumstances. #37740
array<string>
data from ORC files into StarRocks (array<json>
) may cause BEs to crash. #39233
Release date: February 10, 2024
enable_strict_order_by
. When this variable is set to the default value TRUE
, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;
. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE
, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910
enable_materialized_view_for_insert
, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false
. #37505
query_mem_limit
instead of exec_mem_limit
. Setting the value of query_mem_limit
to 0
indicates no limit. #34120
http_worker_threads_num
, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0
. If the value for this parameter is set to a negative value or 0
, the actual thread number is twice the number of CPU cores. #37530
lake_pk_compaction_max_input_rowsets
, which controls the maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data StarRocks cluster. This helps optimize resource consumption for compaction tasks. #39611
connector_sink_compression_codec
, which specifies the compression algorithm used for writing data into Hive tables or Iceberg tables, or exporting data with Files(). Valid algorithms include GZIP, BROTLI, ZSTD, and LZ4. #37912
routine_load_unstable_threshold_second
. #36222
pindex_major_compaction_limit_per_disk
to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1
. #36681
enable_lazy_delta_column_compaction
. The default value is true
, indicating that StarRocks does not perform frequent compaction operations on delta columns. #36654
default_mv_refresh_immediate
, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true
. #37093
default_mv_refresh_partition_num
to 1
. This indicates that when multiple partitions need to be updated during a materialized view refresh, the task will be split in batches, refreshing only one partition at a time. This helps reduce resource consumption during each refresh. #36560
yyyy-MM-ddTHH:mm
and yyyy-MM-dd HH:mm
to support TIMESTAMP partition fields in Apache Iceberg tables. #39986
storage_medium
to the view information_schema.be_tablets
. #37070
SET_VAR
in multiple sub-queries. #36871
LatestSourcePosition
is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. #38298
%
or _
, the LIKE operator is converted into the =
operator. #37515
Fixed the following issues:
storage_page_cache_limit
in certain circumstances. #37740
bitmap_to_string
may return incorrect results due to data type overflow. #37405
SELECT ... FROM ... INTO OUTFILE
is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045
Release date: February 6, 2024
Fixed the following issues:
Release date: January 12, 2024
unnest_bitmap
. #38136
enable_materialized_view_for_insert
, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false
. #37505
enable_new_publish_mechanism
is changed to a static parameter. You must restart the FE after you modify the parameter settings. #35338
enable_strict_order_by
. When this variable is set to the default value TRUE
, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;
. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE
, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910
routine_load_unstable_threshold_second
. #36222
http_worker_threads_num
, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0
. If the value for this parameter is set to a negative value or 0
, the actual thread number is twice the number of CPU cores. #37530
pindex_major_compaction_limit_per_disk
to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1
. #36681
transaction_read_only
and tx_read_only
to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
default_mv_refresh_immediate
, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true
. #37093
lake_enable_vertical_compaction_fill_data_cache
, which specifies whether to allow compaction tasks to cache data on local disks in a shared-data cluster. The default value is false
. #37296
datacache.partition_duration
property, which controls the validity period of the hot data in the data cache. #35681
date_trunc
, adddate
, and time_slice
functions support setting the interval
parameter to values that are accurate to the millisecond and microsecond. #36386
%
or _
, the LIKE operator is converted into the =
operator. #37515
LatestSourcePosition
is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. #38298
spill_mem_limit_threshold
, to control the memory usage threshold (percentage) at which a resource group triggers the spilling of intermediate results when the system variable spill_mode
is set to auto
. The valid range is (0, 1). The default value is 1
, indicating the threshold does not take effect. #37707
Fixed the following issues:
storage_page_cache_limit
in certain circumstances. #37740
bitmap_to_string
may return incorrect results due to data type overflow. #37405
enable_sync_publish
is set to TRUE
, queries on data that is written after the BEs crash and then restart may fail. #37398
TABLE_CATALOG
field in views
of the StarRocks Information Schema is null
. #37570
SELECT ... FROM ... INTO OUTFILE
is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045
RRelease date: Jan 10, 2024
enable_materialized_view_for_insert
, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false
. #37505
enable_strict_order_by
. When this variable is set to the default value TRUE
, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;
. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE
, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910
transaction_read_only
and tx_read_only
to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
routine_load_unstable_threshold_second
. #36222
http_worker_threads_num
, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0
. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
pindex_major_compaction_limit_per_disk
to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1
. #37695
Fixed the following issues:
SELECT ... FROM ... INTO OUTFILE
is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045
bitmap_to_string
may return incorrect result due to data type overflow. #37405
Release date: January 2, 2024
max_tablet_rowset_num
for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539
GROUP_CONCAT_LEGACY
is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150
be_tablets
view in the information_schema
database provides a new field INDEX_DISK
, which records the disk usage (measured in bytes) of persistent indexes #35615
%
or _
, the LIKE operator is converted into the =
operator. #37515
enable_materialized_view_for_insert
, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false
. #37505
enable_new_publish_mechanism
to a static parameter from a dynamic one. You must restart the FE after you modify the parameter settings. #35338
cbo_decimal_cast_string_strict
, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set to true
, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set to false
, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value is true
. #34208
transaction_read_only
and tx_read_only
to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
routine_load_unstable_threshold_second
. #36222
http_worker_threads_num
, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0
. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
default_mv_refresh_immediate
, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true
. #37093
enable_stream_load_verbose_log
. The default value is false
. With this parameter set to true
, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113
pindex_major_compaction_limit_per_disk
to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 2
. #36681
object_storage_connect_timeout_ms
: Timeout duration to establish socket connections with object storage. The default value is -1
, which means to use the default timeout duration of the SDK configurations.object_storage_request_timeout_ms
: Timeout duration to establish HTTP connections with object storage. The default value is -1
, which means to use the default timeout duration of the SDK configurations.Fixed the following issues:
recover_with_empty_tablet
to true
may cause FEs to crash. #33071
show proc '/statistic'
may cause a deadlock. #34237
enable_collect_query_detail_info
is set to true
. #35945
INFORMATION_SCHEMA
is queried by using the database driver MariaDB ODBC, the CATALOG_NAME
column returned in the schemata
view holds only null
values. #34627
/
) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601
partition_live_number
property added by using the ALTER TABLE statement does not take effect. #34842
bitmap_to_string
may return incorrect result due to data type overflow. #37405