Iceberg Versions Save

Apache Iceberg

apache-iceberg-1.5.1

2 weeks ago

What's Changed

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.5.0...apache-iceberg-1.5.1

apache-iceberg-1.5.0

2 months ago

Apache Iceberg 1.5.0 was released on March 11, 2024. The 1.5.0 release adds a variety of new features and bug fixes.

  • API
    • Extend FileIO and add EncryptingFileIO. (#9592)
    • Track partition statistics in TableMetadata (#8502)
    • Add sqlFor API to views to handle resolving a representation for a dialect(#9247)
  • Core
    • Add view support for REST catalog (#7913)
    • Add view support for JDBC catalog (#9487)
    • Add catalog type for glue,jdbc,nessie (#9647)
    • Support Avro file encryption with AES GCM streams (#9436)
    • Add ApplyNameMapping for Avro (#9347)
    • Add StandardEncryptionManager (#9277)
    • Add REST catalog table session cache (#8920)
    • Support view metadata compression (#8552)
    • Track partition statistics in TableMetadata (#8502)
    • Enable column statistics filtering after planning (#8803)
  • Spark
    • Remove support for Spark 3.2 (#9295)
    • Support views via SQL for Spark 3.4 and 3.5 (#9423, #9421, #9343, #9513, #9582)
    • Support executor cache locality (#9563)
    • Added support for delete manifest rewrites (#9020)
    • Support encrypted output files (#9435)
    • Add Spark UI metrics from Iceberg scan metrics (#8717)
    • Parallelize reading files in add_files procedure (#9274)
    • Support file and partition delete granularity (#9384)
  • Flink
    • Remove Flink 1.15
    • Adds support for 1.18 version #9211
    • Emit watermarks from the IcebergSource (#8553)
    • Watermark read options (#9346)
  • Parquet
    • Support reading INT96 column in row group filter (#8988)
    • Add system config for unsafe Parquet ID fallback. (#9324)
  • Kafka-Connect
    • Initial project setup and event data structures (#8701)
    • Sink connector with data writers and converters (#9466)
  • Spec
    • Add partition stats spec (#7105)
    • add nanosecond timestamp types (#8683)
    • Add multi-arg transform (#8579)
  • Vendor Integrations
    • AWS: Support setting description for Glue table (#9530)
    • AWS: Update S3FileIO test to run when CLIENT_FACTORY is not set (#9541)
    • AWS: Add S3 Access Grants Integration (#9385)
    • AWS: Glue catalog strip trailing slash on DB URI (#8870)
    • Azure: Add FileIO that supports ADLSv2 storage (#8303)
    • Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
    • Nessie: Support views for NessieCatalog (#8909)
    • Nessie: Strip trailing slash for warehouse location (#9415)
    • Nessie: Infer default API version from URI (#9459)
  • Dependencies
    • Bump Nessie to 0.77.1
    • Bump ORC to 1.9.2
    • Bump Arrow to 15.0.0
    • Bump AWS Java SDK to 2.24.5
    • Bump Azure Java SDK to 1.2.20
    • Bump Google cloud libraries to 26.28.0

Note:

  1. To enable view support for JDBC catalog, configure jdbc.schema-version to V1 in catalog properties.

New Contributors

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.4.3...apache-iceberg-1.5.0

apache-iceberg-1.4.3

4 months ago

What's Changed

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.4.2...apache-iceberg-1.4.3

apache-iceberg-1.4.2

6 months ago

What's Changed

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.4.1...apache-iceberg-1.4.2

apache-iceberg-1.4.1

6 months ago

What's Changed

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.4.0...apache-iceberg-1.4.1

apache-iceberg-1.4.0

7 months ago
  • API
    • Implement bound expression sanitization (#8149)
    • Remove overflow checks in DefaultCounter causing performance issues (#8297)
    • Support incremental scanning with branch (#5984)
    • Add a validation API to DeleteFiles which validates files exist (#8525)
  • Core
    • Use V2 format by default in new tables (#8381)
    • Use zstd compression for Parquet by default in new tables (#8593)
    • Add strict metadata cleanup mode and enable it by default (#8397) (#8599)
    • Avoid generating huge manifests during commits (#6335)
    • Add a writer for unordered position deletes (#7692)
    • Optimize DeleteFileIndex (#8157)
    • Optimize lookup in DeleteFileIndex without useful bounds (#8278)
    • Optimize split offsets handling (#8336)
    • Optimize computing user-facing state in data tasks (#8346)
    • Don't persist useless file and position bounds for deletes (#8360)
    • Don't persist counts for paths and positions in position delete files (#8590)
    • Support setting system-level properties via environmental variables (#5659)
    • Add JSON parser for ContentFile and FileScanTask (#6934)
    • Add REST spec and request for commits to multiple tables (#7741)
    • Add REST API for committing changes against multiple tables (#7569)
    • Default to exponential retry strategy in REST client (#8366)
    • Support registering tables with REST session catalog (#6512)
    • Add last updated timestamp and snapshot ID to partitions metadata table (#7581)
    • Add total data size to partitions metadata table (#7920)
    • Extend ResolvingFileIO to support bulk operations (#7976)
    • Key metadata in Avro format (#6450)
    • Add AES GCM encryption stream (#3231)
    • Fix a connection leak in streaming delete filters (#8132)
    • Fix lazy snapshot loading history (#8470)
    • Fix unicode handling in HTTPClient (#8046)
    • Fix paths for unpartitioned specs in writers (#7685)
    • Fix OOM caused by Avro decoder caching (#7791)
  • Spark
    • Added support for Spark 3.5
      • Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
      • Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
      • Column pruning in merge-on-read operations.
      • Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
    • Dropped support for Spark 3.1
    • Deprecated support for Spark 3.2
    • Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466)
    • Increase default advisory partition size for writes in Spark 3.5 (#8660)
    • Support distributed planning in Spark 3.4 and 3.5 (#8123)
    • Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886)
    • Support fanout position delta writers in Spark 3.4 and 3.5 (#7703)
    • Use fanout writers for unsorted tables by default in Spark 3.5 (#8621)
    • Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897)
    • Output net changes across snapshots for carryover rows in CDC (#7326)
    • Display read metrics on Spark SQL UI (#7447) (#8445)
    • Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714)
    • Add fast_forward procedure (#8081)
    • Support filters when rewriting position deletes (#7582)
    • Support setting current snapshot with ref (#8163)
    • Make backup table name configurable during migration (#8227)
    • Add write and SQL options to override compression config (#8313)
    • Correct partition transform functions to match the spec (#8192)
    • Enable extra commit properties with metadata delete (#7649)
  • Flink
    • Add possibility of ordering the splits based on the file sequence number (#7661)
    • Fix serialization in TableSink with anonymous object (#7866)
    • Switch to FileScanTaskParser for JSON serialization of IcebergSourceSplit (#7978)
    • Custom partitioner for bucket partitions (#7161)
    • Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360)
    • Support alter table column (#7628)
  • Parquet
    • Add encryption config to read and write builders (#2639)
    • Skip writing bloom filters for deletes (#7617)
    • Cache codecs by name and level (#8182)
    • Fix decimal data reading from ParquetAvroValueReaders (#8246)
    • Handle filters with transforms by assuming data must be scanned (#8243)
  • ORC
    • Handle filters with transforms by assuming the filter matches (#8244)
  • Vendor Integrations
    • GCP: Fix single byte read in GCSInputStream (#8071)
    • GCP: Add properties for OAtuh2 and update library (#8073)
    • GCP: Add prefix and bulk operations to GCSFileIO (#8168)
    • GCP: Add bundle jar for GCP-related dependencies (#8231)
    • GCP: Add range reads to GCSInputStream (#8301)
    • AWS: Add bundle jar for AWS-related dependencies (#8261)
    • AWS: support config storage class for S3FileIO (#8154)
    • AWS: Add FileIO tracker/closer to Glue catalog (#8315)
    • AWS: Update S3 signer spec to allow an optional string body in S3SignRequest (#8361)
    • Azure: Add FileIO that supports ADLSv2 storage (#8303)
    • Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
    • Nessie: Provide better commit message on table registration (#8385)
  • Dependencies
    • Bump Nessie to 0.71.0
    • Bump ORC to 1.9.1
    • Bump Arrow to 12.0.1
    • Bump AWS Java SDK to 2.20.131

apache-iceberg-1.3.1

9 months ago

What's Changed

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.3.0...apache-iceberg-1.3.1

apache-iceberg-1.3.0

11 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/apache/iceberg/compare/apache-iceberg-1.2.0...apache-iceberg-1.3.0

apache-iceberg-1.2.0

1 year ago