March 2023 — Iceberg Community News

Tabular
3 min readMar 31, 2023

--

Iceberg updates

Apache Iceberg 1.2.0 was released on March 20th, 2023. The 1.2.0 release adds a variety of new features and bug fixes. Here is an overview:

Core

  • Added AES GCM encryption stream spec (#5432)
  • Added support for Delta Lake to Iceberg table conversion (#6449, #6880)
  • Added support for position_deletes metadata table (#6365, #6716)
  • Added support for scan and commit metrics reporter that is pluggable through catalog (#6404, #6246, #6410)
  • Added support for branch commit for all operations (#4926, #5010)
  • Added FileIO support for ORC readers and writers (#6293)
  • Updated all actions to leverage bulk delete whenever possible (#6682)
  • Updated snapshot ID definition in Puffin spec to support statistics file reuse (#6272)
  • Added human-readable metrics information in files metadata table (#5376)

Spark

  • Added time range query support for changelog table (#6350)
  • Added changelog view procedure for v1 table (#6012)
  • Added support for storage partition joins to improve read and write performance (#6371)
  • Updated default Arrow environment settings to improve read performance (#6550)
  • Added aggregate pushdown support for min, max and count to improve read performance (#6622)
  • Updated default distribution mode settings to improve write performance (#6828, #6838)
  • Updated DELETE to perform metadata-only update whenever possible to improve write performance (#6899)
  • Improved predicate pushdown support for write operations (#6636)
  • Added support for reading a branch or tag through table identifier and VERSION AS OF (a.k.a. FOR SYSTEM_VERSION AS OF) SQL syntax (#6717, #6575)
  • Added support for writing to a branch through identifier or through write-audit-publish (WAP) workflow settings (#6965, #7050)
  • Added DDL SQL extensions to create, replace and drop a branch or tag (#6638, #6637, #6752, #6807)
  • Added UDFs for years, months, days and hours transforms (#6207, #6261, #6300, #6339)
  • Added partition related stats for add_files procedure result (#6797)

Flink

  • Added support for metadata tables (#6222)
  • Added support for read options in Flink source (#5967)
  • Added support for reading and writing Avro GenericRecord (#6557, #6584)
  • Added support for reading a branch or tag and write to a branch (#6660, #5029)
  • Added throttling support for streaming read (#6299)
  • Added support for multiple sinks for the same table in the same job (#6528)

Vendor Integrations

  • Added Snowflake catalog integration (#6428)
  • Added AWS sigV4 authentication support for REST catalog (#6951)
  • Added support for AWS S3 remote signing (#6169, #6835, #7080)
  • Updated AWS Glue catalog to skip table version archive by default (#6919)
  • Updated AWS Glue catalog to not require a warehouse location (#6586)

Dependencies

  • Upgraded ORC to 1.8.1 (#6349)
  • Upgraded Jackson to 2.14.1 (#6168)
  • Upgraded AWS SDK V2 to 2.20.18 (#7003)
  • Upgraded Nessie to 0.50.0 (#6875)

PyIceberg updates

  • Added Python support for metrics filtering (Fokko Driesprong)
  • Added Python support for startsWith (Luigi)
  • Removed Python legacy! (Python community)

More information can be found on the project site, and the installer can be found here

Iceberg in the industry

  • Cloudera has integrated Iceberg V1 support
  • Trino has added Iceberg improvements in release 409
  • iData has Iceberg support in their Pipeline product
  • CelerData adds Iceberg integration in V3
  • Snowflake Iceberg catalog support is now available

Blogs from the community

Iceberg in the news

Keep up to date on all things iceberg

Watch for new blog posts added to the Blogs page

See the community Contribute guide to learn how to start contributing to Iceberg

Join the Apache Iceberg workspace on Slack using the invite link

Subscribe to the Apache Iceberg mailing list

Originally published at https://tabular.io on March 31, 2023.

--

--

Tabular

Tabular is building an independent cloud-native data platform powered by the open source standard for huge analytic datasets, Apache Iceberg.