April 2023 — Iceberg Community News

Tabular
3 min readApr 28, 2023

Iceberg updates

  • Flink: 1.17 support was added, 1.14 removed (Liwei Li)
  • Iceberg Java 1.2.0 release is out (Jack Ye)
  • Added View version and parser (Amogh)
  • Improved bit density in object storage layout (Prashant)
  • Add initial support for Spark 3.4

Apache Iceberg 1.2.1 was released on April 11th, 2023. The 1.2.1 release is a patch release to address various issues identified in the prior release. Here is an overview:

CORE

  • REST: fix previous locations for refs-only load #7284
  • Parse snapshot-id as long in remove-statistics update #7235

Spark

  • Broadcast table instead of file IO in rewrite manifests #7263
  • Revert “Spark: Add “Iceberg” prefix to SparkTable name string for SparkUI #7273

AWS

  • Performance improvements for S3 when using the Apache HTTP client #7262
  • S3 Credentials provider support in DefaultAwsClientFactory #7066

PyIceberg updates

  • Wrapping everything up for the 0.4.0 release that will bring:
  • Add support for converting a query into a Ray dataset (thanks Rushan!)
  • A revamp of the documentation page (thanks Luigi!)
  • Able to limit the number of rows of a query (thanks Daniel!)
  • Implemented evaluation of the metrics to speed up queries (thanks Fokko!)
  • Ability to convert an Arrow schema to Iceberg, fixes AWS Athena issues (thanks Rushan!)
  • Add support for positional deletes (thanks Fokko!)

More information can be found on the project site, and the package is available on PyPI.

Iceberg in the industry

  • Google BigQuery managed Iceberg storage
  • Fivetran adds Iceberg on S3 as a destination

Blogs from the community

Iceberg in the news

Keep up to date on all things iceberg

Watch for new blog posts added to the Blogs page

See the community Contribute guide to learn how to start contributing to Iceberg

Join the Apache Iceberg workspace on Slack using the invite link

Subscribe to the Apache Iceberg mailing list

Originally published at https://tabular.io on April 30, 2023.

--

--

Tabular

Tabular is building an independent cloud-native data platform powered by the open source standard for huge analytic datasets, Apache Iceberg.