Apache Iceberg 1.3 was released.
Spark now supports the Iceberg UUID type as a string type, ensuring compatibility with tables created using Trino. Some additional noteworthy updates are mentioned below:
- Flink version 1.17 support was added
- Spark version 3.4 support was added
- Added support for Spark’s new TIMESTAMP_NTZ type using Iceberg’s TIMESTAMP WITHOUT TIMEZONE. #7553
- Removed a sort from the MERGE cardinality check (Thanks, Anton!) #7558
- Procedure to rewrite positional delete files (Thanks, Szehon!)
- Mitigated FileIO closing problems (Thanks, Eduard!)
- JDK 17 support was added
- Support was dropped for:
- Flink version 1.14
- Spark version 2.4
Exciting times for PyIceberg. We’re wrapping up the PyIceberg 0.4.0 release, which brings:
- Support for converting Parquet schemas into Iceberg ones
- Support for reading data using FSSpec.
- Support fetching a limited number of rows to quickly peek into a dataset.
- Improved performance with PyArrow>=12.0.0.
- Improve query performance by adding filter pushdown using column range metadata.
- Ability to do SQL style filters:
row_filter='passengers >= 3'
- SigV4 support for the REST catalog.
- A complete makeover of the docs site.
- And many bugs have been fixed!
Iceberg in the industry
- IBM supports Iceberg as part of WatsonX AI initiative
- Starburst launches a new Iceberg page
- Starburst adds support for Tabular
Blogs from the community
- Tabular — Securing the Data Lake — Part II
- Starburst — How to migrate your Hive tables to Apache Iceberg
- Starburst — Tutorial: Using Starburst Galaxy’s materialized views with Apache Iceberg
- Tabular — Tutorial: Using Trino and Iceberg for data warehousing
- Anuj Syal — Top 5 New Data Engineering Technologies to Learn in 2023
- Marin Aglić — Learning Apache Iceberg — Storing the Catalog to Postgres
- Amazon — Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes
Iceberg in the news
- Oracle: Oracle Autonomous Data Warehouse Breaks Through the Limitations of Data Management
- The Register: Trino and dbt open source data tools snuggle closer with integrated SaaS
- CXOtoday: Cloudera Recognized as a Leader in 2023 GigaOm Radar for Data Lakes and Lakehouses
- datanami: The Semantic Layer Architecture: Where Business Intelligence is Truly Heading
- datanami: HPE Brings Analytics Together on its Data Fabric
Keep up to date on all things iceberg
Watch for new blog posts added to the Blogs page
See the community Contribute guide to learn how to start contributing to Iceberg
Join the Apache Iceberg workspace on Slack using the invite link
Subscribe to the Apache Iceberg mailing list
Originally published at https://tabular.io on May 31, 2023.