Iceberg updates
- Java: Flink sink adds custom partitioner to better distribute traffic for bucket partitioned tables (Thanks, Sergio!)
- Java: AWS, GCP, and Azure bundles (Thanks, Bryan!)
- Java: Azure FileIO (Thanks, Bryan!)
- Java: Delete file in job planning optimizations (Thanks, Anton!)
- Java: Fixed branches with empty tables (Thanks, ConeyLiu!)
- Rust: Merged TableMetadata (including (de)serialization), (Thanks, Jan!)
- Go: Schema and types (Thanks, Matt!)
PyIceberg updates
PyIceberg 0.5.0 is inbound! With 0.4.0 just released, many new features already accumulated on the main branch. With stuff like:
- Full support for schema evolution through PyIceberg
- GCS Support
- HDFS Support (through PyArrow)
- Support for GZIP compressed metadata
- Changes to make PyIceberg run in AWS Lambda
- 10x speed improvements for the Avro parsing by using Cython
- Support for the SQLCatalog (JDBC Catalog in Java)
- Moving to Pydantic v2, which offers speed improvements when parsing the metadata JSON.
- And many fixes and improvements both to the code and documentation.
Make sure to subscribe to the devlist to test and validate the Release Candidates that will be announced soon.
More information can be found on the project site, and the package is available on PyPI.
Rust and Go
There is some amazing progress on both the Rust and Go implementations. If you’re interested, make sure to star and watch to the repository.
Iceberg in the industry
- Databend — Preliminary Iceberg support added
- PuppyGraph — Adds support for Iceberg
- GlareDB -0.3.0 released with Iceberg read support added
- Notable now includes PyIceberg by default
- Outerbounds now works with PyIceberg (video)
- Snowflake — Unifying Iceberg Tables on Snowflake
Blogs from the community
- Info Q — Streaming from Apache Iceberg — Building Low-Latency and Cost-Effective Data Pipelines
- Nathan Glover — Vacuuming Amazon Athena Iceberg with AWS Step Functions
- Kestra — Apache Iceberg Crash Course for AWS users: Amazon S3, Athena & AWS Glue ❤️ Iceberg
- Dev — AWS open source newsletter, #168
- Kevin Talbert — Adopting an Open Data Lakehouse with NiFi
- Mehul Batra — Frosty Data Adventures: How a Squirrel Thrived on the Iceberg of Information
- Akshay Jain — Mastering Apache Iceberg: Optimizing Streaming and Batch Updates for Stellar Data Performance
- Vino Duraisamy — Iceberg Tables on Snowflake: Design considerations and Life of an INSERT query
- Mike Taveirne — When To Use Iceberg Tables in Snowflake
- Amazon — Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics
- Jason Hughes — The Sinking Data Warehouse: Is Apache Iceberg the Next Step?
- Thomas Cardenas — Solving the Small File Problem in Iceberg Tables
Iceberg in the news
- Computer Weekly: Inside Cloudera’s data platform strategy
- Helpnet Security: OpenText Cloud Editions 23.3 helps customers interconnect and exchange insights across clouds
- The Register: AWS and IBM Netezza come out in support of Iceberg in table format face-off
- The New Stack: A Real-Time Data Platform for Player-Driven Game Experiences
- Silicon Angle: What is a data platform?
Keep up to date on all things iceberg
Watch for new videos on the Iceberg YouTube Channel
Read blog posts added to the Blogs page
See the community Contribute guide to learn how to start contributing to Iceberg
Join the Apache Iceberg workspace on Slack using the invite link
Subscribe to the Apache Iceberg mailing list