Skip to content

Apache Iceberg on Amazon EMR

In this exercise you will build incremental data lakes on EMR using Apache Iceberg. You will learn about the most important features of Iceberg like schema evolution, time traveling and also S3 traffic scaling using Object Store File Layout.

Apache Iceberg Features

From the EMR Studio workspace Jupyterlab session, go to workshop-repo -> files -> notebook -> apache-iceberg-on-amazon-emr.ipynb. Run all the blocks of this notebook.

Detailed instructions are within the notebook.

Stop the session and restart kernel once you are done.