Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Follow publication

Iceberg Series: ACID Transactions at Scale on the Data Lake in Adobe Experience Platform

Jaemi Bremner
Adobe Tech Blog
Published in
16 min readApr 15, 2021

--

Data Restatement

Figure 1: High-level representation of data restatement operations

Restatement Challenges

Scale requirements

Suboptimal scheduling

Suboptimal performance

Apache Iceberg

Motivation

Tombstone Extension

dataFrame    .option(DataSetOptions.orgId, "<customer.organization>")    .option(DataSetOptions.batchIdsToReplace, "<customer.batch>")    .option(DataSetOptions.batchReplacingReason, "replace")    .save("<customer.dataSetId>")

Design Principles

MERGE ON READ

WRITE PATH

Figure 2: A human-readable format of tombstones schema
Figure 3: Tombstone internal metadata representation

READ PATH

Figure 4: Prune tombstone rows on read

VACUUM

Figure 5: Write optimizations for an efficient vacuum
"lower_bounds": {
"array": [
{
"key": 3,
"value": "A"
}
]
},
"upper_bounds": {
"array": [
{
"key": 3,
"value": "F"
}
]
}

METRICS

"tombstone.file" -> "file://..."
"tombstone.metrics.added.count" -> "100"
"tombstone.metrics.deleted.count" -> "0"
"tombstone.metrics.total.count" -> "158"
"tombstone.file" -> "file://..."
"tombstone.metrics.added.count" -> "0"
"tombstone.metrics.deleted.count" -> "100"
"tombstone.metrics.total.count" -> "58"
"tombstone.vacuum.file" -> "file://..."
"tombstone.metrics.vacuum.count" -> "58"

PROCESSING

Figure 6: Rows unsorted by columnX
Figure 7: Rows sorted by columnX

Incremental Deletes

=> SELECT COUNT(*) FROM events TIMESTAMP SINCE '2021-01-01 00:00:00';count(1)
----------
144
=> SELECT COUNT(*) FROM events SNAPSHOT SINCE 8642949070972100074;count(1)
----------
13608
deleted rows = active_data_files.join(active_soft_deletes, "left_semi")
.union(vacuumed_data_files.join(vacuumed_soft_deletes, "left_semi"))

Going further

Related Blogs

References

--

--

Published in Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Written by Jaemi Bremner

DevX and Experience Technologist. LinkedIn: @jaemibremner Twitter: @jaeness

No responses yet

Write a response