Medallion Architecture: Lessons from 50+ Production Deployments

The Medallion architecture — Bronze (raw), Silver (cleaned), Gold (analytics-ready) — has become the de facto standard for enterprise data lakes built on Apache Iceberg, Delta Lake, or Hudi. After deploying it in 50+ production environments, we have learned that the architecture's success depends almost entirely on the decisions made at the boundaries between layers: what belongs in each layer, what the failure mode is when a layer falls below its quality threshold, and how to handle the inevitable schema evolution. The most common mistake is treating Bronze as a staging area and Gold as the "real" data. In production, all three layers are real, all three are queryable, and all three need reliability monitoring. Teams that monitor only the Gold layer discover problems after they have already propagated through all three transformation steps.

Bronze is your immutable record of what you received. The defining constraint: you never modify Bronze data after ingestion. If a source sends corrupted records, those corrupted records stay in Bronze — you quarantine them, you note the issue, but you do not delete or overwrite. This makes Bronze your audit trail and your time-travel checkpoint. Iceberg's append mode enforces this constraint mechanically. The 50% validation threshold for Bronze is not a sign of low standards — it is an acknowledgment that raw data from external sources is messy. The 50% threshold catches catastrophic failures (entire schema change, empty file, binary corruption) while allowing normal data variation. If your Bronze threshold is 90%, you will either miss real failures or generate constant false positive alerts as source systems evolve.

Silver is where you make trust decisions. Records that passed Bronze validation are cleaned, standardized, and enriched. Records that failed Bronze validation enter a quarantine table for manual review. The MERGE INTO operation in Iceberg Silver mode enables upserts: if a source system sends a corrected version of a record, Silver merges the correction without duplicating the original. The 75% Silver threshold catches data cleaning logic errors: a standardization function that produces nulls on unexpected input formats, a join that drops more records than expected, a deduplication step that is too aggressive. These are the failures that require a Silver engineer to investigate — not the raw data issues that belong to the source team.

Gold is the analytics contract. Nothing reaches Gold without passing Silver validation. Gold transformations are deterministic aggregations — no approximations, no sampling, no ML inference. The 90% Gold threshold is non-negotiable because Gold feeds production dashboards, ML feature stores, and operational reports. A Gold failure is a P1 incident by definition. The most important Gold layer decision is the failure mode: when Gold falls below 90%, does the job fail fast (stopping the entire pipeline) or does it complete with a quality warning? The answer depends on the downstream consumer. If Gold feeds a real-time operational dashboard, fail fast. If Gold feeds a daily batch report that can tolerate a one-day delay, complete with warning and page the team asynchronously.

Iceberg integration changes the Bronze→Silver→Gold data movement pattern significantly. Without Iceberg, each layer produces Parquet files that downstream layers read by path. With Iceberg, each layer produces a versioned table in the Glue catalog. Silver's input is not a path to Bronze Parquet — it is a Glue catalog reference: glue_catalog.{database}.bronze_{dataset}. This enables time-travel debugging (query the Bronze table at any historical snapshot), schema evolution tracking (Iceberg records every schema change), and snapshot retention policies (automatically expire Bronze snapshots older than 30 days to control storage cost). The operational lesson from 50+ deployments: the teams that treat the Medallion architecture as a technical pattern (three table types) consistently struggle. The teams that treat it as an operational discipline (three trust levels, each with defined failure modes and recovery procedures) consistently succeed.

Key Takeaways

Monitor all three layers — problems caught only at Gold have already propagated through all transformation steps
Bronze is an immutable audit trail — never modify; append-only Iceberg mode enforces this mechanically
Layer thresholds encode trust levels: Bronze 50% (raw data is messy), Silver 75% (cleaning logic errors), Gold 90% (P1 if violated)
Iceberg catalog references (not Parquet paths) enable time-travel debugging and schema evolution tracking across layers
Treat Medallion as an operational discipline (three trust levels with defined failure modes) not just a technical pattern

Medallion Architecture: Lessons from 50+ Production Deployments

Key Takeaways

Related Articles

Ready to transform how you use your data?