In broad strokes, the Medallion Architecture pattern separates your data into three layers:
- 🟤 Bronze, where you initially store raw data
- ⚪ Silver, where it is cleaned, deduplicated and format-normalised
- 🟡 Gold, where product-specific production datasets live.
In its original form, it is heavy, enterprise-y and centred around a lakehouse architecture, which is often over the top for earlier-stage projects. It does not help with complexity on its own and it may introduce unnecessary friction.
In addition to being a useful stakeholder storytelling tool, it gets the following right before you even need to think about versioning, traceability, maturity, governance etc:
- Think in terms of one-way improvements towards your destination, not in-place evolution or patching
- Control all your data early - ideally centrally; don’t rely on being able to quickly re-fetch data later
- Separate transformations from eventual serving artefacts, e.g. don’t enforce ACID and SQL database overhead for transformation work
- Split transformations into stages, modularise code, and enforce schema at the data level
- Treat data as having interfaces; establish early what you actually need from it.