Medallion Architecture for Early Stage Startups

Photo by Mediamodifier on Unsplash

Medallion architecture is one of those late-stage / enterprise patterns that I've found surprisingly useful quite early. Research projects, app projects, early-stage startups - you name it.

In broad strokes, the Medallion Architecture pattern separates your data into three layers:

  • 🟤 Bronze, where you initially store raw data
  • ⚪ Silver, where it is cleaned, deduplicated and format-normalised
  • 🟡 Gold, where product-specific production datasets live.

In its original form, it is heavy, enterprise-y and centred around a lakehouse architecture, which is often over the top for earlier-stage projects. It does not help with complexity on its own and it may introduce unnecessary friction.

In addition to being a useful stakeholder storytelling tool, it gets the following right before you even need to think about versioning, traceability, maturity, governance etc:

  1. Think in terms of one-way improvements towards your destination, not in-place evolution or patching
  2. Control all your data early - ideally centrally; don’t rely on being able to quickly re-fetch data later
  3. Separate transformations from eventual serving artefacts, e.g. don’t enforce ACID and SQL database overhead for transformation work
  4. Split transformations into stages, modularise code, and enforce schema at the data level
  5. Treat data as having interfaces; establish early what you actually need from it.