Medallion Architecture in Practice: Lessons from Production Data Pipelines

When we took over the data infrastructure at a regional bank in West Africa, we inherited a collection of nightly batch jobs, scattered SQL scripts, and a data warehouse that nobody fully understood. Queries took 40+ minutes. Reports were wrong by the next morning. The analytics team had stopped trusting the data.

We rebuilt it around the Medallion Architecture over eight months. Here is what the implementation actually looked like — the decisions, the mistakes, and the parts we’d do differently.

What Medallion Architecture Is (and Isn’t)

The Medallion Architecture (also called the multi-hop architecture) organises data lake storage into three zones:

Bronze — raw ingested data, immutable, exactly as it arrived from the source
Silver — cleaned, validated, joined data; “enterprise truth”
Gold — aggregated, business-specific datasets built for consumption

The value is not the layer names — it’s the discipline of separating concerns. Bronze is an audit log. Silver is where data quality lives. Gold is where business logic lives. When something is wrong in a report, you can trace it layer by layer.

Our Stack

Sources → Fivetran / custom Python connectors
Storage → MinIO (on-prem S3-compatible)
Format  → Delta Lake (Parquet + transaction log)
Orchestration → Apache Airflow 2.8 (self-hosted on Kubernetes)
Transformation → dbt Core (Silver → Gold)
Serving → PostgreSQL (Gold materialised views → BI tools)
BI → Apache Superset

All self-hosted. The bank had strict data residency requirements — no data could leave their on-premises data center. This ruled out managed cloud services and shaped every infrastructure decision.

Bronze Layer: Ingest Without Transformation

The cardinal rule of Bronze: land data exactly as it arrived. No type coercion, no null handling, no business logic. Everything is a string if you have to choose.

from airflow import DAG
from airflow.providers.postgres.hooks.postgres import PostgresHook
from airflow.operators.python import PythonOperator
from deltalake import write_deltalake
import pandas as pd
from datetime import datetime, timedelta

def ingest_transactions(**context):
    hook = PostgresHook(postgres_conn_id="core_banking_prod")

    # Always use an extraction watermark, never full table scans
    watermark = context["data_interval_start"]
    sql = """
        SELECT *
        FROM transactions
        WHERE updated_at >= %(watermark)s
          AND updated_at < %(watermark_end)s
    """
    df = hook.get_pandas_df(sql, parameters={
        "watermark": watermark,
        "watermark_end": context["data_interval_end"],
    })

    # Preserve source types as strings — Bronze is an archive
    df = df.astype(str)
    df["_ingested_at"] = datetime.utcnow().isoformat()
    df["_source"] = "core_banking.transactions"
    df["_pipeline_run_id"] = context["run_id"]

    write_deltalake(
        "s3://datalake/bronze/core_banking/transactions",
        df,
        mode="append",
        partition_by=["_ingested_at_date"],  # partition for efficient pruning
        storage_options={"endpoint_url": "http://minio:9000", ...},
    )

with DAG(
    "bronze_core_banking",
    schedule="@hourly",
    start_date=datetime(2025, 1, 1),
    catchup=True,  # critical for backfill capability
    default_args={"retries": 3, "retry_delay": timedelta(minutes=5)},
) as dag:
    PythonOperator(task_id="ingest_transactions", python_callable=ingest_transactions)

Three metadata columns we add to every Bronze record:

Column	Purpose
`_ingested_at`	When did we receive this record?
`_source`	Which source system and table?
`_pipeline_run_id`	Which Airflow run produced this?

These columns make debugging Silver issues dramatically faster — you can always trace a Silver anomaly back to the specific Bronze ingestion run.

Silver Layer: Data Quality at Rest

Silver is where raw data becomes trustworthy data. We use dbt for the Bronze-to-Silver transformation because it brings:

SQL-based transformations with a clear lineage graph
Built-in testing (not_null, unique, relationships, custom assertions)
Incremental model support (process only new/changed records)

-- models/silver/transactions.sql
{{
  config(
    materialized='incremental',
    unique_key='transaction_id',
    on_schema_change='fail',
    tags=['silver', 'core_banking']
  )
}}

WITH source AS (
    SELECT *
    FROM {{ source('bronze', 'core_banking_transactions') }}
    {% if is_incremental() %}
    WHERE _ingested_at > (SELECT MAX(_ingested_at) FROM {{ this }})
    {% endif %}
),

cleaned AS (
    SELECT
        CAST(transaction_id AS BIGINT)                          AS transaction_id,
        CAST(account_id AS BIGINT)                             AS account_id,
        CAST(amount AS DECIMAL(18, 4))                         AS amount_xof,  -- XOF = West African CFA franc
        UPPER(TRIM(currency_code))                             AS currency_code,
        CASE
            WHEN transaction_type IN ('CR', 'CREDIT') THEN 'credit'
            WHEN transaction_type IN ('DR', 'DEBIT')  THEN 'debit'
            ELSE NULL
        END                                                    AS transaction_type,
        TRY_CAST(transaction_date AS TIMESTAMP)                AS transaction_date,
        _ingested_at,
        _source,
        _pipeline_run_id
    FROM source
    WHERE transaction_id IS NOT NULL
      AND account_id IS NOT NULL
      AND TRY_CAST(amount AS DECIMAL(18, 4)) IS NOT NULL
)

SELECT * FROM cleaned

dbt tests sit alongside the model:

models:
  - name: transactions
    columns:
      - name: transaction_id
        tests: [not_null, unique]
      - name: account_id
        tests: [not_null]
      - name: transaction_type
        tests:
          - accepted_values:
              values: ['credit', 'debit']
      - name: amount_xof
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: -10000000
              max_value: 10000000

When a test fails, the Silver DAG fails. The previous clean version of Silver remains intact. This is the ACID benefit of Delta Lake — failed transformations don’t corrupt existing data.

Gold Layer: Business Logic, Not Data Logic

Gold models are built for specific business questions. They should be opinionated and denormalised for query performance.

-- models/gold/daily_account_summary.sql
{{
  config(
    materialized='table',
    tags=['gold', 'finance']
  )
}}

SELECT
    d.date_key,
    a.account_id,
    a.account_type,
    a.branch_code,
    SUM(CASE WHEN t.transaction_type = 'credit' THEN t.amount_xof ELSE 0 END) AS daily_credits_xof,
    SUM(CASE WHEN t.transaction_type = 'debit'  THEN t.amount_xof ELSE 0 END) AS daily_debits_xof,
    COUNT(*)                                                                    AS transaction_count,
    -- Running balance (window function over ordered daily partitions)
    SUM(
        CASE WHEN t.transaction_type = 'credit' THEN t.amount_xof
             WHEN t.transaction_type = 'debit'  THEN -t.amount_xof
             ELSE 0 END
    ) OVER (
        PARTITION BY a.account_id
        ORDER BY d.date_key
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    )                                                                           AS running_balance_xof
FROM {{ ref('dim_date') }} d
JOIN {{ ref('transactions') }} t ON DATE(t.transaction_date) = d.date_key
JOIN {{ ref('accounts') }} a ON t.account_id = a.account_id
GROUP BY d.date_key, a.account_id, a.account_type, a.branch_code

What We’d Do Differently

After eight months in production, three things we’d change:

1. Schema enforcement earlier. We were too permissive in Bronze, treating everything as strings and coercing in Silver. For sources where we control the schema (our own microservices), we should enforce structure at ingestion with a schema registry. The flexibility was rarely needed; the type ambiguity caused real bugs.

2. Data contracts from the start. When source teams changed column names or semantics, our Silver models broke silently (wrong data, no test failure). Data contracts formalise expectations between producers and consumers. We retrofitted them; they should have been day one.

3. Observability before features. We spent months building Gold models before properly instrumenting Bronze freshness. When a source started delivering stale data, we had no alert — we found out from the analytics team. Integrate data freshness checks (dbt source freshness, Great Expectations, or elementary) before you rely on the data downstream.

The Medallion Architecture is not magic. It is a discipline: respect the layer boundaries, never skip Bronze, put your data quality tests in Silver, and let Gold be boring. The boring pipelines are the ones that run reliably at 3 AM.