ETL Data Architectures – Part 1

Explore ETL data architectures like Medallion, Lambda, and Kappa in Databricks. Learn how to choose the right architecture for your data needs.

Share this on:

Choosing the right architecture for your data needs is a crucial first step in launching your project. There are several ETL data architectures available to meet different business requirements and data complexities.

This technical blog series consists of two parts:

The first part covers commonly used data architecture in Databricks such as the Medallion Architecture, Lambda Architecture, and Kappa Architecture, which we have implemented for various clients.
The second part consists of a few more commonly used data architectures across the data industry such as Data Vault Architecture, Kimball’s Dimensional Data Warehouse, Inmon’s Corporate Information Factory, and Lakehouse Architecture.

Databricks Data Architectures

1. Medallion Architecture

The Medallion Architecture is a structured method for data processing commonly used in data lakes.

It tackles the challenges of handling vast amounts of varied data by categorizing it into increasingly refined layers.

The main goal is to enhance data quality and traceability while establishing a clear framework for data governance.

Bronze Layer

The first layer of data ingestion is the Bronze Layer.

In this layer, raw data is collected from a source that includes point-of-sale systems, online transactions, IoT devices, and other data-generating systems.

This raw data is kept in its original format within a data lake so that large volumes of diverse data types can be stored without immediate transformation.

Silver Layer

The Silver Layer is focused on cleaning and enriching the raw data gathered during the Bronze Layer. Here, the data undergoes a sequence of processes meant to enhance its quality and usefulness.

These involve removing duplicates, standardizing the format of the data, correcting errors, and appending additional contextual information, such as customer demographics or product metadata.

Gold Layer

The Gold Layer is the last layer in the Medallion Architecture, where data is aggregated & refined for specific business applications. This layer is committed to creating business-specific datasets optimized for reporting, dashboarding, and advanced analytics.

The data is very curated and sometimes tailored to suit the needs of different business units, such as finance, marketing, & operations.

2. Lambda Architecture

The Lambda Architecture is a data processing framework that effectively manages large volumes of data by combining batch and real-time processing techniques.

It meets the demand for real-time analytics while also enabling thorough analysis of historical data.

This architecture is especially useful in situations where both quick updates and extensive data processing are necessary.

Lambda Architecture consists of three primary components: the Batch Layer, the Speed Layer, and the Serving Layer.

Each component has a distinct function that ensures data is processed efficiently.

Batch Layer

The Batch Layer processes and stores vast amounts of historical data.

It performs extensive data processing operations that can handle higher latencies, like calculating long-term aggregates, trends, and patterns.

The outcome is stored in a batch view, which presents a unified view of the historical information.

Speed Layer

The Speed Layer, however, deals with real-time data streams, processing the latest data as soon as it is available. This component complements the Batch Layer to produce immediate insights and updates based on the most recent data.
Typically, the Speed Layer is built using stream processing frameworks such as Apache Storm, Flink, or Kafka Streams. The objective of the Speed Layer is to process real-time data as fast as possible while producing incremental updates that can be merged with the batch view within the Serving Layer. This will ensure that both the latest and historical data that have been processed by the Batch Layer are made available to users.

Serving Layer

The final layer is the Serving Layer, which combines both batch view and real-time view.

This layer makes it possible for queries to have access to historical as well as real-time data, allowing users to conduct complex analytical tasks.

It integrates the outputs coming from the Speed Layer and Batch Layer, bringing about the idea of the Serving Layer to facilitate better query performance.

In return, this will allow the extraction of valid information from the source, despite unknown origin.

3. Kappa Architecture

The Kappa Architecture is a simplification of the Lambda Architecture but is designed specifically for real-time data processing. It does not include batch processing but focuses solely on the processing of data streams in real-time.

This architecture is ideal for situations where there is no need to process historical data, with an emphasis on low-latency ingestion and processing of data. Kappa Architecture aims to simplify the entire data pipeline and, by providing a single stream layer for consumption, processing, and storage.

This would solve the problems relating to handling one batch layer in addition to at least one speed layer and its complexity when compared to others.

Streaming Layer

The Streaming Layer is one of the key components of the Kappa Architecture, responsible for real-time ingestion and processing of data.

It uses stream processing frameworks like Apache Kafka, and Apache Flink, to handle continuous data streams.

The main role of the Streaming Layer is to handle incoming data events as they happen, performing transformations and computations in real-time.

This approach guarantees that data is processed with minimal delay and is readily available for immediate use.

Serving Layer

The Serving Layer of the Kappa Architecture provides a real-time view of the processed data.

It holds the output from the Streaming Layer and enables real-time queries and analytics.

This layer is designed to support high query volumes with rapid response times.

The Streaming Layer and the Serving Layer are linked, ensuring that the users are always accessing the most recent data.

This ensures that real-time decision-making and analytics are possible.

Lumendata’s Architecture Selection Guidelines

At LumenData, we extensively work with the Databricks platform, implementing the above data architectures for clients. Listed below are a few guidelines for selecting correct and suitable architecture depending on use cases.

Select Medallion Architecture When:

Select Lambda Architecture When:

Select Kappa Architecture When:

Every architecture has a purpose, based on specific use cases, business needs, and technical limitations. The important thing is to understand these trade-offs and choose the right architecture that fits your needs.

About LumenData:

LumenData is a leading provider of Enterprise Data Management, Cloud and Analytics solutions and helps businesses handle data silos, discover their potential, and prepare for end-to-end digital transformation. Founded in 2008, the company is headquartered in Santa Clara, California, with locations in India.

With 150+ Technical and Functional Consultants, LumenData forms strong client partnerships to drive high-quality outcomes. Their work across multiple industries and with prestigious clients like Versant Health, Boston Consulting Group, FDA, Department of Labor, Kroger, Nissan, Autodesk, Bayer, Bausch & Lomb, Citibank, Credit Suisse, Cummins, Gilead, HP, Nintendo, PC Connection, Starbucks, University of Colorado, Weight Watchers, KAO, HealthEdge, Amylyx, Brinks, Clara Analytics, and Royal Caribbean Group, speaks to their capabilities.

For media inquiries, please contact: marketing@lumendata.com.

Authors

Ritesh Chidrewar

Senior Consultant

resources

Read our Case Studies

LumenData implements data cataloging and lineage with Snowflake and dbt for global consulting firm

Case Studies, Snowflake

LumenData Enables Comprehensive Data Cataloging & Lineage using Snowflake and dbt Systems for a Global Consulting Firm

Explore how LumenData helped to increase supplier performance & risk management and reduce time-to-insight for decision-makers.

Learn more

LumenData enables UD Trucks to improve data access and scalability using Informatica SaaS MDM

Case Studies, Informatica, Manufacturing

LumenData Helps UD Trucks to Use Informatica SaaS MDM to Enable Faster Data Access and Enhanced Scalability

Explore How LumenData Helps UD Trucks to Use Informatica SaaS MDM to Enable Faster Data Access and Enhanced Scalability

Learn more

Data modernization and intelligent reporting for a corporate travel provider

Case Studies, Travel & Hospitality

Data Modernization and Intelligent Reporting for a Leading Corporate Travel Provider

See how LumenData empowered a travel firm with data modernization, MDM upgrades, and real-time insights to boost growth and efficiency.

Learn more

ETL Data Architectures – Part 1

What You'll Learn

Databricks Data Architectures

1. Medallion Architecture

Bronze Layer

Silver Layer

Gold Layer

2. Lambda Architecture

Batch Layer

Speed Layer

Serving Layer

3. Kappa Architecture

Streaming Layer

Serving Layer

Lumendata’s Architecture Selection Guidelines

Select Medallion Architecture When:

Select Lambda Architecture When:

Select Kappa Architecture When:

About LumenData:

Authors

Read our Case Studies

Our Partners

LumenData: 2025 Informatica Global Growth Partner & Data Backbone for the Salesforce Ecosystem

Solutions

LumenData Accelerator for MDM Modernization

LumenData 360++ Extension for Supplier 360

LumenData Accelerator for Higher Ed 360

LumenData Axon to CDGC Modernization​

Informatica Reference 360 SaaS Accelerator

Life Science Accelerator for Customer360 SaaS

Migrating from Oracle DRM to Informatica R360

Salesforce Accelerator for Customer360 SaaS

SAP Accelerator for Customer360 SaaS

Salesforce Connector for Informatica MDM

Salesforce Connector for Oracle MDM Product

Reltio Integration for Salesforce

Industry Focus

Public Sector

Financial Services

Higher Education

Retail

Healthcare

Manufacturing

High Tech

Travel & Hospitality

Featured Case Studies

Services Focus

Cloud Modernization

Data Engineering & Analytics

Data Strategy & Business Value Assessment

Solutions

Industry Focus

Telemedicine

Dating Apps

Fintech

Consulting Providers

Featured Case Studies

Simplifying IT for a complex world.

Platform partnerships

Services

Business Challenges

Digital Transformation

Security

Automation

Gaining Efficiency

Industry Focus

LumenData

How can LumenData help?

LumenData Axon to CDGC Modernization

Simplifying IT
for a complex world.