Orchestrating ETL Workflows with Snowflake Tasks: A Real-World Success Story

See how LumenData used Snowflake native tasks to simplify ETL workflows, cut costs, and improve monitoring for an e-commerce client.

Share this on:

LinkedIn
X

What You'll Learn

This blog covers our recent deployment where we used Snowflake’s native task orchestration feature to address issues in data pipeline maintenance for a customer. Learn how we provided a more streamlined, economical solution for our customer.

Client Challenges

Our customer, an expanding e-commerce business, was facing multiple pain points in their current data pipeline infrastructure.

LumenData’s Approach and Snowflake-Native Solution

After understanding and evaluating client requirements, we developed our own custom Snowflake native solution.

Our Snowflake native solution and architecture is centred around the following key components:

1. Dimensional Data Model

We refactored and enhanced the client’s dimensional model with a standard star schema which included three dimensional and one fact table.

(Note: all examples and models are for reference purpose and are like client data model)

Customer Dimension

Stores customer profiles and attributes

Product Dimension

Contains product details and pricing information

Date Dimension

Maintains a comprehensive date hierarchy

Sales Fact

Captures transactional data connected to the dimensions

2. Stored Procedures for Data Update

For each update operation, we created dedicated stored procedures

update_customer_proc():

Similar procedures were created for product and date dimensions. Plus, an aggregation procedure for the sales fact table:

3. Task Orchestration with Dependencies

The core centre of our solution was a task framework that was hierarchical and helped us in managing execution dependencies. We can leverage CRON scheduling with Snowflake tasks. It is more helpful and efficient in scheduling and orchestration.

For testing and demo purpose, we have commented out the code and used hardcoded 1 minute as pipeline schedule.

This created a directed acyclic graph (DAG) of dependencies:

Testing Commands

Below are testing commands for reference:

RESUME TASKS -> Starts/Resumes the tasks
SUSPEND TASKS -> Pauses/Suspends tasks

Manual Execution command for testing workflow/pipeline:

EXECUTE TASK root_task_test_ritesh;

4. Monitoring and Debugging

With this approach, we have seen significant advantages along with comprehensive monitoring and debugging.

Centralized Task History: Snowflake provides built-in task execution history through the INFORMATION_SCHEMA:

This provides a 360-degree visibility into:

Task Execution Times

Run Duration

Success/Failure Status

Error Messages

Dependencies & Triggers

All this information is available in one place without needing to correlate logs across multiple systems.

Alternatively, you can leverage Snowflake Snowsight UI to check individual task runs.

Real-Time Status Monitoring: The client’s team can now monitor pipeline status in real-time through simple SQL queries:

SHOW TASKS;

This provides immediate visibility into:

What Tasks are Currently Running

What Tasks are Scheduled

What Tasks Suspended Due to Issues

Historical Performance Analysis: Our Snowflake native solution helps the client in performing historical analysis.

This would include:

Identifying tasks that are taking longer than expected

Identifying recurring patterns in failures

Optimizing warehouse sizing based on actual resource usage

5. Further Developments

We are now building a Streamlit based UI dashboard for self-service troubleshooting and proactive monitoring. It consolidates and captures details from Task History.

This helps the client team to manually execute the SQL commands in Snowflake environment and get the details directly from the Streamlit UI Dashboard.

Furthermore, we are integrating email notification and Microsoft Teams notifications for proactive monitoring of pipeline failures.

Check out our implemented solutions here:

6. Key Takeaways

About LumenData:

LumenData is a leading provider of Enterprise Data Management, Cloud and Analytics solutions and helps businesses handle data silos, discover their potential, and prepare for end-to-end digital transformation. Founded in 2008, the company is headquartered in Santa Clara, California, with locations in India. 

With 150+ Technical and Functional Consultants, LumenData forms strong client partnerships to drive high-quality outcomes. Their work across multiple industries and with prestigious clients like Versant Health, Boston Consulting Group, FDA, Department of Labor, Kroger, Nissan, Autodesk, Bayer, Bausch & Lomb, Citibank, Credit Suisse, Cummins, Gilead, HP, Nintendo, PC Connection, Starbucks, University of Colorado, Weight Watchers, KAO, HealthEdge, Amylyx, Brinks, Clara Analytics, and Royal Caribbean Group, speaks to their capabilities. 

For media inquiries, please contact: marketing@lumendata.com.

Authors

Picture of Shalu Santvana
Shalu Santvana

Content Writer

Picture of Ritesh Chidrewar
Ritesh Chidrewar

Senior Consultant

resources

Read our Case Studies