Migrating from an On-Premises Data Warehouse to Snowflake Using Informatica IDMC: A Practical Guide

Learn how to migrate from on-prem data warehouse to Snowflake using Informatica IDMC. Boost performance, cut costs, and enable AI-driven analytics.
migrate on-prem data warehouse to Snowflake with Informatica IDMC

Share this on:

LinkedIn
X

What You'll Learn

Migrating an on-premises data warehouse to Snowflake is not just a new platform change; it is the ability to modernize pipelines, lower operational overhead, improve query performance, and enable AI and analytics at scale. By combining Informatica Intelligent Data Management Cloud with Snowflake, you get an enterprise-grade migration stack. Informatica IDMC provides connectors, bulk loading accelerators, data quality, catalog, and security pushdown that reduce migration risk and accelerate time to value. This guide outlines a practical roadmap to migrate from on-premises data warehouse to Snowflake

Why migrate from on-prem data warehouse to Snowflake

Companies are switching to Snowflake for compelling reasons:

Speed & agility

Your analysts and data scientists get a single, scalable SQL environment which separates compute from storage and run analytics faster.

Lower total cost of ownership

Cloud storage plus on-demand compute and better resource isolation shrink infrastructure & maintenance costs.

Simplified operations

 Snowflake eliminates the need to manage HDFS, hardware refreshes, and many of the operational tasks of an on-prem warehouse.

Modern analytics and AI readiness

Snowflake has native integrations for model serving, dynamic data sharing, and operationalizing ML workflows.

Faster time to insight

Properly executed migrations allow you to accelerate reporting and deliver new real-time or near-real-time analytics that were hard to obtain on legacy platforms.

These drivers elevate a migration project into a strategic program which changes data platform architecture, governance, and operating model.

High-level Snowflake migration approaches

Three standard options are offered. Select the one that meets your complexity, your risk tolerance and your business needs.

Role of Informatica IDMC in a Snowflake migration

Informatica Intelligent Data Management Cloud plays several critical roles in an enterprise migration to Snowflake:

  • IDMC’s Snowflake connectors and ingestion accelerators provide capabilities for ingestion of large files, staged load, and scalable parallelized movement of data to Snowflake. This minimizes custom scripts and decreasing migration bulk load complexity and risk.
  • IDMC allows you to create pipelines that pre-transform data before ingesting or to load a whole raw version of the data and transform in Snowflake supporting both ETL and ELT patterns. This is crucial for legacy ETL job modernization.
  • IDMC data quality modules allow you to profile, cleanse, and validate data before and after migration to prevent garbage-in, garbage-out problems in Snowflake. If you have data quality built into the migration, you can avoid re-migrating.
  • IDMC’s Cloud Data Access Management and native pushdown of security policies to Snowflake streamlines and maintains consistency of enforcement of access control, and improves enterprise privacy rules, while allowing governance to be tighter for Snowflake native controls.
  • Informatica IDMC and Snowflake quickstarts have templates and automation that lower the time to first load and validate typical migration scenarios.

When you utilize IDMC, you can automate much of the Snowflake migration process, validate, and enforce security control at scale.

Pre-migration assessment for on-premises data warehouse to Snowflake migration

Key activities include:

Technical migration steps for moving from on-prem data warehouse to Snowflake

This section gives a prescriptive, technical sequence you can practically follow.

Step 1. Extracting data from source systems and staging

Move the data from your on-prem sources to a cloud staging area. For larger data sets, cloud object storage is typically used. Examples of these include AWS S3, Azure Blob Storage or GCP Cloud Storage. Files should be partitioned in the appropriate sizes for use in bulk loading into Snowflake. The goal is to create files that are compressed and between 100 and 250 megabytes after compression. Snowflake will efficiently bulk load the data using the compressed files via the COPY operation.

Step 2. Data schema conversion and mappings

Transform the data types and schemas to formats compatible with Snowflake. This includes the timestamp and numeric precision types, and geometry types. Create documentation to show the mapping of keys, columns, etc to ensure a useful record where you can improve the data model. Moving forward, there are some automation tools to help accelerate the work of schema conversions, however you will still need time to set aside to validate more complex transformations.

Step 3. Data cleaning and standardizing

Utilizing IDMC data quality workflows enables you to standardize fields by defining business rules. By cleaning the data before it is loaded, you will eventually not have to do the rework.

Step 4. Bulk loading to Snowflake

Utilize Snowflake COPY commands, Snowpipe for continuous ingestion, or Informatica IDMC mass ingestion accelerators for processing loads in parallel. You might want to stage files in cloud storage and then use COPY into Snowflake instead of performing the initial load directly to Snowflake. Adjust warehouse sizes before and during loading to balance load times and cost. Depending on use case and time sensitivity, consider using transient or temporary tables for staging data.

Step 5. Transformations and ELT

When you have a chance to conduct a transformation inside Snowflake, it is always best practice to utilize Snowflake compute in parallel processing workloads. This reduces network movement and orchestrates an easier time. When transformation at the source is required, it’s best practice to document why it is pre-transformed and ensure logic is replicated in versioned mapping artifacts in IDMC.

Step 6. Delta capture and continuous synchronization

During, cutover, you will need to track changes that happen after the initial bulk load. Along with the data capture, you can also use the CDC tools or built-in source change methods to capture changes or deltas. IDMC connectors frequently include the capability to push incremental updates into Snowflake efficiently using CDC patterns.

Step 7. Validation and reconciliation

Automate the process of comparing record counts, checksums, referential integrity checks, and business KPIs between the old system and Snowflake. Create dashboards for reconciliations and set up automated alerts, and implement a process where sample validation along with complete reconciliations occurs for important datasets.

What should be the testing, validation, and cutover strategy for Snowflake migration?

Testing is the point at which most migration projects on on-prem to Snowflake transition succeed or fail. We recommend you to utilize a layered testing approach.

Unit tests: Each mapping or conversion should have unit tests to validate field-level transformations, null handling, and type conversion.

Integration tests: Run end-to-end pipeline tests with representative volumes. Validating outputs into downstream consumers, such as dashboards, BI extracts, and APIs.

Performance tests: Run load and query performance tests. Test with concurrent loads and peak loads. Use Snowflake query profiling to identify hotspots and recalculate clustering keys and size of warehouse as appropriate.

Parallel run: Send both legacy and Snowflake outputs to business users in parallel to a defined business period. Compare all outputs and monitor for divergence. Have business users sign off the completion. Parallel runs are a key risk mitigation process for mission-critical datasets.

Cutover rehearsal and go-live: Involve undertaking at least one full dress rehearsal of the process. The rehearsal should involve freeze windows, delta capture, final reconciliation, and rollback processes. During go-live, have hypercare teams available to triage and fix issues that arise during the process. Document every step in the runbook and use timestamps on every process for audit purposes.

Post-migration: Performance, cost and operational tuning in Snowflake

After migration, it’s important to optimize both your query performance as well as your costs. Here are some of the main points to get you started:

Micro-partition awareness: Snowflake automatically micro-partitions your data. Consider the cardinality of your queries when choosing clustering keys to improve the speed of predicate pruning. Additionally, track the clustering depth and re-cluster when necessary!

Right-size your virtual warehouses: Group similar workloads together into the same warehouse and use auto-suspend for cost savings. Monitor how resources are being used and use the cost dashboards to help with right-sizing.

File size optimization: Consider using files between 100 and 250 megabytes compressed when performing bulk loads. Avoid having thousands of small files that add to load cost and overhead. Use transient and temporary tables to stage data before writing into permanent tables. This will help to minimize long-term storage costs. Make sure to purge any intermediate objects as part of the pipeline housekeeping to maintain an efficient workspace.

Query tuning: Utilize query profiling to tune your queries for performance and try to avoid cross-joins. Also, use for query results caching where appropriate, especially for your more frequently used result sets.

Governance for runaway costs: Deploy resource monitors, query tagging, and usage alerts for all users on your Snowflake environment. Finally, educate your users on the cost of running large ad hoc queries.

Security, governance, and compliance checklist

Ensuring security and compliance is a must.

The LumenData Advantage

LumenData utilizes a proven methodology, Informatica IDMC accelerators, and Snowflake best practices to migrate with low risk and high value. Our approach is based on the following pillars:

LumenData case study – A biopharmaceutical firm migrated to Snowflake using Informatica IDMC

A global biopharmaceutical company underwent an end-to-end migration from a hyperscaler warehousing solution to Snowflake. The objective of the project was to centralize analytics workloads, enhance the performance of query workloads, and have a single platform for advanced analytics across the enterprise. LumenData helped the company to migrate the data using Informatica IDMC. The outcomes delivered by LumenData included:

  • Completed seamless bulk ingestion and staging workflows utilizing IDMC accelerators to migrate terabytes of historical data into Snowflake with little to no downtime.
  • Automated data quality and lineage by generating transparent reconciliation reports for auditors and data stewards.
  • Improved tuning and data governance post-migration that reduced query time for critical reports, while also implementing cost controls to reduce monthly compute costs.

LumenData’s approach ensured the migration was completed on-time while meeting important compliance and performance standards. Read the customer success story here.

Wrapping Up: Final checklist

Before you start

During migration

Cutover and post-migration

If you want, LumenData can conduct a technical discovery to estimate the migration effort and define a migration roadmap for you. Connect today.

About LumenData

LumenData is a leading provider of Enterprise Data Management, Cloud and Analytics solutions and helps businesses handle data silos, discover their potential, and prepare for end-to-end digital transformation. Founded in 2008, the company is headquartered in Santa Clara, California, with locations in India. 

With 150+ Technical and Functional Consultants, LumenData forms strong client partnerships to drive high-quality outcomes. Their work across multiple industries and with prestigious clients like Versant Health, Boston Consulting Group, FDA, Department of Labor, Kroger, Nissan, Autodesk, Bayer, Bausch & Lomb, Citibank, Credit Suisse, Cummins, Gilead, HP, Nintendo, PC Connection, Starbucks, University of Colorado, Weight Watchers, KAO, HealthEdge, Amylyx, Brinks, Clara Analytics, and Royal Caribbean Group, speaks to their capabilities. 

For media inquiries, please contact: marketing@lumendata.com.

Authors

Picture of Shalu Santvana
Shalu Santvana

Content Writer

Picture of Sai Bharadwaja
Sai Bharadwaja

Senior Consultant

resources

Read our Case Studies