Share this on:
What You'll Learn
Migrating an on-premises data warehouse to Snowflake is not just a new platform change; it is the ability to modernize pipelines, lower operational overhead, improve query performance, and enable AI and analytics at scale. By combining Informatica Intelligent Data Management Cloud with Snowflake, you get an enterprise-grade migration stack. Informatica IDMC provides connectors, bulk loading accelerators, data quality, catalog, and security pushdown that reduce migration risk and accelerate time to value. This guide outlines a practical roadmap to migrate from on-premises data warehouse to Snowflake
Why migrate from on-prem data warehouse to Snowflake
Companies are switching to Snowflake for compelling reasons:
Speed & agility
Your analysts and data scientists get a single, scalable SQL environment which separates compute from storage and run analytics faster.
Lower total cost of ownership
Cloud storage plus on-demand compute and better resource isolation shrink infrastructure & maintenance costs.
Simplified operations
Snowflake eliminates the need to manage HDFS, hardware refreshes, and many of the operational tasks of an on-prem warehouse.
Modern analytics and AI readiness
Snowflake has native integrations for model serving, dynamic data sharing, and operationalizing ML workflows.
Faster time to insight
Properly executed migrations allow you to accelerate reporting and deliver new real-time or near-real-time analytics that were hard to obtain on legacy platforms.
These drivers elevate a migration project into a strategic program which changes data platform architecture, governance, and operating model.
High-level Snowflake migration approaches
Three standard options are offered. Select the one that meets your complexity, your risk tolerance and your business needs.
- Lift and Shift: Copy schema and data from the on-prem warehouse to Snowflake with minimal transformations. Intro gradual migration with existing analytic tables found on the source platform. Great to use when speed is of most importance.
- Replatform and Modernize: Rework ETL to ELT load design for Snowflake, change schemas as needed for Snowflake micro-partitioning, and add cloud-native load patterns for Snowflake, such as staged loads and Snowpipe. This is slower than selling the lift and shift process but yields materially better long-term performance and lower costs.
- Hybrid incremental migration: Move user-facing datasets/dashboard to Snowflake first. Identify business-critical models to transfer first. Run parallel systems for validation and then phase load into Snowflake from other business domains. The hybrid approach keeps users apprised and offers protection for speed and user impact while continuously capturing the benefits of migrating to the Snowflake ecosystem.
Role of Informatica IDMC in a Snowflake migration
Informatica Intelligent Data Management Cloud plays several critical roles in an enterprise migration to Snowflake:
- IDMC’s Snowflake connectors and ingestion accelerators provide capabilities for ingestion of large files, staged load, and scalable parallelized movement of data to Snowflake. This minimizes custom scripts and decreasing migration bulk load complexity and risk.
- IDMC allows you to create pipelines that pre-transform data before ingesting or to load a whole raw version of the data and transform in Snowflake supporting both ETL and ELT patterns. This is crucial for legacy ETL job modernization.
- IDMC data quality modules allow you to profile, cleanse, and validate data before and after migration to prevent garbage-in, garbage-out problems in Snowflake. If you have data quality built into the migration, you can avoid re-migrating.
- IDMC’s Cloud Data Access Management and native pushdown of security policies to Snowflake streamlines and maintains consistency of enforcement of access control, and improves enterprise privacy rules, while allowing governance to be tighter for Snowflake native controls.
- Informatica IDMC and Snowflake quickstarts have templates and automation that lower the time to first load and validate typical migration scenarios.
When you utilize IDMC, you can automate much of the Snowflake migration process, validate, and enforce security control at scale.
Pre-migration assessment for on-premises data warehouse to Snowflake migration
Key activities include:
- Compile inventory of databases, tables, pipelines, endpoints, refresh schedules and information for downstream consumers.
- Prioritize high-usage data assets. Metadata scanning and enterprise catalogs can all assist with this.
- Identify most used queries, dashboards and SLAs (service level agreements) so they can be migrated and validated first, and users can experience immediate benefit from the migration effort.
- Run automated data profiling to quantify missing data, duplicates for the data to be migrated. This helps determine the data cleanup and transformation tasks. IDMC profiling can be used to further provide standardized metrics.
- Assess stored procedures or vendor-specific SQL dialects, UDFs and complex transformations. This should include automated conversion ascertain the migration efforts range and where manual conversions will be required. Use SnowConvert or other vendor conversion tools with the SQL assessment tools to give proper insight into the effort required with conversion.
- Decide on the high-level migration process. Clearly define success criteria and key performance indicators.
- Document step-by-step runbook for full data loads, delta loads, freeze window and a well-tested rollback plan that is documented and can be actioned on if needed. Well documented rollback plans will reduce risk if issues are identified during the cutover process.
Technical migration steps for moving from on-prem data warehouse to Snowflake
This section gives a prescriptive, technical sequence you can practically follow.
Step 1. Extracting data from source systems and staging
Move the data from your on-prem sources to a cloud staging area. For larger data sets, cloud object storage is typically used. Examples of these include AWS S3, Azure Blob Storage or GCP Cloud Storage. Files should be partitioned in the appropriate sizes for use in bulk loading into Snowflake. The goal is to create files that are compressed and between 100 and 250 megabytes after compression. Snowflake will efficiently bulk load the data using the compressed files via the COPY operation.
Step 2. Data schema conversion and mappings
Transform the data types and schemas to formats compatible with Snowflake. This includes the timestamp and numeric precision types, and geometry types. Create documentation to show the mapping of keys, columns, etc to ensure a useful record where you can improve the data model. Moving forward, there are some automation tools to help accelerate the work of schema conversions, however you will still need time to set aside to validate more complex transformations.
Step 3. Data cleaning and standardizing
Utilizing IDMC data quality workflows enables you to standardize fields by defining business rules. By cleaning the data before it is loaded, you will eventually not have to do the rework.
Step 4. Bulk loading to Snowflake
Utilize Snowflake COPY commands, Snowpipe for continuous ingestion, or Informatica IDMC mass ingestion accelerators for processing loads in parallel. You might want to stage files in cloud storage and then use COPY into Snowflake instead of performing the initial load directly to Snowflake. Adjust warehouse sizes before and during loading to balance load times and cost. Depending on use case and time sensitivity, consider using transient or temporary tables for staging data.
Step 5. Transformations and ELT
When you have a chance to conduct a transformation inside Snowflake, it is always best practice to utilize Snowflake compute in parallel processing workloads. This reduces network movement and orchestrates an easier time. When transformation at the source is required, it’s best practice to document why it is pre-transformed and ensure logic is replicated in versioned mapping artifacts in IDMC.
Step 6. Delta capture and continuous synchronization
During, cutover, you will need to track changes that happen after the initial bulk load. Along with the data capture, you can also use the CDC tools or built-in source change methods to capture changes or deltas. IDMC connectors frequently include the capability to push incremental updates into Snowflake efficiently using CDC patterns.
Step 7. Validation and reconciliation
Automate the process of comparing record counts, checksums, referential integrity checks, and business KPIs between the old system and Snowflake. Create dashboards for reconciliations and set up automated alerts, and implement a process where sample validation along with complete reconciliations occurs for important datasets.
What should be the testing, validation, and cutover strategy for Snowflake migration?
Testing is the point at which most migration projects on on-prem to Snowflake transition succeed or fail. We recommend you to utilize a layered testing approach.
Unit tests: Each mapping or conversion should have unit tests to validate field-level transformations, null handling, and type conversion.
Integration tests: Run end-to-end pipeline tests with representative volumes. Validating outputs into downstream consumers, such as dashboards, BI extracts, and APIs.
Performance tests: Run load and query performance tests. Test with concurrent loads and peak loads. Use Snowflake query profiling to identify hotspots and recalculate clustering keys and size of warehouse as appropriate.
Parallel run: Send both legacy and Snowflake outputs to business users in parallel to a defined business period. Compare all outputs and monitor for divergence. Have business users sign off the completion. Parallel runs are a key risk mitigation process for mission-critical datasets.
Cutover rehearsal and go-live: Involve undertaking at least one full dress rehearsal of the process. The rehearsal should involve freeze windows, delta capture, final reconciliation, and rollback processes. During go-live, have hypercare teams available to triage and fix issues that arise during the process. Document every step in the runbook and use timestamps on every process for audit purposes.
Post-migration: Performance, cost and operational tuning in Snowflake
After migration, it’s important to optimize both your query performance as well as your costs. Here are some of the main points to get you started:
Micro-partition awareness: Snowflake automatically micro-partitions your data. Consider the cardinality of your queries when choosing clustering keys to improve the speed of predicate pruning. Additionally, track the clustering depth and re-cluster when necessary!
Right-size your virtual warehouses: Group similar workloads together into the same warehouse and use auto-suspend for cost savings. Monitor how resources are being used and use the cost dashboards to help with right-sizing.
File size optimization: Consider using files between 100 and 250 megabytes compressed when performing bulk loads. Avoid having thousands of small files that add to load cost and overhead. Use transient and temporary tables to stage data before writing into permanent tables. This will help to minimize long-term storage costs. Make sure to purge any intermediate objects as part of the pipeline housekeeping to maintain an efficient workspace.
Query tuning: Utilize query profiling to tune your queries for performance and try to avoid cross-joins. Also, use for query results caching where appropriate, especially for your more frequently used result sets.
Governance for runaway costs: Deploy resource monitors, query tagging, and usage alerts for all users on your Snowflake environment. Finally, educate your users on the cost of running large ad hoc queries.
Security, governance, and compliance checklist
Ensuring security and compliance is a must.
- Check that the Snowflake accounts and cloud region selections were chosen in accordance with residency policies.
- Check that encryption in motion and at rest has been configured properly according to the enterprise policy.
- Restrict access to roles and use attribute-based access when appropriate. Push the IDMC policies and role mappings to Snowflake to maintain consistency.
- Enable query and access logging with audit trails and anomalous access detection.
- Use masking policies for sensitive columns and tokenization where necessary.
- Use Informatica IDMC to capture the metadata and lineage for compliance or operational troubleshooting.
The LumenData Advantage
LumenData utilizes a proven methodology, Informatica IDMC accelerators, and Snowflake best practices to migrate with low risk and high value. Our approach is based on the following pillars:
- Rapid assessment and prioritization: We leverage automation to inventory and prioritize assets for migration and align those priorities with business KPIs.
- Informatica IDMC-powered pipelines: LumenData builds pipelines in IDMC that accommodate bulk ingestion, CDC and transformation. We use built-in accelerators and templates to speed implementation.
- Data quality and governance: We embed data quality checks and data profiling in each migration pipeline to ensure validation is automated and auditable. Governance, metadata, catalog, and lineage tools are integrated to ensure governance is operational and incorporated.
- Performance and cost tuning: After cutover, we assist the teams with performance tuning and setup micro-partitions, clustering keys, warehouses, cost controls, and monitoring dashboards.
LumenData case study – A biopharmaceutical firm migrated to Snowflake using Informatica IDMC
A global biopharmaceutical company underwent an end-to-end migration from a hyperscaler warehousing solution to Snowflake. The objective of the project was to centralize analytics workloads, enhance the performance of query workloads, and have a single platform for advanced analytics across the enterprise. LumenData helped the company to migrate the data using Informatica IDMC. The outcomes delivered by LumenData included:
- Completed seamless bulk ingestion and staging workflows utilizing IDMC accelerators to migrate terabytes of historical data into Snowflake with little to no downtime.
- Automated data quality and lineage by generating transparent reconciliation reports for auditors and data stewards.
- Improved tuning and data governance post-migration that reduced query time for critical reports, while also implementing cost controls to reduce monthly compute costs.
LumenData’s approach ensured the migration was completed on-time while meeting important compliance and performance standards. Read the customer success story here.
Wrapping Up: Final checklist
Before you start
- Complete an inventory of datasets and dashboards and prioritize them.
- Determine the success metrics and KPIs that will be tracked.
- Determine the migration method and validate technology
During migration
- Stage files in cloud storage at recommended file sizes.
- Utilize Informatica IDMC connectors and accelerators for large ingestion operations.
- Automate data quality and reconciliation.
Cutover and post-migration
- Run parallel systems and reconcile records.
- Tune Snowflake clustering and warehouse systems.
- Deploy governance, monitoring, and budgeting.
If you want, LumenData can conduct a technical discovery to estimate the migration effort and define a migration roadmap for you. Connect today.
About LumenData
LumenData is a leading provider of Enterprise Data Management, Cloud and Analytics solutions and helps businesses handle data silos, discover their potential, and prepare for end-to-end digital transformation. Founded in 2008, the company is headquartered in Santa Clara, California, with locations in India.
With 150+ Technical and Functional Consultants, LumenData forms strong client partnerships to drive high-quality outcomes. Their work across multiple industries and with prestigious clients like Versant Health, Boston Consulting Group, FDA, Department of Labor, Kroger, Nissan, Autodesk, Bayer, Bausch & Lomb, Citibank, Credit Suisse, Cummins, Gilead, HP, Nintendo, PC Connection, Starbucks, University of Colorado, Weight Watchers, KAO, HealthEdge, Amylyx, Brinks, Clara Analytics, and Royal Caribbean Group, speaks to their capabilities.
For media inquiries, please contact: marketing@lumendata.com.
Authors
Content Writer
Senior Consultant