Support Generative AI Innovation while Maintaining Data Governance

Know how integrating Databricks Unity Catalog with Informatica Cloud Data Governance accelerates Generative AI innovation.
February 6, 2024

Generative AI’s proposed use cases have exploded over the past year. This breakneck speed of innovation requires substantial data accuracy but is often only utilized in small sections of an enterprise.

Companies are increasing their investments in AI, with 40% of respondents in a McKinsey Global Survey
saying they will increase their investments in AI.

This accuracy is essential and can best be accomplished in small, controlled settings, but it can be accelerated by an enterprise’s overall data strategy, including the governance of enterprise data.   

A departmental implementation of Databricks Unity Catalog is a powerful tool for managing complex data environments, particularly those involving machine learning models and transient data sources. Unity Catalog facilitates compute for ML models and handles dynamic data sources, providing the flexibility and scalability required for advanced analytics initiatives, including Generative AI applications. Once the work has been vetted at the departmental level, the data can be reintegrated into Informatica’s Data Management Cloud to empower other use cases. 

To further accelerate innovation, data practitioners need accurate, reliable, and understood data. Fortunately, the recent enhancements to Informatica’s Data Management Cloud and Databricks allow Generative AI practitioners to jump-start their innovations with data vetted for accuracy and enhanced with metadata. Using the Databricks Modernization Program, data analysts can quickly convert over 90% of PowerCenter workloads to Databricks and implement them via Databricks SQL, and the data managed in Informatica can be used directly in Databricks from over 300 sources. 

CDGC Integration

While Unity Catalog effectively addresses the needs of the departmental data team, integrating its data and metadata with Informatica CDGC poses unique challenges.  

CDGC, designed for enterprise-level Data Governance, may not inherently support the intricacies of managing ML model data, transient data sources, and other specialized requirements typical of departmental-level initiatives. 

By containing the Generative AI within Databricks, innovation can be encouraged without threatening the overall Data Governance strategy. Once the output is ready to be moved to production, the vetted and accurate data can be incorporated back into the enterprise’s data estate, with the knowledge that the data is accurate. This work can then be propagated throughout the enterprise via Informatica’s Data Cloud. 

Benefits of Integration

Despite the challenges, integrating Databricks Unity Catalog with Informatica CDGC offers numerous benefits. Organizations can achieve a holistic view of their data landscape by bridging departmental-level initiatives with enterprise-wide Data Governance. This integration enables centralized metadata management, improved data quality, enhanced compliance, and streamlined collaboration across departments. Building the organizational muscles to support these initiatives will pay off as more and more departments within an enterprise adopt AI strategies.   

Less than 1/3 of respondents in the McKinsey Survey use AI in more than one function –
but 2/3s expect their organizations to increase their AI investment over the next three years. 

Informatica now offers IDMC customers the following Databricks Unity Catalog-validated capabilities: 

Cloud Data Integration (CDI) Ingest

CDI is now validated with Unity Catalog, enabling customers to ingest data from more than 300 data source types directly into Databricks.

Automated personal staging location (PSL) Management

CDI now automatically manages Databricks personal staging locations for customers, providing a dedicated storage location for temporary data pipeline data. Customer benefits of automated PSL management include data separation (keeping primary data separate from temporary data), higher performance, expanded data versioning capacity, and cost savings.

Cloud Data Integration Transformation

Unity Catalog is now fully supported for ingest transformations running natively in Databricks via Databricks SQL. Customers benefit from highly secure data integration and transformation data pipelines running natively in the Databricks cluster.

Informatica CDI-Free

Unity Catalog is now fully supported by the no-cost edition of CDI, providing customers with no-cost ingestion from 40 popular data sources and enabling data transformation for up to 20M rows or ten compute hours per month. 

Informatica is one of the top three fastest-growing data and AI products in the Databricks ecosystem,
with 174% year-over-year growth, according to Databricks’ 2023 State of Date + AI report

Organizations can ensure a smooth transition toward a unified data ecosystem by navigating the integration challenges and leveraging best practices. From pilot projects to continuous communication and comprehensive training, the journey toward sustained innovation in the era of AI-driven transformation is fortified. With Informatica’s strategic growth in the Databricks ecosystem, IT departments can confidently embrace this integration, knowing it will evolve in value over time, supporting the organization’s pursuit of data-driven excellence and fostering a culture of innovation while maintaining robust data governance principles. 

Key Considerations

Successful integration requires careful consideration of several factors: 

1. Use Existing Infrastructure

You’ve done the hard work of vetting your data for accuracy and relationships in Informatica Data Cloud; converting your PowerCenter pipelines or reading directly from Informatica Sources will help jump-start the work to make a successful AI product. 

2. Data Security

Coordinate the data Security between the two applications to ensure data is used correctly and securely. 

3. Collaboration

Foster collaboration between departmental data teams and enterprise data governance stakeholders to align goals and requirements.   

Best Practices

To facilitate smooth integration, organizations should adhere to best practices: 

4. Pilot Projects

Conducting pilot projects to validate integration workflows and identify potential challenges. 

5. Continuous Communication

Maintaining open communication channels between departmental data teams and enterprise Data Governance stakeholders to address evolving needs. 

6. Training and Education

Providing comprehensive training programs to familiarize stakeholders with the integrated solution and promote adoption. 

Conclusion

Integrating Databricks Unity Catalog with Informatica Cloud Data Governance presents a crucial opportunity for enterprises to drive generative AI innovation while upholding stringent data governance standards. As the adoption of generative AI accelerates, the need for accurate, reliable, and well-governed data becomes paramount. Databricks Unity Catalog, deployed at a departmental level, empowers data teams with agility and scalability, particularly in managing complex data environments for advanced analytics, including generative AI applications.

When seamlessly integrated with Informatica CDGC, this localized innovation bridges departmental initiatives with enterprise-wide data governance frameworks, facilitating centralized metadata management, improved data quality, enhanced compliance, and streamlined collaboration. 

About LumenData

LumenData is a leading provider of Enterprise Data Management, Cloud, and Analytics solutions. We help businesses handle data silos, discover their potential, and prepare for end-to-end digital transformation. Founded in 2008, the company is headquartered in Santa Clara, California, with locations in India.

With 150+ Technical and Functional Consultants, LumenData forms strong client partnerships to drive high-quality outcomes. Our work across multiple industries and with prestigious clients like Versant Health, Boston Consulting Group, FDA, Department of Labor, Kroger, Nissan, Autodesk, Bayer, Bausch & Lomb, Citibank, Credit Suisse, Cummins, Gilead, HP, Nintendo, PC Connection, Starbucks, University of Colorado, Weight Watchers, KAO, HealthEdge, Amylyx, Brinks, Xylem, Clara Analytics, and Royal Caribbean Group, speaks to our capabilities.

For media inquiries, please contact: marketing@lumendata.com.

Quoted Articles

Author

Andrew Crider

Andrew Crider

Andrew Crider is the Director of Analytics at LumenData. With over ten years of experience in the analytics space, he has helped multiple Fortune 500 companies get the most value out of their data innovations.