Databricks Unveils LakeFlow, Simplifying Data Ingestion, Transformation & Orchestration 

Data migration from on-prem to cloud at scale using Databricks

Databricks announced the launch of Databricks LakeFlow, a unified solution that streamlines all aspects of data engineering, from data ingestion to transformation and orchestration. LakeFlow enables data teams to ingest data at scale from various sources efficiently, transform it using SQL and Python, and confidently deploy and operate pipelines in production.

With LakeFlow, data teams can now easily ingest data from databases such as MySQL, Postgres, and Oracle, as well as enterprise applications like Salesforce, Dynamics, Sharepoint, Workday, NetSuite, and Google Analytics. Databricks is also introducing Real Time Mode for Apache Spark, allowing ultra-low latency stream processing.

LakeFlow automates the deployment, operation, and monitoring of pipelines at scale in production, with built-in support for CI/CD and advanced workflows that support triggering, branching, and conditional execution. Data quality checks and health monitoring are integrated with alerting systems like PagerDuty.

LakeFlow simplifies the building and operating of production-grade data pipelines while addressing the most complex data engineering use cases, enabling even the busiest data teams to meet the growing demand for reliable data and AI.

Data engineering is crucial for democratising data and AI within businesses, but it remains a challenging and complex field. Data teams face issues such as ingesting data from siloed and proprietary systems, maintaining intricate logic for data preparation, and dealing with failures and latency spikes that can lead to operational disruptions. Existing solutions are often fragmented and incomplete, resulting in low data quality, reliability issues, high costs, and an increasing backlog of work.

LakeFlow addresses these challenges by simplifying all aspects of data engineering through a single, unified experience built on the Databricks Data Intelligence Platform. It integrates deeply with Unity Catalog for end-to-end governance and leverages serverless compute for highly efficient and scalable execution.

LakeFlow Connect provides a breadth of native, scalable connectors for databases and enterprise applications, fully integrated with Unity Catalog for robust data governance. It incorporates the low-latency, highly efficient capabilities of Arcion, which Databricks acquired in November 2023. LakeFlow Connect makes all data available for batch and real-time analysis, regardless of size, format, or location.

LakeFlow Pipelines, built on Databricks’ highly scalable Delta Live Tables technology, allows data teams to implement data transformation and ETL in SQL or Python. It introduces Real Time Mode for low-latency streaming without code changes, eliminates the need for manual orchestration, and unifies batch and stream processing. LakeFlow Pipelines simplifies even the most complex streaming and batch data transformations.

LakeFlow Jobs provides automated orchestration, data health, and delivery spanning scheduling notebooks, SQL queries, ML training, and automatic dashboard updates. It offers enhanced control flow capabilities and full observability to detect, diagnose, and mitigate data issues for increased pipeline reliability. LakeFlow Jobs automates deploying, orchestrating, and monitoring data pipelines in a single place.

“LakeFlow addresses the challenges data teams face in building and operating reliable data pipelines,” said Ali Ghodsi, CEO and co-founder of Databricks. “By simplifying all aspects of data engineering in a unified experience, LakeFlow enables data teams to efficiently meet the growing demand for reliable data and AI.”

The post Databricks Unveils LakeFlow, Simplifying Data Ingestion, Transformation & Orchestration appeared first on AIM.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...