Data Observability and Pipelines: OpenLineage and Marquez

There’s an inherent tension at the heart of modern data infrastructure. On the one hand, it’s becoming more mission-critical every day, as companies around the world rely on it to run their business. On the other hand, it’s more complex, and potentially brittle, than ever, an “assembly chain” involving multiple tools and repositories.

This tension has led to the emergence of DataOps as a distinct and very active segment. One particularly important area is known as “data lineage“. The concept is basically to monitor data pipelines and understand the journey of data through its various transformations and usages. This makes it possible to fix any issues that happen along the way, and go to the root of data quality, and potentially fairness, issues.

Because data lineage involves many different tools, platforms and companies, it makes sense for those different parts of the ecosystem to collaborate around standard definitions. This is the concept behind OpenLineage, a cross-industry effort involving creators and contributors from key data projects (DBT, Spark, Pandas, etc.), gathered together at the initiative of the founders of Datakin, an SF startup beyond the open source data lineage project Marquez (originally started at WeWork).

At our most recent Data Driven NYC, we had the pleasure of hosting Julien Le Dem, CTO of Datakin. His talk (video below) is very approachable and educational.

In addition to co-founding Datakin, Julien is a well-known open source contributor. He is the coauthor of Apache Parquet and the PMC chair of the project. He is also a committer and PMC Member on Apache Arrow. Julien Prior to Datakin, Julien was a Senior Principal Engineer at WeWork, an architect at Dremio and the tech lead for Twitter’s data processing tools, where he also obtained a two-character Twitter handle (@J_). Prior to Twitter, Julien was a principal engineer and tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. He notes in his bio that “His French accent makes his talks particularly attractive.”

Posted on February 1, 2021February 1, 2021Categories Big Data, Data Driven NYC

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email Address

Data Observability and Pipelines: OpenLineage and Marquez

Data Observability and Pipelines: OpenLineage and Marquez

Leave a Reply Cancel reply

Post navigation

Subscribe to Blog via Email

Recommend

解决docker image has dependent child images 错误

Supporting Both Tap and Long Press on a Button in SwiftUI

电容基础5——RC低通滤波器和RC高通滤波器

聊聊Store的重构

🔥 如何优雅地解决多个 React、Vue App 之间的状态共享？

Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack

Data classes in Kotlin: how does it impact an application size | Bumble Tech

Announcing the Mobile Native Foundation | by Keith Smiley | Mar, 2021 | Lyft Eng...

说说组织这东西

Mobile Developer - Native

About Joyk