0

Driving data transformations with column-aware metadata

 2 years ago
source link: https://venturebeat.com/2022/04/06/driving-data-transformations-with-column-aware-metadata/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Community

Driving data transformations with column-aware metadata

Image Credit: DKosig/Getty

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


Product architecture is everything. If you don’t believe me, take it from Snowflake CEO Frank Slootman’s new book Amp it Up. His most prolific quote? “All of our successes at the three companies where I’ve been CEO trace back to superior architecture.”

Thoughtful and premeditated product architecture is the single most important best friend you’ll have in this industry. It’s the answer to every fear-based question: “What about your competitor who’s raised [x]$ money?” “Why would customers choose your product over something else?” – and the answer would be because your product has been built from the ground up with something that’s almost impossible to replicate: a column-aware architecture.

Take the classic example of a house. Think of a smart house that is pre-designed to use geothermal energy. You dig the coils underground, build the house, and connect it to your IoT, seeing real-time savings and can make decisions on how to use your energy. Now think of a house from the ’50s with owners who want to retrofit geothermal energy to its brick masonry foundation. It’s extremely expensive, the property may not even be compatible for digging, and your house still can’t talk to your energy usage seamlessly.

This is the unique value of a built-from-scratch, column-aware product in the data transformations market. Poetically similar to how companies reacted when Snowflake pioneered the “compute vs. storage” concept, players in the data transformations space will soon try to add column-aware metadata to their messaging. However, trying to retrofit column-aware metadata to an already existing platform simply won’t scale. It will result in less accuracy, expensive costs, UI pitfalls and difficult inner workings of the application layers, ultimately leading to brittle integrations and a poor user experience.

For all readers involved, what exactly is column-aware metadata? It’s the ability to leverage column names and mappings for easily applying transformations within a data set. For example, when creating a type two dimension, you can easily identify and track changes from specific columns such as address, name, phone number or any other column in your table. Column-level lineage is a profound problem for organizations trying to be data-driven and is compounded by how large the scale of the project is. Being column-aware also allows users to generate SQL in a graphical interface vs. a code-driven IDE that requires that input manually.

Benefits of a column-aware architecture include: 

  • Efficiency for everyone. A column-aware product dramatically improves productivity for every part of the data pipeline, helping democratize data from engineers and architects to the creators and consumers of final data dashboards.
  • Instantaneous impact-analysis and lineage. Because the architecture understands the relationships of columns, you can instantly see the data lineage, how it is interconnected, and the type of change impacts a transformation may have.
  • An unlocked UI. Column metadata unlocks an incredibly unique graphical interface able to display an intuitive and powerful experience for the user. All without compromising flexibility.
  • The lost art of data modeling. Column-aware data transformations seamlessly integrate data profiling, the logical data model, and physical model.

One other area that is often overlooked in the data transformation category is the importance of tracking the state of your data warehouse specifically for change management. Data warehouses and data projects live in a constantly changing state. Because of that, having visibility into historical changes or in other words, time-lineage is key for data explainability governance. State management is also critical for persistent data environments that require incremental changes or change-tracking to accommodate streams. When managing a project at scale with thousands of tables and tens of thousands of columns across dozens of business units, being able to preview and apply changes is a must. 

Without valuing state management or column-awareness, data teams are destined to end up with data warehouses that are disorganized, inefficient, poorly governed and likely will become unsuccessful as the project grows. Simply put, a column-aware architecture and state management built from the ground up is good for the environment, just like geothermal energy. 

Satish Jayanthi is the cofounder and CTO of Coalesce

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK