5

Dremio adds new Apache Iceberg features to its data lakehouse

 1 year ago
source link: https://www.infoworld.com/article/3689948/dremio-adds-new-apache-iceberg-features-to-its-data-lakehouse.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Dremio adds new Apache Iceberg features to its data lakehouse

The new features include the ability to copy data and roll back changes in Apache Iceberg tables.

By Anirban Ghoshal

Senior Writer,

InfoWorld | Mar 2, 2023 1:29 pm PST

A user reviews data and statistical models. [analytics / analysis / tracking / monitoring / logging]

Dremio is adding new features to its data lakehouse including the ability to copy data into Apache Iceberg tables and roll back changes made to these tables.  

Apache Iceberg is an open-source table format used by Dremio to store analytic data sets.  

In order to copy data into Iceberg tables, enterprises and developers have to use the new “copy into SQL” command, the company said.

“With one command, customers can now copy data from CSV and JSON file formats stored in Amazon S3, Azure Data Lake Storage (ADLS), HDFS, and other supported data sources into Apache Iceberg tables using the columnar Parquet file format for performance,” Dremio said in an announcement Wednesday.

The copy operation is distributed across the entire, underlying lake house engine to load more data quickly, it added.

The company has also introduced a table rollback feature for enterprises, akin to a Windows system restore backup or a Mac Time Machine backup.

The tables can be backed up either to a specific time or a snapshot ID, the company said, adding that developers will have to make use of the “rollback” command to access the feature.

“The rollback feature makes it easy to revert a table back to a previous state with a single command. When rolling back a table, Dremio will create a new Apache Iceberg snapshot from the prior state and use it as the new current table state,” Dremio said.

Optimize command boosts Iceberg performance

In an effort to increase the performance of Iceberg tables, Dremio has introduced the “optimize” command to consolidate and optimize sizes of small files that are created when data manipulation commands such as insert, update, or delete are used.

“Often, customers will have many small files as a result of DML operations, which can impact read and write performance on that table and utilize excess storage,” the company said, adding that the “optimize” command can be used inside Dremio Sonar at regular intervals to maintain performance.

Dremio Sonar is a SQL engine that provides data warehousing capabilities to the company’s lakehouse.

The new features are expected to improve productivity of data engineers and system administrators while bringing utility to these class of users, said Doug Henschen, principal analyst at Constellation Research.

Dremio, which was an early proponent of Apache Iceberg tables in lakehouses, competes with the likes of Ahana and Starburst, both of which introduced support for Iceberg in 2021.

Other vendors such as Snowflake and Cloudera added support for Iceberg in 2022.

Dremio features new database, BI connectors

In addition to the new features, Dremio said that it was launching new connectors for Microsoft PowerBI, Snowflake and IBM Db2.

“Customers using Dremio and PowerBI can now use single sign-on (SSO) to access their Dremio Cloud and Dremio Software engines from PowerBI, simplifying access control and user management across their data architecture,” the company said.

The Snowflake and IBM DB2 connectors will allow enterprises to add Snowflake data warehouses and IBM DB2 databases as data sources for Dremio, it added.

This makes it easy to include data in these systems as part of the Dremio semantic layer, enabling customers to explore this data in their Dremio queries and views.

The launch of these connectors, according to Henschen, brings more plug-and-play options to analytics professionals from Dremio’s stable.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK