3

Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery

 1 year ago
source link: https://www.infoq.com/news/2022/08/bigtable-bigquery-zero-etl/?itm_source=infoq&itm_medium=popular_widget&itm_campaign=popular_content_list&itm_content=
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery

Aug 11, 2022 2 min read

Recently, Google announced the general availability of Bigtable federated queries with BigQuery allowing customers to query data residing in Bigtable via BigQuery faster. Moreover, the querying is without moving or copying the data in all Google Cloud regions with increased federated query concurrency limits, closing the longstanding gap between operational data and analytics, according to the company. 

BigQuery is Google Cloud's serverless, multi-cloud data warehouse that simplifies analytics by bringing data from various sources together – and Cloud Bigtable is Google Cloud's fully-managed, NoSQL database for time-sensitive transactional and analytical workloads. The latter is suitable for multiple use cases such as real-time fraud detection, recommendations, personalization, and time series. 

Previously, customers had to use ETL tools such as Dataflow or self-developed Python tools to copy data from Bigtable into BigQuery; however, now, they can query data directly with BigQuery SQL. The federated queries BigQuery can access data stored in Bigtable.

To query Bigtable data, users can create an external table for a Cloud Bigtable data source by providing the Cloud Bigtable URI – which can be obtained through the Cloud Bigtable console. The URI contains the following:

  • project_id is the project containing the Cloud Bigtable instance
  • instance_id is the Cloud Bigtable instance ID
  • (Optional) app_profile is the app profile ID to use
  • table_name is the name of the table for querying

 

BigQuery_my4p59r.max-800x800.jpg
Source: https://cloud.google.com/blog/products/data-analytics/bigtable-bigquery-federation-brings-hot--cold-data-closer

Once the external table is created, users can query Bigtable like any other table in BigQuery. In addition, users can also take advantage of BigQuery features like JDBC/ODBC drivers and connectors for popular Business Intelligence and data visualization tools such as Data Studio, Looker, and Tableau, in addition to AutoML tables for training machine learning models and BigQuery’s Spark connector to load data into their model development environments.

A big data enthusiast Christian Laurer explains in a medium article the benefit of the new approach with Bigtable’s federated queries:

Using the new approach, you can overcome some shortcomings of the traditional ETL approach. Such as:

•    More data freshness (up-to-date insights for your business, no hours or even days old data)
•    Not paying twice for the storage of the same data (customers normally store Terabytes or even more in Bigtable)
•    Less monitoring and maintaining of the ETL pipeline

Lastly, more details on Bigtable’s federated queries with BigQuery are available on the documentation page. Furthermore, Querying data in Cloud Bigtable is available in all supported Cloud Bigtable zones.

About the Author

Steef-Jan Wiggers

Steef-Jan Wiggers is one of InfoQ's senior cloud editors and works as a Technical Integration Architect at HSO in The Netherlands. His current technical expertise focuses on integration platform implementations, Azure DevOps, and Azure Platform Solution Architectures. Steef-Jan is a board member of the Dutch Azure User Group, a regular speaker at conferences and user groups, writes for InfoQ, and Serverless Notes. Furthermore, Microsoft has recognized him as Microsoft Azure MVP for the past eleven years.

Show more

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK