5

How to replicate Heap data to a data warehouse using Stitch

 3 years ago
source link: https://www.stitchdata.com/blog/etl-heap-using-stitch/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to replicate Heap data to a data warehouse using Stitch

heap_FULL-1.png

More than a year ago we announced a partnership between Heap and Stitch. Today, let's talk about how you can use Stitch to replicate Heap data and data from the other SaaS tools and local databases into a data warehouse, from which you can create reports to surface insights and improve performance.

A data warehouse consolidates all of your data in a single location — generally a cloud-based platform nowadays. You can extract the data from your SaaS applications and on-premises databases and load it to your data warehouse using an ETL (extract, transform, load) tool such as Stitch. Once the data is available, analysts can use it to create reports.

In this post, we'll walk through the process of connecting Heap to Stitch, setting up replication to a data warehouse, and accessing the data with a business intelligence (BI) tool to create reports.

Three tiers of the data analytics architecture

A data analytics stack comprises three tiers: ETL software, data warehouse, and BI software.

Stitch provides a simple, powerful ETL service for businesses of all sizes. Signup is simple — you can be moving data from one or more sources to a data warehouse in five minutes.

The last few years have seen the emergence of cloud-native data warehouses. Because they run on cloud infrastructure that scales quickly and cost-effectively to meet performance demands, they can handle transformation using the same hardware on which the data warehouse runs.

Finally, to unlock the value of your data, you can connect a BI or data visualization tool to your data warehouse and create reports that analyze data from multiple sources, which you can share via browser-based dashboards.

Setting up a data warehouse

We'll set up our data analytics stack starting with the data warehouse. If you don't already have a data warehouse, choose one that meets your needs. If you choose Amazon Redshift, Google BigQuery, Snowflake, Azure Synapse Analytics, or one of the other destinations Stitch supports, you can follow the setup steps for your data warehouse in the Stitch documentation.

Setting up Stitch for ETL

The next step is setting up an ETL pipeline to move data from your sources to the data warehouse. Stitch makes extracting data from a source and loading it into a data warehouse easy. To get started, visit the signup page, enter your email address, then enter your name and a password.

Signup

Add an integration

Next, add Heap as an integration within Stitch. Click on the Heap icon to get started:

Pick-an-integration

Enter a name for the integration. The name will display on the Stitch dashboard for the integration and will be used to create the name in your destination. You must also enter an S3 bucket and AWS account ID – follow the documentation.

Add-an-integration

The other two settings ask how much historical data you want to replicate to your data warehouse and how often you want to replicate new data.

When you click Save, Stitch displays another screen that lets you choose what tables you want to replicate.

Tables

Each time you check a table, Stitch will display all the fields in the table, so you can choose the ones you want.

Fields

When you're done with one table, click the name of the integration in the breadcrumb bar to return to the previous screen, where you can click additional tables. When you have all you want, click Finalize Your Selections, and voilà, your integration has been added. All new fields and records of the types you've selected will be replicated to your data warehouse — but first you have to connect the data warehouse you set up to Stitch as a destination.

Add a destination

Destinations

Suppose you've chosen an Amazon Redshift data warehouse. Clicking on the Redshift icon brings you to a screen where you can enter your credentials:

Redshift-credentials

When you visit your Stitch dashboard, the status for your new integration may show as pending while Stitch schedules the initial replication job. If you refresh the screen after a few minutes the status will change to active.

From the dashboard you can also do things like adding integrations from other data sources. The Stitch documentation walks through the process for each one.

Connecting BI software to your data warehouse

Now you can connect an analytics platform to your data warehouse. If you don't already use BI software, you have dozens of analysis tools to choose from, including popular options such as Tableau, Microsoft Power BI, Google Data Studio, Looker, Chartio, and Mode.

That's all there is to it. Using an ETL tool like Stitch to move data from Heap and other sources into a data warehouse lets you employ BI tools to correlate and report on data from all of your sources.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK