Transfer On-Premises Files to Azure SQL Database

By:Fikrat Azizov | Last Updated: 2019-05-29 | | Related Tips:More > Azure

Problem

In these series of posts, I am going to explore Azure Data Factory (ADF), compare its features against SQL Server Integration Services (SSIS) and show how to use it for real-life data integration problems. In theprevious post, we learned how to create a pipeline to transfer CSV files between an on-premises machine and Azure Blob Storage. In this post, I’ll explain how to create ADF pipelines to transfer CSV files from Blob Storage into Azure SQL Database.

Solution

Before getting started, I recommend reading the previous tip Transfer On-Premises Files to Azure Blob Storage .

Data Exchange Architecture

Since we have already built a pipeline to transfer CSV files from an on-premises machine into Azure Blob Storage, the next step is to create a pipeline to push these files into Azure SQL Database, see reference architecture below:

IFnyiqB.png!web

Preparing Destination Tables

ADF will not create tables in the destination database by default, so I have created a FactInternetSales table with the same structure as my source table, in the destination Azure SQL Database (DstDb), using the below script:

CREATE TABLE [dbo].[FactInternetSales](
   [ProductKey] [int] NOT NULL,
   [OrderDateKey] [int] NOT NULL,
   [DueDateKey] [int] NOT NULL,
   [ShipDateKey] [int] NOT NULL,
   [CustomerKey] [int] NOT NULL,
   [PromotionKey] [int] NOT NULL,
   [CurrencyKey] [int] NOT NULL,
   [SalesTerritoryKey] [int] NOT NULL,
   [SalesOrderNumber] [nvarchar](20) NOT NULL,
   [SalesOrderLineNumber] [tinyint] NOT NULL,
   [RevisionNumber] [tinyint] NOT NULL,
   [OrderQuantity] [smallint] NOT NULL,
   [UnitPrice] [money] NOT NULL,
   [ExtendedAmount] [money] NOT NULL,
   [UnitPriceDiscountPct] [float] NOT NULL,
   [DiscountAmount] [float] NOT NULL,
   [ProductStandardCost] [money] NOT NULL,
   [TotalProductCost] [money] NOT NULL,
   [SalesAmount] [money] NOT NULL,
   [TaxAmt] [money] NOT NULL,
   [Freight] [money] NOT NULL,
   [CarrierTrackingNumber] [nvarchar](25) NULL,
   [CustomerPONumber] [nvarchar](25) NULL,
   [OrderDate] [datetime] NULL,
   [DueDate] [datetime] NULL,
   [ShipDate] [datetime] NULL,
 CONSTRAINT [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED 
([SalesOrderNumber] ASC,[SalesOrderLineNumber] ASC)
)
GO

Creating Copy Pipeline

If you have followed the previous posts, you should be able to create linked services, datasets and pipelines, so I will skip these screenshots for some of the minor steps, to save space here. Furthermore, since we have already created datasets and linked services for Blob Storage, we can use them as source components for the new pipeline.

Please follow the below steps to create the pipeline:

Linked services - In the previous tips we have already setup linked services for both source and destination, so well proceed to the next step.
Datasets - We need blob storage as a source and Azure SQL Database as destination datasets. Here are the required steps to add source and destination datasets. Since the source dataset for this pipeline is going to be similar to the blob storage dataset BlobSTG_DS we created in theprevious post, lets select it and create a duplicate using the clone command.

r26N3ey.png!web

I named the new dataset BlobSTG_DS2:

UVJzuqY.png!web

To create destination dataset, let’s add a new dataset object of Azure SQL Database type from the Factory Resource s panel, assign the name FactInternetSales_DS and switch to the Connection tab
Select the SqlServerLS_Dst linked service for Azure SQL Database we created earlier and select the FactIntenetSales table from the drop-down list:

2UjeEzE.png!web

Switch to the Schema tab and import the table structure using the Import Schema button:

RfYbYbm.png!web

Pipeline and activity . The last step in this process is adding the pipeline and activity. Here are the steps:
- Add new Pipeline object from the Factory Resources panel and assign the name (I named it Blob_SQL_PL)
- Expand the Move & Transform category on the Activities panel and drag/drop the Copy Data activity into the central panel
- Select the newly added activity and assign the name (I named it FactInternetSales_AC)
- Switch to the Source tab and select the BlobStg_DS2 dataset we created earlier:

mqEFz2U.png!web

Switch to the Sink tab and select the FactInternetSales_DS dataset we created earlier and enter a purge query for the destination table:

IFnU3mf.png!web

Switch to the Mapping tab and select Import Schemas :

NjMzUfi.png!web

Publishing changes - Finally, let’s publish the changes, using the Publish All button and check the notifications area for the deployment status.

Execution and Monitoring

Let's kick-off the pipeline using the Trigger Now command under the Trigger menu:

3uiARbq.png!web

Now, let's switch to the ADF Monitoring page to ensure that pipeline execution was successful:

rqy2QzM.png!web

To verify that files have been transferred successfully, I’ve opened SSMS and checked the content of the target table in the Azure SQL Database:

FVR7vyj.png!web

Conclusion

In this post, we completed building a data flow to transfer files from an on-premises machine to an Azure SQL Database. As you may have observed, the ADF pipeline building process resembles building data flow tasks for SSIS, where you create source and destination components and build mapping between them.

If you followed the previous posts, you should be able to create simple pipelines to transfer data between various file and database systems. Although these kinds of pipelines are sufficient for transferring small data sets, you will need to build more sophisticated pipelines with incremental data processing capabilities for larger tables. You may also need to automate pipeline executions to run on pre-scheduled timelines or in response to certain events. We’ll discuss all these capabilities in future posts.

Next Steps

Last Updated: 2019-05-29

bMv2Iby.png!web

q2qQNb.png!web

About the author

BRnMBfZ.jpg!web Fikrat Azizov has been working with SQL Server since 2002. Hes currently working as Senior BI Consultant at BDO Canada.

View all my tips

Related Resources

More SQL Server DBA Tips...

Problem

Solution

Data Exchange Architecture

Preparing Destination Tables

Creating Copy Pipeline

Execution and Monitoring

Conclusion

Next Steps

About the author

Recommend

在 Linux 命令行下使用“原力”

Penrose: Create beautiful diagrams by typing mathematical notation in plain text

《阿里云前端技术周刊》第六期

Flutter基础（三）Dart快速入门

Google 称它的 Modern YouTube 网站不支持 Edge 预览版

美国建起新柏林墙

GitHub - eddiejibson/sido: Simplistic PHP testing library with 0 composer depend...

GitHub on Twitter: "Fund your open source project on your own terms ?‍?&...

Online PHP editor | output for UrbM5

社区平板项目将移动互联网带到偏远非洲

About Joyk