29

Parallel package execution in Azure-SSIS Runtime

 5 years ago
source link: https://www.tuicool.com/articles/hit/7Nvaemq
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

By:Koen Verbeeck |   Last Updated: 2018-11-29   |  |   Related Tips:More > Azure

Problem

I have created an SSIS project in Visual Studio. In this project, I have a master package starting multiple child packages. These child packages run in parallel. When executing my project on a local server, everything runs fine and performance is optimal. However, after migrating the project to the Azure cloud using the Azure-SSIS runtime, performance has degraded. What can be the cause of this issue?

Solution

As explained in the tips Configure an Azure SQL Server Integration Services Integration Runtime and Executing Integration Services Packages in the Azure-SSIS Integration Runtime you can easily lift and shift your existing SSIS projects to the Azure-SSIS runtime with minimal or no changes. However, to make sure performance stays optimal, some changes do have to be made to make sure packages keep running in parallel. In this tip, we’ll share a solution on how to ensure your packages use the maximum of the available resources on the Azure-SSIS cluster. If you haven’t already read the two mentioned tips, please do so as this tip will built upon the knowledge shared in those tips.

Executing Packages in Parallel in the Azure-SSIS Runtime

Test Set-up

First, we need a package performing a task that takes a little while so we can monitor the process. We’re going to use a query to generate 5 million rows and insert them into a table in an Azure SQL database. We’re using the SSIS data flow, to make sure data is transferred over the network to the cluster in the Azure-SSIS runtime where it will consume CPU and memory. The query uses a tally table to generate the rows and is based on the article Creative Solutions by Using a Number Table by Greg Larsen.

WITH
    L0   AS(SELECT 1 AS C UNION ALL SELECT 1 AS O), -- 2 rows
    L1   AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows
    L2   AS(SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B), -- 16 rows
    L3   AS(SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows
    L4   AS(SELECT 1 AS C FROM L3 AS A CROSS JOIN L3 AS B), -- 65,536 rows
    L5   AS(SELECT 1 AS C FROM L4 AS A CROSS JOIN L4 AS B), -- 4,294,967,296 rows
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L5)
SELECT TOP 5000000 N AS N1, N AS N2 FROM Nums;

This query generates two integer columns, with values ranging from 1 to 5 million. That’s 8 bytes per row, or 40 Megabytes in total.

We’re going to run three identical packages in parallel. The data flow looks like this:

yEfY3m2.png!web

The control flow simply truncates the destination table and loads the generated data into the database.

RB3iAjB.png!web

One master package kicks off the three packages in parallel using Execute Package Tasks .

vaauEzZ.png!web

Each package writes to its own destination table, so there is no concurrency and locking on a single table.

zu673eJ.png!web

After we’ve deployed the SSIS project, we can start the master package in the SSIS catalog.

RfiUjaI.png!web

In Azure Data Factory, we can monitor the IR environment to see how many resources are consumed and how many packages are running.

uYVNZvi.png!web

At the resource monitor page, we can see only one job is running, while there are actually 4 packages running (the master package and the three child packages).

iaeEfmU.png!web

We can verify if all packages are running at the same time by doing a count on the destination tables with the NOLOCK hint.

J3QBZ3I.png!web

The problem is all the child packages are being executed inside the execution of the master package. In other words, they are being treated as part of the master package, so their executions are not spread out over the cluster.

Solution to Solve Problem

Luckily, this can be fixed. Instead of using Execute Package Tasks, we using the SSIS catalog stored procedures to start the child packages. The following stored procedure can be used to simplify the T-SQL needed to create and start an execution:

CREATE PROC [dbo].[RunPackage]
    (@PackageName   VARCHAR(50)
    ,@FolderName    VARCHAR(50)
    ,@ProjectName   VARCHAR(50)
    ,@Synchronized  BIT = 1 -- run synchronously by default
    )
AS
BEGIN
    DECLARE @execution_id BIGINT;

    EXEC [SSISDB].[catalog].[create_execution]
         @package_name      = @PackageName
        ,@execution_id      = @execution_id OUTPUT
        ,@folder_name       = @FolderName
        ,@project_name      = @ProjectName
        ,@use32bitruntime   = False
        ,@reference_id      = NULL;
 
    --SELECT @execution_id;
    EXEC [SSISDB].[catalog].[set_execution_parameter_value]
         @execution_id
        ,@object_type       = 50
        ,@parameter_name    = N'SYNCHRONIZED'
        ,@parameter_value   = @Synchronized;
 
    EXEC [SSISDB].[catalog].[start_execution] @execution_id;
END

This stored procedure creates an SSIS execution, sets the Synchronized parameter and starts the execution. By default, the Synchronized parameter is set to true, which means the stored procedure will wait till the SSIS package has finished executing. A sample T-SQL script to call this stored procedure would be:

EXEC dbo.RunPackage 'CloudTest1.dtsx', 'MSSQLTips', 'MSSQLTIPS', 1;

Since Azure SQL databases don’t allow cross-database queries, this stored procedure must be created inside the SSISDB database.

Now we need to replace the Execute Package Tasks with Execute SQL Tasks . Each of those tasks will call the stored procedure to start a child package.

V3iMFzZ.png!web

Let’s deploy this master package to the catalog and execute it to monitor its behavior. When running this master package, the resource monitor now shows that every node of the IR environment is executing jobs:

7rI7jyU.png!web

Conclusion

By abandoning Execute Package Tasks and rather using the SSIS catalog stored procedures, we can optimally spread out the execution of packages over the Azure-SSIS integration runtime. This tip provides you with an easy to use stored procedure that creates and starts an SSIS package execution. Keep in mind this stored proc has to be deployed in the SSISDB database.

Next Steps

Last Updated: 2018-11-29

J7JRjaZ.png!web

q2qQNb.png!web

About the author

i6Zniau.png!web Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips

Related Resources


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK