22

Cinchoo ETL - Compare two CSV files for ADD, CHANGED or DELETED records (Master...

 2 years ago
source link: https://www.codeproject.com/Tips/5319954/Cinchoo-ETL-Compare-two-CSV-files-for-ADD-CHANGED
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This article talks about comparing two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the comparision can be done. You can do the comparision of large files easily as the the operation is stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let's begin by looking into below sample CSV files. Assuming these CSV files are large in sizes, comes with different fields, may have column counts vary on them.

Listing 3.1.1. Master CSV file (master.csv)
Copy Code
ID,name
1,Danny
2,Fred
3,Sam
Listing 3.1.2. Detail file (detail.csv)
Copy Code
ID,name
1,Danny
3,Pamela
4,Fernando

After successful comparision, the expected CSV file should look like as below

Listing 3.1.3. CSV output (output.csv)
Copy Code
ID,name,Status
1,Danny,Unchanged
2,Fred,Deleted
3,Pamela,Changed
4,Fernando,New

The first thing to do is to install ChoETL.JSON /ChoETL.JSON.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Framework

Copy Code
Install-Package ChoETL.JSON

.NET Core

Copy Code
Install-Package ChoETL.JSON.NETStandard

Now add ChoETL namespace to the program.

Copy Code
using ChoETL;

3.2 Comparision Operation

As input files may be large in sizes, we need to consider way to merge them efficiently. Here is an approach to adapt to merge such CSV files.

  1. First open each CSV file (master and detail CSV files), put them into variables.
  2. Then open ChoCSVWriter for writing.
  3. Finally call the Compare extension method on input CSV file streams for comparision. In here, specify the key ("ID") columns as well as compare ("name") columns. Key columns used to match the records between CSV files. Compare columns used to find if there is any change in the matched records. Compare operation yields Tuple<Master, Detail> records. 
Listing 3.2.1. Compare Master - Detail CSV files
JavaScript
Copy Code
private static void CompareCSVFiles()
{
    var r1 = ChoCSVReader.LoadText("master.csv").WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
    var r2 = ChoCSVReader.LoadText("detail.csv").WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();

    using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
    {
        foreach (var t in r1.Compare(r2, "ID", "name" ))
        {
            dynamic v1 = t.MasterRecord as dynamic;
            dynamic v2 = t.DetailRecord as dynamic;
            if (t.Status == CompareStatus.Unchanged || t.Status == CompareStatus.Deleted)
            {
                v1.Status = t.Status.ToString();
                w.Write(v1);
            }
            else 
            {
                v2.Status = t.Status.ToString();
                w.Write(v2);
            }
        }
    }
}

Sample fiddle: https://dotnetfiddle.net/uPR5Sq

For more information about Cinchoo ETL, please visit the other CodeProject articles:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK