1

Query Large (in millions) Faster data

 2 years ago
source link: https://www.codesd.com/item/query-large-in-millions-faster-data.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Query Large (in millions) Faster data

advertisements

I have two tables:

  • Tbl1 has 2 columns: name and state

  • Tbl2 has name and state and additional columns about the fields

I am trying to match tbl1 name and state with tbl2 name and state. I have remove all exact matches, but I see that I could match more if I could account for misspelling and name variations by using a scalar function that compares the 2 names and returns an integer showing how close of a match they are (the lower the number returned the better the match).

The issue is that Tbl1 has over 2M records and Tbl2 has over 4M records – it takes about 30sec to just to search one record from Tbl1 in Tbl2.

Is there some way I could arrange the data or query so the search could be completed faster?

Here’s the table structure:

CREATE TABLE Tbl1
(
    Id          INT NOT NULL IDENTITY( 1, 1 ) PRIMARY KEY,
    Name        NVARCHAR(255),
    [State]     VARCHAR(50),
    Phone       VARCHAR(50),
    DoB         SMALLDATETIME
)
GO

CREATE INDEX    tbl1_Name_indx ON dbo.Tbl1( Name )
GO
CREATE INDEX    tbl1_State_indx ON dbo.Tbl1( [State] )
GO

CREATE TABLE Tbl2
(
    Id          INT NOT NULL IDENTITY( 1, 1 ) PRIMARY KEY,
    Name        NVARCHAR(255),
    [State]     VARCHAR(50)
)
GO

CREATE INDEX    tbl2_Name_indx ON dbo.Tbl1( Name )
GO
CREATE INDEX    tbl2_State_indx ON dbo.Tbl1( [State] )
GO

Here's a sample function that I tested with to try to rule out function complexity:

CREATE FUNCTION [dbo].ScoreHowCloseOfMatch
    (
      @SearchString VARCHAR(200) ,
      @MatchString VARCHAR(200)
    )
RETURNS INT
AS
    BEGIN

        DECLARE @Result INT;
        SET     @Result = 1;
        RETURN @Result;
    END;

Here's some sample data:

INSERT INTO Tbl1
SELECT  'Bob Jones', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Melcome T Homes', 'CA', '927-333-2222', 'June 10, 1971'  UNION
SELECT  'Janet Rengal', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Matt Francis', 'TN', '234-333-2222', 'June 10, 1971'  UNION
SELECT  'Same Bojen', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Frank Tonga', 'NY', '903-333-2222', 'June 10, 1971'  UNION
SELECT  'Jill Rogers', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Tim Jackson', 'OR', '757-333-2222', 'June 10, 1971'
GO

INSERT INTO Tbl2
SELECT  'BobJonez', 'WA'  UNION
SELECT  'Malcome X', 'CA' UNION
SELECT  'Jan Regal', 'WA'
GO

Here's the query:

WITH cte as (
    SELECT  t1Id = t1.Id ,
            t1Name = t1.Name ,
            t1State = t1.State,
            t2Name = t2.Name ,
            t2State = t2.State ,
            t2.Phone ,
            t2.DoB,
            Score = dbo.ScoreHowCloseOfMatch(t1.Name, t2.Name)

    FROM    dbo.Tbl1 t2
    JOIN    dbo.Tbl2 t1
      ON    t1.State = t2.State
)
SELECT  *
INTO    CompareResult
FROM    cte
ORDER BY    cte.Score ASC
GO


One possibility would be to add a column with a normalized name used only for matching purposes. You would remove all the white spaces, remove accents, replace first names by abbreviated first names, replace known nicknames by real names etc.

You could even sort the first name and the last name of one person alphabetically in order to allow swapping both.

Then you can simply join the two tables by this normalized name column.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK