Query Large (in millions) Faster data

advertisements

I have two tables:

Tbl1 has 2 columns: name and state
Tbl2 has name and state and additional columns about the fields

I am trying to match tbl1 name and state with tbl2 name and state. I have remove all exact matches, but I see that I could match more if I could account for misspelling and name variations by using a scalar function that compares the 2 names and returns an integer showing how close of a match they are (the lower the number returned the better the match).

The issue is that Tbl1 has over 2M records and Tbl2 has over 4M records – it takes about 30sec to just to search one record from Tbl1 in Tbl2.

Is there some way I could arrange the data or query so the search could be completed faster?

Here’s the table structure:

CREATE TABLE Tbl1
(
    Id          INT NOT NULL IDENTITY( 1, 1 ) PRIMARY KEY,
    Name        NVARCHAR(255),
    [State]     VARCHAR(50),
    Phone       VARCHAR(50),
    DoB         SMALLDATETIME
)
GO

CREATE INDEX    tbl1_Name_indx ON dbo.Tbl1( Name )
GO
CREATE INDEX    tbl1_State_indx ON dbo.Tbl1( [State] )
GO

CREATE TABLE Tbl2
(
    Id          INT NOT NULL IDENTITY( 1, 1 ) PRIMARY KEY,
    Name        NVARCHAR(255),
    [State]     VARCHAR(50)
)
GO

CREATE INDEX    tbl2_Name_indx ON dbo.Tbl1( Name )
GO
CREATE INDEX    tbl2_State_indx ON dbo.Tbl1( [State] )
GO

Here's a sample function that I tested with to try to rule out function complexity:

CREATE FUNCTION [dbo].ScoreHowCloseOfMatch
    (
      @SearchString VARCHAR(200) ,
      @MatchString VARCHAR(200)
    )
RETURNS INT
AS
    BEGIN

        DECLARE @Result INT;
        SET     @Result = 1;
        RETURN @Result;
    END;

Here's some sample data:

INSERT INTO Tbl1
SELECT  'Bob Jones', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Melcome T Homes', 'CA', '927-333-2222', 'June 10, 1971'  UNION
SELECT  'Janet Rengal', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Matt Francis', 'TN', '234-333-2222', 'June 10, 1971'  UNION
SELECT  'Same Bojen', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Frank Tonga', 'NY', '903-333-2222', 'June 10, 1971'  UNION
SELECT  'Jill Rogers', 'WA', '555-333-2222', 'June 10, 1971'  UNION
SELECT  'Tim Jackson', 'OR', '757-333-2222', 'June 10, 1971'
GO

INSERT INTO Tbl2
SELECT  'BobJonez', 'WA'  UNION
SELECT  'Malcome X', 'CA' UNION
SELECT  'Jan Regal', 'WA'
GO

Here's the query:

WITH cte as (
    SELECT  t1Id = t1.Id ,
            t1Name = t1.Name ,
            t1State = t1.State,
            t2Name = t2.Name ,
            t2State = t2.State ,
            t2.Phone ,
            t2.DoB,
            Score = dbo.ScoreHowCloseOfMatch(t1.Name, t2.Name)

    FROM    dbo.Tbl1 t2
    JOIN    dbo.Tbl2 t1
      ON    t1.State = t2.State
)
SELECT  *
INTO    CompareResult
FROM    cte
ORDER BY    cte.Score ASC
GO

One possibility would be to add a column with a normalized name used only for matching purposes. You would remove all the white spaces, remove accents, replace first names by abbreviated first names, replace known nicknames by real names etc.

You could even sort the first name and the last name of one person alphabetically in order to allow swapping both.

Then you can simply join the two tables by this normalized name column.

Query Large (in millions) Faster data

Query Large (in millions) Faster data

Recommend

独立站100问（20）：手把手教你看懂Google Analytics 指标数据2

闲鱼举办“新能源车展” 自行车日均搜索量超100万

钉钉的领域变迁

昔日亚洲最大超市落幕！“一站式购齐”魅力不再，传统大卖场如何突围？

Firefox extension developed with George Lucas's ILM ensures colors are consisten...

【茶包射手筆記】PowerShell Invoke-WebRequest IE 錯誤

奇客Solidot | 鱼会做简单加减法

4月2日电商报/韵达官方抖音号、视频号上线一键下单功能

吴欣鸿增持公司股份彰显管理层信心-品玩

Emoji Kitchen list: The best custom Gboard emoji combos

About Joyk