Query Large (in millions) Faster data
source link: https://www.codesd.com/item/query-large-in-millions-faster-data.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Query Large (in millions) Faster data
I have two tables:
Tbl1 has 2 columns: name and state
Tbl2 has name and state and additional columns about the fields
I am trying to match tbl1 name and state with tbl2 name and state. I have remove all exact matches, but I see that I could match more if I could account for misspelling and name variations by using a scalar function that compares the 2 names and returns an integer showing how close of a match they are (the lower the number returned the better the match).
The issue is that Tbl1 has over 2M records and Tbl2 has over 4M records – it takes about 30sec to just to search one record from Tbl1 in Tbl2.
Is there some way I could arrange the data or query so the search could be completed faster?
Here’s the table structure:
CREATE TABLE Tbl1
(
Id INT NOT NULL IDENTITY( 1, 1 ) PRIMARY KEY,
Name NVARCHAR(255),
[State] VARCHAR(50),
Phone VARCHAR(50),
DoB SMALLDATETIME
)
GO
CREATE INDEX tbl1_Name_indx ON dbo.Tbl1( Name )
GO
CREATE INDEX tbl1_State_indx ON dbo.Tbl1( [State] )
GO
CREATE TABLE Tbl2
(
Id INT NOT NULL IDENTITY( 1, 1 ) PRIMARY KEY,
Name NVARCHAR(255),
[State] VARCHAR(50)
)
GO
CREATE INDEX tbl2_Name_indx ON dbo.Tbl1( Name )
GO
CREATE INDEX tbl2_State_indx ON dbo.Tbl1( [State] )
GO
Here's a sample function that I tested with to try to rule out function complexity:
CREATE FUNCTION [dbo].ScoreHowCloseOfMatch
(
@SearchString VARCHAR(200) ,
@MatchString VARCHAR(200)
)
RETURNS INT
AS
BEGIN
DECLARE @Result INT;
SET @Result = 1;
RETURN @Result;
END;
Here's some sample data:
INSERT INTO Tbl1
SELECT 'Bob Jones', 'WA', '555-333-2222', 'June 10, 1971' UNION
SELECT 'Melcome T Homes', 'CA', '927-333-2222', 'June 10, 1971' UNION
SELECT 'Janet Rengal', 'WA', '555-333-2222', 'June 10, 1971' UNION
SELECT 'Matt Francis', 'TN', '234-333-2222', 'June 10, 1971' UNION
SELECT 'Same Bojen', 'WA', '555-333-2222', 'June 10, 1971' UNION
SELECT 'Frank Tonga', 'NY', '903-333-2222', 'June 10, 1971' UNION
SELECT 'Jill Rogers', 'WA', '555-333-2222', 'June 10, 1971' UNION
SELECT 'Tim Jackson', 'OR', '757-333-2222', 'June 10, 1971'
GO
INSERT INTO Tbl2
SELECT 'BobJonez', 'WA' UNION
SELECT 'Malcome X', 'CA' UNION
SELECT 'Jan Regal', 'WA'
GO
Here's the query:
WITH cte as (
SELECT t1Id = t1.Id ,
t1Name = t1.Name ,
t1State = t1.State,
t2Name = t2.Name ,
t2State = t2.State ,
t2.Phone ,
t2.DoB,
Score = dbo.ScoreHowCloseOfMatch(t1.Name, t2.Name)
FROM dbo.Tbl1 t2
JOIN dbo.Tbl2 t1
ON t1.State = t2.State
)
SELECT *
INTO CompareResult
FROM cte
ORDER BY cte.Score ASC
GO
One possibility would be to add a column with a normalized name used only for matching purposes. You would remove all the white spaces, remove accents, replace first names by abbreviated first names, replace known nicknames by real names etc.
You could even sort the first name and the last name of one person alphabetically in order to allow swapping both.
Then you can simply join the two tables by this normalized name column.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK