Thursday, 15 September 2016

SQL Server Delete Duplicate Rows from Table

In this post, I will demonstrate techniques to delete duplicate rows from SQL Server database table.

One day I was working on one of the freelancing project and found duplicates records in some of client tables.
I have found the work around and created solution to delete duplicate rows.
Let’s create sample table and insert data into it:

CREATE TABLE tbl_DuplicateData
( 
 ID INTEGER PRIMARY KEY
 ,Name VARCHAR(150)
)
GO
 
INSERT INTO tbl_DuplicateData VALUES 
((1,'Roy'),(2,'Ashish'),(3,'Tarun'),
(4,'Rahul'),(5,'Kapoor'),
(6,'Nadun'),(7,'Roy'),(8,'Rahul'),(9,'Kapoor'),
(10,'Rahul'),(11,'Roy'))
GO

There can be multiple solution and different way to get out of this issue:
1st solution is using WITH CTE:
;WITH cte_FindDuplicateRows AS 
(
    SELECT 
       Name
       ,ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) RowNumber
    FROM  tbl_DuplicateData
)
DELETE FROM cte_FindDuplicateRows
WHERE RowNumber > 1
GO

Another way is using SELF JOIN:
DELETE FROM A
FROM tbl_DuplicateData AS A
INNER JOIN tbl_DuplicateData AS B
  ON A.Name = B.Name AND A.ID > B.ID
GO

Check the table after deleting duplicate rows:
SELECT *FROM tbl_DuplicateData ORDER BY ID

As per requirement you can choose any of the solution and before executing please must check your query plan and choose best solution to delete duplicate rows.

No comments:

Post a Comment