![]() ![]() This could also be considered when using workarounds like the one from above. CTAS operations writing to clustered columnstore index tables perform better using a higher resource class.See this post by Stephan Köppen for details. Remember to wrap the delete and the insert into a single transaction. row_number() modulo something) into #DupKeyStore and to use this key for splitting into batches. Since delete top(nnn) is not supported on PDW, SET ROWCOUNT does not work either here and you also don’t have a a good approach is add “cluster-column” (e.g. For a larger amount of rows, these statements should be split to batches having one transaction for a bunch of keys. The delete/insert operations require tempdb space. ![]() If there are too many duplicates (I would say more than 5% of the total rows) consider the CTAS operation instead of the delete/insert operation. Check the number of duplicates first (table #DupKeyStore).And having more compute nodes would result in a much shorter query time for this task.īefore extending this solution to even more rows, consider the following topics: SELECT FROM contacts ORDER BY email SELECT email, COUNT(email) FROM contacts GROUP BY email HAVING COUNT(email) > 1 DELETE t1 FROM contacts t1 INNER JOIN. You can use the DISTINCT keyword in a SELECT statement to retrieve only unique values from a particular column. Compared to the 1 hour 15 minutes using the first approach, this is a huge performance boost. One of the easiest ways to remove duplicate data in SQL is by using the DISTINCT keyword. The insert was quite fast, only taking 30 seconds.Īt a total of 4 minutes the 2.5 billion rows were cleaned from duplicates. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |