none
How to update efficiently big table field RRS feed

  • Question

  • Dear all,

    I have a table with a field named BODY which content all the content of html pages of my web site.
    In that field I need to parse a string where name contains "&amp" and replace it with "&"

    for that I use wild card LIKE '%name=%&%' to fetch the record.

    my table has a huge amount of data and update querry might take huge amount of time. Do you see a more faster way to proceed ?

    regards

    Monday, September 9, 2019 12:37 PM

Answers

All replies

  • WHILE 1 = 1
    BEGIN

       UPDATE TOP (50000) tbl SET col=<blblabala>
       WHERE.......;

       IF @@ROWCOUNT < 50000 BREAK;

    END

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, September 9, 2019 12:38 PM
    Answerer
  • Oups sorry Uri did not catch it... i admit my question was fuzzy as well.

    my compte work would be as below :

    First I need to collect records wich match my wild card as below :

    SELECT ID
    
    FROM mytable
    WHERE myField LIKE '%a'


    Then for each records found I need to execute the following :

    UPDATE myTable SET myField = REPLACE(myField, 'aaa', 'bbb')

    I guess i should use CURSOR but how can I combine it with your sugestion above ?

    The final querry will be executed in Azure so that is the reason I should optimize it and make it effiicient to avoid timeout or resource consumtion

    regards

    Monday, September 9, 2019 12:51 PM
  • SELECT ID

    INTO #tmp FROM mytable
    WHERE myField LIKE '%a'

    create index my_idx on #tmp(id)


    WHILE 1 = 1
    BEGIN
       UPDATE TOP (50000) tbl SET myField = REPLACE(myField, 'aaa', 'bbb')
       FROM tbl JOIN #tmp ON tbl.ID=#tmp.ID
         

       IF @@ROWCOUNT < 50000 BREAK;

    END

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Monday, September 9, 2019 1:02 PM
    Answerer
  • Thnaks for your reply.

    What about if my table contains more that 50000 records ?

    Should I get the rowCount dynamically ?

    Monday, September 9, 2019 1:07 PM
  • It is OK, the script will be ran in loop and update will be done in chunks of 50000 rows and when it is done it will exit the loop 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, September 9, 2019 1:10 PM
    Answerer
  • Thnaks for your reply.

    What about if my table contains more that 50000 records ?

    Should I get the rowCount dynamically ?

    The script updates in loops and with every batch it updates 50000 rows at a time. You may change this number while testing query.

    Do check out similar scripts for reference:

    https://www.stevefenton.co.uk/2018/06/sql-bulk-update-in-batches/

    http://www.sqlusa.com/bestpractices/largeupdate/


    Cheers
    Vaibhav
    MCSA (SQL Server 2014)

    Monday, September 9, 2019 1:14 PM
  • This would likely be better than doing 2 scans of the table.

    WHILE 1 = 1
    BEGIN
       UPDATE TOP (50000) tbl SET BODY = REPLACE(BODY,'&amp;','&')
       WHERE CHARINDEX('&amp;',BODY) > 0;
    
       IF @@ROWCOUNT < 50000 BREAK;
    END

    • Proposed as answer by Vaibhav-Chaudhari Monday, September 9, 2019 1:44 PM
    • Marked as answer by wakefun Monday, September 9, 2019 3:33 PM
    Monday, September 9, 2019 1:17 PM
    Moderator
  • hmmm ok thanks to all of you for your reply.

    I have never use this bulk update so I am confused how it works and where is the loop actually.

    The reason is that in code provide by Uri, i can see  IF @@ROWCOUNT < 50000 BREAK;so I uderstand ist so that after 50000 rows it stop the loop

    Can you please clarify ?

    Sorry for simple thing for you which is bit hard to catch for me

    regards

    Monday, September 9, 2019 1:30 PM
  • >>The reason is that in code provide by Uri, i can see  IF @@ROWCOUNT < 50000 BREAK;so I uderstand >>>ist so that after 50000 rows it stop the loop

    No, each bulk update 50000 rows, and last one let me say completes with 42050 rows and the next loop will exists as 42050  is less than 50000


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    • Marked as answer by wakefun Monday, September 9, 2019 1:42 PM
    • Unmarked as answer by wakefun Monday, September 9, 2019 1:45 PM
    Monday, September 9, 2019 1:37 PM
    Answerer
  • Thnaks Uri I understand now.

    can it work the same if I have multiple different update between or I should have only 1 update to execute at a time .

    For exemple if my update is define as below

    UPDATE mytable SET myfield = REPLACE(myfield, 'aa', 'bb') WHERE myField LIKE '%aa%';
    UPDATE mytable SET myfield = REPLACE(myfield, 'cc', 'dd') WHERE myField LIKE '%cc%';

    can it be used in the loop as below ?

    WHILE 1 = 1 BEGIN UPDATE TOP (50000) tbl SET myField = REPLACE(myField, 'aa', 'bb') FROM tbl JOIN #tmp ON tbl.ID=#tmp.ID where myfield LIKE '%aa%'
    UPDATE TOP (50000) tbl SET myField = REPLACE(myField, 'cc', 'dd')
    FROM tbl JOIN #tmp ON tbl.ID=#tmp.ID where myfield LIKE '%cc%'

    IF @@ROWCOUNT < 50000 BREAK; END


    regards


    • Edited by wakefun Monday, September 9, 2019 1:56 PM
    Monday, September 9, 2019 1:52 PM
  • I would use two different loops but  you can try the below logic for both updates

    DECLARE @rowcnt1 INT
    DECLARE @rowcnt2 INT

    WHILE 1 = 1
    BEGIN
       UPDATE TOP (50000) tbl SET myField = REPLACE(myField, 'aa', 'bb')
       FROM tbl JOIN #tmp ON tbl.ID=#tmp.ID where myfield LIKE '%aa%'
       SET @rowcnt1=@@ROWCOUNT
       UPDATE TOP (50000) tbl SET myField = REPLACE(myField, 'cc', 'dd') 
       FROM tbl JOIN #tmp ON tbl.ID=#tmp.ID where myfield LIKE '%cc%'
        SET @rowcnt2=@@ROWCOUNT
       IF @rowcnt1 < 50000  AND @rowcnt2 < 50000 BREAK;

    END


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, September 9, 2019 2:01 PM
    Answerer
  • If you are modifying the same table, only do a single UPDATE.  Running multiple updates on the same table, especially a big one, is not needed.

    It is better to update multiple fields to the same value, rather than scanning the table multiple times.

       UPDATE TOP (50000) tbl SET BODY = REPLACE(REPLACE(REPLACE(BODY,'&amp;','&'),'&lt;','<'),'&gt;','>)
       WHERE CHARINDEX('&amp;',BODY) > 0
    OR CHARINDEX('&lt;',BODY) > 0
    OR CHARINDEX('&gt;',BODY) > 0;

    Monday, September 9, 2019 2:01 PM
    Moderator
  • Just to be clear.

    Each "UPDATE TOP (50000).." updates up to 50000 rows.   The "IF @@ROWCOUNT < 50000 BREAK;" means the update changed less than 50000 rows and there are no more rows to update based on your WHERE clause, so you can stop the loop.

    So if you have 100,005 rows to update, it will update 50,000, loop, update 50,000 rows, loop, update 5 rows and break.

    Monday, September 9, 2019 2:07 PM
    Moderator
  • Thank Tom for your reply,

    By the way I have been always told to avoid using WHILE 1 = 1, which can be very dangerous in getting in an infinit loop, is there a way to avoid to use it ?

    regards

    Monday, September 9, 2019 2:50 PM
  • By the way, perhaps you can also leave “&amp;” as is, because converting to “&” usually transform the HTML into a problematic or invalid one (including the case of multiple replacements). Instead of such unreliable conversion, maybe consider something else in other parts of your application.


    • Edited by Viorel_MVP Monday, September 9, 2019 3:20 PM
    Monday, September 9, 2019 3:19 PM
  • Hi,

    Also pre-plan the script for other commonly used characters and their equivalent counterparts:

    &    -->    &amp;
    <    -->    &lt;
    >    -->    &gt;
    UPDATE tbl SET BODY = REPLACE(REPLACE(REPLACE(BODY,'&amp;','&'),'&lt;','<'),'&gt;','>')
    WHERE ((CHARINDEX('&amp;',BODY) > 0) OR (CHARINDEX('&lt;',BODY) > 0) OR (CHARINDEX('&gt;',BODY) > 0));


    Please mark this as answer if it solved your issue. 
    Please vote this as helpful if it helps to solve your issue. 

    Thanks,
    Arulmouzhi

     

    Monday, September 9, 2019 8:45 PM