Speeding up search and replace in large files?
-
Monday, August 20, 2012 6:42 PM
Short version - I have HTML files that range in megabytes that I need to edit. Specifically, I need to change the URLs to point to different locations. I do this via regexes because each of the links is unique and needs to be changed to a specific, almost-as-unique string.
The problem is, as the filesize increases, the speed at which the links are replaced naturally decreases, as there's that much text to search through {note: the contents of the file are in memory and only written once all the processing is done}.
Can anyone offer some advice on how to speed up the whole routine? Here's the code;
private static string SortThumbnails(string fileContents) { Hashtable thumbnailList = GetThumbnailHashes(fileContents); NameValueCollection thumbnailXsection = SQLite.RetrieveThumbnailTemplates(thumbnailList); info.Log("Redirecting thumbnails"); foreach (DictionaryEntry x in thumbnailList) { //key = thumbnail, value = sha1 #region If the current thumbnail SHA1 matches an existing value in the database, delete it and redirect the HTML if (thumbnailXsection.GetValues("SHA1").Contains(x.Value)) { for (int index = 0; index < thumbnailXsection.GetValues("SHA1").Length; ++index) { if ((string)x.Value == (string)thumbnailXsection.GetValues("SHA1")[index]) { fileContents = new Regex((string)"(?<=<img src=\").*?" + x.Key + "(?=\")", RegexOptions.Multiline).Replace(fileContents, "Thumbnails\\" + thumbnailXsection.GetValues("template")[index]); File.Delete(tempFolder + x.Key); } } } #endregion #region If the current thumbnail SHA1 has no database matches, insert it into the database and move it to thumbnails else { SQLite.QueueThumbnail((string)x.Key, (string)x.Value); fileContents = new Regex((string)"(?<=<img src=\").*?" + x.Key + "(?=\")", RegexOptions.Multiline).Replace(fileContents, "Thumbnails\\" + (string)x.Key); if (File.Exists("Pages\\Thumbnails\\" + x.Key) == true) { File.Delete(tempFolder + x.Key); } else if (File.Exists(tempFolder + x.Key)) { File.Move(tempFolder + x.Key, "Pages\\Thumbnails\\" + x.Key); } } #endregion } return fileContents;
{it's probably a bit of a mess, I know. I'm still learning}
C# newbie, learning on the go. I will probably ask a lot of followup questions about any answers already given, so fair warning and all.
All Replies
-
Wednesday, August 22, 2012 2:35 AMModerator
Hi TheQuinch,
Welcome to MSDN Forum Support.
We're doing research on this issue. It might take some time before we get back to you.
Sincerely,
Jason Wang
Jason Wang [MSFT]
MSDN Community Support | Feedback to us
-
Wednesday, August 22, 2012 2:45 AMWell, don't worry about it too much. I've taken a different approach in the meantime {chunking the file up into an array, might make an in-memory database to deal with it as well as other processing stuff} so it's not as important anymore.
C# newbie, learning on the go. I will probably ask a lot of followup questions about any answers already given, so fair warning and all.
-
Wednesday, August 22, 2012 7:43 AMOne thing I see is your for loop contains a Replace which will do something useful only the first time you run it for each value of x.Key. The File.Delete also won't do anything once you deleted the file. You could add a break at the end of the 'if' block.

