Answered by:
Fastest way to do a search for a string in a list of strings.both single and mult-threaded.

Question
-
I am looking for the best way (performance wise) to search for a string in a list. (I could be looking for hundred of thousands of string or perhaps even millions.)
I would like to find the best single thread way and the best multi-thread way to do this search.
I was testing three different ways in a single thread by:
1) By a predicate
2) Foreach
3) For i
The code is:
public class StringList : List<string> { public StringList() : base() { } public string FindStringPredicate(string s) { //string s2 = List.Find(o => o == s); return (this.Find( o => o == s)); } public string FindStringFori(string s) { int count = this.Count; for(int i = 0; i < count; i++) { if(this[i] == s) return (this[i]); } return (null); } public string FindStringForEach(string s) { foreach(string s2 in this) { if(s2 == s) return (s2); } return (null); } }
I created one million words and did 100,000 searches and my results came back as:
Predicate: 2 minutes 16 seconds
Foreach: 1 minute 48 seconds
For i: 1 minute 30 seconds.
Is there any other ways that could possible be faster than the for i in a single thread?
what about a multi-threaded solution? Any idea what would be the best way for that?
Thursday, October 14, 2010 4:52 PM
Answers
-
Hi guys, does that mean that Toms sequential search was really really fast?
I mean did it report the right time for a sequential search for the last element? or do you think it just returned the last item without a search?
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)No. Look closely - his timing is just the console output, not the search...
I put together a program to test all of these. I'll post it afterwards, but first, the results:
Starting run 1 LINQ-Where: Found 6efb61b5f7724ddca93d9aa32e114cea in 17 ms LINQ-First: Found 6efb61b5f7724ddca93d9aa32e114cea in 21 ms PLINQ : Found 6efb61b5f7724ddca93d9aa32e114cea in 19 ms foreach : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms for : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms List.Find : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms Binary Src: Found 6efb61b5f7724ddca93d9aa32e114cea in 2724 ms Starting run 2 LINQ-Where: Found 6efb61b5f7724ddca93d9aa32e114cea in 17 ms LINQ-First: Found 6efb61b5f7724ddca93d9aa32e114cea in 21 ms PLINQ : Found 6efb61b5f7724ddca93d9aa32e114cea in 6 ms foreach : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms for : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms List.Find : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms Binary Src: Found 6efb61b5f7724ddca93d9aa32e114cea in 2770 ms Starting run 3 LINQ-Where: Found 6efb61b5f7724ddca93d9aa32e114cea in 17 ms LINQ-First: Found 6efb61b5f7724ddca93d9aa32e114cea in 21 ms PLINQ : Found 6efb61b5f7724ddca93d9aa32e114cea in 6 ms foreach : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms for : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms List.Find : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms Binary Src: Found 6efb61b5f7724ddca93d9aa32e114cea in 2757 ms
As you can see, PLINQ is by far the fastest option (on my machine). I had to up the search to 1,000,000 elements in order to get a reasonable timing. The first run, of course, the timings are kind of skewed because of JIT overhead, but after that, it really pairs down quickly.
Of course, this is also somewhat unfair in that I'm doing the sort each iteration. If you pull the sort out, the binary search is very fast - but it does require a sorted list in order to work. The sort far overshadows the speed up unless you can pre-sort this and not re-add elements. Also, as a comparison, I tested with a HashSet - with 1000000 elements, it was running in under 0.5ms - I had to go to 10mil and measure in ticks to see it. There, a HashSet search (Contains) is performing at about 100000x the speed of any of the other options ;)
For anybody who wants to test, here's the full code - just make sure to run it in an optimized (release) build outside of VS.
namespace ConsoleApplication1 { using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; class Program { public static string last; static void Main(string[] args) { List<string> stringsOriginal = InitList(); for (int iteration = 0; iteration < 3; ++iteration) { Console.WriteLine("Starting run {0}", iteration + 1); List<string> strings = new List<string>(stringsOriginal); Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); var value = (from s in strings where s == last select s).First(); stopwatch.Stop(); Console.WriteLine("LINQ-Where: Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); stopwatch.Start(); value = strings.First(s => s == last); stopwatch.Stop(); Console.WriteLine("LINQ-First: Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); value = strings.AsParallel().First(s => s == last); stopwatch.Stop(); Console.WriteLine("PLINQ : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); value = null; foreach (var s in strings) { if (s == last) { value = s; break; } } stopwatch.Stop(); Console.WriteLine("foreach : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); strings = new List<string>(stringsOriginal); stopwatch.Reset(); stopwatch.Start(); value = null; for (int i = 0; i < strings.Count; ++i) { if (strings[i] == last) { value = strings[i]; break; } } stopwatch.Stop(); Console.WriteLine("for : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); value = strings.Find(s => s == last); stopwatch.Stop(); Console.WriteLine("List.Find : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); strings.Sort(); value = strings[strings.BinarySearch(last)]; stopwatch.Stop(); Console.WriteLine("Binary Src: Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); } } static List<string> InitList() { List<string> strings = new List<string>(1000000); string value = string.Empty; for (int i = 0; i < 1000000; i++) { value = Guid.NewGuid().ToString("N"); strings.Add(value); } last = value; return strings; } } }
Reed Copsey, Jr. - http://reedcopsey.com- Marked as answer by Figo Fei Monday, October 25, 2010 2:43 AM
Thursday, October 14, 2010 11:00 PMModerator -
Actually, the timing was not console output - it was the actual execution of the query. The query does not execute until you iterate it - defered execution.... You will see the same setup in the msdn performance tests. I just setup my query wrong, which was unfortunate.
Also purposely used the where to simulate not knowing where the in the collection the string was. I wanted to set it up so I would be timing the worst case - which in a sequential search, would be the last item in the list. I suppose last would do that, but I wanted it to be more like a real search....
In fact, in this case, this was not a particularly good example. On my system, a simple for loop is faster then then using the query - I suspect that there is not enough work going on to make up for the added over head of the parallelism:
stopwatch.Start (); for ( int i = 0; i < strings.Count; i++ ) { if ( strings[i] == last ) { Console.WriteLine ( strings[i] ); break; } } stopwatch.Stop (); Console.WriteLine ( stopwatch.ElapsedMilliseconds );
takes about 3 - 4 ms. The corrected query takes about 17 or 18 ms. The funny thing is my original version - ends up about 5 - 6 ms. Probably because it's essentially running sequentially - and the timing is in the same range as a simple foreach:
Doing a list .find, is in the 3-4ms range like a for loop.stopwatch.Start (); foreach ( string s in strings ) { if ( s == last ) { Console.WriteLine ( s ); break; } } stopwatch.Stop (); Console.WriteLine ( stopwatch.ElapsedMilliseconds );
I'm not convinced now that this is a good candidate for a multi-threaded operation, at least not at the 100,000 item range.
Tom Shelton- Marked as answer by Figo Fei Monday, October 25, 2010 2:44 AM
Friday, October 15, 2010 8:03 AM -
Hey folks,
First of all thanks for sharing your code and results Reed. It's interesting seeing the comparision results of the different methods.
I prefer to use multi-threading to do two things at the same time rather than using threading to do the same thing faster. The reason is multi threading never really gives that much of a performance boost due to all the overhead involved, and it usually just produces complex code. The complexity of the code doesn't justify the slight performance gain for me. Not saying it's a golden rule of mine, a silver bullet, but for the most part one CPU should be enough to do a single task.
Take this problem for example there are a couple of approaches. Half the list and search each half seperately with two threads, ok but the time spent halfing the list and then rejoining the results when finished adds more processing that basically cancels out the performance gains. No point, complicates the code for no real benefit. Ok how about two threads searching the same list, that should be fine, each thread only reads from the list so no locks needed, but what happens to the results, the matches. If they go in another list then access to that result list needs to be synchronised, one thread at a time, theres the bottleneck that cancels out any benefit, complex code for no real benefit.
They say the software industry needs to change to use multi core CPUs and they are right, developers needs to develop software differently. But for me the change isn't doing a thing faster but doing more things at the same time.
:)
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)Friday, October 15, 2010 10:08 AM -
Hi, no problem
How much memory do you have on that bad boy?
Post the times you get with the 160,000 searches, which seems a lot.... what's your software doing?
private static long TimeGUIDSearch_SequentialContains(string[] guids)
{
int count = 0;
string lastEntry = guids.Last();
Stopwatch watch = new Stopwatch();
watch.Restart();
if (guids.Contains(lastEntry))
count = 1;
watch.Stop();
Console.WriteLine("Sequential : Contains : GUIDs found {0} : {1} ms", count, watch.ElapsedMilliseconds);
return watch.ElapsedMilliseconds;
}Virtually the same for parallel
private static long TimeGUIDSearch_ParallelContains(string[] guids)
{
int count = 0;
string lastEntry = guids.Last();
Stopwatch watch = new Stopwatch();
watch.Restart();
if (guids.AsParallel().Contains(lastEntry))
count = 1;
watch.Stop();
Console.WriteLine("Parallel : Contains : GUIDs found {0} : {1} ms", count, watch.ElapsedMilliseconds);
return watch.ElapsedMilliseconds;
}
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)- Marked as answer by Figo Fei Monday, October 25, 2010 2:43 AM
Wednesday, October 20, 2010 5:20 PM
All replies
-
.NET 4 using PLink:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; namespace ConsoleApplication2 { class Program { public static string last; static void Main ( string[] args ) { List<string> strings = InitList (); Stopwatch stopwatch = new Stopwatch (); var value = ( from s in strings where s == last select s ).AsParallel (); stopwatch.Start (); foreach ( var v in value ) Console.WriteLine ( v ); stopwatch.Stop (); Console.WriteLine ( stopwatch.Elapsed ); } static List<string> InitList () { List<string> strings = new List<string> ( 100000 ); string value = string.Empty; for ( int i = 0; i < 100000; i++ ) { value = Guid.NewGuid ().ToString ( "N" ); strings.Add ( value ); } last = value; return strings; } } }
100000 unique strings - doing a worst case search (basically, searching for the last string in the list):
Time: 00:00:00.0058429
that's a 3 year old dual core processor.
Tom SheltonThursday, October 14, 2010 5:39 PM -
Tom,
You've got your PLINQ query setup inappropriately. This:
var value = ( from s in strings where s == last select s ).AsParallel ();
Should really be:
var value = from s in strings.AsParallel() where s == last select s; // Or: // var value = strings.AsParallel().Where(s => s == last);
The way you wrote it, you're doing your entire query in serial. It's effectively doing:
var value = strings.Where(s => s == last).AsParallel();
This causes strings to be enumerated sequentially using the Enumerable.Where method, then the results are converted to a ParallelQuery, which is effectively just being enumerated serially again.
Reed Copsey, Jr. - http://reedcopsey.comThursday, October 14, 2010 8:25 PMModerator -
Predicate: 2 minutes 16 seconds
Foreach: 1 minute 48 seconds
For i: 1 minute 30 seconds.
Is there any other ways that could possible be faster than the for i in a single thread?
what about a multi-threaded solution? Any idea what would be the best way for that?
I suspect that your timings are suspect here. I would recommend making sure that you run this in release mode, outside of the Visual Studio test host. You'll probably find your timings change - they should be much faster.
That being said, I would use PLINQ personally to multithread this. My version would be:
public string FindStringPLinq(string s) { return this.AsParallel().FirstOrDefault(str => s==str); }
Reed Copsey, Jr. - http://reedcopsey.comThursday, October 14, 2010 8:44 PMModerator -
LOL...your right duh! sorry :)
Tom SheltonThursday, October 14, 2010 9:36 PM -
Hey man, how are the string listed? or how are they loaded into the program?
Like say they are in a database then a SQL query would be faster than reading them into memory. If they are in a file then line by line sequentially will reduce the need for a lot of memory or perhaps a huge big read into a stringbuilder and then running a compiled regular expression might be fast... maybe the text changes a lot but the search pattern doesn't or vice versa or maybe both change all the time. If you have a large list of strings and reading them in rather than searching will take the longest time. What about the search is it on a per line basis, looks like it, so you know ... lots of things could improve performance. Perhaps using map reduce functional programming techniques could be applied, I read F# was capable of large data processing in short times, only because functional programming techniques were applied... LINQ in C# basically.
Is searching the biggest performance bottleneck?
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)Thursday, October 14, 2010 9:54 PM -
Is there any other ways that could possible be faster than the for i in a single thread?
list.Sort();
list.BinarySearch(s)
6 seconds to sort amillion random guids. . .
0.5 seconds to search 100000 guids
gimme some slamming techno!!!!- Proposed as answer by Blair Allen Stark Thursday, October 14, 2010 10:08 PM
Thursday, October 14, 2010 9:58 PM -
Hi guys, does that mean that Toms sequential search was really really fast?
I mean did it report the right time for a sequential search for the last element? or do you think it just returned the last item without a search?
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)Thursday, October 14, 2010 9:58 PM -
using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace TestListSearch { public class Program { private static List<string> CreateStringList() { var result = new List<string>(); for (int i = 0; i < 1000000; i++) { result.Add(Guid.NewGuid().ToString()); } return result; } static void Main(string[] args) { Console.WriteLine("Building list"); var list = CreateStringList(); var findThese = new List<string>(); Random rand = new Random(); Console.WriteLine("Building Search Criteria"); for (int i = 0; i < 99000; i++) { findThese.Add(list[rand.Next(0, list.Count)]); } // Add 1% not found for(int i = 0; i < 1000; i++) { findThese.Add(Guid.NewGuid().ToString()); } DateTime time = DateTime.Now; Console.WriteLine("Preparing for Binary Search (sorting the list)"); list.Sort(); TimeSpan timeToSort = DateTime.Now.Subtract(time); Console.WriteLine("Binary Search prepared: {0}", timeToSort.ToString(@"mm\mss\.ff\s")); Console.WriteLine("Running Binary Search "); findThese.ForEach(delegate(string s) { list.BinarySearch(s); }); Console.WriteLine("BinarySearch (including time to Sort) :{0}", DateTime.Now.Subtract(time).ToString(@"mm\mss\.ff\s")); Console.WriteLine(); Console.WriteLine("[Press any key to exit]"); Console.ReadKey(); } } }
gimme some slamming techno!!!!Thursday, October 14, 2010 10:33 PM -
Hi guys, does that mean that Toms sequential search was really really fast?
I mean did it report the right time for a sequential search for the last element? or do you think it just returned the last item without a search?
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)No. Look closely - his timing is just the console output, not the search...
I put together a program to test all of these. I'll post it afterwards, but first, the results:
Starting run 1 LINQ-Where: Found 6efb61b5f7724ddca93d9aa32e114cea in 17 ms LINQ-First: Found 6efb61b5f7724ddca93d9aa32e114cea in 21 ms PLINQ : Found 6efb61b5f7724ddca93d9aa32e114cea in 19 ms foreach : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms for : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms List.Find : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms Binary Src: Found 6efb61b5f7724ddca93d9aa32e114cea in 2724 ms Starting run 2 LINQ-Where: Found 6efb61b5f7724ddca93d9aa32e114cea in 17 ms LINQ-First: Found 6efb61b5f7724ddca93d9aa32e114cea in 21 ms PLINQ : Found 6efb61b5f7724ddca93d9aa32e114cea in 6 ms foreach : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms for : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms List.Find : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms Binary Src: Found 6efb61b5f7724ddca93d9aa32e114cea in 2770 ms Starting run 3 LINQ-Where: Found 6efb61b5f7724ddca93d9aa32e114cea in 17 ms LINQ-First: Found 6efb61b5f7724ddca93d9aa32e114cea in 21 ms PLINQ : Found 6efb61b5f7724ddca93d9aa32e114cea in 6 ms foreach : Found 6efb61b5f7724ddca93d9aa32e114cea in 13 ms for : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms List.Find : Found 6efb61b5f7724ddca93d9aa32e114cea in 11 ms Binary Src: Found 6efb61b5f7724ddca93d9aa32e114cea in 2757 ms
As you can see, PLINQ is by far the fastest option (on my machine). I had to up the search to 1,000,000 elements in order to get a reasonable timing. The first run, of course, the timings are kind of skewed because of JIT overhead, but after that, it really pairs down quickly.
Of course, this is also somewhat unfair in that I'm doing the sort each iteration. If you pull the sort out, the binary search is very fast - but it does require a sorted list in order to work. The sort far overshadows the speed up unless you can pre-sort this and not re-add elements. Also, as a comparison, I tested with a HashSet - with 1000000 elements, it was running in under 0.5ms - I had to go to 10mil and measure in ticks to see it. There, a HashSet search (Contains) is performing at about 100000x the speed of any of the other options ;)
For anybody who wants to test, here's the full code - just make sure to run it in an optimized (release) build outside of VS.
namespace ConsoleApplication1 { using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; class Program { public static string last; static void Main(string[] args) { List<string> stringsOriginal = InitList(); for (int iteration = 0; iteration < 3; ++iteration) { Console.WriteLine("Starting run {0}", iteration + 1); List<string> strings = new List<string>(stringsOriginal); Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); var value = (from s in strings where s == last select s).First(); stopwatch.Stop(); Console.WriteLine("LINQ-Where: Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); stopwatch.Start(); value = strings.First(s => s == last); stopwatch.Stop(); Console.WriteLine("LINQ-First: Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); value = strings.AsParallel().First(s => s == last); stopwatch.Stop(); Console.WriteLine("PLINQ : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); value = null; foreach (var s in strings) { if (s == last) { value = s; break; } } stopwatch.Stop(); Console.WriteLine("foreach : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); strings = new List<string>(stringsOriginal); stopwatch.Reset(); stopwatch.Start(); value = null; for (int i = 0; i < strings.Count; ++i) { if (strings[i] == last) { value = strings[i]; break; } } stopwatch.Stop(); Console.WriteLine("for : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); value = strings.Find(s => s == last); stopwatch.Stop(); Console.WriteLine("List.Find : Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); stopwatch.Reset(); strings = new List<string>(stringsOriginal); stopwatch.Start(); strings.Sort(); value = strings[strings.BinarySearch(last)]; stopwatch.Stop(); Console.WriteLine("Binary Src: Found {0} in {1} ms", value, stopwatch.ElapsedMilliseconds); } } static List<string> InitList() { List<string> strings = new List<string>(1000000); string value = string.Empty; for (int i = 0; i < 1000000; i++) { value = Guid.NewGuid().ToString("N"); strings.Add(value); } last = value; return strings; } } }
Reed Copsey, Jr. - http://reedcopsey.com- Marked as answer by Figo Fei Monday, October 25, 2010 2:43 AM
Thursday, October 14, 2010 11:00 PMModerator -
Actually, the timing was not console output - it was the actual execution of the query. The query does not execute until you iterate it - defered execution.... You will see the same setup in the msdn performance tests. I just setup my query wrong, which was unfortunate.
Also purposely used the where to simulate not knowing where the in the collection the string was. I wanted to set it up so I would be timing the worst case - which in a sequential search, would be the last item in the list. I suppose last would do that, but I wanted it to be more like a real search....
In fact, in this case, this was not a particularly good example. On my system, a simple for loop is faster then then using the query - I suspect that there is not enough work going on to make up for the added over head of the parallelism:
stopwatch.Start (); for ( int i = 0; i < strings.Count; i++ ) { if ( strings[i] == last ) { Console.WriteLine ( strings[i] ); break; } } stopwatch.Stop (); Console.WriteLine ( stopwatch.ElapsedMilliseconds );
takes about 3 - 4 ms. The corrected query takes about 17 or 18 ms. The funny thing is my original version - ends up about 5 - 6 ms. Probably because it's essentially running sequentially - and the timing is in the same range as a simple foreach:
Doing a list .find, is in the 3-4ms range like a for loop.stopwatch.Start (); foreach ( string s in strings ) { if ( s == last ) { Console.WriteLine ( s ); break; } } stopwatch.Stop (); Console.WriteLine ( stopwatch.ElapsedMilliseconds );
I'm not convinced now that this is a good candidate for a multi-threaded operation, at least not at the 100,000 item range.
Tom Shelton- Marked as answer by Figo Fei Monday, October 25, 2010 2:44 AM
Friday, October 15, 2010 8:03 AM -
Hey folks,
First of all thanks for sharing your code and results Reed. It's interesting seeing the comparision results of the different methods.
I prefer to use multi-threading to do two things at the same time rather than using threading to do the same thing faster. The reason is multi threading never really gives that much of a performance boost due to all the overhead involved, and it usually just produces complex code. The complexity of the code doesn't justify the slight performance gain for me. Not saying it's a golden rule of mine, a silver bullet, but for the most part one CPU should be enough to do a single task.
Take this problem for example there are a couple of approaches. Half the list and search each half seperately with two threads, ok but the time spent halfing the list and then rejoining the results when finished adds more processing that basically cancels out the performance gains. No point, complicates the code for no real benefit. Ok how about two threads searching the same list, that should be fine, each thread only reads from the list so no locks needed, but what happens to the results, the matches. If they go in another list then access to that result list needs to be synchronised, one thread at a time, theres the bottleneck that cancels out any benefit, complex code for no real benefit.
They say the software industry needs to change to use multi core CPUs and they are right, developers needs to develop software differently. But for me the change isn't doing a thing faster but doing more things at the same time.
:)
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)Friday, October 15, 2010 10:08 AM -
As you can see, PLINQ is by far the fastest option (on my machine) . . . The sort far overshadows the speed up unless you can pre-sort this and not re-add elements.
well... lets restate this.
if the list is new every time, ie:
list = Get a million Items item = find in list
then parallel is faster
if the list is fairly static
list = Get a million Items findThese = Get a thousand Items found = list where contains findThese
then binary search is a huge magnitude faster
gimme some slamming techno!!!!Friday, October 15, 2010 10:36 AM -
if the list is fairly static
list = Get a million Items findThese = Get a thousand Items found = list where contains findThese
then binary search is a huge magnitude faster
gimme some slamming techno!!!!Ahh, but if it's just checking for the list, a HashSet<T> is faster than a binary search... Also, HashSet would let you change the contents of the collection without a resort. (With List<T> and BinarySearch, any addition to or removal from the list, or a change in any element requires a resort.)
Reed Copsey, Jr. - http://reedcopsey.comFriday, October 15, 2010 3:26 PMModerator -
Hi thanks for all the interesting responses. I was in fact running it in debug. when I switched to release I got new results of:
DEBUG in VS RELEASE in VS RELEASE not in VS
------------------------------------------------------------------------------------------------------------
Predicate: 2 minutes 16 seconds 1 minute 14 seconds 58 seconds
Foreach: 1 minute 48 seconds 1 minute 37 seconds 1 minute 16 seconds
For i: 1 minute 30 seconds. 1 minute 13 seconds 48 seconds
Plinq 7 minutes 6 seconds 6 minutes 26 seconds
I guess I under estimated how well release could optimize code. I am surprised the Predicate improved that much in release under VS and is running at the speed of a for i. Then I am surprised again the for i beats the Predicate in release mode not under VS.
In my test app I create all the words before searching.
In my real app I will be generating words on the fly and I want to make sure there are no duplicates. So when I make a new word up I want to search the list of words and if the word is found in the list do not add it to the list. If the word wasn't found then add it to the list.
Why is the PLINQ statement running so slow?
I am using a statement that is very similar to:
value = strings.AsParallel().First(s => s == last);
- Edited by Lavagin Wednesday, October 20, 2010 4:46 PM aligned text better
Tuesday, October 19, 2010 6:36 PM -
Hi Lavagin,
Been doing some more benchmarking as well. Your times seem very large, are you running these on a Spectrum? :) How many test cases are you using?
I've found that the Contains() method of an unsorted array of GUIDs is showing very fast times compared to other options.
Here are my results, the machine is a pile of poo with less memory than a wristwatch and a CPU build by Fred Flintstone.
The For and Count methods look and count all GUIDs that begin with the letter a, the Contains methods looks for the last guid in the array.
Parallel : 1000000 GUIDs created in 1363 ms
Parallel : For : GUIDs found 62086 : 128 ms
Parallel : Count : GUIDs found 62086 : 137 ms
Parallel : Contains : GUIDs found 1 : 39 ms
Sequential : For : GUIDs found 62086 : 207 ms
Sequential : Count : GUIDs found 62086 : 220 ms
Sequential : Contains : GUIDs found 1 : 25 ms
Parallel : 2000000 GUIDs created in 2916 ms
Parallel : For : GUIDs found 124207 : 229 ms
Parallel : Count : GUIDs found 124207 : 234 ms
Parallel : Contains : GUIDs found 1 : 47 ms
Sequential : For : GUIDs found 124207 : 386 ms
Sequential : Count : GUIDs found 124207 : 427 ms
Sequential : Contains : GUIDs found 1 : 49 ms
Parallel : 3000000 GUIDs created in 4085 ms
Parallel : For : GUIDs found 187538 : 348 ms
Parallel : Count : GUIDs found 187538 : 350 ms
Parallel : Contains : GUIDs found 1 : 70 ms
Sequential : For : GUIDs found 187538 : 575 ms
Sequential : Count : GUIDs found 187538 : 643 ms
Sequential : Contains : GUIDs found 1 : 74 msIt's quite a good learning exercise this; haven't looked into the now .NET 4 parallel processing stuff. Going to see what else is an offer.
How are you getting times of minutes?
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)Wednesday, October 20, 2010 9:43 AM -
In my test sample I have a million strings and I do 160,000 searches. The million strings and 160,000 are exactly the same each time I run my test.
My CPU that I have been testing is a Core i7 CPU 860 2.8 GHZ which has 4 cores and with hyper threading it can do 8 threads at once.
Derek could I see your code for your sequential and Parallel Contains section?
Wednesday, October 20, 2010 5:04 PM -
Hi, no problem
How much memory do you have on that bad boy?
Post the times you get with the 160,000 searches, which seems a lot.... what's your software doing?
private static long TimeGUIDSearch_SequentialContains(string[] guids)
{
int count = 0;
string lastEntry = guids.Last();
Stopwatch watch = new Stopwatch();
watch.Restart();
if (guids.Contains(lastEntry))
count = 1;
watch.Stop();
Console.WriteLine("Sequential : Contains : GUIDs found {0} : {1} ms", count, watch.ElapsedMilliseconds);
return watch.ElapsedMilliseconds;
}Virtually the same for parallel
private static long TimeGUIDSearch_ParallelContains(string[] guids)
{
int count = 0;
string lastEntry = guids.Last();
Stopwatch watch = new Stopwatch();
watch.Restart();
if (guids.AsParallel().Contains(lastEntry))
count = 1;
watch.Stop();
Console.WriteLine("Parallel : Contains : GUIDs found {0} : {1} ms", count, watch.ElapsedMilliseconds);
return watch.ElapsedMilliseconds;
}
…we each have more potential than we might ever presume to guess. (Blog: http://dsmyth.blogspot.com/)- Marked as answer by Figo Fei Monday, October 25, 2010 2:43 AM
Wednesday, October 20, 2010 5:20 PM