none
How to make my pogram run on multiple threads RRS feed

  • Question

  • So I'm making a web scraper but its kinda slow and some people pointed out that I should make my program run on multiple threads to fix the issue.So what I want to do is to make the same program run on 100+ threads.Can somebody help me do it,will share source code if necessery

     class Program
        {
            private static String _outputFileAvailable = "available.txt";
            private static String _outputFileUnavailable = "unavailable.txt";
    
    
            static void Main(string[] args)
            {
                
                Console.WriteLine("Proxies Type (HTTP/SOCKS4/SOCKS5) : ");
                string proxiesType = Console.ReadLine();
                var AvailableNum = 0;
                var UnavailableNum = 0;
                Console.Title = "Cookie name checker | by BataBo | Checked: " + (AvailableNum + UnavailableNum) +  " | Good:" + AvailableNum;
                using (StreamWriter availableWriter = File.AppendText(_outputFileUnavailable))
                {
                    using (StreamWriter unavailableWriter = File.AppendText(_outputFileAvailable))
                    {
                        foreach (string line in File.ReadLines("Usernames.txt"))
                        {
                            args = new[]
                            {
                    "https://api.mojang.com/users/profiles/minecraft/" + line
                };
                            
                            var plines = File.ReadAllLines("Proxies.txt");
                            var prline = new Random();
                            var prnline = prline.Next(0, plines.Length - 1);
                            var pline = plines[prnline];
                            WebProxy proxy = new WebProxy(proxiesType + "://" + pline + "/");
                            
    
                            var client = new WebClient();
    
                            client.Proxy = proxy;
                            
    
                            client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";
                            client.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
                            client.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
                            client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
                            client.Headers[HttpRequestHeader.AcceptLanguage] = "en-GB,en-US;q=0.8,en;q=0.6";
                            client.Headers[HttpRequestHeader.AcceptCharset] = "ISO-8859-1,utf-8;q=0.7,*;q=0.3";
    
                            var content = ScrubContent(client.DownloadString(args[0]));
    
    
                            if (content == "")
                            {
                                Console.ForegroundColor = ConsoleColor.Green;
                                Console.WriteLine("Name {0} is available", line);
                                unavailableWriter.WriteLine(line);
                                AvailableNum++;
                                Console.Title = "Cookie name checker | by BataBo | Checked: " + (AvailableNum + UnavailableNum)  + " | Good:" + AvailableNum;
                            }
    
                            else
                            {
                                Console.ForegroundColor = ConsoleColor.Red;
                                Console.WriteLine("Name {0} is not available", line);
                                availableWriter.WriteLine(line);
                                UnavailableNum++;
                                Console.Title = "Cookie name checker | by BataBo | Checked: " + (AvailableNum + UnavailableNum) + " | Good:" + AvailableNum;
                            }
                        }
                    }
                }
                Console.WriteLine("");
                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.WriteLine("====================Results====================");
                Console.ForegroundColor = ConsoleColor.Blue;
                Console.WriteLine("Checked: {0} names", AvailableNum + UnavailableNum);
                Console.ForegroundColor = ConsoleColor.Green;
                Console.WriteLine("Available names: {0} names", AvailableNum);
                Console.ForegroundColor = ConsoleColor.Red;
                Console.WriteLine("Unavailable names: {0} names", UnavailableNum);
                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.WriteLine("====================Results====================");
                Console.ReadLine();
            }
                static string ScrubContent(string content)
                {
                    return new string(content.Where(c => c != '\n').ToArray());
                }
    
            }

    • Edited by BataBo Jokviu Friday, January 11, 2019 10:32 PM Added code
    Friday, January 11, 2019 5:08 PM

All replies

  • Before deciding to run the program on multiple threads, you first have to find out what is the bottleneck that makes it "kinda slow".

    • If the bottleneck is in your network bandwidth, then you will not gain anything from launching multiple threads.
    • If the bottleneck is in the speed of response of the remote computers that you are scrapping (presuming you are scrapping from different sites) then it may be advantageous to launch a different thread for each server.
    • If the bottleneck is in your CPU speed when processing the data that you have scrapped, then making the program run on 100 threads is probably a bad idea, unless you run it on a computer with 100 cores. Don't launch more threads than the number of CPUs.

    Depending on the case, and how your program is processing its data, the way in which you will launch your multiple threads will be different. For the "cpu" case, the simplest way is probably to do a Parallel.Foreach loop that iterates on all the documents that you want to process.



    Friday, January 11, 2019 9:47 PM
    Moderator
  • Dear Alberto Poblacion,

    Well the proble isnt in my cpu the program is very simple and it only runs at 1% of cpu.

    The bottleneck is that proxies connect to the server very slowly.I think for such a case it would be justified to use more threads.

    thanks in advance,

    BataBo

    Friday, January 11, 2019 10:30 PM
  • Ok, in that case it may be useful to launch several queries in parallel. Whether it works or not will depend mostly on the behavior of the proxy: if it's throttling "per request", then do send several requests in parallel to improve speed. If it's throttling "per user" then don't bother, it will not go faster if you send several requests.

    There are different ways to launch threads, for example, you could use the Thread Parallel Library (TPL), or you could use the ThreadPool. The basic "Thread" class can be used like this:

    using System.Threading;
    
    Thread t = new Thread(myMethod);
    t.Start();
    
    .... // do here other things
    
    t.Join(); // this waits for the thread to finish if it hasn't already
    
    ....
    
    private void myMethod()
    {
        // this will be executed in a separate thread
    }


    Saturday, January 12, 2019 8:16 AM
    Moderator
  • So I put my code in private void myMethod Right?

    If anything I uploaded my code
    Saturday, January 12, 2019 8:44 AM
  • Yes, you put in "MyMethod" the code that you want to run in a separate thread.

    Be warned that multithreaded programming is very complex and there are many things to keep in mind when doing it. A couple of them are these:

    - If the code that runs in a Thread accesses any resource that is shared by other threads (for example, a variable in memory or a file on disk), then you have to take some precautions, such as applying a lock, to avoid corrupting that resource when two threads access it at the same time.

    - If you are doing a Windows desktop application, be aware that the desktop does NOT apply such locks, so it will be corrupted if you access it from two threads. Make sure that your multithreaded code does not access the screen; if it has to do it you need to first marshall execution into the main thread.

    - This locking of shared resources, if not done properly, can lead to deadlocks or contention, so you need to know and understand what you are doing, it's not a simple matter of "give me an example and I'll copy the code".

    - If it is NOT done, it leads to what are called "race conditions", where the code appears to work most of the time, but produces an unexpected error from time to time in an unpredictable way. These are horribly difficult to debug.

    • Proposed as answer by Stanly Fan Monday, January 14, 2019 5:55 AM
    Saturday, January 12, 2019 9:41 AM
    Moderator