none
C# - Program Stuck RRS feed

  • Question

  • Hello,

    I trying to make a proxy scraper, and I managed to do everything so far.

    Description:

    You type in a .txt file sources(URLs) of websites that post free proxies.
    then the program <g class="gr_ gr_31 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" data-gr-id="31" id="31">use</g> regex to scrape the proxies.

    Problem:

    well, it's working very good if I use <g class="gr_ gr_26 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="26" id="26">small</g> amount of Sources, <g class="gr_ gr_28 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" data-gr-id="28" id="28">however</g> if I use <g class="gr_ gr_27 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" data-gr-id="27" id="27">large</g> amount of sources the program stuck and it takes several hours until it done scraping.

    there are tools like this that you can use a larger amount of sources and still, they do not crash/stuck.
    how can I do it?

    I preety new to C#, so I know my code is very bad

    Code:

    Load Sources:
    
    OpenFileDialog opf = new OpenFileDialog();
    opf.Filter = "Text Files |*.txt| all files|*.*";
    opf.Title = "Source";
    if (opf.ShowDialog() == DialogResult.OK)
    {
         string[] Sourceline = File.ReadAllLines(opf.FileName);
         for (int i = 0; i < Sourceline.Length; i++)
         {
              SourceData.Add(Sourceline[i]);
         }
         for (int i = 0; i < SourceData.Count; i++)
         {
              Sourcecounter++;
         }
         flatLabel1.Text = "Source: " + "[ 0/" + Sourcecounter.ToString() + " ]";
                }
    
    Scraping:
    
    for (int i = 1; i <= SourceData.Count; i++)
                {
                    flatLabel1.Text = "Source: " + "[ " + i + "/" + Sourcecounter.ToString() + " ]";
                    try
                    {
                        byte[] data = X.DownloadData(new Uri(SourceData[i - 1].ToString()));
                        MatchCollection A = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[:]\\d{1,6}");
                        MatchCollection B = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}</td><td>\\d{1,4}");
                        MatchCollection C = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}\",\"port\":\"\\d{2,6}");
                        MatchCollection D = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}</td><td class=\"column-2\">\\d{2,6}");
                        foreach (Match m in A)
                        {
                            string[] ip = m.Value.Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value, ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value);
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                        foreach (Match m in B)
                        {
                            string[] ip = m.Value.Replace("</td><td>", ":").Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value.Replace("</td><td>", ":"), ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value.Replace("</td><td>", ":"));
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                        foreach (Match m in C)
                        {
                            string[] ip = m.Value.Replace("\",\"port\":\"", ":").Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value.Replace("\",\"port\":\"", ":"), ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value.Replace("\",\"port\":\"", ":"));
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                        foreach (Match m in D)
                        {
                            string[] ip = m.Value.Replace("</td><td class=\"column-2\">", ":").Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value.Replace("</td><td class=\"column-2\">", ":"), ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value.Replace("</td><td class=\"column-2\">", ":"));
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                    }
                    catch
                    {
    
                    }
    
    

    Wednesday, March 21, 2018 7:28 PM

All replies

  • Hello,

    I <g class="gr_ gr_19 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" data-gr-id="19" id="19">trying</g> to make a proxy scraper, and I managed to do everything so far.

    Description:

    You type in a .txt file sources(URLs) of websites that post free proxies.
    then the program use regex to scrape the proxies.

    Problem:

    well, it's working very good if I use small amount of Sources, however if I use <g class="gr_ gr_27 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" data-gr-id="27" id="27">large</g> amount of sources the program stuck and it takes several hours until it done scraping.

    there are tools like this that you can use a larger amount of sources and still, they do not crash/stuck.
    how can I do it?

    I new to C#, so I know my code is very bad

    Code:

    Load Sources:
    
    OpenFileDialog opf = new OpenFileDialog();
    opf.Filter = "Text Files |*.txt| all files|*.*";
    opf.Title = "Source";
    if (opf.ShowDialog() == DialogResult.OK)
    {
         string[] Sourceline = File.ReadAllLines(opf.FileName);
         for (int i = 0; i < Sourceline.Length; i++)
         {
              SourceData.Add(Sourceline[i]);
         }
         for (int i = 0; i < SourceData.Count; i++)
         {
              Sourcecounter++;
         }
         flatLabel1.Text = "Source: " + "[ 0/" + Sourcecounter.ToString() + " ]";
                }
    
    Scraping:
    
    for (int i = 1; i <= SourceData.Count; i++)
                {
                    flatLabel1.Text = "Source: " + "[ " + i + "/" + Sourcecounter.ToString() + " ]";
                    try
                    {
                        byte[] data = X.DownloadData(new Uri(SourceData[i - 1].ToString()));
                        MatchCollection A = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[:]\\d{1,6}");
                        MatchCollection B = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}</td><td>\\d{1,4}");
                        MatchCollection C = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}\",\"port\":\"\\d{2,6}");
                        MatchCollection D = Regex.Matches(Encoding.UTF8.GetString(data, 0, data.Length), "\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}[.]\\d{1,3}</td><td class=\"column-2\">\\d{2,6}");
                        foreach (Match m in A)
                        {
                            string[] ip = m.Value.Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value, ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value);
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                        foreach (Match m in B)
                        {
                            string[] ip = m.Value.Replace("</td><td>", ":").Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value.Replace("</td><td>", ":"), ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value.Replace("</td><td>", ":"));
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                        foreach (Match m in C)
                        {
                            string[] ip = m.Value.Replace("\",\"port\":\"", ":").Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value.Replace("\",\"port\":\"", ":"), ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value.Replace("\",\"port\":\"", ":"));
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                        foreach (Match m in D)
                        {
                            string[] ip = m.Value.Replace("</td><td class=\"column-2\">", ":").Split(':');
                            dataGridView1.Rows.Add(ProxyCounter2.Count + 1, m.Value.Replace("</td><td class=\"column-2\">", ":"), ls.getCountry(ip[0]).getName());
    
                            ProxyCounter2.Add(m.Value.Replace("</td><td class=\"column-2\">", ":"));
                            flatLabel4.Text = "Amount: " + ProxyCounter2.Count;
                        }
                    }
                    catch
                    {
    
                    }



    • Edited by DanielKirtc Wednesday, March 21, 2018 7:33 PM
    Wednesday, March 21, 2018 7:32 PM
  • Hi DanielKirtc,

    Thank you for posting here.

    For your question, could you provide the URLS of the .txt file? Do you mean you want to use regex to replace something of website? If yes, please provide what you want to replace. The original url and the new url you want.

    If no, please provide more information about what you want.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, March 22, 2018 7:44 AM
    Moderator