none
Web Page Download RRS feed

  • Question

  • Im looking for the fastest consistent way to get a webpage. The test code below seems to suggest that webclient wins, but I wonder if there is another method or some properties that can be set in the others. Wget for Windows takes 500 ms consistently for the same data.

    Code:

    Option Strict On
    Imports System.Net
    Imports System.IO
    Imports System.Text
    
    Public Class Form1
        Dim URLPart1 As String = "https://finance.yahoo.com/quote/"
        Dim URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
        'HTTPDL Function copied from https://stackoverflow.com/questions/17337343/how-to-make-a-get-httpwebrequest-in-vb-net
        'Form with a Button(BtnGo) and a Rich text box (RTB)
        'hopefully the URL is OK for timezones other than mine (PDT)
        Private Function HTTPDL(URL As String) As String
            Dim postData As String = "lsd=AVrFBNXT&display=&enable_profile_selector=&legacy_return=1&next=&profile_selector_ids=&trynum=1&timezone=-120&lgnrnd=163248_FehM&lgnjs=1372203160&default_persistent=1"
            Dim tempcookies As New CookieContainer
            Dim encoding As New UTF8Encoding
            Dim byteData As Byte() = encoding.GetBytes(postData)
            Dim postreq As HttpWebRequest = DirectCast(HttpWebRequest.Create(URL), HttpWebRequest)
            postreq.Method = "GET"
            postreq.KeepAlive = True
            postreq.CookieContainer = tempcookies
            postreq.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729"
            postreq.ContentType = "application/x-www-form-urlencoded"
            postreq.Referer = "https://finance.yahoo.com"
            Dim postresponse As HttpWebResponse
            postresponse = DirectCast(postreq.GetResponse, HttpWebResponse)
            Dim postreqreader As New StreamReader(postresponse.GetResponseStream())
            Dim thepage As String = postreqreader.ReadToEnd
            Return thepage
        End Function
    
        Private Function WEBDL(ByVal url As String) As String
            Dim client As New WebClient
    
            Dim RetVal As String = "No Data"
            Try
                RetVal = client.DownloadString(url)
            Catch ex As Exception
                RetVal = ex.Message
            End Try
            Return RetVal
        End Function
    
        Private Function GetStringFromURL(ByVal url As String) As String
            'From Frank
            Dim retVal As String
            Dim sb As New System.Text.StringBuilder
            Try
                Dim request As WebRequest = WebRequest.Create(url)
                Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
                    Using dataStream As Stream = response.GetResponseStream
                        Using rdr As New StreamReader(dataStream)
                            sb.Append(rdr.ReadToEnd)
                        End Using
                    End Using
                End Using
                retVal = sb.ToString
            Catch ex As Exception
                retVal = ex.Message
            End Try
            Return retVal
        End Function
    
    
        Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles BtnGo.Click
            Dim sw As New Stopwatch
            Dim Tickers As New List(Of String) From {"^DJI", "AAPL", "IBM", "GM"}
            Dim TheURL As String = ""
            Dim Reply As String = ""
            RTB.Text = "Method Used" & vbTab & "Size" & vbTab & "Symb" & vbTab & "Time (ms.)" & vbNewLine
            For Each Ticker As String In Tickers
                sw.Reset()
                sw.Start()
                TheURL = URLPart1 & Ticker & URLPart2
                Reply = GetStringFromURL(TheURL)
                sw.Stop()
                RTB.AppendText("GetString" & vbTab & Reply.Length.ToString & vbTab & _
                                Ticker & vbTab & sw.ElapsedMilliseconds.ToString & vbNewLine)
            Next
            RTB.AppendText(vbNewLine)
            For Each Ticker As String In Tickers
                sw.Reset()
                sw.Start()
                TheURL = URLPart1 & Ticker & URLPart2
                Reply = WEBDL(TheURL)
                sw.Stop()
                RTB.AppendText("WebDload" & vbTab & Reply.Length.ToString & vbTab & _
                                Ticker & vbTab & sw.ElapsedMilliseconds.ToString & vbNewLine)
            Next
            RTB.AppendText(vbNewLine)
    
            For Each Ticker As String In Tickers
                sw.Reset()
                sw.Start()
                TheURL = URLPart1 & Ticker & URLPart2
                Reply = HTTPDL(TheURL)
                sw.Stop()
                RTB.AppendText("HTTPDLoad" & vbTab & Reply.Length.ToString & vbTab & _
                                Ticker & vbTab & sw.ElapsedMilliseconds.ToString & vbNewLine)
            Next
        End Sub
    End Class
    

    My results typically look like this:

    Method Used	Size	Symb	Time (ms.)
    GetString	421636	^DJI	1375
    GetString	430448	AAPL	547
    GetString	430639	IBM	822
    GetString	430369	GM	665
    
    WebDload	421648	^DJI	687
    WebDload	430419	AAPL	508
    WebDload	430619	IBM	1309
    WebDload	430360	GM	776
    
    HTTPDLoad	467519	^DJI	1063
    HTTPDLoad	476312	AAPL	788
    HTTPDLoad	470216	IBM	1891
    HTTPDLoad	473276	GM	1187


    Thursday, May 25, 2017 7:50 PM

Answers

  • Hi Devon_Nullman,

    WebClient is just a wrapper around HttpWebRequest. Using WebClient is potentially slightly (on the order of a few milliseconds) slower than using HttpWebRequest directly. But that "inefficiency" comes with huge benefits: it requires less code, is easier to use, and you're less likely to make a mistake when using it. Consider, for example, retrieving the text of a Web page using WebClient.

    var client = new WebClient();
    var text = client.DownloadString("http://example.com/page.html");

    Contrast that to HttpWebRequest:

    string text;
    var request = (HttpWebRequest)WebRequest.Create("http://example.com/page.html");
    using (var response = request.GetResponse())
    {
        using (var reader = new StreamReader(response.GetResponseStream()))
        {
            text = reader.ReadToEnd();
        }
    }

    Things get really interesting if you want to download and save to file. With WebClient, it's a simple matter of calling DownloadFile. With HttpWebRequest, you have to create a reading loop, etc. The number of ways you can make a mistake with HttpWebRequest is truly astounding.

    With HttpWebRequest, you'd have to duplicate the code above, or wrap that code in a method. But if you're going to wrap it in a method, then why not just use WebClient, which already does it for you?
    When you consider that a request to a fast Web site will probably take on the order of 100 to 500 milliseconds, the few milliseconds' overhead that WebClient adds will amount to at most single-digit percentage of the total time.
    Use WebClient for simple things. Only use HttpWebRequest if you require the additional low-level control that it offers. Speed considerations among the two are irrelevant.

    Best Regards,

    Cherry


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Devon_Nullman Saturday, May 27, 2017 4:29 AM
    Friday, May 26, 2017 6:30 AM
    Moderator
  • However the Async does make it so that the application does not freeze up (much) while it is running async. If you run the test and move the form with parallel or serial normal it locks the app each test while the async does not.

    So thats what we want for a progress bar and etc.


    @tommy

    With regards to the Async, as you no doubt know, both the WebClient and WebRequest Classes have exposed Async methods since the early days of .NET. The WebClient's DownloadDataAsync, DownloadFileAsync, DownloadStringAsync and OpenReadAsync methods are particularly simple to work against, and that's how I've done this sort of thing in the past.

    However, recently I've been looking at the Async Await pattern that's available in the recent versions of .NET. That's what drew me to Devon's thread; a good opportunity to experiment, and a topic that works well with the walkthroughs linked to from https://docs.microsoft.com/en-us/dotnet/visual-basic/programming-guide/concepts/async/index#a-namebkmkrelatedtopicsa-related-topics-and-samples-visual-studio

    Anyways, for what it's worth, this is how I'd tackle Devon's problem with my current level of understanding of Async Await. The code does the following:

    • Loads all the unique ticker names into a Queue(Of String).
    • Creates 4 tasks (in an array), each with its own WebClient instance.
    • Each task dequeues a ticker name, builds the required URL from the name, and then Awaits its WebClient's DownloadStringTaskAsync method.
    • The results are stored in a Dictionary(Of String,String), where the Key is the ticker name, and the Value is the data parsed from the downloaded web page.
    • Each task loops until there are no more tickers to download.
    • Progress is reported by the task by updating the Text of a Label.
    • When there are no more tickers in the Queue, then we know the process has finished.

    Much of the methodology is explained further in the walkthroughs I linked to earlier.

    The Dictionary is used to store the results because, as mentioned in previous posts, the order in which the results are returned does not match the order the tickers were placed in the Queue. However, each of the Dictionary's KeyValue pairs ties the ticker name (Dictionary's Key) to its own data (Dictionary's Value).

    As I understand it, all the user code here is executed on the UI thread, so there is no need to use collections from the System.Collections.Concurrent NameSpace for the sake of thread safety, nor is there any need to Invoke to the UI thread when updating Controls while reporting progress. I presume the WebClients are downloading on their own threads.

    1000 tickers are processed in about 3.5 minutes, reporting progress or live results is easy, and the UI is not frozen.

    Edit: Meant to say: The Form requires a Button, a Label (with plenty of room to the right) and a (wide-ish) RichTextBox.

    Imports System.IO
    Imports System.Net
    
    Public Class Form1
    
    
        Private maxConnections As Integer = 4
        Private queuedTickers As Queue(Of String)
        Private dictTickerResults As Dictionary(Of String, String)
    
        Private URLPart1 As String = "https://finance.yahoo.com/quote/"
        Private URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
    
    
        Private Async Sub Button1_Click(sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            ServicePointManager.DefaultConnectionLimit = maxConnections
    
            Dim sw As New Stopwatch
            sw.Start()
    
    
            queuedTickers = New Queue(Of String)(File.ReadAllLines("Tickers.txt").Distinct)
            dictTickerResults = New Dictionary(Of String, String)
    
            RichTextBox1.Clear()
            Button1.Enabled = False   '   prevent re-entrance while Tasks run
            ReportProgress()
    
    
            ' defer the Tasks from starting until .ToArray is called
            Dim tasksQuery As IEnumerable(Of Task) = From count In Enumerable.Range(0, maxConnections)
                                                     Select FetchTickerData()
            Await Task.WhenAll(tasksQuery.ToArray)
    
            Button1.Enabled = True
    
    
            '   statistics and results
            sw.Stop()
            MsgBox($"{dictTickerResults.Count} tickers fetched in {sw.ElapsedMilliseconds:N0} ms")
    
            For Each kvp As KeyValuePair(Of String, String) In dictTickerResults
                Dim tickerName As String = kvp.Key
                Dim tickerData As String = kvp.Value
                RichTextBox1.AppendText(tickerName & " " & vbTab & " " & tickerData & vbNewLine)
            Next
            RichTextBox1.AppendText($"{vbNewLine}")
            RichTextBox1.AppendText($"{dictTickerResults.Count} tickers fetched in {sw.Elapsed:mm'mins, 'ss\.ff'secs'}")
            RichTextBox1.AppendText($"{vbNewLine}")
    
        End Sub
    
        Private Async Function FetchTickerData() As Task
            Using client As New WebClient
                client.Headers.Add("User-Agent", "WebClientZilla 5.0")
                client.Encoding = System.Text.Encoding.UTF8
    
                Do While queuedTickers.Count > 0
                    Dim thisTicker As String = queuedTickers.Dequeue
    
                    Try
                        Dim pageContents As String = Await client.DownloadStringTaskAsync(URLfromTicker(thisTicker))
                        dictTickerResults(thisTicker) = ParsePageContents(pageContents)
                    Catch ex As Exception
                        dictTickerResults(thisTicker) = $"No Data : ERROR: {ex.Message}"
                    End Try
    
                    ReportProgress()
                Loop
    
            End Using
        End Function
    
        Private Sub ReportProgress()
            Label1.Text = $"{queuedTickers.Count} tickers waiting to start downloading:{dictTickerResults.Count} tickers processed"
        End Sub
    
    
        Private Function URLfromTicker(tickerName As String) As String
            Return String.Concat(URLPart1, tickerName, URLPart2)
        End Function
    
        Private Function ParsePageContents(pageContents As String) As String
    
            Dim data As String = GetTextBetween(pageContents, """prices"":[{", "}")
            If String.IsNullOrEmpty(data) Then
                data = "No Data"
            End If
    
            Return data
        End Function
    
    
        Private Function GetTextBetween(input As String, startDelimiter As String, endDelimiter As String) As String
            Dim result As String = Nothing
    
            If Not String.IsNullOrWhiteSpace(input) Then
                Dim index1 As Integer = input.IndexOf(startDelimiter)
                If index1 <> -1 Then
                    Dim index2 As Integer = input.IndexOf(endDelimiter, index1 + 1)
                    If index2 <> -1 Then
                        Dim resultLength As Integer = index2 - index1 - startDelimiter.Length
                        result = input.Substring(index1 + startDelimiter.Length, resultLength)
                    End If
                End If
            End If
    
            Return result
        End Function
    
    End Class



    • Edited by S P C Monday, May 29, 2017 4:15 PM requisites
    • Marked as answer by Devon_Nullman Monday, May 29, 2017 7:13 PM
    Monday, May 29, 2017 4:06 PM
  • If the difference between the methods is less than a second I'm not sure that one will be better than another.  Loading web pages is a function of sending the request, the web server processing the request and finally receiving the data.  A difference of a second can easily be explained away in the above processing.


    Lloyd Sheen

    What I was wondering is: Is there another method that might be faster, more consistent (Or both), or are there additional settings I could use to accomplish that same goal with the methods in the example code I posted. WGet I think uses sockets and somehow knows what port to connect to as well as performing an internal nslookup  to get the proper IP address. Not something I want to tackle.

    I don't know what you mean by "A difference of a second can easily be explained away in the above processing." - it's the same website every time, just the stock ticker changes.


    I did this test for fun which gets the page every two seconds four times in a loop for each method. You can see the results can be a bit random. So unless one method is like 20 percent faster it becomes irrelevant which method to use.

    I tried using async but I dont really know why?

    Option Strict On
    
    Imports System.IO
    Imports System.Net
    
    Public Class Form5
        Private WithEvents GoButton As New Button With {.Parent = Me, .Text = "Go",
            .Location = New Point(100, 20)}
        Private Label1 As New Label With {.Parent = Me, .Location = New Point(30, 70),
            .AutoSize = True, .Text = "Click Go to Start", .Font = New Font("arial", 10, FontStyle.Bold)}
    
        Private siteURL, totaltotal1, totaltotal2 As String
        Private SW As New Stopwatch
        Private testnumber As Integer
        Private totalTime1, totaltime2 As Single
    
        Private Sub Form5_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Dim URLPart1 As String = "https://finance.yahoo.com/quote/"
            Dim URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
     
            siteURL = URLPart1 & "^DJI" & URLPart2
            Timer1.Interval = 2000
    
        End Sub
    
        Private Async Function GetTextAsync() As Task(Of String)
            Using client As New System.Net.WebClient
    
                Using stream = Await client.OpenReadTaskAsync(siteURL)
                    Dim sr As StreamReader = New StreamReader(stream)
                    Dim t As String = sr.ReadToEnd
                    Return t
                End Using
            End Using
        End Function
    
        Public Function GetTextFromUrl(ByVal thisUrl As String) As String
            Dim Webreq As WebRequest = WebRequest.Create(thisUrl)
            Dim Webresp As WebResponse = Webreq.GetResponse
            Dim Webstr As Stream = Webresp.GetResponseStream
            Dim sr As StreamReader = New StreamReader(Webstr)
            Dim t As String = sr.ReadToEnd
            sr.Close()
            Webstr.Close()
            Return t
        End Function
    
        Private Async Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
            Dim pageText, msg As String
            Dim max As Integer = 4
    
            If testnumber > 2 * max Then
                Timer1.Stop()
                Label1.Text =
                    "Web Request" & vbLf & totaltotal1 &
                    "   Avg: " & (totalTime1 / max).ToString & " ms" & vbLf & vbLf &
                    "Async" & vbLf & totaltotal2 &
                    "   Avg: " & (totaltime2 / max).ToString & " ms" & vbLf
                testnumber = 0
            Else
                SW.Reset()
                SW.Start()
                Select Case testnumber
                    Case 0
                        totalTime1 = 0
                        totaltime2 = 0
                        totaltotal1 = ""
                        totaltotal2 = ""
    
                    Case <= max
                        pageText = GetTextFromUrl(siteURL)
                        msg = "Web Request "
                        SW.Stop()
                        totalTime1 += SW.ElapsedMilliseconds
                        totaltotal1 &= SW.ElapsedMilliseconds.ToString & vbLf
    
                        Label1.Text = "Web Request " & testnumber.ToString & "   Total Time: " & totalTime1.ToString
                    Case <= 2 * max
                        pageText = Await GetTextAsync()
                        msg = "Async "
                        SW.Stop()
                        totaltime2 += SW.ElapsedMilliseconds
                        totaltotal2 &= SW.ElapsedMilliseconds.ToString & vbLf
                        Label1.Text = "Async " & testnumber.ToString & "   Total Time: " & totaltime2.ToString
                End Select
                testnumber += 1
    
            End If
    
        End Sub
    
        Private Sub GoButton_Click(sender As Object, e As EventArgs) Handles GoButton.Click
            Label1.Text = "Starting Test..."
            Timer1.Start()
    
        End Sub
    End Class


    Friday, May 26, 2017 5:36 AM

All replies

  • If the difference between the methods is less than a second I'm not sure that one will be better than another.  Loading web pages is a function of sending the request, the web server processing the request and finally receiving the data.  A difference of a second can easily be explained away in the above processing.

    Lloyd Sheen

    Thursday, May 25, 2017 9:44 PM
  • If the difference between the methods is less than a second I'm not sure that one will be better than another.  Loading web pages is a function of sending the request, the web server processing the request and finally receiving the data.  A difference of a second can easily be explained away in the above processing.

    Lloyd Sheen

    What I was wondering is: Is there another method that might be faster, more consistent (Or both), or are there additional settings I could use to accomplish that same goal with the methods in the example code I posted. WGet I think uses sockets and somehow knows what port to connect to as well as performing an internal nslookup  to get the proper IP address. Not something I want to tackle.

    I don't know what you mean by "A difference of a second can easily be explained away in the above processing." - it's the same website every time, just the stock ticker changes.

    Friday, May 26, 2017 2:21 AM
  • If the difference between the methods is less than a second I'm not sure that one will be better than another.  Loading web pages is a function of sending the request, the web server processing the request and finally receiving the data.  A difference of a second can easily be explained away in the above processing.


    Lloyd Sheen

    What I was wondering is: Is there another method that might be faster, more consistent (Or both), or are there additional settings I could use to accomplish that same goal with the methods in the example code I posted. WGet I think uses sockets and somehow knows what port to connect to as well as performing an internal nslookup  to get the proper IP address. Not something I want to tackle.

    I don't know what you mean by "A difference of a second can easily be explained away in the above processing." - it's the same website every time, just the stock ticker changes.


    I did this test for fun which gets the page every two seconds four times in a loop for each method. You can see the results can be a bit random. So unless one method is like 20 percent faster it becomes irrelevant which method to use.

    I tried using async but I dont really know why?

    Option Strict On
    
    Imports System.IO
    Imports System.Net
    
    Public Class Form5
        Private WithEvents GoButton As New Button With {.Parent = Me, .Text = "Go",
            .Location = New Point(100, 20)}
        Private Label1 As New Label With {.Parent = Me, .Location = New Point(30, 70),
            .AutoSize = True, .Text = "Click Go to Start", .Font = New Font("arial", 10, FontStyle.Bold)}
    
        Private siteURL, totaltotal1, totaltotal2 As String
        Private SW As New Stopwatch
        Private testnumber As Integer
        Private totalTime1, totaltime2 As Single
    
        Private Sub Form5_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Dim URLPart1 As String = "https://finance.yahoo.com/quote/"
            Dim URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
     
            siteURL = URLPart1 & "^DJI" & URLPart2
            Timer1.Interval = 2000
    
        End Sub
    
        Private Async Function GetTextAsync() As Task(Of String)
            Using client As New System.Net.WebClient
    
                Using stream = Await client.OpenReadTaskAsync(siteURL)
                    Dim sr As StreamReader = New StreamReader(stream)
                    Dim t As String = sr.ReadToEnd
                    Return t
                End Using
            End Using
        End Function
    
        Public Function GetTextFromUrl(ByVal thisUrl As String) As String
            Dim Webreq As WebRequest = WebRequest.Create(thisUrl)
            Dim Webresp As WebResponse = Webreq.GetResponse
            Dim Webstr As Stream = Webresp.GetResponseStream
            Dim sr As StreamReader = New StreamReader(Webstr)
            Dim t As String = sr.ReadToEnd
            sr.Close()
            Webstr.Close()
            Return t
        End Function
    
        Private Async Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
            Dim pageText, msg As String
            Dim max As Integer = 4
    
            If testnumber > 2 * max Then
                Timer1.Stop()
                Label1.Text =
                    "Web Request" & vbLf & totaltotal1 &
                    "   Avg: " & (totalTime1 / max).ToString & " ms" & vbLf & vbLf &
                    "Async" & vbLf & totaltotal2 &
                    "   Avg: " & (totaltime2 / max).ToString & " ms" & vbLf
                testnumber = 0
            Else
                SW.Reset()
                SW.Start()
                Select Case testnumber
                    Case 0
                        totalTime1 = 0
                        totaltime2 = 0
                        totaltotal1 = ""
                        totaltotal2 = ""
    
                    Case <= max
                        pageText = GetTextFromUrl(siteURL)
                        msg = "Web Request "
                        SW.Stop()
                        totalTime1 += SW.ElapsedMilliseconds
                        totaltotal1 &= SW.ElapsedMilliseconds.ToString & vbLf
    
                        Label1.Text = "Web Request " & testnumber.ToString & "   Total Time: " & totalTime1.ToString
                    Case <= 2 * max
                        pageText = Await GetTextAsync()
                        msg = "Async "
                        SW.Stop()
                        totaltime2 += SW.ElapsedMilliseconds
                        totaltotal2 &= SW.ElapsedMilliseconds.ToString & vbLf
                        Label1.Text = "Async " & testnumber.ToString & "   Total Time: " & totaltime2.ToString
                End Select
                testnumber += 1
    
            End If
    
        End Sub
    
        Private Sub GoButton_Click(sender As Object, e As EventArgs) Handles GoButton.Click
            Label1.Text = "Starting Test..."
            Timer1.Start()
    
        End Sub
    End Class


    Friday, May 26, 2017 5:36 AM
  • Hi Devon_Nullman,

    WebClient is just a wrapper around HttpWebRequest. Using WebClient is potentially slightly (on the order of a few milliseconds) slower than using HttpWebRequest directly. But that "inefficiency" comes with huge benefits: it requires less code, is easier to use, and you're less likely to make a mistake when using it. Consider, for example, retrieving the text of a Web page using WebClient.

    var client = new WebClient();
    var text = client.DownloadString("http://example.com/page.html");

    Contrast that to HttpWebRequest:

    string text;
    var request = (HttpWebRequest)WebRequest.Create("http://example.com/page.html");
    using (var response = request.GetResponse())
    {
        using (var reader = new StreamReader(response.GetResponseStream()))
        {
            text = reader.ReadToEnd();
        }
    }

    Things get really interesting if you want to download and save to file. With WebClient, it's a simple matter of calling DownloadFile. With HttpWebRequest, you have to create a reading loop, etc. The number of ways you can make a mistake with HttpWebRequest is truly astounding.

    With HttpWebRequest, you'd have to duplicate the code above, or wrap that code in a method. But if you're going to wrap it in a method, then why not just use WebClient, which already does it for you?
    When you consider that a request to a fast Web site will probably take on the order of 100 to 500 milliseconds, the few milliseconds' overhead that WebClient adds will amount to at most single-digit percentage of the total time.
    Use WebClient for simple things. Only use HttpWebRequest if you require the additional low-level control that it offers. Speed considerations among the two are irrelevant.

    Best Regards,

    Cherry


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Devon_Nullman Saturday, May 27, 2017 4:29 AM
    Friday, May 26, 2017 6:30 AM
    Moderator
  • Devon,

    I agree with the text (would have written almost the same) from Cherry, a pity that she not shows VB code in this forum but C#.


    Success
    Cor


    Friday, May 26, 2017 9:37 AM
  • Devon,

    I agree with the text (would have written almost the same) from Cherry, a pity that she not shows VB code in this forum but C#.


    Success
    Cor



    :-)

    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Friday, May 26, 2017 11:53 AM
  • WebClient is just a wrapper around HttpWebRequest.

    I don't think that's correct. If it were then it wouldn't handle FTP - which is does.

    Do you maybe mean WebRequest?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Friday, May 26, 2017 12:01 PM
  • ...

    I don't know what you mean by "A difference of a second can easily be explained away in the above processing." - it's the same website every time, just the stock ticker changes.

    It means there are a LOT of intermediate factors which are beyond the control of your application.

    Just run a continuous PING of the site and see how the times vary.  Network load on every hop between you and the server can impact your response time.  The current load of the server can affect response time.  In short, you can't get a reliable test using a public webserver like this.

    You need to set up your own web server as close to home as possible (ideally on the same local network).  Your program should be the only thing hitting the server and your desktop and server should be the only active devices on the network segment.  Only by eliminating all intermediate potential hindrances can you determine which block of code performs the download the quickest - if there is truly any measurable difference at all.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Friday, May 26, 2017 1:21 PM
    Moderator
  • Devon,

    I agree with the text (would have written almost the same) from Cherry, a pity that she not shows VB code in this forum but C#.


    Success
    Cor


    @Cor:

    You know what to click on that post.  ;)


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Friday, May 26, 2017 1:24 PM
    Moderator
  • Here are the results from pinging "finance.yahoo.com" a few hundred times:

    ms   hits
    16   20
    17   33
    18   31
    19   36
    20   27
    21   11
    22   10
    23   9
    24   6
    25   12
    26   3
    27   6
    28   5
    67   2
    30   2
    33   2
    33   2
    35   1
    64   1
    67   2
    67   2
    77   1
    89   1
    117  1
    I'm sure there is more to it than just a ping, lots of server database lookups, etc.
    As it is, it takes just under an hour to update 4000 + files and I guess I am not going to get any significant speed increase.


    • Edited by Devon_Nullman Saturday, May 27, 2017 4:27 AM increase
    Saturday, May 27, 2017 4:22 AM
  • Here are the results from pinging "finance.yahoo.com" a few hundred times:

    ms   hits
    16   20
    17   33
    18   31
    19   36
    20   27
    21   11
    22   10
    23   9
    24   6
    25   12
    26   3
    27   6
    28   5
    67   2
    30   2
    33   2
    33   2
    35   1
    64   1
    67   2
    67   2
    77   1
    89   1
    117  1
    I'm sure there is more to it than just a ping, lots of server database lookups, etc.
    As it is, it takes just under an hour to update 4000 + files and I guess I am not going to get any significant speed increase.


    Sure there are those other factors, but just look at how much the transmission time affects things and how much it can vary.

    This list of ping times kinda proves that sqlguy had the correct answer (and the first answer).  He should really have gotten the credit.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Saturday, May 27, 2017 12:18 PM
    Moderator
  • I'm sure there is more to it than just a ping, lots of server database lookups, etc.
    As it is, it takes just under an hour to update 4000 + files and I guess I am not going to get any significant speed increase.


    So use 4 or so WebClients in parallel and you should get that down to 15 minutes, give or take. At least until Yahoo decide they have to put a limit on the number of simultaeneous connections people can make.
    Saturday, May 27, 2017 2:11 PM
  • I'm sure there is more to it than just a ping, lots of server database lookups, etc.
    As it is, it takes just under an hour to update 4000 + files and I guess I am not going to get any significant speed increase.


    So use 4 or so WebClients in parallel and you should get that down to 15 minutes, give or take. At least until Yahoo decide they have to put a limit on the number of simultaeneous connections people can make.

    SPC,

    That is a good idea.

    How would one do that exactly?

    Saturday, May 27, 2017 2:54 PM
  • I'm sure there is more to it than just a ping, lots of server database lookups, etc.
    As it is, it takes just under an hour to update 4000 + files and I guess I am not going to get any significant speed increase.


    So use 4 or so WebClients in parallel and you should get that down to 15 minutes, give or take. At least until Yahoo decide they have to put a limit on the number of simultaeneous connections people can make.

    SPC,

    That is a good idea.

    How would one do that exactly?


    The easiest thing is probably to refactor the code to use a Parallel.For (or ForEach) over the colleciton of "tickers".

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Saturday, May 27, 2017 3:07 PM
    Moderator
  • I'm sure there is more to it than just a ping, lots of server database lookups, etc.
    As it is, it takes just under an hour to update 4000 + files and I guess I am not going to get any significant speed increase.


    So use 4 or so WebClients in parallel and you should get that down to 15 minutes, give or take. At least until Yahoo decide they have to put a limit on the number of simultaeneous connections people can make.

    SPC,

    That is a good idea.

    How would one do that exactly?


    The easiest thing is probably to refactor the code to use a Parallel.For (or ForEach) over the colleciton of "tickers".

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    As SPC mentions they must limit how fast they respond to the same address?

    I mean I was thinking you could just loop the 4000 stock symbols in a loop and wait 0.8 secs and you get 4000 back. Assuming the sender could receive it I started thinking that the source or yahoo would not send it that fast to the same address???

    But probably four at once would work?

    I will test it a bit later with a for loop if no one else has...

    Saturday, May 27, 2017 4:06 PM
  • Maybe, maybe not.  And who knows what the threshold might be.  Start with the defaults on Parallel.For and then if you run into issues, use advanced options to tweak the execution so that the number of concurrent threads is limited.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Saturday, May 27, 2017 4:20 PM
    Moderator
  • Maybe, maybe not.  And who knows what the threshold might be.  Start with the defaults on Parallel.For and then if you run into issues, use advanced options to tweak the execution so that the number of concurrent threads is limited.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


    Well I dont know parrallel fors but may attempt it in a while.

    Since you are here Reed what do you think of the Ansync routine I showed in my example?

    You know I dont know Async but using it for a timer as you showed us. However, I was wondering if I could call that Async function in my example test here 4 times with 4 different symbols from a basic for loop and that would await four different responses? Or is that what the parallel for is for?

    Is not that one use for Async to help download files from the internet? So is it as easy as calling a routine four times or does one have to do more work to have 4 independent threads? awaiting response?

    Saturday, May 27, 2017 4:39 PM
  • Maybe, maybe not.  And who knows what the threshold might be.  Start with the defaults on Parallel.For and then if you run into issues, use advanced options to tweak the execution so that the number of concurrent threads is limited.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


    Well I dont know parrallel fors but may attempt it in a while.

    Since you are here Reed what do you think of the Ansync routine I showed in my example?

    You know I dont know Async but using it for a timer as you showed us. However, I was wondering if I could call that Async function in my example test here 4 times with 4 different symbols from a basic for loop and that would await four different responses? Or is that what the parallel for is for?

    Is not that one use for Async to help download files from the internet? So is it as easy as calling a routine four times or does one have to do more work to have 4 independent threads? awaiting response?

    I don't know what I'm doing (in a big way) but ... it's interesting:

    Option Strict On
    Option Explicit On
    Option Infer Off
    
    Imports System.Net
    Imports System.IO
    Imports System.Threading.Tasks
    
    Public Class Form1
        Private Sub _
            Form1_Load(ByVal sender As System.Object, _
                       ByVal e As System.EventArgs) _
                       Handles MyBase.Load
    
            Dim results() As String = RunTest()
    
            Stop
    
        End Sub
    
        Private Function RunTest() As String()
    
            Dim urlList As New List(Of String) From _
                {"http://www.msn.com/", "http://www.cnn.com/", _
                 "https://www.yahoo.com/news/", "https://news.google.com/"}
    
            Dim resultList As New List(Of String)
    
            Parallel.ForEach(urlList, _
                             Sub(currentURL) resultList.Add(GetStringFromURL(currentURL)))
    
            Return resultList.ToArray
    
        End Function
    
        Private Function GetStringFromURL(ByVal url As String) As String
    
            Dim retVal As String
            Dim sb As New System.Text.StringBuilder
    
            Try
                Dim request As WebRequest = WebRequest.Create(url)
                Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
                    Using dataStream As Stream = response.GetResponseStream
                        Using rdr As New StreamReader(dataStream)
                            sb.Append(rdr.ReadToEnd)
                        End Using
                    End Using
                End Using
                retVal = sb.ToString
            Catch ex As Exception
                retVal = ex.Message
            End Try
    
            Return retVal
    
        End Function
    End Class


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 4:46 PM
  • Plus,

    In the WUG api they let you stack calls. ie You call with four locations in one api call request whatever its called. And that counts as one request even though you requested 4 cities at once. You get 500 free a day.

    And then,... I forget exactly how the info comes back and never timed any of it.

    But my point is maybe requests can be stacked you you request 4000 stocks symbols in one request. Maybe yahoo already has this in its api.

    Saturday, May 27, 2017 4:57 PM
  • Plus,

    In the WUG api they let you stack calls. ie You call with four locations in one api call request whatever its called. And that counts as one request even though you requested 4 cities at once. You get 500 free a day.

    And then,... I forget exactly how the info comes back and never timed any of it.

    But my point is maybe requests can be stacked you you request 4000 stocks symbols in one request. Maybe yahoo already has this in its api.

    Tommy,

    I haven't been an active part of this (I always worry about terms of use with things like this) but would you try this?

    Option Strict On Option Explicit On Option Infer Off Imports System.Net Imports System.IO Imports System.Threading.Tasks Public Class Form1 Private Sub _ Form1_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles MyBase.Load Const URLPart1 As String = "https://finance.yahoo.com/quote/" Const URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d" Dim Tickers As New List(Of String) From {"^DJI", "AAPL", "IBM", "GM"} Dim urls As New List(Of String) For Each ticker As String In Tickers urls.Add(URLPart1 & ticker & URLPart2) Next Dim sw As New Stopwatch sw.Start() Dim results() As String = GetYahooData(urls.ToArray) sw.Stop() Stop End Sub Private Function GetYahooData(ByVal urls() As String) As String() Dim retVal() As String = Nothing Dim resultList As New List(Of String) Parallel.ForEach(urls, _ Sub(currentURL) resultList.Add(GetStringFromURL(currentURL))) If resultList.Count > 0 Then retVal = resultList.ToArray End If Return retVal End Function 'Private Function RunTest() As String() ' Dim urlList As New List(Of String) From _ ' {"http://www.msn.com/", "http://www.cnn.com/", _ ' "https://www.yahoo.com/news/", "https://news.google.com/"} ' Dim resultList As New List(Of String) ' Parallel.ForEach(urlList, _ ' Sub(currentURL) resultList.Add(GetStringFromURL(currentURL))) ' Return resultList.ToArray 'End Function Private Function GetStringFromURL(ByVal url As String) As String Dim retVal As String Dim sb As New System.Text.StringBuilder Try Dim request As WebRequest = WebRequest.Create(url) Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse) Using dataStream As Stream = response.GetResponseStream Using rdr As New StreamReader(dataStream) sb.Append(rdr.ReadToEnd) End Using End Using End Using retVal = sb.ToString Catch ex As Exception retVal = ex.Message End Try Return retVal End Function End Class


    It's getting *something* but is it the right return data?

    Like I said, it's interesting if nothing else.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 5:24 PM

  • Did it work?

    "If at first you do succeed, try not to look surprised".


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 5:39 PM
  • Maybe, maybe not.  And who knows what the threshold might be.  Start with the defaults on Parallel.For and then if you run into issues, use advanced options to tweak the execution so that the number of concurrent threads is limited.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


    Well I dont know parrallel fors but may attempt it in a while.

    Since you are here Reed what do you think of the Ansync routine I showed in my example?

    You know I dont know Async but using it for a timer as you showed us. However, I was wondering if I could call that Async function in my example test here 4 times with 4 different symbols from a basic for loop and that would await four different responses? Or is that what the parallel for is for?

    Is not that one use for Async to help download files from the internet? So is it as easy as calling a routine four times or does one have to do more work to have 4 independent threads? awaiting response?

    I don't know what I'm doing (in a big way) but ... it's interesting:

    Option Strict On
    Option Explicit On
    Option Infer Off
    
    Imports System.Net
    Imports System.IO
    Imports System.Threading.Tasks
    
    Public Class Form1
        Private Sub _
            Form1_Load(ByVal sender As System.Object, _
                       ByVal e As System.EventArgs) _
                       Handles MyBase.Load
    
            Dim results() As String = RunTest()
    
            Stop
    
        End Sub
    
        Private Function RunTest() As String()
    
            Dim urlList As New List(Of String) From _
                {"http://www.msn.com/", "http://www.cnn.com/", _
                 "https://www.yahoo.com/news/", "https://news.google.com/"}
    
            Dim resultList As New List(Of String)
    
            Parallel.ForEach(urlList, _
                             Sub(currentURL) resultList.Add(GetStringFromURL(currentURL)))
    
            Return resultList.ToArray
    
        End Function
    
        Private Function GetStringFromURL(ByVal url As String) As String
    
            Dim retVal As String
            Dim sb As New System.Text.StringBuilder
    
            Try
                Dim request As WebRequest = WebRequest.Create(url)
                Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
                    Using dataStream As Stream = response.GetResponseStream
                        Using rdr As New StreamReader(dataStream)
                            sb.Append(rdr.ReadToEnd)
                        End Using
                    End Using
                End Using
                retVal = sb.ToString
            Catch ex As Exception
                retVal = ex.Message
            End Try
    
            Return retVal
    
        End Function
    End Class


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Cool Frank,

    Looks like a 40 percent increase?

    This is the same example I posted before with your routine(s).

    Still looking for errors....

    PS I just saw your last post example 2 and will look...

    'web page open test v2 parallel vs serial web request
    'yahoo stock price
    Option Strict On
    
    Imports System.IO
    Imports System.Net
    Public Class Form2
        Private WithEvents Timer1 As New Timer With {.Interval = 5000}
        Private WithEvents GoButton As New Button With {.Parent = Me, .Text = "Go",
        .Location = New Point(100, 20)}
        Private Label1 As New Label With {.Parent = Me, .Location = New Point(30, 70),
            .AutoSize = True, .Text = "Click Go to Start", .Font = New Font("arial", 10, FontStyle.Bold)}
    
        Private siteURL, totaltotal1, totaltotal2 As String
        Private SW As New Stopwatch
        Private testnumber As Integer
        Private totalTime1, totaltime2 As Single
        Private UrlList As New List(Of String)
    
        Private Sub Form5_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            ClientSize = New Size(400, 400)
            Dim URLPart1 As String = "https://finance.yahoo.com/quote/"
            Dim URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
    
            siteURL = URLPart1 & "^DJI" & URLPart2
    
            Dim Tickers As New List(Of String) From {"^DJI", "AAPL", "IBM", "GM"}
            For Each Ticker As String In Tickers
                UrlList.Add(URLPart1 & Ticker & URLPart2)
            Next
    
        End Sub
    
    
        Private Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
            Dim max As Integer = 4
    
            If testnumber > 2 * max Then
                Timer1.Stop()
                Label1.Text =
                    "Parallel Requests (4 stocks ea)" & vbLf & totaltotal1 &
                    "   Avg/Request: " & (totalTime1 / (4 * max)).ToString & " ms" & vbLf & vbLf &
                    "Serial Request" & vbLf & totaltotal2 &
                    "   Avg/Request: " & (totaltime2 / (4 * max)).ToString & " ms" & vbLf
                testnumber = 0
            Else
                SW.Reset()
                SW.Start()
                Select Case testnumber
                    Case 0
                        totalTime1 = 0
                        totaltime2 = 0
                        totaltotal1 = ""
                        totaltotal2 = ""
                        SW.Stop()
                    Case <= max
    
                        Dim resultList As New List(Of String)
    
                        Parallel.ForEach(UrlList,
                             Sub(currentURL) resultList.Add(GetStringFromURL(currentURL)))
    
                        resultList.ToArray()
    
                        SW.Stop()
                        totalTime1 += SW.ElapsedMilliseconds
                        totaltotal1 &= SW.ElapsedMilliseconds.ToString & vbLf
    
                        Label1.Text = "Parallel Requests " & testnumber.ToString & "   Total Time: " & totalTime1.ToString
                    Case <= 2 * max
                        Dim result As String = ""
    
                        For Each thisUrl In UrlList
                            result &= GetStringFromURL(thisUrl)
    
                        Next
    
                        SW.Stop()
                        totaltime2 += SW.ElapsedMilliseconds
                        totaltotal2 &= SW.ElapsedMilliseconds.ToString & vbLf
                        Label1.Text = "Serial Requests " & testnumber.ToString & "   Total Time: " & totaltime2.ToString
                End Select
                testnumber += 1
    
            End If
    
        End Sub
    
        Private Sub GoButton_Click(sender As Object, e As EventArgs) Handles GoButton.Click
            Label1.Text = "Starting Test..."
            Timer1.Start()
    
        End Sub
    
        Private Function GetStringFromURL(ByVal url As String) As String
    
            Dim retVal As String
            Dim sb As New System.Text.StringBuilder
    
            Try
                Dim request As WebRequest = WebRequest.Create(url)
                Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
                    Using dataStream As Stream = response.GetResponseStream
                        Using rdr As New StreamReader(dataStream)
                            sb.Append(rdr.ReadToEnd)
                        End Using
                    End Using
                End Using
                retVal = sb.ToString
            Catch ex As Exception
                retVal = ex.Message
            End Try
    
            Return retVal
    
        End Function
    
    End Class

    Saturday, May 27, 2017 5:58 PM
  • Tommy,

    Since I don't know what I'm looking at (I don't know anything about stock), I'll wait to see if you can validate the data as being correct, but it is encouraging.

    There's bound to be a bottom though. How can they leave it wide open this long?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 6:04 PM
  • It's getting *something* but is it the right return data?

    Like I said, it's interesting if nothing else.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    I added a routine to get the important data out, if it is there. Fast, and looks perfect, I just have to figure out how to apply this to getting 4000 or more items and showing progress without freezing the UI.

    Option Strict On
    Option Explicit On
    Option Infer Off
    
    Imports System.Text.RegularExpressions
    Imports System.Net
    Imports System.IO
    Imports System.Threading.Tasks
    
    Public Class Form1
        Private Sub _
            Form1_Load(ByVal sender As System.Object, _
                       ByVal e As System.EventArgs) _
                       Handles MyBase.Load
    
            Const URLPart1 As String = "https://finance.yahoo.com/quote/"
            Const URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
    
            Dim Tickers As New List(Of String) From {"^DJI", "AAPL", "IBM", "GM"}
            Dim urls As New List(Of String)
    
            For Each ticker As String In Tickers
                urls.Add(URLPart1 & ticker & URLPart2)
            Next
    
            Dim sw As New Stopwatch
            sw.Start()
    
            Dim results() As String = GetYahooData(urls.ToArray)
            Dim Lengths As New List(Of String)
            For Each s As String In results
                Lengths.Add(TrimData(s))
            Next
    
            sw.Stop()
    
            Stop
    
        End Sub
        Private Function TrimData(DataIN As String) As String
            Dim ReturnString As String = ""
            If DataIN Is Nothing Or String.IsNullOrWhiteSpace(DataIN) Then
                Return "No Data"
            End If
            Dim q As String = Chr(34)
            Dim pattern As String = "\[{" & q & "date" & q & "(.*?)\}\]"
            Dim rgx As Regex = New Regex(pattern, RegexOptions.IgnoreCase)
            Dim m As Match = rgx.Match(DataIN)
            Dim Start As Integer = m.Index
            Dim Len As Integer = m.Length
            If Len < 10 Then Return ("No Data")
            ReturnString = DataIN.Substring(Start + 2, Len - 4)
            Return ReturnString
        End Function
    
        Private Function GetYahooData(ByVal urls() As String) As String()
    
            Dim retVal() As String = Nothing
            Dim resultList As New List(Of String)
    
            Parallel.ForEach(urls, Sub(currentURL) resultList.Add(GetStringFromURL(currentURL)))
    
            If resultList.Count > 0 Then
                retVal = resultList.ToArray
            End If
    
            Return retVal
    
        End Function
    
        Private Function GetStringFromURL(ByVal url As String) As String
    
            Dim retVal As String
            Dim sb As New System.Text.StringBuilder
    
            Try
                Dim request As WebRequest = WebRequest.Create(url)
                Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
                    Using dataStream As Stream = response.GetResponseStream
                        Using rdr As New StreamReader(dataStream)
                            sb.Append(rdr.ReadToEnd)
                        End Using
                    End Using
                End Using
                retVal = sb.ToString
            Catch ex As Exception
                retVal = ex.Message
            End Try
    
            Return retVal
    
        End Function
    End Class
    


    Saturday, May 27, 2017 6:07 PM
  • Devon,

    I'll add that I have a six-core processor so my results might not be what you get, but it's encouraging.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 6:11 PM

  • Did it work?

    "If at first you do succeed, try not to look surprised".


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Frank,

    I have not tried it but yes I think it works. Your time for 4 quotes was 2200 ms mine was 1900.

    I have not looked at the data yet... but did before... the html I mean, just stock prices etc.

    Saturday, May 27, 2017 6:15 PM
  • Devon,

    Please tell us again what your overall goal is? What are you making?

    What are you doing with the data? How fast do you want to do it?

    Saturday, May 27, 2017 6:23 PM

  • Did it work?

    "If at first you do succeed, try not to look surprised".


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Frank,

    I have not tried it but yes I think it works. Your time for 4 quotes was 2200 ms mine was 1900.

    I have not looked at the data yet... but did before... the html I mean, just stock prices etc.

    Parallelization is really interesting.

    I've experimented with PLINQ quite a fair amount and with Parallel.ForEach only a bit, but the concept is great. Reed knows it well; Duane does too, but I'm a veritable newbie to it. :0

    There are some situations where it's the wrong way to go though. If Reed looks back, hopefully he'll explain but if the result has got to be put back together in a specific order then you're better to just iterate through it.

    Point in case: Look at the return (the screenshot you just showed) compared to the order they were entered in.

    It's interesting if nothing else. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 6:25 PM

  • Parallelization is really interesting.

    I've experimented with PLINQ quite a fair amount and with Parallel.ForEach only a bit, but the concept is great. Reed knows it well; Duane does too, but I'm a veritable newbie to it. :0

    There are some situations where it's the wrong way to go though. If Reed looks back, hopefully he'll explain but if the result has got to be put back together in a specific order then you're better to just iterate through it.

    Point in case: Look at the return (the screenshot you just showed) compared to the order they were entered in.

    It's interesting if nothing else. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Yes.

    And your point about your processors has something to do with it, right? I mean if you have 6 cpus then you can run 6 threads (or less) period?

    But our case is the await case where we are not doing any processing just awaiting a response.

    So I dont know what the parallel for is doing as far as those details.

    But what we are doing is just eating up the slack in our own processing instead of awaiting each single stock quote to return we await four at once sort of...


    Saturday, May 27, 2017 6:36 PM

  • Yes.

    And your point about your processors has something to do with it, right? I mean if you have 6 cpus then you can run 6 threads (or less) period?

    But our case is the await case where we are not doing any processing just awaiting a response.

    So I dont know what the parallel for is doing as far as those details.

    But what we are doing is just eating up the slack in our own processing instead of awaiting each single stock quote to return we await four at once sort of...


    I won't pretend to be in the know at all - but yes the processor count very much influences it.

    I *think* that one has to be set for what's running it (your program). If I'm right, then I gave it four URL's and I had five CPU's left so it ran one on each with one left over. Your difference in time versus mine - please keep in mind that I'm on a 32-bit O/S. ;-)

    Note the "I think" caveat please!

    As I understand it, it's taking a process and it (dotNET) works out how to split up the task, how to work through it all, how to put it back together at the end, and hand it back at the end of things.

    It's mind blowing to me, but somehow it works it all out.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 6:46 PM
  • Plus,

    In the WUG api they let you stack calls. ie You call with four locations in one api call request whatever its called. And that counts as one request even though you requested 4 cities at once. You get 500 free a day.

    And then,... I forget exactly how the info comes back and never timed any of it.

    But my point is maybe requests can be stacked you you request 4000 stocks symbols in one request. Maybe yahoo already has this in its api.

    Tommy - the issue is that at least as best as I can find, Yahoo does not have an official API for stock data. A lot of workarounds and wrappers have been made, but they don't last long.

    "Yahoo! Finance provides a variety of RSS feeds on various finance news topics including top stories, most viewed stories, stories by industry and sector, as well as dynamic feeds for company and industry news based on company ticker symbol."

    A while back they had a documented, free method to obtain a daily or date range data as CSV, but they have added the requirement of a "crumb" and even with this crumb and every user-agent string I have thrown at it, it returns "Unauthorized". Also, many indexes like Dow Jones don't let Yahoo make their data available as CSV.

    Their TOS are basically "You are free to download as much data as you need, but it cannot be redistributed or sold."

    Saturday, May 27, 2017 6:55 PM

  • Tommy - the issue is that at least as best as I can find, Yahoo does not have an official API for stock data. A lot of workarounds and wrappers have been made, but they don't last long.

    "Yahoo! Finance provides a variety of RSS feeds on various finance news topics including top stories, most viewed stories, stories by industry and sector, as well as dynamic feeds for company and industry news based on company ticker symbol."

    A while back they had a documented, free method to obtain a daily or date range data as CSV, but they have added the requirement of a "crumb" and even with this crumb and every user-agent string I have thrown at it, it returns "Unauthorized". Also, many indexes like Dow Jones don't let Yahoo make their data available as CSV.

    Their TOS are basically "You are free to download as much data as you need, but it cannot be redistributed or sold."

    Interesting indeed.

    Why not an API where they can control access if someone tries to abuse it (like hit the server too quickly - what we're doing here!)?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 6:59 PM

  • Tommy - the issue is that at least as best as I can find, Yahoo does not have an official API for stock data. A lot of workarounds and wrappers have been made, but they don't last long.

    "Yahoo! Finance provides a variety of RSS feeds on various finance news topics including top stories, most viewed stories, stories by industry and sector, as well as dynamic feeds for company and industry news based on company ticker symbol."

    A while back they had a documented, free method to obtain a daily or date range data as CSV, but they have added the requirement of a "crumb" and even with this crumb and every user-agent string I have thrown at it, it returns "Unauthorized". Also, many indexes like Dow Jones don't let Yahoo make their data available as CSV.

    Their TOS are basically "You are free to download as much data as you need, but it cannot be redistributed or sold."

    Interesting indeed.

    Why not an API where they can control access if someone tries to abuse it (like hit the server too quickly - what we're doing here!)?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Yahoo does not own the data... who knows what their license is to show it.

    And to get the data you pay... before the internet it cost $ to play stocks.

    So they still dont give it away.

    Saturday, May 27, 2017 7:07 PM


  • Yahoo does not own the data... who knows what their license is to show it.

    And to get the data you pay... before the internet it cost $ to play stocks.

    So they still dont give it away.

    True but they have to pay for the bandwidth - it's their equipment. If the ad revenue is bypassed - and they know it is - what gives here? Are they monetizing it with meta from our connections maybe?

    *****

    Who knows. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 7:21 PM


  • Yahoo does not own the data... who knows what their license is to show it.

    And to get the data you pay... before the internet it cost $ to play stocks.

    So they still dont give it away.

    True but they have to pay for the bandwidth - it's their equipment. If the ad revenue is bypassed - and they know it is - what gives here? Are they monetizing it with meta from our connections maybe?

    *****

    Who knows. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Yes I know what you mean.

    If the use of the data stays within personnel use its just a very small cost to yahoo. Yahoo is a mess. I doubt they spend much time on it either way right now.

    But the stock data is copyright somebody so Yahoo can only show it under their own license with the copyright holder whatever that is. Anything the customer can do with the web page is up to the customer to assure its legal use.

    Now if you resold the data and made $$ then they would come a knockin.

    I am not a lawyer. Just my own opinion. I could be wrong. :)


    PS Part of the stock data magic is how fast you can serve it up. So people pay big to get the fastest most recent information.

    So our 0.8 sec per quote is very slow to the big boys.

    Saturday, May 27, 2017 7:43 PM
  • @Devon:

    I don't see where you are correlating your returned data to the actual ticker name. As Frank has already pointed out, owing to the muti-threading, the results are now being returned in a random order, not in the order that you supplied the ticker names. You may need to refactor your code to make allowances for that.

    @WhoeverIsWritingCodeForThis:

    HTTP puts a limit on the number of concurrent connections to the same server. It's there to protect the server from congestion. The limit is 2 by default.

    To see the benefit from this approach, you'll need to up that limit. In .NET you can do this via ServicePointManager.DefaultConnectionLimit Property. Best practice would be to keep the value as low as possible. 4 connections seems a reasonable value to me. YMMV.


    • Edited by S P C Saturday, May 27, 2017 7:57 PM common sense
    Saturday, May 27, 2017 7:53 PM
  • S P C

    I just noticed that also. I will either have to find a reliable way of finding the ticker in the same text as the prices as well as a way to show progress and not freeze the UI. Might not be worthwhile.

    Thanks

    Saturday, May 27, 2017 8:15 PM
  • S P C

    I just noticed that also. I will either have to find a reliable way of finding the ticker in the same text as the prices as well as a way to show progress and not freeze the UI. Might not be worthwhile.

    Thanks

    No it doesn't matter, not unless you changed your mind:

    You're keeping all of these in a SortedList(Of TKey, TValue) where the key is DateTime. The order of getting them isn't germane here unless I'm missing it?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 8:21 PM
  • What people really pay big for is not end-of-day data, it is near real time trading data. the closer you are physically to the exchange the more it costs (it is sold at least partially by latency).

    0.8 would be fine for them if it was 0.8 nanoseconds :)

    Saturday, May 27, 2017 8:39 PM
  • What people really pay big for is not end-of-day data, it is near real time trading data. the closer you are physically to the exchange the more it costs (it is sold at least partially by latency).

    0.8 would be fine for them if it was 0.8 nanoseconds :)

    Ohhhh ...

    I'm totally ignorant of any of it, but that makes sense.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 27, 2017 8:48 PM
  • ...

    Is not that one use for Async to help download files from the internet? So is it as easy as calling a routine four times or does one have to do more work to have 4 independent threads? awaiting response?

    You're right that async is probably an appropriate solution since the routine is mostly I/O bound.  The Parallel.For will actually use multiple threads so it could potentially be faster if there is still some amount of additional processing after receiving the web result.

    The only thing I'm unsure of in the routine you used in your example is the async event handler for the timer... I'm not sure if there are any ramifications for doing that.  Otherwise the usage looks good.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Saturday, May 27, 2017 11:13 PM
    Moderator
  • S P C

    I just noticed that also. I will either have to find a reliable way of finding the ticker in the same text as the prices as well as a way to show progress and not freeze the UI. Might not be worthwhile.

    Thanks

    Just ensure that the return value of your iterator body function contains both the input ticker name and the resulting value from the web query.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Saturday, May 27, 2017 11:21 PM
    Moderator
  • ...

    Is not that one use for Async to help download files from the internet? So is it as easy as calling a routine four times or does one have to do more work to have 4 independent threads? awaiting response?

    You're right that async is probably an appropriate solution since the routine is mostly I/O bound.  The Parallel.For will actually use multiple threads so it could potentially be faster if there is still some amount of additional processing after receiving the web result.

    The only thing I'm unsure of in the routine you used in your example is the async event handler for the timer... I'm not sure if there are any ramifications for doing that.  Otherwise the usage looks good.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Well I tried Async in a serial style loop but it is maybe a bit slower than the normal serial request. Yes the timer ?? its just for the test. If anyone is interested I can post the test code. Its the same as the above example in a loop.

    However the Async does make it so that the application does not freeze up (much) while it is running async. If you run the test and move the form with parallel or serial normal it locks the app each test while the async does not.

    So thats what we want for a progress bar and etc.


    Sunday, May 28, 2017 12:22 AM
  • However the Async does make it so that the application does not freeze up (much) while it is running async. If you run the test and move the form with parallel or serial normal it locks the app each test while the async does not.

    So thats what we want for a progress bar and etc.


    @tommy

    With regards to the Async, as you no doubt know, both the WebClient and WebRequest Classes have exposed Async methods since the early days of .NET. The WebClient's DownloadDataAsync, DownloadFileAsync, DownloadStringAsync and OpenReadAsync methods are particularly simple to work against, and that's how I've done this sort of thing in the past.

    However, recently I've been looking at the Async Await pattern that's available in the recent versions of .NET. That's what drew me to Devon's thread; a good opportunity to experiment, and a topic that works well with the walkthroughs linked to from https://docs.microsoft.com/en-us/dotnet/visual-basic/programming-guide/concepts/async/index#a-namebkmkrelatedtopicsa-related-topics-and-samples-visual-studio

    Anyways, for what it's worth, this is how I'd tackle Devon's problem with my current level of understanding of Async Await. The code does the following:

    • Loads all the unique ticker names into a Queue(Of String).
    • Creates 4 tasks (in an array), each with its own WebClient instance.
    • Each task dequeues a ticker name, builds the required URL from the name, and then Awaits its WebClient's DownloadStringTaskAsync method.
    • The results are stored in a Dictionary(Of String,String), where the Key is the ticker name, and the Value is the data parsed from the downloaded web page.
    • Each task loops until there are no more tickers to download.
    • Progress is reported by the task by updating the Text of a Label.
    • When there are no more tickers in the Queue, then we know the process has finished.

    Much of the methodology is explained further in the walkthroughs I linked to earlier.

    The Dictionary is used to store the results because, as mentioned in previous posts, the order in which the results are returned does not match the order the tickers were placed in the Queue. However, each of the Dictionary's KeyValue pairs ties the ticker name (Dictionary's Key) to its own data (Dictionary's Value).

    As I understand it, all the user code here is executed on the UI thread, so there is no need to use collections from the System.Collections.Concurrent NameSpace for the sake of thread safety, nor is there any need to Invoke to the UI thread when updating Controls while reporting progress. I presume the WebClients are downloading on their own threads.

    1000 tickers are processed in about 3.5 minutes, reporting progress or live results is easy, and the UI is not frozen.

    Edit: Meant to say: The Form requires a Button, a Label (with plenty of room to the right) and a (wide-ish) RichTextBox.

    Imports System.IO
    Imports System.Net
    
    Public Class Form1
    
    
        Private maxConnections As Integer = 4
        Private queuedTickers As Queue(Of String)
        Private dictTickerResults As Dictionary(Of String, String)
    
        Private URLPart1 As String = "https://finance.yahoo.com/quote/"
        Private URLPart2 As String = "/history?period1=1495177200&period2=1495177200&interval=1d&filter=history&frequency=1d"
    
    
        Private Async Sub Button1_Click(sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            ServicePointManager.DefaultConnectionLimit = maxConnections
    
            Dim sw As New Stopwatch
            sw.Start()
    
    
            queuedTickers = New Queue(Of String)(File.ReadAllLines("Tickers.txt").Distinct)
            dictTickerResults = New Dictionary(Of String, String)
    
            RichTextBox1.Clear()
            Button1.Enabled = False   '   prevent re-entrance while Tasks run
            ReportProgress()
    
    
            ' defer the Tasks from starting until .ToArray is called
            Dim tasksQuery As IEnumerable(Of Task) = From count In Enumerable.Range(0, maxConnections)
                                                     Select FetchTickerData()
            Await Task.WhenAll(tasksQuery.ToArray)
    
            Button1.Enabled = True
    
    
            '   statistics and results
            sw.Stop()
            MsgBox($"{dictTickerResults.Count} tickers fetched in {sw.ElapsedMilliseconds:N0} ms")
    
            For Each kvp As KeyValuePair(Of String, String) In dictTickerResults
                Dim tickerName As String = kvp.Key
                Dim tickerData As String = kvp.Value
                RichTextBox1.AppendText(tickerName & " " & vbTab & " " & tickerData & vbNewLine)
            Next
            RichTextBox1.AppendText($"{vbNewLine}")
            RichTextBox1.AppendText($"{dictTickerResults.Count} tickers fetched in {sw.Elapsed:mm'mins, 'ss\.ff'secs'}")
            RichTextBox1.AppendText($"{vbNewLine}")
    
        End Sub
    
        Private Async Function FetchTickerData() As Task
            Using client As New WebClient
                client.Headers.Add("User-Agent", "WebClientZilla 5.0")
                client.Encoding = System.Text.Encoding.UTF8
    
                Do While queuedTickers.Count > 0
                    Dim thisTicker As String = queuedTickers.Dequeue
    
                    Try
                        Dim pageContents As String = Await client.DownloadStringTaskAsync(URLfromTicker(thisTicker))
                        dictTickerResults(thisTicker) = ParsePageContents(pageContents)
                    Catch ex As Exception
                        dictTickerResults(thisTicker) = $"No Data : ERROR: {ex.Message}"
                    End Try
    
                    ReportProgress()
                Loop
    
            End Using
        End Function
    
        Private Sub ReportProgress()
            Label1.Text = $"{queuedTickers.Count} tickers waiting to start downloading:{dictTickerResults.Count} tickers processed"
        End Sub
    
    
        Private Function URLfromTicker(tickerName As String) As String
            Return String.Concat(URLPart1, tickerName, URLPart2)
        End Function
    
        Private Function ParsePageContents(pageContents As String) As String
    
            Dim data As String = GetTextBetween(pageContents, """prices"":[{", "}")
            If String.IsNullOrEmpty(data) Then
                data = "No Data"
            End If
    
            Return data
        End Function
    
    
        Private Function GetTextBetween(input As String, startDelimiter As String, endDelimiter As String) As String
            Dim result As String = Nothing
    
            If Not String.IsNullOrWhiteSpace(input) Then
                Dim index1 As Integer = input.IndexOf(startDelimiter)
                If index1 <> -1 Then
                    Dim index2 As Integer = input.IndexOf(endDelimiter, index1 + 1)
                    If index2 <> -1 Then
                        Dim resultLength As Integer = index2 - index1 - startDelimiter.Length
                        result = input.Substring(index1 + startDelimiter.Length, resultLength)
                    End If
                End If
            End If
    
            Return result
        End Function
    
    End Class



    • Edited by S P C Monday, May 29, 2017 4:15 PM requisites
    • Marked as answer by Devon_Nullman Monday, May 29, 2017 7:13 PM
    Monday, May 29, 2017 4:06 PM
  • Fantastic - significantly faster than any other methods I have tried.

    Thank you

    Monday, May 29, 2017 7:14 PM
  • SPC,

    Very nice!

    It will take a while for me to understand what you did.

    I am getting avg 370 ms per stock running 10 stocks.

    Plus I dont see it locking the app at all. I can move the form freely and the status updates etc.

    Summary from this page


         Normal Serial Loop:  850 ms

         Parallel Loop:           470 ms

         SPC Async:              370 ms

    Monday, May 29, 2017 8:29 PM
  • SPC,

    Very nice!

    It will take a while for me to understand what you did.

    I am getting avg 370 ms per stock running 10 stocks.

    Plus I dont see it locking the app at all. I can move the form freely and the status updates etc.

    Summary from this page


         Normal Serial Loop:  850 ms

         Parallel Loop:           470 ms

         SPC Async:              370 ms

    50 tickers processed in 10.55 Seconds - nice....

    Monday, May 29, 2017 9:26 PM