none
Speed up Custom Sort RRS feed

  • Question

  • I created a custom sort for stock quotes by date. The Data can arrive in different formats. Basically it is:

    A List(Of String) with Date,Open,High,Low,Close,Adj Close,Volume - The issue is that the Date comes in different formats for the Charting Software, for example:

            'Sample String 1 -   20151110,14.50,14.50,9.23,9.50,81500,9.50
            'Sample String 2 - 11/10/2015,14.50,14.50,9.23,9.50,81500,9.50

    20151110 can't be Parsed as a Date and 11/10/2015 cannot be sorted correctly as a string, so I created this Class:

    Class SComparer
        Implements IComparer(Of String)
        Private m_SortOrder As SortOrder
    
        Public Sub New(ByVal sort_order As SortOrder)
            m_SortOrder = sort_order
        End Sub
    
        Public Function Compare(x As String, y As String) As Integer Implements IComparer(Of String).Compare
            'Sample String 1 -   20151110,14.50,14.50,9.23,9.50,81500,9.50
            'Sample String 2 - 11/10/2015,14.50,14.50,9.23,9.50,81500,9.50
            Dim xcom As Integer = x.IndexOf(",") ' for my particular case
            Dim ycom As Integer = y.IndexOf(",") ' for my particular case
            If xcom = -1 Or ycom = -1 Then
                Return String.Compare(x, y)
            End If
            Dim String_x As String = x.Substring(0, xcom)
            Dim String_y As String = y.Substring(0, ycom)
            If m_SortOrder = SortOrder.Ascending Then
                If IsNumeric(String_x) And IsNumeric(String_y) Then
                    Return Val(String_x).CompareTo(Val(String_y)) ' Tried CInt and Convert.ToUInt32 and CUint
                ElseIf IsDate(String_x) And IsDate(String_y) Then
                    Return CDate(String_x).CompareTo(CDate(String_y)) 'tried convert.ToDateTime and DateTime.parse
                Else
                    Return String.Compare(String_x, String_y)
                End If
            Else ' SortOrder.Descending
                If IsNumeric(String_x) And IsNumeric(String_y) Then
                    Return Val(String_y).CompareTo(Val(String_x))
                ElseIf IsDate(String_x) And IsDate(String_y) Then
                    Return CDate(String_y).CompareTo(CDate(String_x))
                Else
                    Return String.Compare(String_y, String_x)
                End If
            End If
        End Function
    End Class

    It sorts correctly but I'm hoping it can be speeded up. A test is here.

    Edit: It is Implemented like this:

            Dim MyComp As New SComparer(SortOrder.Ascending)
            LS.Sort(MyComp) ' LS is the List Of(String) with the data

    • Edited by Devon_Nullman Saturday, May 20, 2017 8:02 PM Addition of Implementation
    Saturday, May 20, 2017 7:41 PM

Answers

  • Devon, 

    From my first days with computers I did not like that way of sorting with a comparer it did look to me doing it like this. 

    Therefore I tried even with this kind of mechanical machines to use a kind of tagsort.

    .Net gives also that possibility with the sortedlist

    I made a sample for you, be aware in the result I show that my DateTime setting is not USA. (Which cannot be recognized if there is not another indicator)

    Public Class Form1
        Private srtlist As New SortedList(Of DateTime, String)
        Private String1 As String = "20151110, 14.5, 14.5, 9.23, 9.5, 81500, 9.5"
        Private String2 As String = "11/10/2015,14.50,14.50,9.23,9.50,81500,9.50"
        Private String3 As String = "11/10/2016,14.50,14.50,9.23,9.50,81500,9.50"
        Private String4 As String = "10/11/2015,14.50,14.50,9.23,9.50,81500,9.50"
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            AddToList(String1)
            AddToList(String2)
            AddToList(String3)
            AddToList(String4)
            For Each item In srtlist.Values
                ListBox1.Items.Add(item)
            Next
        End Sub
        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
    
            If Not IsNothing(dt) Then
                Dim x As Boolean = True
                Do While x = True
                    Try
                        srtlist.Add(dt, StockString)
                        x = False
                    Catch
                        dt = dt.AddTicks(1)
                    End Try
                Loop
            End If
    
        End Sub
        Private Function GetRightDateFormat(First As String) As DateTime
            Dim dt As DateTime
            If Date.TryParse(First, dt) Then Return dt
            If Date.TryParseExact(First, "yyyyMMdd", Nothing, Nothing, dt) Then Return dt
            Return Nothing
        End Function
    End Class



    Success
    Cor





    Sunday, May 21, 2017 7:27 AM
  • Cor - actually the fact that a sorted list doesn't accept duplicate keys is perfect, because the files cannot have duplicated dates. I had to read the whole file each time to be sure the date was not already there, this lets me use a try catch to avoid dupes.

    Devon,

    You never mentioned this at all. I'm glad that you have your answer but you might want to experiment some with Cor's method.

    One in particular is flawed. If you'll look at the code:

        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
    
            If Not IsNothing(dt) Then
                Dim x As Boolean = True
                Do While x = True
                    Try
                        srtlist.Add(dt, StockString)
                        x = False
                    Catch
                        dt = dt.AddTicks(1)
                    End Try
                Loop
            End If
    
        End Sub

    He is correctly setting the key to the converted DateTime ("dt") but "StockString" still contains the original string value that's not been converted.

    If it helps, my suggested modification:

        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
    
            If Not IsNothing(dt) Then
                Dim x As Boolean = True
                Do While x = True
                    Try
                        arrstring(0) = dt.ToShortDateString
                        srtlist.Add(dt, String.Join(",", arrstring))
                        x = False
                    Catch
                        dt = dt.AddTicks(1)
                    End Try
                Loop
            End If
    
        End Sub

    ...this lets me use a try catch to avoid dupes.

    There's a better (and faster) way:

    You can use the .ContainsKey to quickly return a Boolean value letting you know whether or not it's already in there.


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Wednesday, May 24, 2017 8:32 PM

All replies

  • Hi

    Had a play with this. Here is my trial. The sort itself seems very quick, the initializing takes a couple of seconds. This test uses 200000 data rows and sorts on date either ASC or DESC (set in code)

    ' Form1 with DataGridView1
    Option Strict On
    Option Explicit On
    Public Class Form1
        Dim dt As New DataTable("Freddy")
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Dim s1 As String = "20152810,14.50,14.50,9.23,9.50,81500,9.50"
            Dim s2 As String = "29/10/2015,114.510,114.510,9.213,91.50,80000,19.50"
            ' 200000 rows
            Dim incomingdata As New List(Of String)
            For j As Integer = 0 To 100000
                incomingdata.Add(s1)
                incomingdata.Add(s2)
            Next
    
            ' Date,Open,High,Low,Close,Adj Close,Volume
            With dt
                .Columns.Add("#", GetType(Integer))
                .Columns.Add("Date", GetType(DateTime))
                .Columns.Add("Open", GetType(Decimal))
                .Columns.Add("High", GetType(Decimal))
                .Columns.Add("Low", GetType(Decimal))
                .Columns.Add("Close", GetType(Decimal))
                .Columns.Add("Adj Close", GetType(Integer))
                .Columns.Add("Volume", GetType(Decimal))
                Dim count As Integer = 1
                For Each s As String In incomingdata
                    Dim a() As String = Split(s, ",")
                    .Rows.Add(count, Dat(a(0)), a(1), a(2), a(3), a(4), a(5), a(6))
                    count += 1
                Next
            End With
            dt.DefaultView.Sort = "Date desc"
            DataGridView1.DataSource = dt
        End Sub
        Function Dat(s As String) As DateTime
            Dim d As Date = Now
            If Date.TryParse(s, d) Then Return d
    
            Dim year As Integer = GetInteger(s.Substring(0, 4))
            If year > 0 Then
                Dim day As Integer = GetInteger(s.Substring(4, 2))
                If day > 0 Then
                    Dim month As Integer = GetInteger(s.Substring(6, 2))
                    If month > 0 Then
                        Return DateSerial(year, month, day)
                    End If
                End If
            End If
            Return Nothing
        End Function
        Function GetInteger(s As String) As Integer
            Dim i As Integer = -1
            If Integer.TryParse(s, i) Then Return i
            Return -1
        End Function
    End Class


    Regards Les, Livingston, Scotland

    Saturday, May 20, 2017 8:54 PM
  • Devon,

    The class doesn't seem to encapsulate the data at all - or am I missing something?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 20, 2017 9:33 PM
  • Les - I'm not understanding how this relates to what I'm doing.
    The stock data is kept in plain text files like this:

    19280103,17.80,17.80,17.80,17.80,0,17.80
    19280104,17.70,17.70,17.70,17.70,0,17.70
    19280105,17.60,17.60,17.60,17.60,0,17.60
    ~21,700 more rows
    20100412,1194.94,1199.20,1194.71,1196.48,0,1196.48
    20100413,1195.94,1199.04,1188.82,1197.30,0,1197.30
    20100414,1198.69,1210.65,1198.69,1210.65,0,1210.65
    
    or
    02/05/2004,16.66,16.1,16.9,16.68,6288100,16.68
    02/06/2004,16.7,16.4,17.48,17.001,544000,17.001
    02/09/2004,17.25,17.16,17.7,17.41,316700,17.41
    ~2350 more rows
    06/12/2013,6.57,6.5,6.58,6.5,20900,6.5
    06/13/2013,6.52,6.5,6.53,6.51,84800,6.51
    06/14/2013,6.51,6.51,6.51,6.51,0,6.51
    

    It gets added to every day but if a day or more gets missed, the missed day(s) must be in the proper order.

    What I am doing is reading the file, adding the latest data, sorting it by the pseudo-date and saving it all back to the same file.

    Saturday, May 20, 2017 9:47 PM
  • Devon,

    The class doesn't seem to encapsulate the data at all - or am I missing something?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Correct, the Class is just for sorting. Would it help to make the data into a Class and then a list of That Class ? it all has to wind up in a plain text file where everything is the same except the Date format and one of the required formats is not a 'real" format (YYYYMMDD). In a Class or Structure, I cannot have the date field as a date (I don't think)

    Saturday, May 20, 2017 9:51 PM
  • Hi

    OK, hope someone can help with sorting missed days.


    Regards Les, Livingston, Scotland

    Saturday, May 20, 2017 9:54 PM

  • Correct, the Class is just for sorting. Would it help to make the data into a Class and then a list of That Class ? it all has to wind up in a plain text file where everything is the same except the Date format and one of the required formats is not a 'real" format (YYYYMMDD). In a Class or Structure, I cannot have the date field as a date (I don't think)

    That's up to you but IComparer isn't a part of this; You need a function that will return a valid instance of DateTime (ergo, a parser).

    Once you have that, the rest uses the comparer that's built-in for dates which is plenty fast. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 20, 2017 9:56 PM
  • In a Class or Structure, I cannot have the date field as a date (I don't think)

    Why?

    The issue isn't comparison; the issue is parsing it.

    Parse it to a DateTime (a function will do that) and the rest will be easy.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 20, 2017 9:58 PM
  • In a Class or Structure, I cannot have the date field as a date (I don't think)

    Why?

    The issue isn't comparison; the issue is parsing it.

    Parse it to a DateTime (a function will do that) and the rest will be easy.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    so if the file has lines like:

    20100414,1198.69,1210.65,1198.69,1210.65,0,1210.65

    I should read the lines, turn the pseudo-dates into dates, add the one line of data with the date as a real date, then sort, then turn each one back to a pseudo-date and save all the lines ? Would I encapsulate each line into a class/Structure ? It seems like in order to sort by date, that would be mandatory. FYI the data source (Yahoo Finance) sends the date data as the number of seconds since Jan 01, 1970.

    If I kept it as a list (Of String), it would be just the opposite, right ? Turn everything into a pseudo-date and use the built in List.Sort, then either leave as is or convert all into a real date.

    Saturday, May 20, 2017 10:31 PM

  • so if the file has lines like:

    20100414,1198.69,1210.65,1198.69,1210.65,0,1210.65

    I should read the lines, turn the pseudo-dates into dates, add the one line of data with the date as a real date, then sort, then turn each one back to a pseudo-date and save all the lines ? Would I encapsulate each line into a class/Structure ? It seems like in order to sort by date, that would be mandatory. FYI the data source (Yahoo Finance) sends the date data as the number of seconds since Jan 01, 1970.

    If I kept it as a list (Of String), it would be just the opposite, right ? Turn everything into a pseudo-date and use the built in List.Sort, then either leave as is or convert all into a real date.

    Unix Date -- I never did get why that's still used, but 1970 was a good year. ;-)

    *****

    ICompare/IComparer applies to a class and when you implement it, it applies to how you want to compare when "sorting" instances of that class. It's up to you but I don't see that as an issue here. Obviously then, it has to have fields/properties to compare [to each other].

    *****

    You need a function that will always return a valid DateTime. Order them (in my mind, "them" is a collection of instances but that's up to you) based on whatever you want and then persist them directly or export them.

    I'm not seeing the full scope of your work here but you need a parsing routine:

    Option Strict On
    Option Explicit On
    Option Infer Off
    
    Public Class Form1
        Private Sub Form1_Load(sender As System.Object, _
                               e As System.EventArgs) _
                               Handles MyBase.Load
    
            'Sample String 1 -   20151110,14.50,14.50,9.23,9.50,81500,9.50
            'Sample String 2 - 11/10/2015,14.50,14.50,9.23,9.50,81500,9.50
    
            Dim example1 As Nullable(Of DateTime) = ParseTextToDate("20151110")
    
            Stop
    
            Dim example2 As Nullable(Of DateTime) = ParseTextToDate("11/10/2015")
    
            Stop
    
        End Sub
    
    
    
        Private Function ParseTextToDate(ByVal text As String) As Nullable(Of DateTime)
    
            Dim retVal As Nullable(Of DateTime)
    
            If Not String.IsNullOrWhiteSpace(text) Then
                If text.Contains("/") Then
                    retVal = CDate(text)
                    ' In the US only, otherwise pull it apart
                    ' and create a new DateTime from it
                Else
                    If text.Trim.Length = 8 Then
                        retVal = New DateTime(CInt(text.Substring(0, 4)), _
                                              CInt(text.Substring(4, 2)), _
                                              CInt(text.Substring(6)), 0, 0, 0)
                    End If
                End If
            End If
    
            Return retVal
    
        End Function
    End Class
    

    I'm making assumptions in that and be sure to test that the return value .HasValue (ergo, that it's not null), but that's the basic idea.

    Once you have it as a DateTime, the rest is easy.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 20, 2017 10:44 PM
  • Devon,

    How would you set up a class to "Sort" based on a particular order you wanted them to always sort as?

    It's your class; you make the rules. In the following, I've told it first name, then last name, the age. Change it around and experiment though. ;-)

    Option Strict On
    Option Explicit On
    Option Infer Off
    
    Public Class Form1
        Private Sub Form1_Load(sender As System.Object, _
                               e As System.EventArgs) _
                               Handles MyBase.Load
    
            ''Sample String 1 -   20151110,14.50,14.50,9.23,9.50,81500,9.50
            ''Sample String 2 - 11/10/2015,14.50,14.50,9.23,9.50,81500,9.50
    
            'Dim example1 As Nullable(Of DateTime) = ParseTextToDate("20151110")
    
            'Stop
    
            'Dim example2 As Nullable(Of DateTime) = ParseTextToDate("11/10/2015")
    
            'Stop
    
            Dim ecList As New List(Of ExampleClass)
    
            ExampleClass.AddNew(ecList, "Frank", "Smith", 58)
            ExampleClass.AddNew(ecList, "Abraham", "Lincoln", 198)
    
            ecList.Sort()
    
            Stop
    
        End Sub
    
    
    
        Private Function ParseTextToDate(ByVal text As String) As Nullable(Of DateTime)
    
            Dim retVal As Nullable(Of DateTime)
    
            If Not String.IsNullOrWhiteSpace(text) Then
                If text.Contains("/") Then
                    retVal = CDate(text)
                    ' In the US only, otherwise pull it apart
                    ' and create a new DateTime from it
                Else
                    If text.Trim.Length = 8 Then
                        retVal = New DateTime(CInt(text.Substring(0, 4)), _
                                              CInt(text.Substring(4, 2)), _
                                              CInt(text.Substring(6)), 0, 0, 0)
                    End If
                End If
            End If
    
            Return retVal
    
        End Function
    End Class
    
    
    
    
    
    Public Class ExampleClass
        Implements IComparable
        Implements IComparable(Of ExampleClass)
    
        Private _firstName As String
        Private _lastName As String
        Private _age As Integer
    
        Private Sub New(ByVal fName As String, _
                        ByVal lName As String, _
                        ByVal age As Integer)
    
            If Not String.IsNullOrWhiteSpace(fName) AndAlso _
                Not String.IsNullOrWhiteSpace(lName) Then
    
                _firstName = fName.Trim
                _lastName = lName.Trim
    
                If age > 0 Then
                    _age = age
                End If
            End If
    
        End Sub
    
        Public Shared Sub _
            AddNew(ByVal list As List(Of ExampleClass), _
                   ByVal fName As String, _
                   ByVal lName As String, _
                   ByVal age As Integer)
    
            ' I'll skip all of the validation testing..
    
            list.Add(New ExampleClass(fName, lName, age))
    
        End Sub
    
        Public ReadOnly Property Age As Integer
            Get
                Return _age
            End Get
        End Property
    
        Public ReadOnly Property FirstName As String
            Get
                Return _firstName
            End Get
        End Property
    
        Public ReadOnly Property LastName As String
            Get
                Return _lastName
            End Get
        End Property
    
        Public ReadOnly Property FullName As String
            Get
                Return String.Format("{0}, {1}", _
                                     _lastName, _
                                     _firstName)
            End Get
        End Property
    
        Public Function CompareTo(ByVal obj As Object) As Integer _
            Implements IComparable.CompareTo
    
            Dim retVal As Integer = 0
    
            If obj Is Nothing Then
                retVal =  1
            End If
    
            Dim other As ExampleClass = Nothing
    
            If TypeOf obj Is ExampleClass Then
                other = DirectCast(obj, ExampleClass)
            End If
    
            If other Is Nothing Then
                Throw New ArgumentException("obj is not an instance of ExampleClass")
            End If
    
            retVal = CompareTo(other)
    
        End Function
    
        Public Function CompareTo(ByVal other As ExampleClass) As Integer _
            Implements IComparable(Of ExampleClass).CompareTo
    
            Dim retVal As Integer = 0
    
            If other Is Nothing Then
                retVal = 1
            End If
    
            Dim result As Integer = _firstName.CompareTo(other._firstName)
    
            If result <> 0 Then
                retVal = result
            Else
                result = _lastName.CompareTo(other._lastName)
    
                If result <> 0 Then
                    retVal = result
                End If
    
                retVal = _age.CompareTo(other._age)
    
                If result <> 0 Then
                    retVal = result
                End If
            End If
    
            Return retVal
    
        End Function
    End Class


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    • Edited by Frank L. Smith Saturday, May 20, 2017 11:13 PM ... flubbed up part of the comparer...
    Saturday, May 20, 2017 11:09 PM
  • One more then I'll quit being a pest:

            ' Instead, let's make whatever we want using LINQ:
    
            Dim qry As Linq.IOrderedEnumerable(Of ExampleClass) = _
                From ec As ExampleClass In ecList _
                    Order By ec.FirstName, ec.LastName, ec.Age
    
            Stop
    
            ' Same result but more flexible to use (in my opinion)

    You can declare "qry" then later decide how you want it to work; ICompare carves it in stone.

    Do know that LINQ is not nearly as fast though! It's flexible but you trade that off for speed.

    For what it's worth. :)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Saturday, May 20, 2017 11:27 PM
  • If the data can be stored in a byte array instead of series of strings - - much faster operations can be executed.

    A suggested format would be something like : -

    [size of array - 4bytes][Entry][Entry][Entry] . . . . . . .

      Format of [Entry]

            [jump to next entry - 1byte][date - 4 bytes][Open - 3bytes][Close - ?bytes][ . . ][ . . ] . . .

    This way look ups or insertions can be executed in microseconds instead of  milliseconds.


    Pride is the most destructive force in the universe

    Saturday, May 20, 2017 11:39 PM
  • One more then I'll quit being a pest:

            ' Instead, let's make whatever we want using LINQ:
    
            Dim qry As Linq.IOrderedEnumerable(Of ExampleClass) = _
                From ec As ExampleClass In ecList _
                    Order By ec.FirstName, ec.LastName, ec.Age
    
            Stop
    
            ' Same result but more flexible to use (in my opinion)

    You can declare "qry" then later decide how you want it to work; ICompare carves it in stone.

    Do know that LINQ is not nearly as fast though! It's flexible but you trade that off for speed.

    For what it's worth. :)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    You are never a pest. Not only are you way more knowledgeable that I, you also explain things in a way that can be understood. 
    Sunday, May 21, 2017 2:43 AM
  • Devon, 

    From my first days with computers I did not like that way of sorting with a comparer it did look to me doing it like this. 

    Therefore I tried even with this kind of mechanical machines to use a kind of tagsort.

    .Net gives also that possibility with the sortedlist

    I made a sample for you, be aware in the result I show that my DateTime setting is not USA. (Which cannot be recognized if there is not another indicator)

    Public Class Form1
        Private srtlist As New SortedList(Of DateTime, String)
        Private String1 As String = "20151110, 14.5, 14.5, 9.23, 9.5, 81500, 9.5"
        Private String2 As String = "11/10/2015,14.50,14.50,9.23,9.50,81500,9.50"
        Private String3 As String = "11/10/2016,14.50,14.50,9.23,9.50,81500,9.50"
        Private String4 As String = "10/11/2015,14.50,14.50,9.23,9.50,81500,9.50"
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            AddToList(String1)
            AddToList(String2)
            AddToList(String3)
            AddToList(String4)
            For Each item In srtlist.Values
                ListBox1.Items.Add(item)
            Next
        End Sub
        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
    
            If Not IsNothing(dt) Then
                Dim x As Boolean = True
                Do While x = True
                    Try
                        srtlist.Add(dt, StockString)
                        x = False
                    Catch
                        dt = dt.AddTicks(1)
                    End Try
                Loop
            End If
    
        End Sub
        Private Function GetRightDateFormat(First As String) As DateTime
            Dim dt As DateTime
            If Date.TryParse(First, dt) Then Return dt
            If Date.TryParseExact(First, "yyyyMMdd", Nothing, Nothing, dt) Then Return dt
            Return Nothing
        End Function
    End Class



    Success
    Cor





    Sunday, May 21, 2017 7:27 AM
  • You are never a pest. Not only are you way more knowledgeable that I, you also explain things in a way that can be understood. 

    I don't know about that, but what underlies all of it is the CompareTo method:

    https://msdn.microsoft.com/en-us/library/ey2t2ys5(v=vs.110).aspx

    Every time it needs to sort - order - an instance of the class, it will look to each to figure out the relative value of that instance (relative to the other instances) which will be +1, 0, or -1.

    To make my earlier example class faster, make some decisions: If you only need age, only compare age and all other properties won't be considered (so they're equal).

    Faster yet: I like to use my "retVal" as I've been doing it for years, but it's an extra step in processing. Instead, return it directly - just be sure to check that every path returns something.

    *****

    In your case, if you want speed then you can't beat the native speed. If it's a DateTime instance then ordering is based on the .Tick value which is a Long. That's a really fast comparison.

    To sort a List(Of YourClass), then in the CompareTo method, ONLY compare DateTime and everything else is equal.

    I don't see how else you can keep them all together and get the speed you want. You can pretty easily add in persistence (I'd use binary) and/or build import/export features to the string values that you're using.

    *****

    As for me, I'd use LINQ and accept that it's not as fast as native - but that's not to say that it's not plenty fast. :)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Sunday, May 21, 2017 8:27 AM
  • Devon, 

    From my first days with computers I did not like that way of sorting with a comparer it did look to me doing it like this. 

    Therefore I tried even with this kind of mechanical machines to use a kind of tagsort.

    .Net gives also that possibility with the sortedlist

    I made a sample for you, be aware in the result I show that my DateTime setting is not USA. (Which cannot be recognized if there is not another indicator)

    Public Class Form1
        Private srtlist As New SortedList(Of DateTime, String)
        Private String1 As String = "20151110, 14.5, 14.5, 9.23, 9.5, 81500, 9.5"
        Private String2 As String = "11/10/2015,14.50,14.50,9.23,9.50,81500,9.50"
        Private String3 As String = "11/10/2016,14.50,14.50,9.23,9.50,81500,9.50"
        Private String4 As String = "10/11/2015,14.50,14.50,9.23,9.50,81500,9.50"
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            AddToList(String1)
            AddToList(String2)
            AddToList(String3)
            AddToList(String4)
            For Each item In srtlist.Values
                ListBox1.Items.Add(item)
            Next
        End Sub
        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
            If Not IsNothing(dt) Then
                srtlist.Add(dt, StockString)
            End If
        End Sub
        Private Function GetRightDateFormat(First As String) As DateTime
            Dim dt As DateTime
            If Date.TryParse(First, dt) Then Return dt
            If Date.TryParseExact(First, "yyyymmdd", Nothing, Nothing, dt) Then Return dt
            Return Nothing
        End Function
    End Class

    I found it myself a nice sampe and therefore I've put it on our website.  

    http://www.vb-tips.com/sortedlist.aspx


    Success
    Cor



    Hi Cor

    In your example, try setting some of the strings with the same date.


    Regards Les, Livingston, Scotland

    Sunday, May 21, 2017 11:50 AM
  • I tried Les (there is a duplicate in the sample), because I also had the idea that the sortedlist would not accept duplicates.

    But do you know what is really a beginners mistake. I used a Mask "yyyymmdd" and therefore it did not fail. 

    But I go looking for a better class and otherwise I change it by adding a microsecond to the tick each time it catches an error (but that is not fast, because the first try takes time if it fails).

    Thanks for pointing me on this. 

    Could not find it, so changed in the way it goes, but I was surprised the sorted list was accepting my duplicates but it does not, and therefore I've removed it again from our website.


    Thanks
    Cor






    Sunday, May 21, 2017 12:08 PM

  • so if the file has lines like:

    20100414,1198.69,1210.65,1198.69,1210.65,0,1210.65

    I should read the lines, turn the pseudo-dates into dates, add the one line of data with the date as a real date, then sort, then turn each one back to a pseudo-date and save all the lines ? Would I encapsulate each line into a class/Structure ? It seems like in order to sort by date, that would be mandatory. FYI the data source (Yahoo Finance) sends the date data as the number of seconds since Jan 01, 1970.

    If I kept it as a list (Of String), it would be just the opposite, right ? Turn everything into a pseudo-date and use the built in List.Sort, then either leave as is or convert all into a real date.

    I grabbed some of what you showed here last night. I don't think these are Unix dates, but that's what I used anyway:

    20151110,14.50,14.50,9.23,9.50,81500,9.50
    11/10/2015,14.50,14.50,9.23,9.50,81500,9.50
    19280103,17.80,17.80,17.80,17.80,0,17.80
    19280104,17.70,17.70,17.70,17.70,0,17.70
    19280105,17.60,17.60,17.60,17.60,0,17.60
    20100412,1194.94,1199.20,1194.71,1196.48,0,1196.48
    20100413,1195.94,1199.04,1188.82,1197.30,0,1197.30
    20100414,1198.69,1210.65,1198.69,1210.65,0,1210.65
    02/05/2004,16.66,16.1,16.9,16.68,6288100,16.68
    02/06/2004,16.7,16.4,17.48,17.001,544000,17.001
    02/09/2004,17.25,17.16,17.7,17.41,316700,17.41
    06/12/2013,6.57,6.5,6.58,6.5,20900,6.5
    06/13/2013,6.52,6.5,6.53,6.51,84800,6.51
    06/14/2013,6.51,6.51,6.51,6.51,0,6.51

    If those aren't meant as Unix dates then the following will need to be amended:

    Option Strict On Option Explicit On Option Infer Off Imports System.IO Imports Microsoft.VisualBasic.FileIO Public Class StockPrices Implements IComparable Implements IComparable(Of StockPrices) Private _stockPriceData As IEnumerable(Of StockPrices) Private _date As DateTime Private _open As Decimal Private _high As Decimal Private _low As Decimal Private _close As Decimal Private _adjustedClose As Decimal Private _volume As Long Public Sub New(ByVal filePath As String) Try If String.IsNullOrWhiteSpace(filePath) Then Throw New ArgumentException("The text file path cannot be null or empty.") Else Dim fi As New FileInfo(filePath) Dim testData As IEnumerable(Of StockPrices) = _ ParseTextFilePath(fi) If testData IsNot Nothing Then _stockPriceData = testData End If End If Catch ex As Exception Throw End Try End Sub Private Sub New(ByVal dt As DateTime, _ ByVal open As Decimal, _ ByVal high As Decimal, _ ByVal low As Decimal, _ ByVal close As Decimal, _ ByVal adjustedClose As Decimal, _ ByVal volume As Long) _date = dt _open = open _high = high _low = low _close = close _adjustedClose = adjustedClose _volume = volume End Sub Private Function CompareTo(ByVal obj As Object) As Integer _ Implements IComparable.CompareTo If obj Is Nothing Then Return 1 End If Dim other As StockPrices = Nothing If TypeOf obj Is StockPrices Then other = DirectCast(obj, StockPrices) End If If other Is Nothing Then Throw New ArgumentException("obj is not a StockPrices") End If Return CompareTo(other) End Function Private Function CompareTo(ByVal other As StockPrices) As Integer _ Implements IComparable(Of StockPrices).CompareTo If other Is Nothing Then Return 1 End If Return _date.CompareTo(other._date) End Function Public Function GetSortedData() As IEnumerable(Of StockPrices) Return _stockPriceData End Function Public ReadOnly Property [Date] As DateTime Get Return _date End Get End Property Public ReadOnly Property AdjustedClose As Decimal Get Return _adjustedClose End Get End Property Public ReadOnly Property Close As Decimal Get Return _close End Get End Property Public ReadOnly Property High As Decimal Get Return _high End Get End Property Public ReadOnly Property Low As Decimal Get Return _low End Get End Property Public ReadOnly Property Open As Decimal Get Return _open End Get End Property Public ReadOnly Property Volume As Long Get Return _volume End Get End Property Public Overrides Function ToString() As String Dim sb As New System.Text.StringBuilder sb.AppendLine(_date.ToShortDateString) sb.AppendLine("Open: " & _open.ToString("c2")) sb.AppendLine("High: " & _high.ToString("c2")) sb.AppendLine("Low: " & _low.ToString("c2")) sb.AppendLine("Close: " & _close.ToString("c2")) sb.Append("Volume: " & _volume.ToString) Return sb.ToString End Function Private Function ParseTextFilePath(ByVal fi As FileInfo) As IEnumerable(Of StockPrices) Dim retVal As IEnumerable(Of StockPrices) = Nothing fi.Refresh() If fi.Exists Then Dim tempList As New List(Of StockPrices) Using tfp As New TextFieldParser(fi.FullName) With tfp .TextFieldType = FileIO.FieldType.Delimited .Delimiters = New String() {","} End With Dim currentLineOfText() As String = Nothing While Not tfp.EndOfData currentLineOfText = tfp.ReadFields() If currentLineOfText.Length = 7 Then Dim dt As Nullable(Of DateTime) If DateTime.TryParse(currentLineOfText(0), New DateTime) Then dt = CDate(currentLineOfText(0)) Else dt = GetDT_FromUnix(currentLineOfText(0)) End If If dt.HasValue Then tempList.Add(New _ StockPrices(dt.Value, _ CDec(currentLineOfText(1)), _ CDec(currentLineOfText(2)), _ CDec(currentLineOfText(3)), _ CDec(currentLineOfText(4)), _ CDec(currentLineOfText(5)), _ CLng(currentLineOfText(6)))) End If End If End While If tempList.Count > 0 Then tempList.Sort() retVal = tempList.ToArray End If End Using End If Return retVal End Function Private Function _ GetDT_FromUnix(ByVal unixTimeString As String) As Nullable(Of DateTime) Dim retVal As Nullable(Of DateTime) = Nothing If Not String.IsNullOrWhiteSpace(unixTimeString) Then Try retVal = New DateTime(1970, 1, 1, 0, 0, 0).AddSeconds(CLng(unixTimeString)) Catch ex As Exception ' Nothing to do: The return value ' defaults to null... End Try End If Return retVal End Function End Class


    Tested it here:

    Option Strict On Option Explicit On Option Infer Off Public Class Form1 Private Sub _ Form1_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles MyBase.Load Dim desktop As String = _ Environment.GetFolderPath(Environment.SpecialFolder.Desktop) Dim devonFilePath As String = _ IO.Path.Combine(desktop, "ExamplesFromDevon.txt") Dim sp As New StockPrices(devonFilePath) Dim data As IEnumerable(Of StockPrices) = sp.GetSortedData Stop End Sub End Class


    Did I misunderstand about the Unix date?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Sunday, May 21, 2017 12:50 PM
  •         'Sample String 1 -   20151110,14.50,14.50,9.23,9.50,81500,9.50
            'Sample String 2 - 11/10/2015,14.50,14.50,9.23,9.50,81500,9.50

    20151110 can't be Parsed as a Date ...

    Sure it can. :)

    After getting the characters up to the first comma, check to see if they contain a forward slash... if so, parse normally.  If not, use ParseExact and pass the format string "yyyyMMdd":

    Date.ParseExact("20151110", "yyyyMMdd", Globalization.CultureInfo.CurrentCulture.NumberFormat)

    Now you can just use a SortedList(Of Date, String) to hold the parsed date and the updated line of text (replace the date text at the beginning before adding to the list).  After you've finished filling the list you can loop back through the values and overwrite the source file.

    This is pretty much what Frank was leading you toward, just with the ParseExact method used to parse the custom date format.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


    Sunday, May 21, 2017 12:55 PM
    Moderator
  • ... just with the ParseExact method used to parse the custom date format.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


    It's always the simple stuff that I miss!

    :)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Sunday, May 21, 2017 12:57 PM
  • As an addendum, it had me curious about just how long my class would take on fairly large data so I built a routine to take the "devon file" and repeat it, over and over until I got to a little more than 100,000 entries:

    Private Sub CreateLargeTextFile(ByVal sourceFilePath As String, _ ByVal targetFilePath As String, _ ByVal numberOfEntries As Integer) If Not String.IsNullOrWhiteSpace(sourceFilePath) AndAlso _ Not String.IsNullOrWhiteSpace(targetFilePath) AndAlso _ numberOfEntries > 0 Then Dim sourceFI As New IO.FileInfo(sourceFilePath) Dim targetFI As New IO.FileInfo(targetFilePath) If targetFI.Exists Then targetFI.Delete() End If If sourceFI.Exists Then Dim sourceText As String = IO.File.ReadAllText(sourceFI.FullName) If Not String.IsNullOrWhiteSpace(sourceText) Then Dim sb As New System.Text.StringBuilder Dim count As Integer = 0 Do sb.AppendLine(sourceText) count += 1 If count >= numberOfEntries Then IO.File.WriteAllText(targetFI.FullName, sb.ToString) Exit Do End If Loop End If End If End If End Sub


    The question is: How long will it take to read that large text file, make the decision about what to do with the text that represents a date, sort it, and hand the ordered data back:

        Private Sub _
            Form1_Load(ByVal sender As System.Object, _
                       ByVal e As System.EventArgs) _
                       Handles MyBase.Load
    
            Dim desktop As String = _
                Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
    
            Dim devonFilePath As String = _
                IO.Path.Combine(desktop, "ExamplesFromDevon.txt")
    
            Dim largeDevonFilePath As String = _
                IO.Path.Combine(desktop, "LargeDevonFile.txt")
    
            ' CreateLargeTextFile(devonFilePath, largeDevonFilePath, CInt(100000 / 14))
    
            Dim sw As New Stopwatch
            sw.Start()
    
            Dim sp As New StockPrices(largeDevonFilePath) _
                With {.OddDateIsUNIX = False}
    
            Dim data As IEnumerable(Of StockPrices) = sp.GetSortedData
    
            sw.Stop()
    
            Stop
    
        End Sub

    I don't think that's too bad. ;)


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    • Edited by Frank L. Smith Sunday, May 21, 2017 2:11 PM ...dangerous loop!
    Sunday, May 21, 2017 2:10 PM
  • Cor - actually the fact that a sorted list doesn't accept duplicate keys is perfect, because the files cannot have duplicated dates. I had to read the whole file each time to be sure the date was not already there, this lets me use a try catch to avoid dupes.

    Tuesday, May 23, 2017 4:01 AM
  • @Devon:

    If the date part of your data is arriving from the website as a Unix time stamp, then is it your code that is converting that to the "yyyyMMdd" and "MM/dd/yyyy" string formats? Do you actually need 2 different date string formats?

    If it were me, I'd convert the incoming Unix time stamp (and any existing date strings in the "MM/dd/yyyy" format) to the "yyyyMMdd" format, because that format will sort quickly without any custom code. (You could even use the format "yyyy/MM/dd" if that's easier to read.)

    Then, as long as each line starts with the date portion of the string, your code simply becomes:

     LS = File.ReadAllLines("<filePath>").ToList
     LS.AddRange(newData)
     LS.Sort()
     File.WriteAllLines("<filePath>", LS)
    

    Tuesday, May 23, 2017 11:48 AM
  • Cor - actually the fact that a sorted list doesn't accept duplicate keys is perfect, because the files cannot have duplicated dates. I had to read the whole file each time to be sure the date was not already there, this lets me use a try catch to avoid dupes.

    Devon,

    You never mentioned this at all. I'm glad that you have your answer but you might want to experiment some with Cor's method.

    One in particular is flawed. If you'll look at the code:

        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
    
            If Not IsNothing(dt) Then
                Dim x As Boolean = True
                Do While x = True
                    Try
                        srtlist.Add(dt, StockString)
                        x = False
                    Catch
                        dt = dt.AddTicks(1)
                    End Try
                Loop
            End If
    
        End Sub

    He is correctly setting the key to the converted DateTime ("dt") but "StockString" still contains the original string value that's not been converted.

    If it helps, my suggested modification:

        Private Sub AddToList(StockString As String)
            Dim arrstring = StockString.Split(","c)
            Dim dt = GetRightDateFormat(arrstring(0))
    
            If Not IsNothing(dt) Then
                Dim x As Boolean = True
                Do While x = True
                    Try
                        arrstring(0) = dt.ToShortDateString
                        srtlist.Add(dt, String.Join(",", arrstring))
                        x = False
                    Catch
                        dt = dt.AddTicks(1)
                    End Try
                Loop
            End If
    
        End Sub

    ...this lets me use a try catch to avoid dupes.

    There's a better (and faster) way:

    You can use the .ContainsKey to quickly return a Boolean value letting you know whether or not it's already in there.


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Wednesday, May 24, 2017 8:32 PM
  • Thanks again. Now the slow part is the actual data download. processing and sorting and saving are less than 100 ms per item but the raw download (400K) is around a second per item, Wget for Windows takes less than half that - more on another new question later.

    Thursday, May 25, 2017 1:16 AM
  • Thanks again. Now the slow part is the actual data download. processing and sorting and saving are less than 100 ms per item but the raw download (400K) is around a second per item, Wget for Windows takes less than half that - more on another new question later.

    Sorry for the delay.

    This is OT to this thread really but if you'd care to tell me more about what your program is and does - and where the incoming data is coming from - I might can put all of this together in one place.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, May 25, 2017 6:51 PM
  • Thanks again. Now the slow part is the actual data download. processing and sorting and saving are less than 100 ms per item but the raw download (400K) is around a second per item, Wget for Windows takes less than half that - more on another new question later.

    Sorry for the delay.

    This is OT to this thread really but if you'd care to tell me more about what your program is and does - and where the incoming data is coming from - I might can put all of this together in one place.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Frank - I started a new thread here - I would send you the entire project if there's any way to do so non-publically. 
    • Edited by Devon_Nullman Thursday, May 25, 2017 8:03 PM Addition
    Thursday, May 25, 2017 7:52 PM
  • Frank - I started a new thread here

    Oh web scraping.

    I'm no good with that at all. Sorry that I couldn't help.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, May 25, 2017 7:55 PM
  • Frank - I started a new thread here

    Oh web scraping.

    I'm no good with that at all. Sorry that I couldn't help.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Not really scraping, just getting the contents. The parsing and extracting what I need is done.

    Thursday, May 25, 2017 9:17 PM