none
How to use filestream( ) to read text content line by line RRS feed

  • Question

  • Hi,

    can anyone give me some idea and reference information

    how can i read the strings line by line in text file by using filestream() ?

    because my text file contains single and double bytes of characters.

    so, i think it is needed to use filestream to read the bytes of characters instead of using streamreader. 

    but, filestream seems read all lines in once time.

    Thank you.   

    Monday, April 1, 2019 4:34 AM

Answers


  • Thank you for your sample code.

    It can read the char bytes one-by-one starting initially.

    May I know the idea of reading line is loop to read bytes array and write bytes array to files in same loop ?

    While there are found &HA, then pass-by to to read next char bytes.

    I'm not sure what you are asking for, but I think you want to know how to
    process the whole input file rather than just the first line.

    Here's an example that reads the test4.txt file and copies it exactly to the
    file "V2testOut.txt". This is done mainly to verify the program's logic, and
    to ensure that it is processing all lines in the input.

    Note that in the file you provide - test4.txt - the last line does not end with
    a CRLF as do all of the prior lines in the file. This has been considered in
    the code below. No CRLF is added by the program to the end of this file, if
    there isn't one in the input file. Note that I have not tested the program with
    an input file that DOES end with CRLF.

    The program also creates the file "V2testSub.txt" which contains the substrings
    you specified from each line. Each substring (line) in this file ends with CRLF.

    Sub Main()
        Dim fileName = "test4.txt"
        Dim line(1000) As Byte
        Dim lb As List(Of Byte) = New List(Of Byte)
        Dim b As Byte
    
        Dim writer As BinaryWriter = New BinaryWriter(File.Open("V2testOut.txt", FileMode.Create))
        Dim writer2 As BinaryWriter = New BinaryWriter(File.Open("V2testSub.txt", FileMode.Create))
        Dim reader As BinaryReader = New BinaryReader(File.Open(fileName, FileMode.Open))
    
        Try
            While True
                b = reader.ReadByte
                lb.Add(b)
                If b = &HA And lb.Count > 1 Then
                    If lb.Item(lb.Count - 2) = &HD Then
                        ' write line to both files
                        For x As Integer = 0 To lb.Count - 1
                            writer.Write(lb.Item(x))
                        Next
    
                        For x As Integer = 0 To lb.Count - 1
                            line(x) = lb.Item(x)
                        Next
                        ' write subs line
                        writer2.Write(line, 85, 140)
                        writer2.Write(CByte(&HD))
                        writer2.Write(CByte(&HA))
    
                        ' clear list and array
                        lb.Clear()
                        Array.Clear(line, 0, line.Length)
                    End If
                End If
    
            End While
    
        Catch ex As EndOfStreamException
            ' End of File
            If lb.Count > 0 Then
                ' write line to both files
                For x As Integer = 0 To lb.Count - 1
                    writer.Write(lb.Item(x))
                Next
    
                For x As Integer = 0 To lb.Count - 1
                    line(x) = lb.Item(x)
                Next
                ' write subs line
                writer2.Write(line, 85, 140)
                writer2.Write(CByte(&HD))
                writer2.Write(CByte(&HA))
            End If
        End Try
    
        writer.Close()
        writer2.Close()
    
    End Sub
    

    As before, I have used both a list and an array just to simplify somewhat
    the coding. It could be rewritten to use just one or the other. But I've
    contributed enough code to this project. Feel free to modify the code yourself
    if you're so inclined.

    Of course, if there are any logic or syntax errors in my code I will address
    those. But I won't keep altering the code whenever you think up some new
    requirement or enhancement. It's your project so you should be doing the
    bulk of the programming yourself.

    Note that the repetition of the file writing code could be eliminated with the
    creation of a Sub to be called in both places. I leave that as an exercise for
    you. A simple copy & paste was easier for me, and as I'm not getting paid for
    this I will follow the path of least resistance.

    - Wayne

    • Marked as answer by koklee Saturday, April 20, 2019 3:14 PM
    Monday, April 8, 2019 7:18 PM
  • Hello,

    See if the following might work for you. Pass your file name into OpenStreamReaderWithEncoding. Then read lines as per the following code sample starting at the Do While.

    Imports System.IO
    Imports System.Text
    
    Public Module ReadLinesWithEncodingExpample
        ''' <summary>
        ''' Run this
        ''' </summary>
        ''' <param name="pFileName"></param>
        ''' <returns></returns>
        Public Function OpenStreamReaderWithEncoding(pFileName As String) As StreamReader
            Dim enc As Encoding = GetFileEncoding(pFileName)
    
            Return New StreamReader(pFileName, enc)
    
        End Function
    
        Public Function GetFileEncoding(ByVal pFileName As String) As Encoding
            Dim enc As Encoding = Encoding.Default
    
            ' *** Detect byte order mark if any - otherwise assume default
            Dim buffer(4) As Byte
            Dim file As New FileStream(pFileName, FileMode.Open)
    
            file.Read(buffer, 0, 5)
            file.Close()
    
            If buffer(0) = &HEF AndAlso buffer(1) = &HBB AndAlso buffer(2) = &HBF Then
                enc = Encoding.UTF8
            ElseIf buffer(0) = &HFE AndAlso buffer(1) = &HFF Then
                enc = Encoding.Unicode
            ElseIf buffer(0) = 0 AndAlso buffer(1) = 0 AndAlso buffer(2) = &HFE AndAlso buffer(3) = &HFF Then
                enc = Encoding.UTF32
            ElseIf buffer(0) = &H2B AndAlso buffer(1) = &H2F AndAlso buffer(2) = &H76 Then
                enc = Encoding.UTF7
            End If
    
            Return enc
    
        End Function
    
    End Module
    


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    • Marked as answer by koklee Saturday, April 20, 2019 3:15 PM
    Saturday, April 20, 2019 1:34 PM
    Moderator

All replies

  • One of the easy ways to read the lines from a text file that uses muti-byte characters such as UTF-8:

       For Each line In File.ReadLines("MyFile.txt", Encoding.UTF8)

     

          ' . . .

     

       Next

     

    If this does not work, then give some details about the encoding and line-ending.



    • Edited by Viorel_MVP Monday, April 1, 2019 5:28 AM
    Monday, April 1, 2019 5:27 AM
  • Hi,

    You can split the string by line.

     Using fsRead As FileStream = New FileStream("D:\test.txt", FileMode.Open)
                Dim fsLen As Integer = CInt(fsRead.Length)
                Dim heByte As Byte() = New Byte(fsLen - 1) {}
                Dim r As Integer = fsRead.Read(heByte, 0, heByte.Length)
                Dim myStr As String = System.Text.Encoding.UTF8.GetString(heByte)
                Dim striparr As String() = myStr.Split(New String() {vbCrLf}, StringSplitOptions.None)
    
            End Using

    Best Regards,

    Alex


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, April 1, 2019 6:00 AM
  • One of the easy ways to read the lines from a text file that uses muti-byte characters such as UTF-8:

       For Each line In File.ReadLines("MyFile.txt", Encoding.UTF8)

     

          ' . . .

     

       Next

     

    If this does not work, then give some details about the encoding and line-ending.



    Hi  Viorel,  Alex,

    Many thanks for your reply.

    As I found from MSDN DOC,  ReadLines is a method of streamreader class.

    It seems cannot handle the bytes character issue. since, it counts the fixed bytes characters number wrongly.

    The text file content sample (There are many lines ) as below:

    line1:  123456789.00  ABCDE

    line2:  234567891.00 BCDEF

    remark:

    [123456789.00] and space are single byte characters.

    [ABCDEF] are double bytes characters.

    since, I want to get the fixed length of position of characters for further process.

    And then, write to a new file.

    The encoding is using the "default ASCII encoding" both of read and write files.

     I had to take reference from the following link:

    https://docs.microsoft.com/en-us/dotnet/api/system.io.filestream.read?view=netframework-4.7.2

    Is the information enough ?

    Thank you. 

     


    • Edited by koklee Monday, April 1, 2019 6:56 AM
    Monday, April 1, 2019 6:54 AM
  • Hi,

    You can intercept strings based on the length of the string

    Imports System.IO
    Imports System.Text
    
    Public Class Form1
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Dim str As String() = File.ReadAllLines("D:\test.txt", Encoding.Defaut)
            TextBox1.Text = str(0).Substring(0, 17)
            TextBox2.Text = str(1).Substring(0, 17)
        End Sub
    End Class
    

    Best Regards,

    Alex


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, April 2, 2019 6:19 AM
  • Hi Alex,

    Thank you for your advice.

    There is the problem which I faced in using "File.ReadAllLines" of String.

    In your sample,even I can count the length is correct either using

    str(0).Length (your sample) or

    Byte((fsSource.Length) - 1)  ( form MSDN code reference - filestream).

    But I cannot get the correct fixed length of position characters for further process by your code.

    e.g. get all fixed length of "B" with space each line from text file.
    Then write into a new text file line by line.

    I tried upload the sample file at the link below for your reference:

    https://ufile.io/8mpuq

    I'm thinking the filestream is only read all characters of the text file once time.

    how can I get all fixed lenght of "B" with space line by line in text file ?
    Since, each lines are end up with CRLF.

    Thanks
    Wednesday, April 3, 2019 4:26 PM

  • so, i think it is needed to use filestream to read the bytes of characters instead of using streamreader. 


    No if you read a text file from disk, you read always unicode characters. The time that it were (ANSI) kind of bytes is long ago. (Be aware that ASCII (7 bit characters) was never used on HDD)

    All unicode characters are at least 32bit 

    https://en.wikipedia.org/wiki/Unicode

    Therefore maybe is your file no text file and do you need to use a binaryreader. 

    https://docs.microsoft.com/en-us/dotnet/api/system.io.binaryreader?view=netframework-4.7.2


    Success
    Cor



    Wednesday, April 3, 2019 6:54 PM
  • Hi,

    Do you want to find the position of the character "" in the string?

    TextBox1.Text = str(0).IndexOf("")

    Best Regards,

    Alex


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, April 4, 2019 1:40 AM
  • Hi,

    Do you want to find the position of the character "" in the string?

    TextBox1.Text = str(0).IndexOf("B")

    Best Regards,

    Alex


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Hi Alex,

    As mentioned in previous, I want to get all "B" (from "B" start position to "B" end position) line by line and write into a new file.

    But, I still cannot find the methods how to achieve.......

    Thanks.

    Thursday, April 4, 2019 2:41 PM

  • so, i think it is needed to use filestream to read the bytes of characters instead of using streamreader. 


    No if you read a text file from disk, you read always unicode characters. The time that it were (ANSI) kind of bytes is long ago. (Be aware that ASCII (7 bit characters) was never used on HDD)

    All unicode characters are at least 32bit 

    https://en.wikipedia.org/wiki/Unicode

    Therefore maybe is your file no text file and do you need to use a binaryreader. 

    https://docs.microsoft.com/en-us/dotnet/api/system.io.binaryreader?view=netframework-4.7.2


    Success
    Cor



    Hi Cor,

    Do you mean it is to use "Binaryreader" to handle  in my case (ANSI encoding file ) ?

    Can it perform line by line characters reading and writing from file and into file ?

    Thanks

    Thursday, April 4, 2019 2:46 PM

  • I'm always surprised how persons give replies with code for a streamreader while they have not resolved this in your problem. 

    "because my text file contains single and double bytes of characters."

    That is only possible with a non Text file. (Therefore a binary file.). But that has no rows. (which means lines ended by a VBCRLF, a LF or a CR)

    Try first to investigate with Notepad if you can read the file completely. If that is the case but the characters are presented wrong, than it can be a code page (language used in file) or a UTF problem. That needs than a kind of solution likewise Viorel has shown with encoding in the first reply of this thread. 

    If your file is not a by notepad readable file. Than you can try to solve it with the BinaryReader. Be aware that files like Doc, Exe, Avi are all binary files. It is almost impossible to encode those with little knowledge about the files. 

    I think that there is based on what you have written now less is to add then what is written on the page in the link I've already given you. 


    Success
    Cor

    Thursday, April 4, 2019 4:55 PM

  • I'm always surprised how persons give replies with code for a streamreader while they have not resolved this in your problem. 

    "because my text file contains single and double bytes of characters."

    That is only possible with a non Text file. (Therefore a binary file.). But that has no rows. (which means lines ended by a VBCRLF, a LF or a CR)

    Try first to investigate with Notepad if you can read the file completely. If that is the case but the characters are presented wrong, than it can be a code page (language used in file) or a UTF problem. That needs than a kind of solution likewise Viorel has shown with encoding in the first reply of this thread. 

    If your file is not a by notepad readable file. Than you can try to solve it with the BinaryReader. Be aware that files like Doc, Exe, Avi are all binary files. It is almost impossible to encode those with little knowledge about the files. 

    I think that there is based on what you have written now less is to add then what is written on the page in the link I've already given you. 


    Success
    Cor

    Sorry, I have no idea how to solve problem until now.

    So, I uploaded the result (file and snapshot) of using the above mentioned methods at the following link:

    need to read the text file. the source text file sample "text4.txt" is loaded at previous mentioned link.  the correct length of string is 486 per line.

    By using:

    1) FileStream

    2) File.ReadAllLines of Streamwrite class

    https://ufile.io/mw9sc

    please check.  Thanks !



    • Edited by koklee Friday, April 5, 2019 4:40 PM
    Friday, April 5, 2019 11:30 AM

  • As mentioned in previous, I want to get all "B" (from "B" start position to "B" end position) line by line and write into a new file.

    >e.g. get all fixed length of "B" with space each line from text file.

    What *exactly* does "fixed length of "B" with space" mean?

    >I want to get all "B" (from "B" start position to "B" end position)

    What *exactly* does that mean? Do you mean you want to get all characters from
    the first "B" to the last "B"? Give a clearer example.

    Where did this file come from, and how was it created?

    Since it appears to have mixed single-byte characters and double-byte characters
    there isn't likely to be any single function that can tell where one type begins
    and ends and where the other does. Computers can't read minds (yet), so there is
    no way one can "know" exactly what is in a file or record which has arbitrary
    content and formatting.

    - Wayne

    Saturday, April 6, 2019 4:29 AM

  • So, I uploaded the result (file and snapshot) of using the above mentioned methods at the following link:

    need to read the text file. the source text file sample "text4.txt" is loaded at previous mentioned link.  the correct length of string is 486 per line.


    https://ufile.io/mw9sc




    I don't know how you got a length of 389.

    If I use Viorel_'s example I get a length of 412 for each line, whether I use
    Encoding.UTF8 or leave it unspecified. If I use Encoding.ASCII I get 486 as
    the length of each line.

    Imports System.IO
    Imports System.Text
    Module Module1
        Sub Main()
            Dim ls As List(Of String) = New List(Of String)
            'For Each line In File.ReadLines("test4.txt", Encoding.UTF8)
            '412
            '412
            '412
            'For Each line In File.ReadLines("test4.txt")
            '412
            '412
            '412
            For Each line In File.ReadLines("test4.txt", Encoding.ASCII)
                '486
                '486
                '486
                Console.WriteLine(line.Length)
                ls.Add(line)
            Next
            Console.WriteLine()
            For Each s In ls
                Console.WriteLine(s.Length)
            Next
            '486
            '486
            '486
        End Sub
    End Module
    
    

    You still haven't given any info about the font, codepage/encoding, locale, etc.
    that you are using.

    - Wayne

    Saturday, April 6, 2019 4:33 AM

  • As mentioned in previous, I want to get all "B" (from "B" start position to "B" end position) line by line and write into a new file.

    >e.g. get all fixed length of "B" with space each line from text file.

    What *exactly* does "fixed length of "B" with space" mean?

    >I want to get all "B" (from "B" start position to "B" end position)

    What *exactly* does that mean? Do you mean you want to get all characters from
    the first "B" to the last "B"? Give a clearer example.

    Where did this file come from, and how was it created?

    Since it appears to have mixed single-byte characters and double-byte characters
    there isn't likely to be any single function that can tell where one type begins
    and ends and where the other does. Computers can't read minds (yet), so there is
    no way one can "know" exactly what is in a file or record which has arbitrary
    content and formatting.

    - Wayne

    Hi Wayne

    Thank you for reply.

    [[ What *exactly* does that mean? Do you mean you want to get all characters from
    the first "B" to the last "B" ]]

    Yes, I want to get all characters from fist "B" to the last "B". Position from 86 to 225 in text4.txt.
    The result is same to the uploaded text file of  "FileSteam.read write result.txt"

    The code which I used is using MSDN reference code of FileSteam.read
    https://docs.microsoft.com/en-us/dotnet/api/system.io.filestream.read?view=netframework-4.7.2

    Just to change variable "numBytesRead" and numBytesToRead".
    However, it will whole the whole characters once into bytes array instead of line by line.

    And, I found that "ReadAllLines.Length" cannot give the correct line number of text4.txt.
    It just shows 1 line in debugger as it has 486 chars per line.
    If used fewer characters per line, ReadAllLines.Length can count the line number correctly.
    So, does "ReadAllLines" not work properly in my case ?

    Thank you.


    • Edited by koklee Saturday, April 6, 2019 7:43 AM
    Saturday, April 6, 2019 7:34 AM

  • So, I uploaded the result (file and snapshot) of using the above mentioned methods at the following link:

    need to read the text file. the source text file sample "text4.txt" is loaded at previous mentioned link.  the correct length of string is 486 per line.


    https://ufile.io/mw9sc




    I don't know how you got a length of 389.

    If I use Viorel_'s example I get a length of 412 for each line, whether I use
    Encoding.UTF8 or leave it unspecified. If I use Encoding.ASCII I get 486 as
    the length of each line.

    Imports System.IO
    Imports System.Text
    Module Module1
        Sub Main()
            Dim ls As List(Of String) = New List(Of String)
            'For Each line In File.ReadLines("test4.txt", Encoding.UTF8)
            '412
            '412
            '412
            'For Each line In File.ReadLines("test4.txt")
            '412
            '412
            '412
            For Each line In File.ReadLines("test4.txt", Encoding.ASCII)
                '486
                '486
                '486
                Console.WriteLine(line.Length)
                ls.Add(line)
            Next
            Console.WriteLine()
            For Each s In ls
                Console.WriteLine(s.Length)
            Next
            '486
            '486
            '486
        End Sub
    End Module
    

    You still haven't given any info about the font, codepage/encoding, locale, etc.
    that you are using.

    - Wayne

    Hi Wayne,

    The codepage should be ASCII. So, the length characters per line is 486 and it is correct.

    How can i I get the fixed length of position( 86 to 225) of characters by line by for further process as the text source sample of text4.txt ?

    Thank you.

    Saturday, April 6, 2019 7:39 AM

  • The codepage should be ASCII. So, the length characters per line is 486 and it is correct.

    How can i I get the fixed length of position( 86 to 225) of characters by line by for further process as the text source sample of text4.txt ?


    The following code is intended as a rough proof of concept example only.

    It uses a BinaryReader and BinaryWriter

    It uses a List of bytes as well as a byte array by way of examples.
    The List could probably be eliminated and just the array used.

    It only processes the first line in the input file test4.txt.

    I assume that you have the experience to be able to add what is needed to
    iterate over the input file processing all lines in the same way as the first.

    It creates a file testOut.txt that contains the entire first line from the
    input file.

    It creates a file testSub.txt that contains bytes 86 to 225 of the line as
    requested. This file matches byte for byte the "FileSteam.read write result.txt"
    file in your zip.

    Imports System.IO
    
    Module Module1
    
        Sub Main()
            Dim fileName = "test4.txt"
            Dim line(1000) As Byte
            Dim lb As List(Of Byte) = New List(Of Byte)
            Dim b As Byte
    
            Dim reader As BinaryReader = New BinaryReader(File.Open(fileName, FileMode.Open))
    
            Try
                b = reader.ReadByte
                While b <> &HA
                    lb.Add(b)
                    b = reader.ReadByte
                End While
            Catch
                'Console.WriteLine()
            End Try
    
            If lb.Item(lb.Count - 1) = &HD Then
                lb.RemoveAt(lb.Count - 1)
            Else
                Console.WriteLine("Houston, we have a problem.")
                Return
            End If
    
            ' write one line
            Dim writer As BinaryWriter = New BinaryWriter(File.Open("testOut.txt", FileMode.Create))
            For x As Integer = 0 To lb.Count - 1
                writer.Write(lb.Item(x))
            Next
            writer.Close()
    
            For x As Integer = 0 To lb.Count - 1
                line(x) = lb.Item(x)
            Next
    
            ' write subs line
            Dim writer2 As BinaryWriter = New BinaryWriter(File.Open("testSub.txt", FileMode.Create))
            writer2.Write(line, 85, 140)
            writer2.Close()
    
        End Sub
    End Module
    

    - Wayne

    Saturday, April 6, 2019 11:03 AM
  • The codepage should be ASCII. So, the length characters per line is 486 and it is correct.

    How can i I get the fixed length of position( 86 to 225) of characters by line by for further process as the text source sample of text4.txt ?

    Thank you.

    The code page is not ASCII. ASCII encoding does not produce byte values greater than 127, and your test file is full of values like 162 and 208.


    My best guess, based on the byte values from the file that you say should represent the character 'B', is that the text file was created using the text encoding corresponding to Microsoft's code page 950, which Microsoft calls "big5" encoding.

    You can use this encoding to read the contents of your "test4.txt" file, and you'll get back 3 lines that look like:

    311800006.50AAAAAAA          AAAAAAA          020311800000071138  1   0BBB、B、BBB、BBB、BBB、BBB、BB、BBB、BBBB、BBB、BBBBB、BBB、BBB、BBB
              BCC                                        N                            0000.00DDD                                  EEEEEEEEEE63E
                                   FFFF            GGGGGG        H                                      4

    There are 389 characters in that line, but they occupy 486 bytes in the file.

    You can then use conventional String manipulation methods to pull out the characters that you are interested in. Here's an example:

    Imports System.IO
    Imports System.Text
    
    Module Module1
    
        Sub Main()
            '   change next line to use the path to your own test file
            Dim textFilePath As String = "C:\Test\Test4\Test4.txt"
            Dim textEncoding As Encoding = Encoding.GetEncoding(950)
    
            '   Uncomment the next line if your Console's
            '   OutputEncoding is Not set to 950/big5 Or similar.
            '   You may also run into problems if your Console
            '   is not using the correct Font.
            'Console.OutputEncoding = Encoding.UTF8
    
    
            For Each line As String In File.ReadLines(textFilePath, textEncoding)
                Dim charsPerLine As Integer = line.Length
    
                Dim indxFirstB As Integer = line.IndexOf("B")
                Dim indxLastB As Integer = line.LastIndexOf("B")
    
                If indxFirstB <> -1 Then
                    Dim allTheBs As String = line.Substring(indxFirstB, (indxLastB - indxFirstB) + 1)
    
                    Console.WriteLine("Line: ({0} characters per line)" & vbNewLine & line, charsPerLine)
                    Console.WriteLine()
    
                    Console.Write("First B at character offset:  " & indxFirstB)
                    Console.WriteLine(" ,   Last B at character offset:  " & indxLastB)
                    Console.WriteLine()
    
                    Console.WriteLine("Just The Bs:" & vbNewLine & allTheBs)
                    Console.WriteLine()
    
                    Console.WriteLine()
                    Console.WriteLine()
                End If
            Next
    
    
            Console.Read()
        End Sub
    
    End Module

    and it will spit out something like:

    Line: (389 characters per line)
    311800006.50AAAAAAA          AAAAAAA          020311800000071138  1   0BBB、B、BBB、BBB、BBB、BBB、BB、BBB、BBBB、BBB、BBBBB、BBB、BBB、BBB
              BCC                                        N                            0000.00DDD                                  EEEEEEEEEE63E
                                   FFFF            GGGGGG        H                                      4

    First B at character offset:  71 ,   Last B at character offset:  154

    Just The Bs:
    BBB、B、BBB、BBB、BBB、BBB、BB、BBB、BBBB、BBB、BBBBB、BBB、BBB、BBB                            B


    Note that if you are going to write that string containing just the B characters to a file, then you probably will want to use encoding 950 during the write operation, otherwise the data in your new file will not match the data in the existing file (because different encodings produce different byte values for the same character).





    • Edited by S P C Saturday, April 6, 2019 1:47 PM
    Saturday, April 6, 2019 1:46 PM

  • The codepage should be ASCII. So, the length characters per line is 486 and it is correct.

    How can i I get the fixed length of position( 86 to 225) of characters by line by for further process as the text source sample of text4.txt ?


    The following code is intended as a rough proof of concept example only.

    It uses a BinaryReader and BinaryWriter

    It uses a List of bytes as well as a byte array by way of examples.
    The List could probably be eliminated and just the array used.

    It only processes the first line in the input file test4.txt.

    I assume that you have the experience to be able to add what is needed to
    iterate over the input file processing all lines in the same way as the first.

    It creates a file testOut.txt that contains the entire first line from the
    input file.

    It creates a file testSub.txt that contains bytes 86 to 225 of the line as
    requested. This file matches byte for byte the "FileSteam.read write result.txt"
    file in your zip.

    Imports System.IO
    
    Module Module1
    
        Sub Main()
            Dim fileName = "test4.txt"
            Dim line(1000) As Byte
            Dim lb As List(Of Byte) = New List(Of Byte)
            Dim b As Byte
    
            Dim reader As BinaryReader = New BinaryReader(File.Open(fileName, FileMode.Open))
    
            Try
                b = reader.ReadByte
                While b <> &HA
                    lb.Add(b)
                    b = reader.ReadByte
                End While
            Catch
                'Console.WriteLine()
            End Try
    
            If lb.Item(lb.Count - 1) = &HD Then
                lb.RemoveAt(lb.Count - 1)
            Else
                Console.WriteLine("Houston, we have a problem.")
                Return
            End If
    
            ' write one line
            Dim writer As BinaryWriter = New BinaryWriter(File.Open("testOut.txt", FileMode.Create))
            For x As Integer = 0 To lb.Count - 1
                writer.Write(lb.Item(x))
            Next
            writer.Close()
    
            For x As Integer = 0 To lb.Count - 1
                line(x) = lb.Item(x)
            Next
    
            ' write subs line
            Dim writer2 As BinaryWriter = New BinaryWriter(File.Open("testSub.txt", FileMode.Create))
            writer2.Write(line, 85, 140)
            writer2.Close()
    
        End Sub
    End Module

    - Wayne

    Hi Wayne,

    Thank you for your sample code.

    It can read the char bytes one-by-one starting initially.

    May I know the idea of reading line is loop to read bytes array and write bytes array to files in same loop ?

    While there are found &HA, then pass-by to to read next char bytes.

    Thanks.

    Monday, April 8, 2019 3:59 PM
  • The codepage should be ASCII. So, the length characters per line is 486 and it is correct.

    How can i I get the fixed length of position( 86 to 225) of characters by line by for further process as the text source sample of text4.txt ?

    Thank you.

    The code page is not ASCII. ASCII encoding does not produce byte values greater than 127, and your test file is full of values like 162 and 208.


    My best guess, based on the byte values from the file that you say should represent the character 'B', is that the text file was created using the text encoding corresponding to Microsoft's code page 950, which Microsoft calls "big5" encoding.

    You can use this encoding to read the contents of your "test4.txt" file, and you'll get back 3 lines that look like:

    311800006.50AAAAAAA          AAAAAAA          020311800000071138  1   0BBB、B、BBB、BBB、BBB、BBB、BB、BBB、BBBB、BBB、BBBBB、BBB、BBB、BBB
              BCC                                        N                            0000.00DDD                                  EEEEEEEEEE63E
                                   FFFF            GGGGGG        H                                      4

    There are 389 characters in that line, but they occupy 486 bytes in the file.

    You can then use conventional String manipulation methods to pull out the characters that you are interested in. Here's an example:

    Imports System.IO
    Imports System.Text
    
    Module Module1
    
        Sub Main()
            '   change next line to use the path to your own test file
            Dim textFilePath As String = "C:\Test\Test4\Test4.txt"
            Dim textEncoding As Encoding = Encoding.GetEncoding(950)
    
            '   Uncomment the next line if your Console's
            '   OutputEncoding is Not set to 950/big5 Or similar.
            '   You may also run into problems if your Console
            '   is not using the correct Font.
            'Console.OutputEncoding = Encoding.UTF8
    
    
            For Each line As String In File.ReadLines(textFilePath, textEncoding)
                Dim charsPerLine As Integer = line.Length
    
                Dim indxFirstB As Integer = line.IndexOf("B")
                Dim indxLastB As Integer = line.LastIndexOf("B")
    
                If indxFirstB <> -1 Then
                    Dim allTheBs As String = line.Substring(indxFirstB, (indxLastB - indxFirstB) + 1)
    
                    Console.WriteLine("Line: ({0} characters per line)" & vbNewLine & line, charsPerLine)
                    Console.WriteLine()
    
                    Console.Write("First B at character offset:  " & indxFirstB)
                    Console.WriteLine(" ,   Last B at character offset:  " & indxLastB)
                    Console.WriteLine()
    
                    Console.WriteLine("Just The Bs:" & vbNewLine & allTheBs)
                    Console.WriteLine()
    
                    Console.WriteLine()
                    Console.WriteLine()
                End If
            Next
    
    
            Console.Read()
        End Sub
    
    End Module

    and it will spit out something like:

    Line: (389 characters per line)
    311800006.50AAAAAAA          AAAAAAA          020311800000071138  1   0BBB、B、BBB、BBB、BBB、BBB、BB、BBB、BBBB、BBB、BBBBB、BBB、BBB、BBB
              BCC                                        N                            0000.00DDD                                  EEEEEEEEEE63E
                                   FFFF            GGGGGG        H                                      4

    First B at character offset:  71 ,   Last B at character offset:  154

    Just The Bs:
    BBB、B、BBB、BBB、BBB、BBB、BB、BBB、BBBB、BBB、BBBBB、BBB、BBB、BBB                            B


    Note that if you are going to write that string containing just the B characters to a file, then you probably will want to use encoding 950 during the write operation, otherwise the data in your new file will not match the data in the existing file (because different encodings produce different byte values for the same character).





    Hi SPC,

    Many thanks for your idea to solve the problem.

    I think the code idea from Wayne is more suitable to fix my problem.

    Since, the "B" will not fixed and it will be changed another set of strings randomly.

    And I would like to fix to get char position at 86 to 225 line by line.

    It can be neglect any combinations of bytes char per lines (fixed length to 486).

    Since ReadLine will show fluctuated result while there are different combination of single and double bytes chars per line in text file.

    Monday, April 8, 2019 4:13 PM

  • Many thanks for your idea to solve the problem.

    I think the code idea from Wayne is more suitable to fix my problem.

    Since, the "B" will not fixed and it will be changed another set of strings randomly.

    And I would like to fix to get char position at 86 to 225 line by line.

    It can be neglect any combinations of bytes char per lines (fixed length to 486).

    Since ReadLine will show fluctuated result while there are different combination of single and double bytes chars per line in text file.

    Koklee,

    Be aware there are very few persons who dare to open a given link on Internet. Therefore if you give information then give it here. 

    I adviced to try to read your file inside notepad, if that does not go you cannot read it with a textreader in whatever format. You never replied if your file goes in notepad without errors.

    It is difficult to tell you something. Let me try to do it in another way. What kind of system you are talking about. This forum is around Windows systems. Those are in 16Bit, 32bit and 64bit. A string is always a string of characters. A char is in all those system at least 16Bit and not one byte. 

    An ASCII character is from the time of the 7bits papertapereader. It contains papercontrol and characters codes. ASCII encoded in Net means that this is used inside 4 bytes. 

    I told in my previous message in this thread that a file can be binary. It contains than in fact a sequence of bytes and you are the one to order it. But that cannot done by using normal text handling code. You really should handle it in a byte way. 

    Are you maybe busy with a Commodore computer or something like that? This forum is meant for Visual Basic for 32 and 64 bit systems, which can be for Win'95 (16bits, but that is then only .Net 1.x)


    Success
    Cor


    Monday, April 8, 2019 4:49 PM

  • Thank you for your sample code.

    It can read the char bytes one-by-one starting initially.

    May I know the idea of reading line is loop to read bytes array and write bytes array to files in same loop ?

    While there are found &HA, then pass-by to to read next char bytes.

    I'm not sure what you are asking for, but I think you want to know how to
    process the whole input file rather than just the first line.

    Here's an example that reads the test4.txt file and copies it exactly to the
    file "V2testOut.txt". This is done mainly to verify the program's logic, and
    to ensure that it is processing all lines in the input.

    Note that in the file you provide - test4.txt - the last line does not end with
    a CRLF as do all of the prior lines in the file. This has been considered in
    the code below. No CRLF is added by the program to the end of this file, if
    there isn't one in the input file. Note that I have not tested the program with
    an input file that DOES end with CRLF.

    The program also creates the file "V2testSub.txt" which contains the substrings
    you specified from each line. Each substring (line) in this file ends with CRLF.

    Sub Main()
        Dim fileName = "test4.txt"
        Dim line(1000) As Byte
        Dim lb As List(Of Byte) = New List(Of Byte)
        Dim b As Byte
    
        Dim writer As BinaryWriter = New BinaryWriter(File.Open("V2testOut.txt", FileMode.Create))
        Dim writer2 As BinaryWriter = New BinaryWriter(File.Open("V2testSub.txt", FileMode.Create))
        Dim reader As BinaryReader = New BinaryReader(File.Open(fileName, FileMode.Open))
    
        Try
            While True
                b = reader.ReadByte
                lb.Add(b)
                If b = &HA And lb.Count > 1 Then
                    If lb.Item(lb.Count - 2) = &HD Then
                        ' write line to both files
                        For x As Integer = 0 To lb.Count - 1
                            writer.Write(lb.Item(x))
                        Next
    
                        For x As Integer = 0 To lb.Count - 1
                            line(x) = lb.Item(x)
                        Next
                        ' write subs line
                        writer2.Write(line, 85, 140)
                        writer2.Write(CByte(&HD))
                        writer2.Write(CByte(&HA))
    
                        ' clear list and array
                        lb.Clear()
                        Array.Clear(line, 0, line.Length)
                    End If
                End If
    
            End While
    
        Catch ex As EndOfStreamException
            ' End of File
            If lb.Count > 0 Then
                ' write line to both files
                For x As Integer = 0 To lb.Count - 1
                    writer.Write(lb.Item(x))
                Next
    
                For x As Integer = 0 To lb.Count - 1
                    line(x) = lb.Item(x)
                Next
                ' write subs line
                writer2.Write(line, 85, 140)
                writer2.Write(CByte(&HD))
                writer2.Write(CByte(&HA))
            End If
        End Try
    
        writer.Close()
        writer2.Close()
    
    End Sub
    

    As before, I have used both a list and an array just to simplify somewhat
    the coding. It could be rewritten to use just one or the other. But I've
    contributed enough code to this project. Feel free to modify the code yourself
    if you're so inclined.

    Of course, if there are any logic or syntax errors in my code I will address
    those. But I won't keep altering the code whenever you think up some new
    requirement or enhancement. It's your project so you should be doing the
    bulk of the programming yourself.

    Note that the repetition of the file writing code could be eliminated with the
    creation of a Sub to be called in both places. I leave that as an exercise for
    you. A simple copy & paste was easier for me, and as I'm not getting paid for
    this I will follow the path of least resistance.

    - Wayne

    • Marked as answer by koklee Saturday, April 20, 2019 3:14 PM
    Monday, April 8, 2019 7:18 PM

  • Thank you for your sample code.

    It can read the char bytes one-by-one starting initially.

    May I know the idea of reading line is loop to read bytes array and write bytes array to files in same loop ?

    While there are found &HA, then pass-by to to read next char bytes.

    I'm not sure what you are asking for, but I think you want to know how to
    process the whole input file rather than just the first line.

    Here's an example that reads the test4.txt file and copies it exactly to the
    file "V2testOut.txt". This is done mainly to verify the program's logic, and
    to ensure that it is processing all lines in the input.

    Note that in the file you provide - test4.txt - the last line does not end with
    a CRLF as do all of the prior lines in the file. This has been considered in
    the code below. No CRLF is added by the program to the end of this file, if
    there isn't one in the input file. Note that I have not tested the program with
    an input file that DOES end with CRLF.

    The program also creates the file "V2testSub.txt" which contains the substrings
    you specified from each line. Each substring (line) in this file ends with CRLF.

    Sub Main()
        Dim fileName = "test4.txt"
        Dim line(1000) As Byte
        Dim lb As List(Of Byte) = New List(Of Byte)
        Dim b As Byte
    
        Dim writer As BinaryWriter = New BinaryWriter(File.Open("V2testOut.txt", FileMode.Create))
        Dim writer2 As BinaryWriter = New BinaryWriter(File.Open("V2testSub.txt", FileMode.Create))
        Dim reader As BinaryReader = New BinaryReader(File.Open(fileName, FileMode.Open))
    
        Try
            While True
                b = reader.ReadByte
                lb.Add(b)
                If b = &HA And lb.Count > 1 Then
                    If lb.Item(lb.Count - 2) = &HD Then
                        ' write line to both files
                        For x As Integer = 0 To lb.Count - 1
                            writer.Write(lb.Item(x))
                        Next
    
                        For x As Integer = 0 To lb.Count - 1
                            line(x) = lb.Item(x)
                        Next
                        ' write subs line
                        writer2.Write(line, 85, 140)
                        writer2.Write(CByte(&HD))
                        writer2.Write(CByte(&HA))
    
                        ' clear list and array
                        lb.Clear()
                        Array.Clear(line, 0, line.Length)
                    End If
                End If
    
            End While
    
        Catch ex As EndOfStreamException
            ' End of File
            If lb.Count > 0 Then
                ' write line to both files
                For x As Integer = 0 To lb.Count - 1
                    writer.Write(lb.Item(x))
                Next
    
                For x As Integer = 0 To lb.Count - 1
                    line(x) = lb.Item(x)
                Next
                ' write subs line
                writer2.Write(line, 85, 140)
                writer2.Write(CByte(&HD))
                writer2.Write(CByte(&HA))
            End If
        End Try
    
        writer.Close()
        writer2.Close()
    
    End Sub

    As before, I have used both a list and an array just to simplify somewhat
    the coding. It could be rewritten to use just one or the other. But I've
    contributed enough code to this project. Feel free to modify the code yourself
    if you're so inclined.

    Of course, if there are any logic or syntax errors in my code I will address
    those. But I won't keep altering the code whenever you think up some new
    requirement or enhancement. It's your project so you should be doing the
    bulk of the programming yourself.

    Note that the repetition of the file writing code could be eliminated with the
    creation of a Sub to be called in both places. I leave that as an exercise for
    you. A simple copy & paste was easier for me, and as I'm not getting paid for
    this I will follow the path of least resistance.

    - Wayne

    Hi Wayne,

    I appreciated your help and the code. Your code is very useful to me that knowing how to apply idea into coding. Thank you.

    Actually, it is my interest that try the problem solving daily. I will try your suggestions.

    Besides, i found that if the sample of source text file which is saved either "Unicode" or "UTF-8", the output result (V2testSub.txt) is totally different (will get wrong position chars).

    How can I save the source text file into ANSI encoding before processing the bytes read in VB.net ?

    Many thanks!



    • Edited by koklee Saturday, April 20, 2019 12:50 PM
    Saturday, April 20, 2019 11:54 AM
  • Hello,

    See if the following might work for you. Pass your file name into OpenStreamReaderWithEncoding. Then read lines as per the following code sample starting at the Do While.

    Imports System.IO
    Imports System.Text
    
    Public Module ReadLinesWithEncodingExpample
        ''' <summary>
        ''' Run this
        ''' </summary>
        ''' <param name="pFileName"></param>
        ''' <returns></returns>
        Public Function OpenStreamReaderWithEncoding(pFileName As String) As StreamReader
            Dim enc As Encoding = GetFileEncoding(pFileName)
    
            Return New StreamReader(pFileName, enc)
    
        End Function
    
        Public Function GetFileEncoding(ByVal pFileName As String) As Encoding
            Dim enc As Encoding = Encoding.Default
    
            ' *** Detect byte order mark if any - otherwise assume default
            Dim buffer(4) As Byte
            Dim file As New FileStream(pFileName, FileMode.Open)
    
            file.Read(buffer, 0, 5)
            file.Close()
    
            If buffer(0) = &HEF AndAlso buffer(1) = &HBB AndAlso buffer(2) = &HBF Then
                enc = Encoding.UTF8
            ElseIf buffer(0) = &HFE AndAlso buffer(1) = &HFF Then
                enc = Encoding.Unicode
            ElseIf buffer(0) = 0 AndAlso buffer(1) = 0 AndAlso buffer(2) = &HFE AndAlso buffer(3) = &HFF Then
                enc = Encoding.UTF32
            ElseIf buffer(0) = &H2B AndAlso buffer(1) = &H2F AndAlso buffer(2) = &H76 Then
                enc = Encoding.UTF7
            End If
    
            Return enc
    
        End Function
    
    End Module
    


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    • Marked as answer by koklee Saturday, April 20, 2019 3:15 PM
    Saturday, April 20, 2019 1:34 PM
    Moderator
  • Hi Karen,

    Thank you for your code sample.

    It can covert Unicode text file into Ansi encoding now.

    Saturday, April 20, 2019 3:14 PM

  • Many thanks for your idea to solve the problem.

    I think the code idea from Wayne is more suitable to fix my problem.

    Since, the "B" will not fixed and it will be changed another set of strings randomly.

    And I would like to fix to get char position at 86 to 225 line by line.

    It can be neglect any combinations of bytes char per lines (fixed length to 486).

    Since ReadLine will show fluctuated result while there are different combination of single and double bytes chars per line in text file.

    Koklee,

    Be aware there are very few persons who dare to open a given link on Internet. Therefore if you give information then give it here. 

    I adviced to try to read your file inside notepad, if that does not go you cannot read it with a textreader in whatever format. You never replied if your file goes in notepad without errors.

    It is difficult to tell you something. Let me try to do it in another way. What kind of system you are talking about. This forum is around Windows systems. Those are in 16Bit, 32bit and 64bit. A string is always a string of characters. A char is in all those system at least 16Bit and not one byte. 

    An ASCII character is from the time of the 7bits papertapereader. It contains papercontrol and characters codes. ASCII encoded in Net means that this is used inside 4 bytes. 

    I told in my previous message in this thread that a file can be binary. It contains than in fact a sequence of bytes and you are the one to order it. But that cannot done by using normal text handling code. You really should handle it in a byte way. 

    Are you maybe busy with a Commodore computer or something like that? This forum is meant for Visual Basic for 32 and 64 bit systems, which can be for Win'95 (16bits, but that is then only .Net 1.x)


    Success
    Cor


    Hi Cor,

    Thank you for your advice.

    Honestly, it is quite hard to me for understanding how solving problem of fixed different bytes characters issue. And it can be solved by using bytes read and write method as shared by WayneAKing.

    And there are rare code sample from internet, even from MSDN code reference.

    I appreciated all of your help and the given direction how to solve the problem.

    Thank you!

    Saturday, April 20, 2019 3:28 PM
  • Hi SPC,

    Thanks a lot for your reply and advice. It is a hard topic to me actually.

    Saturday, April 20, 2019 3:30 PM