locked
downloading text file RRS feed

Answers

  • User-434868552 posted

    @rds80

    simply click your link:  http://www.wpc.ncep.noaa.gov/discussions/pmdspdbody.txt 

    you will see there are new line characters.

    Run this part of your code:

    System.Net.WebClient wc = new System.Net.WebClient();
    byte[] dataBuffer = wc.DownloadData("http://www.wpc.ncep.noaa.gov/discussions/pmdspdbody.txt");

    inspect dataBuffer, newline characters are present:

    Byte[] (2558 items) 
    83 104 111 114 116 32 82 97 110 103 101 32 70 111 114 101 99 97 115 116 32 68 105 115 99 117 115 115 105 111 110 
    10 <==   \n    newline 
    
    78 87 83 32 87 101 97 116 104 101 114 32 80 114 101 100 105 99 116 105 111 110 32 67 101 110 116 101 114 32 67 111 108 108 101 103 101 32 80 97 114 107 32 77 68 
    10 <==   \n    newline

    after:

    String download = System.Text.Encoding.UTF8.GetString(dataBuffer);

    download looks like this:

    Short Range Forecast Discussion
    NWS Weather Prediction Center College Park MD
    259 AM EST Tue Jan 05 2016
    
        et cetera

    So you see, you do have line breaks.

    Next:

    System.IO.File.WriteAllText(@"C:\Temp\file.txt", download, Encoding.Unicode);

    creates a Unicode text file (2 x 2558) + 2 == 5518 bytes long.

    When i open that file in NotePad, i see this (without the bolding):

    Short Range Forecast DiscussionNWS Weather Prediction Center College Park MD259 AM EST T  et cetera

    This is because NotePad does not easily understand a newline only end of line.

    Programs like NotePad2 open the file correctly:

    Short Range Forecast Discussion
    NWS Weather Prediction Center College Park MD
    259 AM EST Tue Jan 05 2016
    
              et cetera

    One way to fix your problem is add this line before writing the output to make programs like NotePad happy:

    download = download.Replace("\n", "\r\n");

    NotePad will be happy and other programs like NotePad2 will still work as expected.

    edit:

    Note:  the Unicode file gets an extra 2 bytes at the start:  x'FFFE' ... this tells Windows this is a Unicode text file; more specifically:

    UTF-16, little endian

    see https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396#

    "Using Special Characters in Unicode ==> Using Byte Order Marks"

    end edit.

    EDIT #2:

    N.B.:  DO NOT write to the root of your drive:

    @"C:\Temp\file.txt"  // place file in directory Temp

    END EDIT #2,

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, January 5, 2016 9:29 AM

All replies

  • User364480375 posted

    rds80 your downloaded text  doc is not coming proper or u want to read text doc. kindly clear that part.

    Tuesday, January 5, 2016 5:43 AM
  • User-434868552 posted

    @rds80

    simply click your link:  http://www.wpc.ncep.noaa.gov/discussions/pmdspdbody.txt 

    you will see there are new line characters.

    Run this part of your code:

    System.Net.WebClient wc = new System.Net.WebClient();
    byte[] dataBuffer = wc.DownloadData("http://www.wpc.ncep.noaa.gov/discussions/pmdspdbody.txt");

    inspect dataBuffer, newline characters are present:

    Byte[] (2558 items) 
    83 104 111 114 116 32 82 97 110 103 101 32 70 111 114 101 99 97 115 116 32 68 105 115 99 117 115 115 105 111 110 
    10 <==   \n    newline 
    
    78 87 83 32 87 101 97 116 104 101 114 32 80 114 101 100 105 99 116 105 111 110 32 67 101 110 116 101 114 32 67 111 108 108 101 103 101 32 80 97 114 107 32 77 68 
    10 <==   \n    newline

    after:

    String download = System.Text.Encoding.UTF8.GetString(dataBuffer);

    download looks like this:

    Short Range Forecast Discussion
    NWS Weather Prediction Center College Park MD
    259 AM EST Tue Jan 05 2016
    
        et cetera

    So you see, you do have line breaks.

    Next:

    System.IO.File.WriteAllText(@"C:\Temp\file.txt", download, Encoding.Unicode);

    creates a Unicode text file (2 x 2558) + 2 == 5518 bytes long.

    When i open that file in NotePad, i see this (without the bolding):

    Short Range Forecast DiscussionNWS Weather Prediction Center College Park MD259 AM EST T  et cetera

    This is because NotePad does not easily understand a newline only end of line.

    Programs like NotePad2 open the file correctly:

    Short Range Forecast Discussion
    NWS Weather Prediction Center College Park MD
    259 AM EST Tue Jan 05 2016
    
              et cetera

    One way to fix your problem is add this line before writing the output to make programs like NotePad happy:

    download = download.Replace("\n", "\r\n");

    NotePad will be happy and other programs like NotePad2 will still work as expected.

    edit:

    Note:  the Unicode file gets an extra 2 bytes at the start:  x'FFFE' ... this tells Windows this is a Unicode text file; more specifically:

    UTF-16, little endian

    see https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396#

    "Using Special Characters in Unicode ==> Using Byte Order Marks"

    end edit.

    EDIT #2:

    N.B.:  DO NOT write to the root of your drive:

    @"C:\Temp\file.txt"  // place file in directory Temp

    END EDIT #2,

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, January 5, 2016 9:29 AM
  • User1124521738 posted

    This is a UNIX/Linux vs Windows issue.  Windows does CRLF and *NIX does LF only.  Use something like notepad++ to open and line terminators will be honored.

    You can do a replace in string, but test for existence of \r\n or Environment.NewLine in the string body first, else you may end up with a mix of crlf and crcrlf in the body and no editor will like that.  I'll often do a replace of crlf to lf and cr to lf then take lf to crlf just to get all the likely combinations (Macs historically did cr only in the days before OS X)

    See also http://stackoverflow.com/questions/140926/normalize-newlines-in-c-sharp and http://www.dotnetperls.com/whitespace

    Tuesday, January 5, 2016 9:51 AM
  • User1094877758 posted

    Thanks Gerry.  I have to say this is an amazing and a very thorough answer.

    As you suggested, I have placed the file into a specific subfolder off the root C directory. 

    Tuesday, January 5, 2016 2:14 PM
  • User-434868552 posted

    @rds80

    My pleasure; you are welcome.

    i had the advantage of having looked at the contents of dataBuffer using LINQPad's .Dump method; however, as ninianne98 pointed out above, if you are not sure about the what types of line ending characters are causing you grief, you may need to do more analysis and possibly have additional replaces.

    Example, if your data looked something roughly like this:

    ......\n....\r\n....\n

    replacing     \n with   \r\n     would give you

    ......\r\n....\r\r\n....\r\n

    Now you've got a bigger problem.  TIMTOWTDI

    First, for  ......\n....\r\n....\n     replace    \r\n    with  \n    giving

    .....\n....\n....\n

    Next replace     \n     with    \r\n      giving

    .....\r\n....\r\n....\r\n

    MORE  INFORMATION

    \r\r\n|   i.e.,   x'0d0d0a' will drive end users towards insanity because many text editors are not prepared to handle CRCRLF

    Recommended reading:   https://en.wikipedia.org/wiki/Newline 

    EDIT:

    i use Jan Fiala's freeware programmer's editor PSPad for examining files when i need to view the data as hexadecimal.  http://pspad.com  

    END EDIT.

    Tuesday, January 5, 2016 11:33 PM