locked
A dummy XML file is failing RRS feed

  • 问题

  • A dummy XML file is failing. Here it is:

    That's the dummy input file. Here the executing VB2008 that the file is failing on:

     

     







    That's the dummy input file. Here the executing VB2008 that the file is failing on:

    Imports System.IO

    Public Class Form1

        Private FileName As String = "C:\Temp\DataBase\thefile.xml"

        Dim DS As New DataSet

        Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

            Dim loc As String : Dim writing As String = "" : Dim SavdLoc As String = "" : Dim files As Integer = 0

            OpenFile(FileName, DS)

            For Count As Integer = 0 To DG1.RowCount - 2

                If Not DG1.Rows(Count).Cells(1).Value(0) = "*" Then

                    Loc = DG1.Rows(Count).Cells(0).Value.ToString

                    If SavdLoc = loc Then

                        writing = DG1.Rows(Count).Cells(1).Value.ToString

                        files += 1

                    Else

                        writing = DG1.Rows(Count).Cells(0).Value.ToString

                    End If

                    SavdLoc = loc

                End If '

                tb1.Text = tb1.Text + vbCrLf + writing

            Next

             AdjustIO(tb1)

        End Sub

        Private Function AdjustIO(ByVal TB As TextBox) As Size
    'This routine does nothing
           
    Dim foo As TextBox : Dim siz As New Size

            siz.Height = 20

     

            TB.Size = Me.Size

     

            Return TB.Size

        End Function

        Private Sub OpenFile(ByRef fileSpecification As String, ByRef dats As DataSet)

            Dim SR As New IO.StreamReader(fileSpecification, System.Text.Encoding.Unicode)

            DS.ReadXml(SR)'fails here

            DG1.DataSource = dats

            DG1.DataMember = "FILE_LIST"

            SR.Close()

    End Sub

    End Class


    The 'fails here(DS.readXML in Openfile is the point of a silent failure) with nothing returned. A read datafile requires about an hour to process. I dont have an hour for each failure so I trying for the failing dummy file. I've never seen the dataset method used and this is one and no error is retuned. 
    Renee

    The 'fails here(DS.readXML in Openfile is the point of a silent failure) with nothing returned. A read datafile requires about an hour to process. I dont have an hour for each failure so I trying for the failing dummy file. I've never seen the dataset method used and this is one and no error is retuned. 
    Renee

     

    2009年10月14日 2:16

答案

  • Renee

    This is all about the differences in the encoding with different files, as discussed in the earlier thread.  I've looked into it a little more and believe the following is correct.

    If there is no BOM then the reader assumes it is UTF-8.

    Your original file doesn't have a BOM but it uses UTF-16.

    Depending upon how you edit the file it may get saved as UTF-8 (VS) or UTF-16 (Notepad).

    It's all a bit of a mess really.

    So ...... you need to work out what encoding is being used for yourself.

    There is no way to do this provided in the .Net framework.

    There doesn't appear to be a foolproof way of doing it.

    I've come up with this which works for all the files I have, although it is not exhaustive.

    BTW - I've never seen the condition where the read takes an enormously long time.

    Imports System.IO

    Public Class Form1

        Private FileName As String = "thefile.xml"

        Dim DS As New DataSet

        Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

            OpenFile(FileName)

        End Sub

        Private Sub OpenFile(ByRef Filename As String)

            Dim Coding As System.Text.Encoding = GetEncoding(Filename)

            Dim SR As New IO.StreamReader(Filename, Coding)

            DS.ReadXml(SR)

            DG1.DataSource = DS

            DG1.DataMember = "FILE_LIST"

            SR.Close()

        End Sub

        Private Function GetEncoding(ByVal Filename As String) As System.Text.Encoding

            Dim Coding As System.Text.Encoding

            Dim FS As New FileStream(Filename, FileMode.Open, FileAccess.Read)

            Dim Header(1) As Byte

            FS.Read(Header, 0, 2)

            FS.Close()

            Dim HeaderString As String = Hex(Header(0)) & Hex(Header(1))

            Select Case HeaderString

                Case "FFFE"

                    ' Unicode UTF-16, little endian

                    Coding = System.Text.Encoding.Unicode

                Case "FEFF"

                    ' Unicode UTF-16, big endian

                    Coding = System.Text.Encoding.BigEndianUnicode

                Case "EFBB"

                    ' probably UTF-8

                    Coding = System.Text.Encoding.UTF8

                Case Else   ' No BOM (maybe)

                    If Header(0) = 0 Then

                        ' probably big endian

                        Coding = System.Text.Encoding.BigEndianUnicode

                    ElseIf Header(1) = 0 Then

                        ' probably little endian

                        Coding = System.Text.Encoding.Unicode

                    Else

                        ' probably UTF-8 but I wouldn't put money on it

                        Coding = System.Text.Encoding.UTF8

                    End If

            End Select

            Return Coding

        End Function

    End Class 

    2009年10月14日 11:07

全部回复

  • WRITER_METADATA xmlns="x-schema:#VssWriterMetadataInfo" version="1.2">
      <IDENTIFICATION writerId="e8132975-6f93-4464-a53e-1050253ae220" instanceId="4998e264-4773-484c-9ec9-457c4fcb38c1" friendlyName="System Writer" usage="BOOTABLE_SYSTEM_STATE" dataSource="OTHER" majorVersion="0" minorVersion="0" /> 
      <RESTORE_METHOD method="REPLACE_AT_REBOOT_IF_CANNOT_REPLACE" writerRestore="never" rebootRequired="yes" />
    <BACKUP_LOCATIONS>
    <FILE_GROUP componentName="System Files" caption="System Files" restoreMetadata="no" notifyOnBackupComplete="no" selectable="no" selectableForRestore="no" componentFlags="0">
      <FILE_LIST path="C:\Windows\system32\CatRoot\{127D0A1D-4EF2-11D1-8608-00C04FC295EE}" filespec="*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\system32\CatRoot\{F750E6C3-38EE-11D1-85E5-00C04FC295EE}" filespec="*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\system32\CatRoot2\{127D0A1D-4EF2-11D1-8608-00C04FC295EE}" filespec="*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\system32\CatRoot2\{F750E6C3-38EE-11D1-85E5-00C04FC295EE}" filespec="*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\winsxs" filespec="*.*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\servicing\packages" filespec="*.*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys" filespec="*.*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\System32\Microsoft\Protect" filespec="*.*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\ServiceProfiles\NetworkService\AppData\Roaming\Microsoft\SoftwareProtectionPlatform" filespec="*.*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\system32\DriverStore\FileRepository" filespec="*.*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="C:\Windows\assembly" filespec="*" recursive="yes" filespecBackupType="3855" /> 
      <FILE_LIST path="c:\program files (x86)\windows sidebar\gadgets\currency.gadget\images" filespec="delete_up.png" filespecBackupType="3855" /> 
      <FILE_LIST path="c:\program files (x86)\windows sidebar\gadgets\weather.gadget\images" filespec="8.png" filespecBackupType="3855" /> 
      <FILE_LIST path="c:\program files (x86)\windows sidebar\gadgets\weather.gadget\images" filespec="docked_blue_partly-cloudy.png" filespecBackupType="3855" /> 
      <FILE_LIST path="c:\program files\common files\microsoft shared\stationery" filespec="blue_gradient.jpg" filespecBackupType="3855" /> 
      </FILE_GROUP>
      </BACKUP_LOCATIONS>
      </WRITER_METADATA>
    
    2009年10月14日 2:25
  • The above is the exact file that is failing. A much longer file works but requires an hour to process and there arent that many hours in the day.
    The failure (marked) on readxml is silent and returns no error.
    Renee
    2009年10月14日 2:30
  • Alright I've place the following in OpenFile

     

    Private Sub OpenFile(ByRef fileSpecification As String, ByRef dats As DataSet)

     

    Dim SR As New IO.StreamReader(fileSpecification, System.Text.Encoding.Unicode)

     

    Try

    DS.ReadXml(SR)

     

    Catch ex As Exception

    MsgBox(

    "Error connecting to database. Exception was: " & vbCrLf & ex.Message)

     

    End Try

    DG1.DataSource = dats

    DG1.DataMember =

    "FILE_LIST"

    SR.Close()

     

    End Sub

    I get the following error and I wrote about this the other day. "The data at the root level is invalid line 1 position 1.

    Renee

    2009年10月14日 3:17
  • By the way. This suc*ky editor wrote the routine above generally correct only I wrote it much neater. I am totally defeated by the failures in this editor. It totally fails when driven by a word file and it fails without filtering it through word.

    I URGE YOU ALL TO COMPLAIN. A GROUP EFFORT ON FAILURE REPORTING IS THE ONLY THING THAT WILL BE EFFECTIVE. THERE IS NO REASON MICROSOFT SHOULD FAIL ON THIS EXCEPT THAT THEY DONT WANT TO PAY PEOPLE TO DO IT (But Microsoft is quite willing to take the profits on VS)

    Renee
    2009年10月14日 3:29
  • Yep. The same file that has always run, runs now. without modification or error.
    2009年10月14日 3:43
  • Renee

    This is all about the differences in the encoding with different files, as discussed in the earlier thread.  I've looked into it a little more and believe the following is correct.

    If there is no BOM then the reader assumes it is UTF-8.

    Your original file doesn't have a BOM but it uses UTF-16.

    Depending upon how you edit the file it may get saved as UTF-8 (VS) or UTF-16 (Notepad).

    It's all a bit of a mess really.

    So ...... you need to work out what encoding is being used for yourself.

    There is no way to do this provided in the .Net framework.

    There doesn't appear to be a foolproof way of doing it.

    I've come up with this which works for all the files I have, although it is not exhaustive.

    BTW - I've never seen the condition where the read takes an enormously long time.

    Imports System.IO

    Public Class Form1

        Private FileName As String = "thefile.xml"

        Dim DS As New DataSet

        Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

            OpenFile(FileName)

        End Sub

        Private Sub OpenFile(ByRef Filename As String)

            Dim Coding As System.Text.Encoding = GetEncoding(Filename)

            Dim SR As New IO.StreamReader(Filename, Coding)

            DS.ReadXml(SR)

            DG1.DataSource = DS

            DG1.DataMember = "FILE_LIST"

            SR.Close()

        End Sub

        Private Function GetEncoding(ByVal Filename As String) As System.Text.Encoding

            Dim Coding As System.Text.Encoding

            Dim FS As New FileStream(Filename, FileMode.Open, FileAccess.Read)

            Dim Header(1) As Byte

            FS.Read(Header, 0, 2)

            FS.Close()

            Dim HeaderString As String = Hex(Header(0)) & Hex(Header(1))

            Select Case HeaderString

                Case "FFFE"

                    ' Unicode UTF-16, little endian

                    Coding = System.Text.Encoding.Unicode

                Case "FEFF"

                    ' Unicode UTF-16, big endian

                    Coding = System.Text.Encoding.BigEndianUnicode

                Case "EFBB"

                    ' probably UTF-8

                    Coding = System.Text.Encoding.UTF8

                Case Else   ' No BOM (maybe)

                    If Header(0) = 0 Then

                        ' probably big endian

                        Coding = System.Text.Encoding.BigEndianUnicode

                    ElseIf Header(1) = 0 Then

                        ' probably little endian

                        Coding = System.Text.Encoding.Unicode

                    Else

                        ' probably UTF-8 but I wouldn't put money on it

                        Coding = System.Text.Encoding.UTF8

                    End If

            End Select

            Return Coding

        End Function

    End Class 

    2009年10月14日 11:07
  • Thank you Dave for a couple of things......

    Thank you for listening to my analysis AND thank you for your first cut at the solution. You know, the Microsoft people have had their hands in this. There is no doubt in what they read or wrote when they write and read the file, only when we read it. I will work on it today and try out your solution.

    Renee
    2009年10月14日 12:02
  • perhaps my little application can help in this.
    http://community.visual-basic.it/Diego/archive/2007/08/31/20361.aspx
    You can translate the page form Italian to English and then download my copysourcetohtml "addin" (it's not an Addin, but works!). So you can translate your code from IDE to html source.
    I agree there is some problem with the editor, but I'm sure Microsoft guys are working on it.
    please, mark this as answer if it is THE answer
    ----------------
    Diego Cattaruzza
    Microsoft MVP - Visual Basic: Development
    blog: http://community.visual-basic.it/Diego
    web site: http://www.visual-basic.it
    2009年10月14日 12:33
  • Diego,
    I hope the deletion was from you. Your message was full sized and not deletable as there were no buttons for the deletion. I finally had to use the task manager to delete it. And diego, I couln't fing the file after I got there.
    Renee
    2009年10月14日 13:21
  • " A read datafile requires about an hour to process."

    Dave,

    I was clear. At least, I think I was. The read doesn't take long. The processing of it does! It took me years to learn to communicate that way. It nearly gave me heart failure to think I'd miscommunicated, but I didn't.

    Renee
    2009年10月14日 13:33
  • Renne

    OK - I missed that bit.

    The main reason for the slowness is that you are updating the textbox text with every line.

    Try this instead.  I haven't timed it accurately but it should be about a second on your original file.  Note the addition of ToString on the first line of the loop.  Your original code won't work with Option Strict On and interestingly with ToString added it runs over twice as quickly.

            Dim SB As New System.Text.StringBuilder

            For Count As Integer = 0 To DG1.RowCount - 2

                    If Not DG1.Rows(Count).Cells(1).Value.ToString(0) = "*" Then

                        loc = DG1.Rows(Count).Cells(0).Value.ToString

                        If savdloc = loc Then

                            writing = DG1.Rows(Count).Cells(1).Value.ToString

                            files += 1

                        Else

                            writing = DG1.Rows(Count).Cells(0).Value.ToString

                        End If

                        savdloc = loc

                        SB.Append(vbCrLf & writing)

                    End If

            Next

            TB1.Text = SB.ToString

     

    • 已编辑 Dave299 2009年10月15日 8:56 Removed unnecessary If statement
    2009年10月14日 15:06
  • I know that is the reason.
    Will get to the advanced stuff later.

    Anyway i

    Old file = utf16

    00000000  FF FE 3C 00 57 00 52 00-49 00 54 00 45 00 52 00  ..<.W.R.I.T.E.R.

    00000010  5F 00 4D 00 45 00 54 00-41 00 44 00 41 00 54 00  _.M.E.T.A.D.A.T.

    00000020  41 00 20 00 78 00 6D 00-6C 00 6E 00 73 00 3D 00  A. .x.m.l.n.s.=.

    00000030  22 00 78 00 2D 00 73 00-63 00 68 00 65 00 6D 00  ".x.-.s.c.h.e.m.

    00000040  61 00 3A 00 23 00 56 00-73 00 73 00 57 00 72 00  a.:.#.V.s.s.W.r.

    00000050  69 00 74 00 65 00 72 00-4D 00 65 00 74 00 61 00  i.t.e.r.M.e.t.a.

    00000060  64 00 61 00 74 00 61 00-49 00 6E 00 66 00 6F 00  d.a.t.a.I.n.f.o.

    00000070  22 00 20 00 76 00 65 00-72 00 73 00 69 00 6F 00  ". .v.e.r.s.i.o.

    00000080  6E 00 3D 00 22 00 31 00-2E 00 32 00 22 00 3E 00  n.=.".1...2.".>.

     

    The file that will not work = UTF-8

    00000000  3C 57 52 49 54 45 52 5F-4D 45 54 41 44 41 54 41  <WRITER_METADATA

    00000010  20 78 6D 6C 6E 73 3D 22-78 2D 73 63 68 65 6D 61   xmlns="x-schema

    00000020  3A 23 56 73 73 57 72 69-74 65 72 4D 65 74 61 64  :#VssWriterMetad

    00000030  61 74 61 49 6E 66 6F 22-20 76 65 72 73 69 6F 6E  ataInfo" version

    00000040  3D 22 31 2E 32 22 3E 3C-49 44 45 4E 54 49 46 49  ="1.2"><IDENTIFI

    00000050  43 41 54 49 4F 4E 20 77-72 69 74 65 72 49 64 3D  CATION writerId=

    00000060  22 65 38 31 33 32 39 37-35 2D 36 66 39 33 2D 34  "e8132975-6f93-4

    00000070  34 36 34 2D 61 35 33 65-2D 31 30 35 30 32 35 33  464-a53e-1050253

    00000080  61 65 32 32 30 22 20 69-6E 73 74 61 6E 63 65 49  ae220" instanceI

    Here I have it perfectly.

    Wow!!! You used a string builder in that! I was going to get to that myself.

    There's a fundamental question we haven't asked which is how the editor knows? Or is it something as silly as the presence or absense of a BOM?

    Renee

    2009年10月15日 0:22
  • The answer is that good editors will check and protest but finally do what you say. I had to change the file extension and then tell word to...but I finally did it.

    00000000 3C 00 57 00 52 00 49 00-54 00 45 00 52 00 5F 00 <.W.R.I.T.E.R._.
    00000010 4D 00 45 00 54 00 41 00-44 00 41 00 54 00 41 00 M.E.T.A.D.A.T.A.
    00000020 20 00 78 00 6D 00 6C 00-6E 00 73 00 3D 00 22 00 .x.m.l.n.s.=.".
    00000030 78 00 2D 00 73 00 63 00-68 00 65 00 6D 00 61 00 x.-.s.c.h.e.m.a.
    00000040 3A 00 23 00 56 00 73 00-73 00 57 00 72 00 69 00 :.#.V.s.s.W.r.i.
    00000050 74 00 65 00 72 00 4D 00-65 00 74 00 61 00 64 00 t.e.r.M.e.t.a.d.
    00000060 61 00 74 00 61 00 49 00-6E 00 66 00 6F 00 22 00 a.t.a.I.n.f.o.".
    00000070 20 00 76 00 65 00 72 00-73 00 69 00 6F 00 6E 00 .v.e.r.s.i.o.n.
    00000080 3D 00 22 00 31 00 2E 00-32 00 22 00 3E 00 3C 00 =.".1...2.".>.<.
    00000090 49 00 44 00 45 00 4E 00-54 00 49 00 46 00 49 00 I.D.E.N.T.I.F.I.


    This is the bad file "made good". Renee
    2009年10月15日 0:43
  • I played around with all the code you and Dave have posted and
    cut and paste your XML file also.
    I cross-tested with another xml file that I have.
    My file got past the error point being caused by your file.
    Upon closer examination of your xml file, I see that it is
    missing the starting <  before the first word.
    needs:
    <WRITER
    not
    WRITER

    then your file was able to load into the datagrid.
    Job searching in L.A.
    2009年10月15日 1:53
  • Oh, Im aware of that due to a "cut and paste" error. I dont think you could have edited the file without something that converted the file to UTF-16 anyway at which point it would have done fine.

    It loaded for you because by the time you read it, it was UTF-16.

    Renee
    2009年10月15日 3:19
  • Oh and dave.....
    Read the code carefully. You'll find (I think) that on the new directories....There is a new line just for the directory that appears. The rest of time, the file name (inside the directory above) is wriiten. So what I am saying is that, I intend to be doing exactly what I am doing in processing that file.
    Renee
    2009年10月15日 3:54