none
Analysis of multiple XML data files RRS feed

  • Question

  • I have many XML data files that I want to analyse.

     

    My plan is either to combine them into one Access database or Excel workbook (Office 2003).

     

    The XML files are complex, containing nested tables of data, but they to not have a schema.

     

    When I open them in manually in Excel, I can use the third option “Use the XML Source task pane” so that I can select the relevant data elements and drag them to a worksheet. I can repeat this on multiple worksheets to get just the data I need in separate tables (sheets). If I open the XML file with options 1 or 2 I get all the data in one worksheet in one big de-normalised table that is impossible to work with. So I planned to use a macro to open each XML file in turn, and use XPath to assign the relevant elements from the map to worksheets before copying this data to one consolidation workbook. However when testing I hit the following problem which I can only assume to be due to the complexity of the XML files:

     

    Error: -2147467259 “The XPath is not valid because either the XPath syntax is incorrect or not supported by Excel.”

     

    This error occurs at the following line:

    lstExtract.ListColumns(1).XPath.SetValue mapAutoCreated, strXPath

     

    strXPath = "/diagOutput/category/structure/caption"

    I’m pretty certain there is nothing wrong with this syntax, having experimented with other variations with the same error every time.

     

    The help file doesn’t really help as I can’t do anything to change to the schema as it was created automatically by Excel on opening the file. I have considered writing a simplified schema but this is beyond my current capability (the idea being to be impose the alternative schema by vba after opening each file).

     

    I experimented with Microsoft Access using the interface to manually import one of the XML files, and this seemed to work until I opened the tables to find they were populated with the square carriage return character and no legible characters.

     

    I’m now at a loss as to how to continue and would be very grateful for suggestions.

     

    Richard Wright

    VBA developer
    Friday, July 15, 2011 12:20 PM

Answers

  • Hi Rick

    <<I have done this manually with one file, and if this is possible programatically then I think I can get the data required.>>

    You might want to follow up on this point in an XML / xSLT forum. It should be possible for you to transform these XML files (XSLT) to include the style sheet. And it ought to be a fairly simple XSLT, since you want to add only one thing, and that somewhere near the top of the XML file, as I recall. But I'm not that steeped in the technology that I'd dare try help you (and I'm fairly busy at the moment, so no time to really research it).

    Assuming you can put together this XSLT fairly easily, you can link in the MXSML library to a VBA project. COPY the XML files in a folder. Use Dir (or whatever you like) to loop all of them and the XML DOM to open each, run the XSLT, and close. (Again, this isn't something I've got at the top of my fingers, but I know it should be "that easy" when you know which properties and methods are neeeded.)


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by Bruce Song Monday, August 1, 2011 10:58 AM
    Friday, July 15, 2011 4:51 PM
    Moderator

All replies

  • Hi Rick

    From the sound of it (complexity of the XML files) I suspect Access would be the better container for the data, if you could get it in there. The "square characters" probably have something to do with the encoding of the XML file and perhaps the version of Access you're using. Is this also 2003? That version might not support Unicode or something, which could be the reason for the problem.

    Do the XML files contain any "uncommon" characters that require Unicode? Or could you try changing the encoding attribute in the xml declaration?

    Would you like me to move your question to the Access for Developers forum to see if they can confirm my guess and whether they can offer any advice?

    Are you familiar enough with XML and perhaps OLE DB that you'd consider doing a "conversion" of the XML to a *.mdb via code rather than relying on an internal converter? If yes, which programming languages are you familiar with? Only VBA?

    I've never heard many compilmentary things about using the XML tools in Excel. They seem to have major problems. But if that would be your preferred application we can move your question to the Excel for Developers forum where you'll find the application specialists.


    Cindy Meister, VSTO/Word MVP
    Friday, July 15, 2011 3:33 PM
    Moderator
  • Thank you very much Cindy

    You have made some very good points which I will look into further, though for now I can confirm there are now unususal characters as far as I can tell. The encoding in the header of each file is:

    <?xml version="1.0" encoding="ISO-8859-1"?>

    Since raising the question, I have found a stylesheet that could be very useful. The only thing is that the xml files make no reference to it so I'm thinking I will have to programatically open every file and insert the reference, save the modified file and open it in Excel with the stylesheet.

    I have done this manually with one file, and if this is possible programatically then I think I can get the data required. If not then I think I'll be following up your suggestion to move the question over to Access forum.

    Thanks again
    Rick

    Friday, July 15, 2011 4:12 PM
  • Hi Rick

    <<I have done this manually with one file, and if this is possible programatically then I think I can get the data required.>>

    You might want to follow up on this point in an XML / xSLT forum. It should be possible for you to transform these XML files (XSLT) to include the style sheet. And it ought to be a fairly simple XSLT, since you want to add only one thing, and that somewhere near the top of the XML file, as I recall. But I'm not that steeped in the technology that I'd dare try help you (and I'm fairly busy at the moment, so no time to really research it).

    Assuming you can put together this XSLT fairly easily, you can link in the MXSML library to a VBA project. COPY the XML files in a folder. Use Dir (or whatever you like) to loop all of them and the XML DOM to open each, run the XSLT, and close. (Again, this isn't something I've got at the top of my fingers, but I know it should be "that easy" when you know which properties and methods are neeeded.)


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by Bruce Song Monday, August 1, 2011 10:58 AM
    Friday, July 15, 2011 4:51 PM
    Moderator
  • Thanks Cindy

    I managed to complete the job. It was a "quick-n-dirty" method admittedly but as its a one-off it didn't matter if it wasn't pretty.

    I used VBA, and having proved that it was possible to amend the xml files by inserting a standard 2nd line that referred to the xsl file, I was able to do them all as you suggest by looping through all files in the directory.

    This made the second process possible to extract data once the xml files could be opened in Excel. Again this was another VBA project.

    Thanks for your help with this and sorry I didn't close the thead off earlier.

    Regards
    Rick

    Monday, August 1, 2011 11:44 AM
  • Hi Rick

    Glad you got it solved and finished :-)


    Cindy Meister, VSTO/Word MVP
    Monday, August 1, 2011 12:30 PM
    Moderator