none
PDF file in XML! RRS feed

  • Question

  • Hello, I have a problem!

    We have website where the users enter information, the information is stored in an XML file which is sent to a WCF service upon user request. The WCF service reads the XML file and inserts the information into an SQL Server database. So far so good...

    The problem is that the users should be able to attach external files (pdf files) to the XML, and the WCF webservice should "extract" the external file(s) from the XML and insert it into a BLOB column in the SQL Server database.

    Is this possible to do and if so how!?

    Regards,

    Kenbla


    Kenbla

    Tuesday, December 3, 2013 2:09 PM

Answers

  • You can base64-encode the data and insert it right to the XML in a dedicated tag. Or you can create a reference to external entity in the XML, in which case you will need to transfer the attachment separately.

    Sincerely yours, Eugene Mayevski

    Tuesday, December 3, 2013 2:58 PM
  • Hi,

    PDF documents are structured very differently from XML documents.  The format is essentially designed for page layout and some information is usually discarded. For example, text in a PDF document gets broken into arbitrary chunks of just a few characters long.  Unless the PDF document has been tagged in advance, it's not possible to deduce which fragments of text constitute a sentence, or a paragraph, with any certainty.

    Then if so you can convert the PDF file to XML.

    For how to achieve it, please try to check the following articles:

    #pdf to xml conversion using .NET:
    http://stackoverflow.com/questions/6287880/pdf-to-xml-conversion-using-net .

    #Convert table in PDF to XML file in C#:
    http://bytescout.com/products/developer/pdfextractorsdk/convert-pdf-to-xml .

    Best Regards,
    Amy Peng


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, December 4, 2013 9:06 AM
    Moderator

All replies

  • You can base64-encode the data and insert it right to the XML in a dedicated tag. Or you can create a reference to external entity in the XML, in which case you will need to transfer the attachment separately.

    Sincerely yours, Eugene Mayevski

    Tuesday, December 3, 2013 2:58 PM
  • Hi,

    PDF documents are structured very differently from XML documents.  The format is essentially designed for page layout and some information is usually discarded. For example, text in a PDF document gets broken into arbitrary chunks of just a few characters long.  Unless the PDF document has been tagged in advance, it's not possible to deduce which fragments of text constitute a sentence, or a paragraph, with any certainty.

    Then if so you can convert the PDF file to XML.

    For how to achieve it, please try to check the following articles:

    #pdf to xml conversion using .NET:
    http://stackoverflow.com/questions/6287880/pdf-to-xml-conversion-using-net .

    #Convert table in PDF to XML file in C#:
    http://bytescout.com/products/developer/pdfextractorsdk/convert-pdf-to-xml .

    Best Regards,
    Amy Peng


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, December 4, 2013 9:06 AM
    Moderator