Extra characters when reading and parsing pdf file in vb RRS feed

  • Question

  • User869176912 posted

    Hi all,

    When I open and read the pdf file everything looks fine, but whenever I try to read and parse that same pdf file all of a sudden there are a bunch of extra characters. And so whenever my code is looking for a specific string, it's not finding it.


    When I open the pdf file I see this

    Membership ID: 1111111

    But when I open and parse each line I get this

    MembershipMembership ID:ID: <<MemberId>>1111111

    Can someone explain to me why those extra characters are there? And how can I get rid of them or account for them in my code when I'm reading and parsing pdf files.

    Thank you 

    Tuesday, January 9, 2018 6:04 PM

All replies

  • User2053451246 posted

    What library are you using to parse the pdf?  If not using any library you are most likely seeing the objects the text is actually stored in.

    Tuesday, January 9, 2018 7:08 PM
  • User869176912 posted

    I'm using aspose library

    Wednesday, January 10, 2018 5:27 PM
  • User303363814 posted

    Maybe the Aspose support forums would give better help on Aspose products?

    Wednesday, January 10, 2018 9:43 PM