none
HELP: Convert PDF to HTML

    Question

  • Hello all Programmers!

    I want to write a software myself that can Convert PDF file's content to HTML. But I don't know where to start:

    ? Is there any valid ActiveX for this?

    ? Any exist Function that can extract PDF contents and add it to an HTML file?

    ? Or anything else that a newbie like me don't know?

    So wonderful if I can receive any reply of yours... Thanks a lot!

    So many things to learn in this world, hihi  : - )

    Saturday, December 23, 2006 3:52 PM

Answers

  • you need to use some PDF SDK that maybe available then generate html from it in some way. in .NET there is no direct way since PDF's are not a Microsoft product really but Adobe as well as having no relevance to the .NET Framework itself.

    I believe there are a couple of open source PDF projects you can look at and use and customize the way you want to use the application to your needs

    maybe take a look at this:

    http://www.codeproject.com/csharp/MgPDFReader.asp

    commercial/SDK:

    http://www.pdfonline.com/

     

    im sure there are more somewhere

    Saturday, December 23, 2006 3:59 PM
    Moderator
  • I have surfed through the Internet, and I found a proper solution for my app, that is: Convert PDF file to Image file.

    I think this way can give a better result than my first idea. Cos' when I used a professional software that convert PDF to HTML, I got a fail result when that program convert a table in a PDF file to HTML: the text in each cell does not stay in its right place :( And I think there will be no error like that when I convert PDF content to Image (at least, as I think, cos I only just a newbie in this subject,hihi : - p )

    Now I have the source code for my 2nd idea, and I'm downloading the Acrobat SDK to use it in .NET.

    Here is the link to that relative topic: http://www.developerfusion.co.uk/show/5091/2/

    It is about Creating a thumbnail of a PDF file, but I think I can use it, too... just a little change in the code, maybe...

    But I still want many many helps from our friendly public.

    Please tell me what u think about my new solution. Is it really a better way? Oops, 'bout me, i think yes...

    Saturday, December 23, 2006 6:50 PM

All replies

  • you need to use some PDF SDK that maybe available then generate html from it in some way. in .NET there is no direct way since PDF's are not a Microsoft product really but Adobe as well as having no relevance to the .NET Framework itself.

    I believe there are a couple of open source PDF projects you can look at and use and customize the way you want to use the application to your needs

    maybe take a look at this:

    http://www.codeproject.com/csharp/MgPDFReader.asp

    commercial/SDK:

    http://www.pdfonline.com/

     

    im sure there are more somewhere

    Saturday, December 23, 2006 3:59 PM
    Moderator
  • Thank you very very much!

    I'm following your instructions! Hope to see the light... hihi!!!

    I have only just posted this subject and I can't believe that there will be a reply as fast as this!

    I feel really surprise and happy, cause this is the first time I post a question to MSDN forum. Thanks you and MSDN!

    I believe that I can receive more n more helps in this friendly Forum!

    Saturday, December 23, 2006 4:31 PM
  • Thank-you :-) We appreciate your valuable feedback and certainly hope you gain benefit from this place. We do try our best and it makes us feel great that you are finding it useful/to your benefit. That's all that matters
    Saturday, December 23, 2006 4:36 PM
    Moderator
  • I have surfed through the Internet, and I found a proper solution for my app, that is: Convert PDF file to Image file.

    I think this way can give a better result than my first idea. Cos' when I used a professional software that convert PDF to HTML, I got a fail result when that program convert a table in a PDF file to HTML: the text in each cell does not stay in its right place :( And I think there will be no error like that when I convert PDF content to Image (at least, as I think, cos I only just a newbie in this subject,hihi : - p )

    Now I have the source code for my 2nd idea, and I'm downloading the Acrobat SDK to use it in .NET.

    Here is the link to that relative topic: http://www.developerfusion.co.uk/show/5091/2/

    It is about Creating a thumbnail of a PDF file, but I think I can use it, too... just a little change in the code, maybe...

    But I still want many many helps from our friendly public.

    Please tell me what u think about my new solution. Is it really a better way? Oops, 'bout me, i think yes...

    Saturday, December 23, 2006 6:50 PM
  • Hello all Programmers!

    I want to write a program myself that can Convert PDF file's content to HTML or pdf to image in c#

    Tuesday, November 09, 2010 9:35 AM
  • Convert HTML to PDF Using PDFonFly
    1.Go to pdfonfly.com.
    2.Click on the "Text/HTML to PDF" link.
    3.Click on "Source" in the text editing tool and paste your HTML.
    4.Click on the "Create PDF" button.

    And more i like to share with you for PDF Converter Reviews.I hope who encountered PDF converter problem before can work out that.
    By following this PDF Convert Column,you can refer to these articles and easy to convert PDF to Word,PDF to Epub,PDF to Html,PDF to image and more.

    Wednesday, March 23, 2011 12:00 PM
  • Hi QuangHuy!

    I try to help You :)

    You may use for Your software third-party libraries.

    PDF Focus and RTF to HTML (if use both )will allows to convert from PDF to HTML! I've try to use Focus - library works  very well. And i think rtf to html not so difficult to do and all formating'll be save. So...just try to do it

    PDF Focus .Net

    RTF to HTML

    I've prepare sample code for You (maybe will be helpful):

    	Step 1 - PDF to RTF
                SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
                f.OpenPdf(@"d:\Hello.pdf");
    
                if (f.PageCount > 0)
                {
                    int result = f.ToWord(@"d:\Hello.rtf");
                }
    	Step 2 - RTF to HTML
    			SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
                r.OutputFormat = SautinSoft.RtfToHtml.eOutputFormat.HTML_5;
                r.ImageStyle.IncludeImageInHtml = true;
                string rtf = "d:\Hello.rtf"; //Get RTF from database
                string html = r.ConvertString(rtf);
    			
    			SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
    			//Convert to RTF back
                rtf = h.ConvertString(html);

    Tuesday, March 20, 2012 8:59 AM
  • Hi!

    Starting from 2014 the PDF Focus .Net library provides API to convert PDF to HTML. Let's us say, you want to convert PDF file to HTML file in C#:

    SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
    f.OpenPdf(@"c:\Odyssey.pdf");
    f.ToHtml(@"c:\Odyssey.html");

    PDF Focus .Net is commercial SDK, the price for PDF to HTML edition starts from $299.

    Cheers,
    Mx

    Tuesday, March 04, 2014 5:45 PM