none
Converting Docx to PDF

    Question

  • Hi,

    I need to programmatically (C#) convert docx documents into pdf. Is there a freeware/open source solution that permits me to do that without having to interface to the office COM component?

    I need to integrate this solution in a SOA application, where the pdf is generated by a WCF service which may need to generate multiple pdf files per request, therefore the process has to be reasonably quick.

    I've looked into the possibility of printing the documents programmatically using a pdf printer, but I haven't found anything satisfying yet.

    Thanks.

    Tuesday, July 20, 2010 9:02 AM

Answers

  • Replying to my own post :).

    After long hours of researching on this topic, I came accross Bullzip PDF Creator http://www.bullzip.com/products/pdf/info.php . This product is the best freeware programmable pdf printer that I've used so far. The installer of pdf printer includes a .NET assembly (no COM!!) called Bullzip.PdfWriter which gives the possibility to print "silently" to pdf - with no user interaction. (The assembly is visible within the "Add Reference" dialog in Visual Studio in the .NET tab!). When the Print() method of this assembly is invoked on a Word 2007 document, it opens Word automatically, launches the Print command and closes Word within a couple of seconds. The pdf printer can also be configured to iterate through a collection of documents. The best thing about this product is that is very well documented, making it quite easy to set up.

    Here's a piece of code:

    using System;
    using System.IO;
    using System.Linq;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.ComponentModel;
    using System.Configuration;
    using System.ServiceModel;
    using Bullzip.PdfWriter;
    
    namespace DocxGenerator.SL.WCF
    {
      public class PdfMaker
      {    
        internal static byte[] PrintToPdf(string appFolder, string tempDocxFileName)
        {
          try
          {
            string tempFolder = appFolder + @"\temp";
            string tempDocxFilePath = tempFolder + @"\" + tempDocxFileName;
            
            PdfSettings pdfSettings = new PdfSettings();
            pdfSettings.PrinterName = ConfigurationManager.AppSettings["PdfPrinter"];
    
            string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
            pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
            pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
            pdfSettings.WriteSettings(settingsFile);
    
            PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
            string tempPdfFilePath = tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";
            
            bool fileCreated = false;
            while (!fileCreated) 
            {
              fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
            }
    
            byte[] pdfBytes = File.ReadAllBytes(tempPdfFilePath);
    
            File.Delete(tempDocxFilePath);
            File.Delete(tempPdfFilePath);
    
            return pdfBytes;        
          }
          catch (Exception ex)
          {
            throw new FaultException("WCF error!\r\n" + ex.Message);
          }
        }
      }
    
    }
    
    

    • Marked as answer by ErionPC Wednesday, July 21, 2010 11:29 AM
    Wednesday, July 21, 2010 11:28 AM

All replies

  • Replying to my own post :).

    After long hours of researching on this topic, I came accross Bullzip PDF Creator http://www.bullzip.com/products/pdf/info.php . This product is the best freeware programmable pdf printer that I've used so far. The installer of pdf printer includes a .NET assembly (no COM!!) called Bullzip.PdfWriter which gives the possibility to print "silently" to pdf - with no user interaction. (The assembly is visible within the "Add Reference" dialog in Visual Studio in the .NET tab!). When the Print() method of this assembly is invoked on a Word 2007 document, it opens Word automatically, launches the Print command and closes Word within a couple of seconds. The pdf printer can also be configured to iterate through a collection of documents. The best thing about this product is that is very well documented, making it quite easy to set up.

    Here's a piece of code:

    using System;
    using System.IO;
    using System.Linq;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.ComponentModel;
    using System.Configuration;
    using System.ServiceModel;
    using Bullzip.PdfWriter;
    
    namespace DocxGenerator.SL.WCF
    {
      public class PdfMaker
      {    
        internal static byte[] PrintToPdf(string appFolder, string tempDocxFileName)
        {
          try
          {
            string tempFolder = appFolder + @"\temp";
            string tempDocxFilePath = tempFolder + @"\" + tempDocxFileName;
            
            PdfSettings pdfSettings = new PdfSettings();
            pdfSettings.PrinterName = ConfigurationManager.AppSettings["PdfPrinter"];
    
            string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
            pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
            pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
            pdfSettings.WriteSettings(settingsFile);
    
            PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
            string tempPdfFilePath = tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";
            
            bool fileCreated = false;
            while (!fileCreated) 
            {
              fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
            }
    
            byte[] pdfBytes = File.ReadAllBytes(tempPdfFilePath);
    
            File.Delete(tempDocxFilePath);
            File.Delete(tempPdfFilePath);
    
            return pdfBytes;        
          }
          catch (Exception ex)
          {
            throw new FaultException("WCF error!\r\n" + ex.Message);
          }
        }
      }
    
    }
    
    

    • Marked as answer by ErionPC Wednesday, July 21, 2010 11:29 AM
    Wednesday, July 21, 2010 11:28 AM
  • Have you come across performance problems with this solution? the server must have Word installed, and all this open and close of documents is a little scary...


    http://pontodepartilha.blogspot.com
    Friday, August 20, 2010 5:30 PM
  • Just back from holidays...

    I haven't tested it this solution with a production load yet, but it's not very different from the well known Adobe Distiller. Something must always open the document in order to be able to print it. It's true that the server needs to have Word installed, but it only really needs Word Viewer, which is free. In order to prevent file access conflicts, Word files are placed in the system's temporary folder and named using GUIDs. A different Word file is generated at every "generatePDF" method call and destroyed at the end of the method's execution. What potential dangers can you see in this approach?

    Monday, September 6, 2010 7:16 AM
  • I have searched for several hours and cannot find the required assemblies. Where are you downloading Bullzip.PdfWriter from?
    Friday, November 12, 2010 10:14 PM
  • Hi there.I'm new to this forum :).

    Well I'm trying to convert a docx(generated using openxml) to a pdf but when I use this function with the bullzip dll it does not print.
    Here is a sample of the code, thank you in advance.I'm kind of desperate because I've tryed so many different approaches and nothing seems to work well.
    Thank you so much in advance.

     

    PS: Replyeing to chessfan, the dll are in the following path by default:

    C:\Program Files\Common Files\Bullzip\PDF Printer\API\Microsoft.NET

    You just have to add them on Visual Studio.

     

     protected void Page_Load(object sender, EventArgs e)

    {

           PrintToPdf("C:\\teste", docTemplate);
            }

            internal static byte[] PrintToPdf(string appFolder, string tempDocxFileName)
            {
                try
                {
                    string tempFolder = appFolder + @"\temp";
                    string tempDocxFilePath = tempFolder + @"\" + tempDocxFileName;

                    PdfSettings pdfSettings = new PdfSettings();
                    pdfSettings.PrinterName = ConfigurationManager.AppSettings["Bullzip PDF Printer"];

                    string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
                    pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
                    pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
                    pdfSettings.WriteSettings(settingsFile);

                    PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
                    string tempPdfFilePath = tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";
                  
                    bool fileCreated = false;
                    while (!fileCreated)
                    {
                        fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
                    }

                    byte[] pdfBytes = File.ReadAllBytes(tempPdfFilePath);

                    File.Delete(tempDocxFilePath);
                    File.Delete(tempPdfFilePath);

                    return pdfBytes;
                }
                catch (Exception ex)
                {
                    return null;
                }
            }

    Friday, January 14, 2011 12:01 PM
  • Hi, sorry but I just happened to visit this thread by chance and realized I haven't received the emails to notify me of new posts for some reason. 

    Anyway, for what it's worth, here's the link to download the Bullzip PDF Printer.

    http://bullzip.com/products/pdf/info.php#download

    Once you install the printer, the Bullzip.PdfWriter assembly gets installed into the GAC and you can import it into the VS Solution by using "Add Reference" -> ".NET".

    Cheers

    Thursday, March 31, 2011 9:50 AM
  • Hi, same as above, sorry but I just happened to visit this thread by chance and realized I haven't received the emails to notify me of new posts for some reason.

    I've written a couple of articles on CodeProject to illustrate the use of automatized DocX report generation and consequent PDF printing of those reports.

    Here are the links:

    http://www.codeproject.com/KB/office/soa-docx-generation.aspx

    http://www.codeproject.com/Tips/145780/A-SOA-approach-to-dynamic-DOCX-PDF-report-generati.aspx

     

    also on my blog http://erionpc.wordpress.com

     

    Cheers


    Thursday, March 31, 2011 9:56 AM
  • Hi ErionPC, Thanks for the post.

    It is very close to my situation I have fallowed Bullzip pdf conversion of  DOCX  to my web site.it worked for me at visual studio level but could not worked when i deployed on my webserver.

    I also Included seetings.ini in My website folder and build the fallowing code.

    string tempDocxFilePath = Request.PhysicalApplicationPath + @"documents\sample.doc";
    PdfSettings pdfSettings = new PdfSettings();
    pdfSettings.PrinterName = @"Bullzip PDF Printer";
    string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
    pdfSettings.LoadSettings(Request.PhysicalApplicationPath + @"documents\settings.ini");
    pdfSettings.SetValue("Output", Request.PhysicalApplicationPath + @"documents\<smarttitle>.pdf");
    pdfSettings.WriteSettings(settingsFile);
    PdfUtil.PrintFile(tempDocxFilePath,pdfSettings.PrinterName);
    string tempPdfFilePath = Request.PhysicalApplicationPath + @"documents\sample.pdf";

    I also fallowed another approach of using office Com Component directly as in my another post.but Could not get pdf.


    http://social.technet.microsoft.com/Forums/en-US/officewebappssetup/thread/cfd642bb-8ba4-42bc-86e0-86f7a81c0cea

     can anybody guide me the way to solve the problem with any API .net libraries or freeware tools to generate pdf
    it is killing my time.

    Thanks in Advance.


    Monday, July 25, 2011 2:23 PM
  • Hi,

    it seems like a folder permission problem. I don't know what version of Windows you're using but,

    for IIS 6 in XP make sure you give write privileges to the ASP.NET and IUSR_*,

    for Windows 2003 server give the write privileges to Network Service,

    for IIS7 in Windows 7 make sure you give the write privileges to IIS AppPool\DefaultAppPool (or whatever application pool your website is running in).

     

    ... obviously the Write privileges should be given for the folder where the PDF gets saved and also for the one where the settings file is generated.

     

    If it's working inside Visual Studio it means that the setup and your code are ok, you just need to configure it for IIS.

    Cheers


    Monday, July 25, 2011 2:47 PM
  • Hi ErionPC thanks for guidence.

    I hosted my website on IIS7 of windows server 2008 r2
    As you suggested that,
    1.application pool identitiy set to administrator account. AND
    2.In website feature view under "IIS" section, Authentication->ASP.NET Impersonation ->Identity to impersonate set to administrator   Account(Same can be done in web.config as <identity impersonate="true username="username" password="password" />)

    My observations:-

    When it run  "PdfUtil.PrintFile(tempDocxFilePath,pdfSettings.PrinterName)", winword.exe is started and stopped  and retruns true as same in visual studio but could not generate pdf file and

    fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000); remains false forever.

    Actually it was able to access the folder why because it is able to write the docx from by stramwriter Object.and could sent pdf(placed externally for test) to client by response.Transmitfile("Filepath").

    Do you want me to do any additions settings to IIS, if any please guide me briefly.Is this fesable for website development to give pdf to client, that will access from another system along network.

    Regards

    Tuesday, July 26, 2011 9:15 AM
  •     string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
        pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
        pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
        pdfSettings.WriteSettings(settingsFile);
    
        PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
        string tempPdfFilePath = tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";
        
        bool fileCreated = false;
        while (!fileCreated) 
        {
         fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
        }

    I deployed this on a Windows 2003 server some time ago. Unfortunately (or fortunately for the experts) there are a lot more settings in IIS7. All I can think of now is check that the IIS AppPool\DefaultAppPool has Write privileges on both "App_Data" and "temp" folders, referring to my previous example.

    Test that it works by creating a test file in both those folders from code.

    Another issue could be the 32bit/64bit compilation. I think Bullzip is a win32 application, so try moving the web application to a 32 bit application pool under IIS and give the application pool Write privileges on the folders mentioned above.

    Tuesday, July 26, 2011 9:32 AM
  • Hi ErionPc, As you suggested
    1.Application pool identiy set to administrator account
    2.website is running under same administrator account

    Actually in my situation i have html text that should be render to client as pdf.

    For that I used the streamwriter object to render as docx object into th same folder where stream writer object able to write the docx when application pool set to administrator and then need to convert as pdf and send to client by response.transmit file.but could not generate pdf.

    when i run "PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName)" winword.exe is started which was observerd from task manager and could not close it by itself and could not generate pdf as same.

    Thanks in Advance.

    Tuesday, July 26, 2011 7:24 PM
  • I don't know why but email alerting on new post doesn't work as it should on this system... I only just saw your reply.

    Anyway, I know that Bullzip has some issues when printing to pdf from html. Try printing the docx instead.

    Tuesday, December 6, 2011 10:08 AM
  • Hi Erion

    I don't have MSWord installed in my PC.So how can I achieve the same without MSWord installed?

    Thank you..


    Dibyasingh Tripathy

    Monday, March 12, 2012 2:03 PM
  • Hello,

    you need to install Microsoft Word Viewer in order to be able to use this solution.

    Monday, March 12, 2012 2:15 PM
  • I installed Word Viewer on my Windows Server 2008 R2 but when PdfUtil.PrintFile is called on my ashx, i get a file association exception.

    System.ComponentModel.Win32Exception: No application is associated with the specified file for this operation at System.Diagnostics.Process.StartWithShellExecuteEx(ProcessStartInfo startInfo) at Bullzip.PdfWriter.PdfUtil.PrintFile(String fileName, String printerName)

    I checked HKEY_CLASSES_ROOT and .doc is there as it should. What is the problem here ?
    • Proposed as answer by Peter Nejsum Monday, July 23, 2012 10:43 AM
    • Unproposed as answer by Peter Nejsum Monday, July 23, 2012 10:43 AM
    Wednesday, June 20, 2012 4:29 PM
  • for Gustavo A Borges

    You need to make sure that Word Viewer (if you don't have Ms Word) is the default program for opening .docx files.

    • Edited by ErionPC Monday, July 23, 2012 11:04 PM
    Saturday, July 21, 2012 3:37 PM
  • [spam message from Jeniffer Harbo deleted]

    for Jeniffer Harbo

    Interesting solution, but not free

    "... Free version is limited to 20 paragraphs. This limitation is enforced during reading or writing files."

    Someone had mentioned printing html to pdf on an earlier post. Use wkhtml2pdf for that. It's pretty good.



    Saturday, July 21, 2012 3:39 PM
  • Hello,

    I recently had a similar problem - I wanted to print a file from a web page and return the resulting pdf file to the user through the browser. My setup is Windows Server 2008 R2, and I documented my steps as follows:

    Printing PDF files

    In order to print pdf files from windows server through the iis_iusrs user, heres a few hints on how to do it:

    1. Install Bullzip PDF Printer on the server

    • During the installation process choose to download and install ghost script lite

    2. Share the newly installed Bullzip PDF Printer:

    • Start menu -> Devices and Printers
    • Right click the Bullzip PDF Printer -> Printer Properties -> Sharing
    • Click 'Change Sharing Options' -> check 'Share this printer'

    3. Add permissions for the IIS_IUSRS user for the folders containing your sourse and destination files

    • If you cannot find this user, try changing the location to the top domain

    When you use Bullzip to print a file, what it actually does is it briefly opens the file in the associated editor, starts a print job and closes the editor again. For this to work properly we need two things:

    • An editor which can open the source documents
    • An available printer to send the print job to

    4. In this project we just want to open .docx documents in wordpad so this is the file association we're going for here

    • Open the command prompt as administrator -> type the following to check the FTYPE for wordpad

    ftype | findstr /I wordpad
    

    • Make sure it looks something like this

    Wordpad.Document.1="%ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE" "%1"
    

    • This means that the FTYPE for wordpad is "Wordpad.Document.1"..we need this below
    • Type 'assoc .docx' to see the current file association with your source file type
    • The result should look like this:

    .docx=Wordpad.Document.1
    

    • In my case the result was "File association not found for extension .docx"...so lets set it

    5. Type the following in the administrator command prompt:

    assoc .docx=Wordpad.Document.1
    

    • Now try 'assoc .docx' again - this time it should be correct.

    6. Now we need to make sure the printer is available. I followed the steps given at this site:

    • http://www.biopdf.com/guide/examples/network_sharing/
    • Note 1: If you installed the free Bullzip PDF Printer you will not have the entry 'PDF Writer - bioPDF' but rather 'Bullzip PDF Printer'...this does not make any difference.
    • Note 2: When creating the new string values, you can choose the base location of your liking, meaning 'X:\somepath\Application Data' and so on..just make sure you create those folders if they do not exist.
    • Note 3: I skipped step 6-7 as I write these settings from my code using the API. Also skipped step 8.


    That was it - you should now be able to print pdf documents from the IIS_IUSRS user.

    • Proposed as answer by Peter Nejsum Monday, July 23, 2012 10:48 AM
    Monday, July 23, 2012 10:47 AM
  • Why make things more difficult with WordPad if you can just use Word Viewer as a default program for opening .docx files (provided you don't have the full version of Ms Word)?

    I didn't need any of the things described in steps 2, 4, 5, 6.

    Monday, July 23, 2012 11:04 PM