Visual Basic .msg format document convert to .pdf document

Answered Visual Basic .msg format document convert to .pdf document

  • Wednesday, August 08, 2012 8:09 AM
     
     

    Hi guys,

    Recently, I was tasked to convert MSG files to PDF files.

    I was trying to do this without using any 3rd party solution other than the programs included in MSOFFICE.

    This proved easy enough by just  converting the .msg file to html first then converting the html file again to .pdf. However this may be very time consuming if I am trying to convert gigs of of msg files (which I will have to do).

    Is there any way of converting MSG files directly to PDF without any 3rd party app. Or if there isn't one, which 3rd party app will allow me to do this programmatically?

    Any help will be appreciated.

    Thank you very much.

    Regards,

    Zong


    • Edited by Zong_Zofz Wednesday, August 08, 2012 8:27 AM
    •  

All Replies

  • Thursday, August 09, 2012 6:31 AM
    Moderator
     
     

    Hi Zong_Zofz,

    Welcome to the MSDN forum.

    >>I was trying to do this without using any 3rd party solution other than the programs included in MSOFFICE.

    I have no idea about what do you mean” the programs included in MSOFFICE”.  Do you mean that you also don’t use the Office namespace to deal with it? If don’t use Office namespace, I’m afraid that you can hardly to do this, because seldom software can open the .msg files.

    If you can use the Office namespace, a simple way is to SaveAs the msg file to Word file. Use the Word files to covert the PDF files.

    Hope this helps.


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us

  • Thursday, August 09, 2012 8:04 PM
     
     Answered Has Code

    See if this is helpful, needs a form with a button and a MultiLine TextBox, also a Reference to Microsoft.Office.Interop.Outlook and Microsoft.Office.Interop.Word - The file I used - "Acronis" was an HTML EMail that they sent me, which I saved as a .msg file

    Option Strict On
    Imports Microsoft.Office.Interop
    Imports System.IO
    Imports Microsoft.Office.Interop.Word
    
    Public Class Form1
        Dim MasterFileName As String = "D:\Acronis"
        Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
            Dim MSWordExportFilePath As String = MasterFileName & ".pdf"
            Dim MSWordExportFormat As WdExportFormat = WdExportFormat.wdExportFormatPDF
            Dim MSWordOpenAfterExport As Boolean = False
            Dim MSWordExportOptimizeFor As WdExportOptimizeFor = WdExportOptimizeFor.wdExportOptimizeForPrint
            Dim MSWordExportRange As WdExportRange = WdExportRange.wdExportAllDocument
            Dim MSWordStartPage As Int32 = 0
            Dim MSWordEndPage As Int32 = 0
            Dim MSWordExportItem As WdExportItem = WdExportItem.wdExportDocumentContent
            Dim MSWordIncludeDocProps As Boolean = True
            Dim MSWordKeepIRM As Boolean = True
            Dim MSWordCreateBookmarks As WdExportCreateBookmarks = WdExportCreateBookmarks.wdExportCreateWordBookmarks
            Dim MSWordDocStructureTags As Boolean = True
            Dim MSWordBitmapMissingFonts As Boolean = True
            Dim MSWordUseISO19005_1 As Boolean = False
            TextBox1.Clear()
            Dim oApp As New Outlook.Application
            Dim OMItem As Outlook.MailItem
            TextBox1.AppendText("Reading the .msg file" & vbNewLine)
            OMItem = CType(oApp.CreateItemFromTemplate(MasterFileName & ".msg"), Outlook.MailItem)
            TextBox1.AppendText("Writing as HTML file" & vbNewLine)
            Dim SW As StreamWriter = New StreamWriter(MasterFileName & ".html")
            SW.Write(OMItem.HTMLBody)
            SW.Close()
            oApp.Quit()
            OMItem = Nothing
            oApp = Nothing
    
            Dim wApp As New Word.Application
            Dim wdoc As New Word.Document
            TextBox1.AppendText("Reading the HTML file" & vbNewLine)
            wdoc = wApp.Documents.Open(MasterFileName & ".html")
            TextBox1.AppendText("Saving the PDF file" & vbNewLine)
            wdoc.ExportAsFixedFormat(MSWordExportFilePath, _
                MSWordExportFormat, MSWordOpenAfterExport, _
                MSWordExportOptimizeFor, MSWordExportRange, MSWordStartPage, _
                MSWordEndPage, MSWordExportItem, MSWordIncludeDocProps, _
                MSWordKeepIRM, MSWordCreateBookmarks, _
                MSWordDocStructureTags, MSWordBitmapMissingFonts, _
                MSWordUseISO19005_1)
            wdoc.Close()
            wApp.Quit()
            wdoc = Nothing
            wApp = Nothing
            TextBox1.AppendText("All done" & vbNewLine)
        End Sub
    End Class
    

  • Friday, August 10, 2012 1:57 AM
     
     

    Hi Mark,

    Sorry for my bad description, it should be just only use the MSOFFICE namespace to convert msg to pdf. But me convert it to rtf first instead. 

    • The usual steps are : pst (extraction) =>  (Step 1) .msg file -> (Step 2) .doc file / .rtf file -> (Step 3) .pdf file

    However this may be very time consuming if I am trying to convert gigs of of .msg files (which I will have to do).

    I am looking for a way to directly do this from a .msg file.  

    • The ideal steps : pst (extraction) => (Step 1) .msg file -> (Step 2) .pdf file

    Do you have any idea for the solution?

    Thanks and Regards,

    ZOng

  • Friday, August 10, 2012 2:04 AM
     
     

    Hi Devon_Nullman,

    Thank for your informative reply.

    However this may be very time consuming if I am trying to convert gigs of of .msg files (which I will have to do).

    I am looking for a way to directly do this from a .msg file.  

    • The ideal steps : pst (extraction) => (Step 1) .msg file -> (Step 2) .pdf file

    Do you have any idea for the solution?

    Thanks and Regards,

    ZOng

  • Friday, August 10, 2012 2:20 AM
     
     

    I spent hours researching this yesterday and other than performing a conversion like you are doing, 1st step, 2nd step then 3rd step to PDF format I could only find third party software that does this. I searched C++, C# and VB.Net (since I have conversion software for C++ to VB.Net and Telerik provides online C# to VB.Net converter). I couldn't find anything that didn't perform a three step method.

    Maybe it would be faster if you had one computer perform the first two steps and another computer perform the last two steps.


    You've taught me everything I know but not everything you know.


  • Friday, August 10, 2012 4:45 AM
     
     

    I'm confused now -

    originally I thought you had a bunch of .msg files....

    are you wanting to go through a pst file and extract items as .msg files ?

    This is best done using Outlook Interop as well. Do you already know what folders to get the items from ?

    As far as msg direct to PDF, I haven't found a way to do that.

  • Friday, August 10, 2012 8:30 AM
     
     

    Hey Monkeyboy,

    Thanks for your input. Is the 3rd party software you mentioned "foxit" or somthing similar? It is ok if we have to do it with 3rd party software as long as we can do it programmatically. 

    I was attempting to use the foxit printer driver to print out PDF's directly from outlook. Problem is the Outlook "PrintOut" method doesn't accept any arguments. I have to do this programmatically so I cannot afford to keep pressing "yes" and changing the settings to "print to file". I also came across "outlook redemption" but it does not seem to have an advanced print method.

    Thanks and regards,

    Zong

  • Friday, August 10, 2012 8:35 AM
     
     

    Hey Devon,

    Sorry for making you confused. The pst extraction is done already. I was just stating the methodology I wanted.

    Right now I want to change this:

    pst (extraction which is already done) =>  (Step 1) .msg file -> (Step 2) .doc file / .rtf file -> (Step 3) .pdf file

    to this

    pst (extraction which is already done) => (Step 1) .msg file -> (Step 2) .pdf file

    I am still trying to find out how.

    Thanks and regards,

    Zong

  • Friday, August 10, 2012 9:23 AM
    Moderator
     
     Answered

    Hi Zong_zofz,

    >>pst (extraction which is already done) => (Step 1) .msg file -> (Step 2) .pdf file >>I am still trying to find out how.

    The MS office namespace failed to short the three steps to two steps. The reason is only outlook can open the .msg file, but outlook can’t save the file as pdf. You need to add a conversation to Word file to save the pdf file.

    >>However this may be very time consuming if I am trying to convert gigs of of .msg files (which I will have to do).

    Personally, I think this conversation will not be cost lots of time. You don’t need to open the outlook and word, just use several methods provided by namespace. It seems no much time will be cost.

    Hope this makes it clearly.


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us

  • Friday, August 10, 2012 11:39 PM
     
     

    I have Adobe Acrobat (full) installed and was surprised to find that it installed an outlook "plugin" (or something) that has two options:

    Save selected message as PDF

    Create PDF from an Outlook Folder

    That being said - I still have no idea how to implement that in VB.NET

    I proposed your comment as an answer because it makes the most sense. It is fairly easy to read folders in Outlook, get each message and save as a .msg, or just open Word, make a document using the message.Body or message.HTML body and save as PDF. several gigs of messages will take time, but that would be the case no matter what.

  • Friday, August 10, 2012 11:48 PM
     
     

    Zong

    There were various third party softwares that did this but I'm not sure any of them can be used programatically although I will check the names for you and you can check them out if you want.


    You've taught me everything I know but not everything you know.

  • Saturday, August 11, 2012 12:08 AM
     
     

    Zong

    I gleaned the following 8 from various threads on MSG to PDF conversion;

    1. use Neevia Document Converter Pro . All you have to do is install it on your machine then submit the msg files to the input folder watched by the converter. It comes with an ActiveX interface in case you need to do it programmatically. Code sample on how to submit a file via code are on their website.

    2. to convert MSG to pdf files , try this tool http://www.pcvare.com/msg-to-pdf-converter.html . Converts emails to pdf files with excellent output

    3. I use Total Mail Converter to convert msg to pdf. Few weeks ago I had to convert all old email to pdf to archiving; tried several converters. This one did the job perfectly and placed all attachments into separate folders. They say it has command line support, though I didn't use it.

    http://www.coolutils.com/TotalMailConverter


    4. Get the Birdie pdf Converter" href="http://www.birdiesoftware.com/eml-to-doc/" MSG to PDF Converter...get the software from

    http://www.birdiesoftware.com/eml-to-doc/buy.html....

    5. A great tool for converting MSG and EML files to pdf is email Open View Pro. Not only can you convert MSG files to PDF, you can also extract and convert emails from PST files too! And, you don't even have to have Outlook installed to do it.

    You can check it out at: http://bitdaddys.com/emlopenviewpro.html . Be sure to check out the command line options for email to pdf conversion.

    6. Another approach is to use the Redemption from www.dimastr.com.

    7. a 3rd party  application called EZDetatch - www.techhit.com

    8. I think priasoft has some tools that will convert email to PDF, TIFF, PNG, ect. May help you out. www.priasoft.com



    You've taught me everything I know but not everything you know.

  • Saturday, August 11, 2012 8:01 PM
     
     

    From Outlook to PDF here takes just over 1 second per file, I converted 24 Mails into 4 MB of PDFs in about 30 seconds. The files ranged from 10KB to 100 KB but I don't think larger files would take a lot longer. So, it is not lightning fast but what are the alternatives ?

    Update : Adding 3 more "large" EMails - 3 to 4 MB each added 3 seconds to the total time

    This converts each message to either html or txt depending on its format, then uses MSWord to load and export as PDF, then deletes old files.


  • Monday, August 13, 2012 4:52 AM
     
      Has Code

    Hi Mark,

    Thanks for the feedback. Right now I think I am opening one instance of the program to do the conversion. I was trying to search for the namespace like you mentioned but all I found was Word.XMLNamespace and Word.XMLNamespaces. However this does not seem to allow me to open the document. This is the current code used to convert to pdf.

    'Word Object
    Dim wordApp As Word.Application = New Word.Application
    Dim wordDoc As Word.Document
    .
    .
    .
    'save it as PDF
    MItem.SaveAs(strFolderPath & "\" + MItem.SenderName.ToString + " " + rTime + ".html", Outlook.OlSaveAsType.olHTML)
    
    strPathPDF = strFolderPath & "\" + MItem.SenderName.ToString + " " + rTime
    
    wordDoc = wordApp.Documents.Open(strPathPDF + ".html")
               
    wordDoc.ExportAsFixedFormat(strPathPDF + ".pdf", Word.WdExportFormat.wdExportFormatPDF)
    
    wordDoc.Close()
    
    System.IO.File.Delete(strPathPDF + ".html")

    What the code does right now is that it first opens an instance of word. Then each individual html file, save it as a new pdf and closes the file. How can I make it such that it uses Namespaces?
  • Monday, August 13, 2012 5:00 AM
     
     

    Hi Mr.Monkeyboy

    Thanks for the informative third party software list. I have checked out some of them. Neevia is way out of the budget so its ok. I'm not entirely sure on how to use Redemption just yet. We will be trying to use it to bypass the security patch eventually.

    Regards

    Zong

     
  • Monday, August 13, 2012 5:07 AM
     
     

    Hi Devon,

    Thanks for the insight. I think what we are doing is similar to what you did for the conversion process. The timing does not seem to be much of an issue based on your timing program. We will be testing our program with a large PST file (2gb) soon. SO we shall see how long it takes.

    Regards,

    Zong

  • Monday, August 13, 2012 7:19 AM
     
     
    You're quite welcome Zong.

    You've taught me everything I know but not everything you know.

  • Monday, August 13, 2012 7:28 AM
    Moderator
     
     

    Hi Zong_zofz,

    >>How can I make it such that it uses Namespaces?

    I’m sorry I can’t get what you exactly mean, but all of your code is just using office namespace, not open the Word or Outlook software as well.

    >>SO we shall see how long it takes.

    I also look forward the result of your test.

    Have a nice day.


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us

  • Tuesday, August 14, 2012 1:50 AM
     
      Has Code

    Hey mark,

    The code I am using right now opens the Word one time at this line:

    Dim wordApp As Word.Application = New Word.Application

    Then I use the application to open the document as html:

    wordDoc = wordApp.Documents.Open("File.html")

    When I try and use the name space I see this:

    Dim wordNsp As Word.XMLNamespace


    Then I try to do this:

    wordDoc = wordNsp.Application.Documents.Open("File.html")


    I get an error.

    From what you are saying I can do the conversion without actually opening Word, However I am quite lost. Could you offer any guidance?

    Regards,

    Zong

  • Tuesday, August 14, 2012 6:21 AM
    Moderator
     
     

    Hi Zong_zofz,

    I’m confused that why you need to use Word.XMLnamespace? You need to check what word.XMLnamespace is before you used it. Please check this link:http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.xmlnamespace 

    And the XMLNamespace.Application Property (Word) seems to be used in VBA, not VB.Net:http://msdn.microsoft.com/en-us/library/office/ff192056.aspx

    >>From what you are saying I can do the conversion without actually opening Word, However I am quite lost. Could you offer any guidance?

    Sorry for my misleading, I mean you can force the word file isn’t showed to the user to short the time running the application.

    By the way, what is the result about your test about large file conversion?

    Have a nice day.


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us

  • Wednesday, August 22, 2012 6:04 AM
    Moderator
     
     

    Hi Zong_zofz,

    We haven’t heard from you for several days. I’d like to mark my reply as answer firstly. If you have any additional questions, you also can unmark the replay and post your question here. 

    Sorry for any inconvenience and have a nice day.


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us

  • Thursday, August 23, 2012 1:31 AM
     
     

    Hi Mark ,

    Sorry for my missing in action for this few days.

    Really appreciate you guys for help.

    For my conclusion is continue the 3 steps document format conversion. It seems there have no others way to do it, then i will delete all the unnecessary word file after it convert from html to pdf.

    Thank you very much.

    Best Regards,

    Zong

      

  • Friday, August 31, 2012 7:48 AM
     
     

    Hey Mark,

    I am working with Zong on this issue. Anyway we have recently completed the program. It can extract everything out of a pst and what it does is that in coverts to html and then coverts to pdf. However we have decided to not use Word for the following reasons,

    1. Word has the habit of occasionally closing down the whole word process ("WordApp") when we invoke the wordDoc.Close() method. This results in an RPC failure and thus causing the program to pause until the message box is closed. We even tried doing a workaround by putting in a check to ensure a word process is running but the same error kept appearing.

    2. Word persistantly tries to download pictures from the internet. We tried everything even literally taking the whole computer of the network (unplug) but it still did this. We encountered this problem when we converted an email which was an old news letter form HSBC. We found that the pictures do not exist anymore which resulted in an infinitely long "Connecting to server for information" message.

    In the end we switched to using a free program called wkhtmltopdf. It was a bit slower but it did not persistantly try and download pictures and also it could automatically resize to fit documents to some extent.

    As for timing, a 2.62 GB pst file was extracted of it messages, tasks and journal items together with all attachments. If the attachments were compressed files they were extracted. The messages were coverted to html and pdf. The file size ranged form 25kb to about 15mb each. The total time was approximately 2 h 30 min.

    Thanks for all your help guys, If you have any questions do post it here.

    Regards,

    Adeeb