none
How to Convert Large Amount of Word RTF files to DOCX Strict and ODF 1.2 RRS feed

  • Question

  • Dear Sirs.

    How to Convert Large Amount of Word RTF files to DOCX Strict and ODF 1.2 ?

    I have Database of Word RTF Files with complex formatting and conversion fidelity is very important for me.

    that why I have decided to choice word 2016 automation to do this task.

    Also I considering to rent azure VM install Word there and run the task. 

    Task will probably run for 4-5 Day.

    But there is concern how to manage sustainability of Long run Operation without COM Errors.

    How to manage this situation and this task ?


    Irakli Lomidze

    Monday, August 1, 2016 2:20 PM

All replies

  • >>>But there is concern how to manage sustainability of Long run Operation without COM Errors.

    How to manage this situation and this task ?<<<

    According to your description, developers can use Automation in Microsoft Office to build custom solutions that use the capabilities and the features that are built into the Office product. Although such programmatic development can be implemented on a client system with relative ease, a number of complications can occur if Automation takes place from server-side code such as Microsoft Active Server Pages (ASP), ASP.NET, DCOM, or a Windows NT service.

    So there is not possible to achieve your requirement.

    For more information, click here to refer about Considerations for server-side Automation of Office

    In addition if you have any feedback for Word, please feel free to submit them to User Voice:

    https://word.uservoice.com/

    Thanks for your understanding. 

    Tuesday, August 2, 2016 3:30 AM
  • Can you use 3rd party library to do the conversion? If yes, you can try Spire.Doc, a .NET word component for processing Word files.

    Document doc = new Document();
    doc.LoadFromFile("sample.rtf", FileFormat.Rtf);
    doc.SaveToFile("RTF2Docx.docx", FileFormat.Docx);

    Tuesday, August 2, 2016 9:13 AM
  • There is no ASP of multi user need. 

    This is single thread operation document by document. so it is not a server-side automation.

    Server just for temporary run large task.


    Irakli Lomidze

    Tuesday, August 2, 2016 3:44 PM
  • >>>How to Convert Large Amount of Word RTF files to DOCX Strict and ODF 1.2 ?

    According to your description, you could create vb script to save all rtf files in a folder as docx, refer to below code:
    Dim oWord, oDoc, fso, f
      
      Set oWord = CreateObject("Word.Application")
      Set fso = CreateObject("Scripting.FileSystemObject")
      Set f = fso.GetFolder("somefolderpath")
      
      For Each file In f.Files
        If Right(LCase(file.Name), 4) = ".rtf" Then
          Set oDoc = oWord.Documents.Open(file.Path)
          oDoc.SaveAs Left(file.Path, Len(file.Path) - 4) & ".docx", wdFormatDocument
        End If
      Next
      
      oWord.Quit
      Set oWord = Nothing
      Set fso = Nothing
    >>>This is single thread operation document by document. so it is not a server-side automation.

    Server just for temporary run large task.<<<

    Sorry for misunderstanding this meaning of task, I think that is Windows Task Scheduler.

    Thanks for your understanding.
    Wednesday, August 3, 2016 9:39 AM
  • Guys ! 

    I'm not asking sample VBA script. or C# interop code. (I'm not beginner on Office automation) 

    Microsoft word has problem on long run operation, there are some Strange DCOM error.

    Cloud you Explain that DCOM configuration or registry changes should I made before start kind of task.

    What is your exciting experience on mass conversions of documents. 


    Irakli Lomidze

    Wednesday, August 3, 2016 9:56 AM
  • >>>Microsoft word has problem on long run operation, there are some Strange DCOM error.

    According to your description, could you provide more information about this issue, for example error and screenshot.

    >>>Cloud you Explain that DCOM configuration or registry changes should I made before start kind of task.

    You may want to configure permissions for Word in the DCOM configuration tool, you could refer to this article about Microsoft Excel or Microsoft Word does not appear in DCOM Configuration snap-in

    Thanks for your understanding.
    Thursday, August 4, 2016 8:37 AM
  • I'm using Word Interop assemblies for conversion from C# desktop App Called X.

    All works fine until Application X is top on all windows, 

    if you switch to other app, after some time you getting error

    InteropServices.COMException (0x800A1066): Command failed under Teamcity

    According Internet Searches there is solution to modification DCOM Configuration Word : Document Microsoft Word 97-2003 to Interactive user.

    But I'm not still sure if it work interactive mode for 4-5 Day.

    So question for Microsoft team, what and How should I do.

    Thank you in Advance.


    Irakli Lomidze


    Friday, August 5, 2016 9:26 AM
  • Jacky.W

    this is 3rd party solution, What is fidelity of conversion ?

    Do you have real cases/ could you provide references ?


    Irakli Lomidze


    Friday, August 5, 2016 9:55 AM
  • >>>InteropServices.COMException (0x800A1066): Command failed under Teamcity

    According to your description and error, you could refer to this link below:

    http://stackoverflow.com/questions/22554347/interopservices-comexception-0x800a1066-command-failed-under-teamcity

    >>>this is 3rd party solution, What is fidelity of conversion ?

    Do you have real cases/ could you provide references ?<<<

    Could you provide more description about these sentences? That will help us resolve your issue.

    Disclaimer: This response contains a reference to a third party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.

    Thanks for your understanding.

    Monday, August 8, 2016 8:17 AM
  • Dear David

    1) I'm aware about this article on stackowerflow. Can you confirm that this is official Microsoft answer.

    If in Microsoft you were have to convert large amount of document to will you go thought this scenario ?

    Please give me a real case how you have convert legacy documents to new one (ODF or DOCX).

    2) About 3-rd party, I have replied to spice.doc answer. Yeas this components Spice.doc, aspose.word looks like very nice but there no assurance on conversion fidelity.

    3) OpenOffice (LibreOffice) have an options like headless installation on server, Is microsoft going to build edition like this. (e.g Word Object Model on Server)

    4) It will be nice that to Have Conversion Service for Microsoft Documents (Not only Word Automation Server that only convert to fixed format). 

    Please help me to solve my issue. If you were in my position, what approach you will choice and what tools you will use to do it.

    Thank you

    Best regards

     


    Irakli Lomidze

    Tuesday, August 9, 2016 8:56 AM
  • Hi Irakli Lomidze,

    >>>
    1) I'm aware about this article on stackowerflow. Can you confirm that this is official Microsoft answer.

    If in Microsoft you were have to convert large amount of document to will you go thought this scenario ?

    Please give me a real case how you have convert legacy documents to new one (ODF or DOCX).
    <<<

    Sorry, I am not able to confirm that this is official Microsoft answer.

    >>>
    3) OpenOffice (LibreOffice) have an options like headless installation on server, Is microsoft going to build edition like this. (e.g Word Object Model on Server)
    <<<

    The Open XML SDK 2.5 is a collection of classes that let you create and manipulate Open XML documents – documents that adhere to the Office Open XML File Formats Standard. Because the SDK provides an application program interface that lets you manipulate Open XML documents directly, you can do so without the need for the Office client products themselves in both client and server operating environments. The SDK is designed to let you build high performance client-side or server-side solutions that perform complex operations using only a small amount of program code.

    The Open XML SDK 2.5:

    Does not replace the Microsoft Office Object Model and provides no abstraction on top of the file formats. You must still understand the structure of the file formats to use the Open XML SDK 2.5.
    Does not provide functionality to convert Open XML formats to and from other formats, such as HTML or XPS.
    Does not guarantee document validity of Open XML Formats when you use the Open XML SDK 2.5 or if you decide to manipulate the underlying XML directly.
    Does not provide application behavior such as layout functionality in Word or recalculation, data refresh, or adjustment functionalities in Excel.

    >>>Please help me to solve my issue. If you were in my position, what approach you will choice and what tools you will use to do it.

    This workaround is that you could use Open XML SDK to embed contents of a rtf file into a docx file, refer to below:
    static void Main(string[] args)
    {
        CreateWordprocessingDocument("D:\\convertDocx.docx");
    }
    public static void CreateWordprocessingDocument(string filepath)
    {
        // Create a document by supplying the filepath. 
        using (WordprocessingDocument wordDocument =
            WordprocessingDocument.Create(filepath, WordprocessingDocumentType.Document))
        {
    
            // Add a main document part. 
            MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
    
            // Create the document structure and add some text.
            mainPart.Document = new Document();
            Body body = mainPart.Document.AppendChild(new Body());
    
            Paragraph para = body.AppendChild(new Paragraph());
            Run run = para.AppendChild(new Run());
            run.AppendChild(new Text("Create text in body - CreateWordprocessingDocument"));
    
            string altChunkId = "AltChunkId1";
    
            mainPart = wordDocument.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                    AlternativeFormatImportPartType.Rtf, altChunkId);
    
            // Read RTF document content.
            string rtfDocumentContent = File.ReadAllText("D:\\Scope.rtf", Encoding.ASCII);
    
            using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(rtfDocumentContent)))
            {
                chunk.FeedData(ms);
            }
    
            AltChunk altChunk = new AltChunk();
            altChunk.Id = altChunkId;
    
            // Embed AltChunk after the last paragraph.
            mainPart.Document.Body.InsertAfter(
                altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    
            mainPart.Document.Save();
        }
    }
    For more information, click here to refer about Welcome to the Open XML SDK 2.5 for Office

    Thanks for your understanding.
    Wednesday, August 10, 2016 8:15 AM
  • Dear David

    1) This not a conversion, tools just putting RTF file in Zip package.

    2) Code is nice but, this code is loose some formatting, on TextBox positioning also paragraphs styles.

    To back my main topic, how to address microsoft to provide conversion tools for the customers.

      

     


    Irakli Lomidze


    Wednesday, August 10, 2016 10:07 AM
  • >>>To back my main topic, how to address microsoft to provide conversion tools for the customers.

    As far as we know that Word can open RTF file then save as Docx file, if you want additional tool or feature, you could submit any feedback to Word UserVoice:

    https://word.uservoice.com/

    Thanks for your understanding. 
    Thursday, August 11, 2016 3:00 AM
  • Conversion between file formats is never guaranteed to achieve 100% fidelity in the conversion. For example, Microsoft has documented numerous conversion issues when converting between DOCX and ODT. See: https://support.office.com/en-us/article/Differences-between-the-OpenDocument-Text-odt-format-and-the-Word-docx-format-d9d51a92-56d1-4794-8b68-5efb57aebfdc

    Obviously, if your RTF files contain data constructs that aren't supported in the destination format, they won't be converted correctly, if at all. For more information about the RTF 1.8 specifications, see:
    http://www.microsoft.com/downloads/details.aspx?familyid=ac57de32-17f0-4b46-9e4e-467ef9bc5540. See also: https://support.microsoft.com/en-us/kb/924944

    Furthermore, because Word uses the active printer driver for document layout purposes, even the same doc or docx file can appear differently depending on what printer driver is being used - even on the same system. Even the Word version will affect the layout. Word 2010 & earlier use different justification algorithms than Word 2013 & later - and neither of these might match the justification used by whatever program was used to create the RTF files. Thus, even where 100% data fidelity is achieved, layout fidelity cannot be assured.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    PS: This appears to be a resumption of your previous thread on essentially the same topic: https://social.msdn.microsoft.com/Forums/office/en-US/15872b8c-195f-4ccb-bb58-a280c1d1129f/converting-word-rtf-documents-to-docx-in-word-20132016-using-wordcnvpxycnv-rtf2foreign32?forum=worddev
    • Edited by macropodMVP Tuesday, August 16, 2016 10:21 PM PS Added
    Tuesday, August 16, 2016 5:13 AM
  • Dear Sirs.

    Yes I opened this conversation again about word conversion. Because it is very important for us for developers.

    To install Sharepoint 2013 (Windows 2012 Server Plus SQL Server + some Services) just for file conversion you must agree that it is too heavy.

    I also Undertand that word automation services for converting PDF and XPS have rendering mechanism and it is require lost of resources. 

    But for conversion between Input format (DOCX, RTF, DOX) it is much simple task and I think possible with simple a library. 

    Please Microsoft Sharepoint team make even that part of the library accessible for developers without Sharepoint. It is also might be considered as part of microsoft open specification promise. :)

    Thank you in Advance.

    Irakli Lomidze



    Irakli Lomidze

    Friday, August 19, 2016 8:36 PM
  • To install Sharepoint 2013 (Windows 2012 Server Plus SQL Server + some Services) just for file conversion you must agree that it is too heavy.

    ... 

    But for conversion between Input format (DOCX, RTF, DOX) it is much simple task and I think possible with simple a library.

    Regardless of whether installing Sharepoint is 'too heavy', I've already given you a macro for the conversion that could have completed the task many times over by now. All it would have needed, had you said you wanted to save in the ODT format as well, is the addition of a single line of code. It seems you are fixated on a method instead of a solution.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    • Marked as answer by David_JunFeng Wednesday, August 24, 2016 2:39 PM
    • Unmarked as answer by Irakli Lomidze Thursday, August 25, 2016 10:43 AM
    Friday, August 19, 2016 11:08 PM

  • MVP Team's answers for this post is looks like, that team members  is just collecting scoring, and giving answer just for answers, and they are out of context.

    Paul

    Code you provide is a just word macros that, definitely I have not asked for.
    If you are unaware about issues of heavy COM automation task, or just do not want to answer to this question (e.g. according some policy) Please do not reply just for giving 'some' answer.

    AND I THINK IS NOT YOUR TASK TO JUDGE IF THE CUSTOMER FIXATED OR NOT ON A PROBLMEM. AND MOSTLY IN CASE WHEN YOU UNABLE TO PROVIDE A REAL SUPPORT. 

    To MS Team

    Question still not answered, Please explain why Microsoft team is avoid to give answer about large scale conversion tasks.


    Irakli Lomidze

    Thursday, August 25, 2016 10:43 AM
  • Did you ever actually try the macro I posted nearly 5 months ago??? As I have said, it could have completed the task many times over by now. On my testing it could process around 7500 documents per hour. And, for what it's worth, I have successfully run macros that required 10 or more hours continuous running (to process about 12 million items in text files and analyse them), so the 'issues' you claim to have with that approach are well and truly overblown.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Thursday, August 25, 2016 12:14 PM
  • Paul


    Yes of course I have tested code, result in terms of perfomance was same as in case of using word interpor assemblies. 

    Problem is not about working code for macro VBA or Word interop. Both code are working OK in terms of functionality. 

    As I know VBA is also based on COM automation.

    Problem is that unattended COM automation raise multiple COM errors. I also have found some workarounds posted on stackowerflow forums, on configuring DCOM service "word document 97-2003" as interactive users or some specific user. It seems work for 2-3 test hour without notable problems, but as I know it is not supported scenarios. 


    Additionally I have tested document conversion with sharepoint 2013 word automation services, I already wrote about that perfomance was 9-10 times faster than  word automation on client side (and wondering to know why and how). Event more that that test server was in virtual machine running on same developer workstation.

    It works and if I will not find better way, I have working and supported scenario convert my RTF database to DOCX using sharepoint. But my Idea that you call it "FIXATED" was a sime to have ability using conversion services as library.

    I will try to explain why it is imporant from my poiunt of view. 
    Microsoft does not support Word Object Model in unattended scenarios (not only  for servers) and refers us to use OpenXMLSDK. OK it is more or less acceptable but to maniulate documents on DOCX you need to convert them to that format. Here is a problem that part of developer library are missing. 
    And what I sad was that, using Sharepoint just for conversion task it is too havey. 

    I just try to convince Microsoft to endorse this idea, to build such conversion library. 
     
    And Finay You know it better than me, Than UK Gov, NATO, Congers Library and many other big entities are pusshing ODF as main document format.  
    This not directly are address my argument but world definilty need that conversion libraries.

    Irakli Lomidze

    Irakli Lomidze

    Thursday, August 25, 2016 1:37 PM
  • Hi Irakli Lomidze,

    Thanks for your sharing your experience with us, and I see Eric White's Blog also mentioned some same issues.

    https://blogs.msdn.microsoft.com/ericwhite/2010/12/02/understanding-the-three-approaches-to-office-development-using-vsto/

    I think that these information would help other community when they meet similar issue.

    If you want Word support this feature or library in future, you could submit any feedback to Word UserVoice:

    https://word.uservoice.com/

    Thanks for your understanding. 
    Friday, August 26, 2016 9:04 AM
  • Hi David_JunFeng

    I already submitted feedback on word.uservoice.com, hope Microsoft dev team will make attention on it.

    I also note that there are two other uservoice forums officespdev.uservoice.com and sharepoint.uservoice.com

    How do think should I duplicate my feedback submitted on word.uservoice to that forums too.

    Thank you

    Irakli Lomidze


    Irakli Lomidze

    Friday, August 26, 2016 2:14 PM
  • Hi Irakli Lomidze,

    According to your description, I think that you need not to submit duplicate your feedback, Microsoft engineer would  identify the root cause and resolve this issue as soon as possible.

    Your patience will be greatly appreciated.

    Thanks for your understanding.
    Tuesday, August 30, 2016 8:03 AM