locked
How to convert .doc files to .docx in a sharepoint library programmatically. RRS feed

  • Question

  • Is there any possibility to Convert .doc files to .docx in a sharepoint document library.

    I have thousands and lakhs of .doc files and I need to automate to convert those .doc files to .docx with an automation script or powershell script or doing it programmatically.

    Can someone help me get through this.

    Thanks

    Gayatri

    Tuesday, March 19, 2013 5:41 AM

Answers

  • Hello Gayatri,

    You can convert files from doc to docx using following options

    Option 1 

    in bulk using  Office File Converter (OFC) and Version Extraction Tool. Please refer below url for reference - http://technet.microsoft.com/en-us/library/cc179019.aspx

    Option 2 - PowerShell

    please refer url -http://blogs.msdn.com/b/ericwhite/archive/2008/09/19/bulk-convert-doc-to-docx.aspx

    Convert DOC to DOCX using PowerShell

    I was tasked with taking a large number of .DOC and .RTF files and converting them to .DOCX. The files were then going to be imported into a SharePoint site. So I went out on the web looking for PowerShell scripts to accomplish this. There are plenty to choose from.

    All the examples on the web were the same with some minor modifications. Most of them followed this pattern:

    $word = new-object -comobject word.application
    $word.Visible = $False
    $saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat],”wdFormatDocumentDefault”);

    #Get the files
    $folderpath = “c:\doclocation\*”
    $fileType = “*doc”

    Get-ChildItem -path $folderpath -include $fileType | foreach-object
    {
    $opendoc = $word.documents.open($_.FullName)
    $savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
    $opendoc.saveas([ref]“$savename”, [ref]$saveFormat);
    $opendoc.close();
    }

    #Clean up
    $word.quit()

    After trying out several I started to convert some test documents. All went well until the files were uploaded to SharePoint. The .RTF files were fine but even though the .DOC fiels were now .DOCX files they did not allow for all the functionality of .DOCX to be used.

    After investigating a little further it turns out that when doing a conversion from .DOC to .DOCX the files are left in compatibility mode. The files are smaller, but they don’t allow for things like coauthors.

    So back to the drawing board and the web and I found a way to set compatibility mode off. The problem was that it required more steps including saving and reopening the files. In order to use this method I had to add a compatibility mode object:

    $CompatMode = [Enum]::Parse([Microsoft.Office.Interop.Word.WdCompatibilityMode], “wdWord2010″)

    And then change the code inside the {} from above to:

    {
    $opendoc = $word.documents.open($_.FullName)
    $savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
    $opendoc.saveas([ref]“$savename”, [ref]$saveFormat);
    $opendoc.close();
    $converteddoc = get-childitem $savename
    $opendoc = $word.documents.open($converteddoc.FullName)$opendoc.SetCompatibilityMode($compatMode);
    $opendoc.save()
    $opendoc.close()
    }

    It worked, but I didn’t like it. So back to the web again and this time I stumbled across the real way to do it. Use the Convert method. No one else seems to have used this in any of the examples but it is a much cleaner way to do it then the compatibility mode setting. So this is how I changed my code and now all the files come in to SharePoint as true .DOCX files.

    $word = new-object -comobject word.application
    $word.Visible = $False
    $saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat],”wdFormatDocumentDefault”);

    #Get the files
    $folderpath = “c:\doclocation\*”
    $fileType = “*doc”

    Get-ChildItem -path $folderpath -include $fileType | foreach-object
    {
    $opendoc = $word.documents.open($_.FullName)
    $savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
    $word.Convert()
    $opendoc.saveas([ref]“$savename”, [ref]$saveFormat);
    $opendoc.close();
    }

    #Clean up
    $word.quit()


    Tuesday, March 19, 2013 5:54 AM
  • Hi Gayatri,

    I understand that you want to use programmatic to convert doc file to docx, then SharePoint 2010 word automation services available with SharePoint server 2010 supports converting word documents to other formats, including Converting between document formats.

    For more detailed information, please refer to

    http://msdn.microsoft.com/en-us/library/office/ff181518(v=office.14).aspx

    Best Regards.


    Kelly Chen
    TechNet Community Support

    Wednesday, March 20, 2013 6:17 AM

All replies

  • Hello Gayatri,

    You can convert files from doc to docx using following options

    Option 1 

    in bulk using  Office File Converter (OFC) and Version Extraction Tool. Please refer below url for reference - http://technet.microsoft.com/en-us/library/cc179019.aspx

    Option 2 - PowerShell

    please refer url -http://blogs.msdn.com/b/ericwhite/archive/2008/09/19/bulk-convert-doc-to-docx.aspx

    Convert DOC to DOCX using PowerShell

    I was tasked with taking a large number of .DOC and .RTF files and converting them to .DOCX. The files were then going to be imported into a SharePoint site. So I went out on the web looking for PowerShell scripts to accomplish this. There are plenty to choose from.

    All the examples on the web were the same with some minor modifications. Most of them followed this pattern:

    $word = new-object -comobject word.application
    $word.Visible = $False
    $saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat],”wdFormatDocumentDefault”);

    #Get the files
    $folderpath = “c:\doclocation\*”
    $fileType = “*doc”

    Get-ChildItem -path $folderpath -include $fileType | foreach-object
    {
    $opendoc = $word.documents.open($_.FullName)
    $savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
    $opendoc.saveas([ref]“$savename”, [ref]$saveFormat);
    $opendoc.close();
    }

    #Clean up
    $word.quit()

    After trying out several I started to convert some test documents. All went well until the files were uploaded to SharePoint. The .RTF files were fine but even though the .DOC fiels were now .DOCX files they did not allow for all the functionality of .DOCX to be used.

    After investigating a little further it turns out that when doing a conversion from .DOC to .DOCX the files are left in compatibility mode. The files are smaller, but they don’t allow for things like coauthors.

    So back to the drawing board and the web and I found a way to set compatibility mode off. The problem was that it required more steps including saving and reopening the files. In order to use this method I had to add a compatibility mode object:

    $CompatMode = [Enum]::Parse([Microsoft.Office.Interop.Word.WdCompatibilityMode], “wdWord2010″)

    And then change the code inside the {} from above to:

    {
    $opendoc = $word.documents.open($_.FullName)
    $savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
    $opendoc.saveas([ref]“$savename”, [ref]$saveFormat);
    $opendoc.close();
    $converteddoc = get-childitem $savename
    $opendoc = $word.documents.open($converteddoc.FullName)$opendoc.SetCompatibilityMode($compatMode);
    $opendoc.save()
    $opendoc.close()
    }

    It worked, but I didn’t like it. So back to the web again and this time I stumbled across the real way to do it. Use the Convert method. No one else seems to have used this in any of the examples but it is a much cleaner way to do it then the compatibility mode setting. So this is how I changed my code and now all the files come in to SharePoint as true .DOCX files.

    $word = new-object -comobject word.application
    $word.Visible = $False
    $saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat],”wdFormatDocumentDefault”);

    #Get the files
    $folderpath = “c:\doclocation\*”
    $fileType = “*doc”

    Get-ChildItem -path $folderpath -include $fileType | foreach-object
    {
    $opendoc = $word.documents.open($_.FullName)
    $savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
    $word.Convert()
    $opendoc.saveas([ref]“$savename”, [ref]$saveFormat);
    $opendoc.close();
    }

    #Clean up
    $word.quit()


    Tuesday, March 19, 2013 5:54 AM
  • Hi Gayatri,

    I understand that you want to use programmatic to convert doc file to docx, then SharePoint 2010 word automation services available with SharePoint server 2010 supports converting word documents to other formats, including Converting between document formats.

    For more detailed information, please refer to

    http://msdn.microsoft.com/en-us/library/office/ff181518(v=office.14).aspx

    Best Regards.


    Kelly Chen
    TechNet Community Support

    Wednesday, March 20, 2013 6:17 AM
  • how do you get the convert method?  I'm stuck on $word.Convert() is not a supported method.  Get-member doesn't show it.
    Tuesday, November 12, 2013 1:31 AM