none
Content Migration - Doc to sharepoint RRS feed

  • Question

  • Hi, I am working on a content migration project where in we are migrating documents  (.doc) to sharepoint. Here is the approach we are following.

    • Doc is converted into HTML.
    • We use some tool to convert HTML to ASPX.
    • ASPX is pushed into sharepoint.

    I am facing some problem WRT images in document. I face no issues when an image is copied in a document. As I can retrieve this image from the content folder ("FileName_Files"). However I cannot get the image if this is embedded (Insert--> Object--> selected an image and checked the "Display as icon" check box).  When I convert the doc file to HTML I get one.emz file and one gif file which is an icon image(in the folder filename_files). I want to retrieve the image which is embedded.
    I have tried unzipping this emz file with gzip but it again gave me the icon not the real image.

    Can any one help me in this? I just need to be able to retrieve the image.

    Thursday, May 12, 2011 5:18 AM

All replies

  • Hi Abinash Patra,

    I noticed that you you have checked the "Display as icon". You can take try to uncheck the checkbox to see whether you can get the real image.

    Hope the suggestion can help you.

    Best Regards,


    Bruce Song [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, May 13, 2011 8:01 AM
  • Hi,

    Thanks for the reply.

    I am still getting the same .emz file and not able to retrieve the original Image. is there any way I can retrieve the original image?

     

    Regards,

    Abinash

    Monday, May 16, 2011 11:39 AM
  • Hi Abinash,

    If we want to get the original Image, it may need to insert the images directly not to insert as the object. I have tested on my side and the approach can get the original image.

    Best Regards,


    Bruce Song [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, May 17, 2011 9:09 AM
  • Hi,

    Yes absolutely. If we copy and paste an image in a document - we can get it directly.

    However if the image is embedded we cannot get it directly. Now, this is the scenario I want to address. Reason being the documents which are being migrated are old documents and we may have this scenario in them. hence I wanted some help in this case. Don't we have any other way? Do we have to show them as constraint?

    Thanks,

    Abinash

    Wednesday, May 18, 2011 6:33 AM
  • Dear Abinash,

    How did you convert the doc file to HTML? Did you use the word save as function? I used it, but can't get the .emz file you mentioned.

    >>However if the image is embedded we cannot get it directly

    It could be by design, the fucntion save as html can't extract the images from the embed object. The images which was embed as objects are compressed, so you only get the icon not the original images.

    Hope you can figure out about this.

    Regards,


    Be happy.
    Wednesday, May 18, 2011 9:03 AM
  • Hi,

    I converted the doc file to docx and then converted the docx to html.

     

    Regards,

    Abinash

    Wednesday, May 18, 2011 9:37 AM
  • Dear Abinash,

    Did you use the word built-in save as method? If so, the embed images can't be exported as the orginal one.

    Here is a 3-rd part tool you can try:

    http://www.coolutils.com/DocX-to-HTML

    Hope this helps.

    Regards,


    Be happy.
    • Proposed as answer by Mike_HelpYou Wednesday, May 25, 2011 8:10 AM
    Thursday, May 19, 2011 9:47 AM