We identify an issue with Word 2010 in our integration.
We have a tool which open files in Word, read some information from that and save the data in the database. We use MFC with the OleDispatch integration of Word. This tool open a big amount of documents (100.000 and more) and the performance of Word is slowing down as longer the tool runs.
We already implemented some workarounds in the past like restart of Word application every 1000 documents and reduced this number down to 250 documents only. This was necessary for some older versions of Word as it was crashing after a while. But now we have a new concern with Word 2010. But we are not sure with which version the problem starts.
We see that for small 1 KB files in Rich Text Format Word application needs sometimes 30 ms to open but the next document needs up to 600 ms. We was curious what happens in the background and start the Process Monitor to check the file access. We see that Word write for each file in the folder %USERPROFILE%\Application Data\Microsoft\Office\Recent a link to the opened file. But in our integration we specially tell Word to not store the document in the list of the recent opened files. Furthermore we see the index.dat file in this folder very often read and write consuming much IO. There was 40.000 files in the folder and even the explorer navigation was very very slow. After cleanup this folder the speed was back.
We did not analyze the complete IO until now but we see that word starts with IO of some 100 MB for a banch of 250 files (each 1kB) and go up to 1.5 GB after couple of runs.
Does somebody know why the file links are written in this folder even if the flag for recent files storage is set to FALSE?
Can we somehow prevent the creation of this files?
Is it something new in Word 2010 as we don't here about such issues with the previous version of Word?
And why are so much files written in one folder? We learned in the past the Windows has big performance issues with so many files in one folder and the files should be distributed in some subfolders. But why does Microsoft itself not follow this rule? And is there a cleanup mechanism implemented and can be activated to restrict the number of links in this folder?
Big thanks for any helpful comments. Currently we run a separate thread cleaning up this folder but this is only a hack and we are still looking for a regular solution.
- Edited by Paul73D Thursday, February 02, 2012 3:29 PM Change title
I'm going to engage an additional resource to help discuss your performance issue using your tool.
Until then, have you seen the following registry keys:
Name : NoRecentDocsMenu
Name : NoRecentDocsHistory
Name : ClearRecentDocsOnExit
Name : NoInstrumentation
Sincerely, Susan Microsoft Community Support
Thank you for the first hint.
Indeed the parameters above influence the usage of that folder. Finally only the NoRecentDocsHistory prevents writing of the files in that given folder and the parameter MaxRecentDocs limits the number of the files in that folder.
BUT I still wondering why the index.dat file is increased more and more in the case the MaxRecentDocs ist set to a specific value. The index.dat continue to grow and decrease the speed of Word as it looks the complete file is read and written by Word several times during a document is opened. We did not experienced this performance issues in the past before we start using Word 2010 (not sure about Word 2007 as the previous version of our application used Word 2003 only).
And even if the NoRecentDocsHistory prevent the write of the files in that folder it also prevents the user to use the Recent documents menue in Word. The complete list is empty. We cannot use it like this as user want to have his opened documents listed in Word recent file list.
And now the last comment. The original question remains: Why are links stored in this location even if we instruct Word to not store the file in the list of last used documents? Word does not display this file in the recent document list but the links on the the disk remains.
Is this a bug in the Word 2010 automation?
And how can we solve the issue or instrument Word to suppress the storage of the file links as well?
I tried to reproduce the issue using VBScript code but issue of multiple links getting generated didn’t reproduce. I am using Word 2010 (14.0.6024.1000) SP1 MSO (14.0.6112.5000) on Windows 7 Service Pack 1.
Set oApp = CreateObject("Word.Application")
For i = 1 To 50
oApp.Documents.Open "C:\Users\user-name\Desktop\Docs\Test1 - Copy (" & i &").docx", , ,false
To run above vbscript, save it as .vbs file and double click on it to run. After running above script on 50 Word documents, I could only see 4 links under %USERPROFILE%\Application Data\Microsoft\Office\Recent folder. links for all other files got created and deleted during processing automatically by Word.
Word is an end user application and is optimized for desktop users interacting with Word. It caches lot of information to respond quickly to desktop users. Usually the cache is cleared when Word is Idle (not getting any request from user) or when Word is closed. This way end user's productivity is not impacted.
Same applies for links stored under %USERPROFILE%\Application Data\Microsoft\Office\Recent folder. They are created for each document which is open in Word and excess links are deleted when Word gets a chance to process idle tasks or when Word instance is closed. In case of some error/crash this cache may not get deleted properly.
Please test after updating Word to the latest service pack/build. To suppress all the links which are getting created, only option I see is to either disable the links by overriding the NoRecentDocsHistory key or use an alternate approach (like Open XML SDK for word open xml files) to read data of Word document.
If you are using Word Open XML documents (docx), then you can use open XML SDK (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5124, http://msdn.microsoft.com/en-US/office/ee396255.aspx) to read the content of document without using Word object model. This should be fast and none of the above issues should occur.
Thank you for your response. Yes, you're right the number of documents is reflected in the links created in the ...\Office\Recent folder. But We see Word also reading and writing the index.dat file in the same folder several times for each file processed.
In our case this file was up to several hunderd kBytes big and growing. This was actually the issue with the performance. Maybe Word read and write the complete content of the file again and again. First after we deleted the file and rerun the operation the speed was back.
Why is the file index.dat not cleaned up if the folder cleanup is performed?
Using above VBScript code, I observed that size the of index.dat file keeps on increasing/decreasing (increases to 2 KB) when Word is processing and finally it stops at 1 KB when Word has finished processing. It doesn't get deleted though.
Thinking that document size is the factor, I created documents of more than 200 pages and run the script, I see the same behavior with respect to index.dat. Again I tried to modify the script to do some word processing but I couldn't reproduce the issue. Increasing the size of documents does reduce the performance but it still didn't increase the size of index.dat.
Set oApp = CreateObject("Word.Application")
For i = 2 To 50
set oDoc = oApp.Documents.Open("C:\Users\userName\Desktop\Docs\Test1 - Copy (" & i & ").docx", , ,false)
oDoc.Range.Text = "test"
- Edited by Shiv Khare - MSFTMicrosoft employee, Moderator Friday, February 17, 2012 2:15 AM formatting
I can reproduce it very well.
You have just to extend your example to more files with different file names.
I did ot for
- C:\Users\userName\Desktop\Docs\Test1 - Copy (" & i & ").docx
- C:\Users\userName\Desktop\Docs\Test2 - Copy (" & i & ").docx
- C:\Users\userName\Desktop\Docs\Test1 - Copy (" & i & ") - Copy.docx
and get the index.dat filled with all ~150 files opened.
In the header of file I see the last 4 files only (don't know why 4 than the MaxRecentDocs is set to 100)
[misc] test1 - Copy (47) - Copy.docx.LNK=0 test1 - Copy (48) - Copy.docx.LNK=0 test1 - Copy (49) - Copy.docx.LNK=0 test1 - Copy (50) - Copy.docx.LNK=0
This is followed by
[folders] test1 - Copy (2).docx.LNK=0 test1 - Copy (3).docx.LNK=0 ... test1 - Copy (50).docx.LNK=0 test2 - Copy (2).docx.LNK=0 test2 - Copy (3).docx.LNK=0 ... test2 - Copy (50).docx.LNK=0 test1 - Copy (2) - Copy.docx.LNK=0 test1 - Copy (3) - Copy.docx.LNK=0 ... test1 - Copy (50) - Copy.docx.LNK=0
for all files ever opened.
You should be able to reproduce this behavior as well.
I tried with multiple files and run the test several times but still the size of ‘index.dat’ remains within 5kb. Yes, it contains several entries but that's internal to Word processing.
- Edited by Shiv Khare - MSFTMicrosoft employee, Moderator Tuesday, February 21, 2012 3:30 PM format
I have an additional datapoint which I hope could merit some additional attention for this problem...
I just encountered a system with a 16Mb index.dat file (and 65K .LNK files) in the Recent directory where Office 2010 took more than 15 minutes to open a simple document.
To reproduce the symptoms you just need to change the above sample script to use unique file names (create copies and add a timestamp or GUID to the name). Since the index.dat is a simple .INI file, the test scenario above keeps recycling entries. Making the names unique makes the problem more obvious.
I observed that all files opened are added to both the [folders] section and the [misc] section. Because all names are unique, the [misc] section is maintained and kept at only 4 lines, but the [folders] section grows indefinitely.
Adding the docx extension to the list of Office extensions moves the file entries to the [docx] section, but Office 2010 still keeps adding the entries to the [folders] section without cleaning them up.
Also, sometimes the *.LNK file is not removed from the Recent directory, which can result in large number of orphaned files.
To test this, I ran the following test cycle with a document containing just the '=lorem(5,5)' demo text against a 64-bit Word 2010 SP1 with all updates installed:
- Copy the document to an unique name by generating a GUID.
- Open the document read-only and with 'add to recent' set to False.
- Print the document to a file.
- Close the document.
Repeating this loop 25K times resulted in a 150k index.dat file, and 16 .LNK files in the Recent directory that should have been removed as their records expired from the [misc] section in the index.dat file. The [folders] section contained all 25K entries. The frequency of .LNK files left abandoned did increment over time as the size of the index.dat file grew.
Word never reported an error during this test, although its memory use grew with approximately 1.5Mb per document it opened. After 25K open-print-close cycles the memory usage had gone up linearly to 4Gb.
I can reproduce this behavior consistently on both 32 and 64-bit versions of Office 2010, for SP0, SP1 and SP1 hotfixed to the current patch level, on both Windows 2008 and Windows 2008R2. Office 2007 on these platforms does not seem to have this problem; there the files are not added to the [folders] section.
With kind regards,
Stefan ten Hoedt