We identify an issue with Word 2010 in our integration.
We have a tool which open files in Word, read some information from that and save the data in the database. We use MFC with the OleDispatch integration of Word. This tool open a big amount of documents (100.000 and more) and the performance of Word is slowing down as longer the tool runs.
We already implemented some workarounds in the past like restart of Word application every 1000 documents and reduced this number down to 250 documents only. This was necessary for some older versions of Word as it was crashing after a while. But now we have a new concern with Word 2010. But we are not sure with which version the problem starts.
We see that for small 1 KB files in Rich Text Format Word application needs sometimes 30 ms to open but the next document needs up to 600 ms. We was curious what happens in the background and start the Process Monitor to check the file access. We see that Word write for each file in the folder %USERPROFILE%\Application Data\Microsoft\Office\Recent a link to the opened file. But in our integration we specially tell Word to not store the document in the list of the recent opened files. Furthermore we see the index.dat file in this folder very often read and write consuming much IO. There was 40.000 files in the folder and even the explorer navigation was very very slow. After cleanup this folder the speed was back.
We did not analyze the complete IO until now but we see that word starts with IO of some 100 MB for a banch of 250 files (each 1kB) and go up to 1.5 GB after couple of runs.
Does somebody know why the file links are written in this folder even if the flag for recent files storage is set to FALSE?
Can we somehow prevent the creation of this files?
Is it something new in Word 2010 as we don't here about such issues with the previous version of Word?
And why are so much files written in one folder? We learned in the past the Windows has big performance issues with so many files in one folder and the files should be distributed in some subfolders. But why does Microsoft itself not follow this rule? And is there a cleanup mechanism implemented and can be activated to restrict the number of links in this folder?
Big thanks for any helpful comments. Currently we run a separate thread cleaning up this folder but this is only a hack and we are still looking for a regular solution.
- Edited by Paul73D Thursday, February 02, 2012 3:29 PM Change title
I'm going to engage an additional resource to help discuss your performance issue using your tool.
Until then, have you seen the following registry keys:
Name : NoRecentDocsMenu
Name : NoRecentDocsHistory
Name : ClearRecentDocsOnExit
Name : NoInstrumentation
Sincerely, Susan Microsoft Community Support
Thank you for the first hint.
Indeed the parameters above influence the usage of that folder. Finally only the NoRecentDocsHistory prevents writing of the files in that given folder and the parameter MaxRecentDocs limits the number of the files in that folder.
BUT I still wondering why the index.dat file is increased more and more in the case the MaxRecentDocs ist set to a specific value. The index.dat continue to grow and decrease the speed of Word as it looks the complete file is read and written by Word several times during a document is opened. We did not experienced this performance issues in the past before we start using Word 2010 (not sure about Word 2007 as the previous version of our application used Word 2003 only).
And even if the NoRecentDocsHistory prevent the write of the files in that folder it also prevents the user to use the Recent documents menue in Word. The complete list is empty. We cannot use it like this as user want to have his opened documents listed in Word recent file list.
And now the last comment. The original question remains: Why are links stored in this location even if we instruct Word to not store the file in the list of last used documents? Word does not display this file in the recent document list but the links on the the disk remains.
Is this a bug in the Word 2010 automation?
And how can we solve the issue or instrument Word to suppress the storage of the file links as well?
I tried to reproduce the issue using VBScript code but issue of multiple links getting generated didn’t reproduce. I am using Word 2010 (14.0.6024.1000) SP1 MSO (14.0.6112.5000) on Windows 7 Service Pack 1.
Set oApp = CreateObject("Word.Application")
For i = 1 To 50
oApp.Documents.Open "C:\Users\user-name\Desktop\Docs\Test1 - Copy (" & i &").docx", , ,false
To run above vbscript, save it as .vbs file and double click on it to run. After running above script on 50 Word documents, I could only see 4 links under %USERPROFILE%\Application Data\Microsoft\Office\Recent folder. links for all other files got created and deleted during processing automatically by Word.
Word is an end user application and is optimized for desktop users interacting with Word. It caches lot of information to respond quickly to desktop users. Usually the cache is cleared when Word is Idle (not getting any request from user) or when Word is closed. This way end user's productivity is not impacted.
Same applies for links stored under %USERPROFILE%\Application Data\Microsoft\Office\Recent folder. They are created for each document which is open in Word and excess links are deleted when Word gets a chance to process idle tasks or when Word instance is closed. In case of some error/crash this cache may not get deleted properly.
Please test after updating Word to the latest service pack/build. To suppress all the links which are getting created, only option I see is to either disable the links by overriding the NoRecentDocsHistory key or use an alternate approach (like Open XML SDK for word open xml files) to read data of Word document.
If you are using Word Open XML documents (docx), then you can use open XML SDK (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5124, http://msdn.microsoft.com/en-US/office/ee396255.aspx) to read the content of document without using Word object model. This should be fast and none of the above issues should occur.
Thank you for your response. Yes, you're right the number of documents is reflected in the links created in the ...\Office\Recent folder. But We see Word also reading and writing the index.dat file in the same folder several times for each file processed.
In our case this file was up to several hunderd kBytes big and growing. This was actually the issue with the performance. Maybe Word read and write the complete content of the file again and again. First after we deleted the file and rerun the operation the speed was back.
Why is the file index.dat not cleaned up if the folder cleanup is performed?
Using above VBScript code, I observed that size the of index.dat file keeps on increasing/decreasing (increases to 2 KB) when Word is processing and finally it stops at 1 KB when Word has finished processing. It doesn't get deleted though.
Thinking that document size is the factor, I created documents of more than 200 pages and run the script, I see the same behavior with respect to index.dat. Again I tried to modify the script to do some word processing but I couldn't reproduce the issue. Increasing the size of documents does reduce the performance but it still didn't increase the size of index.dat.
Set oApp = CreateObject("Word.Application")
For i = 2 To 50
set oDoc = oApp.Documents.Open("C:\Users\userName\Desktop\Docs\Test1 - Copy (" & i & ").docx", , ,false)
oDoc.Range.Text = "test"
- Edited by Shiv Khare - MSFTMicrosoft employee, Moderator Friday, February 17, 2012 2:15 AM formatting
I can reproduce it very well.
You have just to extend your example to more files with different file names.
I did ot for
- C:\Users\userName\Desktop\Docs\Test1 - Copy (" & i & ").docx
- C:\Users\userName\Desktop\Docs\Test2 - Copy (" & i & ").docx
- C:\Users\userName\Desktop\Docs\Test1 - Copy (" & i & ") - Copy.docx
and get the index.dat filled with all ~150 files opened.
In the header of file I see the last 4 files only (don't know why 4 than the MaxRecentDocs is set to 100)
[misc] test1 - Copy (47) - Copy.docx.LNK=0 test1 - Copy (48) - Copy.docx.LNK=0 test1 - Copy (49) - Copy.docx.LNK=0 test1 - Copy (50) - Copy.docx.LNK=0
This is followed by
[folders] test1 - Copy (2).docx.LNK=0 test1 - Copy (3).docx.LNK=0 ... test1 - Copy (50).docx.LNK=0 test2 - Copy (2).docx.LNK=0 test2 - Copy (3).docx.LNK=0 ... test2 - Copy (50).docx.LNK=0 test1 - Copy (2) - Copy.docx.LNK=0 test1 - Copy (3) - Copy.docx.LNK=0 ... test1 - Copy (50) - Copy.docx.LNK=0
for all files ever opened.
You should be able to reproduce this behavior as well.
I tried with multiple files and run the test several times but still the size of ‘index.dat’ remains within 5kb. Yes, it contains several entries but that's internal to Word processing.
- Edited by Shiv Khare - MSFTMicrosoft employee, Moderator Tuesday, February 21, 2012 3:30 PM format
I have an additional datapoint which I hope could merit some additional attention for this problem...
I just encountered a system with a 16Mb index.dat file (and 65K .LNK files) in the Recent directory where Office 2010 took more than 15 minutes to open a simple document.
To reproduce the symptoms you just need to change the above sample script to use unique file names (create copies and add a timestamp or GUID to the name). Since the index.dat is a simple .INI file, the test scenario above keeps recycling entries. Making the names unique makes the problem more obvious.
I observed that all files opened are added to both the [folders] section and the [misc] section. Because all names are unique, the [misc] section is maintained and kept at only 4 lines, but the [folders] section grows indefinitely.
Adding the docx extension to the list of Office extensions moves the file entries to the [docx] section, but Office 2010 still keeps adding the entries to the [folders] section without cleaning them up.
Also, sometimes the *.LNK file is not removed from the Recent directory, which can result in large number of orphaned files.
To test this, I ran the following test cycle with a document containing just the '=lorem(5,5)' demo text against a 64-bit Word 2010 SP1 with all updates installed:
- Copy the document to an unique name by generating a GUID.
- Open the document read-only and with 'add to recent' set to False.
- Print the document to a file.
- Close the document.
Repeating this loop 25K times resulted in a 150k index.dat file, and 16 .LNK files in the Recent directory that should have been removed as their records expired from the [misc] section in the index.dat file. The [folders] section contained all 25K entries. The frequency of .LNK files left abandoned did increment over time as the size of the index.dat file grew.
Word never reported an error during this test, although its memory use grew with approximately 1.5Mb per document it opened. After 25K open-print-close cycles the memory usage had gone up linearly to 4Gb.
I can reproduce this behavior consistently on both 32 and 64-bit versions of Office 2010, for SP0, SP1 and SP1 hotfixed to the current patch level, on both Windows 2008 and Windows 2008R2. Office 2007 on these platforms does not seem to have this problem; there the files are not added to the [folders] section.
With kind regards,
Stefan ten Hoedt
Has anyone found a good work around for this issue yet? I haven't been able to find that Microsoft has fixed this in Word 2010 - or have I missed something? This issue is still causing a serious slowness issue for us and I need to find a resolution.
We sent this question to Support and got it ultimately bounced to an escalation engineer. Unfortunately the examination was limited to mitigation, not resolution.
Resolution of the Support call:
1. Use OS policies:
Note: it will affect all application and Explorer.
2. Use a login script to delete the contents of <user profile>\Application Data\Microsoft\Office\Recent when a user logs on.
3. Use the following registry entries to prevent shortcuts from being saved in <user profile>\Application Data\Microsoft\Office\Recent.
A) Delete the contents of <user profile>\Application Data\Microsoft\Office\Recent
B) Create the following registry key.
C) In the CacheSize registry key, create the following entries.
ENTRY NAME TYPE VALUE
OfficeFiles REG_DWORD 0
NonOfficeFiles REG_DWORD 0
Folders REG_DWORD 0
Regarding the property AddToRecentFiles set to False this will just not add the document to 'Recent Documents' under File menu, but it will not stop creating the LNK files.
Some additional notes from my own research:
- Setting the keys in 3 C) to something other than 0 does not resolve the problem.
- Making the 'recent' directory unwritable (with an explicit deny ACL) will cause Word to give up trying to log -- which also resolves the problem permanently. If I recall correctly replacing the directory with a file named 'recent' also worked.With kind regards,
Stefan ten Hoedt
I encountered a system with 700.000 entries in the index.dat file and 65.000 LNK files in the recent directory. Opening documents on that system took minutes and increased significantly with every subsequent document that was opened, printed and closed, in the same Word instance. When I got access to the server I killed a WINWORD process that had spent already >45 minutes opening a trivial 1-page document.
I included these numbers with the bug report 2 years ago, and also included a reproduction scenario that grew Word's memory footprint by 1.5M memory with every document opened (and immediately closed again). This information was reported and forwarded to an escalation team and developers.
However ... do not expect a fix for this issue ...
The feedback I got ultimately was that I was not the first to report this specific problem, but that fixing it was in the previous case rejected due to scope/high risk/impact (my paraphrasing). The recommended solution _IS_ mitigation.
Some additional experiences I ran into during testing which are worth mentioning:
- Office 2007 does not have this specific problem.
- The amount of LNK files orphaned in the recent directory seems to explode by concurrent use of Word automation (multiple processes connecting to their own Word instances under the same account and/or running in a service context).
- Make sure all Word instances are terminated before you clean out the recent directory. Otherwise running instances will happily rewrite the full index.dat file from memory.
With kind regards,
Stefan ten Hoedt
Thank you to still spot on the issue and keep the Microsoft support under presure to see this as an issue.
We realized the same issue happening in Office 2013. So the problem still exists and nobody fixed it sofar.
Is it a valid solution to have the recent folder unwritable? Would this have some side effects for the Microsoft Word or operating system itself?
I'm sorry to hear Office 2013 is also still affected by this problem. The good news is that the mitigation options are sufficient (for us) never to encounter the problem with Office 2013 at all.
Making the recent folder unwriteable worked for us as an immediate resolution while we researched other options -- Word kept happily working. We observed the errors in ProcMon, but apparently Word -- or the underlying COM controls -- see the policies mentioned in 1) -- ignore this. But we did not have to care about the impact on desktop experience/user workflow.
I did run only some of the mitigation scenarios and tests from a full desktop environment, where I did not notice any adverse effects. I did not care however about the administration of 'recent' documents as that is not relevant in our specific situation (other than that it didn't break anything obvious).
Ultimately we went for solution 3) above -- disabling explicitly the updating of the 'recent' folder by Office specifically -- as we could do that on the fly in the automation code (it only affects HKCU, where security is not an issue, it is not potentially overwritten by policies, and affects the 'smallest' number of applications) before instantiating Word. This solution had the least least risk/impact/dependencies/moving parts etc. The only technical disadvantage is that the key includes the Word version number and changes for every version.
All mitigations have an effect on the 'recent' administration used by Word/Windows/Explorer. If I recall correctly, it affects the administration of recent lists shown in the startup menu and similar contexts. Depending on the type of solution you could disable more or less functionality that a user might expect to see/depends on. The fact that the lists might not update anymore could be considered disruptive by some.
IMPORTANT DISCLAIMER: I have absolutely no clue what the unwritable 'recent' folder solution will do combined with roaming profiles or freshly created/copied domain profiles etc.
We specifically only have to deal with local profiles or dedicated domain profiles that are non-roaming and never cleaned up/modified/recreated automatically. No roaming users, Terminal Server, Citrix, application containers or other trickery that might affect the location and state of userprofiles in general and the 'recent' directory specifically. It works for us under those constraints. Your mileage might vary if you have different requirements/usage patterns. But odds are it will prove to be a horrible nightmare and fail in many unexpected ways if you roam outside these constraints.
With kind regards,
Stefan ten Hoedt
I just went through my old notes from this case again. The devil is in the details here...
I revoked _ALL_ access to the 'recent' directory (using a DENY ACL for the account running Word), not only WRITE access.
due to intricacies in our software configuration and the way DCOM instantiates Word instances, this was _NOT_ the 'recent' directory in the user profile directory used by the user logged on to the desktop.
The only conclusion I therefore can give, is that Word will be happy if it is not allowed to read/write the 'recent' directory.
I have no data points on other applications (including Explorer) if it is not allowed to read/write the 'recent' directory. Because of the separate 'recent' directories we never encountered that configuration.
Better to use the Registry settings.
With kind regards,
Stefan ten Hoedt
Again, I really appreciate the help you guys are providing. So far we are seeing a dramatic improvement in performance just in clearing out the folder - but as expected it started filling up quickly again yesterday. We will look at changing the registry settings you mentioned to see what long term mitigation solution will work for us.
It is really too bad that Microsoft doesn't have this fixed yet. But having awesome people like the two of you out there to help the rest of us is greatly appreciated!