need PDF files to be searchable in sharepoint but to retain the copy protection when the end user views it.
- Hi Folks
In our environment we have a number of PDF documents that have copy and extraction set to not allowed. Basically this is to stop the end user copying the content and pasting it somewhere else. The problem is that any PDF that has this isnt getting the content index by sharepoint. This seems fair enough to me but ideally we need the content to be searchable in sharepoint but to retain the copy protection when the end user views it.
Currently we use the adbode ifilter 9. Do we have any options?- Edited byMike Walsh MVPMVP, ModeratorTuesday, November 03, 2009 4:32 PMTitle re-written - PDF Security as Title is not appropriate for the Search forum
All Replies
- Hi DMCE
How are setting the copy and extraxtion to not allowed. Is it specific to user or a general rule?
Cheers - The PDF is created in Adobe Standard. Password security is added and "Enable copying of text, images, and other content" is not checked. There is no password on opening the document.
Thanks Nitin - I guess then no tool can extract the text out of that pdf. If its not possible to know if the content is text/image or anything, then Sharepoint can't crawl that even with IFILTER. IFILTER can crawl the text inside PDF files, but in your case its not text(no ways to know it is).
Just my opinion.
Cheers You can consider Information Right Management integration with SharePoint instead of password protected PDF files. With IRM, you can create a persistent set of access controls that live with the content while keep the content searchable in SharePoint.
Please read the following excerpt from http://msdn.microsoft.com/en-us/library/ms458245.aspx :
File Storage in Windows SharePoint Services
Because companies often have restrictions that require their files to be stored in nonencrypted formats, Windows SharePoint Services does not store files in encrypted, rights-managed file formats. However, Windows SharePoint Services calls an IRM protector to convert the stored file to an encrypted format each time a user downloads the file. Similarly, when a user uploads a rights-managed copy of a file, Windows SharePoint Services calls the appropriate IRM protector to convert that copy to a nonencrypted format before it is stored.
As a result, you don't need to create custom solutions to enable searching or archiving of document libraries where IRM is enabled. Storing the files in nonencrypted format ensures that the current Search indexing service is able to crawl content stored on the servers. Search results are already scoped to user permissions, so the user never sees search results that include content to which they do not have some level of access.
- Marked As Answer byGuYumingMSFT, ModeratorTuesday, November 10, 2009 3:34 AM
- Unmarked As Answer bydmce Tuesday, November 10, 2009 2:46 PM
- Unmarked As Answer bydmce Friday, November 13, 2009 10:24 AM
- Marked As Answer byGuYumingMSFT, ModeratorFriday, November 13, 2009 3:21 AM
- This is a solution we would ideally like to implement but its not something we will be doing anytime soon, so i still have the issue where a PDF that doesnt allow copy or extraction cant be indexed. It doesnt need a password to be opened though
to index the pdf though the file contents need to be read by the search crawler and if as you say "Enable copying of text" is not allowed then the crawler cannot get to the content.
GuYuming's answer is technically correct but IRM is a big deal to implement so it wouldnt help you in the short term :(
/afk- I appreciate GuYuming's answer is valid and i thankful for it, but its not really an answer to my question or problem. I spoke to Foxit who claim there filter will read the content of this type of pdf, so i will have a look at this.
- Thanks for sharing information with us!
- I can confirm the Foxit PDF filter will access the type of PDF described above, whereas the Adobe one wont.
According to Foxit PDF IFilter 1.0 For Microsoft Office SharePoint Server User Manual, it cannot read password protected PDF either.
- If you read my initial comments, i said the document doesnt need a password to open, it only has a password on it for the enabled security. I have tested this myself after i got confirmation from Foxit that it was possible.
With version 1 of the Foxit filter I can index PDF files that; a. dont have a password enabled for opening the document and b. have copy and extract protection enabled
Your answer isnt a answer to my question so again im going to unmark it.
Looking at this issue further, our current intranet makes use of Adobe ifilter version6 which also appears to be able to index the above type of PDF so im going to test this in SharePoint then contact Adobe to see what they say.


