locked
Why Can't Windows Search Find Certain Things?

    General discussion

  • Since several versions of Windows, now, Windows Search hasn't really been a "search" at all.

    It's been well-researched and documented with Windows 7 that the current implementation is geared toward producing a "fuzzy" results from only things Microsoft things we will want to search for in only some kind of files.  And now we see that this hasn't been improved in Windows 8...

     

    For example some file types are ignored, and still others have incomplete access routines.  This shows Windows Search missing a simple word "tax" in a .log file and an old Word document:

     

     

    These problems are not new.  

    My question is this:  Why isn't Windows Search something that can be relied upon?

     

    Certainly there are 3rd party tools (one called grepWin is shown above actually finding the info), but wouldn't it be better if when we typed things into the search box at the upper-right corner of Explorer that we could implicitly trust the results?

     

    -Noel

    Thursday, December 22, 2011 3:14 PM

All replies

  • By the way, to try to technically answer my own question, which I asked in the title of this thread, the following is my understanding.  I would greatly appreciate it if anyone who undersdtands the technical details better than I would correct anything here I've gotten wrong, or augment this info.

     

    The Search/Indexing process relies on modules specific to each file type (keyed on file extension) to retrieve search/index information from files.  I believe these are called IFilters.  There is an impressive list of IFilter modules supplied with Windows, each of which provides search engine access to the file types it understands.

    But the list does NOT cover every possible file extension - and (here's the head-shaking part) there doesn't appear to be a "catch-all" fallback strategy at all!  So, considering my examples above, there is apparently no IFilter module supplied by default that reads search/index information from .log files, and apparently the .doc handler that IS supplied by Microsoft apparently doesn't know how to read from older Microsoft Word 6.0 files.

    You might imagine that Microsoft should provide a .* fallback module that, if no other IFilter deals with a file would just look at text from a file, but it just doesn't seem to exist.  And of course you'd think the modules Microsoft does supply (like the .doc handler) should read all of their own formats.

    Further, the existing IFilter modules retrieve only things someone somewhere in Microsoft has deemed worthy of searching for.  Thus you might type tax in the box and get some matches because it's a plain text word, but if you type flätsəm or some other string with characters the IFilters choose to ignore it will not find any matches.  "Trash" words like "and" and "or" are also typically ignored.  THIS is why I say that Windows Search is not a search at all.

     

    Let's go a bit further...  Consider, now, that what you get out of the box by default is an indexed "search".  That means that Windows spends its time working in the background, going through the subset of files on your disk that actually have IFilters, pulling out the words they think you might want to search for and ignoring the ones that they just know you'll never search for.  But it gets worse; unless you change configuration, some parts of the file system are completely ignored by the indexing process - this is why there are settings for searching "system files", for example.

    Microsoft apparently doesn't index just EVERYTHING because, well, an index on your hard drive that contained your entire hard drive contents might actually exceed the size of your hard drive, right?

    Okay, you might say, but indexing makes my results come out faster than if I had the computer actually open all those hundreds of thousands of files.  That's good, right?  My answer is this:  Speed is great!  But what about when the index database becomes corrupted, incomplete, or just plain wrong?  You know there are actual buttons to allow you to "Rebuild Index", right?  Why do you think those are included?  People do report index corruption.  I've read threads on the Microsoft forums where people have to trigger index rebuilds regularly in order to get any utility at all from it.

     

    Thing is, most people naively expect to be able to type things into the box and find all the files that contain those things.  It may be that in real life a "search" doesn't always turn up everything being searched for.  A search for a lost pair of eyeglasses may fail.  But computers are especially good at not getting distracted or doing incomplete work.  Unless they're programmed to do so.

     

    Now keep in mind that Windows Search is becoming more and more a part of your Windows experience.

     

    Microsoft:  Please fix Windows Search for Windows 8.

     

    Thank you.

     

    -Noel

    Thursday, December 22, 2011 5:56 PM
  • Thinking that this might be related to Search Settings, I pressed Win- and typed Search.  By default this opened Search Apps, so I cursored down to Settings and hit Enter.  Then (I think) I picked one.   Anyway my Metro Control Panel is now open at Search.   Not what I want, so I do it again.   Now Search Settings is empty?  But this Control Panel "app" is still there.  How do I get rid of that and see more hits for my search?

    Somehow that eventually went away.   Anyway, by then I remembered what I was looking for Indexing Options.   E.g. is the  *.log  extension  indexed for content?  Picking that option from my Search got  COM Surrogate  COM Surrogate has stopped working.   And then  The remote procedure call failed.

    So, ignoring Metro, let's try what I really want to do from our old standby  Administrator: Command Prompt.   WTH?  The same thing!   YMMV.

    I think this all will boil down to some horrible UI decisions producing a horrible UX.  But testers don't get to question UI decisions and are just supposed to imagine what is WAD/BAD or not.  ; ]

    My guess is that there is a workaround for you but who knows what it will look like?

     

    Robert
    ---

    Thursday, December 22, 2011 5:59 PM
  • My guess is that there is a workaround for you but who knows what it will look like?

    For me, personally, because I will be using the Desktop and avoiding Metro as much as I can, I've already pictured the workaround above - 3rd party software (e.g., grepWin as shown).  Disabling indexing entirely also seems to be effective with Windows 7. 

    When using the Desktop-centric Windows 7, these things, along with development of good habits, create a decent workaround, as I have learned not to rely on Windows Search results for critical work, and when I *do* use it disabling indexing definitely increases the probability of the results being accurate.  For that I'm willing to wait a few seconds longer for those results.

    But we see Windows Search and indexing appearing integrated into more of Windows 8.  Some stuff may just not work with indexing disabled (on my list of things to do is to try it).  Will the stuff that doesn't work be something those who care about results accuracy really need?  Who can say?  I can certainly live without indexed social networking results.  But what else relies on this broken subsystem that we don't know about yet?

    This is no small issue.

    -Noel




    • Edited by Noel Carboni Thursday, December 22, 2011 6:17 PM clarified wording
    Thursday, December 22, 2011 6:12 PM
  • The "technical" reason is indeed IFilters and deliberately ignored locations. For "most" people, it's deeply problematic if search "finds" things in TEMP or other Temporary document copies (because the user opens it, saves it, then the system cleans it up and they lose work). It's also confusing if it starts throwing up log files and other system file they have no idea about when they're just trying to find "their" stuff. What's more people want to find things inside files that aren't purely text based and IFilters provide a solution to this, since they can effectively translate the query into something more appropriate for the type of file they're dealing with.

    If you find yourself really suffering because of this, there are widely documented ways of attaching the plain text IFilter to other file types (such as .log) or even to all file types (.*), but obviously this leads back to the situation you often saw in XP where random binary files start showing up in search simply because they happen to contain a sequence of bytes that match what you're trying to find. And, as you say, you can add any location or even an entire drive to be indexed if you so choose (indexing the whole drive and all files won't end up bigger than the whole drive, but arbitrary operations on files you'd probably never want to find will cause them to be re-indexed again, impacting performance for very little gain)

    Thursday, December 22, 2011 10:30 PM
  • If I search a certain file by name, I still prefer the good old command line running As administrator and executing something like DIR filename /s /a from the root of the drive or an folder, in which the file potentially exists.

    Windows Search is powerful, but not really reliable, since by default system folders and hidden folders are ignored. At least with Windows 7 it can now also find files in Downloaded Program Files folder, which was an issue in Vista.

    Best greetings from Germany
    Olaf

    Thursday, December 22, 2011 10:39 PM
  • Thanks for your thoughts, Andy and Olaf.

    The "technical" reason is indeed...

    I guess the part where a system is made "idiot proof" in a way such that it becomes all but useless to "non-idiots" is a basic irritation of mine...  Perhaps I'm naive, but I believe a good system should be able to support everyone at every level of need.

    Why not call the filtered/indexed thing they developed something like "QuickLocate" or similar, and ALSO provide an exhaustive true "Search" that COULD be configured for use in the same box?  It's not like Microsoft has never provided exhaustive searching facilities.  Back in XP, before the dog...

    I would be interested in more info on the tweaks to make IFilters see into files not presently provided-for... I'll definitely do some searching.

    With indexing disabled, searching the actual files at the time of the query (still through IFilters) is already possible, and I certainly wouldn't mind waiting a little longer for more complete results.  But this doesn't really cover the invisibility of "unlikely to be searched for" strings or blind spots in existing IFilters, does it?  Or are you thinking one could swap out the whole shootin' match and just have a .* IFilter that can find everything?

    -Noel


    • Edited by Noel Carboni Friday, December 23, 2011 1:35 AM corrected misspelling
    Friday, December 23, 2011 1:34 AM
  • Windows Search is powerful


    Doesn't seem to be.   It only feigns generality and implements it badly.   E.g. a  dir/a/b/s  even beats a   name:  search  which is started ahead of it!   You can't even refine the name:  search, e.g. using a regular expression to make it equivalent to the  dir/a/b/s  search.   And you can't turn off its irritating "search while I type" feechur.   (All of the nam files on my computer must really be on a quick find list.)   Considering how poor the rest of its performance is that is maddening.   Even the WS4.0 for XP allowed that irritation to be avoided.   More annoying than XP's Search Companion's dog pawing the ground with its head down low or its wizard nodding moronically--then at least something useful was supposedly being accomplished--albeit slowly.   Plus those annoyances have been succeeded by WS overly active green bar making me think if fewer cycles were spent on it I might get my results faster.   ; ]

     

    ---

    Friday, December 23, 2011 2:43 AM
  • You can see from the image above that the string "tax" was indeed found in the .txt file, so exclusion of searching in that particular folder is not the issue in this case.  But you're right - it could be under other circumstances.

    There is no PerceivedType value at all listed under the HKEY_CLASSES_ROOT\.log key.

     

    By the way, an attempt to show the list of indexed files in the Advanced section of the Windows 8 DP Indexing Options dialog fails.  The whole dialog just exits with an 0xC0000005 Application Error in the Application event log.  But assuming Windows 8's defaults are like Windows 7's, .log would not be listed there, explaining why it's not searched.  I chose .log because a geek like me sometimes wants to search log files, and it happens .log is the file type that originally helped open my eyes to the things Windows Search will miss.

     

    I realize you might be trying to answer my question literally, d'AJ...  I am not so much worried over why .log isn't searchable, specifically  - that's explained by the missing .log entry in the list I couldn't get to show.  My point is to illustrate that the Windows Search implementation is - by definition - excluding things by a variety of factors on several levels. 

    Not what is expected of a serious computer search subsystem, in my opinion.

    And THAT is the crux of the problem.

     

    -Noel

     

     

     

    Friday, December 23, 2011 1:34 PM
  • This shows the sequence of the error I saw above...

     

     

    Here's the error that was logged:

     

     

    -Noel


    Friday, December 23, 2011 1:54 PM
  • With indexing disabled, searching the actual files at the time of the query (still through IFilters) is already possible, and I certainly wouldn't mind waiting a little longer for more complete results.  But this doesn't really cover the invisibility of "unlikely to be searched for" strings or blind spots in existing IFilters, does it?  Or are you thinking one could swap out the whole shootin' match and just have a .* IFilter that can find everything?



    The plain text search filter used by .txt files doesn't do anything particularly clever (As far as I know, it might edge case files that appear Unicode) so assiging it to everything (.*) ends up with things being indexed on pure binary content alone if a type specific IFilter isn't available. And, of course, you could always de-register any type specific IFilters that you didn't want, so it is possible to coerce the system into doing whatever you like (heck you could write your own IFilters to handle things exactly how you'd prefer if you really wanted to).
    Friday, December 23, 2011 4:55 PM
  • you could always de-register any type specific IFilters that you didn't want


    @ Andy

    Can you get into that dialog that both Noel and I are crashing on?   Or is there some other way besides that dialog to do that?

     

    ---

    Friday, December 23, 2011 6:51 PM
  • Thank you for giving me some ideas to try beyond what I've already tweaked, Andy!

    By the way, the .txt filter does seem to be able to find Unicode text.  I'm not sure if that's a new development; I know I tried it a year or so ago and failed to find some Unicode text at the time.

    -Noel


    P.S., Are you thinking of any special way to add a .* IFilter?  Note the following (attempted on Windows 7):

    Friday, December 23, 2011 6:52 PM
  • d'AJ, you are being a bit vague in your statements, "You can search .log files too, but you must set it appropriately in the registry" and "The fact that Win7/8 search uses the registry"...

    There is a dialog, pictured just above, where one specifies what file extensions are associated with what IFilters.  As far as I know, an extension missing from the list shown there will not be indexed/searched, and one in that list will use the IFilter identified there.  Hence the specific problem with .log - it's missing from the default list.

    Are the registry entries created through this dialog (e.g., "PersistentHandler" keys) what you are referring-to with regard to your statements?

    --

    I would only call the search "powerful" if it worked, frankly.

    As far as the syntax to find things, there is sophistication in the Advanced Query Syntax, yes.  But no matter how complex you can make the query, if the files or strings containing the data to be found just don't make it into the search to begin with, it's kind of worthless.

    I still don't see clear how to tie the Plain Text Filter to all file types.

    -Noel



    Friday, December 23, 2011 10:26 PM
  • Thanks for clarifying.

    We have no fundamental differences, just a difference in the way we view or express things.  Your thoughts are registry-centric, and my thoughts are UI-centric I guess.  But no matter, we ARE talking about the exact same things, and I understand fully what you mean.

    My system's basic search health is fine.  It's just the design of Windows Search that's at the core of the issue here.

    What I haven't figured out yet is what to set (in the registry) to tie in an IFilter (e.g., Plain Text Filter) that handles all file types.  I simply haven't had time to research this any further today.

    -Noel

    Saturday, December 24, 2011 3:40 AM
  • I don't care about the implementation under the covers, the simple fact is that .log was left out of the list by whomever made the list.  How it's implemented is inconsequential to that fact.

     

    It'd be troublesome having our searches populated with.log files, because they are not created by us, users. But the.c, .cpp, .h, etc... are created by us.

     

    I create .log files all the time.

    But even if I didn't, it's wrong to presume that the only things people will ever want to search for are in files they have directly created, and thus exclude all other information from any possible search.  Who thought of this?

     

    D'AJ, did I understand you correctly that on the one hand you agree with excluding .log files, yet on the other you believe it's a design flaw that the IFilters don't index/search every possible extended character?

     

    I'm just not seeing a distinction - both are arbitrary decisions made by the designers to trim things from the list of things you're allowed to search for.  Both are design flaws, plain and simple.

     

    -Noel

    Saturday, December 24, 2011 5:51 AM
  • Hello Noel,

    I suggest you send feedback on the Search issue you are seeing.

    Please submit feedback using the Windows Feedback Tool from the Connect Site associated with your Windows Developer Preview program. If you’re an MSDN subscriber, the information on how to join the Connect program is included on the download page where you installed Windows Developer Preview. There’s a link to the Connect site and an invitation code that you can click on to join using a Windows Live ID account. If you’re not an MSDN subscriber follow this limited use link to join the Connect program and then follow the steps here.

    If you are prompted for an invitation code, please enter the following key. MSDN-76H9-3CFP

    https://connect.microsoft.com/site1147/InvitationUse.aspx?ProgramID=7221&InvitationID=MSDN-76H9-3CFP

    Thanks for your help.


    Marilyn
    Saturday, December 24, 2011 3:44 PM
    Moderator
  • Thank you, Marilyn.  I've already sent feedback on this. 

    I'm sorry if it's embarrassing to Microsoft that I expose this half-baked implementation publicly here, but it's not like it's an unknown problem (there have been HUGE threads on this on the forums).  I think it deserves attention, and it's just the kind of thing the public expects to get better in a new release of Windows.

    I want to make sure people like Steven Sinofsky know this is an issue, and that basing further functionality in Windows 8 on (what I consider to be an) unfinished Search functionality is going to come back and bite him.

    Imagine how much more likely people in the know are to be to want to upgrade to Windows 8 if Microsoft were to make the Windows Search facility capable of being exhaustive and accurate in the new OS release.  Even if it's a non-default OPTION that serious computer users can choose, it would be better than a search that's not truly a "search" and is essentially useless as it is now.

    -Noel

    Saturday, December 24, 2011 5:31 PM
  • Thanks Noel for sending Feedback. I just wanted to make sure you had sent this information along.

    Have a great Christmas.


    Marilyn
    Saturday, December 24, 2011 5:46 PM
    Moderator
  • I want to make sure people like Steven Sinofsky know this is an issue, and that basing further functionality in Windows 8 on (what I consider to be an) unfinished Search functionality is going to come back and bite him.
    I agree. Having Metro in Windows 8 is great, but not everybody is going to like it or want to use it, or even need to use it.
     
    For them, Windows 8 needs to have some non-Metro features that show a real advantage to upgrading from Windows 7. Some of the new features (e.g. Hyper-V and mounting .iso files) are a bit geeky (I love them!), but faster booting and improved search will work for everybody. Maybe the ribbon Explorer and new Task Manager will help, but they do not do a lot for me, and are certainly not "need to upgrade" features.
     

    David Wilkinson | Visual C++ MVP
    Sunday, December 25, 2011 1:23 PM
  • Amen! I've used Copernic Desktop Search for years because it actually finds things WITHIN a wide range of content and then throws up a preview pane to get to one or more of the search terms in no time. I keep wishing MS would buy Copernic, a small company that does it better than the Redmond Giant! Google Desktop Search was a distant second but Google has dropped GDS development entirely. Finding information within PDF, RTF, DOC, etc. (like Copernic) would make Windows 8 "just work" rather than "this is Windows, work around the big obstacles in your way." 

    Trust me: if you use your PC for work and home (as most people do) you just want to search and find. Windows has never managed that except for file names, which is kind of useless when you recall something from years ago (or even months ago) and you want it searched. Even .eml files, which can run into the tens of thousands per year for a single user need fast search-and-find. 

    PLEASE make this a priority! (I've said this on MSDN forums many times before and don't expect any improvements. Pretty bad when your loyal base is THAT cynical about company responsiveness). 

    MS is taking chances and the antitrust has run out so "yes, throw in PDF reader!" but also do some of the stuff that now takes other third-party software to do. 

    Wednesday, December 28, 2011 5:39 AM
  • Pretty bad when your loyal base is THAT cynical about company responsiveness 


    Good point.  We're the ones actually trying to figure out how to use this software effectively, not bash holes in it or Microsoft.

    Yet it certainly does seem like Microsoft believes their design is "right" and our input is inconsequential.

    As it is the only two kinds of syntaxes I ever use any more with Windows Search are filename: and ext: (i.e., to locate files by their names or extensions).  For example:

    filename:camera

    ext:pdf

    Even then, though the syntax is supposed to be "powerful", I haven't quite figured out how to search for files/folders whose names BEGIN with camera (e.g., "Camera Raw.8bi" but not the "Digital Camera" folder).  And so I use grepWin when it matters.

    -Noel

    Wednesday, December 28, 2011 3:30 PM
  • The search is broken since Vista. I use http://www.voidtools.com/ to search for files. To search for content I use the search of my file manager.
    "A programmer is just a tool which converts caffeine into code"

    Wednesday, December 28, 2011 5:40 PM
  • Even then, though the syntax is supposed to be "powerful", I haven't quite figured out how to search for files/folders whose names BEGIN with camera (e.g., "Camera Raw.8bi" but not the "Digital Camera" folder).  And so I use grepWin when it matters.

    -Noel

    filename:~<Camera

    There's a pretty good set of docs here, frankly Microsoft should make finding all this info on their site a lot easier.

    Thursday, December 29, 2011 5:43 PM
  • Thanks for that.  I actually looked for the syntax before posting the above...  I had run across some pages on the advanced query syntax on the Microsoft site a while back, but didn't bookmark them and as you say I couldn't easily lay hands on them again.

    You make a good point in the context of this thread:  There is a drop down that shows information when you click in the search field...  Why not make it easy to directly access Help on the syntax right from there?

    And if they ARE going to hide the technical documentation, why not provide a switch to make the old syntax the default?  THAT would at least allow people to use it intuitively.

     

    -Noel


    My new eBook: Configure The Windows 7 "To Work" Options

    Thursday, December 29, 2011 6:42 PM
  • @Noel Carboni, the contents of whatever file aren't being returned in the query (in your example, the log file containing the word "tax"), can you for that file go to its Properties and see if the Index attribute is checked? If not, check it and try searching again and see if it's returning the file in the results. Of course it would also need to be added to the file types to index in Indexing Options control panel.

    But here's a cached copy of the AQS for the latest version, Windows Search 4 found using Internet Archive. Looks like some idiot redirected that link to the Windows XP page.

    • Edited by xpclient Thursday, January 05, 2012 4:11 PM
    Thursday, January 05, 2012 4:04 PM
  • Yes, in the example, all the right settings were set, including the "Index" attribute:

     

     

    The point is that, without going to extreme measures (adding entries to the filter list), Windows Search will NEVER find anything inside a .log file, simply because there's no out-of-the-box IFilter for the file type .log.  Sure, I could add .log, but what about .syq or .xyz?  This illustrates the larger problem that Windows Search is exclusive by design, not inclusive. 

     

    This problem alone could be eliminated by providing a default "everything not listed explicitly" IFilter (I referred to this as a .* filter up thread), possibly based on the Plain Text IFilter. 

    But even that does not consider the secondary issue that the various IFilters themselves don't provide everything that's part of the content of the file for indexing/searching.  The Plain Text filter does not consider some extended characters interesting for indexing/searching, and I can easily show that the content of an old Word 6 .doc file is simply not available via the Microsoft-supplied .doc filter.  Should I try to tie .doc to the Plain Text filter to work around that?

     

    The basic Windows Search design is such that not everything that's in all the files is available for searching by any combination of settings I can discern.  Thus, it is not really a Search at all - by design.

     

    Maybe this "quickly find something in the index to get this user off our back" strategy mollifies people who do nothing other than surf the web and play games, but it's not good enough for folks who expect their computer to return 100% complete, accurate results.

     

    -Noel


    Detailed how-to in my new eBook: Configure The Windows 7 "To Work" Options



    Thursday, January 05, 2012 7:05 PM
  • Oh, and by the way, if one DOES disable indexing (a good idea for a variety of reasons), EVEN THOUGH it's still possible to use Windows Search, the filter list (File Types tab) becomes unavailable entirely (this is true in Windows 7 - in Windows 8 the File Types dialog just crashes and burns no matter what you have enabled/disabled).

    That strikes me as just another clue as to the level of thinking that went into this turkey. I apologize for being blunt, but this half-baked Windows Search deserves no better.

     

    -Noel


    Detailed how-to in my new eBook: Configure The Windows 7 "To Work" Options

    Thursday, January 05, 2012 7:23 PM
  • The point is that, without going to extreme measures (adding entries to the filter list), Windows Search will NEVER find anything inside a .log file, simply because there's no out-of-the-box IFilter for the file type .log.


    I think there is more wrong than that, at least in the case of  .log.   E.g. I have added .log as Textfile (e.g. for Open Action) and this association shows up in the Indexing Options as if  .log  files may be treated as  Text files (e.g. it has a Notepad icon there).   By default  .log is not indexed for content.   But if I am to believe the Folder Options  this shouldn't matter if I set them up to say I am willing to forgo speed for completeness.   Ha!  For this search

    Search Results in Local Disk (C:)    ext:.log error

    in an elevated Explorer window I get:   No items match your search.

    While waiting for that to finish I tried to learn some PowerShell and got this result from an *unelevated* Powershell ISE window:

    PS C:\>
    > ((dir -R -Fi "*.log" -Fo) | foreach-object {select-string "error" $_}).count

    110334

    More convincingly perhaps I initially ran that without the count attribute and got flooded with lines of output--all before the Explorer Search had returned its useless result.

    And in case that first search was too ambiguous I reran it as

    ext:.log content:error

    while still getting   No items...

    Meanwhile I had modified my PowerShell script to

    (dir -R -Fi "*.log" -Fo -Name) | foreach-object {if ((select-string "error" "$_").count -gt 0) {"$_"}}

    which I think is probably a closer approximation to the list I was expecting from WS.

    Maybe I should be thanking WS for providing an incentive to improve my PS skills.   <eg>

     

    FWIW

    Robert
    ---

    Friday, January 06, 2012 2:04 AM