none
Indexing Service missing content when HTML contains specific markup RRS feed

  • Question


  • I have a legacy ASP classic website (running a 32bit app pool) that uses an ADODB.Connection to query the indexing service (Connection string is provider=msidxs).

    The problem is very specific to HTML content and to Server 2008 R2. It appears to work correctly on 32 bit version of Windows Server 2008, and previously I used ixsso.Query object (which no longer exists in 2k8) which also picked up the content correctly.

    The query specifically uses

    ... AND Contains(Contents, 'Elephant')

    but does not return the HTML file.

    I boiled the problem down to a simple test. The following is indexed and returns correctly (as in the file returns in the query using Contains()):

    <html>
    <head>
    	<title></title>
    </head>
    <body>
    	Elephant
    </body>
    </html>
    

    However, the following content fails to be returned.

    <html>
    <head>
    	<title></title>
    	<link rel="stylesheet" type="text/css" href="/some/relative/path/file.css" />
    </head>
    <body>
    	Elephant
    </body>
    </html>
    

    I've tested that the file is definitely in the catalogue by removing all of the Contains() clauses. From indexing service's point of view either the content doesn't contain the unique string "Elephant" or it's blank.

    It seems that including the <link> tag somehow breaks the HTML parsing when building the indexed contents. If this is true, and I'm not missing something obvious, it seems that there's a bug.

     

    Friday, September 30, 2011 5:29 PM