locked
Performance for SQL Server File Tables RRS feed

  • Question

  • Hi

    I have a file table database which has over 70,000 documents. I understand you can use full text indexing to query the documents, however if I already know the location of the document and I run the following query it keeps on going...

    SELECT stream_id
      FROM [dbo].[TBL]
      where is_directory = 0
      and file_stream.GetFileNamespacePath(1,0) = '\\doclocation\doc.docx'

    Setting statistics on I can see it is querying the table multiple times, I managed to get it to stop by adding TOP 1

    SELECT TOP 1 stream_id
      FROM [dbo].[TBL]
      where is_directory = 0
      and file_stream.GetFileNamespacePath(1,0) = '\\doclocation\doc.docx'

    But even that takes 2 seconds. Is there a better way to do this? Can I index this in some way?

    Thanks

    Sam

    Friday, September 9, 2016 10:32 AM

Answers

  • Give this query a shot. I would expect a non-clustered index seek by the primary key:

    SELECT stream_id
    FROM [dbo].[TBL]
    WHERE is_directory = 0
    	AND path_locator = GetPathLocator(N'\\doclocation\doc.docx');


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    • Marked as answer by SG_87 Friday, September 9, 2016 1:25 PM
    Friday, September 9, 2016 12:47 PM
    Answerer
  • But even that takes 2 seconds. Is there a better way to do this? Can I index this in some way?

    If you check the execution plan, the query is probably doing a full scan.  The default file table schema includes the following indexes:

    • nonclustered on (name)
    • nonclustered primary key constraint on (path_locator)
    • nonclustered unique constraint on (stream_id)
    • nonclustered unique constraint on (parent_path_locator, name)

    Specifying the first column of one of these indexes will likely improve performance considerably.  If you don't have many files with the same name, I would expect simply adding name to the WHERE clause will be adequate:

    SELECT stream_id
    FROM [dbo].[TBL]
    WHERE is_directory = 0
        AND file_stream.GetFileNamespacePath(1,0) = N'\\doclocation\doc.docx'
        AND name = N'doc.docx';


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com


    • Edited by Dan GuzmanMVP, Editor Friday, September 9, 2016 11:44 AM tring quoted text
    • Marked as answer by SG_87 Friday, September 9, 2016 12:25 PM
    Friday, September 9, 2016 11:43 AM
    Answerer

All replies

  • But even that takes 2 seconds. Is there a better way to do this? Can I index this in some way?

    If you check the execution plan, the query is probably doing a full scan.  The default file table schema includes the following indexes:

    • nonclustered on (name)
    • nonclustered primary key constraint on (path_locator)
    • nonclustered unique constraint on (stream_id)
    • nonclustered unique constraint on (parent_path_locator, name)

    Specifying the first column of one of these indexes will likely improve performance considerably.  If you don't have many files with the same name, I would expect simply adding name to the WHERE clause will be adequate:

    SELECT stream_id
    FROM [dbo].[TBL]
    WHERE is_directory = 0
        AND file_stream.GetFileNamespacePath(1,0) = N'\\doclocation\doc.docx'
        AND name = N'doc.docx';


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com


    • Edited by Dan GuzmanMVP, Editor Friday, September 9, 2016 11:44 AM tring quoted text
    • Marked as answer by SG_87 Friday, September 9, 2016 12:25 PM
    Friday, September 9, 2016 11:43 AM
    Answerer
  • Hi Dan

    Thanks for the help, it still does the same thing but I can do something like this and it is a lot quicker.

    DECLARE @TBL TABLE (StreamID UNIQUEIDENTIFIER, FileLocation NVARCHAR(MAX))
    INSERT INTO @TBL
    
    SELECT stream_id, file_stream.GetFileNamespacePath(1,0) as filelocation
      FROM [dbo].[TBL]
      where is_directory = 0
      and name  = N'doc.docx'
    
     SELECT StreamID
     FROM @TBL
     WHERE filelocation = N'\\doclocation\doc.docx'

    I did try using a CTE instead but I get the same issue where it constantly queries the table.

    Thanks

    Sam


    Friday, September 9, 2016 12:06 PM
  • Give this query a shot. I would expect a non-clustered index seek by the primary key:

    SELECT stream_id
    FROM [dbo].[TBL]
    WHERE is_directory = 0
    	AND path_locator = GetPathLocator(N'\\doclocation\doc.docx');


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    • Marked as answer by SG_87 Friday, September 9, 2016 1:25 PM
    Friday, September 9, 2016 12:47 PM
    Answerer
  • Thanks Dan that is even better.
    Friday, September 9, 2016 1:26 PM