locked
Equivalent of Materialized Views in SQL Server RRS feed

  • Question

  • Hi

    we are migrating  Oracle Database to SQL Server 2012 and is looking for solution, equivalent to Materialized Views in Oracle.

    we had tried to implement indexed View, but that was not possible, because the MVs have outer joins, subqueries, aggregate functions and also there is no column in it which can act as a Primary Key.

    Please Help to get a solution for this issue.

    Regards

    Joyasree Mondal

    Tuesday, October 14, 2014 12:49 PM

All replies

  • I believe the only SQL Server equivalent is an indexed view.  If the indexed view cannot be implemented because of the restrictions, then perhaps an actual table that gets refreshed at the appropriate interval.  Unfortunately, not every feature in every dbms has an exact equivalent in another dbms, so it seems that you may have to adjust your design. You can also try searching the forums for previous discussions.
    • Proposed as answer by Avijit Swain Monday, October 20, 2014 1:14 PM
    Tuesday, October 14, 2014 1:36 PM
  • Hello,

    The following resource may be useful:

    http://blogs.msdn.com/b/ssma/archive/2011/06/20/migrating-oracle-materialized-view-to-sql-server.aspx

    Hope this helps.

    Regards,

    Alberto Morillo
    SQLCoffee.com

    • Proposed as answer by Donghui Li Thursday, October 16, 2014 11:37 AM
    Tuesday, October 14, 2014 2:00 PM
  • Thanks For your reply.

    I went through the blog.

    but the MVs are having outer joins ,order by clause, sub-queries,aggregate function etc. 

    Regards

    Joyasree

    Monday, October 20, 2014 12:01 PM
  • Joyasree,

    sometimes the query can be rewritten to fulfill the indexed view requeriments. "Order by" clauses for example can be replaced by creating the index on the ordered columns.

    You can also consider adding computed columns in some cases.

    Generally speaking, there are many features in sql server that helps coping with differences between it and other RDBM systems. There may not be straightforward solutions in some cases, which will require you to have more in-depth knowledge, but it can be done.

    Monday, October 20, 2014 12:59 PM
  • There is no direct equivalent to Oracle MV in SQL Server.  That is an Oracle concept.

    SQL Server uses views differently than Oracle.  I would suggest testing a normal view and see if you actually need a materialized view.

    If it does not perform, your only option is to actually create a table which holds the data, either via triggers or other processes.


    Monday, October 20, 2014 1:00 PM
  • "Order by" clauses for example can be replaced by creating the index on the ordered columns.

    No no no no no. A view (indexed or otherwise), like a table, has no inherent order.  If you want the resultset of a query to be ordered, you MUST include an order by clause.

    Monday, October 20, 2014 1:16 PM
  • "Order by" clauses for example can be replaced by creating the index on the ordered columns.

    No no no no no. A view (indexed or otherwise), like a table, has no inherent order.  If you want the resultset of a query to be ordered, you MUST include an order by clause.

    See for yourself.

    You'll see that after creating a nonclustered index, as long as the scan is performed over this index, rows will be retrieved in the order they are physically stored, even if we dont explicitly order the results.

    CREATE TABLE TESTE (ID INT IDENTITY(1,1) PRIMARY KEY, COL1 VARCHAR(50), COL2 VARCHAR(50))
    INSERT INTO TESTE VALUES ('MNB', 'G'), ('A', 'T'), ('H', 'Y'), ('I', 'ASD'), ('SS', '2'), ('GF', 'K'), ('L', 'N')
    
    CREATE NONCLUSTERED INDEX IDX_TESTE ON TESTE (COL1) INCLUDE (COL2) WITH (DROP_EXISTING = ON)
    
    SELECT COL1 FROM TESTE
    
    SET IDENTITY_INSERT TESTE ON
    INSERT INTO TESTE (ID, COL1, COL2) VALUES (14, 'PD', 'O');
    INSERT INTO TESTE (ID, COL1, COL2) VALUES (13, 'FF', 'A');
    SET IDENTITY_INSERT TESTE OFF
    
    CREATE VIEW VW_TESTE WITH SCHEMABINDING
    AS
    SELECT COL1, COL2 FROM DBO.TESTE
    
    CREATE UNIQUE CLUSTERED INDEX [IDX_VWTESTE] ON [dbo].[VW_TESTE]
    (
    	[COL1] ASC
    )
    
    --DROP INDEX IDX_VWTESTE ON VW_TESTE
    
    SELECT COL1 FROM VW_TESTE

    Rows are physically stored in a logical order when you have indexes to enforce it.

    If I scan the clustered index, rows returned will be naturally sorted by the clustered index key column (ID), which is not even the order they were inserted (try inserting values with identity insert on):

    Now if I enforce the usage of the nonclustered index I created earlier, rows will be returned sorted by the nonclustered key column, because thats how rows are physically stored for that index.

    I did NOT use any kind of "order by" clause, anywhere. I just controlled the order in which rows are returned just by adding indexes ;)

    Monday, October 20, 2014 4:24 PM
  • > I just controlled the order in which rows are returned just by adding indexes ;)

    No you did not.  You just created a bug.

    Future changes to the data, statistics or the behavior of the query optimizer will eventually cause the rows to be returned in some other order. 

    Even forcing an index is not enough to guarantee the order.  There are many different ways the optimizer can use an index, which can result in different row orderings.  In particular an index can be scanned in logical order, following the key order in the leaf page forward or reverse page pointers.  Or an index can be scanned in allocation order, reading all the leaf pages in the order they appear in the space allocation maps.

    David


    David http://blogs.msdn.com/b/dbrowne/



    Monday, October 20, 2014 4:49 PM
  • > I just controlled the order in which rows are returned just by adding indexes ;)

    No you did not.  You just created a bug.

    Future changes to the data, statistics or the behavior of the query optimizer will eventually cause the rows to be returned in some other order. 

    David


    David http://blogs.msdn.com/b/dbrowne/


    SET NOCOUNT ON;
    SET IDENTITY_INSERT TESTE ON;
    DECLARE @COUNT INT = 20000
    WHILE @COUNT < 30000
    BEGIN
    INSERT INTO TESTE (ID, COL1, COL2) VALUES (@COUNT, CONVERT(VARCHAR(MAX), NEWID()), CONVERT(VARCHAR(MAX), NEWID()));
    SET @COUNT = @COUNT + 1;
    END
    SET IDENTITY_INSERT TESTE OFF

    SET NOCOUNT ON;
    SET IDENTITY_INSERT TESTE ON;
    DECLARE @COUNT INT = 10000
    WHILE @COUNT < 20000
    BEGIN
    INSERT INTO TESTE (ID, COL1, COL2) VALUES (@COUNT, CONVERT(VARCHAR(MAX), NEWID()), CONVERT(VARCHAR(MAX), NEWID()));
    SET @COUNT = @COUNT + 1;
    END
    SET IDENTITY_INSERT TESTE OFF

    EVEN if you have fragmentation at the leaf level, the higher level pages in the b-tree will ensure rows are returned in their correct order if you perform a scan.

    Monday, October 20, 2014 5:08 PM
  • Monday, October 20, 2014 5:16 PM
  • As David has indicated, you have created the same bug as the developers of the ERP product in the thread below.  No order by clause, no guarantee.  Your demonstration script vastly over-simplifies the variables that affect this situation.

    record ouput sort difference

    Monday, October 20, 2014 5:18 PM
  • That plan will return rows in order, since it's an ordered scan.  But your scan isn't guaranteed to scan in ascending order, and it's not guaranteed to start at the beginning and go to the end.  It may start in the middle and loop around the end.  This actually happens and it's called a Merry-go-round scan (http://technet.microsoft.com/en-us/library/ms191475(v=SQL.105).aspx).

    Also it's possible that you could get an unordered (allocation order) scan in certain isolation levels.

    Also it's possible that future behavior changes in the query optimizer change the plan.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Monday, October 20, 2014 5:26 PM
  • As David has indicated, you have created the same bug as the developers of the ERP product in the thread below.  No order by clause, no guarantee.  Your demonstration script vastly over-simplifies the variables that affect this situation.

    record ouput sort difference

    If what you say is true, then the MERGE JOIN algorythm would probably have never been invented in the first place because it would always require a sort operation, and it wouldnt be worth the effort implementing an algorythm that would be suboptimal in the majority of situations.

    CREATE TABLE TESTE2 (ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, COL1 VARCHAR(50), COL2 VARCHAR(50))
    
    SET NOCOUNT ON;
    SET IDENTITY_INSERT TESTE2 ON;
    DECLARE @COUNT INT = 10000
    WHILE @COUNT < 20000
    BEGIN
    INSERT INTO TESTE2 (ID, COL1, COL2) VALUES (@COUNT, CONVERT(VARCHAR(MAX), NEWID()), CONVERT(VARCHAR(MAX), NEWID()));
    SET @COUNT = @COUNT + 1;
    END
    SET IDENTITY_INSERT TESTE2 OFF
    
    CREATE NONCLUSTERED INDEX IDX_TESTE2 ON TESTE2 (COL1) INCLUDE (COL2)
    
    SET IDENTITY_INSERT TESTE2 ON;
    WHILE @COUNT < 30000
    BEGIN
    INSERT INTO TESTE2 (ID, COL1, COL2) VALUES (@COUNT, CONVERT(VARCHAR(MAX), NEWID()), CONVERT(VARCHAR(MAX), NEWID()));
    SET @COUNT = @COUNT + 1;
    END
    SET IDENTITY_INSERT TESTE2 OFF

    I'm not oversimplifying anything... this is a fact. Now if you want to add more variables to the equation (such as pretty much anything that opens more than one I/O thread), its a whole other story.

    Monday, October 20, 2014 5:30 PM
  • Ok this is a very interesting subject. Lets talk about merry-go-round scanning then. What would happen at runtime, for the execution plan I posted above, if another scan in either tables involved was already underway and the optimizer thought it was cool to join the scans?
    Monday, October 20, 2014 5:42 PM
  • >What would happen at runtime, for the execution plan I posted above, if another scan in either tables involved was already underway and the optimizer thought it was cool to join the scans?

    It's a streaming plan, so the client will get the rows as they are read.  You'd see the rows in order, but starting somewhere in the middle, looping around to the beginning.  Something like:

    ID          COL1                                               COL2
    ----------- -------------------------------------------------- --------------------------------------------------
    6           04DC9E5B-0C0C-49D1-B83C-51D98AF87D5C               760F4CA9-E8A8-4DF8-8DE4-DAF67383CF5D
    7           E5080B9C-F721-4C92-B607-16795697FBEA               BBF4F239-D686-4E7E-864C-C7A2185F855D
    8           FE02E152-9E21-4F1C-AD53-8E712A021371               B7897FB4-2922-4324-AF35-5E5E2F6E7639
    9           FB28B4C4-8D1A-46C5-B4BD-65C49AE6951E               A1B0A9BC-3645-46E9-A6CC-CE7847C6E7A1
    10          37ABBB3B-CDA0-46E4-AE86-DE5D08A2D76E               2B963DA4-DF24-483E-9688-2CEA47F807C8
    11          54D7B3E8-6C26-4B0E-A463-FAA88F5F1D93               C41F37CD-3E82-47DC-9D7B-3435557ECB85
    12          515FCA82-1B62-43C2-89BD-3B7B96A4A0D7               F65757B5-39AD-46E2-ABF0-2D7316FBEF66
    13          52C7BD9D-5B0A-425B-BAB5-872703EAD9D7               ED875D29-8CE3-4510-86D6-77F661E00A63
    14          7205EAB6-E6E7-4C35-BE77-2284068CBBD1               40F7B5C3-92E6-40CB-85A6-F2EFC46F59AF
    15          E4784118-3E86-4335-B741-69CDFF89B23A               262F8C28-EC92-4BCB-9D96-BF8FFA1B78CB
    16          4F82DB7C-12FF-4747-B38E-39D4418988A7               F1BCC6E1-93DE-4700-8B59-1D70789047C2
    17          F0A03090-2ECC-448D-83F9-3A09F7779199               5E29CA72-BEEC-480A-89DE-2C52D986AB67
    18          16C6F5EC-7E69-4170-B577-E60912EA64B7               546D6664-FCE3-4D14-9DD1-83E0389839A8
    1           B2D62494-EA28-47E6-8861-2C162EC04E62               2897D64F-E5AB-49CD-A361-741CDEB96844
    2           B56F607C-5832-4A17-8FCC-499352B436B7               C54BA46A-316F-4CE3-BC45-548AF138BCD6
    3           96ABD12B-FA9E-4A2E-844F-818EE332ADCE               A4472A26-A8E8-43AD-9570-A542B0A53AD4
    4           40A780ED-B28B-400D-B3DA-11846AF9695A               767C0372-2207-4ACD-9069-1875DC80F3FF
    5           A6D6795F-02C3-472E-ABF9-43ACBC522C32               07DC5570-2FAE-44B9-AD0C-3620CFF99D0F

    David


    David http://blogs.msdn.com/b/dbrowne/


    Monday, October 20, 2014 5:57 PM
  • >What would happen at runtime, for the execution plan I posted above, if another scan in either tables involved was already underway and the optimizer thought it was cool to join the scans?

    It's a streaming plan, so the client will get the rows as they are read.  You'd see the rows in order, but starting somewhere in the middle, looping around to the beginning.  Something like:

    ID          COL1                                               COL2
    ----------- -------------------------------------------------- --------------------------------------------------
    6           04DC9E5B-0C0C-49D1-B83C-51D98AF87D5C               760F4CA9-E8A8-4DF8-8DE4-DAF67383CF5D
    7           E5080B9C-F721-4C92-B607-16795697FBEA               BBF4F239-D686-4E7E-864C-C7A2185F855D
    8           FE02E152-9E21-4F1C-AD53-8E712A021371               B7897FB4-2922-4324-AF35-5E5E2F6E7639
    9           FB28B4C4-8D1A-46C5-B4BD-65C49AE6951E               A1B0A9BC-3645-46E9-A6CC-CE7847C6E7A1
    10          37ABBB3B-CDA0-46E4-AE86-DE5D08A2D76E               2B963DA4-DF24-483E-9688-2CEA47F807C8
    11          54D7B3E8-6C26-4B0E-A463-FAA88F5F1D93               C41F37CD-3E82-47DC-9D7B-3435557ECB85
    12          515FCA82-1B62-43C2-89BD-3B7B96A4A0D7               F65757B5-39AD-46E2-ABF0-2D7316FBEF66
    13          52C7BD9D-5B0A-425B-BAB5-872703EAD9D7               ED875D29-8CE3-4510-86D6-77F661E00A63
    14          7205EAB6-E6E7-4C35-BE77-2284068CBBD1               40F7B5C3-92E6-40CB-85A6-F2EFC46F59AF
    15          E4784118-3E86-4335-B741-69CDFF89B23A               262F8C28-EC92-4BCB-9D96-BF8FFA1B78CB
    16          4F82DB7C-12FF-4747-B38E-39D4418988A7               F1BCC6E1-93DE-4700-8B59-1D70789047C2
    17          F0A03090-2ECC-448D-83F9-3A09F7779199               5E29CA72-BEEC-480A-89DE-2C52D986AB67
    18          16C6F5EC-7E69-4170-B577-E60912EA64B7               546D6664-FCE3-4D14-9DD1-83E0389839A8
    1           B2D62494-EA28-47E6-8861-2C162EC04E62               2897D64F-E5AB-49CD-A361-741CDEB96844
    2           B56F607C-5832-4A17-8FCC-499352B436B7               C54BA46A-316F-4CE3-BC45-548AF138BCD6
    3           96ABD12B-FA9E-4A2E-844F-818EE332ADCE               A4472A26-A8E8-43AD-9570-A542B0A53AD4
    4           40A780ED-B28B-400D-B3DA-11846AF9695A               767C0372-2207-4ACD-9069-1875DC80F3FF
    5           A6D6795F-02C3-472E-ABF9-43ACBC522C32               07DC5570-2FAE-44B9-AD0C-3620CFF99D0F

    David


    David http://blogs.msdn.com/b/dbrowne/


    It cant be done like that. Both scans got to retrieve rows in the order they are physically sorted. Sharing an already running scan may work if it needs to buffer data, but buffered data must still honor it's logical order or any execution plans that depend on it would simply fail or return completely unpredictable results, such as the case with the merge join above.

    Imagine if we had the above situation, in which an execution plan is enforced by either being compiled for a SP or by using a plan guide, and all of a sudden row start to be retrieved in a random order during a merge join because the engine decided to use an ongoing scan the optimizer was not aware of.

    If this was the case, in the example above, rows 1 to 5 would simply not be joined because the cursor for the other table would be at row 18 already.


    Monday, October 20, 2014 6:08 PM
  • >It cant be done like that.

    Sure it can.  If there was a MERGE JOIN downstream in the plan, then the optimizer would enforce the order.  Exactly like if there was an ORDER BY in the query.

    If you don't force the optimizer to return sorted rows, it's free to not do so.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, October 20, 2014 6:28 PM
  • Merry-go-round only applies for full table scans. It does not apply if you have an index. I would strongly recommend, David, with all due respect, that you read your own articles before posting them here. Add this to the fact that the article doesn't even refer to the OP's product version.

    "In SQL Server Enterprise, the advanced scan feature allows multiple tasks to share full table scans."

    How index pages are read:

    "The storage engine reads index pages serially in key order."

    Besides:

    "It does not just read each data page in sequence from page 504 to page 556 (the last page with keys in the specified range). Instead, the storage engine scans the intermediate index page and builds a list of the leaf pages that must be read. The storage engine then schedules all the reads in key order."

    This means that EVEN if they are not physically stored in order (such as in the example I gave where the nonclustered index was highly fragmented), the scheduling mechanism ensures pages are retrieved by honoring the key order, which equals to the index order.

    http://technet.microsoft.com/en-us/library/ms191475%28v=SQL.105%29.aspx

    Edit: If you still believe you are right, despite all evidence I have provided, then I would truly welcome any explanations as to how a MERGE JOIN would properly function with two inputs, presumably sorted in the same order, if the engine could just mess up with the retrieval order any time it felt like doing so.

    Monday, October 20, 2014 6:33 PM
  • >It cant be done like that.

    Sure it can.  If there was a MERGE JOIN downstream in the plan, then the optimizer would enforce the order.  Exactly like if there was an ORDER BY in the query.

    If you don't force the optimizer to return sorted rows, it's free to not do so.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Ummmmm no... the optimizer cannot change an execution plan at runtime. If it didn't account for a sort operation when the plan was compiled, it won't magically add it at runtime. Read my previous post.
    Monday, October 20, 2014 6:42 PM
  • >If you still believe you are right, despite all evidence I have provided, then I would truly welcome any >explanations as to how a MERGE JOIN would properly function with two inputs, presumably sorted in the same >order, if the engine could just mess up with the retrieval order any time it felt like doing so.

    "Evidence" is irrelevant.  You have observed sorted output for a particular query on a particular table and concluded that SQL Server _must always_ return sorted output for similar queries and tables.  I assure you that the line of reasoning that leads you to this conclusion is faulty.  And in any case there is no such guarantee unless it is _documented_ in Books Online or in a KB article. 

    The behavior you observe, even if it is consistent, is subject to change in the future and should not be relied upon.  At best, it is undocumented behavior and an implementation detail.

    Again: if the query plan requires sorted input to a query operator then it will be sorted.  That's why MERGE JOIN is irrelevant here.  MERGE JOIN requires a sort, just like ORDER BY does.  So the Optimizer will always produce a plan that has sorted data (either because it's an ordered scan or because it includes a sort). 

    But a simple SELECT * FROM T, even with an INDEX hint does not logically require any sorting.  So the optimizer is free to pick a plan that doesn't guarantee sorted output. 

    David


    David http://blogs.msdn.com/b/dbrowne/


    Monday, October 20, 2014 8:38 PM
  • David... the behavior I have proven is very well documented in the very article you posted here!!!!

    http://technet.microsoft.com/en-us/library/ms191475%28v=SQL.105%29.aspx

    What else can I say in order for you to comprehend that the engine reads index pages based on the key order, and that does not require any sort operations?? This is an absurd, just read your own article lol...

    Its impressive how you can persist in your error even after everything I said here. You are just actively dismissing arguments as "irrelevant" at your leisure and this is very detrimental to a healthy discussion.

    I'm going to try to explain this to you, one more time, despite all the failed attempts so far.

    When you perform a scan over index pages, rows are retrieved based on the indexes key order.

    This means that if I create an index over a given column, and then I scan that index, results will be sorted by the column over which the index was created. This does not require any sort operations.

    Based on the example tables I've used so far, take a look at the execution plans for these three queries. The first two requires that the results be sorted by the COL1 column.

    Note that the object being scanned is an index which has COL1 as the key column, and therefore rows are ALREADY SORTED when they are retrieved, and they dismiss any need for a sorting operator. This means that the optimizer as ignored the ORDER BY clause because it knows the results are already sorted by that column, and the execution plan of the second and third queries are exactly the same.

    THERE IS ABSOLUTELY NO SORTING BEING PERFORMED HERE, EITHER AT PARSE OR RUNTIME!!

    For the third query, the optimizer could choose ANY index and yes, the order of the rows would be random as in that it would honor the order of a random index that the optimizer is free to pick. But if you ensure the usage of a specific index by using a query hint, results will ALWAYS be in the same order.

    Now take a look at these two other queries.

    The first one enforces the usage of an index for one table that does not matching the sort order of the other one, and therefore a sort operation is required. However the rows will still be retrieved in the COL1 order because the sort operation will do that for us (no ORDER BY operation was explicitly needed here either and the order is still guaranteed unless someone adds other indexes and the execution plan is recompiled).

    The second one just scans a different index in the TESTE table, and results will be ordered by the ID column instead of the COL1 one. The optimizer is free to pick any index and therefore the order is unpredictable.

    Edit: Just remember: Execution plans must respect query hints. If you enforce the employment of a specific index, it WILL use that index and it WILL respect its output order. If you still think that a scan over the IDX_TESTE index won't always return an ordered output without requiring an order by clause, then I've got nothing else to say. I'll leave the OP to decide which course of action to adopt and that's it.
    Monday, October 20, 2014 9:14 PM
  • >When you perform a scan over index pages, rows are retrieved based on the indexes key order.

    That article describes how an ordered scan works.  An ordered scan is not the only way to read all the rows from an index.  And there is no guarantee that a particular query form will always use a simple ordered scan.

    >THERE IS ABSOLUTELY NO SORTING BEING PERFORMED HERE

    Correct.  The optimizer knows how to perform an ordered scan of an index, and will usually choose an ordered scan over an un-ordered scan followed by a sort.  But the only way to _force_ it to return ordered results is with an ORDER BY clause in the outermost query.

    >But if you ensure the usage of a specific index by using a query hint, results will ALWAYS be in the same order.

    False.  They could be in reverse order (an index can also be scanned backwards), or piece-wise ordered in a merry-go-round scan.  Or it could be an allocation-order scan of the index pages.  Or it could be a parallel scan with results interleaved from multiple scanning threads, each working on a different range of pages.

    And even if none of those things _actually happen_ today, they might in a future version or service pack.  So you shouldn't rely on the order of the returned rows without an ORDER BY.  Lots of people have, and lots have regretted it later.

    David


    David http://blogs.msdn.com/b/dbrowne/







    Monday, October 20, 2014 9:22 PM
  • > Except in a merry-go-round scan or an allocation-order scan.

    Merry-go-round only applies to full table scans, not index scans. This is also documented in the article you posted.

    > But the only way to _force_ it to return ordered results is with an ORDER BY clause in the outermost query.

    Or with query hints. Still no ORDER BY.

    > They could be in reverse order (an index can also be scanned backwards),

    Depends on the execution plan. It wont ever be reverse scanned if it doesnt need to.

    > or piece-wise ordered in a merry-go-round scan.

    Merry-go-round only applies to full table scans (aka data pages, not index pages). "Piece-wise" what in the world is that in SQL Server?????

    > Or it could be an allocation-order scan of the index pages.

    This sorting is so reliable that even the optimizer doesn't sort data "just to make sure", so why would users? No, its always in the key order like I've shown here. If you think otherwise then send us reference, evidence, tests performed to reproduce this.

    > And even if none of those things _actually happen_ today, they might in a future version or service pack.  So you shouldn't rely on the order of the returned rows without an ORDER BY.  Lots of people have, and lots have regretted it later.

    Never seen it happen. It could happen in the future but then Microsoft would actually have to change an algorythm it has been improving since the product's conception. It would have to rewrite large parts of the query optimizer, change stream aggregate, merge, and other algorythms that depend on the implicit ordering that indexes offer. We are both entering the "what could happen" territory, which is dark and inhospitable, but how likely you think is this to happen?

    Monday, October 20, 2014 9:39 PM
  • The point I'm making here is that SQL Server engine can't just randomily choose the way an I/O thread reads data from disk, causing index data to be output in a random order, because many physical operators DEPEND on this order. This is why it wont deliberately stop scanning indexes based on the key order and do it some other way (I dont even know where you got all those other methods from).

    Execution plans are easy to enforce if you know how to use plan guides and/or query hints.


    Monday, October 20, 2014 9:46 PM
  • >No, its always in the key order like I've shown here.

    You are extrapolating from a single example.

    >If you think otherwise then send us reference, evidence, tests performed to reproduce this.  Here's one.  Both of these selects will return unordered results because they are doing an allocation-order scan.

    use tempdb
    drop table t
    go
    create table t(id int, constraint pk_t primary key (id))
    
    go
    insert into t(id)
    select top 1000 cast( cast( newid() as binary(4) ) as int)
    from sys.objects o, sys.columns c
    go 100
    
    
    select id
    from t with (index=pk_t, tablockx)
    
    
    select id
    from t with (index=pk_t, nolock)

    >Never seen it happen.

    I have.  ORDER BY in a view used to always return ordered results.  Then it didn't.  GROUP BY without an ORDER BY used to always return ordered results.  Then it didn't.  Many queries have change the result ordering when you get a parallel plan.

    >The point I'm making here is that SQL Server engine can't just randomily choose the way an I/O >thread reads data from disk, causing index data to be output in a random order, because many >physical operators DEPEND on this order.

    Right.  But there are different _kinds_ of scans.  An ordered scan, where SQL Server reads rows based on the logical order of the index is just one kind.  Other kinds include the parallel scan, the merry-go-round scan, the reverse scan, and the allocation-order (or unordered) scan.  The only way to force an ordered scan is with an ORDER BY clause.  An index hint is not sufficient.

    And don't take my word for it.  Take Conor Cunningham's:

    If you need order in your query results, put in an ORDER BY.  It's that simple.  Anything else is like riding in a car without a seatbelt.

    No Seatbelt - Expecting Order without ORDER BY

    David


    David http://blogs.msdn.com/b/dbrowne/





    Monday, October 20, 2014 9:58 PM
  • >use tempdb
    >drop table t
    >go
    >create table t(id int, constraint pk_t primary key (id))
    >
    >go
    >insert into t(id)
    >select top 1000 cast( cast( newid() as binary(4) ) as int)
    >from sys.objects o, sys.columns c
    >go 100
    >
    >
    >select id
    >from t with (index=pk_t, tablockx)
    >
    >
    >select id
    >from t with (index=pk_t, nolock)

    Still the same order.

    >> No, its always in the key order like I've shown here.

    > You are extrapolating from a single example.

    Actually, this information doesn't even come from my exampleS (there were quite a few so far). Its written in the article you posted here. My examples are meant to confirm that and explain why the ORDER BY clause is redundant in some cases.

    > I have.  ORDER BY in a view used to always return ordered results.  Then it didn't.

    Examples? References?

    >GROUP BY without an ORDER BY used to always return ordered results.  Then it didn't.  Many queries have >change the result ordering when you get a parallel plan.

    Now you finally admit ordered results would always be returned, even without an order by clause. We are in the right track it appears. The reason for this change of behavior is a variation in the logical operator used by the query optimizer, and not the way the engine reads pages from disk.

    In the beginning of times there was stream aggregate, an operator that would require its input to be sorted. And this is how a GROUP BY clause would ensure results were always ordered.


    But then his ugly cousin came and messed things up. He doesnt need an ordered input so yeah, order would be random. AGAIN, there are still ways of enforcing order here, through query hints, without requiring an order by clause, so this changes nothing.

    You have confirmed that for a given operator, results will always be ordered. Your own argument has, once again, proven my point.

    > Right.  But there are different _kinds_ of scans.  An ordered scan, where SQL Server reads rows based on the logical order of the index is just one kind. 

    It only won't scan in key order (by scheduling reads) if you change the transaction isolation level and hope for the worse. Because the key order scan is the best performing way of reading pages, this behavior (although possible) wasan't even observed in the example with NOLOCK/TABLOCK (known to cause allocation-order scans in larger tables) you posted above.

    > Other kinds include the parallel scan, the reverse scan,

    References?

    > the merry-go-round scan,

    Only applies to table scans, not indexes.

    http://technet.microsoft.com/en-us/library/ms191475%28v=SQL.105%29.aspx

    > And the allocation-order (or unordered) scan. 

    Only if you specify TABLOCK or NOLOCK query hints, which I haven't done. This is something that goes without saying, it's irrelevant to an indexed view scenario such as this one.

    Some people have also said it will perform an allocation-order scan if the index is too fragmented in SQL Server 2005, but as of 2012 it (probably) won't, as I have shown here. There is also no document anywhere in the internet (you are free to search for it), that would point us otherwise. If you have access to additional information, I would be more than happy if you could share it with us.

    Reference: http://sqlmag.com/database-development/quaere-verum-clustered-index-scans-part-iii

    > And don't take my word for it.  Take Conor Cunningham's:

    >If you need order in your query results, put in an ORDER BY.  It's that simple.  Anything else is like >riding in a car without a seatbelt.

    >No Seatbelt - Expecting Order without ORDER BY

    Again, with all due respect, did you read your own article? Did you read what I said earlier about having multiple I/O threads as a whole other situation? If you add new variables, OF COURSE the results will be different. Parallelism was disabled in the server I performed these tests on, but still you can control this by query hints, and you can enforce execution plans by using plan gudes.

    Its like I always say - it's always about the 0.001% of situations, the exceptions, rather than the general rule.

    Tell Mr Connor to add MAXDOP 1 to his querys OPTION clause and see if results are still unordered lol

    References:

    http://msdn.microsoft.com/en-us/library/ms190417.aspx

    http://msdn.microsoft.com/en-us/library/ms181714.aspx

    http://technet.microsoft.com/en-us/library/ms188611%28v=sql.105%29.aspx

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2008/08/27/no-seatbelt-expecting-order-without-order-by.aspx

    Imagining uncontrollable factors that would get an ordered input to be retrieved in an unordered manner is one of SQL Server's myths. There are a whole other people around who think the same way because of some experiences they had when they didn't know how to control the execution plan, and adding an order by clause is a very popular and easy shortcut to a complex problem. "In the worst cases it will just be redundant", they say. What they like to forget is that sorting has an often high cost, and it can be avoided by taking advantage of indexes.

    Tuesday, October 21, 2014 2:03 PM
  • >Imagining uncontrollable factors that would get an ordered input to be retrieved in an unordered manner is one of SQL Server's myths

    Your data changes.  Your statistics change.  The optimizer changes.  These are very real factors that can cause the order of your results to change in the future.

    You have absolutely no guarantee that any of those different scan types won't be used now or in the future for your query without an ORDER BY.  There's no documentation that guarantees that a merry-go-round scan, or a reverse scan, etc won't ever be used. 

    >sorting has an often high cost, and it can be avoided by taking advantage of indexes.

    If your data can be scanned in order from an index then ORDER BY does not increase the cost at all.  It just (correctly) requests ordered output. 

    Bottom line, again:  Ordered output for any query without an ORDER BY is an undocumented behavior subject to change.  You should not rely on ordering without ORDER BY.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Tuesday, October 21, 2014 3:05 PM
  • >Imagining uncontrollable factors that would get an ordered input to be retrieved in an unordered manner is one of SQL Server's myths

    Your data changes.  Your statistics change.  The optimizer changes.  These are very real factors that can cause the order of your results to change in the future.

    You have absolutely no guarantee that any of those different scan types won't be used now or in the future for your query without an ORDER BY.  There's no documentation that guarantees that a merry-go-round scan, or a reverse scan, etc won't ever be used. 

    >sorting has an often high cost, and it can be avoided by taking advantage of indexes.

    If your data can be scanned in order from an index then ORDER BY does not increase the cost at all.  It just (correctly) requests ordered output. 

    Bottom line, again:  Ordered output for any query without an ORDER BY is an undocumented behavior subject to change.  You should not rely on ordering without ORDER BY.

    David


    David http://blogs.msdn.com/b/dbrowne/

    >Your data changes.  Your statistics change.  The optimizer changes.  These are very real factors that can >cause the order of your results to change in the future.

    Queries don't change, and their hints will override any changes that could incur due to statistics. Optimizers don't change unless you update your instance. Data changes are irrelevant, statistics are not.

    >You have absolutely no guarantee that any of those different scan types won't be used now or in the future >for your query without an ORDER BY.  There's no documentation that guarantees that a merry-go-round >scan, or a reverse scan, etc won't ever be used.

    There is no documentation that says a merry-go-round or reverse scans CAN be used for index scans, just key and allocation-order. One of microsoft's article is very clear in that merry-go-round only happens for table scans. Allocation-order is only used under artificial conditions (i.e. has to be induced) in this context.

    >If your data can be scanned in order from an index then ORDER BY does not increase the cost at all.  It just >(correctly) requests ordered output.

    The problem here is that you cannot use an order by clause in a view. My point is that, in the OP's case, you can replace the "order by" clause while migrating a materialized view from Oracle to an indexed view in SQL Server by properly indexing the base table.

    >Bottom line, again:  Ordered output for any query without an ORDER BY is an undocumented behavior >subject to change.  You should not rely on ordering without ORDER BY.

    "The storage engine reads index pages serially in key order."

    http://technet.microsoft.com/en-us/library/ms191475%28v=SQL.105%29.aspx

    This article alone disagrees with you.

    Tuesday, October 21, 2014 3:19 PM
  • >Optimizers don't change unless you update your instance.

    And you should write code that isn't going to break when you patch or upgrade your instance.  SQL code is declarative and long-lived.

    ]>The storage engine reads index pages serially in key order."

    Even so, just because the storage engine reads index pages in key order does not guarantee that the rows are returned to the client in key order.  And that documentation doesn't cover the other scan types, and it's a few versions old.  That's a long way from guaranteeing that a query with an index hint will always return data in key order.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Tuesday, October 21, 2014 3:31 PM
  • >And you should write code that isn't going to break when you patch or upgrade your instance.  SQL code is >declarative and long-lived.

    Yeah, but knowing how very unlikely this feature is to change (because of reasons I wrote a few posts earlier), I wouldn't worry about that.

    >Even so, just because the storage engine reads index pages in key order does not guarantee that the rows >are returned to the client in key order.  And that documentation doesn't cover the other scan types, and it's >a few versions old.  That's a long way from guaranteeing that a query with an index hint will always return >data in key order.

    If you have just one I/O thread, rows will be returned to the client in the order they were retrieved. I don't know how deep your knowledge about this subject goes, but it goes without saying that query execution is an iterative process. It means things are done in an order. To fulfill a select operation, rows are retrieved one at a time, on demand. No two rows will be passed at the same time in the flow of data.

    And before you dabble into networks, know that if this could disrupt the display order, this would happen regardless of order by clause. I'm not sure how clients process this but it doesn't make any sense to say there are any uncontrollable factors, inside or outside SQL Server context, that could interfere with the output order.

    About the article... you're the one who posted it in the first place and now you want to discredit it? LOL!!! :D And It doesn't cover other scan types for a reason.

    You already understand that the engine can only scan indexes in a key or allocation order, and that allocation is only used under certain conditions that you have full control over. You also admitted that a group by clause would ALWAYS RETURN ORDERED RESULTS. You are just ONE step from understanding that order by clauses can be replaced by storing rows in the same order (even though they don't make a difference once this condition is met).

    If you proved me wrong, David, through evidence, articles, information, I would gladly revert all the related optimizations I've made through many years in my clients queries (who very rarely complained about inconsistent behavior with result sorting, and when they did, it was just because of a query hint someone neglected to use).

    Tuesday, October 21, 2014 4:33 PM
  • >If you have just one I/O thread, rows will be returned to the client in the order they were retrieved.

    There's no guarantee of that. 

    Also within an index page, rows aren't stored in any particular order. Where is it documented that the rows on a page are read in index key order? It's not. Again you are _guessing_ that this undocumented behavior won't change.

    >You also admitted that a group by clause would ALWAYS RETURN ORDERED RESULTS

    No.  GROUP BY queries used to return sorted results (in 6.5 IIRC).  And some developers _assumed_ that they always would.  The ordering of the results from a GROUP BY query was an _undocumented implementation detail_.  Exactly like the order of results from a query with an index hint.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Tuesday, October 21, 2014 5:20 PM
  • There's no guarantee of that. 

    Also within an index page, rows aren't stored in any particular order. Where is it documented that the rows on a page are read in index key order? It's not. Again you are _guessing_ that this undocumented behavior won't change.

    Each page has something called "slot array" which ensures the logical order of rows within the page, even if they are not physically stored in the expected order. The slot array gets updated at every modification.

    Here is your document, an article by Paul Randal the almighty.

    http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/

    "What if the record should logicallyhave come at the end of all other records on the page, but we’ve just inserted it in the middle – doesn’t that screw things up somewhat?No, because the slot array is ordered and gets reshuffled as records are inserted and deleted from pages."

    If you think Randal's word is not enough:

    http://blogs.msdn.com/b/askjay/archive/2011/01/07/what-is-a-slot-array.aspx

    "The job of the slot array is to store the offsets where the records start in the logical order as dictated by the index – if one exists.   So reading the slot array in order and going to the offsets to obtain each record on the page will return the records in the correct order."

    No.  GROUP BY queries used to return sorted results (in 6.5 IIRC).  And some developers _assumed_ that they always would.  The ordering of the results from a GROUP BY query was an _undocumented implementation detail_.  Exactly like the order of results from a query with an index hint.

    First lets put that "undocumented implementation" aspect aside.

    http://technet.microsoft.com/en-us/library/ms191475%28v=SQL.105%29.aspx

    "The storage engine then schedules all the reads in key order."

    Now If you mean "assumed" as in that this could change between versions, yeah. But not because the engine randomily chooses a scan method, as you said it yourself a few posts earlier: "GROUP BY clauses would always return ordered results". This means that you understand the engine cannot switch between key and allocation order scans at will except when subject to artificial conditions. You are past that. Good.

    I don't know WHEN it changed, but I do know HOW... And considering how it changed, this is irrelevant to both your discussion and the OP's question. Why?

    GROUP BY behavior changed because the hash aggregate operator was implemented for *UNSORTED* inputs - prior to that, the inputs always needed to be sorted and could perform poorly. We are talking about already sorted inputs.

    Even if you consider other operations prior to the aggregate operator in the execution plan that could change the input order (the consumer side of the operator), you can still enforce the order by specifying the ORDER GROUP query hint. Ohhh but other things could change, you say. The operators that could change can be enforced by query hints and plan guides as well.

    Edit: I just noticed you said "index hint" rather than "index scan".

    "Exactly like the order of results from a query with an index hint"

    Its not even the query hint that guarantees the result order, its the index scan itself. The hint's role is to make sure the optimizer doesn't choose a random index that is not sorted in the order I want the rows to be returned.


    Wednesday, October 22, 2014 1:40 PM
  • I don't know how many ways I can say this. 

    The index hint does not guarantee a query plan using an ordered index scan. 

    Microsoft's guidance (which you are free not to follow) is to not rely on undocumented behavior and to always use an ORDER BY for sorted results.

    David


    David http://blogs.msdn.com/b/dbrowne/



    Wednesday, October 22, 2014 2:23 PM
  • > The index hint does not guarantee a query plan using an ordered index scan. 

    An index scan will always be ordered because there is no such thing as an "unordered index scan" unless you forcibly make it so, under very special conditions. The index hint forces the optimizer to scan an index. These are different concepts.

    >Microsoft's guidance (which you are free not to follow) is to not rely on undocumented behavior and to always use an ORDER BY for sorted results.

    The following is written in the "about" section of every link you posted here which contains the "guidance" you mentioned above:

    "All postings are provided "AS IS" with no warranties, and confer no rights. Any opinions expressed in this blog are solely those of the author and not official positions of Microsoft Corporation."

    Wednesday, October 22, 2014 3:02 PM
  • >The index hint forces the optimizer to scan an index.

    No it doesn't. It only forces the optimizer to use the index somehow.  And the fact that it forces the optimizer to use the index is also undocumented.  If the index is disabled your query currently fails.  But that behavior could also change to make a query not fail if the hinted index is disabled.

    How it uses the index, and whether there are any order-affecting operators in the query plan after the index is used is undocumented.

    It doesn't get any clearer than Books Online:

    "The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified."

    ORDER BY Clause (Transact-SQL)

    David


    David http://blogs.msdn.com/b/dbrowne/





    Wednesday, October 22, 2014 3:17 PM
  • "No it doesn't. It only forces the optimizer to use the index somehow.  And the fact that it forces the optimizer to use the index is also undocumented.  If the index is disabled your query currently fails.  But that behavior could also change to make a query not fail if the hinted index is disabled."

    3 things are wrong here:

    1. It only forcers the optimizer to use the index somehow, but if you do not specify a predicate (like in the examples from previous posts), then it is obviously not going to perform an index seek LOL!!! No predicate = no seek (now you are going to argue that join clauses are not predicates and I'll have to explain this all over again). Even though your statement is correct, it still doesn't change anything. The point here is that the optimizer behavior can be controlled, enforced, manipulated, audited, analyzed. It is not a living being with free will, its just a piece of code. It won't bite.

    2. This also isn't true because you can include another query hint, FORCESCAN, to ensure the specified index is scanned rather than seeked, even if you have a predicate.

    3. Wether the index is enabled or not is irrelevant here. An index doesn't simply disable itself, someone has to do it. This would be the same as droping a table and then wondering why it cannot be queried anymore.

    "It doesn't get any clearer than Books Online:

    "The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.""

    Everything that is taken out of context can have a whole different meaning. Microsoft has a standard for documenting feature usage: It explains things under normal situations and then adds "notes" for exceptional cases. Unfortunately, they weren't added (yet).

    Keep giving us examples of elements that could change the order of a result set. I'll show you they are circumstancial or can be avoided:

    - Many scanning methods (No, there are only two, and one of them is only used if you forcibly make it so)
    - Recompilations can change the plan (Can be avoided by plan guides and query hints)
    - Your data changes (Irrelevant)
    - Statistics change (Irrelevant if you have plan guides, query hints, or even use KEEPFIXED PLAN)
    - Version updates (Unlikely, it would involve rewritting many parts of the optimizer and engine).
    - Client may not receive rows in order (No, or else "order by" would also be unpredictable, sorting happens in the server).
    - Rows may not be stored in order within a page (We have slot arrays for that)
    - Someone may disable the index specified in the hint (Then don't disable it)
    - Connor said parallelism can disrupt retrieval order (Use MAXDOP 1 then)
    - This behavior could change because the group by clause behavir changed 15 years ago (No, group by was INCREMENTED with additional grouping methods, which can be manually overriden).
    - There can be order-affecting operators in the query plan (Then make sure it doesn't. Its not hard).

    Wednesday, October 22, 2014 5:21 PM
  • "- Client may not receive rows in order (No, or else "order by" would also be unpredictable, sorting happens in the server)."

    Just because the data is sorted at the server does not mean there is no effort made at the client to keep it in order...  It is a network and packets will be sliced, sent, delayed, lost, resent etc.  You may receive the data from multiple different paths.

    Why would the fact that the data is sorted at the server and sent in order implies that it is also received in order?

    Typically, I get the kind of result you have for very little set of data.  As soon as I return larger amount, I see discrepancies in the ordering unless of course I specify the ordering.



    • Edited by Antoine F Sunday, October 26, 2014 11:56 PM
    Sunday, October 26, 2014 11:54 PM
  • "- Client may not receive rows in order (No, or else "order by" would also be unpredictable, sorting happens in the server)."

    Just because the data is sorted at the server does not mean there is no effort made at the client to keep it in order...  It is a network and packets will be sliced, sent, delayed, lost, resent etc.  You may receive the data from multiple different paths.

    Why would the fact that the data is sorted at the server and sent in order implies that it is also received in order?

    Typically, I get the kind of result you have for very little set of data.  As soon as I return larger amount, I see discrepancies in the ordering unless of course I specify the ordering.



    I can't speak for other client software, but in a remotely connected management studio, the results are always returned in the order they are processed at the server. This is because every result set is returned with metadata about its sort order, and a sort of implicit row number to indicate which row has to be displayed first. Most other clients will also do the same by default. Even if they don't, you can still sort them locally.

    Note that this is irrelevant to the discussion. The question here is whether an "order by" clause can be replaced by the implied logical order of a scanned index. Whether the rows get shuffled when they are transmitted over the network, regardless of query definition, is not part of the scope.

    Monday, October 27, 2014 12:58 PM
  • "- Client may not receive rows in order (No, or else "order by" would also be unpredictable, sorting happens in the server)."

    Just because the data is sorted at the server does not mean there is no effort made at the client to keep it in order...  It is a network and packets will be sliced, sent, delayed, lost, resent etc.  You may receive the data from multiple different paths.

    Why would the fact that the data is sorted at the server and sent in order implies that it is also received in order?

    Typically, I get the kind of result you have for very little set of data.  As soon as I return larger amount, I see discrepancies in the ordering unless of course I specify the ordering.



    I can't speak for other client software, but in a remotely connected management studio, the results are always returned in the order they are processed at the server. This is because every result set is returned with metadata about its sort order, and a sort of implicit row number to indicate which row has to be displayed first. Most other clients will also do the same by default. Even if they don't, you can still sort them locally.

    Note that this is irrelevant to the discussion. The question here is whether an "order by" clause can be replaced by the implied logical order of a scanned index. Whether the rows get shuffled when they are transmitted over the network, regardless of query definition, is not part of the scope.

    Without ORDER BY clause, we make no guarantees regarding the order in which the rows are returned from the SELECT statement. The semantics is very clearly defined in the ANSI SQL standard regarding how ORDER BY clause is processed & what a SELECT statement without ORDER BY clause means. SELECT statement represents a virtual table or set of rows and as such order is irrelevant or non-existent. Using ORDER BY clause is the only way to guarantee the same results in terms of order.

    So using observed results from specific SELECT statements or queries to infer behavior / semantics is cause of various bugs. In older versions of SQL Server folks did that for queries with GROUP BY clause when the only execution option was to sort & aggregate. Those applications were broken later when we added more strategies for optimizing / executing queries with GROUP BY clause & rows were returned in different order than columns in the GROUP BY clause. There are many examples like this where newer versions of SQL Server modified query behavior. Lastly, SQL Server has column store & hash indexes that inherently have no ORDER and hence the order of the rows returned is non-deterministic.

    To reiterate, we make NO guarantees on the order of rows that are returned from a SELECT statement if there is no ORDER BY clause. If you make any assumptions regarding SELECT statements without ORDER BY clause in terms of order then you do so at your own risk.

    For the original question, indexed views in SQL Server has different set of features than Oracle. You can either rewrite your query to use the supported syntax or alternatively use the scheduled refresh approach using SQLAgent for example. You could use features like CHANGE TRACKING to sync changes or other mechanisms.

    Tuesday, November 4, 2014 6:28 PM
  • It is important to remember that the select statements in the examples above employ query hints to ensure the usage of indexes which have inherent order. A select statement that is free to make any logical decisions over which objects to use is obviously going to have an unpredictable order.

    If you use an index hint, it won't ever use any other index, so results will always be ordered. There are articles documenting this behavior.


    Tuesday, November 4, 2014 7:38 PM
  • It is important to remember that the select statements in the examples above employ query hints to ensure the usage of indexes which have inherent order. A select statement that is free to make any logical decisions over which objects to use is obviously going to have an unpredictable order.

    If you use an index hint, it won't ever use any other index, so results will always be ordered. There are articles documenting this behavior.



    Could you provide those articles.  For all I know, hints are not instructions but rather exactly what the name implies, hints.  As such, SQL server can elect to ignore them.  I would love to be proved wrong about this one, it would simplify my life on some occasions.  Especially when specifying locking mechanism as hints.
    Tuesday, November 4, 2014 9:15 PM
  • It is important to remember that the select statements in the examples above employ query hints to ensure the usage of indexes which have inherent order. A select statement that is free to make any logical decisions over which objects to use is obviously going to have an unpredictable order.

    If you use an index hint, it won't ever use any other index, so results will always be ordered. There are articles documenting this behavior.



    Could you provide those articles.  For all I know, hints are not instructions but rather exactly what the name implies, hints.  As such, SQL server can elect to ignore them.  I would love to be proved wrong about this one, it would simplify my life on some occasions.  Especially when specifying locking mechanism as hints.

    http://technet.microsoft.com/en-us/library/ms191475%28v=SQL.105%29.aspx

    "The storage engine then schedules all the reads in key order."

    When I say query hints, I mean specifically that you can ensure, through its usage, that a query will always scan an index. The optimizer can be forced to perform a scan, and it can be forced to use a specific index that is sorted the way the results have to be returned.

    Group by and join operators, to name a few, can also have their behavior enforced to make sure the results order won't be changed.

    The problem here is that people FROM MICROSOFT disagree with their OWN ARTICLES about pages being read in key order during an index scan, and returning rows in the index order. In their perspective, an index scan enforced by a FORCESCAN, a MAXDOP 0 and an INDEX hint won't always return results in the same order of the index because it is written in the bloody stars.

    They were not able to logically explain their opinion through evidence or documentation, often citing irrelevant situations with different conditions from the ones I posted above as examples.

    Edit: Just read David's posts and see how he repeatedly contradicts himself with arguments that conflicts with both what he said before and the information in the links he sends.

    Edit2: I can't speak for all query hints, but the ones I listed above cannot be ignored. For instance, try to specify an index name that doesn't exist. (someone is, again, going to say that for SQL Server 6 some hints could be ignored and that this is undocumented behavior and could change in 30 years from now when there won't be any RDBMS anymore and that the sky is filled with microsoft's colors)

    http://msdn.microsoft.com/en-us/library/ms181714.aspx

    "Query hints specify that the indicated hints should be used throughout the query. They affect all operators in the statement. If UNION is involved in the main query, only the last query involving a UNION operation can have the OPTION clause. Query hints are specified as part of the OPTION clause. If one or more query hints cause the query optimizer not to generate a valid plan, error 8622 is raised."

    Wednesday, November 5, 2014 1:18 PM
  • Hmm.

    You've now been told by Books Online, Conor Cunningham, and Umchandar Jaychandran that there is no guarantee that a query without an ORDER BY clause will return ordered results.  Period.

    No matter how many times you _observe_ ordered results from a query with an index hint, the documentation and two of SQL Product team members most responsible for writing and maintaining SQL Server recommend not relying on that behavior.

    And while you are free to use the product however you want, it should be clear by now on this thread that other people should not rely on this behavior.

    David


    David http://blogs.msdn.com/b/dbrowne/






    Wednesday, November 5, 2014 5:43 PM
  • Hmm.

    You've now been told by Books Online, Conor Cunningham, and Umchandar Jaychandran that there is no guarantee that a query without an ORDER BY clause will return ordered results.  Period.

    No matter how many times you _observe_ ordered results from a query with an index hint, the documentation and two of SQL Product team members most responsible for writing and maintaining SQL Server recommend not relying on that behavior.

    And while you are free to use the product however you want, it should be clear by now on this thread that other people should not rely on this behavior.

    David


    David http://blogs.msdn.com/b/dbrowne/






    Books online said nothing. It was a simplistic approach aimed at providing guidance to developers that would work in most situations (the same kind of perspective you said I shouldn't have).

    Conor Cunningham gave a single situation where an operation generated multiple I/O threads because of parallelism and therefore order could not be guaranteed in that case. True, but you can still disable MAXDOP and avoid that. He said nothing about not having parallelism enabled.

    Umchandar said nothing about enforcing a scan/seek on an index. He just said simple SELECT statements can't have their order assumed. Under normal conditions this is also true, but then again, this is just another simplistic approach to a little more complex subject.

    Umchandar may be from microsoft's product team, but until someone gives us an example of a situation where you cannot indirectly control the order results are returned, I will defend my position, because it is on my personal interest to contribute to a knowledge sharing community. Note that I have always posted EVIDENCE of whatever assertments I make here, as opposed to other people involved in this discussion. It amazes me how people think they are dismissed from doing the same just because of their points/employer/assumed knowledge or any other reason, because this is irrelevant to the voluntary, knowledge sharing work we perform here.

    Until someone comes with arguments other than that it is written in the stars, as to why you can't indirectly control the way results are returned to a client, I will still recommend, in every applicable optimization thread, that people transfer sorting workloads to disk to take advantage of "pre-sorted" rows.

    Thursday, November 6, 2014 1:26 PM