locked
More columnstore woes RRS feed

  • Question

  • Hi all,

    I have an issue with columnstore and the query optimiser.  I am running the following query:

    SELECT Store.StoreId, COUNT(*)
    FROM Fact.Sale
    INNER JOIN Dimension.Store ON Store.StoreId = Sale.StoreId
    GROUP BY Store.StoreId
    ORDER BY Store.StoreId
    OPTION (RECOMPILE)

    "Fact.Sale" has almost 4 million rows, while the Store table has 24.  The query ends up taking 7 seconds, which is terrible for columnstore.  Obviously, if I rewrite the query with subqueries, the query improves, but I am using SSAS and ROLAP to generate queries.

    The strange thing is, on another data source with an almost identical query and far more data, it is sub-second.  What I've cut it down to is the query optimiser and statistics.  See the estimated query plan:

    and the actual query plan:

    This is also the plan that gets executed (obviously); however, what is interesting is that the "Actual Number of Rows" and "Estimated Number of Rows" are radically off.

    In the estimated plan, you can see the columnstore scan spits out a lot of rows (3.8m in fact), but the "Parallelism (Repartition Stream)" actual estimates it spits out 1 row and then from then on in, it continues to estimate 1 row.  In the actual plan, for some reason the "Parallelism (Repartition Stream)" actual number of rows is 0, but you can see by the height of the arrows from then on in, the actual number of rows is the 3.8m (not the 1 it estimated).

    So clearly the optimiser is picking a simpler query plan because it thinks there will only be 1 row to process through the process.  Interestingly enough though, on the other data source (on the same SQL server instance), it gets all the estimations correct and adds in a "Hash match (Partial aggregate)" in batch mode which makes the query fly.

    I've update statistics, created statistics on various different columns, rebuilt the table, rebuilt the indexes, etc. but I can never seem to get the right estimations.

    Does anyone know how I can probe further into this issue and find a resolution?

    Wednesday, May 8, 2013 1:38 AM

Answers

  • Hi Adam,

    The exchange operators in this context are there to accommodate spills (hash tables do not fit in memory) that facilitate a revert of the batch-mode to a row-mode instead.  You've identified that there is a cardinality estimate issue - and I'd be curious to see the histograms (via DBCC SHOW_STATISTICS) of the indexes and stats being used for this particular query (for example - the histograms associated with StoreID).

    Let's start there.  Also if you can share the .sqlplan of the actual plan, that would be interesting to take a look at as well.

    Cheers,

    Joe

    • Proposed as answer by Fanny Liu Monday, May 13, 2013 7:09 AM
    • Marked as answer by Fanny Liu Tuesday, May 14, 2013 1:16 AM
    Thursday, May 9, 2013 1:14 PM