Have some questions regarding segments and dictionaries.
1. When I created a columnstore index for a fact table with 17496061 records, 18 segments were created for each column. Segments 0 to 15 were having the same number of rows. However segments 16 and 17 are having different number or rows with segment 16 having
less rows than segment 17. Any pointers for this behaviour?
2. How the rows of the column are distributed in each of the segments? The min and max ids columns in the
table seems to have some kind of pattern for arrangement.
3. How are the unique
values in the dictionary for a column accessed during query execution. What does the data pointer column in the
Segments are built in parallel on multiple threads. When threads hit the end of the table they might not have a full segment worth so they could build a segment that is smaller than the full ones.
If the table has a clustered B-tree, columnstore index build uses the same order as the B-tree to order rows going into the segments. This is good to allow improved segment elimination using segment min and max values during query execution, say
based on ordering on date. If the input data is a heap, then no particular order is maintained.
Your question 3 is about an internal detail that's not relevant for tuning so I'm not going to answer it, other than to say that dictionaries are pinned in memory during execution and data value IDs are used to look up into the dictionary to get actual values