locked
The Phone Book Analogy for an Index RRS feed

  • Question

  • I like to use the Phone Book analogy (with some modification) when trying to understand what an Index is. Please comment or expand on the following analogies.

    (1) Heap Analogy: If a table has no clustered index then it is a heap. That is like a world that has no phone book. There is information out there about names, addresses, and phone numbers, but it is not organized. So, if someone asks you to retrieve phone numbers for everyone with a certain last name, you would need to visit all the homes out there, collect the info and filter it down. That is like a table scan.

    (2) Clustered Index Analogy: However, once you compile a phone book, then that's like building a clustered index on a table. It is organized by last name followed by first name. Now if someone asks you to retrieve phone numbers for people with a certain last name, you don't need to visit all the homes. You just look up by the last name and all the info you want is found on in the phone book. That is a clustered index seek. Next, let's say someone asks for all phone numbers in a certain state. We can still use the phone book, but we will have to go through all the entries in it row by row. This is a clustered index scan. But at least we didn't have to visit all the homes out there (ie, do a table scan).

    (3) Non Clustered Index Analogy: Now, let's add a component to our phone book that doesn't typically exist in a real world phone book. Let's add a reference section to the back of the phone book. This reference section will have each state with the page numbers of the phone book that has the phone numbers for everyone in that state. This would be like a non clustered index (and in this case non unique). With this index in place, we don't need to go thru all the pages of the phone book (clustered index scan). We would utilize this new index to find all the pages that has the entries for everyone in a certain state. This would be like a non clustered index seek. Once the pages have been found, you need to actually go to those pages and get the phone numbers for everyone in that state. This would be like a key lookup.

    There few more analogies I would like to add to this, but let me stop here and see what you guys think.

    Wednesday, November 24, 2010 6:22 PM

Answers

  • If you have a heap of business cards, you can build a better analogy.

    Turn all the cards upside down, lay them down on a table and write a unique number on the back.

    Now you have, what SQL Server calls a heap. If you search for Jim's number you need to turn all the cards around and look if they contain the name Jim

    You have to do that for all cards on the table, hence you are doing a table scan.

     

    If you get tired of that you might add the first name to the number on the back of the card instead of the number and sort them on the table in order of that name.

    With that you have build a clustered index. If you want to know Jim's phone number now, you can just pick up Jim's card. That is called a clustered index seek.

     

    If someone asked for Mr Smith's phone number and you did not know his first name, you would have to go and access all cards again, even so they are sorted, because they are sorted by the wrong key. This now is called a clustered index scan.

    If you'd get asked for the phone number by last name often, you might end up taking a piece of paper and write all the last names down and next to each the first name. this is now a nonclustered index. Now you just go to the line that contains Mr Smith on your paper and figure out his first name (Jim). That is called an nonclustered index seek. Because you need his phone number, you now pick up Jim's card to retrieve the phone number. This is called a Key Lookup.

     

    Finally someone asks how many cards you have. you could now go and count the cards, but because you last names fit all on one page it is going to be a lot faster to just count the last names in your index. For that you are doing a nonclustered index scan.

    • Proposed as answer by Naomi N Wednesday, November 24, 2010 7:23 PM
    • Unproposed as answer by DoolinDalton Wednesday, November 24, 2010 7:59 PM
    • Marked as answer by Alex Feng (SQL) Thursday, December 2, 2010 9:23 AM
    Wednesday, November 24, 2010 6:48 PM
  • Let's go with your business card analogy, although I'm not sure why the phone book analogy isn't accurate. May be that will become clear as we engage in this discussion. And I would like to switch up the keys in your indexes a bit to make it reflect a more realistic scenario.

    (1) Heap: A stack of business cards is a heap. Anytime you want to search for anything, you need to go thru it one by one. This is a table scan.

    (2) Clustered Index: Your example builds a clustered index on the first name, but let's build a clustered index on last name followed by first name. So, you label the back of the business cards this way and order them in a rolodex. Now you have a clustered index. When someone asks for business cards for Mr Smith, it's easy to search the rolodex and find the card(s) for Mr Smith. The search would be just as easy if someone asked for Jim Smith. That is a clustered index seek. But if someone asked for just Jim, the order of the rolodex doesn't help. You need to go through all the cards and look for Jim. This is clustered index scan.

    (3) Non Clustered Index: Your next example builds a non clustered index. Let's say sometimes you need to search your business cards by some component of the business address, say, the state. For example, you want to get all business cards in 'NY'. In my phone book example, I created a reference section that has the states with the page numbers of the phone book where all entries for a given state would be found. Whether this reference section is physically attached to the phone book or is a separate book is trivial. Let's say in my phone book analogy, this reference section is a separate book. NowWould you please describe again what this non clustered index would look like in your business card example?

    Wednesday, November 24, 2010 7:59 PM
  • I had some time to think about the phone book analogy and the business card analogy and I think I see where my phone book analogy may be off.

    On the non clustered index example, in my phone book analogy, I have the phone book page number associated with the key for the non clustered index. I think this is where I was wrong. It should've been associated with the key for the clustered index.

    So, switching to the business card example, we have a rolodex of business cards (clustered index) ordered by last name and first name. Now you want to build a non clustered index on the state. So that would be like having a list of states. On this list, there would be an entry for each card with its state and the associated last name frst name.

    Wednesday, November 24, 2010 8:45 PM

All replies

  • If you have a heap of business cards, you can build a better analogy.

    Turn all the cards upside down, lay them down on a table and write a unique number on the back.

    Now you have, what SQL Server calls a heap. If you search for Jim's number you need to turn all the cards around and look if they contain the name Jim

    You have to do that for all cards on the table, hence you are doing a table scan.

     

    If you get tired of that you might add the first name to the number on the back of the card instead of the number and sort them on the table in order of that name.

    With that you have build a clustered index. If you want to know Jim's phone number now, you can just pick up Jim's card. That is called a clustered index seek.

     

    If someone asked for Mr Smith's phone number and you did not know his first name, you would have to go and access all cards again, even so they are sorted, because they are sorted by the wrong key. This now is called a clustered index scan.

    If you'd get asked for the phone number by last name often, you might end up taking a piece of paper and write all the last names down and next to each the first name. this is now a nonclustered index. Now you just go to the line that contains Mr Smith on your paper and figure out his first name (Jim). That is called an nonclustered index seek. Because you need his phone number, you now pick up Jim's card to retrieve the phone number. This is called a Key Lookup.

     

    Finally someone asks how many cards you have. you could now go and count the cards, but because you last names fit all on one page it is going to be a lot faster to just count the last names in your index. For that you are doing a nonclustered index scan.

    • Proposed as answer by Naomi N Wednesday, November 24, 2010 7:23 PM
    • Unproposed as answer by DoolinDalton Wednesday, November 24, 2010 7:59 PM
    • Marked as answer by Alex Feng (SQL) Thursday, December 2, 2010 9:23 AM
    Wednesday, November 24, 2010 6:48 PM
  • Let's go with your business card analogy, although I'm not sure why the phone book analogy isn't accurate. May be that will become clear as we engage in this discussion. And I would like to switch up the keys in your indexes a bit to make it reflect a more realistic scenario.

    (1) Heap: A stack of business cards is a heap. Anytime you want to search for anything, you need to go thru it one by one. This is a table scan.

    (2) Clustered Index: Your example builds a clustered index on the first name, but let's build a clustered index on last name followed by first name. So, you label the back of the business cards this way and order them in a rolodex. Now you have a clustered index. When someone asks for business cards for Mr Smith, it's easy to search the rolodex and find the card(s) for Mr Smith. The search would be just as easy if someone asked for Jim Smith. That is a clustered index seek. But if someone asked for just Jim, the order of the rolodex doesn't help. You need to go through all the cards and look for Jim. This is clustered index scan.

    (3) Non Clustered Index: Your next example builds a non clustered index. Let's say sometimes you need to search your business cards by some component of the business address, say, the state. For example, you want to get all business cards in 'NY'. In my phone book example, I created a reference section that has the states with the page numbers of the phone book where all entries for a given state would be found. Whether this reference section is physically attached to the phone book or is a separate book is trivial. Let's say in my phone book analogy, this reference section is a separate book. NowWould you please describe again what this non clustered index would look like in your business card example?

    Wednesday, November 24, 2010 7:59 PM
  • I had some time to think about the phone book analogy and the business card analogy and I think I see where my phone book analogy may be off.

    On the non clustered index example, in my phone book analogy, I have the phone book page number associated with the key for the non clustered index. I think this is where I was wrong. It should've been associated with the key for the clustered index.

    So, switching to the business card example, we have a rolodex of business cards (clustered index) ordered by last name and first name. Now you want to build a non clustered index on the state. So that would be like having a list of states. On this list, there would be an entry for each card with its state and the associated last name frst name.

    Wednesday, November 24, 2010 8:45 PM