locked
Index Corruption RRS feed

  • Question

  • We have been chasing a periodic index corruption issue for a few years now and we just solved it.  Even though we solved it I want to understand it and I can't find the information I am looking for.

    The problem turned out to be an issue with how indexes are processed by VfpOleDb and a C library.  The application was originally written using this C library and we have been migrating much of the database access to VfpOleDb.  It turns out that there is a new bit flag in the index that VfpOleDb was using but the C code was not.  The C code mistakenly treated the bit field as an integer and when the new bit was added it threw off all the processing.  According to http://msdn.microsoft.com/en-us/library/s8tb8f47%28VS.80%29.aspx there is a "node attribute" in the first 2 bytes of each node page.  The values according to the documentation are 1 for a root node and 2 for a leaf node.  3 for both since its a bit field.  Well we are seeing 5s and 6s in this field when FoxProOleDb writes to the index.  6 is a leaf not (2 and 4 bit set).  5 is a root node (1 and 4 bit set).  The C was was processing the index as an integer (as in nodeattribyte >= 2) and therefore treating root nodes as leaf nodes when they had the 4 bit set.  For reads it caused an error but when a write happened with the C library this confusion caused corruptions.

    What I'm trying to figure out is what this 4 bit means.  We modified the C code to ignore everything except the 1 and 2 bit but I want to understand what the 4 bit means.
    Tuesday, July 28, 2009 10:11 PM

Answers

  • The docs talk about 3 bits


    Node attributes (any of the following numeric values or their sums): 0 – index node 1 – root node 2 – leaf node

    That description "numeric values or their sums" is surely wrong, as then 1 could both mean index+root or only root, etc. I'm sure they meant value 2^0, 2^1, 2^2, bits 0-2 with values 1,2 and 4.

    If that's the case, this means:

    0: no node (possible at all?)

    1: index node

    2: root node

    3: index and root node

    4: leaf node

    5: index and leaf node

    6: root and leaf node

    7: index, root and leaf node

    0 and some of the other values may never occur, due to be impossible. I can only imagine a node being all three types, if on a table with just one record. I can't imagine a node being no type at all.

    Overall I'm unsure, the docs about file structure are not very detailed and not necessarily correct. Incorrect info also turned out about what the byte 0 of DBFs means about the DBF file type, which was just a line break issue, eg values and explanaitions were shifted. 

    The only company, who really would know is Microsoft.

    Bye, Olaf.

    Wednesday, August 15, 2012 6:41 AM

All replies

  • How did you soved it ? (if is not secrect...) I am working at a program to mach dbf files with idx files and I had suprise I found a lot of idx files corrupted (and reading idx file did not help me to establish if is corrupted or no plus I was able to mach the idx files with other dbf tables without receive a error)...If I will find something regarding your questions I will let you know ...
    dni
    Wednesday, July 29, 2009 3:58 PM
  • Your best bet is to replace the IDX files with CDX indexes. IDX files are very prone to corruption, but CDX indexes are part of the table structure. You should be able to write a fox program to go through the tables and replace all idx indexes with cdx indexes; then you can read them through vfpoledb.
    Tuesday, June 5, 2012 2:51 PM
  • This doesn't answer the question (what the bit means) nor is the solution appropriate (the C library is still being used and does not support CDX files)
    Sunday, August 5, 2012 8:11 PM
  • Can anyone answer what the 4-bit set means?

    Thanks!


    Ed Price (a.k.a User Ed), SQL Server Experience Program Manager (Blog, Twitter, Wiki)

    Tuesday, August 14, 2012 10:18 PM
  • The docs talk about 3 bits


    Node attributes (any of the following numeric values or their sums): 0 – index node 1 – root node 2 – leaf node

    That description "numeric values or their sums" is surely wrong, as then 1 could both mean index+root or only root, etc. I'm sure they meant value 2^0, 2^1, 2^2, bits 0-2 with values 1,2 and 4.

    If that's the case, this means:

    0: no node (possible at all?)

    1: index node

    2: root node

    3: index and root node

    4: leaf node

    5: index and leaf node

    6: root and leaf node

    7: index, root and leaf node

    0 and some of the other values may never occur, due to be impossible. I can only imagine a node being all three types, if on a table with just one record. I can't imagine a node being no type at all.

    Overall I'm unsure, the docs about file structure are not very detailed and not necessarily correct. Incorrect info also turned out about what the byte 0 of DBFs means about the DBF file type, which was just a line break issue, eg values and explanaitions were shifted. 

    The only company, who really would know is Microsoft.

    Bye, Olaf.

    Wednesday, August 15, 2012 6:41 AM
  • In many days of testing I had not see the value show up until today (May 1, 2014). I don't know why it's there. It seems to be a flag of some kind as when you add additional keys to the index the setting moves to the new nodes, and does not remain on all of them, but I cannot figure out its meaning.

    I've asked Calvin on his page:  http://blogs.msdn.com/b/calvin_hsia/archive/2005/01/05/347388.aspx

    The thread's so old, I doubt he'll answer though.

    Thursday, May 1, 2014 7:38 PM
  • Olaf, your answer does not align with data in the CDX. The values for the nodes are 0-index, 1-root, 2-leaf, and 3-root+leaf.  They are actual values, not bit positions.  The spurious bit remains a mystery.  Ignoring its existence by ANDing the lower node bits seems to allow indexes to function properly.

    I'm now thinking it's some kind of housekeeping marker or indicator for interior node refactoring as the number of index nodes and keys per node grow.

    Best regards,
    Rick C. Hodgin

    Thursday, April 9, 2015 6:16 PM
  • Well, as said: The only company, who really would know is Microsoft.

    I'm just trying to make sense of this quote:

    Node attributes (any of the following numeric values or their sums): a.   0 – index node b.   1 – root node c.   2 – leaf node

    How would the sum of the values encode an index leaf or index root node, or does the one rule out the other? 1 can be the sum of 0+1 or 1 only.

    Bye, Olaf.


    Olaf Doschke - TMN Systemberatung GmbH http://www.tmn-systemberatung.de

    Sunday, April 12, 2015 11:37 AM