File:  [ELWIX - Embedded LightWeight unIX -] / embedaddon / sqlite3 / src / btreeInt.h
Revision 1.1.1.1 (vendor branch): download - view: text, annotated - select for diffs - revision graph
Tue Feb 21 17:04:17 2012 UTC (12 years, 8 months ago) by misho
Branches: sqlite3, MAIN
CVS tags: v3_7_10, HEAD
sqlite3

    1: /*
    2: ** 2004 April 6
    3: **
    4: ** The author disclaims copyright to this source code.  In place of
    5: ** a legal notice, here is a blessing:
    6: **
    7: **    May you do good and not evil.
    8: **    May you find forgiveness for yourself and forgive others.
    9: **    May you share freely, never taking more than you give.
   10: **
   11: *************************************************************************
   12: ** This file implements a external (disk-based) database using BTrees.
   13: ** For a detailed discussion of BTrees, refer to
   14: **
   15: **     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
   16: **     "Sorting And Searching", pages 473-480. Addison-Wesley
   17: **     Publishing Company, Reading, Massachusetts.
   18: **
   19: ** The basic idea is that each page of the file contains N database
   20: ** entries and N+1 pointers to subpages.
   21: **
   22: **   ----------------------------------------------------------------
   23: **   |  Ptr(0) | Key(0) | Ptr(1) | Key(1) | ... | Key(N-1) | Ptr(N) |
   24: **   ----------------------------------------------------------------
   25: **
   26: ** All of the keys on the page that Ptr(0) points to have values less
   27: ** than Key(0).  All of the keys on page Ptr(1) and its subpages have
   28: ** values greater than Key(0) and less than Key(1).  All of the keys
   29: ** on Ptr(N) and its subpages have values greater than Key(N-1).  And
   30: ** so forth.
   31: **
   32: ** Finding a particular key requires reading O(log(M)) pages from the 
   33: ** disk where M is the number of entries in the tree.
   34: **
   35: ** In this implementation, a single file can hold one or more separate 
   36: ** BTrees.  Each BTree is identified by the index of its root page.  The
   37: ** key and data for any entry are combined to form the "payload".  A
   38: ** fixed amount of payload can be carried directly on the database
   39: ** page.  If the payload is larger than the preset amount then surplus
   40: ** bytes are stored on overflow pages.  The payload for an entry
   41: ** and the preceding pointer are combined to form a "Cell".  Each 
   42: ** page has a small header which contains the Ptr(N) pointer and other
   43: ** information such as the size of key and data.
   44: **
   45: ** FORMAT DETAILS
   46: **
   47: ** The file is divided into pages.  The first page is called page 1,
   48: ** the second is page 2, and so forth.  A page number of zero indicates
   49: ** "no such page".  The page size can be any power of 2 between 512 and 65536.
   50: ** Each page can be either a btree page, a freelist page, an overflow
   51: ** page, or a pointer-map page.
   52: **
   53: ** The first page is always a btree page.  The first 100 bytes of the first
   54: ** page contain a special header (the "file header") that describes the file.
   55: ** The format of the file header is as follows:
   56: **
   57: **   OFFSET   SIZE    DESCRIPTION
   58: **      0      16     Header string: "SQLite format 3\000"
   59: **     16       2     Page size in bytes.  
   60: **     18       1     File format write version
   61: **     19       1     File format read version
   62: **     20       1     Bytes of unused space at the end of each page
   63: **     21       1     Max embedded payload fraction
   64: **     22       1     Min embedded payload fraction
   65: **     23       1     Min leaf payload fraction
   66: **     24       4     File change counter
   67: **     28       4     Reserved for future use
   68: **     32       4     First freelist page
   69: **     36       4     Number of freelist pages in the file
   70: **     40      60     15 4-byte meta values passed to higher layers
   71: **
   72: **     40       4     Schema cookie
   73: **     44       4     File format of schema layer
   74: **     48       4     Size of page cache
   75: **     52       4     Largest root-page (auto/incr_vacuum)
   76: **     56       4     1=UTF-8 2=UTF16le 3=UTF16be
   77: **     60       4     User version
   78: **     64       4     Incremental vacuum mode
   79: **     68       4     unused
   80: **     72       4     unused
   81: **     76       4     unused
   82: **
   83: ** All of the integer values are big-endian (most significant byte first).
   84: **
   85: ** The file change counter is incremented when the database is changed
   86: ** This counter allows other processes to know when the file has changed
   87: ** and thus when they need to flush their cache.
   88: **
   89: ** The max embedded payload fraction is the amount of the total usable
   90: ** space in a page that can be consumed by a single cell for standard
   91: ** B-tree (non-LEAFDATA) tables.  A value of 255 means 100%.  The default
   92: ** is to limit the maximum cell size so that at least 4 cells will fit
   93: ** on one page.  Thus the default max embedded payload fraction is 64.
   94: **
   95: ** If the payload for a cell is larger than the max payload, then extra
   96: ** payload is spilled to overflow pages.  Once an overflow page is allocated,
   97: ** as many bytes as possible are moved into the overflow pages without letting
   98: ** the cell size drop below the min embedded payload fraction.
   99: **
  100: ** The min leaf payload fraction is like the min embedded payload fraction
  101: ** except that it applies to leaf nodes in a LEAFDATA tree.  The maximum
  102: ** payload fraction for a LEAFDATA tree is always 100% (or 255) and it
  103: ** not specified in the header.
  104: **
  105: ** Each btree pages is divided into three sections:  The header, the
  106: ** cell pointer array, and the cell content area.  Page 1 also has a 100-byte
  107: ** file header that occurs before the page header.
  108: **
  109: **      |----------------|
  110: **      | file header    |   100 bytes.  Page 1 only.
  111: **      |----------------|
  112: **      | page header    |   8 bytes for leaves.  12 bytes for interior nodes
  113: **      |----------------|
  114: **      | cell pointer   |   |  2 bytes per cell.  Sorted order.
  115: **      | array          |   |  Grows downward
  116: **      |                |   v
  117: **      |----------------|
  118: **      | unallocated    |
  119: **      | space          |
  120: **      |----------------|   ^  Grows upwards
  121: **      | cell content   |   |  Arbitrary order interspersed with freeblocks.
  122: **      | area           |   |  and free space fragments.
  123: **      |----------------|
  124: **
  125: ** The page headers looks like this:
  126: **
  127: **   OFFSET   SIZE     DESCRIPTION
  128: **      0       1      Flags. 1: intkey, 2: zerodata, 4: leafdata, 8: leaf
  129: **      1       2      byte offset to the first freeblock
  130: **      3       2      number of cells on this page
  131: **      5       2      first byte of the cell content area
  132: **      7       1      number of fragmented free bytes
  133: **      8       4      Right child (the Ptr(N) value).  Omitted on leaves.
  134: **
  135: ** The flags define the format of this btree page.  The leaf flag means that
  136: ** this page has no children.  The zerodata flag means that this page carries
  137: ** only keys and no data.  The intkey flag means that the key is a integer
  138: ** which is stored in the key size entry of the cell header rather than in
  139: ** the payload area.
  140: **
  141: ** The cell pointer array begins on the first byte after the page header.
  142: ** The cell pointer array contains zero or more 2-byte numbers which are
  143: ** offsets from the beginning of the page to the cell content in the cell
  144: ** content area.  The cell pointers occur in sorted order.  The system strives
  145: ** to keep free space after the last cell pointer so that new cells can
  146: ** be easily added without having to defragment the page.
  147: **
  148: ** Cell content is stored at the very end of the page and grows toward the
  149: ** beginning of the page.
  150: **
  151: ** Unused space within the cell content area is collected into a linked list of
  152: ** freeblocks.  Each freeblock is at least 4 bytes in size.  The byte offset
  153: ** to the first freeblock is given in the header.  Freeblocks occur in
  154: ** increasing order.  Because a freeblock must be at least 4 bytes in size,
  155: ** any group of 3 or fewer unused bytes in the cell content area cannot
  156: ** exist on the freeblock chain.  A group of 3 or fewer free bytes is called
  157: ** a fragment.  The total number of bytes in all fragments is recorded.
  158: ** in the page header at offset 7.
  159: **
  160: **    SIZE    DESCRIPTION
  161: **      2     Byte offset of the next freeblock
  162: **      2     Bytes in this freeblock
  163: **
  164: ** Cells are of variable length.  Cells are stored in the cell content area at
  165: ** the end of the page.  Pointers to the cells are in the cell pointer array
  166: ** that immediately follows the page header.  Cells is not necessarily
  167: ** contiguous or in order, but cell pointers are contiguous and in order.
  168: **
  169: ** Cell content makes use of variable length integers.  A variable
  170: ** length integer is 1 to 9 bytes where the lower 7 bits of each 
  171: ** byte are used.  The integer consists of all bytes that have bit 8 set and
  172: ** the first byte with bit 8 clear.  The most significant byte of the integer
  173: ** appears first.  A variable-length integer may not be more than 9 bytes long.
  174: ** As a special case, all 8 bytes of the 9th byte are used as data.  This
  175: ** allows a 64-bit integer to be encoded in 9 bytes.
  176: **
  177: **    0x00                      becomes  0x00000000
  178: **    0x7f                      becomes  0x0000007f
  179: **    0x81 0x00                 becomes  0x00000080
  180: **    0x82 0x00                 becomes  0x00000100
  181: **    0x80 0x7f                 becomes  0x0000007f
  182: **    0x8a 0x91 0xd1 0xac 0x78  becomes  0x12345678
  183: **    0x81 0x81 0x81 0x81 0x01  becomes  0x10204081
  184: **
  185: ** Variable length integers are used for rowids and to hold the number of
  186: ** bytes of key and data in a btree cell.
  187: **
  188: ** The content of a cell looks like this:
  189: **
  190: **    SIZE    DESCRIPTION
  191: **      4     Page number of the left child. Omitted if leaf flag is set.
  192: **     var    Number of bytes of data. Omitted if the zerodata flag is set.
  193: **     var    Number of bytes of key. Or the key itself if intkey flag is set.
  194: **      *     Payload
  195: **      4     First page of the overflow chain.  Omitted if no overflow
  196: **
  197: ** Overflow pages form a linked list.  Each page except the last is completely
  198: ** filled with data (pagesize - 4 bytes).  The last page can have as little
  199: ** as 1 byte of data.
  200: **
  201: **    SIZE    DESCRIPTION
  202: **      4     Page number of next overflow page
  203: **      *     Data
  204: **
  205: ** Freelist pages come in two subtypes: trunk pages and leaf pages.  The
  206: ** file header points to the first in a linked list of trunk page.  Each trunk
  207: ** page points to multiple leaf pages.  The content of a leaf page is
  208: ** unspecified.  A trunk page looks like this:
  209: **
  210: **    SIZE    DESCRIPTION
  211: **      4     Page number of next trunk page
  212: **      4     Number of leaf pointers on this page
  213: **      *     zero or more pages numbers of leaves
  214: */
  215: #include "sqliteInt.h"
  216: 
  217: 
  218: /* The following value is the maximum cell size assuming a maximum page
  219: ** size give above.
  220: */
  221: #define MX_CELL_SIZE(pBt)  ((int)(pBt->pageSize-8))
  222: 
  223: /* The maximum number of cells on a single page of the database.  This
  224: ** assumes a minimum cell size of 6 bytes  (4 bytes for the cell itself
  225: ** plus 2 bytes for the index to the cell in the page header).  Such
  226: ** small cells will be rare, but they are possible.
  227: */
  228: #define MX_CELL(pBt) ((pBt->pageSize-8)/6)
  229: 
  230: /* Forward declarations */
  231: typedef struct MemPage MemPage;
  232: typedef struct BtLock BtLock;
  233: 
  234: /*
  235: ** This is a magic string that appears at the beginning of every
  236: ** SQLite database in order to identify the file as a real database.
  237: **
  238: ** You can change this value at compile-time by specifying a
  239: ** -DSQLITE_FILE_HEADER="..." on the compiler command-line.  The
  240: ** header must be exactly 16 bytes including the zero-terminator so
  241: ** the string itself should be 15 characters long.  If you change
  242: ** the header, then your custom library will not be able to read 
  243: ** databases generated by the standard tools and the standard tools
  244: ** will not be able to read databases created by your custom library.
  245: */
  246: #ifndef SQLITE_FILE_HEADER /* 123456789 123456 */
  247: #  define SQLITE_FILE_HEADER "SQLite format 3"
  248: #endif
  249: 
  250: /*
  251: ** Page type flags.  An ORed combination of these flags appear as the
  252: ** first byte of on-disk image of every BTree page.
  253: */
  254: #define PTF_INTKEY    0x01
  255: #define PTF_ZERODATA  0x02
  256: #define PTF_LEAFDATA  0x04
  257: #define PTF_LEAF      0x08
  258: 
  259: /*
  260: ** As each page of the file is loaded into memory, an instance of the following
  261: ** structure is appended and initialized to zero.  This structure stores
  262: ** information about the page that is decoded from the raw file page.
  263: **
  264: ** The pParent field points back to the parent page.  This allows us to
  265: ** walk up the BTree from any leaf to the root.  Care must be taken to
  266: ** unref() the parent page pointer when this page is no longer referenced.
  267: ** The pageDestructor() routine handles that chore.
  268: **
  269: ** Access to all fields of this structure is controlled by the mutex
  270: ** stored in MemPage.pBt->mutex.
  271: */
  272: struct MemPage {
  273:   u8 isInit;           /* True if previously initialized. MUST BE FIRST! */
  274:   u8 nOverflow;        /* Number of overflow cell bodies in aCell[] */
  275:   u8 intKey;           /* True if intkey flag is set */
  276:   u8 leaf;             /* True if leaf flag is set */
  277:   u8 hasData;          /* True if this page stores data */
  278:   u8 hdrOffset;        /* 100 for page 1.  0 otherwise */
  279:   u8 childPtrSize;     /* 0 if leaf==1.  4 if leaf==0 */
  280:   u8 max1bytePayload;  /* min(maxLocal,127) */
  281:   u16 maxLocal;        /* Copy of BtShared.maxLocal or BtShared.maxLeaf */
  282:   u16 minLocal;        /* Copy of BtShared.minLocal or BtShared.minLeaf */
  283:   u16 cellOffset;      /* Index in aData of first cell pointer */
  284:   u16 nFree;           /* Number of free bytes on the page */
  285:   u16 nCell;           /* Number of cells on this page, local and ovfl */
  286:   u16 maskPage;        /* Mask for page offset */
  287:   struct _OvflCell {   /* Cells that will not fit on aData[] */
  288:     u8 *pCell;          /* Pointers to the body of the overflow cell */
  289:     u16 idx;            /* Insert this cell before idx-th non-overflow cell */
  290:   } aOvfl[5];
  291:   BtShared *pBt;       /* Pointer to BtShared that this page is part of */
  292:   u8 *aData;           /* Pointer to disk image of the page data */
  293:   u8 *aDataEnd;        /* One byte past the end of usable data */
  294:   u8 *aCellIdx;        /* The cell index area */
  295:   DbPage *pDbPage;     /* Pager page handle */
  296:   Pgno pgno;           /* Page number for this page */
  297: };
  298: 
  299: /*
  300: ** The in-memory image of a disk page has the auxiliary information appended
  301: ** to the end.  EXTRA_SIZE is the number of bytes of space needed to hold
  302: ** that extra information.
  303: */
  304: #define EXTRA_SIZE sizeof(MemPage)
  305: 
  306: /*
  307: ** A linked list of the following structures is stored at BtShared.pLock.
  308: ** Locks are added (or upgraded from READ_LOCK to WRITE_LOCK) when a cursor 
  309: ** is opened on the table with root page BtShared.iTable. Locks are removed
  310: ** from this list when a transaction is committed or rolled back, or when
  311: ** a btree handle is closed.
  312: */
  313: struct BtLock {
  314:   Btree *pBtree;        /* Btree handle holding this lock */
  315:   Pgno iTable;          /* Root page of table */
  316:   u8 eLock;             /* READ_LOCK or WRITE_LOCK */
  317:   BtLock *pNext;        /* Next in BtShared.pLock list */
  318: };
  319: 
  320: /* Candidate values for BtLock.eLock */
  321: #define READ_LOCK     1
  322: #define WRITE_LOCK    2
  323: 
  324: /* A Btree handle
  325: **
  326: ** A database connection contains a pointer to an instance of
  327: ** this object for every database file that it has open.  This structure
  328: ** is opaque to the database connection.  The database connection cannot
  329: ** see the internals of this structure and only deals with pointers to
  330: ** this structure.
  331: **
  332: ** For some database files, the same underlying database cache might be 
  333: ** shared between multiple connections.  In that case, each connection
  334: ** has it own instance of this object.  But each instance of this object
  335: ** points to the same BtShared object.  The database cache and the
  336: ** schema associated with the database file are all contained within
  337: ** the BtShared object.
  338: **
  339: ** All fields in this structure are accessed under sqlite3.mutex.
  340: ** The pBt pointer itself may not be changed while there exists cursors 
  341: ** in the referenced BtShared that point back to this Btree since those
  342: ** cursors have to go through this Btree to find their BtShared and
  343: ** they often do so without holding sqlite3.mutex.
  344: */
  345: struct Btree {
  346:   sqlite3 *db;       /* The database connection holding this btree */
  347:   BtShared *pBt;     /* Sharable content of this btree */
  348:   u8 inTrans;        /* TRANS_NONE, TRANS_READ or TRANS_WRITE */
  349:   u8 sharable;       /* True if we can share pBt with another db */
  350:   u8 locked;         /* True if db currently has pBt locked */
  351:   int wantToLock;    /* Number of nested calls to sqlite3BtreeEnter() */
  352:   int nBackup;       /* Number of backup operations reading this btree */
  353:   Btree *pNext;      /* List of other sharable Btrees from the same db */
  354:   Btree *pPrev;      /* Back pointer of the same list */
  355: #ifndef SQLITE_OMIT_SHARED_CACHE
  356:   BtLock lock;       /* Object used to lock page 1 */
  357: #endif
  358: };
  359: 
  360: /*
  361: ** Btree.inTrans may take one of the following values.
  362: **
  363: ** If the shared-data extension is enabled, there may be multiple users
  364: ** of the Btree structure. At most one of these may open a write transaction,
  365: ** but any number may have active read transactions.
  366: */
  367: #define TRANS_NONE  0
  368: #define TRANS_READ  1
  369: #define TRANS_WRITE 2
  370: 
  371: /*
  372: ** An instance of this object represents a single database file.
  373: ** 
  374: ** A single database file can be in use at the same time by two
  375: ** or more database connections.  When two or more connections are
  376: ** sharing the same database file, each connection has it own
  377: ** private Btree object for the file and each of those Btrees points
  378: ** to this one BtShared object.  BtShared.nRef is the number of
  379: ** connections currently sharing this database file.
  380: **
  381: ** Fields in this structure are accessed under the BtShared.mutex
  382: ** mutex, except for nRef and pNext which are accessed under the
  383: ** global SQLITE_MUTEX_STATIC_MASTER mutex.  The pPager field
  384: ** may not be modified once it is initially set as long as nRef>0.
  385: ** The pSchema field may be set once under BtShared.mutex and
  386: ** thereafter is unchanged as long as nRef>0.
  387: **
  388: ** isPending:
  389: **
  390: **   If a BtShared client fails to obtain a write-lock on a database
  391: **   table (because there exists one or more read-locks on the table),
  392: **   the shared-cache enters 'pending-lock' state and isPending is
  393: **   set to true.
  394: **
  395: **   The shared-cache leaves the 'pending lock' state when either of
  396: **   the following occur:
  397: **
  398: **     1) The current writer (BtShared.pWriter) concludes its transaction, OR
  399: **     2) The number of locks held by other connections drops to zero.
  400: **
  401: **   while in the 'pending-lock' state, no connection may start a new
  402: **   transaction.
  403: **
  404: **   This feature is included to help prevent writer-starvation.
  405: */
  406: struct BtShared {
  407:   Pager *pPager;        /* The page cache */
  408:   sqlite3 *db;          /* Database connection currently using this Btree */
  409:   BtCursor *pCursor;    /* A list of all open cursors */
  410:   MemPage *pPage1;      /* First page of the database */
  411:   u8 openFlags;         /* Flags to sqlite3BtreeOpen() */
  412: #ifndef SQLITE_OMIT_AUTOVACUUM
  413:   u8 autoVacuum;        /* True if auto-vacuum is enabled */
  414:   u8 incrVacuum;        /* True if incr-vacuum is enabled */
  415: #endif
  416:   u8 inTransaction;     /* Transaction state */
  417:   u8 max1bytePayload;   /* Maximum first byte of cell for a 1-byte payload */
  418:   u16 btsFlags;         /* Boolean parameters.  See BTS_* macros below */
  419:   u16 maxLocal;         /* Maximum local payload in non-LEAFDATA tables */
  420:   u16 minLocal;         /* Minimum local payload in non-LEAFDATA tables */
  421:   u16 maxLeaf;          /* Maximum local payload in a LEAFDATA table */
  422:   u16 minLeaf;          /* Minimum local payload in a LEAFDATA table */
  423:   u32 pageSize;         /* Total number of bytes on a page */
  424:   u32 usableSize;       /* Number of usable bytes on each page */
  425:   int nTransaction;     /* Number of open transactions (read + write) */
  426:   u32 nPage;            /* Number of pages in the database */
  427:   void *pSchema;        /* Pointer to space allocated by sqlite3BtreeSchema() */
  428:   void (*xFreeSchema)(void*);  /* Destructor for BtShared.pSchema */
  429:   sqlite3_mutex *mutex; /* Non-recursive mutex required to access this object */
  430:   Bitvec *pHasContent;  /* Set of pages moved to free-list this transaction */
  431: #ifndef SQLITE_OMIT_SHARED_CACHE
  432:   int nRef;             /* Number of references to this structure */
  433:   BtShared *pNext;      /* Next on a list of sharable BtShared structs */
  434:   BtLock *pLock;        /* List of locks held on this shared-btree struct */
  435:   Btree *pWriter;       /* Btree with currently open write transaction */
  436: #endif
  437:   u8 *pTmpSpace;        /* BtShared.pageSize bytes of space for tmp use */
  438: };
  439: 
  440: /*
  441: ** Allowed values for BtShared.btsFlags
  442: */
  443: #define BTS_READ_ONLY        0x0001   /* Underlying file is readonly */
  444: #define BTS_PAGESIZE_FIXED   0x0002   /* Page size can no longer be changed */
  445: #define BTS_SECURE_DELETE    0x0004   /* PRAGMA secure_delete is enabled */
  446: #define BTS_INITIALLY_EMPTY  0x0008   /* Database was empty at trans start */
  447: #define BTS_NO_WAL           0x0010   /* Do not open write-ahead-log files */
  448: #define BTS_EXCLUSIVE        0x0020   /* pWriter has an exclusive lock */
  449: #define BTS_PENDING          0x0040   /* Waiting for read-locks to clear */
  450: 
  451: /*
  452: ** An instance of the following structure is used to hold information
  453: ** about a cell.  The parseCellPtr() function fills in this structure
  454: ** based on information extract from the raw disk page.
  455: */
  456: typedef struct CellInfo CellInfo;
  457: struct CellInfo {
  458:   i64 nKey;      /* The key for INTKEY tables, or number of bytes in key */
  459:   u8 *pCell;     /* Pointer to the start of cell content */
  460:   u32 nData;     /* Number of bytes of data */
  461:   u32 nPayload;  /* Total amount of payload */
  462:   u16 nHeader;   /* Size of the cell content header in bytes */
  463:   u16 nLocal;    /* Amount of payload held locally */
  464:   u16 iOverflow; /* Offset to overflow page number.  Zero if no overflow */
  465:   u16 nSize;     /* Size of the cell content on the main b-tree page */
  466: };
  467: 
  468: /*
  469: ** Maximum depth of an SQLite B-Tree structure. Any B-Tree deeper than
  470: ** this will be declared corrupt. This value is calculated based on a
  471: ** maximum database size of 2^31 pages a minimum fanout of 2 for a
  472: ** root-node and 3 for all other internal nodes.
  473: **
  474: ** If a tree that appears to be taller than this is encountered, it is
  475: ** assumed that the database is corrupt.
  476: */
  477: #define BTCURSOR_MAX_DEPTH 20
  478: 
  479: /*
  480: ** A cursor is a pointer to a particular entry within a particular
  481: ** b-tree within a database file.
  482: **
  483: ** The entry is identified by its MemPage and the index in
  484: ** MemPage.aCell[] of the entry.
  485: **
  486: ** A single database file can be shared by two more database connections,
  487: ** but cursors cannot be shared.  Each cursor is associated with a
  488: ** particular database connection identified BtCursor.pBtree.db.
  489: **
  490: ** Fields in this structure are accessed under the BtShared.mutex
  491: ** found at self->pBt->mutex. 
  492: */
  493: struct BtCursor {
  494:   Btree *pBtree;            /* The Btree to which this cursor belongs */
  495:   BtShared *pBt;            /* The BtShared this cursor points to */
  496:   BtCursor *pNext, *pPrev;  /* Forms a linked list of all cursors */
  497:   struct KeyInfo *pKeyInfo; /* Argument passed to comparison function */
  498:   Pgno pgnoRoot;            /* The root page of this tree */
  499:   sqlite3_int64 cachedRowid; /* Next rowid cache.  0 means not valid */
  500:   CellInfo info;            /* A parse of the cell we are pointing at */
  501:   i64 nKey;        /* Size of pKey, or last integer key */
  502:   void *pKey;      /* Saved key that was cursor's last known position */
  503:   int skipNext;    /* Prev() is noop if negative. Next() is noop if positive */
  504:   u8 wrFlag;                /* True if writable */
  505:   u8 atLast;                /* Cursor pointing to the last entry */
  506:   u8 validNKey;             /* True if info.nKey is valid */
  507:   u8 eState;                /* One of the CURSOR_XXX constants (see below) */
  508: #ifndef SQLITE_OMIT_INCRBLOB
  509:   Pgno *aOverflow;          /* Cache of overflow page locations */
  510:   u8 isIncrblobHandle;      /* True if this cursor is an incr. io handle */
  511: #endif
  512:   i16 iPage;                            /* Index of current page in apPage */
  513:   u16 aiIdx[BTCURSOR_MAX_DEPTH];        /* Current index in apPage[i] */
  514:   MemPage *apPage[BTCURSOR_MAX_DEPTH];  /* Pages from root to current page */
  515: };
  516: 
  517: /*
  518: ** Potential values for BtCursor.eState.
  519: **
  520: ** CURSOR_VALID:
  521: **   Cursor points to a valid entry. getPayload() etc. may be called.
  522: **
  523: ** CURSOR_INVALID:
  524: **   Cursor does not point to a valid entry. This can happen (for example) 
  525: **   because the table is empty or because BtreeCursorFirst() has not been
  526: **   called.
  527: **
  528: ** CURSOR_REQUIRESEEK:
  529: **   The table that this cursor was opened on still exists, but has been 
  530: **   modified since the cursor was last used. The cursor position is saved
  531: **   in variables BtCursor.pKey and BtCursor.nKey. When a cursor is in 
  532: **   this state, restoreCursorPosition() can be called to attempt to
  533: **   seek the cursor to the saved position.
  534: **
  535: ** CURSOR_FAULT:
  536: **   A unrecoverable error (an I/O error or a malloc failure) has occurred
  537: **   on a different connection that shares the BtShared cache with this
  538: **   cursor.  The error has left the cache in an inconsistent state.
  539: **   Do nothing else with this cursor.  Any attempt to use the cursor
  540: **   should return the error code stored in BtCursor.skip
  541: */
  542: #define CURSOR_INVALID           0
  543: #define CURSOR_VALID             1
  544: #define CURSOR_REQUIRESEEK       2
  545: #define CURSOR_FAULT             3
  546: 
  547: /* 
  548: ** The database page the PENDING_BYTE occupies. This page is never used.
  549: */
  550: # define PENDING_BYTE_PAGE(pBt) PAGER_MJ_PGNO(pBt)
  551: 
  552: /*
  553: ** These macros define the location of the pointer-map entry for a 
  554: ** database page. The first argument to each is the number of usable
  555: ** bytes on each page of the database (often 1024). The second is the
  556: ** page number to look up in the pointer map.
  557: **
  558: ** PTRMAP_PAGENO returns the database page number of the pointer-map
  559: ** page that stores the required pointer. PTRMAP_PTROFFSET returns
  560: ** the offset of the requested map entry.
  561: **
  562: ** If the pgno argument passed to PTRMAP_PAGENO is a pointer-map page,
  563: ** then pgno is returned. So (pgno==PTRMAP_PAGENO(pgsz, pgno)) can be
  564: ** used to test if pgno is a pointer-map page. PTRMAP_ISPAGE implements
  565: ** this test.
  566: */
  567: #define PTRMAP_PAGENO(pBt, pgno) ptrmapPageno(pBt, pgno)
  568: #define PTRMAP_PTROFFSET(pgptrmap, pgno) (5*(pgno-pgptrmap-1))
  569: #define PTRMAP_ISPAGE(pBt, pgno) (PTRMAP_PAGENO((pBt),(pgno))==(pgno))
  570: 
  571: /*
  572: ** The pointer map is a lookup table that identifies the parent page for
  573: ** each child page in the database file.  The parent page is the page that
  574: ** contains a pointer to the child.  Every page in the database contains
  575: ** 0 or 1 parent pages.  (In this context 'database page' refers
  576: ** to any page that is not part of the pointer map itself.)  Each pointer map
  577: ** entry consists of a single byte 'type' and a 4 byte parent page number.
  578: ** The PTRMAP_XXX identifiers below are the valid types.
  579: **
  580: ** The purpose of the pointer map is to facility moving pages from one
  581: ** position in the file to another as part of autovacuum.  When a page
  582: ** is moved, the pointer in its parent must be updated to point to the
  583: ** new location.  The pointer map is used to locate the parent page quickly.
  584: **
  585: ** PTRMAP_ROOTPAGE: The database page is a root-page. The page-number is not
  586: **                  used in this case.
  587: **
  588: ** PTRMAP_FREEPAGE: The database page is an unused (free) page. The page-number 
  589: **                  is not used in this case.
  590: **
  591: ** PTRMAP_OVERFLOW1: The database page is the first page in a list of 
  592: **                   overflow pages. The page number identifies the page that
  593: **                   contains the cell with a pointer to this overflow page.
  594: **
  595: ** PTRMAP_OVERFLOW2: The database page is the second or later page in a list of
  596: **                   overflow pages. The page-number identifies the previous
  597: **                   page in the overflow page list.
  598: **
  599: ** PTRMAP_BTREE: The database page is a non-root btree page. The page number
  600: **               identifies the parent page in the btree.
  601: */
  602: #define PTRMAP_ROOTPAGE 1
  603: #define PTRMAP_FREEPAGE 2
  604: #define PTRMAP_OVERFLOW1 3
  605: #define PTRMAP_OVERFLOW2 4
  606: #define PTRMAP_BTREE 5
  607: 
  608: /* A bunch of assert() statements to check the transaction state variables
  609: ** of handle p (type Btree*) are internally consistent.
  610: */
  611: #define btreeIntegrity(p) \
  612:   assert( p->pBt->inTransaction!=TRANS_NONE || p->pBt->nTransaction==0 ); \
  613:   assert( p->pBt->inTransaction>=p->inTrans ); 
  614: 
  615: 
  616: /*
  617: ** The ISAUTOVACUUM macro is used within balance_nonroot() to determine
  618: ** if the database supports auto-vacuum or not. Because it is used
  619: ** within an expression that is an argument to another macro 
  620: ** (sqliteMallocRaw), it is not possible to use conditional compilation.
  621: ** So, this macro is defined instead.
  622: */
  623: #ifndef SQLITE_OMIT_AUTOVACUUM
  624: #define ISAUTOVACUUM (pBt->autoVacuum)
  625: #else
  626: #define ISAUTOVACUUM 0
  627: #endif
  628: 
  629: 
  630: /*
  631: ** This structure is passed around through all the sanity checking routines
  632: ** in order to keep track of some global state information.
  633: */
  634: typedef struct IntegrityCk IntegrityCk;
  635: struct IntegrityCk {
  636:   BtShared *pBt;    /* The tree being checked out */
  637:   Pager *pPager;    /* The associated pager.  Also accessible by pBt->pPager */
  638:   Pgno nPage;       /* Number of pages in the database */
  639:   int *anRef;       /* Number of times each page is referenced */
  640:   int mxErr;        /* Stop accumulating errors when this reaches zero */
  641:   int nErr;         /* Number of messages written to zErrMsg so far */
  642:   int mallocFailed; /* A memory allocation error has occurred */
  643:   StrAccum errMsg;  /* Accumulate the error message text here */
  644: };
  645: 
  646: /*
  647: ** Routines to read or write a two- and four-byte big-endian integer values.
  648: */
  649: #define get2byte(x)   ((x)[0]<<8 | (x)[1])
  650: #define put2byte(p,v) ((p)[0] = (u8)((v)>>8), (p)[1] = (u8)(v))
  651: #define get4byte sqlite3Get4byte
  652: #define put4byte sqlite3Put4byte

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>