Basic Concepts of Index
- Oracle employs two primary index architectures: B-tree indexes and bitmap indexes. Variations of these main types include cluster indexes, bitmap join indexes, function-based indexes, reverse key indexes, and text indexes. The B-tree index is considered the “standard” index type.
The “Tree” in B-Tree
- A B-tree index is a tree-shaped data structure, but it is composed of database blocks rather than rows. Think of the leaf blocks of the index as the pages of a phone book.
- Each page in the phone book (leaf block in the index) contains multiple entries. These entries consist of a name (the indexed column value) and an address (ROWID) indicating the physical location of the telephone (row in the table).
- The names on each page are sorted, and when the pages are correctly ordered, they form a complete, sorted list of every name and address.
For example : To find the name Whale in this b-Tree phone book, we: => Read page 1. This tells us that page 6 starts with Red and that page 7 starts with Black. => Read page 6. This tells us that page 350 starts with yellow and that page 351 starts with Blue. => Read page 350, which is a leaf block; we find Whale's address and phone number. => That's it; 3 blocks to find a specific row in a million row table. In reality, index blocks often fit 100 or more rows, so b-Trees are typically quite shallow. I have never seen an index with more than 5 levels. SQL> select index_name, blevel+1 from user_indexes order by 2 ;
user_indexes.blevel is the number of branch levels. Always add 1 to include the leaf level; this tells us the number of blocks a unique index scan must read to reach the leaf-block. If we’re really, really, insatiably curious; try this in SQL*Plus:
SQL> accept index_name prompt "Index Name: "
SQL> alter session set tracefile_identifier='&index_name' ;
SQL> column object_id new_value object_id
SQL> select object_id from user_objects where object_type = 'INDEX' and object_name=upper('&index_name');
SQL> alter session set events 'Immediate trace name treedump level &object_id';
SQL> alter session set tracefile identifier="" ;
SQL> show parameter user_dump_dest
- Now, on the Oracle server, navigate to the directory indicated by the final
SHOW PARAMETER user_dump_dest
command to locate the trace file. The file name will include the index name. Here is an example
Now, on the Oracle server, go to the directory shown by the final SHOW PARAMETER user_dump_dest command and find the trace file - the file name will contain the index name. Here is a sample: ---- begin tree dump branch: 0x68066c8 109078216 (0: nrow: 325, level: 1) leaf: 0x68066c9 109078217 (-1: nrow: 694 rrow: 694) leaf: 0x68066ca 109078218 (0: nrow: 693 rrow: 693) ... ... leaf: 0x68069cf 109078991 (320: nrow: 763 rrow: 763) leaf: 0x68069d0 109078992 (321: nrow: 761 rrow: 761) ----- end tree dump This index has only a root branch with 323 leaf nodes. Each leaf node contains a variable number of index entries up to 807! A deeper index would be more interesting, but it would take a while to dump. "B" is for... Contrary to popular belief, b is not for binary; it's balanced. As we insert new rows into the table, new rows are inserted into index leaf blocks. When a leaf block is full, another insert will cause the block to be split into two blocks, which means an entry for the new block must be added to the parent branch-block. If the branch-block is also full, it too is split. The process propagates back up the tree until the parent of split has space for one more entry, or the root is reached. A new root is created if the root node splits. Staggeringly, this process ensures that every branch will be the same length.
Indexes have three main uses: To quickly find specific rows by avoiding a Full Table Scan We've already seen above how a Unique Scan works. Using the phone book metaphor, it's not hard to understand how a Range Scan works in much the same way to find all people named "Whale", or all of the names alphabetically between "Black" and "Blue". Range Scans can occur when we use >, <, LIKE, or BETWEEN in a WHERE clause. A range scan will find the first row in the range using the same technique as the Unique Scan, but will then keep reading the index up to the end of the range. It is OK if the range covers many blocks. To avoid a table access altogether If all we wanted to do when looking up Whale in the phone book was to find his address or phone number, the job would be done. However if we wanted to know his date of birth, we'd have to phone and ask. This takes time. If it was something that we needed all the time, like an email address, we could save time by adding it to the phone book. Oracle does the same thing. If the information is in the index, then it doesn't bother to read the table. It is a reasonably common technique to add columns to an index, not because they will be used as part of the index scan, but because they save a table access. In fact, Oracle may even perform a Fast Full Scan of an index that it cannot use in a Range or Unique scan just to avoid a table access. To avoid a sort This one is not so well known, largely because it is so poorly documented (and in many cases, unpredicatably implemented by the Optimizer as well). Oracle performs a sort for many reasons: ORDER BY, GROUP BY, DISTINCT, Set operations (eg. UNION), Sort-Merge Joins, uncorrelated IN-subqueries, Analytic Functions). If a sort operation requires rows in the same order as the index, then Oracle may read the table rows via the index. A sort operation is not necessary since the rows are returned in sorted order.
Despite all of the instances listed above where a sort is performed, I have only seen three cases where a sort is actually avoided.
1. GROUP BY :
SQL> select src_sys, sum(actl_expns_amt), count(*) from ef_actl_expns
where src_sys = 'CDW' and actl_expns_amt > 0
group by src_sys ;
-----------------------------------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT GROUP BY NOSORT <------- | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID | EF_ACTL_EXPNS |
|* 3 | INDEX RANGE SCAN | EF_AEXP_PK |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
----------------------------------------------------------------
2 - filter("ACTL_EXPNS_AMT">0)
3 - access("SRC_SYS"='CDW')
Note the NOSORT qualifier in Step 1.
2. ORDER BY : SQL> select * from ef_actl_expns where src_sys = 'CDW' and actl_expns_amt > 0 order by src_sys ---------------------------------------------------------------------------------------- | Id | Operation | Name | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID | EF_ACTL_EXPNS| |* 2 | INDEX RANGE SCAN | EF_AEXP_PK | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ACTL_EXPNS_AMT">0) 2 - access("SRC_SYS"='CDW') Note that there is no SORT operation, despite the ORDER BY clause. Compare this to the following: SQL> select * from ef_actl_expns where src_sys = 'CDW' and actl_expns_amt > 0 order by actl_expns_amt ; --------------------------------------------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT ORDER BY | | | *2| TABLE ACCESS BY GLOBAL INDEX ROWID | EF_ACTL_EXPNS | |*3 | INDEX RANGE SCAN | EF_AEXP_PK | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ACTL_EXPNS_AMT">0) 3 - access("SRC_SYS"='CDW')
3. DISTINCT : SQL> select distinct src_sys from ef_actl_expns where src_sys = 'CDW' and actl_expns_amt > 0 ; ----------------------------------------------------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT UNIQUE NOSORT | | |* 2| TABLE ACCESS BY GLOBAL INDEX ROWID | EF_ACTL_EXPNS | |* 3| INDEX RANGE SCAN | EF_AEXP_PK | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ACTL_EXPNS_AMT">0) 3 - access("SRC_SYS"='CDW') Again, note the NOSORT qualifier.
This is an extraordinary tuning technique in OLTP systems like SQL*Forms that return one page of detail at a time to the screen. A SQL with a DISTINCT, GROUP BY, or ORDER BY that uses an index to sort can return just the first page of matching rows without having to fetch the entire result set for a sort. This can be the difference between sub-second response time and several minutes or hours.
Full table Scans are not bad :
- So far, we’ve explored the benefits of indexes, but they aren’t always advantageous. Sometimes, indexes can be ineffective or even slow down queries.
- A B-tree index is beneficial only if the `WHERE` clause uses operators like `>`, `<`, `LIKE`, `IN`, or `BETWEEN`. However, B-tree indexes are not useful for `NOT` operators such as `!=`, `NOT IN`, or `NOT LIKE`. The use of joins, sub-queries, `OR` predicates, functions (including arithmetic and concatenation), and casting can introduce conditions, caveats, and complexities that fall outside the scope of this discussion. For detailed information, refer to a comprehensive SQL tuning manual.
- More intriguing and critical are the instances where an index slows down a SQL query. This scenario is particularly common in batch systems processing large data volumes.
- To illustrate the issue, let’s use a new metaphor. Picture a large deciduous tree in your front yard during Autumn, and your task is to collect all the leaves on the lawn. The fastest method (without tools like a rake or leaf-vac) would be to get on your hands and knees with a bag, working methodically across the lawn and gathering leaves as you go. This represents a Full Table Scan, selecting rows in no particular order except those closest at hand. Like picking up leaves in handfuls, a Full Table Scan reads a block from disk and caches the next few blocks, expecting them to be needed soon. You can check this behavior with the following SQL command:
SQL> show parameter db_file_multiblock_read_count;
- A Full Table Scan is the quickest way to collect all leaves. Similarly, an index (think of a virtual reality headset guiding you) is the fastest way to pick up just the smallest leaf or even the 100 smallest leaves. As the number of leaves increases, there comes a break-even point beyond which a Full Table Scan is faster. This threshold varies based on the table, index, database settings, hardware, and server load, generally falling between 1% and 10% of the table.
The main reasons for this are :
- Oracle cannot read single rows. To read a row via an index, the entire block must be read with all but one row discarded. So an index scan of 100 rows would read 100 blocks, but a FTS might read 100 rows in a single block.
- The db_file_multiblock_read_count setting described earlier means FTS requires fewer visits to the physical disk.
Even if none of these things was true, accessing the entire index and the entire table is still more IO than just accessing the table. - So what’s the lesson here? Know our data! If our query needs 50% of the rows in the table to resolve our query, an index scan just won’t help. Not only should we not bother creating or investigating the existence of an index, we should check to make sure Oracle is not already using an index. There are a number of ways to influence index usage; once again, consult a tuning manual.
- The exception to this rule – there’s always one – is when all of the columns referenced in the SQL are contained in the index. If Oracle does not have to access the table then there is no break-even point; it is generally quicker to scan the index even for 100% of the rows.