Binary Trees in SQL Server
We often work with data in SQL Server. We write SELECT, INSERT, UPDATE, DELETE etc. commands easily to fetch, manipulate the data. However, if we get to know how SQL Server Database Management System actually stores this data in its underlying file system, what methodologies it uses to perform various operations on data, it is possible for us to store the data more efficiently.
Well, one may question whether the DBMS doesn't know how to store the data efficiently? Yes, the DBMS does store the data efficiently in its own way, but it doesn't know your query requirements. It means which columns we use in a select statement, which fields you are going to update frequently, what conditions you use in your where clause in order to fetch the required data.
We often make use of indexes to provide this information to database so that the database is well prepared to store the data efficiently based on our index which in turn helps to perform DDL, DML operations on the data efficiently. While indexes are of many kinds, the two basic types of indexes are Clustered Indexes and Non Clustered Indexes. Once an index of any of these kind is built, internally SQL Server prepares a structure which is known as Binary Tree Structure, popularly known as B-Tree. SQL Server uses this tree to locate any requested row through a command.
The above shown binary tree is a model that gets created for an index that is created on a column that contains values 1 to 200. If we query a row that contains the column value '111', then sql server looks first at the root level if that value falls in the range. It then takes the path into intermediate levels where the node 101-200 exists, then further 101-150 and then further narrows down to 101-125 wherein it fetches the required row. If the index that is created is a clustered index, the leaf level nodes are nothing but the data rows themselves. So, the database directly gets the data once it reaches the leaf node. However, if the index that you have created is a non-clustered one, then the leaf nodes hold the address of the actual data or the corresponding clustered index node (if the table has one), where the column value of '111' resides in the underlying disk and then further fetches the data by pointing the read head of the disk to that particular address.
Hence it is very important to choose the columns included in either the clustered or non-clustered indexes based on frequency of inserts/update vs select queries that run on these columns.
Hope this gives the basic idea of a b-tree structure in sql server. I will publish more on various kinds of indexes and their impact in data retrievals soon.
Thank you,
Raja