An Overview of Database Indexing for Beginners

Database Indexing is the most common way known and utilized by backend developers to optimize database queries. In this article, we will be discussing about database indexing in detail.

What is a database index?

A database index allows a query to retrieve data from a database in an efficient manner. In simpler words, indexing is a way to get an unordered table into an order that will maximize efficiency when searching for a record.

Indexes can be related to specific tables and consist of one or more keys. Also, a table can have multiple indexes built from it.

When a database table is unindexed, there won’t be any clear order of the rows, thus, to fulfill any query, it will need to search through the rows linearly, that is, the query will have to search through each row to find the rows with the matching condition. As you can imagine, this isn’t ideal and can be a problem when looking inside a database table with huge amount of data.

For example, we have a table as shown below:

COMPANY_ID	UNIT	UNIT_COST
10	12	1.15
12	12	1.05
14	18	1.31
18	18	1.34
11	24	1.15
16	12	1.31
10	12	1.15
12	24	1.3
18	6	1.34
18	12	1.35
14	12	1.95
21	18	1.36
12	12	1.05
20	6	1.31
18	18	1.34
11	24	1.15
14	24	1.05

And then, we want to run a query as following:

SELECT
	company_id,
	units,
	unit_cost
FROM
	index_test
WHERE
	company_id = 18

In this particular case, the database would have to search through all 17 records in the order they appear in the table, from top to bottom, one at a time, to look for all the potential instances of company_id as 18.

This will only get more and more time consuming as the size of the table increases. How indexing can help here? Indexing can help us set up the column with the search condition on (company_id in this case) in a sorted manner to optimize the query performance.

With an index on the company_id column, the table would look like this:

COMPANY_ID	UNIT	UNIT_COST
10	12	1.15
10	12	1.15
11	24	1.15
11	24	1.15
12	12	1.05
12	24	1.3
12	12	1.05
14	18	1.31
14	12	1.95
14	24	1.05
16	12	1.31
18	18	1.34
18	6	1.34
18	12	1.35
18	18	1.34
20	6	1.31
21	18	1.36

Now, the database can simply search for company_id equal to 18 and return all the requested columns for that row, and then move to the next row. If the next row also has the company_id as 18 again, then it will also return the request columns for this row, but if the next row has the company_id as 18, the database knows that it can stop the search here, and finish the response.

This was a rather simple explanation of what database indexes are and what they can do, but there’s a lot more going on in the process. Let’s take a deeper look into how indexing works.

How does database indexing work?

In reality, the database table doesn’t reorder itself every time the query conditions alter in order to optimize the database performance but actually happens is that the index makes the database create a separate data structure which should be easily sortable.

It is important to note that when an index is created on a column in a database, it creates a data structure on that specific column and no other column is stored in this data structure. For instance, in the above example, our data structure will only contain the company_id and no other columns such as unit or unit_cost.

But a legit question pops up here - how does the database know what other fields in the table are to be returned for a query. Let’s try to understand how.

Database indexes store pointers to simply reference information for the location of the additional information in the memory. In other words, the index holds the company_id and that particular row’s address in the memory. In this example, the database index will look something like this:

COMPANY_ID	POINTER
10	_123
10	_129
11	_127
11	_138
12	_124
12	_130
12	_135
14	_125
14	_131
14	_133
16	_128
18	_126
18	_131
18	_132
18	_137
20	_136
21	_134

With this index, the query can check for the rows in the company_id column which have 18 as a value and then using the pointer, it can find the related information for that record.

What data structures are used for indexing?

Having understood what we expect from the index, let’s have a look at the common data structures that can be used for database indexing:

B-Trees

B-trees are the most often used index data structures because they are fast for lookups, deletions, and insertions. All of these operations are possible in logarithmic time and the data contained within a B-tree can be sorted easily.

Hash Tables

Hash indexes are commonly used to describe indexes that utilize hash tables. Because hash tables are particularly efficient at looking up data, queries that look for an exact match may be processed rapidly. The key in a hash index is the column value, and the value in a hash table is a reference to the table's row data.

Hash tables, on the other hand, are not ordered data structures; therefore, they may be inefficient for other types of searches.

R-Tree

R-tree is frequently used in spatial databases, usually used to index multi-dimensional information such as geographical coordinates, rectangles, polygons, etc. It is useful for searches such as "find all the coffee shops within 2 miles of my location."

Bitmap Index

Bitmap indexes are useful for columns that have a high number of occurrences of such values, i.e. columns with low selectivity. For instance, consider a column having boolean values.

When to use Indexes

Indexes are designed to increase database performance; thus, indexing can be used whenever we need to significantly improve database performance. The greater your database expands, the more probable it is that indexing will benefit you.

However, the first and foremost thing to remember is that the index takes up extra space; therefore, the larger the table, the greater the index. Every time you perform an add, remove, or update operation, the same operation will need to be executed on the index as well.

When not to use Indexes

When data is written to the database, the original table is updated first, followed by other indexes based on that table. When a write is done to the database, the indexes become inoperable until they are updated. The indexes will never be functional if the database is continually getting writes.

This is why indexes are often applied to databases in data warehouses that get new data on a planned basis (during off-peak hours) rather than production databases that may receive new writes all the time.

How to create an index?

The following code snippet shows how to create an index on a single column in a SQL database:

CREATE INDEX name_index ON Employee (Employee_Name);

If you want to create an index on multiple columns, the SQL command will look something like this:

CREATE INDEX name_index ON Employee (Employee_Name, Employee_Age);

In general, an index should be constructed on a table only if the data in the indexed column will be frequently accessed.

Conclusion

So, we discussed database indexing in detail in this article and also learned about the data structures used to implement database indexing and also when it is advisable to use indexes and otherwise.

To sum everything up, here is a quick summary:

Database Indexing can help to reduce the time of queries vastly.
Indexing includes a data structure with columns for search criteria as well as a pointer.
The pointer is the address on the memory disk of the row containing the remaining information.
To improve query performance, the index data structure (B-Tree, R-Tree, Hash Table, or a Bitmap) is sorted.
The query searches the index for the specified row; the index refers to the pointer that will discover the remainder of the information.

This is all for this article. Database Indexing is a vast and a bit complicated topic, I hope this article would be helpful in understanding the basics of the concept.

Keep reading!