Database 3:PostgreSQL Indexes, B-tree and GIST

1

An index is an additional database structure built with the specified columns of an existing table.

The index optimizes queries that reference its columns, which allows the query planner to avoid a full table scan. The size and speed of an index depend on its design and the type of data it holds.

The CREATE INDEX command creates an index for its specified table columns. The B-tree indexing method defines an index structure optimized for equality and range queries.

--Indexes are most effective on large datasets
CREATE TABLE tavole ( id  SERIAL PRIMARY KEY, codice  INT );
INSERT INTO tavole (codice, nome)
    SELECT FLOOR(RANDOM() * 10000) FROM GENERATE_SERIES(1, 10000);

--If no specified the database creates a B-tree index
CREATE INDEX indice ON tavole(codice);

--We disable the sec. scan to make the index be used by the query planner.
SET enable_seqscan = off/on;
--Explain Analyze returns the scan stats
explain analyze select * from tavole where codice > 5000;
Sequential query scan and Bitmap index scan statistics

The query planner uses an index if its overhead cost is lower than that of a sequential scan. It also depends on the query clauses (e.g., LIKE or JOIN), as some are not supported by certain index types.

An index can be implicitly created by a constraint and is automatically deleted when its associated table is dropped.

The CREATE INDEX includes the option argument using the WITH keyword. It configures the index access method but offers a more limited set of available options compared to other index types.

The placement of the CREATE INDEX command influences the effect of certain options, as they are applied differently depending on whether the index is created on existing data or is filled later by INSERT operations.

The CREATE UNIQUE INDEX command creates a B-tree that enforces uniqueness on its indexed column values. Some constraints, like UNIQUE, work by implicitly creating a B-tree index; the index command just explicitly creates the same structure to enforce the same rules.

A partial index is an index that contains only a subset of its indexed column values, created by adding a WHERE clause to the CREATE INDEX command. It's applied similarly to a CHECK constraint, but instead of blocking invalid data from entering the table, it limits which rows are included in the index, while allowing all data into the table itself. The smaller index allows for faster query scans and is affected by fewer table updates.

The Pg_amop operator famiy validating query operations for indexed columns data types

A query search including an indexed column requires additional catalog lookups. The query plan must determine if the query operator is compatible with the index included in the search and if the resulting index scan is more efficient than a default sequential scan.

The system determines the index's operation family by referencing its relation name in pg_class, its properties in pg_index, and the metadata from its associated access method in pg_opclass.

The different operator classes sharing the same operator family

The planner analyzes the query's WHERE clause to extract the operator and its data types, which the pg_operator system catalog then uses to identify the specific OID of the query operation. It then searches for a pg_amop rule that validates the operation OID for the operation family of its indexed column.

The pg_amop acts as a bridge, translating the high-level SQL operator into a specific strategy number used by the index's internal low-level code.

The strategy number represents a specific action, used by the pg_am handler function to execute the correct index operation.

The pg_am index execution plan cost and the operator family system catalogs

The query planner execution plan combines the table statistics with the index cost estimations. The pg_operator provides the oprrest function for the specific query operator, this function is aplied to all scan types and defines the logic used to calculate the query selectivity from the pg_statistics data. The pg_am system contains the index-specific amcostestimate handler function. This function estimates the I/O cost required to retrieve the table data volume estimated by the resulting selectivity. The query planner compares the costs of the different scan paths to select the most efficient one.

The pg_amop (Access Method Operator) system catalog stores all available indexing rules, which links the high-level comparison operators to their index strategies. Each rule contains metadata that describes how the index access method implements a specific query operation. They are organized based on their operation families, with operation classes being a specific subset of rules that apply to an index access method. The pg_amop defines the rules for using existing indexes, while pg_amproc provides the support functions needed to build the index. The pg_amop columns include:

amopfamily: It identifies the operator family the rule belongs to. amopopr: The pg_operator OID of the query operator the rules applies to. amoplefttype / amoprighttype: The pg_operator OID of the data types of the operator's left and right inputs. amopstrategy: The number that defines the operator's concept (e.g., "equality"). It's implemented within the access method's C-level code, which is called when the execution plan is run. amopurpose: It defines the rule's role, 's' for search (used in WHERE clauses) or 'o' for ordering (used in ORDER BY clauses). amoppsortfamily: It aplies only to ordering rules. It specifies the operator family that provides the sort logic. It's used by index types that rely on extensions to sort their entries.

The pg_amproc (Access Method Process) system catalog contains the indexes' sorting functions. The database retrieves and automatically triggers the specific support function for the data type being used during the CREATE INDEX command. This registry is shared among all of the database's indexes.

The RAM usage and DMA I/O costs for different indexes scan types

The query planner's EXPLAIN output is based on the estimated number of matching query values:

It selects a Sequential Scan for a high number of matching table rows, as this avoids the overhead cost of an index lookup across numerous pages. It selects an Index Scan for a precise query with few returning rows, as the index overhead cost is lower than reading the entire table. It selects a Bitmap Scan for an intermediate number of matching rows.

The Bitmap structure summarizes a large set of matching table rows using multiple scans:

The Bitmap Index Scan retrieves the TID positions of all disk pages containing the matching values and collects them into the temporary bitmap structure. The Bitmap Table Heap Scan loads the bitmap into RAM, and re-applies the original query conditions to all its contained rows to find the exact matches.

The EXPLAIN output specifies the number of retrieved disk pages and the type of the bitmap scan:

An exact bitmap scan creates a more complex bitmap that includes the complete TID of each index entry, which includes the tuple's specific row position. A lossy bitmap scan is used when the total size of its index entries exceeds its allocated RAM, and it saves space by storing only the TID disk page number .

The query planner chooses its bitmap based on its RAM and CPU costs: an exact bitmap uses more RAM to store a detailed map but incurs a very low CPU cost because it can retrieve each row directly; conversely, a lossy bitmap uses less RAM by creating a simpler map, but it incurs a high CPU cost because it must re-apply the query's conditions to every row on the fetched disk pages.

The DMA (Direct Memory Access) manages the cost of an I/O operation, which includes the seek cost (time to find the disk page) and the transfer cost (time to load data into RAM). The seek cost is the highest component because it's based on physical navigation, and it's the same for retrieving either a single matching row or an entire disk page. Although the query plan may include random I/O retrievals, their limited number maintains efficiency, as a single I/O operation can retrieve multiple matching entries.

The execution plan scan types and the clustering factor

The query planner includes data distribution statistics to define its execution plan. The clustering factor, stored in the pg_statistic correlation column, quantifies the relationship between the physical table's heap data and the logical index structure. It measures how many unique disk pages are accessed by the index's entry TIDs. A high correlation indicates that sequential index entries are stored on the same shared disk pages, while a low correlation indicates index TIDs that point to scattered disk pages, which requires a higher number of expensive random I/O accesses.

The pg_class catalog provides the total row count (reltuples) and total disk pages (relpages) for the table, while pg_statistic determines the selectivity of the specific query condition. The statistical correlation acts as a cost multiplier for the pg_am amcostestimate handler, that can cause the planner to choose a Bitmap Index Scan, which is better designed to handle multiple scattered disk page accesses.

A Sequential Scan processes all the table rows in a series of sequential I/O operations based on their insertion order. Its estimated cost is based on the CPU cost for reading each row and the cost of reading every disk page occupied by the table. It's efficient for queries that retrieve a large percentage of table's data (low selectivity), as the cost of reading the entire table is less than the overhead associated with using an index for almost every row.

The EXPLAIN ANALYZE output describes the properties of the table scan. They are divided into estimated cost values, calculated by EXPLAIN, and actual performance metrics, returned by ANALYZE after the query execution. The loops value represents the number of times the node was executed; a higher count indicates a complex join structure used for the scan. The width value represents the average byte size of the table rows used in the scan. It's a static property used for the cost estimation and is not included in the runtime metrics of the actual cost.

To access the complete explanation for the EXPLAIN and ANALYZE commands or the database statistics check theirs specific sections.

GIST INDEXING

A GiST (Generalized Search Tree) index organizes complex and non-linear data into a balanced tree. Its core algorithm acts as a flexible framework. It uses the operator class of the data type being indexed to define its internal logic and node structure.

The GIST index access method stores multi-dimensional data types, such as geometric data, images, arrays, timestamps, and full-text search strings. Its tree structure organizes the data based on shared properties specific to the indexed data type, resulting in a more complex index with slower operations.

It supports a wide range of specialized comparison operators for queries that access its indexed data types:

Overlap (&&) and containment (<@, @>) for ranges values Is Adjacent To (-|-): Checks if two ranges touch at their boundaries without overlapping. Same As (~=): Verifies if two geometric objects are identical. Distance (<->): Orders results based on their proximity to a target object. Strictly position (<< & >>): Checks if an object is positioned entirely to one side of another. Text Match (@@): Performs a full-text search on a document

1 TITOLO BITMAPAND operation scan type

The GiST index framework is designed to adapt to complex, non-linear data types. It supports extensions, like the pg_trgm, which implements trigram indexing for text similarity searches.

The planner optimizes queries involving multiple indexed columns using their index seelctivities. The index scan + filter is applied when one index is highly selective. The planner performs a standard index scan on the primary column and applies the remaining conditions as filters to the table's heap results, without using the second index. The Bitmap optimitation is applied when multiple indexes have meaningful selectivity. The database independently generates bitmaps for each condition and performs a BitmapAND operation to create a final bitmap containing the intersection of Row IDs (TIDs) present in both.

The EXPLAIN output for a bitmapAND execution plan

The BitmapAnd row count is based on the query planner selectivity estimate for the two columns, assumed as indipendent for the lack of an extended statistics object. Its actual row output count (rows=0) depends on the returned bitmapAND structure not being considered as table rows, leaving the actual count of retrieved rows for the Bitmap Heap Scan.

The Bitmap Heap scan is necessary for inherent lossy index structure like GIN and GIST. These indexes often use hashing or trigram signatures, which can result in hash collisions or false positives. The recheck ensures that the rows retrieved from the heap actually match the query criteria. The execution plan includes the I/O costs for the two child Bitmap index scans and the bitmap heap scan, which, like the bitmapAND, is performed entirely in RAM.

chevron-rightThe GIN and GIST difference in creating bitmapAND execution planshashtag

The GIN indexes have a higher chance of resulting in a BitmapAnd plan compared to GiST. The GIST indexes are often more expensive for bulk retrieval operations like bitmaps. Its balanced-tree index structure requires traversing multiple branches to retrieve the multiple matching rows, as its predicate nodes conditions might be based on a different data from the query values. Resulting in the query planner creating a single bitmap and using the other query condition as a filter. The GIN index uses Posting Lists for its structure, which allows the database to retrieve specific values in a singular I/0 operation, and results in less expensive bitmaps and the bitmapAND operation.

The GIN index is efficent for reading but it's slower to update than GiST and can't handle unique constraints. It's also not suitable for certain complex operators, like spatial overlaps or validity ranges.

DOUBLE LOSSY scan

The EXPLAIN output of a Bitmap Scan can be flagged as a 'Lossy Heap Scan' due to lossiness in either the bitmap memory management or the index logical structure.

The Bitmap Lossiness is related to work_mem management. When the number of bitmap entries exceeds available memory, the database optimizes the structure by using 'lossy' TIDs that point to the disk page containing the match, rather than the row itself. This requires more CPU due to the necessary 'Recheck Condition' applied to all rows within the disk page to determine which ones satisfy the query.

The Index Lossiness refers to the data type used for the index entries. The pg_trgm extension creates GIN/GIST indexes made of trigram values, which are inherently lossy due to their low selectivity and are designed for specific operations like fuzzy matching (%). This results in a bitmap scan that produces a high number of potential matches but does not guarantee precision, requiring an additional verification step with the actual table values.

The combined lossiness from both data structures forces the database to process the EXPLAIN output as a lossy heap block. The database dynamically sets TIDs as 'exact' or 'lossy' depending on available work_mem. If it runs out of allocated memory, it switches to storing lossy pointers while keeping the previous ones as exact. The number of false positives found during data retrieval is shown in the line Rows Removed by Index Recheck.

The EXPLAIN output of the lossy Heap Blocks

The data structures involved are:

circle-info

The GIN/GiST Index: It's stored on disk, it uses the gist_trgm_ops operator class to create an index structure made of trigrams.

The Bitmap: A temporary work_mem data structure that exists only during query execution. It stores the TID pointers indicating the index entries that match the query condition.

The Heap Blocks: The physical disk pages where the actual table data is stored. They are accessed during the Bitmap Heap Scan to confirm the final output values.

Last updated