Database 2: SQL constraints

Constraints are declarative rules applied to PostgreSQL table columns, designed to enforce data integrity. They will return an error and prevent any data modification that violates the constraint rules.

The DEFAULT constraint automatically assigns a specified value to a column during row insertion if no explicit value is provided for that column.

//The default value needs to be compatible with the table column its in.
//It can be either a literal value or a built-in function return value.
create table multi(
  id        SERIAL primary key,
  name      VARCHAR(15) not null,
  employed  boolean default false,
  tempo     TIME default now(),
  lista     INT[] default array[12, 25]
)

insert into multi (name) values ('jeff');
select * from multi;    //{jeff, false, 16:43:35, [12, 25]}

The CHECK constraint applies a SQL boolean expression to each value in its column to ensure data validity.

Table-level constraints define conditions that include multiple columns, we declare them after all the table columns have been defined.

We can create named CHECK conditions using the CONSTRAINT keyword. These names will identify the constraint in the error messages and allow us to add or drop that specific constraint from the table columns.

We can include custom functions within a CHECK constraint to enforce complex data integrity rules.

chevron-rightValidating CHECK constraints using PL/pgSQL functions hashtag

The CHECK constraint function validates any row value being inserted or updated in the table. We can use a procedural language, such as PL/pgSQL, which allows the function to build complex validation logic using the constraint's arguments as variables. It validates data only within the single row for its specified column values and is isolated from other rows, unlike an EXCLUDE GIST constraint which is designed to compare them.

SQL uses the three-valued logic in its Boolean expressions, allowing them to return TRUE, FALSE, or UNKNOWN, often when NULL values are involved in a comparison.

chevron-rightNamed CHECK constraints, NULL conditions and pg_constraint queryhashtag

The CHECK constraint only blocks inserts for conditions explicitly returns FALSE. Unlike clauses like WHERE, which filter rows based on conditions explicitly marked as TRUE, CHECK constraints allow unknown results—such as NULL values—from INSERT operations involving missing data.

The NOT NULL keyword, when used within a CHECK condition, works as a predicate and it can be used with other logical operations. However, if NOT NULL is applied directly to a column definition, it acts as an independent constraint.

All the database CHECK constraints are stored as entries in the pg_constraint system catalog. The conbin column stores the binary tree representation of the check expression used by the SQL query executor. The OID column contains the constraint's own unique OID, while the conrelid column contains the OID of the constraint's table; this OID is shared among all constrains aplied to the same table. The collaname holds the constraints names, including the default names for unnamed constraints. The contype column contains the character code indicating the constraint's type.

The consrc column was discontinued in favor of the pg_get_constraintdef(OID, boolean) pg_catalog function. This function returns the textual representation of the CHECK condition. It requires the OID of the specific constraint and a boolean argument to indicate if the result should be more prettily formatted.

We SELECT columns from pg_constraint entries and JOIN the pg_constraint with the pg_class catalog, which contains database relations like tables. This ensures that the returned constraints also include their table properties. The JOIN condition matches the c.conrelid column with the table's OID, so as to return only the constraints contained within the table specified in the FROM condition.

pg_constrant query on unnamed check conditions

The pg_constraint includes multiple single-character codes to filter results based on a specific constraint.

The PostgreSQL UNIQUE constraint

The UNIQUE constraint ensures that each row in the table has a distinct combination of values for the specified columns.

It can be applied to a single column or to multiple columns as a table-level constraint. Each UNIQUE constraint operates independently, enforcing uniqueness only for its specified column values.

Before being added, any new constraint validates all existing rows in its specified columns.

The SQL standard syntax treats each NULL as a distinct value; this causes multiple NULL within UNIQUE columns not to violate the constraint.

The nulls NOT DISTINCT option explicitly treats NULL values as equal (non-distinct) in the UNIQUE constraint, matching how NULL is evaluated in other database systems.

chevron-rightThe UNIQUE comparison operator on nulls NOT DISTINCT columnshashtag

The nulls NOT DISTINCT option, introduced in PostgreSQL 15, changes how the NULL values are treated within an UNIQUE constraint comparison operator.

Within the SQL standard, any operation involving NULL values returns UNKNOWN, due to its three-valued logic.

The UNIQUE constraint triggers an error only when its comparison operator explicitly returns TRUE, this will allow multiple NULL entries within the UNIQUE columns.

The nulls NOT DISTINCT option changes the result for NULL comparison operations (NULL = NULL) to TRUE, which enables the UNIQUE constraint to block multiple entries containing NULLs.

The UNIQUE constraint is not designed for effective use with RANGE and ARRAY data types. It enforces uniqueness only for ranges and arrays that share the exact same order and values; it will not detect range overlaps or check for shared elements within different arrays.

A UNIQUE constraint inherently generates a B-tree index on its specified columns to enforce the uniqueness rule.

To know more about indexes check the INDEX section.

Implementing UNIQUE Constraints with UNIQUE INDEX

The UNIQUE INDEX stores and orders all specified column values. Its access method includes a function that compares each new value against existing ones, blocking the operation if it finds two equal (non-distinct) values.

A UNIQUE constraint's entry in pg_constraint includes its OID, conname, and contype, while its conbin column is set to NULL. The query executor doesn't use the conbin binary data to enforce the UNIQUE constraint. Instead, it relies on the index referenced by the conindid OID, while the conrelid OID identifies the table where the constraint applies.

The UNIQUE index, identified by the conindid OID, stores its metadata across the pg_class and pg_index.

The pg_class system catalog contains the physical and relational properties of a UNIQUE index, treating it as a storage entity. Aside from its OID, it includes:

relname: The name of the index. For UNIQUE constraints, it matches the constraint name. relkind: A character code indicating the relation's type. relpages: The estimated number of 8KB disk pages the index occupies. relhasindex: Primarily used for tables, it indicates an index associated with the relation

A UNIQUE index, whether created by a UNIQUE constraint or independently, can enforce uniqueness for specific columns through its inherent equality comparison operator.

Implementing partial UNIQUE constraints with WHERE

A partial unique index includes a WHERE clause in its definition, which filters the inserted rows and allows only a subset to be included in the index.

The uniqueness comparison operator will be aplied only to the rows included in the index. A new value will be blocked from the table only if it passes the WHERE clause and matches an existing value within the partial unique index.

All rows excluded by the WHERE partial index clause will still be inserted into the table, allowing the presence of duplicate values.

A partial unique index is designed to optimize the queries for a subset of specific values. The query planner will use the index, instead of a full table scan, if the query's WHERE clause guarantees that it will only SELECT rows contained within the partial index.

chevron-rightQuery conditions for a PARTIAL INDEX scanhashtag

A query that SELECTS the indexed columns and shares the WHERE clause with the partial index will use the index.

A query that SELECTS the indexed columns but with a different WHERE clause will not use the partial index. The query planner can't guarantee that all the SELECT columns can be fetched from the index, so it will opt for a full table scan to avoid missing rows.

A query that SELECTS different columns but matches the WHERE clause might use the partial index via a heap fetch. In this process, the query planner retrieves row identifiers (TIDs) from the index and then uses them to efficiently fetch the complete row data, including the additional columns, from the main table (the heap).

The query planner will use the partial index only if the query WHERE condition mathematically implies it as a subset of the partial index WHERE clause.

  • A query for x < 1 will use a partial index defined with a x < 2 WHERE clause, because all the query values are included in the index's condition.

  • The opposite isn't true, a query for x < 2 can't use a partial index defined with a x < 1 WHERE clause, because the query might retrieve values (like 1.5) that are outside the partial index.

The query planner is not optimized to interpret complex conditions that logically imply the use of an index's clause. This includes expressions like 'column = "true"' and 'column != "false"'.

The query planner decides whether to use an index during query planning time, not during execution. It excludes placeholder and parameter values from its index planning evaluation, as they are unknown during that stage and defined only at runtime.

The NOT NULL constraint

The NOT NULL constraint prevents a table column from containing NULL values. It's more efficient than its equivalent CHECK constraint (column IS NOT NULL). It can be applied with other constraints in any order and is enforced directly by the table without requiring an index.

Its inverse constraint, NULL, allows a column to accept NULL values. It is set by default and is mainly used for compatibility with other database systems.

The PRIMARY KEY constraint

The PRIMARY KEY constraint identifies table rows through its specified columns. It sets them as UNIQUE and NOT NULL, to avoid duplicate identifiers and missing column values respectively.

A table can have only one PRIMARY KEY, which automatically generates a B-tree index on its specified columns for efficient row retrieval.

Primary keys can be used by GUI applications or as default targets for foreign keys.

The EXCLUDE constraint

The EXCLUDE constraint uses a set of comparison rules to filter the new rows in a table. It's similar to the UNIQUE constraint, as it compares the new row to all existing table values while also using comparison operators other than equality(=), like the overlap operator(&&).

It's similar to the CHECK constraint, as it can include multiple columns and operations within a single constraint. It doesn't rely on specific comparison values; instead, it validates using its comparison operators on the existing table values. It excludes the new row if its specified column values return TRUE for all the comparison rules.

We can define a WHERE predicate condition on the EXCLUDE, which turns it into a partial constraint that only applies to a subset of the rows, making the index included in the constraint require less storage.

The index contains the specified column values from the table rows. It allows for quick, pair-to-pair comparisons with the new row's values, avoiding a slow full table scan.

chevron-rightCompatible Indexes in the EXCLUDE constraint.hashtag

The EXCLUDE constraint specifies its index's access method in the USING keyword, which defines the index structure used to perform the comparison operations.

Only a compatible index can be used with the EXCLUDE constraint. Its access method must include the amgettuple function in the API defined by the index's internal structure.

The database requires the amgettuple function to validate the values in the EXCLUDE constraint. It allows the database to sequentially compare each key of the index using the provided comparison operators.

  • The B-tree and HASH index structure allows the use of the amgettuple function in the EXCLUDE constraint. They are optimized only for equality operations, using them results in EXCLUDE being a less effective UNIQUE constraint.

  • The GIN index is designed for text searches and complex data types, with each key storing a list of its value locations. Its internal structure doesn't support the amgettuple function, which is needed for the sequential comparisons used by the EXCLUDE constraint.

  • The GiST index structure is compatible with the amgettuple function, which allows it to perform the sequential comparisons needed to validate the more complex operations in the EXCLUDE constraint.

For more details about indexes check the Index session.

Last updated