SQL Scalar table Data Types

Data types in PostgreSQL define the data format within table column.

//Data types within column definition
TEXT and CHAR(n), VARCHAR(n):        text based types, 
DATE, TIME, TIMESTAMP, and INTERVAL: include date and time types
JSON:                                for semi structured data, 
INT, REAL:                           for numeric types and
ENUM, ARRAY:                         specialized types

The TEXT, CHAR(n), and VARCHAR(n) data types store string values in table columns. TEXT allows strings of any length. While both CHAR(n) and VARCHAR(n) set a maximum n length. The CHAR(n) is also a fixed-length type, it will pad the end of the string with trailing spaces in order to mantain its specified length.

//CHAR and VARCHAR without (n) will be treated as CHAR(1) and TEXT respectively.
CREATE TABLE resorce (
    primo   CHAR(20), secondo VARCHAR(20), terzo   TEXT
);

insert into resorce(primo, secondo, terzo) 
values (' accidenti ', ' accidenti ', ' accidenti ');
//(" accidenti              "," accidenti "," accidenti ")

The || operator concatenates string values, with the return string converted to TEXT regardless of the data types being used. The substring() function returns a segment of a string, starting at the FROM index and continuing for the FOR length. The replace() function searches each string in a column for a specified substring and replaces all occurrences with a given new string.

//We can concatenate different data types.
select (text 'prima ' || ' e ' || 211 );    //TEXT "prima  e 211" 

//We can concatenate string values from columns.
SELECT nome || ' ' || titolo as new_sting FROM parole;

//The strings follow the 1-index rule, like the arrays
select substring('parola'::text from 1 for 3);      //par
SELECT substring(nome FROM 1 FOR 5) FROM parole;    //It selects from table columns

//It can either be a specific column or a stirng value, it replaces the substring
select replace('mottura'::text, 'ttu', 'lla' ); //mollara
SELECT replace(nome, 'pre', 'post') FROM parole where id=2;

The INTERVAL data type represents a time duration, which allows adding and subtracting intervals from date and time values. The standard format for an interval value is years-months days hours:minutes:seconds, with the hyphen(-) specifically separating years and months.

chevron-rightInterval convertion with the justify_interval() functionhashtag

The justify_interval function converts intervals and interval operations that exceed their time unit limit.

The normalized intervals will be more context-dependent than purely arithmetic.

  • The hours-days and days-months (30 days) overflows are more common and standardized conversions.

  • The months-years convertion isn't standardized, the number of months in a year can be context-dependants (like in accounting systems).

  • The seconds-minutes and minutes-hours follow fixed rules and don't typically require justification.

JSON and REAL

The JSON and JSONB data types store and manipulate JSON data in the table rows. JSON mantains its original plain text format, while JSONB store data in a decomposed binary format, which enhances query performance and storage efficiency.

PostgreSQL validates any JSON data inserted into a JSONB column, converting it to an internal binary format. Upon retrieval, this format is automatically converted back to a JSON literal; the binary representation is for internal use only.

The '->' operator retrieves the JSON value associated with the selected key, preserving its original type.

Different data types on JSON select
chevron-rightThe Containment @> and object key ? operatorshashtag

The @> operator checks if a JSON object contains a specified subset.

The ? operator checks if a single key exists at the top level of a JSON object and returns the matching JSON object.

The ?| operator returns rows where any string from the array exists as a top-level key in the JSON.

The &? operator returns rows where all keys from the array exist as top-level keys in the JSON.

The REAL data type stores floating-point and scientific notation values with approximate precision.

ENUM and ARRAY

The ENUM custom data type defines a set of allowed values for a table column. It cannot be defined directly in a column definition; we must CREATE TYPE it outside the table.

PostgreSQL stores ENUM values internally as small integers. The pg_enum system catalog maps the integers to theirs ENUM values during queries.

chevron-rightenum ALTER TYPE ruleshashtag

We can add new ENUM values using ALTER TYPE, but we cannot remove or rename existing values, nor change their order.

The ARRAY data type stores an ordered collection of elements of the same data type within a single column.

We define arrays columns using their data type and dimention level, we INSERT array values as {} strings or using the ARRAY constructor.

In the current PostgreSQL implementation, the database will not strictly enforce the declared array dimension at runtime; it's mainly for user documentation.

During the INSERT operation, the PostgreSQL parser infers{} array values and converts them to the target column's array data type.

Arrays literals defined outside a column definition require explicit type casting to be recognized at their specific data type, array operations only allow arrays with identical data types.

The [ : ] syntax can slice an array section, with both ends being inclusive.

PostgreSQL applies the immutable principle to its functions and operators. PostgreSQL's built-in array functions return new array values, requiring the UPDATE command to modify the corresponding table column data.

The array_append() function appends a specified element to the end of the array.

The array data type, symilar to interval, has access to the @> and && operators.

The containment operator @> checks if the first array contains all elements of the second array. It ignores duplicates, meaning ARRAY[1] and ARRAY[1, 1] are considered equivalent.

The ANY keyword functions as a quantifier, modifying a comparison operation to allow a single value to be compared to each element within an array.

chevron-rightANY query rules and differences with the IN () operatorhashtag

The ANY keyword must be placed on the right side of the comparison operator. On the left side, it is treated as a function, resulting in an invalid comparison.

Both ANY and IN compare a value against a series of values: - ANY works directly with array literals and table array columns, enabling element-wise comparisons, - IN is designed for explicit comma-separated value sets and subqueries, requiring UNNEST for table arrays.

The ALL keyword is a quantifier that compares a value to every element of an array, returning true only if all comparisons are true. It works on subqueries.

The EXISTS operator is a logical operator that requires a subquery as its operand. It is used in the WHERE clause to specify which rows the main query will process, and returns true if its subquery returns at least one row.

We correlate the outer query with the subquery by using a variable. This allows the EXISTS condition to evaluate each row of the outer query against related data in the subquery.

Both IN and JOIN can perform similar operations to EXISTS, but they require the complete evaluation of the subquery or a table join. While EXISTS stops evaluation at the first match, avoiding full table scans.

chevron-rightJOIN and IN on conditional query returns.hashtag

The JOIN operator combines rows from two tables based on a matching condition. It creates a direct relationship between columns at the same query level, unlike the EXISTS subquery correlation.

The IN operator returns the rows that match any of the values returned by its subquery.

The EXISTS predicate queries the specific columns from the main table based on the related table data, without performing a full join or processing the subquery on the entire main table.

Last updated