PostgreSQL’s Exciting features, you should know

Written by hackernoon-archives | Published 2019/03/28
Tech Story Tags: postgresql | postgresql-feature | sharding | inheritance | sql

TLDRvia the TL;DR App

PostgreSQL is a powerful,object-relational database system that extends the SQL language combined with many features that safely store and scale the most complicated data workloads.

PostgreSQL comes with many features aimed to help developers build applications, administrators to protect data integrity and build fault-tolerant environments, and manage data no matter how big or small the dataset.

Here, I pick some great PostgreSQL features you may not have taken a look at but really should, since they can help you get code into production faster, make easier and generally get things done with less code and less effort.

Inheritance

Table inheritance allows extracting a common set of columns into a parent table with children defining additional fields.

CREATE TABLE invoices (
    invoice_number   int  NOT NULL PRIMARY KEY,
    issued_on        date NOT NULL DEFAULT now()
);

CREATE TABLE government_invoices (
    department_id    text NOT NULL
) INHERITS (invoices);

This reflects the situation that all government invoices are invoices, but have an extra attribute. The “government_invoices” table above has a total of 3 columns.

Adding rows work as though the tables are independent:

INSERT INTO invoices (invoice_number) VALUES (100);

INSERT INTO government_invoices
    (invoice_number, department_id) VALUES (101, 'DOD');

But what happens when you SELECT:

SELECT * FROM government_invoices;

 invoice_number | issued_on  | department_id
----------------+------------+---------------
            101 | 2018-06-19 | DOD


SELECT * FROM invoices;

 invoice_number | issued_on
----------------+------------
            100 | 2018-06-19
            101 | 2018-06-19

There are some major reasons for using table inheritance in Postgres.

Let's say, we have some tables needed for invoices, which are created and filled each month:

invoices
    - invoices_2010_04 (inherits invoices)
    - invoices_2010_05 (inherits invoices)

So what makes the inheritance a cool feature — why is it cool to split the data?

  • PERFORMANCE: When selecting data, we SELECT * FROM invoices WHERE date BETWEEN X and Y, and Postgres only uses the tables, where it makes sense. Eg. SELECT * FROM invoices WHERE date BETWEEN ‘2010–04–01’ AND ‘2010–04–15’ only scans the table invoices_2010_04, all other tables won’t get touched!
  • Index size: We have no big fat table with a big fat index on column date. We have small tables per month, with small indexes — faster reads.
  • Maintenance: We can run vacuum full, reindex, cluster on each month table without locking all other data

Data types

PostgreSQL has a rich set of native data types available to users. Users can add new types to PostgreSQL using the CREATE TYPE command.

Besides the numeric, floating-point, string, boolean and date types you’d expect, PostgreSQL supports uuid, monetary, enumerated, geometric, binary, network address, bit string, text search, xml, json, array, composite and range types, as well as some internal types for object identification and log location. To be fair, MySQL, MariaDB and Firebird each have some of these to varying degrees, but only PostgreSQL supports all of them.

Let’s take a closer look at a couple of these:

JSON

PostgreSQL supports native JSON data type. It provides many functions and operators for manipulating JSON data.

Let’s start practicing with JSON data type.

CREATE TABLE orders (
ID serial NOT NULL PRIMARY KEY,
info json NOT NULL 
);

Insert JSON data

INSERT INTO orders (info)
VALUES(
'{ "customer": "John Doe", "items": {"product": "Bag",qty": 6}}'
);

Let’s insert multiple rows at the same time.

INSERT INTO orders (info)
VALUES
('{ "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}'
),
('{ "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}'
),
('{ "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}'
);

Querying JSON data

To query JSON data, you use the SELECT statement, which is similar to querying other native data types:

SELECT info FROM orders;

Output:

info         
----------------------------------------------------------------
 { "customer": "John Doe", "items": {"product": "Bag",qty": 6}}
 { "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}'
{ "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
 '{ "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}

PostgreSQL provides two native operators -> and ->> to help you query JSON data.

  • The operator -> returns JSON object field by key.
  • The operator ->> returns JSON object field by text.

The following query uses the operator -> to get all customers in form of JSON:

SELECT info -> 'customer' AS customer
FROM orders;

And the following query uses operator ->> to get all customers in form of text:

SELECT info ->> 'customer' AS customer
FROM orders;

-> operator returns a JSON object, you can chain it with the operator ->> to retrieve a specific node. For example, the following statement returns all products sold:

SELECT
info -> 'items' ->> 'product' as product
FROM orders
ORDER BY product;

Use JSON operator in WHERE clause

We can use the JSON operators in WHERE clause to filter the returning rows. For example, to find out who bought Diaper, we use the following query:

SELECT
info ->> 'customer' AS customer
FROM orders
WHERE info -> 'items' ->> 'product' = 'Diaper'

To find out who bought two products at a time, we use the following query:

SELECT
info ->> 'customer' AS customer,
info -> 'items' ->> 'product' AS product
FROM orders
WHERE
CAST ( info -> 'items' ->> 'qty' AS INTEGER ) = 2

Array

Array plays an important role in PostgreSQL. PostgreSQL allows you to define a column to be an array of any valid data type including built-in type, user-defined type or enumerated type.

The following CREATE TABLE statement creates the contacts table with the phones column is defined as an array of text.

CREATE TABLE contacts (
id serial PRIMARY KEY,
name VARCHAR (100),
phones TEXT []
);

The phones column is a one-dimensional array that holds various phone numbers that a contact may have.

INSERT INTO contacts (name, phones)
VALUES
('John Doe',ARRAY [ '(408)-589-5846','(408)-589-5555' ]
);

Insert PostgreSQL array values

The following statement inserts a new contact into the contacts table.

INSERT INTO contacts (name, phones)
VALUES
('Lily Bush','{"(408)-589-5841"}'
),
('William Gate','{"(408)-589-5842","(408)-589-58423"}'
);

Query array data

We can use the SELECT statement to query array data as follows:

SELECT name,phones
FROM contacts;

We access array elements using the subscript within square brackets []. By default, PostgreSQL uses one-based numbering for array elements. It means the first array element starts with number 1. Suppose, we want to get the contact’s name and the first phone number, we use the following query:

SELECT name, phones [ 1 ]
FROM contacts;

We can use array element in the WHERE clause as the condition to filter the rows. For example, to find out who has the phone number (408)-589–58423 as the second phone number, we use the following query.

SELECT name
FROM  contacts
WHERE phones [ 2 ] = '(408)-589-58423';

Modifying PostgreSQL array

PostgreSQL allows you to update each element of an array or the whole array. The following statement updates the second phone number of William Gate.

UPDATE contacts
SET phones [ 2 ] = '(408)-589-5843'
WHERE
ID = 3;

hstore

The hstore module implements hstore data type for storing key-value pairs in a single value. The hstore data type is very useful in many cases, such as semi-structured data or rows with many attributes that are rarely queried.

Before working with the hstore data type, you need to enable the hstore extension.

CREATE EXTENSION hstore;

We use the CREATE TABLE statement to create the books table as follows:

CREATE TABLE books (
 id serial primary key,
 title VARCHAR (255),
 attr hstore
);

We use the INSERT statement to insert data into the hstore column as follows:

INSERT INTO books (title, attr)
VALUES
 (
 'PostgreSQL Tutorial',
 '"paperback" => "243",
  "publisher" => "postgresqltutorial.com",
  "language"  => "English",
  "ISBN-13"   => "978-1449370000",
  "weight"    => "11.2 ounces"'
 );

Postgresql hstore provides the -> operator to query the value of a specific key from an hstore column. For example, if we want to know ISBN-13 of all available books in the books table, we can use the ->operator as follows:

SELECT 
attr -> 'ISBN-13' AS isbn 
FROM 
books;

Data integrity

PostgreSQL is more reliable because it is ACID (Atomicity, Consistency, Isolation, and Durability) compliant which means queries will maintain data integrity, and return the same output without error.PostgreSQL is well-known for its rock-solid referential and transactional integrity. Primary keys, restricting and cascading foreign keys, unique constraints, not null constraints, check constraints and other data integrity features ensure only validated data is stored.

MySQL and MariaDB are doing more to be SQL standard compliant with the InnoDB/XtraDB storage engines. They now offer a STRICT option using SQL modes, which determines the data validation checks that get used; however, depending on the mode you use, invalid and sometimes silently-truncated data can be inserted or created on update. Neither of these databases currently supports check constraints and there are also a host of caveats for foreign key constraints.

Case Sensitive

PostgreSQL is case-sensitive for string comparisons.The field “Smith” is not the same as the field “smith”. Use the correct case in your query. (i.e. WHERE name=‘Smith’).PostgreSQL has a case-insensitive operator, like ILIKE

Custom Type and Functions

PostgreSQL provides robust built-in operators and functions including those that support the specialized data types but it also lets you create your own operators and functions (including aggregates) as well as custom stored procedures and triggers. let’s look at a simple example for functions.

CREATE TYPE datetext AS (

date date,

date_as_text text);

CREATE FUNCTION show_date_as_text(date)

RETURNS datetext -- this is our composite type

AS   $$ SELECT $1, to_char($1, 'FMMonth FMDD, YYYY') $$

LANGUAGE SQL;

SELECT show_date_as_text('2015-01-01');

-- Returns: (2015-01-01,"January 1, 2015")

Error Handling

Postgresql returns proper error message when erorr happens.Please see the image in below.

postgresql error message

But what returns mysql when above same sql execute.

Combining Queries

PostgreSQL provides UNION, INTERSECT and EXCEPT clauses for interactions between SELECT statements. UNION will append the results of the second SELECTstatement to those of the first. INTERSECT returns only the rows which match from both SELECT statements. EXCEPT returns only the rows from the first SELECTstatement that do not match rows from the second SELECT statement.

While MySQL, MariaDB, and Firebird all support UNION, none of them support INTERSECT or EXCEPT. However, through the use of joins in the query and the EXISTS condition, the same results can be acquired as from PostgreSQL.

Window Functions

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function.

Similar to an aggregate function, a window function operates on a set of rows, but it does not reduce the number of rows returned by the query.

The OVER keyword, optionally used with PARTITION BY and ORDER BY, indicates that a window function is being used. Note that the WINDOWclause is not required in queries with Window functions, but it lets you create and name windows to help keep things straight.

Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;

Output :

depname  | empno | salary |          avg          
-----------+-------+--------+-----------------------
 develop   |    11 |   5200 | 5020.0000000000000000
 develop   |     7 |   4200 | 5020.0000000000000000
 develop   |     9 |   4500 | 5020.0000000000000000
 develop   |     8 |   6000 | 5020.0000000000000000
 develop   |    10 |   5200 | 5020.0000000000000000
 personnel |     5 |   3500 | 3700.0000000000000000
 personnel |     2 |   3900 | 3700.0000000000000000
 sales     |     3 |   4800 | 4866.6666666666666667
 sales     |     1 |   5000 | 4866.6666666666666667
 sales     |     4 |   4800 | 4866.6666666666666667

The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)

CHECK constraint

The following statement defines an employees table.

CREATE TABLE employees (

id serial PRIMARY KEY,

first_name VARCHAR (50),

last_name VARCHAR (50),

birth_date DATE CHECK (birth_date > '1900-01-01'),

joined_date DATE CHECK (joined_date > birth_date),

salary numeric CHECK(salary > 0)

);

The employees table has three CHECK constraints:

  • First, the birth date ( birth_date) of the employee must be greater than 01/01/1900. If you try to insert a birth date before 01/01/1900, you will receive an error message.
  • Second, the joined date ( joined_date) must be greater than the birth date ( birth_date). This check will prevent from updating invalid dates in terms of their semantic meanings.
  • Third, the salary must be greater than zero, which is obvious.

Sharding

Sharding is just another name for “horizontal partitioning” of a database. Sharding is breaking a single database into smaller, more manageable chunks, and distributing those chunks across multiple servers, in order to spread the load and maintain a high throughput

First, it helps minimizing response times for database queries.

Second, you can use more cheaper, “lower-end” machines to host your data on, instead of one big server, which might not suffice anymore.

PostgreSQL offers built-in support for the following forms of partitioning:

Range Partitioning

The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects.

List Partitioning

The table is partitioned by explicitly listing which key values appear in each partition.

Example:

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

PostgreSQL is becoming too much rich day by day, with built-in features and innumerable ways in which we can customize or extend it further to suit our needs.

Though I have covered a handful of capabilities that make PostgreSQL distinct from other open source SQL solutions- there are many more, I hope that this article will help you get an overview for why you might choose PostgreSQL.

Thanks for reading.


Published by HackerNoon on 2019/03/28