Try / PostgreSQL in Y minutes
This is an overview of SQL as implemented in PostgreSQL.
Make sure that you have created a database, and can connect to it using the shell (psql
) or other tools. If you haven't, see the Getting Started guide first.
Concepts · Creating tables · Populating with data · Querying · Joins · Aggregates · Updates · Deletions · Further reading
✨ This guide needs some love
The guide only covers the basics of SQL. It would be great to add some PostgreSQL specifics. If you'd like to help — please contribute!
Concepts
PostgreSQL is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation is essentially a mathematical term for a table.
Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. While columns have a fixed order in each row, SQL does not guarantee the order of the rows within the table in any way (although they can be explicitly sorted for display).
Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access.
Creating a new table
Create a new table by specifying the table name, along with all column names and their types:
create table weather (
city varchar(80),
temp_lo integer, -- minimum temperature on a day
temp_hi integer, -- maximum temperature on a day
prcp real, -- precipitation
date date
);
You can type this into the shell with the line breaks. The command is not terminated until the semicolon.
White space (i.e., spaces, tabs, and newlines) can be used freely in SQL commands. This means you can type the command aligned differently than above, or even all on one line.
Two dashes (--
) introduce comments. Whatever follows is ignored until the end of the line. SQL is case-insensitive for keywords and identifiers, except when identifiers are double-quoted to preserve the case (not shown above).
In the SQL command, we first specify the type of command we want to execute: create table
. Then the command parameters: table name (weather
), followed by column names and column types.
varchar(80)
specifies that the column can store arbitrary character strings up to 80 characters long.integer
columns store integer numbers (i.e., whole numbers without a decimal point).real
columns store single precision floating-point numbers (i.e., numbers with a decimal point).date
columns store dates (i.e., a combination of year, month, day). Adate
value stores only the specific day, not the time associated with that day.
PostgreSQL supports the standard SQL types integer
, smallint
, real
, double precision
, char(n)
, varchar(n)
, date
, time
, timestamp
and interval
, as well as common types and a rich set of geometric types.
PostgreSQL can be customized with an arbitrary number of user-defined data types. Consequently, type names are not keywords in the syntax, except where required to support special cases in the SQL standard.
The second example stores cities and their associated geographic location:
create table cities (
name varchar(80),
location point
);
The point
type is an example of a PostgreSQL-specific data type.
If you don't need a table anymore or want to recreate it differently, you can remove it with the drop table
command:
drop table cities;
Populating a table with rows
Use insert
statement to populate a table with rows:
insert into weather
values ('San Francisco', 46, 50, 0.25, '1994-11-27');
All data types use fairly obvious input formats. Constants that are not numeric values (e.g., text and dates) must be enclosed in single quotes (''
), as in the example. Date values are usually formatted as 'YYYY-MM-DD'
(although the date
type is actually quite flexible in what it accepts).
The point
type requires a coordinate pair as input:
insert into cities
values ('San Francisco', '(-194.0, 53.0)');
The syntax used so far requires you to remember the order of the columns. An alternative syntax allows you to list the columns explicitly:
insert into weather (city, temp_lo, temp_hi, prcp, date)
values ('San Francisco', 43, 57, 0.0, '1994-11-29');
You can list the columns in a different order if you wish or even omit some columns, e.g., if the precipitation is unknown:
insert into weather (date, city, temp_hi, temp_lo)
values ('1994-11-29', 'Hayward', 54, 37);
Many developers consider explicitly listing the columns better style than relying on the order implicitly.
You can also use copy
to load large amounts of data from CSV files. This is usually faster because the copy
command is optimized for this use case, but allows less flexibility than insert
. An example with weather.csv
would be:
copy weather from 'weather.csv';
where the file name for the source file must be available on the machine running the backend process, not the client (since the backend process reads the file directly). You can read more about the copy
command in the COPY section.
Querying a table
To retrieve data from a table, the table is queried. This is done using an SQL select
statement. The statement is divided into a select list (the part that lists the columns to be returned), a table list (the part that lists the tables from which to retrieve the data), and an optional qualification (the part that specifies any restrictions).
Retrieve all the rows of the weather
table:
select *
from weather;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(3 rows)
Here *
is a shorthand for "all columns". So the same result would be achieved with:
select city, temp_lo, temp_hi, prcp, date
from weather;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(3 rows)
You can write expressions, not just simple column references, in the select list. For example, you can do:
select city, (temp_hi+temp_lo)/2 as temp_avg, date
from weather;
┌───────────────┬──────────┬────────────┐
│ city │ temp_avg │ date │
├───────────────┼──────────┼────────────┤
│ San Francisco │ 48 │ 1994-11-27 │
│ San Francisco │ 50 │ 1994-11-29 │
│ Hayward │ 45 │ 1994-11-29 │
└───────────────┴──────────┴────────────┘
(3 rows)
Notice how the as
clause is used to relabel the output column. (The as
clause is optional.)
A query can be "qualified" by adding a where
clause that specifies which rows are wanted. The where
clause contains a Boolean (truth value) expression, and only rows for which the Boolean expression is true are returned. The usual Boolean operators (and
, or
, and not
) are allowed in the qualification.
Retrieve the weather of San Francisco on rainy days:
select *
from weather
where city = 'San Francisco' and prcp > 0.0;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(1 row)
Return the results of a query in sorted order:
select *
from weather
order by city;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(3 rows)
In this example, the sort order isn't fully specified, so you might get the San Francisco rows in either order. To get a fixed order, add another column to sort the rows with equal city
values:
select *
from weather
order by city, temp_lo;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(3 rows)
Remove the duplicate rows from the result:
select distinct city
from weather;
┌───────────────┐
│ city │
├───────────────┤
│ Hayward │
│ San Francisco │
└───────────────┘
(2 rows)
Here again, the result row ordering might vary. You can ensure consistent results by using distinct
and order by
together:
select distinct city
from weather
order by city;
┌───────────────┐
│ city │
├───────────────┤
│ Hayward │
│ San Francisco │
└───────────────┘
(2 rows)
Joins between tables
So far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time.
Queries that access multiple tables (or multiple instances of the same table) at once are called join queries. They combine rows from one table with rows from a second table, using an expression that specifies which rows are to be paired.
For example, to return all the weather records together with the location of the associated city, the database needs to compare the city
column of each row in the weather
table with the name
column of each row in the cities
table, and select the pairs of rows where these values match.
Here is the query to accomplish this:
select *
from weather join cities on city = name;
┌───────────────┬─────────┬─────────┬──────┬────────────┬───────────────┬───────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ location │
├───────────────┼─────────┼─────────┼──────┼────────────┼───────────────┼───────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ (-194,53) │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │ San Francisco │ (-194,53) │
└───────────────┴─────────┴─────────┴──────┴────────────┴───────────────┴───────────┘
(2 rows)
Note two things about the result set:
- There is no result row for the city of Hayward. This is because there is no matching entry in the
cities
table for Hayward, so the join ignores the unmatched rows in theweather
table. We will fix this shortly. - There are two columns containing the city name. This is correct because the lists of columns from the
weather
andcities
tables are concatenated. In practice, however, this is undesirable, so you will probably want to list the output columns explicitly rather than using*
:
select city, temp_lo, temp_hi, prcp, date, location
from weather join cities on city = name;
┌───────────────┬─────────┬─────────┬──────┬────────────┬───────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ location │
├───────────────┼─────────┼─────────┼──────┼────────────┼───────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ (-194,53) │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │ (-194,53) │
└───────────────┴─────────┴─────────┴──────┴────────────┴───────────┘
(2 rows)
Since the columns all had different names, the parser automatically found which table they belong to. If there were duplicate column names in the two tables, you'd have to qualify the column names to show which one you meant, as in:
select
weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
from weather join cities on weather.city = cities.name;
┌───────────────┬─────────┬─────────┬──────┬────────────┬───────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ location │
├───────────────┼─────────┼─────────┼──────┼────────────┼───────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ (-194,53) │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │ (-194,53) │
└───────────────┴─────────┴─────────┴──────┴────────────┴───────────┘
(2 rows)
It is generally considered good practice to qualify all column names in a join query, so that the query won't fail if a duplicate column name is later added to one of the tables.
Join queries of the kind seen so far can also be written in this alternative form:
select *
from weather, cities
where city = name;
┌───────────────┬─────────┬─────────┬──────┬────────────┬───────────────┬───────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ location │
├───────────────┼─────────┼─────────┼──────┼────────────┼───────────────┼───────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ (-194,53) │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │ San Francisco │ (-194,53) │
└───────────────┴─────────┴─────────┴──────┴────────────┴───────────────┴───────────┘
(2 rows)
This syntax predates the join/on
syntax introduced in SQL-92. The tables are simply listed in the from
clause, and the comparison expression is added to the where
clause.
The results from this older implicit syntax and the newer explicit join/on
syntax are identical. But for a reader of the query, the explicit syntax makes its meaning easier to understand: The join condition is introduced by its own keyword, whereas before the condition was mixed in with other conditions in the where
clause.
Now we will figure out how to get the Hayward records back in. What we want the query to do is to scan the weather
table and for each row to find the matching cities
row(s). If no matching row is found, we want some "empty values" to be substituted for the cities
table's columns. This kind of query is called an outer join. (The joins we have seen so far are inner joins.)
Here is the query:
select *
from weather
left outer join cities on weather.city = cities.name;
┌───────────────┬─────────┬─────────┬──────┬────────────┬───────────────┬───────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ location │
├───────────────┼─────────┼─────────┼──────┼────────────┼───────────────┼───────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ (-194,53) │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │ San Francisco │ (-194,53) │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │ │ │
└───────────────┴─────────┴─────────┴──────┴────────────┴───────────────┴───────────┘
(3 rows)
This query is called a left outer join because the table on the left of the join
operator will have each of its rows in the output at least once, while the table on the right will have only those rows that match some row of the left table. When outputting a left-table row for which there is no right-table match, empty (null
) values are substituted for the right-table columns.
There are also right outer joins and full outer joins, but we will not cover them here. See SQL join flavors for details.
We can also join a table against itself. This is called a self join. Suppose we want to find all the weather records that are within the temperature range of other weather records. So we need to compare the temp_lo
and temp_hi
columns of each weather
row to the temp_lo
and temp_hi
columns of all other weather
rows. We can do this with the following query:
select
w1.city, w1.temp_lo as low, w1.temp_hi as high,
w2.city, w2.temp_lo as low, w2.temp_hi as high
from weather w1 join weather w2
on w1.temp_lo < w2.temp_lo and w1.temp_hi > w2.temp_hi;
┌───────────────┬─────┬──────┬───────────────┬─────┬──────┐
│ city │ low │ high │ city │ low │ high │
├───────────────┼─────┼──────┼───────────────┼─────┼──────┤
│ San Francisco │ 43 │ 57 │ San Francisco │ 46 │ 50 │
│ Hayward │ 37 │ 54 │ San Francisco │ 46 │ 50 │
└───────────────┴─────┴──────┴───────────────┴─────┴──────┘
(2 rows)
Here we have relabeled the weather
table as w1
and w2
to distinguish the left and right side of the join. You can also use these kinds of aliases in other queries to save some typing, e.g.:
select *
from weather w
join cities c on w.city = c.name;
┌───────────────┬─────────┬─────────┬──────┬────────────┬───────────────┬───────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ location │
├───────────────┼─────────┼─────────┼──────┼────────────┼───────────────┼───────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ (-194,53) │
│ San Francisco │ 43 │ 57 │ 0 │ 1994-11-29 │ San Francisco │ (-194,53) │
└───────────────┴─────────┴─────────┴──────┴────────────┴───────────────┴───────────┘
(2 rows)
You will see this type of abbreviation quite often.
Aggregate functions
Like most other relational database products, PostgreSQL supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the count
, sum
, avg
(average), max
(maximum) and min
(minimum) over a set of rows.
Find the highest low-temperature reading anywhere:
select max(temp_lo)
from weather;
┌─────┐
│ max │
├─────┤
│ 46 │
└─────┘
(1 row)
If we wanted to know what city (or cities) that reading occurred in, we might try:
-- WRONG
select city
from weather
where temp_lo = max(temp_lo);
ERROR: aggregate functions are not allowed in WHERE
LINE 3: where temp_lo = max(temp_lo);
^
This will not work because the aggregate max
cannot be used in the where
clause. This restriction exists because the where
clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.
However, as is often the case, the query can be restated to achieve the desired result, here by using a subquery:
select city
from weather
where temp_lo = (select max(temp_lo) from weather);
┌───────────────┐
│ city │
├───────────────┤
│ San Francisco │
└───────────────┘
(1 row)
This is OK because the subquery is an independent computation that computes its own aggregate separately from what happens in the outer query.
Aggregates are also very useful in combination with group by
clauses. For example, we can get the number of readings and the maximum low temperature observed in each city:
select city, count(*), max(temp_lo)
from weather
group by city;
┌───────────────┬───────┬─────┐
│ city │ count │ max │
├───────────────┼───────┼─────┤
│ Hayward │ 1 │ 37 │
│ San Francisco │ 2 │ 46 │
└───────────────┴───────┴─────┘
(2 rows)
which gives us one output row per city. Each aggregate result is computed over the table rows matching that city.
We can filter these grouped rows using having
:
select city, count(*), max(temp_lo)
from weather
group by city
having max(temp_lo) < 40;
┌─────────┬───────┬─────┐
│ city │ count │ max │
├─────────┼───────┼─────┤
│ Hayward │ 1 │ 37 │
└─────────┴───────┴─────┘
(1 row)
which gives us the same results, but only for cities where all temp_lo
values are below 40.
Finally, if we only care about cities whose names begin with "S", we can use the like
operator:
select city, count(*), max(temp_lo)
from weather
where city like 'S%'
group by city;
┌───────────────┬───────┬─────┐
│ city │ count │ max │
├───────────────┼───────┼─────┤
│ San Francisco │ 2 │ 46 │
└───────────────┴───────┴─────┘
(1 row)
The fundamental difference between where
and having
is this:
where
selects input rows before groups and aggregates are computed (so it controls which rows go into the aggregate computation).having
selects group rows after groups and aggregates are computed.
Thus, the where
clause must not contain aggregate functions: it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the having
clause always contains aggregate functions. (Strictly speaking, you can write a having
clause that doesn't use aggregates, but this is rarely useful. The same condition could be used more efficiently at the where
stage.)
In the previous example, we can apply the city name restriction in where
, since it needs no aggregate. This is more efficient than adding the restriction to having
, because we avoid doing the grouping and aggregate calculations for all rows that fail the where
check.
Another way to select the rows that go into an aggregate computation is to use filter
, which is a per-aggregate option:
select
city,
count(*) filter (where temp_lo < 45),
max(temp_lo)
from weather
group by city;
┌───────────────┬───────┬─────┐
│ city │ count │ max │
├───────────────┼───────┼─────┤
│ Hayward │ 1 │ 37 │
│ San Francisco │ 1 │ 46 │
└───────────────┴───────┴─────┘
(2 rows)
filter
is similar to where
, except that it only removes rows from the input of the particular aggregate function it is attached to. Here, the count
aggregate only counts rows with temp_lo
below 45; but the max
aggregate is still applied to all rows, so it still finds the reading of 46.
Updates
To update existing rows, use the update
command. Suppose you discover the temperature readings are all off by 2 degrees after November 28. You can correct the data as follows:
update weather
set temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
where date > '1994-11-28';
Look at the new state of the data:
select * from weather;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 41 │ 55 │ 0 │ 1994-11-29 │
│ Hayward │ 35 │ 52 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(3 rows)
Deletions
To remove rows from a table, use the delete
command. Suppose you are no longer interested in the weather of Hayward. Then delete those rows from the table as follows:
delete from weather
where city = 'Hayward';
All Hayward weather records are removed:
select * from weather;
┌───────────────┬─────────┬─────────┬──────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
├───────────────┼─────────┼─────────┼──────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 41 │ 55 │ 0 │ 1994-11-29 │
│ Hayward │ 35 │ 52 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴──────┴────────────┘
(3 rows)
Beware of delete
without where
:
delete from weather;
Without a qualification, delete
will remove all rows from the table, leaving it empty. The system will not ask for confirmation!
Further reading
For more information about SQL as implemented in PostgreSQL, see the documentation.
PostgreSQL Team + 1 others · original · CC-BY-SA-4.0 · 2024-02-15
PostgreSQL Team, Anton Zhiyanov