Select (SQL)
The SQL SELECT statement returns a result set of records from one or more tables.
It retrieves zero or more rows from one or more base tables, temporary tables, or views in a database. In most applications, SELECT is the most commonly used Data Manipulation Language (DML) command. As SQL is a non-procedural language, SELECT queries specify a result set, but do not specify how to calculate it: translating the query into an executable "query plan" is left to the database system, more specifically to the query optimizer.
The SELECT statement has many optional clauses:
- WHERE specifies which rows to retrieve.
- GROUP BY groups rows sharing a property so that an aggregate function can be applied to each group.
- HAVING selects among the groups defined by the GROUP BY clause.
- ORDER BY specifies an order in which to return the rows.
Examples
Table "T" | Query | Result | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
SELECT * FROM T; |
|
||||||||||||
|
SELECT C1 FROM T; |
|
||||||||||||
|
SELECT * FROM T WHERE C1 = 1; |
|
||||||||||||
|
SELECT * FROM T ORDER BY C1 DESC; |
|
Given a table T, the query SELECT * FROM T will result in all the elements of all the rows of the table being shown.
With the same table, the query SELECT C1 FROM T will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a projection in Relational algebra, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.
With the same table, the query SELECT * FROM T WHERE C1 = 1 will result in all the elements of all the rows where the value of column C1 is '1' being shown — in Relational algebra terms, a selection will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.
Limiting result rows
Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.
In ISO SQL:2003, result sets may be limited by using
- cursors, or
- By introducing SQL window function to the SELECT-statement
ISO SQL:2008 introduced the FETCH FIRST clause.
ROW_NUMBER() window function
ROW_NUMBER() OVER may be used for a simple table on the returned rows, e.g. to return no more than ten rows:
SELECT * FROM ( SELECT ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS row_number,COLUMNS FROM tablename ) foo WHERE row_number <= 10
ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.
RANK() window function
The RANK() OVER window function acts like ROW_NUMBER, but may return more than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:
SELECT * FROM (SELECT RANK() OVER (ORDER BY age ASC) AS ranking, person_id, person_name, age FROM person ) AS foo WHERE ranking <= 10
The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.
FETCH FIRST clause
Since ISO SQL:2008 results limits can be specified as in the following example using the FETCH FIRST clause.
SELECT * FROM T FETCH FIRST 10 ROWS ONLY
This clause currently is supported by IBM DB2, Sybase SQL Anywhere, PostgreSQL, EffiProz and HSQLDB version 2.0.
Non-standard syntax
Result limits
Not all DBMSes support the mentioned window functions, and non-standard syntax has to be used. Below, variants of the simple limit query for different DBMSes are listed:
SELECT * FROM T LIMIT 10 OFFSET 20 | MySQL, PostgreSQL (also supports the standard, since version 8.4), SQLite, HSQLDB, H2 |
SELECT * from T WHERE ROWNUM <= 10 | Oracle (also supports the standard, since Oracle8i) |
SELECT FIRST 10 * from T | Ingres |
SELECT FIRST 10 * FROM T order by a | Informix |
SELECT SKIP 20 FIRST 10 * FROM T order by c, d | Informix (row numbers are filtered after order by is evaluated. SKIP clause was introduced in a v10.00.xC4 fixpack) |
SELECT TOP 10 * FROM T | MS SQL Server, Sybase ASE, MS Access |
SELECT TOP 10 START AT 20 * FROM T | Sybase SQL Anywhere (also supports the standard, since version 9.0.1) |
SELECT FIRST 10 SKIP 20 * FROM T | Interbase, Firebird |
SELECT * FROM T ROWS 20 TO 30 | Firebird (since version 2.1) |
SELECT * FROM T WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY | DB2 |
SELECT * FROM T WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY | DB2 (new rows are filtered after comparing with key column of table T) |
Hierarchical query
Some databases provide specialised syntax for hierarchical data.
Window function
A window function in SQL:2003 is an aggregate function applied to a partition of the result set.
For example,
sum(population) OVER( PARTITION BY city )
calculates the sum of the populations of all rows having the same city value as the current row.
Partitions are specified using the OVER clause which modifies the aggregate. Syntax:
<OVER_CLAUSE> :: = OVER ( [ PARTITION BY <expr>, ... ] [ ORDER BY <expression> ] )
The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.
Query Evaluation ANSI The processing of a SELECT statement according to ANSI SQL would be the following:
select g.* from users u inner join groups g on g.Userid = u.Userid where u.LastName = 'Smith' and u.FirstName = 'John'
1) the FROM clause is evaluated, a cross join or Cartesian product is produced for the first two tables in the FROM clause resulting in a virtual table as Vtable1 2) the ON clause is evaluated for vtable1, only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2 3) If an outer join is specified records which were droped from vTable2 are added into VTable 3, for instance if the above query were:
select u.* from users u left join groups g on g.Userid = u.Userid where u.LastName = 'Smith' and u.FirstName = 'John'
all users who did not belong to any groups would be added back into Vtable3
4) the WHERE clause is evaluated, in this case only group information for for user John Smith would be added to vTable4
5) the GROUP BY is evaluated, if the above query were:
select g.GroupName, count(g.*) as NumberOfMembers from users u inner join groups g on g.Userid = u.Userid group by GroupName
vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
6) the HAVING caluse is evaluted for groups for which the HAVING clause is true and inserted into vTable6. For example:
select g.GroupName, count(g.*) as NumberOfMembers from users u inner join groups g on g.Userid = u.Userid group by GroupName having count(g.*) > 5
7) the SELECT list is evaluated and returned as Vtable 7
8) the DISTINCT clause is evaluated, duplicate rows are removed and returned as Vtable 8
9) the ORDER BY clause is evaluated, ordering the rows and returning VCursor9. This is a cursor and not a table because ANSI defines a cursor as an ordered set of rows (not relational).
From
The SQL From clause is the source of a rowset to be operated upon in a Data Manipulation Language (DML) statement. From clauses are very common, and will provide the rowset to be exposed through a Select statement, the source of values in an Update statement, and the target rows to be deleted in a Delete statement.
FROM is an SQL reserved word in the SQL standard
The FROM clause is used in conjunction with SQL statements, and takes the following general form:
SQL-DML-Statement FROM table_name WHERE predicate
The From clause can generally be anything that returns a rowset, a table, view, function, or system-provided information like the Information Schema, which is typically running proprietary commands and returning the information in a table form.
Examples
The following query returns only those rows from table mytable where the value in column mycol is greater than 100.
SELECT * FROM mytable WHERE mycol > 100
Requirement
The From clause is technically required in relational algebra and in most scenarios to be useful. However many relational dbms implementations may not require it for selecting a single value, or single row.
SELECT 3.14 AS Pi
Other systems will require a From statement with a keyword, even to select system data.
SELECT to_char(sysdate, 'Dy DD-Mon-YYYY HH24:MI:SS') AS "Current Time" FROM dual;
Where
A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but should be used to limit the number of rows affected by a SQL DML statement or returned by a query.
Overview
WHERE is an SQL reserved word.
The WHERE clause is used in conjunction with SQL DML statements, and takes the following general form:
SQL-DML-Statement FROM table_name WHERE predicate
all rows for which the predicate in the WHERE clause is True are affected (or returned) by the SQL DML statement or query. Rows for which the predicate evaluates to False or Unknown (NULL) are unaffected by the DML statement or query.
The following query returns only those rows from table mytable where the value in column mycol is greater than 100.
SELECT * FROM mytable WHERE mycol > 100
The following DELETE statement removes only those rows from table mytable where the column mycol is either NULL or has a value that is equal to 100.
DELETE FROM mytable WHERE mycol IS NULL OR mycol = 100
Predicates
Simple predicates use one of the operators =, <>, >, >=, <, <=, IN, BETWEEN, LIKE, IS NULL or IS NOT NULL.
Predicates can be enclosed in parentheses if desired. The keywords AND and OR can be used to combine two predicates into a new one. If multiple combinations are applied, parentheses can be used to group combinations to indicate the order of evaluation. Without parentheses, the AND operator has a stronger binding than OR.
The following example deletes rows from mytable where the value of mycol is greater than 100, and the value of item is equal to the string literal 'Hammer':
DELETE FROM mytable WHERE mycol > 100 AND item = 'Hammer'
IN
IN will find any values existing in a set of candidates.
SELECT ename WHERE ename IN ('value1', 'value2', ...)
All rows match the predicate if their value is one of the candidate set of values. This is the same behavior as
SELECT ename WHERE ename='value1' OR ename='value2'
except that the latter could allow comparison of several columns, which each IN clause does not. For a larger number of candidates, IN is less verbose.
BETWEEN
BETWEEN will find any values within a range.
SELECT ename WHERE ename BETWEEN 'value1' AND 'value2'
All rows match the predicate if their value is between 'value1' and 'value2', inclusive.
LIKE
LIKE will find a string fitting a certain description.
- Ending Wildcard
- Find any string that begins with the letter 'S'
SELECT ename FROM emp WHERE ename LIKE 'S%';
- Leading Wildcard
- Find any string that ends with the letter 'S'
SELECT ename FROM emp WHERE ename LIKE '%S';
- Multiple Wildcards
- Find any string that contains, anywhere, the letter 'S'
SELECT ename FROM emp WHERE ename LIKE '%S%';
- Single Character Wildcard
- Find any string that contains the letter 'A' followed by any single character followed by the letter 'E'
SELECT ename FROM emp WHERE ename LIKE '%A_E%';
SQL programmers need to be aware that the LIKE predicate typically performs a search without the normal performance benefit of indexes. Using '=', '<>', etc.. instead will increase performance. Users of the LIKE predicate should be aware that case sensitivity (e.g., 'S' versus 's') may be different based upon database product or configuration.
Order by
An ORDER BY clause in SQL specifies that a SQL SELECT statement returns a result set with the rows being sorted by the values of one or more columns. The sort criteria do not have to be included in the result set. The sort criteria can be expressions, including - but not limited to - column names, user-defined functions, arithmetic operations, or CASE expressions. The expressions are evaluated and the results are used for the sorting, i.e. the values stored in the column or the results of the function call.
ORDER BY is the only way to sort the rows in the result set. Without this clause, the relational database system may return the rows in any order. If an ordering is required, the ORDER BY must be provided in the SELECT statement sent by the application. Although some database systems allow the specification of an ORDER BY clause in subselects or view definitions, the presence there has no effect. A view is a logical relational table, and the relational model mandates that a table is a set of rows, implying no sort order whatsoever. The only exception are constructs like ORDER BY ORDER OF ... (not standardized in SQL:2003) which allow the propagation of sort criteria through nested subselects.
The SQL standard's core functionality does not explicitly define a default sort order for Nulls. With the SQL:2003 extension T611, "Elementary OLAP operations", nulls can be sorted before or after all data values by using the NULLS FIRST or NULLS LAST clauses of the ORDER BY list, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for Null sorting in the DBMS.
Structure ORDER BY ... DESC will order in descending order, otherwise ascending order is used.
Examples
SELECT * FROM Employees ORDER BY LastName, FirstName
This sorts by the LastName field, then by the FirstName field if LastName matches.
Having
A HAVING clause in SQL specifies that an SQL SELECT statement should only return rows where aggregate values meet the specified conditions. It was added to the SQL language because the WHERE keyword could not be used with aggregate functions.
Examples
To return a list of department IDs whose total sales exceeded $1000 on the date of January 1, 2000, along with the sum of their sales on that date:
SELECT DeptID, SUM(SaleAmount)FROM Sales WHERE SaleDate = '01-Jan-2000'GROUP BY DeptID HAVING SUM(SaleAmount) > 1000
Referring to the sample tables in the Join (SQL) example, the following query will return the list of departments who have more than 1 employee:
SELECT DepartmentName, COUNT(*)FROM employee,department WHERE employee.DepartmentID = department.DepartmentID GROUP BY DepartmentName HAVING COUNT(*)>1;
Content extracted from free documentation (Wikipedia and other sources).