In one of our previous articles, we have mentioned a few basic SQL queries to search in a database. In this article, we have described a few nested SQL queries and aggregate functions for information retrieval from a biological database.
1. To retrieve a limited number of records from a database.
>SELECT TOP number * FROM table_name;
For example, you have a table named sequences in your database having different columns such as OrgName, Sequence, OrgNumber, and so on. Since a single species may consist of multiple sequences and you want to retrieve only the top 3 sequences of that species, then use the following query.
>SELECT TOP 3 * FROM sequences;
In MySQL, the LIMIT clause does the same thing,
>SELECT column_names * FROM table_name
WHERE condition LIMIT number;
For example,
>SELECT * FROM sequences
WHERE OrgName=’Arabidopsis thaliana’
LIMIT 3;
2. To retrieve the minimum and maximum values from the selected column
To find the minimum value
>SELECT MIN(column_name) FROM table_name
WHERE condition;
For example,
>SELECT MIN(strains) AS MinimumStrains
FROM sequences;
To find the maximum value
>SELECT MAX(column_name) FROM table_name
WHERE condition;
For example,
>SELECT MAX(strains) AS MinimumStrains
FROM sequences
WHERE OrgName=’Arabidopsis thaliana’;
3. Aggregate functions in SQL
To find the number of rows matching a condition
>SELECT COUNT(column_name)
FROM table_name
WHERE condition;
For example,
>SELECT COUNT(OrgNumber)
FROM sequences;
To find the average value of a numeric column in a table
>SELECT AVG(column_name)
FROM table_name
WHERE condition;
For example,
>SELECT AVG(TotSeqs)
FROM sequences;
To find the sum of a numeric column
>SELECT SUM(column_name)
FROM table_name
WHERE condition;
For example,
>SELECT SUM(TotSeqs)
FROM sequences;
4. To search for a specific pattern in a column
>SELECT column1, column2, …
FROM table_name
WHERE column_name LIKE pattern;
There are several other wildcards that can be used with LIKE operator such as ‘*’, ‘[]’, ‘^’, etc. They can be used in combination as well. There are two wildcards mostly used with LIKE operator: ‘%’ and ‘_’.
% represents zero, one, or multiple characters, whereas, _ represents a single character. These wildcards are used before and/or after the specific characters or a pattern you are looking for.
For example, if you wish to search for sequences of a species using a few characters, then use the following query.
>SELECT Sequence, OrgName
FROM sequences
WHERE OrgName LIKE ‘%dopsis%’;
>SELECT Sequence, OrgName
FROM sequences
WHERE OrgName LIKE ‘%thaliana’;
>SELECT Sequence, OrgName
FROM sequences
WHERE OrgName LIKE ‘_ _abidop%’;
5. To specify multiple values in a WHERE clause
>SELECT column1, column2, …
FROM table_name
WHERE column_name IN (value1, value2, …);
For example,
>SELECT * FROM sequences
WHERE OrgName IN (‘Arabidopsis thaliana’, ‘Agrocybe aegerita’, ‘Homo sapiens’);
Or instead of entering values, SELECT statement can also be used as shown below,
>SELECT * FROM sequences
WHERE OrgName IN (SELECT statement);
If you wish to select records that are not present in the given values, then use the following query.
>SELECT * FROM sequences
WHERE OrgName NOT IN (‘Arabidopsis thaliana’, ‘Agrocybe aegerita’, ‘Homo sapiens’);
6. To select records within a given range in a table
>SELECT column1, column2, …
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
For example,
>SELECT * FROM sequences
WHERE OrgNumber BETWEEN 102 AND 130;
Other SQL queries for complex information retrieval from a database will be described in upcoming articles.