Oracle PL/SQL: GROUP BY

I recently wrote, HERE, how I regularly see commonly asked questions appearing in forums relating to the same topic. Like with SELECT INTO from my previous posts, I also see lots of questions about the correct usage of the GROUP BY clause and so I figured that I would write up an explanation of how to properly use this and then, again, I can reference this post’s URL in response to questions on this topic!

Side Note: For those that frequent this blog.. firstly, Thankyou! and secondly, no I haven’t moved away from Cloud topics. I have lots of ideas for posts over the coming months – I just enjoyed spinning up an oracle database in my free tier account and writing some PL/SQL for a change. Incidentally it happened to serve multiple purposes 🙂

Syntax

The PL/SQL GROUP BY clause is used in SQL queries to group rows of data by a particular value or expression and as result returns one row of data per group. This means that GROUP BY can absolutely return many rows of data, but that each row represents a number of other rows in the database. The syntax expected when using PL/SQL GROUP BY is as follows:

SELECT   columns(s)
FROM     table(s)
GROUP BY column(s)

When using WHERE conditions, the GROUP BY clause must come after the WHERE clause, as follows:

SELECT   columns(s)
FROM     table(s)
WHERE    condition(s)
GROUP BY column(s);

Often, you will see that the GROUP BY clause is used in conjunction with aggregate functions or expressions. There are many aggregate functions and expressions available in the Oracle database, which you can read about in the Oracle documentation HERE, however, most often you are probably seeing aggregate functions such as MIN(), MAX(), COUNT().

GROUP BY Examples

For the examples in this post, I will use the widely known “hr schema”. The entity-relationship diagram is as below.

Basic Example

The following example uses a GROUP BY clause to get a list of unique job_id‘s of all employees in the employee database table.

SELECT   job_id
FROM     employees 
GROUP BY job_id;

An explanation of this example is as below:

I used the SELECT statement to query data from the job_id column within the employee table
I used the GROUP BY clause to group together any rows of data with the same value in the job_id column into a single returned row of data (as this is a basic example, you could actually achieve the same outcome using: SELECT DISTINCT job_id FROM employees;
As per the below, the query returned a list of unique jobs that the employee’s within my database are assigned to. The query returned 19 results – note, my database table contains 107 employee records (none of these records have a NULL value in the job_id column)

Aggregate Functions Example

Whilst the above example helps to ascertain which jobs that I have employees assigned to, it is not overly useful. Perhaps it would be better to understand how many employees I have assigned to each job type – to do so, I must use an aggregate function (COUNT()):

SELECT   job_id,
         COUNT(employee_id)
FROM     employees 
GROUP BY job_id;

An explanation of this example is as below:

I used the SELECT statement to query data from the job_id and employee_id columns within the employee table
I used the COUNT() aggregate function to count the number of employee records that apply to this “grouping” (see bullet below)
I used the GROUP BY clause to group together any rows of data with the same value in the job_id column into a single returned row of data.
As per the below, the query returned a list of unique jobs that the employee’s within my database are assigned to, in addition, it returned the count of those employees with each job_id. The query returned 19 results – note, my database table contains 107 employee records (none of these records have a NULL value in the job_id column)

To make this data more meaningful, I could also join to the jobs table to return the actual Job Title rather than an internal ID, as follows:

SELECT   jobs.job_title,
         COUNT(emp.employee_id)
FROM     employees emp 
JOIN     jobs ON emp.job_id = jobs.job_id
GROUP BY jobs.job_title;

In this example, I simply joined to the jobs table which contains more descriptive information about each job type and used the job_title column from this table in my SELECT statement instead. Note – I also had to change the GROUP BY clause. This example returned the following results:

It’s also possible to add more returned data in your SELECT statement. In the below example, I am also retrieving additional useful data; the name of the department, the city in which this department is based and the minimum/maximum/average salary within each group of employees. You can see how this data become more useful by extending the query based upon the same grouping. Note that as I’m selecting additional data columns which are not aggregate functions, I must only apply the grouping to these columns.

SELECT   jobs.job_title,
         dep.department_name,
         loc.city,
         COUNT(emp.employee_id),
         MIN(emp.salary),
         MAX(emp.salary),
         AVG(emp.salary)
FROM     employees emp 
JOIN     jobs ON emp.job_id = jobs.job_id
JOIN     departments dep ON emp.department_id = dep.department_id
JOIN     locations loc   ON dep.location_id = loc.location_id
GROUP BY jobs.job_title, dep.department_name, loc.city;

“But, you are grouping by more than 1 column.” … I hear you say! Incidentally, my query still returns 19 records. This is simply due to basic data within the tables I have. What if I have programmers in more than a single city? Below I am inserting new IT departments that have different locations

-- insert new IT department based in Tokyo
INSERT INTO departments VALUES (DEPARTMENTS_SEQ.nextval, 'IT', null, 1200); 
-- insert new IT department based in Toronto
INSERT INTO departments VALUES (DEPARTMENTS_SEQ.nextval, 'IT', null, 1800); 
-- insert new IT department based in Utrecht
INSERT INTO departments VALUES (DEPARTMENTS_SEQ.nextval, 'IT', null, 3100);

Below, I am updating the departments that employees who are programmers are assigned to:

-- assign employee 103 to department 280
UPDATE employees SET department_id = 280 WHERE employee_id = 103;
-- assign employee 104 to department 290
UPDATE employees SET department_id = 290 WHERE employee_id = 104;
-- assign employee 105 to department 300
UPDATE employees SET department_id = 300 WHERE employee_id = 105;

Now, when I run my SQL query from above, you can see that there are multiple rows returned for employees that work the same job but are based in different locations – this is because loc.city was also in my GROUP BY clause.(note, I have filtered by department_name = 'IT' for the purposes of this screenshot. A total of 22 records were returned by the query)

To further illustrate this, I have removed loc.city from my SELECT statement and GROUP BY clause. The result returns to a single row.

Expressions Example

You may also find that you need to use SQL expressions in your queries. When doing so, you may be required to use the expression itself in the GROUP BY clause. The example below is a query that counts the number of employees hired in each year:

SELECT   EXTRACT(YEAR FROM hire_date) AS hire_year,
         COUNT(employee_id)
FROM     employees
GROUP BY EXTRACT(YEAR FROM hire_date)
ORDER BY hire_year DESC;

Here you will see that I am extracting the year from the employee’s hire date and counting the number of employees that were hired in that year. As I want to group on year, I also have to use the EXTRACT() function in my GROUP BY clause. You’ll also notice that I can also use ORDER BY in this query. The results are below:

WHERE Clause Example

Finally, the following example shows the use of both the WHERE clause and the GROUP BY clause. In fact, this is similar to the query I used earlier which enabled me to only show the results for Programmers. In this example I am using a WHERE clause to define that I only want to see grouped results where the location is ‘Seattle’

SELECT   jobs.job_title,
         dep.department_name,
         loc.city,
         COUNT(emp.employee_id),
         MIN(emp.salary),
         MAX(emp.salary),
         AVG(emp.salary)
FROM     employees emp 
JOIN     jobs            ON emp.job_id        = jobs.job_id
JOIN     departments dep ON emp.department_id = dep.department_id
JOIN     locations loc   ON dep.location_id   = loc.location_id
WHERE    loc.city = 'Seattle'
GROUP BY jobs.job_title, dep.department_name, loc.city;

The results of this query are as follows:

AMY SIMPSON-GRANGE – BLOG

One thought on “Oracle PL/SQL: GROUP BY”

Leave a comment Cancel reply