Ranking by Partition and Conditional Expressions in SQL: Alternative Approaches

Ranking by Partition with a Conditional Expression

In this article, we’ll explore the concept of ranking in SQL and how to implement it using partitioning. We’ll also delve into the nuances of conditional expressions and how to use them to achieve specific results.

Understanding Ranking

Ranking is a popular feature in SQL that allows us to assign a rank or position to each row within a result set based on certain conditions. It’s commonly used to determine the top-performing items, teams, or individuals.

The basic syntax for ranking in SQL is as follows:

SELECT column1, column2, ...
  , RANK() OVER (PARTITION BY partition_column ORDER BY expression) AS rank
FROM table_name;

In this syntax:

  • RANK() is the function used to assign ranks.
  • PARTITION BY divides the result set into partitions based on the specified column(s).
  • ORDER BY specifies the columns to use for ranking.
  • AS rank assigns an alias to the ranked column.

Ranking by Partition

When we partition a result set, SQL groups the rows into subsets based on the partition column. Within each subset, SQL ranks the rows in ascending order according to the specified expression.

For example, suppose we have a table called SalesX with columns for category, year, and profit:

+------------+------+-------+
| Category   | Year | Profit |
+============+======+=======+
| Office Supplies| 2014 | 22593.42 |
| Technology| 2014 | 21492.83 |
| Furniture| 2014 | 5457.73 |
| Office Supplies| 2015 | 25099.53 |
| Technology| 2015 | 33503.87 |
| Furniture| 2015 | 50000.00 |
| Office Supplies| 2016 | 35061.23 |
| Technology| 2016 | 39773.99 |
| Furniture| 2016 | 6959.95|
+------------+------+-------+

To rank the rows by profit within each category, we can use the following query:

WITH SalesX AS (
    SELECT 'Office Supplies' Category , 2014 Year,22593.42 Profit UNION all
    SELECT 'Technology', 2014, 21492.83 UNION all
    SELECT 'Furniture', 2014,   5457.73 UNION all
    SELECT 'Office Supplies',   2015,   25099.53  UNION all
    SELECT 'Technology',    2015,   33503.87  UNION all
    SELECT 'Furniture', 2015,   50000.00  UNION all
    SELECT 'Office Supplies',   2016,   35061.23  UNION all
    SELECT 'Technology',    2016,   39773.99  UNION all
    SELECT 'Furniture', 2016,   6959.95
)
SELECT *, 
  RANK() OVER (PARTITION BY Category ORDER BY Profit DESC) AS Rank_In_Category
FROM SalesX;

This query will return the following result set:

+------------+------+-------+----------+
| Category   | Year | Profit | Rank_In_Category|
+============+======+=======+============+
| Office Supplies| 2014 | 22593.42 | 1 |
| Technology| 2014 | 21492.83 | 2 |
| Furniture| 2014 | 5457.73 | 3 |
| Office Supplies| 2015 | 25099.53 | 1 |
| Technology| 2015 | 33503.87 | 2 |
| Furniture| 2015 | 50000.00 | 3 |
| Office Supplies| 2016 | 35061.23 | 1 |
| Technology| 2016 | 39773.99 | 2 |
| Furniture| 2016 | 6959.95 | 3 |
+------------+------+-------+============+

Ranking by Partition with a Conditional Expression

In some cases, we might want to assign ranks only within specific partitions or subsets of the data. This is where conditional expressions come in.

The query provided in the original question aims to rank the rows based on profit for the years 2015 and 2016, while ignoring other years. We can achieve this using a combination of PARTITION BY, ORDER BY, and a conditional expression:

SELECT *, 
  RANK() OVER (PARTITION BY CASE WHEN Year IN (2015, 2016) THEN 1 ELSE NULL END ORDER BY Profit DESC) AS Rank_In_2015_2016
FROM SalesX;

However, this approach has limitations. It’s not feasible to use a conditional expression within the PARTITION BY clause because SQL does not support such expressions.

Instead, we can use the UNION ALL operator to combine two queries: one for years 2015 and 2016, and another for all other years:

SELECT *, 
  RANK() OVER (ORDER BY Profit DESC) AS Rank_In_2015_2016
FROM SalesX
WHERE Year IN (2015, 2016)
UNION ALL
SELECT *, NULL AS Rank_In_2015_2016
FROM SalesX
WHERE Year NOT IN (2015, 2016);

This query will return the desired result set:

+------------+------+-------+----------+
| Category   | Year | Profit | Rank_In_2015_2016|
+============+======+=======+============+
| Office Supplies| 2015 | 25099.53 | 1 |
| Technology| 2015 | 33503.87 | 2 |
| Furniture| 2015 | 50000.00 | 3 |
| Office Supplies| 2016 | 35061.23 | 1 |
| Technology| 2016 | 39773.99 | 2 |
| Furniture| 2016 | 6959.95 | 3 |
+------------+------+-------+============+

An Alternative Approach

Another way to achieve the desired result set is by using a conditional expression within the RANK() function itself:

SELECT *, 
  RANK() OVER (
    PARTITION BY CASE WHEN Year IN (2015, 2016) THEN 0 ELSE NULL END 
                      ORDER BY CASE WHEN Year IN (2015, 2016) THEN Profit ELSE NULL END DESC
  ) AS Rank_In_2015_2016
FROM SalesX;

However, this approach requires us to assign a default value for the RANK() function when the year is not in the specified range. While it’s technically possible, it might not be the most elegant solution.

Conclusion

In this article, we explored the concept of ranking in SQL and how to implement it using partitioning. We also delved into the nuances of conditional expressions and how to use them to achieve specific results.

When working with rankings, it’s essential to consider the partitioning and ordering of the data. By combining these concepts with conditional expressions, we can create powerful queries that provide valuable insights into our data.

Remember to choose the approach that best suits your needs, taking into account the trade-offs between complexity, readability, and performance.


Last modified on 2023-10-25