Ranking functions in SQL

Ranking functions introduced with SQL Server 2005 allow us to assign a number to each row returned from a query. They allow us to rank each row in respect to others in several different ways. Ranking functions can be used only with the SELECT and ORDER BY statements. They cannot be used directly in a WHERE or GROUP BY clause, but can be used in a CTE or derived table.

The syntax for ranking functions is shown as follows:

<function_name>() OVER([PARTITION BY <partition_by_list>]
ORDER BY <order_by_list>)
  • function_name: Can be one of ROW_NUMBER, RANK, DENSE_RANK, and NTILE
  • OVER: Defines the details of how the ranking should order or split the data
  • PARTITION BY: Details which data the column should use as the basis of the splits
  • ORDER BY: Details the ordering of the data
There are four ranking functions in SQL Server:
  1. ROW_NUMBER – It allows us to provide incrementing sequential integer values to the rows in the result-set of a query based on logical order that is specified in the ORDER BY subclause of the OVER clause. The ROW_NUMBER function contains the OVER clause, which the function uses to determine the numbering behavior. The ORDER BY option, which determines the order in which the function applies the numbers, must be included in the query. We have the option of starting the numbers over whenever the values of a specified column change, called partitioning, with the PARTITION BY clause.

    Example Query 1: Basic use of ROW_NUMBER()
    SELECT CustomerID, FirstName + ' ' + LastName AS Name,
    ROW_NUMBER() OVER (ORDER BY LastName, FirstName) AS Row
    FROM Sales.Customer AS c
    INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID;

    Example Query 2: Using ROW_NUMBER() in a CTE
    WITH customers AS(
      SELECT CustomerID, FirstName + ' ' + LastName AS Name,
      ROW_NUMBER() OVER (ORDER BY LastName, FirstName) AS Row
      FROM Sales.Customer AS c
      INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID
      )
    SELECT CustomerID, Name, Row
    FROM customers
    WHERE Row > 50
    ORDER BY Row;

    Example Query 3: Using PARTITION BY option in ROW_NUMBER()
    SELECT CustomerID, FirstName + ' ' + LastName AS Name, c.TerritoryID,
    ROW_NUMBER() OVER (PARTITION BY c.TerritoryID ORDER BY LastName, FirstName) AS Row
    FROM Sales.Customer AS c
    INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID;

  2. RANKRANK assigns the same number to the duplicate rows and skips numbers not used. It assigns an ascending, nonunique ranking number to a set of rows, giving the same number to duplicate rows; numbers are skipped for the number of rows that have the same value.
    If rows 2 and 3 are duplicates, RANK will supply the values 1, 3, 3, and 4.

    Example Query 1: Using RANK
    SELECT ROW_NUMBER() OVER (PARTITION BY SUBSTRING(LastName,1,2) ORDER BY LastName) AS RowNum,
    RANK() OVER(ORDER BY SUBSTRING(LastName,1,2) ) AS Ranking, CONCAT(FirstName,' ',LastName) AS CustomerName, UnclearedBalance
    FROM CustomerDetails.Customers
    WHERE UnclearedBalance is not null
    ORDER BY Ranking

  3. DENSE_RANK - This is similar to RANK, but each row number returned will be one greater than the previous setting, no matter how many rows are the same. DENSE_RANK doesn’t skip numbers.
    If rows 2 and 3 are duplicates, DENSE_RANK will supply the values 1, 2, 2, and 3.

    Example Query:
    SELECT ROW_NUMBER() OVER (PARTITION BY SUBSTRING(LastName,1,2) ORDER BY LastName) AS RowNum,
    DENSE_RANK() OVER(ORDER BY SUBSTRING(LastName,1,2) ) AS Ranking, CONCAT(FirstName,'',LastName) AS CustomerName, UnclearedBalance
    FROM CustomerDetails.Customers
    WHERE UnclearedBalance is not null
    ORDER BY Ranking

    Note: Both RANK and DENSE_RANK are similar to the ROW_NUMBER function, but they produce the same ranking value in all rows that have the same logical ordering value.
    The difference between RANK and DENSE_RANK is that RANK indicates how many rows have a lower ordering value, whereas DENSE_RANK indicates how many distinct ordering values are lower.
    For example, a rank of 9 indicates eight rows with lower values. A dense rank of 9 indicates eight distinct lower values.

  4. NTILE – The NTILE function assigns buckets to groups of rows. It allows us to associate the rows in the result with tiles (equally sized groups of rows) by assigning a tile number to each row. This takes the rows from the query and places them into an equal (or as close to equal as possible) number of specified numbered groups, where NTILE returns the group number the row belongs to. The value in parentheses after NTILE defines the number of groups to produce, so NTILE(25) would produce 25 groups of as close a split as possible of even numbers.

    Example Query:
    SELECT NTILE(10) OVER (ORDER BY LastName) AS BatchNumber, CONCAT(FirstName,' ',LastName) AS CustomerName, UnclearedBalance
    FROM CustomerDetails.Customers
    WHERE UnclearedBalance is not null
    ORDER BY BatchNumber
The following SQL query uses all the ranking functions in a single query:
 
SELECT orderid, custid, val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum,
RANK() OVER(ORDER BY val) AS rank,
DENSE_RANK() OVER(ORDER BY val) AS dense_rank,
NTILE(100) OVER(ORDER BY val) AS ntile
FROM Sales.OrderValues
ORDER BY val;

No comments:

Post a Comment