Hi Community,
In this article, I listed 5 useful SQL functions with explanations and query examples 👇🏻
These 5 functions are
- COALESCE
- RANK
- DENSE_RANK
- ROW_NUMBER
- Function to Get Running Totals
So Let us start with COALESCE function
#COALESCE
The COALESCE function evaluates a list of expressions in left-to-right order and returns the value of the first non-NULL expression. If all expressions evaluate to NULL, NULL is returned.
Following statement will return first not null value which is 'intersystems'
SELECT COALESCE(NULL, NULL, NULL,'intersystems', NULL,'sql')
SQLSQL
Let us create below table for further example
CREATE TABLE EXPENSES(
TDATE DATE NOT NULL,
EXPENSE1 NUMBER NULL,
EXPENSE2 NUMBER NULL,
EXPENSE3 NUMBER NULL,
TTYPE CHAR(30) NULL)
SQLSQL
Now let us insert some dummy data to test our function
INSERT INTO sqluser.expenses (tdate, expense1,expense2,expense3,ttype )
SELECT {d'2023-01-01'}, 500,400,NULL,'Present'
UNION ALL
SELECT {d'2023-01-01'}, NULL,50,30,'SuperMarket'
UNION ALL
SELECT {d'2023-01-01'}, NULL,NULL,30,'Clothes'
UNION ALL
SELECT {d'2023-01-02'}, NULL,50,30 ,'Present'
UNION ALL
SELECT {d'2023-01-02'}, 300,500,NULL,'SuperMarket'
UNION ALL
SELECT {d'2023-01-02'}, NULL,400,NULL,'Clothes'
UNION ALL
SELECT {d'2023-01-03'}, NULL,NULL,350 ,'Present'
UNION ALL
SELECT {d'2023-01-03'}, 500,NULL,NULL,'SuperMarket'
UNION ALL
SELECT {d'2023-01-04'}, 200,100,NULL,'Clothes'
UNION ALL
SELECT {d'2023-01-06'}, NULL,NULL,100,'SuperMarket'
UNION ALL
SELECT {d'2023-01-06'}, NULL,100,NULL,'Clothes'
SQLSQL
Select the data
Now by using COALESCE function we will retrieve first not NULL value from expense1,expense2 and expense 3 columns
SELECT TDATE,
COALESCE(EXPENSE1,EXPENSE2,EXPENSE3),
TTYPE
FROM sqluser.expenses ORDER BY 2
SQLSQL
#RANK vs DENSE_RANK vs ROW_NUMBER functions
- RANK()— assigns a ranking integer to each row within the same window frame, starting with 1. Ranking integers can include duplicate values if multiple rows contain the same value for the window function field.
- ROW_NUMBER() — assigns a unique sequential integer to each row within the same window frame, starting with 1. If multiple rows contain the same value for the window function field, each row is assigned a unique sequential integer.
- DENSE_RANK() leaves no gaps after a duplicate rank.
In SQL, there’s several ways that you can assign a rank to a row, which we’ll dive into with an example. Consider once again the same example as before, but now we want to know what is the highest expenses.
We want to know where do I spend the most money. There are different ways to do it. We can use all ROW_NUMBER()
, RANK()
and DENSE_RANK()
. We will order the previous table using all three functions and see what are the main differences between them using the following query:
Below is our query:
The main difference between al three functions is the way they deal with ties. We will further deep-dive their differences:
ROW_NUMBER()
returns a unique number for each row starting at 1. When there are ties, it arbitrarily assigns a number if a second criteria is not defined.RANK()
returns a unique number for each row starting at 1, except for when there are ties, then it will assign the same number. As well, a gap will follow a duplicate rank.DENSE_RANK()
leaves no gaps after a duplicate rank.
#Calculating Running Totals
The running total is probably one of the most useful window functions especially when you want to visualize growth. Using a window function with SUM()
, we can calculate a cumulative aggregation.
To do so, we just need to sum a variable using the aggregator SUM()
but order this function using a TDATE column.
You can observe the corresponding query as follows:
As you can observe in the table above, now we have the accumulated aggregation of the amount of money spent as the date passes by.
Conclusion
SQL is great. Functions used above might be useful when dealing data analysis, data science, and any other data-related field.
This is why you should care to keep improving your SQL skills.
Thanks
SUPER examples !
Thanks @Robert Cemper
Thank you Muhammad. We tend to forget about the power of SQL - add to this classmethods defined as [ SqlProc ] and you have amazing capabilities in IRIS
Thanks @Andre Wessels
really nice functions - thank you for highlighting them!
Thanks @Ben Spead
I'm loving the examples, however I'm a little confused by the last running total. Either I'm misunderstanding what it's trying to do or its not doing what I think it should.
The ACC_EXPENSES column seems to be only taking the first value for a given date and adding it into the running total instead of all the values for a given date. In the below image the date 1/1/23 has 3 rows, 300, 230 & 330, which total 860 as seen in the TOTAL_EXPENSES column.
The ACC_EXPENSES value for 1/2/23 should be the sum of 1/1/23, (860) and the sum of 1/2/23, (880), but instead its showing 730, which is the sum of the 2 rows I've highlighted above.
Hi @Mark Charlton
Yes, you are absolutely right.
As we are calculating running total of aggregate value so we need to add one more aggregate function in order to get correct value.
I have updated the query by adding one more SUM() function.
Thanks for highlighting it.
Regards
Waseem
Great functions that I didn't know about. Thanks!
Thanks @Marcelo Witt
Nice example of using Window Functions @Muhammad Waseem !
For those trying to test this on a not-so-recent version: this syntax was first introduced in IRIS 2021.1 and a handful (e.g.
LEAD()
,LAG()
andNTH_VALUE()
) were added in 2022.3Thanks @Benjamin De Boe
💡 This article is considered as InterSystems Data Platform Best Practice.