Finding the median in intersystems cache SQL
Hi,
Is there a way to find the median in Intersystems Cache SQL? I know it is not available as an aggregate function. Also in SQL Server I could try something like:
SELECT ( (SELECT MAX(Score) FROM (SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf) + (SELECT MIN(Score) FROM (SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf) ) / 2 AS Median
However, there is no PERCENT Keyword in Cache as well. Any suggestions?
Thanks
Comments
See Median in SQL
As of IRIS 2021.1, we allow users to create their own aggregate functions. Perhaps there's a beautiful community contribution in there? :-)
You could build something simple where you just stuff all values in a temporary global (the name of which you pass as a state) and sort them (using $sortbegin/$sortend), maintaining a count and then in the FINALIZE method gather the actual median value.
Two caveats:
- don't bother implementing a MERGE function. We don't support parallel execution just yet.
- in some query execution plans, the FINALIZE method may be called more than once (e.g. if the aggregate is used in the SELECT list and say a HAVING clause). So you may want to cache your result somewhere (a PPG will do as this is in the last single-process mile of query processing, typically mere milliseconds apart)
We'll be removing these annoyances in a future version
SELECT TOP 1
AVG(main.age) AS _Average,min(main.age) AS _Min,
CASE WHEN %vid = count(main.age)/2 THEN main.age else 0 END+MAX(CASE WHEN %vid = count(main.age)/2 THEN main.age else 0 END) AS _Median,
max(main.age) AS _Max
FROM ( SELECT TOP all a.Age FROM Sample.Person a ORDER BY a.Age ) main
Thanks Randy!