Question
· May 20, 2020

How to select a random row from a table?

Hi community,

I need to write an SQL query  to fetch a random record from a table, that table has millions of data.

In postgresql, for example, there is a RANDOM() function to do something like that:

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1

Is it possible to do something like that in Caché?

Thanks in advance

Discussion (8)0
Log in or sign up to continue

One thing I've done to split machine learning datasets is to use an auxiliary table that maps IDs to a random number. I write a stored procedure that returns a random number. I create a table with two columns, one for the ID of the source table, and one to hold a random number. I populate the column for the source IDs:

insert into random_table (source_id)
select id from source_table

I then populate the column for the random number:

update random_table
set random_number = MySchema.MySP_Random(1E9)

Then I can select with an ORDER BY clause on the random number:

select top 10 source_id
from random_table
order by random_number, source_id

It depends on your use case whether this will be appropriate for a source table with millions of rows. It's an expensive way to select just one row.

As there is no default RANDOM() in Caché/IRIS SQL you have to write it yourself.
Create this class:

Class User.Henry
{ ClassMethod Random(maxval As %Integer = 2) As %Integer [ SqlName = RANDOM, SqlProc ]
{
 quit $random(maxval)
} }

Assuming your table has defautl ID (%Integer, MINVAL=1)

SELECT TOP 1  column FROM table
HAVING ID >= RANDOM(MAX(ID))

>= ensures that you just get an existing ID and don't drop on a deleted one

Of course, if you know in advance the highest existing ID   (e.g. maxid)  then RANDOM(maxid)  will do it.