How to select a random row from a table?

Primary tabs

Hi community,

I need to write an SQL query  to fetch a random record from a table, that table has millions of data.

In postgresql, for example, there is a RANDOM() function to do something like that:

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1

Is it possible to do something like that in Caché?

Thanks in advance

Replies

One thing I've done to split machine learning datasets is to use an auxiliary table that maps IDs to a random number. I write a stored procedure that returns a random number. I create a table with two columns, one for the ID of the source table, and one to hold a random number. I populate the column for the source IDs:

insert into random_table (source_id)
select id from source_table

I then populate the column for the random number:

update random_table
set random_number = MySchema.MySP_Random(1E9)

Then I can select with an ORDER BY clause on the random number:

select top 10 source_id
from random_table
order by random_number, source_id

It depends on your use case whether this will be appropriate for a source table with millions of rows. It's an expensive way to select just one row.

Jonathan, where do you perform the training/fit part of the AI/ML pipeline?

This is for a simple train-test split as part of testing for IntegratedML, currently in beta.

As there is no default RANDOM() in Caché/IRIS SQL you have to write it yourself.
Create this class:

Class User.Henry
{ ClassMethod Random(maxval As %Integer = 2) As %Integer [ SqlName = RANDOM, SqlProc ]
{
 quit $random(maxval)
} }

Assuming your table has defautl ID (%Integer, MINVAL=1)

SELECT TOP 1  column FROM table
HAVING ID >= RANDOM(MAX(ID))

>= ensures that you just get an existing ID and don't drop on a deleted one

Of course, if you know in advance the highest existing ID   (e.g. maxid)  then RANDOM(maxid)  will do it.

  

I'm curious to know what you're doing that you are looking for a random record in such a large dataset?  Sincerely interested in what you're doing so I can learn and not challenging the validity of your task!