Caché native random command
Hello. I have a query in the database that returns several rows, I need to get an "X" number of results randomly. I tried traditional commands from other languages but was unsuccessful. Does Caché have something like that?
Product version: Ensemble 2018.1
Hello Andre,
Take a look at these two other posts with very similar questions, maybe you can find something that works for you. Unfortunately, there doesn't seem to be a built-in random function on the SQL side (though on the Objectscript side there is $RANDOM.)
https://community.intersystems.com/post/how-select-random-row-table
https://community.intersystems.com/post/custom-query-returns-random-number-columns
I think you're looking for $RANDOM.
Here's the documentation.
https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY...
You can just do "SELECT TOP X FROM ...." where X is the number of rows you're looking for. As mentioned in the above link
Kevin, the idea is good, I imagined a command like that, however, for some reason the return is always the same.
While I don't know the specifics in this case, the issue is that unpredictable isn't quite the same as random and being unpredictable is only one desired property for a random function (for starters, you might also want some sort of even distribution over the range). For example, a function that "randomly" returns either 1, 24 or 73 is unpredictable but isn't the same as a function that returns random numbers over the entire range 1-100.
I'm posting this as a potential solution. I imagine there are smarter ways of doing this. Hoping someone will jump in and show me the way 😁
I've written a classmethod that accepts 3 arguments: the lower and upper limits of the IDs to select, along with the number of records you'll want to select:
The method can be called as a custom SQL function in a subquery:
The query above will select up to 10 rows randomly from table MyTable.
Caveats: This could take a long time to run if your range from minimum to maximum %Id is large. And it's not guaranteed to return the number of rows specified as the 3rd argument (there may be deleted records, or insufficient random values generated before the upper limit for %Ids is reached). Finally, it assumes that %Id is numeric.
EDIT: And as pointed out by @Julius Kavay, @Robert Cemper has a better idea (he usually does 😉)
Some time ago I was working on the issue of a fast data generation for an arbitrary data model.
Data model was connected (object properties) and so to generation was going in stages from low-cardinality independent tables (i.e. product) to high-cardinality dependent tables (i.e. orders). Furthermore tables did not have continuous IDs, there were gaps (due to sharding in my case but even if there's no hard delete we might want to exclude soft deleted rows).
Here's the approach I used.
1. For a given class build a local with all ranges of IDs:
This method accepts class name and returns this structure:
2. Call
GetRandomInRanges
method to get random ID fromranges
local:This method guarantees that returned IDs would conform to the normal distribution.
Ranges
should be called once andGetRandomInRanges
can be called as often as needed.The simplest solution was already answered by Robert Cemper in https://community.intersystems.com/post/how-select-random-row-table. I just want to show a more "universal variant" of that solution.
First, create an SQL stored procedure
then make your query as follows:
This has following advantages:
1) works with all classes, i.e. the ID column has not to be an integer (greater 0), can be a compound column too (like part1||part2, etc)
2) by adjusting the comparison:
For testing of edge conditions you can use
With the right side of the comparison you can fine tune the distances between the returned rows.
For the dummy argument in the above function you can use an arbitrary column name, the simplest is to use ID because the ID column always exists, it's purpose is to force the SQL-Compiler to call this function for each row (thinking, the result of the Random() function is row-dependet). A comparsion like Random(100)<50 is executed just once. Roberts solution works too because he uses Random(100)<ID but this works only for tables where ID is a Integer (>0). You can verify this by just issuing a simple query
You will see (by repeatedly executing the above query) either 10 (subsequente) rows or nothing
Social networks
InterSystems resources
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue