Recent posts:
Recent replies:

Some time ago I was working on the issue of a fast data generation for an arbitrary data model.

Data model was connected (object properties) and so to generation was going in stages from low-cardinality independent tables (i.e. product) to high-cardinality dependent tables (i.e. orders). Furthermore tables did not have continuous IDs, there were gaps (due to sharding in my case but even if there's no hard delete we might want to exclude soft deleted rows).

Here's the approach I used.

1. For a given class build a local with all ranges of IDs:

/// d ##class().Ranges()
ClassMethod Ranges(class As %Dictionary.Classname, Output ranges)
{
    kill ranges    

    set table = ##class(%CSP.UI.Portal.SQL.Home).Quoter2(##class(%DeepSee.Utils).%GetSQLTableName(class))
    
    set rs = ##class(%SQL.Statement).%ExecDirect(,"SELECT ID FROM " _ table _ " ORDER BY ID ASC")

    do rs.%Next()
    set start = rs.ID
    set end = rs.ID
    while rs.%Next() {
        if rs.ID - 1 = end {
            d $i(end)
        } else {
            set ranges($i(ranges)) = $lb(start, end-start)
            set start = rs.ID
            set end = rs.ID
        }
    }
    
    set ranges($i(ranges)) = $lb(start, end-start)
}

This method accepts class name and returns this structure:

ranges(num) = $lb(startID, length)

2. Call GetRandomInRanges method to get random ID from ranges local:

/// d ##class().GetRandomInRanges()
ClassMethod GetRandomInRanges(ByRef ranges)
{
    set sum = 0
    
    for i=1:1:ranges {
        set sum = sum + $lg(ranges(i),2)
    }
    
    set threshold = $random(sum) + 1
    
    set sum = 0
    for chunk=1:1:ranges {
        set sum = sum + $lg(ranges(chunk),2)
        quit:sum>=threshold
    }    
    
    set val = ranges(chunk)
    if $listvalid(val) {
        if $ll(val)=2 {
            set val = $lg(val, 1) + $random($lg(val, 2))
        } else {
            set:$listvalid(val) val = $lg(val, ($random($ll(val))+1))
        }
    }
    
    quit val
}

This method guarantees that returned IDs would conform to the normal distribution.

Ranges should be called once and GetRandomInRanges  can be called as often as needed.

Open Exchange applications:
Followers:
Following:
Eduard has not followed anybody yet.
Global Masters badges: