Discussion
· Apr 12

Code Golf - Maximum Coverage

There is a list of numbers from 1 to 190.

AllList="1,2,3,4,5,6,7,8,9,.....,187,188,189,190"

There is a collection of sets of these values:

List(1)="3,5,6,7,9"
List(2)="1,2,6,9"
List(3)="5,8,9"
List(4)="2,4,6,8"
List(5)="4,7,9"

What is an elegant approach in Object Script to pick the least number of list items:

  • List(1)
  • List(5)
  • List(n)

That together would cover as many numbers as possible from the AllList.

Interested in best coverage over efficiency.

Discussion (7)1
Log in or sign up to continue

Two questions.

First, is the expected answer for the above collection of five sets: 1,2,4 (where the sets 1, 2 and 4 all together cover 9 different numbers: 3,5,6,7,9,1,2,4,8)?

Second, may any of the sets contain a number, which is not included in the AllList?
If the answer is a no (what I assume), then the AllList is't needed, because all numbers are in 'AllList', hence it's enough to search for a minimum number of sets with the maximum of distinctive elements.

Hi Julius, Thanks for clarification questions.

1. Yes looking to understand how developers may approach finding sepecific sets as your example: sets 1,2,4.

2. No, the AllList is informative about possible numbers avaialble to use in the sets in a specific scenario. Following your question maybe a generic solution would be flexible for any number in search for "minimum number of sets to include maximum distinct elements".

Since you want the smallest possible number of sets that cover the maximum number of elements, one of the simplest solutions (from a programming point of view) is to check all possible combinations of K out of N sets.

However, keep in mind that combinations grow (very) quickly. If your N tends to leave the single-digit range, you may need to resort to linear programming methods.

Some values for all combinations (K out of N) to see, how they grow:
 N : 1  2  3   4   5     10      15         20          25                     50
cnt: 1  3  7  15  31  1,023  32,767  1,048,575  33,554,431  1,125,899,906,842,623

Class DC.MaxCoverage Extends %RegisteredObject
{

ClassMethod Test(case = 0)
{
	if case=4 {
		set list($i(list))="1,2,3"
		set list($i(list))=""
		set list($i(list))="1,2,3"
	}
	elseif case=3 {
		set list($i(list))="1,2,3"
		set list($i(list))="4,5,6"
		set list($i(list))="1,2,3"
		
	} elseif case=2 {
		set list($i(list))="1,2,3"
		set list($i(list))="7,8,9"
		set list($i(list))="11,12,13,14,15"
		set list($i(list))="2,8,12"
		set list($i(list))="15,16,17"
		set list($i(list))="12,17,19,21,22,23"
		set list($i(list))="11,21,31,32,33"
		set list($i(list))="34,35,36"
		
	} elseif case=1 {
		set list($i(list))="1,2,3,4"
		set list($i(list))="5,6,7,8,9,10,11,12"
		set list($i(list))="2,3,4,5,6,7"
		
	} else { // the original testcase
		set list($i(list))="3,5,6,7,9"
		set list($i(list))="1,2,6,9"
		set list($i(list))="5,8,9"
		set list($i(list))="2,4,6,8"
		set list($i(list))="4,7,9"
	}
	
	quit ..MinMax(.list)
}

/// minimum number of sets with maximum number of coverage
/// 
ClassMethod MinMax(ByRef list)
{
	// I assume, all numbers are
	// a) integers and greater then 0 (else map the numbers to integers in the toBits() method)
	// b) and all sets have less then ca. 262100 elements
	// 
	set N=list
	for i=1:1:N s set(i)=..toBits(list(i))	// convert each list into a bitstring
	
	set max=0, min=N, lst=""				// max=covered numbers, min=required sets, lst=list of combinations
	
	for k=1:1:N {									// compute all (k out of n) combinations for each k
		set s=0, i(0)=0								// --+ compute k out of n
1		set s=s+1, i(s)=i(s-1), e(s)=N-k+s			//   |
2		for i(s)=i(s)+1:1:e(s) goto 1:s<k do chk	//   | check a particular combination
3		set s=s-1 if s goto 2:i(s)<e(s),3			// --+
	}
	quit min_" set(s) covers "_max_" elements with sets: "_lst	// return the result	
	
chk	set c=i(1),v=set(i(1))									// take the first set
	for i=2:1:s s c=c_","_i(i), v=$bitlogic(v|set(i(i)))	// OR it with the next
	set v=$bitcount(v,1)
	
	if v>max { set max=v,min=s,lst=c }
	elseif v=max,s<min { set min=s,lst=c }
	elseif v=max,s=min { s lst=lst_" or "_c }
}

/// convert a list of numbers into bitstring, i.e.: 1,2,4 --> 1101
/// 
ClassMethod toBits(x)
{
	set b=""
	for i=1:1:$l(x,",")-(x="") set $bit(b,$p(x,",",i))=1
	quit b
}

/// Sum of all combinations of N elements
/// 
ClassMethod AllComb(N)
{
	s s=0
	f k=1:1:N s s=s+..Comb(k,N)
	q s
}

/// Count of combinations of K elements out of N elements
/// 
ClassMethod Comb(k, n)
{
	s c=1
	f k=1:1:k s c=c/k*n, n=n-1
	q c
}

}

Tests


USER>for i=0:1:4 write ##class(DC.MaxCoverage).Test(i),!
3 set(s) covers 9 elements with sets: 1,2,4
2 set(s) covers 12 elements with sets: 1,2
7 set(s) covers 23 elements with sets: 1,2,3,5,6,7,8
2 set(s) covers 6 elements with sets: 1,2 or 2,3
1 set(s) covers 3 elements with sets: 1 or 3

USER>

I think you can guarantee that the picked list would:

  1. Provide the fullest possible coverage of numbers
  2. Skip at least fully superfluous lists

And do it in O(2n) where n is list count (assuming lists are of similar length).

 

Before anything, zero-init a list for each number (called a number list). You'll need to do two passes over your lists.  

On a first pass, check each list value against your number list. If at least one corresponding value in a number list is a zero (meaning our current list has a number we did not encounter before), add the list to the result and increment each position in a number list that is present in a current list by 1.

In our case:

 

Numbers: 0, 0, 0, 0, 0, 0, 0, 0, 0

List(1)="3,5,6,7,9"

As Numbers(3)==0, we add List(1) to the output and modify Numbers:

Numbers: 0, 0, 1, 0, 1, 1, 1, 0, 1

 

In a similar vein, we iterate all our lists (and add all of them, actually); our Numbers after the first pass should look like this:

Numbers: 1, 2, 1, 2, 2, 3, 2, 2, 4

Lists: 1, 2, 3, 4, 5

 

Now do a second pass, only over lists added on a first pass. If every element in a list has a value >1 in a number list, remove the list and decrease the corresponding number list by 1.

 

List(1)="3,5,6,7,9"

Numbers: 1, 2, 1, 2, 2, 3, 2, 2, 4

Numbers(3)==1, so this list remains.

 

List(2)="1,2,6,9"

Numbers(1)==1, so this list remains.

 

List(3)="5,8,9"

Numbers(5)==2>1, Numbers(8)==2>1,  Numbers(5)==4>1,  so we are removing this list, new numbers:

Numbers: 1, 2, 1, 2, 1, 3, 2, 1, 3

 

List(4)="2,4,6,8"

Numbers(8)==1, so this list remains.

 

List(5)="4,7,9"

Numbers(4)==2>1, Numbers(7)==2>1,  Numbers(9)==3>1,  so we are removing this list, new numbers:

Numbers: 1, 2, 1, 1, 1, 3, 1, 1, 2

Lists: 1, 2, 4

 

This, however, does not guarantee that it's a minimum amount of lists, but entirely superfluous lists would be removed, and all possible numbers would be present (have at least one reference in a number list).

 

Another way I thought it could be resolved is by transposing the lists into numbers like this:


Number(1)=$lb(2)
Number(2)=$lb(2, 4)
Number(3)=$lb(1)
Number(4)=$lb(4, 5)
Number(5)=$lb(1, 3)
Number(6)=$lb(1, 2, 4)
Number(7)=$lb(1, 5)
Number(8)=$lb(3, 4)
Number(9)=$lb(1, 2, 3, 5)


After that is done, any number with just one reference must be picked (meaning it's present in only one list). In our case, numbers 1 and 3, resulting in picking lists 2 and 1.


All numbers in lists 1 and 2 must also be picked: 1, 2, 3, 5, 6, 7, 9


Next, we delete Numbers that we already picked, leaving us with:

Number(4)=$lb(4, 5)
Number(8)=$lb(3, 4)

From the remaining Numbers, we need to remove lists that we already picked (so 1 and 2), but in your example, they are not present anyway.


However, after this cleanup, we might encounter a situation where a number is present in only one list. In that case, the first step needs to be redone again until we arrive at a situation where no number is present only in one list, so in our case:


Number(4)=$lb(4, 5)
Number(8)=$lb(3, 4)


After that, pick a list with the largest amount of different numbers - 4 in our case and repeat from the beginning. Eventually, you'll arrive at empty Numbers local, meaning the task is complete.

my approach would be:

1. run through each list and generate a value - list index
     e.g. List(3) would result in a index ^||idx(5,3)="" ^||idx(8,3)="" ^||idx(9,3)=""
     also add the list to a still valid list
2. iterate over the index find the first value with only one entry and add that list to result list, then run through the list and remove all value index entries for values contained in the list. remove the list from the still valid list.
3. if no value only has just one list entry, pick the list with the most entries that is on the still valid list. iterate over the list and check each value against the value index, if the value is still in the index remove the value index and add list to the result list. remove the list from the still available list
4. iterate above until either value index has no more entries, or the still valid list has no more entries.
5. result list contains all lists required for maximum coverage
Hope that makes sense.

Here is the implementation:
 

ClassMethod MaxCoverage(ByRef SourceLists, Output Solution as %List) {
    /*
    1. run through each sourcelist and generate a value - list index
     e.g. List(3) would result in a index ^||idx(5,3)="" ^||idx(8,3)="" ^||idx(9,3)=""
     also add the list to a still valid list
    2. iterate over the index find the first value with only one entry and add that list to result list, then run through the list and remove all value index entries for values contained in the list. remove the list from the still valid list.
    3. if no value only has just one list entry, pick the list with the most entries that is on the still valid list. iterate over the list and check each value against the value index, if the value is still in the index remove the value index and add list to the result list. remove the list from the still available list
    4. iterate above until either value index has no more entries, or the still valid list has no more entries.
    5. result list contains all lists required for maximum coverage
    */
    kill Solution
    kill ^||lengthIdx
    kill ^||idx
    kill ^||covered
    set idx=""
    for {
        set idx=$order(SourceLists(idx))
        quit:idx=""
        set listid=0
        set stillAvailable(idx)=""
        set ^||lengthIdx($listlength(SourceLists(idx)),idx)=idx
        while $listnext(SourceLists(idx),listid,listval) {
            set ^||idx(listval,idx)=""
        }
    }


    set listid=""
    // for loop - exit when either ^||idx has no more entries or the still valid list has no more entries
    for {
        if $data(stillAvailable)=0 {
            // no more lists to process
            quit
        }
        if $data(^||idx)=0 {
            // no more values to process
            quit
        }
        // find the first value with only one entry
        set val=""
        set found=0
        for {
            quit:found=1
            set val=$order(^||idx(val))
            quit:val=""
            set listid=""
            for {
                set listid=$order(^||idx(val,listid))
                quit:listid=""
                // found a value check now if ther eis more than one entry
                if $order(^||idx(val,listid))="" {
                    // found a value with only one entry
                    set found=1
                    quit
                }
            }
        }

        if found=0 {
            // haven't found one yet so use the one with the most entries ^||lengthIdx(
            set res=$query(^||lengthIdx(""),-1,val)
            if res'="" {
                set listid=val
            } else {
                // no more entries
                // should never hit this
                quit
            }
        }

        if listid'="" {
            // got a list now process it
            // first remove the list from the available lists
            kill stillAvailable(listid)
            kill ^||lengthIdx($listlength(SourceLists(listid)),listid)
            // iterate trhough the list, check value against the value index
            set listval=0
            w !,"found listid:"_listid,!

            set ptr=0
            set added=0
            while $listnext(SourceLists(listid),ptr,listval) {
                // check if the value is still in the index
                w !,"   checking value:"_listval
                If $INCREMENT(^||covered(listval))
                if $data(^||idx(listval)) {
                    w " - found it!"
                    // remove the value from the index
                    kill ^||idx(listval)
                    // add the list to the result list
                    if added=0 {
                        set Solution=$select($get(Solution)="":$listbuild(listid),1:Solution_$listbuild(listid))
                        set added=1
                    }
                }
            }
        }
    }
}

And the execution result:

DEV>set List(1)=$lb(3,5,6,7,9),List(2)=$lb(1,2,6,9),List(3)=$lb(5,8,9),List(4)=$lb(2,4,6,8),List(5)=$lb(4,7,9)

DEV>d ##class(Custom.codegolf).MaxCoverage(.List,.res)
found listid:2

   checking value:1 - found it!
   checking value:2 - found it!
   checking value:6 - found it!
   checking value:9 - found it!
found listid:1

   checking value:3 - found it!
   checking value:5 - found it!
   checking value:6
   checking value:7 - found it!
   checking value:9
found listid:5

   checking value:4 - found it!
   checking value:7
   checking value:9
found listid:4

   checking value:2
   checking value:4
   checking value:6
   checking value:8 - found it!

DEV>zw res
res=$lb("2","1","5","4")

DEV>zw ^||lengthIdx
^||lengthIdx(3,3)=3

DEV>zw ^||covered
^||covered(1)=1
^||covered(2)=2
^||covered(3)=1
^||covered(4)=2
^||covered(5)=1
^||covered(6)=3
^||covered(7)=2
^||covered(8)=1
^||covered(9)=3

Thanks for time and many suggestions.

Have also been brainstorming what is already in the platform that I could leverage to assist speed up functionality.

Approach is to loop over extent to find the next "set record" that has the most additional numbers available that are not previously selected.

I came up with using Bit Strings instead of IRIS lists at language level.

This allows efficient bit operations via BitLogic "OR" operator.

Storing the BitStrings in a calculated property on record insert and update, mitigates recalculating a Bit String from source string when iterating over an extent each time, looking for the next best record to use.

FInally wrapped this up in a Class Query.

Class TOOT.Data.Instrument Extends %Persistent
{

Property NoteList As %String(MAXLEN = 8000, TRUNCATE = 1);

Property NoteListEmbedding As %Embedding(MODEL = "toot-v2-config", SOURCE = "NoteList");

Property NoteListBit As %Binary [ SqlComputed,SqlComputeOnChange = NoteList];

/// Calculates and stores a bit string during record insert or update
ClassMethod NoteListBitComputation(cols As %Library.PropertyHelper) As %Binary
{
    set bitString=""
    set numlist=cols.getfield("NoteList")
    set numsLen=$L(numlist,",")
    for i=1:1:numsLen {
        set val=$P(numlist,",",i)
        continue:val<6
        continue:$D(found(val))
        Set found(val)=""
        Set $Bit(bitString,val)=1
    }
    return bitString
}

/// Callable query for getting best document based on bitflags
Query BestNoteList(Top As %Integer=5, Accumulate As %Boolean=0) As %Query(ROWSPEC = "ID:%String,Count:%Integer") [SqlProc]
{
}
ClassMethod BestNoteListExecute(ByRef qHandle As %Binary, Top As %Integer=5, Accumulate As %Boolean=0) As %Status
{
    Set:Top<1 Top=5
    Set qHandle=$LB(Top,0,"")
    Quit $$$OK
}

ClassMethod BestNoteListFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status 
[ PlaceAfter = BestNoteListExecute ]
{
    Set qHandle=$get(qHandle)
    Return:qHandle="" $$$OK
    Set Top=$LI(qHandle,1)
    Set Counter=$LI(qHandle,2)
    Set BitString=$LI(qHandle,3)
    Set Counter=Counter+1
    If (Counter>Top) {
        Set Row=""
        Set AtEnd=1
        quit $$$OK
    }
    Set statement=##class(%SQL.Statement).%New()
    Set tSC=statement.%PrepareClassQuery("TOOT.Data.Instrument","Extent")
    Set tResult=statement.%Execute()
    Set MaxCount=$BITCOUNT(BitString,1)
    Set MaxBitStr=""
    Set MaxId=0
    While tResult.%Next() {
        Set tmpId=tResult.%Get("ID")
        Set tmpBit=##class(TOOT.Data.Instrument).%OpenId(tmpId,0).NoteListBit
        Set tmpBit=$BITLOGIC(BitString|tmpBit)
        Set tmpCount=$BITCOUNT(tmpBit,1)
        If tmpCount>MaxCount {
            Set MaxCount=tmpCount
            Set MaxBitStr=tmpBit
            Set MaxId=tmpId
        }
    }
    Do tResult.%Close()
    If (MaxId'=0) {
        Set Row=$LB(MaxId,MaxCount)
        Set AtEnd=0
        Set $LI(qHandle,2)=Counter
        Set $LI(qHandle,3)=MaxBitStr
    } Else {
        Set Row=""
        Set $LI(qHandle,2)=Counter
        Set AtEnd=1
    }
    Return $$$OK
}

ClassMethod BestNoteListClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = BestNoteListFetch ]
{
    Set qHandle=""
    Quit $$$OK
}
}

Calling from the Management Portal:

Where ID is the Record ID and Count is the increasing coverage of bitflags with each itteration of appending a new record.

 

Temporarily added logging to the Compute Method to confirm not being called during the query running.