It depends on...

Who is sitting at the other end? A Cache/IRIS server or a third-party product?

If Cache/IRIS: Mirroring, shadowing are the catchwords, you have to look for. In case of third-party SQL-DB: how fast (how often) want to do your updates? Once a day or (nearly)realtime?

I did something like that several years ago... the procedure is (just as a starting point):

Our application uses objects, so all the involved classes have an %OnAfterSave() method, something like this

Method %OnAfterSave(insert As %Boolean) As %Status
{
   do ..addToTransfer(..%Id())
}

with some smartness, like do not add if the record is already in the transfer queue, etc.  If you use SQL instead of objects, triggers are your friend.

We have also a task,  which crates (based on the class definition) a series of INSERT/UPDATE statement(s) and does the transfer with the help of  %SQLGatewayConnection.

The simplest solution was already answered by Robert Cemper in https://community.intersystems.com/post/how-select-random-row-table. I just want to show a more "universal variant" of that solution.

First, create an SQL stored procedure

class SP.Utilis Extends %RegisteredObject
{
ClassMethod Random(number As %Integer, dummy As %String) As %Integer [SqlProc]
{
   quit $random(number) // we do not use dummy but we need it!!
}
}

then make your query as follows:

select top 10 * from whatever.table where SP.Utils_Random(100,ID)<50

This has following advantages:

1) works with all classes, i.e. the ID column has not to be an integer (greater 0), can be a compound column too (like part1||part2, etc)

2) by adjusting the comparison:

Random(1000,ID) < 50   // gives you more "greater" distances then

Random(1000,ID) <500  // between the returned rows

For testing of edge conditions you can use

Random(1000,ID)<0    // no rows will be returned or

Random(1000,ID)<1000 // all rows will be returnd

With the right side of the comparison you can fine tune the distances between the returned rows.

For the dummy argument in the above function you can use an arbitrary column name, the simplest is to use ID because the ID column always exists, it's purpose is to force the SQL-Compiler to call this function for each row (thinking, the result of the Random() function is row-dependet). A comparsion like Random(100)<50 is executed just once. Roberts solution works too because he uses Random(100)<ID but this works only for tables where ID is a Integer (>0). You can verify this by just issuing a simple query

select top 10 * fom your.table where SP.Utils_Random(100)<50

You will see (by repeatedly executing the above query) either 10 (subsequente) rows or nothing

Just my 2 cent suggestion, CACHE.DAT and IRIS.DAT can be (usually) well compressed, the catchwords are winzip and winrar (I prefer winrar over winzip). Winrar, despite the word "win" in name, is also available for linux.


An example: winzip turned a 16GB CACHE.DAT into 3.45GB,  winrar (mode=best) topped this with 2.2GB, but as always, your values will depend on your data. And mind the time you need to compress and decompress the files, which, of course will depend on your hardware...

For example (command line)

rar a -m4 -m512 -v4g <pathTo>cachetransfer <pathTo>cache.dat

will create as many compressed files as needed, each (but the last one) with a size of 4GB, with good compression using dictionary of 512KB size.

You will get, in total, roughly 250 (*.rar) files (each with size of 4GB), I assume, 4TB compresses to 1TB.
When the first 4GB (rar)file is ready, start the transfer in parallel (one job does the compression and the other(s) work(s) on transfer - maybe you have multiple internet connections). Further, suppose you have a continuous (internet) connection between your and the target system with 100 Mbps then, again roughly, the job is done in 28 hours... better then transferring 4TB in a week or more (it's easier to restart a 4GB file as a 4TB file)

Counterquestion, do you have an example of a 'non-ASCII' char?

Codepoints 0x00-0x7F (0 - 127) are the C0 controls, aka Basic Latin (ASCII)

Codepoints 0x80-0xFF (128-255) are the C1 controls, aka Latin1

Take a look on https://www.unicode.org/charts/PDF/U0080.pdf

For example, Ä or ä are the german umlaut-A respective umlaut-a,

$ascii("Ä") --> 196 and $ascii("ä") --> 228 type in a terminal session on your system: write $char(196) --> Ä

Download and compare the above pdf with your iso-8859-1 data, there should be no difference.

If you get data as ISO-8859-1 (aka Latin1) and have a Unicode (IRIS/Cache) installation then usually you have nothing to do (except, to process the data). What do you mean with "convert the text to UTF-8"? In IRIS/Cache you have  (and work with) Unicode codepoints, UTF-8 comes into play only when you export your data but in your case, it will rather be ISO-8859-1 or do I something misunderstand?

By the way, if you return your data back to your Latin1 source (as Latin1) then you have to take some precautions because you have an unicode installation, so during the data processing you could mix your Latin1 data with true unicode data from other sources!

See: https://unicode.org/charts/

Also, you may download and read:

https://www.unicode.org/versions/Unicode13.0.0/UnicodeStandard-13.0.pdf

Of course, if you don't want to check each and every write() for error or success, you can do the check just one time at the beginning

set str=##class(%Stream.FileCharacter).%New()
do str.LinkToFile("/root/my_file.txt")
set sts=str.Write("")
if 'sts { write "We have a problem",! quit }

writing a nullstring to stream does not change the stream but the file opening sequence will be executed

Checking status codes is a good starting point...

set str=##class(%Stream.FileCharacter).%New()
write str  --> 4@%Stream.FileCharacter
write $system.OBJ.DisplayError(str.LinkToFile("/root/my_file.txt")) --> 1
write $system.OBJ.DisplayError(str.WriteLine("line-1"))  --> ERROR #5005: Cannot open file '/root/my_file.txt'1

Your %Save() returns with 1 (OK), because there is nothing to save...

Note: on linux,  for a standard user (like me now) is forbidden to write to '/root' directory

I think, there is a "small problem" at start...

zn "%SYS",n=SL ...

will not work as expected ;-))

but what about this line with 552 chars

zn "%SYS" s n="SL" d ##class(Security.SSLConfigs).Create(n):'##class(Security.SSLConfigs).Exists(n),##class(%Net.URLParser).Parse("https://pm.community.intersystems.com/packages/zpm/latest/installer",.c) s h=##class(%Net.HttpRequest).%New(),h.Server=c("host"),h.Port=443,h.Https=1,h.SSLConfiguration=n,s=h.Get(c("path")) q:'s $System.Status.GetErrorText(s) s x=##class(%File).TempFilename("xml"),f=##class(%Stream.FileBinary).%New(),f.Filename=x d f.CopyFromAndSave(h.HttpResponse.Data) d h.%Close(),$system.OBJ.Load(x,"ck") do ##class(%File).Delete(x)

I only rewrote your line, but haven't tried to execute

I think, your solution is an  "ad hoc" solution for a particular task, anyway, I want to point out two problems.

First, the solution fails, if the size of <source> is less then than the size of <target>:

set source={"Name":"Joe", "Age":50 }
set target={"Name":"Joe", "Age":50, "Phone":"123-456"}

write CompareJSON(source,target) ---> 1
write CompareJSON(target,source) ---> 0

The same goes for data like:

set source={"Name":"Joe", "Age":50, "Data":[10,20] }
set target={"Name":"Joe", "Age":50, "Data":[10,20,30]}

Maybe your data do not have such cases.

A quick check could be:

if source.%Size()-target.%Size() { quit "Size-problem" }

Second, in a more "general" case, comparing lists could sometimes lead to an philosophical question:
are two lists with the same elements but in a different sequence (of those elements) equal or not?

list1: [aaa, bbb]
list2: [bbb, aaa]

The answer MAY depend on the question, what is stored in those lists?

If the lists contains, for example, some kind of instructions, then the sequence of those instructions will mattern, but if those list are just list of, say my hobbies, then the sequence is unimportant (except, if one implies some kind of weighting, like the first entry is my preferred hobby).

This implies to me, to compare two list, you MAY need an extra info, something like:
dependsOnSequence = yes | no

One more hint, in the line

s tSC= ..CompareJSON(value,target.%Get(key),RefNo)  

what happens if target.%Get(key) is empty (or not a corresponding object)?

Just my2cents

Ok, here I am... sorry for the delay.

***********************************************

Forget it!

I'm unable to copy and paste a program (in this case a method) into this text box!

After paste operation, some lines are joint, others not,  the indention is lost...

Maybe I'm just too dumb to work with the text box...

*****************************************

If you want this class, you can it download from my FTP.
ftp: ftp.kavay.at
usr: dcmember
psw: member-of-DC

I'm not sure, what you want to do, but if you want to read a tiff file by byte-by-byte and the interpret it in some kind, there is a very simple example for the start.
 The method below returns the file type (gif,jpg,png or tif) based on the magic number.

/// Identify an image by its magic number
/// (only for gif,jpg,png and tif)
ClassMethod ImageType(file, ByRef err)
{
   o file:"ru":0
   i $t {
      s io=$i, err=""
      u file r x#8
      
      i x?1"GIF8"1(1"7",1"9")1"a" { s typ="gif" }
      elseif $e(x,1,2)=$c(255,216), $$end()=$c(255,217) { s typ="jpg" }
      elseif x=$c(137,80,78,71,13,10,26,10) { s typ="png" }
      elseif $case($e(x,1,4), $c(73,73,42,0):1, $c(77,77,0,42):1, :0) { s typ="tif" }
      else { s typ="", err="File type unknown" }
      
      c file
      u:io]"" io
   
   } else { s typ="", err="Can't open "_file }
   
   q typ

end() s t="" r:$zseek(-2,2) t#2 q t
}

I have also a method to retrive the image size (pixelsWidth and pixelsHeight) for the same (gif,jpg,png and tif) files. If you are working on similar problem, I could post this method too.