Difference between %Stream.FileCharacter and %FileBinaryStream

Question

Question

Scott Roth · Apr 19, 2017

Awhile back we had someone write some code to Log messages like we did in eGate but with Ensemble.

ClassMethod LogIt(pComponent As %String, pMsgIn As %String)
{
set vDIR="/ensemble/"_^OSUWMCInstance_"/logs/"
set fs=##class(%Stream.FileCharacter).%New()
set fs.Filename=vDIR_pComponent_".log"
do fs.MoveToEnd()
set vTM=$PIECE($ZDATETIME($HOROLOG)," ",2)
//$ZTIME($PIECE($H,",",2),1)
do fs.WriteLine(vTM_" : "_pMsgIn)
do fs.%Save()
set fs = "" // Close file
}

We found that the IO on this was slowing messages down, and those Operations that had high volume of message would fall behind. I have used %FileBinaryStream before on other items trying to write files. What is the difference between %Stream.FileCharacter and %FileBinaryStream? Is there a difference in throughput?

Thanks

Scott Roth

The Ohio State University Wexner Medical Center

Discussion (2)2

Log in or sign up to continue

Sean Connelly · Apr 19, 2017

Hi Scott,

The %Stream package superseded the stream classes in the %Library package. If you look at the class documentation you will see in the descriptions that the %Library stream classes have been deprecated in favour of the %Stream variants. The only reason they still exist would be for legacy implementations.

The other difference is that one is a character stream and the other is a binary stream. As a general rule you should only write text to the character stream and non text (e.g. images) to the binary stream. The main reason for this is to do with unicode characters. You may not have seen issues writing text to %FileBinaryStream, but that might well be because your text didn't have any unicode conversions going on.

Performance wise I'm not sure there would be much in it between the two. You can access the source code of both and they both use the same underlying raw code for reading and writing to files. If you benchmarked them then I guess you would see a marginal difference, but not enough to question which one to use for best performance.

I wonder, how did you determine that the logIt code was the reason for messages slowing down. On the surface it should only have a small impact on the message throughput. If messages are queueing up then it almost feels like this is just the first observation of an overall performance issue going on. I guess you have monitored overall IO performance. If it's already under strain then this could be the straw that breaks the camels back.

On a curious note, whilst you might have needed to log messages in eGate, I wonder why this would be necessary in Ensemble. Unless you are using in memory messaging, all of your messages will be automatically logged internally, as well as being tailed to the transaction logs. By adding your own logging you are effectively writing the same message to disk not twice but three times. If you also have IO logging enabled on your operation then it will be four times. Not to mention how many times the message was logged before the operation. On top of that, if you have log trace events enabled in production then the IO overhead for just one messages is going to thrash the disks more than it needs to. Multiply that across your production(s) and how well IO is (or is not) spread over disks and it would be easy to see how a peak flow of messages can start to queue.

Another reason I see for messages queuing (due to IO thrashing) is because of poor indexes elsewhere in the production. A data store that worked fast in development will now be so large that even simple lookups will hog the disks and flush out memory cache putting an exponential strain on everything else. Suddenly a simple bespoke logger feels like its writing at the speed of a ZX Spectrum to a tape recorder.

Of course you may well have a highly tuned system and production and all of this is a rambling spam from me. In which case, nine times out of ten if I see messages queuing its just because the downstream system can't process messages as quickly as Ensemble can send them.

Sean.

2 0

Mark Hanson · Apr 20, 2017

Streams support the idea of writing to them without changing the previous stream content so you can either accept the newly changed stream value or discard it depending on if you call %Save or not. In order to support this when you attach to an existing file and then append some data you are actually making a copy of the original file and appending data to this copy. When you %Save this we remove the original file and rename this copy so this is now the version you see. However as you can see making a copy of a file is a potentially expensive operation especially when the file gets large so using a stream here is probably not what you want.

As you just want to append the data and do not want file copies made I would just open the file directly in append mode (using either 'Open' command directly or %File class) and write the data you wish to append so avoiding stream behavior.

0 0