Question
· Feb 28

Estimate IRIS Health Connect database size based on HL7 message volumes

I'm looking for some simple heuristics to estimate the size on disk of a database based on average size of messages, number of messages per day and purge frequency. The purpose is for estimation of disk space requirements.

Clearly this is a how long is a piece of string question but for example, if you have a simple HL7 routing production that does nothing but process HL7. It receives 10,000 HL7v2 messages per day (all approx 1kb on the wire) in a single service, passes them to a single router and outputs to a single operation. What factor should you multiply the size of each message on the wire to get an approximation for the size on disk?

The inbound message will generate a message header object and a message body object held in globals. Both of those will have an index global. The message content is held in a stream which would be roughly the same size in bytes plus a small overhead.  There will be new header for each message shunted between business hosts within the production. There's also event logs  Then there's database block size and packing to consider before thinking about filesystems!

Depending on how I do back-of-envelope maths, I come up with something between a factor of 2x and 5x on-the-wire bytes. I'm inclined to think it's closer to the 2x as I suspect it's more efficient than the 5x, but better to over-estimate than under.

Discussion (3)2
Log in or sign up to continue

This might not be exactly what you’re looking for, as it requires an actual working interface to perform the estimation, and not just a "concept/design" of the interface, but worth checking it out. It indeed takes into account for example also the Event Log, as well as Journals which you didn't mention above but might be important for you as well.

https://community.intersystems.com/post/ensemble-interfaces-disk-space-u...

You might also find this HL7 benchmark post by @Mark Bolinsky useful, specifically the section titled "Disk Configuration", and the related "Table 2" there.

For convenience I'm pasting this table here (but read the original post for the full context, for example your scenario sounds more like the "T2 Workload" described there, rather than the "T4" one) -

Table 2: Disk Requirement per inbound HL7 T4 Message  

Contributor Data Requirement
Segment Data 4.5 KB
HL7 Message Object 2 KB
Message Header 1.0 KB
Routing Rule Log 0.5 KB
Transaction Journals 42 KB
Total 50 KB