Connection reset while transferring HL7 through TCPOperation

Primary tabs

Hello everyone smiley

We are facing what seems to be a network problem while transferring HL7 messages from Ensemble/Healthshare to a distant target through TCP/IP.

Here is the version of the system in any case it could be useful: Cache for Windows (x86-64) 2017.2.1 (Build 801U) Wed Dec 6 2017 09:07:51 EST [HealthShare Modules:Core:14.02.2415 + Linkage Engine:15.03.9901]

Then the configuration of the operation:

NB high numbers for Read- and Response-Timeouts come from "long" transfers occuring sometimes, e.g. HL7 messages with about 600 segments, successfully transferred if we let them this "long" time.

Some messages in particular block the queue, and we don't understand why because they seem formatted the same way than other passed messages, neither they are the longest ones nor have special characters inside. Removing those messages from the queue allows the flow to be active anew, until another blocking message arrive.

At network level, we notice resets during the transfer of the blocking messages. Below an example with the reset occuring always after the transfer of the 5th chunk.

In the logs of the operation, we find disconnections and reconnections (usually it works while the message in the log is "Discarding received non-HL7 data..."). The receiving party can set up to 10 connections at a time but it seems that everyone becomes blocked as soon as the blocking message occurs :


The main question is : why does the operation not just send the message completely, instead of resetting the connection each time ?

Best regards,

Mathieu

Replies

Hello Matthieu,

I suspect this is an issue with the framing. You are using MLLP framing meaning each message is expected to be surrounded by ASCII 11 prefix and 28,13 suffix. In hex that's 0B prefix and 1C, 0D suffix. The error you're getting implies that the ASCII 13 is never being received. I would check to make sure the acks have the proper framing, and that the framing you are using for your operation matches what the downstream system expects, and vice versa.

Hello Vic and thank you for your answer ! I transferred it to the receiver of the messages and he could made a change on the settings concerning the ACK messages that are sent back. Although I have no details about the change itself, the flow was up and running after that. It is not yet for a long time ago, so I will continue checking this flow during the next days, but at first sight it seems that the problem is solved.

Best regards :-)

Great, glad I could help and hopefully that was the answer!

Well, here is an update. Following Vic's answer, we were able to make the warnings disappear ("Discarding received non-HL7 data...") although we were already working with MLLP on both sides (sender <--> receiver).

The flow still blocks sometimes though. It occurs with particular messages because abandoning them allows the flow to become running anew (and this flow blocks again when those messages are put back in it). I think that it does not mean that there is no problem in the network anymore, because the receiver is able to process those blocking messages manually. But we could at least put them apart and so not to let the entire flow be blocked.

Maybe could I reply here once more if we find other clues (or even the solution). But at this time we will probably act as I have just described, processing blocking messages manually, and maybe that putting them apart will allow us to find similarities giving us advices for the real solution.

BR smiley

If MLLP had already been in use, what change was made to prevent the "discarding received non-HL7 data" messages? Does that give any clue to what is happening with those particular messages?

Perhaps you could try enabling "Log Trace Events" which might tell you more about what step in the processing is getting stuck.

What is the process for handling these messages manually, and how does it differ from normal processing?

Those are the other kinds of things I would look at, though looking at the specific messages that trigger this issue is definitely a good step. I would definitely be interested in an update once you've figured this out.

Hello Vic, sorry for the delay. We checked and tested many things and nothing seemed compliant for understanding what was going on.

However, the entire flow has been accepted since about 1 week ago, including previously blocked messages, although no one told me to have changed/restarted anything in configurations or network parameters. So it is very strange, but we could stop thinking about this problem for now (and begin working on other ones). The joys of computing...

Thank you again for your support !

Ah, while it is disappointing that we don't have a 100% understanding of what happened here, I'm glad at least the problem hasn't been recurring.