EnsAlertRequest Summary Queries timing out /

Primary tabs

Off the back of the Interface Monitoring post I had created a class that queries the Ens.AlertRequest global and returns the entries between 6pm the night before and 6am in the morning.   

I tested this build in our T&D environments and the build worked very well.

However in our production environment the query is being truncated, by what I believe to be a timeout and I get a partial query output.

In the System>SQL pages my 12 hour query times out.

I compared the Global size by running a SELECT MAX(ID) query and got a return of 60,244,962 records.  This may go some way to explain why the query is taking longer and possibly timing out.

In our T&D environments (which we don't purge), we have only 1 million and 2.5 million records.

For messages we in production we have a 90 day purge limit.

I wonder if the Ens.AlertRequest global isn't being purged and if so why it isn't as this global is considered to be a message in my mind.

Any advice please.

Regards

Stuart

 

 

 

Replies

Please find my method below which is used to drive a Business Operation.  It's activated by a schedule on a daily basis at 6am:

Method OnInit(Output pResponse As Ens.Response) As %Status
{
//set the DST variable
set DST = $SYSTEM.Util.IsDST()
// set pre-DST edited timestamps for the email header
set emailSummaryStartDAT = $ZDATE($H-1,3)_" "_(..SummaryStartTime)
set emailSummaryFinishDAT = $ZDATE($H,3)_" "_(..SummaryFinishTime)
// ..SummaryStartTime DST handling IF DST =1 adjust SummaryStartTime & SummaryFinishTime Property's H-1
IF DST = 1 {
set ..SummaryStartTime = ($P(..SummaryStartTime,":",1)-1)_":"_$P(..SummaryStartTime,":",2,3)
set ..SummaryFinishTime = ($P(..SummaryFinishTime,":",1)-1)_":"_$P(..SummaryFinishTime,":",2,3)
$$$LOGINFO("Daylight Savings is being applied to timestamps")
}
// IF the hour part of timestamp is <10 then add a leading 0 to make a full UTC timestamp
IF ($P(..SummaryStartTime,":",1)<10){
set ..SummaryStartTime = "0"_..SummaryStartTime
}
IF ($P(..SummaryFinishTime,":",1)<10){
set ..SummaryFinishTime = "0"_..SummaryFinishTime
}
//Set Search Start Date/Time of yesterday with SummaryStartTime
set SummaryStartDAT = $ZDATE($H-1,3)_" "_(..SummaryStartTime)
//Set Search Finish Date/Time for today with SummaryFinishtime
set SummaryFinishDAT = $ZDATE($H,3)_" "_(..SummaryFinishTime) //Obtain the ID of the first and last message in the search timeframe
&sql(SELECT MAX(ID),MIN(ID),COUNT(ID)INTO :LastID,:FirstID,:ErrorCount FROM Ens.AlertRequest WHERE AlertTime BETWEEN :SummaryStartDAT AND :SummaryFinishDAT) /// Log Ens.AlertRequest ID range and the no. of errors
$$$LOGINFO("AlertRequest ID's identified between: "_FirstID_" & "_LastID_" No. of Errors is: "_ErrorCount)
  //build the email subject line
  Set Subject = "Automatic alert from AIE: "_$NAMESPACE_" Email Alert Summary between:"_emailSummaryStartDAT_" & "_emailSummaryFinishDAT
  //build the email Body
  Set emailBody = "This is a summary of alerts generated by the CUH Application Integration Engine (AIE)."
  set emailBody = emailBody_"<br><br>"
  set emailBody = emailBody_"This email covers alerts between: "_emailSummaryStartDAT_" & "_emailSummaryFinishDAT
  set emailBody = emailBody_"<br><br>"
  Set emailBody = emailBody_"Namespace: "_$NAMESPACE
  Set emailBody = emailBody_"<br><br>"
  set emailBody = emailBody_"No of errors is: "_ErrorCount
  Set emailBody = emailBody_"<br><br>"
  // Check if any errors have been identified
  //If errors identified create table
  IF ErrorCount '= 0 {
  // set table header
  set errortbl = "<head><style>table, th, td { border: 0px solid black;}</style></head><table style='width:100%'><tr><th>Alert ID</th><th>AlertText</th><th>AlertTime</th><th>SourceConfigName</th> </tr>"
  //Populate the error table using sql cursor query to loop through the SQL resultset.
  &sql(DECLARE C1 CURSOR FOR SELECT ID, AlertText, AlertTime, SourceConfigName INTO :ID, :AlertText, :AlertTime, :SourceConfigName  FROM Ens.AlertRequest WHERE AlertTime BETWEEN :SummaryStartDAT AND :SummaryFinishDAT ORDER BY SourceConfigName, AlertTime ASC)
  &sql(OPEN C1)
  IF SQLCODE '= 0 {
  $$$LOGERROR("SQL Error Code: "_SQLCODE)
  }
QUIT:(SQLCODE'=0)
&sql(FETCH C1)
//Loop through Alert fields from the time period specified.
While SQLCODE = 0 {
IF DST = 1 {
set AlertHour = $P(AlertTime," ",2)
set $P(AlertHour,":",1) = ($P(AlertHour,":",1)+1)
// Check if updated time is before 10:00am if so a "0" prefix needs adding
IF ($P(AlertHour,":",1)<10){
set $P(AlertHour,":",1) = "0"_$P(AlertHour,":",1)
}
set AlertTime = ($P(AlertTime," ",1)_" "_AlertHour)
  }
Set errortbl = errortbl_"<tr> <td>"_ID_"</td> <td>"_AlertText_"</td> <td>"_AlertTime_"</td>"
set errortbl = errortbl_"<td>"_SourceConfigName_"</td></tr>"
&sql(FETCH C1)
}
&sql(CLOSE C1)
//End of HTML Table
set errortbl = errortbl_"</table>"
Set errortbl = errortbl_"<br>"
set emailBody = emailBody_errortbl}
//handling for no errors. No table is created and comment below written
ELSE {set emailBody = emailBody_"No errors have been identified between: "_emailSummaryStartDAT_" & "_emailSummaryFinishDAT_"<br><br>"}
  // Set Email Footer
  set emailBody = emailBody_"Do NOT reply to this email."
  Set emailBody = emailBody_"<br>"
Set emailBody = emailBody_"<br>Application Integration Engine (AIE) | eHospital | "
Set emailBody = emailBody_"<br>Cambridge University Hospitals NHS Foundation Trust | Addenbrookes Hospital | Box 117 |"
Set emailBody = emailBody_"<br>"
Set emailBody = emailBody_"This email is confidential, see www.cuh.org.uk/email_disclaimer.html"
  // Build new mail structure.
$$$LOGINFO("Attempting to send Alert Summary email")
IF (..From = "") {
SET From = "noreply@addenbrookes.nhs.uk"
}
IF (..To = "") {
$$$LOGWARNING("Warning - No email provided => No email sent")
}
ELSE {
SET mail = ##class(%Net.MailMessage).%New()
// Define sender.
    SET mail.From = ..From
    // Define recipients.
    // Converts String into list of string.
    DO mail.To.InsertList($lfs(..To,";"))
// Define subject.
    SET mail.Subject = Subject
//Char set to "utf-8" to allow for html manipulation
SET mail.Charset = "utf-8"
SET mail.ContentType = "text/html"
SET mail.IsHTML = 1
// Define body.
DO mail.TextData.WriteLine(emailBody)
// Check if high priority is needed.
set Priority = 1
IF Priority = 1 {
DO mail.Headers.SetAt(1,"X-Priority")
  DO mail.Headers.SetAt("High","X-MSMail-Priority")
  DO mail.Headers.SetAt("High","Importance")
}
// Build the smtp server to send the email.
SET server=##class(%Net.SMTP).%New()
SET server.smtpserver=..SMTPServer
SET server.port=..SMTPPort
SET status=server.Send(mail)
   If $$$ISERR(status) {
QUIT $$$ISERR(status)
}
$$$LOGINFO("Alert Summary Email has been sent")
}
QUIT $$$OK
} }

MAX(ID) isn't necessarily the record count. Try a "select count(*) from Ens.AlertRequest" query and see what you get. Compare that to "'select count(*) from Ens.MessageHeader where MessageBodyClassName = 'Ens.AlertRequest'"

If the numbers are in line with the MAX(ID), then my suspicion is that you either don't have "BodiesToo" checked or do have "KeepIntegrity" checked in your purge process configuration. Either of those may be keeping old Ens.AlertRequest bodies around.

Hi Jeffrey,

SELECT COUNT(ID) & SELECT COUNT (*) are timing out via the System>SQL pages.

I had a look at our purge settings and we have "BodiesToo" unchecked and "KeepIntegrity" checked.

The next way forward would be to investigate the effect of purging with the "BodiesToo" and see if this reduces the EnsAlertRequest global size.

If you have access to Caché terminal, you can run run queries that won't time out:

(the sample below assumes your namespace is "PROD"; just substitute whatever your production's namespace is for that).

USER> zn "PROD"

PROD> d $system.SQL.Shell()

SQL Command Line Shell
----------------------------------------------------

The command prefix is currently set to: <<nothing>>.
Enter q to quit, ? for help.
PROD>>SELECT COUNT(*) AS AlertCount FROM Ens.MessageHeader WHERE MessageBodyClassName = 'Ens.AlertRequest'

AlertCount
2205

PROD>>Q

PROD>

So ... if you don't have BodiesToo checked, you most likely have lots of orphaned message bodies taking up database space. And KeepIntegrity is probably retaining a lot of message headers (and associated bodies) that you don't care about anymore. There are reasons you would not want to turn KeepIntegrity off in earlier versions of Caché/Ensemble, like pre-2015 releases. If you're on a release more modern than that and you don't need to worry about messages with parent/child relationships (certain batch types, for example), you can probably turn that off.

There are a couple of articles regarding the management of orphaned bodies here on DC. Might be worthwhile to peruse them :)

Thanks Jeffrey.

I've spoken to our Caché ODBA and he concurs there are orphaned records in the Ens.Message* globals.

We're on 2017.1 currently, but have been live since Caché 2012, which may explain todays issues.

Looking at the purge tasks we have records with no timestamp, which would escape the purging process.

Our Caché ODBA is going to open a WRC to confirm the best course of action for a one off purge of these Ens.Message* globals.

Thank you so much for your advice as it has helped immensely identifying this issue.