New post

Find

Article
· Sep 29, 2020 4m read

Debugging “Server Availability Error” message when loading Web Application

In the WRC, we frequently see customers contact us because their Web Gateway is unable to serve web pages. This article will explain a frequent reason why these errors can occur, and explain some tools which can be used to debug the problem. This explanation is focused on the Web Gateway serving InterSystems IRIS instances, but the same explanation should apply to the CSP Gateway serving Caché instances as well.

The Problem:

Attempting to load a Web Application (either a custom application or the System Management Portal) results in one of the following errors (depending on your browser):

In addition, the CSP.log file shows:

Why this happens:

To understand why this happens, we need to take a look at the architecture which the Web Gateway functions in:

When you try to load your application in a browser, the browser sends a request to your Web Server. The Web Server passes this request off to the Web Gateway. The Web Gateway then has to reach out to InterSystems IRIS to understand what to do with the request. But given that the Web Gateway lives outside of InterSystems IRIS (and might be on another machine entirely), we require that the Web Gateway process authenticate to IRIS. This is the same as we would require for any other new processes connecting to IRIS, such as remote ODBC connections or a simple local IRIS Terminal Sessions.

The reason we are seeing the above errors when loading the application are because this authentication from the Web Gateway to IRIS is failing. The Web Gateway configuration stores within its CSP.ini file a set of credentials for each InterSystems IRIS server it connects to. Normally, these credentials are for the “CSPSystem” user, which is an account created by default when IRIS is installed. These credentials are then used to try and authenticate using the settings configured for the %Service_WebGateway Service in IRIS.

 To understand more information about why this authentication is failing, you can use the Audit capabilities offered by InterSystems IRIS. Given that you likely cannot use the Management Portal at this time, you can use the ^SECURITY routine in an IRIS Terminal Session in order to configure Auditing and view the Audit Log.

First off, you will need to Enable Auditing, if it has not already been enabled:

Next, make sure that Auditing for the %System/%Login/LoginFailure Event is enabled:

Once you’ve done that, you can reproduce the “Server Availability Error” problem. This should result in a LoginFailure Audit Event being logged, and you can examine the details for this event to find out more:

The “Error message” section should provide more information about why we are seeing the LoginFailure. Common problems include “User CSPSystem is disabled” or “Service %Service_WebGateway is not enabled for Password authentication”, which suggest changes which should be made to the IRIS Security Settings.

The most common problem that we see in the WRC is that authentication is failing due to “Invalid password”. This means that the CSPSystem password stored in the Web Gateway does not match the CSPSystem password stored in IRIS.

How to fix the problem:

Now that the Audit Log entry gives you a clear indication of what the mismatch is between the Web Gateway and InterSystems IRIS, you have to fix that mismatch. The IRIS-side CSPSystem credentials can be modified through the ^SECURITY menu in a Terminal session.

There are a few ways to modify the CSPSystem credentials stored in the Web Gateway. The easiest way will be to access the Web Gateway Management Portal, which is documented here: https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCGI_oper_config  Once you’ve loaded the Web Gateway Management Page, you can edit the credentials used to authenticate to IRIS by clicking the Server Access link, and editing the Connection Security settings for the relevant Server.

If you are not able to access the Web Gateway Management Page, then you are left with editing the CSP.ini file. The CSP.ini file should have sections for each Server Definition configured in the Web Gateway. You should be able to edit the “Username” and “Password” sections in the relevant section to match what is stored in InterSystems IRIS. Note that the [SYSTEM] section controls access to the Web Gateway Management Portal, not an InterSystems IRIS instance named [SYSTEM].

For example, if you have a definition to server “Test” where the CSPSystem password is incorrect:

You can edit the CSP.ini file to have the correct plaintext password:

The next time you try to authenticate from the Web Gateway to InterSystems IRIS, the password will be encrypted:

1 Comment
Discussion (1)1
Log in or sign up to continue
Discussion
· Sep 18, 2020

Intersystems Mirroring of code databases

Hello!

First of all, let me state that I am no senior InterSystems expert. 

In my organization, we have a HealthShare Health Connect setup where each namespace has one code database and one data database, which are both actively mirrored. We have two nodes in the mirror.

We had a controlled failover last night to make sure that the backup node works as intended, which it didn't. It turned out that we had only deployed code onto the primary node in several namespaces causing errors with missing classes after the failover. So it seems that each time you deploy productions, you have to manually deploy it to both instances (the primary and failover). That makes me wonder:

  1. What is actually mirrored when you mirror a code database
    1. Obviously not new classes
    2. changes to existing classes?
    3. settings on the production adapters?
    4. something else?
  2. How do you guys go about deploying new code? 
    1. Are you utilizing some kind of automation tool to keep the mirrored nodes consistent regarding code and versions of code?
    2. Are you just manually deploying to each node and have good routines doing it?
  3. Or do we have some kind of faulty setup which makes this not work as intended?

I don't think our setup is faulty, I think we just missed this a bunch of times which makes me want to abstract this to a way where you deploy to something that deploys the same code to both nodes. 

An example: We have 3 environments (production, QA and test), for each of QA and prod, we receive webservice requests from 2 different networks, an internal network and an external one. For each network, we have a pair of web servers running httpd with web gateway. This makes 4 web server hosts for each production and qa environment and in the test environment, we have slimmed this to only have the one pair. Totally 10 web servers. This is bound to be time consuming to maintain and create inconsistency and details that is not done exactly the same between the hosts if you are not extremely thorough, if you would do it manually. So we use Ansible. I have made a playbook and a set of configs for each environment and each network type so each pair is treated exactly the same and the playbook is always used to deploy changes and keep consistency. 

I would like to achieve something similar with deploying code to our HeathConnect mirrored instances. 

How do you guys do it?

5 Comments
Discussion (5)1
Log in or sign up to continue
Question
· Sep 11, 2020

Data Drive Web Apps Questions

Hey there,

I posted a reply to the recent video https://community.intersystems.com/post/new-video-building-data-driven-web-apps#comment-132511
with a slew of questions.  Wondering if someone can take a look a address my questions on that post or on this new thread.

I find many InterSystems learning to be challenging to follow, but I"m not sure if it's because my work computer is so locked down that I don't have all the right tools and permissions I need to do what the tutorial asks, or if I'm not understanding the basics of Cache/IRIS, or both!

Mike

2 Comments
Discussion (2)1
Log in or sign up to continue
Question
· Sep 4, 2020

[Resolved] Task manager email notification

Hi, 

I created a task from Management portal  Task manager to use the Ens.Util.Tasks.Purge task . Task set up includes email notification setup for Completion email and error email.

This task is giving an error  and no email is generated: 

<CLASS DOES NOT EXIST>zSendMail+22^%SYS.TaskSuper.1 *Security.SSLConfigs

I tested all other task types available from Ens.Util.task but all are giving the same error.

Not sure if this Is this a bug or some missing configuration in the task setup ? Anyone noticed any similar issue or any idea how to fix this ? 


Thank you for your help.

Regards,

Mary

7 Comments
Discussion (7)2
Log in or sign up to continue
Article
· Sep 2, 2020 7m read

Integrity Check: Speeding it Up or Slowing it Down

While the integrity of Caché and InterSystems IRIS databases is completely protected from the consequences of system failure, physical storage devices do fail in ways that corrupt the data they store.  For that reason, many sites choose to run regular database integrity checks, particularly in coordination with backups to validate that a given backup could be relied upon in a disaster.  Integrity check may also be acutely needed by the system administrator in response to a disaster involving storage corruption.  Integrity check must read every block of the globals being checked (if not already in buffers), and in an order dictated by the global structure. This takes substantial time, but integrity check is capable of reading as fast as the storage subsystem can sustain.  In some situations, it is desirable to run it in that manner to get results as quickly as possible.  In other situations, integrity check needs to be more conservative to avoid consuming too much of the storage subsystem’s bandwidth. 

Plan of Attack

This following outline caters for most situations.  The detailed discussion in the remainder of this article provides the necessary information to act on any of these, or to derive other courses of action. 

  1. If using Linux and integrity check is slow, see the information below on enabling Asynchronous I/O. 
  2. If integrity check must complete as fast as possible - running in an isolated environment, or because results are needed urgently - use Multi-Process Integrity Check to check multiple globals or databases in parallel.  The number of processes times the number of concurrent asynchronous reads that each process will perform (8 by default, or 1 if using Linux with asynchronous I/O disabled) is the limit on the number of concurrent reads in flight.  Consider that the average may be half that and then compare to the capabilities of the storage subsystem.  For example, with storage striped across 20 drives and the default 8 concurrent reads per process, five or more processes may be needed to capture the full capacity of the storage subsystem (5*8/2=20).
  3. When balancing integrity check speed against its impact on production, first adjust the number of processes in the Multi-Process Integrity Check, then if needed, see the SetAsyncReadBuffers tunable.  See Isolating Integrity Check below for a longer-term solution (and for eliminating false positives).
  4. If already confined to a single process (e.g. there’s one extremely large global or other external constraints) and the speed of integrity check needs adjustment up or down, see the SetAsyncReadBuffers tunable below.

Multi-Process Integrity Check

The general solution to get an integrity check to complete faster (using system resources at a higher rate) is to divide the work among multiple parallel processes.  Some of the integrity check user interfaces and APIs do so, while others use a single process.  Assignment to processes is on a per-global basis, so checking a single global is always done by just one process (versions prior to Caché 2018.1 divided the work by database instead of by global).

The principal API for multi-process integrity check is CheckLIst^Integrity (see documentation for details). It collects the results in a temporary global to be displayed by Display^Integrity. The following is an example checking three databases using five processes. Omitting the database list parameter here checks all databases.

set dblist=$listbuild(“/data/db1/”,”/data/db2/”,”/data/db3/”)
set sc=$$CheckList^Integrity(,dblist,,,5)
do Display^Integrity()
kill ^IRIS.TempIntegrityOutput(+$job)

/* Note: evaluating ‘sc’ above isn’t needed just to display the results, but...
   $system.Status.IsOK(sc) - ran successfully and found no errors
   $system.Status.GetErrorCodes(sc)=$$$ERRORCODE($$$IntegrityCheckErrors) // 267
                           - ran successfully, but found errors.
   Else - a problem may have prevented some portion from running, ‘sc’ may have 
          multiple error codes, one of which may be $$$IntegrityCheckErrors. */

Using CheckLIst^Integrity like this is the most straight-forward way to achieve the level of control that is of interest to us.  The Management Portal interface and the Integrity Check Task (built-in but not scheduled) use multiple processes, but may not offer sufficient control for our purposes.*

Other integrity check interfaces, notably the terminal user interface, ^INTEGRIT or ^Integrity, as well as Silent^Integrity, perform integrity check in a single process. These interfaces, therefore, do not complete the check as fast as it's possible to achieve, and they use fewer resources.  An advantage, though, is that their results are visible, logged to a file or output to the terminal, as each global is checked, and in a well-defined order.

Asynchronous I/O

An integrity check process walks through each pointer block of a global, one at a time, validating each against the contents of the data blocks it points to.  The data blocks are read with asynchronous I/O to keep a number of read requests in flight for the storage subsystem to process, and the validation is performed as each read completes. 

On Linux only, async I/O is effective only in combination with direct I/O, which is not enabled by default until InterSystems IRIS 2020.3.  This accounts for a large number of cases where integrity check takes too long on Linux.  Fortunately, it can be enabled on Cache 2018.1, IRIS 2019.1 and later, by setting wduseasyncio=1 in the [config] section of the .cpf file and restarting.  This parameter is recommended in general for I/O scalability on busy systems and is the default on non-Linux platforms since Caché 2015.2.  Before enabling it, make sure that you’ve configured sufficient memory for database cache (global buffers) because with Direct I/O, the databases will no longer be (redundantly) cached by Linux.  When not enabled, reads done by integrity check complete synchronously and it cannot utilize the storage efficiently. 

On all platforms, the number of reads that an integrity check process will put in flight at one time is set to 8 by default.  If you must alter the rate at which a single integrity check process reads from disk this parameter can be tuned – up to get a single process to complete faster, down to use less storage bandwidth.  Bear in mind that:

  • This parameter applies to each integrity check process.  When multiple processes are used, the number of processes multiplies this number of in-flight reads  Changing the number of parallel integrity check processes has a much larger impact and therefore is usually the first thing to do.  Each process is also limited by computational time (among other things) so there increasing the value of this parameter is limited in its benefit.
  • This only works within the storage subsystem’s capacity to process concurrent reads. Higher values have no benefit if databases are stored on a single local drive, whereas a storage array with striping across dozens of drives can process dozens of reads concurrently.

To adjust this parameter from the %SYS namespace, do SetAsyncReadBuffers^Integrity(value). To see the current value, write $$GetAsyncReadBuffers^Integrity(). The change takes effect when the next global is checked.  The setting currently does not persist through a restart of the system, though it can be added to SYSTEM^%ZSTART.

There is a similar parameter to control the maximum size of each read when blocks are contiguous on disk (or nearly so).  This parameter is less often needed, though systems with high storage latency or databases with larger block sizes could possibly benefit from fine tuning.  The value has units of 64KB, so a value of 1 is 64KB, 4 is 256KB, etc.  0 (the default) lets the system to select and it currently selects 1 (64KB).  The ^Integrity function for this parameter, parallel to those mentioned above, are SetAsyncReadBufferSize and GetAsyncReadBufferSize.

Isolating Integrity Check

Many sites run regular integrity checks directly on the production system. This is certainly the simplest to configure, but it’s not ideal.  In addition to concerns about integrity check’s impact on storage bandwidth, concurrent database update activity can sometimes lead to false positive errors (despite mitigations built into the checking algorithm).  As a result, errors reported from an integrity check run on production need to be evaluated and/or rechecked by an administrator.

Often times, a better option exists.  A storage snapshot or backup image can be mounted on another host, where an isolated Caché or IRIS instance runs the integrity check.  Not only does this prevent any possibility of false positives, but if the storage is also isolated from production, integrity check can be run to fully utilize the storage bandwidth and complete much more quickly.  This approach fits well into the model where integrity check is used to validate backups; a validated backup effectively validates production as of the time the backup was made.  Cloud and virtualization platforms can also make it easier to establish a usable isolated environment from a snapshot.

 


The Management Portal interface, the Integrity Check Task and the IntegrityCheck method of SYS.Database select a rather large number of processes (equal to the number of CPU cores), lacking the control that’s needed in many situations. The management portal and the task also perform a complete recheck of any global that reported error in effort to identify false positives that may have occurred due to concurrent updates. This recheck occurs above and beyond the false positive mitigation built into the integrity check algorithms, and that may be unwanted in some situations due to the additional time it takes (the recheck runs in a single process and checks the entire global). This behavior may be changed in the future.

8 Comments
Discussion (8)0
Log in or sign up to continue