Article

Patrick Jamieson · Jul 19 10m read

#Beginner #Deployment #Performance #System Administration #Web Gateway #InterSystems IRIS for Health

Optimizing Performance of Apache Web Server and Web Gateway

By Patrick Jamieson, M.D., Product Technical Manager, InterSystems IRIS for Health

When working with InterSystems IRIS or IRIS for Health, an external web server like Apache2 or NGINX is essential for managing HTTP workloads, especially for FHIR servers. Starting with version 2023.3, the private Apache server was removed from the installation kit (except for Community Editions and Health Connect). This article provides practical tips to test and optimize the performance of your Apache web server and InterSystems Web Gateway configuration.

📋 Why Performance Optimization Matters

Out-of-the-box configurations are rarely ideal for production workloads. Optimizing Apache and the Web Gateway can significantly reduce response times, improve throughput, and handle higher concurrency under load.

🛠️ Testing Performance Using Apache Bench

Apache Bench (ab) is a powerful benchmarking tool that can simulate high load, identify bottlenecks, and measure metrics like response time and concurrency.

Installing Apache Bench

On Linux, install it with:

sudo apt-get install apache2-utils

Running Tests

Start by using a simple load test:

ab -n 100 -c 10 https://yourserver/endpoint

-n: Number of requests (e.g., 100)
-c: Concurrent requests (e.g., 10)

Apache Bench (bench) will now make 100 requests with a maximum of 10 requests running concurrently.

Here are my results using the Apache2 server with a default IRIS for Health server and gateway configuration settings with a FHIR server deployed on AWS.

Server Software:

Server Hostname: fhir.testintersystems.com

Document Length: 2303 bytes

Concurrency Level: 10

Time taken for tests: 1.161 seconds

Complete requests: 100

Failed requests: 0

Total transferred: 264500 bytes

HTML transferred: 230300 bytes

Requests per second: 86.11 [#/sec] (mean)

Time per request: 116.126 [ms] (mean)

Time per request: 11.613 [ms] (mean, across all concurrent requests)

Transfer rate: 222.43 [Kbytes/sec] received

*Connection Times (ms)

min mean[+/-sd] median max

Connect: 4 20 8.6 18 57

Processing: 3 87 177.9 17 705

Waiting: 3 85 178.1 14 701

Total: 19 107 180.0 38 762

*Connect: How long it takes ab to establish a TCP connection with the target server before writing the request to the connection (ctime)

Processing: How long the connection was open after being created (time - ctime)

Waiting: How long ab waits after sending the request before beginning to read a response from the connection (waittime)

Total: The time elapsed from the moment ab attempts to make the connection to the time the connection closes (time)

* https://www.datadoghq.com/blog/apachebench/

So far, these statistics look good, but when then I increased the load on the server.

ab -n 2000 -c 75 https://fhir.testintersystems.com/csp/healthshare/demo/fhir/r4/Patient/1

Now I saw:

Server Hostname: fhir.testintersystems.com

Document Length: 2303 bytes

Concurrency Level: 75

Time taken for tests: 10.811 seconds

Complete requests: 2000

Failed requests: 12

(Connect: 0, Receive: 0, Length: 12, Exceptions: 0)

Non-2xx responses: 12

Total transferred: 5265484 bytes

HTML transferred: 4583092 bytes

Requests per second: 185.00 [#/sec] (mean)

Time per request: 405.401 [ms] (mean)

Time per request: 5.405 [ms] (mean, across all concurrent requests)

Transfer rate: 475.65 [Kbytes/sec] received

Connection Times (ms)

min mean[+/-sd] median max

Connect: 5 118 63.9 120 310

Processing: 7 284 835.4 86 5397

Waiting: 3 262 838.6 65 5384

Total: 12 402 831.5 205 5417

Percentage of the requests served within a certain time (ms)

50% 205

66% 226

75% 276

80% 296

90% 379

95% 2069

98% 4195

99% 4832

100% 5417 (longest request)

Notice with many more requests per second, the time to process each request has increased significantly to over 400 ms. I saw that a few of the requests didn't even return a response -- a failed request.

The bottlenecks required I modify specific configuration settings in Apache2 to address these performance issues. The Apache2 server ships with a selection of Multi-Processing Modules (MPMs) which are responsible for binding to network ports on the machine, accepting requests, and dispatching children to handle the requests. For instance, I could adjust the MaxClients or ServerLimit directive to improve the number of concurrent connections the server can handle. I could also tweak the KeepAliveTimeout or Timeout settings to optimize how long connections are kept open.

⚙️ Identifying and Addressing Bottlenecks

Here are key optimization steps:

1. Modify Apache2 MPM Configuration

Edit /etc/apache2/mods-enabled/mpm_worker.conf:

<IfModule mpm_worker_module>
    ServerLimit            200
    StartServers           25
    MaxRequestWorkers      5000
    MinSpareThreads        75
    MaxSpareThreads        250
    ThreadsPerChild        25
</IfModule>

2. Tune Web Gateway Settings

Max Server Connections: 8000
Server Response Timeout: 900
No Activity Timeout: 86401

Restart Apache after changes:

sudo service apache2 restart

🚀 Retesting Performance

Re-run ab with higher concurrency:

ab -n 2000 -c 75 https://yourserver/endpoint

Monitor improvements in:

Requests per second
Failed requests
Connection times

Repeating the earlier bench command:
ab -n 2000 -c 75 https://fhir.testintersystems.com/csp/healthshare/demo/fhir/r4/Patient/1

I now saw the following statistics

Server Hostname: fhir.testintersystems.com

Concurrency Level: 75

Time taken for tests: 5.975 seconds

Complete requests: 2000

Failed requests: 0

Total transferred: 5290000 bytes

HTML transferred: 4606000 bytes

Requests per second: 334.75 [#/sec] (mean)

Time per request: 224.045 [ms] (mean)

Time per request: 2.987 [ms] (mean, across all concurrent requests)

Transfer rate: 864.67 [Kbytes/sec] received

Connection Times (ms)

min mean[+/-sd] median max

Connect: 6 136 24.6 135 286

Processing: 7 84 23.4 82 171

Waiting: 3 47 18.8 45 160

Total: 13 221 28.3 219 363

Percentage of the requests served within a certain time (ms)

50% 219

66% 223

75% 227

80% 229

90% 257

95% 281

98% 296

99% 312

100% 363 (longest request)

Notice with the retest (after performance tuning) the total connection time has dropped by 50%, and 99% of our requests are being completed within 312 ms, compared to the previous 4.8 sec. Notice also we are no longer seeing any failed requests.

📊 Monitor System Performance

This short post mentioned a few factors that could limit overall HTTP performance. Memory, CPU, and disk utilization could also create performance limitations. To understand these performance bottlenecks, it may be necessary to monitor system performance over 30 minutes to several hours.

Use the IRIS system performance tool to identify server-side bottlenecks:

Do ^SystemPerformance

📝 Key Takeaways

Optimize Apache’s MPM settings for higher concurrency.
Fine-tune Web Gateway configurations to align with Apache limits.
Use Apache Bench iteratively to test and validate changes.

Performance optimization is an ongoing process—continuously monitor, test, and refine your configurations to meet growing demands.

For further details, refer to IRIS Web Gateway Documentation.

Happy optimizing! 🚀

Tomas Vaverka · Jul 23

Server Response Timeout: 900
No Activity Timeout: 86401

Why are there such settings? Are they specific to the FHIR server?

Server Response Timeout 900 sec means we wait 15 minutes to get the response, then give up. Is this the issue? If there are such long responses, I would better investigate the performance of the given page or server.

For No Activity Timeout: 86401, if you have a hiccup and it creates thousands of new connections, it will stay there for another 24 hours if idle. I would prefer a short timeout (the default is 900 seconds) to free up resources sooner.

0 0