Question
· Sep 4, 2022

System having trouble writing data to disk

Hi Guys,

I've this warning message in SMP (attached below), we have more than enough disk space (1.5TB free) so not sure where to check and what could be the problem, eg. which database, global or process ...etc?

 

 

Thanks 

Product version: Caché 2014.1
Discussion (17)2
Log in or sign up to continue

This is not a nice error to see in a production environment. If this is a production site, I recommend contacting the Support department. 

To find the problem, you can look at the messages.log. There should be an error written with more details.

Most of the time, these errors are related to disk permissions or disk stress. Tools like ^SystemCheck and/or ^SystemPerformance will gather the required information to diagnose the problem. If you contact Support, they will probably ask for this. 

These warnings suggest a performance problem with the CPU and disk bottlenecks. 

If you are running very intensive processes, you should see these warnings go away after they finish. However, if you keep seeing them continuously, I recommend going deeper and analyzing the performance. 

Analyzing the performance takes time, and it's not always easy for non-experts. There is a great series of articles written in this community that can help with this: 

https://community.intersystems.com/post/intersystems-data-platforms-capa...

If you don't want to go through all these series and can easily increase the system resources, it would be an easy way to probe and solve the problem ;-). What I mean by increasing system resources is by adding more CPUs and faster disks. Nowadays, with Virtual machines and cloud systems, this is quite simple. 

And let me insist again, if this is a production system, you may want to open a WRC problem for extra help. Having the disk too stressed will end in User/Application pauses. 

The "write daemon" is a process that is responsible to write all new/changed data to the disk where the WIJ (write image journal) file is located. Only then, actual databases are being updated.

Sometimes, when the system is very busy with a lot of pending writes this error appears, and then after few minutes it is cleared automatically. (e.g. when you rebuild a huge index, or do some data migration process).

I would monitor the disk activity for the disk that the WIJ file is located on (by default its on the same disk you installed Cache).

One solution is to move the WIJ to a different disk, less occupied. This will give the "write daemon" more writing capabilities (you will have to restart Cache).    

Yes, you have also journal files...  they keep all the changes (set, kills, start/end transactions) made to the DB (after actual write to the DBs) and also to be able to roll back transactions.

The write daemon and the WIJ file is more to keep DB physical "integrity" in case of a failure, and its before actual data is being written to the DBs

I see you are using windows. So just look at the windows "task manager" for the "active time" of the disk D:\ If you see that there are times that you hit the 100% "active time" then move the WIJ to a different disk. This will improve performance.

To build on Yaron's answer, our documentation is quite helpful to understand the role of journaling and the WIJ. The individual chapters on each item respectively of course go into further detail, but here's a brief section describing the 2.

Differences Between Journaling and Write Image Journaling

I would also echo all of Mario's guidance - and to answer one of your specific questions above, the "warnvalue" is a configurable threshold for when a warning should be thrown, in your case I believe the default of 75% cpu usage. 

Anything else I would say I think has been covered by other commenters.