Question
Rochdi Badis · Sep 4

System having trouble writing data to disk

Hi Guys,

I've this warning message in SMP (attached below), we have more than enough disk space (1.5TB free) so not sure where to check and what could be the problem, eg. which database, global or process ...etc?

 

 

Thanks 

Product version: Caché 2014.1
0
0 239
Discussion (17)2
Log in or sign up to continue

This is not a nice error to see in a production environment. If this is a production site, I recommend contacting the Support department. 

To find the problem, you can look at the messages.log. There should be an error written with more details.

Most of the time, these errors are related to disk permissions or disk stress. Tools like ^SystemCheck and/or ^SystemPerformance will gather the required information to diagnose the problem. If you contact Support, they will probably ask for this. 

Ups! I just realized you wrote the system is Caché and not Iris. So, instead of the messages.log, you should read cconsole.log. Instead of ^SystemCheck, it would be ^Buttons, and, instead of ^PerformanceCheck, it will be ^pButtons. 

The console file is referring to something called Wanvalue what's that!? 

I'm suspecting that somewhere in the system a Warning Value maybe has been set to 75% and if reached the system starts throwing those warnings?  

Thanks

You marked wrong lines, which would alert you in this case, any mentioning of Write Daemon is

And in your case, many alerts that Write Daemon completed a pass. And most it means, that your disk is too slow. So, check the disk queue and how fast it works.

And WIJ file more than 8 GB, is bad sign, for sure.

Thanks but don't know how to check disk queues or activities, is there a tool or something?

and My understanding is that New or changes are written in Journal files as attached in my screen and first time I heard about the WIJ file?  

@Dmitry Maslennikov , WIJ file of 8GB means that the system wrote 8GB of cache buffers. This doesn't have to be a bad sign by itself and can be perfectly normal. 

These warnings suggest a performance problem with the CPU and disk bottlenecks. 

If you are running very intensive processes, you should see these warnings go away after they finish. However, if you keep seeing them continuously, I recommend going deeper and analyzing the performance. 

Analyzing the performance takes time, and it's not always easy for non-experts. There is a great series of articles written in this community that can help with this: 

https://community.intersystems.com/post/intersystems-data-platforms-capa...

If you don't want to go through all these series and can easily increase the system resources, it would be an easy way to probe and solve the problem ;-). What I mean by increasing system resources is by adding more CPUs and faster disks. Nowadays, with Virtual machines and cloud systems, this is quite simple. 

And let me insist again, if this is a production system, you may want to open a WRC problem for extra help. Having the disk too stressed will end in User/Application pauses. 

The "write daemon" is a process that is responsible to write all new/changed data to the disk where the WIJ (write image journal) file is located. Only then, actual databases are being updated.

Sometimes, when the system is very busy with a lot of pending writes this error appears, and then after few minutes it is cleared automatically. (e.g. when you rebuild a huge index, or do some data migration process).

I would monitor the disk activity for the disk that the WIJ file is located on (by default its on the same disk you installed Cache).

One solution is to move the WIJ to a different disk, less occupied. This will give the "write daemon" more writing capabilities (you will have to restart Cache).    

I thought that New/Changes are in the journal files as attached below

But you mentioned that new/changes are first written in WIJ file as below?

And yes as you mentioned that error comes and goes, but the file is located in a drive with a enough space, so any ideas on how do I monitor the activities in that disk? 

Thanks Yaron

Yes, you have also journal files...  they keep all the changes (set, kills, start/end transactions) made to the DB (after actual write to the DBs) and also to be able to roll back transactions.

The write daemon and the WIJ file is more to keep DB physical "integrity" in case of a failure, and its before actual data is being written to the DBs

I see you are using windows. So just look at the windows "task manager" for the "active time" of the disk D:\ If you see that there are times that you hit the 100% "active time" then move the WIJ to a different disk. This will improve performance.

To build on Yaron's answer, our documentation is quite helpful to understand the role of journaling and the WIJ. The individual chapters on each item respectively of course go into further detail, but here's a brief section describing the 2.

Differences Between Journaling and Write Image Journaling

I would also echo all of Mario's guidance - and to answer one of your specific questions above, the "warnvalue" is a configurable threshold for when a warning should be thrown, in your case I believe the default of 75% cpu usage. 

Anything else I would say I think has been covered by other commenters.

hi Rochdi, what purge have you got for the journals and when did you last perform a backup? What space is available in yr journal and alt-journal directory? Ian

Hi Ian,

Journal files are purged daily and I still have 1.4TB of space.

But that's actually what I'm confused about, to what I know New/Changes are written in journal files in the below directory, but what's that Cache.wij journal file which is in my case stored ..\mgr ?

 

Thanks

sorry, I should have read all the messages. I can see you have enough disk space. Contact WRC