Windows write caching
(This article was reviewed in February 2021. It is still relevant to Caché-based installations and similarly applies to IRIS-based installations.)
This article discusses the Windows write caching setting which can leave systems vulnerable to data loss or corruption in the event of power loss or operating system crash. The setting is on by default in some Windows configurations.
Having Windows write caching enabled for a disk means that some of the things Caché (or any program) writes to that disk will not necessarily be immediately committed to durable storage (even though Caché flushes writes from the OS cache to disk at certain critical points in its write phase). If the computer loses power, whatever has been cached for that device will be lost unless the cache for that device is non-volatile or battery-backed. Caché depends on the operating system to guarantee that data is durable. In this scenario, the guarantee is broken. For Caché, that can result in database corruption or data missing from databases or journal files.
InterSystems documents that one of the things that can break the guarantees provided by Write Image Journaling is the loss of write-back cache contents (see http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=GCDI_wij#GCDI_wij_limits ). InterSystems Worldwide Response Center’s data integrity team has investigated a number of data loss or corruption cases on Windows platforms where evidence has indicated write-back cache content was lost due to the value of this setting.
It’s worth mentioning that a disk can have a cache that would make it robust against this kind of problem. If the cache of the disk in question is non-volatile or is battery backed, writes to that disk should be safe even when this setting is on. If the storage in question is more complex than a directly connected disk, you need to understand where writes are cached in that storage infrastructure and whether those caches are volatile or battery backed to assess this risk.
You can see the setting by going to Device Manager, expanding the Disk Drives section, and looking at the Properties for a given disk. The setting we're interested in is on the Policies tab.
The wording is not always identical to what you see here, and it may differ by the type of device. However, this is a common presentation of the wording, and one where Windows is making it clear that having this setting turned on exposes your system to risk of data loss if the machine loses power or crashes.
Next is an example from another disk on the same machine where the impact is a little less clear. Choosing "Better performance" here would come with the same concerns as Enable Write Caching in the other example.
In both of these examples, the setting you see selected was the default for that device – I hadn’t changed it. You can see that in the first example the default put the device at risk, but not in the second. As far as I know, there’s no universal default based on the device type or the version of Windows. In other words, this is a setting that needs to be checked on a per device basis to know whether that device has this risk.
There are three basic ways to deal with this scenario as a system administrator. Disabling the setting is the simplest way to ensure that you're not exposed to this risk. However, it's possible that disabling the setting will have an unacceptable performance impact. If that's the case, you may prefer to leave the setting on and connect the computer to an uninterruptible power supply. Doing that offers protection against power loss causing data loss or corruption, since the UPS should give you enough time to shut down gracefully when power is lost. The last option is simply to accept the risk of data being lost when the server loses power or crashes. InterSystems recommends against this option. Consumer-grade UPS are relatively inexpensive, and detecting and recovering from the integrity problems can be time consuming and problematic.
InterSystems recommends that you do not turn this setting on without making sure that the computer is connected to an uninterruptable power supply. If the storage is an external device, that device would also need to be connected to a UPS.