· Nov 8, 2016 4m read

Returning Disk Space to File Systems

This is the first article in a series discussing how to regain disk space from Caché databases at the operating system level. This introductory article discusses Caché database growth and gives an overview of various methods you can use to return unused disk space that is allocated to database files back to the file system. But before we talk about returning space to the file system, let’s first review how does it get allocated in the first place.

Caché databases always have the name CACHE.DAT and can reside on practically any file system supported by the underlying operating system, including network-mounted disks (NFS). The initial size of a new Caché database is 1 MB by default, but this can be changed at creation time. You can create new databases via the browser-based Management Portal or using the ^DATABASE command line utility, as well as programmatically through an API.

As new global data is stored in an individual Caché database, naturally the free space inside that file decreases. If there are enough free blocks to hold that data (the way global data is stored inside the database file is beyond the scope of this article), then the size of the CACHE.DAT file will not change at the operating system level. However, in the case that there are no more free blocks to hold the new data, Caché will automatically expand the CACHE.DAT file by allocating more disk blocks to it from the available space on the file system. You can control the amount of disk space by which the CACHE.DAT file will be expanded or you can let Caché determine the amount dynamically, based on the current size of the database file.

There are two main scenarios where this expansion could fail. The obvious one is that there simply is not enough free disk space on the underlying file system for the required database expansion. The other case is when the maximum size of a database file is limited by Caché; you can set this maximum size to be “unlimited” (i.e. only limited by the space available on the file system), or you can direct Caché not to expand a given database beyond a selected value, say 1.5 TB. Note that there are also some other conditions that can prevent database expansion, such as an architectural maximum size for a database file, which for a typical (so-called “8K block”) Caché database is 32 TB.

Depending on the nature of your application, it may be a good idea to define both the maximum size and expansion size for your database(s). The larger the database is created initially, the more data it can hold without expanding, but the creation time would be longer. Similarly, the larger the expansion size is, the fewer and less frequent expansions would be needed, but each expansion would take longer to complete. As said before, the optimal values for these parameters depend on the application, and usually can only be determined after an application has been live for a while. Fortunately, both of these values can be changed at any time during the life of a database.

So if a database can automatically grow when new space is needed, will it also automatically shrink when global data is deleted (killed) and free space inside the database increases? Unfortunately the answer is no:  once allocated, disk space does not automatically get returned to the file system. But fear not, InterSystems provides tools that let you do just that, when needed. There are two tools that can be used: GBLOCKCOPY and “Compact & Truncate”.

GBLOCKCOPY is a command line utility that can be used to copy global data from one database to another. While the tool provides a lot of options and can be used in many different ways, for the purpose of regaining disk space you can simply copy all the globals from the source database (that you want to shrink) into a new database file. Since only the data from the source database will be copied to the target one, the size of the new database will reflect the actual space used in the source one. After the copy, you can change your configuration to point to the new, smaller database, and delete the old, larger one. The obvious drawback of this method is that you will temporarily need additional disk space to hold the new database. Secondly, since you cannot run this utility on a live database, you will need to operate on a backup copy and afterwards apply journals from the live system. Finally, there would be a brief downtime for your application while you are switching the databases. Alternatively, you could dismount the source database and perform the database copy while your application is off-line, if that is an option.

The database Compact and Truncate operations can be used from the Management Portal, or called programmatically using an API. They work in tandem and need to be used in this order. The compact operation moves unused blocks to the end of the database file, so that your free space is all in one place. The truncate operation can then be used to return some or all of this free space file back to the operating system by truncating the file. The beauty of this approach is that it can  be done on a live system while users are accessing data through your application, and that it does not require any additional disk space nor any downtime for your application. However, this functionality is only available in more recent versions of Caché.

The next articles in this series will discuss the two scenarios outlined above in more detail and will provide examples on the usage of these tools. Later on we will cover some other issues related to disk space management with your Caché system, such as the database journal files.

So please stay tuned…

Discussion (0)0
Log in or sign up to continue