Question about database defragmentation (several TB database)

Question

Question

#Databases #Performance #InterSystems IRIS

I have a database that is 2TB in size (from a LIVE system).

I would like to defragment it but I have some questions. I took a look at official documentation but it does not help.

Is is possible to get a measurement about how much database is fragmented ? (eg: a percentage). Most operating systems are able to give a an indication about how much a filesystem is fragmented before any operation.
The database is on a LIVE server, the IRIS service is restarted every night (around 1AM). Because of that I can't run defragment continuously. I don't think it will be able to complete under 24 hours. If IRIS service is stopped while defragment is running, will it handle it properly or should defragment be stopped before ?
If defragmentation is run once, then stopped, then run again and so on, will there any progress made over the time ? (thus achieving a good defragmentation at some point).

InterSystems recommends do to run the defragmentation outside peak hours.
Based on what is written above, does it make sense to run defrag every night (eg: during 1AM - 6AM, outside peak hours) for several days ?

Product version: IRIS 2024.1

$ZV: IRIS for Windows (x86-64) 2024.1.2 (Build 398U) Thu Oct 3 2024 14:01:59 EDT

Discussion (10)2

Log in or sign up to continue

Norman W. Freeman · Jun 27

I wrote a script that calculate database fragmentation level (between 0 and 100%). The idea is to fetch all blocks, to find which global they belongs to, and then count how many segments exists (one sequence being a set of consecutive blocks belonging to same global (eg: in AAAADDBBAACCC there is 5 segments). It's based on Dmitry BlocksExplorer open source project. The formula is as such :

Fragmentation % = (TotalSegments - GlobalCount) / (TotalBlocks - GlobalCount)

Blocks	Formula	Fragmentation
AAAAAACCBBDD (best)	(4-4) / (13-4)	0%
AAAADDBBAACCC	(5-4) / (13-4)	11%
ACADDAABBAACC	(8-4) / (13-4)	44%
ACABADADBACAC (worst)	(13-4) / (13-4)	100%

///usage: do ..ReadBlocks("D:\YOUR_DATABASE\")
ClassMethod ReadBlocks(path As %String)
{
    new $namespace
    znspace "%sys"
    //get total amount of blocks
    set db = ##class(SYS.Database).%OpenId(path)
    set totalblocks = db.Blocks
    set db = ""
    set blockcount = 0
    open 63:"^^"_path	
    set ^TEMP("DEFRAG", "NODES", 3)=$listbuild("", 0)	
    while $data(^TEMP("DEFRAG", "NODES"))=10 //any childs
    {
        set blockId = ""
        for
        {		
            set blockId = $order(^TEMP("DEFRAG", "NODES", blockId),1,node)
            quit:blockId=""
            kill ^TEMP("DEFRAG", "NODES", blockId)
            
            set globalname = $lg(node,1)
            set hasLong = $lg(node,2)
            
            do:blockId'=0 ..ReadBlock(blockId, globalname, hasLong, .totalblocks, .blockcount)			
        }	
    }
    close 63
    set ^TEMP("DEFRAG","PROGRESS") = "DONE"
    do ..CalculateFragmentation()
}

ClassMethod ReadBlock(blockId As %String, globalname As %String, hasLong As %Boolean, ByRef totalblocks As %Integer, ByRef blockcount As %Integer)
{
    view blockId   	
    set blockType=$view(4,0,1)
    
    if blockType=8  //data block
    {
        if hasLong 
        {
            for N=1:1
            {
                set X=$VIEW(N*2,-6)
                quit:X=""
                set gdview=$ascii(X)
                if $listfind($listbuild(5,7,3),gdview) 
                {
                    set cnt=$piece(X,",",2)
                    set blocks=$piece(X,",",4,*)
                    for i=1:1:cnt 
                    {
                        set nextBlock=$piece(X,",",3+i)
                        
                        set ^TEMP("DEFRAG","GLOBAL",nextBlock) = globalname		
                        set blockcount = blockcount + 1 //update progress
                        set ^TEMP("DEFRAG","PROGRESS") = $number(blockcount / totalblocks * 100, 2)
                    }
                }
            }
        }
    } 
    else //block of pointers
    {		
        if blockType = 9 //catalog
        {
            set nextglobal=$view(8,0,4)	//large catalogs might spawn on multiple blocks
            quit:$data(^TEMP("DEFRAG","GLOBAL",nextglobal))
            set:nextglobal'=0 ^TEMP("DEFRAG", "NODES", nextglobal) = $listbuild("",0) //next catalog
        }
        
        for N=1:1
        {
            set X=$VIEW(N-1*2+1,-6)
            quit:X=""
            set nextBlock=$VIEW(N*2,-5)
            if blockType=9 set globalname=X
            set haslong=0
            if $piece($view(N*2,-6),",",1) 
            {
                set haslong=1
            }
            
            continue:$data(^TEMP("DEFRAG","GLOBAL",nextBlock) )//already seen?
            set ^TEMP("DEFRAG", "NODES", nextBlock) = $listbuild(globalname,haslong)				
            set ^TEMP("DEFRAG","GLOBAL",nextBlock) = globalname	
            set blockcount = blockcount + 1
            set ^TEMP("DEFRAG","PROGRESS") = $number(blockcount / totalblocks * 100, 2)
        }
    }
}

ClassMethod CalculateFragmentation()
{
    set segments = 0, blocks = 0, blocktypes = 0
    kill ^TEMP("DEFRAG", "UNIQUE")
    
    set previousglobal = ""
    set key = ""
    for
    {	    
        set key = $order(^TEMP("DEFRAG","GLOBAL",key),1,global)
        quit:key=""
        if global '= previousglobal
        {	   		   	
            set previousglobal = global		   		
            set segments = segments + 1
        }
        
        if '$data(^TEMP("DEFRAG", "UNIQUE", global))	
        {
            set ^TEMP("DEFRAG", "UNIQUE", global)=""
            set blocktypes = blocktypes + 1
        }
        
        set blocks = blocks + 1
    }
    write $number((segments - blocktypes) / (blocks - blocktypes) * 100, 2)
}

Notes :

Use it at your own risks. It's not supposed to write anything in database (doing only read operations) but I'm unfamiliar with the VIEW command and it's possible caveats.
This might take a really long time to complete (several hours), especially if database is huge (TB in size). Progress can be checked by reading ^TEMP("DEFRAG","PROGRESS") node.

1 0

score 1 · Answer 1 · 2025-06-10T07:51:52-04:00

It takes a little more than 100% more space to defragment a file.
I am not sure there are gains if you are on SSD/NVMe drives and if you are not I would do that instead.
One option if you want to do this would be do it on a DR mirror where you could keep it running.

score 0 · Answer 2 · 2025-06-10T08:51:04-04:00

The database is on a SSD/NVMe drive. The impact of random access vs sequential on SSD is less than HDD but it's not neglectable. Run a CrystalDiskMark benchmark on any SSD and you will find out that the random access is slower than sequential one.

This image summarize it well :
r/computing - SATA HDD vs SATA SSD vs SATA NVMe CrystalDiskMark results

Why I want to defragment the database: I found out that the length of the I/O write queue on the database drive goes quite high (up to 35). The drives holding the journals and WIJ have much lower maximum write queue length (it never get higher than 2) while the amount of data being written is the same (the peaks are about 400MB/s). The difference is database is random access while WIJ and journals are pretty much sequential.

score 0 · Answer 3 · 2025-06-10T16:36:32-04:00

Alexander Pettitt · Jun 10

Journals and the WIJ use different sizes for writes. Look at Storage Configuration.

0 0

score 1 · Answer 4 · 2025-06-10T08:07:00-04:00

Ages ago I've impleted project named BlocksExplorer, with a various of ways of using it

One of them is generating a picture of of the database, and visualization of defragmentation.

I think I tried it once on 2TB Database, but that was long time ago.

Anyway, this tool generates picture in BMP format, where each pixel represents each block in the database, so, for

For 2TB database, it will be around 268435456 blocks, that will be a picture 16384x16384, and in BMP with a 3 bytes per pixel, it will be around 768MB for this picture, quite big one

This project is fully open, you can try to modify it a bit, and maybe if you do it per global, you will be able to make a map

score 0 · Answer 5 · 2025-06-10T09:05:44-04:00

Thanks. I remember seeing it before, while looking for info about database internals (it was a long series of community pages you wrote). I will try to make that BlocksExplorer work and will post results.

score 0 · Answer 6 · 2025-06-10T15:05:51-04:00

I have checked your project and have extracted the logic of this function (almost 1:1 copy paste). It works for small databases (few GB in size). I can calculate a fragmentation percentage easily (by checking consecutive blocks of same globals). But for bigger databases (TB in size) it does not work as it only enumerate a small percentage of all globals. It seems the global catalog is quite big and split on multiple blocks (usually it's at block #3).

EDIT : there is a "Link Block" pointer to follow :

score 0 · Answer 7 · 2025-06-10T20:48:15-04:00

Hi Norman,

we need to seperate 2 areas of fragmentation.
1. filesystem/OS level fragmentation
nothing that we can do anything about it except running your trusted defrag if the filesystem has one and actually is in need of defragging.
2. database/global fragmentation:
This is a very interesting topic, usually nothing needs to be done for an IRIS database, IRIS is pretty good in managing global block density. (refer to https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls...)

You can use the output of an integrity check to see how your global density is per global in the database. Both defrag and compact operations are non-destructive and non interruptive, so even if they don't finish they can just be started again and will continue on.

score 0 · Answer 8 · 2025-06-12T11:13:48-04:00

Hello Norman,

1. to build a sync mirror with the server, then you can "set no failover" on primary" and "play" withe the backup, you could take IRIS down in any time on the backup, it will re-sync when its up again

2. O/S defragmentation - either by O/S tool, or to copy the DB to a new clean disk

3. Internal defragmentation - there are some options:
A) to copy all data with GBLOCKCOPY to a new DB, with 1 process, to keep data as sequential as possible (will take a lot of time)
B) to use the internal defragmentation tool (will need 2 x space on that DB) then compact globals

4. In general, its a good practice to split indices to a different DB (and use mapping) since the data ("D" globals) are usually not getting so fragmented over time. This will also help on future defragmentation, compact globals, DB "shrink" and other maintenance on less volume

score 0 · Answer 9 · 2025-06-12T11:47:07-04:00

Alexander Pettitt · Jun 12

I wonder if the DataMove feature could be used.

That could be like the GBLOCKCOPY but without the downtime.

0 0