Question
· Jun 10

Question about database defragmentation (several TB database)

I have a database that is 2TB in size (from a LIVE system).

I would like to defragment it but I have some questions. I took a look at official documentation but it does not help.

  • Is is possible to get a measurement about how much database is fragmented ? (eg: a percentage). Most operating systems are able to give a an indication about how much a filesystem is fragmented before any operation.
  • The database is on a LIVE server, the IRIS service is restarted every night (around 1AM). Because of that I can't run defragment continuously. I don't think it will be able to complete under 24 hours. If IRIS service is stopped while defragment is running, will it handle it properly or should defragment be stopped before ?
  • If defragmentation is run once, then stopped, then run again and so on, will there any progress made over the time ? (thus achieving a good defragmentation at some point).

InterSystems recommends do to run the defragmentation outside peak hours.
Based on what is written above, does it make sense to run defrag every night (eg: during 1AM - 6AM, outside peak hours) for several days ? 

Product version: IRIS 2024.1
$ZV: IRIS for Windows (x86-64) 2024.1.2 (Build 398U) Thu Oct 3 2024 14:01:59 EDT
Discussion (10)2
Log in or sign up to continue

The database is on a SSD/NVMe drive. The impact of random access vs sequential on SSD is less than HDD but it's not neglectable. Run a CrystalDiskMark benchmark on any SSD and you will find out that the random access is slower than sequential one. 

This image summarize it well : 
r/computing - SATA HDD vs SATA SSD vs SATA NVMe CrystalDiskMark results
 

Why I want to defragment the database: I found out that the length of the I/O write queue on the database drive goes quite high (up to 35). The drives holding the journals and WIJ have much lower maximum write queue length (it never get higher than 2) while the amount of data being written is the same (the peaks are about 400MB/s). The difference is database is random access while WIJ and journals are pretty much sequential.

Ages ago I've impleted project named BlocksExplorer, with a various of ways of using it

One of them is generating a picture of of the database, and visualization of defragmentation.

I think I tried it once on 2TB Database, but that was long time ago.

Anyway, this tool generates picture in BMP format, where each pixel represents each block in the database, so, for 

For 2TB database, it will be around 268435456 blocks, that will be a picture 16384x16384, and in BMP with a 3 bytes per pixel, it will be around 768MB for this picture, quite big one

This project is fully open, you can try to modify it a bit, and maybe if you do it per global, you will be able to make a map

I have checked your project and have extracted the logic of this function (almost 1:1 copy paste). It works for small databases (few GB in size). I can calculate a fragmentation percentage easily (by checking consecutive blocks of same globals). But for bigger databases (TB in size) it does not work as it only enumerate a small percentage of all globals. It seems the global catalog is quite big and split on multiple blocks (usually it's at block #3).

EDIT : there is a "Link Block" pointer to follow :

Hi Norman,

we need to seperate 2 areas of fragmentation.
1. filesystem/OS level fragmentation
     nothing that we can do anything about it except running your trusted defrag if the filesystem has one and actually is in need of defragging.
2. database/global fragmentation:
     This is a very interesting topic, usually nothing needs to be done for an IRIS database, IRIS is pretty good in managing global block density. (refer to https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls...)

You can use the output of an integrity check to see how your global density is per global in the database. Both defrag and compact operations are non-destructive and non interruptive, so even if they don't finish they can just be started again and will continue on.

Hello Norman,

1. to build a sync mirror with the server, then you can "set no failover" on primary" and "play" withe the backup, you could take IRIS down in any time on the backup, it will re-sync when its up again

2. O/S defragmentation - either by O/S tool, or to copy the DB to a new clean disk

3. Internal defragmentation - there are some options:
A) to copy all data with GBLOCKCOPY to a new DB, with 1 process, to keep data as sequential as possible (will take a lot of time)
B) to use the internal defragmentation tool (will need 2 x space on that DB) then compact globals

4. In general, its a good practice to split indices to a different DB (and use mapping) since the data ("D" globals) are usually not getting so fragmented over time. This will also help on future defragmentation, compact globals, DB "shrink" and other maintenance on less volume 

I wrote a script that calculate database fragmentation level (between 0 and 100%). The idea is to fetch all blocks, to find which global they belongs to, and then count how many segments exists (one sequence being a set of consecutive blocks belonging to same global (eg: in AAAADDBBAACCC there is 5 segments). It's based on Dmitry BlocksExplorer open source project. The formula is as such :

Fragmentation % = (TotalSegments - GlobalCount) / (TotalBlocks - GlobalCount)
Blocks Formula Fragmentation
AAAAAACCBBDD (best) (4-4) / (13-4) 0%
AAAADDBBAACCC  (5-4) / (13-4) 11%
ACADDAABBAACC (8-4) / (13-4) 44%
ACABADADBACAC (worst) (13-4) / (13-4) 100%
    ///usage: do ..ReadBlocks("D:\YOUR_DATABASE\")
    ClassMethod ReadBlocks(path As %String)
    {
        new $namespace
        znspace "%sys"
        //get total amount of blocks
        set db = ##class(SYS.Database).%OpenId(path)
        set totalblocks = db.Blocks
        set db = ""
        set blockcount = 0
        open 63:"^^"_path	
        set ^TEMP("DEFRAG", "NODES", 3)=$listbuild("", 0)	
        while $data(^TEMP("DEFRAG", "NODES"))=10 //any childs
        {
            set blockId = ""
            for
            {		
                set blockId = $order(^TEMP("DEFRAG", "NODES", blockId),1,node)
                quit:blockId=""
                kill ^TEMP("DEFRAG", "NODES", blockId)
                
                set globalname = $lg(node,1)
                set hasLong = $lg(node,2)
                
                do:blockId'=0 ..ReadBlock(blockId, globalname, hasLong, .totalblocks, .blockcount)			
            }	
        }
        close 63
        set ^TEMP("DEFRAG","PROGRESS") = "DONE"
        do ..CalculateFragmentation()
    }
    ClassMethod ReadBlock(blockId As %String, globalname As %String, hasLong As %Boolean, ByRef totalblocks As %Integer, ByRef blockcount As %Integer)
    {
        view blockId   	
        set blockType=$view(4,0,1)
        
        if blockType=8  //data block
        {
            if hasLong 
            {
                for N=1:1
                {
                    set X=$VIEW(N*2,-6)
                    quit:X=""
                    set gdview=$ascii(X)
                    if $listfind($listbuild(5,7,3),gdview) 
                    {
                        set cnt=$piece(X,",",2)
                        set blocks=$piece(X,",",4,*)
                        for i=1:1:cnt 
                        {
                            set nextBlock=$piece(X,",",3+i)
                            
                            set ^TEMP("DEFRAG","GLOBAL",nextBlock) = globalname		
                            set blockcount = blockcount + 1 //update progress
                            set ^TEMP("DEFRAG","PROGRESS") = $number(blockcount / totalblocks * 100, 2)
                        }
                    }
                }
            }
        } 
        else //block of pointers
        {		
            if blockType = 9 //catalog
            {
                set nextglobal=$view(8,0,4)	//large catalogs might spawn on multiple blocks
                quit:$data(^TEMP("DEFRAG","GLOBAL",nextglobal))
                set:nextglobal'=0 ^TEMP("DEFRAG", "NODES", nextglobal) = $listbuild("",0) //next catalog
            }
            
            for N=1:1
            {
                set X=$VIEW(N-1*2+1,-6)
                quit:X=""
                set nextBlock=$VIEW(N*2,-5)
                if blockType=9 set globalname=X
                set haslong=0
                if $piece($view(N*2,-6),",",1) 
                {
                    set haslong=1
                }
                
                continue:$data(^TEMP("DEFRAG","GLOBAL",nextBlock) )//already seen?
                set ^TEMP("DEFRAG", "NODES", nextBlock) = $listbuild(globalname,haslong)				
                set ^TEMP("DEFRAG","GLOBAL",nextBlock) = globalname	
                set blockcount = blockcount + 1
                set ^TEMP("DEFRAG","PROGRESS") = $number(blockcount / totalblocks * 100, 2)
            }
        }
    }
    ClassMethod CalculateFragmentation()
    {
        set segments = 0, blocks = 0, blocktypes = 0
        kill ^TEMP("DEFRAG", "UNIQUE")
        
        set previousglobal = ""
        set key = ""
        for
        {	    
            set key = $order(^TEMP("DEFRAG","GLOBAL",key),1,global)
            quit:key=""
            if global '= previousglobal
            {	   		   	
                set previousglobal = global		   		
                set segments = segments + 1
            }
            
            if '$data(^TEMP("DEFRAG", "UNIQUE", global))	
            {
                set ^TEMP("DEFRAG", "UNIQUE", global)=""
                set blocktypes = blocktypes + 1
            }
            
            set blocks = blocks + 1
        }
        write $number((segments - blocktypes) / (blocks - blocktypes) * 100, 2)
    }

    Notes : 

    • Use it at your own risks. It's not supposed to write anything in database (doing only read operations) but I'm unfamiliar with the VIEW command and it's possible caveats.
    • This might take a really long time to complete (several hours), especially if database is huge (TB in size). Progress can be checked by reading ^TEMP("DEFRAG","PROGRESS") node.