Question
Steve Pisani · Aug 31, 2017

iKnow Disk space requirements

Hi, 

Has anyone ever estimated the amount of disk space consumed by the iKnow indexing process ?  I know this will be a rough estimate, but, I imagine that for sizing purposes, that would be enough.

The language the unstructured text is in is English.

thanks in advance - 

Steve

0
0 246
Discussion (4)1
Log in or sign up to continue

iKnow or iFind?

If iFind, then what index?

It depends on your corpus, and diversity of concepts encountered there.

I have 1:1 as corpus size:index size as a baseline, but it all depends on many factors.

Thanks Benjamin - that's exactly what I was after.

Hi Steve,

hadn't seen this question until just now, but I have to admit we're a bit storage-hungry with iKnow. If you generate the default full set of indices, for a moderately-sized domain you'll need up to 25x the original dataset size measured as raw text to fit everything. This can drop to half that size (12x) if you forsake all non-essential indices, but that will prevent a number of queries from running smoothly or, in some cases, disable them completely.

For iFind, the numbers are dependent on the type of index. Count on factors 2x, 7x and 15x for Basic, Semantic and Analytic indices, respectively. Of course there's a difference in functionality between all these options and it's best to start from a set of functional requirements and then look at which particular approach covers those.

These numbers are somewhat conservative maximums and, as Eduard already suggested, you may see different (lower) numbers depending on the nature of your data. A more detailed sizing guide is available on request.

Thanks,
benjamin

Hi Steve,

It also depends if you are indexing all the sources at once or a bunch per day  : typically, a batch load will grow your database, and will leave it 50% empty at the end, after deleting  temporary storage. This is not a problem if you load new sources regularly, since this empty space will be used the next time.

If these temporary iKnow globals are mapped to cachetemp, dont forget to take this needed disk space into account as well, even if it will be released after restart. (especially since cachetemp is by default installed on the same drive as Caché, and could have less free disk space as other drives where you put your databases)