How Tax Service, OpenStreetMap, and InterSystems IRIS
could help developers get clean addresses
Pieter Brueghel the Younger, Paying the Tax (The Tax Collector), 1640
In my previous article, we just skimmed the surface of objects. Let's continue our reconnaissance. Today's topic is a tough one. It's not quite BIG DATA, but it's still the data not easy to work with: we're talking about fairly large amounts of data. It won't all fit into RAM at once, and some of it won't even fit on the drive (not due to lack of space, but because there's a lot of junk). The name of our subject is FIAS DB: the Federal Information Address System database - the databases of addresses in Russia. The archive is 5.5 GB. And it's a compressed XML file. After extraction, it will be a full 53 GB (set aside 110 GB for extraction). And when you start to parse and convert it, that 110 GB won't be enough. There won't be enough RAM either.