The .Document table is associated to the .Aggregation table via a parent/child relationship, because there may be any number of documents associated with a single aggregation key.  There's no inefficiency here, simply necessary cardinality.  For that matter, it's not like the .Document table contains the document - it just contains a pointer to the document.

The "documents" that you are finding with your query are not documents, but are SDA record requests.  If you are looking for retrievals of IHE documents from a document repository, the event type to use is "Retrieve Document Set".  You didn't mention what version of HealthShare you are running, but in recent versions, the ATNA repository has a relationship to a child table, HS_IHE_ATNA_Repository.Document, and you can join to that.  It has a RepositoryID and DocumentID property.

Storage depends on where you store the data (object, tables), not where you store classes.  The classes that manage these all reside in the HSLIB database.

Don't know if you are asking about Health Connect or Information Exchange.  If it's the latter, this is managed by where you set up an edge gateway namespace & database to handle the HL7v2 inputs, vs. where you set up the IHE XDSb repository and namespace which stores documents that have been provided and registered.  They would all be managed by the single common registry namespace and database.  If it's the former, it's completely up to you about how to store the data.  You would store them in different globals, and your namespace can map the globals to different databases.

Most of what you are looking for is already captured in the ATNA audit repository, which in current versions lives in the Registry namespace (in future versions we'll allow that to move to its own namespace/instance).  There is a DeepSee cube associated with this, as well as a few dashboards.  You can create your own SQL queries.  We also provide a framework for creating your own reports, including a set of "data volume" reports which can run using this as well as other data, and can incrementally capture dated subsets of this for data comparisons.  This is also useful in case the repository grows to a very large size, where SQL queries become less practical.

That isn't automatically generated - it's used as a source to generate the SDA/CDA annotations.  From any edge gateway namespace, in the management portal, select the Schema Documentation menu item.  You this lets you browse the transforms.  Once you have identified an SDA property, the class documentation points out what table/column that maps to in the viewer cache.