Article
Robert Cemper · Apr 26, 2020 5m read

Materialized Views

A VIEW in SQL is basically a prepared SQL statement.
It has to be executed and assembled like any other SQL query.
MATERIALIZED VIEW means that the content is collected before hands and can be retrieved rather fast.
I saw the concept first with my favorite competitor named O* and they made a lot of noise about it.

  { favorite: because I could win every benchmark against them devil }

Depending on the individual construction of this materialized views updates and maintenance might be required.

In Caché / IRIS this exists almost since ever and we take this as normal and a given fact.
Most developers are just not aware of it and with a little bit of polishing it can be presented as an excellent feature.
In addition, any update and maintenance happen as a built-in functionality with no extra effort.


See this example:
In our famous Sample.Person class in namespace SAMPLES I have extended and rebuilt his index.

/// Define an index for <property>Name</property>.
Index NameIDX On Name [ Data = (Name, Home.State, SSN) ];

And being experienced and comfortable with the query generator you know that

SELECT ID, Name, Home_State, SSN   from Sample.Person 

will run fast across just using the index global  ^Sample.PersonI("NameIDX")
and never touch you data global.

That's basically the functionality of a Materialized View and update is implicit.

Defined as VIEW (from MgmtPortal as Studio isn't so handy) you get this class.

Class Sample.Person.NameView [ ClassType = view
                             ,
CompileAfter = Sample.Person
                             , DdlAllowed
                             ,  Not ProcedureBlock
                            
, SqlTableName = NameView
                             , ViewQuery = { SELECT ID
                                                  , Name
                                                  , Home_State
                                                 
, SSN
                                                   
from Sample.Person  
} ]
       { Parameter READONLY = 1; }

But if you want a little bit more comfort like a backlink to your data  you can map the index global itself .
So you can apply implicit JOIN syntax and have a fully functional table as here

SELECT Name, BaseClass->DOB, HomeState, SSN,"%CLASSNAME",BaseClass FROM Sample_Person.NameIDX


and here's the class definition and you have to design it manually

/// mapped index
/// Index NameIDX On Name [ Data = (Name, Home.State, SSN) ];
Class Sample.Person.NameIDX Extends %Persistent [ Final ]
{
Property IndexName As %String [ InitialExpression = "NameIDX", ReadOnly ];
Property SQLUPPERname As %String [ ReadOnly ];
Property BaseClass As Sample.Person [ ReadOnly ];

Index min On (IndexName, SQLUPPERname, BaseClass) [ IdKey ];

/// Classname of Index Source
Property %CLASSNAME As %String [ ReadOnly ];
/// Person's name.
Property Name As %String [ ReadOnly ];
/// Person's home address. This uses an embedded object.
Property HomeState As %String [ ReadOnly ];
/// Person's Social Security number. This is validated using pattern match.
Property SSN As %String(PATTERN = "3N1""-""2N1""-""4N") [ ReadOnly ];

Parameter READONLY = 1;
Parameter MANAGEDEXTENT As INTEGER = 0;

Storage Default
{
 <Data name="NameIDXDefaultData">
  <Value name="1">
    <Value>%CLASSNAME</Value>
  </Value>
  <Value name="2">
    <Value>Name</Value>
  </Value>
  <Value name="3">
    <Value>HomeState</Value>
  </Value>
  <Value name="4">
    <Value>SSN</Value>
  </Value>
 </Data>
 <DataLocation>^Sample.PersonI</DataLocation>
 <DefaultData>NameIDXDefaultData</DefaultData>
 <IdLocation>^Sample.Person.NameIDXD</IdLocation>
 <IndexLocation>^Sample.Person.NameIDXI</IndexLocation>
 <StreamLocation>^Sample.Person.NameIDXS</StreamLocation>
 <Type>%Library.CacheStorage</Type>
}

 

This is a coding example working on Caché 2018.1.3 and IRIS 2020.2 
It will not be kept in sync with new versions 
It is also NOT serviced by InterSystems Support !

 

120
2 1 3 398
Log in or sign up to continue

Thank you Robert,

Excellent article.
In fact we use this approach when we need to gain speed for queries that run on hundreds of millions records indexes and need to check few items only, so we save checking the "base" class by using [ DATA ... ]  

In addition, to gain more speed, which is so essential in huge queries, we use the (good, old) $order on the index global itself to gain more speed. This is much faster than normal SQL. 

Hi Robert, 

Thanks for this article. 

How you would go about creating a materialised view for an existing SQL view, which joins quite a lot of tables (combination of inner / left and right joins) to produce it's output? 

We have an existing view in our cache application that takes around 4 hours to produce it's output (it joins around 20/30 different tables and outputs about 300 fields). 4 hours is after having undertaken performance optimisations like adding indexes and tuning tables. Our application has a very high throughput of data.

I am working on creating a materialised version of my aforementioned view and wondering if I have to add data indexes to each of the 20 tables or if there is better way?

Will appreciate any code you might be able to share.

Regards

20/30 tables and 300 columns is quite a heavy query / view.
VIEW is only a stored SELECT. The same considerations apply. 
#1)
you should have an index for each column that appears in a JOIN ON..   on WHERE...
and that's hopefully just 1 index by table
#2) for each table the required columns should go as Data into the index
You are perfect if the query analyzer only shows access to indices.

My proposal only covered views of a single table with indices

Materializing a multi-table view as you describe it would require a hidden table with all JOINs
and no filtering WHERE clause. And you would require extra code to keep it up to date.
This is a job that the query generator does in an excellent way and with all tricks available.

I wouldn't say it's impossible. And the final query is definitely faster. 
But it may take weeks or months to get it running. Not to talk about maintenance.
Analytics/DeepSee acts in a similar way. And I can prove: it wasn't done in an afternoon session.