Article
· 14 hr ago 3m read

Sample for Beginners with Streams in IRIS

To better understand Streams in IRIS I'll start with a short

History

In the beginning (before IRIS), there was just basic access to external devices.
The 4 commands OPEN, CLOSE, READ, WRITE still work and are documented 
by  Introduction to I/O in detail.
Especially for files, this is a direct access to your actual file system.
You have to take care of any status or other signal in your code.  
Also, any code conversion or similar is up to you. 

Class %Library.File aka %File offers a large collection of methods and queries 
for standard operations on directories and files.
READ, WRITE is there but the content is not touched.

Now we have reached %Stream Classes.
The major difference from before is that they are oriented to the content.
In addition, streams pass the  MAXSTRING limit of 3,641,144 characters.

Streams are typed by storage location (Global, File, Tmp, Null, Dynamic)
and content (Character, Binary) and features, such as Gzip or Compressed.
The difference between file character streams and file binary streams is
that the character stream understands that it is writing character data
and this may be subject to character set translation.
In addition, line terminators for Windows and Unix are adjusted. 

The example

There is an HTML page with an embedded table.
The exercise is to extract all rows from the red-marked table for further processing.
embedded in this HTML page. (most likely generated with DRUPAL)

Preparation

  • Call the page of interest in your browser
  • with <CRTL>+S  store  it in a directory that can be accessed from IRIS
    • in the demo this is my local directory mapped to container as /ext
  • you now have a source for a %Stream.FileCharacter object.

Step 1

Set up your Stream object

 set file="/ext/Stream.html"
 set stream=##class(%Stream.FileCharacter).%New()
 set sc=stream.LinkToFile(file)

Step 2

Find your table

  • you can skip the HTML <head> . . . </head> part
  • also the framing around the <table> . . . </table> part is just noise
  • this also applies for the column headers <thead> . . . </thead>
  • row content starts after the <tbody> tag
  • So you search for it

Searching in streams works with the same logic as $FIND() in ObjectScript
The source to check is your stream, then a start location, and a string to find  
In addition, you can switch off case sensitivity.
Remember, this is a Character stream !

set row=stream.FindAt(1,"<tbody",,1)
if row<0 return '$$$OK

Step 3

Now you begin to loop over the rows. 
The characteristic start is indicated in HTML by <tr . . .  </tr
The class method FindAt() has the nice feature not just to provide
the location but also the remaining characters in the source buffer. 
In this demo example, it always contains the full row.
Identifying the end by HTML tag </tbody is easy.

set row=stream.FindAt(row+1,"<tr",.temp,1)
set txt=$piece(temp,"</tr")

Step 4

Next, the inner loop over the columns between <td .. . </td follows
As you have the complete row test in your hands, this is just normal ObjectScript

Note 1:
In the example, the content of the columns is overloaded by DRUPAlL with
a lot of formatting and other control code. So the content could only be
estimated by characteristic sequences.  e.g. href= 

Note 2:
To verify the result and to visualize it, I display the values and keep them
also in a PPG for further use if required.  Enjoy trying it.

Demo code available on Github
Enjoy trying it. 
 

Discussion (2)2
Log in or sign up to continue