Question
· Feb 14, 2018

stream.FindAt() stops processing after certain number of characters.

I have a 2GB CSP.log file that I need to investigate.

I'm using a %Stream.FileCharacter() object to open the file and then using the .FindAt() to search for a particular string.

I'm seeing the FindAt() stop after processing 49m characters?

Here is my code:

k
s stream=##class(%Stream.FileCharacter).%New()
d stream.LinkToFile("d:\csp.log")
s x=""
s i=0
s j=0
w stream.Rewind()
w stream.AtEnd
w stream.SizeGet()
while(stream.AtEnd=0){set i=stream.FindAt(-1,"Invalid password",x)+i  set j=j+1}
w stream.AtEnd
w i
w j

Which gives the following output:

USER>k
USER>s stream=##class(%Stream.FileCharacter).%New()
USER>d stream.LinkToFile("d:\csp.log")
USER>s x=""
USER>s i=0
USER>w stream.Rewind()
1
USER>w stream.AtEnd
0
USER>w stream.SizeGet()
2491024949
USER>while(stream.AtEnd=0){set i=stream.FindAt(-1,"Invalid password",x)+i}
USER>w stream.AtEnd
1
USER>w i
49134354
USER>w j
124520

My question is why is the AtEnd being set when we are only at character 49134354? I know that the string "Invalid password" exists beyond this point.

Note, I have inspected the file using glogg as well and this shows some 3million instances of "Invalid password" whereas Cache ObjectScript can only find 124520 instances before the AtEnd flag is being set.

Reading the number of lines in the file (stream.ReadLine()) can count the correct number of lines before AtEnd is set.

Discussion (7)0
Log in or sign up to continue

your code: from Docs:

set i=stream.FindAt(-1,"Invalid password",x)+i

from Docs:
 

Find the first occurrence of target in the stream starting the search at position.
It returns the position at this match starting at the beginning of the stream.
If it does not find the target string then return -1.
If position=-1 then start searching from the current location and just return the offset from the last search,
useful for searching through the entire file.
If you are doing this you should pass in tmpstr by reference in every call which is used as
a temporary location to store information being read so the next call will start where the last one left off.
If you pass caseinsensitive=1 then the search will be case insensitive rather than the default case sensitive search.

 
So your line should work like this
while(stream.AtEnd=0){set i=stream.FindAt(-1,"Invalid password",.x)+i}
-----------------------------------------------------------------^
PASS BY REFERENCE should do the trick

Robert,

Well spotted on the "pass by reference", my mistake.

I updated my code and it looks a lot better, but is still not quite giving me what I expect.

The final output now shows:

USER>w stream.SizeGet()
2491024949
USER>while(stream.AtEnd=0){set i=stream.FindAt(-1,"Invalid password",.x)+i  set j=j+1}
USER>w stream.AtEnd
1
USER>w i
2442920326
USER>w j
3205553
USER>

So it's still not quite going to the end of the file (see SizeGet() and i output above) and it is missing the last 5 "Invalid password" entries (GLOGG is showing 3205558 instances).

Let me know if there is any further information that I can send on that might help.

I'll keep investigating from my side as well.

Oliver.

it says:

If it does not find the target string then return -1

So what you get in i is the last start of your search string

              Which is 2491024949 - 2442920326 = 48104623 from end.
It's almost the same as your first occurrence at 49134354. Looks feasible.

To get the file size as you expect the LAST search string must have been  starting
AT 
the end of your file. Which is a contradiction.

Robert,

Many thanks for following up on this, it is appreciated.

I've had time to do some further investigations on both the file itself and the FileCharacter object.

It's still not working as I would expect. I might raise a WRC ticket on this, I can then attach the file so people can work
with the same data that I am working with.


The last 14 lines of the file are at the bottom of this message, you can see that Invalid Password appears twice.

The file is 2491024949 characters, counting back from the end, this puts the last Invalid Password
at about character 2491024505

So if I run the following:
k
s stream=##class(%Stream.FileCharacter).%New()
d stream.LinkToFile("d:\csp.log")
w stream.Rewind()
w stream.MoveTo(2491024500)
set x=""
w stream.FindAt(-1,"Invalid password",.x,1)

it returns 45 which is about correct?
But it also sets stream.AtEnd to 1?

I can also do:

k
s stream=##class(%Stream.FileCharacter).%New()
d stream.LinkToFile("d:\csp.log")
w stream.Rewind()
w stream.MoveTo(2491024500)
for i=1:1:5{w stream.ReadLine(),!}
k

Which gives the following output, where you can clearly see the Invalid password string.

USER>k
USER>s stream=##class(%Stream.FileCharacter).%New()
USER>d stream.LinkToFile("d:\csp.log")
USER>w stream.Rewind()
1
USER>w stream.MoveTo(2491024500)
1
USER>for i=1:1:5{w stream.ReadLine(),!}
tent-type: text/html
Connection: closed
 
Invalid password
>>> Time: Tue Feb 13 11:41:34 2018; RT Build: 1501.1472 (win64/iis/mod:srv=8.5); Log-Level: 0; Gateway-PID: 26860; Gateway-TID: 19448; Connection-No: ; Request-ID: 2e5d; Session-ID: yynkyKY1Kr; Remote-Addr: 10.104.16.17; Page: POST /csp/ccms_stat/ws.DatabaseQuery.cls
USER>k


LAST 14 LINES OF FILE.

Invalid password
>>> Time: Tue Feb 13 11:41:34 2018; RT Build: 1501.1472 (win64/iis/mod:srv=8.5); Log-Level: 0; Gateway-PID: 26860; Gateway-TID: 24956; Connection-No: ; Request-ID: 2e5c; Session-ID: mVINyKM1KZ; Remote-Addr: 10.104.15.55; Page: POST /csp/ccms_stat/ws.DatabaseQuery.cls
    Diagnostic
    Failed to connect - Reason: 0 (Connection successfully made but server not responding) (No Retry)
>>> Time: Tue Feb 13 11:41:34 2018; RT Build: 1501.1472 (win64/iis/mod:srv=8.5); Log-Level: 0; Gateway-PID: 26860; Gateway-TID: 19448; Connection-No: 11; Server: CCDSInstance; Cache-PID: 0; Request-ID: 2e5d
    cspTestConnection(mode=0, context=11, response_size=104): Response
    CacheSP: chd=1;
HTTP/1.0 403 Forbidden
Content-type: text/html
Connection: closed

Invalid password
>>> Time: Tue Feb 13 11:41:34 2018; RT Build: 1501.1472 (win64/iis/mod:srv=8.5); Log-Level: 0; Gateway-PID: 26860; Gateway-TID: 19448; Connection-No: ; Request-ID: 2e5d; Session-ID: yynkyKY1Kr; Remote-Addr: 10.104.16.17; Page: POST /csp/ccms_stat/ws.DatabaseQuery.cls
    Diagnostic
    Failed to connect - Reason: 0 (Connection successfully made but server not responding) (No Retry)

Oliver,

this turned out to be somewhat more tricky than expected.
The way you used stream.FindAt(...)  returns the size of the gap between the last found occurrence and the next.
So you have to add the size of your search string for each loop to get closer to your file size

so it might be easier to do it this way:

set last=1
for  set i=stream.FindAt(last,"Invalid password") quit:i<0  set last=i

this might be closer but definitely smaller than the total size  

This is a side effect of how Caché finds the string, the code performs a read() on the stream to load a chunk into memory before searching for the string.

The read() is setting the AtEnd property, but if the string being searched for appears more than once in the chunk in memory, only the first occurrence will be found.

So in order to read all occurrences of the string, you should not sue the AtEnd to determine when to Stop.

The following code will count the number of times "Invalid password" appears in the file.

kill
set stream=##class(%Stream.FileCharacter).%New()
do stream.LinkToFile("d:\csp.log")
set x=""
set i=0
set j=0
write stream.Rewind()
while(i'=-1){set i=stream.FindAt(-1,"Invalid password",.x) set j=j+1}
write i
write j

 

Obviously you can alter the code in the while loop to perform any other action when it finds the string.