Hi Heloisa,
My usecase is sorting and removing duplicates and getting count from a file that has json messages as a individual rows.
I am currently planning to use pandas for this purpose as its really fast. Below are the steps i am following
1) call a python function (called function) from IRIS classmethod(calling function)
2) the call python function will read the json file in a dataframe
3) perform sorting, dup removal, count in the dataframe
4) convert the dataframe into iris stream
5) return back the stream to iris calling function class method
When i try to write the stream into termial its coming as a %SYS.python object rather a iris stream object.
Below is what my questions are
1) why is the return a %Sys.python rather a iris stream object
2) is there a better way to implement sorting, dup removals n count of record, in a file within iris.
Thanks!
Gud post, Thanks!
Thanks a lot @Guillaume Rongier and @Julius Kavay for the solution!!
@Guillaume Rongier
have couple of follow up questions
1) what is the buffer size in string to stream operation. Is this at record level or number of bytes and if my json record is say 2000 bytes each row would be written as a separate line in the stream.
2) is there any limitations on the strin size for below. I can have the file size upto 1GB. So effectively as long as the intanse supports handlig this much of data inmemory should be good right
buffer = data.to_json(orient='records', lines=True)
2)