Replies by Mads Lien for InterSystems Developer Community

Mads Lien · May 10, 2019

Update

I managed to connect to the Cache system remotely and ran som sample queries fetching data with the JDBC driver. Unfortunately it is much slower than with the Python binding. In my simple test it was in fact nearly 20 times slower. Thanks for the suggestion anyway.

go to post

Mads Lien · May 9, 2019

Hi David

I am looking into using the JDBC driver. This method will also allow the use of cursors which is not available with the python binding, if I have understood it correctly. Will let you know how it goes.

Thanks.

go to post

Mads Lien · May 9, 2019

Thanks for the feedback, Sergey.

We have an external developer that handles the system, so I would prefer to leave the installation as is and connect remotely.

go to post

Mads Lien · Oct 18, 2016

I think the issue is solved. I rewrote the python code and now the process is quick. It seems that the pandas method of adding rows to a dataframe is not suited for use in loops. So I added the rows from the query to a nested list and created a dataframe from the list ouside the loop instead.

Thanks for your input on this issue.

go to post

Mads Lien · Oct 17, 2016

I have measured speeds now and for a dataset of 35.000 rows it starts out at 4-5 ms per row added to the dataframe and increases to about 9 ms at the end of the loop. For bigger data sets with millions of rows this will take a very long time. The response from the database takes less than a second and the rows contain 5 fields. I am using this method in the loop:

row = query.fetch([None])
df.loc[len(df)] = row

What do you mean by COS and how can I do this comparison?

go to post

Mads Lien · Oct 13, 2016

Thanks for the answer Eduard.

I might be looking in the wrong place for a solution to my problem. I am working with larger data sets, up to 15-20 million rows, and using the python pandas library for data manipulation and analysis. To populate a dataframe I fetch rows from the query object one row at the time and this is very slow. After the dataframe is populated there are no issues with speed so I suspect that this process is the culprit. Is there a more efficient way of doing this?