· Jul 15, 2016

A fetch_all() method for the Python client class

It would be convenient to be able to fetch the whole query result in one operation instead of looping through every row. A fetch_all() method or something similar would provide this functionality I guess. Is there a reason why this is not implemented?

Discussion (7)1
Log in or sign up to continue

That's generally a questionable idea, here's why:

  • It's the slowest possible method - if we're iterating over result set the application logic may decide to end processing not after the last row but earlier. If a whole result set is returned, then we still spent CPU/RAM to calculate something we may not even need
  • Result set may be quite large which would cause application logic errors
  • Result set may be larger than amount of RAM available
  • Time to first result is usually smaller than getting all results, so the user can start working with the first row, while an application fetches more rows

I have measured speeds now and for a dataset of 35.000 rows it starts out at 4-5 ms per row added to the dataframe and increases to about 9 ms at the end of the loop. For bigger data sets with millions of rows this will take a very long time. The response from the database takes less than a second and the rows contain 5 fields. I am using this method in the loop:

row = query.fetch([None])
df.loc[len(df)] = row

What do you mean by COS and how can I do this comparison?

Thanks for the answer Eduard.

I might be looking in the wrong place for a solution to my problem. I am working with larger data sets, up to 15-20 million rows, and using the python pandas library for data manipulation and analysis. To populate a dataframe I fetch rows from the query object one row at the time and this is very slow. After the dataframe is populated there are no issues with speed so I suspect that this process is the culprit. Is there a more efficient way of doing this?