Astronomers’ tools

5 years ago, on December 19, 2013, the ESA launched an orbital telescope called Gaia. Learn more about the Gaia mission on the official website of the European Space Agency or in the article by Vitaly Egorov (Billion pixels for a billion stars).

However, few people know what technology the agency chose for storing and processing the data collected by Gaia. Two years before the launch, in 2011, the developers were considering a number of candidates (see “Astrostatistics and Data Mining” by Luis Manuel Sarro, Laurent Eyer, William O’Mullane, Joris De Ridder, pp. 111-112):

Comparing the technologies side-by-side produced the following results (source):

Technology Time
DB2 13min55s
PostgreSQL 8 14min50s
PostgreSQL 9 6min50s
Hadoop 3min37s
Cassandra 3min37s
Caché 2min25s

The first four will probably sound familiar even to schoolchildren. But what is Caché XEP?

10 9
1 1K

Apache Spark has rapidly become one of the most exciting technologies for big data analytics and machine learning. Spark is a general data processing engine created for use in clustered computing environments. Its heart is the Resilient Distributed Dataset (RDD) which represents a distributed, fault tolerant, collection of data that can be operated on in parallel across the nodes of a cluster. Spark is implemented using a combination of Java and Scala and so comes as a library that can run on any JVM.

11 5
1 2.7K
Article
· Nov 14, 2016 14m read
Mastering the JDBC SQL Gateway

As we all know, Caché is a great database that accomplishes lots of tasks within itself. However, what do you do when you need to access an external database? One way is to use the Caché SQL Gateway via JDBC. In this article, my goal is to answer the following questions to help you familiarize yourself with the technology and debug some common problems.

13 2
8 4.4K

I've asked a lot of questions leading up to this, so I wanted to share some of my progress.

The blue line represents the number of messages processed. The background color represents the average response time. You can see ticks for each hour (and bigger ticks for each day). Hovering over any point in the graph will show you the numbers for that period in time.

This is super useful for "at a glance" performance monitoring as well as establishing patterns in our utilization.

5 2
0 518

The Caché System Management Portal includes a robust web-based SQL query tool, but for some applications it’s more convenient to use a dedicated SQL client installed on a user’s PC.

SQuirreL SQL is a well known open source SQL client built in Java, which uses JDBC to connect to a DBMS. As such, we can configure SQuirreL to connect to Caché using the Caché JDBC driver.

9 11
1 10.8K