Article
· Mar 27 2m read

A Better data import experience for LOAD DATA

In recent versions of IRIS, a powerful new data loading command has been introduced to SQL: LOAD DATA. This feature has been highly optimized to import data into IRIS extremely fast, allowing hundreds of gigabytes of data to be inserted in seconds instead of hours or days. 

This is a very exciting improvement. However, a big problem in the data loading experience still exists. Namely, the time and hassle it takes to:

  1. Define the schema for the table in which you want to load data.
  2. Figure out the syntax for the LOAD DATA command.

I've developed a user interface that invisibly handles the CREATE TABLE step and then generates the syntax for LOAD DATA, all in a handy wizard!

At least in my case -- although I've been in the database business for decades -- I only import data a few times a year. Therefore, my CREATE TABLE skills get rusty, and it's really nice to have a tool take care of that for me. And this tool doesn't just handle syntax. It also inspects the input CSV file using a utility from the SQL Utilities library from @Benjamin De Boe to figure out the data types and their length. Then it asks a few more questions to define the syntax of the required LOAD DATA command. The app can run it, or you can just copy the syntax and tweak it yourself. 

Here's a walkthrough.

Step 1: Install the app and review the CSV file

After following the instructions to install the solution, you will have an Angular app published as a CSP application and a backend ObjectScript application that serves as the API to interface with the database.

Take a look at my sample data set (using the Rainbow CSV extension in VS Code). It has a mix of numeric, text and empty columns.

Step 2: Go to the app

You will probably find the app at http://localhost:52773/csp/dataloadapp/index.html if you use the default IRIS port and web application name.

Step 3: Specify the CSV file location

Step 4: Specify the CSV file's format

LOAD DATA needs to know some things like the column delimiter character and where to start in the file. 

Step 5: Define a destination table name, with the schema name as well

Step 6: Fine tune the field names and data types

Most of this will be filled in for you, and should be pretty accurate, but you will probably want to adjust some names or field lengths.

And that's it! Press "Load CSV" and the client-side app will make a call to the server to run a CREATE TABLE SQL command, then run LOAD DATA with the syntax shown in the black box on the right. Going into the Management Portal (or any other SQL client), you can see I now have the CSV file loaded into IRIS.

 

I must apologize in advance that there isn't much error checking yet, but this is open source so if you find this tool useful, join me in improving it on GitHub.

Discussion (4)3
Log in or sign up to continue

Thanks @Raj Singh  great input  !
I recently struggled with LOAD DATA and finally dropped it in favor
of an old-fashioned COS import for 2 reasons:

  • installation of Java was a prerequisite (might be mentioned somewhere, not identified) not a default in community containers
  • quality of data was POOR to express it politely e.g:
    • column expected to be DOUBLE   >> content was 
      • number 
      • $ number
      • some text
      • NULL
  • date with any sick non -SQL format
  • just empty lines by wild running \n

so some exception handling after validation would have been a great feature in my case
to filter out sick records.

The sick CSV is available on GitHub
 
 

Thanks for the explanations. I would like to explore more possibilities with LOAD DATA. From a data lake perspective, information generally remains as pure data and is often stored in different places. How could LOAD DATA act in situations like this?

Example 1: I have thousands of CSV files on different disks that are accessible via the network. What would be the syntax to perform the load with LOAD DATA?

Example 2: log files available on cloud storage (S3, Google Storage, etc.)
Would the routine to load these files also be through LOAD DATA?

thanks

LOAD DATA only works with files that are on the same filesystem of the IRIS instance. Raj mentions this and the fact that an import from local machine is a great next feature in OE, https://openexchange.intersystems.com/package/IRIS-Data-Loading-Client.

The problem of fetching data from many different machines is a really the seperate problem of machine to machine file transfer, which is really about protocols, network access, authentication and authorization.