How to Convert Rest APIs into Databases: Create a JDBC driver

A lot of data over the web or internally in an enterprise is available and accessible only through Rest APIs. Rest APIs do provide a secure and authenticated mechanism to access these services, but they are not very intuitive sometimes to query upon the data which is fetched through them.

For example, this url fetches the real-time stock data in the Nifty 50 index from NSE (National Stock Exchange) India. It returns a JSON response with a list of symbols in the Nifty 50 index and their price and other values.

Apologies if the stock market is something that is not of your interest, you can give it a quick read though, if curious.

If this data would have been present in a database, then we can very easily write different queries like below and query at ease:

Select open, high, low, close From Nifty50 where trdVol > 100;

Imagine if all the data through Rest APIs would have been available and hosted like a database, how easy it would have been to query them or join them with other data tables.

Another good thing with a database is that they also mostly provide a standard ODBC / JDBC driver, which can be used to connect through any language and used for analysis.

So how do we convert Rest APIs into a database…?

Apache Calcite

This is an amazing open-source project from Apache which can be used to create a JDBC driver over Rest APIs, or for that matter, Apache Calcite allows you to create a JDBC driver over any underlying data source (like Cassandra, ElasticSearch, Local File Folder, AWS S3, etc).

To begin with, this is a well-explained article from Apache Calcite, which gives a walk-through of an example to run JDBC queries over data stored in local CSV files.

Once you follow the example and set up the same over the CSV files, the JDBC config (model.json file in Calcite terms) to use Apache Calcite JDBC driver looks like this:

If we look at the config, it has a key called factory which allows fetching the schema in your own custom way.

In this case, it is defined as a custom class called CsvSchemaFactory which inherits and implements the SchemaFactory interface. The interface has a method create which creates and returns Schema.

Schema internally defines the Table and its definitions. The complete CSV example implementation can be found in this github project.

Going back to our example of NSE data - the NSE website provides data across multiple different categories, like:

Broad Market Indices

Sectoral Indices, etc

and, within each category data is provided for multiple Indexes, ETFs or Bonds, like Nifty50, Nifty Auto, Nifty FMCG, etc.

So, when we jdbcify the NSE data, the categories can be the schema and indexes / ETFs could be the tables.

+-----------------------+-----------------+
| Schema Name           | Table Name      |
+-----------------------+-----------------+
| Broad Market Indices  | Nifty 50        |
|                       | Nifty Next 50   |
| Sectoral Indices      | Nifty Auto      |
+-----------------------+-----------------+

For the implementation, we implement AbstractSchema class, which has a getTableMap method and return the Map of Table with their names.

The createTable in the above code snippet should return an implementation of the interface Table. There are multiple types of tables supported and provided in Apache Calcite, as:

The above serves the purpose as their name suggests. For our example, both Scannable or Streamable table can be created, the latter would be an interesting implementation as the data emitted by NSE APIs ticks and provides the latest prices at a particular frequency.

For now and for simplicity we would create a ScannableTable. ScannableTable implementation has to implement a method called scan, which should return:

Enumerable<java.lang.Object[]>

The returned enumerable should emit each row as an array of values, where values should be in the order of the fields defined in the table column definition. For example, if the table definition contained below fields defined in this same order:

+----------------+--------------+
| Column Name    | Column Type  |
+----------------+--------------+
| symbol         | String       |
| open           | Float        |
| high           | Float        |
+----------------+--------------+

then the scan method should return rows as:

| Infosys        | 123.24       |  128.65      |
| Britannia      | 1232.4       |  1244.01     |

wherein the first row above, “Infosys” is the value for “symbol, 123.24 for “open” and 128.65 for “high”

Table definition for the Table can be specified and returned by the getRowType method:

And that is it!!.. that covers all missing links and places where to write some more code to fetch data from NSE and jdbcify it.

The complete source code of the example can be found at this github project.

You can experiment with the code using Calcite JDBC driver and sqlline utility. Apache Calcite internally uses Apache Avatica for the JDBC driver implementation.

This is how the output looks like, when queried using !tables command in sqlline:

You can run other SQL queries over Tables, the plan for the same is generated and can be viewed if logs are enabled and setup at DEBUG level in log4j.properties file.

With that simple implementation, any rest API hosted can be used as a database, which then allows data exploration and query using a simple old SQL construct.

Also published on: https://ashishjain-ash.medium.com/how-to-create-a-jdbc-driver-over-rest-apis-1571ab156e6a