Data Access for Microservices

Written by lafernando | Published 2020/08/24
Tech Story Tags: programming | microservices | data | data-security | observability | ballerina | microservice-architecture | code-quality

TLDR A data service gives you a generalized interface to the data you’re exposing. This would be in a well-understood protocol and a known data format. Ballerina is meant to avoid boilerplate code and provides maximum agility for the developer. It contains first-class language constructs for services, endpoints, transactions, data security, and more. Here, we will be creating an HTTP service that will consume and produce. JSON via HTTP/S. The service is a service template for a CRUD data service. In the service, we can define individual resource functions.via the TL;DR App

If you want to access data in a distributed environment such as in a microservice architecture, then data services are the way to go. The idea is to create a data abstraction layer (DAL) that the rest of the system’s applications and services can share. Thus, a data service gives you a generalized interface to the data you’re exposing and provides access to it in a standard manner. This would be in a well-understood protocol and a known data format. For example, a popular approach is to use JSON via HTTP/S.
Writing a data service is not just about creating CRUD (Create, Read, Update, and Delete) operations. It’s usually considered an anti-pattern if that is all you’re doing. You should instead strive to do something more useful in your data service. This may include aspects such as data filtering, validation, transformations, and transactions. You need to do something non-trivial for the data service to be useful.
There are plenty of DSL based data integration tools that do these tasks. However, the functionality and the flexibility you need may be more than what such a tool can provide. In these situations, you would usually turn to your trusty general-purpose programming language, i.e., your Java, C#, or Node.js to get it done. But then again, most of these languages have a higher overhead when getting a basic data service up and running, and maintaining it.
Ballerina is meant to avoid this requirement of boilerplate code and provides maximum agility for the developer. Ballerina is more focused on integration scenarios. Thus, writing data services comes very much naturally to it. In supporting this, it contains first-class language constructs for services, endpoints, transactions, data security, and more.

Writing a CRUD Service

In the beginning I mentioned that just writing a CRUD service is not necessarily good. However, let’s take this as a starting point to show the basics you need to get a service up and running. We can later improve on it to add some more interesting features. Here, we will be creating an HTTP service that will consume and produce JSON.
Ballerina has support for SQL databases through JDBC, so any JDBC driver can be used to support an RDBMS. You simply have to download the JDBC driver jar and add it as a Ballerina project dependency. 
Let’s first see how we create the HTTP service for our task. In Ballerina, services and endpoints are first-class constructs in the language. A service is defined in the following way.
Listing 1: Ballerina HTTP CRUD Service Template
The above service is a Ballerina service template for a CRUD data service. In the service, we can define individual resource functions. The resource functions mentioned here are getEmployee, getEmployeeById, addEmployee, updateEmployee and deleteEmployee. The operations these resources represent are self-explanatory and they are each represented by its corresponding HTTP verb, the resource paths, and the payloads it contains.
Ballerina services have data binding support in the service resources, where its path parameters and payload can be directly mapped to the parameters in the resource. A path parameter example is shown in the employeeById resource function, where its id section in the path is mapped to the id integer value in the resource function.
Also, in the employeeInsert resource function, its body payload is mapped directly to the Employee record type. In this case, the incoming JSON is mapped to the fields of the Employee record. If needed, for any custom mappings, we can also declare the raw JSON object to be retrieved as the parameter to the resource as well. (For more information on Ballerina services, and their configuration properties, refer to the “HTTP/HTTPS” section of Ballerina by Example).

Accessing Databases

When accessing a database, we model the database as a remote endpoint. In Ballerina we interact through a client endpoint any time we access an external network entity. The language is modeled in this way to match with the concept of using a sequence diagram to describe the actors, objects, and messages passing between them.
jdbc:Client empDB = check new ("jdbc:mysql://localhost:3306/EmpDB", "root", "root");
Listing 2: Ballerina SQL Database Client Endpoint Declaration
The client endpoint is used in Ballerina by invoking the remote methods in them. A special syntax, which is an arrow notation (->), is used to access the remote methods in a client endpoint. This symbolizes the operation of doing a network call using the client.
Listing 3: Reading Records with Database Client
The code segment above shows how the database endpoint can be used to execute an SQL query with the given record type (Employee) and the arguments (id) for the query. This returns a stream with a record value type (for more information on using the database connectors, refer to the “Database” section of Ballerina by Example).
After this, the caller endpoint can be used to respond to the client with the payload. This is something unique to Ballerina. Other frameworks such as JAX-RS/WS simply return a function and if there is a problem in writing the result back to the client, we do not have a place to handle it properly. In Ballerina, we get that chance to do this, since we are sending the response back explicitly in the code.

Parameterized Queries

The SQL query shown in Listing 3 provides a parameterized query to the query remote method invocation. The parameterized query is based on raw templates in Ballerina. In this way, we can naturally provide the values of parameters in the query string itself. The query remote method accepts both parameterized queries and simple string values. 

Manipulating Data and Batch Updates

Data updates and insertions are done with the execute remote method in the JDBC client. Listing 4 shows how this is done with the operation to add new employees to the database table.
Listing 4: Writing Records with Database Client
When inserting multiple records to a database table, we can get a significant performance boost by batching multiple records at once and doing batch inserts to the database. This is possible with the use of the batchExecute remote method. Let’s add a new resource function named addEmployeeBatch to demonstrate this functionality.
Listing 5: Writing Batch Records with Database Client
As seen in Listing 5, the batch request is executed by providing an array of sql:ParameterizedQuery objects, which is generated by using raw templates in conjunction with the language integrated query features of Ballerina.

Data Security

When implementing a data service, we need to make sure we are handling the data securely, may it be masking out confidential information, data filtering, or avoiding SQL injection attacks. A common amateur mistake developers make is generating SQL queries directly by concatenating argument values into it. Let’s rewrite the content in Listing 3 using this approach.
Listing 6: Invalid SQL Query Usage with Arguments
The code stated in Listing 6 has a clear SQL injection attack vulnerability. So in typical programming languages, the developer must catch this and fix it properly. In Ballerina, the above code doesn’t even compile! The reason being the in-built taint analysis features available in the language. Here, the SQL query string parameter is marked as “untainted”. So unless the variable id is explicitly “untainted”, the SQL query string derived from the concatenation is marked as “tainted”, thus making it not compatible with the untainted value expected as the SQL query string for the query operation.
This shows that Ballerina does its best to stop developers from making any mistakes from the get-go. This is the concept of security by default followed throughout Ballerina.

Authentication/Authorization

Ballerina contains an authentication framework, which is capable of plugging in provider implementations. Some of the out-of-the-box providers in Ballerina are listed below.
Listing 7: Service with Authentication Configuration
The service definition above contains an authentication configuration along with a transport level secure endpoint, where the keystore information is given. In this configuration, the service is declaring that only users with the scope “employee-data-access” are allowed to call this service.
For more information on Ballerina authentication framework features, please read how to secure Ballerina code.

Transactions

Ballerina transaction handling is designed to make it convenient for the developer to define operations and simply mark the transaction boundary these operations belong to. After that, Ballerina will simply give an all-or-nothing guarantee on the execution of all the data operations in the marked scope.
Listing 8: Ballerina Transactions with the SQL Connector
Listing 8 shows a transaction block in action. There, the operations represent a scenario where the employee team is swapped between two employees. For this, we first read in the two employee records, swap the record data, and perform separate SQL update operations to do the swap. All these operations need to be done in a single transaction to make sure we don’t get into an inconsistent state. So here, what we need to simply do is, wrap all the SQL connector operations in a transaction block and do either a commit or rollback action accordingly.

Data Connectors

Ballerina supports relational database access directly through the standard library. The Ballerina platform provides further data connectors through Ballerina Central. Connectors such as Redis and MongoDB can be used directly. 

Observability

Ballerina supports in-built observability for its applications without any additional developer effort. We can simply enable observability for our applications, and we will instantly get the default calculated metrics and tracing information. An example dashboard which will be shown for our services is shown below in Figure 1 and Figure 2.
Figure 1: Ballerina Service Metrics
Figure 2: Ballerina SQL Metrics
For more information on Ballerina observability features, check out this article on automated observability

Demo Run

The following are some sample HTTP requests to test the data service:
curl -X POST -d "{'id':1, 'name':'Jordyn Bird','age':55, 'team': 'Sales'}" http://localhost:8080/data/employees
curl -X POST -d "{'id':2, 'name':'Emily Smith','age':45, 'team': 'Marketing'}" http://localhost:8080/data/employees
curl -X POST -d "[{'id':3, 'name':'John Doe','age':25, 'team': 'Engineering'}, {'id':4, 'name':'Jane Doe','age':25, 'team': 'Engineering'}]" http://localhost:8080/data/employees_batch
curl -X GET http://localhost:8080/data/employees
[{"id":1, "name":"Jordyn Bird", "age":55, "team":"Sales"}, {"id":2, "name":"Emily Smith", "age":45, "team":"Marketing"}, {"id":3, "name":"John Doe", "age":25, "team":"Engineering"}, {"id":4, "name":"Jane Doe", "age":25, "team":"Engineering"}]
curl -X GET http://localhost:8080/data/employees/2
{"value":{"id":2, "name":"Emily Smith", "age":45, "team":"Marketing"}}
curl -X PUT -d "{'id': 1, 'name':'Sunil Perera','age':66, 'team': 'Sales'}" http://localhost:8080/data/employees
curl -X GET http://localhost:8080/data/employees
[{"id":1, "name":"Sunil Perera", "age":66, "team":"Sales"}, {"id":2, "name":"Emily Smith", "age":45, "team":"Marketing"}, {"id":3, "name":"John Doe", "age":25, "team":"Engineering"}, {"id":4, "name":"Jane Doe", "age":25, "team":"Engineering"}]
curl -X POST http://localhost:8080/data/employee_team_swap/1/2
curl -X GET http://localhost:8080/data/employees
[{"id":1, "name":"Sunil Perera", "age":66, "team":"Marketing"}, {"id":2, "name":"Emily Smith", "age":45, "team":"Sales"}, {"id":3, "name":"John Doe", "age":25, "team":"Engineering"}, {"id":4, "name":"Jane Doe", "age":25, "team":"Engineering"}]
curl -X DELETE http://localhost:8080/data/employees/1
curl -X GET http://localhost:8080/data/employees
[{"id":2, "name":"Emily Smith", "age":45, "team":"Sales"}, {"id":3, "name":"John Doe", "age":25, "team":"Engineering"}, {"id":4, "name":"Jane Doe", "age":25, "team":"Engineering"}]

Summary

In this write-up, we show how a general data service can be implemented using Ballerina and how it gives the full power to the developer while making sure they do the right operations at the right time.
The full source code for the data services sample and the database scripts for both Oracle and MySQL can be found here.
For more information on writing microservices in Ballerina, check out the following resources:

Written by lafernando | Software architect and evangelist @ WSO2 Inc.
Published by HackerNoon on 2020/08/24