How to create a financial marketplace for 500,000 people 💸. Part II (technical)

This article is split into the two parts:

Part I: Non-technical
Part II: Technical

If you are interested more about the decisions and thoughts behind an ecosystem, check out Part I of this article. Meanwhile, the story continues with defining a core architecture.

A modern fintech bank is a marketplace with a banking license, a core platform, KYC, CRM, an API and a few core products. The products directly offered by the Fintech bank may be limited to funds holding, bank accounts, cards and a wallet for payments.

However, our task wasn’t quite typical. At first, our main goal was to design and architect a financial platform that would be able to scale horizontally, to be able to integrate with multiple external service providers, have an open API system and be modular enough to scale throughout multiple product lines and features.

That is why it was important to start with ground principles of the platform.

Canonical API Definitions

We need the ability to have a single canonical API definition that could be used in all channels, or that would allow us to at least verify the correctness of the clients.

It is necessary to standardize gRPC to define the service APIs, standardizing gRPC inside the “logical” data center, and then using grpc-gateway to create REST adapters that clients could call. Any client features will be covered in this component of the adapter.

Horizontally scalable and minimum operating cost

We do not want to spend extra money on overdeveloped systems to cope with the load, and we do not want the additional capacity to be burdensome. Everything should be scaled horizontally.

Therefore, we want to translate everything into Kubernetes and make all services containerized, which will allow us to cope with the load, increasing efficiency.

We still need to standardize Bigtable as our permanent store because of its cost for the performance profile.

Logging

It is necessary that all code has a standard way of logging and log search support. This is critically important.

All log lines are converted to standard output in JSON format, and then connected using the EFK stack (ElasticSearch+ Fluentd+ Kibana).

Metrics

We need a single standard way of presenting indicators for the dashboard, as well as warnings.

Since we have already standardized Go for development, it makes sense to expose any metrics using expvars, which can then be cleared Datadog or Prometheus. As an added bonus, since we standardized the gRPC, it was easy enough to make sure that all gRPC services set a standard set of metrics, such as the percentage of the response time to the RPC endpoint.

Developers can also easily identify additional, service-specific indicators using the same approach.

API Gateway

The Gateway app should be the only service being exposed to the public. It should perform the following criteria:

The gateway should be able to be scaled horizontally. Consequently — no application state should exist;
The gateway must be able to combine requests and invoke micro-services asynchronously;
The gateway must be able to limit the number of requests over a period of time;
The gateway must be able to verify the authenticity of the authentication token. Traditionally, it is suggested that the API gateway performs authentication, and the underlying micro-services perform authorization for their resources, but this is to be discussed as to what authentication approach would we go for;
The gateway must be able to automatically import available resources from microservices. First, we’ll select the Swagger format, as the most popular in the world today;
The gateway must be able to change (mutate) the replies of the micro-services;
The gateway must be able to talk to REST to gRPC adapter (grpc-gateway)
And finally: the gateway should be perfectly run directly from the Docker image and configured through environment variables. We do not want any additional repositories, deletion scripts, and so on.

GCP stack

When selecting our cloud service provider, the decision was made to use Google Cloud Platform. We were driven by three major factors: Isolation, Performance & Cost and lastly Kubernetes out of the box.

Isolation

We knew that we wanted to experiment with a more advanced geo store and other techniques, and felt that we would definitely hit bumps early on. The best way to make sure not to impact other services in those cases was to have a new, completely separate one.

Performance & Cost

There was a plan to store and serve a lot of geo-tagged and aggregated data and we needed a low-cost, persistent store that would lend itself well to this use case. This is where Google Bigtable really shined and made the migration more appealing to us. For example, you can stand up the default, 3-node cluster, that supports around 30,000 QPS for about $1,500 per month, including storage. Since capacity is provisioned at the cluster-level and not at the table-level, you don’t need to worry about wasting money over-provisioning use-case specific tables.

Kubernetes

This one is very simple. At the moment of writing this article, AWS already provides “out of the box” Kubernetes services, so this bullet point is relatively useless. However, at the moment of architecting the platform, it was one of the critical reasons involved in decision making.

This is the high-level overview of the GCP stack. All blockchain related data is left out of the scope and are not discussed in here, yet. Whereas, all data that is stored in a centralized manner fall under the current stack.

Monolith or Microservices?

In this specific case, the most efficient approach would be using Microservices. The decision was based on an attempt to create a core set of services and controlling the dependency index.

Number of services running on the platform (besides network layers)

A quick note about Microservices. Microservices are independently built systems, each running in their own process and often communicating with REST API. Representing different parts of your application, they are separately deployable and each part can be written in any language.

You can easily see how, by dealing with the problems of a monolithic system, Microservices have become a requirement for any state-of-the-art software.

I strongly recommend reading Microservices (by James Lewis) and On Monoliths and Microservices if you want to understand more in depth what are the key concepts in this architectural style.

Kubernetes

Kubernetes was designed from the ground up to be the ideal platform to build, manage, and orchestrate distributed applications using containers. It includes primitives for replication and service discovery as part of its core (these are added via frameworks in Mesos and requires some know-how to configure properly) and the vision for Kubernetes is to develop a system that allows enterprises to manage scalable application deployments with maximum efficiency, security, and ease.

Personally, I like to describe Kubernetes as “A special kind of operating system for the cloud” — It’s an operating system which allows developers to treat an arbitrary number of machines as if they were a single, very powerful machine. In my view, K8s is to the cloud what Linux or Windows is to the computer.

In the same way that an Operating System provides developers with an abstraction layer from the hardware which makes up a computer, K8s provides developers with an abstraction layer from the computers which make up a cloud.

The more you think about this comparison, the more parallels you should notice. For example, one of the main purposes of an OS is to schedule processes such that they can efficiently share hardware resources across a single machine — Similarly, one of the main purposes of K8s is to schedule containers such that they can efficiently share computational resources across a cloud (potentially made up of multiple machines).

Once you think of K8s as being an “operating system for the cloud” the business opportunities start to materialize. We’re no longer just talking about a system to improve deployment efficiency, we’re talking about an entirely new platform for building cloud native applications.

Hybrid platform

How does it work?

There are three layers of blockchains in the Hybrid system is built upon: the Ethereum main-net, the Master internal blockchain, and country-specific blockchains. Each internal blockchain is a modified version of the Ethereum protocol, with zero transaction fees and emissions. Inside them a token, acts as the main currency, just like ETH in the main-net.

Note that there is no emission happening on the internal blockchain, as these all begin with the maximum available supply of tokens already existing in a ‘Master Wallet’. When tokens leave the network, they are sent to a second ‘Clearance Wallet’ which in effect ‘burns’ or removes them.

Three critical scenarios exist that must be handled by the system: when a user creates an account; when the tokens are sent from the ecosystem to the outside world; and, finally, when the token is sent from the outside world to the hybrid ecosystem.

Creating a user account

When a user registers a new account, a service is responsible for creating two identical wallets through the blockchain APIs, re-using the same private key. In this way, a user has exactly the same address on both the ETH Mainnet and the Private Chain. The information is then returned to the Tapatybe service (responsible for holding user identities) and tied to the account.

In response, the TokenRef service is called, which performs the required emission on both the ETH Mainnet wallet (by calling the contract and minting the tokens), and the Private chain: at this point, the tokens are transferred from the Master/Clearance wallet to the user’s internal wallet.

Transaction from the inside to the outside

When a user initiates a transaction, there are two possible scenarios: either the wallet belongs to the network, or it belongs to the ETH Mainnet.

As the transaction commences, a service is called that determines whether the destination address exists in the database. If it does, it simply proxies the transaction to the Blockchain API for the internal chain. If the address is indeed external, then the transaction is added to a queue that is picked up by a second service.

This service then proxies the transaction to the ETH Mainnet, which is broadcasted from the transit address. Once that is complete, it calls the Blockchain API to create a transaction that ‘burns’ the tokens on a user’s internal balance, sending them to the Clearance Wallet.

Transaction from the outside to the inside

Given the fact an external transaction can come at any time, a blockchain service must ‘listen’ to all of the wallets for incoming transactions. Whenever a new transaction is detected, a second service initiates a transfer of funds from the user’s mainnet address to the Transit Wallet, grouping transactions to lower the cost as much as possible.

A third service is then called, which orders the internal Blockchain API to distribute the tokens from the Master Wallet to the user’s internal balance.

How does the Hybrid system achieve scalability?

The private blockchains work in parallel, meaning that we can add as many of them as necessary to multiply Ethereum’s TPS. Of course, the presence of inter-blockchain operations implies some loss in efficiency, but these will be a small part of the overall transaction pool.

Sharing contacts in the app

If a service uses an identifier already listed in a typical contact card (phone number or email), it’s simple to quickly display which contacts of a user are also registered with the service and immediately make social features available to that user. This means friends don’t have to “discover” each-other on a service if they already have each-other as contacts.

The problem is that the simplest way to calculate the intersection of registered users and device contacts is to upload all the contacts in the address book to the service, index them, reverse index them, send the client the intersection, and subsequently notify the client when any of those contacts later register.

Bloom Filters and Encrypted Bloom Filters

There’s an entire field of study dedicated to problems like this one, known as “private information retrieval” (PIR). The simplest form of PIR is for the server to send the client the entire list of registered users, which the client can then query locally. Basically, if the client has its own copy of the entire database, it won’t leak its database queries to the server.

One can make this slightly more network efficient by transmitting the list of registered users in a bloom filter tuned for a low false positive rate. To avoid leaking the list of all registered users, it’s even possible to build a “symmetric PIR” system using “encrypted bloom filters” by doing the following:

The server generates an RSA key pair which is kept private.
Rather than putting every user into a bloom filter, the server puts the RSA signature of each user into the bloom filter instead.
The client requests the bloom filter, which contains an RSA signature of each registered user.
When the client wishes to query the local bloom filter, it constructs a “blinded” query as per David Chaum’s blind signature scheme.
The client transmits the blinded query to the server.
The server signs the blinded query and transmits it back to the client.
The client unblinds the queryto reveal the server’s RSA signature of the contact it wishes to query.
The client then checks its local bloom filter for that value.

It’s also possible to compress “updates” to the bloom filter. The server just needs to calculate the XOR of the version the client has and the updated version, then run that through LZMA(the input will be mostly zeros), and transmit the compressed diff to the client.

Facial Recognition system

Facial recognition is used in two places on the platform:

During registration and authentication
During additional verification when performing transactions in the wallet

Service authentication

All API methods require a simple token-based HTTP Authentication. In order to authenticate, you should put world “Token” and a key into the Authorization HTTP header, separated by a whitespace:

Authorization: Token yfT8ftheVqnDLS3Q0yCiTH3E8YY_cm4p

Common object types

**Face**Represents a human face. Note that it might be several faces on a single photo. Different photos of the same person as also considered to be different faces.

"id" (number): unique identifier of the face generated by the services.
"timestamp" (string): time of face object creation as ISO8601 string.
"photo_hash" (string): Hash of the original photo. Note that identical photos will always have the same hash, and different photos will most certainly have different hashes. Don't interpret this value and don't make assumptions about particular hash function used for hash calculation.
"x1" (number): x coordinate of the top-left corner of face's bounding box on the original photo.
"y1" (number): y coordinate of the top-left corner of face's bounding box on the original photo.
"x2" (number): x coordinate of the bottom-right corner of face's bounding box on the original photo.
"y2" (number): y coordinate of the bottom-right corner of face's bounding box on the original photo.
"meta" (string): metadata string that you can use to store any information associated with the face.
"galleries" (string[]): array of galleries names that have this face.
"photo" (string): URL of file name of a photo that had been used to create the face object.
"thumbnail" (string): URL of face thumbnail stored in the service cache.

Methods

**Create face**Processes the uploaded image or provided URL, detects faces and adds the detected faces to the searchable dataset. If there are multiple faces on a photo, only the biggest face is added by default.

Optionally, you can add a custom string meta, such as a name or an ID, which uniquely identifies a person. Multiple face objects may have the same meta. We recommend that you don’t assign the same meta to different persons. Thus when using person’s name as a meta, make sure that all names are unique.

Sample response code, for this method looks as following:

{  "results": [    {      "age": 40,      "emotions": [        "neutral",        "surprised"      ],      "galleries": [        "default",        "ppl"      ],      "gender": "male",      "id": 2333,      "meta": "Sam Berry",      "photo_hash": "dc7ac54590729669ca869a18d92cd05e",      "timestamp": "2016-06-13T11:06:42.075754",      "x1": 225,      "x2": 307,      "y1": 345,      "y2": 428    }  ]}

**Detect faces**This method detects faces on the provided image. You can either upload the image file as multipart/form-data or provide an URL, which the API will use to fetch the image.

Sample response code, for this method looks as following:

{  "faces": [    {      "age": 36,      "emotions": [        "neutral",        "happy"      ],      "gender": "female",      "x1": 236,      "x2": 311,      "y1": 345,      "y2": 419    }  ],  "orientation": 1}

**Verify face**This method verifies that two faces belong to the same person, or, alternatively, measures the similarity between the two faces. You can choose between these two modes by setting the threshold parameter.

In the case, when a binary decision is required, the user can pass a value for the threshold parameter. We provide 3 preset values for the threshold: strict, medium and low, with the former aimed at minimizing the false accept rates and the latter being somewhat more permissive. The client can also override these preset values by a fixed threshold.

Sample response code, for this method looks as following:

{  "results": [    {      "bbox1": {        "x1": 610,        "x2": 796,        "y1": 157,        "y2": 342      },      "bbox2": {        "x1": 584,        "x2": 807,        "y1": 163,        "y2": 386      },      "confidence": 0.9222600758075714,      "verified": true    }  ],  "verified": true}

Current look of the infrastructure

Since the launch of the app, the demand on the infrastructure has increased tenfold, utilising 96Gb of memory across nodes and reaching peak traffic figures of hundreds of megabytesper second. The infrastructure is coping effectively with the load, with average SLA 99.4%. All builds and updates are rolled out automatically using a continuous delivery system that allows transparent upgrading of services.

Our current stack is based on the following technologies and languages: Java (na- tive Android application) + native Android libraries, GoLang, Python, PostgreSQL, MySQL, RabbitMQ, Redis, MongoDb, Kubernetes, Docker, Tensorflow (for AI based assistant bot ML), Google Cloud, Sentry, Grafana, various analytics SDKs (Firebase and others), BigQuery, Apache Zeppelin, MQTT protocol for realtime communication, Node.js for web services.

The development team is working using scrum methodology with weekly sprints and includes backend and frontend developers, a DevOps engineer, a data scientist, a report and event engineer, a blockchain developer, QA engineers and an administrator/scrum-master. Before June 2018, bi-weekly sprints were used, subsequently reduced to speed up the response to the changing environment. All releases are covered by manual smoke tests and, when needed, complete regression tests, to ensure the proper functioning of the service.

This is what 500,000 users are supported by

Further Infrastructure development

General Improvements

Chat objects moved to static on Google Storage
Rejection of huge instances of databases for services
Auto-listing vault
Exhibited resource limits for all services, including infrastructure
Uniformity of helm-templates, deployment scripts, etc.
Terraform for cloud infrastructure
All business services in one namespace
Reservation of IP addresses for domains

Reliability

Cluster Version and OS Update for Production Environment
Infrastructure data services moved to the cloud
Transition to cloud solutions for rabbitMQ, Redis and ingress-controller
Services should be able to work in several copies
Work with points of failure
Preparing for accidents, recovery plans
Stress Testing
Including checking on databases with a lot of information
Vault in HA mode
Multiregional or multizone production cluster

Additional improvements

Collecting metrics about the behaviour of services, need support services.
Dashboard with business metrics and latency, automatic SLI calculation
Transition to a new system for collecting and aggregating logs

Continuous Integration ⇒ Deployment ⇒ Delivery

Automation of releases and rollbacks
Process measurability
Accelerate builds and releases
Creating environments on request (look at GKE alpha clusters).
Test automation
Canary, Blue-Green