Self-Sovereign Identity, smart contracts and Web 3.0

Decentralization of user identity is the one of key parts of Web 3.0, where a user controls his credentials without any external parties. “Login with Google” must become “Login with my own, revokable, identity”. This way of identification and authorization of users is very reliable and functional in the new era of decentralized Web, public blockchains and smart-contracts.

Smart contracts are a special software type that has recently become widespread. Over the past few years, developers have acquired a lot of typical development patterns, approaches to solving similar problems, and reference implementations. This article is about user identification based on smart contracts, working with addresses, anti-hacking measures, administrative access to the system and other ID-related issues.

I will pay the most attention to Ethereum as the most popular blockchain with smart contract support and all important mechanisms for ensuring network security. All new solutions are usually compared with Ethereum, and its algorithms have been responsible for billions of dollars of user crypto assets for a long time and proved their worth.

User Identification

In any decentralized network, a user or a network node can be identified using a digital signature. A user or a node (or let’s call them “participants”) creates a key pair - a secret and a public key. Then, information generated on the public key basis allows further identification of the participant. For example, in Ethereum-based blockchains an address (or “an account”) is 160 bits of the public key hash, i.e. to create an Ethereum address you need:

generate a private_key/public_key keypair to use it with the ECDSA schema (Ethereum currently uses the elliptic curve secp256k1)
take the last 160 bits of the public key SHA-3 hash (Ethereum uses the Keccak-256(SHA-3) hash function)
final value is the Ethereum network address that looks like this: 0xDC25EF3F5B8A186998338A2ADA83795FBA2D695E

Each time the address owner needs to confirm his ownership of this address - he signs the data using his private key. For example, any cryptocurrency transaction from some address includes a public key and an electronic signature that prove that this transaction could only be created by someone with access to the private key.

This method of “confirming ownership of information inherited from the secret key” is extremely common. For example, issuing and verifying HTTPS certificates. HTTPS certificate is a signed hash of the public key with additional data ( a domain name, an expiration date). Any tasks, working with a “digital signature” are similar to this pattern. The controlling body registers the public key of the organization along with its name, TIN, etc. The organization in its turn sends reports electronically and subsequently signs each report with its secret key.

Blockchain as an identity system

Considering the above, public blockchain can be regarded as a cloud service that stores addresses and related information for user identification.

Authentication of all users is carried out using a private key, and public keys (more precisely, their hashes), that are stored “in the blockchain” and any valid transaction in the blockchain serves as proof of ownership of a specific address.

At the same time, all important blockchain properties, such as resistance to attacks, access restrictions impossibility, and complete absence of secret information make it an ideal type of a system for storing information for identifying users. Blockchain is accessible everywhere: you can get proof of address existence and validity from any blockchain node, in any part of the world. It is almost impossible to block all existing blockchain nodes, and any change in information about keys is publicly verifiable.

Therefore, any algorithms that require “proof of address ownership” are a perfect fit for smart contracts. The address becomes a unique user identifier (uid) that does not require storing password hashes and other types of shared secrets. Availability and verifiability of such a “uid database” allows to do without security information backups, and it is easy to rebuild the account database in case of failure. In public blockchains, a server that sends verified information can be launched anywhere without registration or cryptocurrency. It will simply receive verifiable updates from other nodes through a p2p network.

To add to convenience, I will describe one of authorization options using the public Ethereum blockchain network. This scheme is widely used and requires the simplest one-click actions, similar to “login with Google” but without sending any data anywhere but to the site.

Registration

Registration in the system means sending information that links a public address in the blockchain with an identifier and any additional data. To put it simply, an entry is created in a key-value database (i.e. “in the blockchain”) where key is a user’s address and value is any related metadata:

key: “0x13668Ecf257cC15c381b461B9fEDaB5D451c8F7F”
value: “{ login: ‘alex’, age: 18, … }”

The required minimum is just to have an address in the contract, here is its code:

pragma solidity 0.7.0;

contract Users {

  mapping (address => bool) users; // storage of user addresses
  event UserRegistered(address indexed user); // event, fired each time user is registered

  constructor() {
    register(); // register creator of contract as first user
  }

  function register() public {
    users[msg.sender] = true; 
    emit UserRegistered(msg.sender);
  }

}

Now, any blockchain node can get this information by requesting key data from a smart contract that stores the addresses of registered users (e.g. 0x13668Ecf257cC15c381b461B9fEDaB5D451c8F7F). Data tied to the user's address can embrace any external logic: KYC confirmation from a special address belonging to an external KYC provider, the user nickname, etc. Storing a lot of data in blockchain is expensive and pointless, so it makes sense to store only the necessary minimum. In our case it is just a bool, but usually a struct with a nickname, user roles, etc. is used.

To register an address in a smart contract, the user independently calls the register() function from his address. The main problem of such systems is that the user will have to pay for the transaction in the ETH cryptocurrency in order to register, and although there is little data and the commission is small, it is still non-zero. Therefore, onboarding users in blockchain projects is a separate pain point.

On the bright side, the user cannot be hacked by gaining access to the site - in blockchains everyone is responsible for their own security. Secondly, registration updates are extremely rare, and having paid once, the user can use his account as much as he wants, on multiple sites, this information will never be lost.

Authorization

To log in to the site:

the user provides the address for authorization
the site makes sure that this address is "registered"
the site prompts the user to sign a one-time string (a “challenge”)
the user signs the challenge providing the site with an digital signature
the site verifies the signature and in case of success issues a regular JWT authorization token for the user
from this point everything is same as in any regular web service, user is authorized

Ethereum users must have software that signs transactions. Therefore, if the user has previously registered or used Ethereum in the browser, then he knows how to sign the data with the secret key. For example, Metamask browser extension allows to do that in one click, sending cryptocurrency and signing the challenge in the same way. When calling the web3.js functions that offer to sign a challenge, a Metamask extension notification pops up asking the user to sign the string. It looks like this in the browser:

The string to be signed is provided and temporarily saved by the backend. This helps to prevent replay attacks if a hacker intercepts the signed message and tries to resend it. Further on, the challenge and the signature go to the backend. After verifying the signature, the backend removes a one-time challenge and issues an authorization JWT token for the user. Then everything works as usual. A piece of code that implements such a scheme in Django is shown below (this is a standard view using a POST request):

The code is rather simple, and there is no security critical information in the project database except for temporary challenges. Even in the event of an SQL database leak from the backend, there is no threat to user authorization. Also, authentication can be carried out using completely different blockchains, contracts and software, making the signature.

Do you need a blockchain?

An attentive reader could notice that the blockchain itself is not necessary in the scheme above - it is enough to be able to sign the data with a private key on the client side. Private key authorization is now actively developing, proof can be found here: https://www.w3.org/TR/webauthn-2/. We may see these schemes in our browsers without any blockchains soon. The scheme is very convenient as the service no longer needs to be responsible for user accounts and implement password recovery scenarios. These old mechanisms in Web 3.0 bear unnecessary security risks. It is much easier to let users take care of key security themselves. Those who want powerful protection will have hardware keys or external signers (such as Parity Signer); for those who store passwords in a text file on the desktop nothing will change.

However, the most common software using electronic signatures is cryptocurrency "wallets". They are perfect as an authentication device: it is easy to support dozens of wallet types (the site doesn’t even know what software generated the signature). Also, users are not required to have cryptocurrency for authorization, they use “wallet” only for management of keys and addresses.

Identity + smart contracts

Let’s get back to blockchains. Public blockchain and smart contracts are a convenient and viable cloud for security-critical information and allow to implement more complex schemes. For example, your service requires that the user's identity be confirmed by some external KYC provider, or there must be an always available master public key to distribute software updates in the network (relevant for IoT). In this case, a smart contract can be easily stuffed with administrative accounts with different roles that can set different user address features, such as “passed KYC” or “COVID vaccinated”. These actions should be protected by multi-signatures so that hacking one of the accounts would not be enough to attack the whole system. Smart contract usage means that access to such an “API” is possible from anywhere without authorization or extra actions - protection is ensured by the private key. Smart contract identity-system almost does not involve any work to ensure security of data storage, its infrastructure protection is trivial. From the architectural point of view, smart contract based systems are not more complicated but much simpler than the traditional ones, which means they are more reliable.

Such functionality, of course, is not free, and when implementing such a system, you must always remember that there are no free transactions in public blockchains, which means that important administrative actions in the identity system will require extra fees. Also, smart contract specifics exclude functions that work with a large number of entities at once: all functions should be completed in O(1), any cycles should be bounded from above, and the amount of stored data should be minimal.

Synchronization with blockchain

The abovementioned complicates finding a solution for such tasks as “show a list of users who are awaiting identity confirmation” or “display all accounts with a nickname starting with the string ‘alex…’”. Any more or less complex blockchain projects must have a backend that stores and displays data that cannot be obtained directly from the blockchain node. These can be blockchain entity search engines or some kind of large off-chain data that are not stored in the blockchain (cryptographic hashes and various aggregated data are kept in smart contracts, but data itself is stored on backend). To organize such a backend and synchronize it with blockchain, you should employ a synchronization service that can roll up updates from the blockchain to SQL database, simultaneously aggregating and creating necessary indexes. This service usually uses a subscription model, i.e. it subscribes to the contract events of interest. Every time the contract storage is updated (a new registered address appears, or data of a certain account is changed), the service tracks these changes and alters its database accordingly.

In Ethereum, Substrate(Polkadot) and many other blockchains, Events are an important part of contracts (see the event UserRegistered in the example above). Very often there is no need to constantly keep information about the events on the node. It is easy to "reproduce" it by checking the blocks one by one and filling the backend database with gathered data, thus saving more memory and node resources and without paying for occupied space in the blockchain state database, making users pay much less amount of fees. With the example above the synchronization service only needs to subscribe to contract events and update the list of user addresses when stumbling upon the UserRegistered event. Events are saved in the blockchain up to a single bit and cannot be changed later, so they can be trusted. Also, event generation is several times cheaper than writing values to the blockchain state database, and the example above could technically be done without any data storage operations at all, simply by recording each registration with events.

Block finalization is an extremely important factor in determining which events were recorded in the blockchain. For Ethereum 1.0 based on a proof-of-work consensus, “reliable” events are those that happened a few blocks ago, i.e. it’s very unlikely that the chain will be replaced by another, more computationally “strong” one, and a different transaction history will "win". In blockchains with proof-of-stake and proof-of-authority consensus, “double spend” attacks can be carried out without huge capacities, and chain rebuilding can be caused by network splitting. Therefore, you should be extremely careful about what to consider a transaction finality feature for a given network. Most modern blockchains with proof-of-stake and proof-of-authority algorithms use deterministic finality, i.e the finalized block will never be rolled back. Finalization usually lags behind block production (a block is produced by one computer, and it is finalized by several), but having received a finalized block, you can be sure that Events and values in finalized blocks are stored forever, and can be changed only in case of a serious logic failure of blockchain nodes.

Side notes

In my opinion, identity protocols along with public blockchains will impact on the IT infrastructure more than cryptocurrencies. Moving security-critical operations to blockchain and managing public keys are highly important. These operations occur in the electronic document flow, IoT devices access to the cloud providers, issuance of verified credentials or any other area with a digital right of ownership of certain information. Public blockchains eliminate the need to support various types of APIs or control access to servers, just like electric cars are free from problems with ignition, fuel and air supply.

Currently, in DeFi you can have access to hundreds of different DeFi projects with a single secret phrase, having any amount of “accounts”, if you want, and “login” to each project with a single click on site without a risk of losing your account due to the error on site. It’s defintely the future of identity services - 100% owned by user, with less security risks, easier to implement. Self-Sovereign Identity with smart contracts and public blockchains is a very simple and reliable way to organize accounts ifrastructure for any web projects, making only users responsible for their accounts and making web projects more decentralized.