Improve Smart Contract Security by Formal Proofs

Photo by Roman Mager on Unsplash

Vulnerabilities in smart contracts are threatening blockchain projects, developers and investors for a long time. A growing number of security teams are putting efforts in this field with various approaches to secure contracts. SECBIT Labs proposes combining the formal proofs with the traditional test and security audit. In this article, we take the ERC20 formal proof in our GitHub repository tokenlibs-with-proofs as an example to show the use of formal proofs on smart contracts. We hope the formal proofs can help to eliminate buggy contracts, and secure all aspects of smart contracts including design logic, implementation, economic system, etc.

What is Formal Proof

Formal proof is a concept under formal verification, which is defined on Wikipedia as

In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.

The verification of these systems is done by providing a formal proof on an abstract mathematical model of the system, the correspondence between the mathematical model and the nature of the system being otherwise known by construction. Examples of mathematical objects often used to model systems are: finite state machines, labelled transition systems, Petri nets, vector addition systems, timed automata, hybrid automata, process algebra, formal semantics of programming languages such as operational semantics, denotational semantics, axiomatic semantics and Hoare logic.

Essentially, formal verification is to prove properties of a mathematical model. In formal proofs of smart contracts, the model is the smart contracts and the way they are running, and properties are those desired good behaviors of smart contracts, such as no integer overflow and underflow can ever happen.

In detail, we formally model the smart contracts by the program logic, prove they agree with the given contract specification, and prove the specification can further guarantee certain high-level properties. All proofs are provided as explicit proof objects whose correctness can be checked independently of the proving process.

When talking about any formal proofs, we must keep following prerequisites in mind:

All proofs are under certain assumptions;
The correctness of proofs relies on the formal logic theory;
The final proof result and theorem indeed define the correctness and security of smart contracts.

In formal proofs of ERC20 contracts, the first point requires the correctness of fundamental definitions; the second demands a valid proof, which can be constructed and checked by a popular proving assistant Coq; the last one relies on that we state common ERC20 properties correctly.

Note: the above three points are prerequisites and the trusted computing base (TCB) of our formal proofs on smart contracts. Every security system depends on certain TCB. The smaller and the stronger the TCB is, the safer the system is.

Why Formal Proofs for Smart Contracts

Ethereum improved and enhanced smart contracts by introducing a Turing-complete smart contract language, which brings significant flexibility and expressiveness as much as the traditional programming languages. Inevitably, bugs and security issues are introduced as well, causing tremendous financial losses.

SECBIT Labs has performed several rounds of scans and analyses on all deployed token contracts on Ethereum, and recorded a total of 4,172 contracts with vulnerabilities and security risks. 101 of them are also recorded by CoinMarketCap, and a full list can be found in repo awesome-buggy-erc20-tokens by SECBIT.

After auditing numerous smart contracts, SECBIT Labs has realized that the security issues in smart contracts are much more complex and critical than the traditional software:

The trust of smart contracts comes from their immutability, i.e., once deployed no one can modify the code easily. However, it also implies that, if not prepared in advance, it’s difficult to fix bugs easily and quickly, especially when being attacked. Such fragility definitely harms the trust and value of token economy and blockchain projects.
Implementations of many deployed smart contracts are open sourced. Though it can improve the transparency and trust of contracts, it also reduces the cost for attackers to find bugs.
The design of smart contract language still has lots of limitations and flaws, making it easier to introduce bugs and security risks. A lack of experienced developers further aggravated the issue.

At present, there are two main approaches to address smart contract security issues: test and audit. Though they are necessary for avoiding common vulnerabilities, their capabilities remain limited.

Test

The security team can generate test cases manually or by automatic test utilities and use them to examine the input combinations and the execution paths of contracts as many as possible. However, it’s difficult to guarantee 100% coverage, so no issue discovered in tests does not imply bug-free contracts.

Audit

Professional security auditors inspect the implementation and the logic of contracts, and hunt for bugs and risks. The audit mostly relies on the subjective experience of auditors, so the scope and the effectiveness of audit are difficult to measure and evaluate.

Formal proof describes the programs by mathematical models, and proves their correctness and security by mathematical proofs. It is based on the strict and rigorous mathematics and logic, and therefore can cover all program behavior and strictly guarantee the correctness and security which are precisely defined. Formal proof has been applied successfully in critical areas including astronautics, high-speed railway, nuclear power and aeronautics.

Because of the fragility of smart contracts and the large amount of assets managed by them, smart contracts demand extreme security. In addition, some high-level and complex properties about economics and game theories cannot be validated easily by test and audit. Thus, formal proof is no doubt an efficient way to secure those small yet complex smart contracts.

How to Formally Prove Smart Contracts

Proving Structure

The formal proof of a smart contract involves four parts: source code, specifications, properties, and proof.

Structure of Formal Proof

Source Code: is the proving object.
Specifications: defines the expected behavior of every contract function.
Properties: defines the desired or good properties when the contract works as a whole, such as the immutability of the total amount of tokens.
Proof: reasons the contract does implement the specification and guarantee the properties.

Properties Proved

Our proof on ERC20 contract primarily proves the following properties of the contract:

nat_nooverflow_dsl_nooverflow: no harmful overflow in transfer() and transferFrom()

Lemma nat_nooverflow_dsl_nooverflow: forall (m: state -> a2v) st env msg, (_from = _to / (_from <> _to /\ (m st _to <= MAX_UINT256 - _value)))%nat -> ((from == to) || ((fun st env msg => m st (to st env msg)) <= max_uint256 - value))%dsl st env msg = otrue.
Property_totalSupply_equal_to_sum_balances: totalSupply equals the sum of all balances after executing any steps in contracts

(* Prop #1: total supply is equal to sum of balances *) Theorem Property_totalSupply_equal_to_sum_balances : forall env0 env msg ml C E C' E', create env0 msg C E -> env_step env0 env -> run env C ml C' E' -> Sum (st_balances (w_st C')) (st_totalSupply (w_st C')).
Property_totalSupply_fixed_transfer : Token sum remains unchanged after executing transfer()

(* Prop #2: total supply is fixed with transfer *) Theorem Property_totalSupply_fixed_transfer: forall env C C' E' msg to v spec preP evP postP, spec = funcspec_transfer to v (w_a C) env msg -> preP = spec_require spec -> evP = spec_events spec -> postP = spec_trans spec -> preP (w_st C) /\ evP (w_st C) E' /\ postP (w_st C) (w_st C') -> (st_totalSupply (w_st C)) = (st_totalSupply (w_st C')).
Property_totalSupply_fixed_after_initialization: Token sum remains unchanged after initialization

(* Prop #3: total supply is fixed after initialization *) Theorem Property_totalSupply_fixed_after_initialization: forall env0 env msg C E C' E', create env0 msg C E -> step env C msg C' E' -> (st_totalSupply (w_st C)) = (st_totalSupply (w_st C')).
Property_totalSupply_fixed_delegate_transfer: Token sum remains unchanged after executing transferFrom()

(* Prop #4: total supply is fixed with delegate transfer *) Theorem Property_totalSupply_fixed_delegate_transfer1: forall env C C' E' from msg to v spec, spec = funcspec_transferFrom_1 from to v (w_a C) env msg -> (spec_require spec) (w_st C) /\ (spec_events spec) (w_st C) E' /\ (spec_trans spec) (w_st C) (w_st C') -> (st_totalSupply (w_st C)) = (st_totalSupply (w_st C')).

Theorem Property_totalSupply_fixed_delegate_transfer2: forall env C C' E' from msg to v spec, spec = funcspec_transferFrom_2 from to v (w_a C) env msg -> (spec_require spec) (w_st C) /\ (spec_events spec) (w_st C) E' /\ (spec_trans spec) (w_st C) (w_st C') -> (st_totalSupply (w_st C)) = (st_totalSupply (w_st C')).
Property_from_to_balances_change: Transferring only affects balances of accounts involved, other accounts remain unchanged

(* Prop #5: only balances of from and to changed by transfer*) Theorem Property_from_to_balances_change_only: forall env C C' E' to addr msg v spec, spec = funcspec_transfer to v (w_a C) env msg -> (spec_require spec) (w_st C) /\ (spec_events spec) (w_st C) E' /\ (spec_trans spec) (w_st C) (w_st C') -> m_sender msg <> to -> m_sender msg <> addr -> to <> addr -> (st_balances (w_st C') to = (st_balances (w_st C) to) + v) /\ (st_balances (w_st C') (m_sender msg) = (st_balances (w_st C) (m_sender msg)) - v) /\ st_balances (w_st C') addr = st_balances (w_st C) addr.

Proving Process

The proved contract is a standard ERC20 contract, including six public functions: constructor, transfer(), transferFrom(), balanceOf(), approve() and allowance(). A brief proving process is shown as below.

1. Define Specification

A specification language is a formal language in computer science used during systems analysis, requirements analysis and systems design to describe a system at a much higher level than a programming language, which is used to produce the executable code for a system.

Spec.v defines the specification of the six public functions in the contract.

The execution of every function may involve multiple cases and behave differently in each case. We define one specification rule for one case, so a specification usually is composed of one or multiple rules. Take the specification of transferFrom() as an example: it consists of two rules- funspec_transferFrom_1 and funcspec_transferFrom_2, corresponding to two cases that the approved quota is smaller than 2**256-1 or equal to 2**256-1.

Using funcspec_transferFrom_1 as an example, every specification rule is composed of following parts.

spec_require specifies prerequisites that must hold before executing transferFrom in the first case.

Definition funcspec_transferFrom_1 (from: address) (to: address) (value: value) := fun (this: address) (env: env) (msg: message) => (mk_spec (fun S : state => (* require(balances[_from] >= _value); ) st_balances S from >= value /\ ( require(_from == _to || balances[_to] <= MAX_UINT256 - _value); ) ((from = to) / (from <> to /\ st_balances S to <= MAX_UINT256 - value)) /\ ( require(allowance >= _value); ) st_allowed S (from, m_sender msg) >= value /\ ( allowance < MAX_UINT256 *) st_allowed S (from, m_sender msg) < MAX_UINT256 ) ...
spec_events specifies events that must have occurred after successfully executing the function:

Definition funcspec_transferFrom_1 (from: address) (to: address) (value: value) := fun (this: address) (env: env) (msg: message) => (mk_spec (* require omitted *)
```
  (* emit Transfer(_from, _to, _value); *)      (* return True; *)      (fun S E => E = (ev_Transfer (m_sender msg) from to value) :: (ev_return _ True) :: nil)
```

where,

ev_Transfer represents the event Transfer;
ev_return is a pseudo event in specification to represent the function return and the return value.

spec_trans specifies the state transition in a successful function execution, e.g. changes in storage variables:

Definition funcspec_transferFrom_1 (from: address) (to: address) (value: value) := fun (this: address) (env: env) (msg: message) => (mk_spec (* require omitted ) ( events omitted *)

  (* State transition: *)      (fun S S' : state =>      (* Unchanged. *)        st_totalSupply S' = st_totalSupply S /\        st_name S' = st_name S /\        st_decimals S' = st_decimals S /\        st_symbol S' = st_symbol S /\      (* balances[_from] -= _value; *)        st_balances S' = (st_balances S) $+{ from <- -= value }      (* balances[_to] += _value; *)                        $+{ to <- += value } /\      (* allowed[_from][msg.sender] -= _value; *)        st_allowed S' = (st_allowed S) $+{ from, m_sender msg <- -= value }      )

which states that, corresponding entries in balances and allowed must be changed as expected, while the other storage variables are not changed.

Please refer to Spec.v for the complete definitions of all specification rules.

2. Prove Against Specifications

Before proving high-level properties upon specifications defined above, we need to ensure that Solidity source code does implement the specifications. First we represent the Solidity contract code in Coq. A domain specification language (DSL) is defined in Coq for this purpose. For example, the ERC20 function transfer() can be represented as below in DSL.v:

(* DSL representation of transfer(), generated from solidity *)   Definition transfer_dsl : Stmt :=    (@require(balances[msg.sender] >= value) ;    @require((msg.sender == to) || (balances[to] <= max_uint256 - value)) ;    @balances[msg.sender] -= value ;    @balances[to] += value ;    (@emit Transfer(msg.sender, to, value)) ;    (@return true)    ).

Then we prove that every function in DSL agrees with specifications, e.g. we prove that transfer() agrees with funcspec_transfer by the following lemma in DSL.v:

Lemma transfer_dsl_sat_spec:  forall st env msg this,    spec_require (funcspec_transfer _to _value this env msg) st ->    forall st0 result,      dsl_exec transfer_dsl st0 st env msg this nil = result ->        spec_trans (funcspec_transfer _to _value this env msg) st (ret_st result) /\        spec_events (funcspec_transfer _to _value this env msg) (ret_st result) (ret_evts result).

Please refer to DSL.v for the complete DSL definitions, DSL representations, and the specification proofs.

3. Define and Prove High-Level Properties

Specification above define the expected behavior of each execution of one function, while high-level properties define the expected behavior presented by the contract as a whole when arbitrary message calls are received. For example, the following property defined in DSL.v states the property that no token will lose regardless of which message calls are handled by the contract:

(* Prop #1: total supply is equal to sum of balances *)  Theorem Property_totalSupply_equal_to_sum_balances :   forall env0 env msg ml C E C' E',    create env0 msg C E    -> env_step env0 env    -> run env C ml C' E'    -> Sum (st_balances (w_st C')) (st_totalSupply (w_st C')).

Please refer to Spec.v for the complete definitions and proofs of all high-level properties.

Future Plans

We created the project tokenlibs-with-proofs to collect and publish our formal proofs of various token contracts.

We are going to prove more common properties to help the community to avoid vulnerabilities and risks.
We are going to prove more token contracts with more features, e.g. freezing, upgrading, authority manipulating, adding ownership.
We are researching on proving high-level properties with respect to game theories, e.g. fairness, optimistic strategy, Nash equilibrium.

We hope this project can be a helpful resource and reference for demonstrating the use of formal proofs in smart contracts. We also hope more smart contract developers and security experts can join to the research and the application of improving smart contract security by formal proofs. If you have any questions, suggestions and ideas, welcome to discuss in our Gitter channel.

Special thanks to Yi Tang from ConsenSys China who discussed the topic with us and provided feedback and comment.

Related Work

SECBIT Labs: awesome-buggy-erc20-tokens. https://github.com/sec-bit/awesome-buggy-erc20-tokens
ConsenSys: Tokens. https://github.com/ConsenSys/Tokens
ConsenSys: Smart Contract Security Best Practices. https://github.com/ConsenSys/smart-contract-best-practices
Michael Burge: DSLs for Ethereum Contracts. https://www.michaelburge.us/2018/05/15/ethereum-chess-engine.html
C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 26(1):53–56, Jan. 1983.
L. Lamport. Proving the correctness of multiprocess programs. IEEE Transactions on Software Engineering SE-3, 2 (March 1977), 125–143.
G. Necula. Proof-carrying code. In Proc.24th ACM symposium on Principles of programming languages (POPL’97). pages 106–119, New York, Jan. 1997.
Inria: The Coq Proof Assistant. https://coq.inria.fr/
Nielson, F., Nielson, H.R., Hankin, C.: Principles of program analysis. Springer (2015)
Acta Scientiarum Naturalium Universitatis Pekinensis: Logic Based Formal Verification Methods: Progress and Applications http://xbna.pku.edu.cn/html/2016-2-363.htm

SECBIT was founded by a group of cryptocurrency-enthusiasts. We are doing research on smart contract security, smart contract formal verification, crypto-protocols, compilation, contract analysis, game theory and crypto-economics.