The Best Method for Bulk Fetching ERC20 Token Balances

Written by martinkirov | Published 2023/06/08
Tech Story Tags: ethereum | blockchain-data | blockchain | web3 | blockchain-technology | data | erc20 | optimization

TLDRFetching token balances for one or more addresses (at a specific point in time) is a super common task, regardless of what it is that you’re building. It isn’t exactly obvious from just scrolling through the list of available JSON-RPC endpoints. The other issue is figuring out how to do this for multiple addresses across different time periods in a reasonable amount of time. I’ll go over the different methods available (and their issues), starting with the “naive approach” and ending with the best approach.via the TL;DR App

Fetching token balances for one or more addresses (at a specific point in time) is a super common task, regardless of what it is that you’re building.

Despite being such a common task, when I first started working with Ethereum data I found it surprisingly difficult to figure out how to do this. It isn’t exactly obvious from just scrolling through the list of available JSON-RPC endpoints

Once you do figure it out, the other issue (even less obvious) is figuring out how to do this for multiple addresses across different time periods in a reasonable amount of time that won’t burn through all your API credits.

In this article, I am going to save you the pain I had to go through figuring out the best way to fetch token balances. I’ll go over the different methods available (and their issues), starting with the “naive approach” and ending with the best approach.

Overview:

  • The naive approach
  • Batching JSON-RPC requests
  • Using a multicall contract
  • Final remarks (link to code)

The naive approach

The naive approach is to make a single HTTP request using the “eth_call” JSON-RPC endpoint.

Here’s the code:

def fetch_token_balance_naive(wallet_address, token_address, block_number, node_provider_url, api_key):
    balanceof_function = "balanceOf(address)(uint256)"
    balanceof_signature = Signature(balanceof_function)
    block_number_hex = Web3.toHex(primitive=block_number)
    data = balanceof_signature.encode_data([wallet_address]).hex()
    payload = {
        "jsonrpc": "2.0",
        "method": "eth_call",
        "params": [
            {
                "to": token_address,
                "data": "0x" + data,
            },
            block_number_hex,
        ],
        "id": 1,
    }
    headers = {"Content-Type": "application/json", "Accept-Encoding": "gzip"}
    url = f"{node_provider_url}/{api_key}"
    res = requests.post(url, headers=headers, json=payload)
    res_data = res.json()
    balance_encoded_hex = res_data["result"]
    balance_encoded_bytes = Web3.toBytes(hexstr=balance_encoded_hex)
    balance_decoded = Call.decode_output(balance_encoded_bytes, balanceof_signature, returns=None)
    return balance_decoded

There are a few things that are not so obvious if this is your first time using “eth_call” so let’s quickly go over these things.

  • “eth_call” is the JSON-RPC endpoint used to call smart contract functions.

  • “balanceOf” is a function that all ERC20 tokens have. You can call this function to return how much an address holds of the token in question.

  • To use “eth_call” you need to specify two parameters: “to” and “data”. The “to” parameter is the address of the smart contract you want to call a function from. In the “data” parameter you specify the function you want to call and its inputs.

  • Not super user-friendly: But to “data” requires you to encode the function name and inputs into a hexadecimal string.

  • Similarly, the output of our call has to be decoded. To make life easier, I use the Call and Signature classes of the multicall package (see https://github.com/banteg/multicall.py) to help me with encoding and decoding.

The problem with the naive approach is that it’s super slow and expensive (in terms of API credits). If you need to fetch balances for multiple addresses, blocks, and/or tokens. For each block, address, and token you need to perform a separate request.

Batching requests

Batching requests alleviates some of the problems of the naive approach if you need to fetch balances for multiple blocks, addresses, and/or tokens.

In particular, it helps speed things up significantly. Instead of making multiple separate requests — batching enables you to do it in a single request. The code for batching “eth_call” requests is as follows:

def fetch_token_balance_batch(wallet_addresses, token_addresses, block_numbers, node_provider_url, api_key):
    balanceof_function = "balanceOf(address)(uint256)"
    balanceof_signature = Signature(balanceof_function)
    payload_list = []
    for i, (wallet_address, token_address, block_number) in enumerate(
        zip(
            wallet_addresses,
            token_addresses,
            block_numbers,
        )
    ):
        block_number_hex = Web3.toHex(primitive=block_number)
        data = balanceof_signature.encode_data([wallet_address]).hex()
        payload = {
            "jsonrpc": "2.0",
            "method": "eth_call",
            "params": [
                {
                    "to": token_address,
                    "data": "0x" + data,
                },
                block_number_hex,
            ],
            "id": i + 1,
        }
        payload_list.append(payload)
    headers = {"Content-Type": "application/json", "Accept-Encoding": "gzip"}
    url = f"{node_provider_url}/{api_key}"
    res = requests.post(url, headers=headers, json=payload_list)
    res_data_list = res.json()
    balances = []
    for res_data in res_data_list:
        balance_encoded_hex = res_data["result"]
        balance_encoded_bytes = Web3.toBytes(hexstr=balance_encoded_hex)
        balance_decoded = Call.decode_output(balance_encoded_bytes, balanceof_signature, returns=None)
        balances.append(balance_decoded)
    return balances

Some things to keep in mind:

  • You can’t batch an unlimited number of requests. You’re limited by the maximum response size, and maximum number of requests per second allowed by your plan. (This depends on the node provider you use.)
  • Despite being much faster than the naive approach, the problem with batching is that you will still end up using the same amount of API credits. Depending on your use-case this can be cost prohibitive.

Using a multicall contract

Multicall contracts are smart contracts that allow multiple function calls to be bundled together and executed as a single function call.

Similar to batching requests, using a multicall significantly speeds up bulk fetching balances. The other benefit: It’s a lot more cost efficient. Instead of being charged for each separate “eth_call” request, you’ll only be charged for a single request.

The code that uses the multicall contract is a bit long. To make it more readable I have broken the code up into two functions: the main function fetch_token_balance_multicall and the inner function create_multicall_payload_list .

def fetch_token_balance_multicall(wallet_addresses, token_addresses, block_numbers, node_provider_url, api_key):
    block_map = defaultdict(lambda: [])
    for block_number, token_address, wallet_address in zip(block_numbers, token_addresses, wallet_addresses):
        block_map[block_number].append((token_address, wallet_address))
    aggregate_function = "tryBlockAndAggregate(bool,(address,bytes)[])(uint256,uint256,(bool,bytes)[])"
    aggregate_signature = Signature(aggregate_function)
    balanceof_function = "balanceOf(address)(uint256)"
    balanceof_signature = Signature(balanceof_function)
    payload_list = create_multicall_payload_list(block_map, aggregate_signature, balanceof_signature)
    headers = {"Content-Type": "application/json", "Accept-Encoding": "gzip"}
    url = f"{node_provider_url}/{api_key}"
    res = requests.post(url, headers=headers, json=payload_list)
    res_data_list = res.json()
    balances = []
    for res_data in res_data_list:
        output_hex = res_data["result"]
        output_bytes = Web3.toBytes(hexstr=output_hex)
        returns = None
        decoded_output = Call.decode_output(
            output_bytes,
            aggregate_signature,
            returns,
        )
        output_pairs = decoded_output[2]
        for flag, balance_encoded in output_pairs:
            balance_decoded = Call.decode_output(balance_encoded, balanceof_signature, returns)
            balances.append(balance_decoded)
    return balances

The fetch_token_balance_multicall logic is very similar to what we have already seen in the previous sections. All the interesting logic is contained in create_multicall_payload_list . That being said, there is still one thing worth mentioning:

  • fetch_token_balance_multicall combines both request batching and the use of a multicall contract. The request batching was implemented to enable us to fetch historical balances across multiple blocks in a single call.

Now the interesting code:

def create_multicall_payload_list(block_map, balanceof_signature, aggregate_signature):
    multicall3_address = "0xcA11bde05977b3631167028862bE2a173976CA11"
    state_override_code = load_state_override_code()
    require_success = False
    gas_limit = 50000000
    payload_list = []
    for i, block_number in enumerate(block_map.keys()):
        block_number_hex = Web3.toHex(primitive=block_number)
        call_params_list = []
        for token_address, wallet_address in block_map[block_number]:
            call_params_list.append(
                {
                    "to": token_address,
                    "data": balanceof_signature.encode_data([wallet_address]),
                },
            )
        multicall_params = [
            {
                "to": multicall3_address,
                "data": Web3.toHex(
                    aggregate_signature.encode_data(
                        [
                            require_success,
                            [[c["to"], c["data"]] for c in call_params_list],
                        ]
                    )
                ),
            },
            block_number_hex,
        ]
        if gas_limit:
            multicall_params[0]["gas"] = Web3.toHex(primitive=gas_limit)
        if state_override_code:
            multicall_params.append({multicall3_address: {"code": state_override_code}})
        payload = {
            "jsonrpc": "2.0",
            "method": "eth_call",
            "params": multicall_params,
            "id": i + 1,
        }
        payload_list.append(payload)

The create_multicall_payload_list function creates the payload_list for a batch JSON-RPC request. For each block we create a separate payload and append it to the list.

Each payload is an “eth_call” request. The call we are making is to the tryBlockAndAggregate(bool, (address,bytes)[])(uint256, uint256,(bool,bytes)[]) function, which requires us provide it with the list of calls we want to aggregate into a single call.

Things to note:

  • If the number of balances you’re fetching is large you should set a high value for “gas_limit”. The value of 50000000 will almost always work.
  • “state_override_code” is a long hexadecimal string that needs to be provided in order for us to be able to fetch historical balances.
  • The multicall contract I am using can be found here. But, it’s also possible to use other multicall contracts.

Final remarks

All code and test cases can be found on my Github here: https://github.com/martkir/get-erc20-balance.

If you have any questions or want to give feedback on anything I have written you can let me know on Twitter @martkiro

If you’re working with onchain data you might also be interested in checking out https://www.syve.ai where we are indexing the Ethereum blockchain :)


Written by martinkirov | Data Scientist | Software Engineer | Working on indexing the blockchain at https://www.syve.ai
Published by HackerNoon on 2023/06/08