How to Generate a List of All Token Transactions Within a Wallet

Written by msokola | Published 2023/01/27
Tech Story Tags: ethereum | python | programming | tutorial | blockchain | cryptocurrency | guide | crypto

TLDRRecently I needed to prepare a list of all transactions related to one token and my company's wallet. I ended up writing a scrip that collects the information I need and have decided to share my learning via an open source script.via the TL;DR App

I was recently asked to generate a list of all token transactions undertaken within my company's wallet. Naturally, I began with Etherscan in hopes that I would be able to download and export the list of transactions — only to find that such a functionality is limited to 5,000 rows. As you can guess, the wallet I was interested in had significantly higher number of transactions. What a bummer. Which meant the only way to get the list I wanted was to write a script that collected the information I needed.

I have decided to share my learning via an open source script because I believe this problem might be common enough that other others might benefit from it. At the end of the day, all companies in the blockchain space have similar issues.

You can find the script and its source code on Github

In this article, I'm using my personal wallet to simplify examples. My personal wallet has only 17 transactions but the script on my Github works well with significantly higher amounts of transactions, so feel free to use it. I will be working on my transactions related DAI stablecoin. Let's check my wallet history on Etherscan.

As you can see, I did a bunch of transfers and a few smart contract interactions (methods such as Deposit, Withdraw, File Order, Swap Exact Token and so on). I'm mentioning the smart contract interactions because they are harder to process due to encoding, but that becomes irrelevant if you were to follow my guide.

Disclaimer

All examples in this article are written in Python. I am using Python v3.10.5 and extensively taking advantage of a library called web3.py. This library allows me to interact with Ethereum and you can install it using PyPI by typing the following command in your terminal:

$ pip install web3

If you write in JavaScript only, don't worry. You can translate my snippets to JavaScript and use the web3.js library instead. There are some differences in namespaces and naming, but the guide is largely applicable.

Let's get started!

The first thing the script needs to do is to establish connection to the Ethereum blockchain. To do so, we will need to find an Ethereum API. If you don't run your own node, you can get a free API from Infura, QuickNode or Moralis. Choose one, I'm using Infura for my personal projects (As you can see, I pasted it in line 9).

import asyncio
from web3 import Web3, AsyncHTTPProvider
from web3.eth import AsyncEth, Eth
from web3.net import AsyncNet


async def runner():
    ethereum_api_url = "https://mainnet.infura.io/v3/MY_SUPER_SECRET_TOKEN"

    async_web3 = Web3(
        AsyncHTTPProvider(ethereum_api_url),
        modules={
            "eth": (AsyncEth,),
            "net": (AsyncNet,),
        },
        middlewares=[],
    )
    web3_modules = get_default_modules()
    web3_modules["eth"] = Eth

    block = await async_web3.eth.get_block("latest")
    print("Current block:", block.number)

    # All code goes here
    await asyncio.sleep(0)


if __name__ == "__main__":
    asyncio.run(runner())

I created an asynchronous web3 provider, and I will be using it to interact with the blockchain. You might have noticed that on the snippet above I'm fetching the latest block. Even though that isn’t required to complete my task, I'm doing so to verify that my web3 provider is connected to Ethereum.

When you run this script you will see the following result:

$ python runner.py
Current block: 16436276

Now we are ready to scan the blockchain.

Hunting logs

Let's look into my wallet transaction history again, especially my very first transaction. As you can see, the first transaction is 0xe3d38...cefaa2e and it was dispatched in block 13352962. At the moment of writing this article, Ethereum mined block 16436276; it means my first transaction was executed more than 3 million blocks ago! Obviously, retrieving all transactions by querying every single block isn't really an option. Don't get me wrong - it's possible, but extremely time consuming. The whole processing would take at least few hours (or days!). It's too slow.

Let's dig a bit deeper and focus on the Logs tab on Etherscan. As you can see, this transaction has one event - a Transfer event.

Ethereum indexes logs of every transactions. This indexing means it allows to request particular logs by providing broader block ranges and smart contract addresses. The block range can be broad but the output cannot exceed 10,000 logs per call. Isn't it amazing?

Let's return back to our Transfer event. You can spot it has three topics - topic 0 is the Transfer event's signature. It tells us that all Transfer events have signature 0xddf252ad...f523b3ef. This information is crucial, we will be using it to find all transfer to our wallet but I will return to this later. Let's focus on topic 1 and 2 for now.

Most tokens on Ethereum (excluding NFTs and Ether) are ERC-20 tokens and DAI isn't any different. The easiest way to find out about topics exposed by Transfer event will be reading the ERC-20 token smart contract. Fortunately, some of them are open source and we can find their source code on Github. We will be looking into interfaces and search for the Transfer event and its definition.

// SPDX-License-Identifier: MIT
// OpenZeppelin Contracts (last updated v4.6.0) (token/ERC20/IERC20.sol)

pragma solidity ^0.8.0;

/**
 * @dev Interface of the ERC20 standard as defined in the EIP.
 */
interface IERC20 {
    /**
     * @dev Emitted when `value` tokens are moved from one account (`from`) to
     * another (`to`).
     *
     * Note that `value` may be zero.
     */
    event Transfer(address indexed from, address indexed to, uint256 value);
    
    ...
    
}

As you can see, the Transfer event has two indexed arguments - from and to - and value which isn't indexed. The indexed arguments are exposed as topics, and we will be able to request logs associated with them.

Now we know that topic 1 is the from address and topic 2 is the to address. Let's request all incoming DAI transactions to my wallet from block 13,352,962 (my first DAI transaction) to block 16,436,276 (the current block).

incoming_logs = await async_web3.eth.get_logs(
    {
        "fromBlock": 13352962,
        "toBlock": 16436276,
        "address": "0x6B175474E89094C44Da98b954EedeAC495271d0F", # DAI
        "topics": [
            "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
            None,
            "0x000000000000000000000000d5e73f9199e67b6ff8dface1767a1bdadf1a7242",
        ],
    }
)

When you look closer into this snippet, you might feel confused by two things. The first one is the address field. It isn't my wallet address but DAI address (smart contract). If you'd provide the wallet address there, you'd find nothing. Wallets don't implement any events with Transfer signature.

The second thing, the wallet address is prefixed with 24 zeros. It is related to EVM - the topic user 32 bytes data types and wallet addresses are stored in 20 bytes. It means I need to fill the "free" space with zeros.

If you run the snippet code, you will see the array of logs that looks like this one:

[
    AttributeDict({
        'address': '0x6B175474E89094C44Da98b954EedeAC495271d0F', 
        'blockHash': HexBytes('0xa67b8ceaeb79ec2592e161ee2efee6fba3fd329c87131d9335ccaa869cc857ec'), 
        'blockNumber': 13352962, 
        'data': '0x0000000000000000000000000000000000000000000000356ea11fcb4975c000', 
        'logIndex': 85, 
        'removed': False, 
        'topics': [
            HexBytes('0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'), 
            HexBytes('0x00000000000000000000000021a31ee1afc51d94c2efccaa2092ad1028285549'), 
            HexBytes('0x000000000000000000000000d5e73f9199e67b6ff8dface1767a1bdadf1a7242')
        ], 
        'transactionHash': HexBytes('0xe3d380b3647abee2f8e2980d8e3bf6e9a43b00a0f4a388765585df13ecefaa2e'), 
        'transactionIndex': 41
    }), 
    ...
]

If you read it carefully, you can spot the sender and receiver of each log (topic 1 and 2). The only missing part is the transaction value, right? Actually, it is stored in the data field but it is encoded. It can be decoded by using the toInt method from the web3.py library.

Let's try it out.

from web3 import Web3

value = Web3.toInt(hexstr="0x0000000000000000000000000000000000000000000000356ea11fcb4975c000")
print(value) # 985649123680000000000

The script returned 985649123680000000000 and this number feels a bit too high, right?

Not really, DAI has 18 decimals - it means you need to divide this number by 10 to the power of 18 (10^18) to get a "human" readable number. If you do so you will find out that the value of this transaction was 985.64912368 DAI. You can confirm this number with transaction on Etherscan.

Conclusion

Processing Ethereum block-by-block can be tedious and not necessary. Often we can search logs to find events related to transactions we are looking for. As I said at the very beginning, you can find a fully-fledged script on Github with all instructions on how to run it. The article contains only the juicy parts.

If you have any question you can start a discussion on Github or ping me on Twitter. Feel free to follow me - I tweet things related to Ethereum and software engineering.

It would mean the world to me if you would share this article on your social media.

Thank you!

Also published here.


Written by msokola | Software Engineer @ stake.fish (Ethereum, DeFi, Python, JavaScript)
Published by HackerNoon on 2023/01/27