Oracles on StarkNet

Smart contract oracles protect DeFi protocols from price manipulation attacks: make the protocols safe, and are their most significant dependency simultaneously. Thus, choosing an oracle is a critical decision in the design of a protocol. StarkNet is a ZK-rollup on the Ethereum chain that solves the “scaling” problem of Ethereum in a highly-efficient manner. The network was released in November 2021, and in a short life of around 10 months, it has given birth to multiple projects from the DeFi space, NFTs, and even gaming (a whole list of projects building on StarkNet can be found here). Though the scaling problem of Ethereum can probably be solved with L2 solutions built on top of the secure layer of the Ethereum mainnet, the problem of permissionlessly getting off-chain data on chain persists.

The two largest oracles building on StarkNet are Empiric Network and the Stork Oracle Network. I had a conversation with Jonas Nelle, the co-founder of Empiric, and Vlad, founder and CEO of Stork, on their view of how these oracles are going to provide accurate data on-chain and the immense possibilities that have now been unlocked with StarkNet’s cheap computation (and a bit expensive storage cost).

Currently available data

Right now, both Empiric and Stork provide price data for some of the most prominent cryptocurrencies (BTC, ETH, DAI, etc). Along with that, Empiric Network has plans to make data like weather data in a specific place or the outcome of a sports match available. Meanwhile, the Stork team is also interested in equity prices (i.e. stocks), NFTs, and other asset classes. Once these are added, non-price-related feeds might also get added by Stork as well. The wide diversity of information made available by these oracles is very helpful for the ecosystem.

Since only price data is available as of this writing, I set out to do an exploratory data analysis on the data sent by these oracles on the Starknet testnet and compare the quality (and quantity) for both.

Stork

Some basic facts to note are:

Stork is an oracle with on-chain price computation. This means that median prices are calculated on StarkNet based on price quotes provided by individual publishers rather than being aggregated off-chain
Allowlisted publishers can publish and sign data on-chain
Right now 2 publishers are live publishing price updates on the testnet (Dexterity - a financial firm in crypto - and the other is Stork)
Has 5 price tickers available (a more updated list can be found here)
The median time between price updates was 180 seconds

Now, let’s get to the exciting part - the numbers 🤓. I analyzed the price published for BTC and ETH on the Stork contract between July 31, 2022, and August 12, 2022.

As we can see from the graph above, the price range for BTC in the given timeframe was $2403 (i.e. there are no discrepancies in the range of data). However, in some cases, there are large intervals between updates (up to 4 hours) according to the dataset. The dataset was an offline record of transactions that were sent as transactions to the network. Since this record is not supposed to be saved by Stork (there is no use case for this), the data isn’t supposed to be reliable.

Hence, I analyzed the on-chain frequency of transactions on the contract that was shared by Vlad. The on-chain transactions revealed a more optimistic observation. The longest time difference between two updates was 3000 seconds, and these updates probably coincided with the downtimes of the StarkNet network itself. Most of the updates were in the range of 2-3 minutes. A distribution plot of the differences can be found below:

The Stork team is actively adding new publishers in the coming weeks as of this writing. I believe that once more publishers are added, the updates will easily become more frequent, and the downtimes will not be that long.

Empiric Network

The Empiric oracle seems more mature and provides 20+ price feeds, including both spot and options (full list here: https://docs.empiric.network/using-empiric/supported-assets). Their codebase has been open source since day 1, and I was told they had just successfully completed their first audit. Though let us take a closer look at the data. A few facts first:

Allowlisted publishers can publish data on-chain (Empiric Network, CMT, Gemini, FTX, CEX, BitStamp, Coinbase, Kraken, Binance, Bitfinex, Coingecko)
Has 17 price tickers available (list here)
The median time between two price updates for the empiric publisher was 93 seconds
The median time between two price updates when considering all publishers was less than 5 seconds

We got the following results upon analyzing Empiric for the price of BTC between July 31, 2022, and August 12, 2022. The price range for BTC was 2468, and the maximum downtime was for 7 minutes, with most of the updates taking less than one minute (see stats below)

It might be surprising that some updates had less than 5 seconds of gap between them. This is probably due to the number of publishers sending the transactions with the updated prices. Empiric’s cofounder, Jonas, mentioned during our interaction that specific mechanisms could be implemented to explicitly favor timing coordination among publishers, and the updates may remain at small intervals. Stork can indeed implement this mechanism in the future once they have a more extensive network of publishers.

Since the price updates for Empiric also mentioned the source of the data, I looked at the actual closeness of the data to real-time prices (from the CoinGecko API) at 5-minute intervals.

An example of the data retrieved from Empiric is given below for a clearer understanding. The source is the actual source of price (market-maker, AMM, etc.) of the asset, while the publisher is the entity that sends the transaction on-chain.

By taking only the updates from the empiric set as publisher and coingecko as the source, we saw that the price was within -0.002 % of the price returned by CoinGecko API at 5-minute intervals over around 24 hours. I could not run a similar analysis on the Stork data since the data we received from Stork did not have the sources of data segregated and instead had an aggregated price feed.

It is very important to understand that a larger number of publishers on the Empiric network gives it the upside of having more frequent updates published. The bottleneck, however, is the block frequency of the StarkNet network itself. The data we analyzed above is the history of transactions sent and the corresponding timestamp of the transaction being sent. However, they get added to a block whenever a new block is created and published (which takes around 3 mins today).

How “permissionless” are current oracles?

As mentioned in this post, Stork and Empiric only enable allowlisted publishers (mostly high-volume market makers and exchanges) to publish the latest price to the respective smart contracts. I am sure both teams are working on a truly permissionless way of proving the price of an asset. However, right now, the centralized nature of the data being published is to be expected from projects at a nascent stage. Empiric recently ported the open oracle standard in collaboration with Coinbase and OKX. This allows any user to publish the price data received from their APIs with the exchanges’ signatures. As a result, even though the data source remains permissioned (and multiple publishers might use the same source), these are positive steps toward a completely decentralized oracle system.

The most prominent example available today is Chainlink. Anyone is allowed to stake some assets, spin up a node, and start sharing their knowledge about the off-chain (IRL) world. However, only certain entities can publish data to the official Chainlink price feeds (here) - which are used mainly by Defi protocols in the space. The approach that Chainlink has developed makes sense because entities with large market caps have a reputation to uphold and minimal upside to gain from (if any) by publishing dishonest data on the chain. On the other side, since anyone can spin up a node, it is the responsibility of the node maintainer to increase their outreach and build a historical reputation for being a high-quality data provider.

Moreover, since both Empiric and Stork oracles are decentralized in the way that the originator of the price update transaction always is the publisher (who has the private key to their public wallet), any malicious attempts to skew the price reflected by the oracle will only result in reputation loss. Any oracles on StarkNet can implement simple aggregation modes like the median of most recent updates to prevent a single source from skewing the data. Both Stork and Empiric employ these and similar techniques.

Will the updates remain as frequent when moving to mainnet?

It is a big “if“ to assume that the updates remain just as frequent once these oracles move to the mainnet. On StarkNet, the computation cost is infinitesimal, yet the storage cost is relatively high. Hence any updates to the storage of the network may cost higher than expected once moved to the mainnet. However, another interesting fact is that overwriting the same storage slot multiple times in a single block will not result in high storage costs. Hence you only pay storage fees while updating the value - if it stays the same, you pay nothing. The cost would still be of a single write since the final value of the slot is sent to Ethereum Mainnet as calldata. Due to these reasons, StarkNet may issue batching rebates in the future for multiple overwrites of the same storage slot.

Computational feeds and other possibilities for future research

As the computational effort required to verify STARK proofs is exponentially small compared to the computation proven, StarkNet has the potential to scale Ethereum by orders of magnitude; this gives rise to new possibilities like running live computational feeds that can aggregate data provided by oracles, calculating yield curves, and possibly even modeling them on-chain. CurveZero, a fixed rate loan protocol, is already working on an integration with Empiric's data feeds to price their fixed rate loans on StarkNet. Right now, CurveZero is live on testnet, and you can take out a test fixed yield loan on their website. Similarly. Projects like Yagi Finance can use a wide source of information from oracles on Starknet and aggregate the feeds to make decisions for depositing their users' liquidity at the highest yields possible. Yagi is integrating with Empiric and will be utilizing their volatility feeds.

Yet another approach for further improvement is the possibility of moving to L3s. Since L3s build on top of L2s, similar to how L2s build on top of L1s, publishing data will become even cheaper and more accessible, and multiple transactions will be provably posted on the L2 (unless there is a verifier contract on it). Depending on the use case, data availability may be tuned, and the recursive proofs may be employed to power the progress of this ecosystem.

Resources

The link to Colab notebooks containing all the results of DA mentioned in the blog post can be found below:

Stork evaluation: https://colab.research.google.com/drive/1-K1t4BCAXdVdliI3Dby2jyT84ulLdX44?usp=sharing
Empiric evaluation: https://colab.research.google.com/drive/1Bkct5khBkWAp-BvOZnu6VFDvMtRI1Tqz?usp=sharing
Empiric Evaluation against CoinGecko: https://colab.research.google.com/drive/17-ilkX0mgWwo9tHUCVmSypb6DrWILmnp?usp=sharing