Peer Review with expert networks on the blockchain

Determining ‘right’ and ‘wrong’ for subjective crowd input using a global expertise bank and expertise-weighted reviews

In my last article, I wrote about the need for Social Search — using human intelligence and experience to answer questions that Google and AI cannot answer.

For a social search platform like Proffer (or any system that relies on crowd intelligence) to work and to offer high quality responses**,** there needs to exist a reliable system for evaluating crowd-provided answers — for categorizing answers as better or worse, right or wrong. Without a mechanism to measure ‘correctness’ of an answer, there would be no way to rank or filter the answer space. Furthermore, in a crowdsourcing platform that offers rewards or penalties to encourage responders to participate, there’d be no way to determine which responders to reward, which responders to penalize, and by how much.

Peer review (subjecting an answer to the scrutiny of other experts in the same field) is the traditional approach for assessing crowd inputs. It’s a powerful idea and widely practiced in academia and online communities, but suffers from political manipulation, a lack of granularity/specificity in the measurement of expertise, an ability for reviewers to falsely accumulate expertise, and an unnatural constraint whereby votes from skilled and unskilled reviewers have equal impact on the outcome.

In building Proffer, we sought to create a protocol for decentralized peer review that avoids the pitfalls of traditional peer review and runs on the blockchain, adapting to cover future crowdsourcing and social search use cases on a global scale.

Don’t want to read? Go through the slides for a protocol overview and example.

This article is structured as follows:

Problems with peer review as implemented today
Solution and Key Features: a novel peer review platform built on the concept of self-optimizing expert networks, where reviews are weighted by the reviewers’ expertise, where ‘correctness’ of answers is based on cumulative skill of its supporters minus that of its detractors, and where expertise for all reviewers for all topics is stored and updated on the blockchain (global expertise bank).
Walkthrough of crowdsourced peer review on Proffer, a Physics Q&A use case.
Bootstrapping a global expertise bank (what happens at t=0)

Problems with Traditional Peer-Review

Peer Review is politically influenced: It puts significant power in the hands of a few without offering the necessary checks and balances, or the necessary incentives to encourage unbiased, thoughtful reviews. Just ask the academic community — e.g. in this Ars Technica article, Chris Lee describes academic peer review as “utter randomness,” citing the undue influence that reviewers’ “mood, medication, and memory” play on the outcome. He proposes a decentralized peer review process for academic publications that opens up each review to a crowd of 100+ panelists and significantly improves upon the quality and efficiency of traditional peer review.
Expertise is arbitrary, inaccurate, and easily manufactured. For example, anyone can endorse anyone for any skill on LinkedIn. Reddit users can accumulate ‘karma’ from one subreddit (e.g. /r/dogs) to qualify them to post on a completely unrelated subreddit (e.g. /r/ethereum). It’s hard for our computational systems to identify true expertise, and such systems can be fooled. Expertise should be measured based on actions and outcomes, not on self- or crowd-reported inputs with a single data point.
Expertise is fragmented across apps, i.e. Reddit “karma” for blockchain topics doesn’t carry any weight on Stack Exchange or Quora blockchain topics. A ‘top writer’ for Artificial Intelligence on a Medium blog will see no benefits of that reputation when posting on an AI thread on Reddit. Users are forced to re-create their reputation on each new platform or app they join, and must maintain this reputation separately cross the various platforms, leading to a missed opportunity for network effects and usage driven by shared reputation across apps.

Crowdsourced Peer Review: Key Features

Global expertise bank: stores every user’s skill in every topic on a common decentralized ledger. It is ‘global’ as opposed to ‘fragmented’ because all dApps read/write expertise to this bank rather than maintaining their own application-specific skill stores. A global expertise bank benefits from and creates shared network effects across multiple applications — a user can switch from one math tutoring dApp to another while retaining his/her expertise in topic ‘Math’.
Skill-weighted peer reviews: A mechanism to objectively determine correctness of a subjective answer by the cumulative expertise of those who answered or upvoted it minus those who downvoted it. This allows more skillful responders to have a greater influence on the correctness of a response than less skillful responders.
Self-optimizing expert network: A network of experts where expertise can be gained or lost by any individual, pushing the network closer to ground-truth over time. i.e. false expertise gets filtered out as incorrect responders lose skill, and true experts rise to the top as they provide correct responses repeatedly.

Crowdsourced peer review on Proffer: a Physics Q&A example

Step 1: Receive a question from Seeker, broadcast to qualified responders

Seeker asks a question that he/she wants crowd input on, e.g. “Why is the sky blue?” Seeker optionally backs the question with money (SeekerStake) that will be distributed among correct responders. This SeekerStake seeds an incentive pool called Token Backing Pool. A new token backing pool is created for each question that is asked on the platform.

Step 2: Start peer review process. Receive answers and votes from responders

Responders a.k.a peer reviewers see the question and can either respond with a new answer, or upvote/downvote answers previously submittedby other responders. Counterintuitively, responders must also contribute a stake (ResponderStake) to the Token Backing Pool as an expression of their confidence in their response; this stake is returned to them unless their answer is deemed incorrect (explained below) by the peer review protocol. The requirement of a ResponderStake dis-incentivizes spam and puts responders’ skin in the game.

Step 3: Compute the net ‘skill’ supporting each submitted answer

The protocol keeps track of the “net skill” backing each answer that is added to the answer space. The SkillBacking for an answer is equal to the skill of the responder who first proposed it plus (+) skill of all responders who upvoted it minus (-) the skill of all responders who downvoted it. SkillBacking is a skill-weighted measure of the correctness of an answer at any given point in time.

“Skill” here refers to the very specific / granular skill of the user in the topic being addressed in the question, which in this case can be ‘Physics’ or ‘Optical Phenomenon’. The skill is read from the Global Expertise Bank and written back to it each time it’s updated.

Shown above, five different responders (Joe / Sam / Jenny / Andrea / Brett) contribute five new answers to the question “Why is the sky blue?”, paying Answer Stake into the Token Backing Pool for each answer.

Shown below, 15 different responders, named P1 thru P15, choose to upvote or downvote the five answers already in the Answer Space, paying Vote Stake into the Token Backing Pool for each vote. Note that each vote updates the net skill backing of each answer: an upvote adds the upvoter’s skill and downvote reduces net skill by the downvoter’s skill.

Step 4: Use net ‘Skill Backing’ to determine which answers are Correct/Incorrect and accordingly compute payouts for each responder

After P1 thru P15 have submitted their votes, the answer space has five answers, each with a net skill backing. At this point, we can either use a heuristic such as “net skill backing is > zero” to determine which answers are correct, or we can present all options to the Seeker who initially asked the question and allow him / her to choose what is the correct answer.

The best way to determine ‘correctness’ will depend on use case, so our protocol provides two configurable parameters rather than prescribing the judge of correctness/incorrectness:

Let’s assume for this example that Judge of Correctness = ‘seeker’ and Judge of Incorrectness = ‘peers’. This means that we can determine incorrect answers based on net skill backing. Answer 3, which received a net skill backing of -70, and Answer 5 with net skill backing of -15 are therefore Incorrect.

Answer 2 is deemed Correct because we make the assumption that the Seeker will select Answer 2 as the best answer since it has the highest net skill backing. Answers that are neither Correct nor Incorrect are labeled undecided.

We now use the following guidelines to compute two types of payouts — financial and skill — for each responder:

Guidelines for financial and skill payouts for each responder

Skill is an open/indefinite quantity. It can be given freely and taken freely. Performing skill payouts is therefore as easy as increasing or decreasing the responder’s skill on the Global Reputation Bank for the topic at hand.

Skill updates after Answers 3 and 5 determined to be Incorrect and Answer 2 to be Correct.

Money / tokens on the other hand are a closed / zero-sum quantity. They are re-distributed, not created, from incorrect responders to correct responders.

The protocol first returns the money that responders with Correct responses and undecided responses had put into the Token Backing Pool at the start of the process.

The remainder in the pool after these stakes have been returned consists of the original SeekerStake, and the AnswerStake(s) and VoteStake(s) of incorrect responders. Finally, this amount can be distributed across correct responders as their reward for answering or voting correctly.

Possible outcomes in crowdsourced peer review. Incorrect responders lose what they had put in. Correct responders recover what they had put in AND get an additional reward.

Bootstrapping the Global Expertise Bank: getting it right at t=0

In steady state, the self-optimizing expert network described above updates the expertise of each responder over time based on his/her current expertise, his/her response, the crowd’s current expertise, and the crowd’s responses.

Expertise of person ‘p’ at time ‘t+1’ is a function of that same person’s expertise and response at time ‘t’, and the crowd’s expertise and response at time ‘t’

As long as expertise in the system is a reliable metric at time t, we can rest assured that it will be reliable at time t+1, getting closer to ground truth / ‘actual’ expertise of all users with time.

Our peer review protocol and the global expertise bank are capable of working out of the box, with all responders starting off with Skill = zero across all topics. The law of large numbers will ensure that with enough iterations, responders who started off with no skill will have won or lost skill based on the crowd’s voting for or against them, equilibrating somewhere close to their actual skill relative to their peers.

However, to improve the rate at which this skill equilibrium is reached, and to provide the highest quality answers right from t=0, particularly for industry topics that require specialized knowledge or certification, we propose bootstrapping the expert network by manually selecting and on-boarding experts from pre-existing networks, both online and offline, formal and informal.

For example, in the field of healthcare, one could onboard physician networks and initialize skill for each physician in their respective practice (‘Cardiology’, ‘General Medicine’, etc.) based on a combination of past experience and in-person interviews.

The same can be done with practicing lawyers for topics concerning the Law, with school teachers for topics in k-12 education, with land developers and construction teams for topics concerning real estate, etc.

Seeding an expert network through a manual process would be neither trivial nor quick, but in a world where the nature of work is already tending towards ad-hoc gigs that allow workers to monetize their free time, an expert network on the blockchain with frictionless reward payments for experts who answer correctly could be an exciting platform to be a part of. This would be especially true for Proffer’s global expertise bank, as experts would be able to leverage reputation accumulated answering questions on Proffer while using other dApps built on the same global expertise bank.

If you’re curious to learn more about Proffer, the social search protocol we’re building on top of crowdsourced peer review and self-optimizing expert networks, check out our tech spec here, and the 5 apps we’ve published to showcase five different use cases for Proffer on our website here.