blue(sky)print

towards a decent future

You may have heard about the Twitter "bluesky" initiative, launched by Jack last December "to develop an open and decentralized standard for social media." A small group of people, mostly from the dweb space, have been discussing and debating approaches, and have been asked to submit proposals. This is mine.

vision

People engaging in public conversation should find a healthy and diverse ecology of platforms, venues and interfaces that they can choose from, and still be meaningfully connected to the larger conversation. Ideas that others find useful or interesting should easily spread across the ecosystem, with a variety of mechanisms for individuals and platforms to identify and discourage harmful and/or untruthful content. Developers should be able to innovate rapidly and make available new user experiences, algorithms and models that can be used for participating in the conversation.

mission

bluesky will drive adoption of durable and open technologies, including a messaging protocol, to support a connected public conversation.

overall design: message-in-a-bottle

The design of bluesky must support rapid innovation and then rapid scaling. We should not over-specify details too early, but rather minimize dependencies and encourage decoupling of components. Conventions should emerge out of use. Thus, we start with minimal requirements.

A public conversation implies that users can view and respond to each other's messages. Primitives thus include users, messages/posts, and view contexts. It is useful to consider the system from each perspective:

User-centric user wishes to see notifications of messages directed towards them, replies to their messages, discover messages and contexts of interest, be able to post messages into contexts and reply to posts they discover.

Context-centric context server wishes to create user experiences, render context that may include conversation threads and posts relating to topics, and to update conversation threads in real time with user participation.

Message-centric What does a message want to do? As a public message, I wish to be displayed and copied easily to any context, for any user seeing me in any context to be able to reply or react, and for that response to propagate to all contexts in which I am displayed so that my replies can be displayed with me. I would also like the original user who posted me to be notified.

To decentralize this conversation and allow messages to flow through different providers, who display them in different contexts, it seems helpful to have an encapsulated, message-centric format to enable interaction with a message regardless of how it is encountered.

Note, context here is a somewhat overloaded term. It may refer to how a user sees a message, and it may also be a thing that can be responded to by a message, ie in the case of comments on a news article.

This simple diagram begs several questions. How is a message retrieved by address? Who stores the message, the reply list, and how is the reply list propagated throughout the conversation universe in a way that is eventually consistent? These will be answered below.

Beyond send and receive

In order for users to participate meaningfully in a public conversation, they will require the ability to discover contexts and posts of interest to them, and to have some sort of reputation to protect against the inevitable disinformation, phishing and other attacks. In particular, namespacing and schemas within messages may support a reputation feed of signed assertions.

Servers and providers, in order to offer these services in a decoupled fashion, will require metadata regarding routing and interfaces, as well as a micropayment solution in order to coordinate appropriately. Innovation in user interfaces should allow fat clients with enhanced local context and other novel user experiences.

Examples of these elements are also included in this proposal; however it is expected that bluesky will not generally be the primary provider of them.

key capabilities

several of these already exist; bluesky should however assemble the full ecosystem demonstration

protocol

* encapsulated - from a small packet of data it is possible to participate in the message thread (retrieve/send)
* eventually consistent - each participant is able to retrieve full context that becomes consistent across all platforms
* transport and storage agnostic - URIs in the message may be transport-specific, but the protocol itself is not
* namespacing & structured content - messages may include complex components
* conditions-of-use - standard language to describe conditions
* signing - messages can be signed to verify authorship, either by including a digital signature or by a verification endpoint (the latter supports deletion; the former does not).

identity and resolvers
* resolvable identifiers: DIDs, OIDC, other URIs
* decentralized routing layer with shared public ledger
* mapping of human-readable identifiers in local context to global identifiers
* key pair control of private data with device/social backup
* possibly handshake methods to avoid top level authorities

discovery service
* search
* faceted browse (by popularity, tags, or other)
* participation in storage/relay as required - gathering messages from some platforms may imply storing
* personalization - AI based optimization on explicitly shared user features
* customization - user controlled customization features

reputation filters
* reporting via structured messages (may be sent privately)
* web of trust features - invited/trusted reporters may be recognized
* support blocking of individual message authors, platform providers, or network ranges
* verified claims (OIDC or other)
* 'disinformation endpoint' returning recommendation with rich context

demonstration user interfaces
* presentation decoupled from messaging and aggregation
* integrate rich local context; capture intention rather than attention
* design-first based on non-technical users
* ease of login across UIs
* support existing user experiences

decoupled ecosystem
bluesky protocols must make possible the decoupling of services, so that the components now found within a single company silo become pluggable elements able to more rapidly evolve

Thus the reason for interop and minimal requirements is not to produce a frankensteined system or be universally backwards compatible, but to allow organic behavior and traffic to determine which conventions to widely adopt. Removing barriers for communication between systems allows for survival of the fittest over survival of the first.

key technical elements

message envelope

The proposed bluesky message format is subset rather than a superset of existing protocols.

The key requirement is the inclusion of a reply-at URI with all messages, providing a universal method to determine how to reply to any message wherever it may be found, and how to receive updates of other replies. The actual mechanism of the reply-at URI may vary, and may include requirements such as for shared storage.

A minimal specification for the message itself, may be something such as: each element of a message may be a simple string/URI, dict, or list of dicts. Dicts should (not must) include a URI for the schema. If the content element is a string it should be interpreted as the raw message. UTF-8 is recommended. Additional elements may be present.

URIs may be of any sort - ie the identifier may be a did: or it may be an http: uri such as Solid uses. A schema may be used to clue to the reader of a message what to expect from the uri, or it may be implicit. A wire format will be chosen, CBOR (binary JSON superset) is a good candidate.

Note that some message reply-at methods may imply a requirement for shared storage, payment, or some other bar to participate. It is expected that organic behavior in the ecosystem will lead to adoption of few or one preferred method over time.

Further, by allowing namespaced schemas the message content may contain assertions, reporting, micropayments and other expressive elements. Messages may also include server-server communication.

transitional bridge
In order to have real traffic exercising the protocol, we need to receive messages from existing platforms and a message router that decides where and how to transmit them to other, possibly incompatible platforms. This may require implementing one or more of:

* Matrix double-puppeted bridge.
* JSON-LD spout and webfinger instance for ActivityPub interface
* shared storage layer (such as IPFS)
* Twitter API client
* OIDC logins to existing non-dweb platforms

Existing open source projects should be leveraged as much as possible, and where possible bluesky should fund existing projects to enhance interoperablity rather than reinventing bridges.

eventual consistency and native stores
Eventual consistency is most efficient in an architecture designed for it and likely the native bluesky nodes will use some form of CRDTs. However it is also possible to designate a particular endpoint as the 'authority' for a given message and to always push replies to it, and refresh from it. bluesky will develop a 'native store' but will not require it of participants (possibly a graph db such as GUN/orbitDB).

namespace authority/mappings
To be useful, identifiers must resolve. IDX offers one promising implementation of a resolver with identity mapping and a permissionless, decentralized public ledger. Existing provider-based identifiers may be mapped to a global bluesky identifier. Handshake protocol may be used for decentralized consensus on mappings.

Messages should be signed or have a verification endpoint. dweb users can sign individual messages easily using an interface such as Metamask. Verification endpoints at platform providers may expire; context servers wishing to display a message may offer secondary verification that they had seen the original verification.

In addition, bluesky will need to map to external communication schemas as well as register new vocabularies such as reputation assertions. Mapping to existing schemas may be automated, possibly by inspection of actual messages and modeling patterns.

Local-context human readable mappings (eg petnames) for usability and discoverability, and also if exposed can create the beginning of a simple trust web.

trusted and trustless modes
Recognition of a trusted endpoint may enable more efficient or a wider range of interactions with that endpoint, just as internal company networks have more capabilities exposed than they do to external endpoints. Trustless interactions enabled by heavier weight signatures and ledgers should also be possible. eg we may trust the Twitter API to correctly represent a user without requiring individually signed messages; but new platforms may require some form of individual digital signatures.

Ocaps (Object capability keys) enable granular and revocable trust settings that likely will be needed.

Federated governance in particular is needed for trusted modes, with known human or organization responsibility for specfic servers or subnets.

reputation: modeling and back-propagation of error

Granular reputation in a decentralized space is a key problem and may require some novel solutions. The end result must be an endpoint that provides a reputation score for a user or message. To get there, possible innovations might include

- a global blockchain of credibility-staking assertions of direct knowledge - ie I saw this; I know who reported it; etc
- local credibility models including Havelaar immediate-web calculations, and Iris circle analysis for external links
- encouragement of signal-rich protocol and UI features beyond 'likes' ; ie shared bookmarks
- live random 'juries' to anchor source of truth with strong back-propagation (learn from Aragon implementation)
- manual recognition of anchors for source of truth ie organizations like Snopes, DBpedia. (could be customized)
- measure human-or-not, geolocation and other simple and provable assertions to anchor credibility.
- Wikipedia-like community of moderators with reputation scores. Especially valuable for multiple-language content streams. Moderators encouraged to debate quality of sources.
- retroactive trust propagation - after the truth of an issue is established, retroactively adjust credibility of sources of false reports (ie Khashoggi killing is a good example)
- 'undercover hoaxes' - ie intentional misinformation and tracking of the response to it may be valuable for evaluating arbiters. Obviously this must be done in a way careful not to cause harm, might be in cooperation with third parties.

All of the above would feed into AI models for determining reputation. Automated model retraining could include rules-based adjustments to connections strengths based on high-cost manual determinations.

In general, models should include the notion of 'first hand observer' vs 'reporter at n hops' of real world truths, and should model in some cases the existence of a real-world truth of simple statements that can inform the credibility of judges of more complex areas.

In addition, funding should be allocated for a hotline for cases in which individuals are in immediate physical danger.

customization and personalization, control of private data
A secure store such as the 3box or solid implementations for storing a user's social graph and personal preference data may be desired. This could be shared with trusted services (or enable the creation of a smart client) to personalize the user's experience. Existing work such as MetaMask's integration may be used to streamline this effort.

Services may need to create a depersonalized fingerprint of a user that can be used for AI models to run on, with final filtering and ordering performed only by the trusted local client on the user's device that has full access to the user's preferences and social graph.
This bucketed or layered approach to personalization may allow acceptable tradeoffs between privacy and an intelligent personalized feed. Untrusted clients may be offered secure render of private data.

Users may take control of their own inbox by requiring 'stamps' with direct micropayments before they view a message, or other filter mechanisms. Stamps with conditions for redeeming may be part of the message envelope under 'conditions-of-use'.

payment capability
In order to become self-supporting, bluesky will need to implement server-to-server payment mechanisms, such as Hedera Hashgraph tokens. This will allow bluesky to offer valuable services to external users, such as a shared reputation service or AI platform service, that are core to many vertical business models. This will also be later in the project.

Conversely, in order to attract a large developer community, gitcoin and other payment mechanisms should be used to involve many developers globally. Micropayments by end users should also be encouraged/enabled where possible (BAT, Puma)

end user experience

Storing user experience as a replayable stream of immutable posts that the user 'experienced' - actually saw, read, wrote or otherwise interacted with - will allow a variety of slice and dice local databases and UIs to present these in useful ways.

Design pattern:

expressive language => bluesky events => UI rollup

Available streams of new content optimized for the user from external sources can be integrated as well.

Design pattern:

firehose => filters & indexes => personalized pubsub

* bluesky will support new communication in existing user experiences - native twitter users, for example, will not be required to change their user experience in order to participate in the wider conversation
* it will also enable innovation around the user experience by allowing any platform to create a portal into the wider conversation.
* fat clients and encrypted personal data should enable a richer and more personalized user experience including rapid and structured access to personal message history.

bluesky itself will create a demonstration user experience. Candidate features (not all!) might be:

- Power dashboard leveraging local indexing of past tweets to create a rich UI - pin, organize, search, analyze my interactions
- Faceted feeds, view 'as-if' another user, image/video only
- Simple slack-like commands, or **expressive language** using GPT3 beta. Voice integration. Twitter meets Trello.
- Expose reputation UI elements such as a bullshit meter or bubble detector/popper (IRIS based)
- Best responder - typeahead for all contacts + automatically using the 'best' way to reach a given contact across platforms

Many options are possible, the exact features will depend on a design process that explores users' most pressing needs/pain points. Rapid innovation should be possible because of the small number of users initially on the system, and thus low stakes/low traffic requirements.

Existing open source dweb apps such as Planetary should be leveraged and contributed to. Login solutions such as MetaMask will likely be essential. Preexisting work should be built on whenever possible.

Some kind of smooth integration with encrypted private messaging will be important for when those in a public conversation want to speak privately. Possibly the robust Telegram API may be used for this, or integration with P2P libraries.

integration with / adoption by existing systems

The minimal requirements to participate are already met by several existing implementations, requiring only translation to the bluesky message envelope. Standing up a bridge may however require resources, in order to meet the shared storage or replication requirements for participating. Thus the initial bluesky testbed is likely to include modified Matrix servers, ActivityPub implementations, and possibly a GUN stack.

The intention is for the development to be strongly driven by early integration with existing systems with real traffic. In the process, the bluesky team will likely collaborate closely with existing teams, and will either develop open source libraries and bridge apis that they can adopt or simply fund their development. bluesky may also provide resources - ie run bridges.

Based on metrics of performance under load, security and other considerations, bluesky will likely choose one of the routing and delivery mechanisms as a 'native' layer; but it would be premature to make such a choice before interoperability has been achieved and metrics compared at scale.

At some point bluesky will start to charge for the bridge service which may drive existing systems to adopt libraries to interact more natively.

monetization

Advertising as a decoupled service, similar to google's adwords, may bring initial revenue to bluesky. Conditions-of-use may include advertising or payment requirements and methods.

Longer term, creating a pluggable reputation system from decentralized sources is eminently monetizable, across several domains aside from public communication - see long term plans, below. Supporting richer messages possibly including requests for products and signed transactions also opens the door for monetizing communication about transactions. bluesky will look for ways to monetize intentional activity beyond advertising to achieve a sustainable business model.

The ability to choose between intentional micropayments and viewing of ads may be supported at the level of the user experience provider.

anticipated challenges

* resistance to adoption by all players - each existing platform might prefer to have others simply adopt or merge with it
* privacy and security, protecting pseudonymity - there is a real chance that attackers will gather and analyze messages in order to do harm to individuals, and that this is a rare use of the network and so may not get sufficient attention. A 'hotline' to securely report such activity will be necessary to act quickly when it occurs. (and the hotline may itself be abused)
* excessive complexity - the ability to include structured content and namespaced vocabularies will invite complexity. Some kind of recognition for simple services that are gathering steam may help to guide developers into shared paths. Also having a public forum, zoom discussions, regular podcast, and other 'convergence' forces may be helpful in fostering communication
* the desired outcome is more than a 5-person team can do in a few years. To achieve the larger goals it will be necessary to motivate a developer community, who are able to realize some of their own goals and to benefit directly from the work and/or gain a level of ownership of the project. This has a risk of distraction, but may be necessary to realize the goals of the project.
* unanticipated challenges - another possibility, is that we may have to reduce scope to achieve key goals.

plan for experimentation / refinement in first 6-12 months

month 1: In-depth conversations with engineering teams of existing projects re requirements for integration, in particular gathering requirements and concerns from the Twitter API team; determining what will be the possible rollout of access to twitter endpoints. Standing up a repo and some simple deployments. Hiring of initial team. Creating a public presence and recruiting a community

month 2-7: MVP - simplest possible cross-platform communication. Iteration using small alpha user set to reach proof of concept of ability to communicate across platforms using the native bluesky protocol internally. In parallel work on some demonstration of value - UI or discovery. Address efficiency/scalability of internal protocol. Develop metrics to track performance. Cultivate developer community.

months 8-12: Public iteration with fixed set of beta users on existing platforms. Preparation for opening more widely. Implementation of reputation/reporting service. Alpha launch of UI with usability testing. Develop metrics to track adoption and reporting. Report progress publicly, expand beta to developer community.

org structure and long term plan

The initial organization will be a traditional nonprofit that is expected to give birth to a new form of organization with the formation of a '

bluesky constitution

' within two years.

The intention is for the final form of bluesky to incorporate democratic governance mechanisms including work-weighted stake by developers, possibly thru a DAO. To this end approved contributions of developers will be tracked with the intention of giving all contributors some form of recognized stake in the new organization. Core rights of users should also be recognized, such as freedom from physical harm or right to privacy. The mission of bluesky should be legally recognized, possibly in a B corporation. In particular administration of the reputation service may require a governance mechanism for decision-making. Community building will be prioritized, eg thru multi-stakeholder hackathons connected with the startup ecosystem.

bluesky will aim to have multiple sources of support by the end of five years, likely including services that may provide direct value to clients. The community may choose to create spin-off vertical applications under the bluesky organization that may also provide income streams. The key requirements of for example, Upwork or Amazon, can be achieved in a decentralized manner with some decoupled centralized services (of which alternatives could be chosen).

- user to express what they are trying to find - a contract worker, or to order a book
- user to instantly discover and browse potential providers 
- providers to respond to the user's expressed wish
- public reputation score 
- smart contract or micropayment streams between participants

If these generic capacities can be provided, capability for any type of transaction can be built on top of them and should be financially rewarding. bluesky may participate in or collaborate with domain specific partners in parallel with protocol development.

The primary purpose of the bluesky organization however, will be to promote durable and open technologies that support a public conversation.

Many concepts and priorities in this proposal came out of discussions with others, in particular Chris Webber (ocaps, petnames, stamps), Matthew Hodgson (shared storage, bridges, puppets), Mark Nadal (CRDTs, the AI schema idea, Iris' algo), Eugen Rochko (importance of federation; deletion), Danny Zuckerman (Metamask), Michael Sena (DID resolving, routing), Sarven Capadisli (importance of fat clients), Jeremie Miller (adoption behavior), Ian Preston (trust and reputation) Jay Graber (overall ecosystem), Rahul Kothari (Hedera efficiency, overall feedback), Dan Genduso (governance models, overall feedback), Tim Bray (OIDC, wire format, other specifics), Robert Schwentker (governance, community building), Jake Brukhman (public good DAOs)