Real Time Editing is Kind of Hard

(image from Figma real-time editing post)

I was just looking through Quip’s recently released Live Apps documentation. If you are not familiar with Quip, it is a document system that baked in real-time editing, communication and collaboration from the start. It was founded by Bret Taylor, formerly CTO of Facebook and was acquired by SalesForce in 2016.

Quip started going down the compound document / universal canvas path when it added embedded spreadsheets to what was originally a pretty basic rich text canvas. The Live Apps extensibility model now allows new types of components to be created and embedded in the document canvas.

These are effectively mini web apps that can leverage the Quip SDK to support real-time editing. I have been building compound document editors since the mid-80’s so it’s interesting to see another take on this. Building on JavaScript and web technologies makes complete sense right now since there is so much tech available for your developers to build on. The fact that Quip’s document canvas (both web and mobile) is using an HTML surface also makes this a natural approach for them. Microsoft Office also used web technologies as the basis of their newest extensibility mechanism released over the last few years for similar reasons.

When you start embedding objects on a document surface, you pretty immediately get into the difference between linking and embedding. The question is whether the state of the object is directly stored with the document or whether the object just links to some state stored in some other service. The advantage of embedding is that cut/copy/paste within the document surface works naturally. This straight-forward behavior also applies if the document as a whole is copied or managed in some other way.

Linking gets complicated (in general, not just for Quip). I will avoid exploring those issues here since I want to focus on real-time editing and the functionality Quip provides for embedding state. So let’s stick with the embedding case for this post.

Quip’s whole focus is on real-time editing and collaboration, so they really wanted to make sure this was supported for their new Live Apps objects. The approach they took is to expose a set of APIs that allow an application to build up its state as a set of key-value dictionaries (“Records”) and arrays of these dictionaries (“RecordLists”). Local changes to these objects made through editing actions are synchronized in real-time between different instances of the app and this state is then preserved with the document.

The documentation does not discuss it (or at least I was not able to find anything there) but I presume they are using something like Operational Transformation (OT) in order to allow for the real-time merging of changes to the object state when multiple users are editing a single object at the same time. OT is what Google uses for their real-time APIs. These APIs came out of the Google Wave project and then ultimately became a key part of the underpinning of Google Apps. The OT protocols for the arrays and dictionaries Quip exposes are well understood (I have a pretty complete implementation that I wrote myself up on GitHub for anyone that wants a deeper look at this technology).

Real-time collaborative data APIs have had a tough road over the last few years. Apple extended CoreData to support iCloud data syncing but got whacked upside the head by unhappy developers who tried to use the APIs and ran into problems. They are now in the process of deprecating those APIs rather than enhancing them. DropBox provided a set of data APIs with functionality similar to what Quip is providing but have also since deprecated them. Google’s APIs are still available but based on traffic on StackOverflow there does not seem to be wide adoption (happy to find I am wrong about that but I don’t hear much about them).

There are certainly a range of reasons for the failure of these APIs. If “data is the new oil”, putting your data in someone else’s data store is a risky proposition. (Note that in these scenarios, the app has access to the data but the app writer does not since it is stored in an external service and gated by user authentication.) It is a platform choice that significantly limits your future choices — you have tied the heart and soul of your app (including user authentication) to Google or Dropbox or Apple’s iCloud.

I think a more basic reason for the problems is that the APIs just don’t work — and that has direct consequences for Quip’s Live Apps. That statement is a bit unfair — the APIs work as defined but don’t actually do what developers want or need. The problem is that apps are complicated — and that complication ends up intersecting with real-time merge in messy ways. The restriction to use objects and arrays is not overly constraining from a semantic modeling perspective — anyone building a web app is essentially using these primitives. The challenge is that merge happens at the level of these primitive elements and additional app-specific semantic constraints cannot be enforced when the merge happens. So although conflicting edits are properly resolved through OT at a primitive level, the application is left in a semantically inconsistent state.

These issues can be subtle — it is a bit like looking for data race bugs in multi-threaded programming. They only tend to occur when you actually have multiple users pounding on your app — which of course are exactly the most high-profile high-value times for the app developer.

Let me explore a pretty simple app to give you a flavor for the issues. Let’s say I want to build a “King of the Hill” Live App. I can add and delete names of “players” and I can specify one player as the king of the hill. The constraint is that I want to ensure that if there are one or more players, one of them is the king. Let’s look at different ways to model this and what happens with merge.

Let’s say I have a root Record that points to a RecordList of non-king players and also points to a Record representing the king. To make a new king, I insert the current king into the RecordList of players, delete the player I want to make king from this RecordList and set that player as the king in the root. At the OT level, this gets ugly. If these actions happen at the same time in two different instances, the OT resolution would insert the old king into the player list twice, and would overwrite the current king with the last arriving king in the root record, essentially losing the other new king. So I am broken in two ways — I have the same player in the list twice and I lost a player along the way. It was a perfectly fine way to model my app, but it fails abysmally at the OT level.

OK, let’s try again. Let’s just have one RecordList of players and then have a “king” boolean property that I set on one of them. Setting a new king involves clearing this property on the old king and then setting it on the new king. This fails pretty simply — if I merge two operations I end up with two records with the king property set to true.

Note that this has nothing to do with whether you can place transactional boundaries around multiple independent edits to the data structures (setting and clearing the king property in this case). The problem is that OT is not applied at this higher semantic level — it still is applied at the level of primitive data structures. The transactional functionality that Google’s real-time APIs expose is not about solving the OT merge problem but rather is just focused on ensuring that individual edits arising from one client do not arrive at other clients independently and expose intermediate inconsistent states. I assume Quip defines an implicit transaction around all edits executed sequentially by a Live App object to address this issue without having to make the exposed API more complex.

OK, one more. Instead of storing the king as an explicit record, let’s keep all the players in the record list and just store the ID of the king record in the root. So setting a new king just involves setting this ID. If two instances do this at the same time, the last instance wins and I am still semantically consistent. Oops, what about deleting players? Now if one instance sets a player as king at the same time as another instance deletes that player, I end up with a king ID pointing to a non-existent player.

I do have an approach that will work inspired by that Figma post I linked to above (essentially use arbitrary precision real numbers to specify ordering of players and have lowest ordered player be king). The other common approach is rather than modeling the explicit state, just model the operations themselves as a journal of actions. OT is trivial since I am always just appending actions to a RecordList. The app logic in the client “plays” the journal and can apply semantic corrections as necessary since it has the full history. That approach is actually similar to the approach I used in modeling a chess game in the GitHub link above. It does seem like cheating a bit. It also has the consequence that the state can grow without bound.

If I was actually building a web app with a service back end, I clearly would just define a set of actions exposed by the service and the service would serialize them and guarantee the semantics are maintained as the actions are applied. If I was trying to implement this using OT, I would define the operations and the transformations on them at a higher level of semantics than arrays and dictionaries. The key point is I have a place to write custom code to enforce custom semantics. The other key point is that trying to understand whether OT will break your application semantics is quite difficult — even for a very trivial example like this.

If my Microsoft Office experience is any guide, the reality is that most useful Live Apps will actually be links to information that is stored and accessible elsewhere (like the Salesforce record Live App example they provide in their set of sample apps) rather than complex objects with complex embedded state. Trying to actually create a universal canvas by building on these kinds of isolated extensible objects ends up running into lots of deep challenges beyond real-time editing. But that’s a topic for a different post.

[Edit and clarification: A conversation with Quip engineering clarified that they are NOT using OT for the resolution and merge of conflicting edits, but that the mechanism they are using has similar constraints — as it inherently must since the APIs have no way for a Live App developer to provide additional semantic information about interdependencies between different parts of the state tree that could influence the merge process.]