Facial Computing: A Brief History, and the Promising Future, of Personal XR

Every June, tech bros, pension portfolio managers and Twitter threadbois the world over turn their eyes to Santa Clara as Apple hosts its annual Worldwide Developer Conference (WWDC, or ‘dub-dub’).

Among other things, dub-dub is best known as being the birthing ground of almost all of Apple’s suite of ubiquitous consumer technology. The original iPhone, iPad, Apple Watch, and Apple TV and subsequent upgrades have all been launched at the event.

This year, this stable of consumer devices is joined by an entirely new product for the first time since 2015. That new product is the VisionPro.

There is nothing I can tell you about the VisionPro or Apple’s vision for the future of extended reality that a quick browse of Twitter, LinkedIn or the internet at large cannot. The point of this article is instead to try and observe:

how we got here;
why people are building computers for your face and
the many divergent paths this may lead us on.

Content Overview

XR: A Brief History
Headsets Today: Code New Worlds
The Next Frontier for Spatial Computers
Requests for Startups

XR: A Brief History

For the uninitiated, XR refers to extended reality - a catch-all blend of augmented reality (AR, e.g Pokemon Go), virtual reality (VR, e.g. Oculus Rift), and ‘capital R’ Reality (e.g. Google Street View). Mixed reality is another term that exists largely on the same dimension as AR and thus will mostly be ignored here.

The modern vision for XR begins around 1935. This was the year in which Stanley Weinbaum published Pygmalion’s Spectacles. In this short story, protagonist Dan Burke is disillusioned with the reality in which he finds himself. To remedy this, he puts on a pair of glasses that transport him to a world of eternal youth and happiness. The only catch is that he obeys its rules (_T&C_s, in modern parlance).

Fast forward a few decades and we begin to see the first builds of XR in real environments. This begins with Morton Heilig’s ‘Sensorama’ (excluded from the timeline below), a sort of 4D cinema experience that provided the viewer with smells and a vibrating chair in addition to the sounds & sight of the film.

Also excluded from the timeline below is the first military application of XR. In 1961, Philco Headsight became the first headset with motion-tracking technology.

Though it has since been discontinued, the Oculus Rift was the first headset to gain real commercial appeal. Following Oculus, many of big tech’s household names joined the action. Curiously, most had different visions for who the primary customer base for these headsets would be.

The dominant expected use case for most was in immersive entertainment applications, more specifically in gaming. This was the path followed by Oculus, Sony through their PlayStation VR, and Meta with the Meta Quest line of products.

Others chose to focus predominantly on enterprise. Early pioneers in this segment included HTC Vive and Magic Leap. These early movers focused their aims on things like enterprise collaboration platforms (much like some of the metaverse plays described below), emergency response training, and medical education.

Then, the metaverse hype cycle. Corporations the world over began to envision what it would look like if we took our lived environment but made it *virtual*. The vision was essentially SecondLife, but this time with less porn and more corporate work stuff.

In the wake of COVID-19, the rise of remote work created a supposed need for collaborative digital environments. Microsoft joined the party with its own industrial metaverse. Accenture launched a ‘metaverse services’ division. Facebook changed its name to Meta and began spending $1bn per month on this vision, culminating in this brilliant presentation.

I have had fun panning the metaverse, but the hype cycle around it was an important moment in raising awareness for the arrival of what Apple would refer to this week as spatial computing. As this space evolves, there is every chance that many of these ideas come back into vogue.

Other use cases for commercial XR varied from the practical to the benign. Some of them are beginning to become commonplace in popular web back-ends. You take your pick as to which fits which description from the below:

Virtual tourism
Virtual home inspections
Virtual clothing fitting
Corporate training & education (If you’re going to watch any of these videos, do yourself a favour and make it this one)
Surgical simulations
Product design & prototyping
Disney-fied sporting telecasts.

Most of the existing devices listed above tried to achieve some sort of balance with the use cases they were targeting. The most successful to date by sales, the Meta Quest, did so by positioning hard as a gaming device.

Normally, this would be a lesson. However, if anyone is to buck this trend it would be Apple. Apple has a broad suite of already ubiquitous products (1.6bn active iPhones, to name one) that can and will serve as secondary endpoints for VisionPro applications.

While Apple’s inaugural demos for the VisionPro were all hosted within working environments, it is more than likely that it will end up being a general-purpose VR. I foresee its role as being a form of spatial augmentation for Apple’s existing ecosystem of products (e.g Maps, FaceTime, Siri et al) as its vision for computers shifts modalities. More on other potential applications later in this piece.

Headsets Today: Code New Worlds

Spatial Computing

Rather than ride the wave of today’s tech buzzwords, Apple used its informational market-making power to stamp authority on a new one at WWDC.

https://twitter.com/tim_cook/status/1665806600261763072?embedable=true

While it sounds intuitive enough on the surface, let’s delve a little bit deeper into what this term may actually convey going forward.

Besides the obvious branding benefits of coining a buzz phrase, the use of the term ‘computing’ feels extremely deliberate. It represents a logical next step from previous eras of ‘desktop computing’ and ‘mobile computing’. Just as Apple has done with the previous generations of personal computers, it will aim to make this concept and terminology ubiquitous.

To quote Tim Cook from WWDC:

“In the future, you’ll wonder how you led your life without augmented reality”

Besides the change in hardware, how is spatial computing functionally different from its predecessors?

Arguably the most important immediate shift will be from working in 2D to working in 3D. Tasks such as visualization, workspace navigation (i.e. clicking and moving things), and collaboration will all be enabled in ways not seen under previous modalities.

In the longer term, contextual awareness and interconnectivity are more likely to prove the groundbreaking features of spatial computing.

Contextual awareness refers to the ability of the computer to;

i) adapt existing recommendations and notifications to a user’s sensory context (i.e recommendations based on location and movement patterns, automatic changes to screen presentations based on user habits, etc) and

ii) provide customized (possibly agentic) assistance to users as they complete tasks in extended reality.

Interconnectivity refers to the user’s ability to extend extended reality beyond the device. Beyond just working and collaborating on holographic interfaces, users can control and interact with other devices in their environment. To take a boring example, users can change smart home settings from within the device.

In the longer run, interconnectivity may see Siri’s role as a virtual assistant evolve from voice commanded search aggregator to a living assistant that can help complete tasks from within the device based on the user’s previous habits and preferences. As the baseline capabilities of autonomous agents advance, the combination of Apple’s distribution network and the familiarity of Siri may make it the killer technology for bringing personal agents to the world en masse.

Information Presentation

As mentioned above, Apple’s demos of the VisionPro focused almost exclusively on working environments. Make no mistake, this first generation is very much geared toward information workers. How will they use it?

Firstly, there is obvious and trodden ground. Apple will re-do and re-design all the enterprise XR solutions we have seen before. They will provide resources for institutions to train their employees, students, and leaders. They will provide a bevy of try-on solutions, gaming applications, and cloud TV solutions that we have seen before.

One relatively novel example that was showcased as part of the launch was the ‘Minority Report’ style computer interfaces. Instead of having to boot up a physical device every time you want to work on a file, access it from anywhere in the cloud in an instant.

But how does this make the core experience of personal computing itself different?

Some early ideas:

Unlimited monitors at any given point in time
Multimodal forms of interacting with information work (speech, physical movement etc)
Intelligent browser navigation through sensory (e.g eye) tracking
Entirely custom 3D screen presentations automatically tailored to user preference
3D data visualization (boring but important)

Each of these will present its own universe of custom ideas that will take the place of extensions and app stores before it.

AppleLM

This tweet from Cognosys AI founder Sulaiman Omar feels prescient. Apple does indeed have a history of sacrificing first-mover advantage for the sake of developing a better product over a longer timeframe. The VisionPro is a perfect example of this, being released 13 years after the first commercially available Oculus Rift.

Integrating language models into its headsets will open up a number of possibilities for Apple that aren’t available to incumbents without their own proprietary LLMs (note: Meta’s LlaMa may put them in the same boat - Balaji Srinivasan has published a great thread highlighting their potential role in the spatial computing race).

https://twitter.com/balajis/status/1666506719084843009?embedable=true&thread=true

Among the possibilities that this may usher in:

Virtual gaming experiences trained on users’ real-life habits.
Immediate queries through the facial ‘browser’.
Agentic virtual assistants built into the headset and trained on user habits.
Spatial recommendation engines that recommend actions at a point in time.
Real-time translation.
Environmental augmentation allowing users to re-arrange objects in their space.
Generative content creation that plays out immediately as the user speaks it.

But could it be done more elegantly…

The Next Frontier for Spatial Computers

Many a commentator proclaimed the death of Google Glass upon seeing it. Who would want to wear a computer on the face? Why would you opt into looking a bit like a dork? Many have noted that not a single Apple executive even tried a VisionPro on as part of the launch.

There has to be a better reason for strapping one on than “because everyone else is”.

The key question in the long run then becomes: Why would anyone want to strap an XR computer to their face when you can have one inside your brain?

The classic counter to this would be the idea of ‘reprojection’. When everyone is wearing an XR mask, we can reproject photons to make it look as though no one is. Because reprojection involves creating entirely new photons altogether, people can even make themselves look entirely different from what they actually do without the mask. This would become a killer app for catfishing.

The techno-optimist’s concept of living in augmented reality is akin to living in a perpetual lucid dream. Sleep mask when you go to bed, extended reality mask when you get out of it. Examine exhibit A below:

The current paradigm of XR relies on these exoskeletal aids for users to navigate alternative realities. The reality is that we are already so close to no longer needing to depend on these aids. Enter the brain-machine interface.

BMIs

In the long run, the envisioned reality is likely to trump any form of mixed reality that exists today. I use the term envisioned reality here to describe any form of environmental alteration that is;

a) completely customizable to the user’s requirements or demands and

b) requires no wearable aids.

The most obvious conduit for envisioned reality today is brain-machine interfaces (BMIs). For a longer primer, I cannot recommend Tim Urban’s piece on ‘Wizard hats for the brain’ from as far back as 2017.

BMIs offer many of the same ‘spatial computation’ benefits as tools like the VisionPro but without the need for strapping a weighty device to the face. Neuralink’s existing prototypes for the first commercial BMIs involve invasive implants that, understandably, make many uncomfortable (even if the reward is superhuman intelligence and memory). What about when these become less invasive, more accessible, and thus more palatable to the human public?

Extended reality abounds.

Requests for Startups

Better battery systems for today’s XR devices. Mentions of batteries were oddly absent from Apple’s inaugural launch of the VisionPro. It is less obvious for ‘in the moment’ demos and usage than other design features like interfaces, compute, and weight (as referenced in this thread by Kyle Samani) but equally important when used day-in-day-out.

Social infrastructure for an XR world. Not since first dates moved from cafés to online chat rooms has the core wiring of how humans interact with one another been so fundamentally rocked. How can behavior be moderated in a world where people can make themselves appear to be anything? (note: Apple already seems to be working on some form of proof-of-identity protocol which may protect against this). What kind of content becomes more pervasive when the range of presentation methods expands so broadly? How do people come together online?

Arm the Rebels: Creative Tools for XR. SecondLife, Minecraft, and Roblox all created huger-than-expected businesses off the back of secondary marketplaces for digital assets. These marketplaces were enabled by creative tools baked into the very fabric of the platforms themselves.

While there are many developers today already working on designs for new experiences etc for headsets and metaverse-style projects, how can we make this process more accessible for the layman (more specifically the 8-year Roblocker) to create worlds of their own imagination? What will be the second coming of Minecraft for the augmented reality or ‘envisioned reality’ paradigm?

Designs for Lived Experiences (i.e Qualia). A large part of the appeal of virtual/augmented/envisioned reality is the idea of living a different life from the one you have now. Much like Dan Burke in Pygmalion’s Spectacles, anyone will now be able to optimize their lived experiences within virtual worlds.

In the event that such technologies do become ubiquitous, this creates a giant universal market for qualia - synthetic instances of subjective experience. The design scope is theoretically infinite. How many ways can you experience pure ecstasy? Nostalgia? The pool of rewards for those who can bring these kinds of experiences to the world through some form of extended reality are enormous.

Also underappreciated may be the development of tools for users to capture or develop these qualia themselves. Is there a market for the ‘recording’ of one’s own lived experiences to share with others? How can people engineer new experiences to take to market and compose atop other open-source qualia? How can we provide guarantees of privacy for personal qualia if they begin to be used in recommendation engines?

For the curious, check out the Qualia Research Institute to learn more.

Open-Source Hardware & XR Software Marketplaces. One thing that XR still has in common with other computing paradigms before it is its top-down nature. Users are at the whims of Apple for how they want to interact with their phones. If they want to protest this, they can select another one of the depressingly finite options on the market. To date, composability and customisability in hardware have been extremely limited.

But what if we were to open-source the development of headsets themselves through decentralized labs? Active participants could work to modify and iterate upon different hardware specifications to their preference. Modular designs could allow for customization at the layman's level. Just as importantly, these open-source labs would have no incentive to create closed-loop systems for software development. Developers could experiment with, ship, and deploy code that could be available to all headsets at the rate that they can build it.

Such open-source development would represent a step change in the way people interact with both software and hardware. For some inspiration, check out the great work the team at Auki Labs is already doing.

Private User Agents. Linked to the point above, the top-down software ecosystem of today means that incentives built into software tend to be warped. Rather than going all out on user utility, market incentives corner developers into building sub-optimal applications that aim to maximize metrics like ‘average session time’ or ‘clickthrough rates’.

The opportunity that open-source development studios would have to outcompete legacy tech companies by building in private, natural language user agents that act as functional assistants to their users is immense.

Also published here.

The lead image for this article was generated by HackerNoon's AI Image Generator via the prompt "Augmented reality headsets"