3rdspace - Creating Spaces with Spatial Audio

One thing I really miss about pre-COVID times is meetups. Mingling with colleagues and friends over pizza and drinks, listening to a presentation on some new and interesting tech, then heading down to a local spot to keep the good conversations going. Online versions of these meetups always left me feeling a bit disconnected, not being able to seek out and chat with someone I haven’t seen in a while or approach someone I’ve been wanting to meet. As I often do when I have a thought like this, I set out to try and build something. Fortunately, this idea coincided with Dolby.io’s Build the World Hackathon.

The Spatial Audio feature of Dolby.io’s Communication APIs was the perfect solution for this use-case, offering a method of dynamically mixing participants’ audio while preserving the spatial relationship between them. In designing the format, I naturally gravitated toward a two-dimensional top-down view where participants can wander freely throughout the space interacting with others using real-time voice and video. This format is easy to understand as it resembles familiar physical spaces and is highly flexible.

I named the app 3rdspace. Please go and check out the live demo here.

3rdspace recently took 3rd place in the Dolby.io Build the World Hackathon. You can check out some of the other really great submissions here.

Spatial Audio

One of the biggest challenges with this format is providing a good audio experience when you have many people in a space. It was clear that I would not be able to render all of the participants’ audio at the same level, nor would I want to. Enabling smaller, more intimate interactions between participants in an online setting was one of my goals with this. My first approach to solving this problem was to adjust the volume level of each participant based on their distance from my avatar. This would at least emphasize those nearest me in the space and excluded those far which are far away. This looked something like the following:

// Determine distance from me, and adjust volume
let deltaX = sub.posX > me.posX ? sub.posX - me.posX: me.posX - sub.posX
let deltaY = sub.posY > me.posY ? sub.posY - me.posY: me.posY - sub.posY
let rounded = Math.round(Math.sqrt( (deltaY * deltaY) + (deltaX * deltaX) ))
let level = rounded < room.range ? 1 - (rounded / room.range) : 0
setVolume(sub, Math.floor(level * 100))

While this approach helps to enable conversations with those nearest to you, it does not provide any cues to where others are positioned in relation to you. Everyone around you sounds as though they are directly in front of you. I suppose I could have panned each participant based on their location relative to me; however, that would have required getting access to the raw audio channels and mixing myself—much more than I wanted to get into.

Another challenge I faced was when having to decode audio for any more than 12 or so participants. The process of communicating and rendering state amongst a large number of participants is already a lot of work on the browser, so having to decode and mix a bunch of audio streams is not ideal. For now, I would be limited to smaller gatherings.

Dolby.io Communication APIs come to the rescue with Spatial Audio support. Dolby.io Spatial Audio would solve both of the issues described above. Rather than calculate the distance of all participants directly, I would just have to tell Dolby.io where each participant was located in the space. In return, I would get back an audio mix that perfectly places each participant in the mix based on their location relative to me. Once I got a hold of the Beta SDK, I was able to integrate and experience the benefit in just a few hours. The audio mix sounded immensely better. Each individual was easier to make out, given their unique placement in the mix. The above code now looked more like this:

  // Set participant coordinates relative to me   
  const spatialPosition = {
    x: me.posX - (me.posX - sub.posX),
    y: me.posY - (me.posY - sub.posY),
    z: 0,
  };
  VoxeetSDK.conference.setSpatialPosition(props.participant, spatialPosition)

In addition to providing a phenomenal audio mix, all of the audio is now mixed server-side, removing that burden from the browser. No longer would I be limited to a dozen guests at the party, but I could now support 100s!

The Presentation Hall, allows for authorized speakers to have full volume in the space and control the screen share. Participants' volume levels are reduced to a low noise while in the hall.

Spaces

While having a bunch of people roaming around a large space mingling with one another might be great for an office Christmas party, the reality is that most physical spaces have walls, rooms, tables, and other features that impact how people interact with one another. I wanted to be able to replicate some of these features, such as a small table where anyone walking by could overhear, yet the conversation is primarily happening by those at the table; or a meeting room where you can see who is in the room but can’t hear what is going on inside; or a conference room where a panel of speakers has the microphone, and the audience is limited to low noise, until they are given the mic.

The Meeting Room gives everyone inside full volume, yet those outside can not hear or see the screen share.

To support this idea, I came up with the concept of spaces. 3rdspace comprises a number of spaces, each space consisting of seats and elements along with a volume and isolation property. Seats are clickable and immediately move the individual to that location. Elements are fully styleable divs that can take the shape of an image, an embedded YouTube video, a screen share, or anything else I might want to represent. This allowed me to completely describe a room in a format similar to HTML and CSS, dynamically loaded from an API.

The volume and isolation properties is where it gets interesting. Isolation is what allows me to configure how much audio leaks out of a space. A meeting room for example, would have isolation of 100%, meaning that 0% of the sound can leave the room. A table in a public sitting area might have isolation of 50% where someone passing by can hear that conversation at 50% of full volume. A presentation hall might set isolation of 80%, such that you can barely hear what's going on from the outside and are tempted to get inside and get the full experience.

The volume property determines how loud individuals are inside the room. A conversation table or meeting room would set the volume to 100% so that everyone has an equal and full level. A presentation hall might set the volume to 10% so that only the presenters are at full volume while everyone in the audience is limited to low inaudible noise, except perhaps by the whispers of those nearest you.

A control panel allows the room admin to configure the volume and isolation of each space.

To incorporate these ideas, I would first determine which space each participant is occupying relative to yourself. Are we in the same room? Am I in a room, and you are outside? Are you in a room and I am outside? Are we in different rooms? Since there currently is not a way to adjust the volume of a participant directly, I would apply the volume and isolation properties indirectly by adjusting the position to simulate a greater distance, as shown in the snippet below.

  // Same space together
  let adjust = 1.0
  if (user.in_room && me.in_room === user.in_room) {
    if (isSpeaker(user) || isPresenter(user)) {
      adjust = 1.0
    }
    else if (volumeBySpace[user.in_room] !== undefined) {
      adjust = adjust * volumeBySpace[user.in_room]
    }
  }
  // Different rooms, mute
  else if (me.in_room && user.in_room) {
    let myIsolation = isolationBySpace[me.in_room]
    let theirIsolation = isolationBySpace[user.in_room]
    let maxIsolation = theirIsolation > myIsolation ? theirIsolation : myIsolation
    adjust = adjust * (1.0 - maxIsolation)
  }
  else if (user.in_room) {
    let theirIsolation = isolationBySpace[user.in_room]
    if (isSpeaker(user) || isPresenter(user)) {
      adjust = 1.0 - theirIsolation
    }
    else {
      let theirVolume = volumeBySpace[user.in_room]
      adjust = adjust * (1.0 - theirIsolation) * theirVolume
     }
  }
  else if (me.in_room) {
    let myIsolation = isolationBySpace[me.in_room]
    let myVolume = volumeBySpace[me.in_room]
    adjust = adjust * (1.0 - myIsolation) * myVolume
  }

It was really exciting to participate in the beta program for Dolby.io’s spatial audio feature and share ideas for new use-cases; I am excited to see how my feedback on concepts like this one will help shape the future of the product and what new use-cases take advantage of this really unique platform.