Massively conferenced phone calls that mimic physical space

A comment on: The Mad Hatter’s Cocktail Party: A Social Mobile Audio Space supporting Multiple Simultaneous Conversations

Question: Imagine you have several group conversations going on via conference call, but instead of having separate calls, all the groups are lumped into a single call so that individuals can migrate in and out of conversations at will. Could an algorithm identify the different “floors” of conversation in real time, so that the volume of all conversations that a caller is not in are largely muted? That is, can an algorithm simulate the acoustics of a large meeting room, where the conversation that one hears best is the one of the group one is participating in?

Method: Create a full-duplex (i.e. you can hear while speaking) conference call using mic’d iPaq PDAs that are connected via 802.11b wireless network to a GStreamer central audio exchange server. Create a “naive” Bayesian algorithm that is trained off-line with audio files from human conversations. The conversations are recorded during a party game that forces conversational groups to split up and reform. The human trainer then segments the audio according to which group is in the audio. People give subtle audio cues in their speech about whether they are participating in the conversation, such as not interrupting the current speaker but jumping in when that speaker indicates he is finishing. The Bayesian training should enable the algorithm to pick up on these cues and attentuate the volume of incoming audio streams that aren’t in the same conversation as a particular user, and make these decisions for all users simultaneously.

Findings:

– When the system assigned floors correctly, users preferred it to having no floor assignment (i.e. no volume adjustment). This makes sense, since so many people were in the call at once that it was practically impossible to have group conversations without managing the call as if they were one, all-inclusive group.

– Users suggested a “maintain the current volume level” widget, since the system sometimes reassigned other people in the conversation to another group and this wasn’t noticed until the other person’s audio was so muted that they had trouble speaking normally to each other and thereby getting the system to notice that they were actually in the same group.

– Unlike in Fact-To-Face (FTF) interactions, it often took participants quite awhile to notice that another person had moved to another group (whether by that person’s choice or by the system’s mistake).

IMO, this could be a very useful feature in stationary and mobile conferences, but only if there is really a need for people to move silently in and out of conversations. The only situation like that that I can think of are phone-based chat rooms (which don’t exist yet AFAIK) and party lines. The need for a maintain-this-volume button is also problematic with mobile use, since pressing keys on cellphones while talking is very awkward.

Intention Perception

how do we infer what people are up to just by seeing a few of their actions?

Massively conferenced phone calls that mimic physical space