In the last installment of the series on the dimensions of communications the topic was the control plane. This post is focused to the Sensory Interfaces that can be used by people when communicating. Sensory interfaces are the mechanisms that we can use for communicating. Essentially, as human beings, we have 5 sensory inputs mechanisms; hearing, seeing, touch, smell, and taste. While smell and taste may have some value in certain social forms of communications I will argue they are of minimal value in business oriented communications and the need to replicate them over distance is not justifiable to the cost. Therefore, we have essentially 3 primary forms for business; audio, visual and physical.
What this means is that I can communicate by talking (or making other sounds) and the other party hears what I am sending, or I can move or motion and the other party sees, and finally, I can initiate a physical action that results in contact. Of these, touch, while important in a face to face meeting (a firm handshake is well recognized as one of the critical evaluators of an individual in western business circles), the cost of reproducing this over distance is hard and the value is mitigated. So in the end, we are left with only 2 sensory mechanisms for distance communications; audio and visual. In this postI will focus on the audio aspects and save video for the next post.
Beginning with sound, there is an obvious set of values in the sound field that a typical person hears. First, the average human has a frequency response capability that is from about 30cycles to about 15,000 cycles. We can actively hear across a relatively wide dynamic range and can often pick out lower level sounds/streams based on other characteristics. 3D Spatial audio sound field positioning is a critical part of our ability to differentiate sounds from each other. For example, if 2 people to my left are having a conversation and someone to my right makes a comment, the brain can isolate these separate streams and they will become interpreted as separate. Further, the brain naturally uses the differences in gender, accent, tone, or other voice factors to enable the listener to differentiate between people (often referred to as voice colorization). Research by the US Air Force has shown dramatic increases in isolation and identification of speakers with a combination of 2D spatial positioning and colorization. In addition, work at the Univ of Toronto has further detailed the value.
Communications today is based on a set of underlying mechanisms that dramatically limit the capabilities of our aural processing systems to maximize the value of an audio based interaction. The “world standard” for telephony is based on a 64 Kbps uncompressed audio stream that has an 8 bit representation of the dynamic range and a sampling rate of 8,000 cycles per second (8 KHz). Due to the nature of digital sampling, that translates to a maximum frequency of only 3.1 KHz. Contrast this with the 44.1 KHz sampling of CDs (generates a frequency range up to 19 KHz) and a typically 14 bit dynamic range (62 or 64 times as many defined discrete data points).
So there are two critical aspects to improving audio/voice communications, the first is using a higher resolution sampling (both in frequency and range) to create what is often referred to as HD voice. VoIP is much better suited to HD voice from 2 dimensions; the necessary bandwidth to carry even a compressed signal is available, and the capability to use compression schemes that are dynamic in the generated bit rate based on either the frequency and dynamic range of the voice stream or the lack of voice (silence suppression). Both of these factors allow the optimization of an HD voice stream when carried over a packetized network. As the first round of VoIP concludes, I believe we will see strong emergence of HD voice and interoperable HD voice standards.
From a spatial and voice colorization perspective, the capability to integrate both of these capabilities of the brain into a communications environment will dramatically alter the experience of communications dramatically. The capability of having an environment where multiple talker and sound sources is critical to the social control plane and to many new interactive environments. Nortel acquired a company DiamondWare in 2008 that is the leader in 3D Spatial Audio and colorization, with a solution that is world class in spatial resolution as well as hugely scalable. This is the underlying voice technology in the Lenovo eLounge. I believe that these capabilities will dramatically change the landscape of communications. From collaborative and brainstorming environments to social virtual worlds, from gaming to always on virtual workplaces, the capability to create a realistic audio field will transform the experience.
The combination of these two factors will truly make a remote audio experience “as good as being there”. In the next post I will talk about video transformations as a sensory media.
Other Posts in the Communications Dimensions Series:
Communications Dimensions Introduction

Enterprise Technology Categories




