UNmixing Sources, i.e., The Cocktail Party Effect ("Richard J. Fabbri" )

Subject: UNmixing Sources, i.e., The Cocktail Party Effect
From:    "Richard J. Fabbri"  <fabbri(at)NETAXIS.COM>
Date:    Mon, 19 Oct 1998 14:25:23 -0400

Dear List, Several LISTeners have asked about the Stereo example I offered to Robert Bolia and Brian Gygi on 15 October. The following points by Robert and Brian both focus on the relative importance of "Spatial segregation" ... - Robert's point was ... "I'm not sure that I see how an MAA of 1-2 degrees (the 10 microsecond resolution of the binaural system) is crucial to explaining the Cocktail Party Effect. Has anyone ever done any cocktail party experiments with talkers separated by as little as 1 degree? - Brian's further point was ... "Let's take another absurd example. You are listening to two talkers through a *single* loudspeaker. You would almost never think they were one speaker, even though the angle of separation is zero. Spatial separation is useful, but is far from the defining feature of stream segregation, for the simple fact that in highly reverberant environment, the necessary information (time of arrival, phase) can often be ambiguous." Robert questions MAA as crucial to the Cocktail Party Effect while Brian distinguishes between Spatial & Stream segregation in his Zero-Degree "Audible Angle", 2-Source, SINGLE loudspeaker. Robert's 2-Source, SINGLE loudspeaker is the subject of the following Stereo example that demonstrates a particular dynamic process always in-action during perception. What I'm about to describe has been auditioned using a great variety in the choice of source pairs. The most dramatic example of a source pair is a "recorded book" where ONE actor does ALL the reading and thus eliminates the question of any "special" advantage created by using two DIFFERENT voices. By "special", I refer to the usual spectral cues argument that different voices (different spectral composition) offer THAT difference as an independent method of Source segregation. The experiment uses a "Stereo Mixer" readily available (<$100) from any Radio Shack (Tandy) store. Using a battery operated model, I've tested many acoustic environments by travelling to different stereo systems with two cassette players and the Stereo Mixer. I've tested a range of situations from damped to "highly reverberant environments". The test has been perceived "the same" in all environments! ... ... LISTeners ... please report any perceived exceptions. This test was also demonstrated at an AES paper in Berlin, March 1993. The setup is simple. Feed one source (Chapter 1 of a recorded book?) to one loudspeaker. MIX that initial source with a second source (Chapter 2?) and feed the mixture to the other loudspeaker. Of course, your particular choice of Left (loudspeaker) source vs Right (loudspeaker) source is not an issue. In fact, during the AES Berlin demo, I had the audio engineer "reverse the room" so that Left attendees could hear what Right attendees heard and, vice versa. As pointed out by Brian, if you stand in front of the 2-Source speaker you hear the mixed confusion of two voices. You eventually perceive that two voices are present as, over a period of time, you arrive at pauses in ONE voice while the OTHER voice continues. As you continue listening, you realize that even smaller pauses allow you to catch small bursts of intelligible speech from one voice or the other. However, you also realize that, as inferred by Brian, you can NOT continuously FOCUS on ONE voice in the continued presence of the OTHER. If you now move to the "normal" stereo listening position, i.e., central to both speakers, you hear a "normal", central, stereo image of the source common to Left/Right and, the 2nd (mixed) source in ONLY ONE loudspeaker. This should be a bit surprising since you have just ... ... UNmixed ... the sound coming from the mixed (2-source) loudspeaker into ... 1) A SINGLE, perceived, Central Stereo Image. 2) A SINGLE, perceived, voice from the 2-voice loudspeaker. This UNmixing effect suddenly becomes startling if you move to a (slightly) different location! Assuming you are still centrally located, pivot toward, and walk one step closer to, the ONE VOICE loudspeaker (NOT the MIXED loudspeaker). Suddenly, two things happen ... 1) The Central Stereo Image vanishes. 2) You clearly hear ONE voice emanating from ONE loudspeaker and, the OTHER voice exclusively from the OTHER loudspeaker ! You can easily (and, CONTINUOUSLY!) FOCUS on the voice of CHOICE just as you can during a typical Cocktail Party. However, a person standing near the MIXED speaker will still hear the MIXED voices, i.e., the mixture is still emanating but, YOU perceive the INDIVIDUAL voices. As mentioned above, this test is best performed using the voice of the SAME person to source each test voice as "spectral segregation" is then (essentially) ruled-out in this Stream Segregation demo. A dynamic aspect of the Precedence Effect explains this perceived Source segregation (UNmixing). By stepping one foot closer to the 1-voice loudspeaker, the signal from the mixed loudspeaker is delayed approximately 1 ms. Thus, the common voice is detected as BOTH a Source AND as a 1 ms reflection of that (common) Source. But, the 1 ms "reflection" arrives as 1-component of a 2-voice mixture. The Precedence Effect UNmixes that mixed arrival by SILENCING the 1 ms delayed component which allows the SECOND component to be perceived as the ONLY sound arriving from the 2-voice loudspeaker. This does NOT violate the second law of thermodynamics ("UNmixing") because, the 1-voice component is TIME-locked to the 2-voice mixture and is "stream" segregated by the Precedence Effect. But, clearly, each loudspeaker was localized, i.e., EACH signal (1-voice and 2-voice) was detected and Binaurally processed to SILENCE one-component. An interesting verification of this dynamic UNmixing by the Precedence Effect is provided by the additional effect of Source Fusion. That is, the 1 ms delayed component of the 2-voice mixture is not only Silenced: it's LOUDNESS is FUSED with that of the 1-voice Source in the OTHER loudspeaker. This can easily be verified by reducing the common voice component in the mixture to zero; the EFFECT of this reduction is best heard by simply switching OFF/ON the common voice component in the mixture. If the UNmixing is caused by the Precedence Effect then, the 1 ms "reflected" component FUSES with the SINGLE voice of the OTHER loudspeaker. Thus, the ONLY audible effect of switching OFF/ON the common voice component (in the mixture) is that the SINGLE voice heard from the OTHER loudspeaker abruptly changes volume as you switch OFF/ON the common voice component in the mixture, i.e., in EITHER case (OFF/ON) you perceive only ONE voice in EACH loudspeaker but, the loudness of the 1-voice loudspeaker CHANGES ... a direct result of FUSION. The net result of the Cocktail Party Effect (CPE) is our ability to perceive the separate sources otherwise MIXED in a normal acoustic environment. Brian's case of 2-voices emanating from a single point is worst-case CPE. But, also buried in Brian's point is the case of a "highly reverberant environment" which is ALSO handled by the DYNAMICS of the Precedence Effect ... Source reflections are silenced and fused with their corresponding Sources. But, FUSION itself implies that the signal from EACH Source is detected so that REFLECTED energy CAN BE summed with Source energy. I personally do not care which label we place on our ability to ... ... "Source Segregate". The attempt to further analyze "Segregation" into Spatial, Stream, Spectral, etc is beside the point. I personally believe Spectral analysis without specific Source identity is worthless as the question of WHICH frequency components go with WHICH source remains unresolved. I personally believe Stream Analysis is INITIALLY accomplished by a 3-step SPATIAL analysis ... 1) First step by Azimuth. 2) Second step by resolving Vertical Half-Plane ambiguity. 3) Third step by Elevation. Reflections are employed to gage Source distance and the combination of 3-step Spatial and Distance (via reflections) form a source map populated by DATA Sources that are THEN analyzed for content/meaning. My AES paper (Berlin, 1993) discusses this 3-step Spatial analysis. Rich Fabbri McGill is running a new version of LISTSERV (1.8d on Windows NT). Information is available on the WEB at http://www.mcgill.ca/cc/listserv

This message came from the mail archive
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University