audio-visual correlation (Scott Lipscomb )

Subject: audio-visual correlation
From:    Scott Lipscomb  <lipscomb(at)UTSA.EDU>
Date:    Thu, 12 Feb 1998 13:59:39 -0600

List members: I have been reading with interest the series of recent postings in response to Al Bregman's posting of a student query regarding the interaction of audio and visual components in film and animation. I posted a brief message myself - directly to Dr. Bregman - referencing my previous research. Several individuals have requested further information about the research projects to which I referred in that message, so in response I will post the abstracts & publication info for those two investigations. If anyone would like further information, feel free to contact me at the email address indicated below ... or we can continue this interesting forum as a group. ********* Lipscomb, S.D. & Kendall, R.A. (1994). Perceptual judgment of the relationship between musical and visual components in film. Psychomusicology, 13, 60-98. In this study, the authors investigate the relationship between the musical soundtrack and visual images in the motion picture experience. Five scenes were selected from a commercial motion picture along with their composer-intended musical scores. Each soundtrack was paired with every visual excerpt, resulting in a total of twenty-five audio/visual composites. In Experiment I, subjects selected the composite in which the pairing was considered the "best fit". Results indicated that the composer-intended musical score was identified as the best fit by the majority of subjects for all conditions. In Experiment II, subjects rated all twenty-five composites on semantic differential scales. A highly significant interaction between audio/visual combination and the various semantic differential scales was found. Analysis of this interaction revealed that the composer-intended combination yielded higher mean scores in response to the four adjective pairs of the Evaluative dimension. Clustering the subject responses into two factor scores (Evaluative vs. a hybrid of Activity and Potency), confirmed these high Evaluative mean scores. In addition, the response contours of the Activity/Potency dimension remained relatively consistent, suggesting that music exercises a strong and consistent influence over the subject responses to an audio/visual composite, regardless of visual stimulus. The results corroborate previous research, indicating that a musical soundtrack can change the "meaning" of a film presentation. Comparison of the various soundtracks in music theoretical terms assisted in identifying musical elements which appeared to be relevant to specific subject ratings. These comparisons were utilized in the formulation of a model for music communication in the context of the motion picture experience. *********** Lipscomb, S.D. (1995). Cognition of musical and visual accent structure alignment in film and animation. Unpublished (yet) dissertation, University of California, Los Angeles. This investigation examined the relationship between musical sound and visual images in the motion picture experience. Most research in this area has dealt with associational aspects of the music and its affect on perception of still pictures or "characters" within film sequences. In contrast, the present study focused specifically on the relationship of points perceived as accented musically and visually. The following research questions were answered: 1) What are the determinants of "accent" (i.e. salient moments) in the visual and auditory fields?; and 2) Is the precise alignment of auditory and visual strata necessary to ensure that an observer finds the combination effective? Three experiments were conducted using two convergent methods: a verbal attribute magnitude estimation (VAME) task and a similarity judgment task. Audio-visual (AV) stimuli increased in complexity with each experiment. Three alignment conditions were possible between the musical sound and visual images: consonant (accents in the music occur at the same temporal rate and are perfectly aligned with accents in the visual image), out-of-phase (accents occur at the same rate, but are perceptibly misaligned), or dissonant (accents occur at different rates). Results confirmed that VAME ratings are significantly different to the three alignment conditions. Consonant combinations were rated highest, followed by out-of-phase combinations, and dissonant combinations received the lowest ratings. However, as AV stimuli became more complex (Experiment Three), consonant composites were rated less synchronized and dissonant combinations were rated more synchronized than the simple AV composites in Experiment One. Effectiveness ratings failed to distinguish between consonant and out-of-phase conditions when considering actual movie excerpts. An analysis of variance over the VAME data from all three experiments, revealed that this difference between subject responses to simple animations and responses to complex film excerpts was statistically significant. A similar result was observed in the similarity scaling task. Responses to simple stimuli divided clearly on three dimensions: visual component, audio component, and alignment condition. However, as the AV composite became more complex, the third dimension appeared to represent AV congruence (i.e., appropriateness). Modifications to the proposed model of film music were suggested. Dr. Scott D. Lipscomb Assistant Professor, Assistant Director, Undergraduate Advisor of Record UTSA Division of Music, AR 3.01.58A 6900 N. Loop 1604 West San Antonio, TX 78249 (210) 458-4354 phone (210) 458-4381 FAX lipscomb(at)

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University