ObjectTracking

From Psych 221 Image Systems Engineering
Revision as of 22:09, 10 December 2015 by imported>Projects221 (Motivation of methods)
Jump to navigation Jump to search

Applying a computer vision object tracking algorithm to track musicians’ ancillary gestures

Introduction

The motivating question: Musicians gestures at a subphrase level

When watching a musical performance, it is possible to notice that musicians’ body movements seem closely related to their expressive intentions and to their conceptualization of musical structure, beyond the movements made necessarily to play the instrument. In recent years, many studies have noted the correspondence of body motion at the level of the phrase (Krumhansl & Schenck, 1997; MacRitchie, Buck, & Bailey, 2013; Vines, Krumhansl, Wanderley, & Levitin, 2006; Wanderley, Vines, Middleton, McKay, & Hatch, 2005). Looking at clarinetists, participants in Vines et al., (2006) tracked the phrasing of clarinetists under audio-only, video-only, and audiovisual conditions; the participants with only video cues could follow phrasing very well, indicating how communicative the visual channel can be for audience members. Indeed, performers used body motion to modulate audience perception of phrase length, in that, for the visual-only condition, they extended the perception of the length of the last phrase by continuing to move their bodies. A more recent study, MacRitchie et al. (2013) used PCA to analyze motion capture performance of 9 pianists playing two excerpts of Chopin preludes. Their findings suggested that overall motion in a performance arises from the performers’ representation of the musical structure, and that repeating body motion patterns at the level of the phrase can be observed.

These inquiries have largely focused on how musicians physically express musical structure in ancillary gestures on the timescale of the phrase. However, it is possible that ancillary gestures at a sub-phrase level reflect musicians’ conceptions of musical structure. As noted in Godoy et al., (2010), motion of the head of the pianist was cyclical at shorter timescales or melodic figures. Also, Vines et al., (2006) mentions changes in performers’ facial expression changes at important structural moments in the score, though what these facial expressions signified – if they were reflective of some sort of emotional change, or were more related to depiction of emphasis – is not clear. These kinds of gestures are the interest of the current study – those that do not reflect task demands, or capabilities and physical constraints of subjects, but rather planned conceptual grouping and the possibility of physical expression beyond the means necessary for playing, - and if they corresponding to musical structures on a level smaller than the phrase.

A perhaps complementary line of research exists in the field of linguistics, in which it was found that body motions not directly related to speech-producing actions aid in listener understanding of speech. For instance, rhythmic head motion conveys linguistic information, with head movement correlating strongly with pitch and amplitude of the talker’s voice (Munhall, Jones, Callan, Kuratate, & Vatikiotis-Bateson, 2004). Further, when animations of these “talking heads” were presented in a perception task without sound, participants correctly identified more syllables when natural head motion was presented in the animation than when it was eliminated or distorted. This result suggests that nonverbal gestures such as head movements play a more direct role in the perception of speech than previously known. While one might extrapolate that, from a listeners perspective, musical structure might be more ‘understandable’ when a visual corollary plausibly matches it, the takeaway from this study is that natural head motions arise in conjunction with sub-sentence speech, and these tendencies seem natural to both the speaker and the listener.

Motivation/Goals

With a goal of exploring how musicians express structural information on a sub-phrase level, an ideal avenue of consideration is to investigate gestural cues when performers are conceiving the music as mostly containing small melodic groupings rather than long phrases. While there are multiple motion capture systems on the Stanford campus, there is a lack of sufficient performer population at the level of students wanting to become professional musicians, and thus, I felt it necessary to travel off campus to get a sufficient number of participants, perhaps at a slight expense of the quality of the data. Evaluating the quality though, of the ad hoc 'optical motion capture' system I used to obtain performers motions, was the purpose of this project. In the following sections, I will describe the input data (collected directly prior to the start of term), as well as the setup I used to collect the data. Then, I will describe my evaluation of the object tracking algorithm I used to track colored markers attached to the performers body while they played.

Background

Methods

Results

Conclusion

References