Reliability of Behavioral Observation: Effects of Practice

Villanueva, James, Dana Wolak, Shay McManus, and Elizabeth Hill

The observation and measurement of behavior is potentially valuable, but conclusions may be subjective and variables are difficult to quantify. Techniques for increasing the reliability of observational coding include using clear definitions of behaviors, extensive training, and coding behavior from videotapes. As part of a larger study of mouse (Mus musculus) exploratory behavior, reliable measures of behaviors in an “open field” setting were created. Observers were trained and given practice in coding behaviors from videotapes using Noldus Observer XT software.

Two observers coded 5 videotapes twice each; the second trial had the addition of one new mouse behavior. The mice were observed in an open field test that assesses exploratory behavior in a novel environment. Two reliability measures were calculated on the observations, percent agreement in behavior type and correlation of behavior duration; both inter-rater and test retest reliability data were compared. In addition, two behaviors were selected for detailed analysis, rearing and rearing against the wall.

Inter-rater reliability improved with practice. Correlation between the observers was shown to start out at a good level, greater than .80, and reached .90 on the fourth trial. However, the percent agreement never reaches the .80 level. By the fifth trail, agreement reaches .71, but this level is not consistently maintained. On test-retest reliability, Observer A showed a higher correlation between the first and second videotape than Observer B. Test-retest correlation was higher than percent agreement. Similarly, correlation is higher than percent agreement when comparing measurement of rearing to rearing on the wall. Agreement for rearing on the wall was higher than for other rearing, 5/12 vs. 3/12.

Results indicated good agreement after several practice sessions and better results were obtained with behaviors that were more concretely defined. With sufficient training, observation of behavior can generate good measurement for research studies.