Baby Intuitions Benchmark (BIB): Discerning the goals, preferences and actions of others

Kanishk Gandhi, Gala Stojnic, Brenden M. Lake, Moira R. Dillon

To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of others. Human infants intuitively achieve such common sense by making inferences about the underlying causes of other agents' actions. Directly informed by research on infant cognition, our benchmark BIB challenges machines to achieve common sense reasoning about other agents like human infants do. As in studies on infant cognition, moreover, we use a violation of expectation paradigm in which machines must predict the plausibility of an agent's behavior given a video sequence, making this benchmark appropriate for direct validation with human infants in future studies. We show that recently proposed, deep-learning-based agency reasoning models fail to show infant-like reasoning, leaving it an open challenge.

Paper Baselines Training Data Evaluation Data

Sample Videos