This dataset addresses the photo triage problem. People often take a series of nearly redundant pictures to capture a moment or scene, but selecting the best to keep or share is time and labor consuming. Our goal is to use machine learning methods to to automatically learn human preferences within a series of photos taken of the same scene.

The dataset contains 15,545 unedited photos distilled from personal photo albums. The photos are organized in 5,953 series. For each series, human preferences are collected by a crowd-sourced user study. The following figure shows several example series, annotated with human preferences – the green stars mark the preferred photo.

Photo Triage: The photo with the green star in each series is the one preferred by the majority of people, while the percentage below each other photo indicates what fraction of people would prefer that photo over the starred one in the same series.

The dataset also includes comments from the people who expressed these preferences, describes the reasons for their preferences. Here we visualize the most frequent double-word phrases from the comments, including both positive and negative concerns:

Reasons for preferred photos

Reasons for rejected photos

This dataset is available at the download page.