Head pose estimation dataset
In this WWW page a dataset for head pose estimation, annotated with ground truth data, is availed. This dataset is described and referred to in the following publication:
X. Zabulis, T. Sarmis, A. A. Argyros, "3D head pose estimation from multiple distant views", British Machine Vision Conference, London, UK, 7-10 September, 2009.
Please cite the above paper, if you use this dataset.
The video demonstrating the results of the proposed method, in the above paper, on the availed ground truth dataset and additional ones can be found here.
The dataset was acquired in a 5x5m2 room where cameras are used to visually interpret human activity. Eight cameras are mounted at the corners and at the in-between mid-wall points of the room viewing it in yaw-steps of 45 degrees. The cameras are pointing at the floor center in a relative pitch of 43 degrees, on average. Their height is 2.6m from the floor. At the same height, a ninth camera is mounted on the ceiling, overlooking the floor. All cameras have 66x51 degrees field of view and 960x1280 pixels resolution. It is worth noting that the specific camera setup has been decided to generically serve the purposes of human activity interpretation and was not optimized for the particular task of 3D head pose estimation.
The dataset was collected using a mannequin’s head, mounted on a tripod with 2 degrees of freedom (pitch, yaw) and marked rotation gratings. The head’s was 1.3m from the floor, emulating the head locus of a sitting person. To modulate roll, the head was unmounted and rotated by 90 degrees; thus, during this modulation, ground truth for yaw was unavailable.
The main part of the dataset sampled a hemisphere of poses, consisting of seven 360 degree yaw rotations, in steps of 20 degrees. In each, the first and last frame imaged the same pose. The pitch angles of the yaw-rotations were -20, 0, 20, 40, 60, 80, 90 deg. four additional sequences where acquired. The first one is a 360 yaw-rotation with 10 degrees step in 20 degrees pitch. The second one is a pitch-rotation from 0 to 90 degrees, with 10 degrees step, and the third and fourth with roll-rotation from -80 to +80 in 10 degree steps with pitch angle 0 and 20 degrees respectively. Tripod and world coordinate frames were aligned.
Except for the roll sequence, the tripod was still and, thus, head centers occurred on a hemisphere.
The main dataset consists of
-
Original images (compensated for radial distortion)
-
Foreground/background masks (as binary images)
-
Camera projection matrices (units in millimeters)
-
Ground truth measurements
In addition the following data are availed:
Images filenames have the following form:
F.[FrameNumber].[CameraNumber].[ImageType].png
where:
-
[FrameNumber] is a 6 digit integer that represents the frame number and in this dataset has values from 000001 to 00214
-
[CameraNumber] is a 2 digit integer that represents the camera number and in this dataset has values from 01 to 10
-
[ImageType] is a two character string that represents the type of the image and in this dataset has values UD or BG (UD for original images and BG for foreground/background masks)
The ground truth file (GroundTruth.txt) is a text file with one line for every frame. The format of each line of this file is the following:
Index FrameNumber Yaw_Angle Pitch_Angle Roll_Angle
This file can be downloade
d here.
The calibration of the cameras is in the form of projection matrices and can be foun
d here. Unit length is one millimeter.
The background training images can be downloaded
here.
Dataset is partitioned into 11 sub datasets and packaged as RAR files, as shown in the table below. The leftmost column of this table contains the links to the RAR files.