General information about this project can be found here.

Dataset for Learning Object Affordances

Scenes and Labels

sample room and objects images

Our dataset contains 20 scenes from 3 categories (6 living rooms, 7 kitchens and 7 offices) and 47 objects of 19 types (such as dishware, books, fruit, lamps, computers, etc.). Scenes are downloaded from Google 3D warehouse, and then converted to .obj format. We used virtual scanner (by Point Cloud Library) to generate a sparse point-cloud for each scene. The point-cloud has surface points of the empty scene, which are used to sample human poses and object locations. Each line of the file has the X, Y and Z coordinate of each point (in millimeters and separated with one space).

We converted scenes to the .sh3d format, used by Sweet Home 3D. This free software (runs in Java) allows user to load and move objects, so that we use it for labeling. (Those files are big and are not needed for running the code.)

Label Format

For each scene, we asked 3 to 5 subjects (not associated with the project) to manually label the location and orientation of every object in the scene. Each label is a full arrangement of multiple objects in one scene. Each line of the label describes the placement of one object, in the following format:


where X, Y and Z are in millimeters and rotation is in radians.

For each object, we labeled its type and original rotation (of the 3D model), computed its dimensions. These information can be found in all.conf file in the following format:


Labels: .sh3d, .txt


Scenes and labels (512M) (README)

Objects: .obj, all.conf

Human Poses

Six human poses image

We clustered human skeletons from the CAD-60 dataset into 6 human poses (shown above) using K-means algorithm. Poses are defined in skl_aligned_list.txt, where each row specifies one pose in the following format:


	pose_id		=> integer from 1 to 6

	Xi, Yi, Zi 	=> location of ith joint 
	Joint number	=>
		     1 -> HEAD
		     2 -> NECK
		     3 -> TORSO
		     4 -> LEFT_SHOULDER
		     5 -> LEFT_ELBOW
		     6 -> RIGHT_SHOULDER
		     7 -> RIGHT_ELBOW
		     8 -> LEFT_HIP
		     9 -> LEFT_KNEE
		    10 -> RIGHT_HIP
		    11 -> RIGHT_KNEE
		    12 -> LEFT_HAND
		    13 -> RIGHT_HAND
		    14 -> LEFT_FOOT
		    15 -> RIGHT_FOOT

Download the six poses and matlab code loading them (README).

Object Affordances

Download the learned affordances and visualization code (README).

Scene Labeling Experiments

In our scene labeling experiments (described in [1]), we use hallucinated humans to generate human-context features. You can reproduce our experiments in the following two steps:

  • Generating features
  • You can directly download our features (readme), or download our MATLAB code and run 'main_DP_cornellrgbd.m'.

  • Labeling scenes
  • You can easily follow the instructions on Cornell RGBD Dataset to download and run the code.

    [1] Hallucinated Humans as the Hidden Context for Labeling 3D Scenes, Yun Jiang, Hema S. Koppula, Ashutosh Saxena. In Computer Vision and Pattern Recognition (CVPR), 2013 (oral). [PDF]