Semantic Scene Labeling for Personal Robots

We consider the problem of semantic scene labeling from RGB-D data for personal robots. Given the input co-registered RGB, Depth image pair, the colored 3D pointclouds can be easily obtained. We formulate the Conditional Random Field model by incorporating relatins between visual/geometric features and labels, relations between contexts, as well as relations between hierarchical labels.

3d Pointclouds Labeling

Original Scene Ground Truth Labels Predicted Labels

Hierarchical Semantic Labeling

2d Label Mask 3d Label Mask Semantic Hierarchy Graph

Popular Press

New Scientist, ACM Technews, Newswise, Zee News, News Tonight, Azo Robotics, VoiCE, iNewsOne.



Download data and code [2,3].

Data and code [1] to be released.


  1. [1]. Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception, Chenxia Wu, Ian Lenz, Ashutosh Saxena. In Robotics: Science and Systems (RSS), 2014. [PDF]
  2. [2]. Contextually Guided Semantic Labeling and Search for 3D Point Clouds, Abhishek Anand, Hema S. Koppula, Thorsten Joachims, Ashutosh Saxena. In IJRR, 2012. [PDF]
  3. [3]. Semantic Labeling of 3D Point Clouds for Indoor Scenes, Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena. In NIPS, 2011. [PDF]
  4. [4]. Labeling 3D scenes for Personal Assistant Robots, Hema Swetha Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena. In RSS workshop on RGB-D Cameras, 2011. [PDF]

Related Publications:

  1. 3D-Based Reasoning with Blocks, Support, and Stability. Zhaoyin Jia, Andy Gallagher, Ashutosh Saxena, Tsuhan Chen. In Computer Vision and Pattern Recognition (CVPR), 2013. [PDF]
  2. Hallucinated Humans as the Hidden Context for Labeling 3D Scenes, Yun Jiang, Ashutosh Saxena. In Computer Vision and Pattern Recognition (CVPR), 2013. [PDF, project page]


Hema Koppulahema at ([2,3,4]Corresponding Author)
Chenxia Wuchenxiawu at ([1]Corresponding Author)
Abhishek Anand
Gaurab Basu
Prof. Thorsten Joachimstj at
Prof. Ashutosh Saxenaasaxena at

Related Projects

CCM for holistic scene understanding

RGB-D Human Activity Detection