Scene Understanding for Personal Robots

We consider the problem of high level scene understanding for personal robots. Thanks to the availability of Kinect sensors, our robots can now easily obtain colored 3D pointclouds of it's environment. We perform structured prediction to label these pointclouds into 17 object categories. We use a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. It is trained using a maximum-margin learning approach.

Original Scene Ground Truth Labels Predicted Labels

Popular Press

New Scientist, ACM Technews, Newswise, Zee News, News Tonight, Azo Robotics, VoiCE, iNewsOne.



Download data and code.


  1. Contextually Guided Semantic Labeling and Search for 3D Point Clouds, Abhishek Anand, Hema S. Koppula, Thorsten Joachims, Ashutosh Saxena. In IJRR, 2012. [PDF]
  2. Semantic Labeling of 3D Point Clouds for Indoor Scenes, Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena. In NIPS, 2011. [PDF]
  3. Labeling 3D scenes for Personal Assistant Robots, Hema Swetha Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena. In RSS workshop on RGB-D Cameras, 2011. [PDF]

Related Publications:

  1. 3D-Based Reasoning with Blocks, Support, and Stability. Zhaoyin Jia, Andy Gallagher, Ashutosh Saxena, Tsuhan Chen. In Computer Vision and Pattern Recognition (CVPR), 2013. [PDF]
  2. Hallucinated Humans as the Hidden Context for Labeling 3D Scenes, Yun Jiang, Ashutosh Saxena. In Computer Vision and Pattern Recognition (CVPR), 2013. [PDF, project page]


Hema Koppulahema at (Corresponding Author)
Abhishek Anand
Gaurab Basu
Prof. Thorsten Joachimstj at
Prof. Ashutosh Saxenaasaxena at

Related Projects

CCM for holistic scene understanding

RGB-D Human Activity Detection