3D ShapeNets: A Deep Representation for Volumetric Shapes


3D shape is a crucial but heavily underutilized cue in object recognition, mostly due to the lack of a good generic shape representation. With the recent boost of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is even more urgent to have a useful 3D shape model in an object recognition pipeline. Furthermore, when the recognition has low confidence, it is important to have a fail-safe mode for object recognition systems to intelligently choose the best view to obtain extra observation from another viewpoint, in order to reduce the uncertainty as much as possible. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model naturally supports object recognition from 2.5D depth map, and view planning for object recognition. We construct a large-scale 3D computer graphics dataset to train our model, and conduct extensive experiments to study this new representation.


Supplementary Materials


ModelNet Benchmark Leaderboard

Please email Shuran Song to add or update your results.

Wang et al. [22]93.8%
ECC [21]83.2%90.0%
PANORAMA-NN [20]90.7%83.5%91.1%87.4%
MVCNN-MultiRes [19]91.4%
FPNN [18]88.4%
Klokov and Lempitsky[16]91.8% 94.0%
LightNet[15]86.90% 93.39%
Xu and Todorovic[14]81.26% 88.00%
Geometry Image [13]83.9% 51.3%88.4%74.9%
Set-convolution [11]90%
PointNet [12]77.6%
3D-GAN [10]83.3%91.0%
VRN Ensemble [9]95.54%97.14%
ORION [8] 93.8%
FusionNet [7]90.8%93.11%
Pairwise [6]90.7%92.8%
MVCNN [3]90.1%79.5%
GIFT [5] 83.10%81.94% 92.35%91.12%
VoxNet [2]83%92%
DeepPano [4]77.63%76.81%85.45%84.18%
3DShapeNets [1]77%49.2%83.5%68.3%

[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao. 3D ShapeNets: A Deep Representation for Volumetric Shapes. CVPR2015.
[2] D. Maturana and S. Scherer. VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition. IROS2015.
[3] H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller. Multi-view Convolutional Neural Networks for 3D Shape Recognition. ICCV2015.
[4] B Shi, S Bai, Z Zhou, X Bai. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition. Signal Processing Letters 2015.
[5] Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Longin Jan Latecki. GIFT: A Real-time and Scalable 3D Shape Search Engine. CVPR 2016.
[6] Edward Johns, Stefan Leutenegger and Andrew J. Davison. Pairwise Decomposition of Image Sequences for Active Multi-View Recognition CVPR 2016.
[7] Vishakh Hegde, Reza Zadeh 3D Object Classification Using Multiple Data Representations.
[8] Nima Sedaghat, Mohammadreza Zolfaghari, Thomas Brox Orientation-boosted Voxel Nets for 3D Object Recognition.
[9] Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston Generative and Discriminative Voxel Modeling with Convolutional Neural Networks.
[10] Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. NIPS 2016
[11] Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos. Deep Learning with sets and point clouds
[12] A. Garcia-Garcia, F. Gomez-Donoso†, J. Garcia-Rodriguez, S. Orts-Escolano, M. Cazorla, J. Azorin-Lopez. PointNet: A 3D Convolutional Neural Network for Real-Time Object Class Recognition
[13] Ayan Sinha, Jing Bai, Karthik Ramani. Deep Learning 3D Shape Surfaces Using Geometry Images ECCV 2016
[14] Xu Xu and Sinisa Todorovic. Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes
[15] A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition
[16] Roman Klokov, Victor Lempitsky Escape from Cells: Deep Kd-Networks for The Recognition of 3D Point Cloud Models
[17] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. CVPR 2017.
[18] Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, and Leonidas J. Guibas. FPNN: Field Probing Neural Networks for 3D Data. NIPS 2016.
[19] Charles R. Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas.
Volumetric and Multi-View CNNs for Object Classification on 3D Data. CVPR 2016.
[20] K. Sfikas, T. Theoharis and I. Pratikakis.
Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval. 3DOR2017.
[21] Martin Simonovsky, Nikos Komodakis
Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs.
[22] Chu Wang, Marcello Pelillo, Kaleem Siddiqi1
Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition..

Source code



This work is supported by gift funds from Intel Corporation and Project X grant to the Princeton Vision Group, and a hardware donation from NVIDIA Corporation. Z.W. is also partially supported by Hong Kong RGC Fellowship. We thank Thomas Funkhouser, Derek Hoiem, Alexei A. Efros, Andrew Owens, Antonio Torralba, Siddhartha Chaudhuri, and Szymon Rusinkiewicz for valuable discussion.