Shape Completion

1. Introduction

The problem of shape completion in modern research is usually defined as a problem of point cloud completion. Point cloud is a collection of points in space. The main reason for the popularity of point cloud representation of a 3D shape lies in the way the 3D data I collected. Most of the 3D sensors like LiDAR or RGB-D represent the captured data as a point cloud. Therefore point clouds are commonly used as an input for different 3D computer vision tasks like 3D reconstruction, object detection, shape classification and shape generation.

1.1 Motivation

However, the real-world data obtained with RGB-D or LiDAR sensors is incomplete. There are multiple reasons for this problem:

Occlusions – part of a captured shape may be occluded.
Resolution issues – sensors can capture only a finite number of points.
“Environmental issues” – this includes any external influence on the shape capture process (like lighting and mechanical shake)

As it was shown in [5], incompleteness of input point clouds may result in the loss of accuracy for downstream tasks. Therefore, the shape completion (or point cloud completion) becomes the necessary preprocessing step for any real-world data.

1.2 Problem Statement

A complete point cloud could be defined as a set of points on K-dimensional space. This space can include not just the spatial coordinates of points, but also various point features. Partial point cloud can be defined as a subset of points from the complete point cloud:

Then the shape completion task could be defined as the following optimization problem:

Here we try to find the function from the space of partial point clouds to the space of complete point clouds that is best at approximating complete point cloud according to some distance metric. The 2 most popular distance metrics are Chamfer distance and Earth Mover’s distance.Chamfer distance is defined as follows.

Earth Mover’s Distance, on the other hand, finds the optimal bijection between 2 point clouds:

CD can be calculated for arbitrary point clouds, while EMD assumes that both point clouds have same sizes.

Considering the algorithms of point cloud completion there are 3 major challenges:

Point clouds are unstructured in their nature
The output of the algorithm should be invariant to input permutations and the affine transformations of the point cloud.
The output of the algorithm should be a high-resolution completed point cloud

2. Overview of methods

The methods of point cloud completion can roughly be divided into the following groups: point-based, convolution-based, graph-based, folding-based, transformer-based, GAN-based and VAE-based.

Figure 2.1: Classification of point cloud completion methods [1]

2.1 Point-based methods

Point-based methods process the point cloud as a matrix of points by utilizing stacked MLP layers. First point-based cloud completion methods were inspired by PointNet [3] network and its later improved variant PointNet++ [4].

PointNet uses stacked MLP and symmetric max-pooling function to obtain global feature. Specialized network (T-net) predicts the affine transformation matrix which ensures invariance to affine transformations.

PointNet++ uses FPS and grouping operations and PointNets united into the set abstraction layer to better extract local neighborhood information.

(a) (b)

Figure 2.2: (a) Architecture of PointNet++ [4] (b) Architecture of PointNet [3]

2.1.1 PCN [5]

Input: Incomplete point cloud

Key idea: Use PointNet get global features and 2-stage decoder to obtain completed point cloud

Algorithm:

Use PointNet to extract global feature.
Process global feature with MLP to get coarse point cloud.
Refine coarse point cloud with folding-based decoder (see 2.4)

Loss: CD+EMD

Figure 2.3: Architecture of PCN [5]

2.1.2 TopNet [6]

Input: Incomplete point cloud

Key idea: Model point cloud topology in decoder using MLPs arranged in a tree-like structure

Algorithm:

Use PointNet to extract global feature.
Starting from global feature hierarchically model smaller subsets of points using rooted tree structure. In the final layer of the tree compute the points in the completed point cloud.

Loss: CD

Figure 2.4: Architecture of TopNet decoder [6]

2.1.3 ASFM-Net [7]

Input: Incomplete point cloud

Key idea: Use siamese autoencoder to generate better detailed point clouds and further refine them with refinement unit.

Algorithm:

Use 2 PCN-like networks in siamese autoencoder to generate coarse point clouds. The first network is trained on complete point clouds and then freezed. Second network is trained on partial point clouds and uses decoder of the first network.
Merge coarse point cloud with input and use FPS to solve the non-uniform density problem.
Use folding-based decoder with 3 MLPs to generate point cloud of a better resolution.

Loss:

Completion loss: CD
Feature loss: Eucledian distance on encoder features.

(a) (b)

Figure 2.5: (a) General architecture of ASFM-Net (b) Architecture of siamese autoencoder

2.2 Convolution-based methods

CNNs achieved state-of-the-art results on many tasks related to 2D images. Therefore, many researchers tried to apply them to a point cloud completion problem. However, point clouds are unstructured in their nature, so ordinary 3D convolutions cannot be directly applied to them. There are 2 general ways to solve this problem: voxelize the point cloud into 3D voxel array and use standard 3D convolutions or, alternatively, use specialized convolutions that can work on unstructured data.

The voxelization approach has the quantization problem: because we transform our data from continuous space to discrete space we inevitably lose the fine-grained details and local information.

Alternatively, one can use special convolutions that work on unstructured point clouds. Foe example:

Pointwise convolution [8] – puts every neighborhood point into kernel cell and performs convolution.
χ-conv [9] - aggregates local features on points.
KPConv [10] - uses a combination of points as a kernel instead of pixel array.

(a) (b) (c)

Figure 2.6: Various convolutions that work on point cloud data: (a) Pointwise (b) KPConv (c) χ-conv

2.2.1 GRNet [12]

Input: Voxel array obtained by applying gridding operation on input.

Key idea: Preserve spatial information during voxelization by using gridding operation.

Algorithm:

Use gridding operation which selects the values on grid vertices based on distance to the points in adjacent voxels.
Obtain grid of coarse point cloud by using 3D CNN. Get the coarse point cloud by putting a single point into each voxel based on adjacent grid values (reverse gridding).
Sample point features as a concatenation of CNN features from adjacent grid vertices (cubic feature sampling).
Pass coarse point cloud through MLP-based decoder to obtain detailed point cloud.

Loss: L1 loss between grids.

Figure 2.7: Architecture of GRNet

2.2.2 ViPC [11]

Input: Incomplete point cloud + single view image

Key idea: Use object image as a guidance in determining the final shape

Algorithm:

Generate point cloud from image (CNN) and match it with input point cloud. Use FPS for uniform density.
Split the resulting cloud into fine part and coarse part (points from image).
Create global feature that uses 2D and 3D guidance.
Generate offsets of points in refined point cloud using CNN. For fine part points one of this offsets is set to be small.

Loss: CD+EMD

(a) (b)

Figure 2.8: ViPC: (a) Offset predictor (b) Global architecture

2.3 Graph-based methods

Graph methods view point cloud as a set of vertices and try to build a graph on this set.

The graph-based method is DGCNN [13]. It defined edge convolution operation (EdgeConv) as the aggregation of edge features (between neighbors) using symmetric function. Depending on the edge function (calculates edge features) we can have full or partial transformation invariance. PointNet and PointNet++ could be modelled by DGCNN.

2.3.1 Graph-guided deformation [14]

Input: 3 resolutions of an incomplete point cloud obtained via FPS (Farthest Point Sampling).

Key idea: Build point cloud graph and use mesh deformation methods to refine it

Algorithm:

Obtain features for each resolution using PointNet, concatenate them pass through MLP to obtain global feature. Pass it through TopNet decoder to obtain coarse point cloud.
Merge coarse point cloud and input point cloud, use FPS and modified PointNet to generate point features. Build graph on the point cloud using KNN.
Refine the point cloud graph by passing it through GCN layers with residual connections.

Loss:

Reconstruction loss: CD.
Matching loss: Huber loss between points of fine part. Ensures that fine part is not changed greatly.
Shape-preserving loss: Umbrella Laplacian loss. Ensures that mesh deformations are smooth.

Figure 2.9: Architecture of graph-guided deformation

2.3.2 PC-RGNN [15]

Input: 3 resolutions of an incomplete point cloud obtained via FPS

Key idea: Use GCN for multi-resolution prediction of complete point cloud.

Algorithm:

For each resolution generate a global graph feature via GCN.
Merge resolution features into a single global feature using MLP.
Fully-connected layers are used to generate point clouds of different resolutions.

Loss:

Multi-level completion loss: CDs for different resolutions.
Adversarial loss: Obtained via discriminator that consists of 2 GCN layers and fully-connected layers.

Figure 2.10: Architecture of PC-RGNN

2.4 Folding-based methods

These methods generate complete point cloud by deforming a 2D grid around each point in incomplete point cloud. The first algorithm to use such method was FoldingNet.

FoldingNet [14]

Input: Incomplete point cloud

Key idea: Map 2D grid into the completed point cloud based on global point cloud feature

Algorithm:

Build KNN graph. Compute local covariance matrices and concatenate with point coordinates to get point features.
Pass point features through MLP and 2 graph convolution layers to get global point cloud feature.
Concatenate global feature with grid coordinates and pass this matrix through 2 MLPs to get final point cloud.

Loss: CD

Figure 2.11: Architecture of FoldingNet

2.5 Transformer-based methods

Transformers achieved state-of-the-art results in NLP tasks. But in the last years they sterted to be used in point cloud completion tasks. Such methods view point cloud completion problem similarly to seq2seq problem in NLP. One of the first tansformer-based methods was PoinTr.

PoinTr[17]

Input: Sequence of point proxies obtained through DGCNN on selected point centers sampled using FPS concatenated with location information (coordinates passed through MLP).

Key idea: Use modified geometry-aware transformers to extract local features and coordinates of coarse point cloud.

Algorithm:

Pass the input feature sequence through the geometry-aware transformer encoder. Geometry-aware transformer uses KNN to query features of other points and aggregates them with fully-connected and max-pooling layers. This geometric feature is then concatenated with semantic feature from multi-head attention and transformed with fully-connected layer.
Extract global encoder feature via max-pooling. Use linear projection layer on it to generate center coordinates of coarse point cloud.
Concatenate global encoder feature with coordinates and pass them via MLP to generate dynamic queries for decoder. Use geometry-aware transformer decoder to predict features of coarse point cloud.
Use FoldingNet decoder for refinement.

Loss: CD

Figure 2.12: Architecture of PoinTr

2.5.1 SnowflakeNet [18]

Input: Incomplete point cloud

Key idea: Generate final point cloud iteratively by "splitting" points in input point cloud.

Algorithm:

Get global point cloud feature using PointNet++ encoder.
Generate coarse point cloud using point-wise splitting (operation that uses deconvolution to build multiple point features from a single one) followed by 2 MLP layers.
Merge coarse point cloud with input and use FPS to make the density uniform.
Generate more detailed point clouds using Snowflake Point Deconvolution (SPD) that determines the offsets of new points from old ones. SPD obtains local point features with PointNet ans passes them through skip-transformer which also receives offsets from previous SPD layer. Finally, it repeats step 2 to obtain offsets and spplies them to current point cloud.

Loss:

Completion loss: CD (for coarse cloud and SPD outputs).
Preservation loss: one-way CD that matches output point cloud to input point cloud. This loss checks that the input point cloud structure is well-preserved.

Figure 2.13: Architecture of SnowflakeNet

2.6 GAN-based methods

Originally used in 2D image generation task, GANs also became prominent in point cloud completion algorithms. One example of such algorithm is Cascaded Refinement Network.

CRN [21]

Input: Incomplete point cloud

Key idea: Use iterative refinement to generate detailed point cloud and discriminator on different resolutions to learn fine-grained completed shapes.

Algorithm:

Extract global feature using PointNet. Generate coarse point cloud with FC. Iteratively obtain higher resolutions with lifting module (MLPs arranged in contraction-expansion manner).
Use FPS to obtain patches of generated point cloud at different resolutions and pass them to MLP-based discriminator.

Loss:

Reconstruction loss: CD.
Adversarial loss: LS-GAN loss [22].

(a) (b)

Figure 2.14: Architecture of CRN: (a) Discriminator (b) Generator

2.7 VAE-based methods

VAE-based methods learn distribution over possible complete 3D shapes based on the incomplete input. That is why VAEs can output multiple alternative completed point clouds. The problem, however, lies in the dimensionality of the space of complete point clouds which is extremely large. One example of how VAE-based methods deal with that problem is AutoSDF.

AutoSDF [19]

Input: Incomplete point cloud

Key idea: Use VQ-VAE to work on low-dimensional discrete space.

Algorithm:

Split the incomplete shape into regularly organized parts (similar to voxelization) and encode them independently (to obtain pure local features without considering global shape). Form the grid.
Learn the distribution over the grids by factorizing them into a product of conditional distributions. Use randomly permuted sequence to avoid spatial ordering. Probabilities are modelled by transformer.
Map the distribution over grids into complete 3D shapes.

Loss:

Reconstruction loss: CD
VQ objective.
Commitment loss.

Figure 2.15: Architecture of AutoSDF

3. Results

The most popular datasets for evaluating shape completion algorithms are:

KITTI [23] - dataset of real RGB-D and LiDAR scans from driving cars.
ShapeNet [24] - dataset of 3D CAD (synthetic) models.
Completion3D [6] - point cloud completion benchmark proposed by TopNet creators. Derived from ShapeNet.

The claimed results of reviewed algorithms are shown in the table below (the value in brackets for ShapeNet is the output size).

Algorithm	ShapeNet (CD)	Completion3D (CD)	KITTI (Consistency)
PCN (point)	5.70 (16384)	18.22	0.01163
TopNet (point)	0.972 (2048)	14.25	0.014
ASFM-Net (point)	12.09 (4096)	6.68	0.020
ViPC (conv)	3.308 (2048)	-	-
GRNet (conv)	0.27 (16384)	10.64	0.313
Graph-guided deformation (graph)	0.60 (2048)	-	-
FoldingNet (folding)	0.7142 (16384)	19.07	1.053
PoinTr (trans)	1.09 (8192, ShapeNet-55)	-	-
SnowflakeNet (trans)	1.86 (2048)	7.60	-
CRN (gan)	8.505 (16384)
AutoSDF (vae)	1.331	-	-

4. Medical Applications

Shape completion algorithms could have the following medical applications:

Guided surgery - In [20] researchers used VAE-based shape completion algorithm to predict missing bone segments for jaw reconstruction surgery.
Implant shape prediction - In [2] researchers used convolution-based shape completion algorithm to predict the shape of cranial implant based on 3D image of fractured skull.

(a) (b)

Figure 4.1: Medical applications of shape completion algorithms: (a) Implant shape prediction (b) Guided surgery

5. Discussion & Own review

5.1 Review of methods

Methods	Data	Key operations / layers	Challenges
Point-based	Set of points	MLPs	Need to consider local relationships between points
Convolution-based	Point grid or voxel array	Convolutions (Pointwise, 3D)	Data regularization (to perform convolution operations)
Graph-based	Weighted graph	Graph convolutions	Selection of convolutional operator; Exploitation of geometric relationships
Folding-based	2D Grid	Grid deformations	Generation of detailed point clouds
Transformer-based	Sequence of points (seq2seq)	Transformer layers	Model size; Point clouds are not sequentially structured
GAN-based	Collection of real and generated point clouds	Discriminator network	Training of GAN models with high-resolution output
VAE-based	Probability distribution over 3D shapes	VAEs	Factorization of probability distribution

5.2 Review of algorithms

Almost all of the shape completion algorithms have following characteristics:

End-to-end architecture (encoder+decoder or generator+discriminator)
Use of symmetric functions to acvieve permutation invariance.
Translation and rotation invariance are achieved by predicting input transformation matrix (T-Net) or by using transformation-invariant operations (such as some graph convolutions).
High-resolution output is obtained via iterative refinement (typically 2-stage with the generation of coarse and detailed point clouds; some algorithms feature 3-stage (SnowflakeNet) and 4-stage (CRN) generation).
Use of FPS to obtain lower-resolution representations or solve non-uniform density problem (especially when merging 2 point clouds together).

The more detailed review of algorithms is presented in the following table:

Algorithm	Encoder (Generator) Architecture	Decoder (Discriminator) Architecture	Advantages	Disadvantages
PCN (point)	PointNet	MLP, Folding	No quantization because of voxelization.	Lack of details in completed shapes. Computational complexity. Cannot take into account local neighborhood relations.
TopNet (point)	PointNet	TopNet (tree-like MLPs)	Output point clouds can be arbitrary. Architecture is interpretable and based on strong mathematical basis of topology.	Scalability for large output size
ASFM-Net (point)	Similar to PCN		More advanced refinement compared to TopNet and PCN.	Training time (need to train 2 encoders, 1 is not used). Information about complete point cloud may leak into second autoencoder because of feature matchine loss.
ViPC (conv)	PointNet, CNN	CNN	Single-view image gives more information to the encoder. Algorithm uses a wide range of point cloud features.	Algorithm is dependent on the quality of point cloud generation from image which is not considered in loss. Global feature has a high dimensionality. May lack quality in local neighborhood generation due to concatenation of all local features into a global one.
GRNet (conv)	CNN	CNN, MLP	Gridding operation allows to preserve some spatial information while making input regular.	Resolution of coarse point cloud is dependent on grid size. Higher sizes may result in loss of computational effectivness because input only influences a portion of voxels.
Graph-guided deformation (graph)	PointNet	TopNet, GCN	Allows for better surface approximation because of mesh deformations	Mesh deformations cannot upsample point cloud.
PC-RGNN (graph)	GCN, MLP	MLP	Use of adversarial loss improves quality of the output. Algorithm is proved to be efficient when using it for downstream tasks (object detection)	Hierarchical decoder does not consider smoothness between patches generated by close points.
FoldingNet (folding)	GCN, MLP	Folding	Better neighborhood approximation compared to ordinary MLP decoders.	No constraints on the intermediate point cloud. Does not use local features.
PoinTr (trans)	DGCNN, MLP, Transformer	Transformer, FoldingNet	Geometry-aware transformer architecture better captures dependencies between points (not just local, but also global).	Model can be large. Size of the model could be lowered only at the cost of input resolution
SnowflakeNet (trans)	PointNet++	SPD	Algorithm considers hierarchical features between resolutions while constructing completed output (in SPD layer)	PS operation does not consider smoothness between patches generated by close points. Large model size.
CRN (gan)	PointNet	MLP	Supports iterative refinement Uses multi-resolution features in discriminator	Decoder lacks the ability to take into account local neighborhood relations.
AutoSDF (vae)	VQ-VAE		Learns the entire distribution of possible completed shapes and, therefore, can produce multiple outputs.	Factorization still has inherent order even with random permutations

6. References

[1] Fei, Ben, et al. "Comprehensive Review of Deep Learning-Based 3D Point Clouds Completion Processing and Analysis." arXiv preprint arXiv:2203.03311 (2022).

[2] Mainprize, James G., Zachary Fishman, and Michael R. Hardisty. "Shape completion by U-Net: an approach to the AutoImplant MICCAI cranial implant design challenge." Cranial implant design challenge. Springer, Cham, 2020.

[3] Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[4] Qi, Charles Ruizhongtai, et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space." Advances in neural information processing systems 30 (2017).

[5] Yuan, Wentao, et al. "Pcn: Point completion network." 2018 International Conference on 3D Vision (3DV). IEEE, 2018.

[6] Tchapmi, Lyne P., et al. "Topnet: Structural point cloud decoder." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[7] Xia, Yaqi, et al. "Asfm-net: Asymmetrical siamese feature matching network for point completion." Proceedings of the 29th ACM International Conference on Multimedia. 2021.

[8] Hua, Binh-Son, Minh-Khoi Tran, and Sai-Kit Yeung. "Pointwise convolutional neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[9] Li, Yangyan, et al. "Pointcnn: Convolution on x-transformed points." Advances in neural information processing systems 31 (2018).

[10] Thomas, Hugues, et al. "Kpconv: Flexible and deformable convolution for point clouds." Proceedings of the IEEE/CVF international conference on computer vision. 2019.

[11] Zhang, Xuancheng, et al. "View-guided point cloud completion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[12] Xie, Haozhe, et al. "Grnet: Gridding residual network for dense point cloud completion." European Conference on Computer Vision. Springer, Cham, 2020.

[13] Wang, Yue, et al. "Dynamic graph cnn for learning on point clouds." Acm Transactions On Graphics (tog) 38.5 (2019): 1-12.

[14] Shi, Jieqi, et al. "Graph-guided deformation for point cloud completion." IEEE Robotics and Automation Letters 6.4 (2021): 7081-7088.

[15] Zhang, Yanan, Di Huang, and Yunhong Wang. "PC-RGNN: Point cloud completion and graph neural network for 3D object detection." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 4. 2021.

[16] Yang, Yaoqing, et al. "Foldingnet: Point cloud auto-encoder via deep grid deformation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[17] Yu, Xumin, et al. "Pointr: Diverse point cloud completion with geometry-aware transformers." Proceedings of the IEEE/CVF international conference on computer vision. 2021.

[18] Xiang, Peng, et al. "Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer." Proceedings of the IEEE/CVF international conference on computer vision. 2021.

[19] Mittal, Paritosh, et al. "Autosdf: Shape priors for 3d completion, reconstruction and generation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[20] Abdi, Amir H., et al. "Variational shape completion for virtual planning of jaw reconstructive surgery." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019.

[21] Wang, Xiaogang, Marcelo H. Ang Jr, and Gim Hee Lee. "Cascaded refinement network for point cloud completion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

[22] Mao, Xudong, et al. "Least squares generative adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.

[23] Geiger, Andreas, et al. "Vision meets robotics: The kitti dataset." The International Journal of Robotics Research 32.11 (2013): 1231-1237.

[24] Chang, Angel X., et al. "Shapenet: An information-rich 3d model repository." arXiv preprint arXiv:1512.03012 (2015).

Seitenhierarchie

Shape Completion

1. Introduction

1.1 Motivation

1.2 Problem Statement

2. Overview of methods

2.1 Point-based methods

2.1.1 PCN [5]

2.1.2 TopNet [6]

2.1.3 ASFM-Net [7]

2.2 Convolution-based methods

2.2.1 GRNet [12]

2.2.2 ViPC [11]

2.3 Graph-based methods

2.3.1 Graph-guided deformation [14]

2.3.2 PC-RGNN [15]

2.4 Folding-based methods

2.5 Transformer-based methods

2.5.1 SnowflakeNet [18]

2.6 GAN-based methods

2.7 VAE-based methods

3. Results

4. Medical Applications

5. Discussion & Own review

5.1 Review of methods

5.2 Review of algorithms

6. References