DeltaConv: Anisotropic Point Cloud Learning with Exterior Calculus
Abstract
Learning from 3D pointcloud data has rapidly gained momentum, motivated by the success of deep learning on images and the increased availability of 3D data. In this paper, we aim to construct anisotropic convolutions that work directly on the surface derived from a point cloud. This is challenging because of the lack of a global coordinate system for tangential directions on surfaces. We introduce a new convolution operator called DeltaConv, which combines geometric operators from exterior calculus to enable the construction of anisotropic filters on point clouds. Because these operators are defined on scalar and vectorfields, we separate the network into a scalar and a vectorstream, which are connected by the operators. The vector stream enables the network to explicitly represent, evaluate, and process directional information. Our convolutions are robust and simple to implement and show improved accuracy compared to stateoftheart approaches on several benchmarks, while also speeding up training and inference.
1 Introduction
The success of convolutional neural networks (CNNs) on images and the increasing availability of pointcloud data motivate generalizing CNNs from images to 3D point clouds [27, 45, 6]. A promising way to achieve this, is to design convolutions that operate directly on the surface: a nonlinear manifold without a regular grid. Such intrinsic convolutions reduce the kernel space to tangent spaces, which are 2D on surfaces. This means the convolutions can be more efficient and the search space for kernels is reduced; they naturally ignore empty space; and they are robust to rigid and nonrigid deformations [3]. Examples of intrinsic convolutions are GCN [35], PointNet++ [60], and EdgeConv [73].
Our focus is on constructing anisotropic intrinsic convolutions, convolutions that are directiondependent. This is difficult because of the fundamental challenge that nonlinear manifolds lack a global coordinate system. In differential geometry, this problem is often approached by designing operators which are coordinatefree. We will build on this knowledge to improve learning on point clouds. As an illustration of the problem, consider a CNN on images (fig. 1, left). Because an image has a globally consistent updirection, the network can build anisotropic filters that activate the same way across the image. For example, one filter can test for vertical edges and the other for horizontal edges. No matter where the edges are in the image, the filter response is consistent. In subsequent layers, the output of these filters can be combined, e.g., to find a corner. Because we do not have a global coordinate system on surfaces (fig. 1, right), one cannot build and use anisotropic filters in the same way as on images. This limits current intrinsic convolutions on point clouds. For example, GCNs [35] filters are isotropic. PointNet++ [60] uses maximum aggregation and adds relative point positions, but still applies the same weight matrix to each neighboring point.
We introduce a new way to construct intrinsic anisotropic convolutions on point clouds. Our convolutions are described in terms of geometric operators instead of kernels. The operatorbased perspective is familiar from GCN [35], which uses a combinatorial Laplacian on graphs. We allow the network to combine elemental geometric operators from exterior calculus, more specifically the de Rham complex: the gradient, cogradient, divergence, curl, Laplacian, and HodgeLaplacian. These operators are defined on scalar fields and tangential vector fields. Hence, our networks are split in two streams: one stream contains scalars and the other tangential vectors. The operators map along and between the two streams (fig. 2). The vector stream encodes feature activations and directions along the surface, allowing the network to test and relate directions in subsequent layers. Depending on the task, the network outputs scalars or vectors. We name our convolutions DeltaConv after the symbol for the derivative in exterior calculus.
To get an idea of the benefit of these operators, consider the anisotropic filters proposed by Perona and Malik [56]. The PeronaMalik filter achieves anisotropic diffusion by combining the gradient, a scaling factor, a nonlinearity, and divergence. Our convolutions have access to all these building blocks and can thus construct anisotropic filters. Additional benefits of our approach are the following: by maintaining a stream of vector features throughout the network, our convolutions can relate directional information between different points on the surface. Together with the increased expressiveness of convolutions due to anisotropy, this results in increased accuracy over isotropic convolutions, as well as stateoftheart approaches, as we show in our experiments. Also, each operator is implemented as a sparse matrix and the combinations of operators are computed perpoint, which is simple and efficient.
Other convolutions on point clouds support anisotropy in the following ways: Many works extend 2D convolutions to 3D convolutions, instead of building intrinsic filters. Recent examples of 3D Euclidean convolutions are SSCNs [26], MinkowskiNets [10], and KPConv [69]. These approaches construct kernels in three dimensions by learning weights for kernel points on a (pseudo)grid and only apply these kernels to points from the point cloud. We go into an orthogonal direction by building intrinsic convolutions, which operate in fewer dimensions and naturally generalize to (non)rigidly deformed shapes. Another promising solution is to map special 2D kernels to the surface using charts (local coordinate systems) [24, 12, 16, 75, 76, 57, 52]. By constraining kernels to a family of rotation or gaugeequivariant functions, these networks can relate the output of convolutions at different points by a rotation or gaugetransformation. We use geometric operators instead of applying kernels in charts. These operators were designed for nonlinear manifolds and can be applied without additional constructions. This results in a simple design with many connections to modern theory and machinery in differential geometry. To the best of our knowledge, the chartingbased approaches have not yet been tested on point clouds.
In our experiments, we demonstrate that a simple architecture with only a few DeltaConv blocks can match and outperform stateoftheart results using more complex architectures. We achieve 93.8% accuracy on ModelNet40, 84.7% on the most difficult variant of ScanObjectNN, 86.9 mIoU on ShapeNet, and 99.6% on SHREC11, a dataset of nonrigidly deformed shapes. Our ablation studies show that adding the vector stream can decrease the error by up to 25% (from 90.4% to 92.8%) on ModelNet40 and up to 21% for ShapeNet (from 81.1 to 85.1 mIoU), while speeding up inference by and the backward pass by .
Summarizing our main contributions:

[topsep=1pt]

We introduce a new approach for learning from surfaces that explicitly supports the construction of anisotropic filters and the evaluation and processing of tangential information. This is achieved by adding a stream of vector features to the usual stream of scalar features and using discrete differential operators to communicate in and between the streams.

We propose a network architecture that realizes our approach using a selected set of operators from exterior calculus. We also adapt the operators to work effectively in our networks.

We implement and evaluate the network for point clouds and propose techniques to cope with undersampled regions, noise, and missing information prevalent in point cloud learning.
2 Related work
We focus our discussion of related work on the most relevant topics. Please refer to surveys on geometric deep learning [6] and point cloud learning [27, 45] for a more comprehensive overview of this expanding field.
Point cloud networks and anisotropy
A common approach for learning on pointcloud data is to learn features for each point using a multilayer perceptron (MLP), followed by local or global aggregation. Many methods also learn features on local pointpairs before maximum aggregation. The same MLP is used for each neighbor. Well known examples are PointNet and its successor PointNet++ [59, 60]. Several followup works improve on speed and accuracy, for example by adding more combinations of pointpair features [89, 67, 84, 39, 48, 61, 82, 50]. Some of these pointwise MLPs explicitly encode anisotropy by splitting up the MLP for each 3D axis [38, 48]. Concepts from transformers have also made their way to point clouds [90, 86]. These networks learn attention for neighboring points based on their features. This differs from our approach, as we use spatial information to influence how neighboring features are aggregated. Nonetheless, our approach does not exclude the use of attention in the scalar or vector stream.
Pseudogrid convolutions are a more direct translation of image convolutions to point clouds. Many of these are defined in 3D and thus support anisotropy in 3D coordinates. Several works learn a continuous kernel and apply it to local pointcloud regions [47, 4, 46, 69, 78, 29, 1, 23, 81]. Others learn discrete kernels and map points in local regions to a discrete grid [31, 40, 41, 10, 26]. A group of works studies rotational equivariance in 3D space, aiming to design networks invariant to rigid pointcloud transformation [19, 70, 11, 58].
Finally, graphbased approaches create a knearest neighbor or radiusgraph from the input set and apply graph convolutions [65, 73, 87, 44, 64, 17, 9, 68, 21, 72, 88, 54]. DGCNN [73] introduces the EdgeConv operator and a dynamic graph component, which reconnects the knearest neighbor graph inside the network. EdgeConv computes the maximum over feature differences, which allows the network to represent directions in its channels. Channelwise directions can resemble spatial directions if spatial coordinates are provided as input, which is only the case in the first layer for DGCNN. In contrast, our convolutions support anisotropy directly in the operator.
Twostream architectures
Architectures with two streams and vectorvalued features are also used in rotationequivariant approaches for images [77] and surface meshes [16, 76, 12, 57]. These networks constrain kernels to output complexvalued and rotation or gaugeequivariant features, which are separated into orders of equivariance. Instead of equivariant kernels, we work with geometric operators. These are independent of the choice of coordinate systems by design. Furthermore, these approaches apply their kernel in local charts, where geometric operators work directly on the surface. To the best our knowledge, we are the first to implement and evaluate a twostream architecture on point clouds.
Geometric operators
Multiple authors use geometric operators to construct convolutions. The graphLaplacian is used in GCN [35]. Spectral networks for learning on graphs are based on the eigenpairs of the graphLaplacian [7]. Surface networks for triangle meshes [36] interleave the Laplacian with the extrinsic Dirac operator. Parametrized Differential Operators (PDOs) [34] use the gradient and divergence operators to learn from spherical signals on unstructured grids. Recently, DiffGCN [18] uses finite difference schemes of the gradient and divergence operators for the construction of graph networks. DiffusionNet [63] uses the Laplace–Beltrami operator to learn to diffuse signals on the surface. Alongside learned diffusion, DiffusionNet uses the gradient in combination with a scalar product to compute directional features. An approach adjacent to these networks is HodgeNet [66], which learns to build operators using the structure of differential operators. Our approach extends the available operators to be used in the network and maintains a stream of vectorvalued features, so that directional information can be processed in deeper layers of the network. Outside of deep learning, differential operators are widely applied for the analysis of 3D shapes [13, 15].
3 Method
We construct anisotropic convolutions by learning combinations of elemental operators from exterior calculus. Because these operators are defined on scalar and vectorfields, we split our network into scalar and vector features. In this section, we describe these two streams, the operators and how they are discretized, and how combinations of the operators are learned. A schematic overview of our convolutions can be found in Figure 2.
Streams
Consider a point cloud with points arranged in an matrix. Each point can be associated with additional features . Inside the network, we refer to the features in layer at point as . All of these features constitute the scalar stream.
The vector stream runs alongside the scalar stream. Each feature in the vector stream is a tangent vector, encoded by coefficients for an orthonormal basis in the tangent plane: , where are two basis vectors at point , and are the vector coefficients. The basis vectors can be any set of vectors orthonormal to the normal and each other and are used to build the operators. The coefficients are interleaved for each point, forming the tensor of features . One channel in is a column of coefficients: . The input for the vector stream is a vector field defined at each point. In our experiments, we use the gradients of the input to the scalar stream. We will refer to the continuous counterparts of and as and , respectively.
3.1 Scalar to scalar: maximum aggregation
A simplified version of pointbased MLPs is applied inside the scalar stream, building on PointNet++ [60] and EdgeConv [73]. We apply an MLP per point and then perform maximum aggregation over a nn neighborhood. The features in the scalar stream are computed as
(1) 
where is a multilayer perceptron (MLP), consisting of fully connected layers, batch normalization [33], and nonlinearities. If point positions are used as input, they are centralized before maximum aggregation: .
Comparing our convolutions to EdgeConv and PointNet++, the biggest difference is that we do not use pointpair or edgebased features inside the network. Spatial information is provided by the other operators in DeltaConv. The benefit of this change is that the aggregation step is times lighter on the forward and backward pass.
3.2 Scalar to vector: Gradient
The gradient and cogradient connect the scalar stream to the vector stream. The gradient represents the largest rate of change and the direction of that change as vectors at each point, which can be used to characterize (dis)continuities. The cogradient is a 90degree rotation of the gradient. Combined, the gradient and cogradient span the tangent plane, allowing the network to scale, skew, and rotate the vector features.
We construct a discrete gradient using a moving leastsquares approach on neighborhoods with neighbors. This approach is used in modeling and processing for point clouds and solving differential equations on point clouds [43]. The gradient operator is represented as a sparse matrix . It takes values representing features on the points and outputs values representing the gradient expressed in coefficients of the tangent basis of each point. The matrix is highly sparse as it only has elements in each row. The cogradient is a composition of the gradient with a blockdiagonal sparse matrix , where each block in is a 90degree rotation matrix.
Point clouds typically contain undersampled regions and noise. This can be problematic for the moving leastsquares procedure. Consider the example in Figure 3, a chair with thin legs. Only few points lie along the line constituting the legs of the chair. Hence, the perpendicular direction to the line is undersampled, resulting in a volatile leastsquares fit: a minor perturbation of one of the points can heavily influence the outcome (left, circled area). We add a regularization term scaled by to the leastsquares fitting procedure, which seeks to mitigate this effect (right). This is a known technique referred to as ridge regression or Tikhonov regularization. The full procedure and accompanying theory is outlined in the supplemental material.
We also argue that the gradient operator should be normalized, motivated by how information is fused in the network. If exhibits diverging or converging behavior, features resulting from will also diverge or converge. This is undesirable when the gradient is applied multiple times in the network. Features arising from the gradient operation would then have a different order of magnitude which need to be accounted for by the network weights. Therefore, we normalize by the operator norm, which provides an upper bound on the scaling behavior of an operator
(2) 
3.3 Vector to scalar: Divergence, Curl, and Norm
The vector stream connects back to the scalar stream with divergence, curl, and norm. These operators are commonly used to analyze vector fields and indicate features such as sinks, sources, vortices, and the strength of the vector field. The network can use them as building blocks for anisotropic operators.
The discrete divergence is also constructed with a moving leastsquares approach, which is described in the supplement. Divergence is represented as a sparse matrix , with elements. Curl is derived as .
3.4 Vector to vector: Hodge Laplacian
Vector features are diffused in the vector stream using a combination of the identity and the Hodge Laplacian of . Applying the Hodge Laplacian to a vector field results in another vector field encoding the difference between the vector at each point and its neighbors. The Hodge Laplacian can be formulated as a combination of grad, div, curl and [5]
(3) 
In the discrete setting, we replace each operator with its discrete variant
(4) 
3.5 Fusing streams
Each of the operations either outputs scalarvalued or vectorvalued features. We concatenate all the features belonging to each stream and then combine these features with parametrized functions
(5) 
We use the prime to indicate features in layer . All other features are from layer . denotes an MLP. denotes an MLP used for vectors, which first concatenates the 90degree rotated vectors to the input features and then applies a regular MLP. This allows the MLP to rotate, scale, and combine vector features and enriches the set of operators. For example, the rotated gradient is the cogradient. The vectorMLP can learn to combine information from local neighborhoods (through the gradient and Hodge–Laplacian), as well as information from different channels (through the identity). Nonlinearities are applied to the vectors’ norms, as these are invariant to the choice of basis vectors. Other options for vector nonlinearities are explored in [74, 16].
3.6 Properties of our networks
The proposed structure allows DeltaConv to represent, evaluate, and process directional information. For example, it can use gradients to detect edges and their direction and use the vector stream to represent and further evaluate them. Deeper layers can evaluate patterns formed by these edges both in local neighborhoods using divergence, curl, and Hodge–Laplace, as well as between channels at each point, using the vector MLPs.
To represent the vector features and the operators acting on the vector features, we need to choose a basis in every tangent space. Still, the network is independent of the choice of tangent basis, as the same vectors are obtained for any choice of basis. Only their representation changes. DeltaConv inherits this property from the differential operators that it uses. For the applications we consider, the last layer of the network always consists of scalar features. Therefore the bases are only used internally. This means that a network with DeltaConv is able to learn from directional information, while being agnostic to the choice of basis vectors in tangent spaces. This is an alternative to building reference frame fields to compare results of filters along a surface [3, 53, 32], as well as to methods relying on equivariant filters to achieve such independence [16, 76].
4 Experiments
We validate our approach by comparing DeltaConv to stateoftheart approaches for classification and segmentation on data originating from realworld scans as well as sampled CAD models. In addition, we perform ablation studies to discern the effect of different connections in the network structure. We also study the effect of our proposed method on timing and parameter counts and verify our choice to regularize and normalize the gradient matrix.
4.1 Implementation details
The architectures used in these experiments are based on DGCNN [73]. We replace each EdgeConv block with a DeltaConv block (fig. 2). We do not use the dynamic graph component, which means the networks operate at a single scale on local neighborhoods. This rather simple network setup facilitates the evaluation of the effect of the convolution operators on the performance of the network. We would like to stress this point, as many previous works introduce additional architecture changes, like squeezeexcitation blocks, skip attention, and UNet architectures. Despite our simple network setup, DeltaConv achieves stateoftheart results. To show what architectural optimizations mean for DeltaConv, we also test the UResNet architecture used in KPFCNN [69] but with the convolution blocks in the encoder replaced by DeltaConv blocks. In the downsampling blocks used by these networks, we pool vector features by averaging them with parallel transport [76]. More details are provided in the supplemental material. Code will be available upon publication of this paper.
Data transforms. A nn graph is computed for every shape. This graph is used for maximum aggregation in the scalar stream. It is reused to estimate normals when necessary and to construct the gradient. For each experiment, we use xyzcoordinates as input to the network and augment them with a random scale and translation, similar to previous works. Some datasets require specific augmentations, which are detailed in their respective sections.
Training. The parameters in the networks are optimized with stochastic gradient descent (SGD) with an initial learning rate of , momentum of and weight decay of . The learning rate is updated using a cosine annealing scheduler [49], which decreases the learning rate to .
4.2 Classification
For classification, we study ModelNet40 [79], ScanObjectNN [71], and SHREC11 [42]. With these experiments, we aim to demonstrate that our networks can achieve stateoftheart performance on a wide range of challenges: point clouds sampled from CAD models, realworld scans, and nonrigid, deformable objects.
Mean  Overall  

Class Accuracy  Accuracy  
PointNet++ [60]    90.7 
PointCNN [41]  88.1  92.2 
DGCNN [73]  90.2  92.9 
KPConv deform [69]    92.7 
KPConv rigid [69]    92.9 
DensePoint [46]    93.2 
RSCNN [47]    93.6 
GBNet [62]  91.0  93.8 
PointTransformer [90]  90.6  93.7 
PAConv [81]    93.6 
Simpleview [25]    93.6 
Point Voxel Transformer [86]    93.6 
CurveNet [80]    93.8 
DeltaNet (ours)  91.2  93.8 
no bg  bg  t25  t25r  t50r  t50rs  

3DmFV [2]  73.8  68.2  67.1  67.4  63.5  63.0 
PointNet [59]  79.2  73.3  73.5  72.7  68.2  68.2 
SpiderCNN [83]  79.5  77.1  78.1  77.7  73.8  73.7 
PointNet++ [60]  84.3  82.3  82.7  81.4  79.1  77.9 
DGCNN [73]  86.2  82.8  83.3  81.5  80.0  78.1 
PointCNN [41]  85.5  86.1  83.6  82.5  78.5  78.5 
BGAPN++ [71]            80.2 
BGADGCNN [71]            79.9 
GBNet [62]            80.5 
GDANet [82]  88.5  87.0         
DRNet [61]            80.3 
DeltaNet (ours)  89.5  89.3  89.4  87.0  85.1  84.7 
ModelNet40
The ModelNet40 dataset [79] consists of 12,311 CAD models from 40 categories. 9,843 models are used for training and 2,468 models for testing. Each point cloud consists of 1,024 points sampled from the surface using a uniform sampling of 8,192 points from mesh faces and subsequent furthest point sampling (FPS). We use 20 neighbors for maximum aggregation and to construct the gradient and divergence. Groundtruth normals are used to define tangent spaces for these operators and the regularizer is set to . As input to the network, we use the xyzcoordinates. The classification architecture is optimized for 250 epochs. We do not use any voting procedure and list results without voting.
The results for this experiment can be found in Table 1. DeltaConv improves significantly on the most related maximum aggregation operators and is on par with or better than stateoftheart approaches.
ScanObjectNN
ScanObjectNN [71] contains 2,902 unique object instances with 15 object categories sampled from SceneNN [30] and ScanNet [14]. The dataset is enriched to objects by preserving or removing background points and by perturbing bounding boxes. The variant without background points is tested without any perturbations (no bg). The variant with background points is both tested without (bg) and with perturbations. Bounding boxes are translated (t), rotated (r), and scaled (s) before each shape is extracted. This means that some shapes are cut off, rotated, or scaled. t25 and t50 denote a translation by 25% and 50% of the bounding box size, respectively.
We use a modified version of the classification architecture with four convolution blocks: Conv(64, 64), Conv(64, 64), Conv(64, 64), Conv(64, 128). This setup matches the architecture used for DGCNN in [71]. Normals are estimated with 10 neighbors per point and the operators are constructed with 20 neighbors and . As input, we provide the xyzpositions, augmented with a random rotation around the upaxis and a random scale . The network is trained for 250 epochs.
Our results are compared to those reported by the authors of ScanObjectNN [71] and other recent approaches in Table 2. We find that our approach outperforms all networks for every type of perturbation, including networks that explicitly account for background points.
Method  Accuracy 

GWCNN [20]  90.3 
MeshCNN [28]  91.0 
HSN [76]  96.1 
MeshWalker [37]  97.1 
PDMeshNet [51]  99.1 
HodgeNet [66]  94.7 
FC [52]  99.2 
DiffusionNet  hks (no xyz) [63]  99.5 
DeltaNet (ours)  99.6 
Mean  aero  bag  cap  car  chair  ear  guitar  knife  lamp  laptop  motor  mug  pistol  rocket  skate  table  

inst. mIoU  phone  board  
# shapes  2690  76  55  898  3758  69  787  392  1547  451  202  184  283  66  152  5271  
PointNet++  85.1  82.4  79.0  87.7  77.3  90.8  71.8  91.0  85.9  83.7  95.3  71.6  94.1  81.3  58.7  76.4  82.6 
PointCNN  86.1  84.1  86.5  86.0  80.8  90.6  79.7  92.3  88.4  85.3  96.1  77.2  95.3  84.2  64.2  80.0  83.0 
DGCNN  85.2  84.0  83.4  86.7  77.8  90.6  74.7  91.2  87.5  82.8  95.7  66.3  94.9  81.1  63.5  74.5  82.6 
KPConv deform  86.4  84.6  86.3  87.2  81.1  91.1  77.8  92.6  88.4  82.7  96.2  78.1  95.8  85.4  69.0  82.0  83.6 
KPConv rigid  86.2  83.8  86.1  88.2  81.6  91.0  80.1  92.1  87.8  82.2  96.2  77.9  95.7  86.8  65.3  81.7  83.6 
GDANet  86.5  84.2  88.0  90.6  80.2  90.7  82.0  91.9  88.5  82.7  96.1  75.8  95.7  83.9  62.9  83.1  84.4 
PointTransformer  86.6                                 
PointVoxelTransformer  86.5  85.1  82.8  88.3  81.5  92.2  72.5  91.0  88.9  85.6  95.4  76.2  94.7  84.2  65.0  75.3  81.7 
CurveNet  86.8  85.1  84.1  89.4  80.8  91.9  75.2  91.8  88.7  86.3  96.3  72.8  95.4  82.7  59.8  78.5  84.1 
DeltaNet (ours)  86.6  84.9  82.8  89.1  81.3  91.9  79.7  92.2  88.6  85.5  96.7  77.2  95.8  83.0  61.1  77.5  83.1 
DeltaUResNet (ours)  86.9  85.3  88.1  88.6  81.4  91.8  78.4  92.0  89.3  85.6  96.1  76.4  95.9  82.7  65.0  76.6  84.1 
Shrec11
The SHREC11 dataset [42] consists of 900 nonrigidly deformed shapes, 30 each from 30 shape classes. This experiment aims to validate the claim that our approach is well suited for nonrigid deformations. Like previous works [28, 76, 63], we train on 10 randomly selected shapes from each class and report the average over 10 runs. We sample 2048 points from the simplified meshes used in MeshCNNs experiments [28] and use 20 neighbors and mesh normals to construct the operators (). As input, we provide xyzcoordinates, which are randomly rotated along each axis. We decrease the number of parameters in each convolution of the classification architecture to 32, since the dataset is much smaller than other datasets. The network is trained for 100 epochs. We find that our architecture is able to improve on stateoftheart results (table 3), validating the effectiveness of our intrinsic approach on deformable shapes.
4.3 Segmentation
For segmentation, we evaluate our architecture on ShapeNet (part segmentation) [85]. ShapeNet consists of 16,881 shapes from 16 categories. Each shape is annotated with up to six parts, totaling 50 parts. We use the point sampling of 2,048 points provided by the authors of PointNet [59] and the train/validation/test split follows [8]. The operators are constructed with 30 neighbors and groundtruth normals to define tangent spaces (). The xyzcoordinates are provided as input to the network, which is trained for 200 epochs. During testing, we evaluate each shape with ten random augmentations and aggregate the results with a voting procedure. Such a voting approach is used in the most recent works that we compare with.
The results are shown in Table 4, where our approach, especially the UResNet variant, improves upon the stateoftheart approaches on the mean instant IoU metric and in many of the shape categories. For each category, DeltaConv is either comparable to or better than other architectures and significantly better than the most related intrinsic approaches (PointNet++ and DGCNN). We also visualize the features of our simple architecture in Figure 4 to give an idea of the features derived by the network.
4.4 Ablation Studies
We aim to isolate the effect of the vector stream, validate the choices to regularize and normalize the gradient and divergence operators, and investigate the impact of our approach on timing and parameter counts of these networks.
Effectiveness of vector stream
To study the benefit of the vector stream and its effect on different types of intrinsic scalar convolutions, we set up three different scalar streams: (1) a Laplace–Beltrami operator, , (2) GCN [35], and (3) maximum aggregation (eq. 1). We test three variants of each network: (1) only scalar stream, (2) scalar stream with the number of parameters adjusted to match a twostream architecture, and (3) both the scalar and vector stream.
We test each configuration on ModelNet40 and ShapeNet. For both of these tasks, we use the DGCNN basearchitecture. The model for ShapeNet is trained for 50 epochs to save on training time and no voting is used, which results in slightly lower results than listed in Table 4. The results are listed in Table 5. We find that the vector stream improves the network for each scalar stream for both tasks, reducing the error between for classification and for segmentation. For maximum aggregation on ShapeNet, the improvements are lower, but still considerable, given the rate of progress on this dataset over the last few years. Simply increasing the number of parameters in the scalar stream does not yield the same improvement as adding the vector stream, showing that the vectorvalued features are of meaningful benefit. Maximum aggregation in the scalar stream yields the highest accuracy.
Scalar  Vector  Match  Seg mIoU  M40 mcA  M40 OA 

Convolution  Stream  params  
Laplace–      82.5  86.1  90.4 
Beltrami    ✓  82.5  87.1  90.6 
✓    84.9  89.4  92.2  
GCN      81.1  87.3  90.4 
  ✓  81.2  87.3  90.8  
✓    85.1  90.6  92.8  
Max aggregation      85.7  89.2  92.2 
  ✓  85.7  89.5  92.6  
✓    86.1  91.2  93.8 
Timing and parameters
In our method section we argue that computating the gradient matrix is lightweight and that the simplified maximum aggregation operator is significantly faster than edgebased operators in PointNet++ and DGCNN. The main bottleneck in these convolutions is maximum aggregation over each edge. In this experiment we demonstrate this by reporting the time it takes to train and test the classification network on one batch of 32 shapes with 1,024 points each. This includes computing the knearest neighbor graph (ms) and constructing the gradient and divergence operators (ms). All timings are obtained on the same machine with an NVIDIA RTX 2080Ti after 10 iterations. We implemented each method in PyTorch [55] and PyTorch Geometric [22]. The results are listed in Table 6. We find that our network only increases the number of parameters by . Our network is significantly faster than the edgebased convolution: faster in training and inference and faster in the backward pass. DeltaConv with a Laplacian in the scalar stream is even faster: faster in training and inference and faster in the backward pass.
Gradient regularization and normalization
In our method section, we argue that the leastsquares fit for constructing the gradient and divergence should be regularized and the operators should be normalized. In this experiment we intend to validate these choices. We train a model that is entirely based on our gradient operator, with a Laplace–Beltrami operator in the scalar stream. This means that every spatial operator in the network is influenced by regularization and scaling. The model is trained on the ModelNet40 for 50 epochs. The results are listed in Table 7. We notice a considerable difference between our approach with and without regularization. There is a decrease in mean class accuracy and decrease in overall accuracy when the operator is not normalized.
Data  Training  Backward  Inference  Params  

Transform  
DeltaConv (Lapl.)  knn + grad  ms  ms  ms  2,036,938 
DeltaConv  knn + grad  ms  ms  ms  2,037,962 
EdgeConv  knn  ms  ms  ms  1,801,610 
Normalization  Mean  Overall  
Class Accuracy  Accuracy  
85.2  90.3  
  86.6  90.5  
89.4  92.2 
5 Conclusion
In this paper, we proposed DeltaConv, a new intrinsic convolution operator that can better capture anisotropy. This is achieved through a combination of elemental geometric operators from exterior calculus which are weighted by a neural network. DeltaConv separates features into a scalar and vector stream and connects the streams with its operators. We demonstrate improved performance on a wide range of tasks, showing the potential of using the operators from exterior calculus in a learning setting on point clouds. We hope that this work will provide insight into the functionality and operation of neural networks for point clouds and spark more work that combines learning approaches with powerful tools from geometry processing.
Challenges and future work
We limit our study to regression tasks. While we do not think it is impossible to adapt our operators for generative tasks, it is unclear if and when the operators should be recomputed when a surface is generated. We are also interested in connecting concepts from rotationinvariant approaches and transformers into our scalar and vector stream. Finally, it is appealing to test our approach on other surface discretizations.
References
 [1] (2018) Point convolutional neural networks by extension operators. ACM Trans. Graph.. Cited by: §2.
 [2] (2018) 3DmFV: threedimensional point cloud classification in realtime using convolutional neural networks. IEEE Robotics and Automation Letters 3 (4), pp. 3145–3152. Cited by: Table 2.
 [3] (2016) Learning shape correspondence with anisotropic convolutional neural networks. Cited by: §1, §3.6.
 [4] (2019) ConvPoint: continuous convolutions for point cloud processing. Comput. Graph. Forum 88, pp. 24–34. Cited by: §2.
 [5] (2017) Spectral processing of tangential vector fields. Computer Graphics Forum 36 (6), pp. 338–353. Cited by: §3.4.
 [6] (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. 34 (4), pp. 18–42. Cited by: §1, §2.
 [7] (2014) Spectral networks and locally connected networks on graphs. Cited by: §2.
 [8] (2015) ShapeNet: An InformationRich 3D Model Repository. Technical report Technical Report arXiv:1512.03012. Cited by: §4.3.
 [9] (2020) A Hierarchical Graph Network for 3D Object Detection on Point Clouds. Cited by: §2.
 [10] (201906) 4D spatiotemporal convnets: minkowski convolutional neural networks. In CVPR, Cited by: §1, §2.
 [11] (2018) Spherical CNNs. ICLR. Cited by: §2.
 [12] (2019) Gauge Equivariant Convolutional Networks and the Icosahedral CNN. Cited by: §1, §2.
 [13] (2013) Digital geometry processing with discrete exterior calculus. Cited by: §2.
 [14] (2017) ScanNet: richlyannotated 3d reconstructions of indoor scenes. Cited by: §4.2.
 [15] (2015) Vector field processing on triangle meshes. pp. 17:1–17:48. Cited by: §2.
 [16] (2020) Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs. External Links: 2003.05425 Cited by: §1, §2, §3.5, §3.6.
 [17] (2018) GeneralPurpose Deep Point Cloud Feature Extractor. Cited by: §2.
 [18] (2020) DiffGCN: Graph Convolutional Networks via Differential Operators and Algebraic Multigrid Pooling. External Links: 2006.04115 Cited by: §2.
 [19] (2017) Learning SO(3) Equivariant Representations with Spherical CNNs. Cited by: §2.
 [20] (2017) GWCNN: a metric alignment layer for deep shape analysis. Computer Graphics Forum 36 (5), pp. 49–57. Cited by: Table 3.
 [21] (2019) Hypergraph neural networks. Cited by: §2.
 [22] (2019) Fast graph representation learning with PyTorch Geometric. Cited by: §4.4.
 [23] (2018) SplineCNN: fast geometric deep learning with continuous Bspline kernels. Cited by: §2.
 [24] (2021) Geometric deep learning and equivariant neural networks. arXiv preprint arXiv:2105.13926. Cited by: §1.
 [25] (2021) Revisiting point cloud shape classification with a simple and effective baseline. ICML. Cited by: Table 1.
 [26] (2018) 3D semantic segmentation with submanifold sparse convolutional networks. CVPR. Cited by: §1, §2.
 [27] (2020) Deep Learning for 3D Point Clouds: A Survey. IEEE TPAMI, pp. 1. Cited by: §1, §2.
 [28] (2019) MeshCNN: a network with an edge. ACM Trans. Graph. 38 (4), pp. 90. Cited by: §4.2, Table 3.
 [29] (2018) Monte Carlo Convolution for Learning on NonUniformly Sampled Point Clouds. ACM Trans. Graph.. Cited by: §2.
 [30] (2016) SceneNN: a scene meshes dataset with annotations. Cited by: §4.2.
 [31] (2018) Pointwise Convolutional Neural Networks. Cited by: §2.
 [32] (2019) TextureNet: Consistent local parametrizations for learning from highresolution signals on meshes. Cited by: §3.6.
 [33] (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. External Links: 1502.03167 Cited by: §3.1.
 [34] (2019) Spherical CNNs on Unstructured Grids. Cited by: §2.
 [35] (2017) Semisupervised classification with graph convolutional networks. Cited by: §1, §1, §1, §2, §4.4.
 [36] (2018) Surface Networks. Vol. 3. Cited by: §2.
 [37] (2020) MeshWalker. ACM Trans. Graph. 39, pp. 1 – 13. Cited by: Table 3.
 [38] (2019) Modeling Local Geometric Structure of 3D Point Clouds Using GeoCNN. CVPR. Cited by: §2.
 [39] (2020) Going Deeper With Lean Point Networks. Cited by: §2.
 [40] (2019) Octree guided CNN with Spherical Kernels for 3D Point Clouds. Cited by: §2.
 [41] (2018) PointCNN: Convolution on xtransformed points. Cited by: §2, Table 1, Table 2.
 [42] (201101) SHREC ’11 track: shape retrieval on nonrigid 3d watertight meshes.. pp. 79–88. Cited by: §4.2, §4.2, Table 3.
 [43] (2013) Solving partial differential equations on point clouds. SIAM J. Sci. Comput. 35. Cited by: §3.2.
 [44] (2019) Dynamic Points Agglomeration for Hierarchical Point Sets Learning. Cited by: §2.
 [45] (2019) Deep Learning on Point Clouds and Its Application: A Survey. Sens 19 (19). External Links: ISSN 14248220 Cited by: §1, §2.
 [46] (2019) DensePoint: Learning densely contextual representation for efficient point cloud processing. Cited by: §2, Table 1.
 [47] (2019) RelationShape Convolutional Neural Network for Point Cloud Analysis. Cited by: §2, Table 1.
 [48] (2020) A closer look at local aggregation operators in point cloud analysis. In ECCV, Cited by: §2.
 [49] (2017) SGDR: stochastic gradient descent with warm restarts. Cited by: §4.1.
 [50] (202106) CGAnet: category guided aggregation for point cloud semantic segmentation. In CVPR), pp. 11693–11702. Cited by: §2.
 [51] (2020) Primaldual mesh convolutional neural networks. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 952–963. Cited by: Table 3.
 [52] (2021) Field convolutions for surface cnns. External Links: 2104.03916 Cited by: §1, Table 3.
 [53] (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. Cited by: §3.6.
 [54] (2018) 3DTINet: Learn Inner Transform Invariant 3D Geometry Features using Dynamic GCN. External Links: 1812.06254 Cited by: §2.
 [55] (2019) PyTorch: an imperative style, highperformance deep learning library. Cited by: §4.4.
 [56] (1990) Scalespace and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1.
 [57] (2018) Multidirectional geodesic neural networks via equivariant convolution. Cited by: §1, §2.
 [58] (2019) Effective Rotationinvariant Point CNN with Spherical Harmonics kernels. Cited by: §2.
 [59] (2017) PointNet: Deep learning on point sets for 3D classification and segmentation. Cited by: §2, §4.3, Table 2.
 [60] (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. Cited by: §1, §1, §2, §3.1, Table 1, Table 2.
 [61] (202101) Denseresolution network for point cloud classification and segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3813–3822. Cited by: §2, Table 2.
 [62] (2021) Geometric backprojection network for point cloud classification. External Links: 1911.12885 Cited by: Table 1, Table 2.
 [63] (2020) Diffusion is All You Need for Learning on Surfaces. ArXiv abs/2012.0. Cited by: §2, §4.2, Table 3.
 [64] (2018) Mining point cloud local structures by kernel correlation and graph pooling. Cited by: §2.
 [65] (2017) Dynamic edgeconditioned filters in convolutional neural networks on graphs. Cited by: §2.
 [66] (2021) HodgeNet: learning spectral geometry on triangle meshes. ACM Trans. Graph.. Cited by: §2, Table 3.
 [67] (2019) SRINet: Learning Strictly RotationInvariant Representations for Point Cloud Classification and Segmentation. Cited by: §2.
 [68] (2018) RGCNN: Regularized graph CNN for point cloud segmentation. Cited by: §2.
 [69] (2019) KPConv: Flexible and Deformable Convolution for Point Clouds. ICCV. Cited by: §1, §2, §4.1, Table 1.
 [70] (2018) Tensor field networks: Rotation and translationequivariant neural networks for 3D point clouds. CoRR. Cited by: §2.
 [71] (2019) Revisiting point cloud classification: a new benchmark dataset and classification model on realworld data. Cited by: §4.2, §4.2, §4.2, §4.2, Table 2.
 [72] (2018) Local spectral graph convolution for point set feature learning. Cited by: §2.
 [73] (2019) Dynamic graph CNN for learning on point clouds. ACM Trans. Graph.. Cited by: §1, §2, §3.1, §4.1, Table 1, Table 2.
 [74] (2019) General e(2)equivariant steerable cnns. Vol. 32, pp. 14334–14345. Cited by: §3.5.
 [75] (2021) Coordinate independent convolutional networks – isometry and gauge equivariant convolutions on riemannian manifolds. External Links: 2106.06020 Cited by: §1.
 [76] (2020) CNNs on surfaces using rotationequivariant features. ACM Trans. Graph.. Cited by: §1, §2, §3.6, §4.1, §4.2, Table 3.
 [77] (2017) Harmonic networks: deep translation and rotation equivariance. In CVPR, pp. 5028–5037. Cited by: §2.
 [78] (2019) PointConv: Deep Convolutional Networks on 3D Point Clouds. Cited by: §2.
 [79] (2015) 3D ShapeNets: A deep representation for volumetric shapes. Cited by: §4.2, §4.2.
 [80] (202110) Walk in the cloud: learning curves for point clouds shape analysis. In ICCV, pp. 915–924. Cited by: Table 1.
 [81] (202106) PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In CVPR, pp. 3173–3182. Cited by: §2, Table 1.
 [82] (2021) Learning geometrydisentangled representation for complementary understanding of 3d object point cloud. External Links: 2012.10921 Cited by: §2, Table 2.
 [83] (2018) SpiderCNN: deep learning on point sets with parameterized convolutional filters. V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss (Eds.), Cham. External Links: ISBN 9783030012373 Cited by: Table 2.
 [84] (2019) Modeling Point Clouds With SelfAttention and Gumbel Subset Sampling. CVPR. Cited by: §2.
 [85] (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph.. External Links: ISSN 07300301 Cited by: §4.3.
 [86] (2021) PVT: pointvoxel transformer for 3d deep learning. External Links: 2108.06076 Cited by: §2, Table 1.
 [87] (2019) Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features. arXiv preprint arXiv:1904.10014. Cited by: §2.
 [88] (2018) A GraphCNN for 3D point cloud classification. Cited by: §2.
 [89] (2019) PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. Cited by: §2.
 [90] (202110) Point transformer. In ICCV, pp. 16259–16268. Cited by: §2, Table 1.