PointNet++详解（二）：网络结构解析

如有错误，恳请指出。

在之前对PointNet与PointNet++网络进行了介绍，接下来是对其代码的解析。

1. 论文阅读笔记 | 三维目标检测——PointNet

2. 论文阅读笔记 | 三维目标检测——PointNet++

参考的github项目为：https://github.com/yanx27/Pointnet_Pointnet2_pytorch

这篇博客的内容主要是将PointNet++网络结构进行拆分介绍。在介绍PointNet++的时候是按照采样（sampling）、分区（grouping）、特征提取（pointnet）、自适应密度不均衡几大部分来进行的，所以介绍接下来代码介绍也将按照这几部顺序进行。

文章目录

1. Sampling Layer
2. Grouping Layer
3. PointNet Layer
4. MSG Module
5. Feature Propagation
6. Classification
7. Scene Segmentation
8. Part Segmentation

1. Sampling Layer

对一个点云数据进行局部区域的划分，首先就需要的对点进行采样再针对这个采样点构造局部区域。对点进行采样有多种方法，比如随机采样，均匀采样等等，这些在之前的Open3d操作中也介绍到。但是为了使得采样出来的点能最大限度的覆盖整个点云数据，PointNet++中使用了最远点采样的方法。简单来说，就是寻找剩余点下，离已采样点最小距离中的最大距离，作为新的采样点。反复迭代，直到得到需要寻找的K个设定值。这里的最远点采样是指距离最远点采样（D-FPS）是根据欧几里得距离为基础所进行的采样

def farthest_point_sample(point, npoint):
    """
    F-FPS: 距离最远点采样代码
    Input:
        xyz: pointcloud data, [N, D]
        npoint: number of samples
    Return:
        centroids: sampled pointcloud index, [npoint, D]
    """
    N, D = point.shape
    xyz = point[:,:3]
    centroids = np.zeros((npoint,))     # 存储依次挑选出来的采样点
    distance = np.ones((N,)) * 1e10     # 存储每个点离采样点的最近距离
    farthest = np.random.randint(0, N)  # 开始时先随机选择一个最远点
    for i in range(npoint):
        centroids[i] = farthest         # 根据索引确定采样点存储到列表中
        centroid = xyz[farthest, :]
        dist = np.sum((xyz - centroid) ** 2, -1)  # 计算所有点离当前最远点的距离
        mask = dist < distance
        distance[mask] = dist[mask]     # 用最近的距离来表示每个点离采样点的距离
        farthest = np.argmax(distance, -1)        # 筛选最近距离中的最远距离，来作为新一轮的采样点
    point = point[centroids.astype(np.int32)]     # 根据索引挑选出采样后的点云
    return point

最远点采样算法会返回采样点的索引，随后利用index_points函数在点集中根据索引返回相应的点位置以及点特征

# 作用: 根据索引提取对应的点信息
def index_points(points, idx):
    """

    Input:
        points: input points data, [B, N, C]
        idx: sample index data, [B, S]
    Return:
        new_points:, indexed points data, [B, S, C]
    """
    device = points.device
    B = points.shape[0]
    view_shape = list(idx.shape)    # (b,1)
    view_shape[1:] = [1] * (len(view_shape) - 1)
    repeat_shape = list(idx.shape)  # (1,npoint)
    repeat_shape[0] = 1
    # 构建这个batch_indices是必要的:[[0,0,...,0],[1,1,...,1]...,[n,n,...,n]]
    batch_indices = torch.arange(B, dtype=torch.long).to(device).view(view_shape).repeat(repeat_shape)  # (b,npoint)
    new_points = points[batch_indices, idx, :]  # 根据索引提取相关的点位置
    return new_points

使用示例：

new_xyz = index_points(xyz, farthest_point_sample(xyz, S))

2. Grouping Layer

在上一个步骤中，我们已经获得了K个采样点。那么，如果根据采样点划分区域有两个方法：1）以r为半径划分出一个局部区域，在这个局部区域中采样K个点，不足K个可以重复采样；2）直接以最近邻的K个点作为采样点；PointNet使用了划分半径的方法，因为实验证实这种方法更好，其确保了采样区域的局部性，确保提取到局部特征，更有通用性。

# 作用: 利用点之间的两两距离来提取前k个离nsample个采样点最近距离的点
#       如果比k个少，则复制最近点; 如果比k个多, 则只提取前k个,这里的k个最普通的按顺序提取的
#       也就是本质实现上只要符合条件的点,就随机挑选k个即可,不会按照距离优先挑选
def query_ball_point(radius, nsample, xyz, new_xyz):
    """
    Input:
        radius: local region radius
        nsample: max sample number in local region
        xyz: all points, [B, N, 3]
        new_xyz: query points, [B, S, 3]
    Return:
        group_idx: grouped points index, [B, S, nsample]
    """
    device = xyz.device
    B, N, C = xyz.shape
    _, S, _ = new_xyz.shape
    # 每个组直接从0开始分配编号
    group_idx = torch.arange(N, dtype=torch.long).to(device).view(1, 1, N).repeat([B, S, 1])
    sqrdists = square_distance(new_xyz, xyz)    # 统计点之间的两两距离
    group_idx[sqrdists > radius ** 2] = N       # 对于距离外的点进行赋N值来筛除
    # 由于被赋予N值较大，所以进行sort会直接将这些N值放在最后，随后按顺序挑选前K个数值
    group_idx = group_idx.sort(dim=-1)[0][:, :, :nsample]   # 这里sort返回的是值而非索引值,没有按照距离来挑选
    # 对距离进行排序并挑选前k个最小值索引
    # idx = (sqrdists > radius ** 2).sort(dim=-1)[1][:, :, :nsample]
    # group_idx = group_idx[idx]
    group_first = group_idx[:, :, 0].view(B, S, 1).repeat([1, 1, nsample])  # 提取每一组首个采样单
    mask = group_idx == N   # 如果在之前被赋值为N,说明半径内点数量不足,需要使用首个最近点来填充k
    group_idx[mask] = group_first[mask]     # 返回最后符合分组点索引
    return group_idx


# 作用: 根据索引提取对应的点信息
def index_points(points, idx):
    """

    Input:
        points: input points data, [B, N, C]
        idx: sample index data, [B, S]
    Return:
        new_points:, indexed points data, [B, S, C]
    """
    device = points.device
    B = points.shape[0]
    view_shape = list(idx.shape)    # (b,1)
    view_shape[1:] = [1] * (len(view_shape) - 1)
    repeat_shape = list(idx.shape)  # (1,npoint)
    repeat_shape[0] = 1
    # 构建这个batch_indices是必要的:[[0,0,...,0],[1,1,...,1]...,[n,n,...,n]]
    batch_indices = torch.arange(B, dtype=torch.long).to(device).view(view_shape).repeat(repeat_shape)  # (b,npoint)
    new_points = points[batch_indices, idx, :]  # 根据索引提取相关的点位置
    return new_points

3. PointNet Layer

PointNet本身已经具备对点云的特征提取能力，所以对分区中的局部点云进行特征提取。三部曲整个过程的特征维度变化如下，对于一个原始点云数据Nx(d+C)，表示N个点，d是几何位置特征，C是点云特征。首先进行采样获得N1个质心，而在每个质心中以半径r构造局部区域进行采样K个点，维度变化为N1xKx(d+C)。而PointNet所进行的特征提取会将局部点云特征抽象成一个向量，实现编码操作，维度变化为N1x(d+C1)。

为此，在进行了采样和分组之后，就是进行特征提取的过程。

class PointNetSetAbstraction(nn.Module):
    def __init__(self, npoint, radius, nsample, in_channel, mlp, group_all):
        super(PointNetSetAbstraction, self).__init__()
        self.npoint = npoint    # 对多少个点设置搜索半径
        self.radius = radius    # 搜索半径
        self.nsample = nsample  # 半径内的采样点数量
        self.mlp_convs = nn.ModuleList()
        self.mlp_bns = nn.ModuleList()
        last_channel = in_channel
        for out_channel in mlp:
            self.mlp_convs.append(nn.Conv2d(last_channel, out_channel, 1))
            self.mlp_bns.append(nn.BatchNorm2d(out_channel))
            last_channel = out_channel
        self.group_all = group_all  # 全局分组(分为1个)

    def forward(self, xyz, points):
        """
        Input:
            xyz: input points position data, [B, C, N]
            points: input points data, [B, D, N]
        Return:
            new_xyz: sampled points position data, [B, C, S]
            new_points_concat: sample points feature data, [B, D', S]
        """
        xyz = xyz.permute(0, 2, 1)
        if points is not None:
            points = points.permute(0, 2, 1)

        # 核心操作:采样+分组
        # (b,1024,3) -> (b,npoint, nsample, channel) -> (b, channel, nsample, npoint)
        if self.group_all:
            # 这里由于是全局提取，点集划分为1组，所以这里的xyz坐标不再需要
            new_xyz, new_points = sample_and_group_all(xyz, points)
        else:
            # 这里需要将xyz的局部位置补充到特征上,也就是增加相对位置信息,所以需要新的xyz坐标
            new_xyz, new_points = sample_and_group(self.npoint, self.radius, self.nsample, xyz, points)
        # new_xyz: sampled points position data, [B, npoint, C]
        # new_points: sampled points data, [B, npoint, nsample, C+D]
        new_points = new_points.permute(0, 3, 2, 1) # [B, C+D, nsample,npoint]

        # (b, inchannel, nsample, npoint) -> (b,mlp,nsample,npoint)
        for i, conv in enumerate(self.mlp_convs):
            bn = self.mlp_bns[i]
            new_points = F.relu(bn(conv(new_points)))

        # (b,channel,nsample,npoint) -> (b,channel,npoint)
        new_points = torch.max(new_points, 2)[0]
        new_xyz = new_xyz.permute(0, 2, 1)
        return new_xyz, new_points

这里实现的sample_and_group与sample_and_group_all就是上述sampling layer与grouping layer的结合。

def sample_and_group(npoint, radius, nsample, xyz, points, returnfps=False):
    """
    Input:
        npoint:
        radius:
        nsample:
        xyz: input points position data, [B, N, 3]
        points: input points data, [B, N, D]
    Return:
        new_xyz: sampled points position data, [B, npoint, nsample, 3]
        new_points: sampled points data, [B, npoint, nsample, 3+D]
    """
    B, N, C = xyz.shape
    S = npoint
    fps_idx = farthest_point_sample(xyz, npoint)   # 返回采用点的索引
    new_xyz = index_points(xyz, fps_idx)           # 根据索引提取相应点: [b, npoint, 3]
    idx = query_ball_point(radius, nsample, xyz, new_xyz)   # 对于每个采样点再分组在半径中随机提取k个邻域点
    grouped_xyz = index_points(xyz, idx)  # 根据索引再提取k个邻域的位置信息: [b, npoint, nsample, 3]
    grouped_xyz_norm = grouped_xyz - new_xyz.view(B, S, 1, C)   # 转为为相对位置信息

    if points is not None:
        grouped_points = index_points(points, idx)  # 根据索引提取特征信息
        new_points = torch.cat([grouped_xyz_norm, grouped_points], dim=-1)  # 将位置信息与特征信息拼接一起: [B, npoint, nsample, C+D]
    else:
        new_points = grouped_xyz_norm
    if returnfps:
        return new_xyz, new_points, grouped_xyz, fps_idx
    else:
        return new_xyz, new_points


def sample_and_group_all(xyz, points):
    """
    Input:
        xyz: input points position data, [B, N, 3]
        points: input points data, [B, N, D]
    Return:
        new_xyz: sampled points position data, [B, 1, 3]  没用
        new_points: sampled points data, [B, 1, N, 3+D]
    """
    device = xyz.device
    B, N, C = xyz.shape
    new_xyz = torch.zeros(B, 1, C).to(device)   # 不再需要相对位置信息,所以不再需要xyz
    grouped_xyz = xyz.view(B, 1, N, C)   # 当前点集全部为1组
    # 没有利用局部位置信息(比如替换如下两行)
    # new_xyz = xyz.mean(dim=-1)
    # grouped_xyz = grouped_xyz - new_xyz.view(B, 1, 1, C)
    if points is not None:
        new_points = torch.cat([grouped_xyz, points.view(B, 1, N, -1)], dim=-1)    # 这里没有减去点集的质心，没有构建相对位置
    else:
        new_points = grouped_xyz
    return new_xyz, new_points

此外，在进行PointNet对局部特征提取时，需要将一个区域的点减去这个区域的质心，相当于构建成质心的相对坐标。这样做的目的是通过使用相对坐标和点特征，可以捕获局部区域中的点之间关系。这样的一个考虑其实是挺合理的，因为分区后每个局部区域相当于是一个整体了，不应该使用另外一个坐标体系，而是应该使用局部特征自由的坐标体系。

所以在代码实现中会有相对位置的体现，也将这些相对位置信息与特征信息拼接了在一起。

def sample_and_group(npoint, radius, nsample, xyz, points, returnfps=False):
	......
	grouped_xyz_norm = grouped_xyz - new_xyz.view(B, S, 1, C)   # 转为为相对位置信息

    if points is not None:
        grouped_points = index_points(points, idx)  # 根据索引提取特征信息
        new_points = torch.cat([grouped_xyz_norm, grouped_points], dim=-1)  # 将位置信息与特征信息拼接一起: [B, npoint, nsample, C+D]
        ......

但是，这在全部点分为1组的时候是没有使用到相对位置信息的，详细见sample_and_group_all函数实现。

4. MSG Module

对于点云数据来说，其密度一般是不均衡的，点云部分密集部分稀疏。这会给网路带来问题，在密集部分学习到的特征可能无法推广到稀疏部分，同样在稀疏部分进行训练也可能无法准确地获得其细粒度的局部结构。为此，由于局部过分稀疏而导致模型采用局部结构破坏时，应该扩大当前的尺度范围，以准确把握当前的局部结构。希望网络自适应点云的密度变化成为了需求，为此PointNet中提出了MSG和MRG模块。如下所示：

在这里插入图片描述
对于MSG模块，简单来说其实就是构造多个半径，进行多局部区域特征提取，再将这些不同半径的局部特征拼接在一起

class PointNetSetAbstractionMsg(nn.Module):
    def __init__(self, npoint, radius_list, nsample_list, in_channel, mlp_list):
        super(PointNetSetAbstractionMsg, self).__init__()
        self.npoint = npoint
        self.radius_list = radius_list
        self.nsample_list = nsample_list
        self.conv_blocks = nn.ModuleList()
        self.bn_blocks = nn.ModuleList()
        for i in range(len(mlp_list)):  # 循环创建卷积和bn层
            convs = nn.ModuleList()
            bns = nn.ModuleList()
            last_channel = in_channel + 3
            for out_channel in mlp_list[i]:     # 为每一组分别构建一个ModuleList()
                convs.append(nn.Conv2d(last_channel, out_channel, 1))
                bns.append(nn.BatchNorm2d(out_channel))
                last_channel = out_channel
            self.conv_blocks.append(convs)
            self.bn_blocks.append(bns)

    def forward(self, xyz, points):
        """
        Input:
            xyz: input points position data, [B, C, N]
            points: input points data, [B, D, N]
        Return:
            new_xyz: sampled points position data, [B, C, S]
            new_points_concat: sample points feature data, [B, D', S]
        """
        xyz = xyz.permute(0, 2, 1)
        if points is not None:
            points = points.permute(0, 2, 1)

        B, N, C = xyz.shape
        S = self.npoint
        new_xyz = index_points(xyz, farthest_point_sample(xyz, S))    # 根据最远点采样返回的索引获取对应点

        # 对每个半径r各自设置k个邻域点以及其特征提取的ModuleList
        new_points_list = []
        for i, radius in enumerate(self.radius_list):
            K = self.nsample_list[i]    # 获取当前分组的邻域点数
            group_idx = query_ball_point(radius, K, xyz, new_xyz)   # 根据每组的半径r和K获取分组索引
            grouped_xyz = index_points(xyz, group_idx)   # 根据索引获取相关位置信息
            grouped_xyz -= new_xyz.view(B, S, 1, C)      # 构建相对位置特征
            if points is not None:
                grouped_points = index_points(points, group_idx)    # 根据索引获取相关位置信息
                grouped_points = torch.cat([grouped_points, grouped_xyz], dim=-1)   # 原始特征与位置特征拼接
            else:   # 没有norm信息
                grouped_points = grouped_xyz

            # 每组特征用自己的ModuleList进行特征编码(仍在循环中)
            grouped_points = grouped_points.permute(0, 3, 2, 1)  # [B, D, K, S]
            for j in range(len(self.conv_blocks[i])):
                conv = self.conv_blocks[i][j]
                bn = self.bn_blocks[i][j]
                grouped_points = F.relu(bn(conv(grouped_points)))
            new_points = torch.max(grouped_points, 2)[0]  # max pooling: [B, D', S]
            new_points_list.append(new_points)

        # 对来自不同半径提取地特征拼接起来
        new_xyz = new_xyz.permute(0, 2, 1)
        new_points_concat = torch.cat(new_points_list, dim=1)   # 多尺度融合的关键就是拼接起来再进行特征提取
        return new_xyz, new_points_concat

在进行自适应密度不均衡方法时，在训练上也有辅助一个随机丢弃的策略。就是对每个点都以一定的概率进行丢弃，实现训练时的多样化。这个策略只在训练过程中使用，在测试过程中关闭，目的是泛化模型能力，更加鲁棒性。丢弃概率 θ∈[0, p] 其中 p≤1 ，以防止空集的产生（这里设置为0.95）

这部分体现在训练过程的代码中：

'''TRANING'''
    logger.info('Start training...')
    for epoch in range(start_epoch, args.epoch):
        log_string('Epoch %d (%d/%s):' % (global_epoch + 1, epoch + 1, args.epoch))
        mean_correct = []
        classifier = classifier.train()

        scheduler.step()
        # 在加载dataset时只进行了相对位置处理与归一化的操作, 其余的数据增强在训练过程中进行
        for batch_id, (points, target) in tqdm(enumerate(trainDataLoader, 0), total=len(trainDataLoader), smoothing=0.9):
            optimizer.zero_grad()

            # 利用numpy格式进行点云的数据增强(没有使用随机旋转的方法)
            points = points.data.numpy()
            points = provider.random_point_dropout(points)  # 随机丢弃
            points[:, :, 0:3] = provider.random_scale_point_cloud(points[:, :, 0:3])    # 随机缩放
            points[:, :, 0:3] = provider.shift_point_cloud(points[:, :, 0:3])   # 随机整体平移
            points = torch.Tensor(points)
            points = points.transpose(2, 1)
            ......

对于这部分出现的一些数据增强方法，详细可以查看另一外一篇博客：点云中常用的数据增强方法

# 作用: 随机丢弃点云中的点, 操作是将丢弃点全部赋予first point的值, 也就是是一个伪丢弃(shape是没有改变的)
def random_point_dropout(batch_pc, max_dropout_ratio=0.875):
    ''' batch_pc: BxNx3 '''
    for b in range(batch_pc.shape[0]):
        dropout_ratio = np.random.random()*max_dropout_ratio  # 设置随机丢弃的概率，区间是0~0.875
        drop_idx = np.where(np.random.random((batch_pc.shape[1])) <= dropout_ratio)[0]  # 找到那些比概率低的索引值来丢弃
        if len(drop_idx) > 0:
            batch_pc[b,drop_idx,:] = batch_pc[b,0,:]   # 这里所谓的丢弃就是将值设置与第一个点相同
    return batch_pc

ps：在PointNet++分组采样k个邻域点时，如果符合距离的点不足k个，也是使用第一个点来对其他不满足距离的点进行替换，达到了一种单点重复多次的效果。和这里的随机丢弃使用的一样的方法。

5. Feature Propagation

对于分割任务来说，需要获得点云数据中每个点的特征，以作为的后续的分类任务处理。其中一种方案是对点云的所有点都采用为抽象特征层中的质心，但这种方式计算量比较大。另外一种就是类似图像语义分割处理中对点云特征进行上采样。图像特征上采样回归到到原始的尺寸还比较容易理解，而点云数据的上采样就比较难以想象了。由于点云数据的特殊性，这里需要适当的转变一下思路，比如我不是需要类似图像一样进行插值处理，而是将下一级提取到特征融合回去原始特征中进行结合。假设这里N1级特征维度是N1xC1，N2级特征维度是N2xC2，这里需要上采样的层级是N2，原始层级是N1。那么，对于来自N1的每一个点，其实都可以从N2中提取到离其最邻近的K个点，这K个点特征维度是一样的都是C2，那么现在需要做的就是对这K个点进行反距离加权平均获得当前点的特征C2‘，然后拼接回去原始的N1点中。那么，经过上采样后的特征维度就变成了N1x(C1+C2’)，对于这里点云的上采样来说是没有涉及插值操作的。

Feature Propagation的实现代码如下所示：

class PointNetFeaturePropagation(nn.Module):
    def __init__(self, in_channel, mlp):
        super(PointNetFeaturePropagation, self).__init__()
        self.mlp_convs = nn.ModuleList()
        self.mlp_bns = nn.ModuleList()
        last_channel = in_channel
        for out_channel in mlp:
            self.mlp_convs.append(nn.Conv1d(last_channel, out_channel, 1))
            self.mlp_bns.append(nn.BatchNorm1d(out_channel))
            last_channel = out_channel

    def forward(self, xyz1, xyz2, points1, points2):
        """
        Input:
            xyz1: input points position data, [B, C, N]
            xyz2: sampled input points position data, [B, C, S]
            points1: input points data, [B, D, N]
            points2: input points data, [B, D, S]
        Return:
            new_points: upsampled points data, [B, D', N]
        """
        xyz1 = xyz1.permute(0, 2, 1)
        xyz2 = xyz2.permute(0, 2, 1)

        points2 = points2.permute(0, 2, 1)
        B, N, C = xyz1.shape    # 弱语义点集
        _, S, _ = xyz2.shape    # 强语义点集

        # 实现流程: 对弱语义点集的每个点，获取3个强语义中的最小距离点，将这些强语义点集的最小距离点特征按照反距离加权和赋予弱语义点特征
        # 最后，将新构建的弱语义特征与原本的弱语义特征拼接起来，实现了上采样过程
        if S == 1:
            interpolated_points = points2.repeat(1, N, 1)
        else:
            dists = square_distance(xyz1, xyz2)    # xyz2每一个点离xyz1的距离矩阵: (B,N,S)
            dists, idx = dists.sort(dim=-1)        # 对距离矩阵进行从小到大排序，选出前3个最小值以及其索引
            dists, idx = dists[:, :, :3], idx[:, :, :3]  # [B, N, 3]

            dist_recip = 1.0 / (dists + 1e-8)   # 按反距离归一化的操作结果作为权重:(b,N,3)
            norm = torch.sum(dist_recip, dim=2, keepdim=True)
            weight = dist_recip / norm    # 获得3个距离最小点的作用权重

            # 根据索引在强语义特征层中(point_s2)提取强语义特征，同时按照反距离权重和相加
            interpolated_points = torch.sum(index_points(points2, idx) * weight.view(B, N, 3, 1), dim=2)    # (B,N,C2)

        # 将原特征与上采样特征拼接
        if points1 is not None:
            points1 = points1.permute(0, 2, 1)
            new_points = torch.cat([points1, interpolated_points], dim=-1)
        else:
            new_points = interpolated_points

        # 根据传入的MLP列表进行特征变化
        new_points = new_points.permute(0, 2, 1)
        for i, conv in enumerate(self.mlp_convs):
            bn = self.mlp_bns[i]
            new_points = F.relu(bn(conv(new_points)))
        return new_points

拼接后的特征会经过一个unit-pointnet的操作，相当于是1x1的卷积，进行一个融合特征的提取处理。这里可以理解为分区的采样点设置为1，也就是没有分区，简单的通过了一个多层感知机的处理。

6. Classification

分类网络代码实现：

class get_model(nn.Module):
    def __init__(self,num_class,normal_channel=True):
        super(get_model, self).__init__()
        in_channel = 3 if normal_channel else 0
        self.normal_channel = normal_channel
        self.sa1 = PointNetSetAbstractionMsg(512, [0.1, 0.2, 0.4], [16, 32, 128], in_channel,[[32, 32, 64], [64, 64, 128], [64, 96, 128]])
        self.sa2 = PointNetSetAbstractionMsg(128, [0.2, 0.4, 0.8], [32, 64, 128], 320,[[64, 64, 128], [128, 128, 256], [128, 128, 256]])
        self.sa3 = PointNetSetAbstraction(None, None, None, 640 + 3, [256, 512, 1024], True)
        self.fc1 = nn.Linear(1024, 512)
        self.bn1 = nn.BatchNorm1d(512)
        self.drop1 = nn.Dropout(0.4)
        self.fc2 = nn.Linear(512, 256)
        self.bn2 = nn.BatchNorm1d(256)
        self.drop2 = nn.Dropout(0.5)
        self.fc3 = nn.Linear(256, num_class)

    def forward(self, xyz):
        B, _, _ = xyz.shape
        if self.normal_channel:
            norm = xyz[:, 3:, :]
            xyz = xyz[:, :3, :]
        else:
            norm = None
        # 特征变换过程: (b,3,1024) -> l1_points:(b,320,512) -> l2_points:(b,640,128) -> l3_points:(b,1024,1) -> (b,1024)
        l1_xyz, l1_points = self.sa1(xyz, norm)
        l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
        l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)     # 单层的set abstraction特征提取
        x = l3_points.view(B, 1024)
        # (b,1024) -> (b,512) -> (b,256) -> (b,40)
        x = self.drop1(F.relu(self.bn1(self.fc1(x))))
        x = self.drop2(F.relu(self.bn2(self.fc2(x))))
        x = self.fc3(x)
        x = F.log_softmax(x, -1)

        return x, l3_points

7. Scene Segmentation

语义分割网络代码实现：

class get_model(nn.Module):
    def __init__(self, num_classes):
        super(get_model, self).__init__()

        # 每个SA层只分配两个尺度，输出特征channel等与两个尺度的最后MLP列表和
        self.sa1 = PointNetSetAbstractionMsg(1024, [0.05, 0.1], [16, 32], 9, [[16, 16, 32], [32, 32, 64]])
        self.sa2 = PointNetSetAbstractionMsg(256, [0.1, 0.2], [16, 32], 32+64, [[64, 64, 128], [64, 96, 128]])
        self.sa3 = PointNetSetAbstractionMsg(64, [0.2, 0.4], [16, 32], 128+128, [[128, 196, 256], [128, 196, 256]])
        self.sa4 = PointNetSetAbstractionMsg(16, [0.4, 0.8], [16, 32], 256+256, [[256, 256, 512], [256, 384, 512]])
        self.fp4 = PointNetFeaturePropagation(512+512+256+256, [256, 256])
        self.fp3 = PointNetFeaturePropagation(128+128+256, [256, 256])
        self.fp2 = PointNetFeaturePropagation(32+64+256, [256, 128])
        self.fp1 = PointNetFeaturePropagation(128, [128, 128, 128])     # 这里没有与原来的特征相加
        self.conv1 = nn.Conv1d(128, 128, 1)
        self.bn1 = nn.BatchNorm1d(128)
        self.drop1 = nn.Dropout(0.5)
        self.conv2 = nn.Conv1d(128, num_classes, 1)

    def forward(self, xyz):
        l0_points = xyz
        l0_xyz = xyz[:,:3,:]

        # 下采样过程
        # l0:(b,9,1024) -> l1:(b,96,1024) -> l2:(b,256,256) -> l3:(b,512,64) -> l4:(b,1024,16)
        l1_xyz, l1_points = self.sa1(l0_xyz, l0_points)
        l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
        l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)
        l4_xyz, l4_points = self.sa4(l3_xyz, l3_points)

        # 上采样过程
        l3_points = self.fp4(l3_xyz, l4_xyz, l3_points, l4_points)  # l3:(b,512,64) -> (b,256,64)
        l2_points = self.fp3(l2_xyz, l3_xyz, l2_points, l3_points)  # l2:(b,256,256) -> (b,256,256)
        l1_points = self.fp2(l1_xyz, l2_xyz, l1_points, l2_points)  # l1:(b,96,1024) -> (b,128,1024)
        l0_points = self.fp1(l0_xyz, l1_xyz, None, l1_points)       # l0:(b,9,1024) -> (b,128,1024)

        # 分类头: (b,128,1024) -> (b,128,1024) -> (b,k,1024) -> (b,1024,k)
        x = self.drop1(F.relu(self.bn1(self.conv1(l0_points))))
        x = self.conv2(x)
        x = F.log_softmax(x, dim=1)
        x = x.permute(0, 2, 1)
        return x, l4_points

8. Part Segmentation

部件分割网络代码实现：

class get_model(nn.Module):
    def __init__(self, num_classes, normal_channel=False):
        super(get_model, self).__init__()
        if normal_channel:
            additional_channel = 3
        else:
            additional_channel = 0
        self.normal_channel = normal_channel
        self.sa1 = PointNetSetAbstractionMsg(512, [0.1, 0.2, 0.4], [32, 64, 128], 3+additional_channel, [[32, 32, 64], [64, 64, 128], [64, 96, 128]])
        self.sa2 = PointNetSetAbstractionMsg(128, [0.4,0.8], [64, 128], 128+128+64, [[128, 128, 256], [128, 196, 256]])
        self.sa3 = PointNetSetAbstraction(npoint=None, radius=None, nsample=None, in_channel=512 + 3, mlp=[256, 512, 1024], group_all=True)
        self.fp3 = PointNetFeaturePropagation(in_channel=1536, mlp=[256, 256])
        self.fp2 = PointNetFeaturePropagation(in_channel=576, mlp=[256, 128])
        self.fp1 = PointNetFeaturePropagation(in_channel=150+additional_channel, mlp=[128, 128])
        self.conv1 = nn.Conv1d(128, 128, 1)
        self.bn1 = nn.BatchNorm1d(128)
        self.drop1 = nn.Dropout(0.5)
        self.conv2 = nn.Conv1d(128, num_classes, 1)

    def forward(self, xyz, cls_label):
        # Set Abstraction layers
        B,C,N = xyz.shape
        if self.normal_channel:
            l0_points = xyz
            l0_xyz = xyz[:,:3,:]
        else:
            l0_points = xyz
            l0_xyz = xyz
        l1_xyz, l1_points = self.sa1(l0_xyz, l0_points)
        l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
        l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)

        # Feature Propagation layers
        l2_points = self.fp3(l2_xyz, l3_xyz, l2_points, l3_points)
        l1_points = self.fp2(l1_xyz, l2_xyz, l1_points, l2_points)
        cls_label_one_hot = cls_label.view(B,16,1).repeat(1,1,N)
        l0_points = self.fp1(l0_xyz, l1_xyz, torch.cat([cls_label_one_hot,l0_xyz,l0_points],1), l1_points)

        # FC layers
        feat = F.relu(self.bn1(self.conv1(l0_points)))
        x = self.drop1(feat)
        x = self.conv2(x)
        x = F.log_softmax(x, dim=1)
        x = x.permute(0, 2, 1)
        return x, l3_points

需要注意的是，这里的Part Segmentation在处理特征的时候还将label信息编码成one-hot embedding来作为点的额外特征来处理，也就是说对于每个点的特征，都编码上去了场景label的特征。

在PointNet代码中会处理得明显一点

# (b,2048+16) -> (b,2064,1) -> (b,2064,1024) -> (b,64+128+128+512+2048+2064,1024)
out_max = torch.cat([out_max,label.squeeze(1)],1)       # 将点云场景的分类特征编码为one-hot embedding信息
expand = out_max.view(-1, 2048+16, 1).repeat(1, 1, N)   # 对全局特征重复N遍,再与之前的不同尺度特征进行拼接(5种尺度)
concat = torch.cat([expand, out1, out2, out3, out4, out5], 1)   # 多尺度信息拼接

参考资料：

1. 论文阅读笔记 | 三维目标检测——PointNet

2. 论文阅读笔记 | 三维目标检测——PointNet++

3. PointNet++详解（一）：数据增强方法

4.https://github.com/yanx27/Pointnet_Pointnet2_pytorch文章来源地址https://uudwc.com/A/j30j