The understanding and interpretation of complex 3D environments is a key challenge of
autonomous driving. Lidar sensors and their recorded point clouds are particularly interesting
for this challenge since they provide accurate 3D information about the environment. This work
presents a multimodal approach based on deep learning for panoptic segmentation of 3D point
clouds. It builds upon and combines the three key aspects multi view architecture temporal
feature fusion and deep sensor fusion.