Paper
Visual Localization Using Sparse Semantic 3D Map
Motivation
- most traditional mathods fails to locate the camera under a wide range of viewing conditions variations including season and illumination changes,as well as weather and day-night varitations.
Proposed scheme
- combine image-based and structure-based localization with
semantic information
- separate into three parts:
- sparase semantic 3D map
- apply off-the-shelf segmentation CNNs (DeepLabv3+ network) to all database images
- reproject all database images to 3D point cloud
- apply maximum voting to allocate a labels for each 3D point in 3D point cloud
- remove dynamic objects in 3D point cloud to obtain Ms
- Ms represents a cleaner sparse semenatic 3D map
- semantic score
- obtain top-k ranked database images IR for each query image IQ by
NetVLAD
- for every selected IiR, find 2D-3D matches through KNN search and ratio test (blue dotted lines)
- obtain 2D-3D correpsondences between IiR and Ms (green solid lines)
- obtain 3D-2D matches between Ms and IQ (red solid lines)
- apply
PnP solver
to recover query pose - project all
visiable
3D points into IQ (visiable
means 3D points should be seen by IQ) - count # 3D points whose semenatic labels are the same as those in IQ
- obtain top-k ranked database images IR for each query image IQ by
- weighted RANSAC pose estimation
- 2D-3D matches produced by the same IiR are assigned the semantic score of IiR
- normalize each score by the sum of all 2D-3D match socres
- use the normalized score as a
weight p for RANSAC's sampling
- different from removing 2D-3D matches with lower semantic scores
- appendix
- DeepLabv3+ network (paper link)
- NetVLAD (paper link)
- sparase semantic 3D map
Experiment
- visual localization dataset RobotCar Seasons (Benchamrk)
- use
DeepLabv3+network
to segment all dataset images and assign a label to each 3D point by maximum voting with reprojecting pixel labels in all its visiable database images - in the image retrival step, use
NetVLAD
andpre-trained Pitts30K model
generate4096-dimensional descriptor vectors
for each query and database images, and normalize L2 distances of the descriptors - retrival number: k=30 for day conditions; k=50 for night conditions
- precision threshold: high (0.25m, 2deg); mdeium (0.5m, 5deg); coarse (5m, 10deg)
- comparsion with related works: Dense-VLAD, NetVLAD, FAB-MAP, Active Search, CSL, Non-semantic