Flow-SfM | Minsu Kwon

High-level overview of the Flow-SfM pipeline: flow-gated pair scoring, budgeted scene graph construction, conflict-free deduplication into virtual keypoints, and a standard SfM back-end with weighted triangulation.

Overview

Flow-SfM is a detector-free Structure-from-Motion (SfM) framework that leverages dense correspondences from UFM to build a more reliable and efficient multi-view reconstruction pipeline. Rather than relying only on sparse keypoints, Flow-SfM operates on dense flow and covisibility predictions to construct a budgeted scene graph and robust multi-view tracks.

The key idea is to use dense flow and symmetric confidence estimates to score image pairs, select only geometrically useful connections, and then compress millions of dense matches into a small set of virtual keypoints via conflict-free deduplication. These virtual keypoints are fed into an off-the-shelf SfM back-end (e.g., COLMAP) for pose estimation and sparse structure recovery, followed by dense triangulation. This allows Flow-SfM to handle low-texture, low-overlap, and wide-baseline image collections more robustly than traditional sparse-feature SfM.

Abstract

Classical SfM systems are built around sparse keypoints and fully-connected scene graphs, which become brittle and inefficient in challenging regimes such as low-texture scenes, wide baselines, or unordered image collections. In contrast, modern dense matchers such as UFM provide sub-pixel accurate dense correspondences and covisibility predictions across views, but it is non-trivial to integrate them into efficient and stable SfM pipelines.

Flow-SfM bridges this gap with a flow-gated SfM formulation. We first obtain dense UFM correspondences and per-pixel confidence maps between candidate image pairs. A flow-gated pair scoring strategy selects only promising pairs under a global pair budget, forming a sparse but informative scene graph. For each selected pair, we cluster dense correspondences into a small number of virtual keypoints via conflict-free deduplication, which suppresses inconsistent tracks and reduces the burden on bundle adjustment. The resulting keypoints are passed to a standard SfM back-end to recover camera poses and a sparse 3D structure, followed by dense triangulation guided by the UFM confidences. On ETH3D and Tanks & Temples, Flow-SfM achieves high registration rates and strong pose accuracy in difficult settings, and provides high-quality initialization for downstream 3D Gaussian Splatting and mesh reconstruction pipelines.

Method

At a high level, Flow-SfM consists of the following components:

Dense UFM correspondences: For each candidate image pair, UFM predicts dense correspondences, covisibility, and symmetric confidence maps at sub-pixel precision.
Flow-gated pair scoring: We aggregate per-pixel confidence into a flow-gated pair score that measures how useful a pair is for SfM. A global pair budget and per-node degree constraints are imposed to construct a compact, well-connected scene graph.
Conflict-free deduplication: For each selected pair, dense matches are clustered into micro-groups and then merged into a small set of virtual keypoints. This reduces track conflicts and keeps the number of observations per image manageable.
SfM back-end integration: The virtual keypoints and their cross-view tracks are fed into a standard SfM back-end (e.g., COLMAP) to estimate camera poses and a sparse 3D structure.
Dense 3D reconstruction: Finally, we perform dense triangulation guided by UFM confidences to obtain a dense point cloud, which can be further used for 3D Gaussian Splatting or mesh extraction.

The Flow-SfM design emphasizes a balance between robustness and efficiency: by aggressively filtering pairs and deduplicating matches before bundle adjustment, we can exploit dense correspondence quality without exploding computational cost.

Results

We evaluate Flow-SfM on several challenging multi-view benchmarks, including ETH3D, Tanks & Temples, and additional low-overlap image collections. Compared to baseline SfM pipelines built on sparse keypoints, Flow-SfM:

Achieves higher registration rates and more stable pose estimation in low-texture or repetitive regions.
Maintains competitive or improved rotational and translational accuracy (RRA/RTA) under a constrained pair budget.
Provides better initial poses for downstream 3D Gaussian Splatting pipelines, leading to higher-quality meshes and denser, cleaner point clouds in scenes such as Barn and Ignatius.

Flow-SfM qualitative and quantitative results on multi-view reconstruction benchmarks.

Qualitative 3D reconstruction and quantitative pose estimation results on ETH3D and Tanks & Temples. Flow-SfM provides higher-quality reconstructions and stronger pose accuracy compared to baselines.

Relation to Flow-2DGS and 3D Gaussian Splatting

Flow-SfM is designed not only as a standalone SfM system but also as a robust front-end for neural rendering and Gaussian Splatting. In subsequent work (Flow-2DGS), we explore how dense flow-guided monocular geometry and Flow-SfM camera poses can be used to obtain metric-scale, geometrically accurate 3D reconstructions with 2D Gaussian Splatting. Flow-SfM provides the camera trajectory and an initial 3D structure, which significantly stabilizes training and improves the quality of the reconstructed meshes.

BibTeX (placeholder)

Once the paper is accepted, the BibTeX entry will be updated here.

@inproceedings{kwon202Xflowsfm,
  title     = {Flow-SfM: Flow-Gated Structure from Motion with Dense UFM Correspondences},
  author    = {Kwon, Minsu and Others},
  booktitle = {...},
  year      = {202X}
}