Skip to content

Implement transparent database shadowing in Database::Open#4397

Open
chuong-data61 wants to merge 3 commits into
colmap:mainfrom
chuong-data61:feature/database-shadowing-dev
Open

Implement transparent database shadowing in Database::Open#4397
chuong-data61 wants to merge 3 commits into
colmap:mainfrom
chuong-data61:feature/database-shadowing-dev

Conversation

@chuong-data61
Copy link
Copy Markdown

Summary:

This PR introduces a transparent database shadowing mechanism in colmap::Database. When enabled via the COLMAP_USE_LOCAL_DATABASE=1 environment variable, COLMAP will automatically clone the database to local temporary storage (/tmp) upon opening and synchronize all changes back to the original (potentially remote) path upon destruction.

Problem:

Working directly on SQLite databases stored on network-attached storage (NAS) or high-latency filesystems often leads to:

  1. Poor Performance: SQLite's frequent small random reads/writes are highly sensitive to network latency.
  2. Reliability Issues: Intermittent network drops can cause database corruption or fatal errors during long-running SfM/MVS tasks.

Solution:

By intercepting Database::Open, we can redirect the database interaction to a fast local disk. The implementation uses a custom deleter pattern on the std::shared_ptr<Database> returned by the factory to ensure that:

  • The shadow database is properly closed and flushed before synchronization.
  • The synchronization (copy-back) is atomic relative to the COLMAP process lifecycle.
  • Cleanup of temporary files occurs automatically.

Technical Details:

  • Opt-in Mechanism: Controlled by COLMAP_USE_LOCAL_DATABASE=1.
  • Path Handling: Uses std::filesystem for platform-independent path manipulation.
  • Robustness: Includes try-catch blocks to prevent data loss if the remote drive becomes unavailable during sync-back.
  • Transparency: No changes required to any calling code outside of the Database class.

Verification:

Verified on a 120-image reconstruction pipeline. Feature extraction, matching, mapping, and dense reconstruction all successfully utilized the shadow database and correctly synchronized results back to the source NAS.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a database shadowing mechanism to improve performance on high-latency network drives. When the COLMAP_USE_LOCAL_DATABASE environment variable is set, the database is transparently copied to a local temporary directory and synchronized back to the original path upon destruction. The reviewer provided several actionable suggestions to improve the implementation, including ensuring the synchronization process is atomic to prevent file corruption, using a more robust path comparison method to detect existing temporary files, and optimizing the random number generation for shadow file naming.

Comment on lines +116 to +118
std::filesystem::copy_file(
local_path, path, std::filesystem::copy_options::overwrite_existing);
std::filesystem::remove(local_path);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The synchronization back to the original path is not atomic. If the process is interrupted or a network failure occurs during copy_file, the original database file may be left in a corrupted or truncated state. To ensure atomicity, copy the shadow database to a temporary file in the same directory as the target path and then use std::filesystem::rename to replace the original file.

            const std::filesystem::path temp_sync_path =
                path.parent_path() / (path.filename().string() + ".sync.tmp");
            std::filesystem::copy_file(
                local_path,
                temp_sync_path,
                std::filesystem::copy_options::overwrite_existing);
            std::filesystem::rename(temp_sync_path, path);
            std::filesystem::remove(local_path);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great suggestion.

Comment thread src/colmap/scene/database.cc Outdated
Comment thread src/colmap/scene/database.cc Outdated
chuong-data61 and others added 2 commits May 12, 2026 21:48
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ahojnnes
Copy link
Copy Markdown
Contributor

Thanks for your contribution. I understand the motivation. My immediate gut reaction is that this is something that is better handled by the business logic wrapping colmap, as it is highly dependent on the specific setup of each consumer?

@chuong-data61
Copy link
Copy Markdown
Author

chuong-data61 commented May 12, 2026

Thanks for your contribution. I understand the motivation. My immediate gut reaction is that this is something that is better handled by the business logic wrapping colmap, as it is highly dependent on the specific setup of each consumer?

Thanks for the comment @ahojnnes. It was surprising to me that COLMAP stopped working when it was simply deployed to process data on a network drive. Such a use case can be quite common in reality. With the current version of COLMAP, inefficient data manipulation is required to get around the limitations of SQLITE. An alternative solution to this PR is to switch from SQLite to PostgreSQL, which requires more code changes.

@ahojnnes
Copy link
Copy Markdown
Contributor

I see. The problem is that the proposed fix here still requires an opt-in env variable to be set, so the behavior will still be surprising to anybody not setting the env variable. Setting the env variable is not really that much more convenient as compared to just copying the database in a wrapper shell script?

@chuong-data61
Copy link
Copy Markdown
Author

chuong-data61 commented May 13, 2026

I see. The problem is that the proposed fix here still requires an opt-in env variable to be set, so the behavior will still be surprising to anybody not setting the env variable. Setting the env variable is not really that much more convenient as compared to just copying the database in a wrapper shell script?

Not surprised if adding a suggestion to set the env variable when a database error occurs, and/or including in COLMAP tutorial/help document. Once set up, the database update is handled automatically and does not interfere with the overall workflow. Furthermore, many users don't really know when the database is updated to copy across.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants