minifind is a minimal Unix find reimplementation in Rust, designed to list
directory entries as fast as possible. Filename or path matching is supported
via --name (glob) or --regex (regular expression) options, with optional
case-insensitive matching controlled by --case-insensitive. Results can be
narrowed further using --file-type to filter by entry type: b for block
device, c for character device, d for directory, p for named FIFO, f
for regular file, l for symlink, s for socket, or e for empty
file/directory. Both --name and --regex accept multiple patterns.
By default, symlinks are not followed and filesystem boundaries are not crossed. The thread count defaults to the number of available CPU cores.
Other notable projects in this space:
- sharkdp/fd — a much more fully-featured
findalternative with excellent performance - LyonSyonII/hunt-rs — a similar high-performance-oriented tool
- BurntSushi/ripgrep — also home to the globset and ignore crates used by this project
- uutils/findutils — a Rust reimplementation of findutils intended as a drop-in replacement
minimal find reimplementation
Usage: minifind [OPTIONS] <PATH>...
Arguments:
<PATH>... Paths to check for large directories
Options:
-f, --follow-symlinks <FOLLOW_SYMLINKS> Follow symlinks [default: false] [aliases: -L] [possible values: true, false]
-o, --one-filesystem <ONE_FILESYSTEM> Do not cross mount points [default: true] [aliases: --xdev] [possible values: true, false]
-x, --threads <THREADS> Number of threads to use when calibrating and scanning [default: 20]
-d, --max-depth <MAX_DEPTH> Maximum depth to traverse
-n, --name <NAME> Base of the file name matching globbing pattern
-r, --regex <REGEX> File name (full path) matching regular expression pattern
-i, --case-insensitive <CASE_INSENSITIVE> Case-insensitive matching for globbing and regular expression patterns [default: false] [possible values: true, false]
-t, --file-type <FILE_TYPE> Filter matches by type. Also accepts 'b', 'c', 'd', 'p', 'f', 'l', 's' and 'e' aliases [default: directory file symlink]
[possible values: empty, block-device, char-device, directory, pipe, file, symlink, socket]
-h, --help Print help
-V, --version Print versionThe --regex option uses Rust regex syntax,
which is similar to other engines but does not support look-around or
backreferences.
The --name option uses Unix-style glob syntax.
Hardware: 4-core / 8-thread Intel Xeon E5-1630 v3 @ 3.70 GHz, 48 GB RAM.
Measured with the Criterion benchmark in benches/walk.rs
over a shallow clone of the mainline Linux kernel tree (99,893 entries across
6,158 directories, ~2 GB) with a warm page cache. Both minifind (defaults)
and GNU find run as subprocesses, so each pays process-startup cost; output
is discarded for both. 100 samples each:
walk_linux_kernel/minifind time: [20.630 ms 20.710 ms 20.797 ms]
walk_linux_kernel/find time: [78.989 ms 79.237 ms 79.497 ms]
So minifind walks the tree in ~20.7 ms vs ~79.2 ms — about 3.8× faster
(≈4.8M vs ≈1.3M entries/second). Reproduce with cargo bench --bench walk
(set BENCH_WALK_DIR=/path/to/tree to benchmark an existing checkout).
- Parallel traversal. GNU
findwalks on a single thread;minifindfans out across all cores with its own work-stealing walker (one worker per core, minus one thread reserved for output), overlapping directory reads. On this 8-thread machine that accounts for most of the gap — the advantage scales with core count and shrinks toward parity on a 1–2 core host. - Purpose-built walker.
minifinduses its own walker (rawgetdents64viarustixon Unix,std::fselsewhere) rather than a general-purpose crate, so it carries no gitignore/hidden-file bookkeeping it does not need. - No extra
stat(2). File-type filtering uses thed_typealready returned bygetdents(2), avoiding a per-entrystatfor-type-style matching. - Batched, lock-light output. Matched entries are streamed to a dedicated output thread in batches (amortizing channel synchronization), then written straight into a 256 KB buffered writer with one copy per entry.
- Fast allocator.
mimallockeeps the unavoidable per-entry path allocations cheap.
The warm-cache setup isolates CPU and syscall efficiency rather than disk latency; on a cold cache both tools are bound by I/O and the gap narrows.