From 818fbfd208f919e7a4fd9c827b65e5ce5372479b Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 5 Mar 2026 15:34:52 -0800 Subject: [PATCH 001/107] sideband: delay sanitizing by default to Git v3.0 The sideband sanitization patches allow ANSI color sequences through by default, preserving compatibility with pre-receive hooks that provide colored output during `git push`. Even so, there is concern that changing any default behavior in a minor release may have unforeseen consequences. To accommodate this, defer the secure-by-default behavior to Git v3.0, where breaking changes are expected. This gives users and tooling time to prepare, while committing to address CVE-2024-52005 in Git v3.0. Signed-off-by: Johannes Schindelin [jc: adjusted for the removal of 'default' value] Signed-off-by: Junio C Hamano --- Documentation/config/sideband.adoc | 12 ++++++++++-- sideband.c | 6 +++++- t/t5409-colorize-remote-messages.sh | 18 +++++++++++++----- 3 files changed, 28 insertions(+), 8 deletions(-) diff --git a/Documentation/config/sideband.adoc b/Documentation/config/sideband.adoc index 96fade7f5fee39..ddba93393ccadc 100644 --- a/Documentation/config/sideband.adoc +++ b/Documentation/config/sideband.adoc @@ -1,8 +1,16 @@ sideband.allowControlCharacters:: +ifdef::with-breaking-changes[] By default, control characters that are delivered via the sideband are masked, except ANSI color sequences. This prevents potentially - unwanted ANSI escape sequences from being sent to the terminal. Use - this config setting to override this behavior (the value can be + unwanted ANSI escape sequences from being sent to the terminal. +endif::with-breaking-changes[] +ifndef::with-breaking-changes[] + By default, no control characters delivered via the sideband + are masked. This is unsafe and will change in Git v3.* to only + allow ANSI color sequences by default, preventing potentially + unwanted ANSI escape sequences from being sent to the terminal. +endif::with-breaking-changes[] + Use this config setting to override this behavior (the value can be a comma-separated list of the following keywords): + -- diff --git a/sideband.c b/sideband.c index 04282a568edd90..5fb60e52bf00b2 100644 --- a/sideband.c +++ b/sideband.c @@ -34,7 +34,11 @@ static enum { ALLOW_ANSI_CURSOR_MOVEMENTS = 1<<1, ALLOW_ANSI_ERASE = 1<<2, ALLOW_ALL_CONTROL_CHARACTERS = 1<<3, - ALLOW_DEFAULT_ANSI_SEQUENCES = ALLOW_ANSI_COLOR_SEQUENCES +#ifdef WITH_BREAKING_CHANGES + ALLOW_DEFAULT_ANSI_SEQUENCES = ALLOW_ANSI_COLOR_SEQUENCES, +#else + ALLOW_DEFAULT_ANSI_SEQUENCES = ALLOW_ALL_CONTROL_CHARACTERS, +#endif } allow_control_characters = ALLOW_CONTROL_SEQUENCES_UNSET; static inline int skip_prefix_in_csv(const char *value, const char *prefix, diff --git a/t/t5409-colorize-remote-messages.sh b/t/t5409-colorize-remote-messages.sh index 3010913bb113e4..07cbc62736bd26 100755 --- a/t/t5409-colorize-remote-messages.sh +++ b/t/t5409-colorize-remote-messages.sh @@ -98,6 +98,13 @@ test_expect_success 'fallback to color.ui' ' grep "error: error" decoded ' +if test_have_prereq WITH_BREAKING_CHANGES +then + TURN_ON_SANITIZING=already.turned=on +else + TURN_ON_SANITIZING=sideband.allowControlCharacters=color +fi + test_expect_success 'disallow (color) control sequences in sideband' ' write_script .git/color-me-surprised <<-\EOF && printf "error: Have you \\033[31mread\\033[m this?\\a\\n" >&2 @@ -106,7 +113,7 @@ test_expect_success 'disallow (color) control sequences in sideband' ' test_config_global uploadPack.packObjectsHook ./color-me-surprised && test_commit need-at-least-one-commit && - git clone --no-local . throw-away 2>stderr && + git -c $TURN_ON_SANITIZING clone --no-local . throw-away 2>stderr && test_decode_color decoded && test_grep RED decoded && test_grep "\\^G" stderr && @@ -138,7 +145,7 @@ test_decode_csi() { }' } -test_expect_success 'control sequences in sideband allowed by default' ' +test_expect_success 'control sequences in sideband allowed by default (in Git v3.8)' ' write_script .git/color-me-surprised <<-\EOF && printf "error: \\033[31mcolor\\033[m\\033[Goverwrite\\033[Gerase\\033[K\\033?25l\\n" >&2 exec "$@" @@ -147,7 +154,7 @@ test_expect_success 'control sequences in sideband allowed by default' ' test_commit need-at-least-one-commit-at-least && rm -rf throw-away && - git clone --no-local . throw-away 2>stderr && + git -c $TURN_ON_SANITIZING clone --no-local . throw-away 2>stderr && test_decode_color color-decoded && test_decode_csi decoded && test_grep ! "CSI \\[K" decoded && @@ -175,14 +182,15 @@ test_expect_success 'allow all control sequences for a specific URL' ' test_commit one-more-please && rm -rf throw-away && - git clone --no-local . throw-away 2>stderr && + git -c $TURN_ON_SANITIZING clone --no-local . throw-away 2>stderr && test_decode_color color-decoded && test_decode_csi decoded && test_grep ! "CSI \\[K" decoded && test_grep "\\^\\[\\[K" decoded && rm -rf throw-away && - git -c "sideband.file://.allowControlCharacters=true" \ + git -c sideband.allowControlCharacters=false \ + -c "sideband.file://.allowControlCharacters=true" \ clone --no-local "file://$PWD" throw-away 2>stderr && test_decode_color color-decoded && test_decode_csi decoded && From fa1468a1f7c7765a6c7dd1faca4c9dc241d0538c Mon Sep 17 00:00:00 2001 From: Trieu Huynh Date: Tue, 7 Apr 2026 03:30:41 +0900 Subject: [PATCH 002/107] promisor-remote: fix promisor.quiet to use the correct repository fetch_objects() reads the promisor.quiet configuration from the_repository instead of the repo parameter it receives. This means that when git lazy-fetches objects for a non-main repository, eg. a submodule that is itself a partial clone opened via repo_submodule_init(). The submodule's own promisor.quiet setting is ignored and the superproject's setting is used instead. Fix by replacing the_repository with repo in the repo_config_get_bool() call. The practical trigger is git grep --recurse-submodules on a superproject where the submodule is a partial clone. Add a test where promisor.quiet is set only in a partial-clone submodule; a lazy fetch triggered by "git grep --recurse-submodules" must honor that setting. Signed-off-by: Trieu Huynh Signed-off-by: Junio C Hamano --- promisor-remote.c | 2 +- t/t0410-partial-clone.sh | 45 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/promisor-remote.c b/promisor-remote.c index 96fa215b06a924..225260b05f8d65 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -46,7 +46,7 @@ static int fetch_objects(struct repository *repo, "fetch", remote_name, "--no-tags", "--no-write-fetch-head", "--recurse-submodules=no", "--filter=blob:none", "--stdin", NULL); - if (!repo_config_get_bool(the_repository, "promisor.quiet", &quiet) && quiet) + if (!repo_config_get_bool(repo, "promisor.quiet", &quiet) && quiet) strvec_push(&child.args, "--quiet"); if (start_command(&child)) die(_("promisor-remote: unable to fork off fetch subprocess")); diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 52e19728a3fca0..dff442da2090b5 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -717,7 +717,29 @@ test_expect_success 'setup for promisor.quiet tests' ' git -C server rm foo.t && git -C server commit -m remove && git -C server config uploadpack.allowanysha1inwant 1 && - git -C server config uploadpack.allowfilter 1 + git -C server config uploadpack.allowfilter 1 && + + # Setup for submodule repo test: superproject whose submodule is a + # partial clone, so that promisor.quiet is read via a non-main repo. + rm -rf sub-pc-src sub-pc-srv.bare super-src super-work && + git init sub-pc-src && + test_commit -C sub-pc-src initial file.txt "hello" && + + git clone --bare sub-pc-src sub-pc-srv.bare && + git -C sub-pc-srv.bare config uploadpack.allowfilter 1 && + git -C sub-pc-srv.bare config uploadpack.allowanysha1inwant 1 && + + git init super-src && + git -C super-src -c protocol.file.allow=always \ + submodule add "file://$(pwd)/sub-pc-srv.bare" sub && + git -C super-src commit -m "add submodule" && + + git -c protocol.file.allow=always clone super-src super-work && + git -C super-work -c protocol.file.allow=always \ + submodule update --init --filter=blob:none sub && + + # Allow file:// in the submodule so that lazy-fetch subprocesses work. + git -C super-work/sub config protocol.file.allow always ' test_expect_success TTY 'promisor.quiet=false shows progress messages' ' @@ -752,6 +774,27 @@ test_expect_success TTY 'promisor.quiet=unconfigured shows progress messages' ' grep "Receiving objects" err ' +test_expect_success 'promisor.quiet from submodule repo is honored' ' + rm -f pc-quiet-trace && + + # Set promisor.quiet only in the submodule, not the superproject. + git -C super-work/sub config promisor.quiet true && + + # Push a new commit+blob to the server; the blob stays missing in the + # partial-clone submodule until a lazy fetch is triggered. + test_commit -C sub-pc-src updated new-file.txt "world" && + git -C sub-pc-src push "$(pwd)/sub-pc-srv.bare" HEAD:master && + git -C super-work/sub -c protocol.file.allow=always fetch origin && + git -C super-work/sub reset --mixed origin/master && + + # grep descends into the submodule and triggers a lazy fetch for the + # missing blob; verify the fetch subprocess carries --quiet. + GIT_TRACE2_EVENT="$(pwd)/pc-quiet-trace" \ + git -C super-work grep --cached --recurse-submodules "world" \ + 2>/dev/null && + grep negotiationAlgorithm pc-quiet-trace | grep -e --quiet +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd From 6f58b42d052e7fb49e7c1ff16875fbfd5b6cb461 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:00 +0200 Subject: [PATCH 003/107] doc: interpret-trailers: stop fixating on RFC 822 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This command handles the trailers metadata format. But the command isn’t introduced as such; it is instead introduced by stating that these trailer lines look similar to RFC 822 email headers. This is overwrought; most people do not deal directly with email headers, and certainly not email RFCs. Trailers are just key–value pairs that, like email headers, use colon as the separator. The format in its simplest form is easy to describe directly without comparing it to anything else; we will do that in the upcoming commit “explain the format after the intro”. For now, let’s: • remove the first mention of email headers; • keep the second, innocuous comparison with email line folding in the middle; and • remove the now-unneeded disclaimer that trailers do not share many of the features of RFC 822 email headers—there is no invitation to speculate that trailers would follow any other email format rules since we do not compare them directly any more. *** Talking about trailers as an RFC 822/2822-like format seems to go back to the `--fixes`/`Fixes:` trailer topic,[1] the thread that precipitated this command and in turn the first trailer support in git(1) beyond adding s-o-b lines. † 1: https://lore.kernel.org/all/20131027071407.GA11683@leaf/ Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index 77b4f63b05cf5b..1878848ad2acb9 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -14,9 +14,9 @@ git interpret-trailers [--in-place] [--trim-empty] DESCRIPTION ----------- -Add or parse _trailer_ lines that look similar to RFC 822 e-mail -headers, at the end of the otherwise free-form part of a commit -message. For example, in the following commit message +Add or parse _trailer_ lines at the end of the otherwise +free-form part of a commit message. For example, in the following commit +message ------------------------------------------------ subject @@ -107,9 +107,6 @@ key: This is a very long value, with spaces and newlines in it. ------------------------------------------------ -Note that trailers do not follow (nor are they intended to follow) many of the -rules for RFC 822 headers. For example they do not follow the encoding rule. - OPTIONS ------- `--in-place`:: From abb04b0f0daa1df465ec7c71cc42265a8fa0cdf2 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:01 +0200 Subject: [PATCH 004/107] =?UTF-8?q?doc:=20interpret-trailers:=20replace=20?= =?UTF-8?q?=E2=80=9Clines=E2=80=9D=20with=20=E2=80=9Cmetadata=E2=80=9D?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We removed the initial comparison to email headers in the previous commit. Now the introduction paragraph just says “trailer lines”, and the only hint that this is metadata/structured information is the “otherwise free-form” phrase. Let’s replace “lines” with “metadata” since that is their purpose. This also makes the introduction more consistent with how I chose to define trailers in the glossary:[1] “Key-value metadata”. (We will introduce “key–value” in the upcoming commit “explain the format after the intro”.) † 1: 68e3c69e (Documentation/glossary: describe "trailer", 2024-11-17) Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index 1878848ad2acb9..3f60fd9b720dda 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -14,7 +14,7 @@ git interpret-trailers [--in-place] [--trim-empty] DESCRIPTION ----------- -Add or parse _trailer_ lines at the end of the otherwise +Add or parse trailers metadata at the end of the otherwise free-form part of a commit message. For example, in the following commit message From a35523a8398a3dcb65f258b42d323a20fb461361 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:02 +0200 Subject: [PATCH 005/107] =?UTF-8?q?doc:=20interpret-trailers:=20use=20?= =?UTF-8?q?=E2=80=9Cmetadata=E2=80=9D=20in=20Name=20as=20well?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We now since the previous commit introduce the format as “trailers metadata”. We can replace “structured information” with “metadata” in the “Name” section to be consistent. While “structured information” does emphasize that the data is not loosely structured, we also say that this command adds to or parses this format. I don’t think that we need to emphasize that it is structured since clearly there is some structure there. Both “metadata” and “structured information” can convey the same information. But “metadata” is shorter and easier to deploy since it’s just one word. Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index 3f60fd9b720dda..4e92c8299bb21b 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -3,7 +3,7 @@ git-interpret-trailers(1) NAME ---- -git-interpret-trailers - Add or parse structured information in commit messages +git-interpret-trailers - Add or parse metadata in commit messages SYNOPSIS -------- From 9fb47447e82b6c1b2a1b71b033283ba62f5f6151 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:03 +0200 Subject: [PATCH 006/107] doc: interpret-trailers: not just for commit messages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This command doesn’t interface with commits directly. You can interpret or modify any kind of text, even though commit messages are the most relevant. The git(1) suite also isn’t restricted to only direct commit support since git-tag(1) learned `--trailer` in 066cef77 (builtin/tag: add --trailer option, 2024-05-05) Now, we already introduce the command in the “Name” section as dealing with commit messages as well. That is fine since that intro line needs to remain pretty short. Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index 4e92c8299bb21b..7329e710e1a6eb 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -15,8 +15,8 @@ git interpret-trailers [--in-place] [--trim-empty] DESCRIPTION ----------- Add or parse trailers metadata at the end of the otherwise -free-form part of a commit message. For example, in the following commit -message +free-form part of a commit message, or any other kind of text. +For example, in the following commit message ------------------------------------------------ subject From d1673e5aa0bae10d08e424f9919c4c7fe4433dd2 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:04 +0200 Subject: [PATCH 007/107] doc: interpret-trailers: explain the format after the intro MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit You need to read the entire “Description” section in order to understand the full trailer format. But there are many nuances, so that’s fine. As a starter though we have an introductory example.[1] That turns out to be crucial; the rest of this section talks about the mechanics of the command and only incidentally the format itself. Now, although the example might arguably be self-explanatory, we can add a little preamble which defines the format in its simplest form as well as define the most important terms. Note that we name the “blank line” rule since I want to use that term every time it comes up. It gets very mildly obfuscated if you call it a “blank line” in one place[2] and “empty (or whitespace-only) ...” in another one.[3] We will define the format of the *key* in the next commit. † 1: from d57fa7fc (doc: trailer: add more examples in DESCRIPTION, 2023-06-15) † 2: `Documentation/git-interpret-trailers.adoc:86` in 5361983c (The 22nd batch, 2026-03-27) † 3: `Documentation/git-interpret-trailers.adoc:93` in 5361983c (The 22nd batch, 2026-03-27) Suggested-by: D. Ben Knoble Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index 7329e710e1a6eb..bcd79b19bd7752 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -16,7 +16,12 @@ DESCRIPTION ----------- Add or parse trailers metadata at the end of the otherwise free-form part of a commit message, or any other kind of text. -For example, in the following commit message + +A _trailer_ in its simplest form is a key-value pair with a colon as a +separator. A _trailer block_ consists of one or more trailers. The +trailer block needs to be preceded by a blank line, where a _blank line_ +is either an empty or a whitespace-only line. For example, in the +following commit message ------------------------------------------------ subject From 975c9a44e305e456a72c48905a805cace521a705 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:05 +0200 Subject: [PATCH 008/107] doc: interpret-trailers: explain key format MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A trailer key must consist of ASCII alphanumeric characters and hyphens *only*. Let’s document it explicitly instead of relying on readers being conservative and only basing their trailer keys on the documentation examples.[1] The previous commit provided us with an appropriate paragraph to describe the key format. † 1: Technically they would then miss out on using digits in them since all of the example keys just use letters and hyphens Reported-by: Brendan Jackman Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index bcd79b19bd7752..c35fa9c688d28f 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -18,7 +18,8 @@ Add or parse trailers metadata at the end of the otherwise free-form part of a commit message, or any other kind of text. A _trailer_ in its simplest form is a key-value pair with a colon as a -separator. A _trailer block_ consists of one or more trailers. The +separator. The _key_ consists of ASCII alphanumeric characters and +hyphens (`-`). A _trailer block_ consists of one or more trailers. The trailer block needs to be preceded by a blank line, where a _blank line_ is either an empty or a whitespace-only line. For example, in the following commit message From 0e701f8039aff602177db5e7ca525944506253da Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:06 +0200 Subject: [PATCH 009/107] doc: interpret-trailers: add key format example All of the examples speak of the Happy Path where everything works as intended. But failure examples can also be instructive. Especially for explaining again, by example, the key format (see previous commit). This also allows us to demonstrate trailer block detection with a concrete example. Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index c35fa9c688d28f..f215cba4bf0dea 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -405,6 +405,29 @@ mv "\$1.new" "\$1" $ chmod +x .git/hooks/commit-msg ------------ +* Here we try to to use three different trailer keys. But it fails + because two of them are not recognized as trailer keys. ++ +---- +$ cat msg.txt +subject + +Skapad-på: some-branch +Hash-in-v6.11: 45c12d3269fe48f22834320c782ffe86c3560f2c +Reviewed-by: Alice +$ git interpret-trailers --only-trailers Date: Mon, 13 Apr 2026 12:21:07 +0200 Subject: [PATCH 010/107] =?UTF-8?q?doc:=20interpret-trailers:=20commit=20t?= =?UTF-8?q?o=20=E2=80=9Ctrailer=20block=E2=80=9D=20term?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We chose to introduce the term “trailer block” into the documentation a few commits ago.[1] It is used in the code though, so it is not a newly invented term. That term was useful to explain where the trailers are found (they *trail* the message). But it is also useful here, where we explain how trailers are added to existing messages, how trailer blocks are found (beyond the simple case in the introduction), and how the end of the message is found. † 1: in commit “explain the format after the intro” Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 26 ++++++++++++----------- 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index f215cba4bf0dea..b693e89fd96336 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -87,19 +87,21 @@ trailer.sign.key "Signed-off-by: " in your configuration, you only need to specify `--trailer="sign: foo"` on the command line instead of `--trailer="Signed-off-by: foo"`. -By default the new trailer will appear at the end of all the existing -trailers. If there is no existing trailer, the new trailer will appear -at the end of the input. A blank line will be added before the new -trailer if there isn't one already. - -Existing trailers are extracted from the input by looking for -a group of one or more lines that (i) is all trailers, or (ii) contains at -least one Git-generated or user-configured trailer and consists of at +By default the new trailer will appear at the end of the trailer block. +A trailer block will be created with only that trailer if a trailer +block does not already exist. Recall that a trailer block needs to be +preceded by a blank line, so a blank line (specifically an empty line) +will be inserted before the new trailer block in that case. + +Existing trailers are extracted from the input by looking for the +trailer block. Concretely, that is a group of one or more lines that (i) +is all trailers, or (ii) contains at least one Git-generated or +user-configured trailer and consists of at least 25% trailers. -The group must be preceded by one or more empty (or whitespace-only) lines. -The group must either be at the end of the input or be the last -non-whitespace lines before a line that starts with `---` (followed by a -space or the end of the line). +The trailer block is by definition at the end the the message. The end +of the message in turn is either (i) at the end of the input, or (ii) +the last non-whitespace lines before a line that starts with `---` +(followed by a space or the end of the line). When reading trailers, there can be no whitespace before or inside the __, but any number of regular space and tab characters are allowed From 4e06417fd8446f1ea7b79dc64221be57f645432e Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Mon, 13 Apr 2026 12:21:08 +0200 Subject: [PATCH 011/107] doc: interpret-trailers: document comment line treatment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comment lines have always been ignored but this is not documented. This is mostly for completeness since this is unlikely to catch anyone by surprise. But we really ought to be reasonably complete here since it’s the only documentation page that documents trailers. Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-interpret-trailers.adoc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/git-interpret-trailers.adoc b/Documentation/git-interpret-trailers.adoc index b693e89fd96336..b42f957d66638d 100644 --- a/Documentation/git-interpret-trailers.adoc +++ b/Documentation/git-interpret-trailers.adoc @@ -103,6 +103,10 @@ of the message in turn is either (i) at the end of the input, or (ii) the last non-whitespace lines before a line that starts with `---` (followed by a space or the end of the line). +This command ignores comment lines (see `core.commentString` in +linkgit:git-config[1]). This is for use with the `prepare-commit-msg` +and `commit-msg` hooks. + When reading trailers, there can be no whitespace before or inside the __, but any number of regular space and tab characters are allowed between the __ and the separator. There can be whitespaces before, From f7a69261db0f268de967919fa1b7a226571069a9 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 21 Apr 2026 00:26:07 +0000 Subject: [PATCH 012/107] merge-ort: propagate callback errors from traverse_trees_wrapper() traverse_trees_wrapper() saves entries from a first pass through traverse_trees() and then replays them through the real callback (collect_merge_info_callback). However, the replay loop silently discards the callback return value. This means any error reported by the callback during replay -- including a future check for malformed trees -- would be ignored, allowing the merge to proceed with corrupt state. Capture the return value, stop the loop on negative (error) returns, and propagate the error to the caller. Note that the callback returns a positive mask value on success, so we normalize non-negative returns to 0 for the caller. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- merge-ort.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 00923ce3cd749b..4b8e32209d9b3a 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1008,18 +1008,20 @@ static int traverse_trees_wrapper(struct index_state *istate, info->traverse_path = renames->callback_data_traverse_path; info->fn = old_fn; for (i = old_offset; i < renames->callback_data_nr; ++i) { - info->fn(n, - renames->callback_data[i].mask, - renames->callback_data[i].dirmask, - renames->callback_data[i].names, - info); + ret = info->fn(n, + renames->callback_data[i].mask, + renames->callback_data[i].dirmask, + renames->callback_data[i].names, + info); + if (ret < 0) + break; } renames->callback_data_nr = old_offset; free(renames->callback_data_traverse_path); renames->callback_data_traverse_path = old_callback_data_traverse_path; info->traverse_path = NULL; - return 0; + return ret < 0 ? ret : 0; } static void setup_path_info(struct merge_options *opt, From 399bf79b7b76b1b408bfe68dd2dd3432c6497a67 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 21 Apr 2026 00:26:08 +0000 Subject: [PATCH 013/107] merge-ort: drop unnecessary show_all_errors from collect_merge_info() collect_merge_info() has set info.show_all_errors = 1 since d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(), 2020-12-13). This setting was copied from unpack-trees.c where it controls batching of error messages for porcelain display, but merge-ort has no such error-batching logic and never needed it. With show_all_errors set, traverse_trees() captures a negative callback return but continues processing remaining entries rather than stopping immediately. Removing the setting restores the default behavior where a negative return from collect_merge_info_callback() breaks out of the traversal loop right away, allowing a future commit to exit early when a corrupt tree is detected. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- merge-ort.c | 1 - 1 file changed, 1 deletion(-) diff --git a/merge-ort.c b/merge-ort.c index 4b8e32209d9b3a..74e9636020fe40 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1740,7 +1740,6 @@ static int collect_merge_info(struct merge_options *opt, setup_traverse_info(&info, opt->priv->toplevel_dir); info.fn = collect_merge_info_callback; info.data = opt; - info.show_all_errors = 1; if (repo_parse_tree(opt->repo, merge_base) < 0 || repo_parse_tree(opt->repo, side1) < 0 || From 426fc4f650930846728534e5e710f384708f505f Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 21 Apr 2026 00:26:09 +0000 Subject: [PATCH 014/107] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() clear_or_reinit_internal_opts() is responsible for cleaning up the various data structures in merge_options_internal. It already handles many renames-related structures (dirs_removed, dir_renames, relevant_sources, cached_pairs, deferred, etc.) but does not free renames->pairs[].queue. In the normal code path, resolve_and_process_renames() frees pairs[s].queue and reinitializes it with diff_queue_init() before clear_or_reinit_internal_opts() runs, so the omission is harmless. However, if collect_merge_info() encounters an error and returns early (before resolve_and_process_renames() is ever called), any diff pairs already queued by collect_rename_info()/add_pair() will have their backing array leaked. Fix this by freeing renames->pairs[].queue in the cleanup function. In the normal path the pointer is already NULL (from the earlier diff_queue_init() in resolve_and_process_renames()), so free(NULL) is a safe no-op. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- merge-ort.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 74e9636020fe40..8f911cb63979eb 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -728,6 +728,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strintmap_clear_func(&renames->deferred[i].possible_trivial_merges); strset_clear_func(&renames->deferred[i].target_dirs); renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */ + free(renames->pairs[i].queue); + diff_queue_init(&renames->pairs[i]); } renames->cached_pairs_valid_side = 0; renames->dir_rename_mask = 0; From 61388e3ea34663bf0d403f6510cac509cbd88811 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 21 Apr 2026 00:26:10 +0000 Subject: [PATCH 015/107] merge-ort: abort merge when trees have duplicate entries Trees with duplicate entries are malformed; fsck reports "contains duplicate file entries" for them. merge-ort has from the beginning assumed that we would never hit such trees. It was written with the assumption that traverse_trees() calls collect_merge_info_callback() at most once per path. The "sanity checks" in that callback (added in d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(), 2020-12-13)) verify properties of each individual call but not that invariant. The strmap_put() in setup_path_info() silently overwrites the entry from any prior call for the same path, because it assumed there would be no other path. Unfortunately, supplemental data structures for various optimizations could still be tweaked before the extra paths were overwritten, and those data structures not matching expected state could trip various assertions. Change the return type of setup_path_info() from void to int to allow us to detect this case, and abort the merge with a clear error message when it occurs. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- merge-ort.c | 61 ++++++++++++++++------------ t/t6422-merge-rename-corner-cases.sh | 54 ++++++++++++++++++++++++ 2 files changed, 88 insertions(+), 27 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 8f911cb63979eb..be0829bbb781ef 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1026,18 +1026,18 @@ static int traverse_trees_wrapper(struct index_state *istate, return ret < 0 ? ret : 0; } -static void setup_path_info(struct merge_options *opt, - struct string_list_item *result, - const char *current_dir_name, - int current_dir_name_len, - char *fullpath, /* we'll take over ownership */ - struct name_entry *names, - struct name_entry *merged_version, - unsigned is_null, /* boolean */ - unsigned df_conflict, /* boolean */ - unsigned filemask, - unsigned dirmask, - int resolved /* boolean */) +static int setup_path_info(struct merge_options *opt, + struct string_list_item *result, + const char *current_dir_name, + int current_dir_name_len, + char *fullpath, /* we'll take over ownership */ + struct name_entry *names, + struct name_entry *merged_version, + unsigned is_null, /* boolean */ + unsigned df_conflict, /* boolean */ + unsigned filemask, + unsigned dirmask, + int resolved /* boolean */) { /* result->util is void*, so mi is a convenience typed variable */ struct merged_info *mi; @@ -1081,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt, */ mi->is_null = 1; } - strmap_put(&opt->priv->paths, fullpath, mi); + if (strmap_put(&opt->priv->paths, fullpath, mi)) + return error(_("tree has duplicate entries for '%s'"), fullpath); result->string = fullpath; result->util = mi; + return 0; } static void add_pair(struct merge_options *opt, @@ -1350,9 +1352,10 @@ static int collect_merge_info_callback(int n, */ if (side1_matches_mbase && side2_matches_mbase) { /* mbase, side1, & side2 all match; use mbase as resolution */ - setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, - names, names+0, mbase_null, 0 /* df_conflict */, - filemask, dirmask, 1 /* resolved */); + if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+0, mbase_null, 0 /* df_conflict */, + filemask, dirmask, 1 /* resolved */)) + return -1; /* Quit traversing */ return mask; } @@ -1364,9 +1367,10 @@ static int collect_merge_info_callback(int n, */ if (sides_match && filemask == 0x07) { /* use side1 (== side2) version as resolution */ - setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, - names, names+1, side1_null, 0, - filemask, dirmask, 1); + if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+1, side1_null, 0, + filemask, dirmask, 1)) + return -1; /* Quit traversing */ return mask; } @@ -1378,18 +1382,20 @@ static int collect_merge_info_callback(int n, */ if (side1_matches_mbase && filemask == 0x07) { /* use side2 version as resolution */ - setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, - names, names+2, side2_null, 0, - filemask, dirmask, 1); + if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+2, side2_null, 0, + filemask, dirmask, 1)) + return -1; /* Quit traversing */ return mask; } /* Similar to above but swapping sides 1 and 2 */ if (side2_matches_mbase && filemask == 0x07) { /* use side1 version as resolution */ - setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, - names, names+1, side1_null, 0, - filemask, dirmask, 1); + if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+1, side1_null, 0, + filemask, dirmask, 1)) + return -1; /* Quit traversing */ return mask; } @@ -1413,8 +1419,9 @@ static int collect_merge_info_callback(int n, * unconflict some more cases, but that comes later so all we can * do now is record the different non-null file hashes.) */ - setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, - names, NULL, 0, df_conflict, filemask, dirmask, 0); + if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, NULL, 0, df_conflict, filemask, dirmask, 0)) + return -1; /* Quit traversing */ ci = pi.util; VERIFY_CI(ci); diff --git a/t/t6422-merge-rename-corner-cases.sh b/t/t6422-merge-rename-corner-cases.sh index e18d5a227d54f7..81b645bb3bdc5b 100755 --- a/t/t6422-merge-rename-corner-cases.sh +++ b/t/t6422-merge-rename-corner-cases.sh @@ -1525,4 +1525,58 @@ test_expect_success 'submodule/directory preliminary conflict' ' ) ' +# Testcase: submodule/directory conflict with duplicate tree entries +# One side has a path as a gitlink (submodule). The other side replaces +# the gitlink with a directory. A third-party tool creates a tree on the +# submodule side that has *both* a gitlink and a tree entry for the same +# path (adding a file inside the submodule path ignoring that there's a +# gitlink there). collect_merge_info_callback() should detect the +# duplicate and abort rather than silently corrupting its bookkeeping. + +test_expect_success 'duplicate tree entries trigger an error' ' + test_when_finished "rm -rf duplicate-entry" && + git init duplicate-entry && + ( + cd duplicate-entry && + + # Base commit: "docs" is a gitlink (submodule) + empty_tree=$(git mktree file.txt && + git add file.txt && + git commit -m base && + + # side1: remove the gitlink, replace with a directory + git checkout -b side1 && + git rm --cached docs && + mkdir -p docs && + echo hello >docs/requirements.txt && + git add docs/requirements.txt && + git commit -m "side1: submodule to directory" && + + # side2: keep the gitlink but craft a tree that also + # contains a tree entry for "docs" (simulating a tool + # that adds files inside a submodule path without + # removing the gitlink first). + git checkout main && + git checkout -b side2 && + blob_oid=$(echo world | git hash-object -w --stdin) && + docs_tree=$(printf "100644 blob %s\trequirements.txt\n" \ + "$blob_oid" | git mktree) && + cur_tree=$(git rev-parse HEAD^{tree}) && + git cat-file -p $cur_tree >tree-listing && + printf "040000 tree %s\tdocs\n" "$docs_tree" >>tree-listing && + new_tree=$(git mktree err && + test_grep "duplicate entries" err + ) +' + test_done From 60826fdeb137a61e6ae8b80d70509d2bc094f8a5 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 21 Apr 2026 00:26:11 +0000 Subject: [PATCH 016/107] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts verify_cache() checks that the index does not contain both "path" and "path/file" before writing a tree. It does this by comparing only adjacent entries, relying on the assumption that "path/file" would immediately follow "path" in sorted order. Unfortunately, this assumption does not always hold. For example: docs <-- submodule entry docs-internal/README.md <-- intervening entry docs/requirements.txt <-- D/F conflict, NOT adjacent to "docs" When this happens, verify_cache() silently misses the D/F conflict and write-tree produces a corrupt tree object containing duplicate entries (one for the submodule "docs" and one for the tree "docs"). I could not find any caller in current git that both allows the index to get into this state and then tries to write it out without doing other checks beyond the verify_cache() call in cache_tree_update(), but verify_cache() is documented as a safety net for preventing corrupt trees and should actually provide that guarantee. A downstream consumer that relied solely on cache_tree_update()'s internal checking via verify_cache() to prevent duplicate tree entries was bitten by the gap. Add a test that constructs a corrupt index directly (bypassing the D/F checks in add_index_entry) and verifies that write-tree now rejects it. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- cache-tree.c | 46 ++++++++++++++++++++++++-- t/meson.build | 1 + t/t0093-direct-index-write.pl | 38 ++++++++++++++++++++++ t/t0093-verify-cache-df-gap.sh | 59 ++++++++++++++++++++++++++++++++++ 4 files changed, 141 insertions(+), 3 deletions(-) create mode 100644 t/t0093-direct-index-write.pl create mode 100755 t/t0093-verify-cache-df-gap.sh diff --git a/cache-tree.c b/cache-tree.c index 7881b42aa24c80..f11844fe72020e 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -192,22 +192,62 @@ static int verify_cache(struct index_state *istate, int flags) for (i = 0; i + 1 < istate->cache_nr; i++) { /* path/file always comes after path because of the way * the cache is sorted. Also path can appear only once, - * which means conflicting one would immediately follow. + * so path/file is likely the immediately following path + * but might be separated if there is e.g. a + * path-internal/... file. */ const struct cache_entry *this_ce = istate->cache[i]; const struct cache_entry *next_ce = istate->cache[i + 1]; const char *this_name = this_ce->name; const char *next_name = next_ce->name; int this_len = ce_namelen(this_ce); + const char *conflict_name = NULL; + if (this_len < ce_namelen(next_ce) && - next_name[this_len] == '/' && + next_name[this_len] <= '/' && strncmp(this_name, next_name, this_len) == 0) { + if (next_name[this_len] == '/') { + conflict_name = next_name; + } else if (next_name[this_len] < '/') { + /* + * The immediately next entry shares our + * prefix but sorts before "path/" (e.g., + * "path-internal" between "path" and + * "path/file", since '-' (0x2D) < '/' + * (0x2F)). Binary search to find where + * "path/" would be and check for a D/F + * conflict there. + */ + struct cache_entry *other; + struct strbuf probe = STRBUF_INIT; + int pos; + + strbuf_add(&probe, this_name, this_len); + strbuf_addch(&probe, '/'); + pos = index_name_pos_sparse(istate, + probe.buf, + probe.len); + strbuf_release(&probe); + + if (pos < 0) + pos = -pos - 1; + if (pos >= (int)istate->cache_nr) + continue; + other = istate->cache[pos]; + if (ce_namelen(other) > this_len && + other->name[this_len] == '/' && + !strncmp(this_name, other->name, this_len)) + conflict_name = other->name; + } + } + + if (conflict_name) { if (10 < ++funny) { fprintf(stderr, "...\n"); break; } fprintf(stderr, "You have both %s and %s\n", - this_name, next_name); + this_name, conflict_name); } } if (funny) diff --git a/t/meson.build b/t/meson.build index 7528e5cda5fef0..362177999bd342 100644 --- a/t/meson.build +++ b/t/meson.build @@ -124,6 +124,7 @@ integration_tests = [ 't0090-cache-tree.sh', 't0091-bugreport.sh', 't0092-diagnose.sh', + 't0093-verify-cache-df-gap.sh', 't0095-bloom.sh', 't0100-previous.sh', 't0101-at-syntax.sh', diff --git a/t/t0093-direct-index-write.pl b/t/t0093-direct-index-write.pl new file mode 100644 index 00000000000000..2881a3ebb21dcd --- /dev/null +++ b/t/t0093-direct-index-write.pl @@ -0,0 +1,38 @@ +#!/usr/bin/perl +# +# Build a v2 index file from entries listed on stdin. +# Each line: "octalmode hex-oid name" +# Output: binary index written to stdout. +# +# This bypasses all D/F safety checks in add_index_entry(), simulating +# what happens when code uses ADD_CACHE_JUST_APPEND to bulk-load entries. +use strict; +use warnings; +use Digest::SHA qw(sha1 sha256); + +my $hash_algo = $ENV{'GIT_DEFAULT_HASH'} || 'sha1'; +my $hash_func = $hash_algo eq 'sha256' ? \&sha256 : \&sha1; + +my @entries; +while (my $line = ) { + chomp $line; + my ($mode, $oid_hex, $name) = split(/ /, $line, 3); + push @entries, [$mode, $oid_hex, $name]; +} + +my $body = "DIRC" . pack("NN", 2, scalar @entries); + +for my $ent (@entries) { + my ($mode, $oid_hex, $name) = @{$ent}; + # 10 x 32-bit stat fields (zeroed), with mode in position 7 + my $stat = pack("N10", 0, 0, 0, 0, 0, 0, oct($mode), 0, 0, 0); + my $oid = pack("H*", $oid_hex); + my $flags = pack("n", length($name) & 0xFFF); + my $entry = $stat . $oid . $flags . $name . "\0"; + # Pad to 8-byte boundary + while (length($entry) % 8) { $entry .= "\0"; } + $body .= $entry; +} + +binmode STDOUT; +print $body . $hash_func->($body); diff --git a/t/t0093-verify-cache-df-gap.sh b/t/t0093-verify-cache-df-gap.sh new file mode 100755 index 00000000000000..0b6829d805269d --- /dev/null +++ b/t/t0093-verify-cache-df-gap.sh @@ -0,0 +1,59 @@ +#!/bin/sh + +test_description='verify_cache() must catch non-adjacent D/F conflicts + +Ensure that verify_cache() can complain about bad entries like: + + docs <-- submodule + docs-internal/... <-- sorts here because "-" < "/" + docs/... <-- D/F conflict with "docs" above, not adjacent + +In order to test verify_cache, we directly construct a corrupt index +(bypassing the D/F safety checks in add_index_entry) and verify that +write-tree rejects it. +' + +. ./test-lib.sh + +if ! test_have_prereq PERL +then + skip_all='skipping verify_cache D/F tests; Perl not available' + test_done +fi + +# Build a v2 index from entries on stdin, bypassing D/F checks. +# Each line: "octalmode hex-oid name" (entries must be pre-sorted). +build_corrupt_index () { + perl "$TEST_DIRECTORY/t0093-direct-index-write.pl" >"$1" +} + +test_expect_success 'setup objects' ' + test_commit base && + BLOB=$(git rev-parse HEAD:base.t) && + SUB_COMMIT=$(git rev-parse HEAD) +' + +test_expect_success 'adjacent D/F conflict is caught by verify_cache' ' + cat >index-entries <<-EOF && + 0160000 $SUB_COMMIT docs + 0100644 $BLOB docs/requirements.txt + EOF + build_corrupt_index .git/index err && + test_grep "You have both docs and docs/requirements.txt" err +' + +test_expect_success 'non-adjacent D/F conflict is caught by verify_cache' ' + cat >index-entries <<-EOF && + 0160000 $SUB_COMMIT docs + 0100644 $BLOB docs-internal/README.md + 0100644 $BLOB docs/requirements.txt + EOF + build_corrupt_index .git/index err && + test_grep "You have both docs and docs/requirements.txt" err +' + +test_done From eecc860d24564ae8e2c96615649e06e4d636f1aa Mon Sep 17 00:00:00 2001 From: Pablo Sabater Date: Mon, 27 Apr 2026 12:28:38 +0200 Subject: [PATCH 017/107] graph: add indentation for commits preceded by a parentless commit When having a history with multiple root commits or commits that act like roots (they have excluded parents), let's call them parentless, and drawing the history near them, the graphing engine renders the commits one below the other, seeming that they are related. This issue has been attempted multiple times: https://lore.kernel.org/git/xmqqwnwajbuj.fsf@gitster.c.googlers.com/ This happens because for these parentless commits, in the next row the column becomes empty and the engine prints from left to right from the first empty column, filling the gap below these parentless commits. Keep a parentless commit for at least one row more to avoid having the column empty but hide it as indentation, therefore making the next unrelated commit live in the next column (column means even positions where edges live: 0, 2, 4), then clean that "placeholder" column and let the unrelated commit to naturally collapse to the column where the parentless commit was. Add is_placeholder to the struct column to mark if a column is acting as a placeholder for the padding. When a column is parentless, add a column with the parentless commit data to prevent segfaults when 'column->commit' and mark it as a placeholder. Teach rendering functions to print a padding ' ' instead of an edge when a placeholder column is met. Then, unless the next commit is also parentless (then we need to keep cascading the indentation) clean the mapping and columns from the placeholder to allow it to collapse naturally. Add tests for different cases. before this patch: * parentless A * child B * parentless B after this patch: * parentless A * child B / * parentless B Signed-off-by: Pablo Sabater Signed-off-by: Junio C Hamano --- graph.c | 115 ++++++++++++++++++++++++++++++-- t/t4215-log-skewed-merges.sh | 124 +++++++++++++++++++++++++++++++++++ 2 files changed, 233 insertions(+), 6 deletions(-) diff --git a/graph.c b/graph.c index 26f6fbf000aef5..97292df998f5e4 100644 --- a/graph.c +++ b/graph.c @@ -60,6 +60,12 @@ struct column { * index into column_colors. */ unsigned short color; + /* + * A placeholder column keeps the column of a parentless commit filled + * for one extra row, avoiding a next unrelated commit to be printed + * in the same column. + */ + unsigned is_placeholder:1; }; enum graph_state { @@ -563,6 +569,7 @@ static void graph_insert_into_new_columns(struct git_graph *graph, i = graph->num_new_columns++; graph->new_columns[i].commit = commit; graph->new_columns[i].color = graph_find_commit_color(graph, commit); + graph->new_columns[i].is_placeholder = 0; } if (graph->num_parents > 1 && idx > -1 && graph->merge_layout == -1) { @@ -607,7 +614,7 @@ static void graph_update_columns(struct git_graph *graph) { struct commit_list *parent; int max_new_columns; - int i, seen_this, is_commit_in_columns; + int i, seen_this, is_commit_in_columns, is_parentless; /* * Swap graph->columns with graph->new_columns @@ -654,6 +661,26 @@ static void graph_update_columns(struct git_graph *graph) */ seen_this = 0; is_commit_in_columns = 1; + /* + * A commit is "parentless" (is a visual root that starts a new column) + * only if has no visible parents AND it's not a boundary commit. + * + * Boundary commits also have no visible parents, but they are + * NOT a visual root: + * + * 1. A boundary only appears in the output because an included commit + * is its child. Children are always above, and the renderer draws an + * edge down to the boundary from that child. Rather than starting + * a column like a visual root would do, it "inherits" its child + * column. + * + * 2. Included commit CAN'T appear below a boundary. Boundaries are + * ancestors of the exclusion point; if an included commit were an + * ancestor of the boundary it would be excluded and not rendered. + * Boundaries therefore always sink to the bottom. + */ + is_parentless = graph->num_parents == 0 && + !(graph->commit->object.flags & BOUNDARY); for (i = 0; i <= graph->num_columns; i++) { struct commit *col_commit; if (i == graph->num_columns) { @@ -688,11 +715,46 @@ static void graph_update_columns(struct git_graph *graph) * least 2, even if it has no interesting parents. * The current commit always takes up at least 2 * spaces. + * + * Check for the commit to seem like a root, no parents + * rendered and that it is not a boundary commit. If so, + * add a placeholder to keep that column filled for + * at least one row. + * + * Prevents the next commit from being inserted + * just below and making the graph confusing. */ - if (graph->num_parents == 0) + if (is_parentless) { + graph_insert_into_new_columns(graph, graph->commit, i); + graph->new_columns[graph->num_new_columns - 1] + .is_placeholder = 1; + } else if (graph->num_parents == 0) { graph->width += 2; + } } else { - graph_insert_into_new_columns(graph, col_commit, -1); + if (graph->columns[i].is_placeholder) { + /* + * Keep the placeholders if the next commit is + * parentless also, making the indentation cascade. + */ + if (!seen_this && is_parentless) { + graph_insert_into_new_columns(graph, + graph->columns[i].commit, i); + graph->new_columns[graph->num_new_columns - 1] + .is_placeholder = 1; + } else if (!seen_this) { + graph->mapping[graph->width] = -1; + graph->width += 2; + } + /* + * seen_this && is_placeholder means that this + * line is the one after the indented one, the + * placeholder is no longer needed, gets + * dropped and the columns collapses naturally. + */ + } else { + graph_insert_into_new_columns(graph, col_commit, -1); + } } } @@ -846,7 +908,10 @@ static void graph_output_padding_line(struct git_graph *graph, * Output a padding row, that leaves all branch lines unchanged */ for (i = 0; i < graph->num_new_columns; i++) { - graph_line_write_column(line, &graph->new_columns[i], '|'); + if (graph->new_columns[i].is_placeholder) + graph_line_write_column(line, &graph->new_columns[i], ' '); + else + graph_line_write_column(line, &graph->new_columns[i], '|'); graph_line_addch(line, ' '); } } @@ -1058,7 +1123,34 @@ static void graph_output_commit_line(struct git_graph *graph, struct graph_line graph->mapping[2 * i] < i) { graph_line_write_column(line, col, '/'); } else { - graph_line_write_column(line, col, '|'); + if (col->is_placeholder) { + /* + * When the indented commit is a merge commit, + * the placeholder column adds unwanted padding + * between the commit and its subject. + * + * * parentless commit + * * merge commit + * /| + * | * parent A + * * parent B + * ^^ unwanted padding + * + * Once the current commit has been seen, don't + * let placeholder columns to be rendered: + * + * * parentless commit + * * merge commit + * /| + * | * parent A + * * parent B + */ + if (seen_this) + continue; + graph_line_write_column(line, col, ' '); + } else { + graph_line_write_column(line, col, '|'); + } } graph_line_addch(line, ' '); } @@ -1135,7 +1227,18 @@ static void graph_output_post_merge_line(struct git_graph *graph, struct graph_l graph_line_write_column(line, col, '|'); graph_line_addch(line, ' '); } else { - graph_line_write_column(line, col, '|'); + if (col->is_placeholder) { + /* + * Same placeholder handling as in + * graph_output_commit_line(). + */ + if (seen_this) + continue; + graph_line_write_column(line, col, ' '); + } else { + graph_line_write_column(line, col, '|'); + } + if (graph->merge_layout != 0 || i != graph->commit_index - 1) { if (parent_col) graph_line_write_column( diff --git a/t/t4215-log-skewed-merges.sh b/t/t4215-log-skewed-merges.sh index 28d0779a8c599e..0f6f95a6b5aff5 100755 --- a/t/t4215-log-skewed-merges.sh +++ b/t/t4215-log-skewed-merges.sh @@ -370,4 +370,128 @@ test_expect_success 'log --graph with multiple tips' ' EOF ' +test_expect_success 'log --graph with root commit' ' + git checkout --orphan 8_1 && test_commit 8_A && test_commit 8_A1 && + git checkout --orphan 8_2 && test_commit 8_B && + + check_graph 8_2 8_1 <<-\EOF + * 8_B + * 8_A1 + / + * 8_A + EOF +' + +test_expect_success 'log --graph with multiple root commits' ' + test_commit 8_B1 && + git checkout --orphan 8_3 && test_commit 8_C && + + check_graph 8_3 8_2 8_1 <<-\EOF + * 8_C + * 8_B1 + / + * 8_B + * 8_A1 + / + * 8_A + EOF +' + +test_expect_success 'log --graph commit from a two parent merge shifted' ' + git checkout --orphan 9_1 && test_commit 9_B && + git checkout --orphan 9_2 && test_commit 9_C && + git checkout 9_1 && + git merge 9_2 --allow-unrelated-histories -m 9_M && + git checkout --orphan 9_3 && + test_commit 9_A && test_commit 9_A1 && test_commit 9_A2 && + + check_graph 9_3 9_1 <<-\EOF + * 9_A2 + * 9_A1 + * 9_A + * 9_M + /| + | * 9_C + * 9_B + EOF +' + +test_expect_success 'log --graph commit from a three parent merge shifted' ' + git checkout --orphan 10_1 && test_commit 10_B && + git checkout --orphan 10_2 && test_commit 10_C && + git checkout --orphan 10_3 && test_commit 10_D && + git checkout 10_1 && + TREE=$(git write-tree) && + MERGE=$(git commit-tree $TREE -p 10_1 -p 10_2 -p 10_3 -m 10_M) && + git reset --hard $MERGE && + git checkout --orphan 10_4 && + test_commit 10_A && test_commit 10_A1 && test_commit 10_A2 && + + check_graph 10_4 10_1 <<-\EOF + * 10_A2 + * 10_A1 + * 10_A + * 10_M + /|\ + | | * 10_D + | * 10_C + * 10_B + EOF +' + +test_expect_success 'log --graph commit from a four parent merge shifted' ' + git checkout --orphan 11_1 && test_commit 11_B && + git checkout --orphan 11_2 && test_commit 11_C && + git checkout --orphan 11_3 && test_commit 11_D && + git checkout --orphan 11_4 && test_commit 11_E && + git checkout 11_1 && + TREE=$(git write-tree) && + MERGE=$(git commit-tree $TREE -p 11_1 -p 11_2 -p 11_3 -p 11_4 -m 11_M) && + git reset --hard $MERGE && + git checkout --orphan 11_5 && + test_commit 11_A && test_commit 11_A1 && test_commit 11_A2 && + + check_graph 11_5 11_1 <<-\EOF + * 11_A2 + * 11_A1 + * 11_A + *-. 11_M + /|\ \ + | | | * 11_E + | | * 11_D + | * 11_C + * 11_B + EOF +' + +test_expect_success 'log --graph disconnected three roots cascading' ' + git checkout --orphan 12_1 && test_commit 12_D && test_commit 12_D1 && + git checkout --orphan 12_2 && test_commit 12_C && + git checkout --orphan 12_3 && test_commit 12_B && + git checkout --orphan 12_4 && test_commit 12_A && + + check_graph 12_4 12_3 12_2 12_1 <<-\EOF + * 12_A + * 12_B + * 12_C + * 12_D1 + _ / + / + / + * 12_D + EOF +' + +test_expect_success 'log --graph with excluded parent (not a root)' ' + git checkout --orphan 13_1 && test_commit 13_X && test_commit 13_Y && + git checkout --orphan 13_2 && test_commit 13_O && test_commit 13_A && + + check_graph 13_O..13_A 13_1 <<-\EOF + * 13_A + * 13_Y + / + * 13_X + EOF +' + test_done From 6ab1b3b74d02151e7570b82554e9cadebe0ea6b8 Mon Sep 17 00:00:00 2001 From: Usman Akinyemi Date: Sun, 3 May 2026 21:04:00 +0530 Subject: [PATCH 018/107] remote: fix sign-compare warnings in push_cas_option Replace `int` with `size_t` for `nr` and `alloc` in `struct push_cas_option` to avoid -Werror=sign-compare warnings when comparing against size-based values. Suggested-by: Junio C Hamano Signed-off-by: Usman Akinyemi Signed-off-by: Junio C Hamano --- remote.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/remote.h b/remote.h index fc052945ee451d..741d14a9fcefcd 100644 --- a/remote.h +++ b/remote.h @@ -418,8 +418,8 @@ struct push_cas_option { unsigned use_tracking:1; char *refname; } *entry; - int nr; - int alloc; + size_t nr; + size_t alloc; }; int parseopt_push_cas_option(const struct option *, const char *arg, int unset); From 3e7b9dce27b1519f6745c89fe01f0b840acddb0a Mon Sep 17 00:00:00 2001 From: Usman Akinyemi Date: Sun, 3 May 2026 21:04:01 +0530 Subject: [PATCH 019/107] remote: move remote group resolution to remote.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `get_remote_group`, `add_remote_or_group`, and the `remote_group_data` struct are currently defined as static helpers inside builtin/fetch.c. They implement generic remote group resolution that is not specific to fetch — they parse `remotes.` config entries and resolve a name to either a list of group members or a single configured remote. Move them to remote.c and declare them in remote.h so that other builtins can use the same logic without duplication. Useful for the next patch. Suggested-by: Junio C Hamano Signed-off-by: Usman Akinyemi Signed-off-by: Junio C Hamano --- builtin/fetch.c | 42 ------------------------------------------ remote.c | 37 +++++++++++++++++++++++++++++++++++++ remote.h | 12 ++++++++++++ 3 files changed, 49 insertions(+), 42 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 8a36cf67b5f140..966cc58f730150 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -2138,48 +2138,6 @@ static int get_one_remote_for_fetch(struct remote *remote, void *priv) return 0; } -struct remote_group_data { - const char *name; - struct string_list *list; -}; - -static int get_remote_group(const char *key, const char *value, - const struct config_context *ctx UNUSED, - void *priv) -{ - struct remote_group_data *g = priv; - - if (skip_prefix(key, "remotes.", &key) && !strcmp(key, g->name)) { - /* split list by white space */ - while (*value) { - size_t wordlen = strcspn(value, " \t\n"); - - if (wordlen >= 1) - string_list_append_nodup(g->list, - xstrndup(value, wordlen)); - value += wordlen + (value[wordlen] != '\0'); - } - } - - return 0; -} - -static int add_remote_or_group(const char *name, struct string_list *list) -{ - int prev_nr = list->nr; - struct remote_group_data g; - g.name = name; g.list = list; - - repo_config(the_repository, get_remote_group, &g); - if (list->nr == prev_nr) { - struct remote *remote = remote_get(name); - if (!remote_is_configured(remote, 0)) - return 0; - string_list_append(list, remote->name); - } - return 1; -} - static void add_options_to_argv(struct strvec *argv, const struct fetch_config *config) { diff --git a/remote.c b/remote.c index 7ca2a6501b4920..3d62384792c323 100644 --- a/remote.c +++ b/remote.c @@ -2114,6 +2114,43 @@ int get_fetch_map(const struct ref *remote_refs, return 0; } +int get_remote_group(const char *key, const char *value, + const struct config_context *ctx UNUSED, + void *priv) +{ + struct remote_group_data *g = priv; + + if (skip_prefix(key, "remotes.", &key) && !strcmp(key, g->name)) { + /* split list by white space */ + while (*value) { + size_t wordlen = strcspn(value, " \t\n"); + + if (wordlen >= 1) + string_list_append_nodup(g->list, + xstrndup(value, wordlen)); + value += wordlen + (value[wordlen] != '\0'); + } + } + + return 0; +} + +int add_remote_or_group(const char *name, struct string_list *list) +{ + int prev_nr = list->nr; + struct remote_group_data g; + g.name = name; g.list = list; + + repo_config(the_repository, get_remote_group, &g); + if (list->nr == prev_nr) { + struct remote *remote = remote_get(name); + if (!remote_is_configured(remote, 0)) + return 0; + string_list_append(list, remote->name); + } + return 1; +} + int resolve_remote_symref(struct ref *ref, struct ref *list) { if (!ref->symref) diff --git a/remote.h b/remote.h index 741d14a9fcefcd..7915be3111daa7 100644 --- a/remote.h +++ b/remote.h @@ -347,6 +347,18 @@ int branch_has_merge_config(struct branch *branch); int branch_merge_matches(struct branch *, int n, const char *); +/* list of the remote in a group as configured */ +struct remote_group_data { + const char *name; + struct string_list *list; +}; + +int get_remote_group(const char *key, const char *value, + const struct config_context *ctx, + void *priv); + +int add_remote_or_group(const char *name, struct string_list *list); + /** * Return the fully-qualified refname of the tracking branch for `branch`. * I.e., what "branch@{upstream}" would give you. Returns NULL if no From 8ea82816652d20ac7070a8fcd60980568a8a293c Mon Sep 17 00:00:00 2001 From: Usman Akinyemi Date: Sun, 3 May 2026 21:04:02 +0530 Subject: [PATCH 020/107] push: support pushing to a remote group MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `git fetch` accepts a remote group name (configured via `remotes.` in config) and fetches from each member remote. `git push` has no equivalent — it only accepts a single remote name. Teach `git push` to resolve its repository argument through `add_remote_or_group()`, which was made public in the previous patch, so that a user can push to all remotes in a group with: git push When the argument resolves to a single remote, the behaviour is identical to before. When it resolves to a group, each member remote is pushed in sequence. The group push path rebuilds the refspec list (`rs`) from scratch for each member remote so that per-remote push mappings configured via `remote..push` are resolved correctly against each specific remote. Without this, refspec entries would accumulate across iterations and each subsequent remote would receive a growing list of duplicated entries. Mirror detection (`remote->mirror`) is also evaluated per remote using a copy of the flags, so that a mirror remote in the group cannot set TRANSPORT_PUSH_FORCE on subsequent non-mirror remotes in the same group. Suggested-by: Junio C Hamano Signed-off-by: Usman Akinyemi Signed-off-by: Junio C Hamano --- Documentation/git-push.adoc | 80 ++++++++++-- builtin/push.c | 251 +++++++++++++++++++++++++++++++----- t/meson.build | 1 + t/t5566-push-group.sh | 160 +++++++++++++++++++++++ 4 files changed, 451 insertions(+), 41 deletions(-) create mode 100755 t/t5566-push-group.sh diff --git a/Documentation/git-push.adoc b/Documentation/git-push.adoc index e5ba3a67421edc..aa221c3909532e 100644 --- a/Documentation/git-push.adoc +++ b/Documentation/git-push.adoc @@ -18,17 +18,28 @@ git push [--all | --branches | --mirror | --tags] [--follow-tags] [--atomic] [-n DESCRIPTION ----------- - -Updates one or more branches, tags, or other references in a remote -repository from your local repository, and sends all necessary data -that isn't already on the remote. +Updates one or more branches, tags, or other references in one or more +remote repositories from your local repository, and sends all necessary +data that isn't already on the remote. The simplest way to push is `git push `. `git push origin main` will push the local `main` branch to the `main` branch on the remote named `origin`. -The `` argument defaults to the upstream for the current branch, -or `origin` if there's no configured upstream. +You can also push to multiple remotes at once by using a remote group. +A remote group is a named list of remotes configured via `remotes.` +in your git config: + + $ git config remotes.all-remotes "origin gitlab backup" + +Then `git push all-remotes` will push to `origin`, `gitlab`, and +`backup` in turn, as if you had run `git push` against each one +individually. Each remote is pushed independently using its own +push mapping configuration. There is a `remotes.` entry in +the configuration file. (See linkgit:git-config[1]). + +The `` argument defaults to the upstream for the current +branch, or `origin` if there's no configured upstream. To decide which branches, tags, or other refs to push, Git uses (in order of precedence): @@ -55,8 +66,10 @@ OPTIONS __:: The "remote" repository that is the destination of a push operation. This parameter can be either a URL - (see the section <> below) or the name - of a remote (see the section <> below). + (see the section <> below), the name + of a remote (see the section <> below), + or the name of a remote group + (see the section <> below). `...`:: Specify what destination ref to update with what source object. @@ -430,6 +443,57 @@ further recursion will occur. In this case, `only` is treated as `on-demand`. include::urls-remotes.adoc[] +[[REMOTE-GROUPS]] +REMOTE GROUPS +------------- + +A remote group is a named list of remotes configured via `remotes.` +in your git config: + + $ git config remotes.all-remotes "r1 r2 r3" + +When a group name is given as the `` argument, the push is +performed to each member remote in turn. The defining principle is: + + git push all-remotes + +is exactly equivalent to: + + git push r1 + git push r2 + ... + git push rN + +where r1, r2, ..., rN are the members of `all-remotes`. No special +behaviour is added or removed — the group is purely a shorthand for +running the same push command against each member remote individually. + +When pushing to a group of more than one remote, Git spawns a separate +`git push` subprocess for each member remote in sequence. Each subprocess +receives the same flags and refspecs as the original invocation. This +means that per-remote push mappings configured via `remote..push` +and mirror mode (`remote..mirror`) are evaluated independently for +each remote, and a mirror remote in the group cannot affect the push +behaviour of other non-mirror remotes in the same group. + +The `--atomic` option is not supported for group pushes, because atomicity +can only be guaranteed within a single transport connection to a single +remote. Git will refuse the invocation with an error if `--atomic` is +combined with a group name. + +If any member remote fails whether due to a push rejection (e.g. a +non-fast-forward update, a server-side hook refusing a ref) or a connection +error (e.g. the repository does not exist, authentication fails, or the +network is unreachable), Git reports the error and continues pushing to +the remaining remotes in the group. The overall exit code is non-zero if +any member push fails. + +This means the user is responsible for ensuring that the sequence of +individual pushes makes sense. If `git push r1`` would fail for a given +set of options and arguments, then `git push all-remotes` will fail in +the same way when it reaches r1. The group push does not do anything +special to make a failing individual push succeed. + OUTPUT ------ diff --git a/builtin/push.c b/builtin/push.c index 7100ffba5da17e..6021b71d668455 100644 --- a/builtin/push.c +++ b/builtin/push.c @@ -10,6 +10,7 @@ #include "config.h" #include "environment.h" #include "gettext.h" +#include "hex.h" #include "refspec.h" #include "run-command.h" #include "remote.h" @@ -544,6 +545,123 @@ static int git_push_config(const char *k, const char *v, return git_default_config(k, v, ctx, NULL); } +static int push_multiple(struct string_list *list, + const struct string_list *push_options, + int flags, + int tags, + const char **refspecs, + int refspec_nr) +{ + int result = 0; + size_t i; + struct strvec argv = STRVEC_INIT; + + strvec_push(&argv, "push"); + + if (flags & TRANSPORT_PUSH_FORCE) + strvec_push(&argv, "--force"); + if (flags & TRANSPORT_PUSH_DRY_RUN) + strvec_push(&argv, "--dry-run"); + if (flags & TRANSPORT_PUSH_PORCELAIN) + strvec_push(&argv, "--porcelain"); + if (flags & TRANSPORT_PUSH_PRUNE) + strvec_push(&argv, "--prune"); + if (flags & TRANSPORT_PUSH_NO_HOOK) + strvec_push(&argv, "--no-verify"); + if (flags & TRANSPORT_PUSH_FOLLOW_TAGS) + strvec_push(&argv, "--follow-tags"); + if (flags & TRANSPORT_PUSH_SET_UPSTREAM) + strvec_push(&argv, "--set-upstream"); + if (flags & TRANSPORT_PUSH_FORCE_IF_INCLUDES) + strvec_push(&argv, "--force-if-includes"); + if (flags & TRANSPORT_PUSH_ALL) + strvec_push(&argv, "--all"); + if (flags & TRANSPORT_PUSH_MIRROR) + strvec_push(&argv, "--mirror"); + + if (flags & TRANSPORT_PUSH_CERT_ALWAYS) + strvec_push(&argv, "--signed=yes"); + else if (flags & TRANSPORT_PUSH_CERT_IF_ASKED) + strvec_push(&argv, "--signed=if-asked"); + if (!thin) + strvec_push(&argv, "--no-thin"); + + if (deleterefs) + strvec_push(&argv, "--delete"); + + if (receivepack) + strvec_pushf(&argv, "--receive-pack=%s", receivepack); + if (verbosity >= 2) + strvec_push(&argv, "-v"); + if (verbosity >= 1) + strvec_push(&argv, "-v"); + else if (verbosity < 0) + strvec_push(&argv, "-q"); + if (progress > 0) + strvec_push(&argv, "--progress"); + else if (progress == 0) + strvec_push(&argv, "--no-progress"); + + if (family == TRANSPORT_FAMILY_IPV4) + strvec_push(&argv, "--ipv4"); + else if (family == TRANSPORT_FAMILY_IPV6) + strvec_push(&argv, "--ipv6"); + + if (recurse_submodules == RECURSE_SUBMODULES_CHECK) + strvec_push(&argv, "--recurse-submodules=check"); + else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND) + strvec_push(&argv, "--recurse-submodules=on-demand"); + else if (recurse_submodules == RECURSE_SUBMODULES_ONLY) + strvec_push(&argv, "--recurse-submodules=only"); + else if (recurse_submodules == RECURSE_SUBMODULES_OFF) + strvec_push(&argv, "--recurse-submodules=no"); + + + if (tags) + strvec_push(&argv, "--tags"); + + for (i = 0; i < push_options->nr; i++) + strvec_pushf(&argv, "--push-option=%s", + push_options->items[i].string); + + for (i = 0; i < cas.nr; i++) { + if (cas.entry[i].use_tracking) { + strvec_pushf(&argv, "--force-with-lease=%s", + cas.entry[i].refname); + } else if (!is_null_oid(&cas.entry[i].expect)) { + strvec_pushf(&argv, "--force-with-lease=%s:%s", + cas.entry[i].refname, + oid_to_hex(&cas.entry[i].expect)); + } else { + strvec_push(&argv, "--force-with-lease"); + } + } + + for (i = 0; i < list->nr; i++) { + const char *name = list->items[i].string; + struct child_process cmd = CHILD_PROCESS_INIT; + int j; + + strvec_pushv(&cmd.args, argv.v); + strvec_push(&cmd.args, name); + + for (j = 0; j < refspec_nr; j++) + strvec_push(&cmd.args, refspecs[j]); + + if (verbosity >= 0) + printf(_("Pushing to %s\n"), name); + + cmd.git_cmd = 1; + if (run_command(&cmd)) { + error(_("could not push to %s"), name); + result = 1; + } + } + + strvec_clear(&argv); + return result; +} + int cmd_push(int argc, const char **argv, const char *prefix, @@ -552,12 +670,13 @@ int cmd_push(int argc, int flags = 0; int tags = 0; int push_cert = -1; - int rc; + int rc = 0; + int base_flags; const char *repo = NULL; /* default repository */ struct string_list push_options_cmdline = STRING_LIST_INIT_DUP; + struct string_list remote_group = STRING_LIST_INIT_DUP; struct string_list *push_options; const struct string_list_item *item; - struct remote *remote; struct option options[] = { OPT__VERBOSITY(&verbosity), @@ -620,39 +739,45 @@ int cmd_push(int argc, else if (recurse_submodules == RECURSE_SUBMODULES_ONLY) flags |= TRANSPORT_RECURSE_SUBMODULES_ONLY; - if (tags) - refspec_append(&rs, "refs/tags/*"); - if (argc > 0) repo = argv[0]; - remote = pushremote_get(repo); - if (!remote) { - if (repo) - die(_("bad repository '%s'"), repo); - die(_("No configured push destination.\n" - "Either specify the URL from the command-line or configure a remote repository using\n" - "\n" - " git remote add \n" - "\n" - "and then push using the remote name\n" - "\n" - " git push \n")); - } - - if (argc > 0) - set_refspecs(argv + 1, argc - 1, remote); - - if (remote->mirror) - flags |= (TRANSPORT_PUSH_MIRROR|TRANSPORT_PUSH_FORCE); - - if (flags & TRANSPORT_PUSH_ALL) { - if (argc >= 2) - die(_("--all can't be combined with refspecs")); - } - if (flags & TRANSPORT_PUSH_MIRROR) { - if (argc >= 2) - die(_("--mirror can't be combined with refspecs")); + if (repo) { + if (!add_remote_or_group(repo, &remote_group)) { + /* + * Not a configured remote name or group name. + * Try treating it as a direct URL or path, e.g. + * git push /tmp/foo.git + * git push https://github.com/user/repo.git + * pushremote_get() creates an anonymous remote + * from the URL so the loop below can handle it + * identically to a named remote. + */ + struct remote *r = pushremote_get(repo); + if (!r) + die(_("bad repository '%s'"), repo); + string_list_append(&remote_group, r->name); + } + } else { + struct remote *r = pushremote_get(NULL); + if (!r) + die(_("No configured push destination.\n" + "Either specify the URL from the command-line or configure a remote repository using\n" + "\n" + " git remote add \n" + "\n" + "and then push using the remote name\n" + "\n" + " git push \n" + "\n" + "To push to multiple remotes at once, configure a remote group using\n" + "\n" + " git config remotes. \" \"\n" + "\n" + "and then push using the group name\n" + "\n" + " git push \n")); + string_list_append(&remote_group, r->name); } if (!is_empty_cas(&cas) && (flags & TRANSPORT_PUSH_FORCE_IF_INCLUDES)) @@ -662,10 +787,70 @@ int cmd_push(int argc, if (strchr(item->string, '\n')) die(_("push options must not have new line characters")); - rc = do_push(flags, push_options, remote); + if (remote_group.nr == 1) { + /* + * Single remote (the common case): run do_push() directly + * in this process. The loop runs exactly once. + * + * Mirror detection and the --mirror/--all + refspec conflict + * checks are done here. rs is rebuilt so that per-remote push + * mappings (remote.NAME.push config) are resolved against the + * correct remote. inner_flags is a snapshot of flags so that a + * mirror remote cannot bleed TRANSPORT_PUSH_FORCE into any + * subsequent call. + */ + base_flags = flags; + { + int inner_flags = base_flags; + struct remote *r = pushremote_get(remote_group.items[0].string); + if (!r) + die(_("no such remote or remote group: %s"), + remote_group.items[0].string); + + if (r->mirror) + inner_flags |= (TRANSPORT_PUSH_MIRROR|TRANSPORT_PUSH_FORCE); + + if (inner_flags & TRANSPORT_PUSH_ALL) { + if (argc >= 2) + die(_("--all can't be combined with refspecs")); + } + if (inner_flags & TRANSPORT_PUSH_MIRROR) { + if (argc >= 2) + die(_("--mirror can't be combined with refspecs")); + } + + refspec_clear(&rs); + rs = (struct refspec) REFSPEC_INIT_PUSH; + + if (tags) + refspec_append(&rs, "refs/tags/*"); + if (argc > 0) + set_refspecs(argv + 1, argc - 1, r); + + rc = do_push(inner_flags, push_options, r); + } + } else { + /* + * Multiple remotes: spawn one "git push []" + * subprocess per remote, sequentially. + * + * Options that only make sense for a single transport connection + * are rejected here. + */ + if (flags & TRANSPORT_PUSH_ATOMIC) + die(_("--atomic can only be used when pushing to one remote")); + + rc = push_multiple(&remote_group, push_options, flags, + tags, + argc > 1 ? argv + 1 : NULL, + argc > 1 ? argc - 1 : 0); + } + string_list_clear(&push_options_cmdline, 0); string_list_clear(&push_options_config, 0); + string_list_clear(&remote_group, 0); clear_cas_option(&cas); + if (rc == -1) usage_with_options(push_usage, options); else diff --git a/t/meson.build b/t/meson.build index 9b2fa4dee807d6..215df033e07e32 100644 --- a/t/meson.build +++ b/t/meson.build @@ -700,6 +700,7 @@ integration_tests = [ 't5563-simple-http-auth.sh', 't5564-http-proxy.sh', 't5565-push-multiple.sh', + 't5566-push-group.sh', 't5570-git-daemon.sh', 't5571-pre-push-hook.sh', 't5572-pull-submodule.sh', diff --git a/t/t5566-push-group.sh b/t/t5566-push-group.sh new file mode 100755 index 00000000000000..a7d59352b1c03e --- /dev/null +++ b/t/t5566-push-group.sh @@ -0,0 +1,160 @@ +#!/bin/sh + +test_description='push to remote group' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=default +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +test_expect_success 'setup' ' + for i in 1 2 3 + do + git init --bare dest-$i.git && + git -C dest-$i.git symbolic-ref HEAD refs/heads/not-a-branch || + return 1 + done && + test_tick && + git commit --allow-empty -m "initial" && + git config set remote.remote-1.url "file://$(pwd)/dest-1.git" && + git config set remote.remote-1.fetch "+refs/heads/*:refs/remotes/remote-1/*" && + git config set remote.remote-2.url "file://$(pwd)/dest-2.git" && + git config set remote.remote-2.fetch "+refs/heads/*:refs/remotes/remote-2/*" && + git config set remote.remote-3.url "file://$(pwd)/dest-3.git" && + git config set remote.remote-3.fetch "+refs/heads/*:refs/remotes/remote-3/*" && + git config set remotes.all-remotes "remote-1 remote-2 remote-3" +' + +test_expect_success 'push to remote group updates all members correctly' ' + git push all-remotes HEAD:refs/heads/main && + git rev-parse HEAD >expect && + for i in 1 2 3 + do + git -C dest-$i.git rev-parse refs/heads/main >actual || + return 1 + test_cmp expect actual || return 1 + done +' + +test_expect_success 'push second commit to group updates all members' ' + test_tick && + git commit --allow-empty -m "second" && + git push all-remotes HEAD:refs/heads/main && + git rev-parse HEAD >expect && + for i in 1 2 3 + do + git -C dest-$i.git rev-parse refs/heads/main >actual || + return 1 + test_cmp expect actual || return 1 + done +' + +test_expect_success 'push to single remote in group does not affect others' ' + test_tick && + git commit --allow-empty -m "third" && + git push remote-1 HEAD:refs/heads/main && + git -C dest-1.git rev-parse refs/heads/main >hash-after-1 && + git -C dest-2.git rev-parse refs/heads/main >hash-after-2 && + ! test_cmp hash-after-1 hash-after-2 +' + +test_expect_success 'mirror remote in group with refspec fails' ' + git config set remote.remote-1.mirror true && + test_must_fail git push all-remotes HEAD:refs/heads/main 2>err && + test_grep "mirror" err && + git config unset remote.remote-1.mirror +' + +test_expect_success 'push.default=current works with group push' ' + git config set push.default current && + test_tick && + git commit --allow-empty -m "fifth" && + git push all-remotes && + git config unset push.default +' + +test_expect_success '--atomic is rejected for group push' ' + test_must_fail git push --atomic all-remotes HEAD:refs/heads/main 2>err && + test_grep "atomic" err +' + +test_expect_success 'push continues past rejection to remaining remotes' ' + for i in c1 c2 c3 + do + git init --bare dest-$i.git || return 1 + done && + git config set remote.c1.url "file://$(pwd)/dest-c1.git" && + git config set remote.c2.url "file://$(pwd)/dest-c2.git" && + git config set remote.c3.url "file://$(pwd)/dest-c3.git" && + git config set remotes.continue-group "c1 c2 c3" && + + test_tick && + git commit --allow-empty -m "base for continue test" && + + # initial sync + git push continue-group HEAD:refs/heads/main && + + # advance c2 independently + git clone dest-c2.git tmp-c2 && + ( + cd tmp-c2 && + git checkout -b main origin/main && + test_commit c2_independent && + git push origin HEAD:refs/heads/main + ) && + rm -rf tmp-c2 && + + test_tick && + git commit --allow-empty -m "local diverging commit" && + + # push: c2 rejects, others succeed + test_must_fail git push continue-group HEAD:refs/heads/main && + + git rev-parse HEAD >expect && + git -C dest-c1.git rev-parse refs/heads/main >actual-c1 && + git -C dest-c3.git rev-parse refs/heads/main >actual-c3 && + test_cmp expect actual-c1 && + test_cmp expect actual-c3 && + + # c2 should not have the new commit + git -C dest-c2.git rev-parse refs/heads/main >actual-c2 && + ! test_cmp expect actual-c2 +' + +test_expect_success 'fatal connection error does not stop remaining remotes' ' + for i in f1 f2 f3 + do + git init --bare dest-$i.git || return 1 + done && + git config set remote.f1.url "file://$(pwd)/dest-f1.git" && + git config set remote.f2.url "file://$(pwd)/dest-f2.git" && + git config set remote.f3.url "file://$(pwd)/dest-f3.git" && + git config set remotes.fatal-group "f1 f2 f3" && + + test_tick && + git commit --allow-empty -m "base for fatal test" && + + # initial sync + git push fatal-group HEAD:refs/heads/main && + + # break f2 + git config set remote.f2.url "file:///tmp/does-not-exist-$$" && + + test_tick && + git commit --allow-empty -m "after fatal setup" && + + # overall exit code is non-zero because f2 failed + test_must_fail git push fatal-group HEAD:refs/heads/main && + + git rev-parse HEAD >expect && + + # f1 and f3 should both have the new commit — subprocesses are independent + git -C dest-f1.git rev-parse refs/heads/main >actual-f1 && + test_cmp expect actual-f1 && + git -C dest-f3.git rev-parse refs/heads/main >actual-f3 && + test_cmp expect actual-f3 && + + git config set remote.f2.url "file://$(pwd)/dest-f2.git" +' + +test_done From cdeef283bcf8529fc858cfe7d18a7522294519c4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Tue, 12 May 2026 13:56:00 +0200 Subject: [PATCH 021/107] strbuf: add strbuf_add_uint() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit strbuf_addf() calls vsnprintf(3) underneath, which supports a plethora of formatting options. We can avoid its overhead in basic cases by providing specialized functions like strbuf_addstr() for strings. Add another one, strbuf_add_uint(), for unsigned integers. Prepare the number string in a temporary buffer. Make it big enough for any unsigned integer value: A decimal digit can represent ln(10)/ln(2) ≈ 3.32 bits; dividing the number of bits of uintmax_t by 3.3 and rounding up gives a sufficiently close conservative size estimate. Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano --- strbuf.c | 12 ++++++++++++ strbuf.h | 6 ++++++ 2 files changed, 18 insertions(+) diff --git a/strbuf.c b/strbuf.c index 3e04addc22febb..9731ecdc1feb97 100644 --- a/strbuf.c +++ b/strbuf.c @@ -361,6 +361,18 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...) va_end(ap); } +void strbuf_add_uint(struct strbuf *sb, uintmax_t value) +{ + char buf[DIV_ROUND_UP(bitsizeof(value) * 10, 33)]; + char *end = buf + sizeof(buf); + char *p = end; + + do + *--p = "0123456789"[value % 10]; + while (value /= 10); + strbuf_add(sb, p, end - p); +} + static void add_lines(struct strbuf *out, const char *prefix, const char *buf, size_t size, diff --git a/strbuf.h b/strbuf.h index 06e284f9cca445..1089ae687bda95 100644 --- a/strbuf.h +++ b/strbuf.h @@ -410,6 +410,12 @@ void strbuf_humanise_rate(struct strbuf *buf, off_t bytes); __attribute__((format (printf,2,3))) void strbuf_addf(struct strbuf *sb, const char *fmt, ...); + +/** + * Add an unsigned decimal number. + */ +void strbuf_add_uint(struct strbuf *sb, uintmax_t value); + /** * Add a formatted string prepended by a comment character and a * blank to the buffer. From 8feb5702163a32384d098e2c9ad3987928f8c447 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Tue, 12 May 2026 13:56:01 +0200 Subject: [PATCH 022/107] cat-file: use strbuf_add_uint() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Speed up printing of objectsize atoms by using the specialized function strbuf_add_uint() instead of the general-purpose function strbuf_addf(): Benchmark 1: ./git_main cat-file --batch-all-objects --batch-check='%(objectsize)' Time (mean ± σ): 751.7 ms ± 1.5 ms [User: 733.5 ms, System: 17.1 ms] Range (min … max): 750.5 ms … 755.0 ms 10 runs Benchmark 2: ./git cat-file --batch-all-objects --batch-check='%(objectsize)' Time (mean ± σ): 720.4 ms ± 0.4 ms [User: 701.9 ms, System: 16.7 ms] Range (min … max): 719.7 ms … 721.2 ms 10 runs Summary ./git cat-file --batch-all-objects --batch-check='%(objectsize)' ran 1.04 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectsize)' Benchmark 1: ./git_main cat-file --batch-all-objects --batch-check='%(objectsize:disk)' Time (mean ± σ): 404.6 ms ± 0.9 ms [User: 397.8 ms, System: 5.7 ms] Range (min … max): 403.3 ms … 405.9 ms 10 runs Benchmark 2: ./git cat-file --batch-all-objects --batch-check='%(objectsize:disk)' Time (mean ± σ): 378.3 ms ± 0.9 ms [User: 371.2 ms, System: 5.9 ms] Range (min … max): 376.8 ms … 380.2 ms 10 runs Summary ./git cat-file --batch-all-objects --batch-check='%(objectsize:disk)' ran 1.07 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectsize:disk)' Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano --- builtin/cat-file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d9fbad535868bb..62160ca9d428eb 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -330,12 +330,12 @@ static int expand_atom(struct strbuf *sb, const char *atom, int len, if (data->mark_query) data->info.sizep = &data->size; else - strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size); + strbuf_add_uint(sb, data->size); } else if (is_atom("objectsize:disk", atom, len)) { if (data->mark_query) data->info.disk_sizep = &data->disk_size; else - strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size); + strbuf_add_uint(sb, data->disk_size); } else if (is_atom("rest", atom, len)) { if (data->mark_query) data->split_on_whitespace = 1; From f001b4ab3942cbaff4a39662294ee7191e2dbee5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Tue, 12 May 2026 13:56:02 +0200 Subject: [PATCH 023/107] ls-files: use strbuf_add_uint() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Speed up printing of objectsize values by using the specialized function strbuf_add_uint() as well as strbuf_insert() for padding instead of the general-purpose function strbuf_addf(). Here are the numbers I get when listing files in the Linux kernel repo: Benchmark 1: ./git_main -C ../linux ls-files --format='%(objectsize)' Time (mean ± σ): 257.3 ms ± 0.4 ms [User: 197.4 ms, System: 56.7 ms] Range (min … max): 256.7 ms … 258.1 ms 11 runs Benchmark 2: ./git -C ../linux ls-files --format='%(objectsize)' Time (mean ± σ): 253.4 ms ± 0.3 ms [User: 193.6 ms, System: 56.6 ms] Range (min … max): 253.0 ms … 253.8 ms 11 runs Benchmark 3: ./git_main -C ../linux ls-files --format='%(objectsize:padded)' Time (mean ± σ): 257.9 ms ± 0.3 ms [User: 198.0 ms, System: 56.6 ms] Range (min … max): 257.3 ms … 258.5 ms 11 runs Benchmark 4: ./git -C ../linux ls-files --format='%(objectsize:padded)' Time (mean ± σ): 254.6 ms ± 1.0 ms [User: 194.6 ms, System: 56.7 ms] Range (min … max): 253.7 ms … 256.8 ms 11 runs Summary ./git -C ../linux ls-files --format='%(objectsize)' ran 1.00 ± 0.00 times faster than ./git -C ../linux ls-files --format='%(objectsize:padded)' 1.02 ± 0.00 times faster than ./git_main -C ../linux ls-files --format='%(objectsize)' 1.02 ± 0.00 times faster than ./git_main -C ../linux ls-files --format='%(objectsize:padded)' Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano --- builtin/ls-files.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b148607f7a1468..c142ad41562794 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -250,20 +250,23 @@ static void expand_objectsize(struct repository *repo, struct strbuf *line, const struct object_id *oid, const enum object_type type, unsigned int padded) { + static const char padding[] = " "; + size_t min_len = padded ? strlen(padding) : 0; + size_t orig_len = line->len; + size_t len; + if (type == OBJ_BLOB) { unsigned long size; if (odb_read_object_info(repo->objects, oid, &size) < 0) die(_("could not get object info about '%s'"), oid_to_hex(oid)); - if (padded) - strbuf_addf(line, "%7"PRIuMAX, (uintmax_t)size); - else - strbuf_addf(line, "%"PRIuMAX, (uintmax_t)size); - } else if (padded) { - strbuf_addf(line, "%7s", "-"); + strbuf_add_uint(line, size); } else { strbuf_addstr(line, "-"); } + len = line->len - orig_len; + if (len < min_len) + strbuf_insert(line, orig_len, padding, min_len - len); } static void show_ce_fmt(struct repository *repo, const struct cache_entry *ce, From 4f87748b0d25bdc92b76e453f086204808e8be87 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Tue, 12 May 2026 13:56:03 +0200 Subject: [PATCH 024/107] ls-tree: use strbuf_add_uint() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Speed up printing of objectsize values by using the specialized function strbuf_add_uint() as well as strbuf_insert() for padding instead of the general-purpose function strbuf_addf(). Here are the numbers I get when listing objects in the Linux kernel repo: Benchmark 1: ./git_main -C ../linux ls-tree -r --format='%(objectsize)' HEAD Time (mean ± σ): 294.4 ms ± 0.4 ms [User: 231.5 ms, System: 59.4 ms] Range (min … max): 293.9 ms … 295.0 ms 10 runs Benchmark 2: ./git -C ../linux ls-tree -r --format='%(objectsize)' HEAD Time (mean ± σ): 291.2 ms ± 0.4 ms [User: 227.9 ms, System: 62.1 ms] Range (min … max): 290.6 ms … 292.0 ms 10 runs Benchmark 3: ./git_main -C ../linux ls-tree -r --format='%(objectsize:padded)' HEAD Time (mean ± σ): 295.3 ms ± 0.6 ms [User: 232.0 ms, System: 59.6 ms] Range (min … max): 294.3 ms … 296.3 ms 10 runs Benchmark 4: ./git -C ../linux ls-tree -r --format='%(objectsize:padded)' HEAD Time (mean ± σ): 291.9 ms ± 0.4 ms [User: 228.5 ms, System: 61.5 ms] Range (min … max): 291.2 ms … 292.3 ms 10 runs Summary ./git -C ../linux ls-tree -r --format='%(objectsize)' HEAD ran 1.00 ± 0.00 times faster than ./git -C ../linux ls-tree -r --format='%(objectsize:padded)' HEAD 1.01 ± 0.00 times faster than ./git_main -C ../linux ls-tree -r --format='%(objectsize)' HEAD 1.01 ± 0.00 times faster than ./git_main -C ../linux ls-tree -r --format='%(objectsize:padded)' HEAD Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano --- builtin/ls-tree.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c index 113e4a960dc7dd..57846911ce443f 100644 --- a/builtin/ls-tree.c +++ b/builtin/ls-tree.c @@ -26,20 +26,23 @@ static const char * const ls_tree_usage[] = { static void expand_objectsize(struct strbuf *line, const struct object_id *oid, const enum object_type type, unsigned int padded) { + static const char padding[] = " "; + size_t min_len = padded ? strlen(padding) : 0; + size_t orig_len = line->len; + size_t len; + if (type == OBJ_BLOB) { unsigned long size; if (odb_read_object_info(the_repository->objects, oid, &size) < 0) die(_("could not get object info about '%s'"), oid_to_hex(oid)); - if (padded) - strbuf_addf(line, "%7"PRIuMAX, (uintmax_t)size); - else - strbuf_addf(line, "%"PRIuMAX, (uintmax_t)size); - } else if (padded) { - strbuf_addf(line, "%7s", "-"); + strbuf_add_uint(line, size); } else { strbuf_addstr(line, "-"); } + len = line->len - orig_len; + if (len < min_len) + strbuf_insert(line, orig_len, padding, min_len - len); } struct ls_tree_options { From 63621bcbba81a131794d510bcedfa08d9318219c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Wed, 13 May 2026 17:49:11 +0200 Subject: [PATCH 025/107] hex: add and use strbuf_add_oid_hex() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a function for adding the full hexadecimal hash value of an object ID to a strbuf. It's thread-safe and slightly more efficient than using strbuf_addstr() with oid_to_hex() because it doesn't have to determine the length of the string or copy it from the intermediate static buffer. Add and apply a semantic patch to use it throughout the code base. I get a tiny speedup for git log showing a single hash per commit: Benchmark 1: ./git_main log --format=%H Time (mean ± σ): 91.2 ms ± 0.7 ms [User: 51.9 ms, System: 38.6 ms] Range (min … max): 89.8 ms … 92.6 ms 31 runs Benchmark 2: ./git log --format=%H Time (mean ± σ): 90.5 ms ± 0.7 ms [User: 51.0 ms, System: 38.8 ms] Range (min … max): 89.2 ms … 92.3 ms 32 runs Summary ./git log --format=%H ran 1.01 ± 0.01 times faster than ./git_main log --format=%H Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano --- bisect.c | 2 +- builtin/bisect.c | 2 +- builtin/cat-file.c | 5 ++--- builtin/replace.c | 2 +- convert.c | 2 +- fsck.c | 2 +- hex.c | 10 ++++++++++ hex.h | 5 +++++ pretty.c | 8 ++++---- refs.c | 2 +- sequencer.c | 4 ++-- shallow.c | 2 +- tools/coccinelle/strbuf.cocci | 6 ++++++ transport-helper.c | 2 +- 14 files changed, 37 insertions(+), 17 deletions(-) diff --git a/bisect.c b/bisect.c index ef17a442e55d2c..e67226a6dcbe69 100644 --- a/bisect.c +++ b/bisect.c @@ -512,7 +512,7 @@ static char *join_oid_array_hex(struct oid_array *array, char delim) int i; for (i = 0; i < array->nr; i++) { - strbuf_addstr(&joined_hexs, oid_to_hex(array->oid + i)); + strbuf_add_oid_hex(&joined_hexs, array->oid + i); if (i + 1 < array->nr) strbuf_addch(&joined_hexs, delim); } diff --git a/builtin/bisect.c b/builtin/bisect.c index 4520e585d0677f..0f679e7af926db 100644 --- a/builtin/bisect.c +++ b/builtin/bisect.c @@ -833,7 +833,7 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, int argc, if (!repo_get_oid(the_repository, head, &head_oid) && !starts_with(head, "refs/heads/")) { strbuf_reset(&start_head); - strbuf_addstr(&start_head, oid_to_hex(&head_oid)); + strbuf_add_oid_hex(&start_head, &head_oid); } else if (!repo_get_oid(the_repository, head, &head_oid) && skip_prefix(head, "refs/heads/", &head)) { strbuf_addstr(&start_head, head); diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d9fbad535868bb..f015e5f415bc3f 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -320,7 +320,7 @@ static int expand_atom(struct strbuf *sb, const char *atom, int len, { if (is_atom("objectname", atom, len)) { if (!data->mark_query) - strbuf_addstr(sb, oid_to_hex(&data->oid)); + strbuf_add_oid_hex(sb, &data->oid); } else if (is_atom("objecttype", atom, len)) { if (data->mark_query) data->info.typep = &data->type; @@ -345,8 +345,7 @@ static int expand_atom(struct strbuf *sb, const char *atom, int len, if (data->mark_query) data->info.delta_base_oid = &data->delta_base_oid; else - strbuf_addstr(sb, - oid_to_hex(&data->delta_base_oid)); + strbuf_add_oid_hex(sb, &data->delta_base_oid); } else if (is_atom("objectmode", atom, len)) { if (!data->mark_query && !(S_IFINVALID == data->mode)) strbuf_addf(sb, "%06o", data->mode); diff --git a/builtin/replace.c b/builtin/replace.c index 4c62c5ab58bd0a..aed6b2c8debf86 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -127,7 +127,7 @@ static int for_each_replace_name(const char **argv, each_replace_name_fn fn) } strbuf_setlen(&ref, base_len); - strbuf_addstr(&ref, oid_to_hex(&oid)); + strbuf_add_oid_hex(&ref, &oid); full_hex = ref.buf + base_len; if (refs_read_ref(get_main_ref_store(the_repository), ref.buf, &oid)) { diff --git a/convert.c b/convert.c index eae36c8a5936f4..036506842c3d41 100644 --- a/convert.c +++ b/convert.c @@ -1239,7 +1239,7 @@ static int ident_to_worktree(const char *src, size_t len, /* step 4: substitute */ strbuf_addstr(buf, "Id: "); - strbuf_addstr(buf, oid_to_hex(&oid)); + strbuf_add_oid_hex(buf, &oid); strbuf_addstr(buf, " $"); } strbuf_add(buf, src, len); diff --git a/fsck.c b/fsck.c index b72200c352d663..b4ffee6a043474 100644 --- a/fsck.c +++ b/fsck.c @@ -344,7 +344,7 @@ const char *fsck_describe_object(struct fsck_options *options, buf = bufs + b; b = (b + 1) % ARRAY_SIZE(bufs); strbuf_reset(buf); - strbuf_addstr(buf, oid_to_hex(oid)); + strbuf_add_oid_hex(buf, oid); if (name) strbuf_addf(buf, " (%s)", name); diff --git a/hex.c b/hex.c index bc756722ca623b..f02832140d2d43 100644 --- a/hex.c +++ b/hex.c @@ -3,6 +3,7 @@ #include "git-compat-util.h" #include "hash.h" #include "hex.h" +#include "strbuf.h" static int get_hash_hex_algop(const char *hex, unsigned char *hash, const struct git_hash_algo *algop) @@ -122,3 +123,12 @@ char *oid_to_hex(const struct object_id *oid) { return hash_to_hex_algop(oid->hash, &hash_algos[oid->algo]); } + +void strbuf_add_oid_hex(struct strbuf *sb, const struct object_id *oid) +{ + const struct git_hash_algo *algop = oid->algo ? + &hash_algos[oid->algo] : the_hash_algo; + strbuf_grow(sb, algop->hexsz); + hash_to_hex_algop_r(sb->buf + sb->len, oid->hash, algop); + strbuf_setlen(sb, sb->len + algop->hexsz); +} diff --git a/hex.h b/hex.h index 1e9a65d83a4f6b..f15c7e22201cea 100644 --- a/hex.h +++ b/hex.h @@ -33,6 +33,11 @@ char *oid_to_hex_r(char *out, const struct object_id *oid); char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *); /* static buffer result! */ char *oid_to_hex(const struct object_id *oid); /* same static buffer */ +struct strbuf; + +/* Apply oid_to_hex_r() to a strbuf to append the hexadecimal hash. */ +void strbuf_add_oid_hex(struct strbuf *sb, const struct object_id *oid); + /* * Parse a 40-character hexadecimal object ID starting from hex, updating the * pointer specified by end when parsing stops. The resulting object ID is diff --git a/pretty.c b/pretty.c index 814803980b8d1a..268422394648fe 100644 --- a/pretty.c +++ b/pretty.c @@ -662,7 +662,7 @@ static void add_merge_info(const struct pretty_print_context *pp, if (pp->abbrev) strbuf_add_unique_abbrev(sb, oidp, pp->abbrev); else - strbuf_addstr(sb, oid_to_hex(oidp)); + strbuf_add_oid_hex(sb, oidp); parent = parent->next; } strbuf_addch(sb, '\n'); @@ -1567,7 +1567,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ switch (placeholder[0]) { case 'H': /* commit hash */ strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_COMMIT)); - strbuf_addstr(sb, oid_to_hex(&commit->object.oid)); + strbuf_add_oid_hex(sb, &commit->object.oid); strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET)); return 1; case 'h': /* abbreviated commit hash */ @@ -1577,7 +1577,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET)); return 1; case 'T': /* tree hash */ - strbuf_addstr(sb, oid_to_hex(get_commit_tree_oid(commit))); + strbuf_add_oid_hex(sb, get_commit_tree_oid(commit)); return 1; case 't': /* abbreviated tree hash */ strbuf_add_unique_abbrev(sb, @@ -1588,7 +1588,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ for (p = commit->parents; p; p = p->next) { if (p != commit->parents) strbuf_addch(sb, ' '); - strbuf_addstr(sb, oid_to_hex(&p->item->object.oid)); + strbuf_add_oid_hex(sb, &p->item->object.oid); } return 1; case 'p': /* abbreviated parent hashes */ diff --git a/refs.c b/refs.c index bfcb9c7ac3d38c..d5b968c28ef615 100644 --- a/refs.c +++ b/refs.c @@ -2498,7 +2498,7 @@ int refs_update_symref_extended(struct ref_store *refs, const char *ref, if (referent && refs_read_symbolic_ref(refs, ref, referent) == NOT_A_SYMREF) { struct object_id oid; if (!refs_read_ref(refs, ref, &oid)) { - strbuf_addstr(referent, oid_to_hex(&oid)); + strbuf_add_oid_hex(referent, &oid); ret = NOT_A_SYMREF; } } diff --git a/sequencer.c b/sequencer.c index b7d8dca47f4a58..b4df04b6724206 100644 --- a/sequencer.c +++ b/sequencer.c @@ -2223,7 +2223,7 @@ static void refer_to_commit(struct repository *r, struct strbuf *msgbuf, repo_format_commit_message(r, commit, "%h (%s, %ad)", msgbuf, &ctx); } else { - strbuf_addstr(msgbuf, oid_to_hex(&commit->object.oid)); + strbuf_add_oid_hex(msgbuf, &commit->object.oid); } } @@ -2395,7 +2395,7 @@ static int do_pick_commit(struct repository *r, if (!has_conforming_footer(&ctx->message, NULL, 0)) strbuf_addch(&ctx->message, '\n'); strbuf_addstr(&ctx->message, cherry_picked_prefix); - strbuf_addstr(&ctx->message, oid_to_hex(&commit->object.oid)); + strbuf_add_oid_hex(&ctx->message, &commit->object.oid); strbuf_addstr(&ctx->message, ")\n"); } if (!is_fixup(command)) diff --git a/shallow.c b/shallow.c index a8ad92e303d24d..b4b4e2e32a7600 100644 --- a/shallow.c +++ b/shallow.c @@ -395,7 +395,7 @@ static int write_shallow_commits_1(struct strbuf *out, int use_pack_protocol, if (!extra) return data.count; for (size_t i = 0; i < extra->nr; i++) { - strbuf_addstr(out, oid_to_hex(extra->oid + i)); + strbuf_add_oid_hex(out, extra->oid + i); strbuf_addch(out, '\n'); data.count++; } diff --git a/tools/coccinelle/strbuf.cocci b/tools/coccinelle/strbuf.cocci index f5861283297acd..667903d1d48ad8 100644 --- a/tools/coccinelle/strbuf.cocci +++ b/tools/coccinelle/strbuf.cocci @@ -78,3 +78,9 @@ struct strbuf SB; @@ - SB.buf ? SB.buf : "" + SB.buf + +@@ +expression SB, OID; +@@ +- strbuf_addstr(SB, oid_to_hex(OID)) ++ strbuf_add_oid_hex(SB, OID) diff --git a/transport-helper.c b/transport-helper.c index 4e5d1d914fb12a..145a0cd7e6143a 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -1053,7 +1053,7 @@ static int push_refs_with_push(struct transport *transport, if (ref->peer_ref) strbuf_addstr(&buf, ref->peer_ref->name); else - strbuf_addstr(&buf, oid_to_hex(&ref->new_oid)); + strbuf_add_oid_hex(&buf, &ref->new_oid); } strbuf_addch(&buf, ':'); strbuf_addstr(&buf, ref->name); From 74216ffe0aa02309e1fc510c0056ec6fd523898c Mon Sep 17 00:00:00 2001 From: Greg Hurrell Date: Thu, 21 May 2026 13:45:09 +0000 Subject: [PATCH 026/107] git-jump: pick a mode automatically when invoked without arguments When `git jump` is invoked with no positional arguments (and no arguments after `--stdout`) it currently prints usage and exits with status 1. But there are two situations where we can usefully infer the most valuable and likely mode that a user would want to use, and select it automatically: 1. When there are unmerged paths in the index, the user likely wants `git jump merge`. 2. When the working tree has unstaged changes, the user likely wants `git jump diff`. In this commit we teach `git jump` a new "auto" mode which detects these cases and dispatches to the corresponding mode automatically. The user can either explicitly spell out `git jump auto`, or just leave it at `git jump` (because "auto" is the default). If none of the interesting cases listed above applies, then auto mode falls back to the existing usage-and-exit behavior. Signed-off-by: Greg Hurrell Signed-off-by: Junio C Hamano --- contrib/git-jump/README | 12 ++++++++++++ contrib/git-jump/git-jump | 26 +++++++++++++++++++++++--- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/contrib/git-jump/README b/contrib/git-jump/README index 3211841305fcb3..aabec4a756e9d4 100644 --- a/contrib/git-jump/README +++ b/contrib/git-jump/README @@ -75,8 +75,20 @@ git jump grep foo_bar # arbitrary grep options git jump grep -i foo_bar +# jump to places with conflict markers or whitespace errors +# (as reported by `git diff --check`) +git jump ws + # use the silver searcher for git jump grep git config jump.grepCmd "ag --column" + +# pick a mode automatically: "merge" if there are unmerged paths, +# "diff" if the worktree has unstaged changes, "ws" if there are +# whitespace problems; otherwise show usage +git jump auto + +# with no explicit mode and no args, same as "auto" +git jump -------------------------------------------------- You can use the optional argument '--stdout' to print the listing to diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump index 8d1d5d79a69854..79286d811210e3 100755 --- a/contrib/git-jump/git-jump +++ b/contrib/git-jump/git-jump @@ -3,9 +3,11 @@ usage() { cat <<\EOF usage: git jump [--stdout] [] + or: git jump [--stdout] Jump to interesting elements in an editor. -The parameter is one of: +The parameter is one of the following. +With no and no , it defaults to "auto". diff: elements are diff hunks. Arguments are given to diff. @@ -16,6 +18,10 @@ grep: elements are grep hits. Arguments are given to git grep or, if ws: elements are whitespace errors. Arguments are given to diff --check. +auto: select one of the other modes based on worktree state; + "merge" if there are unmerged paths, "diff" if there are + unstaged changes, "ws" if there are whitespace errors. + If the optional argument `--stdout` is given, print the quickfix lines to standard output instead of feeding it to the editor. EOF @@ -82,6 +88,21 @@ mode_ws() { git diff --check "$@" } +mode_auto() { + if test "$(git rev-parse --is-inside-work-tree 2>/dev/null)" != "true"; then + usage >&2 + exit 1 + fi + if test -n "$(git ls-files -u "$@")"; then + mode_merge "$@" + elif ! git diff --quiet "$@"; then + mode_diff "$@" + else + usage >&2 + exit 1 + fi +} + use_stdout= while test $# -gt 0; do case "$1" in @@ -99,8 +120,7 @@ while test $# -gt 0; do shift done if test $# -lt 1; then - usage >&2 - exit 1 + set -- auto fi mode=$1; shift type "mode_$mode" >/dev/null 2>&1 || { usage >&2; exit 1; } From 7c9b38d267129625adeced9f66140e802c345261 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Mon, 25 May 2026 11:58:48 +0900 Subject: [PATCH 027/107] SubmittingPatches: proactively monitor GHCI pages Even those contributors who do not come from GGG and do not first push their changes to their repositories on GitHub with CI enabled, can still monitor the CI runs triggered by integration of their topic to 'seen' and other branches to notice a breakage their topic caused to the system. Encourage them to help the project by keeping an eye on these CI runs. Signed-off-by: Junio C Hamano --- Documentation/SubmittingPatches | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index e270ccbe85b087..ad2dce1998120f 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -792,6 +792,17 @@ relevant for debugging. Then fix the problem and push your fix to your GitHub fork. This will trigger a new CI build to ensure all tests pass. +Even if you do not use GitHub CI to test your changes, pay close +attention to new failures on the branches when the maintainer pushes +out after your topic gets merged to the 'seen' branch to make sure +that your topic is not breaking the CI, and retract your breaking +topic quickly while you fix the breakage you caused. + +To see maintainer's push, keep an eye on this page: + + `https://github.com/git/git/actions/workflows/main.yml?query=event%3Apush+actor%3Agitster` + + [[mua]] == MUA specific hints From b2040bfafe0f7bbbd21cf65a903d2346d602f421 Mon Sep 17 00:00:00 2001 From: Ivan Baluta Date: Tue, 26 May 2026 03:58:07 +0000 Subject: [PATCH 028/107] doc: clarify push.default=simple behavior The documentation for the 'simple' push mode currently singles out the centralized workflow, which can cause confusion about its behavior in other scenarios, such as triangular workflows. Clarify that 'simple' always pushes the current branch to a branch of the same name, but only enforces the strict upstream tracking requirement when pushing back to the same remote being pulled from. Suggested-by: Junio C Hamano Signed-off-by: Ivan Baluta Signed-off-by: Junio C Hamano --- Documentation/config/push.adoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/config/push.adoc b/Documentation/config/push.adoc index d9112b22609b51..28132eedfee6c0 100644 --- a/Documentation/config/push.adoc +++ b/Documentation/config/push.adoc @@ -41,9 +41,10 @@ this is a deprecated synonym for `upstream`. `simple`;; push the current branch with the same name on the remote. + -If you are working on a centralized workflow (pushing to the same repository you -pull from, which is typically `origin`), then you need to configure an upstream -branch with the same name. +This mode requires that the remote repository to be pushed to is +known. When pushing back to the same remote you pull from, the +current branch must also have an upstream tracking branch with the +same name. + This mode is the default since Git 2.0, and is the safest option suited for beginners. From 96d1225ad904bf865fecc89ddfde62e1f4281c19 Mon Sep 17 00:00:00 2001 From: Zakariyah Ali Date: Tue, 26 May 2026 15:23:07 +0000 Subject: [PATCH 029/107] completion: hide dotfiles for selected path completion The completion helper for index paths uses git ls-files rather than shell filename completion. As a result, leading-dot paths such as a tracked .gitignore were offered even when the user had not started the path with ".". Hide leading-dot path components for git rm, git mv, and git ls-files when completing an empty path component. Explicit dot completion is still preserved, so git rm . can still complete .gitignore. This matches standard shell filename completion behavior, where dotfiles are hidden by default unless the user starts their input with a dot. This also resolves four TODO comments in t/9902-completion.sh which have been present since 2013 (commit ddf07bddef9a, "completion: add file completion tests", 2013-04-27), expecting that .gitignore would not be shown when completing on an empty path component. Signed-off-by: Zakariyah Ali Signed-off-by: Junio C Hamano --- contrib/completion/git-completion.bash | 36 +++++++++++++++++--------- t/t9902-completion.sh | 10 ++----- 2 files changed, 26 insertions(+), 20 deletions(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index a8e7c6ddbfb2b1..e8f8fab125b42b 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -638,25 +638,33 @@ __git_ls_files_helper () } -# __git_index_files accepts 1 or 2 arguments: +# __git_index_files accepts 1 to 4 arguments: # 1: Options to pass to ls-files (required). # 2: A directory path (optional). # If provided, only files within the specified directory are listed. # Sub directories are never recursed. Path must have a trailing # slash. # 3: List only paths matching this path component (optional). +# 4: Hide paths whose first component starts with a dot if this is +# "hide-dotfiles" and the third argument is empty (optional). __git_index_files () { - local root="$2" match="$3" + local root="$2" match="$3" hide_dotfiles="${4-}" + local hide_dotfiles_awk=0 + if [ "$hide_dotfiles" = "hide-dotfiles" ] && [ -z "$match" ]; then + hide_dotfiles_awk=1 + fi __git_ls_files_helper "$root" "$1" "${match:-?}" | - awk -F / -v pfx="${2//\\/\\\\}" '{ + awk -F / -v pfx="${2//\\/\\\\}" -v hide_dotfiles="$hide_dotfiles_awk" '{ paths[$1] = 1 } END { for (p in paths) { if (substr(p, 1, 1) != "\"") { # No special characters, easy! + if (hide_dotfiles == 1 && substr(p, 1, 1) == ".") + continue print pfx p continue } @@ -675,8 +683,10 @@ __git_index_files () # We have seen the same directory unquoted, # skip it. continue - else - print pfx p + + if (hide_dotfiles == 1 && substr(p, 1, 1) == ".") + continue + print pfx p } } function dequote(p, bs_idx, out, esc, esc_idx, dec) { @@ -721,13 +731,15 @@ __git_index_files () }' } -# __git_complete_index_file requires 1 argument: +# __git_complete_index_file accepts 1 or 2 arguments: # 1: the options to pass to ls-file +# 2: Hide paths whose first component starts with a dot if this is +# "hide-dotfiles" and the current word is empty (optional). # # The exception is --committable, which finds the files appropriate commit. __git_complete_index_file () { - local dequoted_word pfx="" cur_ + local dequoted_word pfx="" cur_ hide_dotfiles="${2-}" __git_dequote "$cur" @@ -740,7 +752,7 @@ __git_complete_index_file () cur_="$dequoted_word" esac - __gitcomp_file_direct "$(__git_index_files "$1" "$pfx" "$cur_")" + __gitcomp_file_direct "$(__git_index_files "$1" "$pfx" "$cur_" "$hide_dotfiles")" } # Lists branches from the local repository. @@ -2164,7 +2176,7 @@ _git_ls_files () # XXX ignore options like --modified and always suggest all cached # files. - __git_complete_index_file "--cached" + __git_complete_index_file "--cached" hide-dotfiles } _git_ls_remote () @@ -2397,9 +2409,9 @@ _git_mv () if [ $(__git_count_arguments "mv") -gt 0 ]; then # We need to show both cached and untracked files (including # empty directories) since this may not be the last argument. - __git_complete_index_file "--cached --others --directory" + __git_complete_index_file "--cached --others --directory" hide-dotfiles else - __git_complete_index_file "--cached" + __git_complete_index_file "--cached" hide-dotfiles fi } @@ -3219,7 +3231,7 @@ _git_rm () ;; esac - __git_complete_index_file "--cached" + __git_complete_index_file "--cached" hide-dotfiles } _git_shortlog () diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 28f61f08fb4cec..02aaf71876ea0a 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -2811,17 +2811,15 @@ test_expect_success 'complete files' ' touch untracked && - : TODO .gitignore should not be here && test_completion "git rm " <<-\EOF && - .gitignore modified EOF + test_completion "git rm ." ".gitignore" && + test_completion "git clean " "untracked" && - : TODO .gitignore should not be here && test_completion "git mv " <<-\EOF && - .gitignore modified EOF @@ -2832,9 +2830,7 @@ test_expect_success 'complete files' ' mkdir untracked-dir && - : TODO .gitignore should not be here && test_completion "git mv modified " <<-\EOF && - .gitignore dir modified untracked @@ -2843,9 +2839,7 @@ test_expect_success 'complete files' ' test_completion "git commit " "modified" && - : TODO .gitignore should not be here && test_completion "git ls-files " <<-\EOF && - .gitignore dir modified EOF From e3959cc78c968d8f029daa48d4aadcb486da0629 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:55:50 -0400 Subject: [PATCH 030/107] pack-bitmap: pass object position to `fill_bitmap_tree()` In the following commit, callers of `fill_bitmap_tree()` will be required to check the bit corresponding to their tree before calling that function. That change will reduce the overhead of setting up and tearing down stack frames for trees whose bits are already set. To prepare for that change, have callers pass in the tree's bit position in `fill_bitmap_tree()`, which will make the next commit easier to read. In the meantime, this change has a surprising and measurable benefit during bitmap generation, particularly on very large repositories. When processing sub-trees within `fill_bitmap_tree()`, the preimage of this patch did the following: while (tree_entry(&desc, entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: if (fill_bitmap_tree(writer, bitmap, lookup_tree(writer->repo, &entry.oid)) < 0) { /* ... */ } /* ... */ } } , first performing the object lookup via `lookup_tree()`, and then locating its bit position within the recursive call. This patch effectively reorders those two calls so that we first discover the sub-tree's bit position, *then* load its tree. By reordering these two operations, we spend fewer CPU cycles per instruction, likely due to improved CPU dependency/cache/pipeline behavior. Comparing the results of: running `perf stat` before and after this commit, we have: +--------------+-------------+-------------+-------------------+ | | HEAD^ | HEAD | Delta | +--------------+-------------+-------------+-------------------+ | elapsed | 612.5 s | 582.4 s | -30.1 s (-4.9%) | | cycles | 2,857.3 B | 2,713.3 B | -144.0 B (-5.0%) | | instructions | 2,413.2 B | 2,415.5 B | +2.3 B (+0.1%) | | CPI | 1.184 | 1.123 | -0.061 (-5.1%) | +--------------+-------------+-------------+-------------------+ In a large repository with ~4.8M commit, and ~37.1M tree objects this change improves timing from ~612.5 seconds down to ~582.4 seconds, or a ~4.9% improvement. More importantly, the number of CPU cycles spent dropped off significantly as a result of this commit, lowering our cycles-per-instruction ratio by about ~5.1%. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 1c8070f99c03ca..2d5ff8fd406db9 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -456,10 +456,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb) static int fill_bitmap_tree(struct bitmap_writer *writer, struct bitmap *bitmap, - struct tree *tree) + struct tree *tree, + uint32_t pos) { int found; - uint32_t pos; struct tree_desc desc; struct name_entry entry; @@ -467,9 +467,6 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, * If our bit is already set, then there is nothing to do. Both this * tree and all of its children will be set. */ - pos = find_object_pos(writer, &tree->object.oid, &found); - if (!found) - return -1; if (bitmap_get(bitmap, pos)) return 0; bitmap_set(bitmap, pos); @@ -482,8 +479,12 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, while (tree_entry(&desc, &entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: + pos = find_object_pos(writer, &entry.oid, &found); + if (!found) + return -1; if (fill_bitmap_tree(writer, bitmap, - lookup_tree(writer->repo, &entry.oid)) < 0) + lookup_tree(writer->repo, + &entry.oid), pos) < 0) return -1; break; case OBJ_BLOB: @@ -575,8 +576,14 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, } while (tree_queue->nr) { - if (fill_bitmap_tree(writer, ent->bitmap, - prio_queue_get(tree_queue)) < 0) + struct tree *t = prio_queue_get(tree_queue); + int found; + + pos = find_object_pos(writer, &t->object.oid, &found); + if (!found) + return -1; + + if (fill_bitmap_tree(writer, ent->bitmap, t, pos) < 0) return -1; } return 0; From 1760c372589af09ff0b986c57bfe0b9101275674 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:55:53 -0400 Subject: [PATCH 031/107] pack-bitmap: check subtree bits before recursing In the previous commit, we adjusted the callers of `fill_bitmap_tree()` to pass in the bit position of the tree they wish to fill. This commit makes use of that information at the call site to avoid setting up a stack frame for fill_bitmap_tree() entirely whenever a tree's bit position is already set. Since this is such a hot path, the avoided cost of setting up and tearing down stack frames for each noop'd call to `fill_bitmap_tree()` is significant: +--------------+-------------+-------------+-------------------+ | | HEAD^ | HEAD | Delta | +--------------+-------------+-------------+-------------------+ | elapsed | 582.4 s | 562.8 s | -19.6 s (-3.4%) | | cycles | 2,713.3 B | 2,621.3 B | -92.0 B (-3.4%) | | instructions | 2,415.5 B | 2,348.9 B | -66.6 B (-2.8%) | | CPI | 1.123 | 1.116 | -0.007 (-0.7%) | +--------------+-------------+-------------+-------------------+ In the same repository as in the previous commit, our timings dropped from ~582.4 seconds down to ~562.77 seconds. While the cycles-per-instruction ratio is basically unchanged, we execute significantly fewer instructions, and correspondingly fewer cycles. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 2d5ff8fd406db9..72610397020664 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -463,12 +463,6 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, struct tree_desc desc; struct name_entry entry; - /* - * If our bit is already set, then there is nothing to do. Both this - * tree and all of its children will be set. - */ - if (bitmap_get(bitmap, pos)) - return 0; bitmap_set(bitmap, pos); if (repo_parse_tree(writer->repo, tree) < 0) @@ -482,6 +476,15 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, pos = find_object_pos(writer, &entry.oid, &found); if (!found) return -1; + if (bitmap_get(bitmap, pos)) { + /* + * If our bit is already set, then there + * is nothing to do. Both this tree and + * all of its children will be set. + */ + break; + } + if (fill_bitmap_tree(writer, bitmap, lookup_tree(writer->repo, &entry.oid), pos) < 0) @@ -582,6 +585,14 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, pos = find_object_pos(writer, &t->object.oid, &found); if (!found) return -1; + if (bitmap_get(ent->bitmap, pos)) { + /* + * If our bit is already set, then there is + * nothing to do. Both this tree and all of its + * children will be set. + */ + continue; + } if (fill_bitmap_tree(writer, ent->bitmap, t, pos) < 0) return -1; From 3ea5fe8482e44fe8636b2725edffcadc81b22161 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:55:56 -0400 Subject: [PATCH 032/107] pack-bitmap: reuse stored selected bitmaps When `fill_bitmap_commit()` reaches an ancestor that was selected for its own bitmap and processed earlier, its object closure is already stored in `writer->bitmaps` as an EWAH bitmap. As a result, walking through that commit's tree and parents again is redundant. Teach `fill_bitmap_commit()` to notice that case. For non-root commits in the walk, look for a stored selected bitmap and OR it into the bitmap being built. If one exists, skip the commit, its tree, and its parents. Building bitmaps from scratch on the same test repository from the previous commits yields a significant speed-up: +------------------+-------------+-------------+---------------------+ | | HEAD^ | HEAD | Delta | +------------------+-------------+-------------+---------------------+ | elapsed | 562.8 s | 324.8 s | -237.9 s (-42.3%) | | cycles | 2,621.3 B | 1,508.6 B | -1,112.7 B (-42.4%) | | instructions | 2,348.9 B | 1,436.6 B | -912.3 B (-38.8%) | | CPI | 1.116 | 1.050 | -0.066 (-5.9%) | +------------------+-------------+-------------+---------------------+ In our testing repository, there are 1,261 commits selected for bitmap coverage, and 1,382 maximal commits induced as a result of that. Of the 1,382 calls made to `fill_bitmap_commit()` (one per maximal commit), 131 of them can be short-circuited at some point during their traversal as a consequence of this change. In large repositories where the cost of filling the bitmap for any individual commit is large, being able to short-circuit even ~9.5% of the calls to `fill_bitmap_commit()` results in a significant savings. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 72610397020664..651ad467469f44 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -509,6 +509,9 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, static int reused_bitmaps_nr; static int reused_pseudo_merge_bitmaps_nr; +static int fill_bitmap_commit_calls_nr; +static int fill_bitmap_commit_found_ancestor_nr; + static int fill_bitmap_commit(struct bitmap_writer *writer, struct bb_commit *ent, struct commit *commit, @@ -519,6 +522,9 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, { int found; uint32_t pos; + + fill_bitmap_commit_calls_nr++; + if (!ent->bitmap) ent->bitmap = bitmap_new(); @@ -553,6 +559,28 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, bitmap_free(remapped); } + /* + * If we encounter an ancestor for which we have already + * computed a bitmap during this build (i.e. a regular + * selected commit processed earlier in topo order), we can + * short-circuit the walk: its stored bitmap already covers + * the commit itself, its tree, and all of its ancestors. + */ + if (c != commit) { + khiter_t hash_pos = kh_get_oid_map(writer->bitmaps, + c->object.oid); + if (hash_pos != kh_end(writer->bitmaps)) { + struct bitmapped_commit *stored = + kh_value(writer->bitmaps, hash_pos); + if (stored && stored->bitmap) { + fill_bitmap_commit_found_ancestor_nr++; + bitmap_or_ewah(ent->bitmap, + stored->bitmap); + continue; + } + } + } + /* * Mark ourselves and queue our tree. The commit * walk ensures we cover all parents. @@ -692,6 +720,12 @@ int bitmap_writer_build(struct bitmap_writer *writer) trace2_data_intmax("pack-bitmap-write", writer->repo, "building_bitmaps_pseudo_merge_reused", reused_pseudo_merge_bitmaps_nr); + trace2_data_intmax("pack-bitmap-write", writer->repo, + "fill_bitmap_commit_calls_nr", + fill_bitmap_commit_calls_nr); + trace2_data_intmax("pack-bitmap-write", writer->repo, + "fill_bitmap_commit_found_ancestor_nr", + fill_bitmap_commit_found_ancestor_nr); stop_progress(&writer->progress); From ece3465d44157157a03eb7cd5de955e552e7831c Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:55:59 -0400 Subject: [PATCH 033/107] pack-bitmap: consolidate `find_object_pos()` success path Both sides of `find_object_pos()` report success in the same way by setting the optional `found` out-parameter and return the resolved bitmap position. Prepare for adding more bookkeeping around object-position lookups by storing the result in a local `pos` variable and sharing the success return path between the packlist and MIDX cases. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 651ad467469f44..42ed22feacc702 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -217,6 +217,7 @@ static uint32_t find_object_pos(struct bitmap_writer *writer, const struct object_id *oid, int *found) { struct object_entry *entry; + uint32_t pos; entry = packlist_find(writer->to_pack, oid); if (entry) { @@ -224,23 +225,22 @@ static uint32_t find_object_pos(struct bitmap_writer *writer, if (writer->midx) base_objects = writer->midx->num_objects + writer->midx->num_objects_in_base; - - if (found) - *found = 1; - return oe_in_pack_pos(writer->to_pack, entry) + base_objects; + pos = oe_in_pack_pos(writer->to_pack, entry) + base_objects; } else if (writer->midx) { - uint32_t at, pos; + uint32_t at; if (!bsearch_midx(oid, writer->midx, &at)) goto missing; if (midx_to_pack_pos(writer->midx, at, &pos) < 0) goto missing; - - if (found) - *found = 1; - return pos; + } else { + goto missing; } + if (found) + *found = 1; + return pos; + missing: if (found) *found = 0; From c720bbcc53f223236220c7a879f0a0e73e5d3739 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:56:02 -0400 Subject: [PATCH 034/107] pack-bitmap: cache object positions during fill The previous commits removed some redundant work from bitmap generation by avoiding unnecessary tree recursion and by reusing selected bitmaps that have already been computed. Even with those changes in place, there is still an extremely hot path from `fill_bitmap_commit()` and `fill_bitmap_tree()` to translate object IDs into their corresponding bit positions in order to generate their bitmaps. In a small repository, this overhead is not significant. However, in a very large repository (e.g., the one that we have been using as a benchmark over the past several commits with ~57M total objects), the overhead of locating object bit positions (often repeatedly) adds up significantly. Combat this by adding a small, direct-mapped cache to the bitmap writer which maps object IDs to their corresponding bit positions. Size the cache according to the number of objects being written, with fixed lower and upper bounds so small repositories do not pay for a large table and large repositories can avoid most repeated packlist and MIDX lookups. On my machine with (a somewhat outdated) GCC 15.2.0, each entry in the cache is 40 bytes wide: $ pahole -C bitmap_pos_cache_entry pack-bitmap-write.o struct bitmap_pos_cache_entry { struct object_id oid; /* 0 36 */ uint32_t pos; /* 36 4 */ /* size: 40, cachelines: 1, members: 2 */ /* last cacheline: 40 bytes */ }; , and we will allocate up to 2^21 entries for a maximum total of 80 MiB of cache overhead. In our example repository from above and in earlier commits, this results in a ~9.4% reduction in runtime relative to the previous commit: +------------------+-------------+-------------+---------------------+ | | HEAD^ | HEAD | Delta | +------------------+-------------+-------------+---------------------+ | elapsed | 324.8 s | 294.1 s | -30.7 s (-9.4%) | | cycles | 1,508.6 B | 1,365.5 B | -143.0 B (-9.5%) | | instructions | 1,436.6 B | 1,389.8 B | -46.9 B (-3.3%) | | CPI | 1.050 | 0.983 | -0.068 (-6.4%) | +------------------+-------------+-------------+---------------------+ When generating bitmaps on this repository (to produce the above timings), the cache grew to its maximum size of 80 MiB, and resulted in 1.024B cache hits and 59.957M cache misses. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 88 ++++++++++++++++++++++++++++++++++++++++++++- pack-bitmap.h | 7 ++++ 2 files changed, 94 insertions(+), 1 deletion(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 42ed22feacc702..4b6fb07edd71c9 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -89,6 +89,7 @@ void bitmap_writer_free(struct bitmap_writer *writer) ewah_free(writer->tags); kh_destroy_oid_map(writer->bitmaps); + free(writer->pos_cache); kh_foreach_value(writer->pseudo_merge_commits, idx, free_pseudo_merge_commit_idx(idx)); @@ -213,15 +214,92 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer, writer->selected_nr++; } +struct bitmap_pos_cache_entry { + struct object_id oid; + uint32_t pos; +}; + +#define BITMAP_POS_MIN_CACHE_SIZE (1U << 10) +#define BITMAP_POS_MAX_CACHE_SIZE (1U << 21) +#define BITMAP_POS_CACHE_VALID (1U << 31) + +static void bitmap_writer_init_pos_cache(struct bitmap_writer *writer) +{ + if (writer->pos_cache) + return; + + writer->pos_cache_nr = BITMAP_POS_MIN_CACHE_SIZE; + + while (writer->pos_cache_nr < writer->to_pack->nr_objects && + writer->pos_cache_nr < BITMAP_POS_MAX_CACHE_SIZE) + writer->pos_cache_nr <<= 1; + + CALLOC_ARRAY(writer->pos_cache, writer->pos_cache_nr); +} + +static size_t bitmap_writer_pos_cache_slot(struct bitmap_writer *writer, + const struct object_id *oid) +{ + return oidhash(oid) & (writer->pos_cache_nr - 1); +} + +static bool bitmap_writer_pos_cache_valid(struct bitmap_writer *writer, + size_t slot) +{ + return !!(writer->pos_cache[slot].pos & BITMAP_POS_CACHE_VALID); +} + +static int find_cached_object_pos(struct bitmap_writer *writer, + const struct object_id *oid, uint32_t *pos) +{ + size_t slot = bitmap_writer_pos_cache_slot(writer, oid); + + if (bitmap_writer_pos_cache_valid(writer, slot) && + oideq(&writer->pos_cache[slot].oid, oid)) { + writer->pos_cache_hits++; + *pos = writer->pos_cache[slot].pos & ~BITMAP_POS_CACHE_VALID; + return 1; + } + + writer->pos_cache_misses++; + return 0; +} + +static uint32_t store_cached_object_pos(struct bitmap_writer *writer, + const struct object_id *oid, + uint32_t pos) +{ + size_t slot; + + if (pos & BITMAP_POS_CACHE_VALID) + return pos; /* too large to cache */ + + slot = bitmap_writer_pos_cache_slot(writer, oid); + + oidcpy(&writer->pos_cache[slot].oid, oid); + writer->pos_cache[slot].pos = pos | BITMAP_POS_CACHE_VALID; + + return pos; +} + static uint32_t find_object_pos(struct bitmap_writer *writer, const struct object_id *oid, int *found) { struct object_entry *entry; uint32_t pos; + bitmap_writer_init_pos_cache(writer); + + if (find_cached_object_pos(writer, oid, &pos)) { + if (found) + *found = 1; + return pos; + } + entry = packlist_find(writer->to_pack, oid); if (entry) { uint32_t base_objects = 0; + if (writer->midx) base_objects = writer->midx->num_objects + writer->midx->num_objects_in_base; @@ -239,7 +317,7 @@ static uint32_t find_object_pos(struct bitmap_writer *writer, if (found) *found = 1; - return pos; + return store_cached_object_pos(writer, oid, pos); missing: if (found) @@ -662,6 +740,10 @@ int bitmap_writer_build(struct bitmap_writer *writer) writer->progress = start_progress(writer->repo, "Building bitmaps", writer->selected_nr); + + writer->pos_cache_hits = 0; + writer->pos_cache_misses = 0; + trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", writer->repo); @@ -726,6 +808,10 @@ int bitmap_writer_build(struct bitmap_writer *writer) trace2_data_intmax("pack-bitmap-write", writer->repo, "fill_bitmap_commit_found_ancestor_nr", fill_bitmap_commit_found_ancestor_nr); + trace2_data_intmax("pack-bitmap-write", writer->repo, + "bitmap_pos_cache_hits", writer->pos_cache_hits); + trace2_data_intmax("pack-bitmap-write", writer->repo, + "bitmap_pos_cache_misses", writer->pos_cache_misses); stop_progress(&writer->progress); diff --git a/pack-bitmap.h b/pack-bitmap.h index a95e1c2d115a31..19a86554579f7c 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -132,6 +132,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); +struct bitmap_pos_cache_entry; + struct bitmap_writer { struct repository *repo; struct ewah_bitmap *commits; @@ -143,6 +145,11 @@ struct bitmap_writer { struct packing_data *to_pack; struct multi_pack_index *midx; /* if appending to a MIDX chain */ + struct bitmap_pos_cache_entry *pos_cache; + size_t pos_cache_nr; + uint64_t pos_cache_hits; + uint64_t pos_cache_misses; + struct bitmapped_commit *selected; unsigned int selected_nr, selected_alloc; From dcccd997462e2130bcc35f933285ff087454275e Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:56:05 -0400 Subject: [PATCH 035/107] pack-bitmap: sort bitmaps before XORing Reachability bitmaps may be stored as XORs against nearby bitmaps, up to 10 away. However, when callers provide selected commits in an arbitrary order, the writer may miss good ancestor/descendant pairs and produce much larger bitmap files without changing query coverage. Sort the selected bitmaps in date order (from oldest to newest) before computing XOR offsets, leaving pseudo-merge bitmaps alone (which we will deal with separately in following commits). On our same testing repository from previous commits, this change shrunk our selection of 1,261 bitmaps from ~635.46 MiB to 176.4 MiB for a ~72.24% reduction in the on-disk size of our *.bitmap file. The time to generate the smaller bitmap file decreased by ~3.69 seconds, though this is likely mostly noise. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 4b6fb07edd71c9..66282ea14b5123 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -327,11 +327,40 @@ static uint32_t find_object_pos(struct bitmap_writer *writer, return 0; } +static int bitmapped_commit_date_cmp(const void *_a, const void *_b) +{ + const struct bitmapped_commit *a = _a; + const struct bitmapped_commit *b = _b; + + if (a->commit->date < b->commit->date) + return -1; + if (a->commit->date > b->commit->date) + return 1; + return 0; +} + static void compute_xor_offsets(struct bitmap_writer *writer) { static const int MAX_XOR_OFFSET_SEARCH = 10; int i, next = 0; + int nr = bitmap_writer_nr_selected_commits(writer); + + if (nr > 1) { + QSORT(writer->selected, nr, bitmapped_commit_date_cmp); + + for (i = 0; i < nr; i++) { + struct bitmapped_commit *stored = &writer->selected[i]; + khiter_t hash_pos = kh_get_oid_map(writer->bitmaps, + stored->commit->object.oid); + + if (hash_pos == kh_end(writer->bitmaps)) + BUG("selected commit missing from bitmap map: %s", + oid_to_hex(&stored->commit->object.oid)); + + kh_value(writer->bitmaps, hash_pos) = stored; + } + } while (next < writer->selected_nr) { struct bitmapped_commit *stored = &writer->selected[next]; From b04d26607de35b88cf9c62ca11931d4f8cc4ac05 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:56:08 -0400 Subject: [PATCH 036/107] pack-bitmap: remember pseudo-merge parents write_pseudo_merges() currently builds an array of temporary bitmaps for the parent set of each pseudo-merge, then serializes those bitmaps later while writing the extension. Move those parent bitmaps onto the corresponding bitmapped_commit entries instead. This keeps the on-disk output unchanged, but gives the parent bitmap the same lifetime and access pattern that later changes will use when pseudo-merge object bitmaps are built before the write step. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 66282ea14b5123..8200aed610135b 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -32,6 +32,7 @@ struct bitmapped_commit { struct commit *commit; struct ewah_bitmap *bitmap; struct ewah_bitmap *write_as; + struct ewah_bitmap *pseudo_merge_parents; int flags; int xor_offset; uint32_t commit_pos; @@ -102,6 +103,7 @@ void bitmap_writer_free(struct bitmap_writer *writer) if (bc->write_as != bc->bitmap) ewah_free(bc->write_as); ewah_free(bc->bitmap); + ewah_free(bc->pseudo_merge_parents); } free(writer->selected); } @@ -210,6 +212,7 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer, writer->selected[writer->selected_nr].write_as = NULL; writer->selected[writer->selected_nr].flags = 0; writer->selected[writer->selected_nr].pseudo_merge = pseudo_merge; + writer->selected[writer->selected_nr].pseudo_merge_parents = NULL; writer->selected_nr++; } @@ -1004,42 +1007,47 @@ static void write_pseudo_merges(struct bitmap_writer *writer, struct hashfile *f) { struct oid_array commits = OID_ARRAY_INIT; - struct bitmap **commits_bitmap = NULL; off_t *pseudo_merge_ofs = NULL; off_t start, table_start, next_ext; uint32_t base = bitmap_writer_nr_selected_commits(writer); size_t i, j = 0; - CALLOC_ARRAY(commits_bitmap, writer->pseudo_merges_nr); CALLOC_ARRAY(pseudo_merge_ofs, writer->pseudo_merges_nr); for (i = 0; i < writer->pseudo_merges_nr; i++) { struct bitmapped_commit *merge = &writer->selected[base + i]; struct commit_list *p; + struct bitmap *parents = bitmap_new(); if (!merge->pseudo_merge) BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i); - commits_bitmap[i] = bitmap_new(); - for (p = merge->commit->parents; p; p = p->next) - bitmap_set(commits_bitmap[i], + bitmap_set(parents, find_object_pos(writer, &p->item->object.oid, NULL)); + + merge->pseudo_merge_parents = bitmap_to_ewah(parents); + bitmap_free(parents); } start = hashfile_total(f); for (i = 0; i < writer->pseudo_merges_nr; i++) { - struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]); + struct bitmapped_commit *merge = &writer->selected[base + i]; + + if (!merge->pseudo_merge) + BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i); + + if (!merge->pseudo_merge_parents) + BUG("missing pseudo-merge parents bitmap for commit %s", + oid_to_hex(&merge->commit->object.oid)); pseudo_merge_ofs[i] = hashfile_total(f); - dump_bitmap(f, commits_ewah); + dump_bitmap(f, merge->pseudo_merge_parents); dump_bitmap(f, writer->selected[base+i].write_as); - - ewah_free(commits_ewah); } next_ext = st_add(hashfile_total(f), @@ -1122,12 +1130,8 @@ static void write_pseudo_merges(struct bitmap_writer *writer, hashwrite_be64(f, table_start - start); hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t)); - for (i = 0; i < writer->pseudo_merges_nr; i++) - bitmap_free(commits_bitmap[i]); - oid_array_clear(&commits); free(pseudo_merge_ofs); - free(commits_bitmap); } static int table_cmp(const void *_va, const void *_vb, void *_data) From 49633dc88c14008f9a405f215b60994362b36d6c Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 27 May 2026 15:56:11 -0400 Subject: [PATCH 037/107] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When generating bitmaps, `bitmap_builder_init()` starts with an initial selection of commits to receive bitmap coverage, and then determines a set of "maximal" commits based on its input. Commit 089f751360f (pack-bitmap-write: build fewer intermediate bitmaps, 2020-12-08) has extensive details, but the gist is as follows: Each selected commit starts with one commit_mask bit in its "commit mask" bitmap. Then, we walk the first-parent history in topological order and OR each commit's mask into its (first) parent. Whenever that OR results in the parent having more bits set, the child is deemed to be non-maximal, and the frontier is pushed further back along the first parent history. That approach works extremely well for ordinary selected commits, whose first-parent histories often describe real sharing between the bitmaps we are going to write. It struggles, however, to efficiently generate pseudo-merge bitmaps. Unlike ordinary commits for which the above algorithm is designed, pseudo-merges don't represent any "real" commit in history, just a grouping of non-bitmapped reference tips. In that sense, their first parent is just a part of a larger set, and treating them like ordinary selected commits imposes a significant slow-down when generating bitmaps with pseudo-merges enabled. Consider partitioning all non-bitmapped reference tips into eight individual pseudo-merges via the following configuration: [bitmapPseudoMerge "all"] pattern=refs/ threshold=now stableSize=10000000 maxMerges=8 , the cost of generating a bitmap from scratch rises significantly: +------------------+-----------------+---------------+---------------------+ | | no pseudo-merge | pseudo-merges | Delta | | | | (HEAD^) | | +------------------+-----------------+---------------+---------------------+ | elapsed | 294.1 s | 575.0 s | +280.9 s (+95.5%) | | cycles | 1,365.5 B | 2,686.9 B | +1,321.4 B (+96.8%) | | instructions | 1,389.8 B | 2,546.6 B | +1,156.8 B (+83.2%) | | CPI | 0.983 | 1.055 | +0.073 (+7.4%) | +------------------+-----------------+---------------+---------------------+ This is a particularly poor trade-off, because the time saved by these pseudo-merges during, e.g., $ git rev-list --count --all --objects --use-bitmap-index is only: $ hyperfine -L v true,false -n 'pseudo-merges: {v}' ' GIT_TEST_USE_PSEUDO_MERGES={v} git.compile rev-list --count \ --objects --all --use-bitmap-index ' Benchmark 1: pseudo-merges: true Time (mean ± σ): 2.613 s ± 0.012 s [User: 2.308 s, System: 0.305 s] Range (min … max): 2.594 s … 2.633 s 10 runs Benchmark 2: pseudo-merges: false Time (mean ± σ): 52.205 s ± 0.170 s [User: 51.500 s, System: 0.697 s] Range (min … max): 51.956 s … 52.458 s 10 runs Summary pseudo-merges: true ran 19.98 ± 0.11 times faster than pseudo-merges: false In other words, we pay a nearly ~5 minute penalty to generate pseudo-merge bitmaps, but only save ~50 seconds during traversal. The problem stems from injecting pseudo-merges into the bitmap builder as if they were normal commits. The maximal commit selection algorithm was simply not designed for that case, and performs predictably poorly. The only reason we reused the maximal commit selection routine for pseudo-merges alongside regular non-pseudo-merge commits is because we represent them both as commit objects (where the pseudo-merge commits just represent a made-up commit as opposed to one that actually exists in a repository's object store). Instead, build the regular selected commit bitmaps first, considering only non-pseudo-merge commits in `bitmap_builder_init()`. Once those bitmaps have been stored, build each pseudo-merge bitmap separately and attach its parent and object bitmaps to the corresponding pseudo-merge entry before writing the extension. This keeps the regular bitmap build shaped like the no-pseudo-merge case. The later pseudo-merge fill can still stop at stored selected ancestor bitmaps, so it does not have to rewalk each pseudo-merge closure from scratch. When an existing bitmap has the same pseudo-merge parent set, reuse and remap that whole pseudo-merge bitmap before falling back to fill_bitmap_commit(). This preserves the benefit of stable pseudo-merges while keeping the on-disk format and reader behavior unchanged. As a result, the overhead cost for generating pseudo-merges in the above configuration is much smaller: +------------------+-----------------+---------------+-------------------+ | | no pseudo-merge | pseudo-merges | Delta | | | | (HEAD) | | +------------------+-----------------+---------------+-------------------+ | elapsed | 294.1 s | 328.4 s | +34.3 s (+11.7%) | | cycles | 1,365.5 B | 1,529.3 B | +163.7 B (+12.0%) | | instructions | 1,389.8 B | 1,552.8 B | +163.0 B (+11.7%) | | CPI | 0.983 | 0.985 | +0.002 (+0.2%) | +------------------+-----------------+---------------+-------------------+ Recall that at the start of this series, generating reachability bitmaps took 612.5 seconds *without* pseudo-merges. With this commit, it is still ~46.38% *faster* to generate reachability bitmaps *with* pseudo-merges than it was to generate bitmaps wihtout them at the beginning of this series. The changes to implement this are mostly straightforward. We exclude pseudo-merge commits from the existing bitmap generation, and walk over them in a separate pass, by either reusing an existing on-disk pseudo-merge, or passing the pseudo-merge commit itself back to the existing routine in `fill_bitmap_commit()`. (Note that the routine to build pseudo-merge bitmaps is the same both before and after this change, the difference is only that we do not let psuedo-merges participate in determining the set of maximal commits.) The only wrinkle is that `fill_bitmap_commit()` must be taught to not expect that all tree objects have been parsed, which is the case for any portion of history reachable by one or more pseudo-merge(s), but not by any non-pseudo-merge commit selected for bitmapping. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap-write.c | 210 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 174 insertions(+), 36 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 8200aed610135b..1bcb3f98a42518 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -446,13 +446,17 @@ static void bitmap_builder_init(struct bitmap_builder *bb, revs.topo_order = 1; revs.first_parent_only = 1; - for (i = 0; i < writer->selected_nr; i++) { + for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) { struct bitmapped_commit *bc = &writer->selected[i]; struct bb_commit *ent = bb_data_at(&bb->data, bc->commit); + if (bc->pseudo_merge) + BUG("unexpected pseudo-merge at %"PRIuMAX, + (uintmax_t)i); + ent->selected = 1; ent->maximal = 1; - ent->pseudo_merge = bc->pseudo_merge; + ent->pseudo_merge = 0; ent->idx = i; ent->commit_mask = bitmap_new(); @@ -618,6 +622,8 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, static int reused_bitmaps_nr; static int reused_pseudo_merge_bitmaps_nr; +static int pseudo_merge_bitmap_nr; +static int pseudo_merge_bitmap_parents; static int fill_bitmap_commit_calls_nr; static int fill_bitmap_commit_found_ancestor_nr; @@ -631,8 +637,12 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, const uint32_t *mapping) { int found; + int from_pseudo_merge = commit->object.flags & BITMAP_PSEUDO_MERGE; uint32_t pos; + if (ent->pseudo_merge) + BUG("unexpected pseudo-merge commit in fill_bitmap_commit()"); + fill_bitmap_commit_calls_nr++; if (!ent->bitmap) @@ -648,10 +658,7 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, struct ewah_bitmap *old; struct bitmap *remapped = bitmap_new(); - if (commit->object.flags & BITMAP_PSEUDO_MERGE) - old = pseudo_merge_bitmap_for_commit(old_bitmap, c); - else - old = bitmap_for_commit(old_bitmap, c); + old = bitmap_for_commit(old_bitmap, c); /* * If this commit has an old bitmap, then translate that * bitmap and add its bits to this one. No need to walk @@ -660,10 +667,7 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, if (old && !rebuild_bitmap(mapping, old, remapped)) { bitmap_or(ent->bitmap, remapped); bitmap_free(remapped); - if (commit->object.flags & BITMAP_PSEUDO_MERGE) - reused_pseudo_merge_bitmaps_nr++; - else - reused_bitmaps_nr++; + reused_bitmaps_nr++; continue; } bitmap_free(remapped); @@ -696,12 +700,32 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, * walk ensures we cover all parents. */ if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) { + struct tree *tree; + + if (from_pseudo_merge && !c->object.parsed) { + /* + * Commits reachable from selected + * non-pseudo-merges are already parsed + * by the regular bitmap build. + * + * However, pseudo-merge fills can also + * reach commits that were not covered + * there, so parse any such leftovers + * before reading their tree or parents. + */ + if (repo_parse_commit(writer->repo, c)) + return -1; + } + pos = find_object_pos(writer, &c->object.oid, &found); if (!found) return -1; bitmap_set(ent->bitmap, pos); - prio_queue_put(tree_queue, - repo_get_commit_tree(writer->repo, c)); + + tree = repo_get_commit_tree(writer->repo, c); + if (!tree) + return -1; + prio_queue_put(tree_queue, tree); } for (p = c->parents; p; p = p->next) { @@ -738,6 +762,137 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, return 0; } +static int reuse_pseudo_merge_bitmap(struct bitmap_index *old_bitmap, + const uint32_t *mapping, + struct commit *merge, + struct ewah_bitmap **out) +{ + struct ewah_bitmap *old; + struct bitmap *remapped; + + if (!old_bitmap || !mapping) + return 0; + + old = pseudo_merge_bitmap_for_commit(old_bitmap, merge); + if (!old) + return 0; + + remapped = bitmap_new(); + if (rebuild_bitmap(mapping, old, remapped) < 0) { + bitmap_free(remapped); + return 0; + } + + *out = bitmap_to_ewah(remapped); + bitmap_free(remapped); + reused_pseudo_merge_bitmaps_nr++; + return 1; +} + +static int build_pseudo_merge_bitmap(struct bitmap_writer *writer, + struct bitmap_index *old_bitmap, + const uint32_t *mapping, + struct commit *merge, + struct ewah_bitmap **out) +{ + struct bb_commit ent = { 0 }; + struct prio_queue queue = { NULL }; + struct prio_queue tree_queue = { NULL }; + unsigned parents = commit_list_count(merge->parents); + int ret; + + ent.bitmap = bitmap_new(); + + pseudo_merge_bitmap_nr++; + pseudo_merge_bitmap_parents += parents; + + if (reuse_pseudo_merge_bitmap(old_bitmap, mapping, merge, out)) { + ret = 0; + goto done; + } + + ret = fill_bitmap_commit(writer, &ent, merge, &queue, &tree_queue, + old_bitmap, mapping); + + if (!ret) + *out = bitmap_to_ewah(ent.bitmap); + +done: + bitmap_free(ent.bitmap); + clear_prio_queue(&queue); + clear_prio_queue(&tree_queue); + + return ret; +} + +static int build_pseudo_merge_bitmaps(struct bitmap_writer *writer, + struct bitmap_index *old_bitmap, + const uint32_t *mapping, + int *nr_stored) +{ + size_t i = bitmap_writer_nr_selected_commits(writer); + int ret = 0; + + if (!writer->pseudo_merges_nr) + return 0; + + trace2_region_enter("pack-bitmap-write", "building_pseudo_merge_bitmaps", + writer->repo); + + for (; i < writer->selected_nr; i++) { + struct bitmapped_commit *merge = &writer->selected[i]; + struct commit_list *p; + struct bitmap *parents = bitmap_new(); + struct ewah_bitmap *objects = NULL; + + if (!merge->pseudo_merge) + BUG("found non-pseudo merge commit at %"PRIuMAX, + (uintmax_t)i); + + for (p = merge->commit->parents; p; p = p->next) { + int found; + uint32_t pos = find_object_pos(writer, + &p->item->object.oid, + &found); + if (!found) { + bitmap_free(parents); + ret = -1; + goto done; + } + bitmap_set(parents, pos); + } + + merge->pseudo_merge_parents = bitmap_to_ewah(parents); + bitmap_free(parents); + + if (build_pseudo_merge_bitmap(writer, old_bitmap, mapping, + merge->commit, &objects) < 0) { + ret = -1; + goto done; + } + merge->bitmap = objects; + + (*nr_stored)++; + display_progress(writer->progress, *nr_stored); + } + +done: + trace2_region_leave("pack-bitmap-write", "building_pseudo_merge_bitmaps", + writer->repo); + + trace2_data_intmax("pack-bitmap-write", writer->repo, + "pseudo_merge_bitmap_nr", + pseudo_merge_bitmap_nr); + trace2_data_intmax("pack-bitmap-write", writer->repo, + "building_bitmaps_pseudo_merge_reused", + reused_pseudo_merge_bitmaps_nr); + trace2_data_intmax("pack-bitmap-write", writer->repo, + "pseudo_merge_bitmap_parents", + pseudo_merge_bitmap_parents); + + return ret; +} + static void store_selected(struct bitmap_writer *writer, struct bb_commit *ent, struct commit *commit) { @@ -821,6 +976,10 @@ int bitmap_writer_build(struct bitmap_writer *writer) bitmap_free(ent->bitmap); ent->bitmap = NULL; } + if (closed && + build_pseudo_merge_bitmaps(writer, old_bitmap, mapping, + &nr_stored) < 0) + closed = 0; clear_prio_queue(&queue); clear_prio_queue(&tree_queue); bitmap_builder_clear(&bb); @@ -831,9 +990,6 @@ int bitmap_writer_build(struct bitmap_writer *writer) writer->repo); trace2_data_intmax("pack-bitmap-write", writer->repo, "building_bitmaps_reused", reused_bitmaps_nr); - trace2_data_intmax("pack-bitmap-write", writer->repo, - "building_bitmaps_pseudo_merge_reused", - reused_pseudo_merge_bitmaps_nr); trace2_data_intmax("pack-bitmap-write", writer->repo, "fill_bitmap_commit_calls_nr", fill_bitmap_commit_calls_nr); @@ -1015,23 +1171,6 @@ static void write_pseudo_merges(struct bitmap_writer *writer, CALLOC_ARRAY(pseudo_merge_ofs, writer->pseudo_merges_nr); - for (i = 0; i < writer->pseudo_merges_nr; i++) { - struct bitmapped_commit *merge = &writer->selected[base + i]; - struct commit_list *p; - struct bitmap *parents = bitmap_new(); - - if (!merge->pseudo_merge) - BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i); - - for (p = merge->commit->parents; p; p = p->next) - bitmap_set(parents, - find_object_pos(writer, &p->item->object.oid, - NULL)); - - merge->pseudo_merge_parents = bitmap_to_ewah(parents); - bitmap_free(parents); - } - start = hashfile_total(f); for (i = 0; i < writer->pseudo_merges_nr; i++) { @@ -1040,14 +1179,13 @@ static void write_pseudo_merges(struct bitmap_writer *writer, if (!merge->pseudo_merge) BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i); - if (!merge->pseudo_merge_parents) - BUG("missing pseudo-merge parents bitmap for commit %s", + if (!merge->pseudo_merge_parents || !merge->bitmap) + BUG("missing pseudo-merge bitmap for commit %s", oid_to_hex(&merge->commit->object.oid)); pseudo_merge_ofs[i] = hashfile_total(f); - dump_bitmap(f, merge->pseudo_merge_parents); - dump_bitmap(f, writer->selected[base+i].write_as); + dump_bitmap(f, merge->bitmap); } next_ext = st_add(hashfile_total(f), From 9f4e170dfc3bd8cdd284f1c4411b25ce1d09737f Mon Sep 17 00:00:00 2001 From: Kristofer Karlsson Date: Wed, 27 May 2026 15:50:00 +0000 Subject: [PATCH 038/107] pack-objects: call release_revisions() after cruft traversal enumerate_and_traverse_cruft_objects() initializes a rev_info on the stack but never calls release_revisions() afterwards. This is not visible on master but becomes a leak once the revision walking machinery uses dynamically allocated structures. Add the missing release_revisions() call. Signed-off-by: Kristofer Karlsson Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 480cc0bd8c8d22..67025e86256cfd 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4275,6 +4275,7 @@ static void enumerate_and_traverse_cruft_objects(struct string_list *fresh_packs traverse_commit_list(&revs, show_cruft_commit, show_cruft_object, NULL); stop_progress(&progress_state); + release_revisions(&revs); } static void read_cruft_objects(void) From d877b1af507a6aaf55e8643eb73277a30d3a800b Mon Sep 17 00:00:00 2001 From: Kristofer Karlsson Date: Wed, 27 May 2026 15:50:01 +0000 Subject: [PATCH 039/107] revision: introduce rev_walk_mode to clarify get_revision_1() get_revision_1() dispatches to different walk strategies based on a combination of rev_info flags: reflog_info, topo_walk_info, and limited. These conditions are checked in multiple places within the function -- once to select the next commit, and again to decide how to expand parents -- and the two chains must stay in sync. Extract the mode selection into a rev_walk_mode enum and a small get_walk_mode() helper, resolved once at the top of get_revision_1(). Both dispatch sites now switch on the same mode variable, making it obvious that they agree and easier to verify that all modes are handled. No functional change. Signed-off-by: Kristofer Karlsson Signed-off-by: Junio C Hamano --- revision.c | 62 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 14 deletions(-) diff --git a/revision.c b/revision.c index e1970b9c5d34ed..9d0fc696d09937 100644 --- a/revision.c +++ b/revision.c @@ -4327,22 +4327,48 @@ static void track_linear(struct rev_info *revs, struct commit *commit) revs->previous_parents = commit_list_copy(commit->parents); } +enum rev_walk_mode { + REV_WALK_REFLOG, + REV_WALK_TOPO, + REV_WALK_LIMITED, + REV_WALK_STREAMING, +}; + +static enum rev_walk_mode get_walk_mode(struct rev_info *revs) +{ + if (revs->reflog_info) + return REV_WALK_REFLOG; + if (revs->topo_walk_info) + return REV_WALK_TOPO; + if (revs->limited) + return REV_WALK_LIMITED; + return REV_WALK_STREAMING; +} + static struct commit *get_revision_1(struct rev_info *revs) { + enum rev_walk_mode mode = get_walk_mode(revs); + while (1) { struct commit *commit; - if (revs->reflog_info) + switch (mode) { + case REV_WALK_REFLOG: commit = next_reflog_entry(revs->reflog_info); - else if (revs->topo_walk_info) + break; + case REV_WALK_TOPO: commit = next_topo_commit(revs); - else + break; + case REV_WALK_LIMITED: + case REV_WALK_STREAMING: commit = pop_commit(&revs->commits); + break; + } if (!commit) return NULL; - if (revs->reflog_info) + if (mode == REV_WALK_REFLOG) commit->object.flags &= ~(ADDED | SEEN | SHOWN); /* @@ -4350,20 +4376,28 @@ static struct commit *get_revision_1(struct rev_info *revs) * the parents here. We also need to do the date-based limiting * that we'd otherwise have done in limit_list(). */ - if (!revs->limited) { - if (revs->max_age != -1 && - comparison_date(revs, commit) < revs->max_age) - continue; + if (mode != REV_WALK_LIMITED && + revs->max_age != -1 && + comparison_date(revs, commit) < revs->max_age) + continue; - if (revs->reflog_info) - try_to_simplify_commit(revs, commit); - else if (revs->topo_walk_info) - expand_topo_walk(revs, commit); - else if (process_parents(revs, commit, &revs->commits, NULL) < 0) { + switch (mode) { + case REV_WALK_REFLOG: + try_to_simplify_commit(revs, commit); + break; + case REV_WALK_TOPO: + expand_topo_walk(revs, commit); + break; + case REV_WALK_STREAMING: + if (process_parents(revs, commit, + &revs->commits, NULL) < 0) { if (!revs->ignore_missing_links) die("Failed to traverse parents of commit %s", - oid_to_hex(&commit->object.oid)); + oid_to_hex(&commit->object.oid)); } + break; + case REV_WALK_LIMITED: + break; } switch (simplify_commit(revs, commit)) { From dd4bc01c0a8fc871a68a5027ed5ac953fa47fc6e Mon Sep 17 00:00:00 2001 From: Kristofer Karlsson Date: Wed, 27 May 2026 15:50:02 +0000 Subject: [PATCH 040/107] revision: use priority queue for non-limited streaming walks The streaming (non-limited) walk in get_revision_1() inserts newly discovered parent commits into a date-sorted queue via commit_list_insert_by_date(), which scans the linked list to find the insertion point -- O(w) per insert, where w is the width of the active walk frontier. Replace this with an O(log w) priority queue. Add a commit_queue field to rev_info alongside the existing commits linked list. The two representations are mutually exclusive: setup and external callers that need list access use the linked list, then get_revision_1() lazily drains it into the priority queue on first call. Add a REV_WALK_NO_WALK enum value to distinguish the no_walk case (which still uses the commit list) from the streaming case. The conversion function rev_info_commit_list_to_queue() is public so callers that know they will iterate can convert early. Combined with the limit_list() priority queue change already in master, this eliminates all O(w) sorted linked-list insertion from the revision walk machinery. Signed-off-by: Kristofer Karlsson Signed-off-by: Junio C Hamano --- commit.c | 13 ------------- commit.h | 2 -- revision.c | 55 +++++++++++++++++++++++++++++------------------------- revision.h | 12 +++++++++++- 4 files changed, 41 insertions(+), 41 deletions(-) diff --git a/commit.c b/commit.c index e3e7352e69682d..5112c7b2af31b1 100644 --- a/commit.c +++ b/commit.c @@ -729,19 +729,6 @@ void commit_list_free(struct commit_list *list) pop_commit(&list); } -struct commit_list * commit_list_insert_by_date(struct commit *item, struct commit_list **list) -{ - struct commit_list **pp = list; - struct commit_list *p; - while ((p = *pp) != NULL) { - if (p->item->date < item->date) { - break; - } - pp = &p->next; - } - return commit_list_insert(item, pp); -} - static int commit_list_compare_by_date(const struct commit_list *a, const struct commit_list *b) { diff --git a/commit.h b/commit.h index 58150045afafed..385492fbb1ecc5 100644 --- a/commit.h +++ b/commit.h @@ -191,8 +191,6 @@ int commit_list_contains(struct commit *item, struct commit_list **commit_list_append(struct commit *commit, struct commit_list **next); unsigned commit_list_count(const struct commit_list *l); -struct commit_list *commit_list_insert_by_date(struct commit *item, - struct commit_list **list); void commit_list_sort_by_date(struct commit_list **list); /* Shallow copy of the input list */ diff --git a/revision.c b/revision.c index 9d0fc696d09937..4bb3b16e43acb9 100644 --- a/revision.c +++ b/revision.c @@ -1116,7 +1116,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) } static int process_parents(struct rev_info *revs, struct commit *commit, - struct commit_list **list, struct prio_queue *queue) + struct prio_queue *queue) { struct commit_list *parent = commit->parents; unsigned pass_flags; @@ -1158,8 +1158,6 @@ static int process_parents(struct rev_info *revs, struct commit *commit, if (p->object.flags & SEEN) continue; p->object.flags |= (SEEN | NOT_USER_GIVEN); - if (list) - commit_list_insert_by_date(p, list); if (queue) prio_queue_put(queue, p); if (revs->exclude_first_parent_only) @@ -1207,8 +1205,6 @@ static int process_parents(struct rev_info *revs, struct commit *commit, p->object.flags |= pass_flags | CHILD_VISITED; if (!(p->object.flags & SEEN)) { p->object.flags |= (SEEN | NOT_USER_GIVEN); - if (list) - commit_list_insert_by_date(p, list); if (queue) prio_queue_put(queue, p); } @@ -1470,7 +1466,7 @@ static int limit_list(struct rev_info *revs) if (revs->max_age != -1 && (commit->date < revs->max_age)) obj->flags |= UNINTERESTING; - if (process_parents(revs, commit, NULL, &queue) < 0) { + if (process_parents(revs, commit, &queue) < 0) { clear_prio_queue(&queue); return -1; } @@ -3257,6 +3253,7 @@ static void free_void_commit_list(void *list) void release_revisions(struct rev_info *revs) { commit_list_free(revs->commits); + clear_prio_queue(&revs->commit_queue); commit_list_free(revs->ancestry_path_bottoms); release_display_notes(&revs->notes_opt); object_array_clear(&revs->pending); @@ -3726,7 +3723,7 @@ static void explore_walk_step(struct rev_info *revs) if (revs->max_age != -1 && (c->date < revs->max_age)) c->object.flags |= UNINTERESTING; - if (process_parents(revs, c, NULL, NULL) < 0) + if (process_parents(revs, c, NULL) < 0) return; if (c->object.flags & UNINTERESTING) @@ -3902,7 +3899,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) { struct commit_list *p; struct topo_walk_info *info = revs->topo_walk_info; - if (process_parents(revs, commit, NULL, NULL) < 0) { + if (process_parents(revs, commit, NULL) < 0) { if (!revs->ignore_missing_links) die("Failed to traverse parents of commit %s", oid_to_hex(&commit->object.oid)); @@ -3938,6 +3935,13 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) } } +void rev_info_commit_list_to_queue(struct rev_info *revs) +{ + while (revs->commits) + prio_queue_put(&revs->commit_queue, pop_commit(&revs->commits)); +} + + int prepare_revision_walk(struct rev_info *revs) { int i; @@ -4006,7 +4010,7 @@ static enum rewrite_result rewrite_one_1(struct rev_info *revs, for (;;) { struct commit *p = *pp; if (!revs->limited) - if (process_parents(revs, p, NULL, queue) < 0) + if (process_parents(revs, p, queue) < 0) return rewrite_one_error; if (p->object.flags & UNINTERESTING) return rewrite_one_ok; @@ -4020,27 +4024,18 @@ static enum rewrite_result rewrite_one_1(struct rev_info *revs, } } -static void merge_queue_into_list(struct prio_queue *q, struct commit_list **list) +static void merge_queue_into_prio_queue(struct prio_queue *from, + struct prio_queue *to) { - while (q->nr) { - struct commit *item = prio_queue_peek(q); - struct commit_list *p = *list; - - if (p && p->item->date >= item->date) - list = &p->next; - else { - p = commit_list_insert(item, list); - list = &p->next; /* skip newly added item */ - prio_queue_get(q); /* pop item */ - } - } + while (from->nr) + prio_queue_put(to, prio_queue_get(from)); } static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp) { struct prio_queue queue = { compare_commits_by_commit_date }; enum rewrite_result ret = rewrite_one_1(revs, pp, &queue); - merge_queue_into_list(&queue, &revs->commits); + merge_queue_into_prio_queue(&queue, &revs->commit_queue); clear_prio_queue(&queue); return ret; } @@ -4331,6 +4326,7 @@ enum rev_walk_mode { REV_WALK_REFLOG, REV_WALK_TOPO, REV_WALK_LIMITED, + REV_WALK_NO_WALK, REV_WALK_STREAMING, }; @@ -4342,6 +4338,8 @@ static enum rev_walk_mode get_walk_mode(struct rev_info *revs) return REV_WALK_TOPO; if (revs->limited) return REV_WALK_LIMITED; + if (revs->no_walk) + return REV_WALK_NO_WALK; return REV_WALK_STREAMING; } @@ -4349,6 +4347,9 @@ static struct commit *get_revision_1(struct rev_info *revs) { enum rev_walk_mode mode = get_walk_mode(revs); + if (mode == REV_WALK_STREAMING && revs->commits) + rev_info_commit_list_to_queue(revs); + while (1) { struct commit *commit; @@ -4360,9 +4361,12 @@ static struct commit *get_revision_1(struct rev_info *revs) commit = next_topo_commit(revs); break; case REV_WALK_LIMITED: - case REV_WALK_STREAMING: + case REV_WALK_NO_WALK: commit = pop_commit(&revs->commits); break; + case REV_WALK_STREAMING: + commit = prio_queue_get(&revs->commit_queue); + break; } if (!commit) @@ -4390,12 +4394,13 @@ static struct commit *get_revision_1(struct rev_info *revs) break; case REV_WALK_STREAMING: if (process_parents(revs, commit, - &revs->commits, NULL) < 0) { + &revs->commit_queue) < 0) { if (!revs->ignore_missing_links) die("Failed to traverse parents of commit %s", oid_to_hex(&commit->object.oid)); } break; + case REV_WALK_NO_WALK: case REV_WALK_LIMITED: break; } diff --git a/revision.h b/revision.h index 584f1338b5e323..04982a3d47f28f 100644 --- a/revision.h +++ b/revision.h @@ -12,6 +12,7 @@ #include "decorate.h" #include "ident.h" #include "list-objects-filter-options.h" +#include "prio-queue.h" #include "strvec.h" /** @@ -122,8 +123,14 @@ struct oidset; struct topo_walk_info; struct rev_info { - /* Starting list */ + /* + * Work queue of commits, stored as either a linked list or a + * priority queue, but never both at the same time. + * rev_info_commit_list_to_queue() converts list to queue. + */ struct commit_list *commits; + struct prio_queue commit_queue; + struct object_array pending; struct repository *repo; @@ -400,6 +407,7 @@ struct rev_info { * uninitialized. */ #define REV_INFO_INIT { \ + .commit_queue = { .compare = compare_commits_by_commit_date }, \ .abbrev = DEFAULT_ABBREV, \ .simplify_history = 1, \ .pruning.flags.recursive = 1, \ @@ -478,6 +486,8 @@ void reset_revision_walk(void); */ int prepare_revision_walk(struct rev_info *revs); +/* Drain the commits linked list into the priority queue. */ +void rev_info_commit_list_to_queue(struct rev_info *revs); /** * Takes a pointer to a `rev_info` structure and iterates over it, returning a * `struct commit *` each time you call it. The end of the revision list is From 8c84e6802c0e23503bfe655dadcdc4a15de7373a Mon Sep 17 00:00:00 2001 From: Kristofer Karlsson Date: Thu, 28 May 2026 09:00:48 +0000 Subject: [PATCH 041/107] t3070: skip ls-files tests with backslash patterns on Windows MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On Windows (MINGW), backslashes in pathspecs are silently converted to forward slashes (directory separators), which changes the glob semantics. This causes 36 test failures in t3070-wildmatch when the "via ls-files" variants test patterns containing backslash escapes (e.g. '\[ab]', '[\-_]', '[A-\\]'). The wildmatch function itself handles these patterns correctly — only the ls-files code path fails because pathspec parsing converts the backslashes before they reach the glob matcher. Skip these ls-files tests on platforms where BSLASHPSPEC is not set, which is the existing prereq that captures exactly this semantic: "backslashes in pathspec are not directory separators." Signed-off-by: Kristofer Karlsson Signed-off-by: Junio C Hamano --- t/t3070-wildmatch.sh | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 655bb1a0f21031..33941222189663 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -99,6 +99,13 @@ match_with_ls_files() { match_function=$4 ls_files_args=$5 + prereqs=EXPENSIVE_ON_WINDOWS + case "$pattern" in + *\\*) + prereqs="$prereqs,BSLASHPSPEC" + ;; + esac + match_stdout_stderr_cmp=" tr -d '\0' actual && test_must_be_empty actual.err && @@ -108,36 +115,36 @@ match_with_ls_files() { then if test -e .git/created_test_file then - test_expect_success EXPENSIVE_ON_WINDOWS "$match_function (via ls-files): match dies on '$pattern' '$text'" " + test_expect_success $prereqs "$match_function (via ls-files): match dies on '$pattern' '$text'" " printf '%s' '$text' >expect && test_must_fail git$ls_files_args ls-files -z -- '$pattern' " else - test_expect_failure EXPENSIVE_ON_WINDOWS "$match_function (via ls-files): match skip '$pattern' '$text'" 'false' + test_expect_failure $prereqs "$match_function (via ls-files): match skip '$pattern' '$text'" 'false' fi elif test "$match_expect" = 1 then if test -e .git/created_test_file then - test_expect_success EXPENSIVE_ON_WINDOWS "$match_function (via ls-files): match '$pattern' '$text'" " + test_expect_success $prereqs "$match_function (via ls-files): match '$pattern' '$text'" " printf '%s' '$text' >expect && git$ls_files_args ls-files -z -- '$pattern' >actual.raw 2>actual.err && $match_stdout_stderr_cmp " else - test_expect_failure EXPENSIVE_ON_WINDOWS "$match_function (via ls-files): match skip '$pattern' '$text'" 'false' + test_expect_failure $prereqs "$match_function (via ls-files): match skip '$pattern' '$text'" 'false' fi elif test "$match_expect" = 0 then if test -e .git/created_test_file then - test_expect_success EXPENSIVE_ON_WINDOWS "$match_function (via ls-files): no match '$pattern' '$text'" " + test_expect_success $prereqs "$match_function (via ls-files): no match '$pattern' '$text'" " >expect && git$ls_files_args ls-files -z -- '$pattern' >actual.raw 2>actual.err && $match_stdout_stderr_cmp " else - test_expect_failure EXPENSIVE_ON_WINDOWS "$match_function (via ls-files): no match skip '$pattern' '$text'" 'false' + test_expect_failure $prereqs "$match_function (via ls-files): no match skip '$pattern' '$text'" 'false' fi else test_expect_success "PANIC: Test framework error. Unknown matches value $match_expect" 'false' From 1ec041bebb46159562c4beeb2e6980284e0f9a28 Mon Sep 17 00:00:00 2001 From: Michael Montalbo Date: Thu, 28 May 2026 19:21:45 +0000 Subject: [PATCH 042/107] doc: clarify that --word-diff operates on line-level hunks The --word-diff documentation describes the output modes and word-regex mechanics but does not explain that word-diff operates within the hunks produced by the line-level diff rather than performing an independent word-stream comparison. This can surprise users when the line-level alignment causes word-level changes to appear even though the words in both files are identical. Add an implementation note explaining the two-stage relationship and that the output may change if Git acquires a different implementation in the future. Signed-off-by: Michael Montalbo Signed-off-by: Junio C Hamano --- Documentation/diff-options.adoc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc index 9cdad6f72a0c7d..88b724b8c6dba4 100644 --- a/Documentation/diff-options.adoc +++ b/Documentation/diff-options.adoc @@ -455,6 +455,14 @@ endif::git-diff[] + Note that despite the name of the first mode, color is used to highlight the changed parts in all modes if enabled. ++ +The `--word-diff` option operates by taking the same line-by-line +diff that is produced without the option and computing +word-by-word changes within each hunk. This may produce a +larger diff than a dedicated word-diff tool would. If Git +acquires a different implementation in the future, the output +may change. Note that this is similar to the `--diff-algorithm` +option, which may also change the output. `--word-diff-regex=`:: Use __ to decide what a word is, instead of considering From 558057cf4f43ea3b28c5e0b1b2250cab362f1a6a Mon Sep 17 00:00:00 2001 From: Michael Montalbo Date: Thu, 28 May 2026 20:47:44 +0000 Subject: [PATCH 043/107] revision: move -L setup before output_format-to-diff derivation The line_level_traverse block sets a default DIFF_FORMAT_PATCH when no output format has been explicitly requested. This default must be visible to the "Did the user ask for any diff output?" check that derives revs->diff from revs->diffopt.output_format. Currently the -L block runs after that derivation, so revs->diff stays 0 when no explicit format is given. This does not matter yet because log_tree_commit() short-circuits into line_log_print() before consulting revs->diff, but the next commit will route -L through the normal log_tree_diff() path, which checks revs->diff. Move the block above the derivation so the default DIFF_FORMAT_PATCH is in place when revs->diff is computed. No behavior change on its own. Signed-off-by: Michael Montalbo Signed-off-by: Junio C Hamano --- revision.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/revision.c b/revision.c index 599b3a66c369ca..4a8e24bc38d572 100644 --- a/revision.c +++ b/revision.c @@ -3112,6 +3112,14 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s object_context_release(&oc); } + if (revs->line_level_traverse) { + if (want_ancestry(revs)) + revs->limited = 1; + revs->topo_order = 1; + if (!revs->diffopt.output_format) + revs->diffopt.output_format = DIFF_FORMAT_PATCH; + } + /* Did the user ask for any diff output? Run the diff! */ if (revs->diffopt.output_format & ~DIFF_FORMAT_NO_OUTPUT) revs->diff = 1; @@ -3125,14 +3133,6 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s if (revs->diffopt.objfind) revs->simplify_history = 0; - if (revs->line_level_traverse) { - if (want_ancestry(revs)) - revs->limited = 1; - revs->topo_order = 1; - if (!revs->diffopt.output_format) - revs->diffopt.output_format = DIFF_FORMAT_PATCH; - } - if (revs->topo_order && !generation_numbers_enabled(the_repository)) revs->limited = 1; From 42d960748efa79a31e72cc36d983aca244dc167e Mon Sep 17 00:00:00 2001 From: Michael Montalbo Date: Thu, 28 May 2026 20:47:45 +0000 Subject: [PATCH 044/107] line-log: integrate -L output with the standard log-tree pipeline `git log -L` has bypassed log_tree_diff() and log_tree_diff_flush() since the feature was introduced, short-circuiting from log_tree_commit() directly into line_log_print(). This skips the no_free save/restore (noted in a NEEDSWORK comment added by f8781bfda3), the always_show_header fallback, show_diff_of_diff(), and diff_free() cleanup. Restructure so that -L flows through log_tree_diff() -> log_tree_diff_flush(), the same path used by the normal single-parent and merge diff codepaths: - Rename line_log_print() to line_log_queue_pairs() and strip it down to just queuing pre-computed filepairs. The show_log(), separator, diffcore_std(), and diff_flush() calls are removed since log_tree_diff_flush() handles all of those. - In log_tree_diff(), call line_log_queue_pairs() then log_tree_diff_flush(), mirroring the diff_tree_oid() + flush pattern used by the single-parent and merge codepaths. - Remove the early return in log_tree_commit() that is no longer needed now that -L output flows through log_tree_diff() and log_tree_diff_flush(); this restores no_free save/restore, always_show_header, and diff_free() cleanup. Because show_log() is now deferred until after diffcore_std() inside log_tree_diff_flush(), pickaxe (-S, -G, --find-object) and --diff-filter now properly suppress commits when all pairs are filtered out. The blank-line separator between commit header and diff changes slightly: the old code printed one unconditionally, while log_tree_diff_flush() only emits one for verbose headers. This matches the rest of log output. Also reject --full-diff, which is not yet supported with -L: the filepairs are pre-computed during the history walk and scoped to tracked line ranges, so there is currently no full-tree diff to fall back to for display. Update tests accordingly. Signed-off-by: Michael Montalbo Signed-off-by: Junio C Hamano --- line-log.c | 30 ++++------- line-log.h | 2 +- log-tree.c | 10 ++-- revision.c | 6 ++- t/t4211-line-log.sh | 53 ++++++++++++++----- t/t4211/sha1/expect.parallel-change-f-to-main | 1 - .../sha256/expect.parallel-change-f-to-main | 1 - 7 files changed, 60 insertions(+), 43 deletions(-) diff --git a/line-log.c b/line-log.c index 858a899cd2a61d..7ee55b05cc5077 100644 --- a/line-log.c +++ b/line-log.c @@ -13,7 +13,6 @@ #include "revision.h" #include "xdiff-interface.h" #include "strbuf.h" -#include "log-tree.h" #include "line-log.h" #include "setup.h" #include "strvec.h" @@ -1004,29 +1003,18 @@ static int process_all_files(struct line_log_data **range_out, return changed; } -int line_log_print(struct rev_info *rev, struct commit *commit) +void line_log_queue_pairs(struct rev_info *rev, struct commit *commit) { - show_log(rev); - if (!(rev->diffopt.output_format & DIFF_FORMAT_NO_OUTPUT)) { - struct line_log_data *range = lookup_line_range(rev, commit); - struct line_log_data *r; - const char *prefix = diff_line_prefix(&rev->diffopt); - - fprintf(rev->diffopt.file, "%s\n", prefix); - - for (r = range; r; r = r->next) { - if (r->pair) { - struct diff_filepair *p = - diff_filepair_dup(r->pair); - p->line_ranges = &r->ranges; - diff_q(&diff_queued_diff, p); - } - } + struct line_log_data *range = lookup_line_range(rev, commit); + struct line_log_data *r; - diffcore_std(&rev->diffopt); - diff_flush(&rev->diffopt); + for (r = range; r; r = r->next) { + if (r->pair) { + struct diff_filepair *p = diff_filepair_dup(r->pair); + p->line_ranges = &r->ranges; + diff_q(&diff_queued_diff, p); + } } - return 1; } static int bloom_filter_check(struct rev_info *rev, diff --git a/line-log.h b/line-log.h index 04a6ea64d3d68f..99e1755ce3d568 100644 --- a/line-log.h +++ b/line-log.h @@ -46,7 +46,7 @@ int line_log_filter(struct rev_info *rev); int line_log_process_ranges_arbitrary_commit(struct rev_info *rev, struct commit *commit); -int line_log_print(struct rev_info *rev, struct commit *commit); +void line_log_queue_pairs(struct rev_info *rev, struct commit *commit); void line_log_free(struct rev_info *rev); diff --git a/log-tree.c b/log-tree.c index 7e048701d0c5b4..88b3019293b725 100644 --- a/log-tree.c +++ b/log-tree.c @@ -1105,6 +1105,12 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log if (!all_need_diff && !opt->merges_need_diff) return 0; + if (opt->line_level_traverse) { + line_log_queue_pairs(opt, commit); + log_tree_diff_flush(opt); + return !opt->loginfo; + } + parse_commit_or_die(commit); oid = get_commit_tree_oid(commit); @@ -1179,10 +1185,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) opt->loginfo = &log; opt->diffopt.no_free = 1; - /* NEEDSWORK: no restoring of no_free? Why? */ - if (opt->line_level_traverse) - return line_log_print(opt, commit); - if (opt->track_linear && !opt->linear && !opt->reverse_output_stage) fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar); shown = log_tree_diff(opt, commit, &log); diff --git a/revision.c b/revision.c index 4a8e24bc38d572..c903f7a1b4c4c8 100644 --- a/revision.c +++ b/revision.c @@ -3179,8 +3179,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s die(_("the option '%s' requires '%s'"), "--grep-reflog", "--walk-reflogs"); if (revs->line_level_traverse && - (revs->diffopt.output_format & ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT))) - die(_("-L does not yet support diff formats besides -p and -s")); + (revs->full_diff || + (revs->diffopt.output_format & + ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT)))) + die(_("-L does not yet support the requested diff format")); if (revs->expand_tabs_in_log < 0) revs->expand_tabs_in_log = revs->expand_tabs_in_log_default; diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh index aaf197d2edc4d8..e3937138a94055 100755 --- a/t/t4211-line-log.sh +++ b/t/t4211-line-log.sh @@ -368,7 +368,6 @@ test_expect_success '-L diff output includes index and new file mode' ' test_expect_success '-L with --word-diff' ' cat >expect <<-\EOF && - diff --git a/file.c b/file.c --- a/file.c +++ b/file.c @@ -377,7 +376,6 @@ test_expect_success '-L with --word-diff' ' { return [-F2;-]{+F2 + 2;+} } - diff --git a/file.c b/file.c new file mode 100644 --- /dev/null @@ -433,7 +431,6 @@ test_expect_success 'show line-log with graph' ' null_blob=$(test_oid zero | cut -c1-7) && qz_to_tab_space >expect <<-EOF && * $head_oid Modify func2() in file.c - |Z | diff --git a/file.c b/file.c | index $head_blob_old..$head_blob_new 100644 | --- a/file.c @@ -445,7 +442,6 @@ test_expect_success 'show line-log with graph' ' | + return F2 + 2; | } * $root_oid Add func1() and func2() in file.c - ZZ diff --git a/file.c b/file.c new file mode 100644 index $null_blob..$root_blob @@ -494,23 +490,17 @@ test_expect_success '-L --find-object does not crash with merge and rename' ' --find-object=$(git rev-parse HEAD:file) >actual ' -# Commit-level filtering with pickaxe does not yet work for -L. -# show_log() prints the commit header before diffcore_std() runs -# pickaxe, so commits cannot be suppressed even when no diff pairs -# survive filtering. Fixing this would require deferring show_log() -# until after diffcore_std(), which is a larger restructuring of the -# log-tree output pipeline. -test_expect_failure '-L -G should filter commits by pattern' ' +test_expect_success '-L -G should filter commits by pattern' ' git log --format="%s" --no-patch -L 1,1:file -G "nomatch" >actual && test_must_be_empty actual ' -test_expect_failure '-L -S should filter commits by pattern' ' +test_expect_success '-L -S should filter commits by pattern' ' git log --format="%s" --no-patch -L 1,1:file -S "nomatch" >actual && test_must_be_empty actual ' -test_expect_failure '-L --find-object should filter commits by object' ' +test_expect_success '-L --find-object should filter commits by object' ' git log --format="%s" --no-patch -L 1,1:file \ --find-object=$ZERO_OID >actual && test_must_be_empty actual @@ -711,4 +701,41 @@ test_expect_success '-L with -G filters to diff-text matches' ' grep "F2 + 2" actual ' +test_expect_success '-L with --diff-filter=M excludes root commit' ' + git checkout parent-oids && + git log -L:func2:file.c --diff-filter=M --format=%s --no-patch >actual && + # Root commit is an Add (A), not a Modify (M), so it should + # be excluded; only the modification commit remains. + echo "Modify func2() in file.c" >expect && + test_cmp expect actual +' + +test_expect_success '-L with --diff-filter=A shows only root commit' ' + git checkout parent-oids && + git log -L:func2:file.c --diff-filter=A --format=%s --no-patch >actual && + echo "Add func1() and func2() in file.c" >expect && + test_cmp expect actual +' + +test_expect_success '-L with -S suppresses non-matching commits' ' + git checkout parent-oids && + git log -L:func2:file.c -S "F2 + 2" --format=%s --no-patch >actual && + # Only the commit that changes the count of "F2 + 2" should appear. + echo "Modify func2() in file.c" >expect && + test_cmp expect actual +' + +test_expect_success '--full-diff is not yet supported with -L' ' + test_must_fail git log -L1,24:b.c --full-diff 2>err && + test_grep "does not yet support" err +' + +test_expect_success '-L --oneline has no extra blank line before diff' ' + git checkout parent-oids && + git log --oneline -L:func2:file.c -1 >actual && + # Oneline header on line 1, diff starts immediately on line 2 + sed -n 2p actual >line2 && + test_grep "^diff --git" line2 +' + test_done diff --git a/t/t4211/sha1/expect.parallel-change-f-to-main b/t/t4211/sha1/expect.parallel-change-f-to-main index 65a8cc673a6fca..6d7a20103631cc 100644 --- a/t/t4211/sha1/expect.parallel-change-f-to-main +++ b/t/t4211/sha1/expect.parallel-change-f-to-main @@ -5,7 +5,6 @@ Date: Fri Apr 12 16:16:24 2013 +0200 Merge across the rename - commit 6ce3c4ff690136099bb17e1a8766b75764726ea7 Author: Thomas Rast Date: Thu Feb 28 10:49:50 2013 +0100 diff --git a/t/t4211/sha256/expect.parallel-change-f-to-main b/t/t4211/sha256/expect.parallel-change-f-to-main index 3178989253a885..c93e03bef40544 100644 --- a/t/t4211/sha256/expect.parallel-change-f-to-main +++ b/t/t4211/sha256/expect.parallel-change-f-to-main @@ -5,7 +5,6 @@ Date: Fri Apr 12 16:16:24 2013 +0200 Merge across the rename - commit 4f7a58195a92c400e28a2354328587f1ff14fb77f5cf894536f17ccbc72931b9 Author: Thomas Rast Date: Thu Feb 28 10:49:50 2013 +0100 From 4b5d8a0163fe4e9a4ac074f407e0599ba27acf68 Mon Sep 17 00:00:00 2001 From: Michael Montalbo Date: Thu, 28 May 2026 20:47:46 +0000 Subject: [PATCH 045/107] line-log: allow non-patch diff formats with -L Now that -L flows through log_tree_diff_flush() and diff_flush(), metadata-only diff formats work because they only read filepair fields (status, mode, path, oid) already set on the pre-computed pairs. Expand the allowlist in setup_revisions() to also accept --raw, --name-only, --name-status, and --summary. Diff stat formats (--stat, --numstat, --shortstat, --dirstat) remain blocked because they call compute_diffstat() on full blob content and would show whole-file statistics rather than range-scoped ones. Signed-off-by: Michael Montalbo Signed-off-by: Junio C Hamano --- Documentation/line-range-options.adoc | 10 +++--- revision.c | 4 ++- t/t4211-line-log.sh | 47 +++++++++++++++++++++++++-- 3 files changed, 54 insertions(+), 7 deletions(-) diff --git a/Documentation/line-range-options.adoc b/Documentation/line-range-options.adoc index ecb2c79fb9bde8..72f639b5e79ea4 100644 --- a/Documentation/line-range-options.adoc +++ b/Documentation/line-range-options.adoc @@ -8,12 +8,14 @@ give zero or one positive revision arguments, and __ and __ (or __) must exist in the starting revision. You can specify this option more than once. Implies `--patch`. - Patch output can be suppressed using `--no-patch`, but other diff formats - (namely `--raw`, `--numstat`, `--shortstat`, `--dirstat`, `--summary`, - `--name-only`, `--name-status`, `--check`) are not currently implemented. + Patch output can be suppressed using `--no-patch`. + Non-patch diff formats `--raw`, `--name-only`, `--name-status`, + and `--summary` are supported. Diff stat formats + (`--stat`, `--numstat`, `--shortstat`, `--dirstat`) are not + currently implemented. + Patch formatting options such as `--word-diff`, `--color-moved`, `--no-prefix`, and whitespace options (`-w`, `-b`) are supported, -as are pickaxe options (`-S`, `-G`). +as are pickaxe options (`-S`, `-G`) and `--diff-filter`. + include::line-range-format.adoc[] diff --git a/revision.c b/revision.c index c903f7a1b4c4c8..f26fc1f4d5e48e 100644 --- a/revision.c +++ b/revision.c @@ -3181,7 +3181,9 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s if (revs->line_level_traverse && (revs->full_diff || (revs->diffopt.output_format & - ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT)))) + ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT | + DIFF_FORMAT_RAW | DIFF_FORMAT_NAME | + DIFF_FORMAT_NAME_STATUS | DIFF_FORMAT_SUMMARY)))) die(_("-L does not yet support the requested diff format")); if (revs->expand_tabs_in_log < 0) diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh index e3937138a94055..ca4eb7bbc713ef 100755 --- a/t/t4211-line-log.sh +++ b/t/t4211-line-log.sh @@ -155,8 +155,45 @@ test_expect_success '-p shows the default patch output' ' test_cmp expect actual ' -test_expect_success '--raw is forbidden' ' - test_must_fail git log -L1,24:b.c --raw +test_expect_success '--raw shows mode, oid, status and path' ' + git log -L1,24:b.c --raw --format= >actual && + test_grep "^:100644 100644 [0-9a-f]\{7\} [0-9a-f]\{7\} M b.c$" actual && + test_grep ! "^diff --git" actual && + test_grep ! "^@@" actual +' + +test_expect_success '--name-only shows path' ' + git log -L1,24:b.c --name-only --format= >actual && + test_grep "^b.c$" actual && + test_grep ! "^diff --git" actual && + test_grep ! "^@@" actual +' + +test_expect_success '--name-status shows status and path' ' + git log -L1,24:b.c --name-status --format= >actual && + test_grep "^M b.c$" actual && + test_grep ! "^diff --git" actual && + test_grep ! "^@@" actual +' + +test_expect_success '--stat is not yet supported with -L' ' + test_must_fail git log -L1,24:b.c --stat 2>err && + test_grep "does not yet support" err +' + +test_expect_success '--numstat is not yet supported with -L' ' + test_must_fail git log -L1,24:b.c --numstat 2>err && + test_grep "does not yet support" err +' + +test_expect_success '--shortstat is not yet supported with -L' ' + test_must_fail git log -L1,24:b.c --shortstat 2>err && + test_grep "does not yet support" err +' + +test_expect_success '--dirstat is not yet supported with -L' ' + test_must_fail git log -L1,24:b.c --dirstat 2>err && + test_grep "does not yet support" err ' test_expect_success 'setup for checking fancy rename following' ' @@ -738,4 +775,10 @@ test_expect_success '-L --oneline has no extra blank line before diff' ' test_grep "^diff --git" line2 ' +test_expect_success '--summary shows new file on root commit' ' + git checkout parent-oids && + git log -L:func2:file.c --summary --format= >actual && + test_grep "create mode 100644 file.c" actual +' + test_done From b8cda126b4e0fbfd514b26dec4ee8a1c6849abe9 Mon Sep 17 00:00:00 2001 From: Sebastien Tardif Date: Thu, 28 May 2026 02:56:54 +0000 Subject: [PATCH 046/107] daemon: fix IPv6 address corruption in lookup_hostname() getaddrinfo() is called with AF_UNSPEC hints, so it may return IPv6 results. However, the code unconditionally casts ai_addr to sockaddr_in and passes AF_INET to inet_ntop(). On IPv6-only hosts, this reads from the wrong struct offset, producing garbage IP addresses. Fix this by checking ai_family and extracting the address pointer into a local variable before calling inet_ntop() once with the correct family. Die on unexpected address families. Signed-off-by: Sebastien Tardif Signed-off-by: Junio C Hamano --- daemon.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/daemon.c b/daemon.c index 0a7b1aae447912..80fa0226d89f03 100644 --- a/daemon.c +++ b/daemon.c @@ -674,9 +674,20 @@ static void lookup_hostname(struct hostinfo *hi) gai = getaddrinfo(hi->hostname.buf, NULL, &hints, &ai); if (!gai) { - struct sockaddr_in *sin_addr = (void *)ai->ai_addr; + void *addr; + + if (ai->ai_family == AF_INET) { + struct sockaddr_in *sa = (void *)ai->ai_addr; + addr = &sa->sin_addr; + } else if (ai->ai_family == AF_INET6) { + struct sockaddr_in6 *sa6 = (void *)ai->ai_addr; + addr = &sa6->sin6_addr; + } else { + die("unexpected address family: %d", + ai->ai_family); + } - inet_ntop(AF_INET, &sin_addr->sin_addr, + inet_ntop(ai->ai_family, addr, addrbuf, sizeof(addrbuf)); strbuf_addstr(&hi->ip_address, addrbuf); From 30c8fda1ab6d55d3b0129bb1686c23bf06cd5b0d Mon Sep 17 00:00:00 2001 From: Sebastien Tardif Date: Thu, 28 May 2026 02:56:55 +0000 Subject: [PATCH 047/107] daemon: fix IPv6 address truncation in ip2str() The sockaddr struct size (ai_addrlen) is passed as the output buffer size to inet_ntop(). For IPv6, sizeof(sockaddr_in6) is 28 bytes but INET6_ADDRSTRLEN is 46, so long IPv6 addresses are silently truncated. Fix this by passing sizeof(ip) instead, which is the actual size of the destination buffer. Drop the now-unused len parameter from ip2str() and update all callers. Signed-off-by: Sebastien Tardif Signed-off-by: Junio C Hamano --- daemon.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/daemon.c b/daemon.c index 80fa0226d89f03..103c08d868d5de 100644 --- a/daemon.c +++ b/daemon.c @@ -947,7 +947,7 @@ struct socketlist { size_t alloc; }; -static const char *ip2str(int family, struct sockaddr *sin, socklen_t len) +static const char *ip2str(int family, struct sockaddr *sin) { #ifdef NO_IPV6 static char ip[INET_ADDRSTRLEN]; @@ -958,11 +958,11 @@ static const char *ip2str(int family, struct sockaddr *sin, socklen_t len) switch (family) { #ifndef NO_IPV6 case AF_INET6: - inet_ntop(family, &((struct sockaddr_in6*)sin)->sin6_addr, ip, len); + inet_ntop(family, &((struct sockaddr_in6*)sin)->sin6_addr, ip, sizeof(ip)); break; #endif case AF_INET: - inet_ntop(family, &((struct sockaddr_in*)sin)->sin_addr, ip, len); + inet_ntop(family, &((struct sockaddr_in*)sin)->sin_addr, ip, sizeof(ip)); break; default: xsnprintf(ip, sizeof(ip), ""); @@ -1019,14 +1019,14 @@ static int setup_named_sock(char *listen_addr, int listen_port, struct socketlis if (bind(sockfd, ai->ai_addr, ai->ai_addrlen) < 0) { logerror("Could not bind to %s: %s", - ip2str(ai->ai_family, ai->ai_addr, ai->ai_addrlen), + ip2str(ai->ai_family, ai->ai_addr), strerror(errno)); close(sockfd); continue; /* not fatal */ } if (listen(sockfd, 5) < 0) { logerror("Could not listen to %s: %s", - ip2str(ai->ai_family, ai->ai_addr, ai->ai_addrlen), + ip2str(ai->ai_family, ai->ai_addr), strerror(errno)); close(sockfd); continue; /* not fatal */ @@ -1080,7 +1080,7 @@ static int setup_named_sock(char *listen_addr, int listen_port, struct socketlis if ( bind(sockfd, (struct sockaddr *)&sin, sizeof sin) < 0 ) { logerror("Could not bind to %s: %s", - ip2str(AF_INET, (struct sockaddr *)&sin, sizeof(sin)), + ip2str(AF_INET, (struct sockaddr *)&sin), strerror(errno)); close(sockfd); return 0; @@ -1088,7 +1088,7 @@ static int setup_named_sock(char *listen_addr, int listen_port, struct socketlis if (listen(sockfd, 5) < 0) { logerror("Could not listen to %s: %s", - ip2str(AF_INET, (struct sockaddr *)&sin, sizeof(sin)), + ip2str(AF_INET, (struct sockaddr *)&sin), strerror(errno)); close(sockfd); return 0; From 422a5bf57575a8c5d06faedfd77376501917e22c Mon Sep 17 00:00:00 2001 From: Sebastien Tardif Date: Thu, 28 May 2026 02:56:56 +0000 Subject: [PATCH 048/107] daemon: guard NULL REMOTE_PORT in execute() logging REMOTE_ADDR and REMOTE_PORT are both set by the same code path in handle(), so when the existing REMOTE_ADDR check passes, REMOTE_PORT is guaranteed to be non-NULL. Guard REMOTE_PORT as well so that a future change that breaks this invariant does not pass NULL to printf's %s, which is undefined behavior. Signed-off-by: Sebastien Tardif Signed-off-by: Junio C Hamano --- daemon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/daemon.c b/daemon.c index 103c08d868d5de..78cca8673fdb34 100644 --- a/daemon.c +++ b/daemon.c @@ -753,7 +753,7 @@ static int execute(void) struct strvec env = STRVEC_INIT; if (addr) - loginfo("Connection from %s:%s", addr, port); + loginfo("Connection from %s:%s", addr, port ? port : "?"); set_keep_alive(0); alarm(init_timeout ? init_timeout : timeout); From 514f039c9052c23047c310f911ba8c0c2e74a1c7 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:24 +0200 Subject: [PATCH 049/107] odb/source-loose: move loose source into "odb/" subsystem In subsequent patches we'll be turning `struct odb_source_loose` into a proper `struct odb_source`. As a first step towards this goal, move its struct out of "object-file.c" and into "odb/source-loose.c". This detaches the implementation of the loose object source from the generic object file code, following the same convention already used by the "files" and "in-memory" sources. No functional changes are intended. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- Makefile | 1 + meson.build | 1 + object-file.c | 8 -------- object-file.h | 21 +-------------------- odb/source-loose.c | 10 ++++++++++ odb/source-loose.h | 34 ++++++++++++++++++++++++++++++++++ 6 files changed, 47 insertions(+), 28 deletions(-) create mode 100644 odb/source-loose.c create mode 100644 odb/source-loose.h diff --git a/Makefile b/Makefile index a43b8ee0674df8..01356235c3e11d 100644 --- a/Makefile +++ b/Makefile @@ -1217,6 +1217,7 @@ LIB_OBJS += odb.o LIB_OBJS += odb/source.o LIB_OBJS += odb/source-files.o LIB_OBJS += odb/source-inmemory.o +LIB_OBJS += odb/source-loose.o LIB_OBJS += odb/streaming.o LIB_OBJS += odb/transaction.o LIB_OBJS += oid-array.o diff --git a/meson.build b/meson.build index 664d8313295a26..c85e5988351b1f 100644 --- a/meson.build +++ b/meson.build @@ -405,6 +405,7 @@ libgit_sources = [ 'odb/source.c', 'odb/source-files.c', 'odb/source-inmemory.c', + 'odb/source-loose.c', 'odb/streaming.c', 'odb/transaction.c', 'oid-array.c', diff --git a/object-file.c b/object-file.c index 90f995d0000bf6..641bd9c0799dec 100644 --- a/object-file.c +++ b/object-file.c @@ -2205,14 +2205,6 @@ struct odb_transaction *odb_transaction_files_begin(struct odb_source *source) return &transaction->base; } -struct odb_source_loose *odb_source_loose_new(struct odb_source *source) -{ - struct odb_source_loose *loose; - CALLOC_ARRAY(loose, 1); - loose->source = source; - return loose; -} - void odb_source_loose_free(struct odb_source_loose *loose) { if (!loose) diff --git a/object-file.h b/object-file.h index 5241b8dd5c564d..1d8312cf7f9ff9 100644 --- a/object-file.h +++ b/object-file.h @@ -4,6 +4,7 @@ #include "git-zlib.h" #include "object.h" #include "odb.h" +#include "odb/source-loose.h" struct index_state; @@ -20,26 +21,6 @@ struct object_info; struct odb_read_stream; struct odb_source; -struct odb_source_loose { - struct odb_source *source; - - /* - * Used to store the results of readdir(3) calls when we are OK - * sacrificing accuracy due to races for speed. That includes - * object existence with OBJECT_INFO_QUICK, as well as - * our search for unique abbreviated hashes. Don't use it for tasks - * requiring greater accuracy! - * - * Be sure to call odb_load_loose_cache() before using. - */ - uint32_t subdir_seen[8]; /* 256 bits */ - struct oidtree *cache; - - /* Map between object IDs for loose objects. */ - struct loose_object_map *map; -}; - -struct odb_source_loose *odb_source_loose_new(struct odb_source *source); void odb_source_loose_free(struct odb_source_loose *loose); /* Reprepare the loose source by emptying the loose object cache. */ diff --git a/odb/source-loose.c b/odb/source-loose.c new file mode 100644 index 00000000000000..b944d2181324ce --- /dev/null +++ b/odb/source-loose.c @@ -0,0 +1,10 @@ +#include "git-compat-util.h" +#include "odb/source-loose.h" + +struct odb_source_loose *odb_source_loose_new(struct odb_source *source) +{ + struct odb_source_loose *loose; + CALLOC_ARRAY(loose, 1); + loose->source = source; + return loose; +} diff --git a/odb/source-loose.h b/odb/source-loose.h new file mode 100644 index 00000000000000..8b4bac77ea39e8 --- /dev/null +++ b/odb/source-loose.h @@ -0,0 +1,34 @@ +#ifndef ODB_SOURCE_LOOSE_H +#define ODB_SOURCE_LOOSE_H + +#include "odb/source.h" + +struct object_database; +struct oidtree; + +/* + * An object database source that stores its objects in loose format, one + * file per object. This source is part of the files source. + */ +struct odb_source_loose { + struct odb_source *source; + + /* + * Used to store the results of readdir(3) calls when we are OK + * sacrificing accuracy due to races for speed. That includes + * object existence with OBJECT_INFO_QUICK, as well as + * our search for unique abbreviated hashes. Don't use it for tasks + * requiring greater accuracy! + * + * Be sure to call odb_load_loose_cache() before using. + */ + uint32_t subdir_seen[8]; /* 256 bits */ + struct oidtree *cache; + + /* Map between object IDs for loose objects. */ + struct loose_object_map *map; +}; + +struct odb_source_loose *odb_source_loose_new(struct odb_source *source); + +#endif From 1d451ba6fec076d357abf62607b97f585283030a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:25 +0200 Subject: [PATCH 050/107] odb/source-loose: store pointer to "files" instead of generic source The `struct odb_source_loose` holds a pointer to its owning parent source. The way that Git is currently structured, this parent is always the "files" source. In subsequent commits we're going to detangle that so that the "loose" source doesn't have any owning parent source at all so that it can be used as a completely standalone source. Detangling this mess is somewhat intricate though, and is made even more intricate because it's not always clear which kind of source one is holding at a specific point in time -- either the parent "files" source, or the child "loose" source. Make this relationship more explicit by storing a pointer to the "files" source instead of storing a pointer to a generic `struct odb_source`. This will help make subsequent steps a bit clearer. Note that this is a temporary step, only. At the end of this series we will have dropped the parent pointer completely. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 4 ++-- odb/source-files.c | 2 +- odb/source-loose.c | 4 ++-- odb/source-loose.h | 5 +++-- 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/object-file.c b/object-file.c index 641bd9c0799dec..7a1908bfc05cb5 100644 --- a/object-file.c +++ b/object-file.c @@ -178,7 +178,7 @@ static int open_loose_object(struct odb_source_loose *loose, static struct strbuf buf = STRBUF_INIT; int fd; - *path = odb_loose_path(loose->source, &buf, oid); + *path = odb_loose_path(&loose->files->base, &buf, oid); fd = git_open(*path); if (fd >= 0) return fd; @@ -189,7 +189,7 @@ static int open_loose_object(struct odb_source_loose *loose, static int quick_has_loose(struct odb_source_loose *loose, const struct object_id *oid) { - return !!oidtree_contains(odb_source_loose_cache(loose->source, oid), oid); + return !!oidtree_contains(odb_source_loose_cache(&loose->files->base, oid), oid); } /* diff --git a/odb/source-files.c b/odb/source-files.c index b5abd20e971e78..185cc6903e35f2 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -264,7 +264,7 @@ struct odb_source_files *odb_source_files_new(struct object_database *odb, CALLOC_ARRAY(files, 1); odb_source_init(&files->base, odb, ODB_SOURCE_FILES, path, local); - files->loose = odb_source_loose_new(&files->base); + files->loose = odb_source_loose_new(files); files->packed = packfile_store_new(&files->base); files->base.free = odb_source_files_free; diff --git a/odb/source-loose.c b/odb/source-loose.c index b944d2181324ce..c9e7414814814d 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -1,10 +1,10 @@ #include "git-compat-util.h" #include "odb/source-loose.h" -struct odb_source_loose *odb_source_loose_new(struct odb_source *source) +struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) { struct odb_source_loose *loose; CALLOC_ARRAY(loose, 1); - loose->source = source; + loose->files = files; return loose; } diff --git a/odb/source-loose.h b/odb/source-loose.h index 8b4bac77ea39e8..bf61e767c8aab4 100644 --- a/odb/source-loose.h +++ b/odb/source-loose.h @@ -3,6 +3,7 @@ #include "odb/source.h" +struct odb_source_files; struct object_database; struct oidtree; @@ -11,7 +12,7 @@ struct oidtree; * file per object. This source is part of the files source. */ struct odb_source_loose { - struct odb_source *source; + struct odb_source_files *files; /* * Used to store the results of readdir(3) calls when we are OK @@ -29,6 +30,6 @@ struct odb_source_loose { struct loose_object_map *map; }; -struct odb_source_loose *odb_source_loose_new(struct odb_source *source); +struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files); #endif From ead691927b05dbbd2655db9a7183d5fcb935bf3b Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:26 +0200 Subject: [PATCH 051/107] odb/source-loose: start converting to a proper `struct odb_source` Start converting `struct odb_source_loose` into a proper pluggable `struct odb_source` by embedding the base struct and assigning it the new `ODB_SOURCE_LOOSE` type. Furthermore, wire up lifecycle management of this source by implementing the `free` callback and taking ownership of the chdir notifications. Note that the loose source is not yet functional as a standalone `struct odb_source`, as it's missing all of the callback implementations. These will be wired up in subsequent commits. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 17 ----------------- object-file.h | 2 -- odb/source-files.c | 2 +- odb/source-loose.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ odb/source-loose.h | 14 ++++++++++++++ odb/source.h | 3 +++ 6 files changed, 63 insertions(+), 20 deletions(-) diff --git a/object-file.c b/object-file.c index 7a1908bfc05cb5..977d959d333166 100644 --- a/object-file.c +++ b/object-file.c @@ -2041,14 +2041,6 @@ static struct oidtree *odb_source_loose_cache(struct odb_source *source, return files->loose->cache; } -static void odb_source_loose_clear_cache(struct odb_source_loose *loose) -{ - oidtree_clear(loose->cache); - FREE_AND_NULL(loose->cache); - memset(&loose->subdir_seen, 0, - sizeof(loose->subdir_seen)); -} - void odb_source_loose_reprepare(struct odb_source *source) { struct odb_source_files *files = odb_source_files_downcast(source); @@ -2205,15 +2197,6 @@ struct odb_transaction *odb_transaction_files_begin(struct odb_source *source) return &transaction->base; } -void odb_source_loose_free(struct odb_source_loose *loose) -{ - if (!loose) - return; - odb_source_loose_clear_cache(loose); - loose_object_map_clear(&loose->map); - free(loose); -} - struct odb_loose_read_stream { struct odb_read_stream base; git_zstream z; diff --git a/object-file.h b/object-file.h index 1d8312cf7f9ff9..02c9680980ab0f 100644 --- a/object-file.h +++ b/object-file.h @@ -21,8 +21,6 @@ struct object_info; struct odb_read_stream; struct odb_source; -void odb_source_loose_free(struct odb_source_loose *loose); - /* Reprepare the loose source by emptying the loose object cache. */ void odb_source_loose_reprepare(struct odb_source *source); diff --git a/odb/source-files.c b/odb/source-files.c index 185cc6903e35f2..ccc637311b9c21 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -27,7 +27,7 @@ static void odb_source_files_free(struct odb_source *source) { struct odb_source_files *files = odb_source_files_downcast(source); chdir_notify_unregister(NULL, odb_source_files_reparent, files); - odb_source_loose_free(files->loose); + odb_source_free(&files->loose->base); packfile_store_free(files->packed); odb_source_release(&files->base); free(files); diff --git a/odb/source-loose.c b/odb/source-loose.c index c9e7414814814d..92e18f5adb2b89 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -1,10 +1,55 @@ #include "git-compat-util.h" +#include "abspath.h" +#include "chdir-notify.h" +#include "loose.h" +#include "odb.h" +#include "odb/source-files.h" #include "odb/source-loose.h" +#include "oidtree.h" + +void odb_source_loose_clear_cache(struct odb_source_loose *loose) +{ + oidtree_clear(loose->cache); + FREE_AND_NULL(loose->cache); + memset(&loose->subdir_seen, 0, + sizeof(loose->subdir_seen)); +} + +static void odb_source_loose_reparent(const char *name UNUSED, + const char *old_cwd, + const char *new_cwd, + void *cb_data) +{ + struct odb_source_loose *loose = cb_data; + char *path = reparent_relative_path(old_cwd, new_cwd, + loose->base.path); + free(loose->base.path); + loose->base.path = path; +} + +static void odb_source_loose_free(struct odb_source *source) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + odb_source_loose_clear_cache(loose); + loose_object_map_clear(&loose->map); + chdir_notify_unregister(NULL, odb_source_loose_reparent, loose); + odb_source_release(&loose->base); + free(loose); +} struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) { struct odb_source_loose *loose; + CALLOC_ARRAY(loose, 1); + odb_source_init(&loose->base, files->base.odb, ODB_SOURCE_LOOSE, + files->base.path, files->base.local); loose->files = files; + + loose->base.free = odb_source_loose_free; + + if (!is_absolute_path(loose->base.path)) + chdir_notify_register(NULL, odb_source_loose_reparent, loose); + return loose; } diff --git a/odb/source-loose.h b/odb/source-loose.h index bf61e767c8aab4..bd989f0728e622 100644 --- a/odb/source-loose.h +++ b/odb/source-loose.h @@ -12,6 +12,7 @@ struct oidtree; * file per object. This source is part of the files source. */ struct odb_source_loose { + struct odb_source base; struct odb_source_files *files; /* @@ -32,4 +33,17 @@ struct odb_source_loose { struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files); +/* + * Cast the given object database source to the loose backend. This will cause + * a BUG in case the source doesn't use this backend. + */ +static inline struct odb_source_loose *odb_source_loose_downcast(struct odb_source *source) +{ + if (source->type != ODB_SOURCE_LOOSE) + BUG("trying to downcast source of type '%d' to loose", source->type); + return container_of(source, struct odb_source_loose, base); +} + +void odb_source_loose_clear_cache(struct odb_source_loose *loose); + #endif diff --git a/odb/source.h b/odb/source.h index 0a440884e4f0ab..8bcb67787ebafd 100644 --- a/odb/source.h +++ b/odb/source.h @@ -14,6 +14,9 @@ enum odb_source_type { /* The "files" backend that uses loose objects and packfiles. */ ODB_SOURCE_FILES, + /* The "loose" backend that uses loose objects, only. */ + ODB_SOURCE_LOOSE, + /* The "in-memory" backend that stores objects in memory. */ ODB_SOURCE_INMEMORY, }; From a2b7db9bc8d52f133fe8fcb317788d9fe8696f07 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:27 +0200 Subject: [PATCH 052/107] odb/source-loose: wire up `reprepare()` callback Move `odb_source_loose_reprepare()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `reprepare()` callback of the loose source. While at it, make `odb_source_loose_clear_cache()` static, as it is no longer needed outside of its file. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 6 ------ object-file.h | 3 --- odb/source-files.c | 2 +- odb/source-loose.c | 9 ++++++++- odb/source-loose.h | 2 -- 5 files changed, 9 insertions(+), 13 deletions(-) diff --git a/object-file.c b/object-file.c index 977d959d333166..0f4f1e7bdc0733 100644 --- a/object-file.c +++ b/object-file.c @@ -2041,12 +2041,6 @@ static struct oidtree *odb_source_loose_cache(struct odb_source *source, return files->loose->cache; } -void odb_source_loose_reprepare(struct odb_source *source) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - odb_source_loose_clear_cache(files->loose); -} - static int check_stream_oid(git_zstream *stream, const char *hdr, unsigned long size, diff --git a/object-file.h b/object-file.h index 02c9680980ab0f..420a0fff2e7d7e 100644 --- a/object-file.h +++ b/object-file.h @@ -21,9 +21,6 @@ struct object_info; struct odb_read_stream; struct odb_source; -/* Reprepare the loose source by emptying the loose object cache. */ -void odb_source_loose_reprepare(struct odb_source *source); - int odb_source_loose_read_object_info(struct odb_source *source, const struct object_id *oid, struct object_info *oi, diff --git a/odb/source-files.c b/odb/source-files.c index ccc637311b9c21..10832e81e4e206 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -42,7 +42,7 @@ static void odb_source_files_close(struct odb_source *source) static void odb_source_files_reprepare(struct odb_source *source) { struct odb_source_files *files = odb_source_files_downcast(source); - odb_source_loose_reprepare(&files->base); + odb_source_reprepare(&files->loose->base); packfile_store_reprepare(files->packed); } diff --git a/odb/source-loose.c b/odb/source-loose.c index 92e18f5adb2b89..e0fe0d513d2532 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -7,7 +7,7 @@ #include "odb/source-loose.h" #include "oidtree.h" -void odb_source_loose_clear_cache(struct odb_source_loose *loose) +static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); FREE_AND_NULL(loose->cache); @@ -15,6 +15,12 @@ void odb_source_loose_clear_cache(struct odb_source_loose *loose) sizeof(loose->subdir_seen)); } +static void odb_source_loose_reprepare(struct odb_source *source) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + odb_source_loose_clear_cache(loose); +} + static void odb_source_loose_reparent(const char *name UNUSED, const char *old_cwd, const char *new_cwd, @@ -47,6 +53,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->files = files; loose->base.free = odb_source_loose_free; + loose->base.reprepare = odb_source_loose_reprepare; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); diff --git a/odb/source-loose.h b/odb/source-loose.h index bd989f0728e622..4dd4fd6ce30a7e 100644 --- a/odb/source-loose.h +++ b/odb/source-loose.h @@ -44,6 +44,4 @@ static inline struct odb_source_loose *odb_source_loose_downcast(struct odb_sour return container_of(source, struct odb_source_loose, base); } -void odb_source_loose_clear_cache(struct odb_source_loose *loose); - #endif From 337b7fccba1cca8b7d9232b5e6e9ff53271f0398 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:28 +0200 Subject: [PATCH 053/107] odb/source-loose: wire up `close()` callback Wire up a new `close()` callback for the loose source and call it from the "files" source via the generic `odb_source_close()` interface. The callback itself is a no-op as the loose source has no resources that need to be released on close. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- odb/source-files.c | 1 + odb/source-loose.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/odb/source-files.c b/odb/source-files.c index 10832e81e4e206..59e3a70d80d355 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -36,6 +36,7 @@ static void odb_source_files_free(struct odb_source *source) static void odb_source_files_close(struct odb_source *source) { struct odb_source_files *files = odb_source_files_downcast(source); + odb_source_close(&files->loose->base); packfile_store_close(files->packed); } diff --git a/odb/source-loose.c b/odb/source-loose.c index e0fe0d513d2532..65c1076659b8fd 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -21,6 +21,11 @@ static void odb_source_loose_reprepare(struct odb_source *source) odb_source_loose_clear_cache(loose); } +static void odb_source_loose_close(struct odb_source *source UNUSED) +{ + /* Nothing to do. */ +} + static void odb_source_loose_reparent(const char *name UNUSED, const char *old_cwd, const char *new_cwd, @@ -53,6 +58,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->files = files; loose->base.free = odb_source_loose_free; + loose->base.close = odb_source_loose_close; loose->base.reprepare = odb_source_loose_reprepare; if (!is_absolute_path(loose->base.path)) From 584338ed92735f3be768c16b53266d5bad439a7a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:29 +0200 Subject: [PATCH 054/107] odb/source-loose: wire up `read_object_info()` callback Move `odb_source_loose_read_object_info()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `read_object_info()` callback of the loose source. Callers that previously invoked it directly now go through the generic `odb_source_read_object_info()` interface instead. The function `read_object_info_from_path()` cannot be moved along with it because it is still called by `for_each_object_wrapper_cb()`. It is therefore kept in place, but adjusted to take a loose source to clarify that it's always operating on this structure. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 46 +++++++++++++--------------------------------- object-file.h | 11 ++++++----- odb/source-files.c | 2 +- odb/source-loose.c | 24 ++++++++++++++++++++++++ 4 files changed, 44 insertions(+), 39 deletions(-) diff --git a/object-file.c b/object-file.c index 0f4f1e7bdc0733..fa174512a43c75 100644 --- a/object-file.c +++ b/object-file.c @@ -396,13 +396,12 @@ static int parse_loose_header(const char *hdr, struct object_info *oi) return 0; } -static int read_object_info_from_path(struct odb_source *source, - const char *path, - const struct object_id *oid, - struct object_info *oi, - enum object_info_flags flags) +int read_object_info_from_path(struct odb_source_loose *loose, + const char *path, + const struct object_id *oid, + struct object_info *oi, + enum object_info_flags flags) { - struct odb_source_files *files = odb_source_files_downcast(source); int ret; int fd; unsigned long mapsize; @@ -425,7 +424,7 @@ static int read_object_info_from_path(struct odb_source *source, struct stat st; if ((!oi || (!oi->disk_sizep && !oi->mtimep)) && (flags & OBJECT_INFO_QUICK)) { - ret = quick_has_loose(files->loose, oid) ? 0 : -1; + ret = quick_has_loose(loose, oid) ? 0 : -1; goto out; } @@ -532,7 +531,7 @@ static int read_object_info_from_path(struct odb_source *source, if (oi->typep == &type_scratch) oi->typep = NULL; if (oi->delta_base_oid) - oidclr(oi->delta_base_oid, source->odb->repo->hash_algo); + oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo); if (!ret) oi->whence = OI_LOOSE; } @@ -540,26 +539,6 @@ static int read_object_info_from_path(struct odb_source *source, return ret; } -int odb_source_loose_read_object_info(struct odb_source *source, - const struct object_id *oid, - struct object_info *oi, - enum object_info_flags flags) -{ - static struct strbuf buf = STRBUF_INIT; - - /* - * The second read shouldn't cause new loose objects to show up, unless - * there was a race condition with a secondary process. We don't care - * about this case though, so we simply skip reading loose objects a - * second time. - */ - if (flags & OBJECT_INFO_SECOND_READ) - return -1; - - odb_loose_path(source, &buf, oid); - return read_object_info_from_path(source, buf.buf, oid, oi, flags); -} - static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c, const void *buf, unsigned long len, struct object_id *oid, @@ -1833,7 +1812,7 @@ int for_each_loose_file_in_source(struct odb_source *source, } struct for_each_object_wrapper_data { - struct odb_source *source; + struct odb_source_loose *loose; const struct object_info *request; odb_for_each_object_cb cb; void *cb_data; @@ -1848,7 +1827,7 @@ static int for_each_object_wrapper_cb(const struct object_id *oid, if (data->request) { struct object_info oi = *data->request; - if (read_object_info_from_path(data->source, path, oid, &oi, 0) < 0) + if (read_object_info_from_path(data->loose, path, oid, &oi, 0) < 0) return -1; return data->cb(oid, &oi, data->cb_data); @@ -1865,8 +1844,8 @@ static int for_each_prefixed_object_wrapper_cb(const struct object_id *oid, if (data->request) { struct object_info oi = *data->request; - if (odb_source_loose_read_object_info(data->source, - oid, &oi, 0) < 0) + if (odb_source_read_object_info(&data->loose->base, + oid, &oi, 0) < 0) return -1; return data->cb(oid, &oi, data->cb_data); @@ -1881,8 +1860,9 @@ int odb_source_loose_for_each_object(struct odb_source *source, void *cb_data, const struct odb_for_each_object_options *opts) { + struct odb_source_files *files = odb_source_files_downcast(source); struct for_each_object_wrapper_data data = { - .source = source, + .loose = files->loose, .request = request, .cb = cb, .cb_data = cb_data, diff --git a/object-file.h b/object-file.h index 420a0fff2e7d7e..8ac2832dac3439 100644 --- a/object-file.h +++ b/object-file.h @@ -21,11 +21,6 @@ struct object_info; struct odb_read_stream; struct odb_source; -int odb_source_loose_read_object_info(struct odb_source *source, - const struct object_id *oid, - struct object_info *oi, - enum object_info_flags flags); - int odb_source_loose_read_object_stream(struct odb_read_stream **out, struct odb_source *source, const struct object_id *oid); @@ -198,6 +193,12 @@ int read_loose_object(struct repository *repo, void **contents, struct object_info *oi); +int read_object_info_from_path(struct odb_source_loose *loose, + const char *path, + const struct object_id *oid, + struct object_info *oi, + enum object_info_flags flags); + struct odb_transaction; /* diff --git a/odb/source-files.c b/odb/source-files.c index 59e3a70d80d355..8d6924755ffb70 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -55,7 +55,7 @@ static int odb_source_files_read_object_info(struct odb_source *source, struct odb_source_files *files = odb_source_files_downcast(source); if (!packfile_store_read_object_info(files->packed, oid, oi, flags) || - !odb_source_loose_read_object_info(source, oid, oi, flags)) + !odb_source_read_object_info(&files->loose->base, oid, oi, flags)) return 0; return -1; diff --git a/odb/source-loose.c b/odb/source-loose.c index 65c1076659b8fd..50f387ecf31e38 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -2,10 +2,33 @@ #include "abspath.h" #include "chdir-notify.h" #include "loose.h" +#include "object-file.h" #include "odb.h" #include "odb/source-files.h" #include "odb/source-loose.h" #include "oidtree.h" +#include "strbuf.h" + +static int odb_source_loose_read_object_info(struct odb_source *source, + const struct object_id *oid, + struct object_info *oi, + enum object_info_flags flags) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + static struct strbuf buf = STRBUF_INIT; + + /* + * The second read shouldn't cause new loose objects to show up, unless + * there was a race condition with a secondary process. We don't care + * about this case though, so we simply skip reading loose objects a + * second time. + */ + if (flags & OBJECT_INFO_SECOND_READ) + return -1; + + odb_loose_path(source, &buf, oid); + return read_object_info_from_path(loose, buf.buf, oid, oi, flags); +} static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { @@ -60,6 +83,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.free = odb_source_loose_free; loose->base.close = odb_source_loose_close; loose->base.reprepare = odb_source_loose_reprepare; + loose->base.read_object_info = odb_source_loose_read_object_info; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From 727a935a71c29524c936520d8aba4de7098f7566 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:30 +0200 Subject: [PATCH 055/107] odb/source-loose: wire up `read_object_stream()` callback Move `odb_source_loose_read_object_stream()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `read_object_stream()` callback of the loose source. As part of the move we are also forced to expose a couple of functions from "object-file.h" that parse object headers in a somewhat-generic way, as those functions are now used by both subsystems. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 200 ++------------------------------------------- object-file.h | 31 +++++-- odb/source-files.c | 2 +- odb/source-loose.c | 189 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 222 insertions(+), 200 deletions(-) diff --git a/object-file.c b/object-file.c index fa174512a43c75..adfb6724936452 100644 --- a/object-file.c +++ b/object-file.c @@ -164,28 +164,6 @@ int stream_object_signature(struct repository *r, return !oideq(oid, &real_oid) ? -1 : 0; } -/* - * Find "oid" as a loose object in given source, open the object and return its - * file descriptor. Returns the file descriptor on success, negative on failure. - * - * The "path" out-parameter will give the path of the object we found (if any). - * Note that it may point to static storage and is only valid until another - * call to stat_loose_object(). - */ -static int open_loose_object(struct odb_source_loose *loose, - const struct object_id *oid, const char **path) -{ - static struct strbuf buf = STRBUF_INIT; - int fd; - - *path = odb_loose_path(&loose->files->base, &buf, oid); - fd = git_open(*path); - if (fd >= 0) - return fd; - - return -1; -} - static int quick_has_loose(struct odb_source_loose *loose, const struct object_id *oid) { @@ -215,42 +193,11 @@ static void *map_fd(int fd, const char *path, unsigned long *size) return map; } -static void *odb_source_loose_map_object(struct odb_source *source, - const struct object_id *oid, - unsigned long *size) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - const char *p; - int fd = open_loose_object(files->loose, oid, &p); - - if (fd < 0) - return NULL; - return map_fd(fd, p, size); -} - -enum unpack_loose_header_result { - ULHR_OK, - ULHR_BAD, - ULHR_TOO_LONG, -}; - -/** - * unpack_loose_header() initializes the data stream needed to unpack - * a loose object header. - * - * Returns: - * - * - ULHR_OK on success - * - ULHR_BAD on error - * - ULHR_TOO_LONG if the header was too long - * - * It will only parse up to MAX_HEADER_LEN bytes. - */ -static enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, - unsigned char *map, - unsigned long mapsize, - void *buffer, - unsigned long bufsiz) +enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, + unsigned char *map, + unsigned long mapsize, + void *buffer, + unsigned long bufsiz) { int status; @@ -340,7 +287,7 @@ static void *unpack_loose_rest(git_zstream *stream, * too permissive for what we want to check. So do an anal * object header parse by hand. */ -static int parse_loose_header(const char *hdr, struct object_info *oi) +int parse_loose_header(const char *hdr, struct object_info *oi) { const char *type_buf = hdr; size_t size; @@ -2170,138 +2117,3 @@ struct odb_transaction *odb_transaction_files_begin(struct odb_source *source) return &transaction->base; } - -struct odb_loose_read_stream { - struct odb_read_stream base; - git_zstream z; - enum { - ODB_LOOSE_READ_STREAM_INUSE, - ODB_LOOSE_READ_STREAM_DONE, - ODB_LOOSE_READ_STREAM_ERROR, - } z_state; - void *mapped; - unsigned long mapsize; - char hdr[32]; - int hdr_avail; - int hdr_used; -}; - -static ssize_t read_istream_loose(struct odb_read_stream *_st, char *buf, size_t sz) -{ - struct odb_loose_read_stream *st = - container_of(_st, struct odb_loose_read_stream, base); - size_t total_read = 0; - - switch (st->z_state) { - case ODB_LOOSE_READ_STREAM_DONE: - return 0; - case ODB_LOOSE_READ_STREAM_ERROR: - return -1; - default: - break; - } - - if (st->hdr_used < st->hdr_avail) { - size_t to_copy = st->hdr_avail - st->hdr_used; - if (sz < to_copy) - to_copy = sz; - memcpy(buf, st->hdr + st->hdr_used, to_copy); - st->hdr_used += to_copy; - total_read += to_copy; - } - - while (total_read < sz) { - int status; - - st->z.next_out = (unsigned char *)buf + total_read; - st->z.avail_out = sz - total_read; - status = git_inflate(&st->z, Z_FINISH); - - total_read = st->z.next_out - (unsigned char *)buf; - - if (status == Z_STREAM_END) { - git_inflate_end(&st->z); - st->z_state = ODB_LOOSE_READ_STREAM_DONE; - break; - } - if (status != Z_OK && (status != Z_BUF_ERROR || total_read < sz)) { - git_inflate_end(&st->z); - st->z_state = ODB_LOOSE_READ_STREAM_ERROR; - return -1; - } - } - return total_read; -} - -static int close_istream_loose(struct odb_read_stream *_st) -{ - struct odb_loose_read_stream *st = - container_of(_st, struct odb_loose_read_stream, base); - - if (st->z_state == ODB_LOOSE_READ_STREAM_INUSE) - git_inflate_end(&st->z); - munmap(st->mapped, st->mapsize); - return 0; -} - -int odb_source_loose_read_object_stream(struct odb_read_stream **out, - struct odb_source *source, - const struct object_id *oid) -{ - struct object_info oi = OBJECT_INFO_INIT; - struct odb_loose_read_stream *st; - unsigned long mapsize; - unsigned long size_ul; - void *mapped; - - mapped = odb_source_loose_map_object(source, oid, &mapsize); - if (!mapped) - return -1; - - /* - * Note: we must allocate this structure early even though we may still - * fail. This is because we need to initialize the zlib stream, and it - * is not possible to copy the stream around after the fact because it - * has self-referencing pointers. - */ - CALLOC_ARRAY(st, 1); - - switch (unpack_loose_header(&st->z, mapped, mapsize, st->hdr, - sizeof(st->hdr))) { - case ULHR_OK: - break; - case ULHR_BAD: - case ULHR_TOO_LONG: - goto error; - } - - /* - * object_info.sizep is unsigned long* (32-bit on Windows), but - * st->base.size is size_t (64-bit). Use temporary variable. - * Note: loose objects >4GB would still truncate here, but such - * large loose objects are uncommon (they'd normally be packed). - */ - oi.sizep = &size_ul; - oi.typep = &st->base.type; - - if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0) - goto error; - st->base.size = size_ul; - - st->mapped = mapped; - st->mapsize = mapsize; - st->hdr_used = strlen(st->hdr) + 1; - st->hdr_avail = st->z.total_out; - st->z_state = ODB_LOOSE_READ_STREAM_INUSE; - st->base.close = close_istream_loose; - st->base.read = read_istream_loose; - - *out = &st->base; - - return 0; -error: - git_inflate_end(&st->z); - munmap(mapped, mapsize); - free(st); - return -1; -} diff --git a/object-file.h b/object-file.h index 8ac2832dac3439..d93b7ffad704b0 100644 --- a/object-file.h +++ b/object-file.h @@ -18,13 +18,8 @@ int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct s int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags); struct object_info; -struct odb_read_stream; struct odb_source; -int odb_source_loose_read_object_stream(struct odb_read_stream **out, - struct odb_source *source, - const struct object_id *oid); - /* * Return true iff an object database source has a loose object * with the specified name. This function does not respect replace @@ -199,6 +194,32 @@ int read_object_info_from_path(struct odb_source_loose *loose, struct object_info *oi, enum object_info_flags flags); +enum unpack_loose_header_result { + ULHR_OK, + ULHR_BAD, + ULHR_TOO_LONG, +}; + +/** + * unpack_loose_header() initializes the data stream needed to unpack + * a loose object header. + * + * Returns: + * + * - ULHR_OK on success + * - ULHR_BAD on error + * - ULHR_TOO_LONG if the header was too long + * + * It will only parse up to MAX_HEADER_LEN bytes. + */ +enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, + unsigned char *map, + unsigned long mapsize, + void *buffer, + unsigned long bufsiz); + +int parse_loose_header(const char *hdr, struct object_info *oi); + struct odb_transaction; /* diff --git a/odb/source-files.c b/odb/source-files.c index 8d6924755ffb70..90806ddf86b662 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -67,7 +67,7 @@ static int odb_source_files_read_object_stream(struct odb_read_stream **out, { struct odb_source_files *files = odb_source_files_downcast(source); if (!packfile_store_read_object_stream(out, files->packed, oid) || - !odb_source_loose_read_object_stream(out, source, oid)) + !odb_source_read_object_stream(out, &files->loose->base, oid)) return 0; return -1; } diff --git a/odb/source-loose.c b/odb/source-loose.c index 50f387ecf31e38..4b82c6f316512e 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -1,11 +1,13 @@ #include "git-compat-util.h" #include "abspath.h" #include "chdir-notify.h" +#include "gettext.h" #include "loose.h" #include "object-file.h" #include "odb.h" #include "odb/source-files.h" #include "odb/source-loose.h" +#include "odb/streaming.h" #include "oidtree.h" #include "strbuf.h" @@ -30,6 +32,192 @@ static int odb_source_loose_read_object_info(struct odb_source *source, return read_object_info_from_path(loose, buf.buf, oid, oi, flags); } +/* + * Find "oid" as a loose object in given source, open the object and return its + * file descriptor. Returns the file descriptor on success, negative on failure. + * + * The "path" out-parameter will give the path of the object we found (if any). + * Note that it may point to static storage and is only valid until another + * call to open_loose_object(). + */ +static int open_loose_object(struct odb_source_loose *loose, + const struct object_id *oid, const char **path) +{ + static struct strbuf buf = STRBUF_INIT; + int fd; + + *path = odb_loose_path(&loose->base, &buf, oid); + fd = git_open(*path); + if (fd >= 0) + return fd; + + return -1; +} + +static void *odb_source_loose_map_object(struct odb_source_loose *loose, + const struct object_id *oid, + unsigned long *size) +{ + const char *p; + int fd = open_loose_object(loose, oid, &p); + void *map = NULL; + struct stat st; + + if (fd < 0) + return NULL; + + if (!fstat(fd, &st)) { + *size = xsize_t(st.st_size); + if (!*size) { + /* mmap() is forbidden on empty files */ + error(_("object file %s is empty"), p); + goto out; + } + + map = xmmap(NULL, *size, PROT_READ, MAP_PRIVATE, fd, 0); + } + +out: + close(fd); + return map; +} + +struct odb_loose_read_stream { + struct odb_read_stream base; + git_zstream z; + enum { + ODB_LOOSE_READ_STREAM_INUSE, + ODB_LOOSE_READ_STREAM_DONE, + ODB_LOOSE_READ_STREAM_ERROR, + } z_state; + void *mapped; + unsigned long mapsize; + char hdr[32]; + int hdr_avail; + int hdr_used; +}; + +static ssize_t read_istream_loose(struct odb_read_stream *_st, char *buf, size_t sz) +{ + struct odb_loose_read_stream *st = + container_of(_st, struct odb_loose_read_stream, base); + size_t total_read = 0; + + switch (st->z_state) { + case ODB_LOOSE_READ_STREAM_DONE: + return 0; + case ODB_LOOSE_READ_STREAM_ERROR: + return -1; + default: + break; + } + + if (st->hdr_used < st->hdr_avail) { + size_t to_copy = st->hdr_avail - st->hdr_used; + if (sz < to_copy) + to_copy = sz; + memcpy(buf, st->hdr + st->hdr_used, to_copy); + st->hdr_used += to_copy; + total_read += to_copy; + } + + while (total_read < sz) { + int status; + + st->z.next_out = (unsigned char *)buf + total_read; + st->z.avail_out = sz - total_read; + status = git_inflate(&st->z, Z_FINISH); + + total_read = st->z.next_out - (unsigned char *)buf; + + if (status == Z_STREAM_END) { + git_inflate_end(&st->z); + st->z_state = ODB_LOOSE_READ_STREAM_DONE; + break; + } + if (status != Z_OK && (status != Z_BUF_ERROR || total_read < sz)) { + git_inflate_end(&st->z); + st->z_state = ODB_LOOSE_READ_STREAM_ERROR; + return -1; + } + } + return total_read; +} + +static int close_istream_loose(struct odb_read_stream *_st) +{ + struct odb_loose_read_stream *st = + container_of(_st, struct odb_loose_read_stream, base); + + if (st->z_state == ODB_LOOSE_READ_STREAM_INUSE) + git_inflate_end(&st->z); + munmap(st->mapped, st->mapsize); + return 0; +} + +static int odb_source_loose_read_object_stream(struct odb_read_stream **out, + struct odb_source *source, + const struct object_id *oid) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + struct object_info oi = OBJECT_INFO_INIT; + struct odb_loose_read_stream *st; + unsigned long mapsize; + unsigned long size_ul; + void *mapped; + + mapped = odb_source_loose_map_object(loose, oid, &mapsize); + if (!mapped) + return -1; + + /* + * Note: we must allocate this structure early even though we may still + * fail. This is because we need to initialize the zlib stream, and it + * is not possible to copy the stream around after the fact because it + * has self-referencing pointers. + */ + CALLOC_ARRAY(st, 1); + + switch (unpack_loose_header(&st->z, mapped, mapsize, st->hdr, + sizeof(st->hdr))) { + case ULHR_OK: + break; + case ULHR_BAD: + case ULHR_TOO_LONG: + goto error; + } + + /* + * object_info.sizep is unsigned long* (32-bit on Windows), but + * st->base.size is size_t (64-bit). Use temporary variable. + * Note: loose objects >4GB would still truncate here, but such + * large loose objects are uncommon (they'd normally be packed). + */ + oi.sizep = &size_ul; + oi.typep = &st->base.type; + + if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0) + goto error; + st->base.size = size_ul; + + st->mapped = mapped; + st->mapsize = mapsize; + st->hdr_used = strlen(st->hdr) + 1; + st->hdr_avail = st->z.total_out; + st->z_state = ODB_LOOSE_READ_STREAM_INUSE; + st->base.close = close_istream_loose; + st->base.read = read_istream_loose; + + *out = &st->base; + + return 0; +error: + git_inflate_end(&st->z); + munmap(mapped, mapsize); + free(st); + return -1; +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -84,6 +272,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.close = odb_source_loose_close; loose->base.reprepare = odb_source_loose_reprepare; loose->base.read_object_info = odb_source_loose_read_object_info; + loose->base.read_object_stream = odb_source_loose_read_object_stream; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From e4f1d9ba5714957389bee87dd5f9fedb69d8a764 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:31 +0200 Subject: [PATCH 056/107] odb/source-loose: wire up `for_each_object()` callback Move `odb_source_loose_for_each_object()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `for_each_object()` callback of the loose source. Again, as in the preceding commit, we are forced to expose a couple of functions from "object-file.c" that are now used by both subsystems. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/cat-file.c | 5 +- object-file.c | 299 +++------------------------------------------ object-file.h | 32 ++--- odb/source-files.c | 2 +- odb/source-loose.c | 264 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 297 insertions(+), 305 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d9fbad535868bb..2958fc53579336 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -862,8 +862,9 @@ static void batch_each_object(struct batch_options *opt, */ odb_prepare_alternates(the_repository->objects); for (source = the_repository->objects->sources; source; source = source->next) { - int ret = odb_source_loose_for_each_object(source, NULL, batch_one_object_oi, - &payload, &opts); + struct odb_source_files *files = odb_source_files_downcast(source); + int ret = odb_source_for_each_object(&files->loose->base, NULL, batch_one_object_oi, + &payload, &opts); if (ret) break; } diff --git a/object-file.c b/object-file.c index adfb6724936452..157ecad3ea204a 100644 --- a/object-file.c +++ b/object-file.c @@ -22,7 +22,6 @@ #include "odb.h" #include "odb/streaming.h" #include "odb/transaction.h" -#include "oidtree.h" #include "pack.h" #include "packfile.h" #include "path.h" @@ -31,12 +30,6 @@ #include "tempfile.h" #include "tmp-objdir.h" -/* The maximum size for an object header. */ -#define MAX_HEADER_LEN 32 - -static struct oidtree *odb_source_loose_cache(struct odb_source *source, - const struct object_id *oid); - static int get_conv_flags(unsigned flags) { if (flags & INDEX_RENORMALIZE) @@ -164,12 +157,6 @@ int stream_object_signature(struct repository *r, return !oideq(oid, &real_oid) ? -1 : 0; } -static int quick_has_loose(struct odb_source_loose *loose, - const struct object_id *oid) -{ - return !!oidtree_contains(odb_source_loose_cache(&loose->files->base, oid), oid); -} - /* * Map and close the given loose object fd. The path argument is used for * error reporting. @@ -227,9 +214,9 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, return ULHR_TOO_LONG; } -static void *unpack_loose_rest(git_zstream *stream, - void *buffer, unsigned long size, - const struct object_id *oid) +void *unpack_loose_rest(git_zstream *stream, + void *buffer, unsigned long size, + const struct object_id *oid) { size_t bytes = strlen(buffer) + 1, n; unsigned char *buf = xmallocz(size); @@ -343,149 +330,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi) return 0; } -int read_object_info_from_path(struct odb_source_loose *loose, - const char *path, - const struct object_id *oid, - struct object_info *oi, - enum object_info_flags flags) -{ - int ret; - int fd; - unsigned long mapsize; - void *map = NULL; - git_zstream stream, *stream_to_end = NULL; - char hdr[MAX_HEADER_LEN]; - unsigned long size_scratch; - enum object_type type_scratch; - struct stat st; - - /* - * If we don't care about type or size, then we don't - * need to look inside the object at all. Note that we - * do not optimize out the stat call, even if the - * caller doesn't care about the disk-size, since our - * return value implicitly indicates whether the - * object even exists. - */ - if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) { - struct stat st; - - if ((!oi || (!oi->disk_sizep && !oi->mtimep)) && (flags & OBJECT_INFO_QUICK)) { - ret = quick_has_loose(loose, oid) ? 0 : -1; - goto out; - } - - if (lstat(path, &st) < 0) { - ret = -1; - goto out; - } - - if (oi) { - if (oi->disk_sizep) - *oi->disk_sizep = st.st_size; - if (oi->mtimep) - *oi->mtimep = st.st_mtime; - } - - ret = 0; - goto out; - } - - fd = git_open(path); - if (fd < 0) { - if (errno != ENOENT) - error_errno(_("unable to open loose object %s"), oid_to_hex(oid)); - ret = -1; - goto out; - } - - if (fstat(fd, &st)) { - close(fd); - ret = -1; - goto out; - } - - mapsize = xsize_t(st.st_size); - if (!mapsize) { - close(fd); - ret = error(_("object file %s is empty"), path); - goto out; - } - - map = xmmap(NULL, mapsize, PROT_READ, MAP_PRIVATE, fd, 0); - close(fd); - if (!map) { - ret = -1; - goto out; - } - - if (oi->disk_sizep) - *oi->disk_sizep = mapsize; - if (oi->mtimep) - *oi->mtimep = st.st_mtime; - - stream_to_end = &stream; - - switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) { - case ULHR_OK: - if (!oi->sizep) - oi->sizep = &size_scratch; - if (!oi->typep) - oi->typep = &type_scratch; - - if (parse_loose_header(hdr, oi) < 0) { - ret = error(_("unable to parse %s header"), oid_to_hex(oid)); - goto corrupt; - } - - if (*oi->typep < 0) - die(_("invalid object type")); - - if (oi->contentp) { - *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); - if (!*oi->contentp) { - ret = -1; - goto corrupt; - } - } - - break; - case ULHR_BAD: - ret = error(_("unable to unpack %s header"), - oid_to_hex(oid)); - goto corrupt; - case ULHR_TOO_LONG: - ret = error(_("header for %s too long, exceeds %d bytes"), - oid_to_hex(oid), MAX_HEADER_LEN); - goto corrupt; - } - - ret = 0; - -corrupt: - if (ret && (flags & OBJECT_INFO_DIE_IF_CORRUPT)) - die(_("loose object %s (stored in %s) is corrupt"), - oid_to_hex(oid), path); - -out: - if (stream_to_end) - git_inflate_end(stream_to_end); - if (map) - munmap(map, mapsize); - if (oi) { - if (oi->sizep == &size_scratch) - oi->sizep = NULL; - if (oi->typep == &type_scratch) - oi->typep = NULL; - if (oi->delta_base_oid) - oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo); - if (!ret) - oi->whence = OI_LOOSE; - } - - return ret; -} - static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c, const void *buf, unsigned long len, struct object_id *oid, @@ -1667,13 +1511,13 @@ int read_pack_header(int fd, struct pack_header *header) return 0; } -static int for_each_file_in_obj_subdir(unsigned int subdir_nr, - struct strbuf *path, - const struct git_hash_algo *algop, - each_loose_object_fn obj_cb, - each_loose_cruft_fn cruft_cb, - each_loose_subdir_fn subdir_cb, - void *data) +int for_each_file_in_obj_subdir(unsigned int subdir_nr, + struct strbuf *path, + const struct git_hash_algo *algop, + each_loose_object_fn obj_cb, + each_loose_cruft_fn cruft_cb, + each_loose_subdir_fn subdir_cb, + void *data) { size_t origlen, baselen; DIR *dir; @@ -1758,78 +1602,6 @@ int for_each_loose_file_in_source(struct odb_source *source, return r; } -struct for_each_object_wrapper_data { - struct odb_source_loose *loose; - const struct object_info *request; - odb_for_each_object_cb cb; - void *cb_data; -}; - -static int for_each_object_wrapper_cb(const struct object_id *oid, - const char *path, - void *cb_data) -{ - struct for_each_object_wrapper_data *data = cb_data; - - if (data->request) { - struct object_info oi = *data->request; - - if (read_object_info_from_path(data->loose, path, oid, &oi, 0) < 0) - return -1; - - return data->cb(oid, &oi, data->cb_data); - } else { - return data->cb(oid, NULL, data->cb_data); - } -} - -static int for_each_prefixed_object_wrapper_cb(const struct object_id *oid, - void *node_data UNUSED, - void *cb_data) -{ - struct for_each_object_wrapper_data *data = cb_data; - if (data->request) { - struct object_info oi = *data->request; - - if (odb_source_read_object_info(&data->loose->base, - oid, &oi, 0) < 0) - return -1; - - return data->cb(oid, &oi, data->cb_data); - } else { - return data->cb(oid, NULL, data->cb_data); - } -} - -int odb_source_loose_for_each_object(struct odb_source *source, - const struct object_info *request, - odb_for_each_object_cb cb, - void *cb_data, - const struct odb_for_each_object_options *opts) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - struct for_each_object_wrapper_data data = { - .loose = files->loose, - .request = request, - .cb = cb, - .cb_data = cb_data, - }; - - /* There are no loose promisor objects, so we can return immediately. */ - if ((opts->flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY)) - return 0; - if ((opts->flags & ODB_FOR_EACH_OBJECT_LOCAL_ONLY) && !source->local) - return 0; - - if (opts->prefix) - return oidtree_each(odb_source_loose_cache(source, opts->prefix), - opts->prefix, opts->prefix_hex_len, - for_each_prefixed_object_wrapper_cb, &data); - - return for_each_loose_file_in_source(source, for_each_object_wrapper_cb, - NULL, NULL, &data); -} - static int count_loose_object(const struct object_id *oid UNUSED, struct object_info *oi UNUSED, void *payload) @@ -1843,6 +1615,7 @@ int odb_source_loose_count_objects(struct odb_source *source, enum odb_count_objects_flags flags, unsigned long *out) { + struct odb_source_files *files = odb_source_files_downcast(source); const unsigned hexsz = source->odb->repo->hash_algo->hexsz - 2; char *path = NULL; DIR *dir = NULL; @@ -1878,8 +1651,8 @@ int odb_source_loose_count_objects(struct odb_source *source, } else { struct odb_for_each_object_options opts = { 0 }; *out = 0; - ret = odb_source_loose_for_each_object(source, NULL, count_loose_object, - out, &opts); + ret = odb_source_for_each_object(&files->loose->base, NULL, count_loose_object, + out, &opts); } out: @@ -1910,6 +1683,7 @@ int odb_source_loose_find_abbrev_len(struct odb_source *source, unsigned min_len, unsigned *out) { + struct odb_source_files *files = odb_source_files_downcast(source); struct odb_for_each_object_options opts = { .prefix = oid, .prefix_hex_len = min_len, @@ -1920,54 +1694,13 @@ int odb_source_loose_find_abbrev_len(struct odb_source *source, }; int ret; - ret = odb_source_loose_for_each_object(source, NULL, find_abbrev_len_cb, - &data, &opts); + ret = odb_source_for_each_object(&files->loose->base, NULL, find_abbrev_len_cb, + &data, &opts); *out = data.len; return ret; } -static int append_loose_object(const struct object_id *oid, - const char *path UNUSED, - void *data) -{ - oidtree_insert(data, oid, NULL); - return 0; -} - -static struct oidtree *odb_source_loose_cache(struct odb_source *source, - const struct object_id *oid) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - int subdir_nr = oid->hash[0]; - struct strbuf buf = STRBUF_INIT; - size_t word_bits = bitsizeof(files->loose->subdir_seen[0]); - size_t word_index = subdir_nr / word_bits; - size_t mask = (size_t)1u << (subdir_nr % word_bits); - uint32_t *bitmap; - - if (subdir_nr < 0 || - (size_t) subdir_nr >= bitsizeof(files->loose->subdir_seen)) - BUG("subdir_nr out of range"); - - bitmap = &files->loose->subdir_seen[word_index]; - if (*bitmap & mask) - return files->loose->cache; - if (!files->loose->cache) { - ALLOC_ARRAY(files->loose->cache, 1); - oidtree_init(files->loose->cache); - } - strbuf_addstr(&buf, source->path); - for_each_file_in_obj_subdir(subdir_nr, &buf, - source->odb->repo->hash_algo, - append_loose_object, - NULL, NULL, - files->loose->cache); - *bitmap |= mask; - strbuf_release(&buf); - return files->loose->cache; -} - static int check_stream_oid(git_zstream *stream, const char *hdr, unsigned long size, diff --git a/object-file.h b/object-file.h index d93b7ffad704b0..9ee5649220931b 100644 --- a/object-file.h +++ b/object-file.h @@ -6,6 +6,9 @@ #include "odb.h" #include "odb/source-loose.h" +/* The maximum size for an object header. */ +#define MAX_HEADER_LEN 32 + struct index_state; enum { @@ -85,19 +88,13 @@ int for_each_loose_file_in_source(struct odb_source *source, each_loose_cruft_fn cruft_cb, each_loose_subdir_fn subdir_cb, void *data); - -/* - * Iterate through all loose objects in the given object database source and - * invoke the callback function for each of them. If an object info request is - * given, then the object info will be read for every individual object and - * passed to the callback as if `odb_source_loose_read_object_info()` was - * called for the object. - */ -int odb_source_loose_for_each_object(struct odb_source *source, - const struct object_info *request, - odb_for_each_object_cb cb, - void *cb_data, - const struct odb_for_each_object_options *opts); +int for_each_file_in_obj_subdir(unsigned int subdir_nr, + struct strbuf *path, + const struct git_hash_algo *algop, + each_loose_object_fn obj_cb, + each_loose_cruft_fn cruft_cb, + each_loose_subdir_fn subdir_cb, + void *data); /* * Count the number of loose objects in this source. @@ -188,12 +185,6 @@ int read_loose_object(struct repository *repo, void **contents, struct object_info *oi); -int read_object_info_from_path(struct odb_source_loose *loose, - const char *path, - const struct object_id *oid, - struct object_info *oi, - enum object_info_flags flags); - enum unpack_loose_header_result { ULHR_OK, ULHR_BAD, @@ -217,6 +208,9 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, unsigned long mapsize, void *buffer, unsigned long bufsiz); +void *unpack_loose_rest(git_zstream *stream, + void *buffer, unsigned long size, + const struct object_id *oid); int parse_loose_header(const char *hdr, struct object_info *oi); diff --git a/odb/source-files.c b/odb/source-files.c index 90806ddf86b662..676a641739bcbf 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -82,7 +82,7 @@ static int odb_source_files_for_each_object(struct odb_source *source, int ret; if (!(opts->flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY)) { - ret = odb_source_loose_for_each_object(source, request, cb, cb_data, opts); + ret = odb_source_for_each_object(&files->loose->base, request, cb, cb_data, opts); if (ret) return ret; } diff --git a/odb/source-loose.c b/odb/source-loose.c index 4b82c6f316512e..4e8b923498b5b2 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -2,6 +2,7 @@ #include "abspath.h" #include "chdir-notify.h" #include "gettext.h" +#include "hex.h" #include "loose.h" #include "object-file.h" #include "odb.h" @@ -9,8 +10,198 @@ #include "odb/source-loose.h" #include "odb/streaming.h" #include "oidtree.h" +#include "repository.h" #include "strbuf.h" +static int append_loose_object(const struct object_id *oid, + const char *path UNUSED, + void *data) +{ + oidtree_insert(data, oid, NULL); + return 0; +} + +static struct oidtree *odb_source_loose_cache(struct odb_source_loose *loose, + const struct object_id *oid) +{ + int subdir_nr = oid->hash[0]; + struct strbuf buf = STRBUF_INIT; + size_t word_bits = bitsizeof(loose->subdir_seen[0]); + size_t word_index = subdir_nr / word_bits; + size_t mask = (size_t)1u << (subdir_nr % word_bits); + uint32_t *bitmap; + + if (subdir_nr < 0 || + (size_t) subdir_nr >= bitsizeof(loose->subdir_seen)) + BUG("subdir_nr out of range"); + + bitmap = &loose->subdir_seen[word_index]; + if (*bitmap & mask) + return loose->cache; + if (!loose->cache) { + ALLOC_ARRAY(loose->cache, 1); + oidtree_init(loose->cache); + } + strbuf_addstr(&buf, loose->base.path); + for_each_file_in_obj_subdir(subdir_nr, &buf, + loose->base.odb->repo->hash_algo, + append_loose_object, + NULL, NULL, + loose->cache); + *bitmap |= mask; + strbuf_release(&buf); + return loose->cache; +} + +static int quick_has_loose(struct odb_source_loose *loose, + const struct object_id *oid) +{ + return !!oidtree_contains(odb_source_loose_cache(loose, oid), oid); +} + +static int read_object_info_from_path(struct odb_source_loose *loose, + const char *path, + const struct object_id *oid, + struct object_info *oi, + enum object_info_flags flags) +{ + int ret; + int fd; + unsigned long mapsize; + void *map = NULL; + git_zstream stream, *stream_to_end = NULL; + char hdr[MAX_HEADER_LEN]; + unsigned long size_scratch; + enum object_type type_scratch; + struct stat st; + + /* + * If we don't care about type or size, then we don't + * need to look inside the object at all. Note that we + * do not optimize out the stat call, even if the + * caller doesn't care about the disk-size, since our + * return value implicitly indicates whether the + * object even exists. + */ + if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) { + struct stat st; + + if ((!oi || (!oi->disk_sizep && !oi->mtimep)) && (flags & OBJECT_INFO_QUICK)) { + ret = quick_has_loose(loose, oid) ? 0 : -1; + goto out; + } + + if (lstat(path, &st) < 0) { + ret = -1; + goto out; + } + + if (oi) { + if (oi->disk_sizep) + *oi->disk_sizep = st.st_size; + if (oi->mtimep) + *oi->mtimep = st.st_mtime; + } + + ret = 0; + goto out; + } + + fd = git_open(path); + if (fd < 0) { + if (errno != ENOENT) + error_errno(_("unable to open loose object %s"), oid_to_hex(oid)); + ret = -1; + goto out; + } + + if (fstat(fd, &st)) { + close(fd); + ret = -1; + goto out; + } + + mapsize = xsize_t(st.st_size); + if (!mapsize) { + close(fd); + ret = error(_("object file %s is empty"), path); + goto out; + } + + map = xmmap(NULL, mapsize, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); + if (!map) { + ret = -1; + goto out; + } + + if (oi->disk_sizep) + *oi->disk_sizep = mapsize; + if (oi->mtimep) + *oi->mtimep = st.st_mtime; + + stream_to_end = &stream; + + switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) { + case ULHR_OK: + if (!oi->sizep) + oi->sizep = &size_scratch; + if (!oi->typep) + oi->typep = &type_scratch; + + if (parse_loose_header(hdr, oi) < 0) { + ret = error(_("unable to parse %s header"), oid_to_hex(oid)); + goto corrupt; + } + + if (*oi->typep < 0) + die(_("invalid object type")); + + if (oi->contentp) { + *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); + if (!*oi->contentp) { + ret = -1; + goto corrupt; + } + } + + break; + case ULHR_BAD: + ret = error(_("unable to unpack %s header"), + oid_to_hex(oid)); + goto corrupt; + case ULHR_TOO_LONG: + ret = error(_("header for %s too long, exceeds %d bytes"), + oid_to_hex(oid), MAX_HEADER_LEN); + goto corrupt; + } + + ret = 0; + +corrupt: + if (ret && (flags & OBJECT_INFO_DIE_IF_CORRUPT)) + die(_("loose object %s (stored in %s) is corrupt"), + oid_to_hex(oid), path); + +out: + if (stream_to_end) + git_inflate_end(stream_to_end); + if (map) + munmap(map, mapsize); + if (oi) { + if (oi->sizep == &size_scratch) + oi->sizep = NULL; + if (oi->typep == &type_scratch) + oi->typep = NULL; + if (oi->delta_base_oid) + oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo); + if (!ret) + oi->whence = OI_LOOSE; + } + + return ret; +} + static int odb_source_loose_read_object_info(struct odb_source *source, const struct object_id *oid, struct object_info *oi, @@ -218,6 +409,78 @@ static int odb_source_loose_read_object_stream(struct odb_read_stream **out, return -1; } +struct for_each_object_wrapper_data { + struct odb_source_loose *loose; + const struct object_info *request; + odb_for_each_object_cb cb; + void *cb_data; +}; + +static int for_each_object_wrapper_cb(const struct object_id *oid, + const char *path, + void *cb_data) +{ + struct for_each_object_wrapper_data *data = cb_data; + + if (data->request) { + struct object_info oi = *data->request; + + if (read_object_info_from_path(data->loose, path, oid, &oi, 0) < 0) + return -1; + + return data->cb(oid, &oi, data->cb_data); + } else { + return data->cb(oid, NULL, data->cb_data); + } +} + +static int for_each_prefixed_object_wrapper_cb(const struct object_id *oid, + void *node_data UNUSED, + void *cb_data) +{ + struct for_each_object_wrapper_data *data = cb_data; + if (data->request) { + struct object_info oi = *data->request; + + if (odb_source_read_object_info(&data->loose->base, + oid, &oi, 0) < 0) + return -1; + + return data->cb(oid, &oi, data->cb_data); + } else { + return data->cb(oid, NULL, data->cb_data); + } +} + +static int odb_source_loose_for_each_object(struct odb_source *source, + const struct object_info *request, + odb_for_each_object_cb cb, + void *cb_data, + const struct odb_for_each_object_options *opts) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + struct for_each_object_wrapper_data data = { + .loose = loose, + .request = request, + .cb = cb, + .cb_data = cb_data, + }; + + /* There are no loose promisor objects, so we can return immediately. */ + if ((opts->flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY)) + return 0; + if ((opts->flags & ODB_FOR_EACH_OBJECT_LOCAL_ONLY) && !source->local) + return 0; + + if (opts->prefix) + return oidtree_each(odb_source_loose_cache(loose, opts->prefix), + opts->prefix, opts->prefix_hex_len, + for_each_prefixed_object_wrapper_cb, &data); + + return for_each_loose_file_in_source(source, for_each_object_wrapper_cb, + NULL, NULL, &data); +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -273,6 +536,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.reprepare = odb_source_loose_reprepare; loose->base.read_object_info = odb_source_loose_read_object_info; loose->base.read_object_stream = odb_source_loose_read_object_stream; + loose->base.for_each_object = odb_source_loose_for_each_object; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From 8a6da81cc113607bdc1ac08395f6e7121cd652e9 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:32 +0200 Subject: [PATCH 057/107] odb/source-loose: wire up `find_abbrev_len()` callback Move `odb_source_loose_find_abbrev_len()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `find_abbrev_len` callback of the loose source. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 39 --------------------------------------- object-file.h | 12 ------------ odb/source-files.c | 2 +- odb/source-loose.c | 40 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 41 insertions(+), 52 deletions(-) diff --git a/object-file.c b/object-file.c index 157ecad3ea204a..11957aa44f44fd 100644 --- a/object-file.c +++ b/object-file.c @@ -1662,45 +1662,6 @@ int odb_source_loose_count_objects(struct odb_source *source, return ret; } -struct find_abbrev_len_data { - const struct object_id *oid; - unsigned len; -}; - -static int find_abbrev_len_cb(const struct object_id *oid, - struct object_info *oi UNUSED, - void *cb_data) -{ - struct find_abbrev_len_data *data = cb_data; - unsigned len = oid_common_prefix_hexlen(oid, data->oid); - if (len != hash_algos[oid->algo].hexsz && len >= data->len) - data->len = len + 1; - return 0; -} - -int odb_source_loose_find_abbrev_len(struct odb_source *source, - const struct object_id *oid, - unsigned min_len, - unsigned *out) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - struct odb_for_each_object_options opts = { - .prefix = oid, - .prefix_hex_len = min_len, - }; - struct find_abbrev_len_data data = { - .oid = oid, - .len = min_len, - }; - int ret; - - ret = odb_source_for_each_object(&files->loose->base, NULL, find_abbrev_len_cb, - &data, &opts); - *out = data.len; - - return ret; -} - static int check_stream_oid(git_zstream *stream, const char *hdr, unsigned long size, diff --git a/object-file.h b/object-file.h index 9ee5649220931b..96760db0e1cb2b 100644 --- a/object-file.h +++ b/object-file.h @@ -110,18 +110,6 @@ int odb_source_loose_count_objects(struct odb_source *source, enum odb_count_objects_flags flags, unsigned long *out); -/* - * Find the shortest unique prefix for the given object ID, where `min_len` is - * the minimum length that the prefix should have. - * - * Returns 0 on success, in which case the computed length will be written to - * `out`. Otherwise, a negative error code is returned. - */ -int odb_source_loose_find_abbrev_len(struct odb_source *source, - const struct object_id *oid, - unsigned min_len, - unsigned *out); - /** * format_object_header() is a thin wrapper around s xsnprintf() that * writes the initial " " part of the loose object diff --git a/odb/source-files.c b/odb/source-files.c index 676a641739bcbf..4a54b10e4af11d 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -136,7 +136,7 @@ static int odb_source_files_find_abbrev_len(struct odb_source *source, if (ret < 0) goto out; - ret = odb_source_loose_find_abbrev_len(source, oid, len, &len); + ret = odb_source_find_abbrev_len(&files->loose->base, oid, len, &len); if (ret < 0) goto out; diff --git a/odb/source-loose.c b/odb/source-loose.c index 4e8b923498b5b2..4b8d10bc870374 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -481,6 +481,45 @@ static int odb_source_loose_for_each_object(struct odb_source *source, NULL, NULL, &data); } +struct find_abbrev_len_data { + const struct object_id *oid; + unsigned len; +}; + +static int find_abbrev_len_cb(const struct object_id *oid, + struct object_info *oi UNUSED, + void *cb_data) +{ + struct find_abbrev_len_data *data = cb_data; + unsigned len = oid_common_prefix_hexlen(oid, data->oid); + if (len != hash_algos[oid->algo].hexsz && len >= data->len) + data->len = len + 1; + return 0; +} + +static int odb_source_loose_find_abbrev_len(struct odb_source *source, + const struct object_id *oid, + unsigned min_len, + unsigned *out) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + struct odb_for_each_object_options opts = { + .prefix = oid, + .prefix_hex_len = min_len, + }; + struct find_abbrev_len_data data = { + .oid = oid, + .len = min_len, + }; + int ret; + + ret = odb_source_for_each_object(&loose->base, NULL, find_abbrev_len_cb, + &data, &opts); + *out = data.len; + + return ret; +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -537,6 +576,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.read_object_info = odb_source_loose_read_object_info; loose->base.read_object_stream = odb_source_loose_read_object_stream; loose->base.for_each_object = odb_source_loose_for_each_object; + loose->base.find_abbrev_len = odb_source_loose_find_abbrev_len; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From 2ade08ac2978dc1c908602c2a4d653836ecb5acb Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:33 +0200 Subject: [PATCH 058/107] odb/source-loose: wire up `count_objects()` callback Move `odb_source_loose_count_objects()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `count_objects()` callback of the loose source. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/gc.c | 6 ++--- object-file.c | 60 --------------------------------------------- object-file.h | 14 ----------- odb/source-files.c | 2 +- odb/source-loose.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 65 insertions(+), 78 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index 84a66d32404e4d..c26c93ee0fe4a3 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -466,6 +466,7 @@ static int rerere_gc_condition(struct gc_config *cfg UNUSED) static int too_many_loose_objects(int limit) { + struct odb_source_files *files = odb_source_files_downcast(the_repository->objects->sources); /* * This is weird, but stems from legacy behaviour: the GC auto * threshold was always essentially interpreted as if it was rounded up @@ -474,9 +475,8 @@ static int too_many_loose_objects(int limit) int auto_threshold = DIV_ROUND_UP(limit, 256) * 256; unsigned long loose_count; - if (odb_source_loose_count_objects(the_repository->objects->sources, - ODB_COUNT_OBJECTS_APPROXIMATE, - &loose_count) < 0) + if (odb_source_count_objects(&files->loose->base, ODB_COUNT_OBJECTS_APPROXIMATE, + &loose_count) < 0) return 0; return loose_count > auto_threshold; diff --git a/object-file.c b/object-file.c index 11957aa44f44fd..9b2044de3784e6 100644 --- a/object-file.c +++ b/object-file.c @@ -1602,66 +1602,6 @@ int for_each_loose_file_in_source(struct odb_source *source, return r; } -static int count_loose_object(const struct object_id *oid UNUSED, - struct object_info *oi UNUSED, - void *payload) -{ - unsigned long *count = payload; - (*count)++; - return 0; -} - -int odb_source_loose_count_objects(struct odb_source *source, - enum odb_count_objects_flags flags, - unsigned long *out) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - const unsigned hexsz = source->odb->repo->hash_algo->hexsz - 2; - char *path = NULL; - DIR *dir = NULL; - int ret; - - if (flags & ODB_COUNT_OBJECTS_APPROXIMATE) { - unsigned long count = 0; - struct dirent *ent; - - path = xstrfmt("%s/17", source->path); - - dir = opendir(path); - if (!dir) { - if (errno == ENOENT) { - *out = 0; - ret = 0; - goto out; - } - - ret = error_errno("cannot open object shard '%s'", path); - goto out; - } - - while ((ent = readdir(dir)) != NULL) { - if (strspn(ent->d_name, "0123456789abcdef") != hexsz || - ent->d_name[hexsz] != '\0') - continue; - count++; - } - - *out = count * 256; - ret = 0; - } else { - struct odb_for_each_object_options opts = { 0 }; - *out = 0; - ret = odb_source_for_each_object(&files->loose->base, NULL, count_loose_object, - out, &opts); - } - -out: - if (dir) - closedir(dir); - free(path); - return ret; -} - static int check_stream_oid(git_zstream *stream, const char *hdr, unsigned long size, diff --git a/object-file.h b/object-file.h index 96760db0e1cb2b..bc72d89f548915 100644 --- a/object-file.h +++ b/object-file.h @@ -96,20 +96,6 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, each_loose_subdir_fn subdir_cb, void *data); -/* - * Count the number of loose objects in this source. - * - * The object count is approximated by opening a single sharding directory for - * loose objects and scanning its contents. The result is then extrapolated by - * 256. This should generally work as a reasonable estimate given that the - * object hash is supposed to be indistinguishable from random. - * - * Returns 0 on success, a negative error code otherwise. - */ -int odb_source_loose_count_objects(struct odb_source *source, - enum odb_count_objects_flags flags, - unsigned long *out); - /** * format_object_header() is a thin wrapper around s xsnprintf() that * writes the initial " " part of the loose object diff --git a/odb/source-files.c b/odb/source-files.c index 4a54b10e4af11d..d5454e170dee66 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -109,7 +109,7 @@ static int odb_source_files_count_objects(struct odb_source *source, if (!(flags & ODB_COUNT_OBJECTS_APPROXIMATE)) { unsigned long loose_count; - ret = odb_source_loose_count_objects(source, flags, &loose_count); + ret = odb_source_count_objects(&files->loose->base, flags, &loose_count); if (ret < 0) goto out; diff --git a/odb/source-loose.c b/odb/source-loose.c index 4b8d10bc870374..27be066327a313 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -520,6 +520,66 @@ static int odb_source_loose_find_abbrev_len(struct odb_source *source, return ret; } +static int count_loose_object(const struct object_id *oid UNUSED, + struct object_info *oi UNUSED, + void *payload) +{ + unsigned long *count = payload; + (*count)++; + return 0; +} + +static int odb_source_loose_count_objects(struct odb_source *source, + enum odb_count_objects_flags flags, + unsigned long *out) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + const unsigned hexsz = source->odb->repo->hash_algo->hexsz - 2; + char *path = NULL; + DIR *dir = NULL; + int ret; + + if (flags & ODB_COUNT_OBJECTS_APPROXIMATE) { + unsigned long count = 0; + struct dirent *ent; + + path = xstrfmt("%s/17", source->path); + + dir = opendir(path); + if (!dir) { + if (errno == ENOENT) { + *out = 0; + ret = 0; + goto out; + } + + ret = error_errno("cannot open object shard '%s'", path); + goto out; + } + + while ((ent = readdir(dir)) != NULL) { + if (strspn(ent->d_name, "0123456789abcdef") != hexsz || + ent->d_name[hexsz] != '\0') + continue; + count++; + } + + *out = count * 256; + ret = 0; + } else { + struct odb_for_each_object_options opts = { 0 }; + *out = 0; + ret = odb_source_for_each_object(&loose->base, NULL, count_loose_object, + out, &opts); + } + +out: + if (dir) + closedir(dir); + free(path); + return ret; +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -577,6 +637,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.read_object_stream = odb_source_loose_read_object_stream; loose->base.for_each_object = odb_source_loose_for_each_object; loose->base.find_abbrev_len = odb_source_loose_find_abbrev_len; + loose->base.count_objects = odb_source_loose_count_objects; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From 86f7ab5a1f12ecfdf51b6df0b9b014e2329944be Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:34 +0200 Subject: [PATCH 059/107] odb/source-loose: drop `odb_source_loose_has_object()` The function `odb_source_loose_has_object()` checks whether a specific object exists as a loose object on disk by using lstat(3p). This interface is somewhat redundant, as we typically check for object existence in a generic way via `odb_source_read_object_info()`. In fact, these two calls are redundant in case the latter is called in a specific way: when called without an object info request and without the `OBJECT_INFO_QUICK` flag, then we will end up doing the same call to lstat(3p) in `read_object_info_from_path()`. Drop the function and adapt callers to instead use the generic interface so that its calling conventions align with that of other sources. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 12 ++++++++---- object-file.c | 12 ++++-------- object-file.h | 8 -------- 3 files changed, 12 insertions(+), 20 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 480cc0bd8c8d22..a6be3d659f8e36 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1750,9 +1750,11 @@ static int want_object_in_pack_mtime(const struct object_id *oid, * skip the local object source. */ struct odb_source *source = the_repository->objects->sources->next; - for (; source; source = source->next) - if (odb_source_loose_has_object(source, oid)) + for (; source; source = source->next) { + struct odb_source_files *files = odb_source_files_downcast(source); + if (!odb_source_read_object_info(&files->loose->base, oid, NULL, 0)) return 0; + } } /* @@ -4135,9 +4137,11 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type struct odb_source *source = the_repository->objects->sources; int found = 0; - for (; !found && source; source = source->next) - if (odb_source_loose_has_object(source, oid)) + for (; !found && source; source = source->next) { + struct odb_source_files *files = odb_source_files_downcast(source); + if (!odb_source_read_object_info(&files->loose->base, oid, NULL, 0)) found = 1; + } /* * If a traversed tree has a missing blob then we want diff --git a/object-file.c b/object-file.c index 9b2044de3784e6..c83136cf70024c 100644 --- a/object-file.c +++ b/object-file.c @@ -96,12 +96,6 @@ static int check_and_freshen_source(struct odb_source *source, return check_and_freshen_file(path.buf, freshen); } -int odb_source_loose_has_object(struct odb_source *source, - const struct object_id *oid) -{ - return check_and_freshen_source(source, oid, 0); -} - int format_object_header(char *str, size_t size, enum object_type type, size_t objsize) { @@ -1000,9 +994,11 @@ int force_object_loose(struct odb_source *source, int hdrlen; int ret; - for (struct odb_source *s = source->odb->sources; s; s = s->next) - if (odb_source_loose_has_object(s, oid)) + for (struct odb_source *s = source->odb->sources; s; s = s->next) { + struct odb_source_files *files = odb_source_files_downcast(s); + if (!odb_source_read_object_info(&files->loose->base, oid, NULL, 0)) return 0; + } oi.typep = &type; oi.sizep = &len; diff --git a/object-file.h b/object-file.h index bc72d89f548915..506ca6be40b749 100644 --- a/object-file.h +++ b/object-file.h @@ -23,14 +23,6 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa struct object_info; struct odb_source; -/* - * Return true iff an object database source has a loose object - * with the specified name. This function does not respect replace - * references. - */ -int odb_source_loose_has_object(struct odb_source *source, - const struct object_id *oid); - int odb_source_loose_freshen_object(struct odb_source *source, const struct object_id *oid); From d8b9e8bb23ece128179ad54ed5ecbcd4bd809b1e Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:35 +0200 Subject: [PATCH 060/107] odb/source-loose: wire up `freshen_object()` callback Move `odb_source_loose_freshen_object()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `freshen_object()` callback of the loose source. As part of the move, `check_and_freshen_source()` is inlined into the callback function, as it has no other callers anymore. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 15 --------------- object-file.h | 3 --- odb/source-files.c | 2 +- odb/source-loose.c | 9 +++++++++ 4 files changed, 10 insertions(+), 19 deletions(-) diff --git a/object-file.c b/object-file.c index c83136cf70024c..0689a4e67b156a 100644 --- a/object-file.c +++ b/object-file.c @@ -87,15 +87,6 @@ int check_and_freshen_file(const char *fn, int freshen) return 1; } -static int check_and_freshen_source(struct odb_source *source, - const struct object_id *oid, - int freshen) -{ - static struct strbuf path = STRBUF_INIT; - odb_loose_path(source, &path, oid); - return check_and_freshen_file(path.buf, freshen); -} - int format_object_header(char *str, size_t size, enum object_type type, size_t objsize) { @@ -815,12 +806,6 @@ static int write_loose_object(struct odb_source *source, FOF_SKIP_COLLISION_CHECK); } -int odb_source_loose_freshen_object(struct odb_source *source, - const struct object_id *oid) -{ - return !!check_and_freshen_source(source, oid, 1); -} - int odb_source_loose_write_stream(struct odb_source *source, struct odb_write_stream *in_stream, size_t len, struct object_id *oid) diff --git a/object-file.h b/object-file.h index 506ca6be40b749..1d90df9d98b78e 100644 --- a/object-file.h +++ b/object-file.h @@ -23,9 +23,6 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa struct object_info; struct odb_source; -int odb_source_loose_freshen_object(struct odb_source *source, - const struct object_id *oid); - int odb_source_loose_write_object(struct odb_source *source, const void *buf, unsigned long len, enum object_type type, struct object_id *oid, diff --git a/odb/source-files.c b/odb/source-files.c index d5454e170dee66..ef548e6fe69cd0 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -152,7 +152,7 @@ static int odb_source_files_freshen_object(struct odb_source *source, { struct odb_source_files *files = odb_source_files_downcast(source); if (packfile_store_freshen_object(files->packed, oid) || - odb_source_loose_freshen_object(source, oid)) + odb_source_freshen_object(&files->loose->base, oid)) return 1; return 0; } diff --git a/odb/source-loose.c b/odb/source-loose.c index 27be066327a313..e519365d23f680 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -580,6 +580,14 @@ static int odb_source_loose_count_objects(struct odb_source *source, return ret; } +static int odb_source_loose_freshen_object(struct odb_source *source, + const struct object_id *oid) +{ + static struct strbuf path = STRBUF_INIT; + odb_loose_path(source, &path, oid); + return !!check_and_freshen_file(path.buf, 1); +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -638,6 +646,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.for_each_object = odb_source_loose_for_each_object; loose->base.find_abbrev_len = odb_source_loose_find_abbrev_len; loose->base.count_objects = odb_source_loose_count_objects; + loose->base.freshen_object = odb_source_loose_freshen_object; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From 87588db131a5c1c33471606860951c9959bbe6ae Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:36 +0200 Subject: [PATCH 061/107] loose: refactor object map to operate on `struct odb_source_loose` While the loose object map functions in "loose.c" accept a generic `struct odb_source *`, they always expect this to be the "files" backend. Furthermore, the subsystem doesn't even care about the "files" backend, but only uses it as a stepping stone to get to the "loose" backend. This assumption is implicit and thus not immediately obvious. Refactor the interfaces to instead operate on a `struct odb_source_loose` instead, which eliminates the implicit dependency and unnecessary detour via the "files" source. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- loose.c | 45 ++++++++++++++++++++++----------------------- loose.h | 4 ++-- object-file.c | 9 ++++++--- 3 files changed, 30 insertions(+), 28 deletions(-) diff --git a/loose.c b/loose.c index f7a3dd1a72f0fc..0b626c1b854642 100644 --- a/loose.c +++ b/loose.c @@ -46,38 +46,36 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const return 1; } -static int insert_loose_map(struct odb_source *source, +static int insert_loose_map(struct odb_source_loose *loose, const struct object_id *oid, const struct object_id *compat_oid) { - struct odb_source_files *files = odb_source_files_downcast(source); - struct loose_object_map *map = files->loose->map; + struct loose_object_map *map = loose->map; int inserted = 0; inserted |= insert_oid_pair(map->to_compat, oid, compat_oid); inserted |= insert_oid_pair(map->to_storage, compat_oid, oid); if (inserted) - oidtree_insert(files->loose->cache, compat_oid, NULL); + oidtree_insert(loose->cache, compat_oid, NULL); return inserted; } -static int load_one_loose_object_map(struct repository *repo, struct odb_source *source) +static int load_one_loose_object_map(struct repository *repo, struct odb_source_loose *loose) { - struct odb_source_files *files = odb_source_files_downcast(source); struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT; FILE *fp; - if (!files->loose->map) - loose_object_map_init(&files->loose->map); - if (!files->loose->cache) { - ALLOC_ARRAY(files->loose->cache, 1); - oidtree_init(files->loose->cache); + if (!loose->map) + loose_object_map_init(&loose->map); + if (!loose->cache) { + ALLOC_ARRAY(loose->cache, 1); + oidtree_init(loose->cache); } - insert_loose_map(source, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree); - insert_loose_map(source, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob); - insert_loose_map(source, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid); + insert_loose_map(loose, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree); + insert_loose_map(loose, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob); + insert_loose_map(loose, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid); repo_common_path_replace(repo, &path, "objects/loose-object-idx"); fp = fopen(path.buf, "rb"); @@ -97,7 +95,7 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) || p != buf.buf + buf.len) goto err; - insert_loose_map(source, &oid, &compat_oid); + insert_loose_map(loose, &oid, &compat_oid); } strbuf_release(&buf); @@ -119,7 +117,8 @@ int repo_read_loose_object_map(struct repository *repo) odb_prepare_alternates(repo->objects); for (source = repo->objects->sources; source; source = source->next) { - if (load_one_loose_object_map(repo, source) < 0) { + struct odb_source_files *files = odb_source_files_downcast(source); + if (load_one_loose_object_map(repo, files->loose) < 0) { return -1; } } @@ -171,7 +170,7 @@ int repo_write_loose_object_map(struct repository *repo) return -1; } -static int write_one_object(struct odb_source *source, +static int write_one_object(struct odb_source_loose *loose, const struct object_id *oid, const struct object_id *compat_oid) { @@ -180,7 +179,7 @@ static int write_one_object(struct odb_source *source, struct stat st; struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT; - strbuf_addf(&path, "%s/loose-object-idx", source->path); + strbuf_addf(&path, "%s/loose-object-idx", loose->base.path); hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1); fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666); @@ -196,7 +195,7 @@ static int write_one_object(struct odb_source *source, goto errout; if (close(fd)) goto errout; - adjust_shared_perm(source->odb->repo, path.buf); + adjust_shared_perm(loose->base.odb->repo, path.buf); rollback_lock_file(&lock); strbuf_release(&buf); strbuf_release(&path); @@ -210,18 +209,18 @@ static int write_one_object(struct odb_source *source, return -1; } -int repo_add_loose_object_map(struct odb_source *source, +int repo_add_loose_object_map(struct odb_source_loose *loose, const struct object_id *oid, const struct object_id *compat_oid) { int inserted = 0; - if (!should_use_loose_object_map(source->odb->repo)) + if (!should_use_loose_object_map(loose->base.odb->repo)) return 0; - inserted = insert_loose_map(source, oid, compat_oid); + inserted = insert_loose_map(loose, oid, compat_oid); if (inserted) - return write_one_object(source, oid, compat_oid); + return write_one_object(loose, oid, compat_oid); return 0; } diff --git a/loose.h b/loose.h index 6af1702973c058..6c9b3f4571602f 100644 --- a/loose.h +++ b/loose.h @@ -4,7 +4,7 @@ #include "khash.h" struct repository; -struct odb_source; +struct odb_source_loose; struct loose_object_map { kh_oid_map_t *to_compat; @@ -17,7 +17,7 @@ int repo_loose_object_map_oid(struct repository *repo, const struct object_id *src, const struct git_hash_algo *dest_algo, struct object_id *dest); -int repo_add_loose_object_map(struct odb_source *source, +int repo_add_loose_object_map(struct odb_source_loose *loose, const struct object_id *oid, const struct object_id *compat_oid); int repo_read_loose_object_map(struct repository *repo); diff --git a/object-file.c b/object-file.c index 0689a4e67b156a..fe24f00d1b79bf 100644 --- a/object-file.c +++ b/object-file.c @@ -810,6 +810,7 @@ int odb_source_loose_write_stream(struct odb_source *source, struct odb_write_stream *in_stream, size_t len, struct object_id *oid) { + struct odb_source_files *files = odb_source_files_downcast(source); const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; struct object_id compat_oid; int fd, ret, err = 0, flush = 0; @@ -918,7 +919,7 @@ int odb_source_loose_write_stream(struct odb_source *source, err = finalize_object_file_flags(source->odb->repo, tmp_file.buf, filename.buf, FOF_SKIP_COLLISION_CHECK); if (!err && compat) - err = repo_add_loose_object_map(source, oid, &compat_oid); + err = repo_add_loose_object_map(files->loose, oid, &compat_oid); cleanup: strbuf_release(&tmp_file); strbuf_release(&filename); @@ -931,6 +932,7 @@ int odb_source_loose_write_object(struct odb_source *source, struct object_id *compat_oid_in, enum odb_write_object_flags flags) { + struct odb_source_files *files = odb_source_files_downcast(source); const struct git_hash_algo *algo = source->odb->repo->hash_algo; const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; struct object_id compat_oid; @@ -962,13 +964,14 @@ int odb_source_loose_write_object(struct odb_source *source, if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags)) return -1; if (compat) - return repo_add_loose_object_map(source, oid, &compat_oid); + return repo_add_loose_object_map(files->loose, oid, &compat_oid); return 0; } int force_object_loose(struct odb_source *source, const struct object_id *oid, time_t mtime) { + struct odb_source_files *files = odb_source_files_downcast(source); const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; void *buf; unsigned long len; @@ -998,7 +1001,7 @@ int force_object_loose(struct odb_source *source, hdrlen = format_object_header(hdr, sizeof(hdr), type, len); ret = write_loose_object(source, oid, hdr, hdrlen, buf, len, mtime, 0); if (!ret && compat) - ret = repo_add_loose_object_map(source, oid, &compat_oid); + ret = repo_add_loose_object_map(files->loose, oid, &compat_oid); free(buf); return ret; From 04a6e84cbdbebadd01d939168f1c69680c174fce Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:37 +0200 Subject: [PATCH 062/107] odb/source-loose: wire up `write_object()` callback Move `odb_source_loose_write_object()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `write_object()` callback of the loose source. As in preceding commits, this requires us to expose a couple of generic functions from "object-file.c" as they are used in both subsystems now. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 58 +++++++--------------------------------------- object-file.h | 14 ++++++----- odb/source-files.c | 5 ++-- odb/source-loose.c | 44 +++++++++++++++++++++++++++++++++++ 4 files changed, 63 insertions(+), 58 deletions(-) diff --git a/object-file.c b/object-file.c index fe24f00d1b79bf..7bb5b31bcad88b 100644 --- a/object-file.c +++ b/object-file.c @@ -326,10 +326,10 @@ static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_c git_hash_final_oid(oid, c); } -static void write_object_file_prepare(const struct git_hash_algo *algo, - const void *buf, unsigned long len, - enum object_type type, struct object_id *oid, - char *hdr, int *hdrlen) +void write_object_file_prepare(const struct git_hash_algo *algo, + const void *buf, unsigned long len, + enum object_type type, struct object_id *oid, + char *hdr, int *hdrlen) { struct git_hash_ctx c; @@ -746,10 +746,10 @@ static int end_loose_object_common(struct odb_source *source, return Z_OK; } -static int write_loose_object(struct odb_source *source, - const struct object_id *oid, char *hdr, - int hdrlen, const void *buf, unsigned long len, - time_t mtime, unsigned flags) +int write_loose_object(struct odb_source *source, + const struct object_id *oid, char *hdr, + int hdrlen, const void *buf, unsigned long len, + time_t mtime, unsigned flags) { int fd, ret; unsigned char compressed[4096]; @@ -926,48 +926,6 @@ int odb_source_loose_write_stream(struct odb_source *source, return err; } -int odb_source_loose_write_object(struct odb_source *source, - const void *buf, unsigned long len, - enum object_type type, struct object_id *oid, - struct object_id *compat_oid_in, - enum odb_write_object_flags flags) -{ - struct odb_source_files *files = odb_source_files_downcast(source); - const struct git_hash_algo *algo = source->odb->repo->hash_algo; - const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; - struct object_id compat_oid; - char hdr[MAX_HEADER_LEN]; - int hdrlen = sizeof(hdr); - - /* Generate compat_oid */ - if (compat) { - if (compat_oid_in) - oidcpy(&compat_oid, compat_oid_in); - else if (type == OBJ_BLOB) - hash_object_file(compat, buf, len, type, &compat_oid); - else { - struct strbuf converted = STRBUF_INIT; - convert_object_file(source->odb->repo, &converted, algo, compat, - buf, len, type, 0); - hash_object_file(compat, converted.buf, converted.len, - type, &compat_oid); - strbuf_release(&converted); - } - } - - /* Normally if we have it in the pack then we do not bother writing - * it out into .git/objects/??/?{38} file. - */ - write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen); - if (odb_freshen_object(source->odb, oid)) - return 0; - if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags)) - return -1; - if (compat) - return repo_add_loose_object_map(files->loose, oid, &compat_oid); - return 0; -} - int force_object_loose(struct odb_source *source, const struct object_id *oid, time_t mtime) { diff --git a/object-file.h b/object-file.h index 1d90df9d98b78e..2b32592de1135b 100644 --- a/object-file.h +++ b/object-file.h @@ -23,12 +23,6 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa struct object_info; struct odb_source; -int odb_source_loose_write_object(struct odb_source *source, - const void *buf, unsigned long len, - enum object_type type, struct object_id *oid, - struct object_id *compat_oid_in, - enum odb_write_object_flags flags); - int odb_source_loose_write_stream(struct odb_source *source, struct odb_write_stream *stream, size_t len, struct object_id *oid); @@ -129,6 +123,14 @@ int finalize_object_file_flags(struct repository *repo, void hash_object_file(const struct git_hash_algo *algo, const void *buf, unsigned long len, enum object_type type, struct object_id *oid); +void write_object_file_prepare(const struct git_hash_algo *algo, + const void *buf, unsigned long len, + enum object_type type, struct object_id *oid, + char *hdr, int *hdrlen); +int write_loose_object(struct odb_source *source, + const struct object_id *oid, char *hdr, + int hdrlen, const void *buf, unsigned long len, + time_t mtime, unsigned flags); /* Helper to check and "touch" a file */ int check_and_freshen_file(const char *fn, int freshen); diff --git a/odb/source-files.c b/odb/source-files.c index ef548e6fe69cd0..52ba04237acfd7 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -164,8 +164,9 @@ static int odb_source_files_write_object(struct odb_source *source, struct object_id *compat_oid, enum odb_write_object_flags flags) { - return odb_source_loose_write_object(source, buf, len, type, - oid, compat_oid, flags); + struct odb_source_files *files = odb_source_files_downcast(source); + return odb_source_write_object(&files->loose->base, buf, len, type, + oid, compat_oid, flags); } static int odb_source_files_write_object_stream(struct odb_source *source, diff --git a/odb/source-loose.c b/odb/source-loose.c index e519365d23f680..c91018109e5b68 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -5,6 +5,7 @@ #include "hex.h" #include "loose.h" #include "object-file.h" +#include "object-file-convert.h" #include "odb.h" #include "odb/source-files.h" #include "odb/source-loose.h" @@ -588,6 +589,48 @@ static int odb_source_loose_freshen_object(struct odb_source *source, return !!check_and_freshen_file(path.buf, 1); } +static int odb_source_loose_write_object(struct odb_source *source, + const void *buf, unsigned long len, + enum object_type type, struct object_id *oid, + struct object_id *compat_oid_in, + enum odb_write_object_flags flags) +{ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + const struct git_hash_algo *algo = source->odb->repo->hash_algo; + const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; + struct object_id compat_oid; + char hdr[MAX_HEADER_LEN]; + int hdrlen = sizeof(hdr); + + /* Generate compat_oid */ + if (compat) { + if (compat_oid_in) + oidcpy(&compat_oid, compat_oid_in); + else if (type == OBJ_BLOB) + hash_object_file(compat, buf, len, type, &compat_oid); + else { + struct strbuf converted = STRBUF_INIT; + convert_object_file(source->odb->repo, &converted, algo, compat, + buf, len, type, 0); + hash_object_file(compat, converted.buf, converted.len, + type, &compat_oid); + strbuf_release(&converted); + } + } + + /* Normally if we have it in the pack then we do not bother writing + * it out into .git/objects/??/?{38} file. + */ + write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen); + if (odb_freshen_object(source->odb, oid)) + return 0; + if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags)) + return -1; + if (compat) + return repo_add_loose_object_map(loose, oid, &compat_oid); + return 0; +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -647,6 +690,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.find_abbrev_len = odb_source_loose_find_abbrev_len; loose->base.count_objects = odb_source_loose_count_objects; loose->base.freshen_object = odb_source_loose_freshen_object; + loose->base.write_object = odb_source_loose_write_object; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From b9906a645c38ef77643d661ac9a5a6aa31fbeaf4 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:38 +0200 Subject: [PATCH 063/107] object-file: refactor writing objects to use loose source The "object-file" subsystem still hosts the majority of logic used to write loose objects. Eventually, we'll want to move this logic into "odb/source-loose.c", but this isn't yet easily possible because a lot of the writing logic is still being shared with `force_object_loose()`. We will eventually detangle this logic so that we can indeed move all of it into the "loose" source. Meanwhile though, refactor the code so that it operates on a `struct odb_source_loose` directly to already make the dependency explicit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- http-walker.c | 3 +- http.c | 6 ++-- object-file.c | 75 +++++++++++++++++++++++----------------------- object-file.h | 6 ++-- odb/source-files.c | 3 +- odb/source-loose.c | 9 +++--- 6 files changed, 53 insertions(+), 49 deletions(-) diff --git a/http-walker.c b/http-walker.c index 1b6d496548373e..435a7265408fa4 100644 --- a/http-walker.c +++ b/http-walker.c @@ -539,8 +539,9 @@ static int fetch_object(struct walker *walker, const struct object_id *oid) } else if (!oideq(&obj_req->oid, &req->real_oid)) { ret = error("File %s has bad hash", hex); } else if (req->rename < 0) { + struct odb_source_files *files = odb_source_files_downcast(the_repository->objects->sources); struct strbuf buf = STRBUF_INIT; - odb_loose_path(the_repository->objects->sources, &buf, &req->oid); + odb_loose_path(files->loose, &buf, &req->oid); ret = error("unable to write sha1 filename %s", buf.buf); strbuf_release(&buf); } diff --git a/http.c b/http.c index ea9b16861bc3d4..3fcc0122337ba4 100644 --- a/http.c +++ b/http.c @@ -2826,6 +2826,7 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb, struct http_object_request *new_http_object_request(const char *base_url, const struct object_id *oid) { + struct odb_source_files *files = odb_source_files_downcast(the_repository->objects->sources); char *hex = oid_to_hex(oid); struct strbuf filename = STRBUF_INIT; struct strbuf prevfile = STRBUF_INIT; @@ -2840,7 +2841,7 @@ struct http_object_request *new_http_object_request(const char *base_url, oidcpy(&freq->oid, oid); freq->localfile = -1; - odb_loose_path(the_repository->objects->sources, &filename, oid); + odb_loose_path(files->loose, &filename, oid); strbuf_addf(&freq->tmpfile, "%s.temp", filename.buf); strbuf_addf(&prevfile, "%s.prev", filename.buf); @@ -2966,6 +2967,7 @@ void process_http_object_request(struct http_object_request *freq) int finish_http_object_request(struct http_object_request *freq) { + struct odb_source_files *files = odb_source_files_downcast(the_repository->objects->sources); struct stat st; struct strbuf filename = STRBUF_INIT; @@ -2992,7 +2994,7 @@ int finish_http_object_request(struct http_object_request *freq) unlink_or_warn(freq->tmpfile.buf); return -1; } - odb_loose_path(the_repository->objects->sources, &filename, &freq->oid); + odb_loose_path(files->loose, &filename, &freq->oid); freq->rename = finalize_object_file(the_repository, freq->tmpfile.buf, filename.buf); strbuf_release(&filename); diff --git a/object-file.c b/object-file.c index 7bb5b31bcad88b..bce941874eb994 100644 --- a/object-file.c +++ b/object-file.c @@ -54,14 +54,14 @@ static void fill_loose_path(struct strbuf *buf, } } -const char *odb_loose_path(struct odb_source *source, +const char *odb_loose_path(struct odb_source_loose *loose, struct strbuf *buf, const struct object_id *oid) { strbuf_reset(buf); - strbuf_addstr(buf, source->path); + strbuf_addstr(buf, loose->base.path); strbuf_addch(buf, '/'); - fill_loose_path(buf, oid, source->odb->repo->hash_algo); + fill_loose_path(buf, oid, loose->base.odb->repo->hash_algo); return buf->buf; } @@ -575,14 +575,14 @@ static void flush_loose_object_transaction(struct odb_transaction_files *transac } /* Finalize a file on disk, and close it. */ -static void close_loose_object(struct odb_source *source, +static void close_loose_object(struct odb_source_loose *loose, int fd, const char *filename) { - if (source->will_destroy) + if (loose->base.will_destroy) goto out; if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - fsync_loose_object_transaction(source->odb->transaction, fd, filename); + fsync_loose_object_transaction(loose->base.odb->transaction, fd, filename); else if (fsync_object_files > 0) fsync_or_die(fd, filename); else @@ -651,7 +651,7 @@ static int create_tmpfile(struct repository *repo, * Returns a "fd", which should later be provided to * end_loose_object_common(). */ -static int start_loose_object_common(struct odb_source *source, +static int start_loose_object_common(struct odb_source_loose *loose, struct strbuf *tmp_file, const char *filename, unsigned flags, git_zstream *stream, @@ -659,18 +659,18 @@ static int start_loose_object_common(struct odb_source *source, struct git_hash_ctx *c, struct git_hash_ctx *compat_c, char *hdr, int hdrlen) { - const struct git_hash_algo *algo = source->odb->repo->hash_algo; - const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; + const struct git_hash_algo *algo = loose->base.odb->repo->hash_algo; + const struct git_hash_algo *compat = loose->base.odb->repo->compat_hash_algo; int fd; - fd = create_tmpfile(source->odb->repo, tmp_file, filename); + fd = create_tmpfile(loose->base.odb->repo, tmp_file, filename); if (fd < 0) { if (flags & ODB_WRITE_OBJECT_SILENT) return -1; else if (errno == EACCES) return error(_("insufficient permission for adding " "an object to repository database %s"), - source->path); + loose->base.path); else return error_errno( _("unable to create temporary file")); @@ -700,14 +700,14 @@ static int start_loose_object_common(struct odb_source *source, * Common steps for the inner git_deflate() loop for writing loose * objects. Returns what git_deflate() returns. */ -static int write_loose_object_common(struct odb_source *source, +static int write_loose_object_common(struct odb_source_loose *loose, struct git_hash_ctx *c, struct git_hash_ctx *compat_c, git_zstream *stream, const int flush, unsigned char *in0, const int fd, unsigned char *compressed, const size_t compressed_len) { - const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; + const struct git_hash_algo *compat = loose->base.odb->repo->compat_hash_algo; int ret; ret = git_deflate(stream, flush ? Z_FINISH : 0); @@ -728,12 +728,12 @@ static int write_loose_object_common(struct odb_source *source, * - End the compression of zlib stream. * - Get the calculated oid to "oid". */ -static int end_loose_object_common(struct odb_source *source, +static int end_loose_object_common(struct odb_source_loose *loose, struct git_hash_ctx *c, struct git_hash_ctx *compat_c, git_zstream *stream, struct object_id *oid, struct object_id *compat_oid) { - const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; + const struct git_hash_algo *compat = loose->base.odb->repo->compat_hash_algo; int ret; ret = git_deflate_end_gently(stream); @@ -746,7 +746,7 @@ static int end_loose_object_common(struct odb_source *source, return Z_OK; } -int write_loose_object(struct odb_source *source, +int write_loose_object(struct odb_source_loose *loose, const struct object_id *oid, char *hdr, int hdrlen, const void *buf, unsigned long len, time_t mtime, unsigned flags) @@ -760,11 +760,11 @@ int write_loose_object(struct odb_source *source, static struct strbuf filename = STRBUF_INIT; if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - prepare_loose_object_transaction(source->odb->transaction); + prepare_loose_object_transaction(loose->base.odb->transaction); - odb_loose_path(source, &filename, oid); + odb_loose_path(loose, &filename, oid); - fd = start_loose_object_common(source, &tmp_file, filename.buf, flags, + fd = start_loose_object_common(loose, &tmp_file, filename.buf, flags, &stream, compressed, sizeof(compressed), &c, NULL, hdr, hdrlen); if (fd < 0) @@ -776,14 +776,14 @@ int write_loose_object(struct odb_source *source, do { unsigned char *in0 = stream.next_in; - ret = write_loose_object_common(source, &c, NULL, &stream, 1, in0, fd, + ret = write_loose_object_common(loose, &c, NULL, &stream, 1, in0, fd, compressed, sizeof(compressed)); } while (ret == Z_OK); if (ret != Z_STREAM_END) die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid), ret); - ret = end_loose_object_common(source, &c, NULL, &stream, ¶no_oid, NULL); + ret = end_loose_object_common(loose, &c, NULL, &stream, ¶no_oid, NULL); if (ret != Z_OK) die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid), ret); @@ -791,7 +791,7 @@ int write_loose_object(struct odb_source *source, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); - close_loose_object(source, fd, tmp_file.buf); + close_loose_object(loose, fd, tmp_file.buf); if (mtime) { struct utimbuf utb; @@ -802,16 +802,15 @@ int write_loose_object(struct odb_source *source, warning_errno(_("failed utime() on %s"), tmp_file.buf); } - return finalize_object_file_flags(source->odb->repo, tmp_file.buf, filename.buf, + return finalize_object_file_flags(loose->base.odb->repo, tmp_file.buf, filename.buf, FOF_SKIP_COLLISION_CHECK); } -int odb_source_loose_write_stream(struct odb_source *source, +int odb_source_loose_write_stream(struct odb_source_loose *loose, struct odb_write_stream *in_stream, size_t len, struct object_id *oid) { - struct odb_source_files *files = odb_source_files_downcast(source); - const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; + const struct git_hash_algo *compat = loose->base.odb->repo->compat_hash_algo; struct object_id compat_oid; int fd, ret, err = 0, flush = 0; unsigned char compressed[4096]; @@ -825,10 +824,10 @@ int odb_source_loose_write_stream(struct odb_source *source, int hdrlen; if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - prepare_loose_object_transaction(source->odb->transaction); + prepare_loose_object_transaction(loose->base.odb->transaction); /* Since oid is not determined, save tmp file to odb path. */ - strbuf_addf(&filename, "%s/", source->path); + strbuf_addf(&filename, "%s/", loose->base.path); hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len); /* @@ -839,7 +838,7 @@ int odb_source_loose_write_stream(struct odb_source *source, * - Setup zlib stream for compression. * - Start to feed header to zlib stream. */ - fd = start_loose_object_common(source, &tmp_file, filename.buf, 0, + fd = start_loose_object_common(loose, &tmp_file, filename.buf, 0, &stream, compressed, sizeof(compressed), &c, &compat_c, hdr, hdrlen); if (fd < 0) { @@ -867,7 +866,7 @@ int odb_source_loose_write_stream(struct odb_source *source, if (in_stream->is_finished) flush = 1; } - ret = write_loose_object_common(source, &c, &compat_c, &stream, flush, in0, fd, + ret = write_loose_object_common(loose, &c, &compat_c, &stream, flush, in0, fd, compressed, sizeof(compressed)); /* * Unlike write_loose_object(), we do not have the entire @@ -890,16 +889,16 @@ int odb_source_loose_write_stream(struct odb_source *source, */ if (ret != Z_STREAM_END) die(_("unable to stream deflate new object (%d)"), ret); - ret = end_loose_object_common(source, &c, &compat_c, &stream, oid, &compat_oid); + ret = end_loose_object_common(loose, &c, &compat_c, &stream, oid, &compat_oid); if (ret != Z_OK) die(_("deflateEnd on stream object failed (%d)"), ret); - close_loose_object(source, fd, tmp_file.buf); + close_loose_object(loose, fd, tmp_file.buf); - if (odb_freshen_object(source->odb, oid)) { + if (odb_freshen_object(loose->base.odb, oid)) { unlink_or_warn(tmp_file.buf); goto cleanup; } - odb_loose_path(source, &filename, oid); + odb_loose_path(loose, &filename, oid); /* We finally know the object path, and create the missing dir. */ dirlen = directory_size(filename.buf); @@ -907,7 +906,7 @@ int odb_source_loose_write_stream(struct odb_source *source, struct strbuf dir = STRBUF_INIT; strbuf_add(&dir, filename.buf, dirlen); - if (safe_create_dir_in_gitdir(source->odb->repo, dir.buf) && + if (safe_create_dir_in_gitdir(loose->base.odb->repo, dir.buf) && errno != EEXIST) { err = error_errno(_("unable to create directory %s"), dir.buf); strbuf_release(&dir); @@ -916,10 +915,10 @@ int odb_source_loose_write_stream(struct odb_source *source, strbuf_release(&dir); } - err = finalize_object_file_flags(source->odb->repo, tmp_file.buf, filename.buf, + err = finalize_object_file_flags(loose->base.odb->repo, tmp_file.buf, filename.buf, FOF_SKIP_COLLISION_CHECK); if (!err && compat) - err = repo_add_loose_object_map(files->loose, oid, &compat_oid); + err = repo_add_loose_object_map(loose, oid, &compat_oid); cleanup: strbuf_release(&tmp_file); strbuf_release(&filename); @@ -957,7 +956,7 @@ int force_object_loose(struct odb_source *source, oid_to_hex(oid), compat->name); } hdrlen = format_object_header(hdr, sizeof(hdr), type, len); - ret = write_loose_object(source, oid, hdr, hdrlen, buf, len, mtime, 0); + ret = write_loose_object(files->loose, oid, hdr, hdrlen, buf, len, mtime, 0); if (!ret && compat) ret = repo_add_loose_object_map(files->loose, oid, &compat_oid); free(buf); diff --git a/object-file.h b/object-file.h index 2b32592de1135b..d30f1b10b2eb36 100644 --- a/object-file.h +++ b/object-file.h @@ -23,7 +23,7 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa struct object_info; struct odb_source; -int odb_source_loose_write_stream(struct odb_source *source, +int odb_source_loose_write_stream(struct odb_source_loose *loose, struct odb_write_stream *stream, size_t len, struct object_id *oid); @@ -31,7 +31,7 @@ int odb_source_loose_write_stream(struct odb_source *source, * Put in `buf` the name of the file in the local object database that * would be used to store a loose object with the specified oid. */ -const char *odb_loose_path(struct odb_source *source, +const char *odb_loose_path(struct odb_source_loose *source, struct strbuf *buf, const struct object_id *oid); @@ -127,7 +127,7 @@ void write_object_file_prepare(const struct git_hash_algo *algo, const void *buf, unsigned long len, enum object_type type, struct object_id *oid, char *hdr, int *hdrlen); -int write_loose_object(struct odb_source *source, +int write_loose_object(struct odb_source_loose *loose, const struct object_id *oid, char *hdr, int hdrlen, const void *buf, unsigned long len, time_t mtime, unsigned flags); diff --git a/odb/source-files.c b/odb/source-files.c index 52ba04237acfd7..2ba1def776e006 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -174,7 +174,8 @@ static int odb_source_files_write_object_stream(struct odb_source *source, size_t len, struct object_id *oid) { - return odb_source_loose_write_stream(source, stream, len, oid); + struct odb_source_files *files = odb_source_files_downcast(source); + return odb_source_loose_write_stream(files->loose, stream, len, oid); } static int odb_source_files_begin_transaction(struct odb_source *source, diff --git a/odb/source-loose.c b/odb/source-loose.c index c91018109e5b68..da8a60dba1c04c 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -220,7 +220,7 @@ static int odb_source_loose_read_object_info(struct odb_source *source, if (flags & OBJECT_INFO_SECOND_READ) return -1; - odb_loose_path(source, &buf, oid); + odb_loose_path(loose, &buf, oid); return read_object_info_from_path(loose, buf.buf, oid, oi, flags); } @@ -238,7 +238,7 @@ static int open_loose_object(struct odb_source_loose *loose, static struct strbuf buf = STRBUF_INIT; int fd; - *path = odb_loose_path(&loose->base, &buf, oid); + *path = odb_loose_path(loose, &buf, oid); fd = git_open(*path); if (fd >= 0) return fd; @@ -584,8 +584,9 @@ static int odb_source_loose_count_objects(struct odb_source *source, static int odb_source_loose_freshen_object(struct odb_source *source, const struct object_id *oid) { + struct odb_source_loose *loose = odb_source_loose_downcast(source); static struct strbuf path = STRBUF_INIT; - odb_loose_path(source, &path, oid); + odb_loose_path(loose, &path, oid); return !!check_and_freshen_file(path.buf, 1); } @@ -624,7 +625,7 @@ static int odb_source_loose_write_object(struct odb_source *source, write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen); if (odb_freshen_object(source->odb, oid)) return 0; - if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags)) + if (write_loose_object(loose, oid, hdr, hdrlen, buf, len, 0, flags)) return -1; if (compat) return repo_add_loose_object_map(loose, oid, &compat_oid); From e6a39bbe7a6bde5fb7de8d487e8f4ef928e6b751 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:39 +0200 Subject: [PATCH 064/107] odb/source-loose: wire up `write_object_stream()` callback Wire up the `write_object_stream()` callback. Note that we don't move the implementation into "odb/source-loose.c". This is because most of the logic to write loose objects is still contained in "object-file.c", and detangling that requires us to do some refactorings as explained in the preceding commit. So for now, the implementation of writing an object stream is still located in "object-file.c". Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.h | 12 +++++++++++- odb/source-files.c | 3 ++- odb/source-loose.c | 14 ++++++++++++++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/object-file.h b/object-file.h index d30f1b10b2eb36..528c4e6e697f87 100644 --- a/object-file.h +++ b/object-file.h @@ -23,7 +23,17 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa struct object_info; struct odb_source; -int odb_source_loose_write_stream(struct odb_source_loose *loose, +/* + * Write the given stream into the loose object source. The only difference + * from the generic implementation of this function is that we don't perform an + * object existence check here. + * + * TODO: We should stop exposing this function altogether and move it into + * "odb/source-loose.c". This requires a couple of refactorings though to make + * `force_object_loose()` generic and is thus postponed to a later point in + * time. + */ +int odb_source_loose_write_stream(struct odb_source_loose *source, struct odb_write_stream *stream, size_t len, struct object_id *oid); diff --git a/odb/source-files.c b/odb/source-files.c index 2ba1def776e006..83f8066c67dd3c 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -7,6 +7,7 @@ #include "odb.h" #include "odb/source.h" #include "odb/source-files.h" +#include "odb/source-loose.h" #include "packfile.h" #include "strbuf.h" #include "write-or-die.h" @@ -175,7 +176,7 @@ static int odb_source_files_write_object_stream(struct odb_source *source, struct object_id *oid) { struct odb_source_files *files = odb_source_files_downcast(source); - return odb_source_loose_write_stream(files->loose, stream, len, oid); + return odb_source_write_object_stream(&files->loose->base, stream, len, oid); } static int odb_source_files_begin_transaction(struct odb_source *source, diff --git a/odb/source-loose.c b/odb/source-loose.c index da8a60dba1c04c..e52fc289a24102 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -632,6 +632,19 @@ static int odb_source_loose_write_object(struct odb_source *source, return 0; } +static int odb_source_loose_write_object_stream(struct odb_source *source, + struct odb_write_stream *in_stream, + size_t len, + struct object_id *oid) +{ + /* + * TODO: the implementation should be moved here, see the comment on + * the called function in "object-file.h". + */ + struct odb_source_loose *loose = odb_source_loose_downcast(source); + return odb_source_loose_write_stream(loose, in_stream, len, oid); +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -692,6 +705,7 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.count_objects = odb_source_loose_count_objects; loose->base.freshen_object = odb_source_loose_freshen_object; loose->base.write_object = odb_source_loose_write_object; + loose->base.write_object_stream = odb_source_loose_write_object_stream; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From 87af3bb434b86805f69fae40c966d92db1bd2eae Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:40 +0200 Subject: [PATCH 065/107] odb/source-loose: stub out remaining callbacks Stub out remaining callback functions for the "loose" backend. Note that we also stub out transactions for loose objects. In fact, we already have the infrastructure in place for those, and we could in theory implement those, as well. But there are separate efforts ongoing to polish up transactional interfaces, and doing so now would likely result in some messiness. This omission will thus be worked on in a subsequent patch series, once the dust has settled. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- odb/source-loose.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/odb/source-loose.c b/odb/source-loose.c index e52fc289a24102..e1749413184160 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -645,6 +645,25 @@ static int odb_source_loose_write_object_stream(struct odb_source *source, return odb_source_loose_write_stream(loose, in_stream, len, oid); } +static int odb_source_loose_begin_transaction(struct odb_source *source UNUSED, + struct odb_transaction **out UNUSED) +{ + /* TODO: this is a known omission that we'll want to address eventually. */ + return error("loose source does not support transactions"); +} + +static int odb_source_loose_read_alternates(struct odb_source *source UNUSED, + struct strvec *out UNUSED) +{ + return 0; +} + +static int odb_source_loose_write_alternate(struct odb_source *source UNUSED, + const char *alternate UNUSED) +{ + return error("loose source does not support alternates"); +} + static void odb_source_loose_clear_cache(struct odb_source_loose *loose) { oidtree_clear(loose->cache); @@ -706,6 +725,9 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) loose->base.freshen_object = odb_source_loose_freshen_object; loose->base.write_object = odb_source_loose_write_object; loose->base.write_object_stream = odb_source_loose_write_object_stream; + loose->base.begin_transaction = odb_source_loose_begin_transaction; + loose->base.read_alternates = odb_source_loose_read_alternates; + loose->base.write_alternate = odb_source_loose_write_alternate; if (!is_absolute_path(loose->base.path)) chdir_notify_register(NULL, odb_source_loose_reparent, loose); From ef4778bcba323ab38d442811f851af092760b6b5 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 1 Jun 2026 10:20:41 +0200 Subject: [PATCH 066/107] odb/source-loose: drop pointer to the "files" source Now that all callbacks of the loose source operate on `struct odb_source_loose` directly we no longer have to reach into the "files" source at all. Drop this field and update `odb_source_loose_new()` to instead accept all parameters required to initialize itself. This ensures that the "loose" backend is a fully standalone source. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- odb/source-files.c | 2 +- odb/source-loose.c | 8 ++++---- odb/source-loose.h | 7 ++++--- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/odb/source-files.c b/odb/source-files.c index 83f8066c67dd3c..5bdd0429225397 100644 --- a/odb/source-files.c +++ b/odb/source-files.c @@ -268,7 +268,7 @@ struct odb_source_files *odb_source_files_new(struct object_database *odb, CALLOC_ARRAY(files, 1); odb_source_init(&files->base, odb, ODB_SOURCE_FILES, path, local); - files->loose = odb_source_loose_new(files); + files->loose = odb_source_loose_new(odb, path, local); files->packed = packfile_store_new(&files->base); files->base.free = odb_source_files_free; diff --git a/odb/source-loose.c b/odb/source-loose.c index e1749413184160..7d7ea2fb842537 100644 --- a/odb/source-loose.c +++ b/odb/source-loose.c @@ -705,14 +705,14 @@ static void odb_source_loose_free(struct odb_source *source) free(loose); } -struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files) +struct odb_source_loose *odb_source_loose_new(struct object_database *odb, + const char *path, + bool local) { struct odb_source_loose *loose; CALLOC_ARRAY(loose, 1); - odb_source_init(&loose->base, files->base.odb, ODB_SOURCE_LOOSE, - files->base.path, files->base.local); - loose->files = files; + odb_source_init(&loose->base, odb, ODB_SOURCE_LOOSE, path, local); loose->base.free = odb_source_loose_free; loose->base.close = odb_source_loose_close; diff --git a/odb/source-loose.h b/odb/source-loose.h index 4dd4fd6ce30a7e..6070aaf3ce6ab2 100644 --- a/odb/source-loose.h +++ b/odb/source-loose.h @@ -9,11 +9,10 @@ struct oidtree; /* * An object database source that stores its objects in loose format, one - * file per object. This source is part of the files source. + * file per object. */ struct odb_source_loose { struct odb_source base; - struct odb_source_files *files; /* * Used to store the results of readdir(3) calls when we are OK @@ -31,7 +30,9 @@ struct odb_source_loose { struct loose_object_map *map; }; -struct odb_source_loose *odb_source_loose_new(struct odb_source_files *files); +struct odb_source_loose *odb_source_loose_new(struct object_database *odb, + const char *path, + bool local); /* * Cast the given object database source to the loose backend. This will cause From 96ee7f1650e6096561599f069d18c052412d7506 Mon Sep 17 00:00:00 2001 From: LorenzoPegorari Date: Mon, 1 Jun 2026 15:52:01 +0200 Subject: [PATCH 067/107] http: cleanup function fetch_and_setup_pack_index() Cleanup the function `fetch_and_setup_pack_index()` by removing the useless call to the function `unlink()`. This is not necessary anymore since 63aca3f7f1 (dumb-http: store downloaded pack idx as tempfile, 2024-10-25), when `fetch_pack_index()` started registering its return value (in this case `tmp_idx`) as a tempfile to be deleted at process exit. Signed-off-by: LorenzoPegorari Signed-off-by: Junio C Hamano --- http.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/http.c b/http.c index ea9b16861bc3d4..55dd856a279a23 100644 --- a/http.c +++ b/http.c @@ -2609,9 +2609,7 @@ static int fetch_and_setup_pack_index(struct packfile_list *packs, new_pack = parse_pack_index(the_repository, sha1, tmp_idx); if (!new_pack) { - unlink(tmp_idx); free(tmp_idx); - return -1; /* parse_pack_index() already issued error message */ } From 18decad922884a69ea39c0332f7a94ce82cf99cc Mon Sep 17 00:00:00 2001 From: LorenzoPegorari Date: Mon, 1 Jun 2026 15:52:12 +0200 Subject: [PATCH 068/107] http: fix memory leak in fetch_and_setup_pack_index() Inside the function `fetch_and_setup_pack_index()`, when the pack obtained using `parse_pack_index()` fails to be verified by `verify_pack_index()`, the function returns without closing and freeing said pack. Fix this by calling `close_pack_index()` to munmap the index file for the leaking pack (which might have been mmapped by `fetch_pack_index()` or `verify_pack_index()`), and then free it, when the verification fails. Signed-off-by: LorenzoPegorari Signed-off-by: Junio C Hamano --- http.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/http.c b/http.c index 55dd856a279a23..d50a34e4460a90 100644 --- a/http.c +++ b/http.c @@ -2614,11 +2614,13 @@ static int fetch_and_setup_pack_index(struct packfile_list *packs, } ret = verify_pack_index(new_pack); - if (!ret) - close_pack_index(new_pack); + + close_pack_index(new_pack); free(tmp_idx); - if (ret) + if (ret) { + free(new_pack); return -1; + } packfile_list_prepend(packs, new_pack); return 0; From 1891707d1b8bb0ac3c47343e881fcf28ec69457a Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Mon, 1 Jun 2026 16:36:08 -0700 Subject: [PATCH 069/107] describe: fix --exclude, --match with --contains and --all git describe --contains acts as a wrapper around git name-rev. When operating with --contains and --all, the --match and --exclude patterns are not properly forwarded to name-rev as --exclude and --refs options. This results in the command silently discarding match and exclude requests from the user when operating in --all mode. We could check and die() if the user provides --contains, --all, and --match/--exclude. However, its also straight forward to just pass the filters down to git name-rev. Notice that the documentation for --match and --exclude mention the --all mode. It explains that they operate on refs with the prefix refs/tags, and additionally refs/heads and refs/remotes when using --all. Fix the describe logic to pass the patterns down with the appropriate prefixes when --all is provided. This fixes the support to match the documented behavior. Add tests to check that this works as expected. Reported-by: Tuomas Ahola Signed-off-by: Jacob Keller Signed-off-by: Junio C Hamano --- builtin/describe.c | 18 +++++++++++++++--- t/t6120-describe.sh | 22 ++++++++++++++++++++++ 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/builtin/describe.c b/builtin/describe.c index bffeed13a3cb14..62800ef15ed915 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -712,13 +712,25 @@ int cmd_describe(int argc, NULL); if (always) strvec_push(&args, "--always"); - if (!all) { + if (!all) strvec_push(&args, "--tags"); + + for_each_string_list_item(item, &patterns) + strvec_pushf(&args, "--refs=refs/tags/%s", item->string); + for_each_string_list_item(item, &exclude_patterns) + strvec_pushf(&args, "--exclude=refs/tags/%s", item->string); + + if (all) { for_each_string_list_item(item, &patterns) - strvec_pushf(&args, "--refs=refs/tags/%s", item->string); + strvec_pushf(&args, "--refs=refs/heads/%s", item->string); for_each_string_list_item(item, &exclude_patterns) - strvec_pushf(&args, "--exclude=refs/tags/%s", item->string); + strvec_pushf(&args, "--exclude=refs/heads/%s", item->string); + for_each_string_list_item(item, &patterns) + strvec_pushf(&args, "--refs=refs/remotes/%s", item->string); + for_each_string_list_item(item, &exclude_patterns) + strvec_pushf(&args, "--exclude=refs/remotes/%s", item->string); } + if (argc) strvec_pushv(&args, argv); else diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 2c70cc561ad5f6..e5bcf537602a21 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -345,6 +345,28 @@ test_expect_success 'describe --contains and --no-match' ' test_cmp expect actual ' +test_expect_success 'describe --contains --all --match no matching commit' ' + echo "tags/A^0" >expect && + tagged_commit=$(git rev-parse "refs/tags/A^0") && + test_must_fail git describe --contains --all --match="B" $tagged_commit +' + +check_describe "tags/A^0" --contains --all --match="A" $(git rev-parse "refs/tags/A^0") + +check_describe "branch_A" --contains --all --match="branch*" $(git rev-parse "refs/tags/A^0") + +check_describe "branch_C~1" --contains --all --match="branch*" --exclude="branch_A" $(git rev-parse "refs/tags/A^0") + +check_describe "branch_A" --contains --all \ + --exclude="A" --exclude="c" --exclude="test*" --exclude="origin/remote_branch_A" \ + $(git rev-parse "refs/tags/A^0") + +check_describe "remotes/origin/remote_branch_A" --contains --all --match="origin/remote*" $(git rev-parse "refs/tags/A^0") + +check_describe "remotes/origin/remote_branch_C~1" --contains --all \ + --match="origin/remote*" --exclude="origin/remote_branch_A" \ + $(git rev-parse "refs/tags/A^0") + test_expect_success 'setup and absorb a submodule' ' test_create_repo sub1 && test_commit -C sub1 initial && From 5cd4d0d8500c6ef1b102f5cb35187a91c299f013 Mon Sep 17 00:00:00 2001 From: Harald Nordgren Date: Tue, 2 Jun 2026 07:37:58 +0000 Subject: [PATCH 070/107] config.mak.uname: avoid macOS linker warning on Xcode 16.3+ Building on macOS with Xcode 16.3 or newer emits: ld: warning: reducing alignment of section __DATA,__common from 0x8000 to 0x4000 because it exceeds segment maximum alignment Pass -fno-common when "ld -v" reports ld-1167 or newer, so tentative definitions of large arrays go into BSS instead of __DATA,__common. Signed-off-by: Harald Nordgren Signed-off-by: Junio C Hamano --- config.mak.uname | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/config.mak.uname b/config.mak.uname index 3c35ae33a3c0c0..32b58e7a95091e 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -160,6 +160,12 @@ ifeq ($(uname_S),Darwin) NEEDS_GOOD_LIBICONV = UnfortunatelyYes endif + # Silence Xcode 16.3+ linker warning about __DATA,__common alignment. + LD_MAJOR_VERSION = $(shell ld -v 2>&1 | sed -n 's/.*PROJECT:ld-\([0-9]*\).*/\1/p') + ifeq ($(shell test -n "$(LD_MAJOR_VERSION)" && test "$(LD_MAJOR_VERSION)" -ge 1167 && echo 1),1) + BASIC_CFLAGS += -fno-common + endif + # The builtin FSMonitor on MacOS builds upon Simple-IPC. Both require # Unix domain sockets and PThreads. ifndef NO_PTHREADS From 4018dc29eea31e4273c0f1b02effe6ee852f3898 Mon Sep 17 00:00:00 2001 From: Luna Schwalbe Date: Tue, 2 Jun 2026 10:17:36 +0200 Subject: [PATCH 071/107] doc: document and test `@` prefix for raw timestamps The Git internal date format ` ` fails to parse when the timestamp is less than 100,000,000 (fewer than 9 digits). This happens to avoid potential ambiguity with other date formats such as `YYYYMMDD`, especially when used with approxidate. To force the parser to interpret the value as a raw timestamp, it must be prefixed with `@` (e.g., `@0 +0000`). This behavior was introduced in 2c733fb24c10a9d7aacc51f956bf9b7881980870 (parse_date(): '@' prefix forces git-timestamp, 2012-02-02) but was never documented. Document the `@` prefix in `Documentation/date-formats.adoc` to make this behavior explicit. Also add test cases to `t/t0006-date.sh` to verify and demonstrate the difference between prefixed and unprefixed small timestamps (e.g., `@2000` vs `2000`). Signed-off-by: Luna Schwalbe Co-authored-by: Junio C Hamano Signed-off-by: Junio C Hamano --- Documentation/date-formats.adoc | 5 +++++ t/t0006-date.sh | 11 +++++++++++ 2 files changed, 16 insertions(+) diff --git a/Documentation/date-formats.adoc b/Documentation/date-formats.adoc index e24517c496fce4..330424b2baccda 100644 --- a/Documentation/date-formats.adoc +++ b/Documentation/date-formats.adoc @@ -9,6 +9,11 @@ Git internal format:: `` is the number of seconds since the UNIX epoch. `` is a positive or negative offset from UTC. For example CET (which is 1 hour ahead of UTC) is `+0100`. ++ +It is safer to prepend the `` with `@` (e.g., +`@0 +0000`), which forces Git to interpret it as a raw timestamp. This +is required for values less than 100,000,000 (which have fewer than 9 +digits) to avoid confusion with other date formats like `YYYYMMDD`. RFC 2822:: The standard date format as described by RFC 2822, for example diff --git a/t/t0006-date.sh b/t/t0006-date.sh index 53ced36df448f1..8b4e1870bf118d 100755 --- a/t/t0006-date.sh +++ b/t/t0006-date.sh @@ -138,6 +138,13 @@ check_parse '1969-12-31 23:59:59 Z' bad check_parse '1969-12-31 23:59:59 +11' bad check_parse '1969-12-31 23:59:59 -11' bad +# pathologically small timestamps requiring `@` prefix +check_parse '@0 +0000' '1970-01-01 00:00:00 +0000' +check_parse '@99999999 +0000' '1973-03-03 09:46:39 +0000' +check_parse '99999999 +0000' bad +check_parse '@100000000 +0000' '1973-03-03 09:46:40 +0000' +check_parse '100000000 +0000' '1973-03-03 09:46:40 +0000' + REQUIRE_64BIT_TIME=HAVE_64BIT_TIME check_parse '2099-12-31 23:59:59' '2099-12-31 23:59:59 +0000' check_parse '2099-12-31 23:59:59 +00' '2099-12-31 23:59:59 +0000' @@ -195,6 +202,10 @@ check_approxidate '6AM, June 7, 2009' '2009-06-07 06:00:00' check_approxidate '2008-12-01' '2008-12-01 19:20:00' check_approxidate '2009-12-01' '2009-12-01 19:20:00' +# ambiguous raw timestamp +check_approxidate '2000 +0000' '2000-08-30 19:20:00' +check_approxidate '@2000 +0000' '1970-01-01 00:33:20' + check_date_format_human() { t=$(($GIT_TEST_DATE_NOW - $1)) echo "$t -> $2" >expect From 66ebad2775a1e3904f724522519a7290cb8d9709 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Tue, 2 Jun 2026 23:43:03 +0900 Subject: [PATCH 072/107] SubmittingPatches: separate typofixes section The existing text said something about tests (with [[tests]] to make it easier to refer to it from elsewhere) and then flowed into a different topic of typofixes, but it was unclear where the latter started. Add a similar [[typofixes]] marker to the document. Signed-off-by: Junio C Hamano --- Documentation/SubmittingPatches | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index d570184ec84998..dec8aea4cb0cbd 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -237,6 +237,7 @@ Do not forget to update the documentation to describe the updated behavior and make sure that the resulting documentation set formats well (try the Documentation/doc-diff script). +[[typofixes]] We currently have a liberal mixture of US and UK English norms for spelling and grammar, which is somewhat unfortunate. A huge patch that touches the files all over the place only to correct the inconsistency From bc58f1c7347a38175782b5a745443f109773a501 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Tue, 2 Jun 2026 23:43:04 +0900 Subject: [PATCH 073/107] SubmittingPatches: describe cover letter We talk about how a commit log message should look like, but do not give advice on writing the cover letter to sell a series to the widest possible audience. Helped-by: Patrick Steinhardt Helped-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/SubmittingPatches | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index dec8aea4cb0cbd..df9f722bfeed8d 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -472,6 +472,30 @@ highlighted above. Only capitalize the very first letter of the trailer, i.e. favor "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By". +[[cover-letter]] +=== Cover Letter + +The purpose of your cover letter is to sell your changes, explain what +they are about, and get your target audience interested enough to read +the patches. + +. Every code change comes with risk of regression and maintenance cost. + The cover letter should clearly communicate why the value of your + proposed change is worth applying. You can also describe how the risk + is reduced by the design choices you made while writing the patches. + +. Make sure your target audience can understand what the patches are + about and why they are needed without prior context. + +. For a second or subsequent iteration of the same topic, make sure + people who missed the earlier discussion can still understand what + the patches are about, so they can judge if the topic is worth their + time to read and comment on. + +. To help those who are familiar with earlier iterations, give a + summary of changes since the previous rounds. + + [[ai]] === Use of Artificial Intelligence (AI) From 18684282df3e05db3ed9b3cdafc86c97285e4fd4 Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:14 +0100 Subject: [PATCH 074/107] environment: move "trust_ctime" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `core.trustctime` configuration is currently stored in the global variable `trust_ctime`, which makes it shared across repository instances in a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `core.trustctime` is parsed eagerly because it is used in low‑level stat‑matching functions (`match_stat_data()`), where a lazy parse could cause unexpected fatal errors, result in a performance regression and complicate libification efforts. This preserves that behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use repo_config_values(). Mentored-by: Christian Couder Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- environment.c | 4 ++-- environment.h | 2 +- statinfo.c | 6 ++++-- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/environment.c b/environment.c index fc3ed8bb1c7a66..0a9067729e5025 100644 --- a/environment.c +++ b/environment.c @@ -42,7 +42,6 @@ static int pack_compression_seen; static int zlib_compression_seen; int trust_executable_bit = 1; -int trust_ctime = 1; int check_stat = 1; int has_symlinks = 1; int minimum_abbrev = 4, default_abbrev = -1; @@ -309,7 +308,7 @@ int git_default_core_config(const char *var, const char *value, return 0; } if (!strcmp(var, "core.trustctime")) { - trust_ctime = git_config_bool(var, value); + cfg->trust_ctime = git_config_bool(var, value); return 0; } if (!strcmp(var, "core.checkstat")) { @@ -721,4 +720,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->attributes_file = NULL; cfg->apply_sparse_checkout = 0; cfg->branch_track = BRANCH_TRACK_REMOTE; + cfg->trust_ctime = 1; } diff --git a/environment.h b/environment.h index 123a71cdc8d14e..64d537686eed17 100644 --- a/environment.h +++ b/environment.h @@ -91,6 +91,7 @@ struct repo_config_values { /* section "core" config values */ char *attributes_file; int apply_sparse_checkout; + int trust_ctime; /* section "branch" config values */ enum branch_track branch_track; @@ -161,7 +162,6 @@ extern char *git_work_tree_cfg; /* Environment bits from configuration mechanism */ extern int trust_executable_bit; -extern int trust_ctime; extern int check_stat; extern int has_symlinks; extern int minimum_abbrev, default_abbrev; diff --git a/statinfo.c b/statinfo.c index 30a164b0e68cf8..4fc12053f40b20 100644 --- a/statinfo.c +++ b/statinfo.c @@ -3,6 +3,7 @@ #include "git-compat-util.h" #include "environment.h" #include "statinfo.h" +#include "repository.h" /* * Munge st_size into an unsigned int. @@ -63,17 +64,18 @@ void fake_lstat_data(const struct stat_data *sd, struct stat *st) int match_stat_data(const struct stat_data *sd, struct stat *st) { int changed = 0; + struct repo_config_values *cfg = repo_config_values(the_repository); if (sd->sd_mtime.sec != (unsigned int)st->st_mtime) changed |= MTIME_CHANGED; - if (trust_ctime && check_stat && + if (cfg->trust_ctime && check_stat && sd->sd_ctime.sec != (unsigned int)st->st_ctime) changed |= CTIME_CHANGED; #ifdef USE_NSEC if (check_stat && sd->sd_mtime.nsec != ST_MTIME_NSEC(*st)) changed |= MTIME_CHANGED; - if (trust_ctime && check_stat && + if (cfg->trust_ctime && check_stat && sd->sd_ctime.nsec != ST_CTIME_NSEC(*st)) changed |= CTIME_CHANGED; #endif From 88505ed63711eca184409dfd949437c7e41f994e Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:15 +0100 Subject: [PATCH 075/107] environment: move "check_stat" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `core.checkstat` configuration is currently stored in the global variable `check_stat`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `core.checkstat` is parsed eagerly because it controls how `match_stat_data()` and related functions decide file freshness; a lazy parse could lead to unexpected behavior or complicate libification. This preserves the existing eager‑parsing behavior while tying the value to the repository it was read from, avoiding cross‑repository state leakage, and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- entry.c | 3 ++- environment.c | 6 +++--- environment.h | 2 +- statinfo.c | 10 +++++----- 4 files changed, 11 insertions(+), 10 deletions(-) diff --git a/entry.c b/entry.c index 7817aee362ed9e..c55e867d8a2bca 100644 --- a/entry.c +++ b/entry.c @@ -443,7 +443,8 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { - int trust_ino = check_stat; + struct repo_config_values *cfg = repo_config_values(the_repository); + int trust_ino = cfg->check_stat; #if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) trust_ino = 0; diff --git a/environment.c b/environment.c index 0a9067729e5025..8542ac31413d5b 100644 --- a/environment.c +++ b/environment.c @@ -42,7 +42,6 @@ static int pack_compression_seen; static int zlib_compression_seen; int trust_executable_bit = 1; -int check_stat = 1; int has_symlinks = 1; int minimum_abbrev = 4, default_abbrev = -1; int ignore_case; @@ -315,9 +314,9 @@ int git_default_core_config(const char *var, const char *value, if (!value) return config_error_nonbool(var); if (!strcasecmp(value, "default")) - check_stat = 1; + cfg->check_stat = 1; else if (!strcasecmp(value, "minimal")) - check_stat = 0; + cfg->check_stat = 0; else return error(_("invalid value for '%s': '%s'"), var, value); @@ -721,4 +720,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->apply_sparse_checkout = 0; cfg->branch_track = BRANCH_TRACK_REMOTE; cfg->trust_ctime = 1; + cfg->check_stat = 1; } diff --git a/environment.h b/environment.h index 64d537686eed17..1d3e2e4f230a15 100644 --- a/environment.h +++ b/environment.h @@ -92,6 +92,7 @@ struct repo_config_values { char *attributes_file; int apply_sparse_checkout; int trust_ctime; + int check_stat; /* section "branch" config values */ enum branch_track branch_track; @@ -162,7 +163,6 @@ extern char *git_work_tree_cfg; /* Environment bits from configuration mechanism */ extern int trust_executable_bit; -extern int check_stat; extern int has_symlinks; extern int minimum_abbrev, default_abbrev; extern int ignore_case; diff --git a/statinfo.c b/statinfo.c index 4fc12053f40b20..5e00af127d657d 100644 --- a/statinfo.c +++ b/statinfo.c @@ -68,19 +68,19 @@ int match_stat_data(const struct stat_data *sd, struct stat *st) if (sd->sd_mtime.sec != (unsigned int)st->st_mtime) changed |= MTIME_CHANGED; - if (cfg->trust_ctime && check_stat && + if (cfg->trust_ctime && cfg->check_stat && sd->sd_ctime.sec != (unsigned int)st->st_ctime) changed |= CTIME_CHANGED; #ifdef USE_NSEC - if (check_stat && sd->sd_mtime.nsec != ST_MTIME_NSEC(*st)) + if (cfg->check_stat && sd->sd_mtime.nsec != ST_MTIME_NSEC(*st)) changed |= MTIME_CHANGED; - if (cfg->trust_ctime && check_stat && + if (cfg->trust_ctime && cfg->check_stat && sd->sd_ctime.nsec != ST_CTIME_NSEC(*st)) changed |= CTIME_CHANGED; #endif - if (check_stat) { + if (cfg->check_stat) { if (sd->sd_uid != (unsigned int) st->st_uid || sd->sd_gid != (unsigned int) st->st_gid) changed |= OWNER_CHANGED; @@ -94,7 +94,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st) * clients will have different views of what "device" * the filesystem is on */ - if (check_stat && sd->sd_dev != (unsigned int) st->st_dev) + if (cfg->check_stat && sd->sd_dev != (unsigned int) st->st_dev) changed |= INODE_CHANGED; #endif From e0f86540abd22a98c9701d21d06e75fa2c8d34a0 Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:16 +0100 Subject: [PATCH 076/107] environment: move `zlib_compression_level` into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `zlib_compression_level` configuration is currently stored in the global variable `zlib_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `zlib_compression_level` is parsed eagerly because it determines compression behaviour for objects and packs – core operations where a lazy parse could lead to unpredictable results and hinder libification. This preserves the existing eager‑parsing behavior while tying the value to the repository it was read from, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- builtin/index-pack.c | 3 ++- diff.c | 3 ++- environment.c | 6 +++--- environment.h | 2 +- http-push.c | 3 ++- object-file.c | 3 ++- 6 files changed, 12 insertions(+), 8 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index ca7784dc2c4969..3942d3e0d04a5b 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1416,8 +1416,9 @@ static int write_compressed(struct hashfile *f, void *in, unsigned int size) git_zstream stream; int status; unsigned char outbuf[4096]; + struct repo_config_values *cfg = repo_config_values(the_repository); - git_deflate_init(&stream, zlib_compression_level); + git_deflate_init(&stream, cfg->zlib_compression_level); stream.next_in = in; stream.avail_in = size; diff --git a/diff.c b/diff.c index 397e38b41cc6fa..7d17b0bf3f7102 100644 --- a/diff.c +++ b/diff.c @@ -3589,8 +3589,9 @@ static unsigned char *deflate_it(char *data, int bound; unsigned char *deflated; git_zstream stream; + struct repo_config_values *cfg = repo_config_values(the_repository); - git_deflate_init(&stream, zlib_compression_level); + git_deflate_init(&stream, cfg->zlib_compression_level); bound = git_deflate_bound(&stream, size); deflated = xmalloc(bound); stream.next_out = deflated; diff --git a/environment.c b/environment.c index 8542ac31413d5b..5b0e88b65cf420 100644 --- a/environment.c +++ b/environment.c @@ -52,7 +52,6 @@ char *git_commit_encoding; char *git_log_output_encoding; char *apply_default_whitespace; char *apply_default_ignorewhitespace; -int zlib_compression_level = Z_BEST_SPEED; int pack_compression_level = Z_DEFAULT_COMPRESSION; int fsync_object_files = -1; int use_fsync = -1; @@ -377,7 +376,7 @@ int git_default_core_config(const char *var, const char *value, level = Z_DEFAULT_COMPRESSION; else if (level < 0 || level > Z_BEST_COMPRESSION) die(_("bad zlib compression level %d"), level); - zlib_compression_level = level; + cfg->zlib_compression_level = level; zlib_compression_seen = 1; return 0; } @@ -389,7 +388,7 @@ int git_default_core_config(const char *var, const char *value, else if (level < 0 || level > Z_BEST_COMPRESSION) die(_("bad zlib compression level %d"), level); if (!zlib_compression_seen) - zlib_compression_level = level; + cfg->zlib_compression_level = level; if (!pack_compression_seen) pack_compression_level = level; return 0; @@ -721,4 +720,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->branch_track = BRANCH_TRACK_REMOTE; cfg->trust_ctime = 1; cfg->check_stat = 1; + cfg->zlib_compression_level = Z_BEST_SPEED; } diff --git a/environment.h b/environment.h index 1d3e2e4f230a15..93201620afc302 100644 --- a/environment.h +++ b/environment.h @@ -93,6 +93,7 @@ struct repo_config_values { int apply_sparse_checkout; int trust_ctime; int check_stat; + int zlib_compression_level; /* section "branch" config values */ enum branch_track branch_track; @@ -170,7 +171,6 @@ extern int assume_unchanged; extern int warn_on_object_refname_ambiguity; extern char *apply_default_whitespace; extern char *apply_default_ignorewhitespace; -extern int zlib_compression_level; extern int pack_compression_level; extern unsigned long pack_size_limit_cfg; diff --git a/http-push.c b/http-push.c index d143fe28455623..8ac107a56e08be 100644 --- a/http-push.c +++ b/http-push.c @@ -369,13 +369,14 @@ static void start_put(struct transfer_request *request) int hdrlen; ssize_t size; git_zstream stream; + struct repo_config_values *cfg = repo_config_values(the_repository); unpacked = odb_read_object(the_repository->objects, &request->obj->oid, &type, &len); hdrlen = format_object_header(hdr, sizeof(hdr), type, len); /* Set it up */ - git_deflate_init(&stream, zlib_compression_level); + git_deflate_init(&stream, cfg->zlib_compression_level); size = git_deflate_bound(&stream, len + hdrlen); strbuf_grow(&request->buffer.buf, size); request->buffer.posn = 0; diff --git a/object-file.c b/object-file.c index 2acc9522df2daa..7c122ac419829a 100644 --- a/object-file.c +++ b/object-file.c @@ -906,6 +906,7 @@ static int start_loose_object_common(struct odb_source *source, const struct git_hash_algo *algo = source->odb->repo->hash_algo; const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo; int fd; + struct repo_config_values *cfg = repo_config_values(the_repository); fd = create_tmpfile(source->odb->repo, tmp_file, filename); if (fd < 0) { @@ -921,7 +922,7 @@ static int start_loose_object_common(struct odb_source *source, } /* Setup zlib stream for compression */ - git_deflate_init(stream, zlib_compression_level); + git_deflate_init(stream, cfg->zlib_compression_level); stream->next_out = buf; stream->avail_out = buflen; algo->init_fn(c); From 8cd7402accec35c92d9ea8cc10b9d8e2536ef7b5 Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:17 +0100 Subject: [PATCH 077/107] environment: move "pack_compression_level" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `pack_compression_level` configuration is currently stored in the global variable `pack_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `pack_compression_level` is parsed eagerly because it influences packfile compression, a core operation where a lazy parse could cause inconsistent behavior and hamper libification. This preserves the existing eager‑parsing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- builtin/fast-import.c | 8 +++++--- builtin/pack-objects.c | 17 ++++++++++------- environment.c | 8 +++++--- environment.h | 2 +- object-file.c | 3 ++- 5 files changed, 23 insertions(+), 15 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 82bc6dcc003723..070a5af3e48c92 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -965,6 +965,7 @@ static int store_object( unsigned long hdrlen, deltalen; struct git_hash_ctx c; git_zstream s; + struct repo_config_values *cfg = repo_config_values(the_repository); hdrlen = format_object_header((char *)hdr, sizeof(hdr), type, dat->len); @@ -1005,7 +1006,7 @@ static int store_object( } else delta = NULL; - git_deflate_init(&s, pack_compression_level); + git_deflate_init(&s, cfg->pack_compression_level); if (delta) { s.next_in = delta; s.avail_in = deltalen; @@ -1032,7 +1033,7 @@ static int store_object( if (delta) { FREE_AND_NULL(delta); - git_deflate_init(&s, pack_compression_level); + git_deflate_init(&s, cfg->pack_compression_level); s.next_in = (void *)dat->buf; s.avail_in = dat->len; s.avail_out = git_deflate_bound(&s, s.avail_in); @@ -1115,6 +1116,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) struct git_hash_ctx c; git_zstream s; struct hashfile_checkpoint checkpoint; + struct repo_config_values *cfg = repo_config_values(the_repository); int status = Z_OK; /* Determine if we should auto-checkpoint. */ @@ -1134,7 +1136,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) crc32_begin(pack_file); - git_deflate_init(&s, pack_compression_level); + git_deflate_init(&s, cfg->pack_compression_level); hdrlen = encode_in_pack_object_header(out_buf, out_sz, OBJ_BLOB, len); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index dd2480a73d2edf..8ccbe7e17832cd 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -386,8 +386,9 @@ static unsigned long do_compress(void **pptr, unsigned long size) git_zstream stream; void *in, *out; unsigned long maxsize; + struct repo_config_values *cfg = repo_config_values(the_repository); - git_deflate_init(&stream, pack_compression_level); + git_deflate_init(&stream, cfg->pack_compression_level); maxsize = git_deflate_bound(&stream, size); in = *pptr; @@ -413,8 +414,9 @@ static unsigned long write_large_blob_data(struct odb_read_stream *st, struct ha unsigned char ibuf[1024 * 16]; unsigned char obuf[1024 * 16]; unsigned long olen = 0; + struct repo_config_values *cfg = repo_config_values(the_repository); - git_deflate_init(&stream, pack_compression_level); + git_deflate_init(&stream, cfg->pack_compression_level); for (;;) { ssize_t readlen; @@ -5003,6 +5005,7 @@ int cmd_pack_objects(int argc, struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; struct list_objects_filter_options filter_options = LIST_OBJECTS_FILTER_INIT; + struct repo_config_values *cfg = repo_config_values(the_repository); struct option pack_objects_options[] = { OPT_CALLBACK_F('q', "quiet", &progress, NULL, @@ -5084,7 +5087,7 @@ int cmd_pack_objects(int argc, N_("ignore packs that have companion .keep file")), OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), N_("ignore this pack")), - OPT_INTEGER(0, "compression", &pack_compression_level, + OPT_INTEGER(0, "compression", &cfg->pack_compression_level, N_("pack compression level")), OPT_BOOL(0, "keep-true-parents", &grafts_keep_true_parents, N_("do not hide commits by grafts")), @@ -5243,10 +5246,10 @@ int cmd_pack_objects(int argc, if (!reuse_object) reuse_delta = 0; - if (pack_compression_level == -1) - pack_compression_level = Z_DEFAULT_COMPRESSION; - else if (pack_compression_level < 0 || pack_compression_level > Z_BEST_COMPRESSION) - die(_("bad pack compression level %d"), pack_compression_level); + if (cfg->pack_compression_level == -1) + cfg->pack_compression_level = Z_DEFAULT_COMPRESSION; + else if (cfg->pack_compression_level < 0 || cfg->pack_compression_level > Z_BEST_COMPRESSION) + die(_("bad pack compression level %d"), cfg->pack_compression_level); if (!delta_search_threads) /* --threads=0 means autodetect */ delta_search_threads = online_cpus(); diff --git a/environment.c b/environment.c index 5b0e88b65cf420..d0d3a4b7d29e7e 100644 --- a/environment.c +++ b/environment.c @@ -52,7 +52,6 @@ char *git_commit_encoding; char *git_log_output_encoding; char *apply_default_whitespace; char *apply_default_ignorewhitespace; -int pack_compression_level = Z_DEFAULT_COMPRESSION; int fsync_object_files = -1; int use_fsync = -1; enum fsync_method fsync_method = FSYNC_METHOD_DEFAULT; @@ -390,7 +389,7 @@ int git_default_core_config(const char *var, const char *value, if (!zlib_compression_seen) cfg->zlib_compression_level = level; if (!pack_compression_seen) - pack_compression_level = level; + cfg->pack_compression_level = level; return 0; } @@ -662,6 +661,8 @@ static int git_default_attr_config(const char *var, const char *value) int git_default_config(const char *var, const char *value, const struct config_context *ctx, void *cb) { + struct repo_config_values *cfg = repo_config_values(the_repository); + if (starts_with(var, "core.")) return git_default_core_config(var, value, ctx, cb); @@ -701,7 +702,7 @@ int git_default_config(const char *var, const char *value, level = Z_DEFAULT_COMPRESSION; else if (level < 0 || level > Z_BEST_COMPRESSION) die(_("bad pack compression level %d"), level); - pack_compression_level = level; + cfg->pack_compression_level = level; pack_compression_seen = 1; return 0; } @@ -721,4 +722,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->trust_ctime = 1; cfg->check_stat = 1; cfg->zlib_compression_level = Z_BEST_SPEED; + cfg->pack_compression_level = Z_DEFAULT_COMPRESSION; } diff --git a/environment.h b/environment.h index 93201620afc302..514576b67a2741 100644 --- a/environment.h +++ b/environment.h @@ -94,6 +94,7 @@ struct repo_config_values { int trust_ctime; int check_stat; int zlib_compression_level; + int pack_compression_level; /* section "branch" config values */ enum branch_track branch_track; @@ -171,7 +172,6 @@ extern int assume_unchanged; extern int warn_on_object_refname_ambiguity; extern char *apply_default_whitespace; extern char *apply_default_ignorewhitespace; -extern int pack_compression_level; extern unsigned long pack_size_limit_cfg; extern int precomposed_unicode; diff --git a/object-file.c b/object-file.c index 7c122ac419829a..37def5cc590784 100644 --- a/object-file.c +++ b/object-file.c @@ -1437,8 +1437,9 @@ static int stream_blob_to_pack(struct transaction_packfile *state, int status = Z_OK; int write_object = (flags & INDEX_WRITE_OBJECT); off_t offset = 0; + struct repo_config_values *cfg = repo_config_values(the_repository); - git_deflate_init(&s, pack_compression_level); + git_deflate_init(&s, cfg->pack_compression_level); hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, size); s.next_out = obuf + hdrlen; From 6f00fc0499851d33ef6eae3f8633cb67808834aa Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:18 +0100 Subject: [PATCH 078/107] environment: move "precomposed_unicode" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `core.precomposeunicode` configuration is currently stored in the global variable `precomposed_unicode`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `core.precomposeunicode` is parsed eagerly because it controls Unicode path normalization on macOS, a fundamental filesystem‑level behavior that many operations depend on; a lazy parse could lead to inconsistent results and hamper libification. This preserves the existing behavior while tying the value to the repository from which it was read, avoiding cross‑ repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- compat/precompose_utf8.c | 20 +++++++++++++------- environment.c | 4 ++-- environment.h | 2 +- upload-pack.c | 3 ++- 4 files changed, 18 insertions(+), 11 deletions(-) diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index 43b3be011439ef..0e94dbd8629805 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -48,16 +48,18 @@ void probe_utf8_pathname_composition(void) static const char *auml_nfc = "\xc3\xa4"; static const char *auml_nfd = "\x61\xcc\x88"; int output_fd; - if (precomposed_unicode != -1) + struct repo_config_values *cfg = repo_config_values(the_repository); + + if (cfg->precomposed_unicode != -1) return; /* We found it defined in the global config, respect it */ repo_git_path_replace(the_repository, &path, "%s", auml_nfc); output_fd = open(path.buf, O_CREAT|O_EXCL|O_RDWR, 0600); if (output_fd >= 0) { close(output_fd); repo_git_path_replace(the_repository, &path, "%s", auml_nfd); - precomposed_unicode = access(path.buf, R_OK) ? 0 : 1; + cfg->precomposed_unicode = access(path.buf, R_OK) ? 0 : 1; repo_config_set(the_repository, "core.precomposeunicode", - precomposed_unicode ? "true" : "false"); + cfg->precomposed_unicode ? "true" : "false"); repo_git_path_replace(the_repository, &path, "%s", auml_nfc); if (unlink(path.buf)) die_errno(_("failed to unlink '%s'"), path.buf); @@ -69,14 +71,16 @@ const char *precompose_string_if_needed(const char *in) { size_t inlen; size_t outlen; + struct repo_config_values *cfg = repo_config_values(the_repository); + if (!in) return NULL; if (has_non_ascii(in, (size_t)-1, &inlen)) { iconv_t ic_prec; char *out; - if (precomposed_unicode < 0) - repo_config_get_bool(the_repository, "core.precomposeunicode", &precomposed_unicode); - if (precomposed_unicode != 1) + if (cfg->precomposed_unicode < 0) + repo_config_get_bool(the_repository, "core.precomposeunicode", &cfg->precomposed_unicode); + if (cfg->precomposed_unicode != 1) return in; ic_prec = iconv_open(repo_encoding, path_encoding); if (ic_prec == (iconv_t) -1) @@ -130,7 +134,9 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname) struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR *prec_dir) { + struct repo_config_values *cfg = repo_config_values(the_repository); struct dirent *res; + res = readdir(prec_dir->dirp); if (res) { size_t namelenz = strlen(res->d_name) + 1; /* \0 */ @@ -149,7 +155,7 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR *prec_dir) prec_dir->dirent_nfc->d_ino = res->d_ino; prec_dir->dirent_nfc->d_type = res->d_type; - if ((precomposed_unicode == 1) && has_non_ascii(res->d_name, (size_t)-1, NULL)) { + if ((cfg->precomposed_unicode == 1) && has_non_ascii(res->d_name, (size_t)-1, NULL)) { if (prec_dir->ic_precompose == (iconv_t)-1) { die("iconv_open(%s,%s) failed, but needed:\n" " precomposed unicode is not supported.\n" diff --git a/environment.c b/environment.c index d0d3a4b7d29e7e..739b647ebe0ed1 100644 --- a/environment.c +++ b/environment.c @@ -72,7 +72,6 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE; int grafts_keep_true_parents; int core_sparse_checkout_cone; int sparse_expect_files_outside_of_patterns; -int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ unsigned long pack_size_limit_cfg; #ifndef PROTECT_HFS_DEFAULT @@ -532,7 +531,7 @@ int git_default_core_config(const char *var, const char *value, } if (!strcmp(var, "core.precomposeunicode")) { - precomposed_unicode = git_config_bool(var, value); + cfg->precomposed_unicode = git_config_bool(var, value); return 0; } @@ -723,4 +722,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->check_stat = 1; cfg->zlib_compression_level = Z_BEST_SPEED; cfg->pack_compression_level = Z_DEFAULT_COMPRESSION; + cfg->precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ } diff --git a/environment.h b/environment.h index 514576b67a2741..508cb1afbc9fda 100644 --- a/environment.h +++ b/environment.h @@ -95,6 +95,7 @@ struct repo_config_values { int check_stat; int zlib_compression_level; int pack_compression_level; + int precomposed_unicode; /* section "branch" config values */ enum branch_track branch_track; @@ -174,7 +175,6 @@ extern char *apply_default_whitespace; extern char *apply_default_ignorewhitespace; extern unsigned long pack_size_limit_cfg; -extern int precomposed_unicode; extern int protect_hfs; extern int protect_ntfs; diff --git a/upload-pack.c b/upload-pack.c index 9f6d6fe48c8c58..3a52237134ef3b 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -1336,6 +1336,7 @@ static int upload_pack_config(const char *var, const char *value, void *cb_data) { struct upload_pack_data *data = cb_data; + struct repo_config_values *cfg = repo_config_values(the_repository); if (!strcmp("uploadpack.allowtipsha1inwant", var)) { if (git_config_bool(var, value)) @@ -1366,7 +1367,7 @@ static int upload_pack_config(const char *var, const char *value, if (value) data->allow_packfile_uris = 1; } else if (!strcmp("core.precomposeunicode", var)) { - precomposed_unicode = git_config_bool(var, value); + cfg->precomposed_unicode = git_config_bool(var, value); } else if (!strcmp("transfer.advertisesid", var)) { data->advertise_sid = git_config_bool(var, value); } From dfa01cee1cb4d5e6c8567828370eb4785f7c33a1 Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:19 +0100 Subject: [PATCH 079/107] environment: move "core_sparse_checkout_cone" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `core.sparseCheckoutCone` configuration was previously stored in an uninitialized global `int` variable, risking cross‑repository state leakage. Move it into `repo_config_values`, where eagerly‑parsed repository configuration lives. `core.sparseCheckoutCone` is parsed eagerly because it determines the fundamental sparse‑checkout mode and is consulted very early during repository setup; a lazy parse could leave the sparse‑checkout state undefined and complicate libification. This preserves the existing behavior while tying the value to the repository from which it was read, avoiding cross‑ repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- builtin/mv.c | 2 +- builtin/sparse-checkout.c | 37 ++++++++++++++++++++++--------------- dir.c | 3 ++- environment.c | 4 ++-- environment.h | 2 +- sparse-index.c | 2 +- 6 files changed, 29 insertions(+), 21 deletions(-) diff --git a/builtin/mv.c b/builtin/mv.c index 2215d34e31f29a..ef3a326c906897 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -574,7 +574,7 @@ int cmd_mv(int argc, if (ignore_sparse && cfg->apply_sparse_checkout && - core_sparse_checkout_cone) { + cfg->core_sparse_checkout_cone) { /* * NEEDSWORK: we are *not* paying attention to * "out-to-out" move ( is out-of-cone and diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index f4aa405da93760..92d017b81f9a32 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -73,7 +73,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix, memset(&pl, 0, sizeof(pl)); - pl.use_cone_patterns = core_sparse_checkout_cone; + pl.use_cone_patterns = cfg->core_sparse_checkout_cone; sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL, 0); @@ -334,6 +334,7 @@ static int write_patterns_and_update(struct repository *repo, FILE *fp; struct lock_file lk = LOCK_INIT; int result; + struct repo_config_values *cfg = repo_config_values(the_repository); sparse_filename = get_sparse_checkout_filename(); @@ -353,7 +354,7 @@ static int write_patterns_and_update(struct repository *repo, if (!fp) die_errno(_("unable to fdopen %s"), get_lock_file_path(&lk)); - if (core_sparse_checkout_cone) + if (cfg->core_sparse_checkout_cone) write_cone_to_file(fp, pl); else write_patterns_to_file(fp, pl); @@ -402,15 +403,15 @@ static enum sparse_checkout_mode update_cone_mode(int *cone_mode) { /* If not specified, use previous definition of cone mode */ if (*cone_mode == -1 && cfg->apply_sparse_checkout) - *cone_mode = core_sparse_checkout_cone; + *cone_mode = cfg->core_sparse_checkout_cone; /* Set cone/non-cone mode appropriately */ cfg->apply_sparse_checkout = 1; if (*cone_mode == 1 || *cone_mode == -1) { - core_sparse_checkout_cone = 1; + cfg->core_sparse_checkout_cone = 1; return MODE_CONE_PATTERNS; } - core_sparse_checkout_cone = 0; + cfg->core_sparse_checkout_cone = 0; return MODE_ALL_PATTERNS; } @@ -577,7 +578,9 @@ static void add_patterns_from_input(struct pattern_list *pl, FILE *file) { int i; - if (core_sparse_checkout_cone) { + struct repo_config_values *cfg = repo_config_values(the_repository); + + if (cfg->core_sparse_checkout_cone) { struct strbuf line = STRBUF_INIT; hashmap_init(&pl->recursive_hashmap, pl_hashmap_cmp, NULL, 0); @@ -636,13 +639,14 @@ static void add_patterns_cone_mode(int argc, const char **argv, struct pattern_entry *pe; struct hashmap_iter iter; struct pattern_list existing; + struct repo_config_values *cfg = repo_config_values(the_repository); char *sparse_filename = get_sparse_checkout_filename(); add_patterns_from_input(pl, argc, argv, use_stdin ? stdin : NULL); memset(&existing, 0, sizeof(existing)); - existing.use_cone_patterns = core_sparse_checkout_cone; + existing.use_cone_patterns = cfg->core_sparse_checkout_cone; if (add_patterns_from_file_to_list(sparse_filename, "", 0, &existing, NULL, 0)) @@ -690,7 +694,7 @@ static int modify_pattern_list(struct repository *repo, switch (m) { case ADD: - if (core_sparse_checkout_cone) + if (cfg->core_sparse_checkout_cone) add_patterns_cone_mode(args->nr, args->v, pl, use_stdin); else add_patterns_literal(args->nr, args->v, pl, use_stdin); @@ -723,11 +727,12 @@ static void sanitize_paths(struct repository *repo, const char *prefix, int skip_checks) { int i; + struct repo_config_values *cfg = repo_config_values(the_repository); if (!args->nr) return; - if (prefix && *prefix && core_sparse_checkout_cone) { + if (prefix && *prefix && cfg->core_sparse_checkout_cone) { /* * The args are not pathspecs, so unfortunately we * cannot imitate how cmd_add() uses parse_pathspec(). @@ -744,10 +749,10 @@ static void sanitize_paths(struct repository *repo, if (skip_checks) return; - if (prefix && *prefix && !core_sparse_checkout_cone) + if (prefix && *prefix && !cfg->core_sparse_checkout_cone) die(_("please run from the toplevel directory in non-cone mode")); - if (core_sparse_checkout_cone) { + if (cfg->core_sparse_checkout_cone) { for (i = 0; i < args->nr; i++) { if (args->v[i][0] == '/') die(_("specify directories rather than patterns (no leading slash)")); @@ -769,7 +774,7 @@ static void sanitize_paths(struct repository *repo, if (S_ISSPARSEDIR(ce->ce_mode)) continue; - if (core_sparse_checkout_cone) + if (cfg->core_sparse_checkout_cone) die(_("'%s' is not a directory; to treat it as a directory anyway, rerun with --skip-checks"), args->v[i]); else warning(_("pass a leading slash before paths such as '%s' if you want a single file (see NON-CONE PROBLEMS in the git-sparse-checkout manual)."), args->v[i]); @@ -836,6 +841,7 @@ static struct sparse_checkout_set_opts { static int sparse_checkout_set(int argc, const char **argv, const char *prefix, struct repository *repo) { + struct repo_config_values *cfg = repo_config_values(the_repository); int default_patterns_nr = 2; const char *default_patterns[] = {"/*", "!/*/", NULL}; @@ -873,7 +879,7 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix, * non-cone mode, if nothing is specified, manually select just the * top-level directory (much as 'init' would do). */ - if (!core_sparse_checkout_cone && !set_opts.use_stdin && argc == 0) { + if (!cfg->core_sparse_checkout_cone && !set_opts.use_stdin && argc == 0) { for (int i = 0; i < default_patterns_nr; i++) strvec_push(&patterns, default_patterns[i]); } else { @@ -977,7 +983,7 @@ static int sparse_checkout_clean(int argc, const char **argv, setup_work_tree(); if (!cfg->apply_sparse_checkout) die(_("must be in a sparse-checkout to clean directories")); - if (!core_sparse_checkout_cone) + if (!cfg->core_sparse_checkout_cone) die(_("must be in a cone-mode sparse-checkout to clean directories")); argc = parse_options(argc, argv, prefix, @@ -1141,6 +1147,7 @@ static int sparse_checkout_check_rules(int argc, const char **argv, const char * FILE *fp; int ret; struct pattern_list pl = {0}; + struct repo_config_values *cfg = repo_config_values(the_repository); char *sparse_filename; check_rules_opts.cone_mode = -1; @@ -1152,7 +1159,7 @@ static int sparse_checkout_check_rules(int argc, const char **argv, const char * check_rules_opts.cone_mode = 1; update_cone_mode(&check_rules_opts.cone_mode); - pl.use_cone_patterns = core_sparse_checkout_cone; + pl.use_cone_patterns = cfg->core_sparse_checkout_cone; if (check_rules_opts.rules_file) { fp = xfopen(check_rules_opts.rules_file, "r"); add_patterns_from_input(&pl, argc, argv, fp); diff --git a/dir.c b/dir.c index fcb8f6dd2aa969..4f493b64c68dd8 100644 --- a/dir.c +++ b/dir.c @@ -3508,8 +3508,9 @@ int get_sparse_checkout_patterns(struct pattern_list *pl) { int res; char *sparse_filename = get_sparse_checkout_filename(); + struct repo_config_values *cfg = repo_config_values(the_repository); - pl->use_cone_patterns = core_sparse_checkout_cone; + pl->use_cone_patterns = cfg->core_sparse_checkout_cone; res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL, 0); free(sparse_filename); diff --git a/environment.c b/environment.c index 739b647ebe0ed1..b0e873e9f5b901 100644 --- a/environment.c +++ b/environment.c @@ -70,7 +70,6 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED; #endif enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE; int grafts_keep_true_parents; -int core_sparse_checkout_cone; int sparse_expect_files_outside_of_patterns; unsigned long pack_size_limit_cfg; @@ -526,7 +525,7 @@ int git_default_core_config(const char *var, const char *value, } if (!strcmp(var, "core.sparsecheckoutcone")) { - core_sparse_checkout_cone = git_config_bool(var, value); + cfg->core_sparse_checkout_cone = git_config_bool(var, value); return 0; } @@ -723,4 +722,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->zlib_compression_level = Z_BEST_SPEED; cfg->pack_compression_level = Z_DEFAULT_COMPRESSION; cfg->precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ + cfg->core_sparse_checkout_cone = 0; } diff --git a/environment.h b/environment.h index 508cb1afbc9fda..befad9a38876e9 100644 --- a/environment.h +++ b/environment.h @@ -96,6 +96,7 @@ struct repo_config_values { int zlib_compression_level; int pack_compression_level; int precomposed_unicode; + int core_sparse_checkout_cone; /* section "branch" config values */ enum branch_track branch_track; @@ -178,7 +179,6 @@ extern unsigned long pack_size_limit_cfg; extern int protect_hfs; extern int protect_ntfs; -extern int core_sparse_checkout_cone; extern int sparse_expect_files_outside_of_patterns; enum rebase_setup_type { diff --git a/sparse-index.c b/sparse-index.c index 13629c075d06e0..53cb8d64fc9b2c 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -154,7 +154,7 @@ int is_sparse_index_allowed(struct index_state *istate, int flags) { struct repo_config_values *cfg = repo_config_values(the_repository); - if (!cfg->apply_sparse_checkout || !core_sparse_checkout_cone) + if (!cfg->apply_sparse_checkout || !cfg->core_sparse_checkout_cone) return 0; if (!(flags & SPARSE_INDEX_MEMORY_ONLY)) { From c8a32140a7d76565a6d14e8d068e2a9b1562ac95 Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:20 +0100 Subject: [PATCH 080/107] environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `core.sparseCheckoutExpectFilesOutsideOfPatterns` configuration was previously stored in a global `int` variable, making it shared across repository instances and risking cross‑repository state leakage. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. This option is parsed eagerly because it controls how sparse‑checkout paths are interpreted – a fundamental behavior that many commands rely on; a lazy parse could cause inconsistent sparse‑checkout handling and complicate libification. This preserves the existing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- environment.c | 6 ++++-- environment.h | 5 +++-- sparse-index.c | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/environment.c b/environment.c index b0e873e9f5b901..57587ede56a1be 100644 --- a/environment.c +++ b/environment.c @@ -70,7 +70,6 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED; #endif enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE; int grafts_keep_true_parents; -int sparse_expect_files_outside_of_patterns; unsigned long pack_size_limit_cfg; #ifndef PROTECT_HFS_DEFAULT @@ -550,8 +549,10 @@ int git_default_core_config(const char *var, const char *value, static int git_default_sparse_config(const char *var, const char *value) { + struct repo_config_values *cfg = repo_config_values(the_repository); + if (!strcmp(var, "sparse.expectfilesoutsideofpatterns")) { - sparse_expect_files_outside_of_patterns = git_config_bool(var, value); + cfg->sparse_expect_files_outside_of_patterns = git_config_bool(var, value); return 0; } @@ -723,4 +724,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->pack_compression_level = Z_DEFAULT_COMPRESSION; cfg->precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ cfg->core_sparse_checkout_cone = 0; + cfg->sparse_expect_files_outside_of_patterns = 0; } diff --git a/environment.h b/environment.h index befad9a38876e9..609cdaa07fc8ca 100644 --- a/environment.h +++ b/environment.h @@ -98,6 +98,9 @@ struct repo_config_values { int precomposed_unicode; int core_sparse_checkout_cone; + /* section "sparse" config values */ + int sparse_expect_files_outside_of_patterns; + /* section "branch" config values */ enum branch_track branch_track; }; @@ -179,8 +182,6 @@ extern unsigned long pack_size_limit_cfg; extern int protect_hfs; extern int protect_ntfs; -extern int sparse_expect_files_outside_of_patterns; - enum rebase_setup_type { AUTOREBASE_NEVER = 0, AUTOREBASE_LOCAL, diff --git a/sparse-index.c b/sparse-index.c index 53cb8d64fc9b2c..1ed769b78d8de1 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -675,7 +675,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate) struct repo_config_values *cfg = repo_config_values(the_repository); if (!cfg->apply_sparse_checkout || - sparse_expect_files_outside_of_patterns) + cfg->sparse_expect_files_outside_of_patterns) return; if (clear_skip_worktree_from_present_files_sparse(istate)) { From 8407abf02aa310f4b8c21e0c9da925f8091564ff Mon Sep 17 00:00:00 2001 From: Olamide Caleb Bello Date: Tue, 2 Jun 2026 18:09:21 +0100 Subject: [PATCH 081/107] environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `core.warnAmbiguousRefs` configuration was previously stored in a global `int` variable, making it shared across repository instances and risking cross‑repository state leakage. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. This option is parsed eagerly because ambiguity warnings influence how users interpret object references in many commands; a lazy parse could cause these warnings to behave inconsistently or to appear for the wrong repository, confusing users and hindering libification. This preserves the existing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder Mentored-by: Usman Akinyemi Signed-off-by: Olamide Caleb Bello Signed-off-by: Junio C Hamano --- builtin/cat-file.c | 7 ++++--- builtin/pack-objects.c | 7 ++++--- environment.c | 2 +- environment.h | 2 +- object-name.c | 3 ++- revision.c | 7 ++++--- submodule.c | 7 ++++--- 7 files changed, 20 insertions(+), 15 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d9fbad535868bb..cfc543018684c1 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -901,6 +901,7 @@ static int batch_objects(struct batch_options *opt) struct strbuf input = STRBUF_INIT; struct strbuf output = STRBUF_INIT; struct expand_data data = EXPAND_DATA_INIT; + struct repo_config_values *cfg = repo_config_values(the_repository); int save_warning; int retval = 0; @@ -973,8 +974,8 @@ static int batch_objects(struct batch_options *opt) * warn) ends up dwarfing the actual cost of the object lookups * themselves. We can work around it by just turning off the warning. */ - save_warning = warn_on_object_refname_ambiguity; - warn_on_object_refname_ambiguity = 0; + save_warning = cfg->warn_on_object_refname_ambiguity; + cfg->warn_on_object_refname_ambiguity = 0; if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) { batch_objects_command(opt, &output, &data); @@ -1002,7 +1003,7 @@ static int batch_objects(struct batch_options *opt) cleanup: strbuf_release(&input); strbuf_release(&output); - warn_on_object_refname_ambiguity = save_warning; + cfg->warn_on_object_refname_ambiguity = save_warning; return retval; } diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 8ccbe7e17832cd..7df75fe91e1488 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4788,6 +4788,7 @@ static void get_object_list(struct rev_info *revs, struct strvec *argv) struct setup_revision_opt s_r_opt = { .allow_exclude_promisor_objects = 1, }; + struct repo_config_values *cfg = repo_config_values(the_repository); char line[1000]; int flags = 0; int save_warning; @@ -4798,8 +4799,8 @@ static void get_object_list(struct rev_info *revs, struct strvec *argv) /* make sure shallows are read */ is_repository_shallow(the_repository); - save_warning = warn_on_object_refname_ambiguity; - warn_on_object_refname_ambiguity = 0; + save_warning = cfg->warn_on_object_refname_ambiguity; + cfg->warn_on_object_refname_ambiguity = 0; while (fgets(line, sizeof(line), stdin) != NULL) { int len = strlen(line); @@ -4827,7 +4828,7 @@ static void get_object_list(struct rev_info *revs, struct strvec *argv) die(_("bad revision '%s'"), line); } - warn_on_object_refname_ambiguity = save_warning; + cfg->warn_on_object_refname_ambiguity = save_warning; if (use_bitmap_index && !get_object_list_from_bitmap(revs)) return; diff --git a/environment.c b/environment.c index 57587ede56a1be..ba2c60103ff51c 100644 --- a/environment.c +++ b/environment.c @@ -47,7 +47,6 @@ int minimum_abbrev = 4, default_abbrev = -1; int ignore_case; int assume_unchanged; int is_bare_repository_cfg = -1; /* unspecified */ -int warn_on_object_refname_ambiguity = 1; char *git_commit_encoding; char *git_log_output_encoding; char *apply_default_whitespace; @@ -725,4 +724,5 @@ void repo_config_values_init(struct repo_config_values *cfg) cfg->precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ cfg->core_sparse_checkout_cone = 0; cfg->sparse_expect_files_outside_of_patterns = 0; + cfg->warn_on_object_refname_ambiguity = 1; } diff --git a/environment.h b/environment.h index 609cdaa07fc8ca..1ff0a7ba8b0e82 100644 --- a/environment.h +++ b/environment.h @@ -97,6 +97,7 @@ struct repo_config_values { int pack_compression_level; int precomposed_unicode; int core_sparse_checkout_cone; + int warn_on_object_refname_ambiguity; /* section "sparse" config values */ int sparse_expect_files_outside_of_patterns; @@ -174,7 +175,6 @@ extern int has_symlinks; extern int minimum_abbrev, default_abbrev; extern int ignore_case; extern int assume_unchanged; -extern int warn_on_object_refname_ambiguity; extern char *apply_default_whitespace; extern char *apply_default_ignorewhitespace; extern unsigned long pack_size_limit_cfg; diff --git a/object-name.c b/object-name.c index 21dcdc4a0e7c55..319d3db01da110 100644 --- a/object-name.c +++ b/object-name.c @@ -684,11 +684,12 @@ static int get_oid_basic(struct repository *r, const char *str, int len, int refs_found = 0; int at, reflog_len, nth_prior = 0; int fatal = !(flags & GET_OID_QUIETLY); + struct repo_config_values *cfg = repo_config_values(the_repository); if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) { if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) && repo_settings_get_warn_ambiguous_refs(r) && - warn_on_object_refname_ambiguity) { + cfg->warn_on_object_refname_ambiguity) { refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0); if (refs_found > 0) { warning(warn_msg, len, str); diff --git a/revision.c b/revision.c index 599b3a66c369ca..4e7faa7eb15022 100644 --- a/revision.c +++ b/revision.c @@ -2922,9 +2922,10 @@ static void read_revisions_from_stdin(struct rev_info *revs, int seen_end_of_options = 0; int save_warning; int flags = 0; + struct repo_config_values *cfg = repo_config_values(the_repository); - save_warning = warn_on_object_refname_ambiguity; - warn_on_object_refname_ambiguity = 0; + save_warning = cfg->warn_on_object_refname_ambiguity; + cfg->warn_on_object_refname_ambiguity = 0; strbuf_init(&sb, 1000); while (strbuf_getline(&sb, stdin) != EOF) { @@ -2958,7 +2959,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, read_pathspec_from_stdin(&sb, prune); strbuf_release(&sb); - warn_on_object_refname_ambiguity = save_warning; + cfg->warn_on_object_refname_ambiguity = save_warning; } static void NORETURN diagnose_missing_default(const char *def) diff --git a/submodule.c b/submodule.c index b1a0363f9d2a96..f26235bbb728ee 100644 --- a/submodule.c +++ b/submodule.c @@ -898,12 +898,13 @@ static void collect_changed_submodules(struct repository *r, struct setup_revision_opt s_r_opt = { .assume_dashdash = 1, }; + struct repo_config_values *cfg = repo_config_values(the_repository); - save_warning = warn_on_object_refname_ambiguity; - warn_on_object_refname_ambiguity = 0; + save_warning = cfg->warn_on_object_refname_ambiguity; + cfg->warn_on_object_refname_ambiguity = 0; repo_init_revisions(r, &rev, NULL); setup_revisions_from_strvec(argv, &rev, &s_r_opt); - warn_on_object_refname_ambiguity = save_warning; + cfg->warn_on_object_refname_ambiguity = save_warning; if (prepare_revision_walk(&rev)) die(_("revision walk setup failed")); From bb4ce23284d3605c892fdf4fe349fe8773c813d2 Mon Sep 17 00:00:00 2001 From: Mirko Faina Date: Tue, 19 May 2026 02:55:22 +0200 Subject: [PATCH 082/107] revision.c: implement --max-count-oldest "--max-count" is a commit limiting option and sets a maximum amount of commits to be shown. If a user wants to see only the first N commits of the history (the oldest commits) they'd have to do something like git log $(git rev-list HEAD | tail -n N | head -n 1) This is not very user-friendly. Teach get_revision() the --max-count-oldest option. Signed-off-by: Mirko Faina [jc: fixed up t4202 ] Signed-off-by: Junio C Hamano --- Documentation/rev-list-options.adoc | 5 +- revision.c | 111 +++++++++++++++++++++++++++- revision.h | 2 + t/t4202-log.sh | 40 ++++++++++ 4 files changed, 154 insertions(+), 4 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 2d195a147456ea..e8c88d0f1c758f 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -16,7 +16,10 @@ ordering and formatting options, such as `--reverse`. `-`:: `-n `:: `--max-count=`:: - Limit the output to __ commits. + Limit the output to the first __ commits that would be shown. + +`--max-count-oldest=`:: + Limit the output to the last __ commits that would be shown. `--skip=`:: Skip __ commits before starting to show the commit output. diff --git a/revision.c b/revision.c index 599b3a66c369ca..5d53db3152ddf0 100644 --- a/revision.c +++ b/revision.c @@ -2339,10 +2339,28 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } if ((argcount = parse_long_opt("max-count", argv, &optarg))) { + if (revs->max_count_type == 1) + die_for_incompatible_opt2(1, "--max-count", 1, + "--max-count-oldest"); revs->max_count = parse_count(optarg); revs->no_walk = 0; + revs->max_count_type = 0; return argcount; + } else if ((argcount = parse_long_opt("max-count-oldest", argv, &optarg))) { + if (revs->max_count_type == 0 && revs->max_count != -1) + die_for_incompatible_opt2(1, "--max-count", 1, + "--max-count-oldest"); + if (revs->skip_count > 0) + die_for_incompatible_opt2(1, "--skip", 1, + "--max-count-oldest"); + revs->max_count = parse_count(optarg); + revs->no_walk = 0; + revs->max_count_type = 1; + revs->max_count_stage = 0; } else if ((argcount = parse_long_opt("skip", argv, &optarg))) { + if (revs->max_count_type == 1) + die_for_incompatible_opt2(1, "--skip", 1, + "--max-count-oldest"); revs->skip_count = parse_count(optarg); return argcount; } else if ((*arg == '-') && isdigit(arg[1])) { @@ -4521,15 +4539,91 @@ static struct commit *get_revision_internal(struct rev_info *revs) return c; } +static void retrieve_oldest_commits(struct rev_info *revs, + struct commit_list **queue) +{ + struct commit *c; + int max_count = revs->max_count; + int queuei_count = 0; + int queueo_count = 0; + struct commit_list *queueo = NULL; + struct commit_list *queuei = NULL; + struct commit_list *reversed_queue = NULL; + struct commit_list *p; + + revs->max_count = -1; + while ((c = get_revision_internal(revs))) { + /* + * We need to reset SHOWN status otherwise --graph breaks. + * It is fine to do, get_revision_internal() doesn't consider + * children commits as they have been already processed and the + * traversal happens only child to parent. + * + * We do this because the --graph machinery relies on the status + * of the parents to decide how the printing will happen. + * + * We can't simply replace this instruction with a + * graph_update() as it doesn't do the actualy printing, we'd + * have to remove any commit that goes over the + * --max-count-oldest limit from revs->graph. + */ + c->object.flags &= ~(SHOWN | CHILD_SHOWN); + commit_list_insert(c, &queuei); + if (!(c->object.flags & BOUNDARY)) + queuei_count++; + while (queuei_count + queueo_count > max_count) { + if (!queueo_count) { + while ((c = pop_commit(&queuei))) { + commit_list_insert(c, &queueo); + queueo_count++; + } + queuei_count = 0; + } + c = pop_commit(&queueo); + queueo_count--; + /* We need to do this otherwise we'll discard the + * commits that go over the --max-count-oldest limit but + * not their respective boundaries. This matters only if + * we're discarding the commit right before the boundary. + */ + for (p = c->parents; p; p = p->next) + p->item->object.flags &= ~CHILD_SHOWN; + } + } + + while ((c = pop_commit(&queueo))) + commit_list_insert(c, &reversed_queue); + while ((c = pop_commit(&queuei))) + commit_list_insert(c, &queueo); + while ((c = pop_commit(&queueo))) + commit_list_insert(c, &reversed_queue); + + while ((c = pop_commit(&reversed_queue))) + commit_list_insert(c, queue); +} + struct commit *get_revision(struct rev_info *revs) { struct commit *c; struct commit_list *reversed; + struct commit_list *queue = NULL; + struct commit_list *p; + + if (revs->max_count_type == 1 && !revs->max_count_stage) { + retrieve_oldest_commits(revs, &queue); + commit_list_free(revs->commits); + revs->commits = queue; + revs->max_count_stage = 1; + } if (revs->reverse) { reversed = NULL; - while ((c = get_revision_internal(revs))) - commit_list_insert(c, &reversed); + if (revs->max_count_type == 1) + while ((c = pop_commit(&revs->commits))) + commit_list_insert(c, &reversed); + else + while ((c = get_revision_internal(revs))) + commit_list_insert(c, &reversed); commit_list_free(revs->commits); revs->commits = reversed; revs->reverse = 0; @@ -4543,7 +4637,18 @@ struct commit *get_revision(struct rev_info *revs) return c; } - c = get_revision_internal(revs); + if (revs->max_count_stage) { + c = pop_commit(&revs->commits); + if (c) { + c->object.flags |= SHOWN; + if (!(c->object.flags & BOUNDARY)) + for (p = c->parents; p; p = p->next) + p->item->object.flags |= CHILD_SHOWN; + } + } else { + c = get_revision_internal(revs); + } + if (c && revs->graph) graph_update(revs->graph, c); if (!c) { diff --git a/revision.h b/revision.h index 584f1338b5e323..e157463cb1f62c 100644 --- a/revision.h +++ b/revision.h @@ -309,6 +309,8 @@ struct rev_info { /* special limits */ int skip_count; int max_count; + unsigned int max_count_type:1; + unsigned int max_count_stage:1; timestamp_t max_age; timestamp_t max_age_as_filter; timestamp_t min_age; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 05cee9e41bb48d..75edb0eb38c039 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -1882,6 +1882,46 @@ test_expect_success 'log --graph with --name-status' ' test_cmp_graph --name-status tangle..reach ' +test_expect_success 'log --max-count-oldest=3 --oneline' ' + test_when_finished rm expect && + git log --oneline | tail -n3 >expect && + git log --oneline --max-count-oldest=3 >actual && + test_cmp expect actual +' + +test_expect_success 'log --max-count-oldest=3 --reverse --oneline' ' + test_when_finished rm expect && + git log --oneline --reverse | head -n3 >expect && + git log --oneline --max-count-oldest=3 --reverse >actual && + test_cmp expect actual +' + +test_expect_success 'log --max-count-oldest with --max-count' ' + test_when_finished rm stderr && + test_must_fail git log --max-count-oldest=3 --max-count=3 2>stderr && + test_grep "cannot be used together" stderr +' + +test_expect_success 'log --max-count-oldest with --skip' ' + test_when_finished rm stderr && + test_must_fail git log --max-count-oldest=3 --skip=1 2>stderr && + test_grep "cannot be used together" stderr +' + +test_expect_success 'log --max-count-oldest=1000 --graph --boundary' ' + test_when_finished rm expect actual && + git log --graph --boundary >expect && + git log --max-count-oldest=1000 --graph --boundary >actual && + test_cmp expect actual +' + +test_expect_success 'log --oneline --graph --boundary --max-count-oldest=1' ' + test_when_finished rm -f actual && + git log --oneline --graph --boundary --max-count-oldest=1 \ + HEAD~1..HEAD >actual && + test_line_count = 2 actual +' + cat >expect <<-\EOF * reach | From 9b03e2790af03bebc9bc084cfc921492e6d5ca70 Mon Sep 17 00:00:00 2001 From: Harald Nordgren Date: Tue, 2 Jun 2026 18:43:27 +0000 Subject: [PATCH 083/107] config: add git_config_key_is_valid() for quiet validation Move the body of git_config_parse_key() into a static helper do_parse_config_key() that takes a "quiet" flag and treats store_key as optional. git_config_parse_key() becomes a thin wrapper. Add git_config_key_is_valid() for callers that only need to know whether a key is well-formed. Signed-off-by: Harald Nordgren Signed-off-by: Junio C Hamano --- config.c | 38 +++++++++++++++++++++++++++++--------- config.h | 2 ++ 2 files changed, 31 insertions(+), 9 deletions(-) diff --git a/config.c b/config.c index 156f2a24fa0027..7ae356ccaa4b3f 100644 --- a/config.c +++ b/config.c @@ -536,11 +536,14 @@ static inline int iskeychar(int c) * -2 if there is no section name in the key. * * store_key - pointer to char* which will hold a copy of the key with - * lowercase section and variable name + * lowercase section and variable name, can be NULL to skip + * allocation when only validation is needed * baselen - pointer to size_t which will hold the length of the * section + subsection part, can be NULL + * quiet - when non-zero, suppress error() reports on rejection */ -int git_config_parse_key(const char *key, char **store_key, size_t *baselen_) +static int do_parse_config_key(const char *key, char **store_key, + size_t *baselen_, int quiet) { size_t i, baselen; int dot; @@ -552,12 +555,14 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_) */ if (last_dot == NULL || last_dot == key) { - error(_("key does not contain a section: %s"), key); + if (!quiet) + error(_("key does not contain a section: %s"), key); return -CONFIG_NO_SECTION_OR_NAME; } if (!last_dot[1]) { - error(_("key does not contain variable name: %s"), key); + if (!quiet) + error(_("key does not contain variable name: %s"), key); return -CONFIG_NO_SECTION_OR_NAME; } @@ -568,7 +573,8 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_) /* * Validate the key and while at it, lower case it for matching. */ - *store_key = xmallocz(strlen(key)); + if (store_key) + *store_key = xmallocz(strlen(key)); dot = 0; for (i = 0; key[i]; i++) { @@ -579,24 +585,38 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_) if (!dot || i > baselen) { if (!iskeychar(c) || (i == baselen + 1 && !isalpha(c))) { - error(_("invalid key: %s"), key); + if (!quiet) + error(_("invalid key: %s"), key); goto out_free_ret_1; } c = tolower(c); } else if (c == '\n') { - error(_("invalid key (newline): %s"), key); + if (!quiet) + error(_("invalid key (newline): %s"), key); goto out_free_ret_1; } - (*store_key)[i] = c; + if (store_key) + (*store_key)[i] = c; } return 0; out_free_ret_1: - FREE_AND_NULL(*store_key); + if (store_key) + FREE_AND_NULL(*store_key); return -CONFIG_INVALID_KEY; } +int git_config_parse_key(const char *key, char **store_key, size_t *baselen_) +{ + return do_parse_config_key(key, store_key, baselen_, 0); +} + +int git_config_key_is_valid(const char *key) +{ + return !do_parse_config_key(key, NULL, NULL, 1); +} + static int config_parse_pair(const char *key, const char *value, struct key_value_info *kvi, config_fn_t fn, void *data) diff --git a/config.h b/config.h index ba426a960af9f4..26a2850d15afed 100644 --- a/config.h +++ b/config.h @@ -337,6 +337,8 @@ void repo_config_set(struct repository *, const char *, const char *); int git_config_parse_key(const char *, char **, size_t *); +int git_config_key_is_valid(const char *); + /* * The following macros specify flag bits that alter the behavior * of the repo_config_set_multivar*() methods. From 03c29e2e980da7595cbade29e02616d2de2c42f8 Mon Sep 17 00:00:00 2001 From: Harald Nordgren Date: Tue, 2 Jun 2026 18:43:28 +0000 Subject: [PATCH 084/107] config: improve diagnostic for "set" with missing value "git config set pull.rebase=false" currently fails with "wrong number of arguments", and the implicit form "git config pull.rebase=false" fails with "invalid key". Neither points at the real problem: the value is missing. Report that directly, and when the argument has the shape "=", also suggest the split form: $ git config set pull.rebase=false error: missing value to set to the variable 'pull.rebase=false' hint: did you mean "git config set pull.rebase false"? When the prefix before "=" is not a valid key, drop the hint: $ git config set foo=bar error: missing value to set to a variable with an invalid name 'foo=bar' Signed-off-by: Harald Nordgren Signed-off-by: Junio C Hamano --- builtin/config.c | 32 ++++++++++++++++++++++++++- t/t1300-config.sh | 56 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 1 deletion(-) diff --git a/builtin/config.c b/builtin/config.c index cf4ba0f7cc6f22..8d8ec0beead220 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -1,6 +1,7 @@ #define USE_THE_REPOSITORY_VARIABLE #include "builtin.h" #include "abspath.h" +#include "advice.h" #include "config.h" #include "color.h" #include "date.h" @@ -210,6 +211,26 @@ static void check_argc(int argc, int min, int max) exit(129); } +static NORETURN void die_missing_set_value(const char *arg) +{ + const char *last_dot = strrchr(arg, '.'); + const char *eq = last_dot ? strchr(last_dot + 1, '=') : NULL; + char *prefix = eq ? xstrndup(arg, eq - arg) : NULL; + + if (prefix && git_config_key_is_valid(prefix)) { + error(_("missing value to set to the variable '%s'"), arg); + advise(_("did you mean \"git config set %s %s\"?"), + prefix, eq + 1); + } else if (git_config_key_is_valid(arg)) { + error(_("missing value to set to the variable '%s'"), arg); + } else { + error(_("missing value to set to a variable with an invalid name '%s'"), + arg); + } + free(prefix); + exit(129); +} + static void show_config_origin(const struct config_display_options *opts, const struct key_value_info *kvi, struct strbuf *buf) @@ -1133,6 +1154,8 @@ static int cmd_config_set(int argc, const char **argv, const char *prefix, argc = parse_options(argc, argv, prefix, opts, builtin_config_set_usage, PARSE_OPT_STOP_AT_NON_OPTION); + if (argc == 1) + die_missing_set_value(argv[0]); check_argc(argc, 2, 2); if ((flags & CONFIG_FLAGS_FIXED_VALUE) && !value_pattern) @@ -1371,6 +1394,7 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix) }; char *value = NULL, *comment = NULL; int ret = 0; + int actions_implicit; struct key_value_info default_kvi = KVI_INIT; argc = parse_options(argc, argv, prefix, opts, @@ -1385,7 +1409,8 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix) exit(129); } - if (actions == 0) + actions_implicit = (actions == 0); + if (actions_implicit) switch (argc) { case 1: actions = ACTION_GET; break; case 2: actions = ACTION_SET; break; @@ -1394,6 +1419,11 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix) error(_("no action specified")); exit(129); } + if (actions_implicit && argc == 1) { + const char *last_dot = strrchr(argv[0], '.'); + if (last_dot && strchr(last_dot + 1, '=')) + die_missing_set_value(argv[0]); + } if (display_opts.omit_values && !(actions == ACTION_LIST || actions == ACTION_GET_REGEXP)) { error(_("--name-only is only applicable to --list or --get-regexp")); diff --git a/t/t1300-config.sh b/t/t1300-config.sh index 128971ee12fa6c..e53c8ecea1304e 100755 --- a/t/t1300-config.sh +++ b/t/t1300-config.sh @@ -462,6 +462,62 @@ test_expect_success 'invalid key' ' test_must_fail git config inval.2key blabla ' +test_expect_success 'set with 1 arg of "key=value": valid key suggests split form' ' + test_must_fail git config set pull.rebase=false 2>err && + test_grep "missing value to set to the variable .pull\\.rebase=false." err && + test_grep "did you mean .git config set pull\\.rebase false." err +' + +test_expect_success 'set with 1 arg of "key=value": implicit form suggests split form' ' + test_must_fail git config pull.rebase=false 2>err && + test_grep "missing value to set to the variable .pull\\.rebase=false." err && + test_grep "did you mean .git config set pull\\.rebase false." err +' + +test_expect_success 'set with 1 arg of "key=value": invalid key does not suggest split form' ' + test_must_fail git config set foo=bar 2>err && + test_grep "missing value to set to a variable with an invalid name .foo=bar." err && + test_grep ! "did you mean" err +' + +test_expect_success 'set with 1 arg: variable name starting with digit is invalid' ' + test_must_fail git config set foo.1bar=baz 2>err && + test_grep "missing value to set to a variable with an invalid name .foo\\.1bar=baz." err && + test_grep ! "did you mean" err +' + +test_expect_success 'set with 1 arg: digit-led section name is valid' ' + test_must_fail git config set 1foo.bar=baz 2>err && + test_grep "missing value to set to the variable .1foo\\.bar=baz." err && + test_grep "did you mean .git config set 1foo\\.bar baz." err +' + +test_expect_success 'set with 1 arg: subsection plus invalid variable name' ' + test_must_fail git config set foo.some.b_r=baz 2>err && + test_grep "missing value to set to a variable with an invalid name .foo\\.some\\.b_r=baz." err && + test_grep ! "did you mean" err +' + +test_expect_success 'set with 1 arg of valid key reports missing value' ' + test_must_fail git config set pull.rebase 2>err && + test_grep "missing value to set to the variable .pull\\.rebase." err && + test_grep ! "did you mean" err +' + +test_expect_success 'set with 2 args including "=" in invalid key does not suggest' ' + test_must_fail git config set pull.rebase=false true 2>err && + test_grep "invalid key: pull\\.rebase=false" err && + test_grep ! "did you mean" err +' + +test_expect_success '"=" inside subsection is valid' ' + test_when_finished "rm -f subsection.cfg" && + git config set -f subsection.cfg foo.bar=baz.boo qux && + echo qux >expect && + git config get -f subsection.cfg foo.bar=baz.boo >actual && + test_cmp expect actual +' + test_expect_success 'correct key' ' git config 123456.a123 987 ' From 9708b3dc95a22116c4a058b107b063da4bcf7d4a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:31 +0200 Subject: [PATCH 085/107] gitlab-ci: rearrange Linux jobs to match GitHub's order Rearrange the order of Linux jobs that we have defined in GitLab CI so that it matches the order on GitHub's side. This makes it easier to compare whether the list of jobs actually matches on both sides. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- .gitlab-ci.yml | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 83ec786c5a49d0..c4eec6e7651300 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -42,15 +42,15 @@ test:linux: - jobname: linux-reftable image: ubuntu:rolling CC: clang + - jobname: linux-TEST-vars + image: ubuntu:20.04 + CC: gcc + CC_PACKAGE: gcc-8 - jobname: linux-breaking-changes image: ubuntu:20.04 CC: gcc - jobname: fedora-breaking-changes-meson image: fedora:latest - - jobname: linux-TEST-vars - image: ubuntu:20.04 - CC: gcc - CC_PACKAGE: gcc-8 - jobname: linux-leaks image: ubuntu:rolling CC: gcc @@ -60,13 +60,14 @@ test:linux: - jobname: linux-asan-ubsan image: ubuntu:rolling CC: clang + - jobname: linux-meson + image: ubuntu:rolling + CC: gcc - jobname: linux-musl-meson image: alpine:latest + # Supported until 2025-04-02. - jobname: linux32 image: i386/ubuntu:20.04 - - jobname: linux-meson - image: ubuntu:rolling - CC: gcc artifacts: paths: - t/failed-test-artifacts From f0ba41bae89ae5bb66d1b9677d26bd6d7953da34 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:32 +0200 Subject: [PATCH 086/107] gitlab-ci: add missing Linux jobs The GitLab CI definitions are missing jobs for AlmaLinux and Debian, both of which exist in GitHub Workflows. Plug this gap. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- .gitlab-ci.yml | 6 ++++++ ci/lib.sh | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index c4eec6e7651300..2b9ed44eaf24c4 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -68,6 +68,12 @@ test:linux: # Supported until 2025-04-02. - jobname: linux32 image: i386/ubuntu:20.04 + # A RHEL 8 compatible distro. Supported until 2029-05-31. + - jobname: almalinux-8 + image: almalinux:8 + # Supported until 2026-08-31. + - jobname: debian-11 + image: debian:11 artifacts: paths: - t/failed-test-artifacts diff --git a/ci/lib.sh b/ci/lib.sh index 6e3799cfc3ccd5..b939110a6eefcf 100755 --- a/ci/lib.sh +++ b/ci/lib.sh @@ -254,7 +254,7 @@ then CI_OS_NAME=osx JOBS=$(nproc) ;; - *,alpine:*|*,fedora:*|*,ubuntu:*|*,i386/ubuntu:*) + *,almalinux:*|*,alpine:*|*,debian:*|*,fedora:*|*,ubuntu:*|*,i386/ubuntu:*) CI_OS_NAME=linux JOBS=$(nproc) ;; From 43a6a005c8970f13c46202e821111f9e42538b9d Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:33 +0200 Subject: [PATCH 087/107] ci: unify Linux images across GitLab and GitHub The image for the "linux-breaking-changes" job has drifted apart across GitHub and GitLab. Adapt it to use "ubuntu:rolling" on both systems. With this change there's only one difference remaining: GitHub uses "ubuntu:focal" for the "linux32" job while GitLab uses "ubuntu:20.04". These are different names for the same image, so there is no actual difference here. Adjust GitHub to use the "20.04" tag -- this matches all the other jobs which use version numbers, and you don't have to learn Ubuntu's release names by heart. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- .github/workflows/main.yml | 2 +- .gitlab-ci.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 3da5326f0ba90a..cf341d74dbff21 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -407,7 +407,7 @@ jobs: image: alpine:latest # Supported until 2025-04-02. - jobname: linux32 - image: i386/ubuntu:focal + image: i386/ubuntu:20.04 # A RHEL 8 compatible distro. Supported until 2029-05-31. - jobname: almalinux-8 image: almalinux:8 diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 2b9ed44eaf24c4..ef1c723355d153 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -47,7 +47,7 @@ test:linux: CC: gcc CC_PACKAGE: gcc-8 - jobname: linux-breaking-changes - image: ubuntu:20.04 + image: ubuntu:rolling CC: gcc - jobname: fedora-breaking-changes-meson image: fedora:latest From bf3ed750cb4479db9e8193ae1937b2723054ce48 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:34 +0200 Subject: [PATCH 088/107] t7527: fix broken TAP output Before running the tests in t7527 we first verify whether the fsmonitor even works, which seems to depend on the actual filesystem that is in use. The verification executes outside of any prerequisite or test body, so its stdout/stderr is not being redirected. The consequence of this is that any command that prints to stdout/stderr may break the TAP specification by printing invalid lines. And in fact we already do that, as git-init(1) prints the path to the created Git repository by default. Fix this issue by moving the logic into a lazy prerequisite. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- t/t7527-builtin-fsmonitor.sh | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index b63c162f9bac3f..d881e27466c18e 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -25,7 +25,8 @@ maybe_timeout () { "$@" fi } -verify_fsmonitor_works () { + +test_lazy_prereq FSMONITOR_WORKS ' git init test_fsmonitor_smoke || return 1 GIT_TRACE_FSMONITOR="$PWD/smoke.trace" && @@ -50,9 +51,9 @@ verify_fsmonitor_works () { ret=$? rm -rf test_fsmonitor_smoke smoke.trace return $ret -} +' -if ! verify_fsmonitor_works +if ! test_have_prereq FSMONITOR_WORKS then skip_all="filesystem does not deliver fsmonitor events (container/overlayfs?)" test_done From b1688db759de18a8403945090688b8cc25ba26dd Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:35 +0200 Subject: [PATCH 089/107] t7810: turn MB_REGEX check into a lazy prereq In t7810 we verify whether the system has proper multibyte locale support by executing `test-tool regex` with a unicode character. When this check fails though we'll output an error that breaks the TAP format. Fix this issue by turning the logic into a lazy prerequisite. Reported-by: Jeff King Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- t/t7810-grep.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 1b195bee599a37..d61c4a4d73c390 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -18,8 +18,9 @@ test_invalid_grep_expression() { ' } -LC_ALL=en_US.UTF-8 test-tool regex '^.$' '¿' && - test_set_prereq MB_REGEX +test_lazy_prereq MB_REGEX ' + LC_ALL=en_US.UTF-8 test-tool regex "^.$" "¿" +' cat >hello.c < From d11968661e641ea81f4c1938ae9f73a54107dc62 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:36 +0200 Subject: [PATCH 090/107] t/test-lib: silence EBUSY errors on Windows during test cleanup When tests have finished we clean up the trash directory via `rm -rf`. On Windows this can fail with EBUSY in cases where a process still holds some of the files open, for example when we have spawned a daemonized process that wasn't properly terminated. We thus retry several times, but every failure will result in error messages being printed, and that in turn breaks the TAP output format. One such case where this is causing issues is in t921x, which contains tests related to Scalar. Some tests spawn the fsmonitor daemon, and we never properly terminate it. The obvious fix would be to ensure that we never leak any processes, but that gets ugly fast. Instead, let's work around the issue by silencing error messages printed by the `rm -rf` calls. We already know to print an error when the retry loop fails, so we don't loose much. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- t/test-lib.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/test-lib.sh b/t/test-lib.sh index 4a7357b547e77e..d1d24c4124fd1d 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1299,10 +1299,10 @@ test_done () { error "Tests passed but trash directory already removed before test cleanup; aborting" cd "$TRASH_DIRECTORY/.." && - rm -fr "$TRASH_DIRECTORY" || { + rm -fr "$TRASH_DIRECTORY" 2>/dev/null || { # try again in a bit sleep 5; - rm -fr "$TRASH_DIRECTORY" + rm -fr "$TRASH_DIRECTORY" 2>/dev/null } || error "Tests passed but test cleanup failed; aborting" fi From c2d2d173ae6ba4b354a36b3ba732c8a11379d6ec Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:37 +0200 Subject: [PATCH 091/107] t/lib-git-p4: silence output when killing p4d and its watchdog When stopping the p4d watchdog process via "kill -9", the shell may print a job-control notification like: ./test-lib.sh: line 1269: 57960 Killed: 9 while true; do if test $nr_tries_left -eq 0; then kill -9 $p4d_pid; exit 1; fi; sleep 1; nr_tries_left=$(($nr_tries_left - 1)); done 2> /dev/null 4>&2 (wd: ~) This message is printed asynchronously by the shell when it reaps the process. While harmless right now, this will cause breakage once we enable strict parsing of the TAP protocol in a subsequent commit. Fix this by using `wait` so that we can synchronously reap the watchdog process and swallow the diagnostic. While at it, deduplicate the logic we have in `stop_p4d_and_watchdog ()` and `stop_and_cleanup_p4d ()`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- t/lib-git-p4.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/lib-git-p4.sh b/t/lib-git-p4.sh index d22e9c684a495a..910886818768f0 100644 --- a/t/lib-git-p4.sh +++ b/t/lib-git-p4.sh @@ -65,6 +65,7 @@ pidfile="$TRASH_DIRECTORY/p4d.pid" stop_p4d_and_watchdog () { kill -9 $p4d_pid $watchdog_pid + wait $p4d_pid $watchdog_pid 2>/dev/null } # git p4 submit generates a temp file, which will @@ -174,8 +175,7 @@ retry_until_success () { } stop_and_cleanup_p4d () { - kill -9 $p4d_pid $watchdog_pid - wait $p4d_pid + stop_p4d_and_watchdog rm -rf "$db" "$cli" "$pidfile" } From 389c83025dbde15d30d0791281133bf30e45078d Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 12:07:38 +0200 Subject: [PATCH 092/107] t: let prove fail when parsing invalid TAP output To make the result of our tests accessible we use the TAP protocol. This protocol is parsed by either prove or by Meson. Unfortunately, these two tools differ when it comes to their strictness when parsing the protocol: - Prove by default happily accepts lines not specified by the protocol. - Meson will also accept such lines, but prints a big and ugly warning message. We have fixed our test suite in the past to not print invalid TAP lines anymore via b1dc2e796e (Merge branch 'ps/meson-tap-parse', 2025-06-17). But as none of our tools perform a strict check it's still possible for broken tests to sneak back in, like for example in 362f69547f (Merge branch 'ps/t1006-tap-fix', 2025-07-16). This doesn't hurt at all when using prove, but it's quite annoying when using Meson due to the generated warnings. Unfortunately, there doesn't seem to be a portable way to make all tools complain about violations of the TAP format. The TAP 14 specification has added pragmas to the protocol that would allow us to say `pragma +strict`, and the effect of that would be to treat invalid TAP lines as a test failure. But the release of TAP 14 is still rather recent, and Test-Harness for example only gained support for it in version 3.48, which was released in 2023. In fact though, this pragma was already introduced as an inofficial extension of the TAP protocol with Test-Harness 3.10, released in 2008. So while not all tools understand the pragma, at least prove does for a long time. Unconditionally enable the pragma when using prove so that we'll detect tests that emit broken TAP output right away. This would have detected the issues fixed in preceding commits: $ prove t7527-builtin-fsmonitor.sh t7527-builtin-fsmonitor.sh .. All 69 subtests passed (less 6 skipped subtests: 63 okay) Test Summary Report ------------------- t7527-builtin-fsmonitor.sh (Wstat: 0 Tests: 69 Failed: 0) Parse errors: Unknown TAP token: "Initialized empty Git repository in /tmp/git/test_fsmonitor_smoke/.git/" Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- t/test-lib.sh | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/t/test-lib.sh b/t/test-lib.sh index d1d24c4124fd1d..ceefb99bff60e0 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1532,6 +1532,12 @@ then BAIL_OUT 'You need to build test-tool; Run "make t/helper/test-tool" in the source (toplevel) directory' fi +if test -n "$HARNESS_ACTIVE" +then + say "TAP version 13" + say "pragma +strict" +fi + # Are we running this test at all? remove_trash= this_test=${0##*/} From 027e3b3d38fa7989b17bdf60501d5f1617141688 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:25 +0200 Subject: [PATCH 093/107] t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY In subsequent commits we'll rework how we set up the repository. This is a somewhat intricate and thus fragile sequence; there's many things that can go subtly wrong, and there are lots of interesting interactions that one can discover. One such discovered edge case was the interaction between git-init(1) and the "GIT_OBJECT_DIRECTORY" environment variable. When set, the behaviour is that the object directory should be created at the path that the variable points to. This behaviour is documented as such in its man page: If the object storage directory is specified via the GIT_OBJECT_DIRECTORY environment variable then the sha1 directories are created underneath; otherwise, the default $GIT_DIR/objects directory is used. Curiously enough though we don't seem to have any tests that exercise this directly, and thus a subsequent commit inadvertently would have broken this expectation. Plug this test gap. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- t/t0001-init.sh | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/t/t0001-init.sh b/t/t0001-init.sh index e4d32bb4d259f6..e89feca544d29c 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -980,4 +980,14 @@ test_expect_success 're-init reads matching includeIf.onbranch' ' test_cmp expect err ' +test_expect_success 'init honors GIT_OBJECT_DIRECTORY' ' + test_when_finished "rm -rf init-objdir custom-odb" && + mkdir custom-odb && + env GIT_OBJECT_DIRECTORY="$(pwd)/custom-odb" \ + git init init-objdir && + test_path_is_missing init-objdir/.git/objects/pack && + test_path_is_dir custom-odb/pack && + test_path_is_dir custom-odb/info +' + test_done From 452ad8db6d9155d6c7305d6045d29c49a7cc9c7c Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:26 +0200 Subject: [PATCH 094/107] setup: drop `setup_git_env()` The `setup_git_env()` function is a trivial wrapper around `setup_git_env_internal()` and has a single call site only. Drop the function. While at it, drop stale documentation in "environment.h" that points to this function, even though it hasn't been exposed to callers outside of "setup.c" since 43ad1047a9 (setup: stop using `the_repository` in `setup_git_env()`, 2026-03-27) anymore. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- environment.h | 8 +------- refs.c | 3 ++- setup.c | 7 +------ 3 files changed, 4 insertions(+), 14 deletions(-) diff --git a/environment.h b/environment.h index 9eb97b3869c9b1..ccfcf37bfb9b99 100644 --- a/environment.h +++ b/environment.h @@ -130,13 +130,6 @@ void repo_config_values_init(struct repo_config_values *cfg); * `the_repository`. We should eventually get rid of these and make the * dependency on a repository explicit: * - * - `setup_git_env()` ideally shouldn't exist as it modifies global state, - * namely the environment. The current process shouldn't ever access that - * state via envvars though, but should instead consult a `struct - * repository`. When spawning new processes, we would ideally also pass a - * `struct repository` and then set up the environment variables for the - * child process, only. - * * - `have_git_dir()` should not have to exist at all. Instead, we should * decide on whether or not we have a `struct repository`. * @@ -147,6 +140,7 @@ void repo_config_values_init(struct repo_config_values *cfg); * Please do not add new global config variables here. */ # ifdef USE_THE_REPOSITORY_VARIABLE + /* * Returns true iff we have a configured git repository (either via * setup_git_directory, or in the environment via $GIT_DIR). diff --git a/refs.c b/refs.c index 0f3355d2ee0be1..e7070eb7432db0 100644 --- a/refs.c +++ b/refs.c @@ -126,7 +126,8 @@ struct ref_namespace_info ref_namespace[] = { * points to the content of another. Unlike the other * ref namespaces, this one can be changed by the * GIT_REPLACE_REF_BASE environment variable. This - * .namespace value will be overwritten in setup_git_env(). + * .namespace value will be overwritten during repository + * setup. */ .ref = "refs/replace/", .decoration = DECORATION_GRAFTED, diff --git a/setup.c b/setup.c index d723306dfe5256..252b4431172265 100644 --- a/setup.c +++ b/setup.c @@ -1074,11 +1074,6 @@ static void setup_git_env_internal(struct repository *repo, fetch_if_missing = 0; } -static void setup_git_env(struct repository *repo, const char *git_dir) -{ - setup_git_env_internal(repo, git_dir, false); -} - static void set_git_dir_1(struct repository *repo, const char *path, bool skip_initializing_odb) { xsetenv(GIT_DIR_ENVIRONMENT, path, 1); @@ -2023,7 +2018,7 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) const char *gitdir = getenv(GIT_DIR_ENVIRONMENT); if (!gitdir) gitdir = DEFAULT_GIT_DIR_ENVIRONMENT; - setup_git_env(repo, gitdir); + setup_git_env_internal(repo, gitdir, false); } if (startup_info->have_repository) { repo_set_hash_algo(repo, repo_fmt.hash_algo); From 3d884b0b5656fe012002edd6bb8f36a125e6c17e Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:27 +0200 Subject: [PATCH 095/107] setup: deduplicate logic to apply repository format After having discovered the repository format we then apply it to the repository so that it knows to use the proper repository extensions. The logic to apply the format is duplicated across three callsites, which makes it rather painfull to add new extensions. Introduce a new function `apply_repository_format()` that takes a repo and applies a given format to it and adapt all callsites to use it. This function is also the new caller of `verify_repository_format()` so that we can ensure that we never apply an invalid repository format. The verification we have in `read_and_verify_repository_format()` is thus redundant now and dropped. Rename `read_and_verify_repository_format()` accordingly. While at it, also rename `check_repository_format()` to clarify that it doesn't only _check_ the format, but that it also applies it. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- repository.c | 31 +++++++----------- setup.c | 93 ++++++++++++++++++++++++++++------------------------ setup.h | 10 ++++++ 3 files changed, 71 insertions(+), 63 deletions(-) diff --git a/repository.c b/repository.c index db57b8308b94e7..58a13f7c4f5d85 100644 --- a/repository.c +++ b/repository.c @@ -262,8 +262,8 @@ void repo_set_worktree(struct repository *repo, const char *path) trace2_def_repo(repo); } -static int read_and_verify_repository_format(struct repository_format *format, - const char *commondir) +static int read_repository_format_from_commondir(struct repository_format *format, + const char *commondir) { int ret = 0; struct strbuf sb = STRBUF_INIT; @@ -272,11 +272,6 @@ static int read_and_verify_repository_format(struct repository_format *format, read_repository_format(format, sb.buf); strbuf_reset(&sb); - if (verify_repository_format(format, &sb) < 0) { - warning("%s", sb.buf); - ret = -1; - } - strbuf_release(&sb); return ret; } @@ -290,6 +285,8 @@ int repo_init(struct repository *repo, const char *worktree) { struct repository_format format = REPOSITORY_FORMAT_INIT; + struct strbuf err = STRBUF_INIT; + memset(repo, 0, sizeof(*repo)); initialize_repository(repo); @@ -297,21 +294,13 @@ int repo_init(struct repository *repo, if (repo_init_gitdir(repo, gitdir)) goto error; - if (read_and_verify_repository_format(&format, repo->commondir)) + if (read_repository_format_from_commondir(&format, repo->commondir)) goto error; - repo_set_hash_algo(repo, format.hash_algo); - repo_set_compat_hash_algo(repo, format.compat_hash_algo); - repo_set_ref_storage_format(repo, format.ref_storage_format, - format.ref_storage_payload); - repo->repository_format_worktree_config = format.worktree_config; - repo->repository_format_relative_worktrees = format.relative_worktrees; - repo->repository_format_precious_objects = format.precious_objects; - repo->repository_format_submodule_path_cfg = format.submodule_path_cfg; - - /* take ownership of format.partial_clone */ - repo->repository_format_partial_clone = format.partial_clone; - format.partial_clone = NULL; + if (apply_repository_format(repo, &format, &err) < 0) { + warning("%s", err.buf); + goto error; + } if (worktree) repo_set_worktree(repo, worktree); @@ -320,10 +309,12 @@ int repo_init(struct repository *repo, repo_read_loose_object_map(repo); clear_repository_format(&format); + strbuf_release(&err); return 0; error: clear_repository_format(&format); + strbuf_release(&err); repo_clear(repo); return -1; } diff --git a/setup.c b/setup.c index 252b4431172265..c5015923f159a9 100644 --- a/setup.c +++ b/setup.c @@ -750,8 +750,7 @@ static int check_repo_format(const char *var, const char *value, return read_worktree_config(var, value, ctx, vdata); } -static int check_repository_format_gently(struct repository *repo, - const char *gitdir, +static int check_repository_format_gently(const char *gitdir, struct repository_format *candidate, int *nongit_ok) { @@ -765,7 +764,7 @@ static int check_repository_format_gently(struct repository *repo, strbuf_release(&sb); /* - * For historical use of check_repository_format() in git-init, + * For historical use of check_and_apply_repository_format() in git-init, * we treat a missing config as a silent "ok", even when nongit_ok * is unset. */ @@ -782,8 +781,6 @@ static int check_repository_format_gently(struct repository *repo, die("%s", err.buf); } - repo->repository_format_precious_objects = candidate->precious_objects; - string_list_clear(&candidate->unknown_extensions, 0); string_list_clear(&candidate->v1_only_extensions, 0); @@ -1140,7 +1137,7 @@ static const char *setup_explicit_git_dir(struct repository *repo, die(_("not a git repository: '%s'"), gitdirenv); } - if (check_repository_format_gently(repo, gitdirenv, repo_fmt, nongit_ok)) { + if (check_repository_format_gently(gitdirenv, repo_fmt, nongit_ok)) { free(gitfile); return NULL; } @@ -1217,7 +1214,7 @@ static const char *setup_discovered_git_dir(struct repository *repo, struct repository_format *repo_fmt, int *nongit_ok) { - if (check_repository_format_gently(repo, gitdir, repo_fmt, nongit_ok)) + if (check_repository_format_gently(gitdir, repo_fmt, nongit_ok)) return NULL; /* --work-tree is set without --git-dir; use discovered one */ @@ -1265,7 +1262,7 @@ static const char *setup_bare_git_dir(struct repository *repo, { int root_len; - if (check_repository_format_gently(repo, ".", repo_fmt, nongit_ok)) + if (check_repository_format_gently(".", repo_fmt, nongit_ok)) return NULL; setenv(GIT_IMPLICIT_WORK_TREE_ENVIRONMENT, "0", 1); @@ -1757,6 +1754,32 @@ enum discovery_result discover_git_directory_reason(struct strbuf *commondir, return result; } +int apply_repository_format(struct repository *repo, + const struct repository_format *format, + struct strbuf *err) +{ + if (verify_repository_format(format, err) < 0) + return -1; + + repo_set_hash_algo(repo, format->hash_algo); + repo_set_compat_hash_algo(repo, format->compat_hash_algo); + repo_set_ref_storage_format(repo, + format->ref_storage_format, + format->ref_storage_payload); + repo->repository_format_worktree_config = + format->worktree_config; + repo->repository_format_submodule_path_cfg = + format->submodule_path_cfg; + repo->repository_format_relative_worktrees = + format->relative_worktrees; + repo->repository_format_partial_clone = + xstrdup_or_null(format->partial_clone); + repo->repository_format_precious_objects = + format->precious_objects; + + return 0; +} + /* * Check the repository format version in the path found in repo_get_git_dir(repo), * and die if it is a version we don't understand. Generally one would @@ -1765,26 +1788,20 @@ enum discovery_result discover_git_directory_reason(struct strbuf *commondir, * * If successful and fmt is not NULL, fill fmt with data. */ -static void check_repository_format(struct repository *repo, struct repository_format *fmt) +static void check_and_apply_repository_format(struct repository *repo, + struct repository_format *fmt) { struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; + struct strbuf err = STRBUF_INIT; + if (!fmt) fmt = &repo_fmt; - check_repository_format_gently(repo, repo_get_git_dir(repo), fmt, NULL); + + check_repository_format_gently(repo_get_git_dir(repo), fmt, NULL); + if (apply_repository_format(repo, fmt, &err) < 0) + die("%s", err.buf); startup_info->have_repository = 1; - repo_set_hash_algo(repo, fmt->hash_algo); - repo_set_compat_hash_algo(repo, fmt->compat_hash_algo); - repo_set_ref_storage_format(repo, - fmt->ref_storage_format, - fmt->ref_storage_payload); - repo->repository_format_worktree_config = - fmt->worktree_config; - repo->repository_format_submodule_path_cfg = - fmt->submodule_path_cfg; - repo->repository_format_relative_worktrees = - fmt->relative_worktrees; - repo->repository_format_partial_clone = - xstrdup_or_null(fmt->partial_clone); + clear_repository_format(&repo_fmt); } @@ -1862,7 +1879,7 @@ const char *enter_repo(struct repository *repo, const char *path, unsigned flags if (is_git_directory(".")) { set_git_dir(repo, ".", 0); - check_repository_format(repo, NULL); + check_and_apply_repository_format(repo, NULL); return path; } @@ -2020,25 +2037,15 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) gitdir = DEFAULT_GIT_DIR_ENVIRONMENT; setup_git_env_internal(repo, gitdir, false); } + if (startup_info->have_repository) { - repo_set_hash_algo(repo, repo_fmt.hash_algo); - repo_set_compat_hash_algo(repo, - repo_fmt.compat_hash_algo); - repo_set_ref_storage_format(repo, - repo_fmt.ref_storage_format, - repo_fmt.ref_storage_payload); - repo->repository_format_worktree_config = - repo_fmt.worktree_config; - repo->repository_format_relative_worktrees = - repo_fmt.relative_worktrees; - repo->repository_format_submodule_path_cfg = - repo_fmt.submodule_path_cfg; - /* take ownership of repo_fmt.partial_clone */ - repo->repository_format_partial_clone = - repo_fmt.partial_clone; - repo_fmt.partial_clone = NULL; - repo->repository_format_precious_objects = - repo_fmt.precious_objects; + struct strbuf err = STRBUF_INIT; + + if (apply_repository_format(repo, &repo_fmt, &err) < 0) + die("%s", err.buf); + + clear_repository_format(&repo_fmt); + strbuf_release(&err); } } /* @@ -2814,7 +2821,7 @@ int init_db(struct repository *repo, * config file, so this will not fail. What we are catching * is an attempt to reinitialize new repository with an old tool. */ - check_repository_format(repo, &repo_fmt); + check_and_apply_repository_format(repo, &repo_fmt); repository_format_configure(repo, &repo_fmt, hash, ref_storage_format); diff --git a/setup.h b/setup.h index 9409326fe47c70..efbb82fdbfc80b 100644 --- a/setup.h +++ b/setup.h @@ -221,6 +221,16 @@ void clear_repository_format(struct repository_format *format); int verify_repository_format(const struct repository_format *format, struct strbuf *err); +/* + * Apply the given repository format to the repo. This initializes extensions + * and basic data structures required for normal operation. Returns 0 on + * success, a negative error code when the format is not valid as determined by + * `verify_repository_format()`. + */ +int apply_repository_format(struct repository *repo, + const struct repository_format *format, + struct strbuf *err); + const char *get_template_dir(const char *option_template); #define INIT_DB_QUIET (1 << 0) From 6a2fbab4c95b0fc317514ec7ead618b3b37e3553 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:28 +0200 Subject: [PATCH 096/107] repository: stop initializing the object database in `repo_set_gitdir()` The function `repo_set_gitdir()` obviously sets the Git directory for a given repository. Less obviously though, the function also configures a couple of auxiliary settings. One such thing is that we create the object database in this function. This logic only happens conditionally though, as `set_git_dir()` may be called multiple times during repository setup, and we don't want to create the object database multiple times. This is somewhat tangled and hard to follow. Remove the logic from `repo_set_gitdir()` and instead initialize the object database outside of it. This leads to some duplication right now, but that duplication will be removed in a subsequent step where we will start initializing the object database as part of applying the repo's format. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- repository.c | 8 ++------ repository.h | 3 --- setup.c | 7 ++++--- 3 files changed, 6 insertions(+), 12 deletions(-) diff --git a/repository.c b/repository.c index 58a13f7c4f5d85..2c2395105fcf2d 100644 --- a/repository.c +++ b/repository.c @@ -181,12 +181,6 @@ void repo_set_gitdir(struct repository *repo, free(old_gitdir); repo_set_commondir(repo, o->commondir); - - if (!repo->objects) - repo->objects = odb_new(repo, o->object_dir, o->alternate_db); - else if (!o->skip_initializing_odb) - BUG("cannot reinitialize an already-initialized object directory"); - repo->disable_ref_updates = o->disable_ref_updates; expand_base_dir(&repo->graft_file, o->graft_file, @@ -302,6 +296,8 @@ int repo_init(struct repository *repo, goto error; } + repo->objects = odb_new(repo, NULL, NULL); + if (worktree) repo_set_worktree(repo, worktree); diff --git a/repository.h b/repository.h index c3ec0f4b790b00..36e2db26332c0e 100644 --- a/repository.h +++ b/repository.h @@ -221,12 +221,9 @@ const char *repo_get_work_tree(struct repository *repo); */ struct set_gitdir_args { const char *commondir; - const char *object_dir; const char *graft_file; const char *index_file; - const char *alternate_db; bool disable_ref_updates; - bool skip_initializing_odb; }; void repo_set_gitdir(struct repository *repo, const char *root, diff --git a/setup.c b/setup.c index c5015923f159a9..3bd3f6c5924ef3 100644 --- a/setup.c +++ b/setup.c @@ -1045,17 +1045,18 @@ static void setup_git_env_internal(struct repository *repo, struct strvec to_free = STRVEC_INIT; args.commondir = getenv_safe(&to_free, GIT_COMMON_DIR_ENVIRONMENT); - args.object_dir = getenv_safe(&to_free, DB_ENVIRONMENT); args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT); args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT); - args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT); if (getenv(GIT_QUARANTINE_ENVIRONMENT)) args.disable_ref_updates = true; - args.skip_initializing_odb = skip_initializing_odb; repo_set_gitdir(repo, git_dir, &args); strvec_clear(&to_free); + if (!skip_initializing_odb) + repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT), + getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); + if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT)) disable_replace_refs(); replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT); From aae4ebc895272dc7e5a9ccfc135878b55c7322d7 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:29 +0200 Subject: [PATCH 097/107] setup: stop creating the object database in `setup_git_env()` In the preceding commit we have stopped creating the object database in `repo_set_gitdir()`. But the logic is still somewhat confusing as we still end up creating it conditionally in `setup_git_dir()`, which is called multiple times. Drop the conditional logic and instead create the object database in all places where we have discovered and configured a repository. This leads to even more duplication than we already had in the preceding commit, but an alert reader may notice that we now (almost) always call `odb_new()` directly before having called `apply_repository_format()`. The only exception to this is `setup_git_directory_gently()`, where we also call the function when _not_ applying the repository format. This will be fixed in the next commit, and once that's done we can then unify creation of the object database into `apply_repository_format()`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- setup.c | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/setup.c b/setup.c index 3bd3f6c5924ef3..0dc9fe4565a182 100644 --- a/setup.c +++ b/setup.c @@ -1035,8 +1035,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code) } static void setup_git_env_internal(struct repository *repo, - const char *git_dir, - bool skip_initializing_odb) + const char *git_dir) { char *git_replace_ref_base; const char *shallow_file; @@ -1053,10 +1052,6 @@ static void setup_git_env_internal(struct repository *repo, repo_set_gitdir(repo, git_dir, &args); strvec_clear(&to_free); - if (!skip_initializing_odb) - repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT), - getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); - if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT)) disable_replace_refs(); replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT); @@ -1072,10 +1067,10 @@ static void setup_git_env_internal(struct repository *repo, fetch_if_missing = 0; } -static void set_git_dir_1(struct repository *repo, const char *path, bool skip_initializing_odb) +static void set_git_dir_1(struct repository *repo, const char *path) { xsetenv(GIT_DIR_ENVIRONMENT, path, 1); - setup_git_env_internal(repo, path, skip_initializing_odb); + setup_git_env_internal(repo, path); } static void update_relative_gitdir(const char *name UNUSED, @@ -1089,7 +1084,7 @@ static void update_relative_gitdir(const char *name UNUSED, trace_printf_key(&trace_setup_key, "setup: move $GIT_DIR to '%s'", path); - set_git_dir_1(repo, path, true); + set_git_dir_1(repo, path); free(path); } @@ -1102,7 +1097,7 @@ static void set_git_dir(struct repository *repo, const char *path, int make_real path = realpath.buf; } - set_git_dir_1(repo, path, false); + set_git_dir_1(repo, path); if (!is_absolute_path(path)) chdir_notify_register(NULL, update_relative_gitdir, repo); @@ -1879,8 +1874,15 @@ const char *enter_repo(struct repository *repo, const char *path, unsigned flags } if (is_git_directory(".")) { + struct strvec to_free = STRVEC_INIT; + set_git_dir(repo, ".", 0); + repo->objects = odb_new(repo, + getenv_safe(&to_free, DB_ENVIRONMENT), + getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); check_and_apply_repository_format(repo, NULL); + + strvec_clear(&to_free); return path; } @@ -2032,13 +2034,19 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) startup_info->have_repository || /* GIT_DIR_EXPLICIT */ getenv(GIT_DIR_ENVIRONMENT)) { + struct strvec to_free = STRVEC_INIT; + if (!repo->gitdir) { const char *gitdir = getenv(GIT_DIR_ENVIRONMENT); if (!gitdir) gitdir = DEFAULT_GIT_DIR_ENVIRONMENT; - setup_git_env_internal(repo, gitdir, false); + setup_git_env_internal(repo, gitdir); } + repo->objects = odb_new(repo, + getenv_safe(&to_free, DB_ENVIRONMENT), + getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); + if (startup_info->have_repository) { struct strbuf err = STRBUF_INIT; @@ -2048,6 +2056,8 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) clear_repository_format(&repo_fmt); strbuf_release(&err); } + + strvec_clear(&to_free); } /* * Since precompose_string_if_needed() needs to look at @@ -2796,6 +2806,7 @@ int init_db(struct repository *repo, int exist_ok = flags & INIT_DB_EXIST_OK; char *original_git_dir = real_pathdup(git_dir, 1); struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; + struct strvec to_free = STRVEC_INIT; if (real_git_dir) { struct stat st; @@ -2816,6 +2827,9 @@ int init_db(struct repository *repo, } startup_info->have_repository = 1; + repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT), + getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); + /* * Check to see if the repository version is right. * Note that a newly created repository does not have @@ -2879,6 +2893,7 @@ int init_db(struct repository *repo, } clear_repository_format(&repo_fmt); + strvec_clear(&to_free); free(original_git_dir); return 0; } From d87de311ff506599ec130ba5f09a4f73e458a5ae Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:30 +0200 Subject: [PATCH 098/107] setup: stop initializing object database without repository The function `setup_git_directory_gently()` is responsible for discovering and setting up a Git repository based on various environment variables and the current working directory. The result is thus a fully usable Git repository. One oddity of this function is that we may set up the object database even in the case where we don't have a repository, namely in the case where the `GIT_DIR_EXPLICIT` environment variable is set but points to a non-existent repository. If so, we call `setup_git_env_internal()` with the value of the environment variable so that the repository's Git directory is configured, even if it points to a non-existent directory. Historically though, this function didn't only configure the repository, but also initialized the object database. We retained this behaviour from a preceding commit, even though it really doesn't make much sense in the first place -- there is no repository, so we don't have an object database either. There seemingly isn't much of a reason to construct the object database, as we typically won't try to read objects when we don't have an object database. There's one exception though: git-index-pack(1) may run outside of a repository, which can be used to perform consistency checks for a packfile. The code path is _almost_ working: we already know to call `parse_object_buffer()`, which can read objects without an object database being available. And that works for all object types except for commits, because `parse_commit_buffer()` calls `parse_commit_graph()`, and that function doesn't handle the case where we don't have an object database. Fix this instance to check for the object database instead of checking for the Git directory having been initialized. With this fixed, we can now stop constructing an object database completely. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- commit-graph.c | 4 ++-- setup.c | 7 +++---- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 9abe62bd5a278a..0820cf5fb83cbe 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -740,13 +740,13 @@ static struct commit_graph *prepare_commit_graph(struct repository *r) struct odb_source *source; /* - * Early return if there is no git dir or if the commit graph is + * Early return if there is no object database or if the commit graph is * disabled. * * This must come before the "already attempted?" check below, because * we want to disable even an already-loaded graph file. */ - if (!r->gitdir || r->commit_graph_disabled) + if (!r->objects || r->commit_graph_disabled) return NULL; if (r->objects->commit_graph_attempted) diff --git a/setup.c b/setup.c index 0dc9fe4565a182..4a8d6230b18529 100644 --- a/setup.c +++ b/setup.c @@ -2043,13 +2043,12 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) setup_git_env_internal(repo, gitdir); } - repo->objects = odb_new(repo, - getenv_safe(&to_free, DB_ENVIRONMENT), - getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); - if (startup_info->have_repository) { struct strbuf err = STRBUF_INIT; + repo->objects = odb_new(repo, + getenv_safe(&to_free, DB_ENVIRONMENT), + getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); if (apply_repository_format(repo, &repo_fmt, &err) < 0) die("%s", err.buf); From a84a9d4acdae51f58529b2596c4bd935fe9af372 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:31 +0200 Subject: [PATCH 099/107] repository: stop reading loose object map twice on repo init When initializing a repository via `repo_init()` we end up reading the loose object map twice: - `apply_repository_format()` calls `repo_set_compat_hash_algo()`, which in turn calls `repo_read_loose_object_map()` if we have a compatibility hash configured. - `repo_init()` calls `repo_read_loose_object_map()` directly a second time. Drop the second read of the loose object map in `repo_init()`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- repository.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/repository.c b/repository.c index 2c2395105fcf2d..61dfbb8be6cd1d 100644 --- a/repository.c +++ b/repository.c @@ -301,9 +301,6 @@ int repo_init(struct repository *repo, if (worktree) repo_set_worktree(repo, worktree); - if (repo->compat_hash_algo) - repo_read_loose_object_map(repo); - clear_repository_format(&format); strbuf_release(&err); return 0; From 42b9d3dc9dfa9e733cbd6402e665ac35fce0c216 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Thu, 4 Jun 2026 09:46:32 +0200 Subject: [PATCH 100/107] setup: construct object database in `apply_repository_format()` With the preceding changes we now always construct the repository's object database before applying the repository format. Remove this duplication by constructing it in `apply_repository_format()` instead. Note that we create the object database _after_ having set up the repository's hash algorithm, but _before_ setting the compat hash algorithm. This is intentional: - Constructing the object database may require knowledge of its intended object format. - Setting up the compatibility hash requires the object database to be initialized already, because we immediately read the loose object map. The first point is sensible, the second maybe a little less so. Ideally, it should be the responsibility of the object database itself to initialize any data structures required for the compatibility hash. But this would require further changes, so this is kept as-is for now. Further note that this requires us to move handling of the environment variables GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES into the repository format, as well. This allows the caller more flexibility around whether or not those environment variables are being honored, as we want to respect them in "setup.c", but not in "repository.c". Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- repository.c | 4 +--- setup.c | 45 +++++++++++++++++++++------------------------ setup.h | 10 ++++++++++ 3 files changed, 32 insertions(+), 27 deletions(-) diff --git a/repository.c b/repository.c index 61dfbb8be6cd1d..187dd471c4e607 100644 --- a/repository.c +++ b/repository.c @@ -291,13 +291,11 @@ int repo_init(struct repository *repo, if (read_repository_format_from_commondir(&format, repo->commondir)) goto error; - if (apply_repository_format(repo, &format, &err) < 0) { + if (apply_repository_format(repo, &format, 0, &err) < 0) { warning("%s", err.buf); goto error; } - repo->objects = odb_new(repo, NULL, NULL); - if (worktree) repo_set_worktree(repo, worktree); diff --git a/setup.c b/setup.c index 4a8d6230b18529..513fc88749212b 100644 --- a/setup.c +++ b/setup.c @@ -1752,12 +1752,22 @@ enum discovery_result discover_git_directory_reason(struct strbuf *commondir, int apply_repository_format(struct repository *repo, const struct repository_format *format, + enum apply_repository_format_flags flags, struct strbuf *err) { + char *object_directory = NULL, *alternate_object_directories = NULL; + if (verify_repository_format(format, err) < 0) return -1; + if (flags & APPLY_REPOSITORY_FORMAT_HONOR_ENV) { + object_directory = xstrdup_or_null(getenv(DB_ENVIRONMENT)); + alternate_object_directories = xstrdup_or_null(getenv(ALTERNATE_DB_ENVIRONMENT)); + } + repo_set_hash_algo(repo, format->hash_algo); + repo->objects = odb_new(repo, object_directory, + alternate_object_directories); repo_set_compat_hash_algo(repo, format->compat_hash_algo); repo_set_ref_storage_format(repo, format->ref_storage_format, @@ -1773,6 +1783,8 @@ int apply_repository_format(struct repository *repo, repo->repository_format_precious_objects = format->precious_objects; + free(alternate_object_directories); + free(object_directory); return 0; } @@ -1785,7 +1797,8 @@ int apply_repository_format(struct repository *repo, * If successful and fmt is not NULL, fill fmt with data. */ static void check_and_apply_repository_format(struct repository *repo, - struct repository_format *fmt) + struct repository_format *fmt, + enum apply_repository_format_flags flags) { struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; struct strbuf err = STRBUF_INIT; @@ -1794,7 +1807,7 @@ static void check_and_apply_repository_format(struct repository *repo, fmt = &repo_fmt; check_repository_format_gently(repo_get_git_dir(repo), fmt, NULL); - if (apply_repository_format(repo, fmt, &err) < 0) + if (apply_repository_format(repo, fmt, flags, &err) < 0) die("%s", err.buf); startup_info->have_repository = 1; @@ -1874,15 +1887,9 @@ const char *enter_repo(struct repository *repo, const char *path, unsigned flags } if (is_git_directory(".")) { - struct strvec to_free = STRVEC_INIT; - set_git_dir(repo, ".", 0); - repo->objects = odb_new(repo, - getenv_safe(&to_free, DB_ENVIRONMENT), - getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); - check_and_apply_repository_format(repo, NULL); - - strvec_clear(&to_free); + check_and_apply_repository_format(repo, NULL, + APPLY_REPOSITORY_FORMAT_HONOR_ENV); return path; } @@ -2034,8 +2041,6 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) startup_info->have_repository || /* GIT_DIR_EXPLICIT */ getenv(GIT_DIR_ENVIRONMENT)) { - struct strvec to_free = STRVEC_INIT; - if (!repo->gitdir) { const char *gitdir = getenv(GIT_DIR_ENVIRONMENT); if (!gitdir) @@ -2046,17 +2051,13 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok) if (startup_info->have_repository) { struct strbuf err = STRBUF_INIT; - repo->objects = odb_new(repo, - getenv_safe(&to_free, DB_ENVIRONMENT), - getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); - if (apply_repository_format(repo, &repo_fmt, &err) < 0) + if (apply_repository_format(repo, &repo_fmt, + APPLY_REPOSITORY_FORMAT_HONOR_ENV, &err) < 0) die("%s", err.buf); clear_repository_format(&repo_fmt); strbuf_release(&err); } - - strvec_clear(&to_free); } /* * Since precompose_string_if_needed() needs to look at @@ -2805,7 +2806,6 @@ int init_db(struct repository *repo, int exist_ok = flags & INIT_DB_EXIST_OK; char *original_git_dir = real_pathdup(git_dir, 1); struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; - struct strvec to_free = STRVEC_INIT; if (real_git_dir) { struct stat st; @@ -2826,16 +2826,14 @@ int init_db(struct repository *repo, } startup_info->have_repository = 1; - repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT), - getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT)); - /* * Check to see if the repository version is right. * Note that a newly created repository does not have * config file, so this will not fail. What we are catching * is an attempt to reinitialize new repository with an old tool. */ - check_and_apply_repository_format(repo, &repo_fmt); + check_and_apply_repository_format(repo, &repo_fmt, + APPLY_REPOSITORY_FORMAT_HONOR_ENV); repository_format_configure(repo, &repo_fmt, hash, ref_storage_format); @@ -2892,7 +2890,6 @@ int init_db(struct repository *repo, } clear_repository_format(&repo_fmt); - strvec_clear(&to_free); free(original_git_dir); return 0; } diff --git a/setup.h b/setup.h index efbb82fdbfc80b..19679fe78fb72f 100644 --- a/setup.h +++ b/setup.h @@ -221,6 +221,15 @@ void clear_repository_format(struct repository_format *format); int verify_repository_format(const struct repository_format *format, struct strbuf *err); +enum apply_repository_format_flags { + /* + * Honor environment variables when applying the repository format to + * the repository. For now, this only covers environment variables that + * relate to the object database. + */ + APPLY_REPOSITORY_FORMAT_HONOR_ENV = (1 << 0), +}; + /* * Apply the given repository format to the repo. This initializes extensions * and basic data structures required for normal operation. Returns 0 on @@ -229,6 +238,7 @@ int verify_repository_format(const struct repository_format *format, */ int apply_repository_format(struct repository *repo, const struct repository_format *format, + enum apply_repository_format_flags flags, struct strbuf *err); const char *get_template_dir(const char *option_template); From 4a1eb9304aae95ca52dff72a099e060dd6a1b8c9 Mon Sep 17 00:00:00 2001 From: Lucas Seiki Oshiro Date: Thu, 4 Jun 2026 13:34:42 -0300 Subject: [PATCH 101/107] Documentation: remove redundant 'instead' in --subject-prefix The documentation for --subject-prefix has two words "instead" in the same sentence, making it a little bit confusing to read. Change the order of the phrase to a more natural "Use [...] instead of [...]" structure. Signed-off-by: Lucas Seiki Oshiro Signed-off-by: Junio C Hamano --- Documentation/git-format-patch.adoc | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/Documentation/git-format-patch.adoc b/Documentation/git-format-patch.adoc index 5662382450289a..f7905c0f7c0322 100644 --- a/Documentation/git-format-patch.adoc +++ b/Documentation/git-format-patch.adoc @@ -221,10 +221,9 @@ populated with placeholder text. for generating the cover letter. --subject-prefix=:: - Instead of the standard '[PATCH]' prefix in the subject - line, instead use '[]'. This can be used - to name a patch series, and can be combined with the - `--numbered` option. + Use '[]' instead of the standard '[PATCH]' + prefix in the subject line. This can be used to name a patch + series, and can be combined with the `--numbered` option. + The configuration variable `format.subjectPrefix` may also be used to configure a subject prefix to apply to a given repository for From c746f45476146c15791e714246651534619b74ae Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Fri, 5 Jun 2026 15:55:59 +0200 Subject: [PATCH 102/107] doc: link to config for git-replay(1) This config doc was added in 336ac90c (replay: add replay.refAction config option, 2025-11-06) but never included anywhere. Include it in git-replay(1) and git-config(1). Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/config.adoc | 2 ++ Documentation/git-replay.adoc | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/Documentation/config.adoc b/Documentation/config.adoc index dcea3c0c15e2a9..35fc7b4bf6ad31 100644 --- a/Documentation/config.adoc +++ b/Documentation/config.adoc @@ -511,6 +511,8 @@ include::config/remotes.adoc[] include::config/repack.adoc[] +include::config/replay.adoc[] + include::config/rerere.adoc[] include::config/revert.adoc[] diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc index a32f72aead3750..f9ca2db2833a1d 100644 --- a/Documentation/git-replay.adoc +++ b/Documentation/git-replay.adoc @@ -209,6 +209,10 @@ This replays the range `aabbcc..ddeeff` onto commit `112233` and updates `refs/heads/mybranch` to point at the result. This can be useful when you want to use bare commit IDs instead of branch names. +CONFIGURATION +------------- +include::config/replay.adoc[] + GIT --- Part of the linkgit:git[1] suite From 2f169b5c22a641cad83b4be657e7265959d60dd8 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Fri, 5 Jun 2026 15:56:00 +0200 Subject: [PATCH 103/107] doc: replay: improve config description MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First of all, this unordered list for `replay.refAction` introduces a term with a colon. This is exactly what a description list is, structurally. Let’s be stylistically consistent and use the desc. list markup construct. Let’s also drop the harmless but unneeded indentation. We can reuse the `::` delimiter since we use an open block. But for consistency use the typical nested description list delimiter, namely `;;`. Second, let’s replace the inline-verbatim `git replay` with a link to git-replay(1), since we are naming the command. But make that conditional so that we avoid a self-link inside git-replay(1).[1] † 1: See e.g. e7b3a768 (doc: git-init: rework config item init.templateDir, 2024-03-10) for another example of avoiding self-linking Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/config/replay.adoc | 16 ++++++++++------ Documentation/git-replay.adoc | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/Documentation/config/replay.adoc b/Documentation/config/replay.adoc index 7d549d2f0e5195..7328da9537dc64 100644 --- a/Documentation/config/replay.adoc +++ b/Documentation/config/replay.adoc @@ -1,11 +1,15 @@ replay.refAction:: - Specifies the default mode for handling reference updates in - `git replay`. The value can be: + Specifies the default mode for handling reference updates. + The value can be: + -- - * `update`: Update refs directly using an atomic transaction (default behavior). - * `print`: Output update-ref commands for pipeline use. +`update`;; Update refs directly using an atomic transaction (default behavior). +`print`;; Output update-ref commands for pipeline use. -- + -This setting can be overridden with the `--ref-action` command-line option. -When not configured, `git replay` defaults to `update` mode. +ifdef::git-replay[] +See `--ref-action`. +endif::git-replay[] +ifndef::git-replay[] +See `--ref-action` for linkgit:git-replay[1] for details. +endif::git-replay[] diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc index f9ca2db2833a1d..4de85088d6c4c9 100644 --- a/Documentation/git-replay.adoc +++ b/Documentation/git-replay.adoc @@ -211,6 +211,7 @@ to use bare commit IDs instead of branch names. CONFIGURATION ------------- +:git-replay: 1 include::config/replay.adoc[] GIT From 6fa17cca7c72df1c717027887749e8fb73338339 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Fri, 5 Jun 2026 15:56:01 +0200 Subject: [PATCH 104/107] doc: replay: use a nested description list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This bullet list for `--ref-action` introduces a term with a colon. This is exactly what a description list is, structurally. Let’s be stylistically consistent and use the desc. list markup construct. In short, just transform this unordered list in the same way that we did for `replay.refAction` in the previous commit. Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/git-replay.adoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc index 4de85088d6c4c9..b4fe43ec687859 100644 --- a/Documentation/git-replay.adoc +++ b/Documentation/git-replay.adoc @@ -80,10 +80,10 @@ incompatible with `--contained` (which is a modifier for `--onto` only). Control how references are updated. The mode can be: + -- - * `update` (default): Update refs directly using an atomic transaction. - All refs are updated or none are (all-or-nothing behavior). - * `print`: Output update-ref commands for pipeline use. This is the - traditional behavior where output can be piped to `git update-ref --stdin`. +`update` (default);; Update refs directly using an atomic transaction. + All refs are updated or none are (all-or-nothing behavior). +`print`;; Output update-ref commands for pipeline use. This is the + traditional behavior where output can be piped to `git update-ref --stdin`. -- + The default mode can be configured via the `replay.refAction` configuration variable. From 60575c76a5943246fdc36a6ef036e0b6b85d4147 Mon Sep 17 00:00:00 2001 From: Kristoffer Haugsbakk Date: Fri, 5 Jun 2026 15:56:02 +0200 Subject: [PATCH 105/107] =?UTF-8?q?doc:=20replay:=20move=20=E2=80=9Cdefaul?= =?UTF-8?q?t=E2=80=9D=20to=20the=20right-hand=20side?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is now a description list (see previous commit) and parentheticals like this do not go on the left-hand side. Moving it to the other side makes it stand out just as much and is also more consistent with the rest of the documentation. Let’s also do the same for the `replay.refAction` description list. That makes the two desc. lists identical in the first sentence. Let’s add a comment about that for future editors. Signed-off-by: Kristoffer Haugsbakk Signed-off-by: Junio C Hamano --- Documentation/config/replay.adoc | 5 ++++- Documentation/git-replay.adoc | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/Documentation/config/replay.adoc b/Documentation/config/replay.adoc index 7328da9537dc64..40d1695782affd 100644 --- a/Documentation/config/replay.adoc +++ b/Documentation/config/replay.adoc @@ -3,7 +3,10 @@ replay.refAction:: The value can be: + -- -`update`;; Update refs directly using an atomic transaction (default behavior). +//// +These use the first sentences from the description list in git-replay(1). +//// +`update`;; (default) Update refs directly using an atomic transaction. `print`;; Output update-ref commands for pipeline use. -- + diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc index b4fe43ec687859..ea4d14baddb6a9 100644 --- a/Documentation/git-replay.adoc +++ b/Documentation/git-replay.adoc @@ -80,7 +80,10 @@ incompatible with `--contained` (which is a modifier for `--onto` only). Control how references are updated. The mode can be: + -- -`update` (default);; Update refs directly using an atomic transaction. +//// +Expanded description list compared to 'replay.refAction'. +//// +`update`;; (default) Update refs directly using an atomic transaction. All refs are updated or none are (all-or-nothing behavior). `print`;; Output update-ref commands for pipeline use. This is the traditional behavior where output can be piped to `git update-ref --stdin`. From d1b72b29e993ece28ace1f7f5d587e959e26c65c Mon Sep 17 00:00:00 2001 From: Alexander Monakov Date: Fri, 5 Jun 2026 20:26:43 +0300 Subject: [PATCH 106/107] doc: fix typo in GIT_ALTERNATE_OBJECT_DIRECTORIES One file accidentally spelled GIT_ALTERNATE_OBJECT_DIRECTORIES with REPOSITORIES instead of DIRECTORIES. Fix the typo. Signed-off-by: Alexander Monakov Signed-off-by: Junio C Hamano --- Documentation/technical/hash-function-transition.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc index 2359d7d106f842..241d2f763dd436 100644 --- a/Documentation/technical/hash-function-transition.adoc +++ b/Documentation/technical/hash-function-transition.adoc @@ -545,7 +545,7 @@ Alternates ~~~~~~~~~~ For the same reason, a SHA-256 repository cannot borrow objects from a SHA-1 repository using objects/info/alternates or -$GIT_ALTERNATE_OBJECT_REPOSITORIES. +$GIT_ALTERNATE_OBJECT_DIRECTORIES. git notes ~~~~~~~~~ From 45833cba14a50dd451b3949889e15bfba114bbcc Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Mon, 8 Jun 2026 00:16:28 +0900 Subject: [PATCH 107/107] ### match next