-
Notifications
You must be signed in to change notification settings - Fork 2
Comparing changes
Open a pull request
base repository: VectorArc/avp-python
base: main
head repository: VectorArc/avp-python
compare: cross_platform_research
- 14 commits
- 25 files changed
- 2 contributors
Commits on Mar 14, 2026
-
Add logit-guided decoding for cross-model communication
New approach: instead of compressing source model info into a single virtual token (rosetta), distribute the signal as additive logit bias during target generation. Source model's vocabulary distribution is mapped through vocab overlap to target vocabulary. Implementation: - New: rosetta/logit_guided.py — CrossModelLogitBias processor + bias computation - Modified: huggingface.py — cross_model_method="logit_guided" option in generate() - Modified: easy.py — pass through cross_model_method and logit_bias_alpha - New: pipeline_logit_guided.py — benchmark pipeline for GSM8K 2-agent - Modified: run_gsm8k_2agent.py — --mode logit_guided support - Modified: shared/generation.py — logits_processor kwarg support - New: test_logit_guided.py — 11 unit tests (bias shape, zero-mean, gating, scaling) Key features: - Confidence gating: skip bias when target is already confident (>0.8 max prob) - Zero-mean bias: doesn't shift distribution center, only nudges relative prefs - Alpha scaling: default 0.5 (conservative for cross-vocab mapping) - Falls back gracefully for RIDGE/PROCRUSTES (no token-level mapping) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 75d5935 - Browse repository at this point
Copy the full SHA 75d5935View commit details -
Add smart routing and mid-layer injection for cross-model transfer
Smart routing: Enhanced quality gate with task-type classification (math/code vs comprehension) using lexical features. Zero latency overhead. Backward compatible with existing assess_transfer() API. Mid-layer injection: Inject projected hidden states at ~75% depth via forward hook instead of layer-0 KV-cache priming. Based on Ramesh & Li (2501.14082) cross-model injection research. Both features available as cross_model_method options in HuggingFaceConnector.generate() and as benchmark pipeline modes. 47 new tests (31 smart routing + 16 mid-layer), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for b06a957 - Browse repository at this point
Copy the full SHA b06a957View commit details -
Add trained C2C cross-model projector (Tier 2)
Per-layer linear projections with learned sigmoid gates for cross-model latent transfer. Both source and target models frozen; only the lightweight projector trains. Inference via per-layer forward hooks that additively inject projected hidden states during prefill. New files: - rosetta/train.py: LayerProjector, TrainConfig, train_projector() - rosetta/trained_hooks.py: trained_multi_layer_hook context manager - pipeline_trained.py: GSM8K benchmark pipeline for trained mode - test_trained_projector.py: 19 tests (projector, hooks, registry, enum) Modified: - types.py: ProjectionMethod.TRAINED enum - calibrate.py: layer_weights/biases/gates fields on AVPMap - registry.py: save/load trained projection fields - huggingface.py: cross_model_method="trained" branch + _prepare_trained_injection() - run_gsm8k_2agent.py: "trained" mode with inline training Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 8168936 - Browse repository at this point
Copy the full SHA 8168936View commit details -
Fix gradient flow through gate logits in training loop
The forward() method was calling .item() on sigmoid gates, detaching them from the computation graph. This meant gate logits only received gradients from L1 regularization, not from the MSE loss — so gates couldn't learn which layers are important from the training signal. Fix: add return_gate_tensors parameter. Training uses True (tensor gates for gradient flow), inference uses False (float gates for speed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 4c6626a - Browse repository at this point
Copy the full SHA 4c6626aView commit details -
Fix compute_model_hash call in train_projector
Was passing ModelIdentity object instead of config dict. Now uses model.config.to_dict() like all other call sites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for afbb008 - Browse repository at this point
Copy the full SHA afbb008View commit details -
Add NTP loss as primary training objective for C2C projector
Research shows MSE-only training optimizes geometric alignment but not downstream generation quality. This adds cross-entropy (NTP) loss through the hooked target model as the primary loss, with MSE as auxiliary (0.1 weight). Also fixes MSE to use unhooked reference hidden states (avoiding circular reference), lowers gate_init to -5.0 for less initial corruption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 37a588d - Browse repository at this point
Copy the full SHA 37a588dView commit details -
Skip hidden states storage in hooked NTP forward pass
The hooked forward pass only needs logits for NTP loss. Hidden states for MSE auxiliary come from the separate unhooked reference pass. Setting output_hidden_states=False saves activation memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for eb5dbd2 - Browse repository at this point
Copy the full SHA eb5dbd2View commit details -
Add gate_init and gate_reg_weight passthrough to trained benchmark co…
…nfig Exp 3 (NTP loss) failed due to cold-start gate collapse: gate_init=-5.0 combined with L1 regularization pushed all 28 gates to zero. Now exposing these hyperparameters so experiments can test warm-gate NTP configurations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 1f4c4a1 - Browse repository at this point
Copy the full SHA 1f4c4a1View commit details -
Set MSE-only as default training config (gate_init=-3.0, use_ntp_loss…
…=False) 4 experiments showed MSE-only with gate_init=-3.0 matches NTP loss at 76% GSM8K cross-family accuracy (+6pp over rosetta) while requiring half the training compute. NTP with cold gates (-5.0) causes gate collapse to 0/28. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 867eb21 - Browse repository at this point
Copy the full SHA 867eb21View commit details -
Add hybrid latent + selective text mode to HotpotQA rosetta pipeline
Extract top-K tokens by attention weight from source model's forward pass, decode to text, re-tokenize on target side, prepend as embeddings before the projected latent vector. Controlled by hybrid_k parameter (0=disabled). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for faa1aa9 - Browse repository at this point
Copy the full SHA faa1aa9View commit details -
Hybrid v2: inject key tokens as text in Agent B prompt, not inputs_em…
…beds inputs_embeds injection had zero effect (model ignores raw embeddings). Now inject key tokens as "Key Context" in the answerer's prompt via input_ids — processed through normal embedding + positional encoding path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for e71bfc1 - Browse repository at this point
Copy the full SHA e71bfc1View commit details -
Fix hybrid: switch SDPA to eager attention for key token extraction
SDPA silently ignores output_attentions=True, so attention weights were never returned. key_text was always None, meaning hybrid mode was effectively running pure rosetta. Temporarily switch to eager attention for the dummy forward pass when hybrid_k > 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for a1882f6 - Browse repository at this point
Copy the full SHA a1882f6View commit details -
Fix hybrid: load model with eager attention at from_pretrained time
SDPA attention silently ignores output_attentions=True, so all hybrid experiment runs had no attention weights — key_text was always None. The previous runtime _attn_implementation override didn't work because HuggingFace selects the attention module class at from_pretrained time. Fix: pass attn_implementation="eager" to from_pretrained when hybrid_k > 0. Remove the broken runtime override from pipeline_rosetta.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for aec31ec - Browse repository at this point
Copy the full SHA aec31ecView commit details -
Fix hybrid extraction: skip template tokens to avoid attention sinks
Attention-based extraction was picking up system prompt and instruction tokens instead of paragraph content. Added find_content_start() to locate the "## Paragraphs:" marker and zero out template token scores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 4c3607b - Browse repository at this point
Copy the full SHA 4c3607bView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff main...cross_platform_research