Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: VectorArc/avp-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: VectorArc/avp-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cross_platform_research
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 14 commits
  • 25 files changed
  • 2 contributors

Commits on Mar 14, 2026

  1. Add logit-guided decoding for cross-model communication

    New approach: instead of compressing source model info into a single virtual
    token (rosetta), distribute the signal as additive logit bias during target
    generation. Source model's vocabulary distribution is mapped through vocab
    overlap to target vocabulary.
    
    Implementation:
    - New: rosetta/logit_guided.py — CrossModelLogitBias processor + bias computation
    - Modified: huggingface.py — cross_model_method="logit_guided" option in generate()
    - Modified: easy.py — pass through cross_model_method and logit_bias_alpha
    - New: pipeline_logit_guided.py — benchmark pipeline for GSM8K 2-agent
    - Modified: run_gsm8k_2agent.py — --mode logit_guided support
    - Modified: shared/generation.py — logits_processor kwarg support
    - New: test_logit_guided.py — 11 unit tests (bias shape, zero-mean, gating, scaling)
    
    Key features:
    - Confidence gating: skip bias when target is already confident (>0.8 max prob)
    - Zero-mean bias: doesn't shift distribution center, only nudges relative prefs
    - Alpha scaling: default 0.5 (conservative for cross-vocab mapping)
    - Falls back gracefully for RIDGE/PROCRUSTES (no token-level mapping)
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    75d5935 View commit details
    Browse the repository at this point in the history
  2. Add smart routing and mid-layer injection for cross-model transfer

    Smart routing: Enhanced quality gate with task-type classification
    (math/code vs comprehension) using lexical features. Zero latency
    overhead. Backward compatible with existing assess_transfer() API.
    
    Mid-layer injection: Inject projected hidden states at ~75% depth
    via forward hook instead of layer-0 KV-cache priming. Based on
    Ramesh & Li (2501.14082) cross-model injection research.
    
    Both features available as cross_model_method options in
    HuggingFaceConnector.generate() and as benchmark pipeline modes.
    
    47 new tests (31 smart routing + 16 mid-layer), all passing.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    b06a957 View commit details
    Browse the repository at this point in the history
  3. Add trained C2C cross-model projector (Tier 2)

    Per-layer linear projections with learned sigmoid gates for cross-model
    latent transfer. Both source and target models frozen; only the lightweight
    projector trains. Inference via per-layer forward hooks that additively
    inject projected hidden states during prefill.
    
    New files:
    - rosetta/train.py: LayerProjector, TrainConfig, train_projector()
    - rosetta/trained_hooks.py: trained_multi_layer_hook context manager
    - pipeline_trained.py: GSM8K benchmark pipeline for trained mode
    - test_trained_projector.py: 19 tests (projector, hooks, registry, enum)
    
    Modified:
    - types.py: ProjectionMethod.TRAINED enum
    - calibrate.py: layer_weights/biases/gates fields on AVPMap
    - registry.py: save/load trained projection fields
    - huggingface.py: cross_model_method="trained" branch + _prepare_trained_injection()
    - run_gsm8k_2agent.py: "trained" mode with inline training
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    8168936 View commit details
    Browse the repository at this point in the history
  4. Fix gradient flow through gate logits in training loop

    The forward() method was calling .item() on sigmoid gates, detaching
    them from the computation graph. This meant gate logits only received
    gradients from L1 regularization, not from the MSE loss — so gates
    couldn't learn which layers are important from the training signal.
    
    Fix: add return_gate_tensors parameter. Training uses True (tensor
    gates for gradient flow), inference uses False (float gates for speed).
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    4c6626a View commit details
    Browse the repository at this point in the history
  5. Fix compute_model_hash call in train_projector

    Was passing ModelIdentity object instead of config dict. Now uses
    model.config.to_dict() like all other call sites.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    afbb008 View commit details
    Browse the repository at this point in the history
  6. Add NTP loss as primary training objective for C2C projector

    Research shows MSE-only training optimizes geometric alignment but not
    downstream generation quality. This adds cross-entropy (NTP) loss through
    the hooked target model as the primary loss, with MSE as auxiliary (0.1
    weight). Also fixes MSE to use unhooked reference hidden states (avoiding
    circular reference), lowers gate_init to -5.0 for less initial corruption.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    37a588d View commit details
    Browse the repository at this point in the history
  7. Skip hidden states storage in hooked NTP forward pass

    The hooked forward pass only needs logits for NTP loss. Hidden states
    for MSE auxiliary come from the separate unhooked reference pass.
    Setting output_hidden_states=False saves activation memory.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    eb5dbd2 View commit details
    Browse the repository at this point in the history
  8. Add gate_init and gate_reg_weight passthrough to trained benchmark co…

    …nfig
    
    Exp 3 (NTP loss) failed due to cold-start gate collapse: gate_init=-5.0
    combined with L1 regularization pushed all 28 gates to zero. Now exposing
    these hyperparameters so experiments can test warm-gate NTP configurations.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    1f4c4a1 View commit details
    Browse the repository at this point in the history
  9. Set MSE-only as default training config (gate_init=-3.0, use_ntp_loss…

    …=False)
    
    4 experiments showed MSE-only with gate_init=-3.0 matches NTP loss at 76%
    GSM8K cross-family accuracy (+6pp over rosetta) while requiring half the
    training compute. NTP with cold gates (-5.0) causes gate collapse to 0/28.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    867eb21 View commit details
    Browse the repository at this point in the history
  10. Add hybrid latent + selective text mode to HotpotQA rosetta pipeline

    Extract top-K tokens by attention weight from source model's forward pass,
    decode to text, re-tokenize on target side, prepend as embeddings before
    the projected latent vector. Controlled by hybrid_k parameter (0=disabled).
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    faa1aa9 View commit details
    Browse the repository at this point in the history
  11. Hybrid v2: inject key tokens as text in Agent B prompt, not inputs_em…

    …beds
    
    inputs_embeds injection had zero effect (model ignores raw embeddings).
    Now inject key tokens as "Key Context" in the answerer's prompt via
    input_ids — processed through normal embedding + positional encoding path.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    e71bfc1 View commit details
    Browse the repository at this point in the history
  12. Fix hybrid: switch SDPA to eager attention for key token extraction

    SDPA silently ignores output_attentions=True, so attention weights were
    never returned. key_text was always None, meaning hybrid mode was
    effectively running pure rosetta. Temporarily switch to eager attention
    for the dummy forward pass when hybrid_k > 0.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    a1882f6 View commit details
    Browse the repository at this point in the history
  13. Fix hybrid: load model with eager attention at from_pretrained time

    SDPA attention silently ignores output_attentions=True, so all hybrid
    experiment runs had no attention weights — key_text was always None.
    The previous runtime _attn_implementation override didn't work because
    HuggingFace selects the attention module class at from_pretrained time.
    
    Fix: pass attn_implementation="eager" to from_pretrained when hybrid_k > 0.
    Remove the broken runtime override from pipeline_rosetta.py.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    aec31ec View commit details
    Browse the repository at this point in the history
  14. Fix hybrid extraction: skip template tokens to avoid attention sinks

    Attention-based extraction was picking up system prompt and instruction
    tokens instead of paragraph content. Added find_content_start() to locate
    the "## Paragraphs:" marker and zero out template token scores.
    
    Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
    SStas and claude committed Mar 14, 2026
    Configuration menu
    Copy the full SHA
    4c3607b View commit details
    Browse the repository at this point in the history
Loading