14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
-
Updated
Apr 1, 2026 - Python
14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
Drop-in prompt compression for production LLM apps. Cut your token bill 40-60% without changing your code. Python SDK, LLMLingua-2, MIT.
JavaScript/TypeScript implementation of LLMLingua-2 (Experimental)
A self-improving knowledge base about LLM agent infrastructure
Python command-line tool for interacting with AI models through the OpenRouter API/Cloudflare AI Gateway, or local self-hosted Ollama. Optionally support Microsoft LLMLingua prompt token compression
Lossless-first prompt compression for JSON, YAML, CSV, and Markdown. Library, CLI, MCP server, desktop app, and browser extension.
Rolling context compression for Claude Code — never hit the context wall. Auto-compresses old messages while keeping recent context verbatim. Zero config, zero latency. Works as a Claude Code plugin.
Reverse T9 for LLMs. Free, open-source prompt compressor for your AI prompts and agents.
A curated list of strategies, tools, papers, and resources for reducing LLM token costs and improving efficiency in production.
CUTIA: compress prompts while preserving quality
A Claude Code skill that shrinks massive prompts and files using LLMLingua to save tokens.
This repository is the official implementation of Generative Context Distillation.
TOON for TYPO3 — a compact, human-readable, and token-efficient data format for AI prompts & LLM contexts. Perfect for ChatGPT, Gemini, Claude, Mistral, and OpenAI integrations (JSON ⇄ TOON).
LLMLingua-2 prompt compression hook for Claude Code — cut token usage by ~55%
Same answer, fewer tokens — KO-first LLM output-compression skill for Claude Code & Codex. A Korean-native caveman alternative, measured on real session output_tokens.
LLM judgment control layer for drift, memory loss, hallucination, and cost optimization.
Advanced token reduction and prompt optimization framework for LLMs, featuring linguistic, algorithmic, and architectural patterns.
AI-assisted context management and prompt compression toolkit for developer productivity, ADR workflows, and LLM token optimization.
Compress LLM Prompts and save 80%+ on GPT-4 in Python
PirateBao is a TypeScript/Bun agent-skill package for terse pirate-speak AI coding replies that preserve technical detail while cutting filler, with hooks, compressor CLI, OpenCode/Codex/Claude/Gemini cargo, .bao validation, npmjs gates, and token eval checks.
Add a description, image, and links to the prompt-compression topic page so that developers can more easily learn about it.
To associate your repository with the prompt-compression topic, visit your repo's landing page and select "manage topics."