Local proxy that caches LLM API calls.
Building AI agents means running the same prompts thousands of times. That burns API credits. cache-llm caches responses in SQLite and returns them in <2ms on repeat calls.
- ⚡
<2msresponse time on cache hits - 💾 SQLite-backed — zero external dependencies
- 🔌 Drop-in compatible with OpenAI SDK, LangChain, AutoGen, and any OpenAI-compatible client
- 🔒 Deterministic
sha256hashing — same prompt always hits the same cache entry
npx @dinakars777/cache-llmStarts the proxy on http://localhost:8080 targeting https://api.openai.com.
| Flag | Description | Default |
|---|---|---|
-p, --port |
Port to run the proxy on | 8080 |
-t, --target |
Target LLM API base URL | https://api.openai.com |
-d, --db |
SQLite database file path | ./.llm-cache.db |
Point your client's baseURL at the proxy:
// OpenAI Node.js SDK
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'http://localhost:8080/v1',
});# LangChain, AutoGen, etc.
export OPENAI_BASE_URL="http://localhost:8080/v1"- Computes a
sha256hash of the method, target URL, forwarded headers, and raw request body - Returns the cached response instantly on a hit
- On a miss, forwards to the real API, stores the response, then returns it
| Package | Purpose |
|---|---|
better-sqlite3 |
Fast local SQLite caching |
express |
Proxy server |
| TypeScript | Type-safe implementation |
MIT