Skip to content

kshru9/distributed-document-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Document Search Service

Node.js REST API for multi-tenant document search with Elasticsearch, Redis caching, Redis rate limiting, structured logs, and Prometheus metrics.

Default local API base URL: http://localhost:3020

What This Project Demonstrates

  • Public health and metrics endpoints
  • Authenticated document and search APIs
  • Tenant isolation enforced on every protected request
  • Reader/writer role separation
  • Search caching and document caching in Redis
  • Rate limiting backed by Redis
  • Safe Elasticsearch query construction
  • Soft delete behavior
  • Reviewer-friendly demo and load-test scripts

Quick Start

Local Node API + Docker Dependencies

cp .env.example .env
npm install
docker compose up -d elasticsearch redis
npm start

Then check health:

curl --max-time 5 http://localhost:3020/health
curl --max-time 5 http://localhost:3020/metrics

Full Docker Compose

docker compose up --build

The API is exposed on http://localhost:3020 even when the container listens on port 3000 internally.

Requirements

The application reads the following values from .env:

  • PORT=3020
  • ELASTICSEARCH_URL=http://localhost:9200
  • REDIS_URL=redis://localhost:6379
  • ELASTICSEARCH_INDEX_PREFIX=documents
  • LOG_LEVEL=info
  • SEARCH_CACHE_TTL_SECONDS=60
  • SEARCH_QUERY_MAX_LENGTH=200
  • SEARCH_DEFAULT_PAGE=1
  • SEARCH_DEFAULT_SIZE=10
  • SEARCH_MAX_SIZE=50
  • DOCUMENT_CACHE_TTL_SECONDS=300
  • TENANT_RATE_LIMIT_PER_MINUTE=100
  • DOCUMENT_RATE_LIMIT_PER_MINUTE=30
  • RATE_LIMIT_WINDOW_SECONDS=60
  • HEALTH_CHECK_TIMEOUT_MS=2000
  • ELASTICSEARCH_REQUEST_TIMEOUT_MS=2000
  • REDIS_CONNECT_TIMEOUT_MS=2000

Authentication

Protected endpoints require both headers:

Authorization: Bearer <token>
X-Tenant-Id: <tenantId>

Available prototype tokens:

  • tenant-a-reader-token
  • tenant-a-writer-token
  • tenant-b-reader-token
  • tenant-b-writer-token

Role behavior:

  • reader: GET /search, GET /documents/:id
  • writer: POST /documents, GET /search, GET /documents/:id, DELETE /documents/:id

API Examples

GET /health

curl -i http://localhost:3020/health

GET /metrics

curl -i http://localhost:3020/metrics

POST /documents

curl -sS -X POST http://localhost:3020/documents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tenant-a-writer-token" \
  -H "X-Tenant-Id: tenant-a" \
  -d '{
    "title": "Contract Renewal Policy",
    "content": "The contract renewal process begins 60 days before expiry.",
    "metadata": {
      "department": "legal",
      "category": "contracts"
    }
  }'

GET /search

curl -sS "http://localhost:3020/search?q=contract&page=1&size=10" \
  -H "Authorization: Bearer tenant-a-reader-token" \
  -H "X-Tenant-Id: tenant-a"

GET /documents/:id

curl -sS http://localhost:3020/documents/<id> \
  -H "Authorization: Bearer tenant-a-reader-token" \
  -H "X-Tenant-Id: tenant-a"

DELETE /documents/:id

curl -sS -X DELETE http://localhost:3020/documents/<id> \
  -H "Authorization: Bearer tenant-a-writer-token" \
  -H "X-Tenant-Id: tenant-a"

Demo Data

Seed repeatable demo content for both tenants:

npm run seed

The seed script uses the real HTTP API, prints created ids, and exits non-zero if any document fails.

Load Testing

Run a local concurrency test against search:

npm run load-test

Optional document GET test:

DOCUMENT_ID=<id> npm run load-test

Useful overrides:

  • API_BASE_URL=http://localhost:3020
  • TENANT_ID=tenant-a
  • TOKEN=tenant-a-reader-token
  • SEARCH_QUERY=contract
  • CONNECTIONS=50
  • DURATION_SECONDS=30

Load Test Rate-Limit Clarification

  • The default tenant rate limit is 100 requests/minute.
  • autocannon can exceed that window quickly even with low CONNECTIONS.
  • 429 responses during load tests are expected unless local limits are raised.
  • For raw latency measurement, temporarily raise:
    • TENANT_RATE_LIMIT_PER_MINUTE
    • DOCUMENT_RATE_LIMIT_PER_MINUTE
  • For abuse-protection verification, keep or lower limits and observe RATE_LIMITED responses.
  • Local load tests demonstrate methodology and baseline behavior, not 10M-document production scale.

Testing Strategy

Quick review path:

  • Local functional checks cover /health, /metrics, auth, tenant isolation, create/search/get/delete, and soft delete.
  • npm run seed loads repeatable demo data for both tenants without destructive cleanup.
  • Cache checks verify X-Cache: MISS and X-Cache: HIT for search and document GET.
  • Rate-limit checks verify 429 RATE_LIMITED and rate_limited_total.
  • Observability checks verify structured logs, requestId, and the expected metric families.
  • npm run load-test provides a local concurrency baseline with autocannon.
  • Production-scale strategy and pass/fail criteria are documented in Testing Strategy.

Metrics

The local prototype exposes Prometheus text metrics at:

GET /metrics

Example:

curl http://localhost:3020/metrics

Custom metric families:

  • http_requests_total
  • http_request_duration_seconds
  • cache_hits_total
  • cache_misses_total
  • rate_limited_total
  • documents_indexed_total
  • documents_deleted_total
  • search_requests_total
  • search_duration_seconds

Label safety:

  • Metrics avoid raw search query text.
  • Metrics avoid document IDs.
  • Metrics avoid request IDs.
  • Metrics avoid Authorization headers or bearer tokens.
  • Metrics avoid Redis cache keys.
  • HTTP route labels use low-cardinality route patterns such as /documents/:id.

Architecture

Design notes and operational assumptions are documented here:

Repository Layout

.
├── docker-compose.yml
├── Dockerfile
├── docs/
├── scripts/
├── src/
└── package.json

Reviewer Notes

  • /health is public and should report healthy when dependencies are available.
  • /metrics is public and returns Prometheus-formatted metrics.
  • Cache behavior uses X-Cache: HIT and X-Cache: MISS.
  • Tenant mismatches are rejected by the auth layer.
  • Logs are structured and should not leak bearer tokens or document content.

Manual Verification Checklist

  • docker compose up --build
  • curl /health
  • curl /metrics
  • npm run seed
  • Search tenant-a seeded data
  • Search tenant-b seeded data
  • Confirm tenant-a data does not appear for tenant-b
  • Reader POST /documents returns 403
  • Writer POST /documents returns 201
  • Repeated GET /documents/:id shows X-Cache: MISS then X-Cache: HIT
  • Repeated GET /search shows X-Cache: MISS then X-Cache: HIT
  • Invalid search returns 400 VALIDATION_ERROR
  • DELETE /documents/:id returns SOFT_DELETED
  • GET deleted document returns 404
  • npm run load-test completes
  • /metrics counters change
  • Logs include requestId
  • Logs do not include Authorization headers or bearer tokens

AI Usage Note

AI tools were used to assist with architecture brainstorming, implementation scaffolding, documentation organization, and test planning. Final design decisions, code review, and validation were performed by me.

distributed-document-search

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors