| Category | Badges |
|---|---|
| Languages | |
| Framework | |
| ML / GPU | |
| CI | |
| Code Style | |
| Tests | |
| Docs | |
| OS | |
| Version | |
| License | |
| Support | |
| GitHub |
NeuroServe is an AI Inference Server built on FastAPI, designed to run seamlessly on GPU (CUDA/ROCm), CPU, and macOS MPS. It provides ready-to-use REST APIs, a modular plugin system, runtime utilities, and a consistent unified response format — making it the perfect foundation for AI-powered services.
🔧 Virtualenv quick guide: see docs/README_venv.md.
Detailed API reference and usage examples are available here: ➡️ API Documentation
- 🌐 REST APIs out-of-the-box with Swagger UI (
/docs) & ReDoc (/redoc). - ⚡ PyTorch integration with automatic device selection (
cuda,cpu,mps,rocm). - 🔌 Plugin system to extend functionality with custom AI models or services.
- 📊 Runtime tools for GPU info, warm-up routines, and environment inspection.
- 🧠 Built-in utilities like a toy model and model size calculator.
- 🧱 Unified JSON responses for predictable API behavior.
- 🧪 Cross-platform CI/CD (Ubuntu, Windows, macOS, Self-hosted GPU).
repo-fastapi/
├─ app/ # application package
│ ├─ core/ # settings & configuration
│ │ └─ config.py # app settings (Pydantic v2)
│ ├─ routes/ # HTTP API routes
│ ├─ plugins/ # extensions / integrations
│ ├─ workflows/ # workflow definitions & orchestrators
│ └─ templates/ # Jinja templates (if used)
├─ docs/ # documentation & generated diagrams
│ ├─ ARCHITECTURE.md # main architecture report
│ ├─ architecture.mmd # Mermaid source (no fences)
│ ├─ architecture.html # browser-friendly diagram
│ ├─ architecture.png # exported PNG (if mmdc installed)
│ ├─ runtime.mmd # runtime/infra diagram
│ ├─ imports.mmd # Python import graph (if generated)
│ ├─ endpoints.md # discovered API endpoints (if generated)
│ └─ README_venv.md # virtualenv quick guide
├─ tools/ # project tooling & scripts
│ └─ build_workflows_index.py # builds docs/workflows-overview.md
├─ tests/ # test suite
│ └─ test_run.py # smoke test for app startup
├─ gen_arch.py # architecture generator script
├─ requirements.txt # runtime dependencies
├─ requirements-dev.txt # dev dependencies (ruff, pre-commit, pytest, ...)
├─ .pre-commit-config.yaml # pre-commit hooks configuration
├─ README.md # project overview & usage
└─ LICENSE # project license
For a deeper look into the internal design, modules, and flow of the system, see: ➡️ Architecture Guide
git clone https://github.com/USERNAME/gpu-server.git
cd gpu-serverpython -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activatepip install -r requirements.txtpython -m scripts.install_torch --gpu # or --cpu / --rocmuvicorn app.main:app --reload --host 0.0.0.0 --port 8000Available endpoints:
- 🏠 Home → http://localhost:8000/
- ❤️ Health → http://localhost:8000/health
- 📚 Swagger UI → http://localhost:8000/docs
- 📘 ReDoc → http://localhost:8000/redoc
- 🧭 Env Summary → http://localhost:8000/env
- 🔌 Plugins → http://localhost:8000/plugins
Quick test:
curl http://localhost:8000/health
# {"status": "ok"}Each plugin lives in app/plugins/<name>/ and typically includes:
manifest.json
plugin.py # Defines Plugin class inheriting AIPlugin
README.md # Documentation
API Endpoints:
GET /plugins— list all plugins with metadata.POST /plugins/{name}/{task}— execute a task inside a plugin.
Example:
from app.plugins.base import AIPlugin
class Plugin(AIPlugin):
name = "my_plugin"
tasks = ["infer"]
def load(self):
# Load models/resources once
...
def infer(self, payload: dict) -> dict:
return {"message": "ok", "payload": payload}A lightweight orchestration layer to chain plugins into reproducible pipelines (steps → plugin + task + payload).
All endpoints are exposed under /workflow.
- Endpoints:
GET /workflow/ping,GET /workflow/presets,POST /workflow/run - System Guide (EN): app/workflows/README.md
- Workflows Index: docs/workflows-overview.md
A full list of available workflows with their versions, tags, and step counts is maintained in the Workflows Index.
A full list of available plugins with their providers, tasks, and source files is maintained in the Plugins Index.
Install dev dependencies:
pip install -r requirements-dev.txt
pre-commit installRun tests:
pytestRuff (lint + format check) runs automatically via pre-commit hooks.
We enforce a clean and consistent code style using Ruff (linter, import sorter, and formatter). For full details on configuration, commands, helper scripts, and CI integration, see:
Download models in advance:
python -m scripts.prefetch_modelsModels are cached in models_cache/ (see docs/LICENSES.md for licenses).
- Use
uvicorn/hypercornbehind a reverse proxy (e.g., Nginx). - Configure environment with
APP_*variables instead of hardcoding. - Enable HTTPS and configure CORS carefully in production.
A complete history of changes and improvements: ➡️ CHANGELOG
Details about the initial release v0.1.0: ➡️ Release Notes v0.1.0
- Add
/cudaendpoint → return detailed CUDA info. - Add
/warmupendpoint for GPU readiness. - Provide a plugin generator CLI.
- Implement API Key / JWT authentication.
- Example plugins: translation, summarization, image classification.
- Docker support for one-click deployment.
- Benchmark suite for model inference speed.
Contributions are welcome!
- Open Issues for bugs or ideas.
- Submit Pull Requests for improvements.
- Follow style guidelines (Ruff + pre-commit).
Licensed under the MIT License — see LICENSE.
Some AI/ML models are licensed separately — see Model Licenses.