🚀 NeuroServe — GPU-Ready FastAPI AI Server

📊 Project Status

Category	Badges
Languages
Framework
ML / GPU
CI
Code Style
Tests
Docs
OS
Version
License
Support
GitHub

📖 Overview

NeuroServe is an AI Inference Server built on FastAPI, designed to run seamlessly on GPU (CUDA/ROCm), CPU, and macOS MPS. It provides ready-to-use REST APIs, a modular plugin system, runtime utilities, and a consistent unified response format — making it the perfect foundation for AI-powered services.

Quick Setup

🔧 Virtualenv quick guide: see docs/README_venv.md.

📚 API Documentation

Detailed API reference and usage examples are available here: ➡️ API Documentation

✨ Key Features

🌐 REST APIs out-of-the-box with Swagger UI (/docs) & ReDoc (/redoc).
⚡ PyTorch integration with automatic device selection (cuda, cpu, mps, rocm).
🔌 Plugin system to extend functionality with custom AI models or services.
📊 Runtime tools for GPU info, warm-up routines, and environment inspection.
🧠 Built-in utilities like a toy model and model size calculator.
🧱 Unified JSON responses for predictable API behavior.
🧪 Cross-platform CI/CD (Ubuntu, Windows, macOS, Self-hosted GPU).

📂 Project Structure

repo-fastapi/
├─ app/                             # application package
│  ├─ core/                         # settings & configuration
│  │  └─ config.py                  # app settings (Pydantic v2)
│  ├─ routes/                       # HTTP API routes
│  ├─ plugins/                      # extensions / integrations
│  ├─ workflows/                    # workflow definitions & orchestrators
│  └─ templates/                    # Jinja templates (if used)
├─ docs/                            # documentation & generated diagrams
│  ├─ ARCHITECTURE.md               # main architecture report
│  ├─ architecture.mmd              # Mermaid source (no fences)
│  ├─ architecture.html             # browser-friendly diagram
│  ├─ architecture.png              # exported PNG (if mmdc installed)
│  ├─ runtime.mmd                   # runtime/infra diagram
│  ├─ imports.mmd                   # Python import graph (if generated)
│  ├─ endpoints.md                  # discovered API endpoints (if generated)
│  └─ README_venv.md                # virtualenv quick guide
├─ tools/                           # project tooling & scripts
│  └─ build_workflows_index.py      # builds docs/workflows-overview.md
├─ tests/                           # test suite
│  └─ test_run.py                   # smoke test for app startup
├─ gen_arch.py                      # architecture generator script
├─ requirements.txt                 # runtime dependencies
├─ requirements-dev.txt             # dev dependencies (ruff, pre-commit, pytest, ...)
├─ .pre-commit-config.yaml          # pre-commit hooks configuration
├─ README.md                        # project overview & usage
└─ LICENSE                          # project license

🏗️ Architecture

For a deeper look into the internal design, modules, and flow of the system, see: ➡️ Architecture Guide

⚙️ Installation

1. Clone the repository

git clone https://github.com/USERNAME/gpu-server.git
cd gpu-server

2. Create a virtual environment

python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. (Optional) Auto-install PyTorch

python -m scripts.install_torch --gpu    # or --cpu / --rocm

🚀 Running the Server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Available endpoints:

🏠 Home → http://localhost:8000/
❤️ Health → http://localhost:8000/health
📚 Swagger UI → http://localhost:8000/docs
📘 ReDoc → http://localhost:8000/redoc
🧭 Env Summary → http://localhost:8000/env
🔌 Plugins → http://localhost:8000/plugins

Quick test:

curl http://localhost:8000/health
# {"status": "ok"}

🔌 Plugin System

Each plugin lives in app/plugins/<name>/ and typically includes:

manifest.json
plugin.py        # Defines Plugin class inheriting AIPlugin
README.md        # Documentation

API Endpoints:

GET /plugins — list all plugins with metadata.
POST /plugins/{name}/{task} — execute a task inside a plugin.

Example:

from app.plugins.base import AIPlugin

class Plugin(AIPlugin):
    name = "my_plugin"
    tasks = ["infer"]

    def load(self):
        # Load models/resources once
        ...

    def infer(self, payload: dict) -> dict:
        return {"message": "ok", "payload": payload}

Workflow System

A lightweight orchestration layer to chain plugins into reproducible pipelines (steps → plugin + task + payload). All endpoints are exposed under /workflow.

Endpoints: GET /workflow/ping, GET /workflow/presets, POST /workflow/run
System Guide (EN): app/workflows/README.md
Workflows Index: docs/workflows-overview.md

🔄 Available Workflows

A full list of available workflows with their versions, tags, and step counts is maintained in the Workflows Index.

➡️ View Workflows Index

🧩 Available Plugins

A full list of available plugins with their providers, tasks, and source files is maintained in the Plugins Index.

➡️ View Plugins Index

🧪 Development

Install dev dependencies:

pip install -r requirements-dev.txt
pre-commit install

Run tests:

pytest

Ruff (lint + format check) runs automatically via pre-commit hooks.

🧹 Code Style

We enforce a clean and consistent code style using Ruff (linter, import sorter, and formatter). For full details on configuration, commands, helper scripts, and CI integration, see:

➡️ Code Style & Linting Guide

📦 Model Management

Download models in advance:

python -m scripts.prefetch_models

Models are cached in models_cache/ (see docs/LICENSES.md for licenses).

🏭 Deployment Notes

Use uvicorn/hypercorn behind a reverse proxy (e.g., Nginx).
Configure environment with APP_* variables instead of hardcoding.
Enable HTTPS and configure CORS carefully in production.

📝 Changelog

A complete history of changes and improvements: ➡️ CHANGELOG

📦 Release Notes

Details about the initial release v0.1.0: ➡️ Release Notes v0.1.0

🗺️ Roadmap

Add /cuda endpoint → return detailed CUDA info.
Add /warmup endpoint for GPU readiness.
Provide a plugin generator CLI.
Implement API Key / JWT authentication.
Example plugins: translation, summarization, image classification.
Docker support for one-click deployment.
Benchmark suite for model inference speed.

🤝 Contributing

Contributions are welcome!

Open Issues for bugs or ideas.
Submit Pull Requests for improvements.
Follow style guidelines (Ruff + pre-commit).

📜 License

Licensed under the MIT License — see LICENSE.

📜 Model Licenses

Some AI/ML models are licensed separately — see Model Licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
app		app
docs		docs
scripts		scripts
tests		tests
tools		tools
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
gen_arch.py		gen_arch.py
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_lint.ps1		run_lint.ps1
run_lint.sh		run_lint.sh

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🚀 NeuroServe — GPU-Ready FastAPI AI Server

📊 Project Status

📖 Overview

Quick Setup

📚 API Documentation

✨ Key Features

📂 Project Structure

🏗️ Architecture

⚙️ Installation

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. (Optional) Auto-install PyTorch

🚀 Running the Server

🔌 Plugin System

Workflow System

🔄 Available Workflows

🧩 Available Plugins

🧪 Development

🧹 Code Style

📦 Model Management

🏭 Deployment Notes

📝 Changelog

📦 Release Notes

🗺️ Roadmap

🤝 Contributing

📜 License

📜 Model Licenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages