Aura is built for one thing: keeping readers in flow.
When you hit an unfamiliar word, historical reference, or difficult concept, traditional reading quickly turns into a context switch: put down the book, unlock your phone, search, scroll, get distracted. Aura removes that loop with a fully hands-free reading assistant.
- Stationary setup: your phone stays on a stand and acts as the system's camera and microphone.
- Voice-first interaction: wake, ask, and control features (including OCR capture) by voice.
- Seamless output: audio replies go to earbuds/headset, while short text snippets can be pushed to a smartwatch.
The long-term roadmap is wearable-first interaction (for example, chest-mounted setups), so Aura can assist both focused desk reading and outdoor learning scenarios.
Aura uses a 3-layer architecture for responsiveness, privacy, and model flexibility.
The device layer includes smartphones, smartwatches, and audio peripherals.
It handles local sensing, keyword spotting (KWS), and user interaction orchestration.
Primary stack: Flutter + Sherpa-ONNX.
The edge layer usually runs on a local GPU workstation (for example, RTX 3090).
It handles low-latency and privacy-sensitive workloads, including TLS termination (Nginx), speech recognition (STT), and neural speech synthesis (TTS).
Primary stack: FastAPI + Nginx + Faster-Whisper + CosyVoice.
The intelligence layer provides reasoning and knowledge synthesis through LLM APIs or local models.
Current default: local gemma4:31b served by Ollama. You can replace Ollama with any compatible LLM endpoint.
sequenceDiagram
autonumber
actor User
participant Client as Flutter Client
participant Nginx as Nginx Proxy
participant Server as FastAPI Server
participant STT as STT Engine
participant LLM as LLM Core (Aura)
participant TTS as TTS Engine
Note over Client: KWS listening (Sherpa-ONNX)
User ->> Client: "小爱同学" (wake word)
Client ->> Client: switch to AppMode.recording
User ->> Client: "Summarize this paragraph"
Client ->> Client: silence detected, stop recording
Note over Client, Server: Full-duplex pipeline starts
Client ->> Server: uploadAudio(PCM bytes)
Server ->> STT: speech recognition
STT -->> Server: transcript
Server -->> Client: taskId
par Text stream (SSE)
Client ->> Server: GET /text_stream
and Audio stream
Client ->> Server: GET /audio_stream
end
LLM ->> Server: first token
Server ->> Client: text token pushed immediately
LLM ->> TTS: first complete sentence
TTS ->> Server: first audio chunk
Server ->> Client: stream MP3 chunks
LLM -->> Server: [DONE]
TTS -->> Server: synthesis complete
Server -->> Client: streams closed
Client ->> Client: onPlayerComplete()
Client ->> Client: _fastResetToKws()
Aura uses dual authentication behind Nginx:
- Header token for SSE and upload APIs
- URL token for media streams
See the full network guide: Network Configuration Guide.
Run gemma4:31b locally with Ollama, or point Aura to any Ollama-compatible LLM API.
Running in Docker or on a host with an NVIDIA GPU is recommended.
- Python 3.10+
- CUDA Toolkit (11.8 or 12.1)
- Accessible Ollama API endpoint
Install dependencies:
pip install fastapi uvicorn pydantic faster-whisper httpx python-dotenv pydubCosyVoice (TTS)
Initialize submodules:
git submodule update --init --recursiveDownload pretrained CosyVoice models (for example, 0.5B) into services/cosy_voice/pretrained_models/.
For cross-lingual synthesis via inference_cross_lingual(), place a clean prompt .wav file in services/cosy_voice/assets/. Official voice examples: CosyVoice demo.
EdgeTTS APIs are also available in this repository, but EdgeTTS rate limits can break long, continuous conversations.
Faster-Whisper (STT)
The large-v3-turbo model downloads automatically on first run. Ensure internet access on the server during initialization.
SearXNG
Follow the official installation docs. If your environment requires a proxy, configure /etc/searxng/settings.yml accordingly.
If you need global search engines behind regional network restrictions, run an available proxy at 127.0.0.1:7890 (for example, using mihomo).
Create .env in gateway/:
AURA_API_KEY=your_ultra_secret_key_herecd gateway && python aura_server.pyThe app is optimized for Android audio and network-security behavior.
- Flutter SDK
~3.32.5 - Android NDK
27.0.12077973
Create .env in app/ (and keep it out of version control):
AURA_SERVER_IP=your_server_public_ip
AURA_SERVER_PORT=8443
AURA_API_KEY=your_ultra_secret_key_hereEnsure .env is included in app/pubspec.yaml:
assets:
- .envDownload KWS model files from Sherpa-ONNX releases.
Generate wake-word tokens:
sherpa-onnx-cli text2token \
--tokens assets/kws_model/tokens.txt \
--tokens-type phone+ppinyin \
--lexicon assets/kws_model/en.phone \
assets/kws_model/keywords_raw.txt keywords.txtPlace model files and keywords.txt in app/assets/kws_model/, then verify:
assets:
- assets/kws_model/After network setup, copy aura.crt to:
app/android/app/src/main/res/raw/aura_cert.crt
This prevents Android media playback from rejecting the HTTPS audio stream.
For wireless deployment to a physical Android device:
adb pair <ip_address>:<port>
adb connect <ip_address>:<another_port>
adb devicesThen run:
flutter clean
flutter pub get
flutter runFirst-run notes:
- Grant microphone permission, or KWS and voice interaction will fail.
- Some Android ROMs (for example MIUI/HyperOS) show short-lived install prompts during ADB install; missing them can trigger
installation canceled by user.

