Veo3 Avatar Sync Pipeline

Reusable audio-driven avatar video pipeline for faceless explainer channels.

The workflow is:

Generate your avatar expression clips once in Veo3 with a solid green background.
Chroma-key those clips into reusable RGBA frames.
Build a clip_library.json index with tags and metadata.
For each new voiceover, analyze audio -> select clips -> smooth transitions -> composite final video.

No repeated Veo generation is required for each new video once the library is ready.

Modules

chroma_key.py: green-screen masking, edge feathering, spill suppression, RGBA output.
clip_library.py: clip metadata schema, indexing, lookup, and summary.
audio_analyzer.py: segment-level RMS/pitch/speech analysis and expression-tag mapping.
transition_engine.py: best-clip selection and cached transition blend generation.
compositor.py: alpha compositing of avatar over background plus audio mux.
pipeline.py: CLI orchestration (setup, analyse, render).

Requirements

Python 3.12+
System ffmpeg on PATH (required for final audio muxing)
Python packages:
- opencv-python
- numpy
- librosa
- soundfile

Install dependencies with uv:

uv sync

If ffmpeg is missing on macOS:

brew install ffmpeg

Recommended project structure

.
├── audio_analyzer.py
├── chroma_key.py
├── clip_library.py
├── compositor.py
├── pipeline.py
├── transition_engine.py
├── raw_clips/
├── keyed_clips/
├── transition_cache/
└── clip_library.json

Step 1: Generate Veo3 clips (one-time)

Create 30-40 loopable expression clips with a bright solid green background and no camera movement.

Example prompt patterns:

Idle: subtle breathing, neutral pose, loopable 3-second clip.
Talk low/med/high: different energy talking gestures.
Reaction clips: excited, thinking, nod, shrug, wave, point left/right/up, celebrate.

Automated 40-scene generation from your avatar image

You can auto-generate 40 reusable raw scene clips directly from avatar.png using Google Veo:

uv run python generate_avatar_scenes.py \
  --reference avatar.png \
  --output-dir raw_clips \
  --count 40

The script writes:

raw_clips/<scene_name>.mp4
raw_clips/veo_scene_manifest.json (prompt + status + resume metadata)
raw_clips/generated_frames/<scene_name>.png (intermediate static scene image)

Auth options:

Gemini API key mode:
- set GOOGLE_API_KEY (or GEMINI_API_KEY) in your environment
Vertex AI mode:
- --use-vertex --project <gcp-project> --location us-central1

Useful flags:

--dry-run to preview all prompts without calling Veo
--continue-on-error to keep generating even if one scene fails
--start-index 21 --count 20 to generate in batches
--no-resume to force regenerate existing scene files

If your Google model rejects enhancePrompt, do not pass --enhance-prompt.

Two-stage generation (static image -> Veo animation)

This script now follows the exact flow you asked for:

Generate a static scene image from your reference avatar (--image-model, default gemini-2.5-flash-image)
Animate that generated image with Veo (--video-model, default veo-3.1-fast-generate-001)

Example (explicit models):

uv run python generate_avatar_scenes.py \
  --reference avatar.png \
  --output-dir raw_clips \
  --count 40 \
  --image-model gemini-2.5-flash-image \
  --video-model veo-3.1-lite-generate-preview \
  --continue-on-error

Cost-first defaults:

Image model default: gemini-2.5-flash-image (cheap and fast)
Video model default: veo-3.1-lite-generate-preview (cheaper than fast/quality)

The script also auto-falls back to cheaper available models if your chosen model is unavailable.

If one Veo model returns empty video payload for a scene, keep auto-fallback enabled (--auto-video-fallback, on by default) so the script retries other available Veo models automatically.

For full-body output, prompts are now constrained to keep head-to-toe framing in both the static frame stage and animation stage.

Step 2: Key raw clips for reuse

Place raw Veo MP4 clips in raw_clips/ and run:

uv run python pipeline.py setup --raw_clips raw_clips/ --keyed_clips keyed_clips/ --library clip_library.json

This creates keyed PNG sequences and skips clips that were already processed.

Optional chroma tuning at setup time:

uv run python pipeline.py setup \
  --raw_clips raw_clips/ \
  --keyed_clips keyed_clips/ \
  --hue-low 35 --hue-high 85 --feather 3

Step 3: Build clip index

After keying, map your clip names/tags in clip_library.py (default map provided), then build:

uv run python clip_library.py --build --clips_dir keyed_clips/ --raw_dir raw_clips/ --out clip_library.json

Check library summary:

uv run python clip_library.py --summary --out clip_library.json

Step 4: Analyze voiceover (optional)

uv run python pipeline.py analyse --audio voiceover.wav --output segments.json

This writes segment timing and tag suggestions.

Step 5: Render a new video

uv run python pipeline.py render \
  --audio voiceover.wav \
  --background bg.mp4 \
  --output final.mp4 \
  --library clip_library.json \
  --scale 0.35 \
  --position bottom_right \
  --verbose

Without --background, the renderer uses a black fallback background.

How transitions stay smooth

Segment-level clip selection picks nearest expression/energy fit.
Loop-safe clips can repeat with minimal seam artifacts.
Different adjacent clips get a cached 12-frame alpha transition (transition_cache/).

Notes

clip_library.json is your reusable asset index. Keep it versioned with your clip pack.
If a clip tag is missing, selection falls back to nearest speech clip or idle.
If a transition was already generated once, future renders reuse the cached sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Veo3 Avatar Sync Pipeline

Modules

Requirements

Recommended project structure

Step 1: Generate Veo3 clips (one-time)

Automated 40-scene generation from your avatar image

Two-stage generation (static image -> Veo animation)

Step 2: Key raw clips for reuse

Step 3: Build clip index

Step 4: Analyze voiceover (optional)

Step 5: Render a new video

How transitions stay smooth

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
audio_analyzer.py		audio_analyzer.py
avatar.png		avatar.png
chroma_key.py		chroma_key.py
clip_library.json		clip_library.json
clip_library.py		clip_library.py
compositor.py		compositor.py
generate_avatar_scenes.py		generate_avatar_scenes.py
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml
transition_engine.py		transition_engine.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Veo3 Avatar Sync Pipeline

Modules

Requirements

Recommended project structure

Step 1: Generate Veo3 clips (one-time)

Automated 40-scene generation from your avatar image

Two-stage generation (static image -> Veo animation)

Step 2: Key raw clips for reuse

Step 3: Build clip index

Step 4: Analyze voiceover (optional)

Step 5: Render a new video

How transitions stay smooth

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages