FlowKit
Claude Code · Codex CLI · Gemini CLI

Turn your AI CLI into a Cinematic Video Studio

Flow Kit is an open-source agent + Chrome extension that drives Google Flow end-to-end. Reference images keep characters consistent, chain videos flow seamlessly, and YouTube publishing runs itself — all from a single /fk-* command.

MIT licensed · 34 skills · 40+ REST routes · Veo 3.1 · dual-orientation output

claude-code

# Story → YouTube in one session

> /fk-create-project "The Lost City of Petra"

> /fk-pipeline # batch refs → images → videos, retries handled

> /fk-gen-narrator # TTS narration in 600+ languages

> /fk-concat-fit-narrator # trim + xfade to narrator timing

> /fk-youtube-upload # rule-validated publish

The Real Pain of Multi-Scene AI Video

You didn't get into video to wire APIs together. Flow Kit takes the glue work off your plate.

Characters Drift Between Scenes

Every scene you generate gives you a slightly different face, outfit, lighting. Your hero becomes a stranger by scene 6. Prompts can't carry identity — only reference images can.

Expired URLs & CAPTCHA Walls

GCS signed URLs expire in ~1 hour. Uploaded media_ids start returning 'not found'. reCAPTCHA blocks half your batch. Manually babysitting a Flow tab is not a workflow.

YouTube Publishing Is a Second Job

ffmpeg concats, xfade transitions, TTS at 48kHz, brand logo overlays, thumbnail variants, SEO tags, channel upload rules, token refresh — every step a different tool.

Flow Kit solves all three inside the worker — references per entity, auto URL recovery, one upload pipeline.

Flow Kit: Your AI Video Production Team

A Chrome extension captures the Flow bearer token and solves reCAPTCHA. A FastAPI agent orchestrates every generation. Your CLI just calls the skills.

34
AI Skills

Pre-built /fk-* workflows: refs, scene images, video chaining, TTS, music, thumbnails, YouTube upload — and a doctor that triages errors.

40+
REST Routes

FastAPI surface covering projects, videos, scenes, characters, requests (with /batch), materials, models, reviews, and flow health.

3
AI CLIs

Skills are CLI-agnostic markdown recipes. Claude Code, Codex CLI, and Gemini CLI all drive the same agent without modification.

10+
Visual Styles

Materials drive style project-wide: realistic, 3d_pixar, anime, stop_motion, minecraft, oil_painting, or register your own.

One-command setup (./setup.sh)Dual-orientation (9:16 + 16:9) from one projectSelf-healing pipeline (/fk-doctor)

34 Skills That Do the Heavy Lifting

Every skill is a battle-tested workflow recipe. Invoke /fk-* in Claude Code, Codex CLI, or Gemini CLI — same skills, same results.

Core Pipeline(5)

/fk-create-project

Create Project

Interactive wizard that sets up a project with entities (characters, locations, assets), dual-orientation videos, and scenes with chain_type ROOT/CONTINUATION.

/fk-gen-refs

Generate References

Generate one reference image per entity (portrait for characters/assets, landscape for locations). Verifies all responses return UUID media_id before proceeding.

/fk-gen-images

Generate Scene Images

Generate frame-0 images for every scene with all referenced entities applied via imageInputs. Blocks if any required ref is missing a UUID media_id.

/fk-gen-videos

Generate Videos

Animate scene images into 8s video clips via Veo 3.1 (i2v). Polls until complete. Auto-appends voice_description and 'no background music' rules to prompts.

/fk-concat

Concat Final Video

Download every scene video, normalize with ffmpeg, and concatenate into a final cut. Preserves sound effects, no background music by default.

Advanced Video(6)

/fk-gen-chain-videos

Chain Videos (i2v_fl)

Generate videos with start+end frame chaining for smooth transitions between CONTINUATION scenes. Uses transition_prompt to control visual smoothness.

/fk-insert-scene

Insert Scene

Insert cutaways, close-ups, or multi-angle shots into an existing chain as INSERT children. Maintains ROOT/CONTINUATION integrity.

/fk-creative-mix

Creative Mix

Analyze your story and suggest mixed techniques: chain transitions, inserts, r2v intros, parallel multi-scene generation for cinematic impact.

/fk-review-video

Review Video

Claude Vision quality pass before upscale: flags artifacts, continuity breaks, motion glitches, and off-brief compositions for regeneration.

/fk-review-board

Review Board

Visual scene review board for bulk feedback. Flag scenes for REGENERATE_VIDEO or REGENERATE_IMAGE with targeted notes, batched in one pass.

/fk-add-material

Material Style

Register or switch visual material: realistic, 3d_pixar, anime, stop_motion, minecraft, oil_painting, or custom. Controls both image_prompt and scene_prefix.

TTS & Narration(6)

/fk-gen-tts-template

Voice Template

Create a reusable OmniVoice anchor WAV for consistent narration. Zero-shot voice cloning across 600+ languages. Re-used per scene at generation time.

/fk-gen-narrator

Generate Narrator

Auto-write per-scene narrator_text, then run TTS using the project voice template. Skips interview scenes and preserves original audio when flagged.

/fk-gen-text-overlays

Text Overlays

Extract dates, locations, stats and callouts from narrator text and schedule timed text overlays for the final cut.

/fk-concat-fit-narrator

Concat Fit Narrator

Trim every scene video to match its narrator TTS duration with xfade cross-dissolve on chains, burn text overlays, and concat at 48kHz audio.

/fk-gen-music

Generate Music

Generate a Suno chirp-v4 soundtrack matched to video mood. Uses sunoapi.org. Only applied when project.allow_music is enabled.

/fk-import-voice

Import Voice

Register an existing WAV as a project voice template without re-recording. Handy for reusing voice anchors from prior projects.

YouTube(4)

/fk-youtube-seo

YouTube SEO

Generate niche-aware title, description, hashtags, and tag list. Validates total tag-char budget (<=500 with quote overhead) to avoid invalidTags.

/fk-brand-logo

Brand Logo

Overlay channel logo intro/outro (220px or 4K badge from channel directory) on final video and thumbnails, covering the Veo watermark.

/fk-youtube-upload

YouTube Upload

Upload via YouTube Data API v3 with OAuth2. Auto-detects Shorts vs long-form, enforces per-channel max-per-day, min-gap, and avoid-hour rules.

/fk-thumbnail

Thumbnails

Generate 4 AI thumbnail variants optimized for CTR with branding and text overlays. Pulls channel design rules from channel_rules.json.

Orchestration(5)

/fk-pipeline

Full Pipeline

Auto-detect current project state and run the remaining stages in order. Uses POST /api/requests/batch so the worker handles throttling (5 concurrent, 10s cooldown).

/fk-monitor

Pipeline Monitor

Poll project pipeline state, detect transitions, send desktop notifications, and optionally auto-download completed upscales.

/fk-status

Status Dashboard

Print a full project dashboard: scene progress, entity refs, pending requests, error history, and the recommended next skill to invoke.

/fk-switch-project

Switch Project

Switch the active project across the agent so subsequent skills target the right slug automatically. Reads/writes the active-project API.

/fk-dashboard

Live Dashboard

Render a live Flow Kit statusline in Claude Code so progress, failures, and retries are visible inline while you work.

Diagnostics(3)

/fk-doctor

Doctor

Auto-diagnose pipeline errors across Flow backend, Chrome extension, worker, and YouTube upload. Prescribes targeted fixes — invoke before guessing.

/fk-fix-uuids

Fix UUIDs

Find media_id values stuck in CAMS... form, extract the real UUID from fifeUrl, and backfill scenes + entities so downstream gens succeed.

/fk-refresh-urls

Refresh URLs

Recover expired GCS signed URLs for scenes and entity refs (roughly 1h TTL). Handles the 'Requested entity was not found' recovery path.

Utilities(2)

/fk-change-model

Change Model

Inspect and switch video/image/upscale models per project via models.json. Downgrades automatically if tier does not have access.

/fk-upload-image

Upload Image

Upload a local PNG/JPG to Google Flow and receive a UUID media_id. Use for bespoke scene images, manual refs, or brand assets.

Reference(3)

/fk-camera-guide

Camera Guide

Reference for cinematic video prompts: camera angles, movements, lighting, depth of field, and shot-timing patterns for 8s clips.

/fk-thumbnail-guide

Thumbnail Guide

Design rules for CTR-optimized YouTube thumbnails: composition, color psychology, hook phrases, and required branding elements.

/fk-research

Research

Fact-check dates, names, and events before writing documentary content. Pulls citations and flags real-people bypass rules for safety filters.

From Story to YouTube in 6 Stages

Each command handles one stage. The worker batches, throttles, and retries for you — 5 concurrent requests, 10s cooldown, 5 retries with exponential backoff.

01
/fk-create-project

Define Project & Entities

Name the project, pick a material (realistic, 3d_pixar, anime…), declare every entity that should stay consistent — characters, locations, props — and write your scenes with chain_type ROOT or CONTINUATION.

One project → both 9:16 Shorts and 16:9 long-form

02
/fk-gen-refs

Generate Reference Images

One reference per entity: portrait for characters and assets, landscape for locations. The worker verifies every response returns a UUID media_id before the pipeline can continue.

Powers visual consistency across every downstream scene

03
/fk-gen-images

Compose Scene Images

Each scene image is generated with every entity listed in character_names passed as imageInputs. Scene prompts describe action only — the refs carry appearance.

Blocks automatically if any referenced entity is missing a UUID

04
/fk-gen-videos

Animate into 8s Clips

Veo 3.1 turns each scene image into an 8-second clip (i2v). Use /fk-gen-chain-videos for i2v_fl start+end frame chaining with transition_prompt between CONTINUATION scenes.

Voice description auto-appended, no background music by default

05
/fk-gen-narrator

Narrator + Overlays (optional)

Auto-write per-scene narrator_text, clone your voice template via OmniVoice (600+ languages), then extract dates, locations, and stats for timed text overlays.

Skips interview scenes, preserves original audio when flagged

06
/fk-concat-fit-narrator

Trim, Xfade & Publish

Trim every scene to narrator duration, xfade cross-dissolve on chained segments, burn overlays, apply brand logo, generate thumbnails, and upload to YouTube under your channel's rules.

Auto-detects Shorts (<61s + 9:16) vs long-form

A Three-Layer Architecture

Flow Kit does not reimplement Google Flow. It automates the browser session you already have, then wraps the pipeline in a local agent you can drive from any AI CLI.

  ┌──────────────────┐    WebSocket :9222    ┌───────────────────────┐
  │  AI CLI (you)    │                       │                       │
  │  Claude · Codex  │    REST :8100         │  Chrome Extension     │
  │  · Gemini        │ ◄──────────────────── │  (MV3 Service Worker) │
  │                  │                       │                       │
  │  /fk-* skills    │                       │  • Token capture      │
  └────────┬─────────┘                       │  • reCAPTCHA solve    │
           │                                 │  • API proxy          │
           ▼                                 │                       │
  ┌──────────────────┐                       │  on labs.google/fx    │
  │  FastAPI Agent   │  ◄─────── commands ── │                       │
  │                  │  ──────── results  ─► │                       │
  │  • Queue worker  │                       └───────────────────────┘
  │  • Error router  │
  │  • ffmpeg / TTS  │
  │  • aiosqlite DB  │
  └──────────────────┘

Chrome Extension (MV3)

  • Captures the Google Flow bearer token from labs.google/fx/tools/flow
  • Solves reCAPTCHA in the background and retries up to 10× without burning the retry budget
  • Proxies every Flow API call via a service worker — agent never hits labs.google directly

FastAPI Agent + SQLite

  • REST API on :8100 for projects, videos, scenes, characters, requests, materials, reviews
  • Queue worker: 5 concurrent, 10s cooldown, 20+ error patterns routed to targeted handlers
  • aiosqlite schema with dual-orientation video/scene columns and chain_type linkage

AI CLI (you)

  • Claude Code loads CLAUDE.md · Codex reads AGENTS.md · Gemini reads GEMINI.md
  • Each /fk-* skill is a markdown recipe — same skills drive every CLI
  • /fk-doctor triages failures before you re-run anything

CLI-agnostic by design

Every skill is a plain markdown recipe under skills/. Any AI CLI that can read files can drive the full pipeline.

Claude Code
CLAUDE.md

Auto-loaded. Native /fk-* slash commands in .claude/commands/.

Codex CLI
AGENTS.md

Agent reads the markdown skill file for the invoked command.

Gemini CLI
GEMINI.md

Same recipe pattern — skill files drive Flow Kit the same way.

Install in Four Steps

Flow Kit is MIT-licensed. Python 3.10+, ffmpeg, and Chrome required. Windows users should run inside WSL.

1

Clone + one-command setup

$ git clone https://github.com/tuannguyenhoangit-droid/google-flow-agent.git

$ cd google-flow-agent

$ ./setup.sh # installs Python deps, verifies ffmpeg + Chrome

2

Load the Chrome extension

# chrome://extensions → Developer mode → Load unpacked

# Point to the ./extension/ folder in the repo

# Open https://labs.google/fx/tools/flow and sign in

3

Start the agent

$ source venv/bin/activate

$ python -m agent.main

$ curl http://127.0.0.1:8100/health

# {"status":"ok","extension_connected":true}

4

Drive it from your AI CLI

# Claude Code auto-loads CLAUDE.md. Codex / Gemini read AGENTS.md / GEMINI.md.

> /fk-create-project "My First Flow Kit Video"

> /fk-pipeline # batched refs → images → videos

MIT License · Open Source · Contributions Welcome

What Creators Are Saying

Real feedback from video creators running Flow Kit in production.

I used to spend 4-5 hours manually prompting each scene. With Flow Kit, my entire 12-scene documentary was done in under 30 minutes. The reference image system alone is the killer feature.

A

Alex T.

Discord

i2v_fl chaining with transition_prompt is insane. Every cut looks intentional. I haven't opened ffmpeg manually since I found Flow Kit.

M

Maria S.

Twitter

create-project to youtube-upload in one session. I shipped 3 videos yesterday that would have taken me a week before Flow Kit.

D

David L.

Discord

Dual-orientation output from one project is the game-changer. Same scenes, both 9:16 Shorts and 16:9 long-form. No extra work.

S

Sarah K.

Twitter

/fk-doctor saved me. First time my pipeline failed it told me exactly which extension error hit and what to do. Self-healing pipelines are the future.

J

James R.

Discord

OmniVoice with 600+ languages opened markets I couldn't reach before. I publish the same project in English, Vietnamese, and Spanish — same character voices, three upload chains.

N

Nina P.

Facebook