Use Case

Editing a multi-camera
YouTube interview
with AI.

How Vidiot turned 20 minutes of raw two-camera footage into a finished episode of Hanley Talks AI: 20 Under 20 — in a single Claude Code session.

The project

20 Under 20, Episode 2.
Matthew Chan, age 15.

A YouTube interview series featuring remarkable young people. Matthew flew from Vancouver to Hong Kong to speak at a Lunatechs event about building games with Claude Code. After his talk, we sat down and filmed the episode with two cameras and wireless mics.

Interview at Lunatechs Hong Kong

Lunatechs HK · Compass Offices

Matthew Chan (right) and host Hanley Leung filming 20 Under 20, Episode 2

Two cameras, same moment
iPhone close-up of Matthew Chan
iPhone 17 Pro Max · 1080p 30fps
DJI wide shot at Lunatechs
DJI Pocket 3 · 4K 29.97fps
What went in vs what came out
Raw footage
39
minutes of raw footage
7,217
words spoken
8.5GB
of source files
Vidiot output
0
scripted sections
13:46
final episode
0
graphics & overlays

How Vidiot works under the hood

You
Natural language
“Edit this interview
like ep 1”
Claude Code
AI orchestrator
Reads CLAUDE.md
Calls Vidiot tools
Makes editorial decisions
Vidiot
Native macOS engine
Whisper · Vision · Core Graphics
AVFoundation · MossFormer2
HTML template renderer
Output
Finished video
1080p export with
graphics, music, captions
AI analysis

What Vidiot discovered
in the raw footage.

🗣
0
Words transcribed
by Whisper
🎬
0
Frames extracted
for analysis
👁
0
Topic clusters
detected
0
Shot changes
per camera
“The impatient doer will always do better than the overthinking high achiever.”
Matthew Chan, from the episode — selected by Vidiot as the cold open hook
Script structure

27 sections. Structured by AI from raw transcript.

Hook 1: "Keep working hard"H1
Hook 2: "Managing a team at 12"H2
Hook 3: "Just build"H3
Host intro & series contextIntro
Sticker philosophy: work hardSticker
Emotional core: work hard even when cryingEmotion
HKU Game Jam story: 1st placeGame Jam
Webcam gesture game innovationWebcam
Origin: Hong Kong → VancouverOrigin
Roblox: managed team at 12, made $5KRoblox
Vibe coding journey: school to passionVibe Code
Success = making people happySuccess
College: connections over credentialsCollege
Letter to future self: hopes to be in SF at 20Letter
Advice to young peopleAdvice
Advice to parentsParents
Quick fire roundQuick Fire
Red Flag / Green Flag gameRF/GF
Subscribe CTACTA
Hooks
Intro
Chapters
Emotional peaks
Games
CTA
Whisper AVFoundation Core Graphics Vision MossFormer2 Metal Core ML WebKit
How this episode was edited

One prompt.
Four automated stages.

What you do
“Turn this interview into a YouTube episode like ep 1”
Drop your footage folder. Type one sentence. That's it.
What Vidiot does automatically
01
Analyze
Whisper transcribes all audio. Vision detects faces, shot changes. Every clip cataloged with word-level timestamps.
02
Structure
Claude reads the transcript and structures a narrative. Cold open, chapters, conclusion. Editorial decisions made by AI.
03
Build
Multi-cam timeline with frame-accurate cuts. Lower thirds, chapter titles, BGM with ducking, transitions. All in one JSON.
04
Export
Validation catches errors automatically. One command renders the final video with all graphics, captions, and audio baked in.
The actual prompt

This is what was typed
to produce the episode.

One message in Claude Code. Vidiot handled the rest — footage analysis, transcription, narrative structure, multi-cam editing, graphics, and music.

Claude Code
you> I filmed an interview with Matthew Chan for ep 2 of 20 Under 20. Two cameras - iPhone close-up, DJI wide shot. Boya wireless mics on the iPhone. Turn it into a YouTube episode like ep 1. claude> Analyzing footage in Video Source/... IMG_0322.MOV — 20m 39s, 1080p 30fps (iPhone) DJI_...0004_D.MP4 — 18m 36s, 4K 29.97fps (Pocket 3) Transcribing with Whisper... 47 sentences, 12 topic clusters Face detection... 94% of iPhone frames (center position) Writing script.json — cold open, 4 chapters, CTA Building timeline.json — multi-cam switching, graphics, BGM Validating... 0 errors, 0 warnings claude> Timeline ready. Open Vidiot to preview.

Those are real file names from the actual project. The iPhone shot 20 minutes of 1080p close-up. The DJI Pocket 3 captured 18 minutes of 4K wide shot. 8.5 GB of footage, handled in one message.

The timeline Vidiot generated for this episode

Video
iPhone
DJI
iPhone
DJI
iPhone
DJI
iPhone
Graphics
Lower third
Ch. 1
Ch. 2
CTA
BGM
SFX
iPhone close-up
DJI wide shot
Motion graphics
Background music
Sound effects
Real Vidiot output

These frames are rendered
directly by the Vidiot engine.

Impact text: KEEP WORKING HARD with snap zoom
Impact text + snap zoom (1.3x) on hook
Lower third: Matthew Chan, 15-Year-Old Builder
Lower third with scrim — auto-placed at introduction
Chapter title: 20 UNDER 20
Chapter title — inserted at topic transition
Impact text: JUST BUILD
Impact text — punched on Matthew's key quote
Stat callout: 12 years old managing a team
Stat callout — data visualized when mentioned
Quote: The impatient doer will always do better
Quote card with scrim — selected as cold open hook
Before and after

Raw footage vs Vidiot output.

Raw frame
Raw — 1080p source, no processing
Vidiot render
Vidiot — snap zoom + impact text overlay
Raw frame
Raw — unprocessed interview footage
Vidiot render
Vidiot — lower third + scrim overlay
Raw frame
Raw — Matthew talking about Roblox
Vidiot render
Vidiot — animated stat callout rendered on frame
Full episode contact sheet — rendered by Vidiot at 30-second intervals
Contact sheet of the full episode
Episode composition
13:46
total
Hooks 0:45
Intro 1:10
Interview chapters 9:20
Emotional peaks 1:30
Games & quick fire 0:50
CTA 0:11
Everything Vidiot deployed for this episode

18 AI and native capabilities.
Zero manual configuration.

Whisper STT
Word-level transcription at 30fps
Face detection
94% of frames, center position
Shot detection
6 shot boundaries per camera
Content classification
talking_head identified
Topic clustering
12 themes extracted
Narrative structure
Cold open, 4 chapters, CTA
Quote selection
Best moments for hooks
Multi-cam switching
iPhone close-up / DJI wide
Snap zoom
1.2-1.3x on key moments
Slow zoom
Emotional peak: future self letter
Lower third
Matthew Chan name card
Chapter titles
5 animated section headers
Stat callouts
"$5K from Roblox at 12"
Quote card
Impatient doer quote overlay
Impact text
4 text pops on hooks
BGM with ducking
Auto-ducks at 0.08 under speech
Subscribe CTA
End card with channel link
Saliency framing
AI-guided crop for every clip
What Vidiot handled for this episode

Every layer of
the edit. Automated.

Core

Natural language editing

The entire episode was described in one Claude Code message. Vidiot wrote the complete timeline — every clip, transition, graphic, and music cue. No dragging clips on a timeline.

Vision

Multi-cam switching

Vidiot cut between the iPhone close-up on Matthew and the DJI wide two-shot. Close-up for emotional beats, wide for reactions and context. Face detection guided every cut.

Audio

Whisper transcription

Both camera files were transcribed with word-level timestamps at 30fps. Every cut point in the episode lands precisely between words — no clipped syllables.

Graphics

Motion graphics

Lower third introducing Matthew Chan. Chapter titles for each topic. A CTA at the end. All placed automatically from the HTML template library based on the script structure.

Audio

Music & sound design

Background music selected from the AI-classified library by mood and energy. Auto-ducked under speech throughout. Transition whooshes placed at every chapter break.

ML

Speech enhancement

The DJI Pocket 3 uses a built-in mic. MossFormer2 neural enhancement cleaned up those wide-shot clips to match the iPhone's Boya wireless lav quality. GPU-accelerated on M1 Max.

Vision

Color & framing

Both cameras color-graded to match. The Pocket 3's 4K frame was cropped and framed for each clip. Face detection positioned Matthew consistently across every cut.

Case study

20 Under 20, Episode 2

Matthew Chan, 15, Vancouver. Spoke at Lunatechs HK about building games with Claude Code. The interview was filmed with two cameras and edited entirely by Vidiot.

The guest

Matthew Chan, 15

8+ apps shipped in 3 months. 1st place at Games4Change Jam at HKU. 1.5M views across platforms. Solo entry competing against university teams.

The rig

Two cameras, two mics

iPhone 17 Pro Max for the close-up with Boya wireless lavaliers. DJI Pocket 3 for the 4K wide two-shot. Total: 8.5 GB of raw footage.

20
minutes per camera
1
prompt to edit
0
manual transcription

I describe the edit I want. Vidiot handles the transcription, the multi-cam cuts, the graphics, the music. I just focus on the story.

Hanley Leung, creator of Vidiot

See the result

Episode 1 — edited with Vidiot.

See the result.

Watch the episode that Vidiot edited. Then watch how it was built in one Claude Code session.

Watch 20 Under 20 See Episode 1