A soundwalk,
rendered as film.

A hands-on 18-hour workshop in Python, audio classifiers, and AI video. Students field-record a place, analyse what they captured, and render it back as a short film. From their own input to their own output.

18 hours · 6 sessions Basic Python helpful Each student leaves with a film

From field recording
to generative film.

1

Record

Your own field audio of a real place, captured on the device you already carry.

2

Classify

Pretrained audio models tag what's there — bird, siren, rain, engine — with timestamps and confidence.

3

Map

Each sound class becomes a visual property.

4

Render

AI video generation via Replicate. Prototype on images, render on video, assemble in Python.

# the heart of the pipeline
from transformers import pipeline
import librosa

y, sr = librosa.load("recording.wav",
sr=16000)

clf = pipeline("audio-classification",
model="MIT/ast-finetuned-audioset")

events = clf(y, top_k=5)

18 hours, one film.

Session 1 Listen — Python warm-up, loading and plotting field audio with librosa setup + theory

Session 2 Recognize — classifying sound events with AST and AudioSet hands-on

Session 3 Structure — windowed segmentation into a clean event timeline hands-on

Session 4 Map — sound classes to visual properties. The artistic core. hands-on

Session 5 Render & assemble — AI video via Replicate, composed in Python hands-on

Session 6 Your own sound-walk — new location, your pipeline, class screening build

6 × 3h

Sessions. Weekly pace with practice between.

~70%

Time spent making, not watching slides.

1 film

Finished short, in each student's hands.

From sound to film, step by step.

1

Listen

Loading, plotting, and exploring real field audio with librosa. The pipeline starts with your own ears.

2

Recognize

Pretrained classifiers tag the recording. What sound, when, and how confident.

3

Structure

Windowed segmentation and merging into a clean JSON event timeline. The scaffolding everything else builds on.

4

Map

The artistic core. Sound classes to visual properties — never to objects.

5

Render

AI video generation via Replicate. Render an image first to evaluate prompting.

6

Compose

Concatenate the clips, sync to the original field recording, export. The film is finished.

Built for the eye and the ear.

Participants leave with

A finished generative short film built from their own field recording.
A reusable Python pipeline they can run on any future recording.
The skill to design sound-to-visual mappings that work.
Practical experience with audio classifiers and diffusion video models.

Who it's for

Film, animation, audio, and design students with basic Python.
Independent artists expanding into generative film.
Creative coders curious about audio-reactive AI workflows.

Available formats

Course 6 weekly sessions · practice and recording between

Custom In-house for schools, studios, or cultural institutions

About the instructor

AI engineer with 8+ years in production LLM and ML systems. Background in affective computing research and active studio practice in generative film. Teaches through building real things. Based in Athens.

A soundwalk,rendered as film.

From field recordingto generative film.

18 hours, one film.

From sound to film, step by step.

Built for the eye and the ear.

A soundwalk,
rendered as film.

From field recording
to generative film.