VideoML Standard
Status: Draft (v0.1)
VideoML is an XML-based standard for creating videos using declarative markup. It treats time as a layout axis and uses the browser DOM as the canonical runtime. VideoML feels like web programming with a timeline.
What You'll Learn
- Core VideoML syntax and root element structure
- Temporal layout model (sequence, stack, duration)
- Timeline API for accessing playback state
- Lifecycle events for scene and cue transitions
- Inline JavaScript and event handlers
- Design decisions behind VideoML's approach
Canonical Root Element
Every VideoML document starts with a <vml> root element. This is the canonical format—XML files are the source of truth, and all derived outputs (MP4 videos, JSON timelines) are generated artifacts.
Required Attributes
<vml
id="intro"
title="Introduction Video"
fps="30"
width="1920"
height="1080"
>
<!-- scenes go here -->
</vml>id: Unique identifier for this videotitle: Human-readable namefps: Frames per second (24, 30, or 60 typical)width/height: Video dimensions in pixels
File Extension
VideoML files use the .babulus.xml extension. This disambiguates them from generic XML and signals they contain time-based video markup.
<title-slide />).
Temporal Layout Model
VideoML automatically calculates durations based on content, treating time like CSS treats space. Add a 5-second scene, and the video grows by 5 seconds—no manual duration math required.
Sequence (Back-to-Back)
The <sequence> element plays children one after another. Total duration equals the sum of all child durations.
<sequence>
<scene duration="3s">Scene 1</scene> <!-- 0-3s -->
<scene duration="2s">Scene 2</scene> <!-- 3-5s -->
<scene duration="4s">Scene 3</scene> <!-- 5-9s -->
</sequence>
<!-- Total: 9 seconds -->Stack (Parallel)
The <stack> element plays children simultaneously. Total duration equals the longest child.
<stack>
<!-- Visual layer: 10s -->
<layer duration="10s">
<title-slide title="Welcome" />
</layer>
<!-- Audio narration: 8s -->
<audio src="narration.wav" duration="8s" />
</stack>
<!-- Total: 10 seconds (longest child) -->Automatic Duration
If you omit the duration attribute, VideoML calculates it from child elements or generated audio (TTS).
<scene>
<cue>Welcome to the tutorial.</cue>
</scene>
<!-- Duration = length of TTS audio for "Welcome to the tutorial." -->Timeline API
Access current playback state via the global window.timeline object. This API is available in both Live Mode (real-time editing) and Export Mode (rendering).
Properties
| Property | Type | Description |
|---|---|---|
window.timeline.frame |
number | Current frame (0-indexed) |
window.timeline.time |
number | Current time in seconds |
window.timeline.fps |
number | Frames per second |
window.timelines |
Timeline[] | Array of all timelines (for multi-player sync) |
CSS Variables
VideoML sets CSS custom properties on the root element every frame. Use these in stylesheets for dynamic effects.
/* Access timeline state in CSS */
.progress-bar {
width: calc(100% * var(--video-time) / var(--video-duration));
}
.fade-in {
opacity: calc(var(--video-frame) / 30); /* Fade in over 30 frames */
}JavaScript Example
// Log playback state every frame
window.addEventListener('timeline:tick', (e) => {
console.log("Frame " + window.timeline.frame + " at " + window.timeline.time + "s");
});
// Calculate progress percentage
const progress = (window.timeline.frame / totalFrames) * 100;timeline:tick fires every frame (30-60 times per second). Avoid expensive operations in tick handlers.
Lifecycle Events
VideoML dispatches events when scenes and cues start/end. Listen to these events to trigger animations, update UI, or log analytics.
Event Types
| Event | Dispatched On | Detail |
|---|---|---|
timeline:tick |
<vml>, window |
{ frame, time, fps } |
scene:start |
<scene>, window |
{ sceneId, startTime } |
scene:end |
<scene>, window |
{ sceneId, endTime } |
cue:start |
<cue>, window |
{ cueId, text } |
cue:end |
<cue>, window |
{ cueId } |
Example: Scene Transitions
<vml id="demo" fps="30" width="1920" height="1080">
<script>
// Log when scenes change
window.addEventListener('scene:start', (e) => {
console.log(`Scene started: ${e.detail.sceneId}`);
});
</script>
<scene id="intro" duration="3s">
<!-- scene content -->
</scene>
</vml>Inline JavaScript
VideoML supports <script> tags and on:* event handler attributes. Scripts execute when inserted into the DOM, just like HTML.
Script Blocks
<vml id="interactive" fps="30" width="1920" height="1080">
<script>
// Initialize state
let clickCount = 0;
function handleClick() {
clickCount++;
console.log(`Clicked ${clickCount} times`);
}
</script>
<scene duration="5s">
<button on:click="handleClick()">Click Me</button>
</scene>
</vml>Event Handler Attributes
Use on:* attributes to attach event handlers inline. The handler scope includes:
event: The DOM event objecttarget: The element that triggered the eventtimeline: The current timeline objectroot: The<vml>root element
<button on:click="console.log(timeline.frame)">Log Frame</button>
<input on:change="target.value = target.value.toUpperCase()" />Ignoring Subtrees
Use data-videoml-ignore="true" to exclude DOM subtrees from handler rebinding and mutation recording. This is useful for third-party widgets or performance-sensitive areas.
<div data-videoml-ignore="true">
<!-- This subtree is ignored by VideoML runtime -->
<iframe src="external-widget.html"></iframe>
</div>Scenes and Layers
Scenes are the primary structural unit. Layers provide z-index stacking within scenes.
Scene Element
<scene
id="intro"
start="0s"
duration="5s"
>
<!-- scene content -->
</scene>id: Unique identifierstart: Absolute start time (optional, defaults to sequential)duration: Scene length (optional if auto-calculated)
Layer Element
Layers stack visually using CSS z-index. Higher z-index appears on top.
<scene duration="8s">
<!-- Background layer (z-index: 0) -->
<layer style="z-index: 0">
<background-gradient colors="blue,purple" />
</layer>
<!-- Content layer (z-index: 10) -->
<layer style="z-index: 10">
<title-slide title="Hello" />
</layer>
<!-- Overlay layer (z-index: 20) -->
<layer style="z-index: 20">
<lower-third name="Jane Doe" title="CEO" />
</layer>
</scene>Cues and Text-to-Speech
Cues contain narration text that gets converted to audio via TTS providers. The audio duration determines scene length.
Cue Element
<scene>
<cue id="intro-1" voice="en-US-Neural">
Welcome to the tutorial. Today we'll cover the basics.
</cue>
</scene>
<!-- Scene duration = TTS audio length -->id: Unique identifier for this cuevoice: TTS voice name (provider-specific)- Text content: The narration script
Multiple Cues
Multiple cues in a scene play sequentially (like a sequence).
<scene>
<cue id="line-1">First sentence.</cue>
<cue id="line-2">Second sentence.</cue>
</scene>
<!-- Total duration = sum of both audio clips -->Narration Track
Use <narration> for voiceover that spans multiple transitions. Narration items sit on the timeline (like scenes/transitions) but render no visuals.
<transition effect="push" duration="12f" />
<narration id="layouts-voice">
<cue id="layouts">
Use one-column, two-column, three-column, and grid layouts.
</cue>
</narration>Place <narration> just before the scenes it should align with. It will start at the next scene's start unless you provide start or duration.
Web Components
VideoML uses Web Components (custom elements) for reusable UI. Any hyphenated tag is treated as a Web Component.
Component Syntax
<!-- Built-in components -->
<title-slide title="Welcome" subtitle="Get Started" />
<lower-third name="Jane Doe" title="CEO" />
<code-block language="javascript">
console.log('Hello');
</code-block>
<!-- Custom components -->
<my-chart data="[1,2,3]" type="bar" />Props Attribute
For complex data, use the props attribute with JSON.
<data-visualization
props='{
"data": [
{"month": "Jan", "sales": 100},
{"month": "Feb", "sales": 150}
],
"chartType": "line",
"showLegend": true
}'
/>Transitions
Full transitions guide & integration test gallery
Transitions are first-class timeline items that sit between scenes. A transition is a container (like a scene) that can hold visuals and audio, and it can either overlap adjacent scenes or insert time between them.
0.6s, 12f). Time expressions can reference scene(), cue(), and mark().
Transition Element
Use <transition> between scenes for crossfades, wipes, and any branded or custom transitions.
| Attribute | Type | Notes |
|---|---|---|
id |
string | Required unique id. |
start / end / duration |
time | Optional. Explicit timing for the transition window. |
effect |
string | Named transition preset (e.g. crossfade, fade, wipe). |
ease |
string | GSAP ease string (e.g. power2.inOut). |
mode |
string | overlap (default) or insert. |
overflow |
string | Visual overflow behavior: clip, extend, allow. |
overflow-audio |
string | Audio overflow behavior: clip, extend, allow. |
Audio in Transitions
Transitions can include SFX and music via <sfx>, <music>, or <audio kind=...>.
<transition id="wipe-01" effect="wipe" duration="18f" ease="power2.inOut">
<sfx id="whoosh" start="0f" />
</transition>Scene Enter/Exit (Convenience)
For quick fades, use per-scene convenience attributes. These do not crossfade; they only animate the scene itself.
<scene
id="intro"
enter="fade"
enter-duration="12f"
exit="fade"
exit-duration="12f"
>
<cue id="intro">Welcome to the video.</cue>
</scene><transition> element for true crossfades between scenes.
Complete Example
Here's a full VideoML document demonstrating all core features:
<vml
id="product-demo"
title="Product Demo Video"
fps="30"
width="1920"
height="1080"
>
<script>
// Track scene transitions
window.addEventListener('scene:start', (e) => {
console.log(`Started: ${e.detail.sceneId}`);
});
</script>
<!-- Title scene with explicit duration -->
<scene id="title" duration="3s">
<layer>
<title-slide
title="Product Demo"
subtitle="Version 2.0"
/>
</layer>
</scene>
<!-- Narration scene with auto duration from TTS -->
<scene id="intro">
<stack>
<!-- Visual layer -->
<layer>
<content-screen title="Overview">
<ul>
<li>Fast performance</li>
<li>Easy to use</li>
<li>Secure by default</li>
</ul>
</content-screen>
</layer>
<!-- Narration track -->
<cue id="intro-narration">
Our new product offers fast performance, ease of use, and security.
</cue>
</stack>
</scene>
<!-- Code demo scene -->
<scene id="code-demo" duration="8s">
<layer>
<code-block language="javascript" filename="app.js">
import { render } from 'babulus';
render({
fps: 30,
width: 1920,
height: 1080
});
</code-block>
</layer>
</scene>
<!-- Outro with lower third -->
<scene id="outro" duration="4s">
<layer style="z-index: 0">
<background-gradient colors="#667eea,#764ba2" />
</layer>
<layer style="z-index: 10">
<lower-third
name="Learn More"
title="babulus.dev"
/>
</layer>
</scene>
</vml>Design Decisions
Understanding the "why" behind VideoML's approach:
Why XML Over JSON?
- HTML Familiarity: Web developers already know XML/HTML syntax
- Mixed Content: XML supports text + elements naturally (JSON requires nested objects)
- Web Components: Hyphenated custom elements (
<title-slide />) map directly to Web Component spec - Tooling: XML parsers, validators, and editors are mature and widely available
Why DOM Runtime?
- Zero Translation: VideoML elements become DOM nodes directly—no virtual layer
- CSS Compatibility: Use standard CSS for styling, animations, and layout
- JavaScript Integration: Manipulate video content with familiar DOM APIs
- Browser Features: Leverage existing browser capabilities (accessibility, dev tools, extensions)
Why Temporal Layout?
- No Duration Math: Eliminate error-prone manual calculations
- Composability: Scenes and sequences nest naturally
- Flexibility: Swap a 3s scene for a 5s scene—total duration updates automatically
- Narration-Driven: TTS audio length determines timing—video adapts to script changes
Why No Sandboxing?
- Trust Model: VideoML files are source code, not untrusted user input
- Power vs Safety: Full JavaScript access enables rich interactions (at cost of security review)
- Renderer Context: Videos render in isolated headless browsers anyway
- Future Work: Sandboxed mode may be added for user-generated content scenarios
Quick Reference
Core Elements
| Element | Purpose | Key Attributes |
|---|---|---|
<vml> |
Root element | id, fps, width, height |
<scene> |
Video section | id, start, duration |
<sequence> |
Back-to-back | duration (auto) |
<stack> |
Parallel | duration (auto = max) |
<layer> |
Z-index container | style |
<cue> |
TTS narration | id, voice |
Timeline API Summary
| API | Access |
|---|---|
| Current frame | window.timeline.frame |
| Current time | window.timeline.time |
| Frame rate | window.timeline.fps |
| CSS var (frame) | var(--video-frame) |
| CSS var (time) | var(--video-time) |
Event Summary
| Event | Fires When |
|---|---|
timeline:tick |
Every frame |
scene:start |
Scene begins |
scene:end |
Scene ends |
cue:start |
Cue audio starts |
cue:end |
Cue audio ends |
Scope and Non-Goals
In Scope
- XML syntax and root element definition
- Temporal layout model (sequence, stack, auto-duration)
- Timeline API specification
- Lifecycle events (tick, scene, cue)
- Inline JavaScript and event handlers
- Web Component integration
Out of Scope
- Determinism Enforcement: VideoML does not guarantee reproducible output if scripts use randomness or external state
- Sandboxed Scripting: No security isolation for
<script>blocks (treat VideoML as trusted source code) - Backward Compatibility:
<video>root is deprecated;<vml>is canonical - Animation Keyframes: Use CSS animations or JavaScript—VideoML provides timing, not animation primitives
Related Topics
- VideoML Conformance — Validation and testing
- Live VOM — Real-time editing and recording
- Components Guide — Built-in Web Components
- Rendering Overview — Generating MP4 outputs
- Glossary — VideoML terminology reference
Next Steps
- Try It: Create your first VideoML file in Code to Video
- Learn Components: Explore Layouts and Components
- Advanced: Read about Live Mode for interactive editing