VideoML Standard

Canonical docs now live at videoml.org/docs/standard.

Status: Draft (v0.1)

VideoML is an XML-based standard for creating videos using declarative markup. It treats time as a layout axis and uses the browser DOM as the canonical runtime. VideoML feels like web programming with a timeline.

What You'll Learn

Core VideoML syntax and root element structure
Temporal layout model (sequence, stack, duration)
Timeline API for accessing playback state
Lifecycle events for scene and cue transitions
Inline JavaScript and event handlers
Design decisions behind VideoML's approach

Canonical Root Element

Every VideoML document starts with a <vml> root element. This is the canonical format—XML files are the source of truth, and all derived outputs (MP4 videos, JSON timelines) are generated artifacts.

Required Attributes

<vml
  id="intro"
  title="Introduction Video"
  fps="30"
  width="1920"
  height="1080"
>
  <!-- scenes go here -->
</vml>

id: Unique identifier for this video
title: Human-readable name
fps: Frames per second (24, 30, or 60 typical)
width / height: Video dimensions in pixels

File Extension

VideoML files use the .babulus.xml extension. This disambiguates them from generic XML and signals they contain time-based video markup.

Design Decision: XML was chosen over JSON because it maps naturally to HTML/DOM, supports mixed content (text + elements), and allows Web Component syntax (<title-slide />).

Temporal Layout Model

VideoML automatically calculates durations based on content, treating time like CSS treats space. Add a 5-second scene, and the video grows by 5 seconds—no manual duration math required.

Sequence (Back-to-Back)

The <sequence> element plays children one after another. Total duration equals the sum of all child durations.

<sequence>
  <scene duration="3s">Scene 1</scene>  <!-- 0-3s -->
  <scene duration="2s">Scene 2</scene>  <!-- 3-5s -->
  <scene duration="4s">Scene 3</scene>  <!-- 5-9s -->
</sequence>
<!-- Total: 9 seconds -->

Stack (Parallel)

The <stack> element plays children simultaneously. Total duration equals the longest child.

<stack>
  <!-- Visual layer: 10s -->
  <layer duration="10s">
    <title-slide title="Welcome" />
  </layer>

  <!-- Audio narration: 8s -->
  <audio src="narration.wav" duration="8s" />
</stack>
<!-- Total: 10 seconds (longest child) -->

Automatic Duration

If you omit the duration attribute, VideoML calculates it from child elements or generated audio (TTS).

<scene>
  <cue>Welcome to the tutorial.</cue>
</scene>
<!-- Duration = length of TTS audio for "Welcome to the tutorial." -->

Tip: Narration-driven videos use automatic duration. Motion-driven videos (animations, screen recordings) set explicit durations.

Timeline API

Access current playback state via the global window.timeline object. This API is available in both Live Mode (real-time editing) and Export Mode (rendering).

Properties

Property	Type	Description
`window.timeline.frame`	number	Current frame (0-indexed)
`window.timeline.time`	number	Current time in seconds
`window.timeline.fps`	number	Frames per second
`window.timelines`	Timeline[]	Array of all timelines (for multi-player sync)

CSS Variables

VideoML sets CSS custom properties on the root element every frame. Use these in stylesheets for dynamic effects.

/* Access timeline state in CSS */
.progress-bar {
  width: calc(100% * var(--video-time) / var(--video-duration));
}

.fade-in {
  opacity: calc(var(--video-frame) / 30); /* Fade in over 30 frames */
}

JavaScript Example

// Log playback state every frame
window.addEventListener('timeline:tick', (e) => {
  console.log("Frame " + window.timeline.frame + " at " + window.timeline.time + "s");
});

// Calculate progress percentage
const progress = (window.timeline.frame / totalFrames) * 100;

Performance: timeline:tick fires every frame (30-60 times per second). Avoid expensive operations in tick handlers.

Lifecycle Events

VideoML dispatches events when scenes and cues start/end. Listen to these events to trigger animations, update UI, or log analytics.

Event Types

Event	Dispatched On	Detail
`timeline:tick`	`<vml>`, `window`	`{ frame, time, fps }`
`scene:start`	`<scene>`, `window`	`{ sceneId, startTime }`
`scene:end`	`<scene>`, `window`	`{ sceneId, endTime }`
`cue:start`	`<cue>`, `window`	`{ cueId, text }`
`cue:end`	`<cue>`, `window`	`{ cueId }`

Example: Scene Transitions

<vml id="demo" fps="30" width="1920" height="1080">
  <script>
    // Log when scenes change
    window.addEventListener('scene:start', (e) => {
      console.log(`Scene started: ${e.detail.sceneId}`);
    });
  </script>

  <scene id="intro" duration="3s">
    <!-- scene content -->
  </scene>
</vml>

Inline JavaScript

VideoML supports <script> tags and on:* event handler attributes. Scripts execute when inserted into the DOM, just like HTML.

Script Blocks

<vml id="interactive" fps="30" width="1920" height="1080">
  <script>
    // Initialize state
    let clickCount = 0;

    function handleClick() {
      clickCount++;
      console.log(`Clicked ${clickCount} times`);
    }
  </script>

  <scene duration="5s">
    <button on:click="handleClick()">Click Me</button>
  </scene>
</vml>

Event Handler Attributes

Use on:* attributes to attach event handlers inline. The handler scope includes:

event: The DOM event object
target: The element that triggered the event
timeline: The current timeline object
root: The <vml> root element

<button on:click="console.log(timeline.frame)">Log Frame</button>
<input on:change="target.value = target.value.toUpperCase()" />

Ignoring Subtrees

Use data-videoml-ignore="true" to exclude DOM subtrees from handler rebinding and mutation recording. This is useful for third-party widgets or performance-sensitive areas.

<div data-videoml-ignore="true">
  <!-- This subtree is ignored by VideoML runtime -->
  <iframe src="external-widget.html"></iframe>
</div>

Scenes and Layers

Scenes are the primary structural unit. Layers provide z-index stacking within scenes.

Scene Element

<scene
  id="intro"
  start="0s"
  duration="5s"
>
  <!-- scene content -->
</scene>

id: Unique identifier
start: Absolute start time (optional, defaults to sequential)
duration: Scene length (optional if auto-calculated)

Layer Element

Layers stack visually using CSS z-index. Higher z-index appears on top.

<scene duration="8s">
  <!-- Background layer (z-index: 0) -->
  <layer style="z-index: 0">
    <background-gradient colors="blue,purple" />
  </layer>

  <!-- Content layer (z-index: 10) -->
  <layer style="z-index: 10">
    <title-slide title="Hello" />
  </layer>

  <!-- Overlay layer (z-index: 20) -->
  <layer style="z-index: 20">
    <lower-third name="Jane Doe" title="CEO" />
  </layer>
</scene>

Cues and Text-to-Speech

Cues contain narration text that gets converted to audio via TTS providers. The audio duration determines scene length.

Cue Element

<scene>
  <cue id="intro-1" voice="en-US-Neural">
    Welcome to the tutorial. Today we'll cover the basics.
  </cue>
</scene>
<!-- Scene duration = TTS audio length -->

id: Unique identifier for this cue
voice: TTS voice name (provider-specific)
Text content: The narration script

Multiple Cues

Multiple cues in a scene play sequentially (like a sequence).

<scene>
  <cue id="line-1">First sentence.</cue>
  <cue id="line-2">Second sentence.</cue>
</scene>
<!-- Total duration = sum of both audio clips -->

Narration Track

Use <narration> for voiceover that spans multiple transitions. Narration items sit on the timeline (like scenes/transitions) but render no visuals.

<transition effect="push" duration="12f" />
<narration id="layouts-voice">
  <cue id="layouts">
    Use one-column, two-column, three-column, and grid layouts.
  </cue>
</narration>

Place <narration> just before the scenes it should align with. It will start at the next scene's start unless you provide start or duration.

Web Components

VideoML uses Web Components (custom elements) for reusable UI. Any hyphenated tag is treated as a Web Component.

Component Syntax

<!-- Built-in components -->
<title-slide title="Welcome" subtitle="Get Started" />
<lower-third name="Jane Doe" title="CEO" />
<code-block language="javascript">
  console.log('Hello');
</code-block>

<!-- Custom components -->
<my-chart data="[1,2,3]" type="bar" />

Props Attribute

For complex data, use the props attribute with JSON.

<data-visualization
  props='{
    "data": [
      {"month": "Jan", "sales": 100},
      {"month": "Feb", "sales": 150}
    ],
    "chartType": "line",
    "showLegend": true
  }'
/>

Design Decision: Web Components avoid framework lock-in. They work with vanilla JavaScript, React, Vue, or any library.

Transitions

Full transitions guide & integration test gallery

Transitions are first-class timeline items that sit between scenes. A transition is a container (like a scene) that can hold visuals and audio, and it can either overlap adjacent scenes or insert time between them.

Timing: All time values accept seconds or frames (e.g. 0.6s, 12f). Time expressions can reference scene(), cue(), and mark().

Transition Element

Use <transition> between scenes for crossfades, wipes, and any branded or custom transitions.

Attribute	Type	Notes
`id`	string	Required unique id.
`start` / `end` / `duration`	time	Optional. Explicit timing for the transition window.
`effect`	string	Named transition preset (e.g. `crossfade`, `fade`, `wipe`).
`ease`	string	GSAP ease string (e.g. `power2.inOut`).
`mode`	string	`overlap` (default) or `insert`.
`overflow`	string	Visual overflow behavior: `clip`, `extend`, `allow`.
`overflow-audio`	string	Audio overflow behavior: `clip`, `extend`, `allow`.

Audio in Transitions

Transitions can include SFX and music via <sfx>, <music>, or <audio kind=...>.

<transition id="wipe-01" effect="wipe" duration="18f" ease="power2.inOut">
  <sfx id="whoosh" start="0f" />
</transition>

Scene Enter/Exit (Convenience)

For quick fades, use per-scene convenience attributes. These do not crossfade; they only animate the scene itself.

<scene
  id="intro"
  enter="fade"
  enter-duration="12f"
  exit="fade"
  exit-duration="12f"
>
  <cue id="intro">Welcome to the video.</cue>
</scene>

Crossfade: Use a dedicated <transition> element for true crossfades between scenes.

Complete Example

Here's a full VideoML document demonstrating all core features:

<vml
  id="product-demo"
  title="Product Demo Video"
  fps="30"
  width="1920"
  height="1080"
>
  <script>
    // Track scene transitions
    window.addEventListener('scene:start', (e) => {
      console.log(`Started: ${e.detail.sceneId}`);
    });
  </script>

  <!-- Title scene with explicit duration -->
  <scene id="title" duration="3s">
    <layer>
      <title-slide
        title="Product Demo"
        subtitle="Version 2.0"
      />
    </layer>
  </scene>

  <!-- Narration scene with auto duration from TTS -->
  <scene id="intro">
    <stack>
      <!-- Visual layer -->
      <layer>
        <content-screen title="Overview">
          <ul>
            <li>Fast performance</li>
            <li>Easy to use</li>
            <li>Secure by default</li>
          </ul>
        </content-screen>
      </layer>

      <!-- Narration track -->
      <cue id="intro-narration">
        Our new product offers fast performance, ease of use, and security.
      </cue>
    </stack>
  </scene>

  <!-- Code demo scene -->
  <scene id="code-demo" duration="8s">
    <layer>
      <code-block language="javascript" filename="app.js">
import { render } from 'babulus';

render({
  fps: 30,
  width: 1920,
  height: 1080
});
      </code-block>
    </layer>
  </scene>

  <!-- Outro with lower third -->
  <scene id="outro" duration="4s">
    <layer style="z-index: 0">
      <background-gradient colors="#667eea,#764ba2" />
    </layer>
    <layer style="z-index: 10">
      <lower-third
        name="Learn More"
        title="babulus.dev"
      />
    </layer>
  </scene>
</vml>

Design Decisions

Understanding the "why" behind VideoML's approach:

Why XML Over JSON?

HTML Familiarity: Web developers already know XML/HTML syntax
Mixed Content: XML supports text + elements naturally (JSON requires nested objects)
Web Components: Hyphenated custom elements (<title-slide />) map directly to Web Component spec
Tooling: XML parsers, validators, and editors are mature and widely available

Why DOM Runtime?

Zero Translation: VideoML elements become DOM nodes directly—no virtual layer
CSS Compatibility: Use standard CSS for styling, animations, and layout
JavaScript Integration: Manipulate video content with familiar DOM APIs
Browser Features: Leverage existing browser capabilities (accessibility, dev tools, extensions)

Why Temporal Layout?

No Duration Math: Eliminate error-prone manual calculations
Composability: Scenes and sequences nest naturally
Flexibility: Swap a 3s scene for a 5s scene—total duration updates automatically
Narration-Driven: TTS audio length determines timing—video adapts to script changes

Why No Sandboxing?

Trust Model: VideoML files are source code, not untrusted user input
Power vs Safety: Full JavaScript access enables rich interactions (at cost of security review)
Renderer Context: Videos render in isolated headless browsers anyway
Future Work: Sandboxed mode may be added for user-generated content scenarios

Quick Reference

Core Elements

Element	Purpose	Key Attributes
`<vml>`	Root element	`id`, `fps`, `width`, `height`
`<scene>`	Video section	`id`, `start`, `duration`
`<sequence>`	Back-to-back	`duration` (auto)
`<stack>`	Parallel	`duration` (auto = max)
`<layer>`	Z-index container	`style`
`<cue>`	TTS narration	`id`, `voice`

Timeline API Summary

API	Access
Current frame	`window.timeline.frame`
Current time	`window.timeline.time`
Frame rate	`window.timeline.fps`
CSS var (frame)	`var(--video-frame)`
CSS var (time)	`var(--video-time)`

Event Summary

Event	Fires When
`timeline:tick`	Every frame
`scene:start`	Scene begins
`scene:end`	Scene ends
`cue:start`	Cue audio starts
`cue:end`	Cue audio ends

Scope and Non-Goals

In Scope

XML syntax and root element definition
Temporal layout model (sequence, stack, auto-duration)
Timeline API specification
Lifecycle events (tick, scene, cue)
Inline JavaScript and event handlers
Web Component integration

Out of Scope

Determinism Enforcement: VideoML does not guarantee reproducible output if scripts use randomness or external state
Sandboxed Scripting: No security isolation for <script> blocks (treat VideoML as trusted source code)
Backward Compatibility: <video> root is deprecated; <vml> is canonical
Animation Keyframes: Use CSS animations or JavaScript—VideoML provides timing, not animation primitives

VideoML Conformance — Validation and testing
Live VOM — Real-time editing and recording
Components Guide — Built-in Web Components
Rendering Overview — Generating MP4 outputs
Glossary — VideoML terminology reference

Next Steps

Try It: Create your first VideoML file in Code to Video
Learn Components: Explore Layouts and Components
Advanced: Read about Live Mode for interactive editing

VideoML Standard

What You'll Learn

Canonical Root Element

Required Attributes

File Extension

Temporal Layout Model

Sequence (Back-to-Back)

Stack (Parallel)

Automatic Duration

Timeline API

Properties

CSS Variables

JavaScript Example

Lifecycle Events

Event Types

Example: Scene Transitions

Inline JavaScript

Script Blocks

Event Handler Attributes

Ignoring Subtrees

Scenes and Layers

Scene Element

Layer Element

Cues and Text-to-Speech

Cue Element

Multiple Cues

Narration Track

Web Components

Component Syntax

Props Attribute

Transitions

Transition Element

Audio in Transitions

Scene Enter/Exit (Convenience)

Complete Example

Design Decisions

Why XML Over JSON?

Why DOM Runtime?

Why Temporal Layout?

Why No Sandboxing?

Quick Reference

Core Elements

Timeline API Summary

Event Summary

Scope and Non-Goals

In Scope

Out of Scope

Related Topics

Next Steps