AWS Polly TTS Quick Start
AWS Polly is Amazon's text-to-speech service. It's cost-effective (~$4 per 1 million characters) and works well for development and production workloads.
Prerequisites
AWS Account with Polly access
AWS Credentials configured via one of:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY - AWS credentials file:
~/.aws/credentials - IAM role (if running on EC2/ECS)
- Environment variables:
Python boto3 library installed:
pip install boto3
Configuration
1. Add AWS Polly to your config
Edit .babulus/config.yml:
providers:
aws_polly:
region: "us-east-1"
voice_id: "Joanna"
engine: "standard" # or "neural" for better quality2. Set provider in your DSL
Edit your .babulus.yml file:
voiceover:
provider: aws-polly # or just "aws"
sample_rate_hz: 16000 # Required: Polly PCM only supports 8000 or 16000 HzImportant: AWS Polly's PCM format only supports 8000 Hz or 16000 Hz sample rates. Use 16000 Hz for best quality.
3. Environment-Based Configuration (Recommended)
For multi-environment workflows:
voiceover:
provider:
development: openai
aws: aws # Use Polly for AWS testing
production: elevenlabs
sample_rate_hz:
development: 24000 # OpenAI
aws: 16000 # Polly PCM maximum
production: 44100 # ElevenLabsThen generate with:
BABULUS_ENV=aws babulus generate your-video.babulus.ymlAvailable Voices
Standard Engine (Cheaper)
- English (US): Joanna, Matthew, Ivy, Kendra, Kimberly, Salli, Joey, Justin
- English (UK): Amy, Emma, Brian
- Many other languages: See AWS Polly Voices
Neural Engine (Better Quality)
- English (US): Joanna, Matthew, Ivy, Kendra, Kimberly, Salli, Joey, Justin, Kevin, Ruth, Stephen
- Requires
engine: "neural"in config - Slightly higher cost
Voice Selection
You can override the voice per DSL:
voiceover:
provider: aws
voice: Matthew # Override default voice
sample_rate_hz: 16000Or use the default from config.
Pricing
- Standard voices: 4.00per1millioncharacters( 0.004/1K chars)
- Neural voices: 16.00per1millioncharacters( 0.016/1K chars)
- Much cheaper than ElevenLabs (~$0.30/1K chars)
- More expensive than OpenAI ($0.015/1K chars)
Troubleshooting
Error: "AWS Polly PCM only supports sample_rate_hz [8000, 16000]"
Solution: Set sample_rate_hz: 16000 in your voiceover config.
Error: "Unable to locate credentials"
Solutions:
Set environment variables:
export AWS_ACCESS_KEY_ID=your_key_id export AWS_SECRET_ACCESS_KEY=your_secret_keyOr create
~/.aws/credentials:[default] aws_access_key_id = your_key_id aws_secret_access_key = your_secret_keyOr use IAM roles if running on AWS infrastructure
Low Audio Quality
Solution: Switch to neural engine for better quality:
providers:
aws_polly:
engine: "neural"Note: Neural voices cost 4x more than standard but sound significantly better.
Comparison with Other Providers
| Feature | AWS Polly | OpenAI TTS | ElevenLabs |
|---|---|---|---|
| Price (per 1K chars) | $0.004-0.016 | $0.015 | ~$0.30 |
| Quality | Good (neural) / OK (standard) | Very Good | Excellent |
| Sample rates | 8000, 16000 Hz | 24000 Hz | 22050-44100 Hz |
| Voices | 60+ voices, 20+ languages | 6 voices | 1000+ voices |
| Credentials | AWS credentials | API key | API key |
| Best for | Cost-conscious production | Development/prototyping | Final production |
Example: Complete DSL
voiceover:
provider: aws
voice: Matthew
sample_rate_hz: 16000
seed: 1337
lead_in_seconds: 0.25
scenes:
- id: intro
title: "Introduction"
cues:
- id: welcome
label: "Welcome"
voice: "Welcome to our presentation. Let's get started."AWS Polly Features Not (Yet) Supported
- SSML tags for prosody control
- Speech marks for word-level timing
- Lexicons for pronunciation customization
These may be added in future releases. For now, Babulus uses plain text synthesis.