End-to-end autonomous pipeline: send a text message to a Telegram bot →
AI generates a script → voiceover → captions → video render → you approve → auto-publish
to Facebook, Instagram, YouTube, and TikTok. Fully self-hosted. Zero recurring cost.
A Temporal durable workflow orchestrates every step. If any step fails, it retries automatically. If the process crashes, it resumes exactly where it left off.
Rejection loops back to Script Generation with accumulated feedback (up to 5 rounds)
Every architectural choice was deliberate. Here are the six decisions that define the system.
The entire pipeline is event-sourced via Temporal. Every step is recorded, replayable, and automatically recoverable on crash. Workflow state persists across process restarts.
Every AI call chains through local GPU → cloud free tier → paid fallback. The pipeline never fails from a single service outage.
Telegram approval uses Temporal Signals — the workflow pauses indefinitely at the approval step, then resumes when the user taps Approve/Reject. No polling. No timeouts.
Five automated validation gates catch issues at every stage. Each gate retries up to 3× with corrective feedback injection before alerting.
Every asset is SHA-256 fingerprinted and stored in SQLite. Trigram Jaccard similarity (threshold 0.35) catches near-duplicate scripts automatically.
The system learns from its own performance data: hook type analysis, audience profiling, trend scoring. Performance metrics feed back into scriptwriter prompts via Mem0.
Every external dependency has a cascading fallback. If the local GPU is down, cloud takes over. If cloud is unreachable, a free alternative kicks in.
Every artifact passes through automated validation before proceeding. Each gate retries 3× with corrective feedback injection, then sends a Telegram alert if remediation fails.
| Layer | Technology | Role |
|---|---|---|
| Orchestration | Temporal.io | Durable workflow execution, event sourcing, crash recovery |
| Bot Interface | python-telegram-bot | User interaction — briefs, approvals, analytics |
| AI - Script | GPT-4o / DeepSeek / Claude | Multi-provider LLM chain with local-first priority |
| AI - Vision | GPT-4o Vision / Llama 3.3 | Multimodal media analysis for photos/videos |
| AI - Image | ComfyUI + FLUX / DALL-E 3 | AI-generated visuals with local GPU priority |
| TTS | XTTS-v2 / ElevenLabs / edge-tts | 4-tier voice synthesis with local voice cloning |
| Music | MusicGen (Meta) / Freesound | Local generative music on GPU |
| Captions | OpenAI Whisper (local) | Word-level timestamps for karaoke-style captions |
| Video | FFmpeg | 1080×1920 compositing: backgrounds, code blocks, diagrams, captions |
| Diagrams | Mermaid CLI + Chromium | Mermaid → PNG with adaptive theming, overlaid on video |
| Code Blocks | Pygments + Pillow | VS Code-style syntax highlighting → PNG |
| Storage | Cloudinary / SQLite | Cloud hosting + local content-addressable asset DB |
| Memory | Mem0 (self-hosted) | Semantic memory for script context and learning |
| Dashboard | FastAPI + HTMX + Alpine.js | 12-page Command Center (zero build step) |
| Trends | Reddit / HN / Google Trends / GitHub / YouTube | 5-source trend discovery with scoring algorithm |
| Publishing | Facebook / Instagram / YouTube / TikTok APIs | Multi-platform auto-publish with per-platform optimization |
| Deployment | Docker Compose (6 services) | PostgreSQL, Temporal, Worker, Bot, Dashboard |
| Testing | pytest-asyncio | 8 test modules, shared fixtures, async-native |
Every component is open-source, free tier, or self-hosted. The system delivers capabilities comparable to paid SaaS tools (Descript, Opus Clip, Repurpose.io) at zero cost.
The architecture deep-dive covers Temporal workflow anatomy, rendering engine internals, intelligence layer design, deployment topology, and the complete project structure.