Architecture Deep-Dive

A technical walkthrough of how the Social Media Automation Engine works under the hood. Each section covers a distinct subsystem — from macro architecture to deployment topology.

Macro Architecture

The system is composed of 6 interconnected subsystems that form a closed feedback loop. Content performance data feeds back into trend scoring and script generation.

  TREND DISCOVERY ───→ CONTENT CALENDAR ───→ CONTENT PIPELINE ───→ MULTI-PLATFORM PUBLISHER
        ↑                                                                          │
        │                                                                          ↓
  INTELLIGENCE LAYER ←───── ANALYTICS ENGINE ←────────────────────────────────┘

  ───────────────────────────────────────────────────────────────────────────────
  Each subsystem is a separate Temporal workflow running on independent schedules.
  The pipeline is event-sourced — every state transition is durable and replayable.

🔍 Trend Discovery

Scans 5 data sources (Reddit, HN, Google Trends, GitHub, YouTube) every 4 hours. Normalizes scores, deduplicates via trigram Jaccard, filters by niche, checks Mem0 for already-covered topics.

📅 Content Calendar

Picks top-scored topics, assigns optimal publish times based on historical engagement data. Respects per-platform cooldowns and daily production limits.

🎬 Content Pipeline

The core 11-step workflow: from script generation through video rendering to approval. Durable execution via Temporal with 5 QA gates and cascading fallback chains.

🚀 Multi-Platform Publisher

Publishes to Facebook (Graph API), Instagram (Graph API), YouTube (Data API v3), and TikTok (API). Per-platform metadata optimization (aspect ratio, hashtags, descriptions).

📊 Analytics Engine

Collects engagement metrics at checkpoints: 1h, 6h, 24h, 48h, 168h post-publish. Calculates viral scores and engagement rates per platform.

🧠 Intelligence Layer

Learns from performance data: hook type analysis (6 styles), content feature extraction, audience fingerprinting. Enriches scriptwriter prompts via Mem0 semantic memory.

Temporal Workflow Anatomy

The ContentPipelineWorkflow is a Temporal durable workflow with 11 activity steps, 5 QA gates, and a human approval signal. Rejection loops back to step 2 with accumulated feedback (up to 5 rounds).

01

analyse_media

GPT-4o Vision analyses photos/videos sent to the bot. Extracts context, mood, subjects for script generation.

optional — skipped for text-only
02

generate_script

Multi-provider LLM generates a video script. Enriched with Mem0 learned preferences and intelligence insights.

QA_SCRIPT — 3× retry with corrective feedback
03

send_progress

Telegram notification to user: script preview and generation progress.

04

generate_voiceover

4-tier TTS chain: XTTS-v2 (local voice cloning) → Speaches/Kokoro → ElevenLabs → edge-tts.

QA_TTS — reverse transcription verification via Whisper
05

generate_captions

OpenAI Whisper (local) generates word-level timestamps. Custom ASS generator produces karaoke-style word-by-word highlighting.

QA_CAPTIONS — coverage, timing, font, spatial overlap checks
06

generate_background_music

MusicGen (local GPU) → Freesound API → ElevenLabs → cached loops → FFmpeg procedural ambient.

best-effort — pipeline continues without music
07

fetch_background_video

Pexels API with freshness-aware selection. Falls back to procedural FFmpeg animations (gradient, bokeh).

best-effort
08

generate_image

ComfyUI + FLUX on local GPU → DALL-E 3 cloud fallback. Used for visual overlays in the video.

best-effort
09

render_video

FFmpeg compositing: animated backgrounds, text cards, code blocks, diagrams, karaoke captions, audio mixing. 1080×1920 output.

QA_RENDER + QA_VISUAL — frame sampling, black/frozen detection
10

upload → send_preview → wait_for_approval

Upload to Cloudinary → send preview via Telegram → Temporal Signal pauses workflow indefinitely until user taps Approve/Reject.

Temporal Signal — indefinite pause
11

publish_multiplatform

Publish to all enabled platforms (Facebook, Instagram, YouTube, TikTok) with per-platform metadata. Log result to DB.

Rejection → loop to step 02 (up to 5 rounds)
Scheduled Workflows

Four supporting workflows run on Temporal Schedules, operating autonomously alongside the main content pipeline.

WorkflowSchedulePurposeKey Activities
TrendScanWorkflow Every 4h Discover trending topics from 5 data sources Scan → normalize → deduplicate → filter by niche → store to queue
ContentSchedulerWorkflow Every 12h Pick top topics, assign optimal publish slots Score ranking → time slot optimization → platform assignment
AnalyticsCollectorWorkflow Every 4h Collect engagement metrics at checkpoints Pull metrics at 1h, 6h, 24h, 48h, 168h → calculate viral scores
IntelligenceWorkflow Every 24h Learn from performance data Hook analysis → feature extraction → audience profiling → Mem0 update
Video Rendering Engine

The render activity (909 lines) builds complex FFmpeg filter chains that compose multiple visual layers into a final 1080×1920 vertical video with synchronized audio.

🎨 Animated Backgrounds

Gradient animations with hue rotation + soft bokeh glow effects. Falls back to Pexels stock footage or solid gradients.

💻 Code Block Overlays

Pygments + Pillow render VS Code-style syntax-highlighted code to PNG. Title bar, line numbers, Monokai theme.

📊 Mermaid Diagrams

Mermaid CLI (mmdc) + Chromium renders diagrams to PNG. Adaptive theming: samples frame brightness to pick dark/light theme.

📝 Karaoke Captions

Custom ASS generator with word-by-word highlighting. 68px MrBeast/Hormozi style. Word-level timestamps from Whisper.

🔊 Audio Mixing

Voiceover + generated background music at calibrated levels. Ducking when speech is active. Fade in/out.

📐 Layout Engine

Centralized 246-line positioning logic with platform-specific safe zones (TikTok, Instagram, YouTube). Intelligent overlay coexistence.

render_activity.py — FFmpeg filter chain composition (simplified)
# Build the FFmpeg filter graph for final video composition
def build_filter_chain(ctx: RenderContext) -> str:
    filters = []

    # 1. Background: animated gradient OR Pexels stock footage
    if ctx.background_video:
        filters.append(f"[0:v]scale=1080:1920,setsar=1[bg]")
    else:
        filters.append(generate_gradient_background(ctx.duration))

    # 2. Overlay: code blocks, diagrams (position from layout engine)
    for overlay in ctx.overlays:
        x, y = layout_engine.position(overlay, ctx.platform)
        filters.append(f"[bg][{overlay.id}]overlay={x}:{y}:enable='...")

    # 3. Captions: ASS subtitle burn-in with karaoke styling
    filters.append(f"subtitles={ctx.caption_file}:force_style='...'")

    # 4. Audio: mix voiceover + music with ducking
    filters.append(f"[voice][music]amix=inputs=2:duration=first")

    return ";".join(filters)
Data Layer

SQLite for structured data + Mem0 for semantic memory. Pydantic v2 models enforce type safety across all boundaries.

🗄️ SQLite — Content-Addressable Store

  • SHA-256 fingerprinted assets (scripts, audio, video, images)
  • Publish history with per-platform tracking
  • Engagement metrics time-series
  • Topic queue with status tracking
  • Posting schedule with slot management
  • Trigram Jaccard dedup (threshold 0.35)

📐 Pydantic v2 Models

  • schemas.py — ContentBrief, ScriptResult, VoiceResult, CaptionResult, RenderResult, PipelinePayload
  • trend_models.py — TrendSource, TrendItem, TopicStatus, TopicQueueItem, NicheConfig
  • scheduler_models.py — PostingSlot, PostMetric, ScheduledItem
  • autonomy_models.py — AutonomyMode, AutopilotGuards
12-Page Web Dashboard

Built with FastAPI + Jinja2 + HTMX + Alpine.js + Tailwind CSS. Zero build step — no webpack, no npm. 2,905 lines serving 12 pages with real-time updates via HTMX polling.

📊
Pipeline Monitor
Live runs, staging artifacts, output videos, log tail
✂️
Video Editor
Re-render, trim, adjust captions, preview
⚙️
Settings
API keys (masked), render defaults, toggles
📚
Content Library
Browse assets, scripts, publishes, dedup stats
🔥
Trends
Live trend scan results, niche filtering
📅
Content Calendar
Scheduled posts, time slot management
📈
Analytics
Engagement metrics, viral scores, platform comparison
🧠
Intelligence
Learned insights: best hooks, optimal word counts
🤖
Autonomy Control
Manual → Suggest → Autopilot mode switching
🎮
AI Playground
Interactive model testing: chat, TTS, STT, diarize
🚀
Publish
Manual publish trigger and platform selection
💬
Telegram Bot
12 commands + inline keyboards + freeform input
Deployment & Local GPU Stack

Docker Compose orchestrates 6 services. A separate local GPU server ("Cognitive Silo") runs all AI models — keeping the pipeline fully self-hosted.

🐳 Docker Compose Services

  • temporal-db — PostgreSQL 16 Alpine
  • temporal — auto-setup server (depends on DB healthy)
  • temporal-ui — Web UI at :8233
  • worker — Python: 55 activities + 5 workflows
  • telegram-bot — Python: Telegram polling
  • dashboard — FastAPI on :8080

🖥️ Cognitive Silo (Local GPU)

  • Ollama (:11434) — DeepSeek R1/V2, Llama 3.3 on AMD W7900
  • XTTS-v2 (:8002) — Local voice cloning TTS
  • MusicGen (:8003) — Meta's music generation model
  • ComfyUI (:8188) — FLUX image generation
  • Speaches (:8000) — Kokoro TTS, Whisper, VAD
  • Mem0 (:8080) — Semantic memory with LLM extraction
Dockerfile — Multi-stage build (simplified)
FROM python:3.11-slim

# System deps: FFmpeg for rendering, Node.js for Mermaid diagrams
RUN apt-get update && apt-get install -y \
    ffmpeg fonts-dejavu \
    nodejs npm chromium

# Mermaid CLI for diagram rendering via Puppeteer
RUN npm install -g @mermaid-js/mermaid-cli

# Python deps + Whisper for local captioning
COPY pyproject.toml .
RUN pip install -e . && pip install openai-whisper

COPY . /app
WORKDIR /app
Project Structure

~24,700 lines of Python across 80+ source files. Every module has a clear responsibility boundary.

sma/
├── bot/                      # Telegram bot: handlers, keyboards, 12 commands
│   └── main.py
├── config/                   # YAML config + Pydantic Settings + secrets
│   ├── config.yaml
│   └── loader.py             # Strongly typed hierarchical config
├── dashboard/                # FastAPI + Jinja2: 12 pages, 2,905 lines
│   ├── app.py
│   ├── templates/            # HTMX + Alpine.js + Tailwind
│   └── static/
├── data/                     # SQLite DB: content-addressable asset store
│   ├── db.py
│   └── models.py
├── logger/                   # structlog: dual-write JSONL + human-readable
├── models/                   # Pydantic v2: 4 schema modules
│   ├── schemas.py            # Pipeline data models
│   ├── trend_models.py       # Trend discovery models
│   ├── scheduler_models.py   # Content calendar models
│   └── autonomy_models.py    # Autonomy control models
├── pipeline/                 # Temporal workflows + activities
│   ├── workflow.py           # ContentPipelineWorkflow (11 steps)
│   ├── worker.py             # Registers 5 workflows + 55 activities
│   ├── trend_workflow.py     # TrendScanWorkflow (4h schedule)
│   ├── scheduler_workflow.py # ContentSchedulerWorkflow (12h)
│   ├── analytics_workflow.py # AnalyticsCollectorWorkflow (4h)
│   ├── intelligence_workflow.py
│   ├── layout_engine.py      # Platform-specific safe zones (246 lines)
│   └── activities/           # 27 activity modules
│       ├── ai_activity.py          # Multi-provider LLM orchestration
│       ├── tts_activity.py         # 4-tier TTS fallback chain
│       ├── caption_activity.py     # Whisper + ASS generator
│       ├── render_activity.py      # FFmpeg composition (909 lines)
│       ├── qa_activity.py          # 5 QA gates (807 lines)
│       ├── code_renderer.py        # VS Code-style code → PNG
│       ├── diagram_activity.py     # Mermaid → PNG
│       ├── facebook_activity.py    # Facebook Graph API
│       ├── instagram_activity.py   # Instagram Graph API
│       ├── youtube_activity.py     # YouTube Data API v3
│       ├── tiktok_activity.py      # TikTok API
│       └── multiplatform_activity.py
├── prompts/                  # AI system prompts for each pipeline stage
├── services/                 # Business logic services
│   ├── ai_service.py         # Multi-provider gateway
│   ├── trend_scanner.py      # 5-source scanner (504 lines)
│   ├── trend_aggregator.py   # Normalize + score + dedup (275 lines)
│   ├── intelligence.py       # Self-learning brain (603 lines)
│   ├── autonomy.py           # 3-level autonomy with safety guards
│   └── posting_optimizer.py  # Time slot optimization
├── tests/                    # pytest-asyncio: 8 test modules
│   ├── conftest.py           # Fixtures: isolated DB, seeded data (248 lines)
│   ├── test_e2e.py           # End-to-end integration (431 lines)
│   ├── test_qa.py            # All 5 QA gates (388 lines)
│   ├── test_intelligence_unit.py  # Intelligence brain unit tests (374 lines)
│   └── ...                   # + regression, diagram, integration tests
├── docker-compose.yaml       # 6 services with health checks
├── Dockerfile                # Python 3.11 + FFmpeg + Node.js + Chromium + Whisper
└── pyproject.toml            # Dependencies + project metadata
Engineering Methodology

Architecture-first engineering: spec before code. Every feature started as a plan document with alternatives considered, risks assessed, and implementation phased.

📋 7 Plan Documents

  • PLAN.md — Core architecture: Temporal, bot, pipeline
  • PLAN2.md — Asset DB: SQLite, SHA-256, dedup
  • PLAN3.md — AI Playground dashboard tab
  • PLAN4.md — Mermaid diagram overlays
  • PLAN5.md — Autonomous viral content engine
  • UPGRADE_PLAN.md — v2 rendering fixes
  • QA_PIPELINE_PLAN.md — 5 QA gates + Command Center

🧪 Testing Strategy

  • 8 test modules covering unit, integration, e2e, regression
  • pytest-asyncio with asyncio_mode = "auto"
  • Shared fixtures: isolated in-memory SQLite seeded with 10 posts + metrics
  • Mocked externals: Mem0, Telegram, AI providers
  • Tests validate QA gates, intelligence brain, diagram rendering, full pipeline flow

Autonomy Modes with Safety Guards

The system supports three autonomy levels, controllable via Telegram or dashboard:

MANUAL
Human triggers every pipeline run. Full control.
SUGGEST
System suggests topics and times. Human approves.
AUTOPILOT
Fully autonomous. Hard-coded rule: Telegram approval ALWAYS required before publish.
Explore Further

View the source code, read the plan documents, or go back to the project overview.

View Source on GitHub ← Project Overview Back to Portfolio