SMA Architecture Deep-Dive

01 — System Design

Macro Architecture

The system is composed of 6 interconnected subsystems that form a closed feedback loop. Content performance data feeds back into trend scoring and script generation.

  TREND DISCOVERY ───→ CONTENT CALENDAR ───→ CONTENT PIPELINE ───→ MULTI-PLATFORM PUBLISHER
        ↑                                                                          │
        │                                                                          ↓
  INTELLIGENCE LAYER ←───── ANALYTICS ENGINE ←────────────────────────────────┘

  ───────────────────────────────────────────────────────────────────────────────
  Each subsystem is a separate Temporal workflow running on independent schedules.
  The pipeline is event-sourced — every state transition is durable and replayable.

🔍 Trend Discovery

Scans 5 data sources (Reddit, HN, Google Trends, GitHub, YouTube) every 4 hours. Normalizes scores, deduplicates via trigram Jaccard, filters by niche, checks Mem0 for already-covered topics.

📅 Content Calendar

Picks top-scored topics, assigns optimal publish times based on historical engagement data. Respects per-platform cooldowns and daily production limits.

🎬 Content Pipeline

The core 11-step workflow: from script generation through video rendering to approval. Durable execution via Temporal with 5 QA gates and cascading fallback chains.

🚀 Multi-Platform Publisher

Publishes to Facebook (Graph API), Instagram (Graph API), YouTube (Data API v3), and TikTok (API). Per-platform metadata optimization (aspect ratio, hashtags, descriptions).

📊 Analytics Engine

Collects engagement metrics at checkpoints: 1h, 6h, 24h, 48h, 168h post-publish. Calculates viral scores and engagement rates per platform.

🧠 Intelligence Layer

Learns from performance data: hook type analysis (6 styles), content feature extraction, audience fingerprinting. Enriches scriptwriter prompts via Mem0 semantic memory.

02 — Core Pipeline

Temporal Workflow Anatomy

The ContentPipelineWorkflow is a Temporal durable workflow with 11 activity steps, 5 QA gates, and a human approval signal. Rejection loops back to step 2 with accumulated feedback (up to 5 rounds).

analyse_media

GPT-4o Vision analyses photos/videos sent to the bot. Extracts context, mood, subjects for script generation.

optional — skipped for text-only

generate_script

Multi-provider LLM generates a video script. Enriched with Mem0 learned preferences and intelligence insights.

QA_SCRIPT — 3× retry with corrective feedback

send_progress

Telegram notification to user: script preview and generation progress.

generate_voiceover

4-tier TTS chain: XTTS-v2 (local voice cloning) → Speaches/Kokoro → ElevenLabs → edge-tts.

QA_TTS — reverse transcription verification via Whisper

generate_captions

OpenAI Whisper (local) generates word-level timestamps. Custom ASS generator produces karaoke-style word-by-word highlighting.

QA_CAPTIONS — coverage, timing, font, spatial overlap checks

generate_background_music

MusicGen (local GPU) → Freesound API → ElevenLabs → cached loops → FFmpeg procedural ambient.

best-effort — pipeline continues without music

fetch_background_video

Pexels API with freshness-aware selection. Falls back to procedural FFmpeg animations (gradient, bokeh).

best-effort

generate_image

ComfyUI + FLUX on local GPU → DALL-E 3 cloud fallback. Used for visual overlays in the video.

best-effort

render_video

FFmpeg compositing: animated backgrounds, text cards, code blocks, diagrams, karaoke captions, audio mixing. 1080×1920 output.

QA_RENDER + QA_VISUAL — frame sampling, black/frozen detection

upload → send_preview → wait_for_approval

Upload to Cloudinary → send preview via Telegram → Temporal Signal pauses workflow indefinitely until user taps Approve/Reject.

Temporal Signal — indefinite pause

publish_multiplatform

Publish to all enabled platforms (Facebook, Instagram, YouTube, TikTok) with per-platform metadata. Log result to DB.

Rejection → loop to step 02 (up to 5 rounds)

03 — Autonomous Operations

Scheduled Workflows

Four supporting workflows run on Temporal Schedules, operating autonomously alongside the main content pipeline.

Workflow	Schedule	Purpose	Key Activities
TrendScanWorkflow	Every 4h	Discover trending topics from 5 data sources	Scan → normalize → deduplicate → filter by niche → store to queue
ContentSchedulerWorkflow	Every 12h	Pick top topics, assign optimal publish slots	Score ranking → time slot optimization → platform assignment
AnalyticsCollectorWorkflow	Every 4h	Collect engagement metrics at checkpoints	Pull metrics at 1h, 6h, 24h, 48h, 168h → calculate viral scores
IntelligenceWorkflow	Every 24h	Learn from performance data	Hook analysis → feature extraction → audience profiling → Mem0 update

04 — Media Pipeline

Video Rendering Engine

The render activity (909 lines) builds complex FFmpeg filter chains that compose multiple visual layers into a final 1080×1920 vertical video with synchronized audio.

🎨 Animated Backgrounds

Gradient animations with hue rotation + soft bokeh glow effects. Falls back to Pexels stock footage or solid gradients.

💻 Code Block Overlays

Pygments + Pillow render VS Code-style syntax-highlighted code to PNG. Title bar, line numbers, Monokai theme.

📊 Mermaid Diagrams

Mermaid CLI (mmdc) + Chromium renders diagrams to PNG. Adaptive theming: samples frame brightness to pick dark/light theme.

📝 Karaoke Captions

Custom ASS generator with word-by-word highlighting. 68px MrBeast/Hormozi style. Word-level timestamps from Whisper.

🔊 Audio Mixing

Voiceover + generated background music at calibrated levels. Ducking when speech is active. Fade in/out.

📐 Layout Engine

Centralized 246-line positioning logic with platform-specific safe zones (TikTok, Instagram, YouTube). Intelligent overlay coexistence.

render_activity.py — FFmpeg filter chain composition (simplified)

# Build the FFmpeg filter graph for final video composition
def build_filter_chain(ctx: RenderContext) -> str:
    filters = []

    # 1. Background: animated gradient OR Pexels stock footage
    if ctx.background_video:
        filters.append(f"[0:v]scale=1080:1920,setsar=1[bg]")
    else:
        filters.append(generate_gradient_background(ctx.duration))

    # 2. Overlay: code blocks, diagrams (position from layout engine)
    for overlay in ctx.overlays:
        x, y = layout_engine.position(overlay, ctx.platform)
        filters.append(f"[bg][{overlay.id}]overlay={x}:{y}:enable='...")

    # 3. Captions: ASS subtitle burn-in with karaoke styling
    filters.append(f"subtitles={ctx.caption_file}:force_style='...'")

    # 4. Audio: mix voiceover + music with ducking
    filters.append(f"[voice][music]amix=inputs=2:duration=first")

    return ";".join(filters)

05 — Autonomous Intelligence

Trend Discovery & Self-Learning

The system discovers content ideas autonomously and learns from its own performance to improve over time.

📡 5 Data Sources

Reddit — public JSON API + OAuth2 (60 req/min)
Hacker News — Firebase public API
Google Trends — pytrends library
GitHub Trending — ghapi (5000 req/hr)
YouTube — Data API v3 (10K units/day)

🧮 Scoring Formula

score =
popularity × 0.35
+ velocity × 0.30
+ freshness × 0.20
+ multi_source_bonus × 0.15
− coverage_penalty × 0.50

🧠 Intelligence Brain

Detects 6 hook styles (question, bold_claim, statistic, story, listicle, challenge)
Extracts content features (word count, emoji usage, CTA presence, question density)
Builds audience fingerprint (engagement by day/hour/platform/topic)
Boosts (+0.3) patterns from high-performers, penalizes (−0.3) poor ones

💾 Mem0 Semantic Memory

Self-hosted Mem0 stores learned insights as semantic memories. The LLM scriptwriter receives relevant memories as context — creating a genuine feedback loop where each content piece is informed by the performance of all previous ones.

06 — Persistence

Data Layer

SQLite for structured data + Mem0 for semantic memory. Pydantic v2 models enforce type safety across all boundaries.

🗄️ SQLite — Content-Addressable Store

SHA-256 fingerprinted assets (scripts, audio, video, images)
Publish history with per-platform tracking
Engagement metrics time-series
Topic queue with status tracking
Posting schedule with slot management
Trigram Jaccard dedup (threshold 0.35)

📐 Pydantic v2 Models

schemas.py — ContentBrief, ScriptResult, VoiceResult, CaptionResult, RenderResult, PipelinePayload
trend_models.py — TrendSource, TrendItem, TopicStatus, TopicQueueItem, NicheConfig
scheduler_models.py — PostingSlot, PostMetric, ScheduledItem
autonomy_models.py — AutonomyMode, AutopilotGuards

07 — Command Center

12-Page Web Dashboard

Built with FastAPI + Jinja2 + HTMX + Alpine.js + Tailwind CSS. Zero build step — no webpack, no npm. 2,905 lines serving 12 pages with real-time updates via HTMX polling.

📊

Pipeline Monitor

Live runs, staging artifacts, output videos, log tail

✂️

Video Editor

Re-render, trim, adjust captions, preview

⚙️

Settings

API keys (masked), render defaults, toggles

📚

Content Library

Browse assets, scripts, publishes, dedup stats

🔥

Trends

Live trend scan results, niche filtering

📅

Content Calendar

Scheduled posts, time slot management

📈

Analytics

Engagement metrics, viral scores, platform comparison

🧠

Intelligence

Learned insights: best hooks, optimal word counts

🤖

Autonomy Control

Manual → Suggest → Autopilot mode switching

🎮

AI Playground

Interactive model testing: chat, TTS, STT, diarize

🚀

Publish

Manual publish trigger and platform selection

💬

Telegram Bot

12 commands + inline keyboards + freeform input

08 — Infrastructure

Deployment & Local GPU Stack

Docker Compose orchestrates 6 services. A separate local GPU server ("Cognitive Silo") runs all AI models — keeping the pipeline fully self-hosted.

🐳 Docker Compose Services

temporal-db — PostgreSQL 16 Alpine
temporal — auto-setup server (depends on DB healthy)
temporal-ui — Web UI at :8233
worker — Python: 55 activities + 5 workflows
telegram-bot — Python: Telegram polling
dashboard — FastAPI on :8080

🖥️ Cognitive Silo (Local GPU)

Ollama (:11434) — DeepSeek R1/V2, Llama 3.3 on AMD W7900
XTTS-v2 (:8002) — Local voice cloning TTS
MusicGen (:8003) — Meta's music generation model
ComfyUI (:8188) — FLUX image generation
Speaches (:8000) — Kokoro TTS, Whisper, VAD
Mem0 (:8080) — Semantic memory with LLM extraction

Dockerfile — Multi-stage build (simplified)

FROM python:3.11-slim

# System deps: FFmpeg for rendering, Node.js for Mermaid diagrams
RUN apt-get update && apt-get install -y \
    ffmpeg fonts-dejavu \
    nodejs npm chromium

# Mermaid CLI for diagram rendering via Puppeteer
RUN npm install -g @mermaid-js/mermaid-cli

# Python deps + Whisper for local captioning
COPY pyproject.toml .
RUN pip install -e . && pip install openai-whisper

COPY . /app
WORKDIR /app

09 — Codebase

Project Structure

~24,700 lines of Python across 80+ source files. Every module has a clear responsibility boundary.

sma/
├── bot/                      # Telegram bot: handlers, keyboards, 12 commands
│   └── main.py
├── config/                   # YAML config + Pydantic Settings + secrets
│   ├── config.yaml
│   └── loader.py             # Strongly typed hierarchical config
├── dashboard/                # FastAPI + Jinja2: 12 pages, 2,905 lines
│   ├── app.py
│   ├── templates/            # HTMX + Alpine.js + Tailwind
│   └── static/
├── data/                     # SQLite DB: content-addressable asset store
│   ├── db.py
│   └── models.py
├── logger/                   # structlog: dual-write JSONL + human-readable
├── models/                   # Pydantic v2: 4 schema modules
│   ├── schemas.py            # Pipeline data models
│   ├── trend_models.py       # Trend discovery models
│   ├── scheduler_models.py   # Content calendar models
│   └── autonomy_models.py    # Autonomy control models
├── pipeline/                 # Temporal workflows + activities
│   ├── workflow.py           # ContentPipelineWorkflow (11 steps)
│   ├── worker.py             # Registers 5 workflows + 55 activities
│   ├── trend_workflow.py     # TrendScanWorkflow (4h schedule)
│   ├── scheduler_workflow.py # ContentSchedulerWorkflow (12h)
│   ├── analytics_workflow.py # AnalyticsCollectorWorkflow (4h)
│   ├── intelligence_workflow.py
│   ├── layout_engine.py      # Platform-specific safe zones (246 lines)
│   └── activities/           # 27 activity modules
│       ├── ai_activity.py          # Multi-provider LLM orchestration
│       ├── tts_activity.py         # 4-tier TTS fallback chain
│       ├── caption_activity.py     # Whisper + ASS generator
│       ├── render_activity.py      # FFmpeg composition (909 lines)
│       ├── qa_activity.py          # 5 QA gates (807 lines)
│       ├── code_renderer.py        # VS Code-style code → PNG
│       ├── diagram_activity.py     # Mermaid → PNG
│       ├── facebook_activity.py    # Facebook Graph API
│       ├── instagram_activity.py   # Instagram Graph API
│       ├── youtube_activity.py     # YouTube Data API v3
│       ├── tiktok_activity.py      # TikTok API
│       └── multiplatform_activity.py
├── prompts/                  # AI system prompts for each pipeline stage
├── services/                 # Business logic services
│   ├── ai_service.py         # Multi-provider gateway
│   ├── trend_scanner.py      # 5-source scanner (504 lines)
│   ├── trend_aggregator.py   # Normalize + score + dedup (275 lines)
│   ├── intelligence.py       # Self-learning brain (603 lines)
│   ├── autonomy.py           # 3-level autonomy with safety guards
│   └── posting_optimizer.py  # Time slot optimization
├── tests/                    # pytest-asyncio: 8 test modules
│   ├── conftest.py           # Fixtures: isolated DB, seeded data (248 lines)
│   ├── test_e2e.py           # End-to-end integration (431 lines)
│   ├── test_qa.py            # All 5 QA gates (388 lines)
│   ├── test_intelligence_unit.py  # Intelligence brain unit tests (374 lines)
│   └── ...                   # + regression, diagram, integration tests
├── docker-compose.yaml       # 6 services with health checks
├── Dockerfile                # Python 3.11 + FFmpeg + Node.js + Chromium + Whisper
└── pyproject.toml            # Dependencies + project metadata

10 — Process

Engineering Methodology

Architecture-first engineering: spec before code. Every feature started as a plan document with alternatives considered, risks assessed, and implementation phased.

📋 7 Plan Documents

PLAN.md — Core architecture: Temporal, bot, pipeline
PLAN2.md — Asset DB: SQLite, SHA-256, dedup
PLAN3.md — AI Playground dashboard tab
PLAN4.md — Mermaid diagram overlays
PLAN5.md — Autonomous viral content engine
UPGRADE_PLAN.md — v2 rendering fixes
QA_PIPELINE_PLAN.md — 5 QA gates + Command Center

🧪 Testing Strategy

8 test modules covering unit, integration, e2e, regression
pytest-asyncio with asyncio_mode = "auto"
Shared fixtures: isolated in-memory SQLite seeded with 10 posts + metrics
Mocked externals: Mem0, Telegram, AI providers
Tests validate QA gates, intelligence brain, diagram rendering, full pipeline flow

Autonomy Modes with Safety Guards

The system supports three autonomy levels, controllable via Telegram or dashboard:

MANUAL

Human triggers every pipeline run. Full control.

SUGGEST

System suggests topics and times. Human approves.

AUTOPILOT

Fully autonomous. Hard-coded rule: Telegram approval ALWAYS required before publish.

Table of Contents