Signal Scout

Signal Sources

▶

Active V1 — Currently Wired (Jacob's code)

#	Source	Type	Method	Trust	Status
1	Hacker News	Community	RSS	0.65	● Active
2	Wired	Trade Press	RSS	0.85	● Active
3	VentureBeat AI	Trade Press	RSS	0.82	● Active
4	TechCrunch	Trade Press	RSS	0.75	● Active
5	MIT Tech Review	Tier-1 Research	RSS	0.88	● Active
6	AI News	Trade Press	RSS	0.80	● Active
7	arXiv Papers	Academic Research	RSS	0.88	● Active

Thought Leader Video Monitoring V1

Step	Detail
1. Curate	Maintain list of 20-50 YouTube channels (thought leaders + companies)
2. Poll	YouTube Data API for new uploads every 30 min
3. Extract	Title, description, tags, thumbnail_url, published_at, view_count
4. Score	Pass through multi-layer scoring engine
5. Queue	High-score videos (>0.70) queued for talk track generation

Channel Tier	Examples	Behavior
Tier 0 (auto-elevate)	Karpathy, Andrew Ng, Jensen Huang, Sam Altman	Score → 0.90+ automatically
Tier 1 (high trust)	IBM Research, Snowflake, Databricks, Anthropic, OpenAI	Trust floor 0.92
Tier 2 (track)	Fireship, Two Minute Papers, AI Explained	Normal scoring

Future: Whisper transcript extraction → full content scoring (not just title/description)

Video Description Scoring Approach

How signals are identified from video content:
• Title keywords: Match against relevance keyword library (same as RSS scoring)
• Description parsing: Extract topics, tools mentioned, company names, product launches
• Tag analysis: YouTube tags reveal creator intent and topic clustering
• Cross-reference: Same topic in video + RSS + community within 48h → cross-platform heat multiplier
• Engagement signal: View velocity in first 24h indicates topic resonance
• Upload frequency: Multiple videos on same topic = strong signal

Key insight: Video descriptions contain more context than titles. Creators write for SEO, revealing exactly what content covers — higher-signal than headline-only scanning.

Signal Queue output: When a video signal enters the queue, it includes a direct link to the video for human review before talk track generation.

Video Signal Flow:
• Score description/title/tags → enters Signal Queue with video link
• Human reviews video (watch or skim)
• If approved as signal → extract full transcript (Whisper)
• Transcript fed to framework selection → generate talk track or structured output

Tier 1 V1 — V1 Expansion (implement next)

#	Source	Why	Method	Trust	Auth
7	Reddit	r/artificial, r/machinelearning, r/dataengineering	API (PRAW)	0.70	Free tier
8	GitHub Trending	What builders are actually building	Scrape	0.72	None
9	Google Trends	Search demand = real interest	API	0.80	API key
10	YouTube Data API	Thought leader video titles + descriptions	API	Varies	API key
11	Podcast RSS	Episode titles, guests, descriptions	RSS	0.75	None

Tier 2 Future — Under Evaluation for Potential Future Implementation

Category	Source	Description	Method	Trust	Type
Social	Twitter/X API	Real-time discourse, influencer takes	API (paid)	0.60	Paid
Social	LinkedIn Feed	Enterprise sentiment, executive posts	API (limited)	0.78	Paid
Social	Threads / Bluesky	Emerging social platforms, tech early adopters	API	0.55	Free
Social	TikTok Trends	Viral content patterns, gen-Z signal	Scrape	0.50	Free
Trend	Exploding Topics	Pre-mainstream topic detection	API	0.82	Paid
Trend	Feedly Pro + AI	AI-curated feed aggregation	API	0.80	Paid
Trend	Glimpse (trend enrichment)	Trend data enrichment layer	API	0.78	Paid
Competitor	Semrush	SEO + content gap analysis	API	0.85	Paid
Competitor	Ahrefs	Backlink + content performance	API	0.85	Paid
Competitor	Crunchbase	Funding rounds, startup signals	API	0.80	Paid
Competitor	SimilarWeb	Traffic analysis, market share	API	0.78	Paid
Competitor	Owler	Company news, competitive alerts	API	0.72	Free/Paid
Audience	SparkToro	Audience intelligence, where they gather	API	0.80	Paid
Audience	LinkedIn Sales Navigator	Decision-maker activity tracking	API	0.82	Paid
Content	BuzzSumo	Top-performing content by topic	API	0.78	Paid
Content	Social Blade	Channel growth tracking	API	0.70	Free/Paid
Innovation	Product Hunt	New launches, adoption velocity	API	0.70	Free
Innovation	DEV.to	Developer community discourse	API	0.65	Free
Innovation	Patent Filings	R&D direction indicators	RSS/Scrape	0.85	Free
Enterprise	Gartner Reports	Market quadrants, hype cycles	API/Scrape	0.95	Paid
Enterprise	Forrester Research	Technology wave analysis	API/Scrape	0.93	Paid
Enterprise	McKinsey Insights	Strategy + transformation research	RSS	0.90	Free
Enterprise	CB Insights	Market maps, emerging tech	API	0.88	Paid
Enterprise	Statista	Market data, statistics	API	0.85	Paid
Deep Research	Perplexity API	AI-powered web research	API	0.88	Paid
Deep Research	Perplexity Deep Research	Multi-source deep analysis	API	0.92	Paid
Hiring	Job Posting Feeds	Demand signals by role/skill	API	0.72	Paid
Hiring	Layoffs.fyi	Market contraction signals	Scrape	0.70	Free
Hiring	Crunchbase Funding	Investment direction, growth signals	API	0.80	Paid
Community	Indie Hackers	Builder community sentiment	Scrape	0.65	Free
Community	Discord Servers	Niche community monitoring	Bot/API	0.60	Free
Alerts	Google Alerts	Keyword-triggered notifications	Email/RSS	0.65	Free
Alerts	Mention.com	Real-time brand/topic monitoring	API	0.75	Paid
Listening	Brandwatch	Enterprise social listening	API	0.85	Paid
Listening	Meltwater	Media monitoring + analytics	API	0.85	Paid
Blogs/RSS	Substack newsletters	Long-form thought leadership	RSS	0.72	Free
Blogs/RSS	Medium publications	Tech community writing	RSS	0.68	Free
Blogs/RSS	Press Releases	Official company announcements	RSS	0.70	Free
Industry	Listen Notes	Podcast search, 3M+ shows indexed	API	0.75	Paid
Industry	Conference/Event Feeds	Keynote topics, speaker signals	RSS/Scrape	0.80	Free

User Config: Add / Edit / Remove Sources

Users can manage sources at any time via the Source Management UI:
• Add RSS: Paste feed URL → validate → set tier + category + trust → save
• Add API: Select supported platform → paste API key → validate connection → configure polling
• Add Scrape: Paste target URL → set extraction rules → set frequency → save
• Edit: Change tier, trust weight, polling frequency, category
• Remove/Pause: Disable without deleting, or remove entirely

New sources enter at neutral trust and earn score through validated signals over time.

Source Connection Logic

RSS Sources:
• Paste feed URL → validate (fetch + parse) → store URL + polling interval + trust score
• Connector: HTTP GET with User-Agent header, XML/Atom parser

API Sources:
• Select platform → input API key/OAuth token → validate connection → configure endpoints + rate limits
• Reddit: PRAW library, OAuth2 app credentials
• YouTube: API key, quota management (10,000 units/day free)
• Google Trends: pytrends library, no auth needed
• HackerNews: Free API, no auth, rate-limit friendly

Scrape Sources:
• Define target URL + CSS selectors for content extraction
• Set crawl frequency + respectful delays (2-5s between requests)
• GitHub Trending: parse /trending page, extract repo name + description + stars

Connection Health:
• Each source has a status: connected / degraded / failed
• Auto-retry on failure (3 attempts, exponential backoff)
• Alert if source fails 3 consecutive polls
• Trust score decays if source consistently returns low-signal content

Source Trust Ranking

Initial Trust Assignment:
• Tier-1 research/academic (MIT, arXiv): 0.85-0.95
• Trade press (Wired, TechCrunch): 0.75-0.85
• Community (HN, Reddit): 0.60-0.70
• New/unverified sources: 0.50 (neutral entry)

Trust Adjusts Over Time Based On:
• Signal-to-noise ratio: % of ingested items that score above 0.50 after multi-layer scoring
• Human validation: Signals from this source that get approved in queue vs rejected
• Engagement correlation: Do items from this source actually perform when published?
• Consistency: Regular high-quality signals vs sporadic

Trust Decay:
• Source produces 5+ consecutive items scoring below 0.40 → trust drops 0.05
• Source offline/failing for 7+ days → trust drops 0.10
• Manual override: admin can set trust floor or ceiling at any time

Trust Growth:
• 10+ approved signals from source in 30 days → trust increases 0.05
• Source signals consistently lead to published content → trust increases 0.03
• Capped at 0.95 (no source gets perfect trust)

Existing Implementation Reference

Package: src/scout/ (pip-installable Python)
RSS Parser: scoring.py → parse_rss(url, name) — fetches feed, extracts title/link/description, returns structured items
Config: default_config/sources.json — array of {url, type, trust, name}
Adding a source: Append to sources.json with URL, trust score, and category
YouTube/Reddit/API: Extend with new connector functions in scoring.py following same pattern (return list of {title, url, desc, source})

Ingestion Methods

▶

Pipeline 1: Automated Market Signal Ingestion V1

Automated scanning of external sources for potential signals. Runs on cron schedule (configurable). Can also be user-initiated on demand. All ingested items pass through multi-layer scoring before entering Signal Queue.

Method 1: RSS/Feed Polling

Frequency: every 30-60 min
Sources: Podcasts, TechCrunch, Substack, Medium, arXiv, Product Hunt, press releases
Format: { title, description, url, timestamp, source_name }
Implementation: Standard RSS/Atom parser, User-Agent: Scout/3.0, timeout 12s

Method 2: API Direct

Frequency: every 15-30 min
Sources: Reddit (PRAW), YouTube Data API, Google Trends, Twitter/X, HackerNews
Format: Structured JSON per platform
Implementation: Platform SDKs, rate limit management, auth token rotation

Method 3: Web Search/Scrape

Frequency: every 2-4 hours
Sources: Google Search (SerpAPI), GitHub Trending, YouTube video descriptions
Format: Extracted text + metadata
Implementation: SerpAPI or headless browser, respectful crawl delays

Pipeline 2: Manual / Direct Content Input V1

Human-initiated input. Bypasses automated scoring and Signal Queue. Routes directly to framework selection and structured output selection.

Input Types

V1 Inputs: V1
• Text paste: Raw text, notes → framework selection directly
• File upload: PDF, transcript, document → extract text → framework selection
• URL paste: Article, blog post → extract content → framework selection (if feasible)

V2 Inputs: V2
• Voice note: Audio recording → transcribe (Whisper) → framework selection
• Video demo / URL video link: Product demo recording or video URL → transcribe → extract key moments
• Topic submission: Topic/idea with no source material → research + generate

Manual Pipeline Behavior

• Bypasses automated scoring (human already validated)
• Bypasses Signal Queue (no review needed — human initiated)
• Routes directly to framework + structured output selection
• Can optionally run through scoring for prioritization if multiple manual inputs queued
• Supports team routing (assign to specific member)

Future Ingestion Methods Future

Under Evaluation

• Webhook listeners: Real-time push from integrated platforms
• Email parsing: Newsletter digests auto-ingested as signals
• Slack/Teams monitoring: Internal channel keyword triggers
• Calendar integration: Auto-ingest conference agendas, webinar topics
• CRM integration: Client questions/feedback as signal input
• Perplexity Deep Research: AI-driven multi-source research on demand

Signal Scoring Engine

▶

V1 All layers below are V1 implementation scope.

Layer 1: Emergence Detection (weight: 0.30)

Identifies concepts in the critical 5% window: past "too early" but before mainstream saturation. Semantic clustering of related concepts in new combinations. Hype cycle positioning: pre-emergence = maximum value, peak = contextualization only.

Layer 2: Thought Leader Watchlist (override)

Tier 0 (0.97): Karpathy, Andrew Ng, Jensen Huang, Sam Altman, Demis Hassabis. Tier 1 (0.92): IBM Research, Snowflake, Databricks, Anthropic, OpenAI. Activation auto-elevates to 0.90+ regardless of engagement. Bypasses formula entirely.

Layer 3: Question Gap Detector (weight: 0.15)

Monitors comments, Reddit threads, LinkedIn replies, conference Q&A for unanswered questions. Repeated questions across sources with no satisfying answer = talk track opportunity. "Questions precede answers; answers precede adoption."

Layer 4: Practitioner vs. Analyst Divergence

Tracks analyst publications (Gartner, Forrester, McKinsey) vs practitioner sentiment (HN, Reddit). High divergence = positioning opportunity. Example: "73% of AI projects fail" vs analyst adoption narratives.

Layer 5: Competitive Gap Intelligence

Monitors Accenture, Deloitte, McKinsey, BCG, PwC AI publications. Topics they ALL cover (add depth), NONE cover (own it), covered POORLY (outperform). Updated monthly.

Layer 6: Temporal & Calendar Intelligence

Q4: AI governance elevated (budget season). Pre-conference: innovation elevated (IBM Think, Dreamforce, NeurIPS). Nov/Dec: year-end predictions window.

Layer 7: Source Trust (weight: 0.20)

Weighted trust per source. Decays for repeated low-signal. New sources enter at neutral, earn trust through validated signals. Tracked over time.

Layer 8: Engagement Velocity (weight: 0.10)

Comment volume, share rate, reactions at ingest time. LinkedIn reactions, YouTube comments, Reddit upvotes, HN score. High engagement = real audience resonance.

Layer 9: Cross-Platform Heat (multiplier)

Same topic trending across 2+ platforms within 48h = score multiplier. Strongest signal: multiple communities discussing simultaneously.

Layer 10: Relevance Keywords (weight: 0.25)

Weighted keyword matching: AI/ML (0.15), Data Engineering (0.10), Enterprise (0.15), Governance (0.15), Open Source (0.10). Capped at 1.0.

Layer 11: Noise Filter (zero-out)

Negative keywords: gaming, esports, celebrity, sports scores, recipe, fashion, diet, horoscope. Any match = score 0, never surfaced.

Layer 12: Gartner Hype Cycle Position Detection (multiplier)

Maps signals against hype cycle phases using our own signal data to detect optimal timing.

Phases & Multipliers:
• Innovation Trigger (1.5x): Topic on GitHub/arXiv/HN but NOT mainstream press yet
• Peak of Inflated Expectations (0.7x): Everywhere simultaneously — Reddit + TechCrunch + LinkedIn + YouTube
• Trough of Disillusionment (1.2x): Drops from headlines, practitioners still building
• Slope of Enlightenment (1.3x): Steady mentions, practical how-to content increasing
• Plateau of Productivity (0.8x): Established, generic content, low engagement

Detection (our own logic):
• Cross-platform volume analysis (where + how many sources discussing simultaneously)
• Practitioner vs mainstream coverage ratio
• Content type shift (hype articles → tutorials → case studies = maturity)
• Engagement decay curves (rapid drop = peak passed)

Future enrichment: Gartner subscription validates our position detection (Tier 2 source)

Adding New Scoring Logic

• New scoring layers can be added as signal needs evolve
• Define: name, weight (or multiplier/override), scoring criteria, data inputs
• New layers integrate into the formula or act as multipliers/overrides
• All weights remain configurable via config — no code changes needed to tune

How Layers Map to Scoring Formula (subject to adjustment)

Direct formula inputs (weighted):
• Emergence (0.30): Layer 1 (Emergence Detection) + Layer 4 (Practitioner vs Analyst Divergence)
• Relevance (0.25): Layer 10 (Keywords) + Layer 5 (Competitive Gap Intelligence)
• Authority (0.20): Layer 7 (Source Trust)
• Question Gap (0.15): Layer 3 (Question Gap Detector)
• Velocity (0.10): Layer 8 (Engagement Velocity) — social engagement, comments, shares, upvotes

Post-formula multipliers:
• Layer 6 (Temporal/Calendar) — seasonal multiplier applied after base score
• Layer 9 (Cross-Platform Heat) — multiplier when 2+ platforms discuss same topic within 48h
• Layer 12 (Gartner Hype Cycle) — position-based multiplier (0.7x to 1.5x)

Overrides (bypass formula):
• Layer 2 (Thought Leader Watchlist) — auto-elevates to 0.90+ regardless of formula
• Layer 11 (Noise Filter) — zeros out score entirely, signal never surfaces

emergence_position × 0.30 + relevance_depth × 0.25 + source_authority × 0.20 + question_gap × 0.15 + velocity × 0.10
Temporal multiplier applied after. TL watchlist overrides formula.

Signal Queue

▶

V1 Human-in-the-loop review and decision point. All scored signals land here before framework selection.

Signal Queue

Score	Action
>0.85	Immediate alert — talk track within 24h
0.70–0.85	Enters Signal Queue for review
0.50–0.70	Enters Signal Queue — lower priority, weekly digest
<0.50	Logged, not surfaced
<0.40	Filtered out entirely

Signal Queue UI Behavior

Score Threshold Slider:
• Adjustable slider controls which signals enter the queue
• Slide down = more potential signals visible (lower threshold)
• Slide up = only highest confidence signals shown

Each Signal Card Displays:
• Score (numeric + visual indicator)
• Topic/title summary
• Source link(s) — clickable to review original content
• Tags and scoring factors breakdown
• Timestamp (when ingested)

User Actions Per Signal:
• Generate: Route to Framework Selection + Output Format Selection → produce deliverables
• Return to Queue: Keep in queue for later review (re-prioritize)
• Purge: Remove from queue, log as rejected (feeds trust scoring over time)

Generate Flow:
• Step 1 — Framework Selection (optional): How are we talking about this? (SPARK, PAS, StoryBrand, etc.) Defines the narrative structure and voice. Can be skipped if no specific framework is needed.
• Step 2 — Output Format Selection (required, multi-select): What are we producing? Users can select multiple formats simultaneously. Determines the deliverable bundle(s) fed into the creative pipeline. Always required. Example: select Talk Track + Beat Map + Shot List for a single signal to produce both a talking head video and a sizzle reel.

Output format determines deliverable bundle:
• Video: narrative, beat map, script/talking points
• Podcast: narrative, episode structure, talking points, segment breakdown
• LinkedIn/Article: narrative, written post, headline options, CTA
• Talk Track: narrative, structured talking points, hook options

Team Member / Avatar Routing Logic V2

Potential V2 implementation — route scored signals to specific team members based on vertical expertise, capacity, and assignment rules.

Framework Selection (Structured Output)

▶

V1 14 frameworks already built and ready for V1 implementation. Additional frameworks can be added over time.

#	Framework	Structure	Best For
1	SPARK	Signal → Position → Argument → Reinforcement → Kicker	Primary. 60-90s video scripts, LinkedIn posts, talk tracks. Most versatile across all formats.
2	PAS	Problem → Agitation → Solution	Short-form video hooks, Instagram reels, problem-aware audiences.
3	BAB	Before → After → Bridge	Case study videos, client testimonials, before/after LinkedIn posts.
4	StoryBrand	Guide → Problem → Plan → Action → Success	Longer explainer videos, landing pages, brand positioning pieces.
5	AIDA	Attention → Interest → Desire → Action	Ad scripts, email sequences, direct-response LinkedIn posts.
6	Data-Driven	Stat → Context → Implication → Action	White papers, data-heavy LinkedIn articles, conference presentations.
7	Hot Take	Contrarian → Evidence → CTA	Viral LinkedIn posts, Twitter threads, short-form video hooks that challenge status quo.
8	Story Arc	Hook → Tension → Resolution → Lesson	Podcast episodes, longer video content, keynote structures.
9	Listicle	3-5 key points, punchy	Carousel posts, newsletter sections, quick-hit social content.
10	Trend Piece	Signal → Context → Trajectory → Implications	Thought leadership articles, industry commentary, weekly digest content.
11	Case Study	Challenge → Approach → Outcome → Lesson	Client-facing videos, website content, sales enablement materials.
12	Comparison	Option A → Option B → Verdict → Why	Tool/platform evaluation posts, buyer-stage content, advisory pieces.
13	Prediction	Current state → Forces → Predicted → Prepare	Year-end/new-year content, conference talks, long-form thought pieces.
14	Explainer	Concept → Analogy → Example → Application	Tutorial videos, onboarding content, explainer series, YouTube long-form.

Style Modes

Thought Leader — authoritative, first-person, opinionated
Educational — explain like I'm smart but new
Provocative — challenge assumptions, contrarian
Corporate — safe, professional, IBM-appropriate
Conversational — informal, peer-to-peer

Anti-Patterns (never generate)

Doom Framing (fear-based)
Fear Farming (exploiting anxiety)
Generic Hype (empty buzzwords)
Unsubstantiated claims

Adding New Frameworks

• Users can manually add new frameworks at any time
• Define: name, structure (sections/steps), description, best-use-case
• New frameworks appear in the selection UI alongside defaults
• Framework library grows over time as team develops new approaches

Framework UI Behavior

Adding a Framework:
• Click "Add Framework" button in framework library
• Enter: name, description, structure (ordered sections/steps), best-for use cases
• Save → immediately available in framework selector

Selecting a Framework:
• Framework cards display name + structure preview + "Best For" hint
• "Best For" guides user toward the right output format
• Click to select → routes to Output Format selector

Editing/Removing:
• Edit framework structure, description, or best-for at any time
• Remove/archive frameworks no longer in use

Format Selection (Structured Output)

▶

Production Types (User Selects) V1

User selects the production type. System generates all required structured ingredients for that format. Framework selection remains optional for any type.

#	Production Type	Structured Ingredients Produced
1	Short-Form Video (talking head)	Framework (optional), script/talk track, beat map, shot list
2	Short-Form Video (sizzle / no talking head)	Narrative, beat map, shot list, storyboard, headline/hook variants
3	Long-Form Video	Storyboard, narrative/story arc, beat map, shot list, script, segment breakdown
4	Podcast	Segments, guest prep, show notes, intro/outro, talking points
5	Social Post	Platform sub-select (LinkedIn, IG, X, TikTok) → copy, headline/hook variants, visual direction

Structured Ingredients (System Generates)

These are the deliverables produced by the system based on production type selection. Users see what will be generated before confirming.

Ingredient	Description	Used By
Talk Track / Script	Scripted narrative for speaking (VO, talking head, conversation)	Short-form TH, Long-form, Podcast
Beat Map	Word-level timestamps + overlay timing	Short-form, Long-form
Shot List	Camera specs, source types, visual direction per beat	Short-form, Long-form
Storyboard	5-8 scene narrative arc with visual direction	Sizzle, Long-form
Narrative / Story Arc	Overall story structure and flow	Sizzle, Long-form
Headline / Hook Variants	5-10 options per topic for testing	Sizzle, Social
Social Copy	Platform-optimized text (character limits, formatting, CTAs)	Social
Podcast Segments	Segment structure, guest prep, show notes, intro/outro	Podcast

Adding New Formats

• Users can add new output formats at any time
• Define: name, description, deliverable bundle (what gets produced)
• New formats appear in the selector alongside defaults
• Format library grows as production needs evolve

Format Selection UI Behavior

Selecting Formats (multi-select):
• Format cards display name + deliverable bundle preview
• User sees what they'll receive before selecting
• Select one or multiple formats simultaneously
• Confirm selection → system generates all deliverables across selected bundles

After Generation:
• Deliverables displayed for review (editable)
• Approve → routes to creative pipeline
• Edit → modify before routing
• Regenerate → re-run with same inputs

Adding/Editing:
• Click "Add Format" to create new output type
• Edit existing format descriptions or deliverable bundles
• Archive formats no longer in use

Signal Sources

Active V1 — Currently Wired (Jacob's code)

Thought Leader Video Monitoring V1

Tier 1 V1 — V1 Expansion (implement next)

Tier 2 Future — Under Evaluation for Potential Future Implementation

Ingestion Methods

Pipeline 1: Automated Market Signal Ingestion V1

Pipeline 2: Manual / Direct Content Input V1

Future Ingestion Methods Future

Signal Scoring Engine

Signal Queue

Signal Queue

Team Member / Avatar Routing Logic V2

Framework Selection (Structured Output)

Style Modes

Anti-Patterns (never generate)

Format Selection (Structured Output)

Production Types (User Selects) V1

Structured Ingredients (System Generates)

Architecture Flow