Signal Scout

V1 Deployment Spec — Developer Handoff — v1.0

Signal Sources

Active V1 — Currently Wired (Jacob's code)

#SourceTypeMethodTrustStatus
1Hacker NewsCommunityRSS0.65● Active
2WiredTrade PressRSS0.85● Active
3VentureBeat AITrade PressRSS0.82● Active
4TechCrunchTrade PressRSS0.75● Active
5MIT Tech ReviewTier-1 ResearchRSS0.88● Active
6AI NewsTrade PressRSS0.80● Active
7arXiv PapersAcademic ResearchRSS0.88● Active

Thought Leader Video Monitoring V1

StepDetail
1. CurateMaintain list of 20-50 YouTube channels (thought leaders + companies)
2. PollYouTube Data API for new uploads every 30 min
3. ExtractTitle, description, tags, thumbnail_url, published_at, view_count
4. ScorePass through multi-layer scoring engine
5. QueueHigh-score videos (>0.70) queued for talk track generation
Channel TierExamplesBehavior
Tier 0 (auto-elevate)Karpathy, Andrew Ng, Jensen Huang, Sam AltmanScore → 0.90+ automatically
Tier 1 (high trust)IBM Research, Snowflake, Databricks, Anthropic, OpenAITrust floor 0.92
Tier 2 (track)Fireship, Two Minute Papers, AI ExplainedNormal scoring

Future: Whisper transcript extraction → full content scoring (not just title/description)

Video Description Scoring Approach
How signals are identified from video content:
Title keywords: Match against relevance keyword library (same as RSS scoring)
Description parsing: Extract topics, tools mentioned, company names, product launches
Tag analysis: YouTube tags reveal creator intent and topic clustering
Cross-reference: Same topic in video + RSS + community within 48h → cross-platform heat multiplier
Engagement signal: View velocity in first 24h indicates topic resonance
Upload frequency: Multiple videos on same topic = strong signal

Key insight: Video descriptions contain more context than titles. Creators write for SEO, revealing exactly what content covers — higher-signal than headline-only scanning.

Signal Queue output: When a video signal enters the queue, it includes a direct link to the video for human review before talk track generation.

Video Signal Flow:
• Score description/title/tags → enters Signal Queue with video link
• Human reviews video (watch or skim)
• If approved as signal → extract full transcript (Whisper)
• Transcript fed to framework selection → generate talk track or structured output

Tier 1 V1 — V1 Expansion (implement next)

#SourceWhyMethodTrustAuth
7Redditr/artificial, r/machinelearning, r/dataengineeringAPI (PRAW)0.70Free tier
8GitHub TrendingWhat builders are actually buildingScrape0.72None
9Google TrendsSearch demand = real interestAPI0.80API key
10YouTube Data APIThought leader video titles + descriptionsAPIVariesAPI key
11Podcast RSSEpisode titles, guests, descriptionsRSS0.75None

Tier 2 Future — Under Evaluation for Potential Future Implementation

CategorySourceDescriptionMethodTrustType
SocialTwitter/X APIReal-time discourse, influencer takesAPI (paid)0.60Paid
SocialLinkedIn FeedEnterprise sentiment, executive postsAPI (limited)0.78Paid
SocialThreads / BlueskyEmerging social platforms, tech early adoptersAPI0.55Free
SocialTikTok TrendsViral content patterns, gen-Z signalScrape0.50Free
TrendExploding TopicsPre-mainstream topic detectionAPI0.82Paid
TrendFeedly Pro + AIAI-curated feed aggregationAPI0.80Paid
TrendGlimpse (trend enrichment)Trend data enrichment layerAPI0.78Paid
CompetitorSemrushSEO + content gap analysisAPI0.85Paid
CompetitorAhrefsBacklink + content performanceAPI0.85Paid
CompetitorCrunchbaseFunding rounds, startup signalsAPI0.80Paid
CompetitorSimilarWebTraffic analysis, market shareAPI0.78Paid
CompetitorOwlerCompany news, competitive alertsAPI0.72Free/Paid
AudienceSparkToroAudience intelligence, where they gatherAPI0.80Paid
AudienceLinkedIn Sales NavigatorDecision-maker activity trackingAPI0.82Paid
ContentBuzzSumoTop-performing content by topicAPI0.78Paid
ContentSocial BladeChannel growth trackingAPI0.70Free/Paid
InnovationProduct HuntNew launches, adoption velocityAPI0.70Free
InnovationDEV.toDeveloper community discourseAPI0.65Free
InnovationPatent FilingsR&D direction indicatorsRSS/Scrape0.85Free
EnterpriseGartner ReportsMarket quadrants, hype cyclesAPI/Scrape0.95Paid
EnterpriseForrester ResearchTechnology wave analysisAPI/Scrape0.93Paid
EnterpriseMcKinsey InsightsStrategy + transformation researchRSS0.90Free
EnterpriseCB InsightsMarket maps, emerging techAPI0.88Paid
EnterpriseStatistaMarket data, statisticsAPI0.85Paid
Deep ResearchPerplexity APIAI-powered web researchAPI0.88Paid
Deep ResearchPerplexity Deep ResearchMulti-source deep analysisAPI0.92Paid
HiringJob Posting FeedsDemand signals by role/skillAPI0.72Paid
HiringLayoffs.fyiMarket contraction signalsScrape0.70Free
HiringCrunchbase FundingInvestment direction, growth signalsAPI0.80Paid
CommunityIndie HackersBuilder community sentimentScrape0.65Free
CommunityDiscord ServersNiche community monitoringBot/API0.60Free
AlertsGoogle AlertsKeyword-triggered notificationsEmail/RSS0.65Free
AlertsMention.comReal-time brand/topic monitoringAPI0.75Paid
ListeningBrandwatchEnterprise social listeningAPI0.85Paid
ListeningMeltwaterMedia monitoring + analyticsAPI0.85Paid
Blogs/RSSSubstack newslettersLong-form thought leadershipRSS0.72Free
Blogs/RSSMedium publicationsTech community writingRSS0.68Free
Blogs/RSSPress ReleasesOfficial company announcementsRSS0.70Free
IndustryListen NotesPodcast search, 3M+ shows indexedAPI0.75Paid
IndustryConference/Event FeedsKeynote topics, speaker signalsRSS/Scrape0.80Free
User Config: Add / Edit / Remove Sources
Users can manage sources at any time via the Source Management UI:
Add RSS: Paste feed URL → validate → set tier + category + trust → save
Add API: Select supported platform → paste API key → validate connection → configure polling
Add Scrape: Paste target URL → set extraction rules → set frequency → save
Edit: Change tier, trust weight, polling frequency, category
Remove/Pause: Disable without deleting, or remove entirely

New sources enter at neutral trust and earn score through validated signals over time.
Source Connection Logic
RSS Sources:
• Paste feed URL → validate (fetch + parse) → store URL + polling interval + trust score
• Connector: HTTP GET with User-Agent header, XML/Atom parser

API Sources:
• Select platform → input API key/OAuth token → validate connection → configure endpoints + rate limits
• Reddit: PRAW library, OAuth2 app credentials
• YouTube: API key, quota management (10,000 units/day free)
• Google Trends: pytrends library, no auth needed
• HackerNews: Free API, no auth, rate-limit friendly

Scrape Sources:
• Define target URL + CSS selectors for content extraction
• Set crawl frequency + respectful delays (2-5s between requests)
• GitHub Trending: parse /trending page, extract repo name + description + stars

Connection Health:
• Each source has a status: connected / degraded / failed
• Auto-retry on failure (3 attempts, exponential backoff)
• Alert if source fails 3 consecutive polls
• Trust score decays if source consistently returns low-signal content
Source Trust Ranking
Initial Trust Assignment:
• Tier-1 research/academic (MIT, arXiv): 0.85-0.95
• Trade press (Wired, TechCrunch): 0.75-0.85
• Community (HN, Reddit): 0.60-0.70
• New/unverified sources: 0.50 (neutral entry)

Trust Adjusts Over Time Based On:
Signal-to-noise ratio: % of ingested items that score above 0.50 after multi-layer scoring
Human validation: Signals from this source that get approved in queue vs rejected
Engagement correlation: Do items from this source actually perform when published?
Consistency: Regular high-quality signals vs sporadic

Trust Decay:
• Source produces 5+ consecutive items scoring below 0.40 → trust drops 0.05
• Source offline/failing for 7+ days → trust drops 0.10
• Manual override: admin can set trust floor or ceiling at any time

Trust Growth:
• 10+ approved signals from source in 30 days → trust increases 0.05
• Source signals consistently lead to published content → trust increases 0.03
• Capped at 0.95 (no source gets perfect trust)
Existing Implementation Reference
Package: src/scout/ (pip-installable Python)
RSS Parser: scoring.py → parse_rss(url, name) — fetches feed, extracts title/link/description, returns structured items
Config: default_config/sources.json — array of {url, type, trust, name}
Adding a source: Append to sources.json with URL, trust score, and category
YouTube/Reddit/API: Extend with new connector functions in scoring.py following same pattern (return list of {title, url, desc, source})

Ingestion Methods

Pipeline 1: Automated Market Signal Ingestion V1

Automated scanning of external sources for potential signals. Runs on cron schedule (configurable). Can also be user-initiated on demand. All ingested items pass through multi-layer scoring before entering Signal Queue.

Method 1: RSS/Feed Polling
Frequency: every 30-60 min
Sources: Podcasts, TechCrunch, Substack, Medium, arXiv, Product Hunt, press releases
Format: { title, description, url, timestamp, source_name }
Implementation: Standard RSS/Atom parser, User-Agent: Scout/3.0, timeout 12s
Method 2: API Direct
Frequency: every 15-30 min
Sources: Reddit (PRAW), YouTube Data API, Google Trends, Twitter/X, HackerNews
Format: Structured JSON per platform
Implementation: Platform SDKs, rate limit management, auth token rotation
Method 3: Web Search/Scrape
Frequency: every 2-4 hours
Sources: Google Search (SerpAPI), GitHub Trending, YouTube video descriptions
Format: Extracted text + metadata
Implementation: SerpAPI or headless browser, respectful crawl delays

Pipeline 2: Manual / Direct Content Input V1

Human-initiated input. Bypasses automated scoring and Signal Queue. Routes directly to framework selection and structured output selection.

Input Types
V1 Inputs: V1
Text paste: Raw text, notes → framework selection directly
File upload: PDF, transcript, document → extract text → framework selection
URL paste: Article, blog post → extract content → framework selection (if feasible)

V2 Inputs: V2
Voice note: Audio recording → transcribe (Whisper) → framework selection
Video demo / URL video link: Product demo recording or video URL → transcribe → extract key moments
Topic submission: Topic/idea with no source material → research + generate
Manual Pipeline Behavior
• Bypasses automated scoring (human already validated)
• Bypasses Signal Queue (no review needed — human initiated)
• Routes directly to framework + structured output selection
• Can optionally run through scoring for prioritization if multiple manual inputs queued
• Supports team routing (assign to specific member)

Future Ingestion Methods Future

Under Evaluation
Webhook listeners: Real-time push from integrated platforms
Email parsing: Newsletter digests auto-ingested as signals
Slack/Teams monitoring: Internal channel keyword triggers
Calendar integration: Auto-ingest conference agendas, webinar topics
CRM integration: Client questions/feedback as signal input
Perplexity Deep Research: AI-driven multi-source research on demand

Signal Scoring Engine

V1 All layers below are V1 implementation scope.

Layer 1: Emergence Detection (weight: 0.30)
Identifies concepts in the critical 5% window: past "too early" but before mainstream saturation. Semantic clustering of related concepts in new combinations. Hype cycle positioning: pre-emergence = maximum value, peak = contextualization only.
Layer 2: Thought Leader Watchlist (override)
Tier 0 (0.97): Karpathy, Andrew Ng, Jensen Huang, Sam Altman, Demis Hassabis. Tier 1 (0.92): IBM Research, Snowflake, Databricks, Anthropic, OpenAI. Activation auto-elevates to 0.90+ regardless of engagement. Bypasses formula entirely.
Layer 3: Question Gap Detector (weight: 0.15)
Monitors comments, Reddit threads, LinkedIn replies, conference Q&A for unanswered questions. Repeated questions across sources with no satisfying answer = talk track opportunity. "Questions precede answers; answers precede adoption."
Layer 4: Practitioner vs. Analyst Divergence
Tracks analyst publications (Gartner, Forrester, McKinsey) vs practitioner sentiment (HN, Reddit). High divergence = positioning opportunity. Example: "73% of AI projects fail" vs analyst adoption narratives.
Layer 5: Competitive Gap Intelligence
Monitors Accenture, Deloitte, McKinsey, BCG, PwC AI publications. Topics they ALL cover (add depth), NONE cover (own it), covered POORLY (outperform). Updated monthly.
Layer 6: Temporal & Calendar Intelligence
Q4: AI governance elevated (budget season). Pre-conference: innovation elevated (IBM Think, Dreamforce, NeurIPS). Nov/Dec: year-end predictions window.
Layer 7: Source Trust (weight: 0.20)
Weighted trust per source. Decays for repeated low-signal. New sources enter at neutral, earn trust through validated signals. Tracked over time.
Layer 8: Engagement Velocity (weight: 0.10)
Comment volume, share rate, reactions at ingest time. LinkedIn reactions, YouTube comments, Reddit upvotes, HN score. High engagement = real audience resonance.
Layer 9: Cross-Platform Heat (multiplier)
Same topic trending across 2+ platforms within 48h = score multiplier. Strongest signal: multiple communities discussing simultaneously.
Layer 10: Relevance Keywords (weight: 0.25)
Weighted keyword matching: AI/ML (0.15), Data Engineering (0.10), Enterprise (0.15), Governance (0.15), Open Source (0.10). Capped at 1.0.
Layer 11: Noise Filter (zero-out)
Negative keywords: gaming, esports, celebrity, sports scores, recipe, fashion, diet, horoscope. Any match = score 0, never surfaced.
Layer 12: Gartner Hype Cycle Position Detection (multiplier)
Maps signals against hype cycle phases using our own signal data to detect optimal timing.

Phases & Multipliers:
Innovation Trigger (1.5x): Topic on GitHub/arXiv/HN but NOT mainstream press yet
Peak of Inflated Expectations (0.7x): Everywhere simultaneously — Reddit + TechCrunch + LinkedIn + YouTube
Trough of Disillusionment (1.2x): Drops from headlines, practitioners still building
Slope of Enlightenment (1.3x): Steady mentions, practical how-to content increasing
Plateau of Productivity (0.8x): Established, generic content, low engagement

Detection (our own logic):
• Cross-platform volume analysis (where + how many sources discussing simultaneously)
• Practitioner vs mainstream coverage ratio
• Content type shift (hype articles → tutorials → case studies = maturity)
• Engagement decay curves (rapid drop = peak passed)

Future enrichment: Gartner subscription validates our position detection (Tier 2 source)
Adding New Scoring Logic
• New scoring layers can be added as signal needs evolve
• Define: name, weight (or multiplier/override), scoring criteria, data inputs
• New layers integrate into the formula or act as multipliers/overrides
• All weights remain configurable via config — no code changes needed to tune
How Layers Map to Scoring Formula (subject to adjustment)
Direct formula inputs (weighted):
Emergence (0.30): Layer 1 (Emergence Detection) + Layer 4 (Practitioner vs Analyst Divergence)
Relevance (0.25): Layer 10 (Keywords) + Layer 5 (Competitive Gap Intelligence)
Authority (0.20): Layer 7 (Source Trust)
Question Gap (0.15): Layer 3 (Question Gap Detector)
Velocity (0.10): Layer 8 (Engagement Velocity) — social engagement, comments, shares, upvotes

Post-formula multipliers:
• Layer 6 (Temporal/Calendar) — seasonal multiplier applied after base score
• Layer 9 (Cross-Platform Heat) — multiplier when 2+ platforms discuss same topic within 48h
• Layer 12 (Gartner Hype Cycle) — position-based multiplier (0.7x to 1.5x)

Overrides (bypass formula):
• Layer 2 (Thought Leader Watchlist) — auto-elevates to 0.90+ regardless of formula
• Layer 11 (Noise Filter) — zeros out score entirely, signal never surfaces
emergence_position × 0.30 + relevance_depth × 0.25 + source_authority × 0.20 + question_gap × 0.15 + velocity × 0.10
Temporal multiplier applied after. TL watchlist overrides formula.

Signal Queue

V1 Human-in-the-loop review and decision point. All scored signals land here before framework selection.

Signal Queue

ScoreAction
>0.85Immediate alert — talk track within 24h
0.70–0.85Enters Signal Queue for review
0.50–0.70Enters Signal Queue — lower priority, weekly digest
<0.50Logged, not surfaced
<0.40Filtered out entirely
Signal Queue UI Behavior
Score Threshold Slider:
• Adjustable slider controls which signals enter the queue
• Slide down = more potential signals visible (lower threshold)
• Slide up = only highest confidence signals shown

Each Signal Card Displays:
• Score (numeric + visual indicator)
• Topic/title summary
• Source link(s) — clickable to review original content
• Tags and scoring factors breakdown
• Timestamp (when ingested)

User Actions Per Signal:
Generate: Route to Framework Selection + Output Format Selection → produce deliverables
Return to Queue: Keep in queue for later review (re-prioritize)
Purge: Remove from queue, log as rejected (feeds trust scoring over time)

Generate Flow:
Step 1 — Framework Selection (optional): How are we talking about this? (SPARK, PAS, StoryBrand, etc.) Defines the narrative structure and voice. Can be skipped if no specific framework is needed.
Step 2 — Output Format Selection (required, multi-select): What are we producing? Users can select multiple formats simultaneously. Determines the deliverable bundle(s) fed into the creative pipeline. Always required. Example: select Talk Track + Beat Map + Shot List for a single signal to produce both a talking head video and a sizzle reel.

Output format determines deliverable bundle:
Video: narrative, beat map, script/talking points
Podcast: narrative, episode structure, talking points, segment breakdown
LinkedIn/Article: narrative, written post, headline options, CTA
Talk Track: narrative, structured talking points, hook options

Team Member / Avatar Routing Logic V2

Potential V2 implementation — route scored signals to specific team members based on vertical expertise, capacity, and assignment rules.

Framework Selection (Structured Output)

V1 14 frameworks already built and ready for V1 implementation. Additional frameworks can be added over time.

#FrameworkStructureBest For
1SPARKSignal → Position → Argument → Reinforcement → KickerPrimary. 60-90s video scripts, LinkedIn posts, talk tracks. Most versatile across all formats.
2PASProblem → Agitation → SolutionShort-form video hooks, Instagram reels, problem-aware audiences.
3BABBefore → After → BridgeCase study videos, client testimonials, before/after LinkedIn posts.
4StoryBrandGuide → Problem → Plan → Action → SuccessLonger explainer videos, landing pages, brand positioning pieces.
5AIDAAttention → Interest → Desire → ActionAd scripts, email sequences, direct-response LinkedIn posts.
6Data-DrivenStat → Context → Implication → ActionWhite papers, data-heavy LinkedIn articles, conference presentations.
7Hot TakeContrarian → Evidence → CTAViral LinkedIn posts, Twitter threads, short-form video hooks that challenge status quo.
8Story ArcHook → Tension → Resolution → LessonPodcast episodes, longer video content, keynote structures.
9Listicle3-5 key points, punchyCarousel posts, newsletter sections, quick-hit social content.
10Trend PieceSignal → Context → Trajectory → ImplicationsThought leadership articles, industry commentary, weekly digest content.
11Case StudyChallenge → Approach → Outcome → LessonClient-facing videos, website content, sales enablement materials.
12ComparisonOption A → Option B → Verdict → WhyTool/platform evaluation posts, buyer-stage content, advisory pieces.
13PredictionCurrent state → Forces → Predicted → PrepareYear-end/new-year content, conference talks, long-form thought pieces.
14ExplainerConcept → Analogy → Example → ApplicationTutorial videos, onboarding content, explainer series, YouTube long-form.

Style Modes

Anti-Patterns (never generate)

Adding New Frameworks
• Users can manually add new frameworks at any time
• Define: name, structure (sections/steps), description, best-use-case
• New frameworks appear in the selection UI alongside defaults
• Framework library grows over time as team develops new approaches
Framework UI Behavior
Adding a Framework:
• Click "Add Framework" button in framework library
• Enter: name, description, structure (ordered sections/steps), best-for use cases
• Save → immediately available in framework selector

Selecting a Framework:
• Framework cards display name + structure preview + "Best For" hint
• "Best For" guides user toward the right output format
• Click to select → routes to Output Format selector

Editing/Removing:
• Edit framework structure, description, or best-for at any time
• Remove/archive frameworks no longer in use

Format Selection (Structured Output)

Production Types (User Selects) V1

User selects the production type. System generates all required structured ingredients for that format. Framework selection remains optional for any type.

#Production TypeStructured Ingredients Produced
1Short-Form Video (talking head)Framework (optional), script/talk track, beat map, shot list
2Short-Form Video (sizzle / no talking head)Narrative, beat map, shot list, storyboard, headline/hook variants
3Long-Form VideoStoryboard, narrative/story arc, beat map, shot list, script, segment breakdown
4PodcastSegments, guest prep, show notes, intro/outro, talking points
5Social PostPlatform sub-select (LinkedIn, IG, X, TikTok) → copy, headline/hook variants, visual direction

Structured Ingredients (System Generates)

These are the deliverables produced by the system based on production type selection. Users see what will be generated before confirming.

IngredientDescriptionUsed By
Talk Track / ScriptScripted narrative for speaking (VO, talking head, conversation)Short-form TH, Long-form, Podcast
Beat MapWord-level timestamps + overlay timingShort-form, Long-form
Shot ListCamera specs, source types, visual direction per beatShort-form, Long-form
Storyboard5-8 scene narrative arc with visual directionSizzle, Long-form
Narrative / Story ArcOverall story structure and flowSizzle, Long-form
Headline / Hook Variants5-10 options per topic for testingSizzle, Social
Social CopyPlatform-optimized text (character limits, formatting, CTAs)Social
Podcast SegmentsSegment structure, guest prep, show notes, intro/outroPodcast
Adding New Formats
• Users can add new output formats at any time
• Define: name, description, deliverable bundle (what gets produced)
• New formats appear in the selector alongside defaults
• Format library grows as production needs evolve
Format Selection UI Behavior
Selecting Formats (multi-select):
• Format cards display name + deliverable bundle preview
• User sees what they'll receive before selecting
• Select one or multiple formats simultaneously
• Confirm selection → system generates all deliverables across selected bundles

After Generation:
• Deliverables displayed for review (editable)
• Approve → routes to creative pipeline
• Edit → modify before routing
• Regenerate → re-run with same inputs

Adding/Editing:
• Click "Add Format" to create new output type
• Edit existing format descriptions or deliverable bundles
• Archive formats no longer in use

Architecture Flow

Pipeline 1: Automated
Signal Sources
RSS / API / Scrape
Ingestion Layer
Cron + user-initiated
Signal Scoring Engine
Multi-layer • multipliers • overrides
Signal Queue
Human review • score slider • approve/purge
Pipeline 2: Manual
Human Input
Text / File / URL
Bypass scoring
Signal Queue optional
Route to Framework + Format Selection
↓ Converge at selection ↓
Framework Selection
(optional) How are we talking about it?
Format Selection
(required, multi-select) What are we producing?
Structured Output
Generated ingredient bundle
Creative Pipeline
Video / Audio / Content Production