Skip to main content
AI-powered topic intelligence for fast-moving public conversations

Turn hot topics intostructured social evidence.

MindSpider is an open-source sentiment crawling system that first discovers emerging topics, then expands them into platform-level crawls for deeper discussion, reaction, and feedback analysis.

Open-source project sitePlaywright-powered crawlingMySQL-first persistence
13
upstream discovery sources

Daily feeds across social, technical, and community surfaces.

7
deep-crawl platforms

Platform-specific passes for posts, comments, and engagement evidence.

2-stage
analysis pipeline

Broad topic extraction first, platform-level sentiment crawling second.

System Shape

Discovery to crawl, without the manual gap

README-backed

Discovery Sources

Daily signal intake

Weibo, Zhihu, Bilibili, Toutiao, GitHub, CoolApk, and adjacent feeds seed the topic graph before deeper crawling begins.

Agent Layer

AI topic extraction

Model-assisted summarization produces topic names, summaries, and keyword lists from noisy daily sources.

Crawl Queue

Keyword fan-out

The extracted topics become crawl tasks for each platform adapter, keeping downstream work tied to explicit evidence.

Platform Pass

Deep sentiment crawling

Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu are crawled with browser automation to capture comments, reactions, and discussion context.

Output

Tables + reports

Data lands in explicit tables like `daily_topics`, `topic_news_relation`, and `crawling_tasks`.

Discovery sources

The broad pass is designed to recognize momentum before you choose a crawl target.

WeiboZhihuBilibiliToutiaoGitHubCoolApk

Deep-crawl targets

The second pass goes deeper on the platforms where sentiment, discussion, and feedback actually live.

XiaohongshuDouyinKuaishouBilibiliWeiboTiebaZhihu

Data outputs

What comes out is designed for operators, not just demos.

daily_newsdaily_topicstopic_news_relationcrawling_tasksplatform content tables

How It Works

A crawler pipeline shaped like an analyst workflow.

01

Discover rising topics

MindSpider pulls daily hot signals from news and community sources, then uses AI extraction to turn raw headlines into reusable topic clusters.

02

Fan out into platform crawls

Those topic clusters become structured keyword queues for deep crawls across Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu.

03

Persist evidence for analysis

Tasks, content, and relationships are written into MySQL-ready tables so you can review trajectories, compare platforms, and build downstream reports.

Architecture

Three lanes, one intent: make topic movement inspectable.

Daily signal intake

Broad Topic Extraction

The first lane watches public trend surfaces, normalizes source data, and asks the model layer to produce topics worth pursuing.

  • Daily news and hot-list collection
  • AI-generated topic summaries
  • Keyword lists written to durable storage

Platform-specific evidence collection

Deep Sentiment Crawling

The second lane takes the extracted keywords and turns them into structured crawl tasks for each target platform.

  • Per-platform crawler adapters
  • Login-aware browser sessions
  • Comment, post, and interaction capture

Tables, tasks, and replayability

Structured Output Layer

Instead of dumping text into blobs, MindSpider stores topic relations, crawl progress, and platform outputs in explicit database structures.

  • MySQL-oriented persistence
  • Task progress and status tracking
  • Reusable datasets for reports and follow-on agents

Open-Source Status

MindSpider is live as a project identity, but its latest code lives inside BettaFish.

The original MindSpider repository documents the pipeline clearly and is still useful for understanding the architecture. The maintainers now position it as a module inside BettaFish for newer work.

  • Use this site as the product-facing front door for the project story.
  • Use the GitHub repository and README for current installation details.
  • Treat the self-serve onboarding flow here as intentionally in-progress.

Upstream repositories

Keep both links visible so operators can read the original README and follow the newer monorepo path without guessing.

Original MindSpider repositoryBettaFish upstream module host

Feature Surface

Product language for a system that still respects the code.

AI topic extraction

Convert noisy daily news and hot lists into themes, summaries, and keyword sets that agents can keep working with.

Playwright-first crawling

Browser automation is built into the deep crawl layer, making dynamic pages and login-heavy flows more realistic to operate.

Platform-aware storage

Outputs are mapped into structured tables for notes, videos, threads, tasks, and topic relationships instead of loose export files.

Keyword queue control

The system manages topic-to-keyword fan-out so follow-up crawls stay tied to the signals that triggered them.

Open-source inspectability

Everything is visible in code: pipeline stages, database schema, platform adapters, and the operational assumptions around them.

Built for analyst handoff

The output is meant to be reviewed, queried, and reused by humans or later agents instead of dying as a one-shot scrape.

Frequently Asked Questions

Is MindSpider a hosted SaaS product today?+

No. This site presents MindSpider as an open-source project and product identity. The current Start Free path is a placeholder while the guided onboarding flow is rebuilt.

What does the two-stage pipeline actually mean?+

Stage one identifies promising topics from daily feeds. Stage two takes the resulting keywords and runs deeper platform crawls to gather sentiment-bearing evidence.

Which technologies shape the implementation?+

The README centers Python, Playwright, MySQL, asyncio, and a DeepSeek-compatible analysis layer for topic extraction and downstream interpretation.

Can I self-host it?+

Yes. The project is presented as an inspectable open-source system, so the primary path today is repository-driven setup rather than a closed hosted dashboard.

Why mention BettaFish here?+

Because the upstream README now states that the latest MindSpider code is maintained as a submodule inside BettaFish. Linking both avoids sending users to stale expectations.

Start Path

Start free now, then decide how deep you want the project story to go.

Today that means a guided placeholder path, the public README, and the upstream repositories. The site flow is being rebuilt so those three pieces eventually feel like one coherent onboarding track.