Wechat Article Fetcher

290 installs244 stars

Summary

Pulls WeChat public account articles from mp.weixin.qq.com links and saves them as HTML, Markdown, or JSON metadata. Handles single articles or batch downloads with configurable delays to avoid rate limits. Downloads images with proper referer headers since WeChat uses lazy loading, organizes output by account and date, and can extract just the metadata if you don't need files. The short URL format is more reliable than the long __biz URLs which sometimes trigger captchas. Built on BeautifulSoup and html2text, with a Python API if you want to integrate it into your own scripts instead of using the CLI.

Install to Claude Code

npx -y skills add wwwzhouhui/skills_collection --skill wechat-article-fetcher --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.mdView on GitHub

微信公众号文章获取器

获取、解析并保存微信公众号文章，支持单篇和批量下载、元数据提取、图片下载和 Markdown 转换。

快速开始

获取单篇文章：

python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx"

批量获取多篇文章（空格分隔）：

python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output

批量获取多篇文章（逗号分隔）：

python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output

仅输出元数据（不保存文件）：

python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx" --json

依赖安装

pip install beautifulsoup4 html2text requests

功能说明

1. 获取文章并保存到本地

python scripts/fetch_wechat_article.py "<url>" --output-dir ./output

输出目录结构：

output/<公众号名称>/<日期>_<标题>/
├── index.html    # 格式化的独立HTML文件
├── article.md    # Markdown版本
├── meta.json     # 文章元数据
└── images/       # 下载的图片

2. 仅提取元数据

python scripts/fetch_wechat_article.py "<url>" --json

返回 JSON 包含：title（标题）、author（作者）、account_nickname（公众号名称）、description（摘要）、create_time（发布时间）、content_text（正文文本）、content_markdown（Markdown内容）、cover_image（封面图）、source_url（原文链接）。

3. 批量下载多篇文章

空格分隔多个链接：

python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output

逗号分隔多个链接：

python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output

自定义下载间隔（默认3秒，避免触发反爬）：

python scripts/fetch_wechat_article.py "url1" "url2" --interval 5

同一公众号的文章自动归类到同一目录下。

4. 不下载图片

python scripts/fetch_wechat_article.py "<url>" --no-images

4. 不下载图片

python scripts/fetch_wechat_article.py "<url>" --no-images

5. 作为 Python 库调用

from scripts.fetch_wechat_article import fetch_article, batch_fetch

# 单篇获取并保存
result = fetch_article("https://mp.weixin.qq.com/s/xxxxx", output_dir="./output")
print(result['title'], result['path'])

# 单篇仅获取元数据
meta = fetch_article("https://mp.weixin.qq.com/s/xxxxx", json_only=True)
print(meta['title'])
print(meta['content_text'][:200])

# 批量获取
urls = ["https://mp.weixin.qq.com/s/aaa", "https://mp.weixin.qq.com/s/bbb"]
stats = batch_fetch(urls, output_dir="./output", interval=3.0)
print(f"成功{stats['success']}篇, 失败{stats['fail']}篇")

主要函数参数：

url：文章链接（支持短链接和长链接）
output_dir：保存目录（默认：./wechat_articles）
download_img：是否下载图片（默认：True）
to_markdown：是否转换为 Markdown（默认：True）
json_only：仅返回元数据字典，不保存文件

batch_fetch 额外参数：

urls：文章链接列表
interval：每篇文章之间的下载间隔秒数（默认：3.0）

注意事项

优先使用短链接（/s/xxxxx）—— 带 __biz 参数的长链接可能触发验证码。
批量下载时默认间隔3秒，可通过 --interval 调整，避免触发微信反爬机制。
自动使用微信移动端 User-Agent 绕过访问限制。
微信图片使用 data-src 属性（非 src），因为采用了懒加载。
下载图片需要设置 Referer: https://mp.weixin.qq.com/ 请求头。
HTML 结构详情参见 references/wechat_html_structure.md。

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

First SeenJun 3, 2026

View on GitHub

微信公众号文章获取器

获取、解析并保存微信公众号文章，支持单篇和批量下载、元数据提取、图片下载和 Markdown 转换。

快速开始

获取单篇文章：

python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx"

批量获取多篇文章（空格分隔）：

python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output

批量获取多篇文章（逗号分隔）：

python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output

仅输出元数据（不保存文件）：

python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx" --json

依赖安装

pip install beautifulsoup4 html2text requests

功能说明

1. 获取文章并保存到本地

python scripts/fetch_wechat_article.py "<url>" --output-dir ./output

输出目录结构：

output/<公众号名称>/<日期>_<标题>/
├── index.html    # 格式化的独立HTML文件
├── article.md    # Markdown版本
├── meta.json     # 文章元数据
└── images/       # 下载的图片

2. 仅提取元数据

python scripts/fetch_wechat_article.py "<url>" --json

3. 批量下载多篇文章

空格分隔多个链接：

python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output

逗号分隔多个链接：

python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output

自定义下载间隔（默认3秒，避免触发反爬）：

python scripts/fetch_wechat_article.py "url1" "url2" --interval 5

同一公众号的文章自动归类到同一目录下。

4. 不下载图片

python scripts/fetch_wechat_article.py "<url>" --no-images

4. 不下载图片

python scripts/fetch_wechat_article.py "<url>" --no-images

5. 作为 Python 库调用

from scripts.fetch_wechat_article import fetch_article, batch_fetch

# 单篇获取并保存
result = fetch_article("https://mp.weixin.qq.com/s/xxxxx", output_dir="./output")
print(result['title'], result['path'])

# 单篇仅获取元数据
meta = fetch_article("https://mp.weixin.qq.com/s/xxxxx", json_only=True)
print(meta['title'])
print(meta['content_text'][:200])

# 批量获取
urls = ["https://mp.weixin.qq.com/s/aaa", "https://mp.weixin.qq.com/s/bbb"]
stats = batch_fetch(urls, output_dir="./output", interval=3.0)
print(f"成功{stats['success']}篇, 失败{stats['fail']}篇")

主要函数参数：

url：文章链接（支持短链接和长链接）
output_dir：保存目录（默认：./wechat_articles）
download_img：是否下载图片（默认：True）
to_markdown：是否转换为 Markdown（默认：True）
json_only：仅返回元数据字典，不保存文件

batch_fetch 额外参数：

urls：文章链接列表
interval：每篇文章之间的下载间隔秒数（默认：3.0）

注意事项

优先使用短链接（/s/xxxxx）—— 带 __biz 参数的长链接可能触发验证码。
批量下载时默认间隔3秒，可通过 --interval 调整，避免触发微信反爬机制。
自动使用微信移动端 User-Agent 绕过访问限制。
微信图片使用 data-src 属性（非 src），因为采用了懒加载。
下载图片需要设置 Referer: https://mp.weixin.qq.com/ 请求头。
HTML 结构详情参见 references/wechat_html_structure.md。

Wechat Article Fetcher

Install to Claude Code

微信公众号文章获取器

快速开始

依赖安装

功能说明

1. 获取文章并保存到本地

2. 仅提取元数据

3. 批量下载多篇文章

4. 不下载图片

4. 不下载图片

5. 作为 Python 库调用

注意事项

Wechat Article Fetcher

Install to Claude Code

微信公众号文章获取器

快速开始

依赖安装

功能说明

1. 获取文章并保存到本地

2. 仅提取元数据

3. 批量下载多篇文章

4. 不下载图片

4. 不下载图片

5. 作为 Python 库调用

注意事项

Recommended

Recommended