Pulls WeChat public account articles from mp.weixin.qq.com links and saves them as HTML, Markdown, or JSON metadata. Handles single articles or batch downloads with configurable delays to avoid rate limits. Downloads images with proper referer headers since WeChat uses lazy loading, organizes output by account and date, and can extract just the metadata if you don't need files. The short URL format is more reliable than the long __biz URLs which sometimes trigger captchas. Built on BeautifulSoup and html2text, with a Python API if you want to integrate it into your own scripts instead of using the CLI.
npx -y skills add wwwzhouhui/skills_collection --skill wechat-article-fetcher --agent claude-codeInstalls into .claude/skills of the current project.
获取、解析并保存微信公众号文章,支持单篇和批量下载、元数据提取、图片下载和 Markdown 转换。
获取单篇文章:
python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx"
批量获取多篇文章(空格分隔):
python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output
批量获取多篇文章(逗号分隔):
python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output
仅输出元数据(不保存文件):
python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx" --json
pip install beautifulsoup4 html2text requests
python scripts/fetch_wechat_article.py "<url>" --output-dir ./output
输出目录结构:
output/<公众号名称>/<日期>_<标题>/
├── index.html # 格式化的独立HTML文件
├── article.md # Markdown版本
├── meta.json # 文章元数据
└── images/ # 下载的图片
python scripts/fetch_wechat_article.py "<url>" --json
返回 JSON 包含:title(标题)、author(作者)、account_nickname(公众号名称)、description(摘要)、create_time(发布时间)、content_text(正文文本)、content_markdown(Markdown内容)、cover_image(封面图)、source_url(原文链接)。
空格分隔多个链接:
python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output
逗号分隔多个链接:
python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output
自定义下载间隔(默认3秒,避免触发反爬):
python scripts/fetch_wechat_article.py "url1" "url2" --interval 5
同一公众号的文章自动归类到同一目录下。
python scripts/fetch_wechat_article.py "<url>" --no-images
python scripts/fetch_wechat_article.py "<url>" --no-images
from scripts.fetch_wechat_article import fetch_article, batch_fetch
# 单篇获取并保存
result = fetch_article("https://mp.weixin.qq.com/s/xxxxx", output_dir="./output")
print(result['title'], result['path'])
# 单篇仅获取元数据
meta = fetch_article("https://mp.weixin.qq.com/s/xxxxx", json_only=True)
print(meta['title'])
print(meta['content_text'][:200])
# 批量获取
urls = ["https://mp.weixin.qq.com/s/aaa", "https://mp.weixin.qq.com/s/bbb"]
stats = batch_fetch(urls, output_dir="./output", interval=3.0)
print(f"成功{stats['success']}篇, 失败{stats['fail']}篇")
主要函数参数:
url:文章链接(支持短链接和长链接)output_dir:保存目录(默认:./wechat_articles)download_img:是否下载图片(默认:True)to_markdown:是否转换为 Markdown(默认:True)json_only:仅返回元数据字典,不保存文件batch_fetch 额外参数:
urls:文章链接列表interval:每篇文章之间的下载间隔秒数(默认:3.0)/s/xxxxx)—— 带 __biz 参数的长链接可能触发验证码。--interval 调整,避免触发微信反爬机制。data-src 属性(非 src),因为采用了懒加载。Referer: https://mp.weixin.qq.com/ 请求头。juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills