1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
| const axios = require('axios') const fs = require('fs') const cheerio = require('cheerio') const moment = require('moment') const path = require('path')
const __biz = "" const uin = "" const key = ""
const ins = axios.create({ headers: { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) MicroMessenger/6.8.0(0x16080000) MacWechat/3.2(0x13020013) Chrome/39.0.2171.95 Safari/537.36 NetType/WIFI WindowsWechat" } })
async function craw(index) { let url = "https://mp.weixin.qq.com/mp/profile_ext" let params = { __biz, uin, key, 'action': 'getmsg', 'f': 'json', 'offset': index * 10, 'count': '10', 'is_ok': '1', 'scene': '124', 'wxtoken': '', 'x5': '0', } let { data } = await ins.get(url, { params }) return data.general_msg_list }
async function writeArticleToFile({ app_msg_ext_info, comm_msg_info: { datetime } }) { const { title, content_url, cover, author } = app_msg_ext_info const imgFileName = datetime + ".png" datetime = moment(datetime * 1000) const year = datetime.year() if (!fs.existsSync(`${year}`)) { fs.mkdirSync(`${year}`) fs.mkdirSync(`images/${year}`) } const created_at = datetime.format('YYYY-MM-DD HH:mm:SS') const content = await crawArticle(content_url) const writer = fs.createWriteStream(path.resolve(__dirname, 'images', String(year), imgFileName)) const res = await ins({ url: cover, method: "GET", responseType: "stream" }) res.data.pipe(writer) const result = `--- title: ${title} date: ${created_at} layout: post author: ${author} img: /source/images/${year}/${imgFileName} categories: - 微信文章 - 导入 tags: 随笔 ---
[原链接](${content_url}) {% raw %} ${content} {% endraw %} `
fs.writeFileSync(`${year}/${title}.md`, result, 'utf8')
}
async function crawArticle(url) { const res = await ins.get(url) return readArticle(res.data) }
function readArticle(html) { let $ = cheerio.load(html) let content = $('#js_content').html() return content }
async function main() { let page = 0 while (true) { const res = await craw(page++) if (!res.list || res.list.length === 0) break for (const obj of res.list) { try { await writeArticleToFile(obj) } catch (e) { console.log(e) console.log(obj.comm_msg_info.id) } } } } main()
|