Files
community-crawler/crawler/package.json
Claude ae85dcbd87 Add Puppeteer support to bypass bot detection
Major changes:
- Install puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth
- Create PuppeteerFetcher class with Stealth plugin
- Update all crawlers to use Puppeteer instead of Axios
- Add browser lifecycle management (init/close)
- Update test.ts and index.ts with browser cleanup

Features:
- Real Chrome browser execution (bypasses TLS fingerprinting)
- Stealth plugin to avoid bot detection
- Headless mode for background operation
- Proper error handling and browser cleanup

Limitations:
- Requires Chrome/Chromium installation
- Higher resource usage (~200MB memory)
- Slower than Axios (browser startup time)
- Cannot test in current environment (Chrome install blocked)

Next steps:
- Test in local environment with Chrome installed
- Adjust HTML selectors based on actual page structure
- Monitor for Cloudflare blocks
2025-11-15 17:39:43 +00:00

35 lines
744 B
JSON

{
"name": "community-crawler",
"version": "1.0.0",
"description": "Korean community crawler",
"main": "dist/index.js",
"type": "module",
"scripts": {
"dev": "tsx watch src/index.ts",
"build": "tsc",
"start": "node dist/index.js",
"test": "tsx src/test.ts"
},
"keywords": [
"crawler",
"community",
"korea"
],
"author": "",
"license": "MIT",
"dependencies": {
"axios": "^1.7.9",
"cheerio": "^1.0.0",
"node-cron": "^3.0.3",
"puppeteer": "^24.30.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2"
},
"devDependencies": {
"@types/node": "^22.10.2",
"@types/node-cron": "^3.0.11",
"tsx": "^4.19.2",
"typescript": "^5.7.2"
}
}