Add Puppeteer support to bypass bot detection

Major changes:
- Install puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth
- Create PuppeteerFetcher class with Stealth plugin
- Update all crawlers to use Puppeteer instead of Axios
- Add browser lifecycle management (init/close)
- Update test.ts and index.ts with browser cleanup

Features:
- Real Chrome browser execution (bypasses TLS fingerprinting)
- Stealth plugin to avoid bot detection
- Headless mode for background operation
- Proper error handling and browser cleanup

Limitations:
- Requires Chrome/Chromium installation
- Higher resource usage (~200MB memory)
- Slower than Axios (browser startup time)
- Cannot test in current environment (Chrome install blocked)

Next steps:
- Test in local environment with Chrome installed
- Adjust HTML selectors based on actual page structure
- Monitor for Cloudflare blocks
This commit is contained in:
Claude
2025-11-15 17:39:43 +00:00
parent d62867e0cb
commit ae85dcbd87
9 changed files with 1864 additions and 13 deletions

View File

@@ -10,13 +10,20 @@
"start": "node dist/index.js",
"test": "tsx src/test.ts"
},
"keywords": ["crawler", "community", "korea"],
"keywords": [
"crawler",
"community",
"korea"
],
"author": "",
"license": "MIT",
"dependencies": {
"axios": "^1.7.9",
"cheerio": "^1.0.0",
"node-cron": "^3.0.3"
"node-cron": "^3.0.3",
"puppeteer": "^24.30.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2"
},
"devDependencies": {
"@types/node": "^22.10.2",