Add Puppeteer support to bypass bot detection
Major changes: - Install puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth - Create PuppeteerFetcher class with Stealth plugin - Update all crawlers to use Puppeteer instead of Axios - Add browser lifecycle management (init/close) - Update test.ts and index.ts with browser cleanup Features: - Real Chrome browser execution (bypasses TLS fingerprinting) - Stealth plugin to avoid bot detection - Headless mode for background operation - Proper error handling and browser cleanup Limitations: - Requires Chrome/Chromium installation - Higher resource usage (~200MB memory) - Slower than Axios (browser startup time) - Cannot test in current environment (Chrome install blocked) Next steps: - Test in local environment with Chrome installed - Adjust HTML selectors based on actual page structure - Monitor for Cloudflare blocks
This commit is contained in:
@@ -10,13 +10,20 @@
|
||||
"start": "node dist/index.js",
|
||||
"test": "tsx src/test.ts"
|
||||
},
|
||||
"keywords": ["crawler", "community", "korea"],
|
||||
"keywords": [
|
||||
"crawler",
|
||||
"community",
|
||||
"korea"
|
||||
],
|
||||
"author": "",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"axios": "^1.7.9",
|
||||
"cheerio": "^1.0.0",
|
||||
"node-cron": "^3.0.3"
|
||||
"node-cron": "^3.0.3",
|
||||
"puppeteer": "^24.30.0",
|
||||
"puppeteer-extra": "^3.3.6",
|
||||
"puppeteer-extra-plugin-stealth": "^2.11.2"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^22.10.2",
|
||||
|
||||
Reference in New Issue
Block a user