Major changes: - Install puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth - Create PuppeteerFetcher class with Stealth plugin - Update all crawlers to use Puppeteer instead of Axios - Add browser lifecycle management (init/close) - Update test.ts and index.ts with browser cleanup Features: - Real Chrome browser execution (bypasses TLS fingerprinting) - Stealth plugin to avoid bot detection - Headless mode for background operation - Proper error handling and browser cleanup Limitations: - Requires Chrome/Chromium installation - Higher resource usage (~200MB memory) - Slower than Axios (browser startup time) - Cannot test in current environment (Chrome install blocked) Next steps: - Test in local environment with Chrome installed - Adjust HTML selectors based on actual page structure - Monitor for Cloudflare blocks
35 lines
744 B
JSON
35 lines
744 B
JSON
{
|
|
"name": "community-crawler",
|
|
"version": "1.0.0",
|
|
"description": "Korean community crawler",
|
|
"main": "dist/index.js",
|
|
"type": "module",
|
|
"scripts": {
|
|
"dev": "tsx watch src/index.ts",
|
|
"build": "tsc",
|
|
"start": "node dist/index.js",
|
|
"test": "tsx src/test.ts"
|
|
},
|
|
"keywords": [
|
|
"crawler",
|
|
"community",
|
|
"korea"
|
|
],
|
|
"author": "",
|
|
"license": "MIT",
|
|
"dependencies": {
|
|
"axios": "^1.7.9",
|
|
"cheerio": "^1.0.0",
|
|
"node-cron": "^3.0.3",
|
|
"puppeteer": "^24.30.0",
|
|
"puppeteer-extra": "^3.3.6",
|
|
"puppeteer-extra-plugin-stealth": "^2.11.2"
|
|
},
|
|
"devDependencies": {
|
|
"@types/node": "^22.10.2",
|
|
"@types/node-cron": "^3.0.11",
|
|
"tsx": "^4.19.2",
|
|
"typescript": "^5.7.2"
|
|
}
|
|
}
|