- Create crawler project structure - Implement base crawler class with safety features - Add crawlers for Ruliweb, Arcalive, DCInside - Implement utilities: fetcher (with retry logic), logger - Configure crawling settings (3s delay, max 20 posts/board) - Add test script and scheduler (30min intervals) Safety measures: - 3 second delay between requests - Exponential backoff retry logic - Respect robots.txt (DCInside disabled) - User-Agent and proper headers Current status: - Structure complete - Both Ruliweb and Arcalive return 403 (bot detection) - Need to decide: Puppeteer, switch targets, or use mock data
17 lines
355 B
JSON
17 lines
355 B
JSON
{
|
|
"compilerOptions": {
|
|
"target": "ES2022",
|
|
"module": "ESNext",
|
|
"moduleResolution": "node",
|
|
"esModuleInterop": true,
|
|
"strict": true,
|
|
"skipLibCheck": true,
|
|
"outDir": "./dist",
|
|
"rootDir": "./src",
|
|
"resolveJsonModule": true,
|
|
"declaration": true
|
|
},
|
|
"include": ["src/**/*"],
|
|
"exclude": ["node_modules", "dist"]
|
|
}
|