Files
community-crawler/crawler/tsconfig.json
Claude c5ef580534 Add crawler implementation (Node.js + TypeScript)
- Create crawler project structure
- Implement base crawler class with safety features
- Add crawlers for Ruliweb, Arcalive, DCInside
- Implement utilities: fetcher (with retry logic), logger
- Configure crawling settings (3s delay, max 20 posts/board)
- Add test script and scheduler (30min intervals)

Safety measures:
- 3 second delay between requests
- Exponential backoff retry logic
- Respect robots.txt (DCInside disabled)
- User-Agent and proper headers

Current status:
- Structure complete
- Both Ruliweb and Arcalive return 403 (bot detection)
- Need to decide: Puppeteer, switch targets, or use mock data
2025-11-15 17:18:09 +00:00

17 lines
355 B
JSON

{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "node",
"esModuleInterop": true,
"strict": true,
"skipLibCheck": true,
"outDir": "./dist",
"rootDir": "./src",
"resolveJsonModule": true,
"declaration": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}