Add crawler implementation (Node.js + TypeScript)
- Create crawler project structure - Implement base crawler class with safety features - Add crawlers for Ruliweb, Arcalive, DCInside - Implement utilities: fetcher (with retry logic), logger - Configure crawling settings (3s delay, max 20 posts/board) - Add test script and scheduler (30min intervals) Safety measures: - 3 second delay between requests - Exponential backoff retry logic - Respect robots.txt (DCInside disabled) - User-Agent and proper headers Current status: - Structure complete - Both Ruliweb and Arcalive return 403 (bot detection) - Need to decide: Puppeteer, switch targets, or use mock data
This commit is contained in: