AI Scraper Engineer (Hybrid)
We're hiring an engineer to design and run server-side scraping systems—including custom scrapers—that stay reliable as targets change and defenses tighten. You'll work closely with leadership (in English only) and help evolve toward AI-orchestrated scraping (e.g. GPT-style models generating or coordinating scripts, proxy choices, and runs).
What you'll do
- Build and maintain production-grade scrapers in Python (and supporting tooling), from HTTP/API clients to headless browser flows where needed.
- Reverse-engineer sites and mobile/web apps: discover internal/private APIs, headers, auth flows, pagination, and rate limits; document findings for the team.
- Operate in the full scraping stack: proxies (residential/datacenter/mobile rotation), TLS/JA3, browser fingerprints, cookies/sessions, CAPTCHA/mitigation strategies where ethical and legal.
- Use Playwright (or equivalent) for high-defense targets where pure HTTP isn't enough; keep runs stable, observable, and cost-aware.
- Collaborate remotely with fluent English (written and spoken) for specs, incidents, and architecture discussions with our team.
- Use AI coding assistants (e.g. Cursor, Codex, ChatGPT-class tools) as a normal part of workflow—shipping fixes and features fast while preserving code quality, security, and performance.
- Lay groundwork for generative-AI orchestration: prompts/tooling that propose or adjust scraper logic, schedules, proxy usage, and failure recovery (within compliance boundaries).
What we're looking for
Must-have
- 2+ years professional experience building scrapers, crawlers, or data-ingestion pipelines against real, changing websites.
- Strong Python for networking, async I/O, parsing (HTML/JSON), error handling, retries, and structured logging.
- Practical grasp of HTTP/HTTPS, REST/JSON (and common variations), cookies, redirects, CORS (from a client perspective), and basic TLS implications for scraping.
- Hands-on experience with proxies, IP rotation, session stickiness, and anti-bot concepts (fingerprints, headers, behavioral signals).
- Playwright or Selenium-class automation for difficult sites.
- Comfortable debugging with DevTools, mitmproxy/Charles-class tools, HAR captures, and reading minified JS when tracing API calls.
- Professional English for daily collaboration with English-only stakeholders.
- Openness to hybrid: remote initially, with expectation to work on-site/offline with the team later (details to align during hiring).
Nice-to-have
- Experience with queue/workers (Celery, RQ, Dramatiq), containers, and basic observability (metrics, tracing, alerting).
- Familiarity with LLM APIs, prompt design, and safe patterns for AI-generated code (review, tests, sandbox runs).
- Exposure to classification/extraction (rules + ML) for messy HTML or unstructured text.
- Understanding of legal and ethical boundaries (robots.txt, ToS, regional rules)—we expect judgment and compliance-minded design.
How we work
- Hybrid model: remote to start; later combination of remote and in-person as agreed.
- We value engineers who iterate quickly with AI tools but still own outcomes: correctness, edge cases, security (secrets, customer data), and operational stability.
To apply:
- Submit your application here: https://link.lntech.ai/h-ase
- Send a short note on two scraping projects you've shipped (stack, defenses faced, how you kept them running), plus anything relevant on Playwright, proxy/fingerprint setups, or AI-assisted development.