AI Scraper Engineer

LN Technologies

Indonesia, Surabaya

2-4 Years

Save

Posted 20 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

AI Scraper Engineer (Hybrid)

We're hiring an engineer to design and run server-side scraping systems—including custom scrapers—that stay reliable as targets change and defenses tighten. You'll work closely with leadership (in English only) and help evolve toward AI-orchestrated scraping (e.g. GPT-style models generating or coordinating scripts, proxy choices, and runs).

What you'll do

Build and maintain production-grade scrapers in Python (and supporting tooling), from HTTP/API clients to headless browser flows where needed.
Reverse-engineer sites and mobile/web apps: discover internal/private APIs, headers, auth flows, pagination, and rate limits; document findings for the team.
Operate in the full scraping stack: proxies (residential/datacenter/mobile rotation), TLS/JA3, browser fingerprints, cookies/sessions, CAPTCHA/mitigation strategies where ethical and legal.
Use Playwright (or equivalent) for high-defense targets where pure HTTP isn't enough; keep runs stable, observable, and cost-aware.
Collaborate remotely with fluent English (written and spoken) for specs, incidents, and architecture discussions with our team.
Use AI coding assistants (e.g. Cursor, Codex, ChatGPT-class tools) as a normal part of workflow—shipping fixes and features fast while preserving code quality, security, and performance.
Lay groundwork for generative-AI orchestration: prompts/tooling that propose or adjust scraper logic, schedules, proxy usage, and failure recovery (within compliance boundaries).

What we're looking for

Must-have

2+ years professional experience building scrapers, crawlers, or data-ingestion pipelines against real, changing websites.
Strong Python for networking, async I/O, parsing (HTML/JSON), error handling, retries, and structured logging.
Practical grasp of HTTP/HTTPS, REST/JSON (and common variations), cookies, redirects, CORS (from a client perspective), and basic TLS implications for scraping.
Hands-on experience with proxies, IP rotation, session stickiness, and anti-bot concepts (fingerprints, headers, behavioral signals).
Playwright or Selenium-class automation for difficult sites.
Comfortable debugging with DevTools, mitmproxy/Charles-class tools, HAR captures, and reading minified JS when tracing API calls.
Professional English for daily collaboration with English-only stakeholders.
Openness to hybrid: remote initially, with expectation to work on-site/offline with the team later (details to align during hiring).

Nice-to-have

Experience with queue/workers (Celery, RQ, Dramatiq), containers, and basic observability (metrics, tracing, alerting).
Familiarity with LLM APIs, prompt design, and safe patterns for AI-generated code (review, tests, sandbox runs).
Exposure to classification/extraction (rules + ML) for messy HTML or unstructured text.
Understanding of legal and ethical boundaries (robots.txt, ToS, regional rules)—we expect judgment and compliance-minded design.

How we work

Hybrid model: remote to start; later combination of remote and in-person as agreed.
We value engineers who iterate quickly with AI tools but still own outcomes: correctness, edge cases, security (secrets, customer data), and operational stability.

To apply:

Submit your application here: https://link.lntech.ai/h-ase
Send a short note on two scraping projects you've shipped (stack, defenses faced, how you kept them running), plus anything relevant on Playwright, proxy/fingerprint setups, or AI-assisted development.