Company Overview:
Our client is an innovative AgTech company specialized in managing and validating field data within complex, smallholder-heavy agri-commodity supply chains like cocoa, coffee, palm oil, rubber, and soy. Their flagship SaaS product verifies supply chain field data in real-time to ensure seamless compliance with global regulatory frameworks. With a clear mission to improve dataset veracity and build transparent, inclusive supply chains , they are currently seeking a hands-on Data Engineer to join their expanding team in Indonesia. This role offers a modern four-day work week and a fully remote work arrangement.
Key Responsibilities:
- Model, structure, and script automated data flows to process raw information into meaningful, high-value outputs.
- Write robust Python scripts to acquire and process data from diverse academic, government, or commercial sources.
- Develop internal tools and utilize technologies like Apache Airflow, AWS, and Google Cloud to scale and optimize geospatial ETL tasks.
- Maintain the team's Git repository of ETL scripts, strictly adhering to version control best practices to ensure code reproducibility and organization.
- Optimize data ingestion processes to guarantee reliable, timely, and scalable delivery of large-scale raster and vector datasets.
- Monitor ingestion pipelines with logging, alerting, and metrics to quickly detect failures or data inconsistencies.
- Maintain high-quality metadata, internal data catalogues, manuals, dictionaries, and glossaries across global projects.
- Drive continuous operational efficiency by proactively identifying data pipeline bottlenecks and implementing solutions.
Requirements:
- Min 3+ years of experience in a relevant data engineering.
- Deep expertise in building, optimizing, and maintaining scalable ETL data pipelines for complex geospatial datasets using modern orchestration tools like Apache Airflow.
- Proficiency in cloud platforms (specifically AWS) with hands-on experience using Infrastructure as Code (IaC) tools like Terraform.
- Strong Python skills alongside data engineering frameworks such as Pandas, Pydantic, Rasterio, or Tippecanoe.
- Solid understanding of data governance, Git version control, technical documentation, and rigorous data quality assurance.
- Strong problem-solving, analytical skills, and the capability to effectively organize data from multiple sources.
- Independently driven, proactive, accountable, and a highly collaborative team player.
- Excellent English communication skills to collaborate seamlessly with international stakeholders and cross-border teams across the Netherlands, Lithuania, and Ghana.
Nice-to-Have:
- Familiarity with GIS scripting (GeoPandas, PostGIS) and GIS software environments like QGIS.
- Demonstrated personal projects or professional case studies involving forest monitoring or land-use analysis.
- Hands-on experience with specialized data version control tools such as DVC.
- Prior experience working within a fast-growing start-up environment.