Senior Data Engineer

5-7 Years

Save

Early Applicant

Job Description

Role Description

Develop and plan work activities to ensure timely and structured project execution.
Design and build streaming pipelines for real-time data processing.
Develop and maintain data marts to support reporting and dashboard requirements.
Collect, sort, clean, and manage data to ensure accuracy and reliability.
Perform data analysis and present insights and findings to relevant teams.
Manage data transfer processes (including tally data) from transactional databases to the data warehouse.
Administer and manage server job schedulers to ensure smooth data processing operations.
Collaborate with cross-functional teams to identify and resolve technical issues on both backend and frontend systems.
Prepare and submit monthly work progress reports.

Qualifications

Experienced in Data Engineer role minimum for 5 years
Proficient in Python for data processing & automation.
Advanced SQL skills (subquery, ROW_NUMBER, RANK, PARTITION BY, Indexing, Explain Plan, dan Query Tuning, Stored Procedures, Triggers, View, Materialize View, Constraints, indexing, query optimization)
Strong hands-on experience with: MySQL, PostgreSQL, Oracle, Snowflake, Handling structured & unstructured data
Experience in Distributed Data Storage Systems (partitioned data storage, sorting key, SerDes, data replication, caching and persistence)
Distributed Data Processing Systems (partitioning, predicate pushdown, sort by partition, maintaining size of shuffle blocks, window function, leveraging all cores and memory available in the cluster to improve concurrency)
Stream Data Processing (Real-time, Stream and Batch Processing)
Experinced in using tools such as:

Data Pipelines and Automation (Airbyte)
Data ingestion in Message Queues
Data Wrangling operations Pandas,numpy,re d. Data Scraping requests/BeautifulSoup/lxml/Scrapy
Interacting with External APIs and other Data Sources, Logging
Parallel processing Libraries Dask, Multiprocessing
Data engineering tools (Apache Kafka, Apache Airflow)
Cloud Native Technologies (Serverless Computing, Virtual Instances, Containering Docker, Orchestration -Kubernetes)
Data Visualization (Qlikview, Tableau)