Analyze, maintain, and optimize existing Python scripts to ensure seamless data pipeline automation and system reliability.
Design and implement scalable processes to transform unstructured or semi-structured data like JSON, XML, and logs into structured tabular formats.
Develop and manage end-to-end ETL and CDC workflows using tools such as SSIS, Talend, Informatica, or Azure Data Factory.
Implement event-based streaming and real-time data processing using Debezium, Kafka, Flink, and Spark.
Manage and query diverse database environments, including PostgreSQL, Oracle, SQL Server, NoSQL (MongoDB), and GraphDB (Gsql/TG).
Oversee Data Warehouse architecture and manage job scheduling and orchestration across cloud (GCP/BigQuery/Azure) and on-premise systems.
Collaborate with cross-functional teams to translate complex data requirements into technical solutions while troubleshooting and resolving pipeline issues.
Qualifications:
At least 1-3 years of professional work experience, with a heavy specialization in Data Engineering. Hands-on experience with SQL, focusing on performance tuning, complex joins, and data granularity.
Strong proficiency in Python, specifically the ability to read, refactor, and maintain existing codebases.
Proven expertise in ETL tools (Kettle, Pentaho, Informatica) and modern data transformation frameworks like dbt (data build tool).
Knowledge of Big Data tools such as Hive, Impala, and cloud-native solutions like BigQuery and Azure Data Factory.
Familiarity with REST APIs, basic programming principles, and data visualization tools like Power BI.
Proactive mindset with an eagerness to learn and adapt to new emerging technologies.