Design and implement frameworks for data quality validation, including rules, thresholds, and metrics to ensure high-quality data.
Automate data quality monitoring and anomaly detection using tools or custom scripts.
Collaborate with data stewards to resolve data quality issues and improve processes.
Develop and maintain centralized metadata repositories to ensure accurate and up-to-date metadata.
Automate metadata extraction, data lineage tracking, and validation processes.
Integrate metadata management solutions with tools like Informatica EDC, Axon, or similar platforms.
Work closely with the Data Governance team to implement data governance policies and data standards across the data lifecycle.
Create and maintain pipelines to enable data lineage, audit trails, and compliance reporting.
Support the adoption of governance tools and frameworks to ensure consistent data usage.
Build robust, scalable, and secure ETL/ELT pipelines for structured and unstructured data.
Optimize data ingestion, transformation, and storage for hybrid (on-premises and on cloud) environments.
Ensure data pipeline reliability and performance through monitoring and testing.
Work closely with data stewards, data analysts, and data governance teams to align engineering efforts with business needs.
Mentor junior team members on best practices in data engineering and governance.
5-10 years of experience in data engineering, with a focus on data quality, metadata, data lineage and data governance..
Strong programming skills in Python, SQL, or Scala; experience with modern data frameworks such as PySpark and dbt (Data Build Tool) is a plus..
Expertise in ETL/ELT tools like Informatica Intelligent Data Management Cloud (IDMC), Talend, Apache NiFi, or similar cloud-native solutions..
Proficiency with metadata management platforms such as Informatica EDC, Collibra, or Alation, including automation of metadata ingestion, classification, and lineage mapping..
Hands-on experience with data quality tools (e.g., Informatica Data Engineering Quality (DEQ), Collibra Data Quality & Observability, Great Expectations) and custom validation scripting..
Strong knowledge of data governance frameworks and tools (e.g., Informatica Axon, Collibra Governance, Alation Data Governance)..
Experience with cloud data platforms and databases (e.g., Snowflake, Databricks, Oracle, SQL Server), as well as data lake/lakehouse architectures..
Familiarity with multi-cloud and hybrid environments (e.g., AWS, Azure, Google Cloud Platform) and their native data services..
Metadata automation (auto-tagging, semantic enrichment, and data catalog population).
Anomaly detection in pipelines and datasets using AI-driven observability tools.
Data quality improvement through AI-based rules generation, pattern recognition, and automated remediation suggestions...
Strong problem-solving and analytical abilities.
Excellent communication and collaboration skills.
Ability to mentor and guide junior team members..
Job skills required: Python, Compliance, ETL
Job skills preferred: Apache, SQL, Scala