Job Title: Data Engineer
Job Summary:
We are looking for a Junior Data Engineer to design, develop, and maintain data pipelines, ETL processes, and cloud-based data solutions to support data analytics and business intelligence applications. The ideal candidate should have hands-on experience with SQL, Python, data modeling, big data technologies, and cloud platforms. This role involves working with structured and unstructured data, optimizing data workflows, and collaborating with data scientists and analysts to enable data-driven decision-making.
Key Responsibilities:
- Design, develop, and optimize data pipelines for collecting, transforming, and storing large-scale data
- Implement and manage ETL (Extract, Transform, Load) processes to move data between databases and data warehouses
- Write efficient SQL queries for data transformation, aggregation, and reporting
- Ensure data quality, validation, and integrity by implementing best practices for data governance
- Work with cloud-based data storage and processing solutions (AWS, Google Cloud, Azure)
- Optimize and automate data workflows using tools like Apache Airflow, Prefect, or Luigi
- Develop and maintain scalable data infrastructure to support analytics and machine learning applications
- Monitor and troubleshoot data pipelines to ensure high availability and performance
- Collaborate with data scientists and business analysts to ensure seamless data accessibility and integration
- Document data architecture, transformations, and best practices
Skills and Knowledge Required:
- Proficiency in SQL and database management (PostgreSQL, MySQL, SQL Server, Oracle)
- Experience with Python or Scala for data processing and automation
- Familiarity with ETL tools (Apache NiFi, Talend, Informatica, dbt, SSIS)
- Knowledge of data modeling techniques (dimensional modeling, star schema, normalization)
- Understanding of data warehousing solutions (Google BigQuery, AWS Redshift, Snowflake, Azure Synapse)
- Experience with cloud-based data storage and processing (AWS S3, Google Cloud Storage, Azure Data Lake)
- Familiarity with big data frameworks (Apache Spark, Hadoop, Hive) (optional)
- Experience with workflow orchestration tools (Apache Airflow, Prefect, Luigi)
- Basic knowledge of DevOps and CI/CD for data pipelines (Git, Docker, Terraform)
- Good problem-solving and debugging skills
Educational Qualifications:
- Bachelor’s degree in Computer Science, Data Science, Information Technology, or a related field
- Certifications in SQL, Cloud Data Engineering (AWS/GCP/Azure), or ETL Tools are a plus
Experience:
- 1-2 years of experience in data engineering, database management, or ETL development
- Hands-on experience with SQL optimization, ETL pipelines, and cloud-based data solutions
Key Focus Areas:
- Data Pipeline Development & Optimization
- ETL Process Automation & Data Integration
- Cloud Data Engineering & Big Data Processing
- Database Performance Tuning & Data Modeling
Tools and Technologies:
- Programming Languages: SQL, Python, Scala (optional)
- Databases & Warehouses: PostgreSQL, MySQL, SQL Server, Oracle, Google BigQuery, Snowflake
- ETL & Data Pipeline Tools: Apache NiFi, Talend, Informatica, dbt, SSIS
- Big Data & Workflow Orchestration: Apache Spark, Hadoop, Hive, Airflow
- Cloud Platforms: AWS RDS, Google Cloud SQL, Azure Data Factory, AWS Glue
- Version Control & DevOps: Git, Terraform, Docker
Other Requirements:
- Ability to analyze and optimize data processing workflows
- Strong attention to detail and ability to ensure data accuracy
- Passion for working with big data and cloud-based solutions
- Excellent communication and teamwork skills