Apply Below for a Direct HR Interview - First in India!

DevOps / SRE Engineer

10-15 Lakhs

7-10 Years

Delhi-New Delhi

Engineering

Vacancies - 1
Amazon ECS / EKSAWS CloudFrontAWS Fargate+ 24 More
View Job Description

Personal Details

DevOps / SRE Engineer Screening Questions

Job Description

Job Description – DevOps / SRE Engineer

Experience: 5–6 Years

CTC: ₹7.5 LPA – ₹12 LPA

Employment Type: Full-Time

About the Role

We are looking for a skilled and proactive DevOps / Site Reliability Engineer (SRE) to manage and optimize the platform infrastructure for a multi-service production environment on AWS. The ideal candidate should have strong expertise in Infrastructure-as-Code (IaC), CI/CD automation, gateway-level traffic management, cloud security, and observability practices.

This role requires hands-on ownership of AWS infrastructure, deployment pipelines, monitoring systems, and production reliability.

Key Responsibilities

AWS Infrastructure Management

  • Manage and maintain AWS infrastructure including:

    • AWS Fargate / ECS

    • Application Load Balancer (ALB)

    • CloudFront

    • VPC

    • IAM

    • Route53

    • ACM

    • AWS Secrets Manager

  • Ensure infrastructure scalability, security, reliability, and performance optimization.

Infrastructure as Code (IaC)

  • Develop and maintain reusable Terraform modules.

  • Handle Terraform state management, drift detection, and multi-account deployments.

  • Automate infrastructure provisioning and environment setup.

CI/CD & Deployment Automation

  • Build and manage CI/CD pipelines using GitHub Actions or equivalent tools.

  • Implement:

    • Immutable deployments

    • Automated image scanning

    • Secure secrets handling

    • Signed artifacts and reproducible builds

  • Improve deployment reliability and release efficiency.

Gateway & Traffic Management

  • Configure and manage Envoy Gateway or equivalent proxy solutions.

  • Implement routing rules, traffic splitting, and gateway observability.

  • Support production-grade traffic management strategies.

Observability & Reliability Engineering

  • Implement structured logging, monitoring, and alerting systems.

  • Define and track:

    • SLOs

    • Error budgets

    • p99 latency metrics

    • RED / USE metrics

  • Maintain operational runbooks and incident response processes.

Security & Compliance

  • Ensure production credentials and secrets are securely managed.

  • Support WAF, DDoS protection, and compliance-related security practices.

  • Follow cloud security best practices and governance standards.

Collaboration & Operations

  • Work closely with development and product teams to improve system reliability and deployment workflows.

  • Participate in troubleshooting, root-cause analysis, and performance optimization activities.

Required Skills & Qualifications

  • 5+ years of experience managing production AWS environments (Fargate, ECS, or EKS).

  • Strong hands-on experience with Terraform:

    • Modules

    • State management

    • Drift handling

    • Multi-account architecture

  • Experience with Envoy Proxy or equivalent technologies (Nginx, HAProxy, Istio).

  • Strong expertise in CI/CD pipelines using GitHub Actions or similar tools.

  • Good understanding of:

    • Observability

    • Monitoring

    • SRE principles

    • Alerting strategies

  • Experience with containerized environments and deployment workflows.

  • Strong troubleshooting and automation skills.

Preferred Skills

  • Experience with:

    • Postgres backup & recovery

    • PITR

    • Failover drills

    • Connection pool sizing

  • Knowledge of FinOps and AWS cost optimization.

  • Experience with traffic splitting strategies across regions or tenants.

  • Familiarity with cloud security standards and compliance practices.

© 2026 Erekrut HR Automation Solutions Pvt Ltd. All Rights Reserved.