Senior Site Reliability Engineer

ViziRecruiter,LLC.
Chicago, IL

Introduction

Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which also includes five leading omnichannel grocery brands – Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Ahold Delhaize USA associates support the brands with a wide range of services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more.

Overview

The Site Reliability Engineer (SRE) III is responsible for ensuring the scalability, reliability, and performance of production systems through automation, observability, incident response, and infrastructure engineering. This role involves designing and implementing robust operational processes and tooling to support highly available, fault-tolerant systems in a cloud-native environment. The SRE III collaborates closely with engineering squads, product teams, and stakeholders to embed reliability best practices across the software delivery lifecycle. The role includes ownership of system uptime, service level objectives (SLOs), and operational excellence, along with mentoring junior engineers and leading cross-functional initiatives that improve system resilience.

Applicants must be currently authorized to work in the United States on a full-time basis.

Our flexible/hybrid work schedule includes 3 in-person days at our Chicago office and 2 remote days.

Responsibilities

  • Design and implement infrastructure solutions that ensure system availability, scalability, and reliability across cloud-native environments like AKS and Kubernetes.
  • Develop automation for provisioning, deployment, configuration, monitoring, and incident remediation using tools such as Terraform, ArgoCD, and GitHub Actions.
  • Collaborate with engineering teams to define and track service level objectives (SLOs) and service level indicators (SLIs).
  • Build and manage microservices-based platforms leveraging Spring Boot, Java, Tomcat, and Redis.
  • Monitor production environments using Datadog and proactively address performance and reliability issues.
  • Perform root cause analysis and lead post-incident reviews to drive continual improvement.
  • Manage CI/CD pipelines and deployment automation using GitHub, Docker, and container orchestration technologies.
  • Create and maintain infrastructure as code (IaC) using Terraform, with deployment pipelines integrated into GitOps workflows.
  • Lead and support operational readiness reviews, game days, chaos engineering practices, and failure mode analysis.
  • Build scalable observability and alerting frameworks with Datadog.
  • Implement resilient, asynchronous architectures using Kafka for event-driven services.
  • Reduce operational toil through self-healing automation and proactive system tuning.
  • Troubleshoot Linux-based environments such as Ubuntu and optimize them for reliability.
  • Provide on-call support and ensure 24/7/365 system reliability for mission-critical applications.
  • Collaborate with the security team to enforce secure operational practices and cloud compliance.
  • Mentor junior engineers and contribute to documentation, technical design, and knowledge-sharing across the organization.

Requirements

  • Bachelor's Degree in Computer Science, Information Systems, or a related technical field; equivalent training, certifications, or experience will be considered.
  • 5+ years of experience in a Site Reliability Engineering, or DevOps, or Java programming role.
  • Experience managing production-grade systems and services on AKS/Kubernetes in distributed environments.
  • Proficiency in programming and scripting languages including Python, Java, Bash, or Go.
  • Proven experience with Spring Boot, Tomcat, Redis, and microservices architecture.
  • Hands‑on experience in managing Linux environments, particularly Ubuntu.
  • Proficiency with observability stacks and performance monitoring using Datadog, Prometheus, and ELK.
  • Deep understanding of containerization and orchestration using Docker, Kubernetes, and ArgoCD.
  • Experience managing event‑driven systems using Kafka.
  • Expertise in IaC and automation using Terraform and GitHub Actions.
  • Familiarity with networking concepts, DNS, load balancing, and cloud infrastructure (AWS, Azure, or GCP).
  • Strong analytical, debugging, and problem‑solving skills.
  • Excellent verbal and written communication skills and the ability to collaborate effectively across teams.

Salary Range: $125,040 - $187,560

Actual compensation offered to a candidate may vary based on their unique qualifications and experience, internal equity, and market conditions. Final compensation decisions will be made in accordance with company policies and applicable laws.

#J-18808-Ljbffr
Posted 2026-01-15

Recommended Jobs

Senior Program Manager, Data Center Infrastructure

Oracle
Springfield, IL

Job Description Job Description As a Program Manager, Data Center Infrastructure, you will play a critical role in supporting the successful delivery of large-scale data center development…

View Details
Posted 2026-01-12

Business Analyst-Kennesaw, GA/ Chicago, IL/Richmond, VA (Hybrid role)

Career Mentors, LLC
Chicago, IL

Location: Kennesaw, GA / Chicago, IL / Richmond, VA (Hybrid) Employment Type: W2 Candidates Only We are seeking a seasoned Business Analyst with a strong background in the banking domain ,…

View Details
Posted 2025-10-25

Wholesale Lending Services Business Manager - Vice President

JPMorgan Chase & Co.
Chicago, IL

Job Description The Commercial & Investment Banking (CIB) Finance & Business Management (F&BM) team is seeking a strategic, analytical, and energetic professional to support the Wholesale Lending …

View Details
Posted 2025-12-30

Software Engineer ll

Strata Decision Technology
Chicago, IL

Our Team   As a Software Engineer, you will work on many different areas of our application with other passionate engineers that have a broad range of experience in full stack development. Operatin…

View Details
Posted 2026-01-16

Senior Manager, Information Security Office (ISO) Consultant

Capital One
Chicago, IL

Senior Manager, Information Security Office (ISO) Consultant At Capital One, you will help consult on initiatives, programs, and projects to raise their game in Information Security. You are …

View Details
Posted 2026-01-05

Animal Care Specialist

Shedd Aquarium
Chicago, IL

TITLE: Animal Care Specialist DEPARTMENT: Oceanarium Team REPORTS TO: Director of Animal Care & Behavioral Training STATUS: Full-time, Non-Exempt PAY RATE: $23.00/hour LOCATION: …

View Details
Posted 2026-01-09

Psychotherapist

Vetted Solutions
Aurora, IL

The Psychotherapist role involves providing individual and group therapy, utilizing evidence-based practices to develop personalized treatment plans, and offering grief counseling and support for clie…

View Details
Posted 2026-01-04

Senior FSQ Project Leader - Global Initiatives

Ferrero International S.A.
Chicago, IL

A global food company is seeking a Senior Project Manager for Food Safety & Quality based in the U.S. The role involves leading technical projects, managing stakeholder communications, and ensuring p…

View Details
Posted 2026-01-15

Dentist

MRINetwork Jobs
Northbrook, IL

Melissa Owens| President P: (386)339-0839 |E: [email protected] Schedule a Meeting:   Benefits: Safe harbor 401(k) Incentive equity grant CE benefits Employer paid…

View Details
Posted 2025-12-23

Nurse Practitioner Opportunity in Berwyn, IL making $140k + Bonus

Optigy
Berwyn, IL

  Nurse Practitioner/(APP) Berwyn, IL COMPENSATION: Salary $105k-$140k plus Incentive Bonus   Role Description: Our APP provides equitable and effective value-based healthcare to local M…

View Details
Posted 2025-12-05