Job Description:
Job Title: DevOps Lead (10+ Years Experience) – DCF L35
We are seeking a highly motivated and experienced DevOps Lead with 10+ years of relevant experience across DevOps and SRE functions, to design, build, evolve, optimize, improve, manage and support nearly 6+ business and enterprise applications that we have within our “Dentsu.Connect Creative” portfolio and suite of apps.
The role will require taking end-to-end ownership of our CI/CD pipelines, Release Management infrastructure, Release and deployment processes, GITHub Actions workflows, deployment and build/release automation, quality and automation pipelines, and also environment/infrastructure operations, management and troubleshooting, across our digital app footprint and cloud estate on Azure Cloud using Containers / Azure Kubernetes Service.
As a DevOps Lead, you will be responsible for building, evolving, automating, and managing our release management functions, with a focus on containerization, container management and orchestration, service routing and service mesh management, and the ability to manage all Infrastructure as Code (IaC). You will lead the implementation of DevOps and CI/CD best practices, deployment and release automation, enabling our development teams to deliver applications quickly and reliably.
You will act as the “bridge” between all our Application Dev teams and the central Global DevOps function we have within our Dentsu Data and Technology Practice, driving the adoption of centralized capabilities and best practices, across all our Creative Apps.
Key Responsibilities:
-
Package, Deploy and Manage Apps on Kubernetes: Manage application deployment and packaging of enterprise workloads and apps on scaled Kubernetes clusters.
-
Design and Optimize GITHub Actions and CI/CD Workflows: Build, optimize and orchestrate GITHub Actions, manage end-to-end CI/CD Pipelines and Workflows using GITHub Actions, including code build, packaging, quality test automation, release and deployment across Non-PROD and PROD environments.
-
Container Orchestration: Orchestrate container workloads in non-production and production environments, including traffic management, service routing, etc.
-
Service Mesh: Configure and manage virtual service routing via service mesh frameworks and stacks such as Istio.
-
API Gateways : Orchestrate and route traffic across the application stack, from the front-end UI Layer down to API and Federation Gateways to Micro Services.
-
App Management across MERN Stack, Python Apps and AI powered Apps : Manage end-to-end deployment and release management of all our Apps on MERN Stack (JavaScript + TypeScript), and also our rapidly growing suite of Python Full-Stack Apps and AI-powered/AI-enabled apps, using AI Services. Good prior experience with both MERN stack and Python stack is required.
-
GITHub management and GITOps: Manage large GITHub Enterprise orgs with multiple GITHub app Repos, driving best practices around repository management, environment variables and secrets management, GIT Branching and branch management best practices, strong experience with trunk-driven development processes, feature flagging techniques, etc.
-
Support Operations and App Development Teams: Assist development and operations teams in building, deploying and managing cloud applications using containerized services, including day-to-day troubleshooting support.
-
Cloud Operations: Manage and Operate cloud platforms on Azure Kubernetes Service (AKS) to ensure performance, scalability, reliability, and security for enterprise applications and strategic products. Manage and optimize cloud services including AKS, Container Apps, Azure PaaS - App Services, and Azure Functions / Azure Serverless, Azure Storage, Atlas MongoDB, Azure Redis, etc.
-
Infrastructure as Code (IaC): Implement IaC solutions using tools such as Terraform, CloudFormation, ARM Templates, etc, for automating provisioning.
-
Monitoring and Performance: Implement and enhance monitoring, alerting, and logging solutions using Azure Monitor, Prometheus, Grafana, Loki, and other tools to ensure infrastructure health.
-
Security Scanning and Compliance: Ensure the security and compliance of cloud infrastructure and application code with best practices, including RBAC, code and container security scanning, etc.
-
Cost Optimization: Develop and implement strategies to optimize cloud costs and reduce operational expenses.
-
Network Management: Manage network configurations and network access controls such as VPCs, subnets, private links, load balancers, etc.
-
CI/CD Pipelines: Improve and optimize CI/CD pipelines using GitHub Actions, Azure DevOps, and/or other CI/CD tools for continuous integration and delivery.
-
Technical Guidance: Provide technical guidance and support on cloud architecture, best practices, and cloud-based solutions.
-
Stakeholder Collaboration: Collaborate with senior stakeholders from across our Product Engineering Team, Quality Engineering team, central Global DevOps and SRE organization, Cloud Services organization, and Security organization, to align on and optimize our continuous delivery capabilities.
Key Requirements:
-
Deep understanding of Azure cloud platform, Azure Kubernetes Service, Azure Functions / Serverless, and all core Azure Cloud services across compute, networking and storage.
-
Ability to define and design an end-to-end app deployment architecture, comprising all application, networking, and security components, with considerations for high availability.
-
Ability to explain the rationale for their design and deployment choices, present options and choices, and defend discussions with senior engineering leads and tech stakeholders on the choice of architecture.
-
Lifecycle Management for MERN Full Stack Apps and Python Apps : Ability to configure and manage the end-to-end application lifecycle for MERN Full Stack Apps and Python Apps.
-
Azure Active Directory (Entra ID) and Security Management: Manage Microsoft Entra users and groups (create, manage properties, licenses, external users, SSPR). Understanding of how Azure integrates with IDP and SSO solutions such as Okta.
-
Azure Resource Access Management: Manage built-in Azure roles, assign roles at different scopes, and interpret access assignments.
-
Azure Storage: Configure storage access (firewalls, VNETs, SAS tokens, stored access policies, access keys, identity-based access for Files), manage storage accounts, manage data (Storage Explorer, AzCopy), configure Azure Files and Blob Storage (file shares, containers, storage tiers, snapshots, soft delete, lifecycle management, versioning).
-
Azure Compute: Automate resource deployment (ARM templates, Terraform), provision and manage containers (Azure Container Registry, Container Instances, Container Apps, scaling), create and configure App Service (plans, scaling, certificates, TLS, custom DNS, backup, networking, deployment slots).
-
Azure Virtual Networking: Configure and manage virtual networks (subnets, peering, public IPs, routes), secure access (NSGs, application security groups, Azure Bastion, service/private endpoints), configure name resolution and load balancing (Azure DNS, load balancers).
-
Azure Resource Monitoring and Maintenance: Monitor resources (metrics, logs, alerts, action groups, alert processing rules), configure monitoring of VMs, storage, and networks (Azure Monitor Insights), use Azure Network Watcher and Connection Monitor.
-
Backup and Recovery: Create Recovery Services/Backup vaults, configure backup policies, perform backup and restore operations, configure Azure Site Recovery, perform failovers, configure and interpret reports and alerts.
-
Extensive experience with Azure Kubernetes Service (AKS) and Azure Container Registry (ACR).
-
Extensive experience with Service Routing and Service Mesh frameworks such as Istio.
-
Extensive experience with Application Lifecycle Management (ALM), Monitoring, and Observability tools.
Required - Soft Skills :
-
Ability to collaborate across multiple teams and stakeholders, in multiple time zones
-
Ability to work across multiple Application Development teams and multiple workstreams
-
Strong abilities in planning, orchestration, governance, and process orientation, applied to release management workflows
-
Strong communication skills, including written and verbal communication, to collaborate across domains, disciplines, and teams with different backgrounds.
-
Ability to coordinate directly with Senior Technical Leads and Senior Stakeholders across Central DevOps and SRE functions in UK and US geographies.
Preferred Qualifications:
-
Certifications: Azure Administrator (AZ-104) or Azure DevOps Engineer Expert certifications or a CKA (Certified Kubernetes Administrator) certification are highly desirable.
- Container Orchestration: Experience with other container orchestration platforms (e.g., Docker Swarm) in addition to AKS.
-
Monitoring Tools: Familiarity with monitoring and observability platforms like Prometheus, Grafana, Loki, and other cloud-native monitoring solutions.
- Scripting: Proficiency in scripting languages (e.g., Python, Bash, PowerShell) for frequent automation tasks.
- Agile Experience: Experience working in an Agile/Scrum environment.
-
AI/ML Workloads (Nice-to-have) : Exposure to deploying or managing AI-powered or Gen-AI-enabled applications using Azure AI Services, Azure OpenAI, or comparable platforms.
Educational Background
Bachelor’s degree in computer science, Engineering, or a related field
Location:
DGS India - Bengaluru - Manyata N1 Block
Brand:
Merkle
Time Type:
Full time
Contract Type:
Permanent