EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking a highly experienced Senior AIOps Solution Architect with exceptional expertise in Gen-AI-enabled Cloud Engineering, Observability, Operational Intelligence, and AI-driven automation. The ideal candidate will bring 10+ years of enterprise-level architecture experience, with a focus on building innovative Gen-AI-enabled platforms, data-driven automation frameworks, and enterprise-grade AIOps solutions to advance operational efficiency.
Responsibilities
-
Design and deliver scalable Gen-AI-powered AIOps solutions for large enterprise platforms to improve MTTR, achieve automated incident resolution, and drive operational excellence
-
Architect and implement Gen-AI & LLM Engineering solutions using tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, and LangChain
-
Develop and optimize MLOps pipelines and model deployment workflows leveraging SageMaker, Azure ML, clustering, topic modeling, and anomaly detection techniques
-
Implement RAG, Vector DBs, and advanced semantic search across platforms using PGVector, Elasticsearch, and Bedrock Knowledge Sources
-
Create and automate solutions for Cloud Platforms and Infrastructure with AWS, Azure, GCP, Terraform, CloudFormation, and Helm, alongside Python and Shell Scripting
-
Lead Kubernetes-based container orchestration and DevSecOps initiatives, including CI/CD pipelines, Istio, and KEDA deployment strategies
-
Design and integrate serverless and cloud-native architectures using API Gateway, Lambda, Step Functions, DynamoDB, S3, and Kinesis
-
Implement end-to-end Observability solutions using DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda
-
Ensure seamless ITSM and ServiceNow integration for AI-driven operations and automation
-
Work with ITSM tools like ServiceNow, Jira Service Management, and Manage Engine to streamline incident management workflows
-
Provide thought leadership in AIOps, automation, and AI-powered operational intelligence to leadership and engineering teams
Requirements
-
19+ years of overall IT experience
-
10+ years of professional experience in Enterprise Cloud, Infrastructure Engineering, SRE, Automation, and Architecture roles
-
Proven track record of delivering Gen-AI-powered AIOps solutions in production environments, driving efficiencies like MTTR improvement and operational automation
-
Expertise in Gen-AI and LLM Engineering tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, LangChain, and Bedrock Agents
-
Proficiency in RAG, Vector Databases, and semantic search solutions like PGVector, Elasticsearch, and Bedrock Knowledge Sources
-
Background in MLOps, model development, and machine learning techniques using SageMaker, Azure ML, clustering, topic modeling, and anomaly detection
-
Skills in cloud engineering and automation technologies, including AWS, Azure, GCP, Terraform, CloudFormation, Helm, Python, and Shell Scripting
-
Capability to design and operate Kubernetes-based infrastructure, CI/CD pipelines, security automation, Istio, and KEDA
-
Familiarity with serverless computing and cloud-native tools like API Gateway, Lambda, Step Functions, DynamoDB, S3, and Kinesis
-
Knowledge of Observability platforms such as DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda
-
Understanding of ITSM platforms, including ServiceNow, Jira Service Management, and Manage Engine
-
Showcase of AI and Machine Learning expertise in areas like anomaly detection, GenAI implementation, and agentic AI solutions
-
Ability to communicate effectively in both written and spoken English (B2 level or higher)
Nice to have
-
Experience leading AIOps/Cloud Practices or platform engineering organizations
-
Certifications in AWS ML, Cloud Architecture, or AI Leadership
We offer
-
Opportunity to work on technical challenges that may impact across geographies
-
Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
-
Opportunity to share your ideas on international platforms
-
Sponsored Tech Talks & Hackathons
-
Unlimited access to LinkedIn learning solutions
-
Possibility to relocate to any EPAM office for short and long-term projects
-
Focused individual development
-
Benefit package:
-
Health benefits
-
Retirement benefits
-
Paid time off
-
Flexible benefits
-
Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)