Noida, Up Engineer java architecture Jobs – 144 Open Jobs Today

Principal Site Reliability Engineer
Optum —Noida, Uttar Pradesh
- Full-time
- Azure
- Computer Science
- Java
- Health insurance
3h
SDE -2 ( Python & Gen AI )
Adobe —Noida, Uttar Pradesh
- Full-time
- Azure
- Java
- Databases
17h
Principal Software Engineer
Microsoft —Noida, Uttar Pradesh
- Full-time
- Computer Science
- Windows
- Java
14h
Performance Testing Engineer
E Hook Systems —West District, Delhi
- Java
- Analysis skills
- JavaScript
Sr Developer - Software
CorroHealth Infotech Private Limited —Noida, Uttar Pradesh
- Full-time
- Azure
- DevOps
- Java
Quick apply
Sr.Devops Engineer-215033
Knack Consulting Services Pvt Ltd —Noida, Uttar Pradesh
- Azure
- Oracle
- DevOps
Quick apply
Senior Engineer, Devops (AWS Automation)
BOLD LLC —Noida, Uttar Pradesh
- Azure
- DevOps
- Java
- Health insurance
- Paid time off
AI Solution Architect I
HCLTech —Gautam Budh Nagar, Uttar Pradesh
- Azure
- DevOps
- English
2d
Lead Developer
360 DC Group —Noida, Uttar Pradesh
- Azure
- DevOps
- Java
Quick apply
Software Engineer Java Angular
Optum —Noida, Uttar Pradesh
- Full-time
- Azure
- Java
- AWS
- Health insurance
3h
Senior Software Engineer JAVA Angular
Optum —Noida, Uttar Pradesh
- Full-time
- Azure
- Java
- AWS
- Health insurance
5h
Senior Full Stack Engineer
Optum —Noida, Uttar Pradesh
- Full-time
- Computer Science
- Java
- Bachelor's degree
- Health insurance
3h

I want to receive the latest job alerts for engineer java architecture in noida, up

By signing in to your account, you agree to SimplyHired's Terms of Service and consent to our Cookie and Privacy Policy.

Explore jobs in more locations

engineer java architecture jobs near noida, up

Principal Software Engineer jobs in Noida, Uttar Pradesh

Senior Java Developer jobs in Noida, Uttar Pradesh

Site Reliability Engineer jobs in Noida, Uttar Pradesh

Lead Developer jobs in Noida, Uttar Pradesh

Full Stack Developer jobs in Noida, Uttar Pradesh

Java Developer jobs in Noida, Uttar Pradesh

Senior Software Engineer jobs in Noida, Uttar Pradesh

Principal Software Engineer jobs in Hyderabad, Telangana

Principal Software Engineer jobs in Remote

Principal Software Engineer jobs in Madgaon, Goa

Software Developer jobs in Mumbai, Maharashtra

Software Developer jobs in Kochi, Kerala

Pu College, Principal jobs in Karnataka

Principal Software Engineer jobs in Pune, Maharashtra

Hiring Software Companies jobs in Hsr Layout, Bengaluru, Karnataka

Solution Architect jobs in Kolkata, West Bengal

Solution Architect jobs in Akurdi, Pune, Maharashtra

Career Job Solution jobs in Nagpur, Maharashtra

Internship Interior Designer, Fresher, Architect jobs in Nagpur, Maharashtra

Ai Architect jobs in Pune, Maharashtra

Principal Site Reliability Engineer

Optum -
Noida, Uttar Pradesh

Apply Now

Job details

Full-time
3 hours ago

Benefits

Health insurance

Qualifications

BCS
CI/CD
Cloud infrastructure
Azure
Go
Law
Computer Science
CSS
React
Kubernetes
Ansible
Relational databases
Software deployment
Enterprise Software
C#
UNIX
Microsoft SQL Server
Git
Build automation
Performance testing
.NET
Java
Master's degree
Databases
AWS
C++
Incident response
C
Bachelor's degree
JavaScript
PostgreSQL
Terraform
Continuous integration
Splunk
Perl
REST
Mentoring
Scripting
Software development
Technical writing
GitHub
APIs
ServiceNow
Agile
UI
Linux
Kafka
Unit testing
Data science
Root cause analysis
Data visualization
AI
Leadership
Jenkins
TypeScript
GitLab
Python
PowerShell
Shell Scripting
HTML
MySQL
Information Technology

Full job description

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.

Primary Responsibilities:

Define and own the SRE, AI-enabled operations, and observability strategy for the assigned portfolio, aligned with organizational goals and focused on improving reliability, stability, security, scalability, supportability, resilience, automation, and operational excellence across all digital properties
Act as a senior technical leader who bridges Site Reliability Engineering, software engineering, IT operations, AI engineering, observability platform engineering, cloud/platform teams, data engineering, and business technology leadership
Provide technical leadership, mentorship, and strategic guidance to senior and mid-level SREs, platform engineers, AI implementation teams, observability engineers, and cross-functional technology teams
Foster a culture of engineering excellence, continuous learning, proactive reliability, automation-first operations, production ownership, and operational discipline
Collaborate with engineering, security, architecture, product, data platform, cloud, infrastructure, operations, and business leaders to integrate reliability, observability, AI-enabled automation, and operational best practices into products and platforms from design through deployment
Report to senior stakeholders and CIO-level leaders on critical paths, operational risks, reliability posture, production readiness, mitigation plans, technology debt, AI adoption opportunities, and strategic SRE initiatives
Define, govern, and continuously improve enterprise reliability standards, including SLAs, SLIs, SLOs, error budgets, operational risk scoring, production readiness criteria, resilience scorecards, and service health models
Lead the reliability and peak season readiness initiatives by owning the assessment framework, collaborating with application teams, identifying reliability gaps, and driving critical applications toward 99.999% availability from a resiliency, availability, and reliability perspective
Architect and govern enterprise-grade monitoring, alerting, and observability standards across lines of business using platforms such as Splunk, Dynatrace, Grafana, DataDog, OpenTelemetry, ServiceNow, cloud-native monitoring tools, and next-generation observability platforms
Drive the transition from static dashboards and tool-specific monitoring to unified, intelligent, business-impact-aligned observability that provides visibility into application health, infrastructure health, customer experience, service reliability, operational risk, and business impact
Lead the design and implementation of a modern enterprise observability dashboard and intelligence platform using technologies such as React, JavaScript, TypeScript, REST APIs, Snowflake, Kafka Streaming, cloud-native services, and enterprise data platforms
Partner with data engineering and platform teams to design scalable data models, telemetry pipelines, event streams, API integrations, and analytical capabilities using Snowflake, relational databases, Kafka, streaming platforms, and observability data sources
Integrate real-time and near-real-time telemetry from logs, metrics, traces, events, alerts, incidents, change records, service metadata, cloud platforms, infrastructure platforms, and business systems
Ensure the observability dashboard supports service health views, dependency mapping, role-based views, SLO tracking, alert correlation, incident insights, customer impact analysis, capacity trends, executive reporting, and AI-assisted recommendations
Lead the strategy, design, and implementation of AI-enabled observability, AIOps, and intelligent automation capabilities to transform incident management from reactive to proactive, predictive, and increasingly autonomous
Drive implementation of AI and GenAI capabilities for incident triage, impact assessment, log analysis, anomaly detection, event correlation, root cause analysis, knowledge retrieval, runbook recommendation, production readiness validation, and automated remediation
Partner with engineering and platform teams to integrate LLM-based triage, Agentic AI workflows, AI-powered observability, and automated remediation into SRE workflows, on-call processes, incident response, and operational support models
Identify practical AI implementation opportunities that reduce alert noise, accelerate root cause analysis, reduce manual toil, improve developer productivity, and deliver measurable improvements in MTTD, MTTA, MTTR, and MTBI
Work with security, architecture, data governance, and platform teams to ensure AI-enabled solutions are implemented securely, responsibly, explainably, and in alignment with enterprise standards
Analyze and model system dependencies across applications, APIs, infrastructure, databases, cloud services, message streams, third-party integrations, and business-critical workflows
Conduct risk and threat modeling for operational scenarios including natural disasters, cloud region failures, cyberattacks, infrastructure failures, software defects, data pipeline failures, dependency failures, and peak-volume business events
Design and implement resilience patterns such as automated failover, geo-redundancy, circuit breakers, bulkheads, throttling, graceful degradation, blue-green deployments, canary deployments, automated rollback, and self-healing automation
Lead chaos engineering strategy and execution to proactively identify failure modes, validate system resilience, and improve recovery readiness
Provide technical leadership across hybrid hosting environments including Unix, Linux, Windows, Azure, AWS, GCP, private cloud, Kubernetes, containers, serverless platforms, and enterprise hosting platforms
Partner with infrastructure, cloud, network, security, and application teams to ensure platforms are reliable, scalable, secure, observable, resilient, cost-efficient, and supportable
Lead technology transformation efforts including cloud migration strategy, HCP assessment and adoption, platform modernization, containerization, serverless architecture, open source and inner source adoption, and automation-led operations
Guide teams on modern technology trends, emerging AI capabilities, evolving observability practices, changing cloud/platform technologies, and new engineering patterns that can improve reliability and operational effectiveness
Own and drive automation strategy to eliminate manual toil by designing scalable automation frameworks for runbooks, incident response, change validation, operational support, reporting, remediation, and self-service operations
Define and track toil metrics, automation coverage, operational efficiency metrics, incident trends, reliability improvement outcomes, and continuous improvement opportunities
Build automation-first operational models using scripting, APIs, workflow automation, CI/CD integration, AI-assisted workflows, and reusable engineering patterns
Improve operational tooling and frameworks by evaluating, selecting, standardizing, and governing tools across the SRE, observability, AI operations, and platform engineering portfolio
Act as a senior gatekeeper for production changes by establishing change governance processes, operational risk scoring, AI-assisted readiness validation, rollback validation, and release reliability standards
Lead incident response for P1 and P2 incidents, including war room facilitation, executive communication, technical triage, impact assessment, recovery coordination, root cause analysis, and post-incident review processes
Respond to platform emergencies, alerts, and escalations from customer support, business operations, application teams, and technology partners while ensuring root cause is addressed and corrective actions are implemented
Leverage ServiceNow and ITSM processes for incident, problem, change, knowledge, configuration, and service management at enterprise scale
Participate in and lead on-call rotation, setting the standard for on-call excellence, operational readiness, knowledge sharing, escalation management, and continuous improvement
Create and maintain architectural diagrams, flow diagrams, runbooks, operational playbooks, executive-level reports, service health documentation, dashboard documentation, and AI-enabled operational process documentation

Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so

Required Qualifications:

Bachelor's degree in Computer Science, Information Technology, Engineering, Data Science, or a related field preferred.
15+ years of overall experience in the IT industry across software development, infrastructure, operations, platform engineering, cloud engineering, production support, or enterprise technology delivery.
9+ years of hands-on experience in Site Reliability Engineering, Platform Engineering, Production Engineering, DevOps, Cloud Operations, or a similar role with demonstrated leadership in driving reliability at enterprise scale
9+ years of experience designing, implementing, and governing monitoring, alerting, and observability architectures for cloud, hybrid, and enterprise software solutions using tools such as Splunk, Dynatrace, DataDog, Grafana, OpenTelemetry, ServiceNow, or similar platforms
7+ years of coding or scripting experience with two or more of the following: Java, Python, Go, JavaScript, TypeScript, C#, C/C++, Perl, PowerShell, Shell scripting, Mainframe technologies, or similar languages
5+ years of experience building, designing, integrating, and programmatically consuming REST APIs at scale
2+ years of experience mentoring and providing technical leadership to SRE engineers, software engineers, platform engineers, observability engineers, AI engineers, or cross-functional technology teams
Solid hands-on experience implementing SRE practices across large-scale enterprise applications, including SLAs, SLIs, SLOs, error budgets, monitoring, alerting, incident response, capacity planning, performance engineering, resilience engineering, and production readiness
Demonstrated experience defining, managing, and operationalizing SLAs, SLIs, SLOs, error budgets, and reliability metrics as operational standards
Proven practical experience with AI implementation, AIOps, AI-enabled observability, intelligent incident detection, event correlation, anomaly detection, automated response, or LLM-based triage
Experience identifying AI use cases, designing implementation patterns, integrating AI capabilities into operational workflows, and measuring business or operational outcomes
Experience building or supporting observability dashboards, operational intelligence platforms, service health portals, or executive reporting solutions
Experience with modern front-end or dashboard development technologies such as React, JavaScript, TypeScript, HTML, CSS, REST APIs, UI components, and data visualization frameworks
Experience working with data platforms such as Snowflake, SQL Server, PostgreSQL, MySQL, or similar relational, analytical, or operational data stores
Experience with streaming or event-driven platforms such as Kafka, Kafka Streams, event hubs, message queues, or similar technologies
Experience integrating observability data from logs, metrics, traces, events, alerts, incidents, changes, service metadata, infrastructure platforms, cloud platforms, and business systems.
Experience with automation and deployment tools such as Terraform, Ansible, Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps, Argo CD, Helm, Kubernetes operators, or similar tools
Experience with programmatic interaction with relational databases and data-driven operational decision-making
Experience leading incident response for P1/P2 production incidents, including war room facilitation, executive stakeholder communication, root cause analysis, and post-incident review processes
Experience leveraging ServiceNow or similar ITSM platforms for incident, problem, change, knowledge, configuration, and service management processes
Experience in health care, insurance, financial services, government programs, regulated environments, or large-scale enterprise technology operations
Solid understanding of hybrid hosting and infrastructure platforms including Unix, Linux, Windows, Azure, AWS, GCP, private cloud, containers, Kubernetes, serverless platforms, and enterprise hosting platforms
Familiarity with GenAI, Agentic AI, LLM-based assistants, AI copilots, prompt engineering, semantic search, RAG patterns, vector databases, model integration, AI governance, and responsible AI practices
Proven track record of planning, supporting, or improving 99.999% availability for critical applications in production environments
Proven solid architectural understanding of engineering fundamentals including unit testing, performance testing, chaos engineering, code reviews, telemetry, Agile, DevOps, CI/CD, security, API design, and production readiness
Proven deep expertise in CI/CD pipelines, containerization, serverless architecture, public cloud, private cloud, application observability, messaging, streaming architecture, and platform automation
Demonstrated ability to guide technical priorities, conduct design reviews, influence architecture decisions, define engineering standards, and drive adoption of modern technology practices
Proven ability to evaluate emerging technologies, understand changing technology dynamics, guide teams on adoption strategy, and translate new technology capabilities into practical enterprise implementation plans
Proven ability to communicate effectively with technical and non-technical, globally distributed audiences, including presenting to senior leadership and CIO-level stakeholders on reliability posture, AI initiatives, operational risk, and strategic technology direction
Proven solid technical writing skills, including creating architectural diagrams, flow diagrams, runbooks, end-user documentation, operational playbooks, executive-level reports, and technology strategy documents
Flexibility to support 24x7 operations through shift-based, on-call, and rotational support models

Preferred Qualification:

AI Dojo certification Level 1, Level 2, and Level 3

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.

Apply Now

Refine Your Search

engineer java architecture jobs in noida, up

Principal Site Reliability Engineer

SDE -2 ( Python & Gen AI )

Principal Software Engineer

Performance Testing Engineer

Sr Developer - Software

Sr.Devops Engineer-215033

Senior Engineer, Devops (AWS Automation)

AI Solution Architect I

Lead Developer

Software Engineer Java Angular

Senior Software Engineer JAVA Angular

Senior Full Stack Engineer

I want to receive the latest job alerts for engineer java architecture in noida, up

Explore jobs in more locations

Jobseeker tools

Employer Tools

Browse

Stay Connected