Summary:
-
Implements and operates monitoring, logging, alerting, and dashboards.
-
Works closely with developers on "build & run" topics.
-
Monitors and coordinates security-related topics (vulnerabilities, findings) and stability-related topics (query patterns, indexing).
-
First responder for production incidents.
Mandatory Skills (in order of importance):
-
Monitoring & alerting (CloudWatch — logs, metrics, dashboards, alarms)
-
Distributed tracing (AWS X-Ray, Lambda Insights)
-
Incident management (root-cause analysis, runbook authoring)
-
Database performance analysis (MongoDB, PostgreSQL)
-
Security operations (Wiz.io)
-
AWS services (Lambda, S3, SQS, IAM, VPC)
-
Bash / shell scripting & automation
Advantageous Skills:
-
GitHub Actions, AWS CodePipeline (CI/CD understanding)
-
Docker, ECS Fargate (container health, troubleshooting)
-
AWS Cost Explorer, resource tagging
-
Release coordination (hotfix processes, rollback procedures)
-
BMW SCP.apps (Integrate) platform operational knowledge
-
TypeScript (reading application code for