OBSERVABILITY ANALYST - (PH-181)

Kastech Software Solutions Group


**Responsibilities**: - Design and implement comprehensive observability strategies and architectures for AWS cloud environments, including metrics, logs, and distributed tracing. - Develop custom dashboards and alerts to monitor key performance indicators (KPIs) and overall system health. - Automate the deployment and management of observability infrastructure using Infrastructure as Code (IaC) tools. - Work closely with development, operations, and engineering teams to understand their observability needs and provide effective solutions. - Participate in incident resolution, providing observability data and analysis to identify root causes and facilitate recovery. - Implement and manage observability solutions specifically for containerized environments and orchestration with Elastic Kubernetes Service (EKS). - Evaluate and recommend new observability tools and technologies to enhance our capabilities. - Document observability configurations, processes, and best practices. - Train and support other teams in the use of observability tools and techniques. - Stay up-to-date on the latest trends and best practices in observability and cloud technologies. **Requirements**: - Cloud Knowledge and Experience (AWS): - Proven experience minimum 3 - 5 years working with the Amazon Web Services (AWS) cloud platform. - In-depth knowledge of AWS services relevant to observability, such as CloudWatch (Logs, Metrics, Alarms), X-Ray, and potentially others like AWS Observability Service. - **Infrastructure as Code (IaC)**: - Practical experience in deploying and managing infrastructure using Infrastructure as Code (IaC) tools such as Terraform, or similar. - Ability to write, maintain, and improve IaC code to automate the creation and configuration of observability infrastructure. - **Elastic Kubernetes Service (EKS)**: - Deep understanding of Kubernetes concepts and its interaction with AWS. - Hands-on experience configuring observability tools specifically for Kubernetes environments, such as Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Jaeger, etc., within EKS. - **General Observability Experience**: - Solid understanding of observability principles and best practices (metrics, logs, distributed tracing). - Experience with various observability and monitoring tools. - Ability to develop effective dashboards and alerts based on observability data. - Capacity to analyze observability data to identify performance and availability issues. **Additional Technical Skills**: - Ability to develop scripts and automate tasks using languages such as Python, Bash, etc. - Knowledge of Linux operating systems. - Familiarity with Agile and DevOps methodologies. - **Interpersonal Skills**: - Strong problem-solving skills and the ability to analyze complex data. - Excellent communication and collaboration skills. - Ability to work independently and as part of a team. **Nice to have** - Relevant AWS certifications (e.g., AWS Certified DevOps Engineer - Professional). - Experience with other container orchestration platforms (e.g., vanilla Kubernetes). - Knowledge of Site Reliability Engineering (SRE) principles. - Experience in implementing Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

trabajosonline.net © 2017–2021
Más información