Key roles and responsibilities "> main tasks include ensuring the stability, availability, and performance of our systems and services. we collaborate with development and operations teams to understand the requirements for observability and monitoring of their services. - we work closely with our development and operations teams to design, implement, and maintain monitoring, logging, and tracing solutions that enable us to identify and resolve issues quickly. - we empower and leverage the observability practice, acting as a fellow squad member. - we promote near-zero manual actions in administration and upgrades of systems together with the development of incident automatic responses. - we design, implement and maintain monitoring, logging, and tracing solutions using tools such as prometheus, grafana, elk stack, jaeger, etc. - we leverage the monitoring processes so they map the underlying business processes they support. - we ensure that observability and monitoring systems are set up to provide actionable insights that can be used to improve system stability, performance, and availability. - we troubleshoot and resolve issues related to observability systems, including data quality, system availability, and performance. - we continuously evaluate new and emerging observability technologies and tools, and make recommendations for their adoption. required skills and qualifications essential skills include: - a solid knowledge base in public clouds: aws, azure, gcp. - experience with tools such as dynatrace (required), netcool, devo, netscout, prometheus, grafana, el...
Cree una alerta de empleo y reciba nuevas ofertas que se adaptan a su perfil desde más de 2550 sitios web de empleo