Job Summary The Observability Engineer will design, implement, and maintain observability solutions for complex systems and applications. This role requires a solid understanding of monitoring and observability practices, as well as expertise in tools and technologies used to collect and analyze performance, logging, and metrics data. Responsibilities - Monitoring Setup and Configuration: Configure monitoring tools to gather data from various systems, applications, and network components. Define metrics, configure data collection agents, and ensure proper connectivity and access. - Alert Management: Monitor alerts, perform triage to identify critical issues, analyze alert patterns, and configure escalation workflows to ensure timely response and resolution. - Performance Analysis and Troubleshooting: Use tool features to analyze metrics, logs, and traces. Conduct root cause analysis, troubleshoot issues, and identify areas for optimization. - Incident Response: Collaborate across teams to respond to incidents quickly, handling triage, communication, and coordination with stakeholders. Participate in post-incident reviews to identify improvements. - Dashboard and Visualization: Develop and maintain dashboards and visualizations that offer a consolidated view of system health and performance. Customize dashboards based on specific business and operational requirements. - Capacity Planning and Scalability: Monitor resource utilization and trends to forecast capacity needs. Collaborate on resource planning and provisioning to support scalability and optimal performance. - Tool Administration and Maintenance: Perform routine administration tasks for observability tools, including user management, access control, and system upgrades. Monitor the health and availability of these tools. - Documentation and Knowledge Sharing: Document configurations, troubleshooting steps, and best practices. Contribute to knowledge bases and share insights with the team. - Tool Integration and Automation: Integrate observability tools with other systems, including ticketing and incident management platforms. Automate monitoring configurations and reporting to improve efficiency. - Continuous Improvement and Research: Stay updated on observability trends, research new tools and methods, and continuously improve monitoring setups to align with best practices. - Other duties as assigned. Skills and Experience - Bachelor's degree in computer science or a related technical field preferred. - 5+ years of experience in software engineering or IT with a focus on monitoring, alerting, and analysis. - Proficiency in application, cloud infrastructure, and monitoring tool administration. - Hands-on experience with Solar Winds, Elasticsearch (AWS Open Search), and similar tools (e.g., Splunk). - Experience with APM tools such as App Dynamics or alternatives like Dynatrace, New Relic. - Proficiency in scripting languages (Python, JSON, Power Shell preferred). - Strong understanding of web services and CI/CD pipelines. - Ability to thrive in a fast-paced environment with excellent problem-solving skills, adaptability, and teamwork skills. - Knowledge of Infrastructure as Code (IaC), particularly CDK and Terraform, is highly desirable. Passion for Dev Ops, application/API monitoring, automation, and reliability. About Auxis This is a full-time position with a work schedule of Monday-Friday with some schedule variations as needed including on-call coverage rotation. Occasional night or weekend work for special projects. This position is 100% work from office. #J-18808-Ljbffr