Software Engineer (Observability & Monitoring) - West Des Moines, IA Washington DC

Software Engineer (Observability & Monitoring) - West Des Moines, IA

Full Time • Washington DC
Job Description:
 
Overview:
 
·      Seeking an experienced Observability and Monitoring Engineer to build and mature our enterprise-wide monitoring, logging, alerting, and observability capabilities across our AWS-based technology stack.
 
·      This role will define the strategy, architecture, implementation standards, and dashboards that enable proactive detection, faster troubleshooting, and data-driven insights across applications, infrastructure, operating systems, databases, file transfers, and batch processes.
 
·      The ideal candidate has hands-on engineering expertise, strong architecture skills, and the ability to unify multiple monitoring solutions into a cohesive observability framework.
 
Responsibilities:
 
·      You will establish standards for logs, metrics, traces, event correlation, and alert across multiple environments
 
·      You will build centralized dashboards and alerting policies that provide unified visibility across: applications & services, operating systems, AWS services (EC2, RDS, Lambda, S3, CloudWatch, CloudTrail, etc.), databases (MS SQL Server, PostgreSQL, etc.), file transfer systems (SFTP, managed transfer tools), batch jobs and scheduled processes.
 
·      You will create actionable and noise-free alerting thresholds, escalation policies, and runbooks.
 
·      You will integrate existing tools (Dynatrace, Graylog, Splunk, SolarWinds, Zabbix) into a cohesive ecosystem.
 
·      You will rationalize tool usage and recommend consolidation or modernization where appropriate.
 
·      You will manage the lifecycle, configuration, tuning, and health of monitoring and logging platforms, automate monitoring deployments using IaC (CloudFormation) and CI/CD pipelines, and develop reusable templates/standards so teams can onboard new applications quickly.
 
·      You will build self-service dashboards and reporting for technical/business stakeholders, create documentation for monitoring standards, dashboard naming conventions, logging schemas, and alert configuration guidelines.
 
·      You will define SLOs/SLIs and reliability KPIs for critical services.
 
·      You will partner with scrum teams, infrastructure, and security teams to reduce MTTR and improve system reliability, participate in incident resolution, root cause analysis, and problem management.
 
·      You will provide technical leadership/mentoring to team members and consult on architecture decisions and best practices.
 
·      You will Develop/maintain system documentation and participate in project planning and technical strategy sessions.
 
Qualifications:
 
·      Bachelor's degree in Computer Science or related field
 
·      5+ years of experience implementing monitoring and observability using Dynatrace
 
·      Hands-on experience with monitoring/logging tools such as Zabbix, Graylog, Splunk, SolarWinds, or equivalents
 
·      5+ years of hands-on experience with AWS services and architecture
 
·      Deep understanding of metrics, logs, traces, distributed tracing, and event correlation
 
·      Experience building dashboards and KPIs for application, infrastructure, and database layers
 
·      Strong scripting/automation skills (Python, Bash, PowerShell) and familiarity with Terraform or CloudFormation
 
·      Strong understanding of network monitoring, performance tuning, and systems architecture
 
·      Familiarity with ITIL incident/problem management processes
 
·      Proficiency with AI tools and using them responsibly in improving observability preferred
 
·      Experience with container orchestration and microservices architecture preferred
 
·      Experience with AWS OpenTelemetry, Prometheus, Grafana, or similar tools preferred
 
 
 
Required Technical Skills:
 
•                     AWS Services (EC2, RDS, S3, Lambda, ECS/EKS, etc.)
 
•                     Configuration Management (Ansible, Puppet, Chef)
 
•                     Monitoring Tools (Dynatrace, CloudWatch, Zabbix, Solarwinds, Graylog etc.)
 
•                     CI/CD Tools (Jenkins, Quickbuild, Bitbucket)
 
•                     Scripting Languages (Python, PowerShell, Bash)
 
•                     Database Management (MS SQL Server, PostgreSQL)
 
•                     Infrastructure as Code (Terraform, CloudFormation)
 
•                     Container Technologies (Docker, Kubernetes)
 
Compensation: $45.00 per hour




(if you already have a resume on Indeed)

Or apply here.

* required fields

Location
Or
Or