Product & Engineering 

Site Reliability Engineer

Washington, D.C.
Full-Time

Job Description

Site Reliability Engineers (SREs) are responsible for keeping all Leverege production systems running smoothly and meeting SLAs for customer projects. A good SRE must apply sound engineering principles to enhance automation for all deployments, manage monitoring and alerting systems, and stay on top of potential issues due to scale, security vulnerabilities, and infrastructure decisions. Experience with cloud-native software (e.g. Docker, Kubernetes) and general knowledge of networking and distributed systems are a must. 

Leverege powers multiple large scale, business critical solutions for Fortune 500 companies with over a million devices already connected on the platform. We are key technology partners of both Google and AWS and currently power the largest Low-Power Wide-Area Network (LPWAN) IoT  solution in North America. Leverege stays up to date with the latest technologies to improve the scale and reliability of our IoT platform and allow developers to continually build new IoT solutions. 

Responsibilities

  • Manage OpsGenie rotation and respond to incidents to meet SLAs for our customers
  • Run the infrastructure with Terraform, Kubernetes, and Helm on GCP and AWS
  • Improve monitoring and alerting systems to catch incidents and reduce false positives
  • Implement best SRE practices in documenting and making improvements to infrastructure
  • Build internal tools to manage multiple customer projects
  • Debug production issues across the entire tech stack (i.e. VMs, containers, cloud, front end)
  • Grow the CI/CD pipeline at Leverege
  • Design, build, and maintain core infrastructure pieces 

Requirements

  • 2+ years experience in a fast-paced professional setting (startup experience is a plus)
  • 2+ years experience with any of the cloud providers (AWS and GCP are a plus) 
  • 1+ year experience with Docker and Kubernetes
  • Familiarity with cloud-native tools such as Prometheus, Grafana, Helm, Chart Museum, Istio, etc
  • Strong programming skills and experience building tools using bash, Python, Javascript, Ruby, and/or Go
  • Strong debugging skills on distributed systems

Apply to Leverege

Thanks for applying!

Our team is looking forward to connecting with you and will be in touch very soon! In the meantime, check out the resources on our blog!

Explore Our Blog
Oops! Something went wrong while submitting the form.

Other Open Positions

View All Open Positions