Site Reliability Engineer (Remote)

Mattermost provides an open source enterprise-grade messaging platform to the world’s leading organizations that allows teams to collaborate securely and privately anywhere. With over 10,000 server downloads / month our customers include Intel, Samsung, Affirm, The US Department of Defense and more. Our private cloud solutions offer secure, configurable, highly-scalable messaging across web, phone and PC with archiving, search, and deep integrations with hundreds of SaaS and on-premises technologies. Headquartered in Palo Alto, California, our company serves customers around the world with a distributed organization spanning the globe.

We value high impact work, ownership, self-awareness and being focused on customer success. If these values match who you are, we hope you’ll learn more about working at Mattermost and come talk to us!

We are looking for an engineer with demonstrated experience in software development and infrastructure using Kubernetes with a focus on ensuring high reliability and scaling of Mattermost’s new SaaS offering through building tools, deploying infrastructure and automation in Kubernetes.


  • Build services and tools to ensure the stability of Mattermost’s SaaS offering
  • Define infrastructure in code with Terraform and other tools
  • Write thoughtful and high-quality code in Go
  • Follow our engineering best practices, and ensure alignment with our Leadership Principles
  • Develop services to handle automatic recovery from incidents and disasters
  • Automate incident or disaster simulations to identify blindspots
  • Set technical vision and innovate to be on the forefront of self-healing SaaS services
  • Implement, maintain and tune monitoring and alerting systems
  • Deploy applications to and manage Kubernetes clusters
  • Participate in our on-call rotation to respond to incidents and resolve problems.


  • Bachelor’s degree in Computer Science or related fields, or significant professional DevOps or SRE experience
  • 2+ years of previous experience as a developer or SRE with operational responsibilities
  • Strong experience running reliable, high scale applications with Kubernetes in production
  • Strong skills and experience working with infrastructure as code tools, such as Terraform
  • Familiarity with container systems such as Kubernetes & Docker
  • Solid programming skills and experience with or an ability to quickly become proficient in Go
  • Ability and willingness to be on-callExperience with SRE and DevOps methodologies


  • Experience with distributed application systems using HTTP, WebSockets, RPC, pub/sub, etc. at scale
  • Open source contributions to related projects
  • Knowledge of Grafana and Prometheus
  • Comfortable with GitHub, Jira, Jenkins, CircleCI
  • Experience working in open source communities

More Information

Leave your thoughts

Share this job

How did you hear about this opportunity?

Please let Mattermost know you found this position on as a way to help us get more companies to post here!

Do you find it difficult to access remote jobs ?

Please subscribe to the supportive
community and learn the skills of finding remote job and being good at it too.

We will send you a weekly newsletter with a lot of love, support and inspiration.