Full Time Site Reliability Engineer at Buildkite (Remote, Australia/New Zealand)

Company: Buildkite
Geo: Remote, Australia/New Zealand
Head Office: None
Level: Senior
WFH: Remote, from home, or will sponsor co-working space if you like
Type: Full Time
Salary Range: $155,000 – $180,000 AUD, $160,000 – $185,000 NZD
Closing Date: until the role is filled!

We’ve got interesting challenges scaling Rails with big Postgres databases. If this sounds like your sort of fun, please keep reading, or see the listing on our website for more details: Site Reliability Engineer

About the company

At Buildkite we build tools to help the best software teams stay happy and productive. We’ve rethought how CI/CD should work and have built a platform that is fast, reliable, secure, and is able to scale to the needs of the most demanding high-growth tech companies including Shopify, Pinterest, Wayfair, Cruise, PagerDuty, CultureAmp, and Canva.

Buildkite is a differently shaped company that values work-life balance and supports staff to work the ways that make sense for them. From the beginning, our goal has been to build a company that was profitable, grew sustainably, and had a strong, people-centered culture. We’re currently a distributed team of 36 humans working remotely from Seattle, Vancouver, Perth, Sydney, Hobart, Adelaide, Kyoto, Oakland, Long Beach, Wellington, Berlin, London, Kiev, Cordoba, and Melbourne.

Take a look at Buildkite’s values and the way they shape the benefits of working with our team here.

About the position

​​Buildkite is a continuous delivery platform that helps development teams ship quality code, fast. Making developers happy and productive makes us happy, and we’re lucky to work closely with some of the best software teams in the world, including Airbnb, Shopify, Google, Pinterest, and Basecamp.

We’re looking for a Senior Site Reliability Engineer to join our team and help us scale one of the biggest Postgres-powered Rails majestic monoliths in the world. We have some really fun problems and challenges to work on to keep our databases and other infrastructure scaling reliably, and we’d love your help! We run on AWS, we’re deep into observability using Datadog, moving to immutable infrastructure via Fargate, doing interesting partitioning with Postgres and iteratively rearchitecting to improve performance and reliability.

A typical day for a Site Reliability Engineer at Buildkite might look like:

  • Iterating on changes to our PostgreSQL database, and any related changes to the Ruby (Rails) code in our primary app.
  • Consulting with team members to improve how we diagnose, and fix the operational aspects of their software systems.
  • Collaborating via Basecamp on designing an upcoming feature or fix.
  • Troubleshooting our production app and isolating issues to fix.
  • Video calls with others on the team to discuss or solve problems, or to just say hi.
  • Providing feedback on a GitHub pull request, or responding to feedback left for you.
  • Helping customers via email and Slack support, and following up with “support hacks” to solve any identified issues.
  • Working to manage and improve on-call and incident response processes.
  • Ensuring we have a high level of confidence in the reliability of our code and systems.
  • Engineering solutions to improve the reliability and availability of our platform.
  • Working with peer teams to take into account security and operational concerns when developing.

This job is for you if you:

  • Have helped manage a production Rails application, and are familiar with database performance at scale, Rails and the surrounding Ruby ecosystem.
  • Have experience operating multi-Terabyte postgres databases, with informed opinions about sharding, partitioning and query optimization
  • Believe in quality code. You are comfortable writing tests and clear comments for the code you write. You know how to balance your own high standards of code quality with the problems you are solving and external constraints like how time-sensitive it is or the impact it will have.
  • Like solving problems. You are happy working through difficult technical problems and solving them in straight-forward ways. If you don’t know the answer immediately, you are comfortable digging until you figure it out and know the right point to ask for assistance.
  • Understand development processes. You are comfortable writing git commits, pull requests and issues. You know how to tackle critiquing others’ code in a positive and productive way, and are comfortable receiving the same sort of feedback.
  • Are a good communicator. You value empathy and kindness and are able to articulate your ideas and feelings when writing or speaking.
  • Are self-motivated. We are a remote company, so you will need to be comfortable stepping into gaps in the planning, taking initiative, and identifying what needs to be done and how to get it done.
  • Learn fast. You might not be an expert in everything we do initially, but you will quickly become an expert in some aspects. You are comfortable diving in and learning things, even if they are new to you.

Apply

Check out the listing for this position and apply from our website: