Quartzy – Senior Site Reliability Engineer, Foundation Team

I’m hiring to fill a full-time remote position on my team at Quartzy; the Foundation Team – a team dedicated toward helping our product engineers move quickly and reliably in building and shipping software.

As of today, we have Nix deployed onto every Quartzy engineer’s laptop with Nix shells that they use in local development. We’re also in the late stages of rolling out a fully Nix driven deployment system based on HashiCorp Nomad and nix-nomad (a little Nix library I wrote, which you could help me maintain). Soon, our entire production infra will be running on NixOS. We are slowly opting into Nix everywhere in our infra that it adds value (which turns out, in my experience, to be a lot of places).

I hope to use this role to help upstream improvements to the nixpkgs/NixOS ecosystem. My hope is that the path we pave may help other startups like ours adopt Nix.

If you’d like to help advance scientific research while getting to work with Nix, you can apply here.

I’ve copied the full job description below:

Quartzy is seeking a Senior Site Reliability Engineer to join our fully remote and growing Engineering team! Our mission at Quartzy is to accelerate the pace of scientific discovery. As a member of Quartzy’s Foundation Team, you will engineer solutions that help our Product Team achieve this mission with speed and reliability.

About This Role

Quartzy’s Foundation Team designs, builds, and operates the platform on which our product development teams ship their applications. In this role, you’ll provide technical leadership to all of Engineering by interfacing directly with our product development teams. Your role as a mentor for (and at times, a contributor to) these teams will support timely delivery and reliable operation of our core applications.

Our team builds with HashiCorp Nomad, Consul, Vault and Terraform on top of AWS to create the backbone of our software operations. We ship containers, and have invested heavily in Nix as the glue that pieces all of this together. Monitoring and observability is provided by Datadog, helping us keep things running smoothly. The products our team builds with these tools enable us to continuously deliver software to production, with confidence, dozens of times a day. We facilitate and lean on intelligent monitoring and alerting workflows, and design and build all of these solutions using a code-first approach to infrastructure that we aim to scale with our company.

About You

The ideal person for this role will have prior experience in a SaaS company on a Foundation, Platform, DevOps or similar team. They will have experience as a Senior Site Reliability Engineer, Senior DevOps Engineer, or other hybrid role bridging software and operations responsibilities. This person will work very closely with our Lead Cloud Engineer, and are expected to bring the experience necessary to at times work self-directed. This person needs to feel comfortable taking the initiative with external team members to solve problems cross-functionally. Our most successful hires are motivated by our mission to accelerate the Life Sciences industry by building and operating high quality technology to accelerate research & development.

Why Quartzy

Quartzy is the world’s #1 lab management platform. Every day, hundreds of thousands of scientists from all over the world improve the efficiency of their research by using Quartzy. Our software combines lab resource management and E-Commerce, producing unique value in this large market, returning time to researchers who can focus on their next discoveries. Our customers range from wine makers, to food/ag companies, to companies working on COVID testing and cancer therapeutics. We are humbled every day to serve them.

What You’ll Do

  • Initiate and support our HashiCorp/AWS infrastructure-as-code using a combination of Nix, Terraform and related tools, providing development, QA, and production environment automation.
  • Support infrastructure refactoring initiatives and seek to automate manually performed tasks, drive infrastructure monitoring and cloud based delivery tooling towards meeting Engineering SLAs and key operating metrics.
  • Help establish best practices and standards including work breakdown, estimating and assisting with developing delivery schedules.
  • Perform root-cause analysis for service interruptions and establish preventive measures.
  • Support development of policies and procedures for standard operating procedures around change management, access provisioning, and structured incident handling in support of achieving company level security and compliance programs and frameworks.
  • Participate in the on-call support rotation.
  • Participate in knowledge sharing opportunities and contribute to the overall growth and collective knowledge of the team.

What We’re Looking For

  • Bachelor’s degree in Computer Science, Software Engineering, Information Technology or equivalent industry experience
  • 5+ years of experience working on a Platform or Cloud Engineering team in a hands-on role operating a 24x7 AWS based workload, infrastructure, database, and networking
  • Experience with TCP/IP VPN and networking concepts, SSH, DNS, SSO and authentication standards, system performance-monitoring tools, load balancing firewalls and HTTP traffic, email routing, Redis, ElasticSearch, queuing technologies, and interacting with REST APIs for application integration and automation.
  • Experience managing cloud-based infrastructure-as-code using Terraform or similar.
  • Experience deploying containers and workload orchestration tools such as Docker, Nomad, Kubernetes, ECS, EKS, or OpenShift, in a multi-cluster multi-environment setup.
  • Experience working with Git, or other version control systems, and automating code delivery pipelines and infrastructure as code supporting frequent production deployments
  • Experience managing and SQL performance tuning databases on RDS, MySQL, Postgres or other production scale database platforms.
  • Experience working with or under one of more security and compliance certification frameworks.
  • Excellent communication and interpersonal skills, ability to explain complex technical concepts to non-technical team members, a self-starter with a bias towards action and taking initiative on your own to advance project work with other team members.

What We Offer

  • Great Insurance - We cover 100% of employee premium cost and 50% for partner/family
  • Great Culture - Participate in our fun events like speaker series, virtual happy hours & company/department off-sites
  • Remote Team - We’re 100% distributed so you can live anywhere in the US!
  • Transparency - Weekly all company stand ups, monthly town halls, quarterly state of the start ups and anytime access to co-founders
  • Generous Time Off - Take the time you need, when you need it
  • Internet Stipend - Quartzy provides a monthly stipend for your internet service
  • Great Gear - We’ll set you up for success with the latest tech and help you outfit your home office.

Want to learn more take a look at what people are saying about us on Glassdoor!

If that sounds like you, we’d love to hear from you!

Again, if you are interested, you can apply here.

Hosted by Flying Circus.