This site uses cookies. To find out more, see our Cookies Policy

Site Reliability Engineer (RT3) in Seattle, WA at Fast Switch

Date Posted: 1/11/2019

Job Snapshot

  • Employee Type:
    Contractor
  • Location:
    Seattle, WA
  • Job Type:
  • Experience:
    Not Specified
  • Date Posted:
    1/11/2019

Job Description

Job ID: 50387

Site Reliability Engineer. We are hiring a Site Reliability Engineer contractor in Seattle, WA who will work with system and software engineers to build reliable, high capacity and high-performance systems in support of the mission to reimagine learning for millions of students and learners worldwide.

About the client:

  • We aim to break down walls between development and operations; participate in finding and building solutions which enable teams to deliver software updates in a way that is highly stable and operationally sound.
  • We are strongly invested in the AWS Cloud, infrastructure-as-code, and monitoring-as-code.
  • We favor the practical and pragmatic over the ideal, including finding right-sized solutions.
  • We are anticipatory and forward-looking, reliable, and have a bias toward taking action.
  • We understand that without our customers our efforts are worthless, and that operational changes are likely to have a direct impact on user experience. We understand that uptime is paramount, and we work backwards from there.

Essential Accountabilities:

  • The ability to collaborate with product teams and technical principals to prioritize our efforts.
  • Hands-on design, understanding, and troubleshooting of highly-distributed, large-scale production systems — both modern and legacy, monolithic and micro.
  • Co-ownership with the development teams over reliability, uptime, capacity, and performance.
  • Ensuring the repeatability, traceability, and transparency of our infrastructure automation including alignment with client standards and best practices for operational excellence.
  • Identifying highest-impact opportunities to optimize existing systems; ensuring “right-sized” solutions in consideration of technical and business constraints.
  • System design consulting for teams seeking to leverage or improve their production infrastructure.
  • Anticipate, build, and plan capacity for upcoming product/feature launches.
  • Working with application teams and product principals to fully operationalize software/systems projects (including security requirements), delivered on-time and within budget.
  • Stay current on industry trends; conceive and present to management ways to improve current practices, to improve our standing in the marketplace, and remain on the cutting edge of technology.
  • Mentor team members; foster growth by setting high-reaching goals; providing support as needed to achieve them.

Required:

  • 3 years of experience as a software application engineer.
  • 3 years of experience as a system/release engineer.
  • 5 years of experience with the foundational AWS services: EC2, RDS, and S3.
  • 3 years of experience with the supporting AWS services (e.g., SQS, SNS, SES, CloudWatch, ElastiCache, Lambda).
  • 1 year of integrating continuous-integration and continuous-delivery software development life cycles (i.e., CI/CD) into one or more applications (using Jenkins, Circle CI, or other modern CI tools).
  • 3 years of infrastructure and/or system configuration automation technologies (e.g., Terraform, AWS CodeDeploy, Puppet, Ansible, Chef).
  • 3 years of experience in container and orchestration technologies (e.g., Docker, Vagrant, etcd, Consul, Zookeeper).
  • 3 years of experience with Linux-in-the-cloud, with at least 1 year of “Enterprise Linux” distributions (e.g., RHEL, CentOS, Amazon Linux).
  • 1 year of experience with cloud database operations and deployment experience (e.g., RDS MySQL, RDS PostgreSQL, Amazon Aurora); caching operations & deployment experience (e.g., Memcache, Redis).
  • 3 years of experience with monitoring applications and infrastructure; familiarity with common monitoring systems (e.g., CloudWatch, Datadog, New Relic, Sumo Logic).
  • Strong problem-solving, root cause understanding, and systems engineering skills.
  • Ability to design and manage escalation response plans — from monitoring, to reaction/response/remediation, to retrospection/post-mortem in culturally-aligned (proactive, customer focused, collaborative, proven-with-data) ways.
  • Demonstrated expertise building and managing highly-scaled production infrastructure in the cloud (AWS required; GCP, Azure, OpenStack a plus).
  • Excellent presentation and communication skills.
  • B.S. Degree in Computer Science (or related technical field, or equivalent industry experience).

Nice to Have:

  • Being able to translate between development, operations, security, product, and management dialects is a highly-sought skill.
  • Ability to translate knowledge and ideas into written-word as documentation.
  • Cloud and container-native Linux administration/build/management skills (e.g., AMIs, Packer).
  • Expertise with Lean/Agile deployment processes (e.g., blue/green, zero downtime, canary, and DNS strategies).
  • Being “conversational” in JavaScript/TypeScript, Python, PHP, Ruby, Golang, Java, Bash, Markdown, reStructuredText, HCL, JSON, YAML, and TOML would be valuable. Being fluent in 2-3 of them would be a huge plus.
  • Expertise with software development lifecycle branching and distributed source code management systems (e.g., Git/Mercurial, Git-Flow, GitHub-Flow).
  • A non-trivial background in open source is a huge plus.

** Due to high volume of applicants, only applicants with the following information will have further consideration for the above open position: First and last name, Education (graduation date, degree), Authorization to work in the U.S. including C2C, or H1 sponsorship transfer request. 

To view all our open positions, please go to: http://www.jobs.net/jobs/fastswitch/en-us/all-jobs/