Senior Site Reliability Engineer in Columbus, OH at Fast Switch

Date Posted: 12/10/2019

Job Snapshot

  • Employee Type:
  • Location:
    Columbus, OH
  • Job Type:
  • Experience:
    At least 5 year(s)
  • Date Posted:

Job Description

Job ID: 52985

Senior Site Reliability Engineer. Our client in Columbus, Ohio has a contract need for a Senior Site Reliability Engineer who be working within a collaborative, flexible environment, helping to push technology forward. The position will be responsible for accelerating product development while providing solutions that enhance the availability, scalability, security and reliability of our client's platforms and supporting systems. 

You are encouraged to bring fresh ideas and new perspectives to the position. If you are looking for opportunities to lead and expand your skill-set, as well as leverage your expertise in systems and applications to shape the future of products, this may be the position for you.


  • Share a 24x7 on-call rotation with your team and respond to service incidents
  • Support the development team with tools, services and information to enhance productivity and manage risk
  • Design, implement and support non-production and production solutions focused on scalability, reliability, security and availability
  • Implement world-class monitoring, alerting, and self-healing capabilities by applying DevOps best practices and honoring SLIs, SLOs and SLAs.
  • Diagnose and resolve issues across the entire stack such as: application, queues, operating system, network, hardware, including cross-application dependencies
  • Lead RCA efforts with an eye towards actionable outcomes and continuous learning
  • Develop and support CI/CD processes for applications and services
  • Use automation to eliminate toil, reduce time to action and repair services
  • Uncover and address weaknesses in our complex systems through chaos engineering

Required skills:

  • 5+ years of experience building and supporting highly-available systems in physical or cloud-based hosting locations (e.g. aws, azure)
  • 3+ years of development experience in at least one of these languages: Java, C#, Go, Python
  • Extensive experience with Linux
  • BS Degree in computer science or related degree; or the equivalent in experience 
  • Experience designing, implementing and operating distributed systems at scale
  • Experience identifying and resolving high-severity, time-sensitive issues and outages across large, distributed, multi-service environments 
  • Experience with distributed caching strategies (e.g. Redis, Couchbase, Elasticache)
  • Knowledge of authentication and authorization standards (e.g. saml, oauth, openid, jwt)
  • Knowledge of client-side technologies and troubleshooting: (e.g. HTML5, CSS, modern JS frameworks/libraries, CDN delivery) 
  • Demonstrated experience in a 24x7 on-call team supporting a customer facing production environment 
  • Strong scripting experience, and a developer's mindset towards system administration (always looking to automate manual tasks and eliminate toil)
  • Experience in application and infrastructure security tools (e.g. Whitehat, Veracode)
  • Strong experience with automation technologies (e.g. Puppet, Terraform, Ansible, CloudFormation, Chef)
  • Experience with common middleware (e.g., Apache, NGINX, IIS, Tomcat, JBoss)
  • Experience with SQL databases (e.g., PostgreSQL, Oracle, MySQL)
  • Knowledge of networking concepts and application protocols, especially TCP/IP, VPN, BGP, HTTPS and DNS
  • Experience using log management tools such as Splunk, Sumologic and/or ELK
  • Experience with APM tools such as New Relic, Datadog and Dynatrace.
  • Experience with build and release processes technolgoies GIT, GitHub, GitFlow, Sonar, CI/CD
  • Expertise with various deployment processes (Blue/Green, ZDT, Canary, LB/DNS strategies)
  • Experience with AWS technologies is a strong plus (e.g. VPC, EC2, S3, EBS/EFS, ELB/NLB/ALB, RDS, DynamoDB, SQS, SNS, CloudWatch, and Route 53)
  • Experience with container technologies is a strong plus (e.g. Docker, k8s, ECS, Packer)  

Client's ecosystem includes:

  • Angular, Java, Go, Node, PHP
  • MySQL, Oracle, PostgreSQL, SQL Server
  • ElasticCache, ElasticSearch, Couchbase, Solr, Redis
  • SQS, SNS, RabbitMQ
  • SAML, AES, OAuth, JWT, OpenID
  • Linux, Windows
  • Apache, Nginx, IIS, Tomcat
  • Sumologic, New Relic, Datadog PagerDuty, CloudWatch
  • Git, GitHub Enterprise, CircleCI, Jenkins, Terraform, CloudFormation, Puppet
  • VMWare, EC2, AWS ECS, K8S 
  • AWS Cloud & Physical Datacenters 

You are a good fit if:

  • You are passionate and adept at software development and system engineering/operations
  • Continuously learning about application scalability, availability, reliability, and security
  • Intensely curious about how complex distributed systems operate and fail at scale
  • Think freely and independently, and are ready to share your view
  • Eager to learn from mistakes and you socialize the lessons learned
  • Like to take ownership of infrastructure components and leading projects