
Website Homebase
Job Description:
Our Site Reliability team is expanding, can you help ensure the stability, resilience, and scale of our services through automation, observability, and engineering practices. We value expertise from system operations, cloud infrastructure, pipeline engineering, software development, and performance testing to make sure our services are operating optimally. The successful candidate will be able to identify problems and suggest enhancements using data from metrics, logs and traces. Then collaborate with product teams to deliver improvements.
Job Requirements:
The ideal candidate will strive for continual improvement by contributing and assessing new ideas and innovations. You will work closely with development squads and be jointly responsible for the health of our sites and services. The ideal candidate will have some or all of:
- Monitoring tools and instrumentation, Datadog or similar observability platforms
- AWS expertise; familiarity with core services
- Software development or strong scripting experience
- PagerDuty, Slack, and related tooling integrations
- Good understanding of traditional operations areas: Linux, storage, networking
- Good familiarity with Docker and Kubernetes
- Continuous delivery, build pipelines, artefact repositories, zero-downtime deployment
- Experimentation strategies, A/B testing, canary releases
- Proving resilience and scalability using load and stress testing
- Experience with any of the following: ElasticSearch, PostgreSQL, Redis
- CDNs and strong knowledge of web delivery protocols
- Experience of incident management and leading post-mortems
- Some understanding of iOS or Android also beneficial
Job Details:
Company: Homebase
Vacancy Type: Full Time
Job Location: Northampton, GB
Application Deadline: N/A
careerstrivia.net