Senior Software Engineer, Site Reliability Engineering, Cloud IRT
Minimum qualifications:
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- 5 years of experience with software development in one or more programming languages.
- 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems.
- 2 years of experience leading projects and providing technical leadership.
- Experience troubleshooting production incidents as part of an on-call rotation.
Preferred qualifications:
- Master's degree in Computer Science or Engineering.
- Experience in telemetry systems, incident and risk management.
- Ability to work across organizational boundaries.
- Excellent systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
About the job
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.
As Senior Software Engineer, Site Reliability Engineering, Cloud Incident Response Team (IRT), you will respond to and help coordinate, mitigate, or resolve major incidents across all of Google Cloud Platform. Our rotation includes a community of some of the most experienced and executive engineers in Cloud Site Reliability Engineering. You will build the processes, systems, and tooling necessary to deliver customer-focused mitigations to critical Google Cloud Platform incidents. This will involve working on tooling, systems, and processes.
Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.Responsibilities
- Engage in and improve the whole lifecycle of service from inception and design, through to deployment, operation, and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Build systems and tooling to support Cloud IRT team; improve visibility into state of Cloud, detection of large scale issues, communications to customers, stakeholders and customer facing teams.
- Participate in oncall rotation supporting critical incident response for Google Cloud Platform (GCP).
Additional Information
- Published
- 2026-06-24T08:38:13.744Z
- Url
- https://careers.google.com/jobs/results/116674326396052166-senior-software-engineer/
- Jobtype
- FULL_TIME
- Employer
- Languagecode
- en-US
- Remote
- onsite
- Isremote
- No
- Ishybrid
- No
- City
- London
- Country
- UK