Meet the Team
Cisco’s Collaboration Business Unit empowers people and organizations worldwide to connect, communicate, and innovate seamlessly.
You will collaborate with a global team of software engineers and SREs responsible for delivering world-class collaboration experiences at scale. Our team supports backend services deployed worldwide and works closely with development, product, and operations partners to ensure reliability and performance.
Webex is powering the shift to the hybrid workforce, helping people stay connected in a rapidly evolving digital world. We foster a startup-like culture that values innovation, ownership, and collaboration, while offering the scale and impact of a global technology leader.
Your impact
As a
Site Reliability Engineer (SRE)
supporting backend services for Cisco’s SaaS collaboration products, you will play a critical role in delivering
reliable, scalable, and resilient
experiences across calling, messaging, meetings, and contact
center
solutions.
Your work will directly
impact
the availability, performance, and quality of services used by millions of users globally.
Specifically:
Own the deployment and operation of critical collaboration services across cloud and hybrid environments, driving reliability and scalability.
Design, evolve, and
optimize
CI/CD pipelines and automation, including AI‑first tooling for deployment, monitoring, and incident response.
Lead incident response for complex production issues, perform root cause analysis, and drive systemic reliability and performance improvements.
Use observability data to guide capacity planning, scaling strategies, and resource optimization across services.
Define and champion operational best practices, documentation standards, and a culture of reliability and operational excellence.
Minimum Qualifications
Bachelor’s degree in Computer Science
, Engineering, or related field (or equivalent experience) with
3–5 years
in Site Reliability Engineering, Cloud Operations, or Systems Engineering.
Strong hands-on experience operating production services using
Docker and Kubernetes
in cloud or hybrid environments.
Proficiency
in
one or more programming or scripting languages
(e.g., Python, Go, Bash) to build automation and operational tooling.
Experience with
monitoring, observability, and incident response
in production environments, including on-call participation and post-incident reviews.
Working knowledge of
Linux systems, networking, distributed systems, CI/CD pipelines, infrastructure-as-code, and Git-based workflows
.
Preferred Qualifications