Zoox is transforming mobility-as-a-service by developing a fully autonomous, purpose-built fleet designed for AI to drive and humans to enjoy.
Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. In this role, you will own the full lifecycle of our services—from designing fault-tolerant, maintainable systems to deploying, operating, and continuously improving them in production. As a robotics company, Zoox embraces automation at every layer of our infrastructure, and you’ll help drive that ethos forward. You’ll work hands-on with systems that process massive volumes of data and support compute-intensive pipelines running on both CPUs and GPUs.
About Zoox
Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.
Accommodations
If you need an accommodation to participate in the application or interview process please reach out to
[email protected] or your assigned recruiter.
A Final Note:
You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.
In this role, you will:
Design and implement highly scalable and reliable systems to support Zoox's autonomous vehicle platform. Optimize system performance, reliability, and scalability. Develop and maintain monitoring, alerting, and reporting systems to ensure proactive identification and resolution of issues.Collaborate with software engineering teams to improve software architecture, deployment processes, and automation.Conduct root cause analysis of production issues and implement corrective actions. Implement disaster recovery and business continuity plans.Qualifications
5+ years of experience in site reliability engineering or a similar role, with a strong background in working with large-scale distributed systems.Proven experience with cloud platforms such as AWS, GCP, or Azure.Expertise in container orchestration technologies like Kubernetes.Deep understanding of networking, storage, and database technologies. Strong programming skills in languages such as Python, Go, C/C++, or Java. Experience with infrastructure as code tools such as Terraform, Ansible, Salt, or CloudFormation.Bonus Qualifications
Experience in the automotive or autonomous vehicle industry.Knowledge of security best practices and compliance requirements.