Zoox is transforming mobility-as-a-service by developing a fully autonomous, purpose-built fleet designed for AI to drive and humans to enjoy.
Zoox is looking for an experienced Software Engineer to work on key new frameworks and infrastructure modernization for our custom High-Performance Computing infrastructure and its supporting ecosystem of tools and services. Zoox HPC services combine industry-best scheduling and workload orchestration technologies, such as Ray.io and SLURM, with value-add workflows specifically for Autonomous Vehicle development. These HPC services form the backbone of development workflows across all Zoox software teams, from data engineering to training our AI models in Perception, Planner, Prediction, to simulation, and more. You will take on a breadth of end-to-end responsibilities including distributed system design, algorithmic job scheduling, and adaptive cloud scaling in support of all of Zoox’s computational needs.
The position comes with a high degree of independence and the opportunity to help define Zoox’s compute scaling strategy, both technically and organizationally. You will work closely with stakeholders in Autonomy and Software teams to iterate on world-class developer experiences, incorporating the latest industry tools and best practices.
Compensation
There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. The salary range for this position is $210,000 to $275,000. A sign-on bonus may be offered as part of the compensation package. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.
Zoox also offers a comprehensive package of benefits including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.
In this role, you will:
Evaluate new distributed system paradigms and technologies to meet Zoox’s ever-growing computational and storage needsStrike a balance between incremental improvements to Zoox’s existing in-house HPC infrastructure and greenfield services and abstractions.Create production-grade web service APIs, SDKs, and other tools to provide a world-class developer experience for all of Zoox’s software teams.Qualifications
7+ years of experienceExperience with Ray.io, particularly Ray Core and Ray DataExperience with Kubernetes, particularly for heterogeneous workloads and clustersExperience with Ray.io and Kubernetes deployed on Amazon Web Services (AWS) or other similar cloud providers such as Azure or GCPProficiency with PythonBonus Qualifications
Exposure to machine learning workloads (training, inference, data generation, etc) from a compute infra service provider perspectiveExperience with Kubernetes or SLURM at scale (>10k+ nodes)Experience with SLURM workload manager