Zoox is transforming mobility-as-a-service by developing a fully autonomous, purpose-built fleet designed for AI to drive and humans to enjoy.
We are seeking an experienced and highly skilled data scientist to join the Perception Data and Labeling team.. The team is responsible for training and evaluation data powering the perception (vision, lidar, and other modalities) ML models at Zoox. The candidate will work alongside data ops partners, ML engineers, software developers, and data engineers to improve model performance through high quality human- and auto-labeled data.
There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.
Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.
In this role, you will:
Define and implement scalable data quality measures across complex, multimodal data labeling pipelinesDrive data-centric ML model improvements to achieve critical Zoox milestonesSupport an org-wide data ontology and class structure for perception modelsDetermine trade-offs and integrations between human-labeled, human-in-the-loop, and zero-shot autolabeled dataBuild metrics to quantify labeling throughput, capacity, and annotator/vendor qualityQualifications:
Master's or PhD degree in a field relevant to autonomous driving (computer science, robotics) to the analysis of human data (computational neuroscience, cognitive science) or a related fieldProficient using data query languages (SQL and/or Spark/scala) to quickly build complex yet efficient data queries at scale and using Python to build production-quality codeProficient in exploratory data analysis (EDA) and data visualization to understand and present trends and their implications for the business.Background in statistical modeling and analysis; including experience making data-driven decisions that connect point and uncertainty estimates to business impact.Experience with data-centric ML development and data curationBonus Qualifications:
Experience with experiment design and statistical comparisons (A/B testing, parametric/non-parametric statistics, etc.)Experience with human data collection, including annotation task design