SRE Manager – Cloud Infrastructure, Cloud Platform

Lucid Motors

7200

Newark, CA, United States

Posted on: 2022-11-08

Category: emobility

Ready to make this your next chapter?

Let Lucid Motors know you found them on WorkInGreen. It helps more companies post climate jobs here.

Expired

Employment type:

Full time

Experience required:

Senior

Salary

Salary not provided

About the company:

We launched Lucid in 2016 to build the world’s best cars and accelerate the shift to clean energy. Our vehicles deliver best-in-class performance *and* efficiency. Equally important, we’re building a world-class team: From our state-of-the-art factory in Arizona to our global headquarters in California’s Silicon Valley, we’re recruiting high-performing people who want to help decarbonize Earth. Join us!

Leading the future in luxury electric and mobility

At Lucid, we set out to introduce the most captivating, luxury electric vehicles that elevate the human experience and transcend the perceived limitations of space, performance, and intelligence. Vehicles that are intuitive, liberating, and designed for the future of mobility.

We plan to lead in this new era of luxury electric by returning to the fundamentals of great design – where every decision we make is in service of the individual and environment. Because when you are no longer bound by convention, you are free to define your own experience.

Come work alongside some of the most accomplished minds in the industry. Beyond providing competitive salaries, we’re providing a community for innovators who want to make an immediate and significant impact. If you are driven to create a better, more sustainable future, then this is the right place for you.

The Cloud Platform team at Lucid is seeking a Site Reliability Engineering Manager for Cloud Infrastructure. In this position, you will lead a team to build and maintain the reliability of the Cloud Platform's underlying Cloud Infrastructure on various public and private Cloud Providers.

Our ideal candidate exhibits a can-do attitude and approaches the work with vigor and determination. We are looking for a hands-on Software Engineering Manager who will collaborate with various stakeholders to build tools & services to keep up the SLA of the Data Analytics Cloud Platform.

At Lucid, we don’t just welcome diversity - we celebrate it! Lucid Motors is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, national or ethnic origin, age, religion, disability, sexual orientation, gender, gender identity and expression, marital status, and any other characteristic protected under applicable State or Federal laws and regulations.

Notice regarding COVID-19 protocols

At Lucid, we prioritize the health and wellbeing of our employees, families, and friends above all else. In response to the novel Coronavirus all new Lucid employees, whose job will be based in the United States may or may not be required to provide original documentation confirming status as having received the prescribed inoculation (doses). Vaccination requirements are dependent upon location and position, please refer to the job description for more details.

Individuals in positions requiring vaccinations may seek a medical and/or religious exemption from this requirement and may be granted such an accommodation after submitting a formal request to and the subsequent review and approval thereof by our dedicated Covid-19 Response team.

To all recruitment agencies: Lucid Motors does not accept agency resumes. Please do not forward resumes to our careers alias or other Lucid Motors employees. Lucid Motors is not responsible for any fees related to unsolicited resumes.

The Job:

Manage and lead the Service Reliability Engineering of cloud services across Lucid Motors.

Collaborate with Service Owners to define SLOs, build SLIs, and ensure the Data Analytic services meet the SLA.

Indulge with Developers, DevOps, Data Scientists, and Quality Engineers to build reliable services from design to production.

Build tools and frameworks to automate the monitoring systems to ensure the highest level of uptime in various production-grade environments.

Experienced with HA big-data systems and can identify the bottlenecks at early stages.

Use the self-service model approach by creating tools that enable Data Engineers & Data Scientists to optimize Data analytics jobs based on the load and be cost-effective.

Manage Kafka as a streaming platform to run various workloads, induce observability tools to monitor and inspect the pub-subs, and automate the system scalability based on the load.

Manage orchestration engines such as Airflow, Kubeflow, and others to run various Data ETL and ML pipeline processes by providing additional frameworks and tools to achieve higher reliability.

Manage the cloud connectivity modules and be able to troubleshoot end-to-end on a private or public cloud Infrastructure.

Use Incident management processes and always look for improving operational efficiency.

Swiftly navigate through the incident, perform the impact analysis and take appropriate actions.

Understands customer impact and can prioritize the workload between features development and customer support

Create a 24/7 service availability model to proactive monitor the systems across geographical locations.

Support the reliability aspects of services that use MQTT, Kafka, EMQX, RabbitMQ, Spark, Hive, and other open-source software services.

Track record of hiring and building SRE organization from the ground up.

Qualifications:

B.S. or M.S. degree in Computer Science, Engineering, OR equivalent work experience

Minimum 5+ years of experience in SRE or DevOps Engineering

2-5 years of experience in managing one or more SRE teams

3-5 years of experience in managing large-scale data analytics platforms that use Spark, Storm, or other similar frameworks

1-3 years of experience deploying and maintaining applications that are built using Docker and orchestrated on Kubernetes on Public or Private Cloud Providers

1-3 years of experience using Cloud Automation tools such as Terraform, Pulumi, Cluster API, or other frameworks

1-3 years of experience in Programing or scripting languages using Python, Go, Bash/Shell, or others

1-3 years of administrative operations knowledge in RDBMS such as Postgres and no-SQL such as Cassandra, MongoDB, or others

Show experience in hiring and building high-performant SRE teams

Show the traits of being detail-oriented with time management and organization skills, and dedication to quality

Experienced with various debugging tools and troubleshooting performance bottlenecks at the infrastructure or application tier

Good to have experience with Config Management and automation using Ansible, Chef, Puppet, or others

Experienced with various Networking challenges and able to resolve networking bottlenecks at peak load

Good to know about REST-based APIs and knows how to triage the request-response

Emobility jobs More climate jobs in United States

935 Emobility jobs

Not quite the right fit? Keep looking.

More climate roles that match your skills and values.

Full time

Lucid Motors

7200

Full time

Emobility