A leading technology firm is seeking an experienced Quality Manager in Sunnyvale, California. The role requires a minimum of two years' experience and proficiency in project management frameworks such as PMP and ITIL. Responsibilities include conducting risk reviews, managing system requirements evaluations, and ensuring compliance with quality assurance practices. The firm offers an equal opportunity workplace and values diversity in its hiring practices.
#J-18808-Ljbffr
$124k-184k yearly est. 1d ago
Looking for a job?
Let Zippia find it for you.
Low-Voltage Reliability Engineer for EV Electronics
Rivian 4.1
Palo Alto, CA jobs
A leading electric vehicle manufacturer in Palo Alto is seeking a Design-for-Reliability Engineer to enhance the reliability of low voltage electronics in their vehicles. The role involves monitoring product performance, utilizing statistical methods, and collaborating with manufacturing teams. The ideal candidate has a Bachelor's degree in Engineering and over five years in reliability engineering. Salaries are competitive, ranging from $146,900 to $194,610 based on experience.
#J-18808-Ljbffr
$146.9k-194.6k yearly 2d ago
Site Reliability Engineer - Observability
Rivian 4.1
Palo Alto, CA jobs
About Us
Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive's next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we're addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.
The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we'll map a new way forward. Working together, we'll create a future that's more connected, more intelligent, more sustainable for everyone.
Role Summary
We are seeking a Senior Site Reliability Engineer (SRE) specializing in Observability to join RivianVW's Data Platform - Production Engineering team. In this role, you will design, implement, and scale robust observability systems to ensure the health, performance, and reliability of our production environment. You will collaborate closely with cross-functional teams to create telemetry solutions that provide actionable insights into our distributed systems.
Responsibilities
Observability Platform Design: Architect, implement, and maintain observability systems, leveraging tools like Datadog, LGTM stack, OpenTelemetry, and Vector to enable real-time performance monitoring, logging, and alerting.
Telemetry Optimization: Evolve and scale telemetry pipelines to ensure low latency and high availability for metrics, logs, and traces across multi-cloud environments.
Performance Engineering: Proactively identify performance bottlenecks, optimize systems, and provide recommendations for reliability improvements.
Scalable Automation: Implement automation solutions to scale systems sustainably while driving improvements in reliability and deployment velocity.
Incident Management: Collaborate with the incident response team to establish data-driven debugging and troubleshooting processes using observability data.
Tooling Development: Create and maintain self-service observability tools and dashboards to empower teams across the organization.
Cross-functional Collaboration: Partner with development, DevOps, and infrastructure teams to define SLOs/SLIs and ensure observability is embedded throughout the software lifecycle.
Qualifications
Educational Background: Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
Experience: 5+ years in Site Reliability Engineering or a related role with a strong emphasis on observability.
Technical Expertise:
Proficiency in designing and operating observability platforms with tools like Prometheus, Grafana, Loki, Jaeger, or Datadog.
Experience with OpenTelemetry and distributed tracing in microservices architectures.
Deep knowledge of Kubernetes (e.g., EKS), ArgoCD, and Crossplane.
Programming Skills: Strong proficiency in Python, Go, or similar languages for building automation and custom telemetry solutions.
Cloud & Systems: Familiarity with multi-cloud setups, containerization (Docker), and Linux system fundamentals.
Soft Skills: Exceptional problem-solving, communication, and a data-driven approach to decision-making.
Pay Disclosure
Salary Range/Hourly Rate for California Based Applicants: $146,900 - $194,610 USD
Actual Compensation will be determined based on experience, location, and other factors permitted by law.
Benefits Summary: Rivian and Volkswagen Group Technologies provides robust medical, prescription, dental and vision insurance packages for full-time employees, their spouse or domestic partner, and their children up to age 26. Coverage is effective on the first day of employment.
Equal Opportunity
Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.
Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.
Candidate Data Privacy
Rivian and VW Group Technologies ("Rivian and Volkswagen Group Technologies") may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes ("Candidate Personal Data"). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law.
Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies' service providers, including providers of background checks, staffing services, and cloud services.
Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.
Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.
Please note that we are currently not accepting applications from third party application services.
#J-18808-Ljbffr
$146.9k-194.6k yearly 5d ago
Frontend Leaning Full Stack Engineer
Y Combinator 4.2
San Francisco, CA jobs
Y Combinator is run by a small team committed to helping founders build the next Airbnb, Stripe, Reddit, or Doordash. We work together on the same San Francisco campus where we run each YC batch, and we operate using the same principles we teach our startups. If you've read Paul Graham's essays or watched YC's videos, you already have a sense of what it's like to work here: fast-moving, product-led, and focused on helping founders succeed. Working at YC puts you right in the center of the startup world.
About the role
YC relies heavily on software to run the batch and support thousands of alumni - and much of that software is deeply frontend-driven, used all day by founders, partners, and investors. The Bookface team builds YC's most important user-facing tools, and nearly everything you work on will shape someone's experience in the batch or alumni community. Current projects include:
Internal Software. Build and improve the internal tools that keep YC running at scale - from productivity software for Group Partners who support thousands of companies, to systems that help finance and legal teams process high-volume operational workflows.
Bookface. The primary interface founders use throughout the batch - from event schedules to office hours to pitch prep. You'll be shaping UX and UI that founders interact with every single day.
Investor Tools. A critical frontend experience used by thousands of investors. Your work helps founder and investors connect with one another, and further the support of YC raising billions of dollars in seed funding.
About the team
Our whole software team is only 15 full-stack product engineers, and we enjoy working in a small team with high impact and knowing each other by name. We have a broad range of experiences from bigger companies like Meta and Google, and many of us have started startups ourselves. True to YC advice, our product engineers talk to our customers regularly and ship fast. We also define our own roadmap and often design our own products when needed.
Our stack is pretty straightforward (Rails, React, Postgres), and the last three engineers have learned it on the job. For this particular role, you will lean heavily toward owning the frontend: crafting fast, intuitive interfaces, bringing new product ideas to life in React, and shaping the quality bar for everything users touch. Furthermore, the further you are able to go down the stack -- to create API endpoints, server-side controllers and database schemas/migrations -- the better able you'll be able to own entire products end-to-end.
Location: This is an in-person role at YC's campus in Dogpatch, San Francisco. This is where our users (founders, partners, and employees) are five days a week, so it's optimal for you to be here with them. You must live in the SF Bay Area or be willing to relocate. We offer generous relocation support for those who want to move to SF to work here.
Compensation: $185K to $300K base salary, depending on experience. We also offer carry in the YC fund, which offers potential upside in the YC investment fund. (It would be the investment equivalent of upside at a startup).
Benefits: Our full benefits package includes medical, vision, and dental plans, infertility benefit, STD/LTD, life insurance, commuter benefits, flexible spending account, health savings account, 401(k) + 4% matching, generous parental leave, paid holidays, and flexible paid time off policy.
Work Authorization: Y Combinator is willing to sponsor certain employment visas in accordance with company policy.
Legal note: Y Combinator considers qualified applicants with criminal histories, consistent with applicable federal, state, and local law, including San Francisco's Fair Chance Ordinance. Y Combinator is committed to protecting the privacy of the personal information of job applicants and complying with the California Consumer Privacy Act. The privacy policy of Ashby, Inc., the hiring platform used by Y Combinator, governs the collection of such data and can be found here.
#J-18808-Ljbffr
A dynamic technology company in San Francisco is seeking a Senior Platform Engineer to influence the technical direction and build robust platform solutions. The ideal candidate has over 10 years of experience in software and platform engineering, with expertise in Terraform and Kubernetes. This role requires collaboration across teams to enhance reliability, scalability, and security of systems. The position offers a competitive salary between $180,000 and $275,000, comprehensive benefits, and a supportive work environment.
#J-18808-Ljbffr
$180k-275k yearly 4d ago
Senior+ Site Reliability Engineer
Crusoe Energy Systems LLC 4.1
San Francisco, CA jobs
Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability.
Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure.
About This Role:
Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform - and operational excellence is at the heart of that mission. As a Site Reliability Engineer focused on Operational Excellence, you will help ensure the stability, resilience, and performance of Crusoe's GPU cloud.
This role is ideal for engineers who thrive in fast-paced environments, enjoy solving operational problems, and want to grow their technical career while supporting incident response, reliability, and continuous improvement across a large-scale distributed platform.
You'll partner closely with senior SREs, infrastructure engineers, and platform teams to improve reliability, reduce operational toil, and strengthen Crusoe's incident management practices.
What You'll Be Working On:
Collaborate with cross-functional teams to define and refine availability metrics for Crusoe's cloud infrastructure, including establishing, tracking, and improving SLIs and SLOs.
Assist in incident response by identifying, diagnosing, and resolving service disruptions, and support post-incident processes through RCA documentation and participation in post-incident reviews.
Build, operate, and monitor infrastructure health using Crusoe's observability stack (Prometheus, Grafana, Alertmanager, OpenTelemetry).
Identify and communicate reliability risks, performance bottlenecks, and early indicators of potential incidents that could impact service availability.
Develop automation and tooling to reduce operational toil, minimize manual intervention, and enhance service recovery and self‑healing capabilities.
Partner with compute, network, storage, and platform teams to improve service resilience and strengthen disaster recovery readiness.
Contribute to knowledge sharing, process improvements, and the development of operational best practices across the organization.
Participate in ongoing training, mentorship, and professional development to grow into advanced SRE responsibilities.
What You'll Bring to the Team:
5+ years of experience in cloud operations, SRE, or related roles
Understanding of cloud platforms and infrastructure fundamentals (Kubernetes, AWS/GCP, virtualization, distributed systems)
Familiarity with incident management practices and operational frameworks (SRE/ITIL/etc.)
Experience with monitoring and alerting tools (Prometheus, Grafana) or a strong willingness to learn
Familiarity with infrastructure-as-code and configuration management tools such as Terraform and Ansible
Basic Scripting and automation experience (Go, Python, C, C++, or similar)
Strong communication skills, with the ability to clearly articulate technical issues to diverse stakeholders
Ability to stay calm, focused, and effective in fast-moving or high-pressure situations
A growth mindset with enthusiasm for operational excellence, reliability engineering, and continuous improvement
Bonus Points:
Experience with Kubernetes, container orchestration, or large-scale distributed systems
Exposure to change management, operational readiness reviews, or structured RCAs
Familiarity with self‑healing systems, automated remediation, or event‑driven operations
Interest in scaling AI/HPC infrastructure and solving reliability challenges in GPU‑heavy environments
Passion for learning, mentorship, and developing deeper SRE capabilities over time
Benefits:
Industry competitive pay
Restricted Stock Units in a fast growing, well‑funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
MetLife Legal
Company paid commuter benefit; $300 per month
Compensation:
Compensation will be paid in the range of $172,000 - $209,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
#J-18808-Ljbffr
A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations. Candidates should have strong cloud proficiency, experience in containerization technologies, and a bachelor's degree in a related field.
#J-18808-Ljbffr
$127k-176k yearly est. 2d ago
Site Reliability Engineer, Frontier Systems Infrastructure
Openai 4.2
San Francisco, CA jobs
About the Team
The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training.
We take data center designs, turn them into real, working systems and build any software needed for running large-scale frontier model trainings.
Our mission is to bring up, stabilize and keep these hyperscale supercomputers reliable and efficient during the training of the frontier models.
About the Role
We are looking for engineers to operate the next generation of compute clusters that power OpenAI's frontier research.
This role blends distributed systems engineering with hands-on infrastructure work on our largest datacenters. You will scale Kubernetes clusters to massive scale, automate bare-metal bring-up, and build the software layer that hides the complexity of a magnitude of nodes across multiple data centers.
You will work at the intersection of hardware and software, where speed and reliability are critical. Expect to manage fast-moving operations, quickly diagnose and fix issues when things are on fire, and continuously raise the bar for automation and uptime.
You might thrive in this role if you:
Have deep experience operating or scaling Kubernetes clusters or similar container orchestration systems in high-growth or hyperscale environments
Bring strong programming or scripting skills (Python, Go, or similar) and familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation
Are comfortable with bare-metal Linux environments, GPU hardware, and large-scale networking
Enjoy solving fast-moving, high-impact operational problems and building automation to eliminate manual work
Can balance careful engineering with the urgency of keeping mission-critical systems running
Qualifications
Experience as an infrastructure, systems, or distributed systems engineer in large-scale or high-availability environments
Strong knowledge of Kubernetes internals, cluster scaling patterns, and containerized workloads
Proficiency in cloud infrastructure concepts (compute, networking, storage, security) and in automating cluster or data center operations
Bonus: background with GPU workloads, firmware management, or high-performance computing
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement.
Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.
To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
#J-18808-Ljbffr
$127k-176k yearly est. 1d ago
Reliability/DFX Engineer
Openai 4.2
San Francisco, CA jobs
About the Team
OpenAI's Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI's supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.
About the Role
We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. This hands-on individual contributor will sit within our hardware team and work closely with chip design, platform design, hardware health, and the broader industry ecosystem to architect, implement, and deploy reliable next-generation AI accelerator systems. This engineer will evaluate system and chip architecture holistically, identify high-ROI opportunities to improve reliability and availability across the stack, and translate those opportunities into strategy and silicon features.
In this role, you will
Oversee DFX architecture, implementation, and execution in silicon from concept to high-volume deployment, and propose high-ROI features to enhance reliability and fault tolerance. DFX includes design for testability, reliability, availability, and serviceability of high-performance AI hardware.
Build system-level reliability models grounded in empirical data to guide organization-wide DFX and reliability strategy. This requires a detailed understanding of chip and system architecture, design, implementation, and component-level reliability.
Collaborate with chip and platform architecture/design teams to explore and implement DFX features, including the specification and implementation of digital/mixed-signal IP, firmware/system software, and DFX methodology (in partnership with engineering teams).
Partner with hardware health and platform design teams to continuously improve reliability and fault tolerance in NPI and HVM. This includes optimizing operating conditions, designing experiments, and performing data analysis to drive continuous, data-driven improvements across the stack.
Serve as the DFX/reliability champion and evangelist to align the broader industry ecosystem with OpenAI's requirements and roadmap.
Qualifications
BS with 15+ years, MS with 10+ years, or PhD with 3+ years of relevant industry experience focused on reliability across the chip/platform stack.
Hands-on experience with RTL design and DFT is required; physical implementation and/or silicon ATE experience is preferred.
Detailed understanding of ML chip and platform architecture and ML workload characteristics is required.
Strong fundamentals in reliability modeling, with hands-on skills in empirical data analysis.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.
To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
#J-18808-Ljbffr
$127k-176k yearly est. 4d ago
Reliability Engineer
Medium 4.0
Palo Alto, CA jobs
Pivotal is the leader in the emerging market of electric Vertical Takeoff and Landing (eVTOL) aircraft. We design, develop, and manufacture light eVTOL aircraft and are renowned for the BlackFly, the first light eVTOL to fly manned missions and enter the consumer market.
Efficient, compact, and simple, Pivotal vehicles are designed for a wide range of consumer, public service, and defense applications. Our distinctive tilt‑aircraft architecture and scalable platform have been in development, preparing to enter the market for over 10 years. We recently announced our next‑generation aircraft, the Helix, planned for general release and scalable production in 2025.
Mobility is one of the most highly‑valued areas of technology investment today. This is the right company, in the right space, the right strategy, at the right time. We invite you to join our amazing team and grow with us.
We are seeking a Reliability Engineer to support the development of our cutting‑edge eVTOL aircraft from concept through production. In this position, you will be responsible for ensuring the reliability, availability, and maintainability (RAM) of aircraft systems required to meet or exceed safety, performance, and regulatory requirements. Your work will directly impact the safety, operational efficiency, and customer experience of our revolutionary air mobility platform.
As a Reliability Engineer, you will develop, lead, and maintain reliability requirements and system reliability in conjunction with the engineering leads. You will work seamlessly across functions and understand the systems engineering approach and execution of a complex system development effort. As the ideal candidate, you are comfortable with a range of tools including Relyence for reliability, JAMA for requirements and a variety of project management tools including Git Lab and MS Project. You are tenacious and drive accountability and leverage the collective smarts of teams to find better solutions - committed to “finding a way”.
Ideally, you also bring a passion for technology, aviation, and the future of personal flight!
$82,500 - $187,500 a year
Applicants must be eligible for employment in the United States and willing to work onsite at our HQ office in Palo Alto, CA.
Pivotal offers a comprehensive benefits package, including medical, dental, vision, and 401k plans.
Pivotal is an Equal Opportunity Employer. Pivotal does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non‑disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.
Responsibilities
Lead the development and execution of Reliability Program Plans
Work closely with Engineering leads to support the Failure Modes and Effects Analysis (FMEA) on each sub system
Perform Fault Tree Analysis (FTA), Reliability Block Diagrams (RBDs) and Weibull analysis for each aircraft system
Perform reliability & failure analysis and qualification for product launch
Work closely with engineering teams to assess the reliability of a product and use it to improve product performance and inform new product design
Assist with the Management, validation, and document verification of requirements at the system and subsystem levels
Assist with managing requirements traceability and evolution through the system development cycle Identifies, analyzes and resolves system design weaknesses
Influences the shaping of future products by contributing to the framework (architecture) used across multiple products or systems
Collaborate with engineers in compliance analysis, system interface definitions, and associated requirements definition.
Ensure “white spaces” in verification and validation plans or results are quickly resolved while keeping project on track to schedule
Engage with our test team on verification & validation (V&V) test plans to ensure comprehensive coverage of requirements
Qualifications
Bachelor's degree in Mathematics, Engineering/Computer Science, Electrical Engineering, Aerospace, or Mechanical Engineering, or equivalent combination of education, training, and experience
5+ years experience using a reliability analysis tool (Relyence preferred)
3+ years experience in a systems-type product development environment involving requirements development, FMEA, compliance standards, and V&V test planning
6+ years of combined experience in reliability and systems engineering
Familiarity with FAA PT103, PT23, DO-160 and MIL STD 810
Ability to understand specifications, test plans, procedures, and reports as it relates to reliability
Exhibits effective problem‑solving, analytical, and interpersonal skills
Consistently demonstrates teamwork and collaboration, and puts the success of the team above one's own interests
Excellent verbal and written communication skills
Preferred Qualifications
Taking a product from concept to production with contract manufacturers is a plus
Exposure to Guidance Navigation (GNC) and Control Systems is highly desired
Experience with Modeling, Simulation & Control and Model-Based Systems Engineering is helpful but not required
Knowledge of Simulink, Ansys and Solidworks preferred
Proficiency with C/C++ and Matlab
Proficiency with office software and computer based productivity tools is a must
Attributes aligned with Core Values
Demonstrates a proactive safety mindset by embedding safety into daily operations, identifying and mitigating risks through assessments and training, encouraging open dialogue on safety concerns, and continuously improving protocols to ensure a safe work environment.
Puts customers at the center of every action by deeply understanding their challenges, delivering exceptional value, and striving to exceed expectations to support their success as our core purpose.
Actively seeks and values diverse stakeholder perspectives, builds cross‑functional relationships, and fosters trust through empathetic, fact‑based communication.
Drives results with clarity and purpose by focusing on what matters most, adapting to change, taking initiative, and owning outcomes while aligning actions with a clear understanding of success at every level.
Navigates ambiguity with resilience and bold thinking by challenging the status quo and combining innovative ideas with practical best practices to overcome obstacles and drive progress.
Fosters a high‑performance culture grounded in respect and professionalism consistently balancing high expectations with a healthy, collaborative environment.
Is always a trusted, dependable teammate.
#J-18808-Ljbffr
$82.5k-187.5k yearly 5d ago
Reliability Engineer - eVTOL Systems RAM (Onsite)
Medium 4.0
Palo Alto, CA jobs
A pioneering aerospace company based in Palo Alto is looking for a Reliability Engineer to support the development of innovative electric Vertical Takeoff and Landing (eVTOL) aircraft. You will lead reliability analysis and ensure all systems meet safety and performance standards. Applicants should have a Bachelor's degree in a relevant field and significant experience in reliability analysis. This role offers a competitive salary range of $82,500 - $187,500 and a comprehensive benefits package.
#J-18808-Ljbffr
$82.5k-187.5k yearly 5d ago
Production Engineer, Storage
Crusoe Energy Systems LLC 4.1
San Francisco, CA jobs
Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability.
Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure.
About This Role:
At Crusoe Energy Systems, our Site Reliability Engineering (SRE) team plays a mission-critical role in maintaining the performance and reliability of our AI-optimized cloud infrastructure. The Storage-focused SRE role is responsible for ensuring the availability, performance, and scalability of Crusoe's cloud storage products and services, which power compute-intensive, latency-sensitive workloads for AI and HPC use cases. This role directly supports our vertically integrated, sustainable cloud platform by building and optimizing distributed, fault-tolerant storage systems at scale.
What You'll Be Working On:
In this role, you will build automation and self-healing tools to monitor and maintain Crusoe's distributed cloud storage infrastructure, which includes block, file, and object storage systems. You will drive reliability initiatives focused on data replication, encryption, backup and restore strategies, and robust failover mechanisms. Collaborating closely with storage engineers, you will help implement and maintain high-performance NVMe- and SSD-backed volumes that support large-scale AI compute clusters. Your responsibilities will also include supporting user-facing storage services with a focus on availability, performance tuning, and adherence to error budgets. You'll investigate and resolve storage-related incidents using deep telemetry, logs, and performance profiling, while also partnering with hardware and kernel teams to diagnose low-level I/O issues and optimize I/O paths, cache policies, and file systems. Additionally, you will contribute to the architecture of fault-tolerant, scalable storage backends tailored for AI-first cloud environments.
What You'll Bring to the Team:
5+ years of professional experience in SRE, systems, or storage engineering.
Hands-on experience with distributed storage systems (e.g., Ceph, GlusterFS, OpenEBS) and deep understanding of object, block, and file storage paradigms.
Proficiency in a programming language such as Python, Go, Java, or C.
Experience with Infrastructure as Code and deployment tooling such as Terraform, Ansible, or Puppet.
Deep knowledge of Linux internals with a focus on I/O subsystems, memory management, and storage scheduling.
Familiarity with storage protocols like NFS, SMB, iSCSI, or NVMe-oF.
Strong experience working with containerized workloads and orchestration platforms (e.g., Kubernetes, Docker).
Excellent incident response, troubleshooting, and documentation practices.
Experience with building and operating managed services at scale such as object, file and block storage (AWS, GCP, Azure)
Excellent communication skills
Must be able to pass a background check
Embody the Company values
Bonus Points:
Contributions to open-source storage projects or the Linux storage stack.
Experience with hybrid storage models across on-prem and cloud environments.
Familiarity with high-throughput network topologies for storage backplanes (e.g., RoCE, RDMA, InfiniBand)..
Benefits:
Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
MetLife Legal
Company paid commuter benefit; $300 per month
Compensation:
Compensation will be paid in the range of $166,000 - $201,000 a year + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
#J-18808-Ljbffr
$166k-201k yearly 5d ago
Production Engineer, Compute
Crusoe Energy Systems LLC 4.1
San Francisco, CA jobs
Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability.
Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure.
About This Role:
At Crusoe, we are building the most sustainable, AI-first cloud infrastructure, and our Compute-focused Site Reliability Engineers are the backbone of that mission. This role is centered on supporting virtualization, hypervisor, and kernel-level performance for Crusoe's compute infrastructure. You'll play a vital role in deploying and optimizing bare-metal and virtualized compute platforms, ensuring performance, security, and scale for modern AI and HPC workloads.
What You'll Be Working On:
In this role, you will develop automation and observability tools to monitor Crusoe's compute infrastructure, spanning from the kernel to orchestration layers. You will support and scale the company's virtualization stack, including technologies such as KVM, QEMU, and other hypervisors. Collaborating with Linux kernel and hardware teams, you'll help identify and resolve performance bottlenecks, driver issues, and optimize hardware offloads. A key focus will be on optimizing performance for AI and HPC workloads across CPU, GPU, and DPU/NIC resources. You will participate in root cause analysis for kernel crashes, hardware-software integration problems, and performance regressions, while also integrating hypervisor-level enhancements to improve guest VM reliability and workload isolation. The role involves tuning kernel subsystems such as the process scheduler, NUMA configuration, memory management, and interrupt handling. Additionally, you will work closely with platform teams to implement and validate support for emerging compute hardware, including SmartNICs, BlueField devices, and TPUs
What You'll Bring to the Team:
5+ years of professional experience in Compute SRE, Linux system engineering, or compute infrastructure roles.
Strong proficiency in Linux kernel internals, with exposure to scheduler, memory allocation, and driver subsystems.
Experience with virtualization architectures and technologies such as KVM, Xen, QEMU, or VMware.
Familiarity with SmartNICs/DPUs (e.g., NVIDIA CX6/7, BlueField-3) and kernel bypass techniques.
Expert-level skills in at least one programming language: Go, C or Rust.
Experience with system-level debugging, including kdump, kexec, and kernel panic analysis.
Proficiency in Infrastructure as Code tooling and CI/CD practices for bare-metal or cloud infrastructure.
Strong understanding of compute scheduling, resource management, and high-throughput networking.
Benefits:
Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
MetLife Legal
Company paid commuter benefit; $300 per pay period
Compensation Range:
Compensation will be paid in the range of $166,000 - $201,000 a year + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
#J-18808-Ljbffr
$166k-201k yearly 5d ago
GenAI ML Engineer for Real-World Production Systems
Scale Ai, Inc. 4.1
San Francisco, CA jobs
A leading AI technology company is seeking a Machine Learning Research Engineer to develop critical ML systems for their GenAI platform. This high-impact role involves designing and deploying scalable ML solutions, utilizing advanced techniques and frameworks. Candidates should have 3+ years of experience, solid software engineering skills, and a passion for solving real-world problems. The position offers competitive compensation and benefits, with a salary range of $176,000 to $220,000 USD in San Francisco.
#J-18808-Ljbffr
$176k-220k yearly 5d ago
AI Cloud Storage SRE - Scale & Reliability
Crusoe Energy Systems LLC 4.1
San Francisco, CA jobs
A technology firm based in California is seeking a Site Reliability Engineer to optimize their AI-optimized cloud infrastructure. The role involves building automation tools, driving reliability initiatives, and collaborating with engineers to ensure high-performance storage systems. Candidates should have strong experience in SRE, distributed storage systems, and programming languages. The position offers competitive compensation and various benefits, including health insurance and stock options.
#J-18808-Ljbffr
$92k-119k yearly est. 5d ago
Production Engineer - AI Infra, Automation & Kubernetes
Crusoe Energy Systems LLC 4.1
San Francisco, CA jobs
A leading technology firm in San Francisco is seeking a Production Engineer to manage fleet operations and develop automation tools for server provisioning. This role is essential for transitioning the infrastructure to Kubernetes and includes troubleshooting hardware issues. The ideal candidate will have solid hardware experience and a strong Linux background, along with Kubernetes proficiency. Competitive pay and benefits are offered including stock options and health insurance.
#J-18808-Ljbffr
$92k-130k yearly est. 4d ago
Production AI Evals Engineer - Build Reproducible Pipelines
Openai 4.2
San Francisco, CA jobs
A global AI research and deployment company is hiring product-minded engineers in San Francisco to design evals for advanced AI systems. You will define evaluation signals, prototype solutions, and enhance model reliability while working closely with research and product teams. The ideal candidate has 4+ years of software engineering experience, particularly in building AI applications. This role offers a hybrid work model and competitive compensation.
#J-18808-Ljbffr
$94k-132k yearly est. 2d ago
Chemical Engineers - AI Trainer (Contract)
Handshake 3.9
San Francisco, CA jobs
Handshake is recruiting Chemical Engineer Professionals to contribute to an hourly, temporary AI research project-but there's no AI experience needed. In this program, you'll leverage your professional experience to evaluate what AI models produce in your field, assess content related to your field of work, and deliver clear, structured feedback that strengthens the model's understanding of your workplace tasks and language. The Handshake AI opportunity runs year-round, with project opportunities opening periodically across different areas of expertise.
Details
The position is remote and asynchronous; work independently from wherever you are.
The hours are flexible, with no minimum commitment, but most average 5-20 hrs
The work includes developing prompts for AI models that reflect your field, and then evaluating responses.
You'll learn new skills and contribute to how AI is used in your field
Your placement into a project will be dependent on project availability-if you apply now and can't work on this project, more will be available soon.
Qualifications
You have at least 4 years of professional experience in one or more of the following types of work.
The examples below reflect the types of real-world responsibilities that you might have had in your role that will give you the context needed to evaluate and train high-quality AI models
Develop and implement safety procedures for chemical processes, ensuring compliance with safety and environmental regulations.
Troubleshoot and optimize chemical manufacturing processes by analyzing data and conducting research.
Design equipment layouts and control systems, and oversee the transition from laboratory to commercial production, including cost estimation and performance monitoring.
You're able to participate in asynchronous work in partnership with leading AI labs.
Application Process
Create a Handshake account
Upload your resume and verify your identity
Get matched and onboarded into relevant projects
Start working and earning
Work authorization information
F-1 students who are eligible for CPT or OPT may be eligible for projects on Handshake AI. Work with your Designated School Official to determine your eligibility. If your school requires a CPT course, Handshake AI may not meet your school's requirements. STEM OPT is not supported. See our Help Center article for more information on what types of work authorizations are supported on Handshake AI.
#J-18808-Ljbffr
$75k-105k yearly est. 5d ago
Wearables Manufacturing Test Engineer - RF & Camera Systems
Sesame 4.7
San Francisco, CA jobs
A leading tech company in San Francisco is seeking a Manufacturing Test Engineer to optimize test solutions for wearable devices. The ideal candidate will have over 10 years of experience in manufacturing test engineering, particularly in consumer electronics, and is proficient in audio and RF testing methodologies. This role offers a range of employee benefits, including 401k matching and comprehensive health coverage.
#J-18808-Ljbffr
$67k-89k yearly est. 1d ago
Test Engineer, Manufacturing
Sesame 4.7
San Francisco, CA jobs
Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.
About the Role
As a Manufacturing Test Engineer, you will be the technical expert responsible for developing, implementing, and optimizing manufacturing test solutions for our next-generation wearable devices. You will play a critical role in the entire product lifecycle, from initial design reviews to mass production, ensuring that every product leaving our manufacturing lines meets our rigorous quality and performance specifications. This role requires a blend of deep technical expertise, strategic thinking, and the ability to work effectively with large-scale contract manufacturers (CMs) and product development engineering across the globe.
Responsibilities
Test Strategy & Development: Architect and implement comprehensive manufacturing test strategies for new product introductions (NPIs), covering functional, parametric, and in-circuit testing for complex wearable devices.
Test Fixture & Equipment Design: Design, develop, and qualify test fixtures, equipment, and software for various stages of the manufacturing process, from PCBA-level tests to final product assembly.
Specialized Testing: Develop and optimize test solutions for specialized functions critical to our products, including:
Audio Testing: Acoustic test methodologies for speakers, microphones, and other audio components.
Camera & Imaging Testing: Calibration and functional tests for high-resolution cameras and optical systems.
RF & Wireless: Testing for wireless connectivity, including Bluetooth, Wi‑Fi, and other RF protocols.
Sensors: Calibration and functional testing for various sensors (accelerometers, gyroscopes, biometric sensors).
Data Analysis & Optimization: Utilize data analysis tools to automatically monitor test yields, identify trends, root cause failures, and flag production issues. Proactively drive continuous improvement to optimize test coverage, cycle time, and overall manufacturing efficiency.
Collaboration with Contract Manufacturers (CMs): Act as the primary technical interface with our large-scale contract manufacturing partners. Train and support CM teams on test processes, troubleshoot issues on the factory floor, and ensure strict adherence to our specifications.
Design for Testability (DFT): Partner with the product design and engineering teams during the development phase to influence product architecture and design for enhanced testability, manufacturability, and reliability.
Documentation: Create and maintain detailed test specifications, process instructions, and engineering change notices (ECNs) to ensure clear communication and consistency across all manufacturing sites.
Project Leadership: Lead and mentor junior engineers, providing technical guidance and expertise on complex test challenges. Manage third‑party resources as required.
Required Qualifications
Deep expertise in developing test solutions for consumer electronics, including PCBA, FATP (Final Assembly, Test, and Pack), and system‑level testing.
Demonstrated experience with specialized testing:
Audio: Expertise with audio test equipment (e.g., Audio Precision, SoundCheck) and methodologies.
Camera: Experience with optical test benches, image quality analysis, and camera calibration.
RF: Experience with RF test equipment (e.g., spectrum analyzers, network analyzers) and wireless protocol testing.
Battery: Experience with battery testing methodologies, including charge/discharge cycling, impedance spectroscopy, and safety testing.
Experience working directly and extensively with large‑scale Contract Manufacturers (CMs), including overseas travel for factory support and new line bring‑up.
Strong knowledge of statistical process control (SPC), GR&R, and data analysis tools to drive yield improvements.
Excellent problem‑solving, communication, and interpersonal skills.
Preferred Qualifications
10+ years of hands‑on experience in manufacturing test engineering within the consumer electronics industry.
Proven experience with wearables is highly preferred, including familiarity with the unique challenges of small form factor, high‑density products.
Proficiency in a programming language for test automation (e.g., Python, C#, LabVIEW).
Demonstrated experience with specialized testing and calibration for eye tracking and vision.
Bachelor's or Master's degree in Electrical Engineering, Mechanical Engineering, or a related field.
Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities-contact ****************** for assistance.
Full‑time Employee Benefits
401k matching
100% employer‑paid health, vision, and dental benefits
Unlimited PTO and sick time
Flexible spending account matching (medical FSA)
Benefits do not apply to contingent/contract workers
#J-18808-Ljbffr