Post job

Reliability engineer jobs in Redwood City, CA - 1,308 jobs

All
Reliability Engineer
Senior Reliability Engineer
  • Site Reliability Engineer

    The Voleon Group 4.1company rating

    Reliability engineer job in Berkeley, CA

    Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future. Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together. In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production‑critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real‑world problems and collaborate with passionate and talented colleagues in an empowering, results‑driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort. Responsibilities Improve fault‑tolerance and maintainability of code in proprietary data pipelines and trading systems Diagnose and fix bugs in code Lead complex deployments Automate manual workflows Track and prioritize outstanding production‑related issues Share an on‑call rotation responding to incidents to ensure the continuous operation of production‑critical systems Requirements Experience with coding and debugging Python Experience with Linux Familiarity with Relational Databases & SQL Sharp analytical and problem‑solving skills and a persistent drive to make things work (better) Strong growth mindset and a passion for learning Strong technical communication skills Attention to detail 2 years of relevant industry experience An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience Preferred Qualifications Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment Experience supporting production systems Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match. Friends of Voleon Candidate Referral Program If you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity Employer The Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law. #J-18808-Ljbffr
    $120k-160k yearly 4d ago
  • Job icon imageJob icon image 2

    Looking for a job?

    Let Zippia find it for you.

  • Site Reliability Engineer - AI Inference Infra & GPU Clusters

    Near Inc. 4.6company rating

    Reliability engineer job in San Francisco, CA

    A tech company specializing in AI infrastructure based in San Francisco is looking for a candidate to own the development of decentralized machine learning infrastructure. The role involves designing components, performance tuning, and collaboration with skilled colleagues. The ideal candidate should have experience in Cloud infrastructure and software concurrency, along with a Bachelor's degree in Computer Science. Excellent communication skills and the ability to learn quickly are essential. The position is onsite at the San Francisco office. #J-18808-Ljbffr
    $126k-176k yearly est. 2d ago
  • Site Reliability Engineer

    Rethink Recruit

    Reliability engineer job in San Francisco, CA

    About Runloop Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform eliminates friction in environment setup and dependencies, enabling teams to experiment, iterate, and deploy seamlessly. We're a small but dedicated team working to deliver a rock-solid platform that empowers innovation. The Role We're looking for a skilled Site Reliability Engineer (SRE) to ensure the reliability, observability, performance, and security of our core platform-the foundation upon which our users build. You'll work closely with engineering to maintain resilient systems that power our code sandboxes, while mentoring peers on reliability practices. This role blends deep operational expertise with a software engineering mindset. What You'll Do Design, operate, and improve production infrastructure on AWS, GCP, or Azure. Define and monitor SLIs/SLOs, manage error budgets, and maintain observability with Prometheus, Grafana, and logging/tracing frameworks. Build automation for deployments, scaling, and recovery-reducing toil and creating self-healing systems. Lead incident response, root‑cause analysis, and blameless post‑mortems. Collaborate with developers to design scalable, reliable services. Optimize distributed systems, networking, and sandbox performance. Plan for capacity growth and support safe release/change management. Mentor engineers on reliability and front‑end distributed systems (CDNs, caching, client observability). Qualifications Proven experience as an SRE, DevOps Engineer, or similar role. Strong programming skills (Python or Go preferred). Deep knowledge of containerization (Docker, Kubernetes). Expertise in infrastructure-as-code (Terraform or Pulumi). Strong understanding of networking, Linux, and system security. Hands‑on experience with distributed systems and observability (metrics, logs, tracing). Skilled in incident management, on‑call rotations, and post‑mortem processes. Ability to mentor and influence best practices across teams. Bonus Points Experience with chaos engineering, CI/CD for front‑end delivery, or observability tools like Sentry, RUM, or synthetic monitoring. Benefits Competitive salary and equity. Comprehensive health, dental, and vision insurance for you and your dependents. Free lunch and snacks. Opportunity to shape the future of AI‑driven software engineering in a high‑impact role. Location On‑site in San Francisco, CA (in office 4 days/week, optional 1 day WFH). Join Us If you're passionate about building resilient systems that empower developers and want to shape the future of AI‑driven software engineering, we'd love to hear from you. Join Runloop and help build the infrastructure that powers tomorrow's AI. Runloop is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, sexual orientation, gender identity, or any other characteristic protected by law. #J-18808-Ljbffr
    $113k-160k yearly est. 6d ago
  • Site Reliability Engineer

    Happyrobot Inc.

    Reliability engineer job in San Francisco, CA

    About HappyRobot HappyRobot is the AI-native operating system for the real economy-a system that closes the circuit between intelligence and action. By combining real-time truth, specialized AI workers, and an orchestrating intelligence, we help enterprises run complex, mission-critical operations with true autonomy. Our AI OS compounds knowledge, optimizes at every level, and evolves over time. We're starting with supply chain and industrial-scale operations, where resilience, speed, and continuous improvement matter most-freeing humans to focus on strategy, creativity, and other high-value tasks. You can learn more about our vision in our Manifesto. HappyRobot has raised $62M to date, including our most recent $44M Series B in September 2025. Our investors include Y Combinator (YC), Andreessen Horowitz (a16z), and Base10-partners who believe in our mission to redefine how enterprises operate. We're channeling this investment into building a world-class team: people with relentless drive, sharp problem-solving skills, and the passion to push limits in a fast-paced, high-intensity environment. If this resonates, you belong at HappyRobot. About the Role We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You'll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations. This is a high-impact, high-trust role where you'll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment. Must-Have 3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.) Strong problem-solving skills and ability to dive into unfamiliar backend codebases Comfort with Python and Go for reading code and writing small tools/utilities Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry) Clear, calm communication under pressure - especially during live incidents Nice-to-Have Experience working with distributed systems or services at scale Built or maintained internal tooling for on-call teams or reliability workflows Familiarity with deployment pipelines, CI/CD, or infra-as-code Experience improving system observability (e.g., custom metrics, traces, log pipelines) Why join us? Opportunity to work at a high-growth AI startup, backed by top investors. Fast Growth - Backed by a16z and YC, on track for double-digit ARR. Top-Tier Compensation - Competitive salary + equity in a high-growth startup. Ownership & Autonomy - Take full ownership of projects and ship fast. Work With the Best - Join a world-class team of engineers and builders. Our Operating Principles Extreme Ownership We take full responsibility for our work, outcomes, and team success. No excuses, no blame-shifting - if something needs fixing, we own it and make it better. This means stepping up, even when it's not “your job.” If a ball is dropped, we pick it up. If a customer is unhappy, we fix it. If a process is broken, we redesign it. We don't wait for someone else to solve it - we lead with accountability and expect the same from those around us. Craftsmanship Putting care and intention into every task, striving for excellence, and taking deep ownership of the quality and outcome of your work. Craftsmanship means never settling for “just fine.” We sweat the details because details compound. Whether it's a product feature, an internal doc, or a sales call - we treat it as a reflection of our standards. We aim to deliver jaw‑dropping customer experiences by being curious, meticulous, and proud of what we build - even when nobody's watching. We are “majos” Be friendly & have fun with your coworkers. Always be genuine & honest, but kind. “Majo” is our way of saying: be a good human. Be approachable, helpful, and warm. We're building something ambitious, and it's easier (and more fun) when we enjoy the ride together. We give feedback with kindness, challenge each other with respect, and celebrate wins together without ego. Urgency with Focus Create the highest impact in the shortest amount of time. Move fast, but in the right direction. We operate with speed because time is our most limited resource. But speed without focus is chaos. We prioritize ruthlessly, act decisively, and stay aligned. We aim for high leverage: the biggest results from the simplest, smartest actions. We're running a high-speed marathon - not a sprint with no strategy. Talent Density and Meritocracy Hire only people who can raise the average; ‘exceptional performance is the passing grade.' Ability trumps seniority. We believe the best teams are built on talent density - every hire should raise the bar. We reward contribution, not titles or tenure. We give ownership to those who earn it, and we all hold each other to a high standard. A-players want to work with other A-players - that's how we win. First-Principles Thinking Strip a problem to physics-level facts, ignore industry dogma, rebuild the solution from scratch. We don't copy-paste solutions. We go back to basics, ask why things are the way they are, and rebuild from the ground up if needed. This mindset pushes us to innovate, challenge stale assumptions, and move faster than incumbents. It's how we build what others think is impossible. Personal Data Protection The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller. By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer. In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data. If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through ********************** subject to the GDPR. For more information, visit **************************************** By submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described. #J-18808-Ljbffr
    $113k-160k yearly est. 3d ago
  • Founding Site Reliability Engineer

    Relevance Ai

    Reliability engineer job in San Francisco, CA

    About Us 🚀 At Relevance AI, our mission is to empower anyone to delegate work to the AI workforce. We're building a new category of AI automation, enabling teams to create and deploy intelligent AI agents that replicate human-quality work, decision-making, and collaboration at scale. We're scaling fast backed by top global investors including Bessemer Venture Partners, Insight Partners, Peak XV, and King River Capital and our platform is already trusted by industry leaders like Canva, Databricks, Confluent, KMPG, Autodesk, and more. With offices in Sydney 🇦🇺 and San Francisco 🇺🇸 (and a new hub launching in Barcelona 🇪🇸), this is your chance to shape the future of work on a global stage. The Role 🧠 We're looking for a Founding Site Reliability Engineer to join us as our first SRE hire in San Francisco. We are open to hiring someone who is Senior, Lead or Principal level and will be candidate led. This role is perfect for someone ready to establish and scale the SRE discipline from the ground up in one of the fastest-growing AI companies globally. You'll own the reliability, scalability, and security of our platform as we power tens of thousands of multi-agent workloads across multiple regions. You'll partner closely with our founders, engineering leads, and product teams to define our reliability culture, shape long-term strategy, and build world-class infrastructure for enterprise scale. What You'll Do 💪 Own SRE establishing best practices, tooling, and culture Tackle reliability challenges unique to multi-agent orchestration at enterprise scale Guarantee >99.9% uptime of production systems, ensuring reliability at global scale Architect and automate AWS infrastructure with Terraform and CI/CD pipelines Design observability systems across microservices, APIs, and vector infrastructure (metrics, tracing, logging) Drive down incidents and MTTR through runbooks, alerting, and incident response excellence Help scale infra to support hundreds of thousands of agents and billions of API calls Partner with engineering teams to embed SRE principles into the SDLC and shape org-wide reliability strategy Act as a founding voice in our SF office, influencing product direction and engineering culture What We're Looking For 🧠 5+ years in SRE/DevOps/Infrastructure roles, with experience in enterprise SaaS environments. Deep AWS expertise (EC2, ECS/EKS, Lambda, RDS, VPC, IAM). Proven track record with Infrastructure as Code (Terraform, Kubernetes/EKS, CDK, or CloudFormation). Hands-on with observability stacks (CloudWatch, Grafana, Prometheus, Datadog). Incident management experience in production SaaS systems, including on-call, postmortems, and reliability improvements. Bonus: Prior exposure to AI/ML platforms, data-heavy systems, or multi-agent workloads. Tech Stack 🧰 AWS, Kubernetes/EKS, Terraform, GitHub Actions, Postgres/Mongo, Prometheus/Grafana, CloudWatch, PagerDuty/BetterStack #J-18808-Ljbffr
    $113k-160k yearly est. 5d ago
  • Site Reliability Engineer

    Mercor, Inc.

    Reliability engineer job in San Francisco, CA

    About Mercor Mercor is at the intersection of labor markets and AI research. We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development. Our vast talent network trains frontier AI models in the same way teachers teach students: by sharing knowledge, experience, and context that can't be captured in code alone. Today, more than 30,000 experts in our network collectively earn over $1.5 million a day. Mercor is creating a new category of work where expertise powers AI advancement. Achieving this requires an ambitious, fast‑paced and deeply committed team. You'll work alongside researchers, operators, and AI companies at the forefront of shaping the systems that are redefining society. Mercor is a profitable Series C company valued at $10 billion. We work in‑person five days a week in our new San Francisco headquarters. About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability across our most critical systems, partnering directly with infrastructure leadership. You'll play a foundational role in building our SRE function from the ground up and shaping how Mercor operates large‑scale, high‑availability systems. What You'll Do Own reliability and production safety for core shared services and customer‑facing systems. Partner directly with infrastructure leadership to define SRE priorities, reliability standards, and production safety roadmap. Repair and improve how our production systems are structured so they are stable, resource‑efficient, isolated, and well‑observed. Introduce and champion modern SRE practices (e.g., incident response, postmortems, SLIs/SLOs) across engineering teams. Collaborate with leverage engineering and applied AI teams to ensure sustainable growth. Represent SRE best practices internally and help teams onboard onto production in a way that is safe, scalable, and consistent with SRE principles. What We're Looking For Experience doing true SRE work (not just operations) across multiple roles or companies. Deep familiarity with SRE practices as popularized by Google (e.g., error budgets, reliability vs. risk trade‑offs, large‑scale distributed systems). 5+ years of SRE experience; 15+ years of overall experience is ideal for this first SRE hire. Proven success operating systems at scale, with a strong understanding of the challenges of large, distributed production environments. Strong collaboration skills; able to work efficiently with cross‑functional engineering teams. Ability to drive cultural change around reliability while remaining hands‑on in building and fixing systems. Comfort working in high‑intensity, high‑availability environments where uptime and production quality are critical. Nice to Haves Experience as a founding SRE or early SRE hire, standing up SRE practices and orgs from scratch. Hands‑on experience in the AWS ecosystem, Kubernetes, and modern IaC tooling (Terraform, Spacelift, etc.). #J-18808-Ljbffr
    $113k-160k yearly est. 5d ago
  • Founding Site Reliability Engineer

    Assort Health Inc.

    Reliability engineer job in San Francisco, CA

    Our mission is to make exceptional healthcare accessible anytime, anywhere, for everyone. At Assort Health, we believe healthcare should feel effortless and connected - quick answers, clear communication, and seamless access to care. That's why we're building a new foundation for how patients and providers connect, driven by AI, built to embrace the complexities of healthcare, and tailored to each provider's unique needs. Assort is the most comprehensive patient experience platform powered by specialty-specific agentic AI. Assort's omnichannel AI agents seamlessly integrate with EHR/PMS and complicated provider preferences to eliminate lengthy hold times and inefficiencies that stand in the way of patients getting the care they need. Since launching in 2023, Assort has managed over 50M+ patient interactions, slashing average hold times from 11 minutes to 1 minute. Our platform now handles calls for thousands of providers with 98%+ resolution rates and 99% scheduling accuracy. Patient satisfaction averages 4.4/5, and we've achieved 11× revenue growth since Q4 2024. We're scaling rapidly and expanding adoption across the entire healthcare industry. What You'll Do You'll be the go-to expert for keeping our systems fast, stable, and resilient. While your primary mission is reliability, you'll also help shape the infrastructure, CI/CD, and tooling that enable the team to move faster and safer. Your scope may include: Define, own, and improve SLIs / SLOs / error budgets - set measurable targets around availability, latency, and error rates, and drive toward achieving them Build and maintain observability across the stack (metrics, logging, tracing, dashboards, alerts, anomaly detection) and lead incident management - coordinating responses, improving runbooks and postmortems, automating with AI tools, and collaborating with partners like Deepgram, Cartesia, GCP, and EHRs to ensure capacity Reduce operational toil by automating repetitive tasks and building self-healing systems and remediation workflows Improve deployment safety through canary or blue/green rollouts, automated rollbacks, chaos experiments, and deployment guardrails Contribute to infrastructure work: IaC, cloud architecture, networking, autoscaling, and related systems Ensure reliability across services, databases, caches, queues, third-party integrations, and networks Drive capacity planning, performance tuning, and cost optimization Mentor others on reliability best practices and champion a reliability mindset across engineering Experience & Background 3+ years focused on reliability, SRE, or production infrastructure Hands-on experience running production systems in startups or growth-stage companies (not just large enterprises) Comfortable balancing firefighting with strategic reliability improvements Technical Must-haves: Cloud infrastructure experience (GCP preferred; AWS or Azure fine) Implemented or maintained observability stacks (Datadog, Prometheus, Grafana, Honeycomb, OpenTelemetry, Sentry, PagerDuty) Can code and automate Comfortable with Kubernetes Nice-to-haves: Infrastructure-as-code (Terraform strongly preferred) CI/CD pipelines and modern deployment strategies Early-stage/high-growth experience Exposure to security, compliance, or resilience architectures Voice infra experience (Twilio, etc.) Why This Role Matters You'll build the reliability foundation - systems, culture, and practices - from the ground up You'll work cross-functionally with product, engineering, and leadership to influence priorities and tradeoffs You'll see direct impact: fewer outages, faster incident resolution, and more confidence in launches Benefits & Perks for Assorties 💸 Competitive Compensation - Including salary and employee stock options so you share in our success. 📚 Lifelong Learning - Annual budget for professional development, plus training opportunities to help you grow. 💻 Office Setup Stipend - We'll outfit your in-office workspace so comfy as it's productive. 🩺 Top-Tier Health Coverage - Medical, dental, and vision insurance, because your health comes first. 🏖 Unlimited PTO - We trust you to take the time you need to recharge and come back ready to crush it. 🥗 Meals & Snacks - Lunch, dinner, and snack breaks that fuel great ideas. 💪 Wellness Stipend - Your physical and mental well-being matters, and we've got a yearly stipend to prove it. 👵 401(k) - Let us help you plan for the future. We've got you covered. How We Work & What We Value Our team at Assort Health moves fast, stays focused, and is fueled by a desire to serve our customers and patients. Our company values guide how we work-they are present in how we show up, make decisions and work together to move our mission forward. We bring a Day One Drive, relentlessly striving to improve, keep a 5-Star Focus, as our customers are our lifeblood, always Answer the Call, remembering that ownership and accountability are paramount, and show up with One Pulse, because we are one team, with one rhythm and one result. Our team is growing and we are looking for motivated, hardworking, and passionate talent. If you want to make healthcare accessible for everyone, we'd love to hear from you! #J-18808-Ljbffr
    $113k-160k yearly est. 4d ago
  • Site Reliability Engineer - Kubernetes

    Theklicker

    Reliability engineer job in Palo Alto, CA

    theklicker is an online platform specializing in electronic product price comparison, enabling users to browse prices across multiple booking sites effortlessly. We are dedicated to being a one-stop solution for purchasing electronic products. With a focus on delivering the best user experience, theklicker empowers users to make informed purchasing decisions quickly and efficiently. Role Description This is a full-time on-site role for a Site Reliability Engineer - Kubernetes, based in Palo Alto, CA. The role involves maintaining and optimizing system reliability, managing infrastructure, troubleshooting technical issues, and supporting software deployments. The Site Reliability Engineer will work closely with development and operations teams to ensure seamless operations and robust technology solutions. Qualifications Proven expertise in Site Reliability Engineering and troubleshooting complex system issues Experience in Software Development with a strong understanding of coding best practices Proficiency in System Administration, managing Linux/Unix environments, and implementing automation scripts Knowledge of Kubernetes and infrastructure management Strong problem-solving, analytical, and communication skills Experience with monitoring and incident management tools is a plus Bachelor's degree in Computer Science, Engineering, or a related field #J-18808-Ljbffr
    $112k-159k yearly est. 3d ago
  • Site Reliability Engineer

    Neara

    Reliability engineer job in Palo Alto, CA

    Job type: Full Time · Department: Backend Engineer · Work type: Remote About A rchetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it. Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently pre-Series A, progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley. Our team is headquartered in Palo Alto, California, with team members throughout the US and Europe. We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don't see a role that exactly fits you below you can contact us directly with your resume via jobsarchetypeaiio. About the Role As a Site Reliability Engineer (SRE) at Archetype AI, you will be responsible for designing, scaling, and maintaining the infrastructure that powers our AI-driven products. You will collaborate with backend engineers and ML researchers to ensure that our distributed platforms are fault-tolerant, performant, and highly available. Core Responsibilities Design, build, and operate highly available distributed systems. Collaborate with engineering and ML teams to ensure reliable deployment of backend services (in Rust, C++ or similar). Implement monitoring, alerting, and observability solutions across infrastructure. Automate deployments, scaling, and infrastructure provisioning using infrastructure-as-code. Diagnose and resolve performance bottlenecks, system outages, and production incidents. Support AI/ML infrastructure for training and serving models at scale, including GPU clusters, pipelines, and inference services. Contribute to infrastructure architecture, standards, and operational best practices. Minimum Qualifications 5+ years of experience as SRE, DevOps, or Systems Engineer. Strong expertise in distributed systems, fault-tolerant architectures, and large-scale production environments. Proficiency in Rust, C++, or other backend languages with willingness to learn. Solid experience with Kubernetes, containers, and cloud platforms (AWS, GCP, Azure). Hands‑on experience with monitoring and observability tools (Prometheus, Grafana, ELK, OpenTelemetry). Experience with data pipelines, messaging systems, and streaming technologies (Kafka, Pulsar, etc.). Familiarity with AI/ML infrastructure (training pipelines, GPU clusters, inference systems). Strong debugging, problem‑solving, and automation mindset (Terraform, Ansible, Pulumi, scripting). Excellent communication and collaboration skills. Preferred Qualifications Experience with real‑time or low‑latency systems. Open‑source contributions to distributed systems or infrastructure projects. Knowledge of security best practices for distributed environments. Experience with edge or embedded systems and sensor‑based infrastructure. Background in multimodal data fusion or physical‑world perception systems. What We Value Ownership - You take initiative, follow through, and care deeply about quality and outcomes. Motivation - You're driven to solve complex problems and continuously raise the bar for yourself and your team. Excellence - You bring discipline, clarity, and rigor to your craft-and help others do the same. Collaboration - You work well with others, mentor generously, and contribute to a high‑trust, high‑performance culture. #J-18808-Ljbffr
    $112k-159k yearly est. 6d ago
  • Reliability Engineer for AI-Driven Materials Lab

    Periodiclabs

    Reliability engineer job in Menlo Park, CA

    A cutting-edge materials research lab in Menlo Park seeks a Reliability Engineer to lead maintenance operations and optimize experimental systems. The ideal candidate will have a relevant engineering degree and extensive experience in reliability and systems engineering, particularly in materials science. Responsibilities include establishing maintenance programs, managing a machine shop, and designing labware for automated workflows. Join this innovative team to contribute to groundbreaking scientific discoveries. #J-18808-Ljbffr
    $112k-160k yearly est. 6d ago
  • Site Reliability Engineer - Observability

    Rivian 4.1company rating

    Reliability engineer job in Palo Alto, CA

    About Us Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive's next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we're addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world. The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we'll map a new way forward. Working together, we'll create a future that's more connected, more intelligent, more sustainable for everyone. Role Summary We are seeking a Senior Site Reliability Engineer (SRE) specializing in Observability to join RivianVW's Data Platform - Production Engineering team. In this role, you will design, implement, and scale robust observability systems to ensure the health, performance, and reliability of our production environment. You will collaborate closely with cross-functional teams to create telemetry solutions that provide actionable insights into our distributed systems. Responsibilities Observability Platform Design: Architect, implement, and maintain observability systems, leveraging tools like Datadog, LGTM stack, OpenTelemetry, and Vector to enable real-time performance monitoring, logging, and alerting. Telemetry Optimization: Evolve and scale telemetry pipelines to ensure low latency and high availability for metrics, logs, and traces across multi-cloud environments. Performance Engineering: Proactively identify performance bottlenecks, optimize systems, and provide recommendations for reliability improvements. Scalable Automation: Implement automation solutions to scale systems sustainably while driving improvements in reliability and deployment velocity. Incident Management: Collaborate with the incident response team to establish data-driven debugging and troubleshooting processes using observability data. Tooling Development: Create and maintain self-service observability tools and dashboards to empower teams across the organization. Cross-functional Collaboration: Partner with development, DevOps, and infrastructure teams to define SLOs/SLIs and ensure observability is embedded throughout the software lifecycle. Qualifications Educational Background: Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. Experience: 5+ years in Site Reliability Engineering or a related role with a strong emphasis on observability. Technical Expertise: Proficiency in designing and operating observability platforms with tools like Prometheus, Grafana, Loki, Jaeger, or Datadog. Experience with OpenTelemetry and distributed tracing in microservices architectures. Deep knowledge of Kubernetes (e.g., EKS), ArgoCD, and Crossplane. Programming Skills: Strong proficiency in Python, Go, or similar languages for building automation and custom telemetry solutions. Cloud & Systems: Familiarity with multi-cloud setups, containerization (Docker), and Linux system fundamentals. Soft Skills: Exceptional problem-solving, communication, and a data-driven approach to decision-making. Pay Disclosure Salary Range/Hourly Rate for California Based Applicants: $146,900 - $194,610 USD Actual Compensation will be determined based on experience, location, and other factors permitted by law. Benefits Summary: Rivian and Volkswagen Group Technologies provides robust medical, prescription, dental and vision insurance packages for full-time employees, their spouse or domestic partner, and their children up to age 26. Coverage is effective on the first day of employment. Equal Opportunity Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status. Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com. Candidate Data Privacy Rivian and VW Group Technologies ("Rivian and Volkswagen Group Technologies") may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes ("Candidate Personal Data"). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law. Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies' service providers, including providers of background checks, staffing services, and cloud services. Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions. Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information. Please note that we are currently not accepting applications from third party application services. #J-18808-Ljbffr
    $146.9k-194.6k yearly 3d ago
  • Senior PostgreSQL DBRE - Scale, Reliability & Automation

    Okta, Inc. 4.3company rating

    Reliability engineer job in San Francisco, CA

    A leading identity management firm is looking for a Senior Database Reliability Engineer (DBRE) in San Francisco, California. The ideal candidate will have over 4 years of experience specifically with PostgreSQL and will be responsible for designing and optimizing data persistence layers for mission-critical systems. Key responsibilities include leading database incidents, working cross-functionally with platform teams, developing automation for tasks, and ensuring high availability across database environments. This position is essential for operational excellence in a hybrid environment. #J-18808-Ljbffr
    $157k-199k yearly est. 4d ago
  • Site Reliability Engineer, Managed AI

    Crusoe Energy Systems LLC 4.1company rating

    Reliability engineer job in San Francisco, CA

    Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure. About the Role At Crusoe, our Site Reliability Engineering team ensures the reliability and scalability of Crusoe's AI-optimized cloud platform. We're looking for an SRE with a strong background in distributed systems and hands-on experience with large language models to help us build and operate managed AI services at scale. This role is central to delivering highly available, performant, and cost-efficient AI infrastructure that powers compute-intensive, latency-sensitive workloads for our customers. What You'll Work On: Design and operate reliable managed AI services with a focus on serving and scaling LLM workloads Build automation and reliability tooling to support distributed AI pipelines and inference services Define, measure, and improve SLIs/SLOs across AI workloads to ensure performance and reliability targets are met Collaborate with AI, platform, and infrastructure teams to optimize large-scale training and inference clusters Automate observability by building telemetry and performance tuning strategies for latency-sensitive AI services Investigate and resolve reliability issues in distributed AI systems using telemetry, logs, and profiling Contribute to the architecture of next-generation distributed systems purpose-built for AI-first environments What You'll Bring: Strong software engineering background - experience building production-grade systems beyond scripting or Bash Demonstrated experience in distributed systems design and implementation Hands-on work with large language models (LLMs) or AI/ML infrastructure SRE mindset and experience (whether or not under the SRE title) including: Defining and measuring SLIs/SLOs Building monitoring and observability systems Driving performance and reliability improvements Designing fault‑tolerant systems and automated testing strategies Proficiency in at least one modern programming language (Python, Go, Java, C++) Familiarity with Kubernetes or container orchestration platforms Strong collaboration and communication skills Ability to thrive in a fast‑paced, mission‑driven environment Bonus Points: Experience scaling inference or training workloads for LLMs Benefits: Industry competitive pay Restricted Stock Units in a fast growing, well‑funded technology company Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents Employer contributions to HSA accounts Paid Parental Leave Paid life insurance, short‑term and long‑term disability Teladoc 401(k) with a 100% match up to 4% of salary Generous paid time off and holiday schedule Cell phone reimbursement Tuition reimbursement Subscription to the Calm app MetLife Legal Company paid commuter benefit; $300 per month Compensation: Compensation will be paid in the range of $204,000 - $247,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation. #J-18808-Ljbffr
    $124k-174k yearly est. 4d ago
  • Reliability/DFX Engineer

    Openai 4.2company rating

    Reliability engineer job in San Francisco, CA

    About the Team OpenAI's Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI's supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI. About the Role We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. This hands-on individual contributor will sit within our hardware team and work closely with chip design, platform design, hardware health, and the broader industry ecosystem to architect, implement, and deploy reliable next-generation AI accelerator systems. This engineer will evaluate system and chip architecture holistically, identify high-ROI opportunities to improve reliability and availability across the stack, and translate those opportunities into strategy and silicon features. In this role, you will Oversee DFX architecture, implementation, and execution in silicon from concept to high-volume deployment, and propose high-ROI features to enhance reliability and fault tolerance. DFX includes design for testability, reliability, availability, and serviceability of high-performance AI hardware. Build system-level reliability models grounded in empirical data to guide organization-wide DFX and reliability strategy. This requires a detailed understanding of chip and system architecture, design, implementation, and component-level reliability. Collaborate with chip and platform architecture/design teams to explore and implement DFX features, including the specification and implementation of digital/mixed-signal IP, firmware/system software, and DFX methodology (in partnership with engineering teams). Partner with hardware health and platform design teams to continuously improve reliability and fault tolerance in NPI and HVM. This includes optimizing operating conditions, designing experiments, and performing data analysis to drive continuous, data-driven improvements across the stack. Serve as the DFX/reliability champion and evangelist to align the broader industry ecosystem with OpenAI's requirements and roadmap. Qualifications BS with 15+ years, MS with 10+ years, or PhD with 3+ years of relevant industry experience focused on reliability across the chip/platform stack. Hands-on experience with RTL design and DFT is required; physical implementation and/or silicon ATE experience is preferred. Detailed understanding of ML chip and platform architecture and ML workload characteristics is required. Strong fundamentals in reliability modeling, with hands-on skills in empirical data analysis. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. #J-18808-Ljbffr
    $127k-176k yearly est. 2d ago
  • Founding SRE Engineer - Reliability & Growth

    Asana 4.6company rating

    Reliability engineer job in San Francisco, CA

    A leading software company is seeking experienced Software Engineers to join the new Site Reliability Engineering team. This role focuses on building reliable, scalable systems and leading projects across infrastructure. Candidates should have strong software engineering skills and a passion for reliability. The position offers a hybrid work model and generous compensation packages with additional benefits. #J-18808-Ljbffr
    $147k-189k yearly est. 2d ago
  • Site Reliability Engineer: Scale, Automate & Own Cloud Infra

    Clay 4.0company rating

    Reliability engineer job in San Francisco, CA

    A leading technology firm in San Francisco is looking for a Site Reliability Engineer to design and manage scalable infrastructure solutions. You will ensure high performance and availability while collaborating across teams. Candidates should have at least 5 years of experience, strong coding skills, and familiarity with various cloud services and automation tools. Join us in a culture of continuous improvement and innovation. #J-18808-Ljbffr
    $109k-150k yearly est. 5d ago
  • Senior Technology Site Reliability Engineer

    Cooley LLP 4.8company rating

    Reliability engineer job in San Francisco, CA

    Senior Technology Site Reliability Engineer page is loaded## Senior Technology Site Reliability Engineerlocations: San Francisco: New York: Santa Monica: Los Angeles: Palo Altotime type: Full timeposted on: Posted Yesterdayjob requisition id: Req 4348Senior Technology Site Reliability EngineerCooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operations team The Senior Technology Site Reliability Engineer (“SRE”) is responsible for ensuring the reliability, scalability, and performance of the firm's critical infrastructure and applications. The SRE blends software engineering with systems engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence. In addition to being technically advanced, the SRE will have a high degree of emotional intelligence and the ability to work as a team towards complex and layered objectives. Specific duties and responsibilities include, but are not limited to, the following:**Position responsibilities:*** Monitor and maintain production systems to ensure high availability and performance* Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements (SLA's), and error budgets* Participate in on-call rotations and incident response, including root cause analysis and postmortems* Develop and maintain infrastructure as code (IaC) using Terraform* Automate deployment, scaling, and recovery processes to reduce manual intervention* Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery* Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.)* Proactively identify and resolve system bottlenecks and reliability risks* Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle* Contribute to a culture of blameless post-mortems and continuous improvement* Document operational procedures and share knowledge across teams* All other duties as assigned or required**Skills and experience****:**Required:* After orientation at Cooley LLP, exhibit proficiency in the Microsoft Office suite, iManage and other firm applications* Ability to work extended and/or weekend hours, as required* Ability to travel, as required* 6+ years direct applicable experience (e.g. site reliability engineering or related field)* Proficiency in Terraform and programming languages such as Python, Go, or Java* Deep expertise in cloud platforms, particularly AWS, and container orchestration* Strong background in distributed systems, performance tuning, and automation* Hands-on experience with configuration management tools such as Puppet, Chef, or SaltPreferred:* Bachelor's Degree in Computer Science, Information Technology, Engineering, or associated discipline* Experience working with advanced ETL data workflows including technologies such as AWS EMR, Azure Synapse, Azure Data Factory, or Apache Hive/Spark/Airflow* Experience with IaC deployment of AKS/EKS/GKE architecture* Experience with enterprise Data Lake environments using technologies such as DataBricks or Snowflake**Competencies****:*** Expert analytical/quantitative, problem-solving, and deductive reasoning skills, experience performing advanced troubleshooting and root cause analysis of complex technical issues* Excellent organizational, planning, and time management skills and ability to work independently and in a team environment to manage competing priorities and meet deadlines* Advanced verbal and written communication skills with the ability to present findings, conclusions, alternatives, and information clearly and concisely* Experience working with all levels of business professionals, management, stakeholders, and vendors with the ability to build effective relationships through trust and diplomacy Cooley offers a competitive compensation and excellent benefits package and is committed to fair and equitable employment practices.EOE.The expected annual pay range for this position with a full-time schedule is $140,000 - $205,000. Please note that final offer amount will be dependent on geographic location, applicable experience and skillset of the candidate.We offer a full range of elective benefits including medical, health savings account (with applicable medical plan), dental, vision, health and/or dependent care flexible spending accounts, pre-tax commuter benefits, life insurance, AD&D, long-term care coverage, backup care for children and/or adults and other parental support benefits. In addition to elective benefit options, benefited employees receive firm-paid life insurance, AD&D, LTD, short term medical benefits as well as 21 days of Paid Time Off (“PTO”) and 10 paid holidays each year. We provide generous parental leave and fertility benefits. New employees will attend a detailed benefit orientation to learn more about our many benefits and resources.Welcome to Cooley. We are counselors, strategists and advocates for today's and tomorrow's leaders of the business economy. We seek to meet the evolving needs of our clients by building a community of professionals of the highest caliber who share our vision and embrace our values.Working at Cooley provides an opportunity to work in an environment of collaboration, challenge and reward. We are all part of one firm dedicated to maintaining a diverse workplace that values and celebrates differences-from the way we relate to and support each other, to the way we work together to meet the needs of our clients. It is the unique abilities and perspectives of every individual at Cooley that creates a rewarding workplace.For Cooley, this means offering all employees the tools, training and mentoring they need to succeed. It enables every individual to balance work and family obligations. It looks beyond the Firm's four walls, fostering community involvement. It includes becoming leaders and contributors in our communities.Our cooperative spirit is the trademark of the Cooley Culture and every employee in every department is instrumental to the success of the Firm. We invite you to take a look at our open positions. #J-18808-Ljbffr
    $140k-205k yearly 4d ago
  • Reliability Engineer: Scale Systems, Observe & Automate

    Openai 4.2company rating

    Reliability engineer job in San Francisco, CA

    A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations. Candidates should have strong cloud proficiency, experience in containerization technologies, and a bachelor's degree in a related field. #J-18808-Ljbffr
    $127k-176k yearly est. 5d ago
  • Site Reliability Engineer

    Clay 4.0company rating

    Reliability engineer job in San Francisco, CA

    Clay is a creative tool for growth. Our mission is to help businesses grow - without huge investments in tooling or manual labor. We're already helping over 100,000 people grow their business with Clay. From local pizza shops to enterprises like Anthropic and Notion, our tool lets you instantly translate any idea that you have for growing your company into reality. We believe that modern GTM teams win by finding GTM alpha - a unique competitive edge powered by data, experimentation, and automation. Clay is the platform they use to uncover hidden signals, build custom plays, and launch faster than their competitors. We're looking for sharp, low-ego people to help teams find their GTM alpha. Why is Clay the best place to work? Customers love the product (100K+ users and growing) We're growing a lot (6x YoY last year, and 10x YoY the two years before that) Incredible culture (our customers keep applying to work here) Well-resourced - We raised a $100M Series C in 2025 at a $3.1B valuation and are backed by world-class investors like Capital G (Google), Sequoia and Meritech Read more about why people love working at Clay here and explore our wall of love to learn more about the product. SRE @ Clay In this role, you'll join our growing infrastructure team in building and fine-tuning our infrastructure to keep our services running smoothly. We're looking for someone who's excited about automation and continuous improvement. While your main focus will be on infrastructure, coding skills are a must. As a growing startup, we all jump in where needed, so you'll need to be comfortable taking on a variety of roles. What You'll Do Architect, design, implement, and manage robust, scalable, and secure infrastructure solutions. Develop, maintain, and enforce best practices for CI/CD, infrastructure as code, and automation. Oversee the management and optimization of cloud infrastructure, ensuring high availability, performance, and cost-efficiency. Implement monitoring, logging, and alerting solutions to maintain system health and quickly resolve issues. Lead incident response efforts, troubleshooting and resolving complex issues in a timely manner. Participate in an oncall rotation. Work with teams across the company to ensure we achieve the right balance of developer velocity, reliability and performance, and cost efficiency. What You'll Bring 5+ years of experience Experience with containerization and orchestration tools Strong understanding of CI/CD concepts and tools Knowledge of infrastructure automation tools Experience with oncall and incident response Proficiency in one or more programming languages Familiarity with our stack or ability to learn unfamiliar technologies quickly: Aurora Postgres RDS, Elasticache Redis, Docker + ECS, Lambda, OpenSearch Terraform and Atlantis CircleCI, Netlify, Playwright Cloudwatch, Datadog, Mezmo Typescript, Python #J-18808-Ljbffr
    $109k-150k yearly est. 5d ago
  • Senior SRE - AI-Driven Cloud Reliability & Automation

    Crusoe Energy Systems LLC 4.1company rating

    Reliability engineer job in San Francisco, CA

    A leading energy technology firm seeks a Site Reliability Engineer to enhance its reliable, energy-efficient, AI-optimized cloud platform. In this role, you'll collaborate with cross-functional teams to improve system performance and incident management. Ideal candidates will have a strong background in cloud operations and automation, alongside critical problem-solving skills. Join this innovative team to drive sustainable technology and contribute to a cutting-edge infrastructure focused on operational excellence. #J-18808-Ljbffr
    $142k-189k yearly est. 3d ago

Learn more about reliability engineer jobs

How much does a reliability engineer earn in Redwood City, CA?

The average reliability engineer in Redwood City, CA earns between $96,000 and $187,000 annually. This compares to the national average reliability engineer range of $76,000 to $144,000.

Average reliability engineer salary in Redwood City, CA

$134,000

What are the biggest employers of Reliability Engineers in Redwood City, CA?

The biggest employers of Reliability Engineers in Redwood City, CA are:
  1. Tesla
  2. Zoox
  3. Replit
  4. xAI
  5. Rivian
  6. Skydio
  7. The Connection
  8. JPMorgan Chase & Co.
  9. JPMC
  10. Pantera Capital
Job type you want
Full Time
Part Time
Internship
Temporary