Post job

Systems engineer jobs in Walnut Creek, CA - 4,864 jobs

All
Systems Engineer
Infrastructure Engineer
Systems Engineer Staff
Systems Analyst
  • 4-H Data Systems Analyst 3 - Davis, CA, Job ID 82838

    University of California Agriculture and Natural Resources 3.6company rating

    Systems engineer job in Davis, CA

    Under the direction and supervision of the Statewide 4-H Director, the 4-H Data Systems Analyst applies advanced analytical concepts, organizational objectives, and database integration principles to assist with the management and development of the statewide 4-H enrollment and reporting system. This role involves analyzing extensive and multi-layered processes and problems; developing identified online system needs and solutions; collaborating to ensure all new and updated enrollment system processes will improve efficiency of the University of California 4-H (CA 4-H) Youth Development Program's enrollment system. The incumbent provides subject-matter expertise to inform enrollment system design, data integrity, reporting, training, and compliance across related platforms used in CA 4-H. This includes serving as the primary liaison with vendors, county offices, statewide staff, and external partners to ensure the enrollment system and related tools meet program, policy, and compliance requirements. The position is responsible for designing data methodologies, developing statewide enrollment reporting frameworks, and analyzing program participation trends to inform organizational decision-making. The analyst also leads requirements gathering and analysis to translate statewide operational, programmatic, and policy needs into technical specifications. The 4-H Data Systems Analyst participates in the development of enrollment system training, resources, and system enhancements. The role requires the ability to manage multiple, high-level projects, anticipate and adapt to organizational needs, and deliver innovative, data-driven solutions that increase efficiency, compliance, and program effectiveness across CA 4-H. This position independently applies advanced data systems concepts to resolve complex issues and shape statewide system functions. The position also collaborates with the 4-H Policy Analyst to ensure that all applicable UC, state, federal, and 4-H policy changes are integrated into the enrollment system. The 4-H Data Analyst also collaborates on policy-based issues impacting the UC 4-H enrollment system, UC ANR digital enterprise system, and the national 4-H network for data management and enrollment reporting. This position is a career appointment that is 100% fixed. The home department is CA 4-H. While this position normally is based in Davis, CA, this position is eligible for hybrid flexible work arrangements for applicants living in the State of California at this time. Please note that hybrid flexible work arrangements are subject to change by the University. Pay Scale: $84,100.00/year to $119,400.00year Job Posting Close Date: This job is open until filled. The first application review date will be 12/16/2025. Key Responsibilities: 40% Statewide Data System Coordination and Support: Provides strategic oversight and management of the statewide 4-H enrollment database and related systems, ensuring data integrity, compliance, and security. Participates in the design and oversees implementation of system features, integrations, and workflows to increase efficiency and effectiveness of program operations. Assists with the development of statewide methodologies for extracting, validating, and reporting data, ensuring alignment with UC, state, and federal reporting requirements. Serves as primary liaison to vendors and developers, advocating California's system needs and ensuring successful system enhancements and problem resolution. Ensures consistent application of data governance and quality assurance practices across all statewide enrollment data workflows. Collaborates with Statewide 4-H Director, 4-H Policy Analyst and others to anticipate and interpret applicable policy changes (UC, state, federal and 4-H) and integrates them into enrollment system design and user processes. 20% Data Analysis, Reporting, and Policy Support: Designs and delivers advanced reporting dashboards, data visualizations, and analyses to support statewide monitoring, compliance, and decision-making. Conducts complex analyses of program participation and system usage, identifying trends, gaps, and opportunities to inform leadership decisions. Leads requirements gathering and analysis to translate statewide operational, programmatic, and policy needs into technical specifications and system configurations. Serves as subject matter expert in translating program and policy requirements into actionable enrollment system processes. 30% Training, Communication, & Statewide Support: Assists with the design and implementation of statewide training programs, guidance materials, and communication strategies for all 4-H data system users, including county staff, volunteers, and families. Delivers advanced, multi-platform trainings (virtual and in-person), ensuring consistent statewide understanding and compliance. Coaches and advises county-level staff on complex system and policy questions, providing advanced-level troubleshooting and guidance. Represents California 4-H in national peer groups and committees related to enrollment and data systems, sharing best practices and advocating for program needs. 10% Additional Systems & Financial Reporting System: Provides secondary technical support for additional online 4-H systems, including the statewide financial reporting platform, as needed. Advises on future CA 4-H enrollment system technology adoption, integration, and system expansion opportunities to strengthen program operations. Review enrollment system functions for increased efficiencies in enrollment procedures and overall data collection and use. Provides subject-matter expertise to evaluate system functionality and recommend improvements to support statewide operational efficiency. Requirements: Bachelor's degree in a related field and extensive professional experience in data systems management, reporting, and analysis, or equivalent combination of education and experience Demonstrated expertise in database design, system implementation, and data security/integrity practices, including handling complex and sensitive data. Thorough knowledge of data visualization and reporting tools; ability to design dashboards and decision-support tools for executive audiences. Strong analytical, problem-solving, collaboration, and decision-making skills; ability to independently as well as collaboratively resolve highly complex issues requiring evaluation of multiple factors. Excellent written and verbal communication skills; ability to communicate technical concepts to diverse audiences. Ability to anticipate organizational needs, translate policy into operational procedures, and recommend strategic improvements. Demonstrate strong proficiency using Microsoft Office, Zoom, Google Workspace applications, Box, and similar collaboration and communication software tools. Preferred Skills: Master's degree in a related field and significant professional experience in data systems management, reporting, and analysis, and/or equivalent combination of education and experience. Knowledge of Cooperative Extension. 4-H knowledge of program delivery, including delivery modes. Experience managing vendor relationships and system development projects. Coding knowledge and experience Fluency in Spanish Special Conditions of Employment: Must possess valid California Driver's License to drive a County or University vehicle. Ability and means to travel on a flexible schedule as needed, proof of liability damage insurance on vehicle used is required. Reimbursement of job-related travel will be reimbursed according to University policies. Travel including travel outside normal business hours may be requested. The University reserves the right to make employment contingent upon successful completion of the background check. This is a designated position requiring a background check and may require fingerprinting due to the nature of the job responsibilities. UC ANR does hire people with conviction histories and reviews information received in the context of the job responsibilities. As of January 1, 2014, ANR is a smoke- and tobacco-free environment in which smoking, the use of smokeless tobacco products, and the use of unregulated nicotine products (e-cigarettes), is strictly prohibited. As a condition of employment, you will be required to comply with the University of California https://apptrkr.com/get_redirect.php?id=6853764&target URL=Policy on Vaccination Programs, as may be amended or revised from time to time. Federal, state, or local public health directives may impose additional requirements. Exercise the utmost discretion in managing sensitive information learned in the course of performing their duties. Sensitive information includes, but is not limited to, employee and student records, health and patient records, financial data, strategic plans, proprietary information, and any other sensitive or non-public information learned during the course and scope of employment. Understands that sensitive information should be shared on a limited basis and actively takes steps to limit access to sensitive information to individuals who have legitimate business need to know. Ensure that sensitive information is properly safeguarded. Follow all organizational policies and laws on data protection and privacy. This includes secure handling of physical and digital records and proper usage of IT systems to prevent data leaks. The unauthorized or improper disclosure of confidential work-related information obtained from any source on any work-related matter is a violation of these expectations. Misconduct Disclosure Requirement: As a condition of employment, the final candidate who accepts a conditional offer of employment will be required to disclose if they have been subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct; received notice of any allegations or are currently the subject of any administrative or disciplinary proceedings involving misconduct; have left a position after receiving notice of allegations or while under investigation in an administrative or disciplinary proceeding involving misconduct; or have filed an appeal of a finding of misconduct with a previous employer. a. "Misconduct" means any violation of the policies or laws governing conduct at the applicant's previous place of employment, including, but not limited to, violations of policies or laws prohibiting sexual harassment, sexual assault, or other forms of harassment, discrimination, dishonesty, or unethical conduct, as defined by the employer. For reference, below are UC's policies addressing some forms of misconduct: UC Sexual Violence and Sexual Harassment Policy UC Anti-Discrimination Policy Abusive Conduct in the Workplace To apply, please visit: https://careerspub.universityofcalifornia.edu/psc/ucanr/EMPLOYEE/HRMS/c/HRS_HRAM_FL.HRS_CG_SEARCH_FL.GBL?Page=HRS_APP_JBPST_FL&JobOpeningId=82838&PostingSeq=1&SiteId=17&language Cd=ENG&FOCUS=Applicant Copyright ©2025 Jobelephant.com Inc. All rights reserved. Posted by the FREE value-added recruitment advertising agency jeid-7807bc68cc6c884abd53e55ed13dbe1c
    $84.1k-119.4k yearly 7d ago
  • Job icon imageJob icon image 2

    Looking for a job?

    Let Zippia find it for you.

  • Machine Learning Infrastructure Engineer

    Apple Inc. 4.8company rating

    Systems engineer job in Sunnyvale, CA

    Sunnyvale, California, United States Machine Learning and AI Want to ship amazing experiences in Apple products? Be part of the team in the Video Computer Vision (VCV) organization that focuses on people understanding from real-time video streams and building higher level reasoning algorithms. VCV delivered features such as Face ID, RoomPlan as well as many other computer vision algorithms powering Apple Vision Pro, iPhone, and iPad. We focus on a balance of research and development to deliver Apple quality, pioneering experiences. Come shape Apple products as a driven and dedicated ML Infrastructure and Data Engineer to push the limits of ML algorithms with hands‑on work and real world and simulated data, in an innovative team and be part of building the next big thing. Description As part of the Video Computer Vision (VCV) team, you will help us create the data and infrastructure ecosystem needed to support our ML development and continuously improve our features. We take full end-to-end ownership of our services and data products, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you'll have the opportunity to work on complex problems in close partnership with our ML engineers, data scientists and software integration teams. Minimum Qualifications Bachelor's degree in Computer Science or related discipline, and 2 years relevant industry experience. Strong foundational knowledge in Computer Science. Extensive programming experience in Python. Hands‑on experience with cloud providers (AWS, GCP, or Azure). Strong understanding of core infrastructure concepts (e.g., compute, networking, storage, containers, Kubernetes). Preferred Qualifications Experience with machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment. Proficiency with cloud computing and distributed data processing infrastructure and tools (e.g., Ray, Spark, Trino). Hands‑on experience with CI/CD pipelines and practices. Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform, Pulumi, or CloudFormation). Experience building on LLMs or other generative models. Ability to drive projects from concept to production, balancing business needs with technical quality and timely delivery. Excellent communication skills, ability to work both independently and multi‑functionally. At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant . Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr
    $147.4k-272.1k yearly 5d ago
  • MEP Systems Engineer

    Samara 3.4company rating

    Systems engineer job in Redwood City, CA

    Ready to play a key role in building the future of living? Join Samara in tackling California's housing shortage and enabling people to attain sustainable housing without compromising design or quality. Our flagship product, Backyard, is a fully turnkey, premium accessory dwelling unit (ADU) designed for homeowners and real estate developers. As we expand our offerings and scale our in-house development initiatives, we're at a pivotal moment, redefining homeownership through high-quality, attainable infill housing. Backed by top-tier investors, including Airbnb, Thrive Capital, and 8VC, Samara is positioned for significant growth and market impact. To support our next phase of growth, we're hiring product-focused engineers to advance and scale the technical foundation of our modular system. These roles go beyond traditional design work-they refine system standards, improve factory repeatability, and ensure our units are code-compliant, manufacturable, and built to the highest standards of quality and performance. The MEP Systems Engineer will be responsible for the detailed design and implementation of mechanical, electrical, plumbing, and PV systems tailored for modular construction building systems. This role requires a deep understanding of MEP systems combined with practical experience in modular construction. You will collaborate closely with leadership, crossfunctional design and engineering teams to integrate all technical and user experience requirements into our designs to ensure optimal functionality, sustainability, and compliance with all regulations. What You'll Do Design and develop integrated MEP systems for our new and existing designs including solar energy systems, including PV and ESS, optimized for prefabricated modular construction Ensure that solar and energy storage designs align with overall MEP system functionality and building energy requirements Lead the creation of comprehensive design documents, schematics, component material selections and system layouts, preferably using CAD and BIM software Provide technical leadership during the installation and commissioning phases to ensure systems meet design specifications and performance standards Conduct system testing and validation to ensure functionality, efficiency, and safety of both MEP and PV installations Collaborate closely with installation teams to facilitate seamless and efficient factory and onsite implementation of design Engage in research and application of the latest technologies and practices in renewable energy and modular construction Work with program managers and other engineering disciplines to ensure holistic integration of all systems within Samara modular units What We're Looking For Modular construction experience in factory builds, multi-mod, stackable and/or other hands on related experience. Licensed Electrician or Mechanical Contractor -and/or- Bachelor's degree in Mechanical, Electrical, or Energy Systems Engineering, or a related field Professional Engineering (PE) license preferred Minimum of 7 years of experience in one of the following: Mechanical, Electrical, Solar and/or Plumbing System design Comprehensive knowledge of building codes, safety regulations, and sustainability practices relevant to MEP and renewable energy systems Proficiency in design software such as Onshape, Revit, and/or other BIM methodologies preferred Excellent problem-solving skills and the ability to adapt designs to changing technological and regulatory landscapes Strong communication and leadership skills, capable of driving project decisions and managing complex stakeholder relationships Ability to travel to our factory in Mexico up to 25-40%. What We Offer Salary range of $120-160K and performance-based bonuses. Hybrid work schedule with 3 days each week in our Redwood City office. Snacks and Lunch on in-office days Early stage employee equity. Exceptional health, dental, and vision insurance. 401k eligibility after 6 months. Flexible PTO policy. How to Apply If you're excited to support Samara's mission and have the skills to match, we'd love to hear from you. Please submit your resume and a brief letter of introduction to our team. Let's build something extraordinary-together.
    $120k-160k yearly 4d ago
  • Machine Learning Infrastructure Engineer

    Ambience Healthcare

    Systems engineer job in San Francisco, CA

    About Us: Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care. Trusted by top health systems across North America, Ambience's platform is live across outpatient, emergency, and inpatient settings, supporting more than 100 specialties with real-time, coding‑aware documentation. The platform integrates directly with Epic, Oracle Cerner, athenahealth, and other major EHRs. Founded in 2020 by Mike Ng and Nikhil Buduma, Ambience is headquartered in San Francisco and backed by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, Kleiner Perkins, and other leading investors. Join us in the endeavor of accelerating the path to safe & useful clinical super intelligence by becoming part of our community of problem solvers, technologists, clinicians, and innovators. The Role: We're looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure that powers every AI system at Ambience. You'll work closely with our ML, data, and product teams to develop the foundational tools, systems, and workflows that support rapid iteration, robust evaluation, and production reliability for our LLM‑based products. Our engineering roles are hybrid - working onsite at our San Francisco office three days per week. What You'll Do: You have 5+ years of experience as a software engineer, infrastructure engineer, or ML platform engineer You've worked directly on systems that support ML research or production workloads - whether training pipelines, evaluation systems, or deployment frameworks You write high-quality code (we primarily use Python) and have strong engineering and systems design instincts You're excited to work closely with ML researchers and product engineers to unblock them with better infrastructure You're pragmatic and care deeply about making tools that are reliable, scalable, and easy to use You thrive in fast-paced, collaborative environments and are eager to take ownership of ambiguous problems Who You Are: Design, build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows Improve internal tools and developer experience for ML experimentation and observability Partner with ML engineers to optimize model deployment and monitoring across clinical workloads Define standards for model versioning, performance tracking, and rollout processes Collaborate across the engineering team to build reusable abstractions that accelerate AI product development Drive performance, cost efficiency, and reliability improvements across our AI infrastructure stack Pay Transparency We offer a base compensation range of approximately $200,000-300,000 per year, with the addition of significant equity. This intentionally broad range provides flexibility for candidates to tailor their cash and equity mix based on individual preferences. Our compensation philosophy prioritizes meaningful equity grants, enabling team members to share directly in the impact they help create. If your expectations fall outside of this range, we still encourage you to apply-our approach to compensation considers a range of factors to ensure alignment with each candidate's unique needs and preferences. Being at Ambience: An opportunity to work with cutting edge AI technology, on a product that dramatically improves the quality of life for healthcare providers and the quality of care they can provide to their patients Dedicated budget for personal development, including access to world class mentors, advisors, and an in‑house executive coach Work alongside a world‑class, diverse team that is deeply mission aligned Ownership over your success and the ability to significantly impact the growth of our company Competitive salary and equity compensation with benefits including health, dental, and vision coverage, quarterly retreats, unlimited PTO, and a 401(k) plan Ambience is committed to supporting every candidate's ability to fully participate in our hiring process. If you need any accommodations during your application or interviews, please reach out to our Recruiting team at accommodations@ambiencehealth.com. We'll handle your request confidentially and work with you to ensure an accessible and equitable experience for all candidates. #J-18808-Ljbffr
    $200k-300k yearly 1d ago
  • Privacy-First ML Infrastructure Engineer

    Workshop Labs

    Systems engineer job in San Francisco, CA

    A pioneering AI startup in San Francisco is looking for an experienced individual to build infrastructure for deploying personalized AI models. The role demands a strong understanding of machine learning technology and a passion for enabling user-controlled AI solutions. Ideal candidates will thrive in fast-paced environments and contribute to impactful AI development. The company offers competitive compensation, equity, and a significant role in shaping the future of AI. #J-18808-Ljbffr
    $115k-175k yearly est. 3d ago
  • Machine Learning Infrastructure Engineer

    David Ai

    Systems engineer job in San Francisco, CA

    David AI is the first audio data research company. We bring an R&D approach to data-developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, and we believe audio is the gateway. Speech is versatile, accessible, and human-it fits naturally into everyday life. As audio AI advances and new use cases emerge, high-quality training data is the bottleneck. This is where David AI comes in. David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we've brought on most FAANG companies and AI labs as customers. We recently raised a $50M Series B from Meritech, NVIDIA, Jack Altman (Alt Capital), Amplify Partners, First Round Capital and other Tier 1 investors. Our team is sharp, humble, ambitious, and tight-knit. We're looking for the best research, engineering, product, and operations minds to join us on our mission to push the frontier of audio AI. About our Engineering team At David AI, our engineers build the pipelines, platforms, and models that transform raw audio into high-signal data for leading AI labs and enterprises. We're a tight-knit team of product engineers, infrastructure specialists, and machine learning experts focused on building the world's first audio data research company. We move fast, own our work end-to-end, and ship to production daily. Our team designs real-time pipelines handling terabytes of speech data and deploys cutting-edge generative audio models. About this role As our Founding Machine Learning Infrastructure Engineer at David AI, you will build and scale the core infrastructure that powers our cutting-edge audio ML products. You'll be leading the development of the systems that enable our researchers and engineers to train, deploy, and evaluate machine learning models efficiently. In this role, you will Design and maintain data pipelines for processing massive audio datasets, ensuring terabytes of data are managed, versioned, and fed into model training efficiently. Develop frameworks for training audio models on compute clusters, managing cloud resources, optimizing GPU utilization, and improving experiment reproducibility. Create robust infrastructure for deploying ML models to production, including APIs, microservices, model serving frameworks, and real-time performance monitoring. Apply software engineering best practices with monitoring, logging, and alerting to guarantee high availability and fault‑tolerant production workloads. Translate research prototypes into production pipelines, working with ML engineers and data teams to support efficient data labeling and preparation. and optimization techniques to enhance infrastructure velocity and reliability. Your background looks like 5+ years of backend engineering with 2+ years ML infrastructure experience. Hands‑on experience scaling cloud infrastructure and large‑scale data processing pipelines for ML model training and evaluation. Proficient with Docker, Kubernetes, and CI/CD pipelines. Proven ML model deployment and lifecycle management in production. Strong system design skills optimizing for scale and performance. Proficient in Python with deep Kubernetes experience. Bonus points if you have Experience with feature stores, experiment tracking (MLflow, Weights and Biases), or custom CI/CD pipelines. Familiarity with large‑scale data ingestion and streaming systems (Spark, Kafka, Airflow). Proven ability to thrive in fast‑moving startup environments. Some technologies we work with Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Trigger.dev, WebRTC, FFmpeg. Benefits Unlimited PTO. Top‑notch health, dental, and vision coverage with 100% coverage for most plans. FSA & HSA access. 401k access. Meals 2x daily through DoorDash + snacks and beverages available at the office. Unlimited company‑sponsored Barry's classes. #J-18808-Ljbffr
    $115k-175k yearly est. 5d ago
  • Machine Learning Infrastructure Engineer at early-stage private AI platform

    Jack & Jill/External ATS

    Systems engineer job in San Francisco, CA

    This is a job that we are recruiting for on behalf of one of our customers. To apply, speak to Jack. He's an AI agent that sends you unmissable jobs and then helps you ace the interview. He'll make sure you are considered for this role, and help you find others if you ask. Machine Learning Infrastructure Engineer Company Description: Early-stage private AI platform Job Description: Build the core infrastructure to serve thousands, then millions, of private, personalized AI models at scale. This role involves optimizing model serving performance for low latency and cost, and integrating a TEE-based privacy stack to ensure user data and models are exclusively accessible by the user, not even the company. Drive the foundational systems for a new era of personal AI. Location: San Francisco, USA Why this role is remarkable: Pioneer the infrastructure for truly private, personal AI models, ensuring user data remains confidential. Join an early-stage, well-funded startup backed by top-tier VCs and leading AI experts. Make a massive impact on the future of AI, helping to keep humans empowered in a post-AGI world. What you will do: Build infrastructure for deploying thousands to millions of personalized finetuned models. Monitor and optimize in-the-wild model serving performance for low latency and cost. Integrate with a TEE-based privacy stack to guarantee user data and model confidentiality. The ideal candidate: Deep understanding of the machine learning stack, including transformer optimization and GPU performance. Ability to execute quickly in a fast-paced, early-stage startup environment. A missionary mentality, passionate about ensuring AI works for people. How to Apply: To apply for this job speak to Jack, our AI recruiter. Step 1. Visit our website Step 2. Click 'Speak with Jack' Step 3. Login with your LinkedIn profile Step 4. Talk to Jack for 20 minutes so he can understand your experience and ambitions Step 5. If the hiring manager would like to meet you, Jack will make the introduction #J-18808-Ljbffr
    $115k-175k yearly est. 4d ago
  • Machine Learning Systems Engineer

    Menlo Ventures

    Systems engineer job in Berkeley, CA

    Who We Are At RelationalAI, we are building the future of intelligent data systems through our cloud-native relational knowledge graph management system-a platform designed for learning, reasoning, and prediction. We are a remote-first, globally distributed team with colleagues across six continents. From day one, we've embraced asynchronous collaboration and flexible schedules, recognizing that innovation doesn't follow a 9-to-5. We are committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of every team member and believe in fostering a culture of respect, curiosity, and innovation. We support each other's growth and success-and take the well‑being of our colleagues seriously. We encourage everyone to find a healthy balance that affords them a productive, happy life, wherever they choose to live. We bring together engineers who love building core infrastructure, obsess over developer experience, and want to make complex systems scalable, observable, and reliable. Machine Learning Systems Engineer Location: Remote (San Francisco Bay Area / North America / South America) Experience Level: 3+ years of experience in machine learning engineering or research About ScalarLM This role will involve heavily working with the ScalarLM framework and team. ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self‑improving agents-all via an OpenAI‑compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron‑LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher‑lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation. ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 - October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry. It is a fully open source project (CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self‑improving AI agents similar to DeepSeek R1. ScalarLM is supported and maintained by TensorWave in addition to RelationalAI. The Role As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large‑scale language model applications on top of it. You'll operate at the intersection of high-performance computing, distributed systems, and cutting‑edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale. This is an opportunity to take on technically demanding projects, contribute to foundational systems, and help shape the next generation of intelligent computing. You Will Contribute code and performance improvements to the open source project. Develop and optimize distributed training algorithms for large language models. Implement high‑performance inference engines and optimization techniques. Work on integration between vLLM, Megatron‑LM, and HuggingFace ecosystems. Build tools for seamless model training, fine‑tuning, and deployment. Optimize performance of advanced GPU architectures. Collaborate with the open source community on feature development and bug fixes. Research and implement new techniques for self‑improving AI agents. Who You Are Technical Skills Programming Languages: Proficiency in both C/C++ and Python High Performance Computing: Deep understanding of HPC concepts, including: MPI (Message Passing Interface) programming and optimization Bulk Synchronous Parallel (BSP) computing models Multi‑GPU and multi‑node distributed computing CUDA/ROCm programming experience preferred Machine Learning Foundations: Solid understanding of gradient descent and backpropagation algorithms Experience with transformer architectures and the ability to explain their mechanics Knowledge of deep learning training and its applications Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization) Research and Development Publications: Experience with machine learning research and publications preferred Research Skills: Ability to read, understand, and implement techniques from recent ML research papers Open Source: Demonstrated commitment to open source development and community collaboration Experience 3+ years of experience in machine learning engineering or research. Experience with large-scale distributed training frameworks (Megatron‑LM, DeepSpeed, FairScale, etc.). Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.). Experience with containerization (Docker, Kubernetes) and cluster management. Background in systems programming and performance optimization. Bonus points if: PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field. Experience with SLURM, Kubernetes, or other cluster orchestration systems. Knowledge of mixed precision training, data parallel training, and scaling laws. Experience with transformer architecture, pytorch, decoding algorithms. Familiarity with high performance GPU programming ecosystem. Previous contributions to major open source ML projects. Experience with MLOps and model deployment at scale. Understanding of modern attention mechanisms (multi‑head attention, grouped query attention, etc.). Why RelationalAI RelationalAI is committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of our team. We are driven by curiosity, value innovation, and help each other to succeed and to grow. We take the well‑being of our colleagues seriously, and offer flexible working hours so each individual can find a healthy balance that affords them a productive, happy life wherever they choose to live. 🌎 Global Benefits at RelationalAI At RelationalAI, we believe that people do their best work when they feel supported, empowered, and balanced. Our benefits prioritize well‑being, flexibility, and growth, ensuring you have the resources to thrive both professionally and personally. We are all owners in the company and reward you with a competitive salary and equity. Work from anywhere in the world. Comprehensive benefits coverage, including global mental health support Open PTO - Take the time you need, when you need it. Company Holidays, Your Regional Holidays, and RAI Holidays-where we take one Monday off each month, followed by a week without recurring meetings, giving you the time and space to recharge. Paid parental leave - Supporting new parents as they grow their families. We invest in your learning & development Regular team offsites and global events - Building strong connections while working remotely through team offsites and global events that bring everyone together. A culture of transparency & knowledge‑sharing - Open communication through team standups, fireside chats, and open meetings. Country Hiring Guidelines RelationalAI hires around the world. All of our roles are remote; however, some locations might carry specific eligibility requirements. Because of this, understanding location & visa support helps us better prepare to onboard our colleagues. Our People Operations team can help answer any questions about location after starting the recruitment process. Privacy Policy EU residents applying for positions at RelationalAI can see our Privacy Policy here. California residents applying for positions at RelationalAI can see our Privacy Policy here. Equal Opportunity RelationalAI is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. #J-18808-Ljbffr
    $86k-121k yearly est. 3d ago
  • Distributed Systems Engineer

    Archil, Inc.

    Systems engineer job in San Francisco, CA

    Role As a distributed systems engineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product direction. We'll expect you to be able to: Be oncall for a production system to help our customers if anything goes wrong. Build out never-before-seen capabilities in a storage service Design distributed systems interactions for atomicity and idempotency Deploy infrastructure and generalize infrastructure across different clouds Operate through changing customer requirements with lots of ambiguity Who are you? You have 3+ years of experience building and operating distributed systems (flexible). Ideally, you've worked at a startup before, so you know how chaotic this time can be. You've successfully resolved disagreements at work before, and you understand that the highest priority is helping our customers - not being right. You're comfortable debugging problems that occur as a result of failures in multiple, different systems, using tools like metrics and logs. You've been paged at 3am to solve a complex production issue before. You're knowledgeable about distributed systems: you get how consensus works, you know how to scale systems, and you know what pitfalls in API design to avoid. You're familiar with how to optimize the performance of a system, including a general sense of how much latency different operations take, and what kind of bottlenecks could lead to a reduction in potential throughput. Most of all, you know how computers work from the silicon up. Someone once asked you in an interview “what happens when you go to Google.com”, and there wasn't enough time in the interview to talk about all of the steps. Why join us? By building the highest-performance, simplest storage product in the cloud, we have a great chance of changing how the world builds the next-generation of applications (and with AI, more applications will be written in the next 5 years than ever before). We'd love for your to be a part of our journey. How to join? Show us that you're knowledgeable about the space that we're working in on your application. It's up to you how you do this, but one potential way is by answering one of the following questions: How do you think our system works? What do you think our biggest technical challenge is? What would make our system not work? About Archil Archil is on a mission to change how developers build applications in the cloud, by building the next, default storage platform in the cloud. Over the past 15 years, S3 has become the default way to store inactive data sets in the cloud, but the next-generation of AI and analytics applications need to actively process more data than ever before. We're solving this problem by building the first Volume storage product that's as fast as EBS, infinitely scaleable like S3, and connects to existing data sets in S3 and other repositories. Our customers choose Archil because this architecture radically simplifies how they think about working with their data (every application becomes stateless, no cold-start latencies, and no need to worry about checkpointing or backup). Hacker News agrees. Hunter, the founder, has 10 years of experience building and operating cloud storage, including helping to launch Amazon's EFS product and working on bleeding-edge storage at Netflix. He started the company after working with hundreds of customers across these roles, and identifying a need for a new kind of storage product. We're fully in-person in San Francisco. If you're also someone interested in distributed systems, completely focused on how to make customers successful, and interested in solving really big technical challenges, we'd love for you to join us. #J-18808-Ljbffr
    $87k-121k yearly est. 1d ago
  • ML Engineer - Production-Scale AI Systems

    Inference

    Systems engineer job in San Francisco, CA

    A cutting-edge AI startup in San Francisco is seeking a Machine Learning Engineer. In this role, you will build and improve core ML systems that drive custom model training platforms. You will lead projects from data intake to model delivery, creating robust tools and ensuring model performance. The ideal candidate has experience in AI model training with PyTorch, data processing, and creating benchmarks. Offering competitive salaries within a range of $220,000 to $320,000, plus equity and benefits. #J-18808-Ljbffr
    $87k-121k yearly est. 1d ago
  • SRE Cybersecurity Engineer - Scale Systems (Equity)

    Pantera Capital

    Systems engineer job in Palo Alto, CA

    A tech-focused financial services firm in California is seeking a Cybersecurity/SRE professional to secure and maintain the reliability of its infrastructure. Responsibilities include building secure applications on AWS, managing identities, and strengthening Kubernetes security. The ideal candidate has expertise in Python, Terraform, and large distributed systems, and holds a proactive, problem-solving mindset. Competitive salary and comprehensive benefits included. #J-18808-Ljbffr
    $86k-120k yearly est. 2d ago
  • Distributed ML Infrastructure Engineer

    Institute of Foundation Models

    Systems engineer job in Sunnyvale, CA

    A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed and FSDP. This role offers a competitive salary ranging from $150,000 to $450,000 annually along with comprehensive benefits and amenities. #J-18808-Ljbffr
    $114k-174k yearly est. 3d ago
  • Staff Hardware Systems Engineer - BIOS/Firmware Lead

    Crusoe 4.1company rating

    Systems engineer job in San Francisco, CA

    A technology company in Sunnyvale is seeking a Staff Hardware Systems Engineer to lead the development of system firmware and kernel-level software for high-performance server platforms. The ideal candidate has over 8 years of experience in hardware systems development and strong expertise in BIOS and firmware engineering. You will directly influence the company's future by enhancing hardware compatibility and performance. Competitive compensation and benefits are provided, including stock options. #J-18808-Ljbffr
    $117k-170k yearly est. 3d ago
  • ML Infrastructure Engineer - Real-Time Vision

    Apple Inc. 4.8company rating

    Systems engineer job in Sunnyvale, CA

    A leading technology company is looking for a Machine Learning Infrastructure Engineer in Sunnyvale, California. You will develop data ecosystems and infrastructure for ML projects, partnering closely with engineers and scientists. Candidates should have a Bachelor's in Computer Science and experience with cloud providers, as well as strong programming skills in Python. This is an opportunity to be a part of innovative projects that influence the next generation of technology. #J-18808-Ljbffr
    $150k-196k yearly est. 5d ago
  • Machine Learning Infrastructure Engineer

    Ambience Healthcare, Inc.

    Systems engineer job in San Francisco, CA

    About Us: Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care. Trusted by top health systems across North America, Ambience's platform is live across outpatient, emergency, and inpatient settings, supporting more than 100 specialties with real-time, coding-aware documentation. The platform integrates directly with Epic, Oracle Cerner, athenahealth, and other major EHRs. Founded in 2020 by Mike Ng and Nikhil Buduma, Ambience is headquartered in San Francisco and backed by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, Kleiner Perkins, and other leading investors. Join us in the endeavor of accelerating the path to safe & useful clinical super intelligence by becoming part of our community of problem solvers, technologists, clinicians, and innovators. The Role: We're looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure that powers every AI system at Ambience. You'll work closely with our ML, data, and product teams to develop the foundational tools, systems, and workflows that support rapid iteration, robust evaluation, and production reliability for our LLM-based products. Our Engineering roles are hybrid in our SF office 3x/wk. What You'll Do: You have 5+ years of experience as a software engineer, infrastructure engineer, or ML platform engineer You've worked directly on systems that support ML research or production workloads - whether training pipelines, evaluation systems, or deployment frameworks You write high-quality code (we primarily use Python) and have strong engineering and systems design instincts You're excited to work closely with ML researchers and product engineers to unblock them with better infrastructure You're pragmatic and care deeply about making tools that are reliable, scalable, and easy to use You thrive in fast-paced, collaborative environments and are eager to take ownership of ambiguous problems Who You Are: Design, build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows Improve internal tools and developer experience for ML experimentation and observability Partner with ML engineers to optimize model deployment and monitoring across clinical workloads Define standards for model versioning, performance tracking, and rollout processes Collaborate across the engineering team to build reusable abstractions that accelerate AI product development Drive performance, cost efficiency, and reliability improvements across our AI infrastructure stack Pay Transparency We offer a base compensation range of approximately $200,000-300,000 per year, exclusive of equity. This intentionally broad range provides flexibility for candidates to tailor their cash and equity mix based on individual preferences. Our compensation philosophy prioritizes meaningful equity grants, enabling team members to share directly in the impact they help create. Are you outside of the range? We encourage you to still apply: we take an individualized approach to ensure that compensation accounts for all of the life factors that matter for each candidate. Being at Ambience: An opportunity to work with cutting edge AI technology, on a product that dramatically improves the quality of life for healthcare providers and the quality of care they can provide to their patients Dedicated budget for personal development, including access to world class mentors, advisors, and an in-house executive coach Work alongside a world-class, diverse team that is deeply mission aligned Ownership over your success and the ability to significantly impact the growth of our company Competitive salary and equity compensation with benefits including health, dental, and vision coverage, quarterly retreats, unlimited PTO, and a 401(k) plan Ambience is committed to supporting every candidate's ability to fully participate in our hiring process. If you need any accommodations during your application or interviews, please reach out to our Recruiting team at accommodations@ambiencehealth.com. We'll handle your request confidentially and work with you to ensure an accessible and equitable experience for all candidates. #J-18808-Ljbffr
    $200k-300k yearly 5d ago
  • Machine Learning Infrastructure Engineer

    Workshop Labs

    Systems engineer job in San Francisco, CA

    Build the infrastructure to serve personal AI models privately and at scale. We're building the first truly private, personal AI - one that learns your skills, judgment, and preferences without big tech ever seeing your data. Our core ML systems challenge: how do we serve the world's best personal model, at low cost and high speed, with bulletproof privacy? What you'll do Build the infrastructure that lets us create & deploy thousands and eventually millions of personalized finetuned models for our customers Monitor & optimize in-the-wild model serving performance to hit low latency & cost Interface with the TEE-based privacy stack that lets us guarantee user data & models can only be seen & used by the user-not even us-and integrate the privacy architecture with the finetuning & inference code You have A deep understanding of the machine learning stack. You can dive into the details of how transformers work & performance optimization techniques for them. You have a mental model of GPUs sufficient to reason about performance from first principles. You can drill down from ML code to metal. Ability to execute quickly. We ship fast and fail fast so we can win faster. The challenge of human relevance in a post-AGI world isn't going to solve itself. A missionary mentality. We're a mission-driven company, looking for mission-first people. If you're passionate about ensuring AI works for people (and not the other way around), you've come to the right place. Ready to roll up your sleeves. We're an early stage startup, so we're looking for someone who can wear many hats. Experience you may have Work at a fast-paced AI startup, or top AI lab Experience deploying ML systems at scale. You might have worked with frameworks like vLLM, S-LoRA, Punica, or LoRAX. Experience with privacy-first infrastructure. You're familiar with confidential computing & ability to reason about both technical and real-world confidentiality and security. You may have worked with secure enclaves, TEEs, code measurement & remote attestation, Nvidia Confidential Computing, Intel TDX or AMD SEV-SNP, or related confidential computing technologies. We encourage speculative applications; we expect many strong candidates will have different experience or unconventional backgrounds. What we offer Generous compensation and early stage equity. We're competitive with the top startups, because we believe the best talent deserves it. World-class expertise. We're based in top AI research hubs in San Francisco and London. We're backed by AI experts like Juniper Ventures, Seldon Lab, and angels at Anthropic and Apollo Research. You'll have access to some of the best AI expertise in the world. Massive impact. Our mission is to keep people in the economy well after AGI. You'll help shift the trajectory of AI development for the better, helping break the intelligence curse and prevent gradual disempowerment to keep humans in control of the future. About Workshop Labs We're building the AI economy for humans. While everyone else tries to automate the world top-down, we believe in augmenting people bottom-up. Our team previously created evals used by Open AI, completed frontier AI research at MIT/Cambridge/Oxford, worked in Stuart Russell's lab, and led product verticals at high growth startups. The essay series The Intelligence Curse has been covered in TIME, The New York Times, and AI 2027. Our vision is for everyone to have a personal AI aligned to their goals and values, helping them stay durably relevant in a post-AGI economy. As a public benefit corporation, we have a fiduciary duty to ensure that as AI becomes more powerful, humans become more empowered, not disempowered or replaced. We're an early stage startup, backed by legendary investors like Brad Burnham and Matt McIlwain, visionary product leaders like Jake Knapp and John Zeratsky, philosopher-builders like Brendan McCord, and top AI safety funds like Juniper Ventures. Our investors were early at Anthropic, Slack, Prime Intellect, DuckDuckGo, and Goodfire. Our advisors have held senior roles at Anthropic, Google DeepMind, and UK AISI. #J-18808-Ljbffr
    $115k-175k yearly est. 3d ago
  • Distributed Systems Engineer - High-Impact Cloud Storage

    Archil, Inc.

    Systems engineer job in San Francisco, CA

    A cloud storage technology company in San Francisco is looking for a Distributed Systems Engineer to work across the stack in building innovative storage solutions. You will be oncall for production systems and will design distributed systems to meet customer needs. The ideal candidate has over 3 years of experience in distributed systems, problem solving skills, and is passionate about enhancing customer experiences. Join us in our mission to revolutionize cloud storage with the next generation of applications. #J-18808-Ljbffr
    $87k-121k yearly est. 1d ago
  • Machine Learning Systems Engineer, RL Engineering

    Menlo Ventures

    Systems engineer job in San Francisco, CA

    About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: You want to build the cutting-edge systems that train AI models like Claude. You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever more capable, reliable and steerable AI. As an ML Systems Engineer on our Reinforcement Learning Engineering team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety. You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible. You're energized by the challenge of supporting and empowering our research team in the mission to build beneficial AI systems. Our finetuning researchers train our production Claude models, and internal research models, using RLHF and other related methods. Your job will be to build, maintain, and improve the algorithms and systems that these researchers use to train models. You'll be responsible for improving the speed, reliability, and ease-of-use of these systems. You may be a good fit if you: Have 4+ years of software engineering experience Like working on systems and tools that make other people more productive Are results-oriented, with a bias towards flexibility and impact Pick up slack, even if it goes outside your job description Enjoy pair programming (we love to pair!) Want to learn more about machine learning research Care about the societal impacts of your work Strong candidates may also have experience with: High performance, large scale distributed systems Large scale LLM training Python Implementing LLM finetuning algorithms, such as RLHF Representative projects: Profiling our reinforcement learning pipeline to find opportunities for improvement Building a system that regularly launches training jobs in a test environment so that we can quickly detect problems in the training pipeline Making changes to our finetuning systems so they work on new model architectures Building instrumentation to detect and eliminate Python GIL contention in our training code Diagnosing why training runs have started slowing down after some number of steps, and fixing it Implementing a stable, fast version of a new training algorithm proposed by a researcher Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position is: Annual Salary:$315,000-$425,000 USDLogistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact - advancing our long-term goals of steerable, trustworthy AI - rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. #J-18808-Ljbffr
    $87k-121k yearly est. 3d ago
  • Machine Learning Infrastructure Engineer

    Institute of Foundation Models

    Systems engineer job in Sunnyvale, CA

    About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you'll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role We're looking for a distributed ML infrastructure engineer to help extend and scale our training systems. You'll work side‑by‑side with world‑class researchers and engineers to: Extend distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod) Implement distributed optimizers from mathematical specs Build robust config + launch systems across multi‑node, multi‑GPU clusters Own experiment tracking, metrics logging, and job monitoring for external visibility Improve training system reliability, maintainability, and performance While much of the work will support large‑scale pre‑training, pre‑training experience is not required. Strong infrastructure and systems experience is what we value most. Key Responsibilities Distributed Framework Ownership - Extend or modify training frameworks (e.g., DeepSpeed, FSDP) to support new use cases and architectures. Optimizer Implementation - Translate mathematical optimizer specs into distributed implementations. Launch Config & Debugging - Create and debug multi‑node launch scripts with flexible batch sizes, parallelism strategies, and hardware targets. Metrics & Monitoring - Build systems for experiment tracking, job monitoring, and logging usable by collaborators and researchers. Infra Engineering - Write production‑quality code and tests for ML infra in PyTorch or JAX; ensure reliability and maintainability at scale. Qualifications Must-Haves: 5+ years of experience in ML systems, infra, or distributed training Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod) Strong software engineering fundamentals (Python, systems design, testing) Proven multi‑node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO) Ability to implement algorithms across GPUs/nodes based on mathematical specs Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team Experience with large‑scale machine learning workloads (strong ML fundamentals) Nice-to-Haves: Exposure to mixed‑precision training (e.g., bf16, fp8) with accuracy validation Familiarity with performance profiling, kernel fusion, or memory optimization Open‑source contributions or published research (MLSys, ICML, NeurIPS) CUDA or Triton kernel experience Experience with large‑scale pre‑training Experience building custom training pipelines at scale and modifying them for custom needs Deep familiarity with training infrastructure and performance tuning $150,000 - $450,000 a year Benefits Comprehensive medical, dental, and vision 401(k) program Generous PTO, sick leave, and holidays Paid parental leave and family‑friendly benefits On‑site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station #J-18808-Ljbffr
    $114k-174k yearly est. 3d ago
  • Staff Hardware Systems Engineer - BIOS/Firmware Lead

    Crusoe 4.1company rating

    Systems engineer job in Sunnyvale, CA

    A technology company in Sunnyvale is seeking a Staff Hardware Systems Engineer to lead the development of system firmware and kernel-level software for high-performance server platforms. The ideal candidate has over 8 years of experience in hardware systems development and strong expertise in BIOS and firmware engineering. You will directly influence the company's future by enhancing hardware compatibility and performance. Competitive compensation and benefits are provided, including stock options. #J-18808-Ljbffr
    $116k-169k yearly est. 3d ago

Learn more about systems engineer jobs

How much does a systems engineer earn in Walnut Creek, CA?

The average systems engineer in Walnut Creek, CA earns between $74,000 and $140,000 annually. This compares to the national average systems engineer range of $62,000 to $109,000.

Average systems engineer salary in Walnut Creek, CA

$102,000

What are the biggest employers of Systems Engineers in Walnut Creek, CA?

The biggest employers of Systems Engineers in Walnut Creek, CA are:
  1. RTX
  2. EUV Tech
Job type you want
Full Time
Part Time
Internship
Temporary