Senior systems engineer jobs in Hayward, CA - 4,472 jobs
All
Senior Systems Engineer
Systems Engineer
Infrastructure Engineer
Systems Engineer Staff
Machine Learning Infrastructure Engineer
Apple Inc. 4.8
Senior systems engineer job in Sunnyvale, CA
Sunnyvale, California, United States Machine Learning and AI
Want to ship amazing experiences in Apple products? Be part of the team in the Video Computer Vision (VCV) organization that focuses on people understanding from real-time video streams and building higher level reasoning algorithms. VCV delivered features such as Face ID, RoomPlan as well as many other computer vision algorithms powering Apple Vision Pro, iPhone, and iPad. We focus on a balance of research and development to deliver Apple quality, pioneering experiences. Come shape Apple products as a driven and dedicated ML Infrastructure and Data Engineer to push the limits of ML algorithms with hands‑on work and real world and simulated data, in an innovative team and be part of building the next big thing.
Description
As part of the Video Computer Vision (VCV) team, you will help us create the data and infrastructure ecosystem needed to support our ML development and continuously improve our features. We take full end-to-end ownership of our services and data products, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you'll have the opportunity to work on complex problems in close partnership with our ML engineers, data scientists and software integration teams.
Minimum Qualifications
Bachelor's degree in Computer Science or related discipline, and 2 years relevant industry experience.
Strong foundational knowledge in Computer Science.
Extensive programming experience in Python.
Hands‑on experience with cloud providers (AWS, GCP, or Azure).
Strong understanding of core infrastructure concepts (e.g., compute, networking, storage, containers, Kubernetes).
Preferred Qualifications
Experience with machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment.
Proficiency with cloud computing and distributed data processing infrastructure and tools (e.g., Ray, Spark, Trino).
Hands‑on experience with CI/CD pipelines and practices.
Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform, Pulumi, or CloudFormation).
Experience building on LLMs or other generative models.
Ability to drive projects from concept to production, balancing business needs with technical quality and timely delivery.
Excellent communication skills, ability to work both independently and multi‑functionally.
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .
Apple accepts applications to this posting on an ongoing basis.
#J-18808-Ljbffr
$147.4k-272.1k yearly 3d ago
Looking for a job?
Let Zippia find it for you.
Machine Learning Infrastructure Engineer
Ambience Healthcare
Senior systems engineer job in San Francisco, CA
About Us:
Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care. Trusted by top health systems across North America, Ambience's platform is live across outpatient, emergency, and inpatient settings, supporting more than 100 specialties with real-time, coding‑aware documentation. The platform integrates directly with Epic, Oracle Cerner, athenahealth, and other major EHRs. Founded in 2020 by Mike Ng and Nikhil Buduma, Ambience is headquartered in San Francisco and backed by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, Kleiner Perkins, and other leading investors.
Join us in the endeavor of accelerating the path to safe & useful clinical super intelligence by becoming part of our community of problem solvers, technologists, clinicians, and innovators.
The Role:
We're looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure that powers every AI system at Ambience. You'll work closely with our ML, data, and product teams to develop the foundational tools, systems, and workflows that support rapid iteration, robust evaluation, and production reliability for our LLM‑based products.
Our engineering roles are hybrid - working onsite at our San Francisco office three days per week.
What You'll Do:
You have 5+ years of experience as a software engineer, infrastructure engineer, or ML platform engineer
You've worked directly on systems that support ML research or production workloads - whether training pipelines, evaluation systems, or deployment frameworks
You write high-quality code (we primarily use Python) and have strong engineering and systems design instincts
You're excited to work closely with ML researchers and product engineers to unblock them with better infrastructure
You're pragmatic and care deeply about making tools that are reliable, scalable, and easy to use
You thrive in fast-paced, collaborative environments and are eager to take ownership of ambiguous problems
Who You Are:
Design, build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows
Improve internal tools and developer experience for ML experimentation and observability
Partner with ML engineers to optimize model deployment and monitoring across clinical workloads
Define standards for model versioning, performance tracking, and rollout processes
Collaborate across the engineering team to build reusable abstractions that accelerate AI product development
Drive performance, cost efficiency, and reliability improvements across our AI infrastructure stack
Pay Transparency
We offer a base compensation range of approximately $200,000-300,000 per year, with the addition of significant equity. This intentionally broad range provides flexibility for candidates to tailor their cash and equity mix based on individual preferences. Our compensation philosophy prioritizes meaningful equity grants, enabling team members to share directly in the impact they help create. If your expectations fall outside of this range, we still encourage you to apply-our approach to compensation considers a range of factors to ensure alignment with each candidate's unique needs and preferences.
Being at Ambience:
An opportunity to work with cutting edge AI technology, on a product that dramatically improves the quality of life for healthcare providers and the quality of care they can provide to their patients
Dedicated budget for personal development, including access to world class mentors, advisors, and an in‑house executive coach
Work alongside a world‑class, diverse team that is deeply mission aligned
Ownership over your success and the ability to significantly impact the growth of our company
Competitive salary and equity compensation with benefits including health, dental, and vision coverage, quarterly retreats, unlimited PTO, and a 401(k) plan
Ambience is committed to supporting every candidate's ability to fully participate in our hiring process. If you need any accommodations during your application or interviews, please reach out to our Recruiting team at accommodations@ambiencehealth.com. We'll handle your request confidentially and work with you to ensure an accessible and equitable experience for all candidates.
#J-18808-Ljbffr
$200k-300k yearly 4d ago
Machine Learning Infrastructure Engineer at early-stage private AI platform
Jack & Jill/External ATS
Senior systems engineer job in San Francisco, CA
This is a job that we are recruiting for on behalf of one of our customers.
To apply, speak to Jack. He's an AI agent that sends you unmissable jobs and then helps you ace the interview. He'll make sure you are considered for this role, and help you find others if you ask.
Machine Learning Infrastructure Engineer
Company Description: Early-stage private AI platform
Job Description: Build the core infrastructure to serve thousands, then millions, of private, personalized AI models at scale. This role involves optimizing model serving performance for low latency and cost, and integrating a TEE-based privacy stack to ensure user data and models are exclusively accessible by the user, not even the company. Drive the foundational systems for a new era of personal AI.
Location: San Francisco, USA
Why this role is remarkable:
Pioneer the infrastructure for truly private, personal AI models, ensuring user data remains confidential.
Join an early-stage, well-funded startup backed by top-tier VCs and leading AI experts.
Make a massive impact on the future of AI, helping to keep humans empowered in a post-AGI world.
What you will do:
Build infrastructure for deploying thousands to millions of personalized finetuned models.
Monitor and optimize in-the-wild model serving performance for low latency and cost.
Integrate with a TEE-based privacy stack to guarantee user data and model confidentiality.
The ideal candidate:
Deep understanding of the machine learning stack, including transformer optimization and GPU performance.
Ability to execute quickly in a fast-paced, early-stage startup environment.
A missionary mentality, passionate about ensuring AI works for people.
How to Apply:
To apply for this job speak to Jack, our AI recruiter.
Step 1. Visit our website
Step 2. Click 'Speak with Jack'
Step 3. Login with your LinkedIn profile
Step 4. Talk to Jack for 20 minutes so he can understand your experience and ambitions
Step 5. If the hiring manager would like to meet you, Jack will make the introduction
#J-18808-Ljbffr
$115k-175k yearly est. 2d ago
Machine Learning Infrastructure Engineer
David Ai
Senior systems engineer job in San Francisco, CA
David AI is the first audio data research company. We bring an R&D approach to data-developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, and we believe audio is the gateway. Speech is versatile, accessible, and human-it fits naturally into everyday life. As audio AI advances and new use cases emerge, high-quality training data is the bottleneck. This is where David AI comes in.
David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we've brought on most FAANG companies and AI labs as customers. We recently raised a $50M Series B from Meritech, NVIDIA, Jack Altman (Alt Capital), Amplify Partners, First Round Capital and other Tier 1 investors.
Our team is sharp, humble, ambitious, and tight-knit. We're looking for the best research, engineering, product, and operations minds to join us on our mission to push the frontier of audio AI.
About our Engineering team
At David AI, our engineers build the pipelines, platforms, and models that transform raw audio into high-signal data for leading AI labs and enterprises. We're a tight-knit team of product engineers, infrastructure specialists, and machine learning experts focused on building the world's first audio data research company.
We move fast, own our work end-to-end, and ship to production daily. Our team designs real-time pipelines handling terabytes of speech data and deploys cutting-edge generative audio models.
About this role
As our Founding Machine Learning Infrastructure Engineer at David AI, you will build and scale the core infrastructure that powers our cutting-edge audio ML products. You'll be leading the development of the systems that enable our researchers and engineers to train, deploy, and evaluate machine learning models efficiently.
In this role, you will
Design and maintain data pipelines for processing massive audio datasets, ensuring terabytes of data are managed, versioned, and fed into model training efficiently.
Develop frameworks for training audio models on compute clusters, managing cloud resources, optimizing GPU utilization, and improving experiment reproducibility.
Create robust infrastructure for deploying ML models to production, including APIs, microservices, model serving frameworks, and real-time performance monitoring.
Apply software engineering best practices with monitoring, logging, and alerting to guarantee high availability and fault‑tolerant production workloads.
Translate research prototypes into production pipelines, working with ML engineers and data teams to support efficient data labeling and preparation.
and optimization techniques to enhance infrastructure velocity and reliability.
Your background looks like
5+ years of backend engineering with 2+ years ML infrastructure experience.
Hands‑on experience scaling cloud infrastructure and large‑scale data processing pipelines for ML model training and evaluation.
Proficient with Docker, Kubernetes, and CI/CD pipelines.
Proven ML model deployment and lifecycle management in production.
Strong system design skills optimizing for scale and performance.
Proficient in Python with deep Kubernetes experience.
Bonus points if you have
Experience with feature stores, experiment tracking (MLflow, Weights and Biases), or custom CI/CD pipelines.
Familiarity with large‑scale data ingestion and streaming systems (Spark, Kafka, Airflow).
Proven ability to thrive in fast‑moving startup environments.
Some technologies we work with
Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Trigger.dev, WebRTC, FFmpeg.
Benefits
Unlimited PTO.
Top‑notch health, dental, and vision coverage with 100% coverage for most plans.
FSA & HSA access.
401k access.
Meals 2x daily through DoorDash + snacks and beverages available at the office.
Unlimited company‑sponsored Barry's classes.
#J-18808-Ljbffr
$115k-175k yearly est. 3d ago
Machine Learning Infrastructure Engineer
Workshop Labs
Senior systems engineer job in San Francisco, CA
Build the infrastructure to serve personal AI models privately and at scale.
We're building the first truly private, personal AI - one that learns your skills, judgment, and preferences without big tech ever seeing your data.
Our core ML systems challenge: how do we serve the world's best personal model, at low cost and high speed, with bulletproof privacy?
What you'll do
Build the infrastructure that lets us create & deploy thousands and eventually millions of personalized finetuned models for our customers
Monitor & optimize in-the-wild model serving performance to hit low latency & cost
Interface with the TEE-based privacy stack that lets us guarantee user data & models can only be seen & used by the user-not even us-and integrate the privacy architecture with the finetuning & inference code
You have
A deep understanding of the machine learning stack. You can dive into the details of how transformers work & performance optimization techniques for them. You have a mental model of GPUs sufficient to reason about performance from first principles. You can drill down from ML code to metal.
Ability to execute quickly. We ship fast and fail fast so we can win faster. The challenge of human relevance in a post-AGI world isn't going to solve itself.
A missionary mentality. We're a mission-driven company, looking for mission-first people. If you're passionate about ensuring AI works for people (and not the other way around), you've come to the right place.
Ready to roll up your sleeves. We're an early stage startup, so we're looking for someone who can wear many hats.
Experience you may have
Work at a fast-paced AI startup, or top AI lab
Experience deploying ML systems at scale. You might have worked with frameworks like vLLM, S-LoRA, Punica, or LoRAX.
Experience with privacy-first infrastructure. You're familiar with confidential computing & ability to reason about both technical and real-world confidentiality and security. You may have worked with secure enclaves, TEEs, code measurement & remote attestation, Nvidia Confidential Computing, Intel TDX or AMD SEV-SNP, or related confidential computing technologies.
We encourage speculative applications; we expect many strong candidates will have different experience or unconventional backgrounds.
What we offer
Generous compensation and early stage equity. We're competitive with the top startups, because we believe the best talent deserves it.
World-class expertise. We're based in top AI research hubs in San Francisco and London. We're backed by AI experts like Juniper Ventures, Seldon Lab, and angels at Anthropic and Apollo Research. You'll have access to some of the best AI expertise in the world.
Massive impact. Our mission is to keep people in the economy well after AGI. You'll help shift the trajectory of AI development for the better, helping break the intelligence curse and prevent gradual disempowerment to keep humans in control of the future.
About Workshop Labs
We're building the AI economy for humans. While everyone else tries to automate the world top-down, we believe in augmenting people bottom-up.
Our team previously created evals used by Open AI, completed frontier AI research at MIT/Cambridge/Oxford, worked in Stuart Russell's lab, and led product verticals at high growth startups.
The essay series The Intelligence Curse has been covered in TIME, The New York Times, and AI 2027.
Our vision is for everyone to have a personal AI aligned to their goals and values, helping them stay durably relevant in a post-AGI economy. As a public benefit corporation, we have a fiduciary duty to ensure that as AI becomes more powerful, humans become more empowered, not disempowered or replaced.
We're an early stage startup, backed by legendary investors like Brad Burnham and Matt McIlwain, visionary product leaders like Jake Knapp and John Zeratsky, philosopher-builders like Brendan McCord, and top AI safety funds like Juniper Ventures. Our investors were early at Anthropic, Slack, Prime Intellect, DuckDuckGo, and Goodfire. Our advisors have held senior roles at Anthropic, Google DeepMind, and UK AISI.
#J-18808-Ljbffr
$115k-175k yearly est. 1d ago
Distributed ML Infrastructure Engineer
Institute of Foundation Models
Senior systems engineer job in Sunnyvale, CA
A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed and FSDP. This role offers a competitive salary ranging from $150,000 to $450,000 annually along with comprehensive benefits and amenities.
#J-18808-Ljbffr
$114k-174k yearly est. 1d ago
Staff Hardware Systems Engineer - BIOS/Firmware Lead
Crusoe 4.1
Senior systems engineer job in Sunnyvale, CA
A technology company in Sunnyvale is seeking a Staff Hardware SystemsEngineer to lead the development of system firmware and kernel-level software for high-performance server platforms. The ideal candidate has over 8 years of experience in hardware systems development and strong expertise in BIOS and firmware engineering. You will directly influence the company's future by enhancing hardware compatibility and performance. Competitive compensation and benefits are provided, including stock options.
#J-18808-Ljbffr
$116k-169k yearly est. 1d ago
Machine Learning Systems Engineer, Research Tools
Menlo Ventures
Senior systems engineer job in San Francisco, CA
Machine Learning SystemsEngineer, Research ToolsAbout Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the role:
You want to build the cutting-edge systems that train AI models like Claude. You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever more capable, reliable and steerable AI. As an ML SystemsEngineer on our Research Tools team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety. You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible. You're energized by the challenge of supporting and empowering our research team in the mission to build beneficial AI systems.
Our finetuning researchers train our production Claude models, and internal research models, using RLHF and other related methods. Your job will be to build, maintain, and improve the algorithms and systems that these researchers use to train models. You'll be responsible for improving the speed, reliability, and ease-of-use of these systems.
You may be a good fit if you:
Have 2+ years of software engineering experience
Like working on systems and tools that make other people more productive
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work
Strong candidates may also have experience with:
Python
Implementing LLM finetuning algorithms, such as RLHF
Representative projects:
Profiling our reinforcement learning pipeline to find opportunities for improvement
Building a system that regularly launches training jobs in a test environment so that we can quickly detect problems in the training pipeline
Making changes to our finetuning systems so they work on new model architectures
Building instrumentation to detect and eliminate Python GIL contention in our training code
Diagnosing why training runs have started slowing down after some number of steps, and fixing it
Implementing a stable, fast version of a new training algorithm proposed by a researcher
Deadline to apply:None. Applications will be reviewed on a rolling basis.
The expected salary range for this position is:
$300,000 - $405,000 USD
Logistics
Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship:We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.
We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.
How we're different
We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact - advancing our long-term goals of steerable, trustworthy AI - rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.
Come work with us!
Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage:Learn aboutour policy for using AI in our application process
Create a Job Alert
Interested in building your career at Anthropic? Get future opportunities sent straight to your email.
Apply for this job
indicates a required field
First Name *
Last Name *
Email *
Phone
Resume/CV
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
(Optional) Personal Preferences *
How do you pronounce your name?
Website
Publications (e.g. Google Scholar) URL
Are you open to working in-person in one of our offices 25% of the time? * Select...
When is the earliest you would want to start working with us?
Do you have any deadlines or timeline considerations we should be aware of?
AI Policy for Application * Select...
We believe that AI will have a transformative impact on the world, and we're seeking exceptional candidates who collaborate thoughtfully with Claude to realize this vision. At the same time, we want to understand your unique skills, expertise, and perspective through our hiring process. We invite you to review our AI partnership guidelines for candidates and confirm your understanding by selecting “Yes.”
Why Anthropic? *
Why do you want to work at Anthropic? (We value this response highly - great answers are often 200-400 words.)
Additional Information *
Add a cover letter or anything else you want to share.
LinkedIn Profile
Please ensure to provide
either
your LinkedIn profile or Resume, we require at least one of the two.
Are you open to relocation for this role? * Select...
What is the address from which you plan on working? If you would need to relocate, please type "relocating".
Have you ever interviewed at Anthropic before? * Select...
Do you have 8 or more years of software engineering experience? * Select...
Do you have experience building and maintaining data processing pipelines or infrastructure at scale? * Select...
Have you worked directly with technical stakeholders (like researchers or data scientists) to understand and implement their infrastructure needs? * Select...
Are you proficient in Python and have experience working with cloud platforms (AWS/GCP)? * Select...
Have you independently owned and delivered technical projects from conception to production? * Select...
Do you require visa sponsorship? * Select...
Will you now or will you in the future require employment visa sponsorship to work in the country in which the job you're applying for is located? * Select...
Voluntary Self-Identification
For government reporting purposes, we ask candidates to respond to the below self-identification survey.Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiringprocess or thereafter. Any information that you do provide will be recorded and maintained in aconfidential file.
As set forth in Anthropic's Equal Employment Opportunity policy,we do not discriminate on the basis of any protected group status under any applicable law.
If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection.As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measurethe effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categoriesis as follows:
A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.
A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.
An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.
An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.
Select...
Voluntary Self-Identification of Disability
Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026
Voluntary Self-Identification of DisabilityForm CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026
Why are you being asked to complete this form?
We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.
Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor's Office of Federal Contract Compliance Programs (OFCCP) website at ***************** .
How do you know if you have a disability?
A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:
Alcohol or other substance use disorder (not currently using drugs illegally)
Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
Blind or low vision
Cancer (past or present)
Cardiovascular or heart disease
Celiac disease
Cerebral palsy
Deaf or serious difficulty hearing
Diabetes
Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
Epilepsy or other seizure disorder
Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
Intellectual or developmental disability
Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
Missing limbs or partially missing limbs
Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
Nervous system condition, for example, migraine headaches, Parkinson's disease, multiple sclerosis (MS)
Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
Partial or complete paralysis (any cause)
Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
Short stature (dwarfism)
Traumatic brain injury
Disability Status Select...
PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.
#J-18808-Ljbffr
$87k-121k yearly est. 1d ago
ML Engineer - Production-Scale AI Systems
Inference
Senior systems engineer job in San Francisco, CA
A cutting-edge AI startup in San Francisco is seeking a Machine Learning Engineer. In this role, you will build and improve core ML systems that drive custom model training platforms. You will lead projects from data intake to model delivery, creating robust tools and ensuring model performance. The ideal candidate has experience in AI model training with PyTorch, data processing, and creating benchmarks. Offering competitive salaries within a range of $220,000 to $320,000, plus equity and benefits.
#J-18808-Ljbffr
$87k-121k yearly est. 4d ago
Distributed Systems Engineer - High-Impact Cloud Storage
Archil, Inc.
Senior systems engineer job in San Francisco, CA
A cloud storage technology company in San Francisco is looking for a Distributed SystemsEngineer to work across the stack in building innovative storage solutions. You will be oncall for production systems and will design distributed systems to meet customer needs. The ideal candidate has over 3 years of experience in distributed systems, problem solving skills, and is passionate about enhancing customer experiences. Join us in our mission to revolutionize cloud storage with the next generation of applications.
#J-18808-Ljbffr
$87k-121k yearly est. 4d ago
SRE Cybersecurity Engineer - Scale Systems (Equity)
Pantera Capital
Senior systems engineer job in Palo Alto, CA
A tech-focused financial services firm in California is seeking a Cybersecurity/SRE professional to secure and maintain the reliability of its infrastructure. Responsibilities include building secure applications on AWS, managing identities, and strengthening Kubernetes security. The ideal candidate has expertise in Python, Terraform, and large distributed systems, and holds a proactive, problem-solving mindset. Competitive salary and comprehensive benefits included.
#J-18808-Ljbffr
$86k-120k yearly est. 5d ago
Siri Runtime ML Engineer - Systems & Interaction
Apple Inc. 4.8
Senior systems engineer job in Cupertino, CA
A leading technology company is seeking a Machine Learning Engineer to contribute to the development of Siri. You will work on designing and optimizing machine learning algorithms and collaborate cross-functionally at Apple. The ideal candidate will have strong programming skills and experience in machine learning. Salary base range is $126,800 to $220,900, with additional benefits including stock options and comprehensive healthcare.
#J-18808-Ljbffr
$126.8k-220.9k yearly 5d ago
Machine Learning Infrastructure Engineer
Ambience Healthcare, Inc.
Senior systems engineer job in San Francisco, CA
About Us:
Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care. Trusted by top health systems across North America, Ambience's platform is live across outpatient, emergency, and inpatient settings, supporting more than 100 specialties with real-time, coding-aware documentation. The platform integrates directly with Epic, Oracle Cerner, athenahealth, and other major EHRs. Founded in 2020 by Mike Ng and Nikhil Buduma, Ambience is headquartered in San Francisco and backed by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, Kleiner Perkins, and other leading investors.
Join us in the endeavor of accelerating the path to safe & useful clinical super intelligence by becoming part of our community of problem solvers, technologists, clinicians, and innovators.
The Role:
We're looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure that powers every AI system at Ambience. You'll work closely with our ML, data, and product teams to develop the foundational tools, systems, and workflows that support rapid iteration, robust evaluation, and production reliability for our LLM-based products.
Our Engineering roles are hybrid in our SF office 3x/wk.
What You'll Do:
You have 5+ years of experience as a software engineer, infrastructure engineer, or ML platform engineer
You've worked directly on systems that support ML research or production workloads - whether training pipelines, evaluation systems, or deployment frameworks
You write high-quality code (we primarily use Python) and have strong engineering and systems design instincts
You're excited to work closely with ML researchers and product engineers to unblock them with better infrastructure
You're pragmatic and care deeply about making tools that are reliable, scalable, and easy to use
You thrive in fast-paced, collaborative environments and are eager to take ownership of ambiguous problems
Who You Are:
Design, build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows
Improve internal tools and developer experience for ML experimentation and observability
Partner with ML engineers to optimize model deployment and monitoring across clinical workloads
Define standards for model versioning, performance tracking, and rollout processes
Collaborate across the engineering team to build reusable abstractions that accelerate AI product development
Drive performance, cost efficiency, and reliability improvements across our AI infrastructure stack
Pay Transparency
We offer a base compensation range of approximately $200,000-300,000 per year, exclusive of equity. This intentionally broad range provides flexibility for candidates to tailor their cash and equity mix based on individual preferences. Our compensation philosophy prioritizes meaningful equity grants, enabling team members to share directly in the impact they help create.
Are you outside of the range? We encourage you to still apply: we take an individualized approach to ensure that compensation accounts for all of the life factors that matter for each candidate.
Being at Ambience:
An opportunity to work with cutting edge AI technology, on a product that dramatically improves the quality of life for healthcare providers and the quality of care they can provide to their patients
Dedicated budget for personal development, including access to world class mentors, advisors, and an in-house executive coach
Work alongside a world-class, diverse team that is deeply mission aligned
Ownership over your success and the ability to significantly impact the growth of our company
Competitive salary and equity compensation with benefits including health, dental, and vision coverage, quarterly retreats, unlimited PTO, and a 401(k) plan
Ambience is committed to supporting every candidate's ability to fully participate in our hiring process. If you need any accommodations during your application or interviews, please reach out to our Recruiting team at accommodations@ambiencehealth.com. We'll handle your request confidentially and work with you to ensure an accessible and equitable experience for all candidates.
#J-18808-Ljbffr
$200k-300k yearly 3d ago
Privacy-First ML Infrastructure Engineer
Workshop Labs
Senior systems engineer job in San Francisco, CA
A pioneering AI startup in San Francisco is looking for an experienced individual to build infrastructure for deploying personalized AI models. The role demands a strong understanding of machine learning technology and a passion for enabling user-controlled AI solutions. Ideal candidates will thrive in fast-paced environments and contribute to impactful AI development. The company offers competitive compensation, equity, and a significant role in shaping the future of AI.
#J-18808-Ljbffr
$115k-175k yearly est. 1d ago
Machine Learning Infrastructure Engineer
Institute of Foundation Models
Senior systems engineer job in Sunnyvale, CA
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
As part of our team, you'll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.
The Role
We're looking for a distributed ML infrastructure engineer to help extend and scale our training systems. You'll work side‑by‑side with world‑class researchers and engineers to:
Extend distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)
Implement distributed optimizers from mathematical specs
Build robust config + launch systems across multi‑node, multi‑GPU clusters
Own experiment tracking, metrics logging, and job monitoring for external visibility
Improve training system reliability, maintainability, and performance
While much of the work will support large‑scale pre‑training, pre‑training experience is not required. Strong infrastructure and systems experience is what we value most.
Key Responsibilities
Distributed Framework Ownership - Extend or modify training frameworks (e.g., DeepSpeed, FSDP) to support new use cases and architectures.
Optimizer Implementation - Translate mathematical optimizer specs into distributed implementations.
Launch Config & Debugging - Create and debug multi‑node launch scripts with flexible batch sizes, parallelism strategies, and hardware targets.
Metrics & Monitoring - Build systems for experiment tracking, job monitoring, and logging usable by collaborators and researchers.
Infra Engineering - Write production‑quality code and tests for ML infra in PyTorch or JAX; ensure reliability and maintainability at scale.
Qualifications Must-Haves:
5+ years of experience in ML systems, infra, or distributed training
Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)
Strong software engineering fundamentals (Python, systems design, testing)
Proven multi‑node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO)
Ability to implement algorithms across GPUs/nodes based on mathematical specs
Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team
Experience with large‑scale machine learning workloads (strong ML fundamentals)
Nice-to-Haves:
Exposure to mixed‑precision training (e.g., bf16, fp8) with accuracy validation
Familiarity with performance profiling, kernel fusion, or memory optimization
Open‑source contributions or published research (MLSys, ICML, NeurIPS)
CUDA or Triton kernel experience
Experience with large‑scale pre‑training
Experience building custom training pipelines at scale and modifying them for custom needs
Deep familiarity with training infrastructure and performance tuning
$150,000 - $450,000 a year
Benefits
Comprehensive medical, dental, and vision
401(k) program
Generous PTO, sick leave, and holidays
Paid parental leave and family‑friendly benefits
On‑site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station
#J-18808-Ljbffr
$114k-174k yearly est. 1d ago
Staff Hardware Systems Engineer - BIOS/Firmware Lead
Crusoe 4.1
Senior systems engineer job in San Francisco, CA
A technology company in Sunnyvale is seeking a Staff Hardware SystemsEngineer to lead the development of system firmware and kernel-level software for high-performance server platforms. The ideal candidate has over 8 years of experience in hardware systems development and strong expertise in BIOS and firmware engineering. You will directly influence the company's future by enhancing hardware compatibility and performance. Competitive compensation and benefits are provided, including stock options.
#J-18808-Ljbffr
$117k-170k yearly est. 1d ago
Distributed Systems Engineer
Archil, Inc.
Senior systems engineer job in San Francisco, CA
Role
As a distributed systemsengineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product direction.
We'll expect you to be able to:
Be oncall for a production system to help our customers if anything goes wrong.
Build out never-before-seen capabilities in a storage service
Design distributed systems interactions for atomicity and idempotency
Deploy infrastructure and generalize infrastructure across different clouds
Operate through changing customer requirements with lots of ambiguity
Who are you?
You have 3+ years of experience building and operating distributed systems (flexible).
Ideally, you've worked at a startup before, so you know how chaotic this time can be.
You've successfully resolved disagreements at work before, and you understand that the highest priority is helping our customers - not being right.
You're comfortable debugging problems that occur as a result of failures in multiple, different systems, using tools like metrics and logs.
You've been paged at 3am to solve a complex production issue before.
You're knowledgeable about distributed systems: you get how consensus works, you know how to scale systems, and you know what pitfalls in API design to avoid.
You're familiar with how to optimize the performance of a system, including a general sense of how much latency different operations take, and what kind of bottlenecks could lead to a reduction in potential throughput.
Most of all, you know how computers work from the silicon up. Someone once asked you in an interview “what happens when you go to Google.com”, and there wasn't enough time in the interview to talk about all of the steps.
Why join us?
By building the highest-performance, simplest storage product in the cloud, we have a great chance of changing how the world builds the next-generation of applications (and with AI, more applications will be written in the next 5 years than ever before). We'd love for your to be a part of our journey.
How to join?
Show us that you're knowledgeable about the space that we're working in on your application. It's up to you how you do this, but one potential way is by answering one of the following questions:
How do you think our system works?
What do you think our biggest technical challenge is?
What would make our system not work?
About Archil
Archil is on a mission to change how developers build applications in the cloud, by building the next, default storage platform in the cloud.
Over the past 15 years, S3 has become the default way to store inactive data sets in the cloud, but the next-generation of AI and analytics applications need to actively process more data than ever before. We're solving this problem by building the first Volume storage product that's as fast as EBS, infinitely scaleable like S3, and connects to existing data sets in S3 and other repositories. Our customers choose Archil because this architecture radically simplifies how they think about working with their data (every application becomes stateless, no cold-start latencies, and no need to worry about checkpointing or backup). Hacker News agrees.
Hunter, the founder, has 10 years of experience building and operating cloud storage, including helping to launch Amazon's EFS product and working on bleeding-edge storage at Netflix. He started the company after working with hundreds of customers across these roles, and identifying a need for a new kind of storage product.
We're fully in-person in San Francisco. If you're also someone interested in distributed systems, completely focused on how to make customers successful, and interested in solving really big technical challenges, we'd love for you to join us.
#J-18808-Ljbffr
$87k-121k yearly est. 4d ago
ML Systems Engineer, Research Tools - Impactful
Menlo Ventures
Senior systems engineer job in San Francisco, CA
A leading AI research company in New York seeks a Machine Learning SystemsEngineer to build cutting-edge systems for training AI models. This role involves developing critical algorithms, improving system performance, and collaborating with a dynamic research team. Ideal candidates have a strong software engineering background and care about the societal impacts of AI technology. The expected salary range is $300,000 - $405,000 USD, with a hybrid work policy requiring 25% in-office presence.
#J-18808-Ljbffr
$87k-121k yearly est. 1d ago
ML Infrastructure Engineer - Real-Time Vision
Apple Inc. 4.8
Senior systems engineer job in Sunnyvale, CA
A leading technology company is looking for a Machine Learning Infrastructure Engineer in Sunnyvale, California. You will develop data ecosystems and infrastructure for ML projects, partnering closely with engineers and scientists. Candidates should have a Bachelor's in Computer Science and experience with cloud providers, as well as strong programming skills in Python. This is an opportunity to be a part of innovative projects that influence the next generation of technology.
#J-18808-Ljbffr
$150k-196k yearly est. 3d ago
Machine Learning Systems Engineer
Menlo Ventures
Senior systems engineer job in Berkeley, CA
Who We Are
At RelationalAI, we are building the future of intelligent data systems through our cloud-native relational knowledge graph management system-a platform designed for learning, reasoning, and prediction.
We are a remote-first, globally distributed team with colleagues across six continents. From day one, we've embraced asynchronous collaboration and flexible schedules, recognizing that innovation doesn't follow a 9-to-5.
We are committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of every team member and believe in fostering a culture of respect, curiosity, and innovation. We support each other's growth and success-and take the well‑being of our colleagues seriously. We encourage everyone to find a healthy balance that affords them a productive, happy life, wherever they choose to live.
We bring together engineers who love building core infrastructure, obsess over developer experience, and want to make complex systems scalable, observable, and reliable.
Machine Learning SystemsEngineer
Location: Remote (San Francisco Bay Area / North America / South America)
Experience Level: 3+ years of experience in machine learning engineering or research
About ScalarLM
This role will involve heavily working with the ScalarLM framework and team.
ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self‑improving agents-all via an OpenAI‑compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron‑LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher‑lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation.
ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 - October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry.
It is a fully open source project (CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self‑improving AI agents similar to DeepSeek R1.
ScalarLM is supported and maintained by TensorWave in addition to RelationalAI.
The Role
As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large‑scale language model applications on top of it. You'll operate at the intersection of high-performance computing, distributed systems, and cutting‑edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.
This is an opportunity to take on technically demanding projects, contribute to foundational systems, and help shape the next generation of intelligent computing.
You Will
Contribute code and performance improvements to the open source project.
Develop and optimize distributed training algorithms for large language models.
Implement high‑performance inference engines and optimization techniques.
Work on integration between vLLM, Megatron‑LM, and HuggingFace ecosystems.
Build tools for seamless model training, fine‑tuning, and deployment.
Optimize performance of advanced GPU architectures.
Collaborate with the open source community on feature development and bug fixes.
Research and implement new techniques for self‑improving AI agents.
Who You Are Technical Skills
Programming Languages: Proficiency in both C/C++ and Python
High Performance Computing: Deep understanding of HPC concepts, including:
MPI (Message Passing Interface) programming and optimization
Bulk Synchronous Parallel (BSP) computing models
Multi‑GPU and multi‑node distributed computing
CUDA/ROCm programming experience preferred
Machine Learning Foundations:
Solid understanding of gradient descent and backpropagation algorithms
Experience with transformer architectures and the ability to explain their mechanics
Knowledge of deep learning training and its applications
Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)
Research and Development
Publications: Experience with machine learning research and publications preferred
Research Skills: Ability to read, understand, and implement techniques from recent ML research papers
Open Source: Demonstrated commitment to open source development and community collaboration
Experience
3+ years of experience in machine learning engineering or research.
Experience with large-scale distributed training frameworks (Megatron‑LM, DeepSpeed, FairScale, etc.).
Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.).
Experience with containerization (Docker, Kubernetes) and cluster management.
Background in systems programming and performance optimization.
Bonus points if:
PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field.
Experience with SLURM, Kubernetes, or other cluster orchestration systems.
Knowledge of mixed precision training, data parallel training, and scaling laws.
Experience with transformer architecture, pytorch, decoding algorithms.
Familiarity with high performance GPU programming ecosystem.
Previous contributions to major open source ML projects.
Experience with MLOps and model deployment at scale.
Understanding of modern attention mechanisms (multi‑head attention, grouped query attention, etc.).
Why RelationalAI
RelationalAI is committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of our team. We are driven by curiosity, value innovation, and help each other to succeed and to grow. We take the well‑being of our colleagues seriously, and offer flexible working hours so each individual can find a healthy balance that affords them a productive, happy life wherever they choose to live.
🌎 Global Benefits at RelationalAI
At RelationalAI, we believe that people do their best work when they feel supported, empowered, and balanced. Our benefits prioritize well‑being, flexibility, and growth, ensuring you have the resources to thrive both professionally and personally.
We are all owners in the company and reward you with a competitive salary and equity.
Work from anywhere in the world.
Comprehensive benefits coverage, including global mental health support
Open PTO - Take the time you need, when you need it.
Company Holidays, Your Regional Holidays, and RAI Holidays-where we take one Monday off each month, followed by a week without recurring meetings, giving you the time and space to recharge.
Paid parental leave - Supporting new parents as they grow their families.
We invest in your learning & development
Regular team offsites and global events - Building strong connections while working remotely through team offsites and global events that bring everyone together.
A culture of transparency & knowledge‑sharing - Open communication through team standups, fireside chats, and open meetings.
Country Hiring Guidelines
RelationalAI hires around the world. All of our roles are remote; however, some locations might carry specific eligibility requirements.
Because of this, understanding location & visa support helps us better prepare to onboard our colleagues.
Our People Operations team can help answer any questions about location after starting the recruitment process.
Privacy Policy
EU residents applying for positions at RelationalAI can see our Privacy Policy here.
California residents applying for positions at RelationalAI can see our Privacy Policy here.
Equal Opportunity
RelationalAI is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.
#J-18808-Ljbffr
How much does a senior systems engineer earn in Hayward, CA?
The average senior systems engineer in Hayward, CA earns between $95,000 and $180,000 annually. This compares to the national average senior systems engineer range of $82,000 to $141,000.
Average senior systems engineer salary in Hayward, CA
$131,000
What are the biggest employers of Senior Systems Engineers in Hayward, CA?
The biggest employers of Senior Systems Engineers in Hayward, CA are: