Infrastructure engineer jobs in Berkeley, CA - 2,321 jobs

All

Infrastructure Engineer

Systems Engineer

Information Engineer

Machine Learning Infrastructure Engineer
Apple Inc. 4.8
Infrastructure engineer job in Sunnyvale, CA
Sunnyvale, California, United States Machine Learning and AI Want to ship amazing experiences in Apple products? Be part of the team in the Video Computer Vision (VCV) organization that focuses on people understanding from real-time video streams and building higher level reasoning algorithms. VCV delivered features such as Face ID, RoomPlan as well as many other computer vision algorithms powering Apple Vision Pro, iPhone, and iPad. We focus on a balance of research and development to deliver Apple quality, pioneering experiences. Come shape Apple products as a driven and dedicated ML Infrastructure and Data Engineer to push the limits of ML algorithms with hands‑on work and real world and simulated data, in an innovative team and be part of building the next big thing. Description As part of the Video Computer Vision (VCV) team, you will help us create the data and infrastructure ecosystem needed to support our ML development and continuously improve our features. We take full end-to-end ownership of our services and data products, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you'll have the opportunity to work on complex problems in close partnership with our ML engineers, data scientists and software integration teams. Minimum Qualifications Bachelor's degree in Computer Science or related discipline, and 2 years relevant industry experience. Strong foundational knowledge in Computer Science. Extensive programming experience in Python. Hands‑on experience with cloud providers (AWS, GCP, or Azure). Strong understanding of core infrastructure concepts (e.g., compute, networking, storage, containers, Kubernetes). Preferred Qualifications Experience with machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment. Proficiency with cloud computing and distributed data processing infrastructure and tools (e.g., Ray, Spark, Trino). Hands‑on experience with CI/CD pipelines and practices. Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform, Pulumi, or CloudFormation). Experience building on LLMs or other generative models. Ability to drive projects from concept to production, balancing business needs with technical quality and timely delivery. Excellent communication skills, ability to work both independently and multi‑functionally. At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant . Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr
$147.4k-272.1k yearly 3d ago

Looking for a job?

Let Zippia find it for you.

Machine Learning Infrastructure Engineer
Ambience Healthcare
Infrastructure engineer job in San Francisco, CA
About Us: Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care. Trusted by top health systems across North America, Ambience's platform is live across outpatient, emergency, and inpatient settings, supporting more than 100 specialties with real-time, coding‑aware documentation. The platform integrates directly with Epic, Oracle Cerner, athenahealth, and other major EHRs. Founded in 2020 by Mike Ng and Nikhil Buduma, Ambience is headquartered in San Francisco and backed by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, Kleiner Perkins, and other leading investors. Join us in the endeavor of accelerating the path to safe & useful clinical super intelligence by becoming part of our community of problem solvers, technologists, clinicians, and innovators. The Role: We're looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure that powers every AI system at Ambience. You'll work closely with our ML, data, and product teams to develop the foundational tools, systems, and workflows that support rapid iteration, robust evaluation, and production reliability for our LLM‑based products. Our engineering roles are hybrid - working onsite at our San Francisco office three days per week. What You'll Do: You have 5+ years of experience as a software engineer, infrastructure engineer, or ML platform engineer You've worked directly on systems that support ML research or production workloads - whether training pipelines, evaluation systems, or deployment frameworks You write high-quality code (we primarily use Python) and have strong engineering and systems design instincts You're excited to work closely with ML researchers and product engineers to unblock them with better infrastructure You're pragmatic and care deeply about making tools that are reliable, scalable, and easy to use You thrive in fast-paced, collaborative environments and are eager to take ownership of ambiguous problems Who You Are: Design, build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows Improve internal tools and developer experience for ML experimentation and observability Partner with ML engineers to optimize model deployment and monitoring across clinical workloads Define standards for model versioning, performance tracking, and rollout processes Collaborate across the engineering team to build reusable abstractions that accelerate AI product development Drive performance, cost efficiency, and reliability improvements across our AI infrastructure stack Pay Transparency We offer a base compensation range of approximately $200,000-300,000 per year, with the addition of significant equity. This intentionally broad range provides flexibility for candidates to tailor their cash and equity mix based on individual preferences. Our compensation philosophy prioritizes meaningful equity grants, enabling team members to share directly in the impact they help create. If your expectations fall outside of this range, we still encourage you to apply-our approach to compensation considers a range of factors to ensure alignment with each candidate's unique needs and preferences. Being at Ambience: An opportunity to work with cutting edge AI technology, on a product that dramatically improves the quality of life for healthcare providers and the quality of care they can provide to their patients Dedicated budget for personal development, including access to world class mentors, advisors, and an in‑house executive coach Work alongside a world‑class, diverse team that is deeply mission aligned Ownership over your success and the ability to significantly impact the growth of our company Competitive salary and equity compensation with benefits including health, dental, and vision coverage, quarterly retreats, unlimited PTO, and a 401(k) plan Ambience is committed to supporting every candidate's ability to fully participate in our hiring process. If you need any accommodations during your application or interviews, please reach out to our Recruiting team at accommodations@ambiencehealth.com. We'll handle your request confidentially and work with you to ensure an accessible and equitable experience for all candidates. #J-18808-Ljbffr
$200k-300k yearly 4d ago
Distributed ML Infrastructure Engineer
Institute of Foundation Models
Infrastructure engineer job in Sunnyvale, CA
A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed and FSDP. This role offers a competitive salary ranging from $150,000 to $450,000 annually along with comprehensive benefits and amenities. #J-18808-Ljbffr
$114k-174k yearly est. 1d ago
Machine Learning Infrastructure Engineer at early-stage private AI platform
Jack & Jill/External ATS
Infrastructure engineer job in San Francisco, CA
This is a job that we are recruiting for on behalf of one of our customers. To apply, speak to Jack. He's an AI agent that sends you unmissable jobs and then helps you ace the interview. He'll make sure you are considered for this role, and help you find others if you ask. Machine Learning Infrastructure Engineer Company Description: Early-stage private AI platform Job Description: Build the core infrastructure to serve thousands, then millions, of private, personalized AI models at scale. This role involves optimizing model serving performance for low latency and cost, and integrating a TEE-based privacy stack to ensure user data and models are exclusively accessible by the user, not even the company. Drive the foundational systems for a new era of personal AI. Location: San Francisco, USA Why this role is remarkable: Pioneer the infrastructure for truly private, personal AI models, ensuring user data remains confidential. Join an early-stage, well-funded startup backed by top-tier VCs and leading AI experts. Make a massive impact on the future of AI, helping to keep humans empowered in a post-AGI world. What you will do: Build infrastructure for deploying thousands to millions of personalized finetuned models. Monitor and optimize in-the-wild model serving performance for low latency and cost. Integrate with a TEE-based privacy stack to guarantee user data and model confidentiality. The ideal candidate: Deep understanding of the machine learning stack, including transformer optimization and GPU performance. Ability to execute quickly in a fast-paced, early-stage startup environment. A missionary mentality, passionate about ensuring AI works for people. How to Apply: To apply for this job speak to Jack, our AI recruiter. Step 1. Visit our website Step 2. Click 'Speak with Jack' Step 3. Login with your LinkedIn profile Step 4. Talk to Jack for 20 minutes so he can understand your experience and ambitions Step 5. If the hiring manager would like to meet you, Jack will make the introduction #J-18808-Ljbffr
$115k-175k yearly est. 2d ago
Machine Learning Infrastructure Engineer
David Ai
Infrastructure engineer job in San Francisco, CA
David AI is the first audio data research company. We bring an R&D approach to data-developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, and we believe audio is the gateway. Speech is versatile, accessible, and human-it fits naturally into everyday life. As audio AI advances and new use cases emerge, high-quality training data is the bottleneck. This is where David AI comes in. David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we've brought on most FAANG companies and AI labs as customers. We recently raised a $50M Series B from Meritech, NVIDIA, Jack Altman (Alt Capital), Amplify Partners, First Round Capital and other Tier 1 investors. Our team is sharp, humble, ambitious, and tight-knit. We're looking for the best research, engineering, product, and operations minds to join us on our mission to push the frontier of audio AI. About our Engineering team At David AI, our engineers build the pipelines, platforms, and models that transform raw audio into high-signal data for leading AI labs and enterprises. We're a tight-knit team of product engineers, infrastructure specialists, and machine learning experts focused on building the world's first audio data research company. We move fast, own our work end-to-end, and ship to production daily. Our team designs real-time pipelines handling terabytes of speech data and deploys cutting-edge generative audio models. About this role As our Founding Machine Learning Infrastructure Engineer at David AI, you will build and scale the core infrastructure that powers our cutting-edge audio ML products. You'll be leading the development of the systems that enable our researchers and engineers to train, deploy, and evaluate machine learning models efficiently. In this role, you will Design and maintain data pipelines for processing massive audio datasets, ensuring terabytes of data are managed, versioned, and fed into model training efficiently. Develop frameworks for training audio models on compute clusters, managing cloud resources, optimizing GPU utilization, and improving experiment reproducibility. Create robust infrastructure for deploying ML models to production, including APIs, microservices, model serving frameworks, and real-time performance monitoring. Apply software engineering best practices with monitoring, logging, and alerting to guarantee high availability and fault‑tolerant production workloads. Translate research prototypes into production pipelines, working with ML engineers and data teams to support efficient data labeling and preparation. and optimization techniques to enhance infrastructure velocity and reliability. Your background looks like 5+ years of backend engineering with 2+ years ML infrastructure experience. Hands‑on experience scaling cloud infrastructure and large‑scale data processing pipelines for ML model training and evaluation. Proficient with Docker, Kubernetes, and CI/CD pipelines. Proven ML model deployment and lifecycle management in production. Strong system design skills optimizing for scale and performance. Proficient in Python with deep Kubernetes experience. Bonus points if you have Experience with feature stores, experiment tracking (MLflow, Weights and Biases), or custom CI/CD pipelines. Familiarity with large‑scale data ingestion and streaming systems (Spark, Kafka, Airflow). Proven ability to thrive in fast‑moving startup environments. Some technologies we work with Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Trigger.dev, WebRTC, FFmpeg. Benefits Unlimited PTO. Top‑notch health, dental, and vision coverage with 100% coverage for most plans. FSA & HSA access. 401k access. Meals 2x daily through DoorDash + snacks and beverages available at the office. Unlimited company‑sponsored Barry's classes. #J-18808-Ljbffr
$115k-175k yearly est. 3d ago
Machine Learning Infrastructure Engineer
Workshop Labs
Infrastructure engineer job in San Francisco, CA
Build the infrastructure to serve personal AI models privately and at scale. We're building the first truly private, personal AI - one that learns your skills, judgment, and preferences without big tech ever seeing your data. Our core ML systems challenge: how do we serve the world's best personal model, at low cost and high speed, with bulletproof privacy? What you'll do Build the infrastructure that lets us create & deploy thousands and eventually millions of personalized finetuned models for our customers Monitor & optimize in-the-wild model serving performance to hit low latency & cost Interface with the TEE-based privacy stack that lets us guarantee user data & models can only be seen & used by the user-not even us-and integrate the privacy architecture with the finetuning & inference code You have A deep understanding of the machine learning stack. You can dive into the details of how transformers work & performance optimization techniques for them. You have a mental model of GPUs sufficient to reason about performance from first principles. You can drill down from ML code to metal. Ability to execute quickly. We ship fast and fail fast so we can win faster. The challenge of human relevance in a post-AGI world isn't going to solve itself. A missionary mentality. We're a mission-driven company, looking for mission-first people. If you're passionate about ensuring AI works for people (and not the other way around), you've come to the right place. Ready to roll up your sleeves. We're an early stage startup, so we're looking for someone who can wear many hats. Experience you may have Work at a fast-paced AI startup, or top AI lab Experience deploying ML systems at scale. You might have worked with frameworks like vLLM, S-LoRA, Punica, or LoRAX. Experience with privacy-first infrastructure. You're familiar with confidential computing & ability to reason about both technical and real-world confidentiality and security. You may have worked with secure enclaves, TEEs, code measurement & remote attestation, Nvidia Confidential Computing, Intel TDX or AMD SEV-SNP, or related confidential computing technologies. We encourage speculative applications; we expect many strong candidates will have different experience or unconventional backgrounds. What we offer Generous compensation and early stage equity. We're competitive with the top startups, because we believe the best talent deserves it. World-class expertise. We're based in top AI research hubs in San Francisco and London. We're backed by AI experts like Juniper Ventures, Seldon Lab, and angels at Anthropic and Apollo Research. You'll have access to some of the best AI expertise in the world. Massive impact. Our mission is to keep people in the economy well after AGI. You'll help shift the trajectory of AI development for the better, helping break the intelligence curse and prevent gradual disempowerment to keep humans in control of the future. About Workshop Labs We're building the AI economy for humans. While everyone else tries to automate the world top-down, we believe in augmenting people bottom-up. Our team previously created evals used by Open AI, completed frontier AI research at MIT/Cambridge/Oxford, worked in Stuart Russell's lab, and led product verticals at high growth startups. The essay series The Intelligence Curse has been covered in TIME, The New York Times, and AI 2027. Our vision is for everyone to have a personal AI aligned to their goals and values, helping them stay durably relevant in a post-AGI economy. As a public benefit corporation, we have a fiduciary duty to ensure that as AI becomes more powerful, humans become more empowered, not disempowered or replaced. We're an early stage startup, backed by legendary investors like Brad Burnham and Matt McIlwain, visionary product leaders like Jake Knapp and John Zeratsky, philosopher-builders like Brendan McCord, and top AI safety funds like Juniper Ventures. Our investors were early at Anthropic, Slack, Prime Intellect, DuckDuckGo, and Goodfire. Our advisors have held senior roles at Anthropic, Google DeepMind, and UK AISI. #J-18808-Ljbffr
$115k-175k yearly est. 1d ago
Principal Enterprise IT Engineer - Zero-Trust & Automation
1X Technologies
Infrastructure engineer job in Palo Alto, CA
A robotics and AI company in Palo Alto is seeking a Principal Enterprise IT Engineer responsible for leading IT strategy and architecture across the organization. The ideal candidate will have expert knowledge of Google Workspace and Okta, strong scripting skills, and experience scaling IT systems in high-growth environments. This role offers a competitive salary of $180,000 - $235,000 along with comprehensive health benefits and 401(k) matching. #J-18808-Ljbffr
$180k-235k yearly 3d ago
IT Engineer - Onsite SF, Autonomous & Impactful
Hard Yaka
Infrastructure engineer job in San Francisco, CA
A fast-growing tech company based in San Francisco seeks an experienced IT Engineer for contract work. You will manage IT operations, leading employee onboarding and troubleshooting. Candidates should have 3-5 years of experience in IT roles, with skills in Google Workspace and troubleshooting. The position requires reliable execution and communication skills. You will work in the office three days a week, participating in an innovative tech culture that values diversity and collaboration. #J-18808-Ljbffr
$113k-161k yearly est. 5d ago
Machine Learning Systems Engineer, Research Tools
Menlo Ventures
Infrastructure engineer job in San Francisco, CA
Machine Learning Systems Engineer, Research ToolsAbout Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: You want to build the cutting-edge systems that train AI models like Claude. You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever more capable, reliable and steerable AI. As an ML Systems Engineer on our Research Tools team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety. You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible. You're energized by the challenge of supporting and empowering our research team in the mission to build beneficial AI systems. Our finetuning researchers train our production Claude models, and internal research models, using RLHF and other related methods. Your job will be to build, maintain, and improve the algorithms and systems that these researchers use to train models. You'll be responsible for improving the speed, reliability, and ease-of-use of these systems. You may be a good fit if you: Have 2+ years of software engineering experience Like working on systems and tools that make other people more productive Are results-oriented, with a bias towards flexibility and impact Pick up slack, even if it goes outside your job description Enjoy pair programming (we love to pair!) Want to learn more about machine learning research Care about the societal impacts of your work Strong candidates may also have experience with: Python Implementing LLM finetuning algorithms, such as RLHF Representative projects: Profiling our reinforcement learning pipeline to find opportunities for improvement Building a system that regularly launches training jobs in a test environment so that we can quickly detect problems in the training pipeline Making changes to our finetuning systems so they work on new model architectures Building instrumentation to detect and eliminate Python GIL contention in our training code Diagnosing why training runs have started slowing down after some number of steps, and fixing it Implementing a stable, fast version of a new training algorithm proposed by a researcher Deadline to apply:None. Applications will be reviewed on a rolling basis. The expected salary range for this position is: $300,000 - $405,000 USD Logistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship:We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact - advancing our long-term goals of steerable, trustworthy AI - rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage:Learn aboutour policy for using AI in our application process Create a Job Alert Interested in building your career at Anthropic? Get future opportunities sent straight to your email. Apply for this job indicates a required field First Name * Last Name * Email * Phone Resume/CV Enter manually Accepted file types: pdf, doc, docx, txt, rtf Enter manually Accepted file types: pdf, doc, docx, txt, rtf (Optional) Personal Preferences * How do you pronounce your name? Website Publications (e.g. Google Scholar) URL Are you open to working in-person in one of our offices 25% of the time? * Select... When is the earliest you would want to start working with us? Do you have any deadlines or timeline considerations we should be aware of? AI Policy for Application * Select... We believe that AI will have a transformative impact on the world, and we're seeking exceptional candidates who collaborate thoughtfully with Claude to realize this vision. At the same time, we want to understand your unique skills, expertise, and perspective through our hiring process. We invite you to review our AI partnership guidelines for candidates and confirm your understanding by selecting “Yes.” Why Anthropic? * Why do you want to work at Anthropic? (We value this response highly - great answers are often 200-400 words.) Additional Information * Add a cover letter or anything else you want to share. LinkedIn Profile Please ensure to provide either your LinkedIn profile or Resume, we require at least one of the two. Are you open to relocation for this role? * Select... What is the address from which you plan on working? If you would need to relocate, please type "relocating". Have you ever interviewed at Anthropic before? * Select... Do you have 8 or more years of software engineering experience? * Select... Do you have experience building and maintaining data processing pipelines or infrastructure at scale? * Select... Have you worked directly with technical stakeholders (like researchers or data scientists) to understand and implement their infrastructure needs? * Select... Are you proficient in Python and have experience working with cloud platforms (AWS/GCP)? * Select... Have you independently owned and delivered technical projects from conception to production? * Select... Do you require visa sponsorship? * Select... Will you now or will you in the future require employment visa sponsorship to work in the country in which the job you're applying for is located? * Select... Voluntary Self-Identification For government reporting purposes, we ask candidates to respond to the below self-identification survey.Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiringprocess or thereafter. Any information that you do provide will be recorded and maintained in aconfidential file. As set forth in Anthropic's Equal Employment Opportunity policy,we do not discriminate on the basis of any protected group status under any applicable law. If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection.As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measurethe effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categoriesis as follows: A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability. A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service. An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense. An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985. Select... Voluntary Self-Identification of Disability Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Voluntary Self-Identification of DisabilityForm CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Why are you being asked to complete this form? We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years. Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor's Office of Federal Contract Compliance Programs (OFCCP) website at ***************** . How do you know if you have a disability? A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to: Alcohol or other substance use disorder (not currently using drugs illegally) Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS Blind or low vision Cancer (past or present) Cardiovascular or heart disease Celiac disease Cerebral palsy Deaf or serious difficulty hearing Diabetes Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders Epilepsy or other seizure disorder Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome Intellectual or developmental disability Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD Missing limbs or partially missing limbs Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports Nervous system condition, for example, migraine headaches, Parkinson's disease, multiple sclerosis (MS) Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities Partial or complete paralysis (any cause) Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema Short stature (dwarfism) Traumatic brain injury Disability Status Select... PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete. #J-18808-Ljbffr
$87k-121k yearly est. 1d ago
ML Engineer - Production-Scale AI Systems
Inference
Infrastructure engineer job in San Francisco, CA
A cutting-edge AI startup in San Francisco is seeking a Machine Learning Engineer. In this role, you will build and improve core ML systems that drive custom model training platforms. You will lead projects from data intake to model delivery, creating robust tools and ensuring model performance. The ideal candidate has experience in AI model training with PyTorch, data processing, and creating benchmarks. Offering competitive salaries within a range of $220,000 to $320,000, plus equity and benefits. #J-18808-Ljbffr
$87k-121k yearly est. 4d ago
Distributed Systems Engineer - High-Impact Cloud Storage
Archil, Inc.
Infrastructure engineer job in San Francisco, CA
A cloud storage technology company in San Francisco is looking for a Distributed Systems Engineer to work across the stack in building innovative storage solutions. You will be oncall for production systems and will design distributed systems to meet customer needs. The ideal candidate has over 3 years of experience in distributed systems, problem solving skills, and is passionate about enhancing customer experiences. Join us in our mission to revolutionize cloud storage with the next generation of applications. #J-18808-Ljbffr
$87k-121k yearly est. 4d ago
SRE Cybersecurity Engineer - Scale Systems (Equity)
Pantera Capital
Infrastructure engineer job in Palo Alto, CA
A tech-focused financial services firm in California is seeking a Cybersecurity/SRE professional to secure and maintain the reliability of its infrastructure. Responsibilities include building secure applications on AWS, managing identities, and strengthening Kubernetes security. The ideal candidate has expertise in Python, Terraform, and large distributed systems, and holds a proactive, problem-solving mindset. Competitive salary and comprehensive benefits included. #J-18808-Ljbffr
$86k-120k yearly est. 5d ago
ML Infrastructure Engineer - Real-Time Vision
Apple Inc. 4.8
Infrastructure engineer job in Sunnyvale, CA
A leading technology company is looking for a Machine Learning Infrastructure Engineer in Sunnyvale, California. You will develop data ecosystems and infrastructure for ML projects, partnering closely with engineers and scientists. Candidates should have a Bachelor's in Computer Science and experience with cloud providers, as well as strong programming skills in Python. This is an opportunity to be a part of innovative projects that influence the next generation of technology. #J-18808-Ljbffr
$150k-196k yearly est. 3d ago
Machine Learning Infrastructure Engineer
Ambience Healthcare, Inc.
Infrastructure engineer job in San Francisco, CA
About Us: Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care. Trusted by top health systems across North America, Ambience's platform is live across outpatient, emergency, and inpatient settings, supporting more than 100 specialties with real-time, coding-aware documentation. The platform integrates directly with Epic, Oracle Cerner, athenahealth, and other major EHRs. Founded in 2020 by Mike Ng and Nikhil Buduma, Ambience is headquartered in San Francisco and backed by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, Kleiner Perkins, and other leading investors. Join us in the endeavor of accelerating the path to safe & useful clinical super intelligence by becoming part of our community of problem solvers, technologists, clinicians, and innovators. The Role: We're looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure that powers every AI system at Ambience. You'll work closely with our ML, data, and product teams to develop the foundational tools, systems, and workflows that support rapid iteration, robust evaluation, and production reliability for our LLM-based products. Our Engineering roles are hybrid in our SF office 3x/wk. What You'll Do: You have 5+ years of experience as a software engineer, infrastructure engineer, or ML platform engineer You've worked directly on systems that support ML research or production workloads - whether training pipelines, evaluation systems, or deployment frameworks You write high-quality code (we primarily use Python) and have strong engineering and systems design instincts You're excited to work closely with ML researchers and product engineers to unblock them with better infrastructure You're pragmatic and care deeply about making tools that are reliable, scalable, and easy to use You thrive in fast-paced, collaborative environments and are eager to take ownership of ambiguous problems Who You Are: Design, build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows Improve internal tools and developer experience for ML experimentation and observability Partner with ML engineers to optimize model deployment and monitoring across clinical workloads Define standards for model versioning, performance tracking, and rollout processes Collaborate across the engineering team to build reusable abstractions that accelerate AI product development Drive performance, cost efficiency, and reliability improvements across our AI infrastructure stack Pay Transparency We offer a base compensation range of approximately $200,000-300,000 per year, exclusive of equity. This intentionally broad range provides flexibility for candidates to tailor their cash and equity mix based on individual preferences. Our compensation philosophy prioritizes meaningful equity grants, enabling team members to share directly in the impact they help create. Are you outside of the range? We encourage you to still apply: we take an individualized approach to ensure that compensation accounts for all of the life factors that matter for each candidate. Being at Ambience: An opportunity to work with cutting edge AI technology, on a product that dramatically improves the quality of life for healthcare providers and the quality of care they can provide to their patients Dedicated budget for personal development, including access to world class mentors, advisors, and an in-house executive coach Work alongside a world-class, diverse team that is deeply mission aligned Ownership over your success and the ability to significantly impact the growth of our company Competitive salary and equity compensation with benefits including health, dental, and vision coverage, quarterly retreats, unlimited PTO, and a 401(k) plan Ambience is committed to supporting every candidate's ability to fully participate in our hiring process. If you need any accommodations during your application or interviews, please reach out to our Recruiting team at accommodations@ambiencehealth.com. We'll handle your request confidentially and work with you to ensure an accessible and equitable experience for all candidates. #J-18808-Ljbffr
$200k-300k yearly 3d ago
Machine Learning Infrastructure Engineer
Institute of Foundation Models
Infrastructure engineer job in Sunnyvale, CA
About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you'll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role We're looking for a distributed ML infrastructure engineer to help extend and scale our training systems. You'll work side‑by‑side with world‑class researchers and engineers to: Extend distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod) Implement distributed optimizers from mathematical specs Build robust config + launch systems across multi‑node, multi‑GPU clusters Own experiment tracking, metrics logging, and job monitoring for external visibility Improve training system reliability, maintainability, and performance While much of the work will support large‑scale pre‑training, pre‑training experience is not required. Strong infrastructure and systems experience is what we value most. Key Responsibilities Distributed Framework Ownership - Extend or modify training frameworks (e.g., DeepSpeed, FSDP) to support new use cases and architectures. Optimizer Implementation - Translate mathematical optimizer specs into distributed implementations. Launch Config & Debugging - Create and debug multi‑node launch scripts with flexible batch sizes, parallelism strategies, and hardware targets. Metrics & Monitoring - Build systems for experiment tracking, job monitoring, and logging usable by collaborators and researchers. Infra Engineering - Write production‑quality code and tests for ML infra in PyTorch or JAX; ensure reliability and maintainability at scale. Qualifications Must-Haves: 5+ years of experience in ML systems, infra, or distributed training Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod) Strong software engineering fundamentals (Python, systems design, testing) Proven multi‑node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO) Ability to implement algorithms across GPUs/nodes based on mathematical specs Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team Experience with large‑scale machine learning workloads (strong ML fundamentals) Nice-to-Haves: Exposure to mixed‑precision training (e.g., bf16, fp8) with accuracy validation Familiarity with performance profiling, kernel fusion, or memory optimization Open‑source contributions or published research (MLSys, ICML, NeurIPS) CUDA or Triton kernel experience Experience with large‑scale pre‑training Experience building custom training pipelines at scale and modifying them for custom needs Deep familiarity with training infrastructure and performance tuning $150,000 - $450,000 a year Benefits Comprehensive medical, dental, and vision 401(k) program Generous PTO, sick leave, and holidays Paid parental leave and family‑friendly benefits On‑site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station #J-18808-Ljbffr
$114k-174k yearly est. 1d ago
Privacy-First ML Infrastructure Engineer
Workshop Labs
Infrastructure engineer job in San Francisco, CA
A pioneering AI startup in San Francisco is looking for an experienced individual to build infrastructure for deploying personalized AI models. The role demands a strong understanding of machine learning technology and a passion for enabling user-controlled AI solutions. Ideal candidates will thrive in fast-paced environments and contribute to impactful AI development. The company offers competitive compensation, equity, and a significant role in shaping the future of AI. #J-18808-Ljbffr
$115k-175k yearly est. 1d ago
Principal Enterprise IT Engineer
1X Technologies
Infrastructure engineer job in Palo Alto, CA
Principal Enterprise IT Engineer, IT & Security About 1X: We're an AI and robotics company based in Palo Alto, California, on a mission to build a truly abundant society through general-purpose robots capable of performing any kind of work autonomously. We believe that to truly understand the world and grow in intelligence, humanoid robots must live and learn alongside us. That's why we're focused on developing friendly home robots designed to integrate seamlessly into everyday life. We're looking for curious, driven, and passionate people who want to help shape the future of robotics and AI. If this mission excites you, we'd be thrilled to hear from you and explore how you might contribute to our journey. Role Overview The Principal Enterprise IT Engineer will lead the strategy, architecture, and implementation of enterprise IT systems across the company. This role will define standards for identity, endpoint management, collaboration, and security while scaling IT infrastructure to support rapid organizational growth. You'll play a key leadership role, mentoring senior engineers and influencing cross-functional and executive stakeholders to align IT operations with strategic business needs. Responsibilities Define and drive enterprise IT strategy, architecture, and roadmaps across identity, collaboration, and device platforms Lead administration and scaling of Google Workspace, Okta, Intune, and MDM platforms with a focus on Zero Trust principles Develop and implement automation frameworks and scripting (Bash, Python, PowerShell) to streamline IT operations Align IT systems with compliance standards (e.g., SOC2, ISO 27001) and proactively mitigate enterprise risks Ensure seamless integration of IT systems with engineering, manufacturing, and robotics environments Act as senior escalation point for IT operations, mentoring IT engineers and building a high-performance function Influence executive and cross-functional stakeholders to ensure IT strategy supports business growth Requirements Expert-level knowledge of Google Workspace, Okta, Microsoft Intune, and MDM platforms across multiple OS (mac OS, Windows, iOS, Android) Strong scripting and automation skills (Bash, Python, PowerShell); experience implementing Zero Trust security Proven experience scaling IT systems globally in high-growth, cloud-first or hybrid environments Ability to lead IT architecture initiatives and partner with executive and security leadership Experience mentoring senior IT engineers and leading high-performance teams Preferred: Familiarity with Terraform, Ansible, and IT support for robotics or engineering-heavy environments Preferred: Certifications such as CISSP, Okta Certified Architect, Google Workspace Admin, or Microsoft Enterprise Mobility Benefits & Compensation Salary: $180,000 - $235,000 Health, dental, and vision insurance 401(k) with company match Paid time off and holidays Equal Opportunity Employer 1X is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, ancestry, citizenship, age, marital status, medical condition, genetic information, disability, military or veteran status, or any other characteristic protected under applicable federal, state, or local law. #J-18808-Ljbffr
$180k-235k yearly 3d ago
IT Engineer (Contract)
Hard Yaka
Infrastructure engineer job in San Francisco, CA
About AngelList We exist to accelerate innovation. We do this by giving more people the opportunity to participate in the venture economy by building the financial infrastructure that makes it possible for more people to invest in world-changing startups. We also build tools for startup founders that help them run their operations, so they can focus on building their company. AngelList is the nexus of venture capital and the startup community. We support over $171B+ assets on our platform, and we've driven capital to over 13,000 startups. 57% of top-tier U.S. VC deals involve investors on AngelList. While our scale is large, our ambitions are even larger - we're innovating on the infrastructure for venture and individual investors and the startups they invest in. Come build with us! About the Role We're looking for an IT Engineer to join our team on a contract basis, reporting directly to our IT Lead. You'll own the execution of critical IT workstreams across AngelList and our subsidiary companies, taking full responsibility for delivery while operating within established systems and priorities. This role is ideal for someone who thrives on autonomy, can run independently after ramping, and takes pride in reliable execution over strategic planning. Responsibilities Own day-to-day IT operations, including employee onboarding, offboarding, and access management across Google Workspace, Rippling, 1Password, and Slack. Troubleshoot and resolve IT issues independently, serving as a reliable resource for employees. Execute on MDM and endpoint management, maintaining security policies and device compliance. Manage SaaS platforms, including license tracking, access reviews, and vendor coordination. Maintain and improve IT documentation, playbooks, and runbooks. Own cross-functional projects as assigned, coordinating with engineering, security, and ops teams. Manage office IT infrastructure and AV equipment as needed. What We're Looking For 3-5+ years of experience in IT operations, systems administration, or similar roles. Hands‑on experience with Google Workspace, Slack, and common SaaS tools. Solid understanding of identity and access management. Strong troubleshooting instincts and a bias toward solving problems without escalation. Reliable and organized - someone who follows through without needing reminders. Clear communicator who can work across teams and explain technical issues simply. Experience in automation and scripting to reduce manual work and improve efficiency. Experience with Rippling, Slack Grid or n8n.io is a plus. If you don't tick every box above, we'd still encourage you to apply. We're building a diverse team whose skills balance and complement one another. Office Location and Expectations AngelList has offices in two hub cities: This role is based in our San Francisco office. You will be expected to come in three times a week, usually Tuesdays, Wednesdays, and Thursdays, with some flexibility for occasional Mondays or Fridays. The role also includes light after‑hours support to set up IT for occasional evening events. If you need to be offline at 5 PM every day, this won't be the right fit. Compensation: $80+ an hour, 40-45 hours a week. Working at AngelList: At AngelList, we are united in our purpose to accelerate innovation and build the future of private markets. Our beliefs and values shape how we work, collaborate, and create impact. If the below resonate, we'd love to have you with us. *Beliefs: ************************** *Values & Leadership Expectations: ************************* AngelList is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. #J-18808-Ljbffr
$113k-161k yearly est. 5d ago
Distributed Systems Engineer
Archil, Inc.
Infrastructure engineer job in San Francisco, CA
Role As a distributed systems engineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product direction. We'll expect you to be able to: Be oncall for a production system to help our customers if anything goes wrong. Build out never-before-seen capabilities in a storage service Design distributed systems interactions for atomicity and idempotency Deploy infrastructure and generalize infrastructure across different clouds Operate through changing customer requirements with lots of ambiguity Who are you? You have 3+ years of experience building and operating distributed systems (flexible). Ideally, you've worked at a startup before, so you know how chaotic this time can be. You've successfully resolved disagreements at work before, and you understand that the highest priority is helping our customers - not being right. You're comfortable debugging problems that occur as a result of failures in multiple, different systems, using tools like metrics and logs. You've been paged at 3am to solve a complex production issue before. You're knowledgeable about distributed systems: you get how consensus works, you know how to scale systems, and you know what pitfalls in API design to avoid. You're familiar with how to optimize the performance of a system, including a general sense of how much latency different operations take, and what kind of bottlenecks could lead to a reduction in potential throughput. Most of all, you know how computers work from the silicon up. Someone once asked you in an interview “what happens when you go to Google.com”, and there wasn't enough time in the interview to talk about all of the steps. Why join us? By building the highest-performance, simplest storage product in the cloud, we have a great chance of changing how the world builds the next-generation of applications (and with AI, more applications will be written in the next 5 years than ever before). We'd love for your to be a part of our journey. How to join? Show us that you're knowledgeable about the space that we're working in on your application. It's up to you how you do this, but one potential way is by answering one of the following questions: How do you think our system works? What do you think our biggest technical challenge is? What would make our system not work? About Archil Archil is on a mission to change how developers build applications in the cloud, by building the next, default storage platform in the cloud. Over the past 15 years, S3 has become the default way to store inactive data sets in the cloud, but the next-generation of AI and analytics applications need to actively process more data than ever before. We're solving this problem by building the first Volume storage product that's as fast as EBS, infinitely scaleable like S3, and connects to existing data sets in S3 and other repositories. Our customers choose Archil because this architecture radically simplifies how they think about working with their data (every application becomes stateless, no cold-start latencies, and no need to worry about checkpointing or backup). Hacker News agrees. Hunter, the founder, has 10 years of experience building and operating cloud storage, including helping to launch Amazon's EFS product and working on bleeding-edge storage at Netflix. He started the company after working with hundreds of customers across these roles, and identifying a need for a new kind of storage product. We're fully in-person in San Francisco. If you're also someone interested in distributed systems, completely focused on how to make customers successful, and interested in solving really big technical challenges, we'd love for you to join us. #J-18808-Ljbffr
$87k-121k yearly est. 4d ago
ML Systems Engineer, Research Tools - Impactful
Menlo Ventures
Infrastructure engineer job in San Francisco, CA
A leading AI research company in New York seeks a Machine Learning Systems Engineer to build cutting-edge systems for training AI models. This role involves developing critical algorithms, improving system performance, and collaborating with a dynamic research team. Ideal candidates have a strong software engineering background and care about the societal impacts of AI technology. The expected salary range is $300,000 - $405,000 USD, with a hybrid work policy requiring 25% in-office presence. #J-18808-Ljbffr
$87k-121k yearly est. 1d ago

Learn more about infrastructure engineer jobs

How much does an infrastructure engineer earn in Berkeley, CA?

The average infrastructure engineer in Berkeley, CA earns between $95,000 and $211,000 annually. This compares to the national average infrastructure engineer range of $76,000 to $148,000.

Average infrastructure engineer salary in Berkeley, CA

$141,000

$95,00010%

$141,000Median

$211,00090%

What are the biggest employers of Infrastructure Engineers in Berkeley, CA?

The biggest employers of Infrastructure Engineers in Berkeley, CA are:

Bluespace
Oxide Computer
Center for Elders' Independence
Insight Global
JSat Automation

Job type you want

Full Time

Part Time

Internship

Temporary

Zippia Careers
Computer and Mathematical Industry
Infrastructure Engineer Jobs
Locations
Berkeley, CA
Infrastructure Engineer Berkeley, CA Jobs

Infrastructure engineer jobs in Berkeley, CA - 2,321 jobs

Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Distributed ML Infrastructure Engineer

Machine Learning Infrastructure Engineer at early-stage private AI platform

Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Principal Enterprise IT Engineer - Zero-Trust & Automation

IT Engineer - Onsite SF, Autonomous & Impactful

Machine Learning Systems Engineer, Research Tools

ML Engineer - Production-Scale AI Systems

Distributed Systems Engineer - High-Impact Cloud Storage

SRE Cybersecurity Engineer - Scale Systems (Equity)

ML Infrastructure Engineer - Real-Time Vision

Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Privacy-First ML Infrastructure Engineer

Principal Enterprise IT Engineer

IT Engineer (Contract)

Distributed Systems Engineer

ML Systems Engineer, Research Tools - Impactful

Learn more about infrastructure engineer jobs

How much does an infrastructure engineer earn in Berkeley, CA?

What are the biggest employers of Infrastructure Engineers in Berkeley, CA?