Sunnyvale, California, United States Machine Learning and AI
Want to ship amazing experiences in Apple products? Be part of the team in the Video Computer Vision (VCV) organization that focuses on people understanding from real-time video streams and building higher level reasoning algorithms. VCV delivered features such as Face ID, RoomPlan as well as many other computer vision algorithms powering Apple Vision Pro, iPhone, and iPad. We focus on a balance of research and development to deliver Apple quality, pioneering experiences. Come shape Apple products as a driven and dedicated ML Infrastructure and Data Engineer to push the limits of ML algorithms with hands‑on work and real world and simulated data, in an innovative team and be part of building the next big thing.
Description
As part of the Video Computer Vision (VCV) team, you will help us create the data and infrastructure ecosystem needed to support our ML development and continuously improve our features. We take full end-to-end ownership of our services and data products, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you'll have the opportunity to work on complex problems in close partnership with our ML engineers, data scientists and software integration teams.
Minimum Qualifications
Bachelor's degree in Computer Science or related discipline, and 2 years relevant industry experience.
Strong foundational knowledge in Computer Science.
Extensive programming experience in Python.
Hands‑on experience with cloud providers (AWS, GCP, or Azure).
Strong understanding of core infrastructure concepts (e.g., compute, networking, storage, containers, Kubernetes).
Preferred Qualifications
Experience with machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment.
Proficiency with cloud computing and distributed data processing infrastructure and tools (e.g., Ray, Spark, Trino).
Hands‑on experience with CI/CD pipelines and practices.
Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform, Pulumi, or CloudFormation).
Experience building on LLMs or other generative models.
Ability to drive projects from concept to production, balancing business needs with technical quality and timely delivery.
Excellent communication skills, ability to work both independently and multi‑functionally.
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .
Apple accepts applications to this posting on an ongoing basis.
#J-18808-Ljbffr
$147.4k-272.1k yearly 3d ago
Looking for a job?
Let Zippia find it for you.
Staff ML Engineer - AI-Powered Observability Platform
Cisco Systems 4.8
Engineer job in San Jose, CA
A global technology company is looking for a seasoned software engineer to enhance AI capabilities within their observability platform. Candidates should have a strong background in AI/ML systems, cloud computing, and robust technical leadership. This role is pivotal in driving innovation in data analysis and delivering scalable solutions. The ideal candidate will thrive in an agile environment and provide mentorship to junior engineers. Enjoy competitive salaries and benefits while contributing to impactful technology solutions.
#J-18808-Ljbffr
$151k-191k yearly est. 4d ago
Staff ML Engineer, Compute Platform - Scale & GPU
General Motors 4.6
Engineer job in Sunnyvale, CA
An automotive giant is seeking a Staff ML Engineer for their ML Compute Platform to scale backend services and contribute to AI infrastructure. Responsibilities include designing software components, improving system efficiency, and leading initiatives. Candidates should have 7+ years of experience and expertise in languages like Go, C++, or Python, as well as a solid background in distributed systems. Join a team that's transforming mobility and tackling complex engineering challenges with AI applications.
#J-18808-Ljbffr
$117k-142k yearly est. 1d ago
Integration Engineer, Hands
1X Technologies As
Engineer job in Palo Alto, CA
1X builds safe humanoid robots that work alongside people at home and in factories. We ship fast, test hard, iterate until it works.
The job: Own full integration of dexterous hands into the robot. Make every hand move perfectly, sense accurately, hit production yield. Hands-on with mech, EE, controls, AI.
You'll do:
Plug hands into control stack + validate full behaviors
Build calibration + end-of-line tests
Integrate position/torque/tactile sensors into real-time loops
Act as the technical focal point for the hand
Close the loop between design intent, control behavior, and manufacturability
Write diagnostics + fallback logic
Debug mech/EE/SW with scopes, logs, data
Tune high-BW controllers + run sys-ID
Define test stations + x-functional requirements
Script auto-rigs, dashboards, pipelines
Write docs + mentor
You need:
6+ yrs shipping robotic actuators/manipulators
C++ / Python (embedded + test)
Sensor fusion + closed-loop tuning
Mech, kinematics, controls, sensors
Linux, lab gear, debug
BS/MS ME/EE/Robotics
Cross-team driver
Nice to have:
Built robotic hands or multi-DOF
EtherCAT/CAN/SPI/I2C
MATLAB/Simulink
CAD collab
Benefits & Compensation
Comprehensive health, dental, and vision insurance
401(k) with company match
Paid time off and holidays
Pay: $160k-$240k + equity + benefits
Equal Opportunity Employer
1X is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, ancestry, citizenship, age, marital status, medical condition, genetic information, disability, military or veteran status, or any other characteristic protected under applicable federal, state, or local law.
#J-18808-Ljbffr
$110k-152k yearly est. 4d ago
Staff ML Engineer - Grid AI & Planning Lead
X Development, LLC
Engineer job in Mountain View, CA
A leading technology company is seeking a Staff Machine Learning Engineer based in Mountain View, CA. In this pivotal role, you will develop and deploy advanced machine learning models to address challenges in today's complex electric grid. You will lead model building efforts, mentor junior engineers, and collaborate on cross-functional projects. Ideal candidates have over 10 years of experience in machine learning, strong Python skills, and familiarity with ML frameworks like PyTorch and TensorFlow. Join us and make a significant impact on the energy future.
#J-18808-Ljbffr
$98k-169k yearly est. 1d ago
Staff ML Engineer - Lead Global ML Initiatives
Minimal
Engineer job in Palo Alto, CA
A leading technology firm in California is looking for an experienced Staff Machine Learning Engineer. In this role, you will drive the technical direction of machine learning technology, design and build impactful solutions, and lead cross-team initiatives. The ideal candidate has extensive experience in machine learning and is skilled in collaboration and mentorship. This position offers competitive compensation with equity options and emphasizes a diverse and inclusive workplace.
#J-18808-Ljbffr
$98k-169k yearly est. 1d ago
Distributed ML Infrastructure Engineer
Institute of Foundation Models
Engineer job in Sunnyvale, CA
A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed and FSDP. This role offers a competitive salary ranging from $150,000 to $450,000 annually along with comprehensive benefits and amenities.
#J-18808-Ljbffr
$114k-174k yearly est. 1d ago
Tech Lead Platform Validation Engineer - GPU Server
D-Matrix
Engineer job in Santa Clara, CA
A technology company is seeking a Platform Validation Engineer to ensure the reliability and performance of their GPU servers. This role involves developing test plans, debugging hardware and software interactions, and automating validation workflows. Candidates should have a Bachelor's or Master's degree in EE or CS with over 5 years of experience in GPU server platform validation. The position offers a hybrid work model at the Santa Clara headquarters.
#J-18808-Ljbffr
$89k-123k yearly est. 3d ago
Cellular and Wireless Integration Engineer
Rivian 4.1
Engineer job in Palo Alto, CA
About Us
Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive's next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we're addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.
The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we'll map a new way forward. Working together, we'll create a future that's more connected, more intelligent, more sustainable for everyone.
Role Summary
As an engineer focusing on cellular and wireless integration for the Connected Systems Integration Team at R|V Tech, you will work alongside software developers, systems integrators, and system engineers to support features such as cellular, WiFi, Ethernet and GPS domains along with telematics, mobile app integration, and other Internet-dependent vehicle features, with a primary focus on cellular systems integration and bring-up.
The cellular and wireless integration engineer will execute tests and integration of connectivity features and will also contribute to automation of these tests (in Python). The cellular and wireless integration engineer may also support other vehicle development activities which require connectivity support. These development activities may include drives in pre-production vehicles and will provide exposure to vehicle networks, cloud connectivity, and fleet management. The connectivity integration engineer may support time-critical test events and may be asked to support special investigations and projects.
Responsibilities
Your typical day looks like this:
Attend meetings with development teams to understand new system designs and to align on test plans and expectations.
Create design documentation and test plans for a new cellular feature implementation.
Perform cellular testing using a specialized test chamber and test equipment; work with test equipment vendors to ensure compatibility and system integrity for automation.
Perform complementary cellular testing on-vehicle using a prototype or fleet vehicle.
Analyze logs from internal and external bugs to assess root cause of failure and create new work scope for dev teams.
Core capabilities and behaviors should include:
Strong fundamentals for test execution and documentation.
The ability to correlate meaningful feedback from disparate data sets.
Keen observation skills of wireless (WiFi, 5G, LTE, GPS, Bluetooth) behavior and performance.
Deep understanding of telematics, end-to-end data routing and validation of data fidelity and frequency over different data links.
Understanding of middleware and layered/encapsulated network schemes.
Native Linux proficiency (command-line, tools, bring-up); embedded Linux a plus.
Ability to identify and distinguish issues in hardware and/or software.
Engage in discussions to test, bug fix, and optimize feature development based on data.
Qualifications
Bachelor's degree in either Computer Science, Computer Engineering, Electrical Engineering, Software Engineering, or Information Systems.
Good understanding of 5GNR and LTE protocol stack.
A working understanding of the OSI model and network protocols.
Basic proficiency with scripting languages including Python and/or Shell.
Basic proficiency with database manipulation and data presentation.
Triage and diagnosis of complex connectivity issues to include root cause analysis, stability, and interoperability problems.
A mindset geared towards collaboration and forward progress.
Ability to thrive under pressure and time constraints.
Availability to travel and/or support activities during off-peak hours as-needed.
Pay Disclosure
Salary Range/Hourly Rate for Palo Alto, California Based Applicants: $117,000 - $149,000 USD (actual compensation will be determined based on experience, location, and other factors permitted by law).
Benefits Summary: Rivian and Volkswagen Group Technologies provides robust medical/Rx, dental and vision insurance packages for full-time and part-time employees, their spouse or domestic partner, and children up to age 26. Full Time Employee coverage is effective on the first day of employment. Part-Time employee coverage is effective the first of the month following 90 days of employment.
Equal Opportunity
Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.
Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.
Candidate Data Privacy
Rivian and VW Group Technologies (“Rivian and Volkswagen Group Technologies”) may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes (“Candidate Personal Data”). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and VW Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law.
Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies' service providers, including providers of background checks, staffing services, and cloud services.
Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.
Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.
Please note that we are currently not accepting applications from third party application services.
#J-18808-Ljbffr
$117k-149k yearly 4d ago
Lead ML Engineer - Ads Identity & Conversion
Pinterest 4.6
Engineer job in Palo Alto, CA
A leading social media platform in Palo Alto is looking for a Technical Lead Manager in Ads Conversion Modeling. This role includes leading the User Match Prediction roadmap, developing conversion models, and partnering with various teams to drive performance. Ideal candidates have a strong software engineering background, machine learning knowledge, and 6+ years of relevant experience. The position offers a competitive salary and hybrid work flexibility.
#J-18808-Ljbffr
$163k-210k yearly est. 4d ago
Senior AI Engineer - Photobook AI & Cloud ML Lead
Ring Inc. 4.5
Engineer job in San Mateo, CA
A Silicon Valley-based tech company is seeking a skilled AI Engineer to lead the maintenance and enhancement of AI solutions. The role involves optimizing AI systems, developing models, and managing APIs, predominantly using AWS services. Ideal candidates should have a strong background in AI development, proficiency in Python, and at least 6 years of experience. The position offers a competitive salary of $180,000 - $200,000 with comprehensive benefits and growth opportunities.
#J-18808-Ljbffr
$180k-200k yearly 18h ago
SRE Cybersecurity Engineer - Scale Systems (Equity)
Pantera Capital
Engineer job in Palo Alto, CA
A tech-focused financial services firm in California is seeking a Cybersecurity/SRE professional to secure and maintain the reliability of its infrastructure. Responsibilities include building secure applications on AWS, managing identities, and strengthening Kubernetes security. The ideal candidate has expertise in Python, Terraform, and large distributed systems, and holds a proactive, problem-solving mindset. Competitive salary and comprehensive benefits included.
#J-18808-Ljbffr
$86k-120k yearly est. 18h ago
Machine Learning Systems Engineer
Menlo Ventures
Engineer job in Berkeley, CA
Who We Are
At RelationalAI, we are building the future of intelligent data systems through our cloud-native relational knowledge graph management system-a platform designed for learning, reasoning, and prediction.
We are a remote-first, globally distributed team with colleagues across six continents. From day one, we've embraced asynchronous collaboration and flexible schedules, recognizing that innovation doesn't follow a 9-to-5.
We are committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of every team member and believe in fostering a culture of respect, curiosity, and innovation. We support each other's growth and success-and take the well‑being of our colleagues seriously. We encourage everyone to find a healthy balance that affords them a productive, happy life, wherever they choose to live.
We bring together engineers who love building core infrastructure, obsess over developer experience, and want to make complex systems scalable, observable, and reliable.
Machine Learning Systems Engineer
Location: Remote (San Francisco Bay Area / North America / South America)
Experience Level: 3+ years of experience in machine learning engineering or research
About ScalarLM
This role will involve heavily working with the ScalarLM framework and team.
ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self‑improving agents-all via an OpenAI‑compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron‑LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher‑lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation.
ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 - October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry.
It is a fully open source project (CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self‑improving AI agents similar to DeepSeek R1.
ScalarLM is supported and maintained by TensorWave in addition to RelationalAI.
The Role
As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large‑scale language model applications on top of it. You'll operate at the intersection of high-performance computing, distributed systems, and cutting‑edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.
This is an opportunity to take on technically demanding projects, contribute to foundational systems, and help shape the next generation of intelligent computing.
You Will
Contribute code and performance improvements to the open source project.
Develop and optimize distributed training algorithms for large language models.
Implement high‑performance inference engines and optimization techniques.
Work on integration between vLLM, Megatron‑LM, and HuggingFace ecosystems.
Build tools for seamless model training, fine‑tuning, and deployment.
Optimize performance of advanced GPU architectures.
Collaborate with the open source community on feature development and bug fixes.
Research and implement new techniques for self‑improving AI agents.
Who You Are Technical Skills
Programming Languages: Proficiency in both C/C++ and Python
High Performance Computing: Deep understanding of HPC concepts, including:
MPI (Message Passing Interface) programming and optimization
Bulk Synchronous Parallel (BSP) computing models
Multi‑GPU and multi‑node distributed computing
CUDA/ROCm programming experience preferred
Machine Learning Foundations:
Solid understanding of gradient descent and backpropagation algorithms
Experience with transformer architectures and the ability to explain their mechanics
Knowledge of deep learning training and its applications
Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)
Research and Development
Publications: Experience with machine learning research and publications preferred
Research Skills: Ability to read, understand, and implement techniques from recent ML research papers
Open Source: Demonstrated commitment to open source development and community collaboration
Experience
3+ years of experience in machine learning engineering or research.
Experience with large-scale distributed training frameworks (Megatron‑LM, DeepSpeed, FairScale, etc.).
Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.).
Experience with containerization (Docker, Kubernetes) and cluster management.
Background in systems programming and performance optimization.
Bonus points if:
PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field.
Experience with SLURM, Kubernetes, or other cluster orchestration systems.
Knowledge of mixed precision training, data parallel training, and scaling laws.
Experience with transformer architecture, pytorch, decoding algorithms.
Familiarity with high performance GPU programming ecosystem.
Previous contributions to major open source ML projects.
Experience with MLOps and model deployment at scale.
Understanding of modern attention mechanisms (multi‑head attention, grouped query attention, etc.).
Why RelationalAI
RelationalAI is committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of our team. We are driven by curiosity, value innovation, and help each other to succeed and to grow. We take the well‑being of our colleagues seriously, and offer flexible working hours so each individual can find a healthy balance that affords them a productive, happy life wherever they choose to live.
🌎 Global Benefits at RelationalAI
At RelationalAI, we believe that people do their best work when they feel supported, empowered, and balanced. Our benefits prioritize well‑being, flexibility, and growth, ensuring you have the resources to thrive both professionally and personally.
We are all owners in the company and reward you with a competitive salary and equity.
Work from anywhere in the world.
Comprehensive benefits coverage, including global mental health support
Open PTO - Take the time you need, when you need it.
Company Holidays, Your Regional Holidays, and RAI Holidays-where we take one Monday off each month, followed by a week without recurring meetings, giving you the time and space to recharge.
Paid parental leave - Supporting new parents as they grow their families.
We invest in your learning & development
Regular team offsites and global events - Building strong connections while working remotely through team offsites and global events that bring everyone together.
A culture of transparency & knowledge‑sharing - Open communication through team standups, fireside chats, and open meetings.
Country Hiring Guidelines
RelationalAI hires around the world. All of our roles are remote; however, some locations might carry specific eligibility requirements.
Because of this, understanding location & visa support helps us better prepare to onboard our colleagues.
Our People Operations team can help answer any questions about location after starting the recruitment process.
Privacy Policy
EU residents applying for positions at RelationalAI can see our Privacy Policy here.
California residents applying for positions at RelationalAI can see our Privacy Policy here.
Equal Opportunity
RelationalAI is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.
#J-18808-Ljbffr
$86k-121k yearly est. 1d ago
Staff ML Engineer - Open-Domain QA & LLMs (RL)
Apple Inc. 4.8
Engineer job in Santa Clara, CA
A leading technology company based in Santa Clara, California, is seeking a Staff Machine Learning Engineer to enhance AI features across Apple products. The ideal candidate will have extensive experience in machine learning, particularly in reinforcement learning and large language model post-training. This role promises a competitive salary, stock options, and a comprehensive benefits package including medical, dental, and education reimbursement.
#J-18808-Ljbffr
$167k-219k yearly est. 3d ago
Senior ML Engineer - GenAI & RAG Platforms
Cisco Systems 4.8
Engineer job in Palo Alto, CA
A global technology leader is seeking an experienced engineer to implement GenAI services and APIs enhancing Splunk products. Candidates should have over 5 years of backend systems experience, excel in Python or TypeScript, and possess knowledge of LLM APIs. This role offers competitive pay and a variety of benefits including medical insurance, retirement plans, and generous paid time off, with a salary range of $181,000 to $235,000 annually based in Palo Alto, California.
#J-18808-Ljbffr
$181k-235k yearly 2d ago
Principal Enterprise IT Engineer
1X Technologies
Engineer job in Palo Alto, CA
Principal Enterprise IT Engineer, IT & Security
About 1X: We're an AI and robotics company based in Palo Alto, California, on a mission to build a truly abundant society through general-purpose robots capable of performing any kind of work autonomously. We believe that to truly understand the world and grow in intelligence, humanoid robots must live and learn alongside us. That's why we're focused on developing friendly home robots designed to integrate seamlessly into everyday life. We're looking for curious, driven, and passionate people who want to help shape the future of robotics and AI. If this mission excites you, we'd be thrilled to hear from you and explore how you might contribute to our journey.
Role Overview
The Principal Enterprise IT Engineer will lead the strategy, architecture, and implementation of enterprise IT systems across the company. This role will define standards for identity, endpoint management, collaboration, and security while scaling IT infrastructure to support rapid organizational growth. You'll play a key leadership role, mentoring senior engineers and influencing cross-functional and executive stakeholders to align IT operations with strategic business needs.
Responsibilities
Define and drive enterprise IT strategy, architecture, and roadmaps across identity, collaboration, and device platforms
Lead administration and scaling of Google Workspace, Okta, Intune, and MDM platforms with a focus on Zero Trust principles
Develop and implement automation frameworks and scripting (Bash, Python, PowerShell) to streamline IT operations
Align IT systems with compliance standards (e.g., SOC2, ISO 27001) and proactively mitigate enterprise risks
Ensure seamless integration of IT systems with engineering, manufacturing, and robotics environments
Act as senior escalation point for IT operations, mentoring IT engineers and building a high-performance function
Influence executive and cross-functional stakeholders to ensure IT strategy supports business growth
Requirements
Expert-level knowledge of Google Workspace, Okta, Microsoft Intune, and MDM platforms across multiple OS (mac OS, Windows, iOS, Android)
Strong scripting and automation skills (Bash, Python, PowerShell); experience implementing Zero Trust security
Proven experience scaling IT systems globally in high-growth, cloud-first or hybrid environments
Ability to lead IT architecture initiatives and partner with executive and security leadership
Experience mentoring senior IT engineers and leading high-performance teams
Preferred: Familiarity with Terraform, Ansible, and IT support for robotics or engineering-heavy environments
Preferred: Certifications such as CISSP, Okta Certified Architect, Google Workspace Admin, or Microsoft Enterprise Mobility
Benefits & Compensation
Salary: $180,000 - $235,000
Health, dental, and vision insurance
401(k) with company match
Paid time off and holidays
Equal Opportunity Employer
1X is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, ancestry, citizenship, age, marital status, medical condition, genetic information, disability, military or veteran status, or any other characteristic protected under applicable federal, state, or local law.
#J-18808-Ljbffr
$180k-235k yearly 3d ago
Machine Learning Infrastructure Engineer
Institute of Foundation Models
Engineer job in Sunnyvale, CA
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
As part of our team, you'll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.
The Role
We're looking for a distributed ML infrastructure engineer to help extend and scale our training systems. You'll work side‑by‑side with world‑class researchers and engineers to:
Extend distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)
Implement distributed optimizers from mathematical specs
Build robust config + launch systems across multi‑node, multi‑GPU clusters
Own experiment tracking, metrics logging, and job monitoring for external visibility
Improve training system reliability, maintainability, and performance
While much of the work will support large‑scale pre‑training, pre‑training experience is not required. Strong infrastructure and systems experience is what we value most.
Key Responsibilities
Distributed Framework Ownership - Extend or modify training frameworks (e.g., DeepSpeed, FSDP) to support new use cases and architectures.
Optimizer Implementation - Translate mathematical optimizer specs into distributed implementations.
Launch Config & Debugging - Create and debug multi‑node launch scripts with flexible batch sizes, parallelism strategies, and hardware targets.
Metrics & Monitoring - Build systems for experiment tracking, job monitoring, and logging usable by collaborators and researchers.
Infra Engineering - Write production‑quality code and tests for ML infra in PyTorch or JAX; ensure reliability and maintainability at scale.
Qualifications Must-Haves:
5+ years of experience in ML systems, infra, or distributed training
Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)
Strong software engineering fundamentals (Python, systems design, testing)
Proven multi‑node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO)
Ability to implement algorithms across GPUs/nodes based on mathematical specs
Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team
Experience with large‑scale machine learning workloads (strong ML fundamentals)
Nice-to-Haves:
Exposure to mixed‑precision training (e.g., bf16, fp8) with accuracy validation
Familiarity with performance profiling, kernel fusion, or memory optimization
Open‑source contributions or published research (MLSys, ICML, NeurIPS)
CUDA or Triton kernel experience
Experience with large‑scale pre‑training
Experience building custom training pipelines at scale and modifying them for custom needs
Deep familiarity with training infrastructure and performance tuning
$150,000 - $450,000 a year
Benefits
Comprehensive medical, dental, and vision
401(k) program
Generous PTO, sick leave, and holidays
Paid parental leave and family‑friendly benefits
On‑site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station
#J-18808-Ljbffr
$114k-174k yearly est. 1d ago
Siri Runtime ML Engineer - Systems & Interaction
Apple Inc. 4.8
Engineer job in Cupertino, CA
A leading technology company is seeking a Machine Learning Engineer to contribute to the development of Siri. You will work on designing and optimizing machine learning algorithms and collaborate cross-functionally at Apple. The ideal candidate will have strong programming skills and experience in machine learning. Salary base range is $126,800 to $220,900, with additional benefits including stock options and comprehensive healthcare.
#J-18808-Ljbffr
$126.8k-220.9k yearly 18h ago
Senior NLP & Language AI Engineer
Cisco Systems 4.8
Engineer job in San Jose, CA
A leading tech firm is seeking a recent graduate or final-year PhD/Master's student for a role focused on advancing NLP research within their Language AI features. Responsibilities include conducting rigorous research, collaborating with product managers, and innovating in AI solutions. Ideal candidates will have a strong background in machine learning and natural language processing, and experience with Python. This position is located in San Jose, California, and offers a competitive salary package.
#J-18808-Ljbffr
$138k-175k yearly est. 1d ago
Principal Enterprise IT Engineer - Zero-Trust & Automation
1X Technologies
Engineer job in Palo Alto, CA
A robotics and AI company in Palo Alto is seeking a Principal Enterprise IT Engineer responsible for leading IT strategy and architecture across the organization. The ideal candidate will have expert knowledge of Google Workspace and Okta, strong scripting skills, and experience scaling IT systems in high-growth environments. This role offers a competitive salary of $180,000 - $235,000 along with comprehensive health benefits and 401(k) matching.
#J-18808-Ljbffr
The average engineer in Union City, CA earns between $67,000 and $138,000 annually. This compares to the national average engineer range of $65,000 to $130,000.