Infrastructure engineer jobs in Lafayette, CA

- 1,994 jobs

All

Infrastructure Engineer

Virtualization Engineer

Data Center Analyst

Data Center Engineer

Center Administrator

Systems Administrator

Systems Engineer

Infrastructure Administrator

Surgery Center Administrator
Midland-Marvel Recruiters, LLC
Infrastructure engineer job in San Francisco, CA
Surgery Center Administrator needed for growing Ambulatory Surgery Center! Bonus Incentives and Full Relocation! Surgery Center in which physicians have an ownership or investment interest. This state-of-the-art, multispecialty facility performs procedures in: Orthopedics, Pain Management and Plastic Surgery. 2 Operating Rooms and 1 Treatment Room. Ideal candidate for this role will either have Ambulatory Surgery Center (ASC) leadership experience or hospital surgical leadership experience with ASC exposure. Summary: Responsible for directing, coordinating, and controlling all aspects of the operating functions, processes, and staff of the facility while demonstrating the primary goal of efficiently providing surgical services that exceed customer expectations and improve clinical and financial operations. Qualifications Bachelor's degree or equivalent work experience. Minimum three years of experience in a top administrative or management position either in the ASC or hospital surgical space. Ability to work well with physicians, employees, patients, and others.
$75k-126k yearly est. 3d ago
System Administrator
MacHaon Diagnostics
Infrastructure engineer job in Berkeley, CA
Machaon Diagnostics is a clinical reference laboratory and contract research organization (CRO) that focuses on diagnosing, treating, and monitoring hemostatic and thrombotic conditions, complement-mediated disorders, and rare genetic diseases. Our mission is to save more lives with lab tests. Originating from a collaboration of four laboratory scientists, the team now includes clinicians, scientists, consultants, and technologists with over 400+ years of collective expertise. We provide esoteric and routine testing services to a broad clientele, including community hospitals, university medical centers, clinics, commercial laboratories, and research facilities, as well as biotechnology, pharmaceutical, and medical device companies. Our primary goal is to deliver high-quality testing with industry-leading speed. Role Description This is a full-time, on-site role located in Berkeley, CA for a Patient Advocate and Accounts Receivable Specialist. The responsibilities include managing patient accounts, advocating for patient needs, handling customer service inquiries, and managing cases. Additionally, the role involves critical thinking and effective communication in a medical setting. The specialist will work closely with patients, healthcare providers, and internal teams to ensure the best possible outcomes for those requiring our diagnostic services. Administer and maintain Google Workspace, Microsoft 365, Active Directory, MDM platforms, and enterprise SaaS applications. Manage and maintain server hardware, storage solutions, and network equipment (routers, switches, firewalls). Manage user lifecycle processes, including provisioning, de-provisioning, and access control. Support and secure endpoints across mac OS and Windows environments. Support processes and systems for asset inventory and management for hardware, software, and subscription services Support the onboarding process of new employees to include system setup, adding accounts to the AD infrastructure, and shipping computers and peripherals to employees Support IT projects to completion with direction from the Director of Laboratory Information Systems Supports issuing new computer hardware and the disposition of end-of-life equipment Supports IT requirements through direct employee and guest support for remote and on-site staff Perform other related duties as required and identified in goals set by the Director of Laboratory Information Systems Qualifications: Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience. Minimum 3 years of experience in system administration, network administration or related field, ideally within a healthcare or similarly regulated environment. Demonstrated competence with Microsoft 365 / Entra ID (Azure AD), Active Directory, and MDM solutions Familiarity with Google Workspace Strong troubleshooting and problem-solving skills. A+/Network+/Security+ Certification is preferred · Demonstrate a high degree of integrity, enthusiasm, and initiative daily. Constant adherence to HIPAA compliance and patient confidentiality requirements Please send a cover letter and resume to the Human Resources Director ******************************
$80k-112k yearly est. 5d ago
Energy Marshall, Data Centers
Suffolk Construction 4.7
Infrastructure engineer job in Sunnyvale, CA
Suffolk is a national enterprise that builds, innovates, and invests. We provide value across the entire project lifecycle through our core construction management services and complementary business lines in real estate investment, design, self-perform construction, and technology start-up investment (Suffolk Technologies). By integrating data, artificial intelligence, and advanced technology through our Seamless Platform, we connect design, construction, and operations to deliver smarter, more predictable results and redefine how America builds. Suffolk - America's Contractor - is a national company with more than $8 billion in annual revenue, 3,000 employees, and 17 offices, including Boston (headquarters), New York City, Miami, West Palm Beach, Tampa, Estero, Dallas, Los Angeles, San Francisco, San Diego, Las Vegas, Herndon, U.S. Virgin Islands, and other key markets. Suffolk manages some of the most complex and transformative projects in the country, serving clients across healthcare, life sciences, education, gaming, aviation, transportation, government, mission critical, and commercial sectors. Suffolk is privately held and is led by founder, chairman and CEO John Fish. Suffolk is ranked #8 on ENR's list of “Top CM-at-Risk Contractors.” For more information, visit *************** and follow Suffolk on Facebook, Twitter, LinkedIn, YouTube, and Instagram. Position: Suffolk is currently seeking an Energy Marshall to implement learning, provide consistency, and drive rigor into energy isolation and electrical safety programs. Responsibilities: Reviewing the Electrical Energization Safety Program with the electrical contractor and commissioning authority Involvement with all stored energy systems - gas, water, steam, air. Organizing and scheduling Pre-Energization meetings Confirming individuals working on energized / de-energized equipment are Qualified Work based on NFPA 70E, OSHA, or an accepted qualified electrical safety training standard. Delivering a project specific Electrical Safety Orientation to employees who will be working on energized or de-energized equipment Reviewing the electrician's LOTO plan and verifying it is accurate and managed properly. Reviewing electrician and vendor AHA's. Confirming receipt of the approved coordination study and all arc flash labels have been applied to the equipment. Tracking and confirming all required QA/QC is complete and documentation has been submitted. Reviewing the daily Pre-Task Plan for energization activities. Implementing adequate communication to the project team that identifies daily high-risk activities, energized equipment and spaces, barriers, and off-limit spaces. Confirming all pre-energization steps have been completed. Conducting pre-energization daily walks with the electrician and project stakeholders. Performing end-of-day walks for electrical equipment to confirm all systems are secure. Confirming adherence to the LOTO plan and isolation requirements. Confirming adequate signage and barriers are installed for electrical rooms and spaces with energized equipment. Confirming an adequate access control plan is in place for electrical rooms and spaces with energized equipment. Qualifications: BA/BS + 5 years of related experience or demonstrated equivalency of experience and/or education Able to understand the safe installation of electrical equipment and various voltages, equipment types, and AC/DC systems Knowledge of pressurized mechanical lines, compressed gas and air. Experience in construction and electrical commissioning standards and practices. Experience communicating complex technical solutions and concepts to engineers and non-engineers. Ensure audit site practices against written standards as part of assurance role. Ability to Interpret line drawings and system redundancies to ensure design of LOTO systems are 100% effective and in compliance with customer standards. While performing the duties of this job, the employee is regularly required to sit for long periods of time; talk or hear; perform fine motor, hand and finger skills in the use of a keyboard, telephone, or writing. The employee is frequently required to stand; walk; and reach with arms and/or hands. Specific vision abilities include close vision, distance vision, depth perception and the ability to adjust focus. The employee will spend their time in an office environment with a quiet to moderate noise level. Job site walking. Suffolk provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, pregnancy or maternity, national origin, citizenship, genetic information, disability, protected veteran, gender identity, age or any other status protected by law. This policy applies to recruiting, hiring, transfers, promotions, terminations, compensation, benefits, and all other terms and conditions of employment. Suffolk will not tolerate any unlawful discrimination toward, or harassment of, applicants or employees by anyone at Suffolk, or anyone working on behalf of Suffolk.
$62k-83k yearly est. 3d ago
MEP Systems Engineer
Samara 3.4
Infrastructure engineer job in Redwood City, CA
Ready to play a key role in building the future of living? Join Samara in tackling California's housing shortage and enabling people to attain sustainable housing without compromising design or quality. Our flagship product, Backyard, is a fully turnkey, premium accessory dwelling unit (ADU) designed for homeowners and real estate developers. As we expand our offerings and scale our in-house development initiatives, we're at a pivotal moment, redefining homeownership through high-quality, attainable infill housing. Backed by top-tier investors, including Airbnb, Thrive Capital, and 8VC, Samara is positioned for significant growth and market impact. To support our next phase of growth, we're hiring product-focused engineers to advance and scale the technical foundation of our modular system. These roles go beyond traditional design work-they refine system standards, improve factory repeatability, and ensure our units are code-compliant, manufacturable, and built to the highest standards of quality and performance. The MEP Systems Engineer will be responsible for the detailed design and implementation of mechanical, electrical, plumbing, and PV systems tailored for modular construction building systems. This role requires a deep understanding of MEP systems combined with practical experience in modular construction. You will collaborate closely with leadership, crossfunctional design and engineering teams to integrate all technical and user experience requirements into our designs to ensure optimal functionality, sustainability, and compliance with all regulations. What You'll Do Design and develop integrated MEP systems for our new and existing designs including solar energy systems, including PV and ESS, optimized for prefabricated modular construction Ensure that solar and energy storage designs align with overall MEP system functionality and building energy requirements Lead the creation of comprehensive design documents, schematics, component material selections and system layouts, preferably using CAD and BIM software Provide technical leadership during the installation and commissioning phases to ensure systems meet design specifications and performance standards Conduct system testing and validation to ensure functionality, efficiency, and safety of both MEP and PV installations Collaborate closely with installation teams to facilitate seamless and efficient factory and onsite implementation of design Engage in research and application of the latest technologies and practices in renewable energy and modular construction Work with program managers and other engineering disciplines to ensure holistic integration of all systems within Samara modular units What We're Looking For Modular construction experience in factory builds, multi-mod, stackable and/or other hands on related experience. Licensed Electrician or Mechanical Contractor -and/or- Bachelor's degree in Mechanical, Electrical, or Energy Systems Engineering, or a related field Professional Engineering (PE) license preferred Minimum of 7 years of experience in one of the following: Mechanical, Electrical, Solar and/or Plumbing System design Comprehensive knowledge of building codes, safety regulations, and sustainability practices relevant to MEP and renewable energy systems Proficiency in design software such as Onshape, Revit, and/or other BIM methodologies preferred Excellent problem-solving skills and the ability to adapt designs to changing technological and regulatory landscapes Strong communication and leadership skills, capable of driving project decisions and managing complex stakeholder relationships Ability to travel to our factory in Mexico up to 25-40%. What We Offer Salary range of $120-160K and performance-based bonuses. Hybrid work schedule with 3 days each week in our Redwood City office. Snacks and Lunch on in-office days Early stage employee equity. Exceptional health, dental, and vision insurance. 401k eligibility after 6 months. Flexible PTO policy. How to Apply If you're excited to support Samara's mission and have the skills to match, we'd love to hear from you. Please submit your resume and a brief letter of introduction to our team. Let's build something extraordinary-together.
$120k-160k yearly 2d ago
Firmware Infrastructure Engineer - GPU
Nvidia 4.9
Infrastructure engineer job in Santa Clara, CA
Do you enjoy hacking and tinkering at the lowest levels of software? Are you capable of crafting and implementing creative secure solutions in heavily resource-constrained environments? If so, you're primed to join our team in developing the tools and infrastructure that support the software and firmware that boot the world's best GPUs. We are searching for an outstanding software engineer to fill a challenging, yet fun role on our GPU Firmware team. You will be joining a team whose primary mission is designing and implementing world-class automation, tools, and practices to improve, unlock and accelerate the innovations of firmware that defines the cutting edge of GPU technology. In the world of firmware, where tight requirements of security, boot-time, storage space act as constraints to all solutions: Every. Byte. Counts. This is your chance to create waves in the industry while directly working with and alongside some of the most top-valued diverse set of minds in the business. Your goal will be to shape the future of GPU technologies doing exactly what you enjoy: solving puzzles. If this sounds exciting and you're up for the task, we'd certainly like to hear from you! What you'll be doing: * Improve team software process and core infrastructure by enhancing and designing tooling, build systems, and regression farms * Design, develop, test, debug, and optimize creative solutions for GPU firmware throughout the entire GPU lifecycle * Work closely with hardware, software, infrastructure, and business teams to transform new infrastructure features from idea to reality * Work with leading OS and PC vendors to improve and innovate on the startup experience * Create, document, and automate workflows, processes, and tooling for internal-facing and external-facing team projects What we need to see: * BS or MS degree in EE/CS/CE (or equivalent experience) * 4+ yrs experience * Automation experience with modern CI/CD tools * Sturdy technical background in cloud/distributed infrastructure * In depth understanding of database concepts and object modeling. * Strong grasp of software development lifecycle and coding practices * Strong Python, C, and scripting skills * Even stronger interpersonal skills * Sense of humor highly encouraged, but not required * Easy to work with, as you'll constantly be engaged with both hardware designers and other software engineers to design, develop, and debug functional (and non-functional!) aspects of GPU subsystems and infrastructure Ways to stand out from the crowd: * Experience with static/dynamic code analysis, tooling, and build automation * Experience in developing device BIOS, firmware, or other low-level software * Familiarity with SQL language and one or more SQL database ecosystems * Experience with secure development techniques such as threat models, attack-trees, static/dynamic analysis, fuzzing, and negative testing * Passion for optimizing and unlocking the potential of yourself and others through your work NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until December 6, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
$140k-184k yearly est. Auto-Apply 60d+ ago
AI Infrastructure Engineer
Advanced Micro Devices, Inc. 4.9
Infrastructure engineer job in San Jose, CA
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE PERSON: We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed. Key Responsibilities * Build and extend platform capabilities to enable new classes of workloads (e.g., interactive development pods, CI pipelines, inference services, benchmarking jobs). * Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments. * Develop platform features such as secret management, configuration management, and deployment automation for customers. * Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management. * Manage service lifecycle within Kubernetes using Helm and GitOps workflows (e.g., ArgoCD or Flux). * Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads. Required Qualifications * 5+ years of experience in DevOps, Platform, or Infrastructure Engineering. * Deep hands-on experience with Kubernetes and container orchestration at scale. * Proven ability to design and deliver platform features that serve internal customers or developer teams * Experience building developer-facing platforms or internal developer portals (e.g.custom workflow tooling). Nice to Have * Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy). * Experience with Infrastructure as Code tools like Terraform. * Background in HPC, Slurm, or GPU-based compute systems for ML/AI workloads. * Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.). * Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.). #LI-G11 #LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
$128k-167k yearly est. 57d ago
Virtual Memory Kernel Engineer
Apple Inc. 4.8
Infrastructure engineer job in Cupertino, CA
The Darwin Systems team within Apple's CoreOS organization is responsible for delivery of a high-quality and performant kernel for just about every one of Apple's products. Our software runs on your wrist as part of watch OS; in your pocket with iOS; on your desk in mac OS; in your living room with tv OS; and now in vision OS and Apple's Cloud. These are the devices owned by your friends and family; and hundreds of millions of devices beyond those. We ensure every aspect of the kernel and other system software are top class: features, performance, stability, security… This position requires a solid understanding of operating systems fundamentals, including kernel design and implementation. The virtual memory team is in charge of page management, mechanisms such as copy on write, low-memory process killing, swap… We work with every layer of the stack: from hardware all the way up to applications and successful engineers will be able to dig deep into details and work with other engineers to solve problems, find opportunities to keep on improving our stack and design to improve our customers' experience. As Moore's law is slowing down, effective management of resources is becoming more and more important. We work closely with all product teams across Apple to provide them with a modern, efficient operating system that allows them to ship the kinds of quality products that our customers expect. Come work with us on Apple's operating systems and get a chance to influence design across the stack: from Silicon all the way up to the SDK and applications while focusing on performance and delivering value to our customers. BS/MS in Computer Science + 5 years work experience or equivalent knowledge and experience Ability to work with teams across multiple timezones. Familiarity with Unix and associated tools. Ability to ramp up quickly on an unfamiliar code base. In-depth knowledge of kernel internals. Highly professional, with the ability to multitask and deliver solid work on tight schedules. A demonstrated record of working on core operating system technologies, specifically around memory management in a modern kernel. Design and implementation responsibility for a major project. Demonstrated creative and critical thinking capabilities and troubleshooting skills. Familiarity with modern processor architecture (e.g. memory hierarchy, multi-core, multithreading, etc).
$131k-169k yearly est. 60d+ ago
Autonomy Engineer - Deep Learning Infrastructure
Skydio, Inc. 4.5
Infrastructure engineer job in San Mateo, CA
Skydio is the leading US drone company and the world leader in autonomous flight, the key technology for the future of drones and aerial mobility. The Skydio team combines deep expertise in artificial intelligence, best-in-class hardware and software product development, operational excellence, and customer obsession to empower a broader, more diverse audience of drone users. From utility inspectors to first responders, soldiers in battlefield scenarios and beyond. Skydio is the leading US drone company and the world leader in autonomous flight. We leverage breakthrough AI to create the world's most intelligent flying machines for use by enterprise and government. Learning a semantic and geometric understanding of the world from visual data is the core of our autonomy system. We are pushing the boundaries of what is possible with real-time deep networks to accelerate progress in intelligent mobile robots. About the role: If you are excited about leveraging massive amounts of structured video data to solve problems in Computer Vision (CV) such as object detection and tracking, optical flow estimation and segmentation, we would love to hear from you. How you'll make an impact: As a deep learning infrastructure engineer, you will be responsible for building and scaling the infrastructure that supports Skydio's DL and AI efforts. You will be working at the nexus of Skydio's autonomy, embedded and cloud teams to deliver new capabilities and empower the deep learning team.How you'll make an impact: * Develop solutions for high-performance deep learning inference for CV workloads that can deliver high throughput and low latency on different hardware platforms * Profile CV and Vision Language Models (VLMs) to analyze performance, identify bottlenecks and optimization opportunities and improve power efficiency of deep learning inference workloads * Design and implement end to end MLOps workflows for model deployment, monitoring and re-training * Utilize advanced Machine Learning knowledge to leverage training or runtime frameworks or model efficiency tools to improve system performance * Create new methods for improving training efficiency * Implement GPU kernels for custom architectures and optimized inference * Design and implement SDKs that allow customers/external developers to create autonomous workflows using ML * Leverage your expertise and best-practices to uphold and improve Skydio's engineering standards What makes you a good fit: * Demonstrated hands-on experience with MLOps, ML inference optimization and edge deployment * Strong knowledge of DL fundamentals, techniques and state-of-the-art DL models/architectures * Strong fundamentals in CV, image processing and video processing * Demonstrated hands-on experience building and managing ML pipelines for solving vision or vision language tasks including data preparation, model training, model deployment and monitoring * Experience and understanding of security and compliance requirements in ML infrastructure * Experience with ML frameworks and libraries * You have demonstrated ability to take a concept and systematically drive it through the software lifecycle: architecture, development, testing, and deployment, and monitoring * You are comfortable navigating and delivering within a complex codebase * Strong communication skills and the ability to collaborate effectively at all levels of technical depth Compensation: At Skydio, our compensation packages for regular, full-time employees include competitive base salaries, equity in the form of stock options, and comprehensive benefits packages. Compensation will vary based on factors, including skill level, proficiencies, transferable knowledge, and experience. Relocation assistance may also be provided for eligible roles. The annual base salary range for this position is $170,000 - 236,500*. Fundamentally, we believe that equity is the key to long-term financial growth, and we ensure all regular, full-time employees have the opportunity to significantly benefit from the company's success. Regular, full-time employees are eligible to enroll in the Company's group health insurance plans. Regular, full-time employees are eligible to receive the following benefits: Paid vacation time, sick leave, holiday pay and 401K savings plan. This position and all associated benefits are subject to applicable federal, state, and local laws, as well as the Company's policies and eligibility criteria. * Compensation for certain positions may vary based on the position's location. #LI-PG1 At Skydio we believe that diversity drives innovation. We have created a multidisciplinary environment that embraces the power of diverse perspectives to create elegant solutions for complex problems. We are committed to growing our network of people, programs, and resources to nurture an inclusive culture. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or other characteristics protected by federal, state or local anti-discrimination laws. For positions located in the United States of America, Skydio, Inc. uses E-Verify to confirm employment eligibility. To learn more about E-Verify, including your rights and responsibilities, please visit *************************
$170k-236.5k yearly Auto-Apply 3d ago
Infrastructure Engineer
Tiger 4.6
Infrastructure engineer job in San Francisco, CA
Example org is a leading software company. Example org allows real-time collaboration on important example workflows. Founded in 2012 we have over 10,000 customers worldwide and are backed by fantastic investors such as Example Capital. Example has raised its Series C and is valued at $750 million dollars. This example role will be part of an example team and will report to the example manager. The new hire plays a critical role in various example workflows. What you'll do Participate in example meetings Lead example initiatives Recruit new team members to example team Mentor and develop existing team members Requirements Experience writing good example job descriptions Other exemplary skills 3-5 years prior experience in this role Motivation Great english language skills Why you might want to work with us We take care of you and your family with comprehensive health, vision, and dental insurance. We're serious about food. Free catered lunch every day, and a fully stocked kitchen with occasional snack appearances from our Japanese office. Healthy and not-so-healthy options are available, as are foods for those with dietary restrictions. You're excited to work on a product that will impact almost any consumer, almost anywhere. We dress casually. If you want, you can wear slippers in the office. You should see the creative collection our team has built. We believe in a culture of learning, and want to keep building our skills, experiences, and capabilities. We offer flexible work schedules. We trust our team to know how they will do their best work. We're family friendly. We want our teammates to focus on what they need to when they need to. We offer very competitive compensation, including equity in Standard, to each one of our employees. Example org provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities. Do you have further questions about this role? Reach out to our talent team at *******************.
$132k-182k yearly est. Auto-Apply 60d+ ago
ML Infrastructure Engineer (Staff / Principal)
Genesis Therapeutics
Infrastructure engineer job in Burlingame, CA
About the Team We're a tight-knit team of proven drug hunters, deep learning researchers, and software engineers united by a common mission - drive AI innovation in biochemistry, discovering and developing groundbreaking therapies for patients suffering from severe disorders. Genesis AI team is focused on developing foundation models for small molecule drug discovery by conducting fundamental research at the intersection of machine learning, physics, and computational chemistry, as well as engineering robust software systems that enable running large scale simulations and training generative and predictive AI models designed to learn from all kinds of molecular data, leveraging our cluster with 1000s of GPUs and 10,000s of CPUs. About the Role We're seeking experienced ML infrastructure engineers to join the team and lead engineering efforts focused on driving forward our ML research agenda for generative modeling of molecular systems, which is instrumental to our mission. As an engineer at Genesis, you will lead rapid iteration on our AI platform and infrastructure, unlocking the next level of performance, efficiency, and scale that was not previously possible. You will build massively distributed training and inference pipelines, core MLOps tools and frameworks, and optimize GPU operations to speed up ML models. Genesis is a highly-collaborative and cross functional environment, and you will work in close partnership with our exceptional engineers, researchers, and scientists. You Will * Lead engineering efforts focused on continuous improvement of the AI platform, focused on rapid build out and iteration on scalable and robust distributed infrastructure for ML training, inference, and evaluation. * Support model training and deployment across multiple clusters and multiple clouds, optimizing for throughput and cost. * Optimizing efficiency of ML models and other workloads in terms of latency, throughput, memory consumption, etc. (e.g., via GPU performance engineering), pushing the limits of what's possible with the current hardware. * Contribute to the long-term vision for Genesis' ML platform. * Have the opportunity to mentor and guide more junior members of our technical team as well as research interns, fostering an environment of growth and innovation. You are * Strong engineer who constantly strives for technical excellence. You can write clean code and have a deep understanding of the codebases you work in. * Deeply experienced with distributed training and inference of large models on GPU clusters and some of the core libraries and frameworks we use: Pytorch, Pytorch Lightning, Pytorch Geometric, and Ray. * Independent thinker with a strong sense of ownership and capability of engineering robust systems from first-principles-based conceptualization to state-of-the-art realization. * Curious, problem-oriented thinker who is excited to dive deep into the emerging field at the intersection of AI, physics, chemistry, and biology and make foundational contributions and discoveries (no previous experience in anything but ML necessary). Nice to haves * Experienced with building, maintaining and debugging low-level cluster infrastructure running on multiple clouds using Kubernetes and Terraform. * Experienced GPU engineer who can quickly figure out performance bottlenecks and architect highly performant code for large scale ML workloads. * Experience with XLA, Triton, CUDA, or similar accelerator programming languages and/or deep learning compiler stacks. * Experience working with some of the following: molecular systems (protein sequences and 3D structures, small molecules, etc.), ML force fields or other physics-informed models and methods, or point cloud data in other application domains, such as 3D graphics. Compensation, Benefits, and Perks * Competitive compensation package that includes salary and equity. * Comprehensive health benefits: Medical, Dental, and Vision (covered 100% for the employees). * 401(k) plan. * Open (unlimited) PTO policy. * Free lunches and dinners at our offices. * Paid family leave (maternity and paternity). * Life and long- and short-term disability insurance. About Genesis Molecular AI Genesis Molecular AI is pioneering foundation models for molecular AI to unlock a new era of drug design and development. The company's generative and predictive AI platform, GEMS (Genesis Exploration of Molecular Space), integrates AI and physics into industry-leading models to generate and optimize drug molecules, including the breakthrough generative diffusion model Pearl for structure prediction. Genesis has raised over $300 million from leading AI, tech and life science-focused investors, signed multiple AI-focused research collaborations with major pharma partners, and is deploying GEMS to advance an internal therapeutics pipeline for a variety of high-impact targets. Genesis is headquartered in Burlingame, CA, with a fully integrated laboratory in San Diego. We are proud to be an inclusive workplace and an Equal Opportunity Employer.
$115k-174k yearly est. 25d ago
AI Training Infrastructure Engineer - Helix Team
Figure 4.5
Infrastructure engineer job in San Jose, CA
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure's vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
$150k-350k yearly Auto-Apply 40d ago
Virtualization & Cloud Platform Engineer
Altera Semiconductor
Infrastructure engineer job in San Jose, CA
Job Details:Job Description: We are seeking an experienced Virtualization and Cloud Platform Engineer with a strong IT infrastructure background. The ideal candidate will have extensive experience (5+ years) managing and optimizing cloud and on-premises environments using Azure Local, Hyper-V, Windows Server OS, SQL databases, file share services, and SUSE Linux. Proficiency in scripting, monitoring solutions, and Windows SCCM administration is essential. Responsibilities: Design, implement, and maintain virtualized infrastructure leveraging Hyper-V and Azure Stack HCI (Azure Local). Manage and optimize Windows Server OS environments, ensuring system performance, reliability, and security. Administer SQL Server databases, including maintenance, backups, tuning, and troubleshooting. Configure and maintain enterprise file-sharing services, ensuring secure and efficient data access. Manage and support SUSE Linux environments within mixed OS landscapes. Develop automation scripts (PowerShell, Bash) to streamline infrastructure provisioning, deployment, monitoring, and maintenance. Implement and maintain comprehensive infrastructure monitoring solutions, proactively addressing system performance issues. Administer Windows System Center Configuration Manager (SCCM), including deployment, patch management, inventory, and reporting. Collaborate with IT teams to architect scalable, secure, and efficient solutions aligned with business requirements. Provide advanced troubleshooting and support for infrastructure-related issues, escalating when necessary. Salary Range Our compensation reflects the cost of labor within the US market. Actual salary may vary based on a number of factors including job location, job-related knowledge, skills, experiences, trainings, etc. $159.7k - $231.2k USD #LI-CG1 Qualifications: Minimum Qualifications Bachelor's Degree in Computer Science, Information Technology, or a related technical discipline, or equivalent professional experience. 5+ years of experience in virtualization and cloud platforms, specifically with Azure Local and Hyper-V. Strong proficiency in managing Windows Server Operating Systems and related services. Hands-on experience managing SQL Server databases and file share solutions. Demonstrable experience with SUSE Linux administration in hybrid environments. Expert-level scripting skills (PowerShell, Bash, etc.) for automation and management. Proven ability to deploy and manage monitoring tools and solutions (e.g., Azure Monitor, Nagios, Zabbix, or similar). Extensive knowledge and experience with Windows SCCM for systems deployment, patching, and configuration management. Excellent analytical, troubleshooting, and problem-solving skills. Effective communication skills and ability to work collaboratively within cross-functional teams. Preferred Qualifications Relevant industry certifications (e.g., Microsoft Azure certifications, MCSE, Linux certifications). Experience with hybrid cloud environments and migrations. Knowledge of container technologies (Docker, Kubernetes) and DevOps practices. Job Type: RegularShift:Shift 1 (United States of America) Primary Location:San Jose, California, United StatesAdditional Locations:Posting Statement:All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.
$159.7k-231.2k yearly Auto-Apply 60d+ ago
Staff Infrastructure Engineer
Crusoe 4.1
Infrastructure engineer job in Sunnyvale, CA
Job Description Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure. About the Role: We are seeking a Staff Software Infrastructure Engineer to play a critical role in managing Crusoe's fleet operations, focusing on foundational tools for provisioning and reprovisioning servers with a strong emphasis on Infrastructure as code. The role includes building automation tools, troubleshooting hardware, and scaling operations to support high growth. The candidate will be integral in transitioning to Kubernetes and optimizing Crusoe's infrastructure. This position offers the opportunity to work on cutting-edge technologies within a world-class team and contribute directly to the success of a rapidly growing company while making a significant impact on the global energy landscape. What You'll Be Doing: Manage and maintain day-to-day operations of Crusoe's cloud infrastructure. Develop automation tools to streamline server provisioning and reduce SLA times. Scale infrastructure to support mass deployments (80-100 servers simultaneously). Troubleshoot hardware issues, especially with GPUs, and liaise with vendors. Transition Crusoe's environment to Kubernetes and containerized workflows. What You'll Bring to the Team: Solid hardware experience and GPU troubleshooting expertise. Strong Linux background Knowledge of PXE booting and server provisioning (bare metal) Experience with BMC/IPMI, BIOS, and enterprise-grade server management. Kubernetes proficiency (admin or developer). Familiarity with containerization technologies (Docker preferred). Experience with version control systems ( Gitlab ) Problem solving skills - able to analyze complex technical issues and develop effective solutions Strong communication and collaboration skills to work effectively with cross-functional teams Values: Embody the Company values Experience with MAAS (nice to have) Proficiency in Python or Golang (preferred language) (nice to have) Kubernetes administration and deployment experience (nice to have) Experience with Ansible and Terraform (nice to have) Benefits: Industry competitive pay Restricted Stock Units in a fast growing, well-funded technology company Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents Employer contributions to HSA accounts Paid Parental Leave Paid life insurance, short-term and long-term disability Teladoc 401(k) with a 100% match up to 4% of salary Generous paid time off and holiday schedule Cell phone reimbursement Tuition reimbursement Subscription to the Calm app MetLife Legal Company paid commuter benefit; $300 per month Compensation: Compensation will be paid in the range of $209,000 - $253,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
$127k-183k yearly est. 4d ago
Founding Infrastructure Engineer (Onsite)
Cadre 4.4
Infrastructure engineer job in San Francisco, CA
Founding Infrastructure Engineer Judgment Labs San Francisco, CA (Onsite) Judgment Labs is building Agent Behavior Monitoring (ABM) infrastructure - a critical reliability layer for AI agents in production. Our platform helps companies detect poor agent behavior, prevent failures before they reach customers, and continuously improve agent performance based on real-world interactions. As AI agents move into mission-critical workflows, Judgment Labs is becoming the system of record for how agents actually behave in the wild. We are tackling one of the hardest problems in AI today: correctness, reliability, and learning at scale. Funding: $30M+ raised Backers: Lightspeed and other top-tier investors Stage: Early-stage, high-growth (founded 2025) Role Overview We are hiring a Founding Infrastructure Engineer to own the cloud and infrastructure roadmap end-to-end. This is a foundational, hands-on role responsible for building and scaling the systems that power Judgment Labs' core product across cloud, enterprise, and self-hosted environments. You will design and operate high-throughput, low-latency infrastructure that supports large-scale data ingestion, querying, and secure deployments across multi-cloud and BYOC environments. This role is onsite in San Francisco and is ideal for senior engineers who want deep ownership, technical autonomy, and direct impact at an early-stage company. Responsibilities Own and build Judgment Labs' cloud and infrastructure stack end-to-end Design and scale high-throughput data ingestion and query systems Operate and optimize distributed databases and streaming pipelines in production Lead infrastructure for self-hosted and BYOC deployments (VPC, on-prem, enterprise security) Build multi-cloud, multi-region infrastructure with strong reliability and security guarantees Implement and manage Kubernetes-based deployments Define and maintain infrastructure-as-code using Terraform Partner closely with product, AI, and backend engineers to support rapid iteration Establish best practices for observability, performance, and operational excellence Requirements Experience 4+ years of experience as a backend, cloud, or infrastructure engineer Background at top-tier technology companies or high-performing startups Proven experience scaling high-throughput data systems (ingestion and query performance) Experience with self-hosted or BYOC deployments, including enterprise security considerations Technical Skills Strong proficiency with at least one major cloud provider (AWS, GCP, or Azure) Hands-on experience with: ClickHouse or similar analytical databases Kafka, RabbitMQ, or other distributed messaging systems Kubernetes in production environments Terraform or equivalent infrastructure-as-code tooling Education Technical undergraduate degree (Computer Science, Engineering, or related field) What We're Looking For Deeply hands-on individual contributors who enjoy building and operating systems Engineers who thrive in fast-moving, ambiguous environments Strong systems thinkers with a bias toward ownership and execution Not a Fit If Your experience is primarily management-focused with limited recent hands-on work You have only worked in slow-moving, large enterprise environments without startup exposure Team & Culture Tight-knit, senior, highly technical team based in San Francisco Extremely high talent density across AI, systems, and infrastructure Direct collaboration with founders and early engineers Fast-moving, low-bureaucracy environment with real ownership Why Join Now Early-stage but well-capitalized with strong investor backing Massive market tailwinds as AI agents become core to enterprise workflows Opportunity to define foundational infrastructure that every serious AI company will eventually need Meaningful ownership and impact at the ground floor of a category-defining company
$141k-202k yearly est. 4d ago
Enterprise Cloud Infrastructure Engineer
Stanford University 4.5
Infrastructure engineer job in Stanford, CA
**Business Affairs: University IT (UIT), Redwood City, California, United States** Information Technology Services Post Date Sep 05, 2025 Requisition # 107211 Build and maintain scalable, highly available, and resilient systems in the cloud and on-prem. Implement any new cloud functionality or migrate existing processes to the cloud and maintain them. Build and deploy systems utilizing Continuous Integration/Continuous Delivery framework and infrastructure automation. This is a hybrid-eligible position. **Core Responsibilities** + Build and manage IaaS, PaaS, and SaaS services (e.g., compute, storage, network, security, administration, automation, application services, and databases) in either native cloud or hybrid cloud environments. Will ensure that the infrastructure is properly optimized for performance, cost, and security. + Develop and maintain automation scripts using high-level with a 'devops language' (such as Python, etc) and infrastructure-as-code templates to provision and manage Oracle OCI and AWS resources using Terraform. + Engineer processes across various areas like servers, applications, and databases and monitor them to ensure everything works as it should, taking necessary steps to facilitate spikes in traffic and usage. + Take responsibility for infrastructure resource design and performance optimization, backup and recovery strategies and implementation, as well as monitoring the overall health of the environment. + Develop and maintain efficient and appropriate connectivity solutions between various campus infrastructure to ensure necessary data is available as needed. + Collaborate with application development and infrastructure teams to optimize cloud-based solutions, ensuring high performance and cost-effectiveness + Ensure monitoring coverage for applications and proactively monitor and manage the health of applications/systems; responsible and accountable for driving the root cause and ensuring all issues are addressed and resolved in a timely fashion. + Implement Best Practices for IaaS, PaaS services lifecycle **Minimum Qualifications** Education & Experience: + Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience. Knowledge, Skills and Abilities: + Demonstrate Cloud Infrastructure experience including but not limited to AWS, GCP, Azure, OCI + Extensive Experience with AWS server deployment using AWS CLI, HashiCorp Terraform. + Experience in orchestrating environment deployment from OS all the way through the application layers of a solution, using tools such as Docker, Kubernetes, Jenkins, Git, Puppet and many others. + Experience with Servers, Infrastructure, Platform Sizing, Infrastructure Cost Reduction + Experience with automated deployment/ Continuous Integration/ Release + Demonstrate experience in Network Security and Network awareness principal. + Demonstrate a full-stack infrastructure including but not limited to database, web server, application server development as code. + Good understanding & experience of Networking, Firewall, Security hardening, storage and OS software package management in Unix Environment. + Strong understanding of Cloud Technologies with focus on AWS and OCI. + Comprehensive knowledge of DevOps practices and tools. + Knowledge of infrastructure APIs on cloud platforms such as EC2/VM/Bare Metal/Containers, Storage/EBS/Volumes, Network/VPC/Subnet, Network Interfaces, etc. + Extensive experience with deploying Infrastructure as Code (IaC) patterns using Terraform + Ability to script proficiently using Unix Shell, Python for automation. + Ability to work independently or as a team member and self-motivated in learning new technology. + Excellent communication, presentation and collaborative problem-solving skills. + Ability to collaborate with vendors and system service providers, provide administration for system solutions, and act as a liaison. Preferred Qualifications: + Experience working with Single Sign-On SAML, OpenID is a plus. + Experience with AWS SQS, API Gateway, OAuth 2.0 & RESTful APIs is a plus. + Experience in Oracle OCI is highly recommended. + AWS , OCI certifications preferred. The expected pay range for this position is $150,922 - $155,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (***************************************************** provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. **Why Stanford is for you:** Imagine a world without search engines or social platforms. Consider lives saved through first-ever organ transplants and research to cure illnesses. Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with: + **Freedom to grow.** We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak. + **A caring culture** . We provide superb retirement plans, generous time-off, and family care resources. + **A healthier you.** Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits. + **Discovery and fun.** Stroll through historic sculptures, trails, and museums. + **Enviable resources.** Enjoy free commuter programs, ridesharing incentives, discounts and more! The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned. Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting acontact form. Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Additional Information + **Schedule: Full-time** + **Job Code: 4762** + **Employee Status: Regular** + **Grade: K** + **Requisition ID: 107211** + **Work Arrangement : Hybrid Eligible**
$150.9k-155k yearly 60d+ ago
Energy Marshall, Data Centers
Suffolk Construction 4.7
Infrastructure engineer job in Fremont, CA
Suffolk is a national enterprise that builds, innovates, and invests. We provide value across the entire project lifecycle through our core construction management services and complementary business lines in real estate investment, design, self-perform construction, and technology start-up investment (Suffolk Technologies). By integrating data, artificial intelligence, and advanced technology through our Seamless Platform, we connect design, construction, and operations to deliver smarter, more predictable results and redefine how America builds. Suffolk - America's Contractor - is a national company with more than $8 billion in annual revenue, 3,000 employees, and 17 offices, including Boston (headquarters), New York City, Miami, West Palm Beach, Tampa, Estero, Dallas, Los Angeles, San Francisco, San Diego, Las Vegas, Herndon, U.S. Virgin Islands, and other key markets. Suffolk manages some of the most complex and transformative projects in the country, serving clients across healthcare, life sciences, education, gaming, aviation, transportation, government, mission critical, and commercial sectors. Suffolk is privately held and is led by founder, chairman and CEO John Fish. Suffolk is ranked #8 on ENR's list of “Top CM-at-Risk Contractors.” For more information, visit *************** and follow Suffolk on Facebook, Twitter, LinkedIn, YouTube, and Instagram. Position: Suffolk is currently seeking an Energy Marshall to implement learning, provide consistency, and drive rigor into energy isolation and electrical safety programs. Responsibilities: Reviewing the Electrical Energization Safety Program with the electrical contractor and commissioning authority Involvement with all stored energy systems - gas, water, steam, air. Organizing and scheduling Pre-Energization meetings Confirming individuals working on energized / de-energized equipment are Qualified Work based on NFPA 70E, OSHA, or an accepted qualified electrical safety training standard. Delivering a project specific Electrical Safety Orientation to employees who will be working on energized or de-energized equipment Reviewing the electrician's LOTO plan and verifying it is accurate and managed properly. Reviewing electrician and vendor AHA's. Confirming receipt of the approved coordination study and all arc flash labels have been applied to the equipment. Tracking and confirming all required QA/QC is complete and documentation has been submitted. Reviewing the daily Pre-Task Plan for energization activities. Implementing adequate communication to the project team that identifies daily high-risk activities, energized equipment and spaces, barriers, and off-limit spaces. Confirming all pre-energization steps have been completed. Conducting pre-energization daily walks with the electrician and project stakeholders. Performing end-of-day walks for electrical equipment to confirm all systems are secure. Confirming adherence to the LOTO plan and isolation requirements. Confirming adequate signage and barriers are installed for electrical rooms and spaces with energized equipment. Confirming an adequate access control plan is in place for electrical rooms and spaces with energized equipment. Qualifications: BA/BS + 5 years of related experience or demonstrated equivalency of experience and/or education Able to understand the safe installation of electrical equipment and various voltages, equipment types, and AC/DC systems Knowledge of pressurized mechanical lines, compressed gas and air. Experience in construction and electrical commissioning standards and practices. Experience communicating complex technical solutions and concepts to engineers and non-engineers. Ensure audit site practices against written standards as part of assurance role. Ability to Interpret line drawings and system redundancies to ensure design of LOTO systems are 100% effective and in compliance with customer standards. While performing the duties of this job, the employee is regularly required to sit for long periods of time; talk or hear; perform fine motor, hand and finger skills in the use of a keyboard, telephone, or writing. The employee is frequently required to stand; walk; and reach with arms and/or hands. Specific vision abilities include close vision, distance vision, depth perception and the ability to adjust focus. The employee will spend their time in an office environment with a quiet to moderate noise level. Job site walking. Suffolk provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, pregnancy or maternity, national origin, citizenship, genetic information, disability, protected veteran, gender identity, age or any other status protected by law. This policy applies to recruiting, hiring, transfers, promotions, terminations, compensation, benefits, and all other terms and conditions of employment. Suffolk will not tolerate any unlawful discrimination toward, or harassment of, applicants or employees by anyone at Suffolk, or anyone working on behalf of Suffolk.
$62k-82k yearly est. 3d ago
AI Infrastructure Engineer - Slurm Platform
Advanced Micro Devices, Inc. 4.9
Infrastructure engineer job in San Jose, CA
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: A senior technical contributor that drives end-to-end delivery of software solutions, directly contributing to, and coordinating implementation and optimization across multiple teams for AI software development, inference and training of machine learning models. The position will involve interfacing with software and hardware engineering teams and AMD partners to plan, develop and optimize use cases. This is an exciting opportunity to work on the cutting edge of GPU Computing for Machine Learning. THE PERSON: We are seeking a DevOps Engineer / HPC Platform Engineer to build and operate our Slurm-based GPU compute platform, which complements our Kubernetes-based orchestration layer for GPU and CPU workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to communicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed. KEY RESPONSIBILITIES: * Design, deploy, and operate Slurm clusters across on-prem and multi-cloud GPU environments (Azure, OCI, Vultr, DigitalOcean, etc.). * Integrate Slurm with the broader orchestration ecosystem, enabling hybrid scheduling, unified authentication, and telemetry pipelines. * Build platform features that improve developer experience - e.g., job submission APIs, automated environment setup, and metrics dashboards. * Optimize cluster utilization and scheduling for GPU and CPU workloads; develop fair-share, QoS, and preemption policies. * Monitor cluster health and performance, implementing observability pipelines using Prometheus, Grafana, and custom exporters. * Collaborate with internal developers (framework, compiler, and application teams) to understand workload needs and translate them into scalable Slurm features. * Contribute to storage and network integration, ensuring performant I/O (e.g., NFS, Lustre, Weka) and high-speed interconnect configuration (InfiniBand, NVLink). * Support the job lifecycle - from image builds and environment modules to debugging and performance tuning of Slurm jobs. REQUIRED EXPERIENCES * 8+ years of experience managing and automating HPC or Slurm clusters in production environments. * Deep understanding of Linux systems, job schedulers (Slurm), and resource management for GPU-accelerated workloads. * Strong troubleshooting skills across compute, storage, and network layers. * Proven ability to collaborate with developers and researchers to design scalable HPC solutions. PREFERRED EXPERIENCE: * Experience integrating Slurm with Kubernetes or other control planes. * Experience with HPC storage and I/O technologies (Lustre, ZFS, WekaFS, NFS). * Familiarity with metrics collection and visualization using Prometheus, Grafana, and Thanos. * Exposure to CI/CD pipelines and DevOps practices for scientific or ML workloads. * Understanding of machine learning workflows and frameworks (PyTorch, vLLM, SGLang). * Experience with infrastructure automation (e.g., Ansible, Terraform) and scripting (Python, Bash). ACADEMIC CREDENTIALS: Bachelor's or Master's degree in related discipline preferred #LI-G11 #LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
$128k-167k yearly est. 57d ago
Virtual Memory Kernel Engineer
Apple Inc. 4.8
Infrastructure engineer job in Cupertino, CA
The Darwin Systems team within Apple's CoreOS organization is responsible for delivery of a high-quality and performant kernel for just about every one of Apple's products. Our software runs on your wrist as part of watch OS; in your pocket with iOS; on your desk in mac OS; in your living room with tv OS; and now in vision OS and Apple's Cloud. These are the devices owned by your friends and family; and hundreds of millions of devices beyond those. We ensure every aspect of the kernel and other system software are top class: features, performance, stability, security… This position requires a solid understanding of operating systems fundamentals, including kernel design and implementation. The virtual memory team is in charge of page management, mechanisms such as copy on write, low-memory process killing, swap… We work with every layer of the stack: from hardware all the way up to applications and successful engineers will be able to dig deep into details and work with other engineers to solve problems, find opportunities to keep on improving our stack and design to improve our customers' experience. As Moore's law is slowing down, effective management of resources is becoming more and more important. We work closely with all product teams across Apple to provide them with a modern, efficient operating system that allows them to ship the kinds of quality products that our customers expect. Come work with us on Apple's operating systems and get a chance to influence design across the stack: from Silicon all the way up to the SDK and applications while focusing on performance and delivering value to our customers. BS/MS in Computer Science + 0-2 years work experience or equivalent knowledge and experience Ability to work with teams across multiple timezones. Familiarity with Unix and associated tools. Ability to ramp up quickly on an unfamiliar code base. In-depth knowledge of kernel internals. Highly professional, with the ability to multitask and deliver solid work on tight schedules. A demonstrated record of working on core operating system technologies, specifically around memory management in a modern kernel. Design and implementation responsibility for a major project. Demonstrated creative and critical thinking capabilities and troubleshooting skills. Familiarity with modern processor architecture (e.g. memory hierarchy, multi-core, multithreading, etc).
$131k-169k yearly est. 60d+ ago
Cloud Infrastructure Engineer
Crusoe 4.1
Infrastructure engineer job in San Francisco, CA
Job Description Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure. About This Role: We're looking for an experienced Cloud Infrastructure Engineer to design, build, and operate the core cloud foundation that powers our AI-first compute environment. You'll own the reliability, scalability, and automation of our Google Cloud Platform footprint and build the guardrails, tooling, and infrastructure that enable engineering teams to move quickly while maintaining strong security and operational excellence. What You'll Be Working On: Design, build, and manage core cloud infrastructure across compute, networking, storage, and IAM. Architect and operate GCE, GKE, and serverless workloads (Cloud Run/Functions) for scale and reliability. Own VPC design, routing, load balancers, interconnects, peering, and network security boundaries. Develop and maintain Terraform modules to define automated, auditable, and secure cloud environments. Implement policies and guardrails across IAM, resource hierarchy, service accounts, and VPC-SC. Build automation for provisioning, lifecycle management, and blue/green or canary deploy patterns. Partner closely with security and platform teams on monitoring, logging, compliance, and operational readiness. Optimize cloud costs, quotas, and capacity planning across multiple projects and regions. Troubleshoot complex production issues across compute, storage, and networking layers. What You'll Bring to the Team: 5-8+ years operating large-scale production workloads on Google Cloud Platform. Deep knowledge of GCE, GKE, VPC networking, load balancers, firewall rules, interconnect, and GCS. Strong Terraform experience and a track record of building automated multi-environment infrastructure. Hands-on experience with Kubernetes internals, workload orchestration, scaling, and observability. Ability to debug complex distributed systems across compute, storage, and network boundaries. Strong cloud security fundamentals, including least privilege, secrets management, and policy enforcement. Proficiency with Python, Go, or Shell for automation and tooling. Experience influencing design decisions and partnering with cross-functional teams. Bonus Points: Experience supporting high-performance or AI/ML workloads in GCP. Familiarity with Anthos, service mesh, or multi-cluster Kubernetes operations. Background in hybrid or multi-cloud infrastructure. Strong SRE fundamentals including SLOs, incident response, and postmortems. Experience with Spanner, BigQuery, Bigtable, or large-scale data platforms. Benefits: Industry competitive pay Restricted Stock Units in a fast growing, well-funded technology company Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents Employer contributions to HSA accounts Paid Parental Leave Paid life insurance, short-term and long-term disability Teladoc 401(k) with a 100% match up to 4% of salary Generous paid time off and holiday schedule Cell phone reimbursement Tuition reimbursement Subscription to the Calm app MetLife Legal Company paid commuter benefit; $300 per month Compensation: Compensation will be paid in the range of $172,000 - $209,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
$128k-185k yearly est. 10d ago
Enterprise Cloud Infrastructure Engineer
Stanford University 4.5
Infrastructure engineer job in Redwood City, CA
Build and maintain scalable, highly available, and resilient systems in the cloud and on-prem. Implement any new cloud functionality or migrate existing processes to the cloud and maintain them. Build and deploy systems utilizing Continuous Integration/Continuous Delivery framework and infrastructure automation. This is a hybrid-eligible position. Core Responsibilities * Build and manage IaaS, PaaS, and SaaS services (e.g., compute, storage, network, security, administration, automation, application services, and databases) in either native cloud or hybrid cloud environments. Will ensure that the infrastructure is properly optimized for performance, cost, and security. * Develop and maintain automation scripts using high-level with a 'devops language' (such as Python, etc) and infrastructure-as-code templates to provision and manage Oracle OCI and AWS resources using Terraform. * Engineer processes across various areas like servers, applications, and databases and monitor them to ensure everything works as it should, taking necessary steps to facilitate spikes in traffic and usage. * Take responsibility for infrastructure resource design and performance optimization, backup and recovery strategies and implementation, as well as monitoring the overall health of the environment. * Develop and maintain efficient and appropriate connectivity solutions between various campus infrastructure to ensure necessary data is available as needed. * Collaborate with application development and infrastructure teams to optimize cloud-based solutions, ensuring high performance and cost-effectiveness * Ensure monitoring coverage for applications and proactively monitor and manage the health of applications/systems; responsible and accountable for driving the root cause and ensuring all issues are addressed and resolved in a timely fashion. * Implement Best Practices for IaaS, PaaS services lifecycle Minimum Qualifications Education & Experience: * Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience. Knowledge, Skills and Abilities: * Demonstrate Cloud Infrastructure experience including but not limited to AWS, GCP, Azure, OCI * Extensive Experience with AWS server deployment using AWS CLI, HashiCorp Terraform. * Experience in orchestrating environment deployment from OS all the way through the application layers of a solution, using tools such as Docker, Kubernetes, Jenkins, Git, Puppet and many others. * Experience with Servers, Infrastructure, Platform Sizing, Infrastructure Cost Reduction * Experience with automated deployment/ Continuous Integration/ Release * Demonstrate experience in Network Security and Network awareness principal. * Demonstrate a full-stack infrastructure including but not limited to database, web server, application server development as code. * Good understanding & experience of Networking, Firewall, Security hardening, storage and OS software package management in Unix Environment. * Strong understanding of Cloud Technologies with focus on AWS and OCI. * Comprehensive knowledge of DevOps practices and tools. * Knowledge of infrastructure APIs on cloud platforms such as EC2/VM/Bare Metal/Containers, Storage/EBS/Volumes, Network/VPC/Subnet, Network Interfaces, etc. * Extensive experience with deploying Infrastructure as Code (IaC) patterns using Terraform * Ability to script proficiently using Unix Shell, Python for automation. * Ability to work independently or as a team member and self-motivated in learning new technology. * Excellent communication, presentation and collaborative problem-solving skills. * Ability to collaborate with vendors and system service providers, provide administration for system solutions, and act as a liaison. Preferred Qualifications: * Experience working with Single Sign-On SAML, OpenID is a plus. * Experience with AWS SQS, API Gateway, OAuth 2.0 & RESTful APIs is a plus. * Experience in Oracle OCI is highly recommended. * AWS , OCI certifications preferred. The expected pay range for this position is $150,922 - $155,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (***************************************************** provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. Why Stanford is for you: Imagine a world without search engines or social platforms. Consider lives saved through first-ever organ transplants and research to cure illnesses. Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with: * Freedom to grow. We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak. * A caring culture. We provide superb retirement plans, generous time-off, and family care resources. * A healthier you. Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits. * Discovery and fun. Stroll through historic sculptures, trails, and museums. * Enviable resources. Enjoy free commuter programs, ridesharing incentives, discounts and more! The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned. Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form. Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Additional Information * Schedule: Full-time * Job Code: 4762 * Employee Status: Regular * Grade: K * Requisition ID: 107211 * Work Arrangement : Hybrid Eligible
$150.9k-155k yearly 60d+ ago

Learn more about infrastructure engineer jobs

How much does an infrastructure engineer earn in Lafayette, CA?

The average infrastructure engineer in Lafayette, CA earns between $94,000 and $211,000 annually. This compares to the national average infrastructure engineer range of $76,000 to $148,000.

Average infrastructure engineer salary in Lafayette, CA

$141,000

$94,00010%

$141,000Median

$211,00090%

What are the biggest employers of Infrastructure Engineers in Lafayette, CA?

The biggest employers of Infrastructure Engineers in Lafayette, CA are:

Far.Ai

Job type you want

Full Time

Part Time

Internship

Temporary

Zippia Careers
Computer and Mathematical Industry
Infrastructure Engineer Jobs
Locations
Lafayette, CA
Infrastructure Engineer Lafayette, CA Jobs

Infrastructure engineer jobs in Lafayette, CA

Surgery Center Administrator

System Administrator

Energy Marshall, Data Centers

MEP Systems Engineer

Firmware Infrastructure Engineer - GPU

AI Infrastructure Engineer

Virtual Memory Kernel Engineer

Autonomy Engineer - Deep Learning Infrastructure