Post job

Senior Infrastructure Engineer jobs at Apple

- 118 jobs
  • Machine Learning Infrastructure Engineer

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Want to ship amazing experiences in Apple products? Be part of the team in the Video Computer Vision (VCV) organization that focuses on people understanding from real-time video streams and building higher level reasoning algorithms. VCV delivered features such as Face ID, RoomPlan as well as many other computer vision algorithms powering Apple Vision Pro, iPhone, and iPad. We focus on a balance of research and development to deliver Apple quality, pioneering experiences. Come shape Apple products as a driven and dedicated ML Infrastructure and Data Engineer to push the limits of ML algorithms with hands-on work and real world and simulated data, in an innovative team and be part of building the next big thing. As part of the Video Computer Vision (VCV) team, you will help us create the data and infrastructure ecosystem needed to support our ML development and continuously improve our features. We take full end-to-end ownership of our services and data products, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you'll have the opportunity to work on complex problems in close partnership with our ML engineers, data scientists and software integration teams. Experience with machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment. Proficiency with cloud computing and distributed data processing infrastructure and tools (e.g., Ray, Spark, Trino). Hands-on experience with CI/CD pipelines and practices. Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform, Pulumi, or CloudFormation). Experience building on LLMs or other generative models. Ability to drive projects from concept to production, balancing business needs with technical quality and timely delivery. Excellent communication skills, ability to work both independently and multi-functionally. Bachelor's degree in Computer Science or related discipline, and 2 years relevant industry experience. Strong foundational knowledge in Computer Science. Extensive programming experience in Python. Hands-on experience with cloud providers (AWS, GCP, or Azure). Strong understanding of core infrastructure concepts (e.g., compute, networking, storage, containers, Kubernetes).
    $150k-196k yearly est. 40d ago
  • On-Device ML Infrastructure Engineer (CoreML Runtime)

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Imagine being at the forefront of an evolution where powerful AI meets the elegance of Apple silicon. The On-Device Machine Learning team transforms groundbreaking research into practical applications, enabling billions of Apple devices to run powerful AI models locally, privately, and efficiently. We stand at the unique intersection of research, software engineering, hardware engineering, and product development, making Apple a top destination for on-device machine learning innovation. Our team builds the essential infrastructure that enables machine learning at scale on Apple devices. This involves onboarding innovative architectures to embedded systems, developing optimization toolkits for model compression and acceleration, building ML compilers and runtimes for efficient execution, and creating comprehensive benchmarking and debugging toolchains. This infrastructure forms the backbone of Apple's machine learning workflows across Camera, Siri, Health, Vision, and other core experiences, contributing to the overall Apple Intelligence ecosystem. If you are passionate about the technical challenges of running sophisticated ML models on resource-constrained devices and eager to directly impact how machine learning operates across the Apple ecosystem, this role presents an incredible opportunity to work on the next generation of intelligent experiences on Apple platforms. We are seeking an ML Infrastructure Engineer with a specific focus on graph compilers and runtimes. In this role, you will build the world's most advanced ML graph compilation and runtime system, capable of optimizing and delivering ML models efficiently on Apple products and services. If you are a highly motivated software engineer who is creative, versatile, and passionate about machine learning operator primitives, common compiler optimizations, runtimes, and system software engineering in the fast-paced and dynamic field of machine learning, this could be a fantastic role for you. We're building an end-to-end developer experience for machine learning development that employs Apple's vertical integration. This allows developers to iterate on model authoring, optimization, transformation, execution, debugging, profiling, and analysis. This role focuses on the Core ML Runtime for execution on-device. Experience with any on-device ML stack, such as TFLite, ONNX, ExecuTorch, etc. Experience with any ML authoring framework (PyTorch, TensorFlow, JAX, etc.) is a strong plus. Experience with accelerators, GPU programming is a strong plus. Masters or equivalent experience in Computer Sciences, Engineering, or related subject area. Highly proficient in C++ or Swift. Familiarity with Python. Experience with any compiler stack (MLIR/LLVM/TVM/...). Familiarity with Operating Systems, embedding programming, parallel programming. Sound understanding of ML fundamentals, including common architectures such as Transformers. Good communication skills, including ability to communicate with multi-functional audiences.
    $150k-196k yearly est. 60d+ ago
  • Global Infrastructure Engineer

    Meta 4.8company rating

    Menlo Park, CA jobs

    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community. We are seeking a forward-thinking individual skilled across multiple disciplines to lead global initiatives on this team. The mission of this role is to identify and tackle the biggest technical and operational challenges and opportunities before SiteOps. The Infrastructure Engineer is expected to personally advance our highest impact initiatives, and to work with others to closure through the right working groups and delegates. The scope of the role is Infra-wide; the DC Infra Engineer is expected to work with the data center teams, Core Systems, CEA, PE, and hardware engineering to architect and implement adaptable solutions that transform our infrastructure in dimensions including performance, efficiency, quality, and resiliency. Areas of emphasis include next gen platforms, tools, and technologies; the interplay between our platforms and data centers; and the underlying architecture of our infrastructure including physical vs logical layer trade-offs. **Required Skills:** Global Infrastructure Engineer Responsibilities: 1. Represent Site Operations in leading work to define and architect new solutions on global initiatives, working with stakeholders across Infra Data Centers & Infrastructure teams 2. Assemble and lead teams to address complex engineering challenges, requiring technical expertise as well as a broad understanding of Meta's overall infrastructure 3. Address issues that can be ambiguous and global in nature, requiring leadership and collaboration across time zones, teams, and technical domains 4. Act as key SME and mentor in the design, operation, and troubleshooting of tools, technologies, and processes utilized within Site Operations 5. Understand and assess risks and challenges associated with emerging new hardware, data center and software technologies, and define & implement effective mitigations for these 6. Employ a holistic understanding of the full infrastructure stack to lead solutions that appropriately balance physical and logical layer 7. Act as a global communication and advisory point of contact for the design, implementation and delivery of projects that affect our global data center and server fleet and facilitate resolution of issues drawing on local expertise and global support partners 8. Leverage data-driven methodologies to understand a problem at the onset, define a plan, and measure progress throughout a project 9. Provide data supplied narratives and ensure a focus on continuous improvement 10. Build and support, trusted, cross-functional connections with teams across the globe and serve as an advocate for the Site Operations Team with key stakeholders, influencing policies and procedures to improve global data center operations 11. Approximately 20% - 30% travel **Minimum Qualifications:** Minimum Qualifications: 12. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 13. Knowledge of the full stack of infrastructure, with experience building or operating logical infrastructure on top of a complex, distributed physical infrastructure 14. Proven communication skills and experience working in a highly distributed environment, across teams/department boundaries 15. 10+ years of technical experience, in a large-scale data center or IT Infrastructure environment, or equivalent experience building platforms and systems for large scale compute 16. Experience building globally scalable solutions and translating global strategic initiatives into local executable projects 17. Knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, network, server and storage systems 18. Experience building, operating, and scaling with Linux or Unix Operating systems 19. Experience communicating the results of analysis and insights to cross functional teams and influencing the strategy of these teams 20. Experience with Data Center Design and Expansion **Preferred Qualifications:** Preferred Qualifications: 21. Extensive knowledge of storage and AI/ML related services and the hardware that supports them 22. Coding or scripting experience such as Bash, PHP, Python, SQL, or Perl 23. Experience in providing technical guidance to external vendors and partners. Knowledge and experience with virtualization, containerization, distributed systems, fault tolerance, and incident management 24. Experience with high level data center design, operations, basic electrical/mechanical infrastructure, and scaling physical infrastructure **Public Compensation:** $191,000/year to $262,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
    $191k-262k yearly 48d ago
  • ASIC Engineer, EDA Infrastructure

    Meta 4.8company rating

    Sunnyvale, CA jobs

    Meta is hiring ASIC EDA Infrastructure Engineers within our Infrastructure organization. We are looking for individuals with experience in EDA flow and methodology, CAD/automation and ASIC infrastructure to build efficient System on Chip (SoC) and IP for data center applications. **Required Skills:** ASIC Engineer, EDA Infrastructure Responsibilities: 1. Front End implementation flow development and support 2. Physical Design implementation flow development and support 3. RTL2GDS flow development and support 4. Internal tools development and automation to help improve productivity across ASIC design cycles including but not limited to RTL generation tools, memory selection automation, register generation, filelist generation 5. Work with EDA vendors on new tools/technology and new features evaluation and adoption 6. Manage the internal EDA license requests, installation and license forecast as well as EDA tool installation and maintenance 7. Work with internal infrastructure team on compute grid, storage management and job scheduling architecture, efficiency and maintenance 8. Work with internal infrastructure team on adapting Meta infrastructure to ASIC design solutions, including but not limited to Source Control Management, Continuous Integration, data management and reporting **Minimum Qualifications:** Minimum Qualifications: 9. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 10. 8+ years of experience with EDA tools and scripting languages (Python, Tcl) used to build tools and flows for complex environments 11. Knowledge of front-end and back-end ASIC tools and flows 12. Experience with communicating across functional internal teams and with vendors **Preferred Qualifications:** Preferred Qualifications: 13. Experience with ASIC EDA infrastructure (compute, storage, job scheduling) management, maintenance and support 14. Experience with RTL design using SystemVerilog or other HDL 15. Experience setting up EDA infrastructure from scratch 16. Experience with developing and supporting solutions for ASIC design environment and infrastructure **Public Compensation:** $173,000/year to $249,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
    $173k-249k yearly 60d+ ago
  • Global Infrastructure Engineer

    Meta Platforms, Inc. 4.8company rating

    Menlo Park, CA jobs

    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community. We are seeking a forward-thinking individual skilled across multiple disciplines to lead global initiatives on this team. The mission of this role is to identify and tackle the biggest technical and operational challenges and opportunities before SiteOps. The Infrastructure Engineer is expected to personally advance our highest impact initiatives, and to work with others to closure through the right working groups and delegates. The scope of the role is Infra-wide; the DC Infra Engineer is expected to work with the data center teams, Core Systems, CEA, PE, and hardware engineering to architect and implement adaptable solutions that transform our infrastructure in dimensions including performance, efficiency, quality, and resiliency. Areas of emphasis include next gen platforms, tools, and technologies; the interplay between our platforms and data centers; and the underlying architecture of our infrastructure including physical vs logical layer trade-offs. Minimum Qualifications * Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience * Knowledge of the full stack of infrastructure, with experience building or operating logical infrastructure on top of a complex, distributed physical infrastructure * Proven communication skills and experience working in a highly distributed environment, across teams/department boundaries * 10+ years of technical experience, in a large-scale data center or IT Infrastructure environment, or equivalent experience building platforms and systems for large scale compute * Experience building globally scalable solutions and translating global strategic initiatives into local executable projects * Knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, network, server and storage systems * Experience building, operating, and scaling with Linux or Unix Operating systems * Experience communicating the results of analysis and insights to cross functional teams and influencing the strategy of these teams * Experience with Data Center Design and Expansion Preferred Qualifications * Extensive knowledge of storage and AI/ML related services and the hardware that supports them * Coding or scripting experience such as Bash, PHP, Python, SQL, or Perl * Experience in providing technical guidance to external vendors and partners. Knowledge and experience with virtualization, containerization, distributed systems, fault tolerance, and incident management * Experience with high level data center design, operations, basic electrical/mechanical infrastructure, and scaling physical infrastructure Responsibilities * Represent Site Operations in leading work to define and architect new solutions on global initiatives, working with stakeholders across Infra Data Centers & Infrastructure teams * Assemble and lead teams to address complex engineering challenges, requiring technical expertise as well as a broad understanding of Meta's overall infrastructure * Address issues that can be ambiguous and global in nature, requiring leadership and collaboration across time zones, teams, and technical domains * Act as key SME and mentor in the design, operation, and troubleshooting of tools, technologies, and processes utilized within Site Operations * Understand and assess risks and challenges associated with emerging new hardware, data center and software technologies, and define & implement effective mitigations for these * Employ a holistic understanding of the full infrastructure stack to lead solutions that appropriately balance physical and logical layer * Act as a global communication and advisory point of contact for the design, implementation and delivery of projects that affect our global data center and server fleet and facilitate resolution of issues drawing on local expertise and global support partners * Leverage data-driven methodologies to understand a problem at the onset, define a plan, and measure progress throughout a project * Provide data supplied narratives and ensure a focus on continuous improvement * Build and support, trusted, cross-functional connections with teams across the globe and serve as an advocate for the Site Operations Team with key stakeholders, influencing policies and procedures to improve global data center operations * Approximately 20% - 30% travel About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $162k-213k yearly est. 7d ago
  • Global Infrastructure Engineer

    Meta 4.8company rating

    Newark, CA jobs

    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community. We are seeking a forward-thinking individual skilled across multiple disciplines to lead global initiatives on this team. The mission of this role is to identify and tackle the biggest technical and operational challenges and opportunities before SiteOps. The Infrastructure Engineer is expected to personally advance our highest impact initiatives, and to work with others to closure through the right working groups and delegates. The scope of the role is Infra-wide; the DC Infra Engineer is expected to work with the data center teams, Core Systems, CEA, PE, and hardware engineering to architect and implement adaptable solutions that transform our infrastructure in dimensions including performance, efficiency, quality, and resiliency. Areas of emphasis include next gen platforms, tools, and technologies; the interplay between our platforms and data centers; and the underlying architecture of our infrastructure including physical vs logical layer trade-offs. **Required Skills:** Global Infrastructure Engineer Responsibilities: 1. Represent Site Operations in leading work to define and architect new solutions on global initiatives, working with stakeholders across Infra Data Centers & Infrastructure teams 2. Assemble and lead teams to address complex engineering challenges, requiring technical expertise as well as a broad understanding of Meta's overall infrastructure 3. Address issues that can be ambiguous and global in nature, requiring leadership and collaboration across time zones, teams, and technical domains 4. Act as key SME and mentor in the design, operation, and troubleshooting of tools, technologies, and processes utilized within Site Operations 5. Understand and assess risks and challenges associated with emerging new hardware, data center and software technologies, and define & implement effective mitigations for these 6. Employ a holistic understanding of the full infrastructure stack to lead solutions that appropriately balance physical and logical layer 7. Act as a global communication and advisory point of contact for the design, implementation and delivery of projects that affect our global data center and server fleet and facilitate resolution of issues drawing on local expertise and global support partners 8. Leverage data-driven methodologies to understand a problem at the onset, define a plan, and measure progress throughout a project 9. Provide data supplied narratives and ensure a focus on continuous improvement 10. Build and support, trusted, cross-functional connections with teams across the globe and serve as an advocate for the Site Operations Team with key stakeholders, influencing policies and procedures to improve global data center operations 11. Approximately 20% - 30% travel **Minimum Qualifications:** Minimum Qualifications: 12. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 13. Knowledge of the full stack of infrastructure, with experience building or operating logical infrastructure on top of a complex, distributed physical infrastructure 14. Proven communication skills and experience working in a highly distributed environment, across teams/department boundaries 15. 10+ years of technical experience, in a large-scale data center or IT Infrastructure environment, or equivalent experience building platforms and systems for large scale compute 16. Experience building globally scalable solutions and translating global strategic initiatives into local executable projects 17. Knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, network, server and storage systems 18. Experience building, operating, and scaling with Linux or Unix Operating systems 19. Experience communicating the results of analysis and insights to cross functional teams and influencing the strategy of these teams 20. Experience with Data Center Design and Expansion **Preferred Qualifications:** Preferred Qualifications: 21. Extensive knowledge of storage and AI/ML related services and the hardware that supports them 22. Coding or scripting experience such as Bash, PHP, Python, SQL, or Perl 23. Experience in providing technical guidance to external vendors and partners. Knowledge and experience with virtualization, containerization, distributed systems, fault tolerance, and incident management 24. Experience with high level data center design, operations, basic electrical/mechanical infrastructure, and scaling physical infrastructure **Public Compensation:** $191,000/year to $262,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
    $191k-262k yearly 48d ago
  • Global Infrastructure Engineer

    Meta Platforms, Inc. 4.8company rating

    Santa Clara, CA jobs

    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community. We are seeking a forward-thinking individual skilled across multiple disciplines to lead global initiatives on this team. The mission of this role is to identify and tackle the biggest technical and operational challenges and opportunities before SiteOps. The Infrastructure Engineer is expected to personally advance our highest impact initiatives, and to work with others to closure through the right working groups and delegates. The scope of the role is Infra-wide; the DC Infra Engineer is expected to work with the data center teams, Core Systems, CEA, PE, and hardware engineering to architect and implement adaptable solutions that transform our infrastructure in dimensions including performance, efficiency, quality, and resiliency. Areas of emphasis include next gen platforms, tools, and technologies; the interplay between our platforms and data centers; and the underlying architecture of our infrastructure including physical vs logical layer trade-offs. Minimum Qualifications * Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience * Knowledge of the full stack of infrastructure, with experience building or operating logical infrastructure on top of a complex, distributed physical infrastructure * Proven communication skills and experience working in a highly distributed environment, across teams/department boundaries * 10+ years of technical experience, in a large-scale data center or IT Infrastructure environment, or equivalent experience building platforms and systems for large scale compute * Experience building globally scalable solutions and translating global strategic initiatives into local executable projects * Knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, network, server and storage systems * Experience building, operating, and scaling with Linux or Unix Operating systems * Experience communicating the results of analysis and insights to cross functional teams and influencing the strategy of these teams * Experience with Data Center Design and Expansion Preferred Qualifications * Extensive knowledge of storage and AI/ML related services and the hardware that supports them * Coding or scripting experience such as Bash, PHP, Python, SQL, or Perl * Experience in providing technical guidance to external vendors and partners. Knowledge and experience with virtualization, containerization, distributed systems, fault tolerance, and incident management * Experience with high level data center design, operations, basic electrical/mechanical infrastructure, and scaling physical infrastructure Responsibilities * Represent Site Operations in leading work to define and architect new solutions on global initiatives, working with stakeholders across Infra Data Centers & Infrastructure teams * Assemble and lead teams to address complex engineering challenges, requiring technical expertise as well as a broad understanding of Meta's overall infrastructure * Address issues that can be ambiguous and global in nature, requiring leadership and collaboration across time zones, teams, and technical domains * Act as key SME and mentor in the design, operation, and troubleshooting of tools, technologies, and processes utilized within Site Operations * Understand and assess risks and challenges associated with emerging new hardware, data center and software technologies, and define & implement effective mitigations for these * Employ a holistic understanding of the full infrastructure stack to lead solutions that appropriately balance physical and logical layer * Act as a global communication and advisory point of contact for the design, implementation and delivery of projects that affect our global data center and server fleet and facilitate resolution of issues drawing on local expertise and global support partners * Leverage data-driven methodologies to understand a problem at the onset, define a plan, and measure progress throughout a project * Provide data supplied narratives and ensure a focus on continuous improvement * Build and support, trusted, cross-functional connections with teams across the globe and serve as an advocate for the Site Operations Team with key stakeholders, influencing policies and procedures to improve global data center operations * Approximately 20% - 30% travel About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $162k-212k yearly est. 7d ago
  • Enterprise Infrastructure Engineer, IAM

    Meta Platforms, Inc. 4.8company rating

    Fremont, CA jobs

    The Enterprise Engineering Identity and Access Management (EE IAM) team is responsible for designing, implementing, and maintaining secure identity and access management solutions that enable Meta's workforce to securely access the tools and data they need to do their jobs. As an Enterprise Infrastructure Engineer, you will play a critical role in ensuring the smooth operation of our identity and access management systems. We are seeking an experienced Enterprise Infrastructure Engineer to join our EE IAM team. In this role, you will be responsible for managing the lifecycle of complex applications, including design, development, testing, deployment, and maintenance. You will work closely with cross-functional teams, including IAM engineering, production engineering, and business stakeholders, to ensure that our identity and access management solutions meet the needs of Meta's workforce. Minimum Qualifications * BS or MS in Computer Science, Engineering, or a related technical discipline (or equivalent experience) * 5+ years of experience in Identity and Access Management field * 3+ years of experience in systems engineering * 3+ years of experience automating the management of infrastructure and services * 3+ years of experience working with monitoring and configuration management tools such as Chef, Ansible, Puppet, Saltstack, etc * 3+ years of experience coding in at least one of the following languages: Python, Ruby, PHP, Rust, or Go * Proven experience in managing complex IAM deployments, including design, development, testing, deployment, and maintenance * Experience with identity and access management solutions, such as Sailpoint, Okta, Azure AD etc * Familiarity with DevOps practices and tools, CICD * Ability to analyze complex issues and develop creative solutions * Ability to stay up-to-date with industry trends and emerging technologies in identity and access management Preferred Qualifications * Master's degree in Computer Science, Information Technology, or related field * Experience with cloud-based identity and access management solutions, such as AWS IAM or Google Cloud IAM * Certification in security, such as CISSP * Experience of consistently working under your own initiative, seeking feedback and input where appropriate Responsibilities * Manage the lifecycle of complex applications, including design, development, testing, deployment, and maintenance * Collaborate with cross-functional teams, including engineering, product management, and business stakeholders, to ensure that our identity and access management solutions meet the needs of Meta's workforce * Develop and maintain technical documentation, including system architecture, design documents, and user guides * Provide technical leadership and guidance to other team members * Participate in on-call rotation for production support * Stay up-to-date with industry trends and emerging technologies in identity and access management * Identify opportunities for process improvement and automation About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $162k-212k yearly est. 7d ago
  • Global Infrastructure Engineer

    Meta Platforms, Inc. 4.8company rating

    Newark, CA jobs

    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community. We are seeking a forward-thinking individual skilled across multiple disciplines to lead global initiatives on this team. The mission of this role is to identify and tackle the biggest technical and operational challenges and opportunities before SiteOps. The Infrastructure Engineer is expected to personally advance our highest impact initiatives, and to work with others to closure through the right working groups and delegates. The scope of the role is Infra-wide; the DC Infra Engineer is expected to work with the data center teams, Core Systems, CEA, PE, and hardware engineering to architect and implement adaptable solutions that transform our infrastructure in dimensions including performance, efficiency, quality, and resiliency. Areas of emphasis include next gen platforms, tools, and technologies; the interplay between our platforms and data centers; and the underlying architecture of our infrastructure including physical vs logical layer trade-offs. Minimum Qualifications * Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience * Knowledge of the full stack of infrastructure, with experience building or operating logical infrastructure on top of a complex, distributed physical infrastructure * Proven communication skills and experience working in a highly distributed environment, across teams/department boundaries * 10+ years of technical experience, in a large-scale data center or IT Infrastructure environment, or equivalent experience building platforms and systems for large scale compute * Experience building globally scalable solutions and translating global strategic initiatives into local executable projects * Knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, network, server and storage systems * Experience building, operating, and scaling with Linux or Unix Operating systems * Experience communicating the results of analysis and insights to cross functional teams and influencing the strategy of these teams * Experience with Data Center Design and Expansion Preferred Qualifications * Extensive knowledge of storage and AI/ML related services and the hardware that supports them * Coding or scripting experience such as Bash, PHP, Python, SQL, or Perl * Experience in providing technical guidance to external vendors and partners. Knowledge and experience with virtualization, containerization, distributed systems, fault tolerance, and incident management * Experience with high level data center design, operations, basic electrical/mechanical infrastructure, and scaling physical infrastructure Responsibilities * Represent Site Operations in leading work to define and architect new solutions on global initiatives, working with stakeholders across Infra Data Centers & Infrastructure teams * Assemble and lead teams to address complex engineering challenges, requiring technical expertise as well as a broad understanding of Meta's overall infrastructure * Address issues that can be ambiguous and global in nature, requiring leadership and collaboration across time zones, teams, and technical domains * Act as key SME and mentor in the design, operation, and troubleshooting of tools, technologies, and processes utilized within Site Operations * Understand and assess risks and challenges associated with emerging new hardware, data center and software technologies, and define & implement effective mitigations for these * Employ a holistic understanding of the full infrastructure stack to lead solutions that appropriately balance physical and logical layer * Act as a global communication and advisory point of contact for the design, implementation and delivery of projects that affect our global data center and server fleet and facilitate resolution of issues drawing on local expertise and global support partners * Leverage data-driven methodologies to understand a problem at the onset, define a plan, and measure progress throughout a project * Provide data supplied narratives and ensure a focus on continuous improvement * Build and support, trusted, cross-functional connections with teams across the globe and serve as an advocate for the Site Operations Team with key stakeholders, influencing policies and procedures to improve global data center operations * Approximately 20% - 30% travel About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $162k-213k yearly est. 7d ago
  • Network Production Engineer, Network Infrastructure

    Meta 4.8company rating

    Menlo Park, CA jobs

    The Network Infrastructure team is responsible for designing, building and operating one of the largest networks in the world. Networking is at the core of all Meta products and experiences, and we are looking for Production Engineers who are interested in solving complex technical challenges in the Backbone or Datacenter Network domains.Production Network Engineers at Meta are hybrid software and network engineers who keep reliability and scalability in mind as they work on different parts of the lifecycle (designing, building, and operating our worldwide network).This role offers an opportunity to solve the scaling challenges of supporting billions of people using our family of apps; to cutting-edge challenges in AI workloads that power new Meta products. **Required Skills:** Network Production Engineer, Network Infrastructure Responsibilities: 1. Conceiving, developing, and deploying systems and tools to keep the network running reliably and efficiently 2. Establish and implement global best practices and contribute to the design of new scalable network solutions 3. Define and partner with network hardware, software, and vendor teams on the development of network platforms 4. Participate in an on-call rotation to support the global network infrastructure and analyze data to diagnose and identify root causes to network issues 5. Lead projects to address hard technical challenges, directly contributing to roadmaps and partner alongside the best engineers in the industry to develop reliable and scalable solutions 6. Help onboard new team members while working with other technical leads and managers to make the team a success **Minimum Qualifications:** Minimum Qualifications: 7. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 8. 7+ years of relevant experience developing scalable and reliable systems and/or networks 9. Experience coding in higher-level languages (e.g., Python, C++, Go, Rust, etc) 10. Experience learning software, frameworks and APIs 11. Engineering degree, or a related technical discipline or equivalent experience 12. Experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems 13. Experience developing and understanding network device configuration for at least one vendor (Juniper, Cisco, Arista, Brocade, etc.) **Preferred Qualifications:** Preferred Qualifications: 14. Expert knowledge of TCP/IP and IPv6 15. Protocol knowledge in one or more of BGP, MPLS, ISIS or similar routing protocols - knowledge in typical configurations and performance tuning 16. Understanding of routing and switching - hardware design and knowledge of forwarding and data planes 17. Understanding of RDMA congestion control mechanisms on IB and RoCE Networks 18. Understanding of AI training workloads and demands they exert on networks 19. Understanding of the latest artificial intelligence (AI) technologies **Public Compensation:** $177,000/year to $251,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
    $177k-251k yearly 60d+ ago
  • Network Production Engineer, Network Infrastructure

    Meta 4.8company rating

    Sacramento, CA jobs

    The Network Infrastructure team is responsible for designing, building and operating one of the largest networks in the world. Networking is at the core of all Meta products and experiences, and we are looking for Production Engineers who are interested in solving complex technical challenges in the Backbone or Datacenter Network domains.Production Network Engineers at Meta are hybrid software and network engineers who keep reliability and scalability in mind as they work on different parts of the lifecycle (designing, building, and operating our worldwide network).This role offers an opportunity to solve the scaling challenges of supporting billions of people using our family of apps; to cutting-edge challenges in AI workloads that power new Meta products. **Required Skills:** Network Production Engineer, Network Infrastructure Responsibilities: 1. Conceiving, developing, and deploying systems and tools to keep the network running reliably and efficiently 2. Establish and implement global best practices and contribute to the design of new scalable network solutions 3. Define and partner with network hardware, software, and vendor teams on the development of network platforms 4. Participate in an on-call rotation to support the global network infrastructure and analyze data to diagnose and identify root causes to network issues 5. Lead projects to address hard technical challenges, directly contributing to roadmaps and partner alongside the best engineers in the industry to develop reliable and scalable solutions 6. Help onboard new team members while working with other technical leads and managers to make the team a success **Minimum Qualifications:** Minimum Qualifications: 7. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 8. 7+ years of relevant experience developing scalable and reliable systems and/or networks 9. Experience coding in higher-level languages (e.g., Python, C++, Go, Rust, etc) 10. Experience learning software, frameworks and APIs 11. Engineering degree, or a related technical discipline or equivalent experience 12. Experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems 13. Experience developing and understanding network device configuration for at least one vendor (Juniper, Cisco, Arista, Brocade, etc.) **Preferred Qualifications:** Preferred Qualifications: 14. Expert knowledge of TCP/IP and IPv6 15. Protocol knowledge in one or more of BGP, MPLS, ISIS or similar routing protocols - knowledge in typical configurations and performance tuning 16. Understanding of routing and switching - hardware design and knowledge of forwarding and data planes 17. Understanding of RDMA congestion control mechanisms on IB and RoCE Networks 18. Understanding of AI training workloads and demands they exert on networks 19. Understanding of the latest artificial intelligence (AI) technologies **Public Compensation:** $177,000/year to $251,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
    $177k-251k yearly 60d+ ago
  • Network Production Engineer, Network Infrastructure

    Meta Platforms, Inc. 4.8company rating

    Menlo Park, CA jobs

    The Network Infrastructure team is responsible for designing, building and operating one of the largest networks in the world. Networking is at the core of all Meta products and experiences, and we are looking for Production Engineers who are interested in solving complex technical challenges in the Backbone or Datacenter Network domains. Production Network Engineers at Meta are hybrid software and network engineers who keep reliability and scalability in mind as they work on different parts of the lifecycle (designing, building, and operating our worldwide network). This role offers an opportunity to solve the scaling challenges of supporting billions of people using our family of apps; to cutting-edge challenges in AI workloads that power new Meta products. Minimum Qualifications * Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience * 7+ years of relevant experience developing scalable and reliable systems and/or networks * Experience coding in higher-level languages (e.g., Python, C++, Go, Rust, etc) * Experience learning software, frameworks and APIs * Engineering degree, or a related technical discipline or equivalent experience * Experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems * Experience developing and understanding network device configuration for at least one vendor (Juniper, Cisco, Arista, Brocade, etc.) Preferred Qualifications * Expert knowledge of TCP/IP and IPv6 * Protocol knowledge in one or more of BGP, MPLS, ISIS or similar routing protocols - knowledge in typical configurations and performance tuning * Understanding of routing and switching - hardware design and knowledge of forwarding and data planes * Understanding of RDMA congestion control mechanisms on IB and RoCE Networks * Understanding of AI training workloads and demands they exert on networks * Understanding of the latest artificial intelligence (AI) technologies Responsibilities * Conceiving, developing, and deploying systems and tools to keep the network running reliably and efficiently * Establish and implement global best practices and contribute to the design of new scalable network solutions * Define and partner with network hardware, software, and vendor teams on the development of network platforms * Participate in an on-call rotation to support the global network infrastructure and analyze data to diagnose and identify root causes to network issues * Lead projects to address hard technical challenges, directly contributing to roadmaps and partner alongside the best engineers in the industry to develop reliable and scalable solutions * Help onboard new team members while working with other technical leads and managers to make the team a success About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $174k-234k yearly est. 28d ago
  • Production Systems Engineer

    Meta 4.8company rating

    Menlo Park, CA jobs

    Meta is seeking a Production Systems Engineer to join our Hardware Design and Release to Production (HDRTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver Meta's services globally. The HDRTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, from exploration and development to production health. HDRTP Engineers work closely with Production Engineering teams, Enterprise Networking, Hardware Designers, Networking Teams, Manufacturers, Vendors, Datacenter Operation teams and New Product Introduction teams to ensure the smooth operation of systems across the planet.We encounter problems from the very smallest of scales (errors occurring at the microscopic scale, within single registers of a CPU) up to the very largest - deploying solutions to Meta's millions of devices globally. We focus on finding solutions to complex issues, embracing ambiguity, driving impact, and tackling the hardest problems in the domain. **Required Skills:** Production Systems Engineer Responsibilities: 1. Build and develop tooling solutions to automate business critical processes in service of managing the health of the Meta production hardware fleet 2. Troubleshoot, diagnose and root cause system failures, working with key partners to identify and deliver solutions 3. Proactively identify opportunities to fix or enhance tooling, hardware and processes 4. Build subject matter expertise in one or more of the specialist areas covered by the RTP (Release To Production) team 5. Scientific approach to troubleshooting, root-cause analysis and investigation **Minimum Qualifications:** Minimum Qualifications: 6. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 7. 6+ years experience coding in a higher-level language (Python, PHP, Java, Go, Rust, C++) 8. Experience building, maintaining and debugging production services or platforms - usually (but not necessarily) in a linux/unix environment 9. Knowledge of server architecture and components across Compute/Storage/AI Systems/Networking **Preferred Qualifications:** Preferred Qualifications: 10. Experience managing and debugging hardware platforms in a cloud environment 11. 6+ years experience coding in a higher-level language (Python, PHP, Java, Go, Rust, C++) 12. Demonstrated experience driving projects to successful business outcomes **Public Compensation:** $132,000/year to $191,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
    $132k-191k yearly 60d+ ago
  • Technical Support Engineer

    Dell International Services 4.8company rating

    San Mateo, CA jobs

    $90K+ salary. Very negotiable Equity Typical day: 60% live chat, 30% email, 10% phone support. Training will be provided for recent graduates. Ideally someone with 0-6 years of tech support exp. Notes: Must Haves: Bachelor's Degree, Accredited university. - THIS IS REQUIRED GPA 3.0 or higher Some type of computer networking knowledge whether professional or educational. Good communication sk Notes: Responsibilities: · Troubleshooting and support for all IT-related topics and anything else your customers need help with: software, laptops, cell phones, desk phones, printers, confer Additional Information All your information will be kept confidential according to EEO guidelines.
    $90k yearly 9h ago
  • ML Infrastructure Engineer - Multimodal Training Tools, SIML

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people? We are the Intelligence System Experience (ISE) team within Apple's software organization. The team operates at the intersection of multimodal machine learning and system experiences. Our multidisciplinary ML teams focus on a broad spectrum of areas, including Visual Generative Foundation Models, Multimodal Understanding, Visual Understanding of People, Text, Handwriting, and Scenes, Personalization, Knowledge Extraction, Conversation Analysis, Behavioral Modeling for Proactive Suggestions, and Privacy-Preserving Learning. These innovations form the foundation of the seamless, intelligent experiences our users enjoy every day. We are seeking engineers experienced in building tools for training, adapting and deploying large-scale generative models. You will be working alongside a cross functional team of engineers who own ML infrastructure u0026 algorithms, data scientists, designers, safety and UX engineers. In this role you will have a deep expertise in ML tooling, with a passion to empower engineers across the ML stack. Responsibilities include: - Contributing towards tools for large generative model training including diffusion u0026 autoregressive workflows - Tools for efficient inference and hosting of models for experimentation and human feedback - Tooling for model representation and efficient deployment on multiple HW targets incl. Apple Silicon - Benchmarking, Analysing and Improving training and inference performance - Integrating efficient data loading strategies and auto-eval workflows - CI/CD of base training workstreams Strong ML Fundamentals Experience working with large cross-functional and diverse teams. Bachelors, Masters, or PhD in Electrical Engineering/Computer Science or a related field (mathematics, physics or computer engineering), with a focus on machine learning; or comparable professional experience Experienced in training / adapting LLM and Diffusion models Advanced Fluency in PyTorch Excellent programming skills and experience contributing software to large projects Experience with distributed training of large models
    $150k-196k yearly est. 26d ago
  • On-device ML Infrastructure Engineer (ML User Experience APIs u0026 Integration)

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Imagine being at the forefront of an evolution where innovative AI meets the elegance of Apple silicon. The On-Device Machine Learning team transforms groundbreaking research into practical applications, enabling billions of Apple devices to run powerful AI models locally, privately, and efficiently. We stand at the unique intersection of research, software engineering, hardware engineering, and product development, making Apple a top destination for machine learning innovation. This team builds the essential infrastructure that enables machine learning at scale on Apple devices. This involves onboarding modern architectures to embedded systems, developing optimization toolkits for model compression and acceleration, building ML compilers and runtimes for efficient execution, and creating comprehensive benchmarking and debugging toolchains. This infrastructure forms the backbone of Apple's machine learning workflows across Camera, Siri, Health, Vision, and other core experiences, supplying to the overall Apple Intelligence ecosystem. If you are passionate about the technical challenges of running sophisticated ML models across all devices, from resource-constrained devices to powerful clusters, and eager to directly impact how machine learning operates across the Apple ecosystem, this role presents a great opportunity to work on the next generation of intelligent experiences on Apple platforms. Our group is seeking an ML Infrastructure Engineer, with a focus on ML user experience APIs and integration. The role is responsible for developing new ML model conversion and authoring APIs that serve as the main entry point into Apple's ML infrastructure. An engineer in this role will also drive the onboarding of popular and latest ML models-demonstrating end-to-end workflows that highlight both the authoring and runtime capabilities of Apple's ML ecosystem with strong, competitive performance on Apple platforms. The role also involves integrating these APIs into internal and external systems (e.g., Hugging Face) to showcase the most efficient path for bringing models into Apple's ML stack. This integration could involve a gamut of optimizations ranging from authored program optimizations (e.g., in PyTorch) to custom transformations within Apple's model representation. As an engineer in this role, you will be primarily focused on developing and using APIs that enable ML engineers to efficiently author and convert ML models to run effectively on Apple platforms. You will integrate Apple's ML tools into internal and external model repositories to evaluate and demonstrate how models can be efficiently ingested and implemented within Apple's ML stack. You will ideate, design, and stress test a variety of optimizations required to support these models, ranging from source-level optimizations (e.g., in the PyTorch program) to custom transformations within Apple's model representation. As a power user of Apple's ML infrastructure, you will also help create the latest and most capable models with strong, driven performance across hardware targets-showcasing the practical power of Apple's authoring and runtime APIs. This role offers the opportunity to shape how ML developers experience Apple's end-to-end inference stack, from model creation to deployment. The role requires a confirmed understanding of ML modeling (architectures, training vs. inference trade-offs, etc.), ML deployment optimizations (e.g., quantization), and strong experience designing Python APIs. We are building the first end-to-end developer experience for ML development that, by taking advantage of Apple's vertical integration, allows developers to iterate on model authoring, optimization, transformation, execution, debugging, profiling and analysis. Experience with C++, Swift, and/or GPU programming paradigms. Familiarity with QAT and other compression and quantization techniques employing PyTorch workflows. Experience designing Python APIs and deploying production-grade Python packages. Experience with MLIR/LLVM or similar compiler toolchains. Familiarity with Hugging Face or other model repositories. * Bachelors in Computer Sciences, Engineering, or related subject area. - Highly proficient in Python programming, familiarity with C++ is required. - Proficiency in at least one ML authoring framework, such as PyTorch, MLX, and JAX. Strong understanding of ML fundamentals, including common architectures such as Transformers. Hands-on experience with ML inference optimizations, such as quantization, pruning, KV caching, etc. Strong communication skills, including ability to connect with multi-functional audiences.
    $150k-196k yearly est. 11d ago
  • AIML - Core Infrastructure Engineering, Core Infrastructure

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Do you want to make Apple products smarter for our users? The AIML Core Infra team is looking for an experienced software engineer to work on core infrastructure for information intelligence at Apple. We build systems to connect Apple users to information as part of the wider Apple Intelligence initiative. You will work closely with product teams and information intelligence teams to deliver reliable, scalable, and efficient infrastructure that powers Apple products for billions of users. We design and build infrastructures to support features that empowers billions of Siri users. Our team processes trillions of links to find the best content to surface to users through search. We also analyze pages to extract critical features for indexing, ranking. We apply statistical analysis to improve link selection, freshness, retrieval rates, extraction quality, and many others. You'll have the opportunity to work with large scale systems with trillions of rows and many petabytes of data and incredible complexity. Experience with large-scale data processing, information retrieval, web search, or ranking systems BS, MS, or Ph.D. in Computer Science or a related field, or equivalent experience Proven experience designing, building, and operating large-scale distributed systems and backend services (e.g., using cloud platforms like AWS/GCP and container orchestration like Kubernetes) Strong software engineering fundamentals in data structures, algorithms, and system design Solid understanding of networking protocols (e.g., TCP/IP, DNS, HTTP) and their application in building large-scale systems Excellent communication and collaboration skills to work effectively with cross-functional teams
    $148k-192k yearly est. 55d ago
  • AIML - ML Infrastructure Engineer, ML Platform u0026 Technology - ML Compute

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something - you'll add something! As a Senior/Staff Engineer on the Foundation Model Compute Infra team, you will design and scale the scheduling and orchestration systems that power Apple's large-scale foundation model training and inference workloads across TPU clusters. You will drive innovations in resource management, cluster efficiency, and platform reliability, enabling Apple's next-generation AI models to train and serve at scale. Advance degrees in Computer Science, engineering, or a related field Proficient in working with and debugging accelerators, like: GPU, TPU, AWS Trainium Proficient in ML training and deployment frameworks, like: JAX, Tensorflow, PyTorch, TensorRT, vLLM Bachelors in Computer Science, engineering, or a related field Experience with foundation model training and inference workloads across TPU clusters 5+ years of hands-on experience in building scalable backend systems for training and evaluation of machine learning models Proficient in relevant programming languages, like Python or Go Strong expertise in distributed systems, reliability and scalability, containerization, and cloud platforms Proficient in cloud computing infrastructure and tools: Kubernetes, Ray, PySpark Ability to clearly and concisely communicate technical and architectural problems, while working with partners to iteratively find
    $148k-192k yearly est. 60d+ ago
  • AIML - Staff ML Infrastructure Engineer, ML Platform u0026 Technology - ML Compute

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something - you'll add something! As a staff engineer on ML Compute team, your work will include: - Lead the development of the infrastructure to run large-scale workloads on the Cloud, such as Apache Spark, Ray, and distributed training. - Optimize platform efficiency and throughput by improving resource management capabilities with schedulers like Apache YuniKorn and Kueue. - Integrate new features from core distributed computing and ML frameworks into the platform, offering them to production users and providing support. - Enhance the platform's scalability, performance, and observability through improved monitoring and logging. - Drive the architectural evolution of the platform by adopting modern, cloud-native technologies to improve system performance, efficiency, and scalability. - Reduce dev-ops efforts by automating and streamlining operational processes. - Mentor engineers in areas of your expertise, fostering skill growth and knowledge sharing. Advance degrees in Computer Science, engineering, or a related field. Hands-on experience with cloud-native resource management and scheduling tools like Apache YuniKorn. Experience with advanced architecture for distributed data processing and ML workloads. Proficient in working with and debugging accelerators, like: GPU, TPU, AWS Trainium. Bachelors in Computer Science, engineering, or a related field 6+ years of hands-on experience in building scalable backend systems for training and evaluation of machine learning models Proficient in relevant programming languages, like Python or Go Strong expertise in distributed systems, reliability and scalability, containerization, and cloud platforms Proficient in cloud computing infrastructure and tools: Kubernetes, Ray, PySpark Ability to clearly and concisely communicate technical and architectural problems, while working with partners to iteratively find solutions
    $148k-192k yearly est. 60d+ ago
  • Virtual Memory Kernel Engineer

    Apple Inc. 4.8company rating

    Senior infrastructure engineer job at Apple

    The Darwin Systems team within Apple's CoreOS organization is responsible for delivery of a high-quality and performant kernel for just about every one of Apple's products. Our software runs on your wrist as part of watch OS; in your pocket with iOS; on your desk in mac OS; in your living room with tv OS; and now in vision OS and Apple's Cloud. These are the devices owned by your friends and family; and hundreds of millions of devices beyond those. We ensure every aspect of the kernel and other system software are top class: features, performance, stability, security… This position requires a solid understanding of operating systems fundamentals, including kernel design and implementation. The virtual memory team is in charge of page management, mechanisms such as copy on write, low-memory process killing, swap… We work with every layer of the stack: from hardware all the way up to applications and successful engineers will be able to dig deep into details and work with other engineers to solve problems, find opportunities to keep on improving our stack and design to improve our customers' experience. As Moore's law is slowing down, effective management of resources is becoming more and more important. We work closely with all product teams across Apple to provide them with a modern, efficient operating system that allows them to ship the kinds of quality products that our customers expect. Come work with us on Apple's operating systems and get a chance to influence design across the stack: from Silicon all the way up to the SDK and applications while focusing on performance and delivering value to our customers. BS/MS in Computer Science + 5 years work experience or equivalent knowledge and experience Ability to work with teams across multiple timezones. Familiarity with Unix and associated tools. Ability to ramp up quickly on an unfamiliar code base. In-depth knowledge of kernel internals. Highly professional, with the ability to multitask and deliver solid work on tight schedules. A demonstrated record of working on core operating system technologies, specifically around memory management in a modern kernel. Design and implementation responsibility for a major project. Demonstrated creative and critical thinking capabilities and troubleshooting skills. Familiarity with modern processor architecture (e.g. memory hierarchy, multi-core, multithreading, etc).
    $131k-169k yearly est. 60d+ ago

Learn more about Apple jobs

View all jobs