Post job

Senior Reliability Engineer jobs at Intel - 120 jobs

  • SoC GFX PnP Validation Engineer

    Intel Corp 4.7company rating

    Senior reliability engineer job at Intel

    Intel Silicon Architecture, Power-Performance (SiA PTP) team is looking for a highly skilled SoC performance engineer to join our PnP validation team. We are a post silicon team responsible for attaining power-performance expectations for Intel's integrated graphics products. As a PnP validation engineer, you will be able to develop, execute validation test plans and directed tests to ensure adherence to validation standards and procedures. In this role, you will have the opportunity to work on cutting edge technology, use best in class systems and equipment. We expect you to have good scripting skills and be able to process large amounts of performance data, use automation for analyzing and optimizing Graphics performance. You will also work closely with cross-functional teams to ensure our SoCs meet the highest standards of performance, power efficiency and identifying opportunities for improvement. Qualifications: Minimum qualifications are required to be initially considered for this position. Requirements listed would be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research. Minimum Qualifications * Bachelor's degree with 3+ years of experience or Master's degree with 2+ years of experience in Electrical Engineering, Computer Engineering or related field. * Post-silicon validation using test equipment, computer systems, and test station Software development. * Experience with scripting languages such as Python, C/C++ and/or Perl. * Preferred Qualifications: * CPU micro-architectural and/or high-speed bus protocol experience. * Experience in design, verification or validation disciplines, system/platform level debug and root cause isolation, methodology and tools. * Experience with Power measurement and debug tools * Experience with leading Automation efforts, planning, execution and overseeing system validation activities. * Deep technical knowledge in performance and power management including understanding of architecture and microcode sufficient to understand potential Power and performance impacts of changes. * Familiarity with industry benchmarks, which aspects of performance they measure and competitive analysis from previous gen products. * Experience with Performance counters. Job Type: Experienced Hire Shift: Shift 1 (United States of America) Primary Location: US, California, Folsom Additional Locations: Business group: The Silicon Engineering Group (SIG) is a worldwide organization focused on the development and integration of SOCs, Cores, and critical IPs from architecture to manufacturing readiness that power Intel's leadership products. This business group leverages an incomparable mix of experts with different backgrounds, cultures, perspectives, and experiences to unleash the most innovative, amazing, and exciting computing experiences. Posting Statement: All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance. Position of Trust N/A Benefits: We offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock, bonuses, as well as, benefit programs which include health, retirement, and vacation. Find more information about all of our Amazing Benefits here: ********************************************************************************** Annual Salary Range for jobs which could be performed in the US: 121,050.00 USD - 227,620.00 USD The range displayed on this job posting reflects the minimum and maximum target compensation for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific compensation range for your preferred location during the hiring process. Work Model for this Role This role will require an on-site presence. * Job posting details (such as work model, location or time type) are subject to change.
    $96k-125k yearly est. Auto-Apply 27d ago
  • Job icon imageJob icon image 2

    Looking for a job?

    Let Zippia find it for you.

  • Senior Reliability Methodology Development Engineer

    Nvidia 4.9company rating

    California jobs

    For two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics and data centers products. With our invention of the GPU - the engine of innovative visual computing, AI, Robotics. NVIDIA is now passionate about innovation at the intersection of visual processing, high performance computing, and artificial intelligence. Join NVIDIA to work on powerful technology in a forward-thinking environment. Contribute to projects that are groundbreaking in AI and computing. What you'll be doing: As a Reliability Methodology Engineer at NVIDIA, you will be responsible for ensuring our products and systems operate flawlessly. Your key duties will include: Collaborate with design, product, and test engineering teams to apply DFT methodologies to improve reliability screening specific to HTOL (Component level Hight Temp Op. Life Test). Familiarity with DFR best practices including reliability prediction, derating, and physics of failure modeling. Developing and implementing reliability test plans and protocols Develop and implement DFR (DFX) strategies early in the product lifecycle to ensure long-term field reliability, especially in critical datacenter/Robotics environments. Conducting failure mode and effects analysis (FMEA) at product development stage Collaborating with cross-functional teams to determine root causes of failures Crafting and maintaining detailed documentation of reliability tests and results Analyzing data to identify trends and areas for improvement Ensuring compliance with industry standards Providing technical support and mentorship to other departments What we need to see: A bachelor's or higher degree in Electrical Engineering, Computer Engineering or equivalent experience. MS or PhD. in EE is preferable experience in Engineering or a related field. 5+ years of experience. Proven experience in reliability engineering, preferably in the technology sector, JEDEC (e.g., JESD22, JESD47) and AEC-Q100 standards Hands-on experience in HTOL hardware development Background in DFT, including techniques for enhanced screening, debugging, and stress pattern development Strong analytical and problem-solving skills Excellent written and verbal communication abilities Proficient with reliability data analysis tools (JMP/MATLAB/Python) and methodologies With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technological world's most desirable employers. We have some of the most brilliant and talented people in the world working for us. Are you creative and autonomous, with a genuine passion for technology? We want to hear from you. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 136,000 USD - 218,500 USD for Level 3, and 168,000 USD - 264,500 USD for Level 4. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until December 15, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $151k-199k yearly est. Auto-Apply 60d+ ago
  • Senior Site Reliability Engineer - Observability and Telemetry Platform

    Nvidia 4.9company rating

    California jobs

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow. What you'll be doing: Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, real time monitoring, logging and alerting Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation and refinement Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews Maintain services once they are live by measuring and monitoring availability, latency and overall system health Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity Practice sustainable incident response and blameless postmortems Be part of an on call rotation to support production systems What we need to see: BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience 5+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud system in Production 8+ years experience delivering foundational infrastructure and observability platforms. Experience in one or more of the following: Python, Go, Perl or Ruby In depth knowledge on Linux, Networking and Containers Ways to stand out from the crowd: Interest in crafting, analyzing and fixing large-scale distributed systems Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. Ability to debug and optimize code and automate routine tasks Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack and Docker. Experience running Grafana, OpenTelemetry, Prometheus, and similar observability focused tools Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 270,250 USD for Level 4, and 208,000 USD - 333,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until December 29, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $151k-199k yearly est. Auto-Apply 60d+ ago
  • Senior Site Reliability Engineer - DGX Cloud

    Nvidia 4.9company rating

    Santa Clara, CA jobs

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demand knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow. What you'll be doing: Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation and refinement. Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity Practice sustainable incident response and blameless postmortems Be part of an on call rotation to support production systems What we need to see: BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience. 10+ years of experience. Experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud system in Production Experience in one or more of the following: Python, Go, Perl or Ruby In depth knowledge on Linux, Networking and Containers Ways to stand out from the crowd: Interest in crafting, analyzing and fixing large-scale distributed systems. Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. Ability to debug and optimize code and automate routine tasks. Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack and Docker NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hard-working people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 270,250 USD for Level 4, and 208,000 USD - 333,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until December 29, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $154k-204k yearly est. Auto-Apply 7d ago
  • Senior Site Reliability Engineer, BCM - DGX Cloud

    Nvidia 4.9company rating

    Santa Clara, CA jobs

    NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for over 25 years. It's a unique legacy of innovation fueled by great technology-and dynamic people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. NVIDIANS immerse themselves in a diverse, supportive environment that encourages everyone to do their best work. Join the team and see how you can make a lasting impact on the world. NVIDIA Base Command Manager powers thousands of clusters worldwide, varying from a few to several thousands of nodes, and streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all the tools you need to deploy and run an AI data center. We take great pride in providing excellent, comprehensive support to our customers! Sr Site Reliability Engineer in this role will significantly impact and contribute to the overall success of both external customers running their clusters with NVIDIA solutions AND internal clusters used for research, operations, and next-generation projects. What you'll be doing: Contributing to deployments and daily operations of large scale next-generation GPU platforms Handling incidents in GPU clusters, bridging the gap between cluster operations and development Designing and implementing small features in the Base Command Manager product to become intimately familiar with the workings of the product Validating complex cluster configurations including Slurm and Kubernetes orchestrators for performance, scalability and resilience, ensuring they meet real-world customer scenarios. What we need to see: Bachelor's Degree or equivalent experience in Computer Science or related field. 8+ years of experience in site reliability engineering and/or software development roles. Fluency in Python In-depth knowledge of Linux and networking Ways to stand out from the crowd: Experience with C++, high-performance computing, Kubernetes and/or system administration would be an asset Previous experience as a system admin running BCM/Bright Cluster Manager/Base Command Manager clusters is a definite plus. Proficiency with cluster networking including InfiniBand and Spectrum-X NVIDIA is widely considered one of the world's most desirable employers in technology. We have some of the world's most forward-thinking and passionate people working for us. If you're creative and autonomous, we want to hear from you! Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 270,250 USD for Level 4, and 208,000 USD - 333,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until December 29, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $154k-204k yearly est. Auto-Apply 7d ago
  • Senior Reliability Methodology Development Engineer

    Nvidia 4.9company rating

    Santa Clara, CA jobs

    For two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics and data centers products. With our invention of the GPU - the engine of innovative visual computing, AI, Robotics. NVIDIA is now passionate about innovation at the intersection of visual processing, high performance computing, and artificial intelligence. Join NVIDIA to work on powerful technology in a forward-thinking environment. Contribute to projects that are groundbreaking in AI and computing. What you'll be doing: As a Reliability Methodology Engineer at NVIDIA, you will be responsible for ensuring our products and systems operate flawlessly. Your key duties will include: * Collaborate with design, product, and test engineering teams to apply DFT methodologies to improve reliability screening specific to HTOL (Component level Hight Temp Op. Life Test). * Familiarity with DFR best practices including reliability prediction, derating, and physics of failure modeling. * Developing and implementing reliability test plans and protocols * Develop and implement DFR (DFX) strategies early in the product lifecycle to ensure long-term field reliability, especially in critical datacenter/Robotics environments. * Conducting failure mode and effects analysis (FMEA) at product development stage * Collaborating with cross-functional teams to determine root causes of failures * Crafting and maintaining detailed documentation of reliability tests and results * Analyzing data to identify trends and areas for improvement * Ensuring compliance with industry standards * Providing technical support and mentorship to other departments What we need to see: * A bachelor's or higher degree in Electrical Engineering, Computer Engineering or equivalent experience. MS or PhD. in EE is preferable experience in Engineering or a related field. * 5+ years of experience. * Proven experience in reliability engineering, preferably in the technology sector, JEDEC (e.g., JESD22, JESD47) and AEC-Q100 standards * Hands-on experience in HTOL hardware development * Background in DFT, including techniques for enhanced screening, debugging, and stress pattern development * Strong analytical and problem-solving skills * Excellent written and verbal communication abilities * Proficient with reliability data analysis tools (JMP/MATLAB/Python) and methodologies With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technological world's most desirable employers. We have some of the most brilliant and talented people in the world working for us. Are you creative and autonomous, with a genuine passion for technology? We want to hear from you. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 136,000 USD - 218,500 USD for Level 3, and 168,000 USD - 264,500 USD for Level 4. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until December 15, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $154k-204k yearly est. Auto-Apply 60d+ ago
  • Reliability Engineer Level 3/4

    Northrop Grumman 4.7company rating

    Los Angeles, CA jobs

    RELOCATION ASSISTANCE: Relocation assistance may be available CLEARANCE TYPE: SecretTRAVEL: Yes, 10% of the TimeDescriptionAt Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled us to be at the forefront of many technological advancements in our nation's history - from the first flight across the Atlantic Ocean, to stealth bombers, to landing on the moon. We look for people who have bold new ideas, courage and a pioneering spirit to join forces to invent the future, and have fun along the way. Our culture thrives on intellectual curiosity, cognitive diversity and bringing your whole self to work - and we have an insatiable drive to do what others think is impossible. Our employees are not only part of history, they're making history. At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled us to be at the forefront of many technological advancements in our nation's history - from the first flight across the Atlantic Ocean, to stealth bombers, to landing on the moon. We look for people who have bold new ideas, courage and a pioneering spirit to join forces to invent the future, and have fun along the way. Our culture thrives on intellectual curiosity, cognitive diversity and bringing your whole self to work - and we have an insatiable drive to do what others think is impossible. Our employees are not only part of history, they're making history. Northrop Grumman's Defense Systems is looking for a Reliability Engineer Level 3/4 with the ability to obtain a security clearance to work with our talented Advanced Weapons team located in Northridge, CA. As an integral part of our multi-discipline engineering team in Advanced Weapons, you will be on the forefront of developing next generation solutions to protect technology for our nation's warfighter. The ideal candidate will be supporting our Advanced Programs for defining and architecting a secured missile weapon system. This position will require close collaboration with team members of other projects within the organization for security architecture designs as well as external partners and customers. This position requires the selected candidate to work on-site in Northridge, CA. Telework is not available for this position. Relocation assistance, although not guaranteed, may be available. What You'll Get to Do: Perform reliability and maintainability allocations and predictions for missile systems. Develop/Analyze Failure Modes Effects and Criticality Analysis (FMECA), Built-In Test (BIT) and Testability analyses. Develop mission success/reliability analyses. Develop/Analyze Fault Tree Analysis (FTA). Support Failure Reporting Analysis and Corrective Action System (FRACAS). Perform Failure Analysis and Risk Assessment. Support BIT Design Verification Testing to evaluate BIT effectiveness. Support Maintainability and Testability Demonstrations. Support Highly Accelerated Life Testing (HALT) and Reliability Growth Testing (RGT). Utilize knowledge of documents such as MIL-HDBK-338, MIL-HDBK-472, MIL-STD-2155 and MIL-HDBK-217 F(N2). Develop trustworthy customer relationships via excellent communication, technical insight and successful execution of programs. Collaborate with Integrated Product Teams (IPT), program management, customers, and suppliers for the reliability, maintainability and testability (RMT) aspects of assigned systems. Support identifying, planning and quoting the RMT tasks for proposals. Basic Qualifications Level 3: BS in Electrical Engineering, Aerospace Engineering or related STEM discipline with a minimum of 5 years of relevant experience or MS with 3 years of relevant experience. Demonstrated experience in Reliability and Maintainability disciplines, such as familiarity with MIL-HDBK-338, MIL-HDBK-472, MIL-STD-2155 and MIL-HDBK-217 F(N2). Familiarity with electronic circuit schematics and component failure modes/mechanisms. Clear and effective verbal and written communication skills to be able to interact with team members, peers and leadership within NGDS and the customer community. Innovative mindset with focus on process improvement. U.S. citizen with the ability to obtain a Secret DoD Security Clearance within a reasonable period of time to support program needs. Basic Qualifications Level 4: BS in Electrical Engineering, Aerospace Engineering or related STEM discipline with a minimum of 8 years of relevant experience, or MS with 6 years of relevant experience, or PhD with 4 years relevant experience. Demonstrated experience in Reliability and Maintainability disciplines, such as familiarity with MIL-HDBK-338, MIL-HDBK-472, MIL-STD-2155 and MIL-HDBK-217 F(N2). Familiarity with electronic circuit schematics and component failure modes/mechanisms. Clear and effective verbal and written communication skills to be able to interact with team members, peers and leadership within NGDS and the customer community. Innovative mindset with focus on process improvement. U.S. citizen with the ability to obtain a Secret DoD Security Clearance within a reasonable period of time to support program needs. Preferred Qualifications Level 3/4: Electrical Engineering Degree. Experience with hands-on RMT analysis and modeling skills for electronic and non-electronic systems. Knowledge of reliability techniques including Weibull analysis, Reliability Centered Maintenance, Monte Carlo Simulation and others. Experience with Reliability analysis software and modeling tools. Excellent communication and interpersonal skills, and the ability to interface with all levels of employees, management and customers . Knowledge of Reliability Maintainability and Testability disciplines. Ability to read circuit schematics and mechanical drawings. Possess active Secret DoD Security Clearance. Primary Level Salary Range: $100,300.00 - $150,500.00Secondary Level Salary Range: $124,900.00 - $187,300.00The above salary range represents a general guideline; however, Northrop Grumman considers a number of factors when determining base salary offers such as the scope and responsibilities of the position and the candidate's experience, education, skills and current market conditions.Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay. Annual bonuses are designed to reward individual contributions as well as allow employees to share in company results. Employees in Vice President or Director positions may be eligible for Long Term Incentives. In addition, Northrop Grumman provides a variety of benefits including health insurance coverage, life and disability insurance, savings plan, Company paid holidays and paid time off (PTO) for vacation and/or personal business.The application period for the job is estimated to be 20 days from the job posting date. However, this timeline may be shortened or extended depending on business needs and the availability of qualified candidates.Northrop Grumman is an Equal Opportunity Employer, making decisions without regard to race, color, religion, creed, sex, sexual orientation, gender identity, marital status, national origin, age, veteran status, disability, or any other protected class. For our complete EEO and pay transparency statement, please visit *********************************** U.S. Citizenship is required for all positions with a government clearance and certain other restricted positions.
    $124.9k-187.3k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer

    IBM 4.7company rating

    San Jose, CA jobs

    **Introduction** At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk **Your role and responsibilities** As a Site Reliability Engineer at IBM, you'll get to work on the systems that are driving the quantum revolution and the AI era. Join our team of creators - the people who help move IBM forward by using their imagination to envision solutions, their curiosity to experiment with new ideas, and their ever-growing skills into action for customers to make better decisions with greater speed on the most trusted platforms in today's market. IBM has an opening for Site Reliability Engineer to join our team and help build highly reliable, scalable, and efficient systems. In this role, you will collaborate with development and operations teams to design, implement, and maintain infrastructure that supports mission-critical applications. This is an excellent opportunity for someone with foundational experience who is eager to grow in the SRE discipline.SRE Engineers participate in various aspects of the life cycle, test, and support process, such as: * Design and implement automation for deployment, monitoring, and incident response. * Maintain and improve system reliability, availability, and performance. * Collaborate with software engineers to ensure services are resilient and scalable. * Participate in on-call rotations and incident management. * Contribute to documentation and best practices for reliability engineering. **Required technical and professional expertise** Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience). 1/2-1 year of experience in software development, systems administration, or DevOps. Proficiency in at least one programming language (Python, Go, Java, or similar). Familiarity with Linux/Unix systems and networking fundamentals. Strong verbal and written communication skills. Passion for technology and engineering. Growth minded, trusted, team focused, courageous, resourceful, and outcome focused. **Preferred technical and professional experience** Experience with containerization (Docker, Kubernetes). Knowledge of CI/CD pipelines and automation tools. Exposure to monitoring and observability tools (Prometheus, Grafana, ELK). Strong analytical and problem-solving skills. IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
    $97k-135k yearly est. 15d ago
  • Process Engineer III, Senior

    Applied Materials 4.5company rating

    Santa Clara, CA jobs

    Who We Are Applied Materials is a global leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. We design, build and service cutting-edge equipment that helps our customers manufacture display and semiconductor chips - the brains of devices we use every day. As the foundation of the global electronics industry, Applied enables the exciting technologies that literally connect our world - like AI and IoT. If you want to push the boundaries of materials science and engineering to create next generation technology, join us to deliver material innovation that changes the world. What We Offer Salary: $124,000.00 - $171,000.00 Location: Santa Clara,CA You'll benefit from a supportive work culture that encourages you to learn, develop, and grow your career as you take on challenges and drive innovative solutions for our customers. We empower our team to push the boundaries of what is possible-while learning every day in a supportive leading global company. Visit our Careers website to learn more. At Applied Materials, we care about the health and wellbeing of our employees. We're committed to providing programs and support that encourage personal and professional growth and care for you at work, at home, or wherever you may go. Learn more about our benefits. Process Engineer III, Senior Who We Are Applied Materials is the global leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. We design, build and service cutting-edge equipment that helps our customers manufacture display and semiconductor chips - the brains of devices we use every day. As the foundation of the global electronics industry, Applied enables the exciting technologies that literally connect our world - like AI and IoT. If you want to work beyond the cutting-edge, continuously pushing the boundaries of science and engineering to make possible the next generations of technology, join us to Make Possible a Better Future. What We Offer Location: Onsite; Santa Clara, CA At Applied, we prioritize the well-being of you and your family and encourage you to bring your best self to work. Your happiness, health, and resiliency are at the core of our benefits and wellness programs. Our robust total rewards package makes it easier to take care of your whole self and your whole family. We're committed to providing programs and support that encourage personal and professional growth and care for you at work, at home, or wherever you may go. Learn more about our Benefits. You'll also benefit from a supportive work culture that encourages you to learn, develop and grow your career as you take on challenges and drive innovative solutions for our customers. We empower our team to push the boundaries of what is possible-while learning every day in a supportive leading global company. Visit our Careers website to learn more about careers at Applied. What You'll Do As a Process Engineer, you'll play a critical role in solving our customers' biggest challenges by designing and optimizing display and semiconductor manufacturing processes. You will work on some of the most advanced and challenging technologies in the world, shaping the way process engineering evolves for our business and the semiconductor industry. Process Engineers at Applied enjoy a diverse and exciting work experience-no two days are the same and the challenges are complex and interesting. You will experiment, learn, and collaborate with some of the brightest minds in the semiconductor and display industries, partnering with our globally recognized R&D teams on state-of-the-art research and development projects. Role Responsibilities: Design, collect data, analyze, and compile reports on difficult process engineering experiments Perform hardware characterization on a variety of difficult systems Troubleshoot complex problems, perform Root Cause Analysis, and resolve a variety of difficult process engineering issues Measure film properties and interpret data Generate internal documentation for products, presentations, and technical reports Communicate and engage with customers to resolve complex process engineering issues or concerns, under limited supervision Identify, select, and partner with vendors and suppliers Implement new technology, products, and analytical instrumentation Serve as a resource for junior colleagues; lead project teams, as needed Minimum Qualifications: Bachelor's degree in a related field 4-7 years of related experience Demonstrates conceptual and practical expertise in complex problem solving All qualified applicants will receive consideration for employment without regard to race, color, national origin, religion, sex, disability, protected veteran status, or any other characteristics protected by law. Travel is needed. Additional Information Time Type: Full time Employee Type: Assignee / Regular Travel: Yes, 10% of the Time Relocation Eligible: No The salary offered to a selected candidate will be based on multiple factors including location, hire grade, job-related knowledge, skills, experience, and with consideration of internal equity of our current team members. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation such as participation in a bonus and a stock award program, as applicable. For all sales roles, the posted salary range is the Target Total Cash (TTC) range for the role, which is the sum of base salary and target bonus amount at 100% goal achievement. Applied Materials is an Equal Opportunity Employer. Qualified applicants will receive consideration for employment without regard to race, color, national origin, citizenship, ancestry, religion, creed, sex, sexual orientation, gender identity, age, disability, veteran or military status, or any other basis prohibited by law. In addition, Applied endeavors to make our careers site accessible to all users. If you would like to contact us regarding accessibility of our website or need assistance completing the application process, please contact us via e-mail at Accommodations_****************, or by calling our HR Direct Help Line at ************, option 1, and following the prompts to speak to an HR Advisor. This contact is for accommodation requests only and cannot be used to inquire about the status of applications.
    $124k-171k yearly Auto-Apply 18d ago
  • Customer Reliability Engineer

    Cisco 4.8company rating

    San Jose, CA jobs

    **_This is a fully remote position open to candidates located in the United States with a strong preference for candidates based on the West Coast, with the ability to work in the Pacific Time Zone._** Application window is expected to close on 12/25/2025. However, the job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. **Meet The Team** Isovalent is the company founded by the creators of Cilium and eBPF. Isovalent builds open-source software and enterprise solutions solving networking, security, and observability needs for modern cloud native infrastructure. The flagship technology, Cilium, is the choice of leading global organizations, including Adobe, AWS, Capital One, Datadog, GitLab, Google, and many more. **Your Impact** As a Technical Consulting Engineer, you are the tip of the spear in interacting with our customers. Our CRE team adapts the best practices of Site Reliability Engineering (SRE) and applies them to our customers. As part of the role, you will gain a deep understanding of our customers, their architecture down into their various configurations. The main mission of this role is to ensure that our customers can continue running Isovalent networking for Kubernetes, reliably, at scale. You will work with various stakeholders, internally and externally to provide world class support and issue resolution to various incidents and enhance our organization's view into the health of our various customers. This role takes a proactive approach vs a reactive approach to customer reliability and you will use existing data to help us and our customers be aware of upcoming reliability risks. **Minimum Qualifications** + Minimum of 5 years hands-on experience managing and scaling advanced Kubernetes clusters in production environments. + Minimum of 2 years of hands-on experience configuring and managing Cisco Nexus switches in production environments. + Demonstrated expertise in networking concepts and technologies across OSI layers 2 through 7. **Preferred Qualifications** + At least 2 years of direct experience supporting and engaging with enterprise customers in a technical capacity. + Experience resolving issues with Kubernetes and cloud-native technologies at a large production scale. + Extensive knowledge of Customer Reliability Engineering (CRE) practices, including Production Readiness Reviews (PRRs), Customer Test Environments (CuTEs), tooling, monitoring, knowledge base creation, and retrospectives. + Extensive experience with at least one major cloud provider (AWS, Azure, or GCP). + Strong familiarity with standard methodologies in operating system security and their application in cloud-native technologies. + Dedication to teamwork through the creation of valuable content for customers and internal team members. + Emphasis on using automation technologies for efficiency in handling repetitive tasks. **Why Cisco?** At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. **Message to applicants applying to work in the U.S. and/or Canada:** The starting salary range posted for this position is $158,200.00 to $200,700.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: + 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees + 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco + Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees + Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) + 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next + Additional paid time away may be requested to deal with critical or emergency issues for family members + Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: + .75% of incentive target for each 1% of revenue attainment up to 50% of quota; + 1.5% of incentive target for each 1% of attainment between 50% and 75%; + 1% of incentive target for each 1% of attainment between 75% and 100%; and + Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $158,200.00 - $241,700.00 Non-Metro New York state & Washington state: $140,600.00 - $241,800.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. ** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements. Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.
    $158.2k-241.7k yearly 26d ago
  • Customer Reliability Engineer

    Cisco Systems, Inc. 4.8company rating

    San Jose, CA jobs

    This is a fully remote position open to candidates located in the United States with a strong preference for candidates based on the West Coast, with the ability to work in the Pacific Time Zone. Application window is expected to close on 12/25/2025. However, the job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Meet The Team Isovalent is the company founded by the creators of Cilium and eBPF. Isovalent builds open-source software and enterprise solutions solving networking, security, and observability needs for modern cloud native infrastructure. The flagship technology, Cilium, is the choice of leading global organizations, including Adobe, AWS, Capital One, Datadog, GitLab, Google, and many more. Your Impact As a Technical Consulting Engineer, you are the tip of the spear in interacting with our customers. Our CRE team adapts the best practices of Site Reliability Engineering (SRE) and applies them to our customers. As part of the role, you will gain a deep understanding of our customers, their architecture down into their various configurations. The main mission of this role is to ensure that our customers can continue running Isovalent networking for Kubernetes, reliably, at scale. You will work with various stakeholders, internally and externally to provide world class support and issue resolution to various incidents and enhance our organization's view into the health of our various customers. This role takes a proactive approach vs a reactive approach to customer reliability and you will use existing data to help us and our customers be aware of upcoming reliability risks. Minimum Qualifications * Minimum of 5 years hands-on experience managing and scaling advanced Kubernetes clusters in production environments. * Minimum of 2 years of hands-on experience configuring and managing Cisco Nexus switches in production environments. * Demonstrated expertise in networking concepts and technologies across OSI layers 2 through 7. Preferred Qualifications * At least 2 years of direct experience supporting and engaging with enterprise customers in a technical capacity. * Experience resolving issues with Kubernetes and cloud-native technologies at a large production scale. * Extensive knowledge of Customer Reliability Engineering (CRE) practices, including Production Readiness Reviews (PRRs), Customer Test Environments (CuTEs), tooling, monitoring, knowledge base creation, and retrospectives. * Extensive experience with at least one major cloud provider (AWS, Azure, or GCP). * Strong familiarity with standard methodologies in operating system security and their application in cloud-native technologies. * Dedication to teamwork through the creation of valuable content for customers and internal team members. * Emphasis on using automation technologies for efficiency in handling repetitive tasks. Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. Message to applicants applying to work in the U.S. and/or Canada: The starting salary range posted for this position is $158,200.00 to $200,700.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: * 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees * 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco * Non-exempt employees receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees * Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) * 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next * Additional paid time away may be requested to deal with critical or emergency issues for family members * Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: * .75% of incentive target for each 1% of revenue attainment up to 50% of quota; * 1.5% of incentive target for each 1% of attainment between 50% and 75%; * 1% of incentive target for each 1% of attainment between 75% and 100%; and * Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $158,200.00 - $241,700.00 Non-Metro New York state & Washington state: $140,600.00 - $241,800.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements.
    $158.2k-241.7k yearly 26d ago
  • Reliability Engineer

    Meta Platforms, Inc. 4.8company rating

    Sunnyvale, CA jobs

    As a Reliability Engineer in Meta Reality Labs, you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. You will collaborate with a large breadth of cross-functional disciplines to understand emerging designs and technologies. You will be responsible for identifying risks associated with these various technologies and architectures. You will be responsible for identifying appropriate stresses that our products must pass and creating test plans which will be used to assess the reliability of our products. You will be responsible for producing accurate reliability analyses in a timely manner. You will be responsible for providing suggestions to improve the reliability of our products. We are looking for someone who can wear multiple hats depending on the task at hand, has a "can do" attitude with a demonstrated background in reliability engineering, high attention to detail, thirst for knowledge, and an inherent interest in all aspects of engineering. Minimum Qualifications * Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta * At least 3 years of experience in reliability engineering or related field * Knowledge of mechanical design, material science, and failure mechanisms of consumer electronics * Experience developing and performing reliability tests including (but not limited to) mechanical, shock, vibration, environmental, and characterization testing * Knowledge of industry test standards including ASTM, MIL, JEDEC, and IEC * Experience with failure analysis techniques and tools (but not limited to) optical microscopy, X-Ray, CT scan, SEM, EDX, and FTIR * Ability to manage multiple tasks in a dynamic environment Preferred Qualifications * Experience working on opto-mechanical systems and modules Responsibilities * Develop reliability requirements at system and module level for consumer facing hardware * Lead stress characterization of appropriate usage conditions * Lead and document FMEA (Failure Mode Effects Analysis) * Develop acceleration models and accelerated tests * Develop new reliability test specifications and procedures * Develop appropriate test plans * Apply design of experiments (DOE) * In-depth research into applicable failure mechanisms * Contribute to failure analysis and use correct methodology to identify root cause of failures * Manage reliability testing and analysis through product lifecycle * Provide accurate and in-depth statistical analysis * Prepare detailed and concise reliability reports * Collaborate with cross-functional teams to implement mitigations for reliability risks * International travel, up to 20% About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $161k-208k yearly est. 7d ago
  • Reliability Engineer

    Meta Platforms, Inc. 4.8company rating

    Sunnyvale, CA jobs

    As a Reliability Engineer in Meta Reality Labs, you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. You will collaborate with a large breadth of cross-functional disciplines to understand emerging designs and technologies. You will be responsible for identifying risks associated with these various technologies and architectures. You will be responsible for identifying appropriate stresses that our products must pass and creating test plans which will be used to assess the reliability of our products. You will be responsible for producing accurate reliability analyses in a timely manner. You will be responsible for providing suggestions to improve the reliability of our products. We are looking for someone who can wear multiple hats depending on the task at hand, has a "can do" attitude with a demonstrated background in reliability engineering, high attention to detail, thirst for knowledge, and an inherent interest in all aspects of engineering. Minimum Qualifications * Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience * Currently has, or is in the process of obtaining a Bachelor's degree in Mechanical Engineering, Materials Engineering, Electrical Engineering, or a related technical field (or equivalent practical experience). Degree must be completed prior to joining Meta * At least 5 years of experience in reliability engineering or related field * Knowledge of mechanical design, material science, and failure mechanisms of consumer electronics * Experience in shipping products and familiarity with PDP cycles * Experience developing and performing reliability tests including (but not limited to) mechanical, shock, vibration, environmental, and characterization testing * Knowledge of industry test standards including ASTM, MIL, JEDEC, and IEC * Experience with failure analysis techniques and tools (but not limited to) optical microscopy, X-Ray, CT scan, SEM, EDX, and FTIR * Experience managing multiple tasks in a dynamic environment Preferred Qualifications * Experience working on opto-mechanical systems and modules Responsibilities * Develop reliability requirements at system and module level for consumer facing hardware * Lead stress characterization of appropriate usage conditions * Lead and document FMEA (Failure Mode Effects Analysis) * Develop acceleration models and accelerated tests * Develop new reliability test specifications and procedures * Develop appropriate test plans * Apply design of experiments (DOE) * In-depth research into applicable failure mechanisms * Contribute to failure analysis and use correct methodology to identify root cause of failures * Manage reliability testing and analysis through product lifecycle * Provide accurate and in-depth statistical analysis * Prepare detailed and concise reliability reports * Collaborate with cross-functional teams to implement mitigations for reliability risks * International travel, up to 20% About Meta Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics. Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
    $161k-208k yearly est. 14d ago
  • ASIC Reliability Engineer

    Cisco 4.8company rating

    San Francisco, CA jobs

    The application window is expected to close on: January 13, 2026. NOTE: Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Candidate must be local to the San Jose, CA area as this is a hybrid role. **Meet the Team** Cisco Silicon One ASICs are transforming the Future of the Internet and the unified, power-efficient architecture is rapidly gaining traction in the market. The Silicon One Productization team takes enables technology, supplier readiness and takes these cutting-edge ASICs to production and we are looking for an experienced Reliability Engineer to join Quality and Reliability organization. Join an exciting team that works in a fast-paced, start-up type environment to establish robust processes to scale Cisco's Silicon Components business. **Your Impact** As a Product Reliability Engineer, you will play a critical role in ensuring our products meet the highest standards of quality and long-term performance. You will lead end-to-end product qualification efforts, applying your semiconductor engineering expertise to validate new designs, processes, and technologies. In this highly visible role, you'll partner with teams across design, manufacturing, suppliers, and leadership-communicating clearly, navigating complex technical challenges, and driving informed decision-making. Your ability to analyze data, resolve issues, and influence stakeholders will directly strengthen product reliability and customer trust. + Own and execute reliability test plans for new products, from sample testing through full qualification, including coordination with internal and external labs. + Assess IC, package, and process reliability to industry standards, performing key evaluations such as HTOL, HAST, ESD/LU, and supporting detailed debug activities. + Lead cross-functional failure analysis and structured problem-solving (8D, 5-Whys, cause-and-effect) to drive issue resolution and long-term prevention. + Analyze reliability data, trends, and acceleration models to evaluate field readiness, identify risks, and support qualification of new processes and package technologies. + Deliver clear qualification reports, collaborate closely with suppliers and engineering partners, and plan reliability resources to support both new product releases and ongoing monitoring programs. **Minimum Qualifications** + Bachelor's degree in Electrical Engineering or a related field. + 5+ years of experience in semiconductor product reliability, manufacturing/assembly, test processes, supply chain, and failure analysis. + Strong understanding of semiconductor manufacturing, IC test methodologies, device reliability, and common silicon/package failure mechanisms. + Hands-on experience with accelerated life testing, reliability estimation, and reliability testing of semiconductor components and modules. + Proficiency in statistical data analysis, ATE data interpretation, electrical characterization, IC failure analysis, and cross-functional collaboration to analyze yield loss and device failures. **Why Cisco?** At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. **Message to applicants applying to work in the U.S. and/or Canada:** The starting salary range posted for this position is $122,200.00 to $154,700.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: + 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees + 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco + Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees + Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) + 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next + Additional paid time away may be requested to deal with critical or emergency issues for family members + Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: + .75% of incentive target for each 1% of revenue attainment up to 50% of quota; + 1.5% of incentive target for each 1% of attainment between 50% and 75%; + 1% of incentive target for each 1% of attainment between 75% and 100%; and + Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $122,200.00 - $177,900.00 Non-Metro New York state & Washington state: $108,700.00 - $158,400.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. ** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements. Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.
    $122.2k-177.9k yearly 19d ago
  • ASIC Reliability Engineer

    Cisco Systems, Inc. 4.8company rating

    San Francisco, CA jobs

    The application window is expected to close on: January 13, 2026. NOTE: Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Candidate must be local to the San Jose, CA area as this is a hybrid role. Meet the Team Cisco Silicon One ASICs are transforming the Future of the Internet and the unified, power-efficient architecture is rapidly gaining traction in the market. The Silicon One Productization team takes enables technology, supplier readiness and takes these cutting-edge ASICs to production and we are looking for an experienced Reliability Engineer to join Quality and Reliability organization. Join an exciting team that works in a fast-paced, start-up type environment to establish robust processes to scale Cisco's Silicon Components business. Your Impact As a Product Reliability Engineer, you will play a critical role in ensuring our products meet the highest standards of quality and long-term performance. You will lead end-to-end product qualification efforts, applying your semiconductor engineering expertise to validate new designs, processes, and technologies. In this highly visible role, you'll partner with teams across design, manufacturing, suppliers, and leadership-communicating clearly, navigating complex technical challenges, and driving informed decision-making. Your ability to analyze data, resolve issues, and influence stakeholders will directly strengthen product reliability and customer trust. * Own and execute reliability test plans for new products, from sample testing through full qualification, including coordination with internal and external labs. * Assess IC, package, and process reliability to industry standards, performing key evaluations such as HTOL, HAST, ESD/LU, and supporting detailed debug activities. * Lead cross-functional failure analysis and structured problem-solving (8D, 5-Whys, cause-and-effect) to drive issue resolution and long-term prevention. * Analyze reliability data, trends, and acceleration models to evaluate field readiness, identify risks, and support qualification of new processes and package technologies. * Deliver clear qualification reports, collaborate closely with suppliers and engineering partners, and plan reliability resources to support both new product releases and ongoing monitoring programs. Minimum Qualifications * Bachelor's degree in Electrical Engineering or a related field. * 5+ years of experience in semiconductor product reliability, manufacturing/assembly, test processes, supply chain, and failure analysis. * Strong understanding of semiconductor manufacturing, IC test methodologies, device reliability, and common silicon/package failure mechanisms. * Hands-on experience with accelerated life testing, reliability estimation, and reliability testing of semiconductor components and modules. * Proficiency in statistical data analysis, ATE data interpretation, electrical characterization, IC failure analysis, and cross-functional collaboration to analyze yield loss and device failures. Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. Message to applicants applying to work in the U.S. and/or Canada: The starting salary range posted for this position is $122,200.00 to $154,700.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: * 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees * 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco * Non-exempt employees receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees * Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) * 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next * Additional paid time away may be requested to deal with critical or emergency issues for family members * Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: * .75% of incentive target for each 1% of revenue attainment up to 50% of quota; * 1.5% of incentive target for each 1% of attainment between 50% and 75%; * 1% of incentive target for each 1% of attainment between 75% and 100%; and * Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $122,200.00 - $177,900.00 Non-Metro New York state & Washington state: $108,700.00 - $158,400.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements.
    $122.2k-177.9k yearly 19d ago
  • Principal Staff Site Reliability Engineer

    Nvidia 4.9company rating

    Santa Clara, CA jobs

    NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation that's fueled by great technology and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. We are seeking a highly skilled Principal Staff SRE to join our dynamic team. Our company is at the forefront of technological innovation, and we are dedicated to driving efficiency and optimizing the performance of our infrastructure both on-prem and cloud. Join us in this exciting endeavor! What You Will Be Doing: Lead initiatives to transform IT Compute Core Team, architecture to build new service offerings across On-Prem and Cloud You will design, scale, and deploy core infrastructure services including DNS, NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity planning, and lifecycle management. Define and implement metrics to measure the efficiency of services and drive efficiency with software and hardware optimizations (SR-IOV/ DPU) Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans for appropriate level enterprise-wide systems, and coordinate with management personnel in implementing changes. Develop and maintain tools for collecting, analyzing, and visualizing data for reporting, alerting, monitoring. Collaborate with NVIDIA leadership, senior engineers, program managers, and product managers to develop compelling IT products and services that meet customer needs. What We Need To See: Bachelor's degree in Engineering, Computer Science, Mathematics, or related field, or equivalent experience 15+ years of proven experience in compute platform engineering with a focus on automation. Experience in designing and deploying Containerization architectures and Distributed Systems Infrastructure Proven experience evaluating existing application architectures and identify opportunities for containerization to improve scalability, reliability, and efficiency. Strong analytical skills with the ability to define and track key performance metrics. Experience in developing tools for data analysis and performance profiling, Development with Terraform, Config Management tools. Proficiency in programming languages such as Go and/or Python. Linux OS Proficiency with Kernel Internals Experience with running large environments consisting of BareMetal Build Infrastructure Understanding of Network Protocols and Architectures (VLAN/VxLAN/SDN/BGP/Anycast) Ways To Stand Out From The Crowd: Deep understanding of other infrastructure components like, DNS, LDAP, Security Tools etc.. Hands-on experience with containers and its implementation Deploying and Managing Services like DNS , LDAP at scale Solid understanding of microservices architecture, infrastructure as code (IaC) and configuration management tools. NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and passionate people on the planet working for us. If you're creative and autonomous, we want to hear from you! #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 248,000 USD - 391,000 USD. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until August 24, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $135k-180k yearly est. Auto-Apply 60d+ ago
  • Staff Site Reliability Engineer

    Nvidia 4.9company rating

    Santa Clara, CA jobs

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demand knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment, open source cloud enabling technologies like Kubernetes and Public Cloud. SRE at NVIDIA ensures that our internal and external facing services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on building components to eliminating manual work through automation, performance tuning and growing efficiency of production systems. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow. What you'll be doing: * Lead the technical strategy and roadmap for large-scale, cross-functional SRE initiatives that improve reliability, scalability, and developer productivity across enterprise systems. * Design, and build resilient distributed systems that power NVIDIA's next-generation AI-driven enterprise products and services. * Drive automation and observability improvements, using metrics and analytics to enhance performance, reliability, and efficiency. * Collaborate across Cloud, Platform, Security, and AI/ML teams to implement modern SRE components that ensure high availability and secure operations. * Analyze and troubleshoot complex systems, championing best practices in system design, incident management, and postmortem analysis. * Mentor and influence engineers across teams, fostering technical excellence and a culture of reliability engineering. What we need to see: * 10+ years of experience in Site Reliability Engineering, Platform Engineering, or Cloud Architect roles. * BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience * Strong proficiency in programming languages such as Python, Typescript, JavaScript, or Go, with a focus on automation and infrastructure-as-code. * Experience with infrastructure-as-code such as AWS CDK, AWS CloudFormation, Terraform or CrossPlane * Solid understanding of OpenTelemetry or other Observability implementation at scale. * Deep expertise in systems architecture, networking, Kubernetes, and public cloud services (AWS, Azure, or GCP). * Outstanding problem-solving, communication, and teamwork skills, with the ability to influence across technical and interpersonal boundaries. Ways to stand out from the crowd: * Passion for and experience with Public Cloud or large-scale automation systems. * Demonstrated ability to drive technical strategy and deliver measurable reliability outcomes in complex environments. * A strong sense of ownership, curiosity, and innovation-you thrive in ambiguity and turn challenges into opportunities. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables outstanding creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for exceptional people like you to help us accelerate the next wave of artificial intelligence. #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 264,500 USD for Level 4, and 200,000 USD - 322,000 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until November 4, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $135k-180k yearly est. Auto-Apply 19d ago
  • Staff Site Reliability Engineer

    Nvidia 4.9company rating

    California jobs

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demand knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment, open source cloud enabling technologies like Kubernetes and Public Cloud. SRE at NVIDIA ensures that our internal and external facing services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on building components to eliminating manual work through automation, performance tuning and growing efficiency of production systems. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow. What you'll be doing: Lead the technical strategy and roadmap for large-scale, cross-functional SRE initiatives that improve reliability, scalability, and developer productivity across enterprise systems. Design, and build resilient distributed systems that power NVIDIA's next-generation AI-driven enterprise products and services. Drive automation and observability improvements, using metrics and analytics to enhance performance, reliability, and efficiency. Collaborate across Cloud, Platform, Security, and AI/ML teams to implement modern SRE components that ensure high availability and secure operations. Analyze and troubleshoot complex systems, championing best practices in system design, incident management, and postmortem analysis. Mentor and influence engineers across teams, fostering technical excellence and a culture of reliability engineering. What we need to see: 10+ years of experience in Site Reliability Engineering, Platform Engineering, or Cloud Architect roles. BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience Strong proficiency in programming languages such as Python, Typescript, JavaScript, or Go, with a focus on automation and infrastructure-as-code. Experience with infrastructure-as-code such as AWS CDK, AWS CloudFormation, Terraform or CrossPlane Solid understanding of OpenTelemetry or other Observability implementation at scale. Deep expertise in systems architecture, networking, Kubernetes, and public cloud services (AWS, Azure, or GCP). Outstanding problem-solving, communication, and teamwork skills, with the ability to influence across technical and interpersonal boundaries. Ways to stand out from the crowd: Passion for and experience with Public Cloud or large-scale automation systems. Demonstrated ability to drive technical strategy and deliver measurable reliability outcomes in complex environments. A strong sense of ownership, curiosity, and innovation-you thrive in ambiguity and turn challenges into opportunities. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables outstanding creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for exceptional people like you to help us accelerate the next wave of artificial intelligence. #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 264,500 USD for Level 4, and 200,000 USD - 322,000 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until November 4, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
    $131k-176k yearly est. Auto-Apply 59d ago
  • ASIC Reliability Engineer

    Cisco 4.8company rating

    San Jose, CA jobs

    The application window is expected to close on: January 13, 2026. NOTE: Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Candidate must be local to the San Jose, CA area as this is a hybrid role. **Meet the Team** Cisco Silicon One ASICs are transforming the Future of the Internet and the unified, power-efficient architecture is rapidly gaining traction in the market. The Silicon One Productization team takes enables technology, supplier readiness and takes these cutting-edge ASICs to production and we are looking for an experienced Reliability Engineer to join Quality and Reliability organization. Join an exciting team that works in a fast-paced, start-up type environment to establish robust processes to scale Cisco's Silicon Components business. **Your Impact** As a Product Reliability Engineer, you will play a critical role in ensuring our products meet the highest standards of quality and long-term performance. You will lead end-to-end product qualification efforts, applying your semiconductor engineering expertise to validate new designs, processes, and technologies. In this highly visible role, you'll partner with teams across design, manufacturing, suppliers, and leadership-communicating clearly, navigating complex technical challenges, and driving informed decision-making. Your ability to analyze data, resolve issues, and influence stakeholders will directly strengthen product reliability and customer trust. + Own and execute reliability test plans for new products, from sample testing through full qualification, including coordination with internal and external labs. + Assess IC, package, and process reliability to industry standards, performing key evaluations such as HTOL, HAST, ESD/LU, and supporting detailed debug activities. + Lead cross-functional failure analysis and structured problem-solving (8D, 5-Whys, cause-and-effect) to drive issue resolution and long-term prevention. + Analyze reliability data, trends, and acceleration models to evaluate field readiness, identify risks, and support qualification of new processes and package technologies. + Deliver clear qualification reports, collaborate closely with suppliers and engineering partners, and plan reliability resources to support both new product releases and ongoing monitoring programs. **Minimum Qualifications** + Bachelor's degree in Electrical Engineering or a related field. + 5+ years of experience in semiconductor product reliability, manufacturing/assembly, test processes, supply chain, and failure analysis. + Strong understanding of semiconductor manufacturing, IC test methodologies, device reliability, and common silicon/package failure mechanisms. + Hands-on experience with accelerated life testing, reliability estimation, and reliability testing of semiconductor components and modules. + Proficiency in statistical data analysis, ATE data interpretation, electrical characterization, IC failure analysis, and cross-functional collaboration to analyze yield loss and device failures. **Why Cisco?** At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. **Message to applicants applying to work in the U.S. and/or Canada:** The starting salary range posted for this position is $122,200.00 to $154,700.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: + 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees + 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco + Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees + Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) + 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next + Additional paid time away may be requested to deal with critical or emergency issues for family members + Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: + .75% of incentive target for each 1% of revenue attainment up to 50% of quota; + 1.5% of incentive target for each 1% of attainment between 50% and 75%; + 1% of incentive target for each 1% of attainment between 75% and 100%; and + Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $122,200.00 - $177,900.00 Non-Metro New York state & Washington state: $108,700.00 - $158,400.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. ** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements. Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.
    $122.2k-177.9k yearly 19d ago
  • ASIC Reliability Engineer

    Cisco Systems, Inc. 4.8company rating

    San Jose, CA jobs

    The application window is expected to close on: January 13, 2026. NOTE: Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Candidate must be local to the San Jose, CA area as this is a hybrid role. Meet the Team Cisco Silicon One ASICs are transforming the Future of the Internet and the unified, power-efficient architecture is rapidly gaining traction in the market. The Silicon One Productization team takes enables technology, supplier readiness and takes these cutting-edge ASICs to production and we are looking for an experienced Reliability Engineer to join Quality and Reliability organization. Join an exciting team that works in a fast-paced, start-up type environment to establish robust processes to scale Cisco's Silicon Components business. Your Impact As a Product Reliability Engineer, you will play a critical role in ensuring our products meet the highest standards of quality and long-term performance. You will lead end-to-end product qualification efforts, applying your semiconductor engineering expertise to validate new designs, processes, and technologies. In this highly visible role, you'll partner with teams across design, manufacturing, suppliers, and leadership-communicating clearly, navigating complex technical challenges, and driving informed decision-making. Your ability to analyze data, resolve issues, and influence stakeholders will directly strengthen product reliability and customer trust. * Own and execute reliability test plans for new products, from sample testing through full qualification, including coordination with internal and external labs. * Assess IC, package, and process reliability to industry standards, performing key evaluations such as HTOL, HAST, ESD/LU, and supporting detailed debug activities. * Lead cross-functional failure analysis and structured problem-solving (8D, 5-Whys, cause-and-effect) to drive issue resolution and long-term prevention. * Analyze reliability data, trends, and acceleration models to evaluate field readiness, identify risks, and support qualification of new processes and package technologies. * Deliver clear qualification reports, collaborate closely with suppliers and engineering partners, and plan reliability resources to support both new product releases and ongoing monitoring programs. Minimum Qualifications * Bachelor's degree in Electrical Engineering or a related field. * 5+ years of experience in semiconductor product reliability, manufacturing/assembly, test processes, supply chain, and failure analysis. * Strong understanding of semiconductor manufacturing, IC test methodologies, device reliability, and common silicon/package failure mechanisms. * Hands-on experience with accelerated life testing, reliability estimation, and reliability testing of semiconductor components and modules. * Proficiency in statistical data analysis, ATE data interpretation, electrical characterization, IC failure analysis, and cross-functional collaboration to analyze yield loss and device failures. Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. Message to applicants applying to work in the U.S. and/or Canada: The starting salary range posted for this position is $122,200.00 to $154,700.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: * 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees * 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco * Non-exempt employees receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees * Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) * 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next * Additional paid time away may be requested to deal with critical or emergency issues for family members * Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: * .75% of incentive target for each 1% of revenue attainment up to 50% of quota; * 1.5% of incentive target for each 1% of attainment between 50% and 75%; * 1% of incentive target for each 1% of attainment between 75% and 100%; and * Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $122,200.00 - $177,900.00 Non-Metro New York state & Washington state: $108,700.00 - $158,400.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements.
    $122.2k-177.9k yearly 19d ago

Learn more about Intel jobs

View all jobs