Post job

Senior reliability engineer job description

Updated March 14, 2024
10 min read
Find better candidates in less time
Post a job on Zippia and take the best from over 7 million monthly job seekers.

Example senior reliability engineer requirements on a job description

Senior reliability engineer requirements can be divided into technical requirements and required soft skills. The lists below show the most common requirements included in senior reliability engineer job postings.
Sample senior reliability engineer requirements
  • Bachelor’s degree in engineering or related field
  • Minimum 5 years of experience in reliability engineering
  • Knowledge of reliability engineering tools, techniques, and processes
  • Proficient in CAD and other engineering software
  • Familiarity with safety regulations and standards
Sample required senior reliability engineer soft skills
  • Strong communication and interpersonal skills
  • Excellent problem-solving and analytical skills
  • High attention to detail and accuracy
  • Ability to work independently and as part of a team
  • Aptitude to learn quickly and adapt to changing environments

Senior reliability engineer job description example 1

CVS Health senior reliability engineer job description

CVS Health Digital is looking for hands-on, passionate people who want to join a high energy and growing team to make a difference in customers' lives and who want to be on theforefront of digital innovation that aims to reinvent what a pharmacy and a health care company can be in the digital world.This role includes closely working with a team of engineers to drive delivery excellence, partner with business and other IT leaders to build digital assets, provide expert level support to resolve complex issues and provide RCA,identify gaps in systems in current digital eco system and provide solutions to enhance overall performance.All of this is critical to driving Digital growth and improving customer experience.This individual will collaborate with digital leaders, business partners, and IT colleagues to deliver business growth and great customer experiencesthat differentiate CVS in the market, while ensuring efficient, effective and SAFe development.As a Senior Site Reliability Engineering professional, you'll be making decisions of a strategic nature that impact our customers, clients, and businesses. Your expertise in analyzing complex data systems, anticipating problems, and finding ways to mitigate risk, will be a key focus of a high performing team to successfully design and navigate the program roadmap. By incorporating your knowledge of business drivers, you will affect change and lead development of innovative improvements and world-class practices.
Pay Range

The typical pay range for this role is:

Minimum: 75,000

Maximum: 167,000

Please keep in mind that this range represents the pay range for all positions in the job grade within which this position falls. The actual salary offer will take into account a wide range of factors, including location.

Required Qualifications

5+ years supporting environments at scale and leading incident/problem management 3+ years of experience with Cloud Infrastructure (AWS and GCP preferred), container orchestration like Kubernetes, and service mesh such as Istio3+ years demonstrated ability to leverage and maintain observability tools such as Kiali, Jaeger, Prometheus, NewRelic, SumoLogic, Splunk2+ years of experience with GitOps (ArgoCD)

COVID Requirements

COVID-19 Vaccination Requirement

CVS Health requires certain colleagues to be fully vaccinated against COVID-19 (including any booster shots if required), where allowable under the law, unless they are approved for a reasonable accommodation based on disability, medical condition, religious belief, or other legally recognized reasons that prevents them from being vaccinated.

You are required to have received at least one COVID-19 shot prior to your first day of employment and to provide proof of your vaccination status or apply for a reasonable accommodation within the first 10 days of your employment. Please note that in some states and roles, you may be required to provide proof of full vaccination or an approved reasonable accommodation before you can begin to actively work.

Preferred Qualifications

Configuring and maintaining Infrastructure as Code (Terraform) Experience configuring, maintaining, and troubleshooting networking (e.g. AWS VPCs, load balancers, etc) Good understanding of Site Reliability Engineering (SRE) and DevOpsphilosophies, technologies, platforms and tools, SLA management, incident resolution, and automation Demonstrated ability to conceptualize, launch and deliver multiple IT projects on time and within budget Ability to articulate to more experienced management a technical strategy in clear, concise, understandable terms Proven ability to understand and troubleshoot complex problems under pressure Excellent organizational and workload management capabilities2+ years of experience database support and tuning (NoSQL such as MongoDB and Relational such as Postgres)

Education

Bachelor's degree or equivalent experience

Business Overview

Bring your heart to CVS Health Every one of us at CVS Health shares a single, clear purpose: Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced human-centric health care for a rapidly changing world. Anchored in our brand - with heart at its center - our purpose sends a personal message that how we deliver our services is just as important as what we deliver. Our Heart At Work Behaviors™ support this purpose. We want everyone who works at CVS Health to feel empowered by the role they play in transforming our culture and accelerating our ability to innovate and deliver solutions to make health care more personal, convenient and affordable. We strive to promote and sustain a culture of diversity, inclusion and belonging every day. CVS Health is an affirmative action employer, and is an equal opportunity employer, as are the physician-owned businesses for which CVS Health provides management services. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex/gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law. We proudly support and encourage people with military experience (active, veterans, reservists and National Guard) as well as military spouses to apply for CVS Health job opportunities.
jobs
Post a job for free, promote it for a fee

Senior reliability engineer job description example 2

Tesla senior reliability engineer job description

The Role:

This role follows the reliability lifecycle of Tesla's High Voltage Battery (HVB) pack integration systems from concept to design, development, manufacturing, field operation, and field returns to design-in, confirm and grow exceptional reliability at every stage.

Responsibilities:

+ Facilitate Design FMEA sessions in order to drive reliable design choices and improve validation test planning and assemblies.

+ Knowledge of the reliability methods to build in and validate that the targets are met.

+ Analyze usage and environmental conditions from the field in order to improve requirement setting and testing methods.

+ Create Fault Trees and reliability block diagrams to assess system reliability.

+ Apply reliability failure physics to design accelerated test methods to failure modes exploration and reliability growth.

+ Spec out reliability validation plans for components and subsystems.

+ Facilitate failure analysis to understand root cause and drive resolution for failures originating in testing and from the field.

+ Work with manufacturing quality, service engineering, supplier quality and design teams to facilitate field failure resolution (aka Weibull) in order to support all investigations.

+ Provide reliability design guidelines and apply reliability lessons learned to enable continuous improvement.

Desired skills:

+ Experience working with reliability of battery packs and/or complex electro-mechanical systems.

+ Knowledge of failure mechanisms and lifetime acceleration models associated with high current carrying conductors.

+ Knowledge of the ReliaSoft Synthesis Platform, including Weibull++, BlockSim, ALTA, RGA, and xFMEA.

+ Knowledge of database structures and practical understanding and use of SQL. Experience in working with large data sets. Experience with Tableau for big data analytics.

+ Working knowledge of applied statistics and experience with statistical software such as JMP.

+ Understanding of fleet reliability monitoring metrics and reliability KPIs.

+ Knowledge of reliability warranty analysis and reliability prediction methods.

+ Working knowledge of failure analysis techniques such as optical microscopy, SEM, CSAM, X-ray, cross-sectioning and EDX.

+ ASQ Certified Reliability Engineer preferred.

Requirements:

+ BS, MS, or Ph.D. in Mechanical Engineering (EE), or Material Science (prefer metal material), and one or more years of industry experience on reliability of high voltage battery related systems or components.

+ Master's or PhD degree in Reliability Engineering
jobs
Dealing with hard-to-fill positions? Let us help.

Senior reliability engineer job description example 3

Global Payments senior reliability engineer job description

Every day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions and over 600 million cardholders grow with confidence and achieve amazing results. We are driven by our passion for success and we are proud to deliver best-in-class payment technology and software solutions. Join our dynamic team and make your mark on the payments technology landscape of tomorrow.
#LI-Remote

Job Description: Senior Site Reliability Engineer

Summary of This Role:

The Senior SRE Engineer will work closely with Infrastructure, Application, and other teams as required to ensure full-stack logging, monitoring, and alerting systems are providing unparalleled visibility and insight to support Platform Availability, Performance, Security, and Compliance goals. The candidate is expected to be a self-starter who can operate well under various situations and types of projects, ranging from a team of one to a team of many in maintaining and continuously improving observability of the health and state of the Platform's stack to achieve superior Platform Reliability.

Duties and Responsibilities:

Develop a strategy to grow and mature our suite of monitoring technologies and work with teams to support high levels of adoption and best practice usage
Develop implementation standards, guides, processes, and run books to enable teams to deploy, test, and use monitoring system configurations while ensuring maintainability
Work with System, Application, Network, Cloud Ops, Security, Database, Product management, and Operations teams to understand logging, dashboarding, alerting, and analytical requirements and assist them in fulfilling these with the implementation of the appropriate monitoring functionality
Manage the availability, performance, and capacity of all monitoring Technologies
Working with Vendors as required, plan and lead execution of upgrades to monitoring technologies ensuring adoption of relevant new features as they become available from the Vendors
Act as liaison for technical questions, issues, or escalations related to monitoring systems, including working with other teams on their roadmaps/plans, solutions, estimation, and implementation where it pertains to monitoring technology functionality
Participate in troubleshooting of live Incidents to minimize time to restoration or postmortem deep dives of any aspect of the technology stack to identify the root cause
Identify and maintain a prioritized backlog of blind spots and gaps in Platform Availability, Performance, Security, and Compliance observability working with appropriate teams to implement solutions to eliminate
Tracks and reports on the performance of Monitoring technologies in relation to supporting Platform goals
Maintain functional and technical knowledge of existing monitoring suite technologies and research and learn new tools for adoption

Competencies:

Demonstrates good judgment in solving problems as well as identifying problems in advance, and proposing solutions
Solid analytical skills combined coupled with excellent verbal and written communication skills to articulate findings and recommendations
Strong ability to handle multiple tasks concurrently with competing deadlines, effectively prioritizing, researching, documenting, and managing problems and tasks within a dynamic and fast-paced environment
Excellent troubleshooting skills with the ability to participate and contribute in a meaningful way on live Incidents or postmortem deep dives into any aspect of the technology stack
Demonstrable ability to achieve results through relationship building and influencing others
Desire and ability to learn all facets of an online Platform from non-functional to functional to its supporting teams' processes
Highly motivated self-starter who can excel in this role under minimum supervision

Technical Experience:

4 + years of experience as a Systems/DevOps engineer with a breadth of knowledge across Network, Storage, Virtualization, Operating Systems, and Application Containers
Experience in developing monitoring strategies put into action with monitoring technologies such as Splunk, ELK, PRTG, LogicMonitor, Thousand Eyes, Pingdom, Dynatrace
Experience working heavily with Splunk, including rule and advanced logic creation, dashboards/visualizations, data ingestion, and manipulation
Experience with Application Monitoring solutions(Dynatrace, New Relic, or similar)
Experience with Infrastructure Monitoring solutions (PRTG, LogicMonitor, or similar)
Experience developing custom health checks and alerting
Extensive hands-on experience with scripting languages to automate tasks and manipulate data (e.g., Python, Perl, PowerShell, Terraform, Ansible)
Experience in public cloud environments (GCP & Azure preferred), ideally with monitoring design and implementation for such environments

What qualifications a successful candidate should have:

Bachelor's degree (willing to accept relevant experience in place of a degree)
4 years of relevant experience in IT
Past work experience in regulated environments adhering to standards such as PCI DSS
Azure and GCP experience with a solid security focus, certifications a plus

Global Payments Inc. is an equal opportunity employer.

Global Payments provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex (including pregnancy), national origin, ancestry, age, marital status, sexual orientation, gender identity or expression, disability, veteran status, genetic information or any other basis protected by law. Those applicants requiring reasonable accommodation to the application and/or interview process should notify a representative of the Human Resources Department.
jobs
Start connecting with qualified job seekers

Resources for employers posting senior reliability engineer jobs

Average cost of hiring
Recruitment statistics
How to write a job description
Examples of work conditions

Senior reliability engineer job description FAQs

Ready to start hiring?

Updated March 14, 2024

Zippia Research Team
Zippia Team

Editorial Staff

The Zippia Research Team has spent countless hours reviewing resumes, job postings, and government data to determine what goes into getting a job in each phase of life. Professional writers and data scientists comprise the Zippia Research Team.