Post job

Senior Reliability Engineer jobs at QGenda

- 68 jobs
  • Senior Site Reliability Engineer

    Qgenda 4.1company rating

    Senior reliability engineer job at QGenda

    Who We Are QGenda is redefining healthcare workforce management everywhere care is delivered. We're on a mission to empower the healthcare industry to better onboarding, deploy, and manage their workforce. Over 4,500 healthcare organizations have trusted us to help them make strategic workforce decisions through our unified software platform. With more than 700 employees across the US, we are united in our vision and culture to make a difference for our customers, while enjoying the day-to-day. At QGenda, we value our employees and their contributions toward the success of the business. We strive to create a dynamic work environment that fosters growth, innovation, and collaboration, where employees can be proud of the work they do and the impact it has on the healthcare industry. QGenda is headquartered in Atlanta. To learn more about QGenda, visit us at qgenda.com or follow us on Instagram or LinkedIn. About Your Role As a Senior Site Reliability Engineer, you will work with our Infrastructure and Product Development Teams to design, operate, and scale highly available services on AWS. You'll lead automation and infrastructure-as-code efforts to eliminate toil, standardize configuration, and expand observability across metrics, logs, and traces. You will evaluate and introduce AWS services and tooling that improve reliability, performance, and developer velocity. This role offers the opportunity to shape our reliability roadmap and make a measurable impact on the resilience and evolution of our technology stack. How You'll Make an Impact System Reliability and Performance: Design, implement, and manage scalable systems that ensure high availability, fault tolerance, and optimal performance. Continuously monitor and enhance system health and performance through data analysis and metrics. Embed observability (metrics, logs, traces, alerts) with actionable thresholds and up-to-date runbooks. Automation and Tooling: Eliminate toil by building automation and self-service tools for common operational workflows. Own CI/CD pipelines (build, test, security scans) and enable progressive delivery (blue/green, canary). Manage infrastructure as code via Terraform and configuration management with Git-backed workflows. Incident Management and Troubleshooting: Participate in on-call; triage, mitigate, and resolve incidents within defined SLAs. Lead incident response and blameless post-incident reviews; document RCAs and drive corrective actions to closure. Maintain runbooks/playbooks and regularly perform disaster recovery scenarios. Infrastructure Management: Operate and secure AWS environments (IAM, VPC, EC2/ECS, RDS, S3, Lambda, etc.) with a focus on resilience and compliance. Optimize cost, performance, and reliability (rightsizing, autoscaling, reservations/savings plans, tagging, spend monitoring, etc.). Collaboration & Culture: Serve as a technical advisor to engineering teams on infrastructure and operations best practices. Mentor peers on SRE practices; promote observability, continuous improvement, and a blameless culture. Contribute to roadmaps and capacity planning to align reliability goals with product objectives. Who You Are Availability for off-hours deployment and upgrades of production systems during release and maintenance windows Strong problem-solving skills and ability to work effectively under pressure. Excellent communication skills for cross-functional collaboration as well as documentation creation. Experience You Bring B.S. in Computer Science, Computer Information Systems, or Computer Engineering from a major U.S. university or equivalent industry experience 5+ years of experience as a DevOps, SRE or Systems Engineer Advanced proficiency with at least one scripting or programming language Experience with Docker and container orchestration tools such as AWS ECS Hands-on experience building infrastructure and supporting applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache Experience with logging, creating dashboards, and alerts using observability tools such as Datadog and Amazon CloudWatch Strong understanding of networking and DNS Familiarity with configuration management and infrastructure as code (IaC) tools such as Terraform Firm understanding and experience with Agile and Scrum SDLC processes Using distributed version control system experience (Git preferred) to check-in code, branching, merging, pull request, code review, etc Knowledge of CI/CD best practices and tools such as AWS CodeBuild, Jenkins and/or TeamCity Experience designing and delivering secure, high performance and highly available cloud services #LI-Hybrid Applicants for this position must be authorized to work for any employer in the United States(U.S.), including being located in the US. We are unable to sponsor, take over sponsorship of, or hire candidates with an employment visa at this time. What's In It For You We offer a comprehensive total rewards package to support our full-time employees and their family's day-to-day needs, well-being and major life events, which includes: Fully company-paid options for medical (both in-person and virtual), dental and vision insurance Generous paid time off (PTO) policy to enjoy periods of uninterrupted rest and relaxation for a healthy work/life balance Paid parental leave for birth, adoption or permanent placement 401(k) with company match Options to work in a hybrid-working model or remotely from home, depending on the position Annual Costco membership, cell phone stipend, commuter benefits, in-office perks and more QGenda delivers technology solutions to improve how healthcare is delivered and increase access - for everyone. We can only succeed by bringing together diverse minds, thoughts, ideas and team members to create better solutions for our customers and make us a better company as a whole. We are committed to creating a culture of embracing diversity, inclusion and equity for all. QGenda is an Equal Employment Opportunity employer and makes all employment decisions without regard to race, color, religion, creed, gender, sex (including pregnancy), sexual orientation, gender identity or expression, natural origin, ancestry, age, marital status, disability or genetic information, military status, status as a disabled or protected veteran or any other protected status under applicable law. If you require accommodations or assistance to complete the online application process, please contact ********************* and identify the type of accommodation or assistance you are requesting. Do not include any medical or health information in this email. We will respond to your email promptly.
    $97k-130k yearly est. Auto-Apply 24d ago
  • Site Reliability Engineer

    Matlen Silver 3.7company rating

    Alpharetta, GA jobs

    Title: Senior Cloud Security Engineer/Architect Environment: Onsite Duration: 6 month contract to hire Contract pay: $68-$90/hour W2 Conversion salary: $150k-$188k NO C2C ** Due to client requirements, US Citizen or GC Holder ONLY ** Requirements Minimum 13+ years of professional experience in Cloud Infrastructure, DevOps, or Site Reliability Engineering. Strong Infrastructure as Code (IaC) expertise with Terraform-hands-on experience creating and managing EKS clusters, repositories, and Terraform modules. Architect, implement, and manage Azure IaaS infrastructure encompassing VNets, subnets, network security groups, VPN gateways, CDNs, Traffic Manager, peering, custom routes, DNS, DHCP, and virtual appliances. Proven proficiency across Azure and/or AWS (multi-cloud experience preferred). Strong security mindset with practical experience in IAM, vulnerability remediation, encryption, and patching. Solid understanding of DNS, Docker, Kubernetes, and containerization best practices. Experience with Windows and Linux/Unix system and network administration (8+ years). Proficiency in one or more programming/scripting languages: Python, Go, Bash, or Ruby. Expertise in Terraform, Ansible, or Chef for automation and configuration management. Hands-on experience with cloud services (AWS, Azure, GCP) - including EC2, S3, Kubernetes, and serverless environments. Knowledge of networking fundamentals: DNS, firewalls, load balancing, and VPNs. Experience with container orchestration using Docker, Kubernetes, or OpenShift. Experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, or New Relic. CI/CD pipeline development using Jenkins, GitLab CI, GitHub Actions, or CircleCI. Bonus: Experience with HashiCorp Vault and advanced Terraform module design. Deep understanding of access control, encryption standards, secure coding practices, and regulatory frameworks Skilled in incident management, root cause analysis, automation, and performance tuning. Understanding of SLOs/SLAs, system scalability, redundancy, and resilience best practices.
    $150k-188k yearly 4d ago
  • Senior Site Reliability Engineer I

    Axon 4.5company rating

    Atlanta, GA jobs

    Join Axon and be a Force for Good. At Axon, we're on a mission to Protect Life. We're explorers, pursuing society's most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We connect with candor and care, seeking out diverse perspectives from our customers, communities and each other. Life at Axon is fast-paced, challenging and meaningful. Here, you'll take ownership and drive real change. Constantly grow as you work hard for a mission that matters at a company where you matter. Your Impact As a contributor in the APX SRE organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with the APX SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. Location: This role is based out of our Atlanta, GA office (Peachtree Corners) and follows a hybrid schedule. We rely on in-person collaboration and ask that team members work onsite Tuesdays through Fridays, with the flexibility to work remotely on Mondays, unless there is an approved workplace accommodation. We believe that connection fuels innovation, and our in-office culture is designed to foster meaningful teamwork, mentorship, and shared success. What You'll Do Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, and securely. Exemplify cloud-native site reliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. Contribute to platform features and tooling by designing clear, well-tested APIs in Go or Python. Author design docs, test plans, and usage guides to promote self-service. Take calculated risks, champion new ideas, and cultivate your craft. What You Bring 6+ years of applicable software engineering or SRE experience 3+ years experience managing cloud platforms such as Azure, AWS, or similar. Experience operating in Kubernetes platforms like AKS, EKS, or similar. Experience using managed languages such as Python, Go, C#, Java, or similar with demonstrable API and unit-testing experience. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, releases, and integration tests on your code. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and testable Infrastructure as Code modules. Empathy to support the needs of software engineers. Benefits that Benefit You Competitive salary and 401k with employer match Discretionary paid time off Paid parental leave for all Medical, Dental, Vision plans Fitness Programs Emotional & Mental Wellness support Learning & Development programs And yes, we have snacks in our offices Benefits listed herein may vary depending on the nature of your employment and the location where you work. The Pay: Axon is a total compensation company, meaning compensation is made up of base pay, bonus, and stock awards. The starting base pay for this role is between USD 134,250 in the lowest geographic market and USD 214,800 in the highest geographic market. The actual base pay is dependent upon many factors, such as: level, function, training, transferable skills, work experience, business needs, geographic market, and often a combination of all these factors. Our benefits offer an array of options to help support you physically, financially and emotionally through the big milestones and in your everyday life. To see more details on our benefits offerings please visit ****************************** Don't meet every single requirement? That's ok. At Axon, we Aim Far. We think big with a long-term view because we want to reinvent the world to be a safer, better place. We are also committed to building diverse teams that reflect the communities we serve. Studies have shown that women and people of color are less likely to apply to jobs unless they check every box in the . If you're excited about this role and our mission to Protect Life but your experience doesn't align perfectly with every qualification listed here, we encourage you to apply anyways. You may be just the right candidate for this or other roles. Important Notes The above is not intended as, nor should it be construed as, exhaustive of all duties, responsibilities, skills, efforts, or working conditions associated with this job. The job description may change or be supplemented at any time in accordance with business needs and conditions. Some roles may also require legal eligibility to work in a firearms environment. Axon's mission is to Protect Life and is committed to the well-being and safety of its employees as well as Axon's impact on the environment. All Axon employees must be aware of and committed to the appropriate environmental, health, and safety regulations, policies, and procedures. Axon employees are empowered to report safety concerns as they arise and activities potentially impacting the environment. We are an equal opportunity employer that promotes justice, advances equity, values diversity and fosters inclusion. We're committed to hiring the best talent - regardless of race, creed, color, ancestry, religion, sex (including pregnancy), national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, genetic information, veteran status, or any other characteristic protected by applicable laws, regulations and ordinances - and empowering all of our employees so they can do their best work. If you have a disability or special need that requires assistance or accommodation during the application or the recruiting process, please email **********************. Please note that this email address is for accommodation purposes only. Axon will not respond to inquiries for other purposes.
    $101k-130k yearly est. Auto-Apply 60d+ ago
  • Sr. Site Reliability Engineer II

    Axon 4.5company rating

    Atlanta, GA jobs

    Join Axon and be a Force for Good. At Axon, we're on a mission to Protect Life. We're explorers, pursuing society's most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We connect with candor and care, seeking out diverse perspectives from our customers, communities and each other. Life at Axon is fast-paced, challenging and meaningful. Here, you'll take ownership and drive real change. Constantly grow as you work hard for a mission that matters at a company where you matter. Your Impact As a contributor in the APX SRE organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with the APX SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. What You'll Do Work Location: This role is based out of our Atlanta Office and follows a hybrid schedule. We rely on in-person collaboration and ask that team members work onsite Tuesdays through Fridays, with the flexibility to work remotely on Mondays, unless there is an approved workplace accommodation. We believe that connection fuels innovation, and our in-office culture is designed to foster meaningful teamwork, mentorship, and shared success Reports to: Senior Manager, SRE Direct Reports: None Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, and securely. Exemplify cloud-native site reliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. Influence and educate the engineering organization to adopt new and improved architectural patterns. Provide robust documentation for use by engineers to promote self-service. Take calculated risks, champion new ideas, and cultivate your craft. What You Bring This position involves handling of classified federal data; under federal regulations, it is open to US Citizens only Strong expertise in Kubernetes, including designing, deploying, and managing Kubernetes clusters in production environments. 10+ years of of applicable experience Experience managing cloud platforms such as Azure, AWS, or similar. Experience operating in Kubernetes platforms like AKS, EKS, or similar. Experience using managed languages such as Python, Go, Java, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and testable Infrastructure as Code modules. Empathy to support the needs of software engineers. Benefits that Benefit You Competitive salary and 401k with employer match Discretionary paid time off Paid parental leave for all Medical, Dental, Vision plans Fitness Programs Emotional & Mental Wellness support Learning & Development programs And yes, we have snacks in our offices Benefits listed herein may vary depending on the nature of your employment and the location where you work. The Pay: Axon is a total compensation company, meaning compensation is made up of base pay, bonus, and stock awards. The starting base pay for this role is between USD 150,000 in the lowest geographic market and USD 230,000 in the highest geographic market. The actual base pay is dependent upon many factors, such as: level, function, training, transferable skills, work experience, business needs, geographic market, and often a combination of all these factors. Our benefits offer an array of options to help support you physically, financially and emotionally through the big milestones and in your everyday life. To see more details on our benefits offerings please visit ***************************** (http://*****************************). Don't meet every single requirement? That's ok. At Axon, we Aim Far. We think big with a long-term view because we want to reinvent the world to be a safer, better place. We are also committed to building diverse teams that reflect the communities we serve. Studies have shown that women and people of color are less likely to apply to jobs unless they check every box in the . If you're excited about this role and our mission to Protect Life but your experience doesn't align perfectly with every qualification listed here, we encourage you to apply anyways. You may be just the right candidate for this or other roles. Important Notes The above is not intended as, nor should it be construed as, exhaustive of all duties, responsibilities, skills, efforts, or working conditions associated with this job. The job description may change or be supplemented at any time in accordance with business needs and conditions. Some roles may also require legal eligibility to work in a firearms environment. Axon's mission is to Protect Life and is committed to the well-being and safety of its employees as well as Axon's impact on the environment. All Axon employees must be aware of and committed to the appropriate environmental, health, and safety regulations, policies, and procedures. Axon employees are empowered to report safety concerns as they arise and activities potentially impacting the environment. We are an equal opportunity employer that promotes justice, advances equity, values diversity and fosters inclusion. We're committed to hiring the best talent - regardless of race, creed, color, ancestry, religion, sex (including pregnancy), national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, genetic information, veteran status, or any other characteristic protected by applicable laws, regulations and ordinances - and empowering all of our employees so they can do their best work. If you have a disability or special need that requires assistance or accommodation during the application or the recruiting process, please email **********************. Please note that this email address is for accommodation purposes only. Axon will not respond to inquiries for other purposes.
    $101k-130k yearly est. Auto-Apply 58d ago
  • Site Reliability Engineer II

    Axon 4.5company rating

    Atlanta, GA jobs

    Join Axon and be a Force for Good. At Axon, we're on a mission to Protect Life. We're explorers, pursuing society's most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We connect with candor and care, seeking out diverse perspectives from our customers, communities and each other. Life at Axon is fast-paced, challenging and meaningful. Here, you'll take ownership and drive real change. Constantly grow as you work hard for a mission that matters at a company where you matter. Your Impact As a contributor in the SRE organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with the SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. Work Location: This role is based out of our Atlanta office and follows a hybrid schedule. We rely on in-person collaboration and ask that team members work onsite Tuesdays through Fridays, with the flexibility to work remotely on Mondays, unless there is an approved workplace accommodation. We believe that connection fuels innovation, and our in-office culture is designed to foster meaningful teamwork, mentorship, and shared success What You'll Do Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, securely, and cost-effective. Exemplify cloud-native site reliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. Influence and educate the engineering organization to adopt new and improved architectural patterns. Provide robust documentation for use by engineers to promote self-service. Take calculated risks, champion new ideas, and cultivate your craft. What You Bring This position involves handling of classified federal data; under federal regulations, it is open to U.S. citizens only 5+ years of applicable experience Experience managing cloud platforms such as Azure, AWS, or similar. Experience using managed languages such as Python, Go, C#, Java, or similar. Experience operating in Kubernetes platforms like AKS, EKS, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, AWS CloudFormation, or similar. Builder-operator mindset with proven production ownership (uptime, SLOs, on-call, incident leadership). Empathy to support the needs of software engineers. Benefits that Benefit You Competitive salary and 401k with employer match Discretionary time off Paid parental leave for all Medical, Dental, Vision plans Fitness Programs Emotional & Development Programs And yes, we have snacks in our offices Benefits listed herein may vary depending on the nature of your employment and the location where you work. The Pay: Axon is a total compensation company, meaning compensation is made up of base pay, bonus, and stock awards. The starting base pay for this role is between USD 115,500 in the lowest geographic market and USD 184,800 in the highest geographic market. The actual base pay is dependent upon many factors, such as: level, function, training, transferable skills, work experience, business needs, geographic market, and often a combination of all these factors. Our benefits offer an array of options to help support you physically, financially and emotionally through the big milestones and in your everyday life. To see more details on our benefits offerings please visit ****************************** Don't meet every single requirement? That's ok. At Axon, we Aim Far. We think big with a long-term view because we want to reinvent the world to be a safer, better place. We are also committed to building diverse teams that reflect the communities we serve. Studies have shown that women and people of color are less likely to apply to jobs unless they check every box in the . If you're excited about this role and our mission to Protect Life but your experience doesn't align perfectly with every qualification listed here, we encourage you to apply anyways. You may be just the right candidate for this or other roles. Important Notes The above is not intended as, nor should it be construed as, exhaustive of all duties, responsibilities, skills, efforts, or working conditions associated with this job. The job description may change or be supplemented at any time in accordance with business needs and conditions. Some roles may also require legal eligibility to work in a firearms environment. Axon's mission is to Protect Life and is committed to the well-being and safety of its employees as well as Axon's impact on the environment. All Axon employees must be aware of and committed to the appropriate environmental, health, and safety regulations, policies, and procedures. Axon employees are empowered to report safety concerns as they arise and activities potentially impacting the environment. We are an equal opportunity employer that promotes justice, advances equity, values diversity and fosters inclusion. We're committed to hiring the best talent - regardless of race, creed, color, ancestry, religion, sex (including pregnancy), national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, genetic information, veteran status, or any other characteristic protected by applicable laws, regulations and ordinances - and empowering all of our employees so they can do their best work. If you have a disability or special need that requires assistance or accommodation during the application or the recruiting process, please email **********************. Please note that this email address is for accommodation purposes only. Axon will not respond to inquiries for other purposes.
    $78k-103k yearly est. Auto-Apply 55d ago
  • Sr. Specialist Site Reliability Engineer

    Waystar 4.6company rating

    Atlanta, GA jobs

    ** We are looking for a talented and driven Site Reliability Engineering (SRE) Specialist to support our engineering team, which manages the infrastructure and services that power our Waystar products. This role is ideal for an experienced engineer who thrives in data-intensive environments and is passionate about building reliable, scalable systems that ensure data integrity, availability, and performance. As an SRE Specialist, you'll work closely with engineering, product, and data teams to ensure our data licensing platforms are resilient, observable, and continuously improving. **WHAT YOU'LL DO** + **System Reliability & Performance** + Design and implement reliability solutions for data ingestion, processing, and delivery pipelines. + Define and maintain SLIs/SLOs for data licensing services and manage error budgets. + Build automation for deployment, monitoring, and incident response. + **Observability & Monitoring** + Enhance system observability through metrics, logging, and tracing. + Develop and maintain dashboards and alerts to proactively detect and resolve issues. + **Incident Response & Postmortems** + Participate in on-call rotations and lead incident response efforts. + Conduct root cause analysis and drive post-incident improvements. + Maintain runbooks and operational documentation. + **Collaboration & Continuous Improvement** + Partner with software and data engineers to embed reliability into system design. + Contribute to blameless postmortems and reliability reviews. + Share knowledge and mentor junior team members. **WHAT YOU'LL NEED** + 7+ years of experience in SRE, DevOps, or infrastructure engineering. + Strong understanding of cloud platforms (AWS, GCP, or Azure), container orchestration (Kubernetes), and infrastructure-as-code (Terraform, CloudFormation). + Experience with observability tools (e.g., Prometheus, Grafana, Splunk) and CI/CD pipelines. + Familiarity with data platforms, ETL pipelines, and distributed systems. + Excellent problem-solving and communication skills. + Experience with Python, Powershell, and other similar languages + Active use of artificial intelligence (AI) tools and techniques to enhance performance, drive innovation, and improve decision-making across business functions + Ability to leverage AI tools and platforms to streamline workflows, improve decision-making, and drive innovation + Curiosity and adaptability in exploring emerging AI technologies, with a mindset for continuous learning and experimentation Preferred Qualifications + Experience with data licensing, data governance, or data compliance frameworks. + Exposure to data pipeline tools (e.g., Apache Airflow, Kafka, Spark). + Familiarity with regulatory requirements related to data usage and distribution. **ABOUT WAYSTAR** Through a smart platform and better experience, Waystar helps providers simplify healthcare payments and yield powerful results throughout the complete revenue cycle. Waystar's healthcare payments platform combines innovative, cloud-based technology, robust data, and unparalleled client support to streamline workflows and improve financials so providers can focus on what matters most: their patients and communities. Waystar is trusted by 1M+ providers, 1K+ hospitals and health systems, and is connected to over 5K commercial and Medicaid/Medicare payers. We are deeply committed to living out our organizational values: honesty; kindness; passion; curiosity; fanatical focus; best work, always; making it happen; and joyful, optimistic & fun. Waystar products have won multiple Best in KLAS or Category Leader awards since 2010 and earned multiple #1 rankings from Black Book surveys since 2012. The Waystar platform supports more than 500,000 providers, 1,000 health systems and hospitals, and 5,000 payers and health plans. For more information, visit waystar.com or follow @Waystar (**************************** on Twitter. **WAYSTAR PERKS** + Competitive total rewards (base salary + bonus, if applicable) + Customizable benefits package (3 medical plans with Health Saving Account company match) + We offer generous paid time off for our non-exempt team members, starting with 3 weeks + 13 paid holidays, including 2 personal floating holidays. We also offer flexible time off for our exempt team members + 13 paid holidays + Paid parental leave (including maternity + paternity leave) + Education assistance opportunities and free LinkedIn Learning access + Free mental health and family planning programs, including adoption assistance and fertility support + 401(K) program with company match + Pet insurance + Employee resource groups Waystar is proud to be an equal opportunity workplace. We celebrate, value, and support diversity and inclusion. Qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, marital status, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. **Job Category:** Technology/Engineering **Job Type:** Full time **Req ID:** R2880
    $99k-128k yearly est. 4d ago
  • Sr. Specialist Site Reliability Engineer

    Waystar 4.6company rating

    Atlanta, GA jobs

    We are looking for a talented and driven Site Reliability Engineering (SRE) Specialist to support our engineering team, which manages the infrastructure and services that power our Waystar products. This role is ideal for an experienced engineer who thrives in data-intensive environments and is passionate about building reliable, scalable systems that ensure data integrity, availability, and performance. As an SRE Specialist, you'll work closely with engineering, product, and data teams to ensure our data licensing platforms are resilient, observable, and continuously improving. WHAT YOU'LL DO * System Reliability & Performance * Design and implement reliability solutions for data ingestion, processing, and delivery pipelines. * Define and maintain SLIs/SLOs for data licensing services and manage error budgets. * Build automation for deployment, monitoring, and incident response. * Observability & Monitoring * Enhance system observability through metrics, logging, and tracing. * Develop and maintain dashboards and alerts to proactively detect and resolve issues. * Incident Response & Postmortems * Participate in on-call rotations and lead incident response efforts. * Conduct root cause analysis and drive post-incident improvements. * Maintain runbooks and operational documentation. * Collaboration & Continuous Improvement * Partner with software and data engineers to embed reliability into system design. * Contribute to blameless postmortems and reliability reviews. * Share knowledge and mentor junior team members. WHAT YOU'LL NEED * 7+ years of experience in SRE, DevOps, or infrastructure engineering. * Strong understanding of cloud platforms (AWS, GCP, or Azure), container orchestration (Kubernetes), and infrastructure-as-code (Terraform, CloudFormation). * Experience with observability tools (e.g., Prometheus, Grafana, Splunk) and CI/CD pipelines. * Familiarity with data platforms, ETL pipelines, and distributed systems. * Excellent problem-solving and communication skills. * Experience with Python, Powershell, and other similar languages * Active use of artificial intelligence (AI) tools and techniques to enhance performance, drive innovation, and improve decision-making across business functions * Ability to leverage AI tools and platforms to streamline workflows, improve decision-making, and drive innovation * Curiosity and adaptability in exploring emerging AI technologies, with a mindset for continuous learning and experimentation Preferred Qualifications * Experience with data licensing, data governance, or data compliance frameworks. * Exposure to data pipeline tools (e.g., Apache Airflow, Kafka, Spark). * Familiarity with regulatory requirements related to data usage and distribution. ABOUT WAYSTAR Through a smart platform and better experience, Waystar helps providers simplify healthcare payments and yield powerful results throughout the complete revenue cycle. Waystar's healthcare payments platform combines innovative, cloud-based technology, robust data, and unparalleled client support to streamline workflows and improve financials so providers can focus on what matters most: their patients and communities. Waystar is trusted by 1M+ providers, 1K+ hospitals and health systems, and is connected to over 5K commercial and Medicaid/Medicare payers. We are deeply committed to living out our organizational values: honesty; kindness; passion; curiosity; fanatical focus; best work, always; making it happen; and joyful, optimistic & fun. Waystar products have won multiple Best in KLAS or Category Leader awards since 2010 and earned multiple #1 rankings from Black Book surveys since 2012. The Waystar platform supports more than 500,000 providers, 1,000 health systems and hospitals, and 5,000 payers and health plans. For more information, visit waystar.com or follow @Waystar on Twitter. WAYSTAR PERKS * Competitive total rewards (base salary + bonus, if applicable) * Customizable benefits package (3 medical plans with Health Saving Account company match) * We offer generous paid time off for our non-exempt team members, starting with 3 weeks + 13 paid holidays, including 2 personal floating holidays. We also offer flexible time off for our exempt team members + 13 paid holidays * Paid parental leave (including maternity + paternity leave) * Education assistance opportunities and free LinkedIn Learning access * Free mental health and family planning programs, including adoption assistance and fertility support * 401(K) program with company match * Pet insurance * Employee resource groups Waystar is proud to be an equal opportunity workplace. We celebrate, value, and support diversity and inclusion. Qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, marital status, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.
    $99k-128k yearly est. Auto-Apply 4d ago
  • Sr. Specialist Site Reliability Engineer

    Waystar 4.6company rating

    Atlanta, GA jobs

    We are looking for a talented and driven Site Reliability Engineering (SRE) Specialist to support our engineering team, which manages the infrastructure and services that power our Waystar products. This role is ideal for an experienced engineer who thrives in data-intensive environments and is passionate about building reliable, scalable systems that ensure data integrity, availability, and performance. As an SRE Specialist, you'll work closely with engineering, product, and data teams to ensure our data licensing platforms are resilient, observable, and continuously improving. WHAT YOU'LL DO System Reliability & Performance Design and implement reliability solutions for data ingestion, processing, and delivery pipelines. Define and maintain SLIs/SLOs for data licensing services and manage error budgets. Build automation for deployment, monitoring, and incident response. Observability & Monitoring Enhance system observability through metrics, logging, and tracing. Develop and maintain dashboards and alerts to proactively detect and resolve issues. Incident Response & Postmortems Participate in on-call rotations and lead incident response efforts. Conduct root cause analysis and drive post-incident improvements. Maintain runbooks and operational documentation. Collaboration & Continuous Improvement Partner with software and data engineers to embed reliability into system design. Contribute to blameless postmortems and reliability reviews. Share knowledge and mentor junior team members. WHAT YOU'LL NEED 7+ years of experience in SRE, DevOps, or infrastructure engineering. Strong understanding of cloud platforms (AWS, GCP, or Azure), container orchestration (Kubernetes), and infrastructure-as-code (Terraform, CloudFormation). Experience with observability tools (e.g., Prometheus, Grafana, Splunk) and CI/CD pipelines. Familiarity with data platforms, ETL pipelines, and distributed systems. Excellent problem-solving and communication skills. Experience with Python, Powershell, and other similar languages Active use of artificial intelligence (AI) tools and techniques to enhance performance, drive innovation, and improve decision-making across business functions Ability to leverage AI tools and platforms to streamline workflows, improve decision-making, and drive innovation Curiosity and adaptability in exploring emerging AI technologies, with a mindset for continuous learning and experimentation Preferred Qualifications Experience with data licensing, data governance, or data compliance frameworks. Exposure to data pipeline tools (e.g., Apache Airflow, Kafka, Spark). Familiarity with regulatory requirements related to data usage and distribution. ABOUT WAYSTAR Through a smart platform and better experience, Waystar helps providers simplify healthcare payments and yield powerful results throughout the complete revenue cycle. Waystar's healthcare payments platform combines innovative, cloud-based technology, robust data, and unparalleled client support to streamline workflows and improve financials so providers can focus on what matters most: their patients and communities. Waystar is trusted by 1M+ providers, 1K+ hospitals and health systems, and is connected to over 5K commercial and Medicaid/Medicare payers. We are deeply committed to living out our organizational values: honesty; kindness; passion; curiosity; fanatical focus; best work, always; making it happen; and joyful, optimistic & fun. Waystar products have won multiple Best in KLAS or Category Leader awards since 2010 and earned multiple #1 rankings from Black Book™ surveys since 2012. The Waystar platform supports more than 500,000 providers, 1,000 health systems and hospitals, and 5,000 payers and health plans. For more information, visit waystar.com or follow @Waystar on Twitter. WAYSTAR PERKS Competitive total rewards (base salary + bonus, if applicable) Customizable benefits package (3 medical plans with Health Saving Account company match) We offer generous paid time off for our non-exempt team members, starting with 3 weeks + 13 paid holidays, including 2 personal floating holidays. We also offer flexible time off for our exempt team members + 13 paid holidays Paid parental leave (including maternity + paternity leave) Education assistance opportunities and free LinkedIn Learning access Free mental health and family planning programs, including adoption assistance and fertility support 401(K) program with company match Pet insurance Employee resource groups Waystar is proud to be an equal opportunity workplace. We celebrate, value, and support diversity and inclusion. Qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, marital status, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.
    $99k-128k yearly est. Auto-Apply 2d ago
  • Sr Specialist Site Reliability Engineer

    Waystar 4.6company rating

    Atlanta, GA jobs

    We are seeking a highly skilled and proactive Senior Specialist, Site Reliability Engineering (SRE) to help drive reliability, scalability, and performance across our critical platforms. This role is ideal for a senior-level engineer who combines deep technical expertise with a passion for automation, observability, and operational excellence. As a Senior Specialist, you'll work on complex reliability challenges, lead technical initiatives, and collaborate across engineering, product, and infrastructure teams to ensure our systems are resilient and efficient. WHAT YOU'LL DO Reliability Engineering Architect and implement solutions to improve system reliability, scalability, and performance. Define and manage SLIs/SLOs and error budgets across services. Lead efforts to automate operational tasks and improve system observability. Incident Management & Root Cause Analysis Serve as a technical lead during major incidents and drive resolution. Conduct deep root cause analyses and implement long-term fixes. Champion blameless postmortems and continuous improvement. Technical Leadership Lead cross-functional reliability initiatives and mentor junior engineers. Influence system design and architecture to embed reliability from the ground up. Collaborate with software engineers to optimize deployment pipelines and infrastructure. Monitoring & Tooling Enhance observability through metrics, logging, and tracing. Develop and maintain dashboards, alerts, and automated recovery systems. WHAT YOU'LL NEED 7+ years of experience in SRE, DevOps, or infrastructure engineering. Deep expertise in cloud platforms (AWS, GCP, or Azure), container orchestration (Kubernetes), and infrastructure-as-code (Terraform, CloudFormation). Strong proficiency in observability tools (e.g., Prometheus, Grafana, Splunk) and CI/CD pipelines. Proven track record of solving complex reliability challenges in distributed systems. Excellent communication and collaboration skills. Experience in Python, Powershell, or other similar languages Active use of artificial intelligence (AI) tools and techniques to enhance performance, drive innovation, and improve decision-making across business functions Ability to leverage AI tools and platforms to streamline workflows, improve decision-making, and drive innovation Curiosity and adaptability in exploring emerging AI technologies, with a mindset for continuous learning and experimentation Preferred Qualifications Experience in regulated or high-availability environments (e.g., financial services, healthcare). Familiarity with chaos engineering, performance tuning, and capacity planning. Background in software development with strong coding skills (e.g., Python, Go, Bash). ABOUT WAYSTAR Through a smart platform and better experience, Waystar helps providers simplify healthcare payments and yield powerful results throughout the complete revenue cycle. Waystar's healthcare payments platform combines innovative, cloud-based technology, robust data, and unparalleled client support to streamline workflows and improve financials so providers can focus on what matters most: their patients and communities. Waystar is trusted by 1M+ providers, 1K+ hospitals and health systems, and is connected to over 5K commercial and Medicaid/Medicare payers. We are deeply committed to living out our organizational values: honesty; kindness; passion; curiosity; fanatical focus; best work, always; making it happen; and joyful, optimistic & fun. Waystar products have won multiple Best in KLAS or Category Leader awards since 2010 and earned multiple #1 rankings from Black Book™ surveys since 2012. The Waystar platform supports more than 500,000 providers, 1,000 health systems and hospitals, and 5,000 payers and health plans. For more information, visit waystar.com or follow @Waystar on Twitter. WAYSTAR PERKS Competitive total rewards (base salary + bonus, if applicable) Customizable benefits package (3 medical plans with Health Saving Account company match) We offer generous paid time off for our non-exempt team members, starting with 3 weeks + 13 paid holidays, including 2 personal floating holidays. We also offer flexible time off for our exempt team members + 13 paid holidays Paid parental leave (including maternity + paternity leave) Education assistance opportunities and free LinkedIn Learning access Free mental health and family planning programs, including adoption assistance and fertility support 401(K) program with company match Pet insurance Employee resource groups Waystar is proud to be an equal opportunity workplace. We celebrate, value, and support diversity and inclusion. Qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, marital status, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.
    $99k-128k yearly est. Auto-Apply 48d ago
  • DevOps - Site Reliability Engineer ( SRE)

    Resource Informatics Group Inc. 3.9company rating

    Atlanta, GA jobs

    This Software Engineer will be part of the Site Reliability Engineering (SRE) team. The SRE team is an innovative team devoted to providing automated solutions and services for Cox Automotive to measure, evaluate and plan for visible, reliable application delivery and maintenance. As a member of the SRE team, you will work with development teams to help create automated pipelines and solutions required for continuous delivery in an Agile Dev/Ops environment. The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the pipeline and increase application stability. This is an opportunity to create automation, monitoring, and pipelines to improve deploy and response time across the board. We are looking for engineers who are passionate about infrastructure as code and continuous deployment to build scalable and highly reliable applications. If you love to figure out how all the pieces are put together and if automation and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you. What you will do: Automate anything and everything! (Infrastructure build out, testing, deploying, monitoring, etc) Design and assist in the authoring of software tools that reliably manage application delivery Design and assist in the setup and maintenance of application monitoring and alerting Engage with Development/Capability Teams to ensure best practices are implemented Improve predictability and reliability of software releases, workflows and operating software. Reduce application deployment windows by leading company towards a Continuous Deployment environment Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery. The skills we require: Python, Ruby, Go or other systems programming (moderate skills required) Experience with configuration management systems (Octopus, Chef, Puppet) Experience rolling out redundant, mission-critical applications in a highly available production environment Experience with version control systems (Git or SVN) Experience with Cloud Computing platforms (Amazon AWS, Kubernetes, Heroku, etc) Experience with continuous integration tools (Jenkins, CircleCI, etc), Artifactory (or Nexus) Excellent written communication, problem solving, and process management skills Desire to work in a fast paced, evolving, growing, dynamic environment The skills we prefer: Linux system engineering expertise VMWare, VirtualBox experience. Experience supporting Ruby or Java applications - Experience supporting Database Server infrastructure (MySQL, Postgres, etc) Networking Knowledge Experience with Hashicorp tools (Vagrant, Terraform, Packer, etc), Linux Containers (docker, rocket) Experience with Java build tools such as Ant, Maven, Gant, or Gradle Experience with agile development, continuous integration and automated testing Experience with dashboarding, monitoring
    $77k-108k yearly est. 23d ago
  • Principal Site Reliability Engineer

    Priority Technology Holdings, LLC 4.5company rating

    Alpharetta, GA jobs

    Job Description Job title: Principal Site Reliability Engineer Reports to: Director, Site Reliability Engineering Department: Engineering Grade: 21 About Priority: Priority Technology Holdings, Inc. is a leading financial technology company on a mission to deliver a personalized, easy-to-adopt financial toolset that accelerates cash flow and optimizes working capital for businesses. Our vision is to eliminate the barriers to unlocking revenue - empowering businesses to grow faster and operate smarter. We achieve this through the Priority Commerce Engine, an innovative platform that combines payables, acquiring, and banking and treasury solutions. This unified approach allows businesses to streamline financial operations, reduce unnecessary costs, and uncover new revenue opportunities. At Priority, we're driven by results. We expect our people to be known for results - bringing expertise, momentum, and relentless focus to every challenge, helping our clients and each other thrive. About the Role: As a Principal Site Reliability Engineer, you will be a senior technical leader ensuring the reliability, scalability, and operational excellence of Priority's mission-critical financial technology platform. This role blends hands-on engineering with leadership, mentorship, and cross-functional influence. You will partner with product and infrastructure teams to ensure services are observable and resilient. You will resolve incidents, automate operational workflows, and set standards that raise the bar for reliability across the organization. This is an ideal role for an engineer who thrives at the intersection of software, systems, and operations, and who wants to shape the reliability culture at scale. Responsibilities: Define and drive the SRE strategy, aligning reliability practices with Priority's long-term business and technology goals. Lead incident response and retrospectives, driving systemic reliability improvements across multiple product workstreams. Own cross-cutting platform concerns such as observability, monitoring, alerting, performance, scalability, and resiliency. Partner with engineering leadership and product teams to embed reliability best practices into design, planning, and delivery. Automate detection, resolution, and recovery for recurring production issues, reducing toil and increasing delivery velocity. Evaluate and introduce new technologies, frameworks, and practices to improve reliability, cost efficiency, and performance. Mentor and coach engineers across levels, multiplying SRE skills and mindset across the organization. What Success Looks Like: Delivery & Execution: Services are deployed frequently and safely; incidents are resolved quickly with minimal customer impact; operational load is reduced through automation and reliable processes. Product Quality & Reliability: Systems consistently meet availability, latency, error-rate, and performance SLOs; incidents are rare, well-mitigated, and quickly remediated; redundancy and fault tolerance are built into all layers of the stack. Collaboration & Knowledge Sharing: Teams design with reliability in mind; incidents are resolved faster with shared playbooks; engineers across disciplines feel confident in operational practices; blameless postmortems drive organizational learning. Business & Product Impact: Reliability improvements directly enhance customer experience, reduce churn, and support adoption of new products and features without sacrificing stability. Professional Growth & Team Contribution: You are recognized as a mentor and thought leader, elevating engineering maturity across the org; teams continuously improve reliability practices, automation, and resilience. Candidate Requirements: Required: 10+ years of professional software engineering / systems engineering experience, including 5+ years in SRE or reliability-focused roles. Proven leadership experience influencing reliability practices across multiple teams or domains. Strong background in distributed systems, cloud infrastructure (AWS preferred), CI/CD, microservices, and serverless architectures. Deep expertise in observability, monitoring, alerting, and incident management. Proficiency in modern programming languages (Java, Node.js/JavaScript, or similar) and scripting for automation. Experience with relational and non-relational databases. Strong communication skills with the ability to collaborate effectively across engineering and business stakeholders. Preferred: Prior experience in high-throughput, regulated industries such as payments, fintech, or banking. Familiarity with event-driven and asynchronous architectures. Strong mentoring background, helping senior engineers adopt higher-level reliability thinking. Advanced certifications in cloud, reliability, or system design (e.g., AWS Solutions Architect Professional, SRE Foundation Work Environment & Culture: We believe that performance and experience go hand in hand - an exceptional employee experience is earned through contribution. We are a results-driven team, grounded in our core values: ownership, authenticity, service, trust, innovation, and camaraderie. Our culture is built for those who want to make an impact. We challenge each other to grow, celebrate progress, and support one another through shared goals and real connection. Whether you're building technology, serving clients, or supporting internal teams, you'll be part of a company that empowers you to perform at your best and be known for results. Compensation and Benefits: Compensation range: $181,700 - $201,000 We invest in the whole employee - personally and professionally. Our benefits package is designed to support your well-being, growth, and success - both inside and outside of work. Financial Wellness Bonus programs 401(k) match Employee Stock Purchase Program (ESPP) HSA and FSA options Financial wellness resources and employee discount programs Health & Well-being Medical, dental, and vision coverage Mental health support for employees and dependents through Lyra Health Family planning and women's health benefits through Carrot Gym membership reimbursement and virtual wellness programs (including yoga) Time Off 3 weeks PTO to start, with unlimited PTO after year one Growth & Development Education expense reimbursement Leadership development programs Certified Payments Professional (CPP) certification support We believe great performance starts with feeling supported - and we've built our benefits with that in mind. Traditional Physical Requirements: Requires prolonged sitting, standing, bending, stooping and stretching. Requires the ability to lift 10 pounds. Requires eye-hand coordination, manual dexterity and a normal range of hearing and vision (with or without correction). Join our team at Priority Technology Holdings, Inc. and be part of a dynamic and innovative company that is transforming the financial technology landscape. Together, we can shape the future of payments and banking solutions while providing unmatched value to our clients.
    $181.7k-201k yearly 26d ago
  • Principal Site Reliability Engineer

    Priority Technology Holdings, LLC 4.5company rating

    Alpharetta, GA jobs

    Job title: Principal Site Reliability Engineer Reports to: Director, Site Reliability Engineering Department: Engineering Grade: 21 About Priority: Priority Technology Holdings, Inc. is a leading financial technology company on a mission to deliver a personalized, easy-to-adopt financial toolset that accelerates cash flow and optimizes working capital for businesses. Our vision is to eliminate the barriers to unlocking revenue - empowering businesses to grow faster and operate smarter. We achieve this through the Priority Commerce Engine, an innovative platform that combines payables, acquiring, and banking and treasury solutions. This unified approach allows businesses to streamline financial operations, reduce unnecessary costs, and uncover new revenue opportunities. At Priority, we're driven by results. We expect our people to be known for results - bringing expertise, momentum, and relentless focus to every challenge, helping our clients and each other thrive. About the Role: As a Principal Site Reliability Engineer, you will be a senior technical leader ensuring the reliability, scalability, and operational excellence of Priority's mission-critical financial technology platform. This role blends hands-on engineering with leadership, mentorship, and cross-functional influence. You will partner with product and infrastructure teams to ensure services are observable and resilient. You will resolve incidents, automate operational workflows, and set standards that raise the bar for reliability across the organization. This is an ideal role for an engineer who thrives at the intersection of software, systems, and operations, and who wants to shape the reliability culture at scale. Responsibilities: Define and drive the SRE strategy, aligning reliability practices with Priority's long-term business and technology goals. Lead incident response and retrospectives, driving systemic reliability improvements across multiple product workstreams. Own cross-cutting platform concerns such as observability, monitoring, alerting, performance, scalability, and resiliency. Partner with engineering leadership and product teams to embed reliability best practices into design, planning, and delivery. Automate detection, resolution, and recovery for recurring production issues, reducing toil and increasing delivery velocity. Evaluate and introduce new technologies, frameworks, and practices to improve reliability, cost efficiency, and performance. Mentor and coach engineers across levels, multiplying SRE skills and mindset across the organization. What Success Looks Like: Delivery & Execution: Services are deployed frequently and safely; incidents are resolved quickly with minimal customer impact; operational load is reduced through automation and reliable processes. Product Quality & Reliability: Systems consistently meet availability, latency, error-rate, and performance SLOs; incidents are rare, well-mitigated, and quickly remediated; redundancy and fault tolerance are built into all layers of the stack. Collaboration & Knowledge Sharing: Teams design with reliability in mind; incidents are resolved faster with shared playbooks; engineers across disciplines feel confident in operational practices; blameless postmortems drive organizational learning. Business & Product Impact: Reliability improvements directly enhance customer experience, reduce churn, and support adoption of new products and features without sacrificing stability. Professional Growth & Team Contribution: You are recognized as a mentor and thought leader, elevating engineering maturity across the org; teams continuously improve reliability practices, automation, and resilience. Candidate Requirements: Required: 10+ years of professional software engineering / systems engineering experience, including 5+ years in SRE or reliability-focused roles. Proven leadership experience influencing reliability practices across multiple teams or domains. Strong background in distributed systems, cloud infrastructure (AWS preferred), CI/CD, microservices, and serverless architectures. Deep expertise in observability, monitoring, alerting, and incident management. Proficiency in modern programming languages (Java, Node.js/JavaScript, or similar) and scripting for automation. Experience with relational and non-relational databases. Strong communication skills with the ability to collaborate effectively across engineering and business stakeholders. Preferred: Prior experience in high-throughput, regulated industries such as payments, fintech, or banking. Familiarity with event-driven and asynchronous architectures. Strong mentoring background, helping senior engineers adopt higher-level reliability thinking. Advanced certifications in cloud, reliability, or system design (e.g., AWS Solutions Architect Professional, SRE Foundation Work Environment & Culture: We believe that performance and experience go hand in hand - an exceptional employee experience is earned through contribution. We are a results-driven team, grounded in our core values: ownership, authenticity, service, trust, innovation, and camaraderie. Our culture is built for those who want to make an impact. We challenge each other to grow, celebrate progress, and support one another through shared goals and real connection. Whether you're building technology, serving clients, or supporting internal teams, you'll be part of a company that empowers you to perform at your best and be known for results. Compensation and Benefits: Compensation range: $181,700 - $201,000 We invest in the whole employee - personally and professionally. Our benefits package is designed to support your well-being, growth, and success - both inside and outside of work. Financial Wellness Bonus programs 401(k) match Employee Stock Purchase Program (ESPP) HSA and FSA options Financial wellness resources and employee discount programs Health & Well-being Medical, dental, and vision coverage Mental health support for employees and dependents through Lyra Health Family planning and women's health benefits through Carrot Gym membership reimbursement and virtual wellness programs (including yoga) Time Off 3 weeks PTO to start, with unlimited PTO after year one Growth & Development Education expense reimbursement Leadership development programs Certified Payments Professional (CPP) certification support We believe great performance starts with feeling supported - and we've built our benefits with that in mind. Traditional Physical Requirements: Requires prolonged sitting, standing, bending, stooping and stretching. Requires the ability to lift 10 pounds. Requires eye-hand coordination, manual dexterity and a normal range of hearing and vision (with or without correction). Join our team at Priority Technology Holdings, Inc. and be part of a dynamic and innovative company that is transforming the financial technology landscape. Together, we can shape the future of payments and banking solutions while providing unmatched value to our clients.
    $181.7k-201k yearly Auto-Apply 60d ago
  • Site Reliability Engineer

    Origami Risk 4.3company rating

    Atlanta, GA jobs

    The Site Reliability Engineer is a key force behind improving Origami's time to resolution and advancing overall site reliability and scalability. This person participates in efforts to identify root causes during post-incident investigations, while also identifying preventative measures to minimize future disruptions. They also assist with identifying root causes in performance challenges in client implementations and implement methods for tracking key performance metrics across clients. Starting base pay for this role is between $100,000 and $120,000. The actual base pay is dependent upon many factors, such as transferable skills, work experience, business needs, training, location, and market demands. The base pay range is subject to change and may be modified in the future. This role will be eligible for a bonus as well as competitive medical, dental, and vision benefits, wellness reimbursement, life insurance, and a 401(k) with company match. We offer vacation and sick leave benefits (under a flexible time off policy in most states). Responsibilities Leads post-incident investigations for the Site Reliability team. Conducts in-depth post-incident analyses to identify root causes and develops preventive strategies. Drafts clear and insightful RCAs for customer delivery. Cross trains colleagues on how to best leverage observability tools during incident and performance investigations. Provides visibility to all stakeholders throughout the entire Site Reliability process. Collaborates with cross-functional teams to implement system enhancements that enhance scalability and stability. Develops client-focused dashboards/alerts to proactively identify performance challenges. Monitors and continuously improves our time to resolution metrics. Maintains and configures core observability tools to ensure optimum performance and key metrics/data are available for incident response and performance investigations. Provides an actionable feedback loop to Observability and Engineering teams toward improving MELT and development patterns. Contributes to the development of automation tools to streamline incident response. Works proactively to prevent incidents and reduce their impact on our platform. Partners with the larger Cloud Operations, SRE, Engineering teams, and the business-at-large to advance our SaaS platforms. Other duties as assigned. Qualifications Bachelor's degree in Computer Science or related field (or equivalent experience) 5+ years of proven experience in a Site Reliability Engineering role. Strong knowledge of SRE best practices and incident management protocols Deep experience using and/or configuring New Relic, Data Dog, SumoLogic or similar observability tools Proficiency in reading and writing code (e.g., JavaScript, .NET, SQL) Familiarity with cloud platforms (e.g., AWS, Azure) and architectural patterns Excellent problem-solving skills and a data-driven approach to incident analysis Prior experience operating within a Public Cloud environment (AWS strongly preferred) Experience troubleshooting C#/.Net based web applications to identify bugs/performance challenges. Solid knowledge of SaaS operations Ability to succeed when facing ambiguity and differing levels of operational maturation Advanced written and verbal communication skills Windows and SQL-server troubleshooting skills preferred Knowledge of Continuous Integration and Continuous Delivery (CI/CD) pipelines preferred Experience working in an Infrastructure as a Code (IaC) environment preferred Benefits Medical and Dental coverage available for employees, dependents, domestic partners, and spouses Paid Time Off - Flexible options plus 10 paid company holidays where available** All full-time positions are hybrid, with many eligible to be completely remote Fully Paid by Origami Risk - Vision insurance, Short & Long-Term Disability Insurance, and Basic Life Insurance Generous family leave options-including adoption and foster care placements Pre-Tax Savings Accounts - Flexible Spending Account, Health Savings Account, Commuter Benefits, Dependent Care Savings Account Retirement Savings - 401(k) with company match up to 4% Employee Assistance Program (EAP) - Confidential & Free support offered to colleagues facing personal or work-related complications Education Assistance Program - to help colleagues pursue industry/role-specific certifications Wellness Benefits - reimbursement program to invest in healthy habits as well as support better colleague productivity and stress management Additional coverages available - Pet Insurance, Critical Illness Insurance, and Voluntary Life & AD&D coverage **Flexible PTO not available in California or the UK Who We Are Origami Risk provides integrated SaaS solutions to organizations across the risk and insurance ecosystem - from insured corporate and public entities to brokers and risk consultants, insurers, third party claims administrators (TPAs), and risk pools. We deliver our risk management and insurance core system solutions from a cloud-based platform that is highly configurable, completely scalable, and accessible via web browser and mobile app. Dais Technology, a subsidiary of Origami Risk, provides a no-code platform that revolutionizes insurance product creation for MGAs, insurers, and reinsurers. Dais' event-based architecture enables AI-driven bundling, automation, and real-time deployment. Solutions from Origami Risk and Dais Technology are backed by a best-in-class service team of experienced risk and insurance professionals who possess a balance of industry knowledge and technological expertise. A singular focus on helping clients achieve their business objectives underlies our approach to developing, implementing, and supporting our risk management, safety, compliance, and insurance core system technology solutions. Origami Risk is proud to be an equal opportunity employer. We thrive and benefit from diversity and are committed to creating an inclusive and equitable environment for all employees. We do not discriminate against any individual based upon race, religion, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, color, sex, national origin, age, marital status, military or veteran status, disability, or any other characteristic protected by applicable law. Caution : Be alert to recruiting scams. We have received reports of individuals impersonating Origami Risk recruiters to deceive candidates into disclosing personal information. These impostors use fake Origami Risk domain names and email addresses. Please double-check that any email address from an Origami Risk recruiter ends with origamirisk.com or talent.icims.com . And to confirm the legitimacy of any recruiting communication, feel free to email ********************************* .
    $100k-120k yearly Auto-Apply 1d ago
  • Metrics Platform Site Reliability Engineer

    Capgemini 4.5company rating

    Atlanta, GA jobs

    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you'd like, where you'll be supported and inspired by a collaborative community of colleagues around the world, and where you'll be able to reimagine what's possible. Join us and help the world's leading organizations unlock the value of technology and build a more sustainable, more inclusive world. **Job Location - Atlanta, GA** **** **Key Responsibilities** Manage and mentor a team of Site Reliability Engineers Define and implement SRE strategies and best practices in alignment with organizational objectives Monitor clients service level agreements SLAs service level objectives SLOs and service level indicators SLIs Lead initiatives to improve system reliability availability scalability and performance Collaborate with development and operations teams to ensure reliability and resiliency goals are met Implement and improve incident management processes to minimize downtime and ensure timely resolutions Review and contribute to the architecture of critical systems ensuring they meet reliability and performance goals Drive observability practices by implementing robust monitoring logging and alerting systems **Skills required** Proficiency in writing Splunk Queries and Alerts is a must Hands on experience with at least one APM tool NewRelic AppDynamics Honeycomb Data Dog is a must Expertise in automation tools and scripting languages Python Or JavaScript is a must Proficiency in scripting languages Python or NodeJs a must Proficiency in any cloud platforms AWS GCP Azure is a must Strong understanding of distributed systems microservices architecture and container orchestration tools eg Kubernetes Experience with monitoring tools like Prometheus Grafana a must **Job Description** Monitoring and Alerting Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users Incident Response Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service Automation Automate repetitive tasks and processes to improve efficiency and reduce manual effort Performance Optimization Identify and address performance bottlenecks to ensure systems run efficiently and effectively Infrastructure Management Manage and maintain the underlying infrastructure including servers networks and cloud resources Capacity Planning Plan for future capacity needs to ensure systems can handle anticipated workloads Release Engineering Develop and maintain processes for deploying software updates and releases Collaboration Work closely with developers operations teams and other stakeholders to ensure system reliability and availability Documentation Maintain clear and concise documentation of systems processes and procedures Continuous Improvement Identify areas for improvement and implement changes to enhance system reliability and performance" **Life at Capgemini** Capgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer: + Flexible work + Healthcare including dental, vision, mental health, and well-being programs + Financial well-being programs such as 401(k) and Employee Share Ownership Plan + Paid time off and paid holidays + Paid parental leave + Family building benefits like adoption assistance, surrogacy, and cryopreservation + Social well-being benefits like subsidized back-up child/elder care and tutoring + Mentoring, coaching and learning programs + Employee Resource Groups + Disaster Relief **Disclaimer:-** Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law. This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed. Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider reasonable accommodations that might involve varying job requirements and/or changing the way this job is performed, provided that such accommodations do not pose an undue hardship. Capgemini is committed to providing reasonable accommodations during our recruitment process. If you need assistance or accommodation, please reach out to your recruiting contact. Click the following link for more information on your rights as an Applicant **Salary Transparency:** Capgemini discloses salary range information in compliance with state and local pay transparency obligations. The disclosed range represents the lowest to highest salary we, in good faith, believe we would pay for this role at the time of this posting, although we may ultimately pay more or less than the disclosed range, and the range may be modified in the future. The disclosed range takes into account the wide range of factors that are considered in making compensation decisions including, but not limited to, geographic location, relevant education, qualifications, certifications, experience, skills, seniority, performance, sales or revenue-based metrics, and business or organizational needs. At Capgemini, it is not typical for an individual to be hired at or near the top of the range for their role. The base salary range for the tagged location is $100000 - $130000 / year. This role may be eligible for other compensation including variable compensation, bonus, or commission. Full time regular employees are eligible for paid time off, medical/dental/vision insurance, 401(k), and any other benefits to eligible employees. Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, or any other form of compensation that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company's sole discretion, consistent with the law. Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem. Ref. code: 360290 Posted on: Nov 19, 2025 Experience Level: Experienced Professionals Contract Type: Permanent Location: Atlanta, GA, USNew York, USNew York, NY, US Brand: Capgemini Professional Community: Software Engineering Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law.
    $100k-130k yearly 23d ago
  • Metrics Platform Site Reliability Engineer

    Capgemini Holding Inc. 4.5company rating

    Atlanta, GA jobs

    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you'd like, where you'll be supported and inspired by a collaborative community of colleagues around the world, and where you'll be able to reimagine what's possible. Join us and help the world's leading organizations unlock the value of technology and build a more sustainable, more inclusive world. Job Location - Atlanta, GA Key Responsibilities Manage and mentor a team of Site Reliability Engineers Define and implement SRE strategies and best practices in alignment with organizational objectives Monitor clients service level agreements SLAs service level objectives SLOs and service level indicators SLIs Lead initiatives to improve system reliability availability scalability and performance Collaborate with development and operations teams to ensure reliability and resiliency goals are met Implement and improve incident management processes to minimize downtime and ensure timely resolutions Review and contribute to the architecture of critical systems ensuring they meet reliability and performance goals Drive observability practices by implementing robust monitoring logging and alerting systems Skills required Proficiency in writing Splunk Queries and Alerts is a must Hands on experience with at least one APM tool NewRelic AppDynamics Honeycomb Data Dog is a must Expertise in automation tools and scripting languages Python Or JavaScript is a must Proficiency in scripting languages Python or NodeJs a must Proficiency in any cloud platforms AWS GCP Azure is a must Strong understanding of distributed systems microservices architecture and container orchestration tools eg Kubernetes Experience with monitoring tools like Prometheus Grafana a must Job Description Monitoring and Alerting Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users Incident Response Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service Automation Automate repetitive tasks and processes to improve efficiency and reduce manual effort Performance Optimization Identify and address performance bottlenecks to ensure systems run efficiently and effectively Infrastructure Management Manage and maintain the underlying infrastructure including servers networks and cloud resources Capacity Planning Plan for future capacity needs to ensure systems can handle anticipated workloads Release Engineering Develop and maintain processes for deploying software updates and releases Collaboration Work closely with developers operations teams and other stakeholders to ensure system reliability and availability Documentation Maintain clear and concise documentation of systems processes and procedures Continuous Improvement Identify areas for improvement and implement changes to enhance system reliability and performance' Life at Capgemini Capgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer: * Flexible work * Healthcare including dental, vision, mental health, and well-being programs * Financial well-being programs such as 401(k) and Employee Share Ownership Plan * Paid time off and paid holidays * Paid parental leave * Family building benefits like adoption assistance, surrogacy, and cryopreservation * Social well-being benefits like subsidized back-up child/elder care and tutoring * Mentoring, coaching and learning programs * Employee Resource Groups * Disaster Relief Disclaimer:- Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law. This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed. Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider reasonable accommodations that might involve varying job requirements and/or changing the way this job is performed, provided that such accommodations do not pose an undue hardship. Capgemini is committed to providing reasonable accommodations during our recruitment process. If you need assistance or accommodation, please reach out to your recruiting contact. Click the following link for more information on your rights as an Applicant ************************************************************************** Salary Transparency: Capgemini discloses salary range information in compliance with state and local pay transparency obligations. The disclosed range represents the lowest to highest salary we, in good faith, believe we would pay for this role at the time of this posting, although we may ultimately pay more or less than the disclosed range, and the range may be modified in the future. The disclosed range takes into account the wide range of factors that are considered in making compensation decisions including, but not limited to, geographic location, relevant education, qualifications, certifications, experience, skills, seniority, performance, sales or revenue-based metrics, and business or organizational needs. At Capgemini, it is not typical for an individual to be hired at or near the top of the range for their role. The base salary range for the tagged location is $100000 - $130000 / year. This role may be eligible for other compensation including variable compensation, bonus, or commission. Full time regular employees are eligible for paid time off, medical/dental/vision insurance, 401(k), and any other benefits to eligible employees. Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, or any other form of compensation that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company's sole discretion, consistent with the law. Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.
    $100k-130k yearly 11d ago
  • Site Reliability Engineer II

    Southeastern Data Cooperative 4.3company rating

    Atlanta, GA jobs

    At Meridian, we're building resilient, high-performing systems that power meaningful experiences for the people and organizations we support. We're looking for an experienced Site Reliability Engineer (SRE) who thrives at the intersection of software engineering and systems operations-someone who is energized by solving complex problems, strengthening infrastructure, and creating automation that makes our technology smarter, faster, and more reliable. In this role, you will be a key technical leader responsible for ensuring the stability, performance, and scalability of our cloud-based platforms. You'll collaborate closely with DevOps, Development, and Security teams to design robust solutions, eliminate inefficiencies, and prevent issues before they reach our customers. If you're excited by deep technical challenges, building tools that empower teams, and driving operational excellence across an organization, we want to meet you. What You'll Do * Architect and maintain highly available, scalable AWS-based systems that support critical production environments. * Lead efforts to improve system performance, reliability, and observability across the enterprise. * Develop smart automation that reduces manual work, accelerates deployments, and improves consistency through scripting and Infrastructure-as-Code. * Build and refine monitoring, alerting, and incident-management processes to ensure rapid detection and resolution of issues. * Partner with engineering teams to troubleshoot problems, remove bottlenecks, and enhance application and system resiliency. * Conduct root cause analyses and implement long-term solutions that prevent recurring incidents. * Strengthen disaster recovery and backup strategies to protect data and maintain uptime. * Stay current on emerging cloud, DevOps, and SRE practices-constantly pushing Meridian toward greater operational maturity. What You'll Bring * 8+ years of hands-on experience with cloud technologies, with deep expertise in AWS. * A strong systems foundation-operating systems, networking, storage, distributed systems. * Experience in Identity & Access Management, including SAML/OIDC federation, MFA/Passwordless authentication, SCIM automation, and role-based access architecture across cloud and SaaS ecosystems. * Proven experience supporting complex cloud architectures and high-availability environments. * Proficiency in scripting or programming (Python, Go, Bash) to automate processes and manage infrastructure. * Experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, or ELK. * Experience with enterprise identity management by unifying SSO, MFA/Passwordless, SCIM provisioning, and RBAC across platforms (Okta, Snowflake, AWS, GCP, OCI, Datadog, StrongDM, Twilio), establishing a consistent, secure, least-privilege access model for all users and services. * Familiarity with containers (Docker, Kubernetes), distributed storage (NFS, S3, Ceph, HDFS), and CI/CD tools such as Jenkins, GitLab, or CircleCI. * Hands-on experience with Infrastructure-as-Code tools like Terraform, AWS CDK/CloudFormation, or Pulumi. * Excellent troubleshooting skills, strong problem-solving ability, and a proactive mindset. * Effective communication skills and the ability to work collaboratively across technical teams. Preferred Certifications * AWS Certified Cloud Practitioner * AWS Certified Solutions Architect What We Offer * Outstanding Medical/Dental/Vision * Education/Training Reimbursement * On-Site Education Courses * Flexible Spending Account * Health/Wellness Reimbursement * Excellent Life and AD&D insurance * Paid Time Off: Eligible to begin accrual from date of hire; accrual amount based on years of service. Beginning accrual rate equivalent to 22 days per year. 9 holidays which include the day after Thanksgiving, and Christmas Eve. Up to 240 hours of PTO can roll over to the following year. * Volunteer Time: 8 hours per year * Retirement: Robust 401K. Following one year of eligible service, the Company contributes in two ways: (1) match of 100% of each dollar you contribute on the first five percent (5%) of eligible compensation, and (2) Employer basic contribution of 4% of base salary (with increases in basic contribution percentage based on years of service). Employees are 100% vested in Company funded contributions from the date they enter the plan. About Us: We were formed in 1976 by a group of Electric Membership Cooperatives with a vision for a single enterprise solution provider to serve data processing, IT, and operational needs to cooperatives, public utility districts, and municipal utilities. Through carefully curated acquisitions and partnerships, Meridian Cooperative has unified multiple leading-edge companies under its umbrella to truly execute that vision. Today, the Meridian Suite serves over 500 utilities across the country with industry-leading enterprise software solutions.
    $75k-99k yearly est. 17d ago
  • Database Reliability Engineer IV

    Pagerduty 3.8company rating

    Atlanta, GA jobs

    PagerDuty, Inc. (NYSE:PD) is a global leader in digital operations management. Trusted by nearly half of both the Fortune 500 and the Forbes AI 50, as well as approximately two-thirds of the Fortune 100, PagerDuty is essential for delivering always-on digital experiences to modern businesses. Join us. At PagerDuty, you'll tackle complex problems, collaborate with kind and ambitious people, and help build a more equitable world-all in a flexible, award-winning workplace. PagerDuty is seeking a proficient Senior Database Reliability Engineer (DBRE) IV to enhance our dynamic, customer-centric team! In this role as a DBRE, you will develop standards and practices for our back-end data storage, and data streaming systems. You will also build tools to enable engineers to easily interact with and optimize their database systems. This role offers a thrilling opportunity to contribute to scaling the PagerDuty Platform. The perfect candidate will come with coding/scripting abilities, Kafka expertise, experience with AWS, and a solid background in Site Reliability Engineering or DBRE in high-scale environments. If you have a track record of solving complex problems with automation and a keen interest in making database systems more approachable to others, then you're the ideal candidate. Key Responsibilities You partner with Engineering stakeholders to design and deliver reliable, scalable, secure, and performant data platforms. You continuously strive to improve the customer experience: Full lifecycle support (creation, development, deployment, retirement), observability, flexible connectivity, and monitoring. You stay current on technology trends in order to deliver innovative tools and approaches to interesting problems. You share your expertise with the entire Engineering organization You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules. Basic Qualifications 5+ years of experience in SRE, DBRE, or Software Development. 3+ years experience with database management systems such as MySQL, PostgreSQL, DynamoDB, Cassandra, etc. Experience in one or more of the following languages like Ruby, Python, or Golang. Experience working on cloud-native infrastructure in AWS. Experience working with a container scheduler platform, preferably Kubernetes. Preferred Qualifications Experience with infrastructure as code (Terraform) for managing database & cloud resources. Experience with data streaming solutions such as Kafka, AWS Kinesis, etc. Knowledge of various Kafka-related tools and frameworks, such as Apache ZooKeeper, Kafka Connect, Kafka Streams, or KSQL, can help with integrating Kafka-based solutions with external systems, data sources, and data destinations. Experience with monitoring, observability and logging platforms for databases (e.g. DataDog, New Relic, Grafana Logs, etc.). Knowledge of configuration management systems like Ansible, Chef, or Puppet for database infrastructure management. Experience in automating database releases, continuous integration/delivery systems, and relevant tools (e.g., Jenkins, CircleCI, Travis CI, Buildkite, etc.) with a focus on database performance and reliability. PagerDuty is a flexible, hybrid workplace. We embrace and encourage in-person working as an integral part of our culture. Both our employees and external research tells us that co-located collaboration strengthens connections, drives innovation, and accelerates learning. This role is expected to come into our Atlanta office 2 days per week, so you can thrive in your new role and fully embrace being a Dutonian! The base salary range for this position is 150,000 - 252,000 USD. This role may also be eligible for bonus, commission, equity, and/or benefits. Our base salary ranges are determined by role, level, and location. The range, which is subject to change based on primary work location, reflects the minimum and maximum base salary we expect to pay newly hired employees for the position. Within the range, we determine pay for an individual based on a number of factors including market location, job-related knowledge, skills/competencies and experience. Your recruiter can share more about the specific offerings for this role, as well as the salary range for your primary work location during the hiring process. Hesitant to apply? We encourage you to submit your resume even if you don't meet every requirement. We value potential and consider each candidate's full professional story. Whether you're exploring a career change or taking your next step, we look forward to reviewing your application. If this just isn't the right role or time - sign up for job alerts! Where we work PagerDuty operates a hybrid work model with offices in 8 major cities: Atlanta, Lisbon, London, San Francisco, Santiago, Sydney, Tokyo, and Toronto. While we offer flexibility within our established locations, we cannot employ candidates residing in: Location restrictions: Australia: Northern Territory, Queensland, South Australia, Tasmania, Western Australia Canada: Alberta, Manitoba, Newfoundland, Northwest Territories, Nunavut, PEI, Quebec, Saskatchewan, Yukon United States: Alaska, Hawaii, Iowa, Louisiana, Mississippi, Nebraska, New Mexico, Oklahoma, Rhode Island, South Dakota, West Virginia, Wyoming Candidates must reside in an eligible location, which vary by role. How we work Our values guide how we support customers, collaborate with colleagues, develop products, and foster a culture of belonging. They define not just our actions, but what it means to be Dutonian. People Leaders at PagerDuty are responsible for creating high performance environments that drive accountability. PagerDuty has four key dimensions that define our Leadership Impact: Lead Self, Lead the Team, Lead the Business, and Lead the Future. Each dimension has three associated competencies to give leaders a shared language for guiding their development, career, promotion, and succession planning discussions. Our Manager Expectations serve as a practical guide for managers to understand their responsibilities, prioritize their efforts, and drive engagement and performance. What we offer As a global organization, our total rewards approach is competitive with industry standards and aligned with local laws and regulations. Learn more, including country-specific offerings, on our benefits site. Your package may include: Competitive salary Comprehensive benefits package Flexible work arrangements Company equity* ESPP (Employee Stock Purchase Program)* Retirement or pension plan* Generous paid vacation time Paid holidays and sick leave Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent (some countries have longer leave standards and we comply with local laws)* Paid volunteer time off: 20 hours per year Company-wide hack weeks Mental wellness programs *Eligibility may vary by role, region, and tenure About PagerDuty PagerDuty, Inc. (NYSE:PD) is a global leader in digital operations management. The PagerDuty Operations Cloud is an AI-powered platform that empowers business resilience and drives operational efficiency for enterprises. With a generative AI assistant at its core, PagerDuty empowers teams to detect and resolve issues in real time, orchestrate complex workflows, and drive continuous improvement across their digital operations. Trusted by nearly half of both the Fortune 500 and the Forbes AI 50, as well as approximately two-thirds of the Fortune 100, PagerDuty is essential for delivering always-on digital experiences to modern businesses PagerDuty is Great Place to Work-certified™, a Fortune Best Workplace for Millennials, a Fortune Best Medium Workplace, a Fortune Best Workplace in Technology, and a top rated product on TrustRadius and G2. Go behind-the-scenes on our careers site and @pagerduty on Instagram. Additional Information PagerDuty is an equal opportunity employer. PagerDuty does not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, parental status, veteran status, or disability status. Your privacy is important to us. By submitting an application, you confirm that you have read and understand PagerDuty's Privacy Policy. PagerDuty is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application process. Should you require accommodation, please email accommodation@pagerduty.com and we will work with you to meet your accessibility needs. PagerDuty uses the E-Verify employment verification program.
    $110k-138k yearly est. Auto-Apply 4d ago
  • Project Leader Engineering II

    Vanderlande 4.7company rating

    Marietta, GA jobs

    Job TitleProject Leader Engineering IIJob Description The Project Leader Engineering is responsible for leading a project team in organizing, optimizing, and realizing sold projects into a functional system design. Extensive planning and coordination with other disciplines are essential to ensure on-time, cost-effective, and efficient project execution. As the spearhead of the engineering team, the Project Leader Engineering acts as a communication bridge between all parties, including customers, outside of engineering, and serves as the technical expert for the engineering team. Responsibilities & Tasks Project Execution: Execute projects from conception through customer handover, contributing positively to technical performance, financial results, and organizational success. Oversee the system design, including hardware, system and network design, software development (PLC and/or PC related systems), electrical installation, integration of components, system commissioning, and testing, leading to final handover to the customer. Ensure system functionality and maintain technical quality for material handling systems. Project Planning and Coordination: Develop engineering project plans and schedules for the project team, consistently monitoring and reporting progress. Coordinate and communicate activities with the Project Manager, Site Manager, and subcontractors as needed. Collaborate closely with all members of the value chain to ensure as planned execution. Documentation and Reporting: Document functional system specifications, interface specifications, and (high-level) test plans (FAT, SAT, CAT), including validation. Manage all engineering project-related information, including complete project archiving. Report project progress and status to the Project Manager and Engineering Manager (Project Status Report with risks and opportunities). Budget and Safety: Monitor and control project engineering budgets. Execute health and safety analysis and coordinate safety during the engineering phase. Basic Qualifications Bachelor's Degree in Engineering or equivalent. Minimum 8 years of experience working on large-scale material handling systems. Leadership experience in a material handling environment. Willingness and ability to travel up to 50% or as required by project needs. Preferred Qualifications Experience utilizing MS Project, Primavera, or other scheduling software. Project Management certification or experience. Knowledge of (controls) engineering theories and principles. Knowledge of electrical installation and design standards. Proficient in Microsoft Office Suite of Tools. Knowledge-Skills-Abilities Technical knowledge/background in material handling processes and standards. In-depth knowledge of test approaches (Mechanical and software testing). Proactive and hands-on work ethic. Self-motivated with proven problem-solving abilities, especially in intense situations. Pragmatic and capable of balancing quality, lead time, costs, and technology. Ability to lead a specialized team responsible for complex projects. Motivator and trainer for team members.
    $101k-133k yearly est. Auto-Apply 22d ago
  • Project Leader Engineering III

    Vanderlande 4.7company rating

    Marietta, GA jobs

    Job TitleProject Leader Engineering IIIJob Description The Project Leader Engineer III is responsible for leading a project team in organizing, optimizing, and realizing sold projects into a functional system design. Extensive planning and coordination with other disciplines are essential to ensure on-time, cost-effective, and efficient project execution. As the spearhead of the engineering team, the Project Leader Engineer III acts as a communication bridge between all parties, including customers, outside of engineering, and serves as the technical expert for the engineering team. Responsibilities & Tasks Project Execution: Execute projects from conception through customer handover, contributing positively to technical performance, financial results, and organizational success. Oversee the system design, including mechanical, hardware, installation, and testing, leading to final handover to the customer. Ensure system functionality and maintain technical quality for material handling systems. Project Planning and Coordination: Develop engineering project plans and schedules for the project team, consistently monitoring and reporting progress. Coordinate and communicate activities with the Project Manager, Site Manager, and subcontractors as needed. Collaborate closely with all members of the value chain to ensure as planned execution. Documentation and Reporting: Manage all engineering project-related information, including complete project archiving. Report project progress and status to the Project Manager and Engineering Manager (Project Status Report with risks and opportunities). Budget and Safety: Monitor and control project engineering budgets. Execute health and safety analysis and coordinate safety during the engineering phase. Basic Qualifications Bachelor's degree in engineering or equivalent. Minimum 5 years of experience working on large-scale material handling systems. Willingness and ability to travel up to 50% or as required by project needs. Proactive and hands-on work ethic. Self-motivated with proven problem-solving abilities, especially in stressful situations. Pragmatic and capable of balancing quality, schedule, costs, scope, and resources. Ability to lead a specialized team responsible for complex projects. Motivator and trainer for team members. Knowledge of mechanical installation and design standards. Basic knowledge of controls engineering theories and principles. Basic knowledge of electrical installation and design standards. Proficient in Microsoft Office Suite of Tools. Preferred Qualifications Project Management certification or experience. Technical knowledge/background in material handling processes and standards. In-depth knowledge of test approaches (Mechanical). Leadership experience as a project team and people manager.
    $101k-133k yearly est. Auto-Apply 10d ago
  • Site Reliability Engineer

    Icims 4.6company rating

    Atlanta, GA jobs

    We are seeking a skilled Engineer, Site Reliability (SRE) to contribute to the reliability, scalability, and performance of our multi-cloud SaaS platform serving thousands of customers worldwide. This role involves hands-on technical work in incident response, system monitoring, automation, and continuous improvement of our platform reliability. The successful candidate will work within a global SRE team to ensure optimal system performance and customer satisfaction. **About Us** When you join iCIMS, you join the team helping global companies transform business and the world through the power of talent. Our customers do amazing things: design rocket ships, create vaccines, deliver consumer goods globally, overnight, with a smile. As the Talent Cloud company, we empower these organizations to attract, engage, hire, and advance the right talent. We're passionate about helping companies build a diverse, winning workforce and about building our home team. We're dedicated to fostering an inclusive, purpose-driven, and innovative work environment where everyone belongs. **Responsibilities** + **System Monitoring & Reliability:** + Monitor multi-cloud infrastructure (AWS, Azure, GCP) using New Relic, Grafana, and Sumo Logic + Maintain reliability of AWS resources, Auth0/Okta authentication, databases, and legacy applications + Implement monitoring, alerting, and dashboards for assigned systems + **Incident Management & Response:** + Respond to alerts and incidents within SLA timeframes + Perform root cause analysis and document findings + Create and maintain runbooks and troubleshooting procedures + Participate in 24/7 on-call rotation + **Automation & Improvement:** + Develop scripts to reduce manual operational overhead + Build monitoring and alerting solutions + Support infrastructure-as-code initiatives + Implement automated remediation where possible + **Success Metrics:** + **Customer Impact** : Reduced MTTR and improved customer satisfaction scores + **Reliability** : Achievement of 99.9%+ uptime SLAs across all products and regions + **Proactive Prevention:** Reduction in incident frequency through automated detection and prevention + **Cross-functional Collaboration:** Improved partnership metrics with Product, Engineering, and Customer Success teams + **Automation Delivery:** Complete assigned automation projects to reduce manual tasks + **Knowledge Sharing:** Contribute to team knowledge base and mentor junior engineers **Qualifications** + 4+ years experience in SRE, DevOps, or Infrastructure Engineering + Hands-on experience with AWS (required) and Azure (preferred) + Strong Linux system administration skills + Experience with monitoring tools (New Relic, Grafana, Prometheus) + Scripting skills in Python, Bash, or similar + Knowledge of databases (SQL Server, PostgreSQL, MongoDB) **Preferred** **Technical Experience:** + SaaS experience in a global environment + Authentication and identity management systems knowledge + Cloud certifications (AWS, Azure, or Google Cloud) + Infrastructure-as-code tools (Terraform, CloudFormation) **Education/Certifications/Licenses:** + Bachelor's degree in computer science, Engineering, Information Systems, or related technical field + Equivalent combination of education and experience will be considered **Working Conditions:** + Global role requiring flexibility for incident response and team coordination across time zones + Occasional client-facing responsibilities during critical incidents + Travel may be required for team building + Hybrid work environment with team members distributed globally **EEO Statement** iCIMS is a place where everyone belongs. We celebrate diversity and are committed to creating an inclusive environment for all employees. Our approach helps us to build a winning team that represents a variety of backgrounds, perspectives, and abilities. So, regardless of how your diversity expresses itself, you can find a home here at iCIMS. We are proud to be an equal opportunity and affirmative action employer. We prohibit discrimination and harassment of any kind based on race, color, religion, national origin, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, veteran status, genetic information, disability, or other applicable legally protected characteristics. If you would like to request an accommodation due to a disability, please contact us at *****************. **Compensation and Benefits** We accept applications for this position on an ongoing basis until the position is filled. Applications will be reviewed as they are received, and qualified candidates may be contacted throughout the posting period. The anticipated base pay range for this position is $100,000-140,000.00 annually. Final compensation will be based on factors such as relevant experience, skills, education, internal equity, and market data. This range aligns with our commitment to equitable and transparent compensation practices, as required by applicable law. Competitive health and wellness benefits include medical, dental, vision, 401(k), dependent care, short term and long-term disability, life and AD&D insurance, bonding and parental leave, mindfulness resources, an open vacation policy, sick days, paid holidays, quiet hours each workday, and tuition reimbursement. Benefits and eligibility may vary by location, role, and tenure. Learn more here: **********************************
    $100k-140k yearly 60d+ ago

Learn more about QGenda jobs