Reliability Engineer jobs at JPMorgan Chase & Co. - 678 jobs
Site Reliability Engineer II
Jpmorgan Chase & Co 4.8
Reliability engineer job at JPMorgan Chase & Co.
JobID: 210637057 JobSchedule: Full time JobShift: : Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions. As a Site Reliability Engineer II at JPMorgan Chase within the Enterprise technology, corporate technology team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies.
Job responsibilities
* Execute small to medium projects independently with initial direction, eventually graduating to designing and delivering projects autonomously.
* Leverage technology to solve business problems by writing high-quality, maintainable, and robust code following best practices in software engineering.
* Participate in triaging, examining, diagnosing, and resolving incidents, collaborating with others to solve problems at their root.
* Recognize toil within the role and proactively work towards eliminating it through systems engineering or updating application code.
* Understand observability patterns and strive to implement and improve service level indicators, objectives monitoring, and alerting solutions for optimal transparency and analysis.
Required qualifications, capabilities, and skills
* Formal training or certification in software engineering concepts with 2+ years of applied experience.
* Ability to code in at least one programming language.
* Experience maintaining a cloud-based infrastructure.
* Familiarity with site reliability concepts, principles, and practices.
* Familiarity with observability tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others.
* Familiarity with containers or a common server OS such as Linux and Windows.
* Emerging knowledge of software, applications, and technical processes within a given technical discipline (e.g., Cloud, AI, Android, etc.).
* Emerging knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform.
* Emerging knowledge of common networking technologies.
Preferred qualifications, capabilities, and skills
* Ability to work in a large, collaborative team and demonstrate the willingness to vocalize ideas with peers and managers.
* Understanding of how to prioritize and adjust work plans to adapt to changes in assigned responsibilities and projects.
* Eagerness to participate in learning opportunities to enhance effectiveness in executing day-to-day project activities.
* Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems.
* General knowledge of the financial services industry.
* Knowledge of IDEs and use of coding assistants.
* Knowledge of GEN AI for technology and operations.
#LI-ID1
$97k-119k yearly est. Auto-Apply 60d+ ago
Looking for a job?
Let Zippia find it for you.
Site Reliability Engineer
The Voleon Group 4.1
Berkeley, CA jobs
Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future.
Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together.
In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more.
As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production‑critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real‑world problems and collaborate with passionate and talented colleagues in an empowering, results‑driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.
Responsibilities
Improve fault‑tolerance and maintainability of code in proprietary data pipelines and trading systems
Diagnose and fix bugs in code
Lead complex deployments
Automate manual workflows
Track and prioritize outstanding production‑related issues
Share an on‑call rotation responding to incidents to ensure the continuous operation of production‑critical systems
Requirements
Experience with coding and debugging Python
Experience with Linux
Familiarity with Relational Databases & SQL
Sharp analytical and problem‑solving skills and a persistent drive to make things work (better)
Strong growth mindset and a passion for learning
Strong technical communication skills
Attention to detail
2 years of relevant industry experience
An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience
Preferred Qualifications
Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
Experience supporting production systems
Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes
The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match.
Friends of Voleon Candidate Referral Program
If you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program.
Equal Opportunity Employer
The Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
#J-18808-Ljbffr
$120k-160k yearly 2d ago
Senior AI SRE: Scale GenAI Reliability & Impact
Charles Schwab Corporation 4.8
San Francisco, CA jobs
A leading financial services firm is seeking a Senior AI Site Reliability Engineer responsible for designing and managing the reliability of AI-driven applications. In this role, you'll work on innovative projects and mentor junior engineers while collaborating with cross-functional teams. Candidates should have extensive experience in software development and reliability engineering, with a particular focus on AI systems. This on-site position is located in San Francisco and offers opportunities for professional growth and development.
#J-18808-Ljbffr
$118k-152k yearly est. 3d ago
Process Improvement Specialist
DZ Corporation 4.3
The Villages, FL jobs
Reports To:
Operations Manager
The Process Improvement Specialist is responsible for optimizing production processes within the precast concrete facility. This role focuses on identifying inefficiencies, implementing process enhancements, and supporting quality and safety improvements across manufacturing operations. Working closely with production teams, engineers, and supervisors, the specialist helps streamline workflows, reduce waste, and ensure consistent product quality.
Key Responsibilities:
Process Analysis & Optimization:
Observe and analyze daily production activities (casting, curing, reinforcement, finishing, etc.) to identify bottlenecks and improvement opportunities.
Data Collection & Reporting:
Gather and track production data such as cycle times, material usage, downtime, and defect rates to support improvement projects.
Continuous Improvement Projects:
Assist in implementing Lean, 5S, or Six Sigma initiatives to improve plant efficiency, reduce waste, and enhance workplace organization.
Standard Work & Documentation:
Help develop and update standard operating procedures (SOPs), work instructions, and visual management tools.
Quality & Safety Support:
Collaborate with Quality Control and Safety teams to ensure process changes meet safety standards and product specifications.
Technical Support:
Support the introduction of new molds, equipment, or materials by conducting process trials and documenting results.
Collaboration:
Partner with maintenance, engineering, and production supervisors to troubleshoot recurring process issues.
Qualifications:
Education:
Associate's degree or technical diploma in Manufacturing Technology, Industrial Engineering, or related field.
Equivalent experience in precast concrete production or process improvement will be considered.
Experience:
2+ years in a manufacturing or precast concrete environment.
Familiarity with Lean Manufacturing, 6S, or Continuous Improvement principles.
Skills:
Strong mechanical aptitude and understanding of production equipment.
Ability to collect and interpret process data (cycle times, scrap, yield, etc.).
Proficiency in Microsoft Office and basic data entry tools.
Good communication and problem-solving skills.
Team-oriented and hands-on approach.
Preferred Qualifications:
Experience with precast or concrete manufacturing processes (casting, curing, form setup, reinforcement, finishing).
Knowledge of quality systems such as NPCA or PCI standards.
Basic CAD or technical drawing reading ability.
Certification in Lean or Six Sigma or willingness to acquire.
Performance Indicators:
Reduction in process waste or rework rates.
Increased production throughput and efficiency.
Improved safety compliance and incident reduction.
Consistency in meeting product quality standards.
Implementation and sustainability of improvement projects.
$68k-100k yearly est. 2d ago
Electromechanical Validation Engineer
Generis Tek Inc. 4.0
Milwaukee, WI jobs
Please Contact: To discuss this amazing opportunity, reach out to our Talent Acquisition Specialist Jigar Kachhia at email address **************************** can be reached on # ************.
We have Permanent role Electromechanical Validation Engineer for our client at Willowbrook, IL. Please let me know if you or any of your friends would be interested in this position.
Position Details:
Electromechanical Validation Engineer- Willowbrook, IL
Location : Willowbrook, IL 60527
Project Duration : Full-time Permanent
Job Summary:
Join Client as an Electromechanical Validation Engineer and help ensure the reliability and performance of innovative battery diagnostic tools and systems. This role is ideal for engineers with deep electromechanical aptitude who thrive in a hands-on lab environment.
You'll lead validation for a range of battery testers and diagnostic platforms for both traditional 12V ICE vehicles and high voltage EV battery modules. Day-to-day work involves setting up complex test environments, troubleshooting real-world hardware issues, analyzing system behaviors, and collaborating with design, firmware, and manufacturing teams to ensure robust product performance.
Key Responsibilities:
• Develop and execute validation test plans for electromagnetically systems (e.g., test equipment with embedded electronics, relays, sensors, and power electronics).
• Build, maintain, and instrument hardware test setups involving DC power systems, loads, thermal management, cabling, enclosures, and mechanical interfaces.
• Use tools like oscilloscopes, DAQ systems, power supplies, thermal chambers, and custom test fixtures to execute validation activities.
• Investigate issues by analyzing both electrical signals and mechanical performance; drive root-cause resolution in cross-functional teams.
• Define and manage validation timelines to align with hardware development milestones.
• Act as primary liaison with external test labs for regulatory, certification, and environmental testing (e.g., thermal, vibration, EMC).
• Own compliance, qualification, and certification efforts for design releases.
• Author detailed validation plans, protocols, test reports, and engineering documentation.
Position Requirements:
• BS or higher in Electrical Engineering, Mechatronics, or a related discipline
• Minimum 10 years of experience in hardware validation of electromechanical systems
• Proven track record diagnosing mixed signal, power, and electromechanical issues in lab environments
• Strong understanding of validation methods and lab instrumentation (oscilloscopes, DAQ, thermal cycling, high current load testing)
• Experience with LabVIEW or equivalent for test automation.
• Excellent technical communication and teamwork skills
WHY CHOOSE Client:
• Comprehensive Health Coverage: Medical, dental, and vision benefits that prioritize your well-being.
• Secure Your Future: Life and disability insurance provided at no extra cost to you.
• Invest in Tomorrow: 401K savings plan with company match.
• Performance Rewards: Annual bonus and profit-sharing opportunities.
• Time to Recharge: Enjoy 12 days of vacation per year (prorated based on start date); 5 emergency PTO days; plus 10
Company-paid holidays.
• Continual Learning: Tuition reimbursement to support your educational goals.
• Health & Wellness: Onsite wellness screenings, flu shots, and subsidized health club memberships.
• Sustainable Choices: Free charging stations for hybrid and electric vehicles.
• Exclusive Perks: Discounts with auto suppliers.
• Appreciation in Action: Weekly breakfast or lunch as a gesture of our gratitude to our team.
• Must be able to travel to external labs.
To discuss this amazing opportunity, reach out to our Talent Acquisition Specialist Jigar Kachhia at email address **************************** can be reached on # ************.
$61k-78k yearly est. 1d ago
Staff Site Reliability Engineer
Figure 4.5
San Jose, CA jobs
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.
We are looking for a Site Reliability Engineer to own our internal systems infrastructure. This role is responsible for setting up and managing cloud and on-prem infrastructure to deliver highly available, reliable, and automated systems.
Responsibilities:
Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more.
Migrate SaaS to self-hosted solutions to enhance security and reliability.
Implement monitoring and alerting systems, and define incident response plans and runbooks.
Reduce human workload through automation to automate deployment and scaling.
Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives.
Use a data driven approach to demonstrate service robustness and track optimization work.
Partner with the security team to ensure that security remediations and updates are applied in a timely manner.
Requirements:
Strong experience with Linux/Unix systems administration
Proficiency in programming/scripting
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems.
Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets.
Ability to work in cross-functional teams with developers, infra, and product teams
Excellent verbal and written communication skills
The US base salary range for this full-time position is between $175,000 - $250,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
$175k-250k yearly Auto-Apply 60d+ ago
Site Reliability Engineer - Capital Markets
Jefferies Financial Group Inc. 4.8
Jersey City, NJ jobs
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.
As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
Front Line Site ReliableEngineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
Build monitoring tools for application and infrastructure components.
Implement and manage scalable infrastructure using cloud-native technologies and tools.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
Develop and maintain CI/CD pipelines to streamline deployment processes.
Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
Create sustainable systems and services through automation.
Collaborate with Application team to establish and enforce production and development standards.
Document procedures, best practices and troubleshooting FAQs.
Resolve complex application and technical problems.
Debugging the system and fixing the production related issues.
Escalate / follow-up on permanent fix for development related issues.
Lead incident response efforts and post-mortem analysis to prevent future occurrences.
Handles complex operational tasks and recommends process and technology changes.
Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
Ability to work both independently and in groups in an energetic, diverse environment.
Participate in on-call rotations to ensure 24/7 system availability and support.
Support compliance and legal queries.
Qualifications:
Strong experience in Windows and Linux/Unix services.
Strong experience in scripting language like Power shell, Python and SQL.
Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
Strong Knowledge of FIX protocol
Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
$175k-200k yearly Auto-Apply 51d ago
Site Reliability Engineer
Tata Consulting Services 4.3
Atlanta, GA jobs
Must Have Technical/Functional Skills * Monitoring solutions - CloudWatch, Dynatrace, PagerDuty * DevOps - GitLab, GitLab CI/CD, AWS Cloud Development Kit (CDK), CloudFormation (CFT) and CodePipeline * Languages, IDEs, Tools & Architectures - Node.js, TypeScript, YAML, VSCode, IntelliJ, Eclipse, REST API, Postman, Docker,
* AWS Technologies - API Gateway, Route 53, Lambda, Kafka, ElastiCache, PostgeSQL, SNS, Quarkus, EventBridge, Secret Manager
Roles & Responsibilities
* Building and supporting a reliable application suite for the environment to meet the development and maintenance
* requirements of systems/platforms
* Implement Service Reliability Engineering by working as part of the development team to evaluate the health, stability, and reliability of applications
* Lead the team in best practices in incident, problem, and change management
* Utilizing monitoring, alerts, dashboards, and management tools to ensure the availability, reliability, cost, and performance of applications and services
* Constantly working to improve and implement automation of applications tasks
* Providing technical support for systems/platforms according to application SLA's
* Responsible for designing and developing resiliency in the application code, troubleshooting incidents, engaging with squads to address failure patterns, and participating in incident management
* Develop delivery pipelines and automated deployment scripts
* Configure services, such as databases and monitoring
Salary Range-$100,000-$125,000 a year
#LI-KR3
TCS Employee Benefits Summary:
Discretionary Annual Incentive.
Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
Family Support: Maternal & Parental Leaves.
Insurance Options: Auto & Home Insurance, Identity Theft Protection.
Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.
Time Off: Vacation, Time Off, Sick Leave & Holidays.
Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
$100k-125k yearly 5d ago
Staff Site Reliability Engineer
CME Group 4.4
Chicago, IL jobs
We're looking for a Staff Site Reliability Engineer to join our team, focusing on the core systems that power global financial markets. This isn't just about keeping the lights on; it's about pioneering the future of financial technology. As a member of our Clearing department, you'll be on the front lines, ensuring the integrity and performance of mission-critical systems that facilitate billions of dollars in daily transactions. If you're a builder at heart, driven by a passion for creating ultra-reliable and resilient systems, you'll thrive here.
This is a hybrid role. You must be in our office 2+ days a week
What You'll Get
* A supportive environment fostering career progression, continuous learning, and an inclusive culture.
* Broad exposure to CME's diverse products, asset classes, and cross-functional teams.
* A competitive salary and comprehensive benefits package. Learn more about our career opportunities here.
What You'll Do
As a Staff Site Reliability Engineer, you'll be a visionary builder of our resilient infrastructure. You'll move beyond conventional operations to apply software engineering principles to every facet of our clearing systems.
* Pioneer solutions to guarantee the reliability, performance, and availability of our CME clearing and risk systems, where every millisecond and every transaction counts.
* Architect and implement cutting-edge solutions for application resiliency and fault tolerance.
* Drive automation and continuous improvement across the entire system lifecycle, eliminating manual toil and enhancing operational excellence.
* Integrate SRE principles directly into the software development lifecycle, embedding reliability from day one.
* Collaborate with cross-functional development and platform teams, providing expert-level guidance to deploy and maintain critical applications.
* Innovate and lead efforts to prevent incidents, enhance operational processes, and automate solutions at a global scale.
* Spearhead the adoption of observability and performance testing, guiding teams to a "build with SRE mindset" culture.
* Own the end-to-end operational integrity of products, understanding and contributing to the bigger picture of the organization.
What You'll Bring
* A strong academic background: Bachelor's degree in Engineering, Computer Science, Information Technology, or a related field is strongly preferred.
* Cloud expertise: Hands-on experience deploying and operating applications using IaaS and PaaS on major cloud providers, preferably Google Cloud Services.
* Coding fluency: Proficiency in one or more of the following languages: Java, Python, Bash, or Go. Typescript and/or Rust are a significant plus.
* Infrastructure as Code (IaC) mastery: Experience with tools such as GKE, Terraform, CloudFormation, and Chef.
* Proven reliability engineering skills: Deep knowledge of SRE and security best practices, with a track record of implementing them into workflows. A solid understanding of performance testing tools is essential, along with the ability to help teams resolve complex performance issues.
* Automation prowess: Demonstrated experience with automation, CI/CD, orchestration, and configuration management.
* Observability knowledge: Familiarity with logging and observability platforms such as OpenTelemetry and Prometheus.
* A security-first mindset: Strong understanding of security and compliance frameworks.
* Problem-solving abilities: Excellent written and verbal communication skills, with the ability to convey complex technical concepts clearly to both technical and non-technical audiences.
* Strong collaboration skills: An agile team player who is self-motivated and can work with minimal supervision while juggling multiple concurrent projects.
* A passion for innovation: A continuous desire to learn and stay up-to-date with the latest technologies and industry trends.
#LI-JK1
#LI-Hybrid
CME Group is committed to offering a competitive total rewards package for our employees that recognizes their contributions to the business and reflects our long-term investment in their future. The pay range for this role is $128,500-$214,100. Actual salary offered will be dependent on a wide array of factors including but not limited to: relevant experience, skills, education and comparison to internal employees (where relevant). Our compensation program also includes an annual target bonus opportunity for all employees, as well as the opportunity to become an owner in the company through our broad-based equity program. Through our benefits program, we strive to offer flexibility, value and choice. From comprehensive health coverage, to a retirement package that includes both a 401(k) and an active pension plan, to highly competitive education reimbursement provisions, paid time off and a mental health benefit, CME Group offers a holistic benefits package for our team and their dependents.
CME Group: Where Futures are Made
CME Group is the world's leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it - all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And we're looking for more.
At CME Group, we embrace our employees' unique experiences and skills to ensure that everyone's perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.
Important Notice: Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.
$128.5k-214.1k yearly 60d+ ago
Reliability Engineer
Tata Consulting Services 4.3
Marlborough, MA jobs
* SRE to quickly write automations, self-heal scripts, understanding and finding resolutions for errors from Microservices basically any from any stack ( Full-Stack capable). * Operations skillset with enough attitude to scale to a Reliability Engineer
* Should be able to handle customer communication and coordination with offshore team.
TCS Employee Benefits Summary:
* Discretionary Annual Incentive.
* Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
* Family Support: Maternal & Parental Leaves.
* Insurance Options: Auto & Home Insurance, Identity Theft Protection.
* Convenience & Professional Growth: Commute r Benefits & Certification & Training Reimbursement.
* Time Off: Vacation, Time Off, Sick Leave & Holidays.
* Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
# LI-RJ2
Salary Range - $100,000-$120,000 a year
$100k-120k yearly 5d ago
Reliability Engineer (SRE OMS)
Tata Consulting Services 4.3
Marlborough, MA jobs
* SRE with Sterling OMS Skillset with adaptability to Distributed Systems, developing Automations with AI/GenAI tool etc * Operations skillset with enough attitude to scale to a Reliability Engineer. * Should be able to handle customer communication and coordination with offshore team.
TCS Employee Benefits Summary:
* Discretionary Annual Incentive.
* Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
* Family Support: Maternal & Parental Leaves.
* Insurance Options: Auto & Home Insurance, Identity Theft Protection.
* Convenience & Professional Growth: Commute r Benefits & Certification & Training Reimbursement.
* Time Off: Vacation, Time Off, Sick Leave & Holidays.
* Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
# LI-RJ2
Salary Range - $100,000-$120,000 a year
$100k-120k yearly 5d ago
Site Reliability Engineer
Tata Consulting Services 4.3
Miami, FL jobs
Must-Have * Strong development experience in .NET and Java frameworks. * Proven leadership managing SRE and DevOps teams. * Incident and problem management using ServiceNow. * Expertise in Observability: AppDynamics, PagerDuty, Grafana, Splunk. * Deep understanding of CI/CD with Azure ADO, GitHub, Maven, Gradle.
* Automated regression and performance testing experience with Selenium, JMeter.
* Experience building self-healing systems.
* Strong skills in root cause analysis (RCA) and problem identification.
* Ability to define and enforce SLAs and response metrics.
* Document and maintain version-controlled knowledge repositories.
* Exposure to self-healing systems in SRE or DevOps context.
Good-to-Have
* Certifications in AWS/GCP/Azure
Salary Range-$100,000-$120,000 a year
#LI-KR3
TCS Employee Benefits Summary:
Discretionary Annual Incentive.
Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
Family Support: Maternal & Parental Leaves.
Insurance Options: Auto & Home Insurance, Identity Theft Protection.
Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.
Time Off: Vacation, Time Off, Sick Leave & Holidays.
Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
Experience working in a Travel/Tourism industry
$100k-120k yearly 2d ago
Site Reliability Engineer - Capital Markets
Jefferies Financial Group Inc. 4.8
New York, NY jobs
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
* Front Line Site ReliableEngineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
* Build monitoring tools for application and infrastructure components.
* Implement and manage scalable infrastructure using cloud-native technologies and tools.
* Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
* Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
* Develop and maintain CI/CD pipelines to streamline deployment processes.
* Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
* Create sustainable systems and services through automation.
* Collaborate with Application team to establish and enforce production and development standards.
* Document procedures, best practices and troubleshooting FAQs.
* Resolve complex application and technical problems.
* Debugging the system and fixing the production related issues.
* Escalate / follow-up on permanent fix for development related issues.
* Lead incident response efforts and post-mortem analysis to prevent future occurrences.
* Handles complex operational tasks and recommends process and technology changes.
* Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
* Ability to work both independently and in groups in an energetic, diverse environment.
* Participate in on-call rotations to ensure 24/7 system availability and support.
* Support compliance and legal queries.
Qualifications:
* Strong experience in Windows and Linux/Unix services.
* Strong experience in scripting language like Power shell, Python and SQL.
* Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
* Strong Knowledge of FIX protocol
* Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
* Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
* Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
$175k-200k yearly Auto-Apply 33d ago
Network Reliability Engineer III
CME Group 4.4
Chicago, IL jobs
As we embark on a journey to transform the Network Services Group in CME, we are seeking a Network Reliability Engineer III to join our dynamic team. In this role, you will design, develop and maintain self-service tools and applications that enhance productivity and reduce operational costs. You will work across the full stack-both front-end and back-end-to architect microservices (GKE) in Google Cloud Platform (GCP), driving our infrastructure towards greater automation and reliability.
We are a global team across US, UK, India and Singapore made up of a diverse range of people from varied backgrounds who each bring unique network experiences and skill sets. The relatively new Network Reliability/Automation team are responsible for building a suite of custom automation tools and developing our self-healing capabilities while working closely with other members of the Network Services team in project delivery to ensure one of the largest Exchange network infrastructures in the world is highly available, resilient, secure and reliable.
Responsibilities
* Design, develop and maintain self-service and automation tools to streamline IT operations and reduce manual effort.
* Engage in full-stack development, delivering responsive front-end interfaces as well as robust scalable back-end services.
* With support Architect, deploy and scale microservices on GCP, with particular emphasis on containers and Google Kubernetes Engine (GKE).
* Manage cloud infrastructure via Infrastructure-as-Code (IaC), primarily using Terraform to provision and maintain resources.
* Operate and troubleshoot solutions on Linux-based platforms, leveraging Visual Studio Code (VSCode) as the primary development environment.
* Adhere to software engineering best practices, including PEP8 coding standards, SOLID design principles, and established SDLC processes.
* Implement and manage CI/CD pipelines with a DevOps mindset, ensuring rapid, reliable delivery of code.
* Develop and consume Flask-based RESTful APIs to support network and security automation.
* Collaborate within an Agile Scrum framework, utilizing tools such as Bitbucket and Jira to track progress and manage sprints.
* Apply strong analytical and problem-solving skills to balance multiple project variables and deliver high-quality solutions on schedule.
What we are looking for
* Approximately 2-3 years' hands-on Python programming experience, with a demonstrable track record of automation or tooling projects.
* Knowledge and experience working with both Python Django and Flask in a corporate environment.
* Any experience in network and security automation, coupled with understanding of network fundamentals (routing, switching, firewalls, VPNs) would be beneficial.
* Experience developing REST APIs using Flask (or a comparable Python framework).
* Applicants with front-end experience using Javascript/JQuery/HTML5/CSS would be ideal.
* Familiarity with Infrastructure-as-Code using Terraform (or similar) to manage cloud resources.
* Comfortable working in Linux environments and proficient in using Visual Studio Code (VSCode).
* Strong software engineering mindset: adherence to PEP8, SOLID principles, and best practices for SDLC, CI/CD and DevOps.
* Excellent communication skills, both verbal and written, with the ability to convey technical concepts to diverse stakeholders.
* Highly analytical, with the ability to troubleshoot complex issues and manage multiple tasks concurrently.
* Experience working in Agile Scrum teams, utilizing Bitbucket and Jira (or equivalent tools) for version control and project tracking.
Personal Attributes
* Proactive and positive attitude, taking initiative to identify and resolve issues ahead of time.
* Collaborative team player, eager to contribute knowledge and assist colleagues.
* Innovative thinker who brings fresh ideas and constructive suggestions for continuous improvement.
Education
Bachelor's Degree in Computer Science, Engineering or a related field is preferred. Equivalent practical experience will also be considered.
#LI - Hybrid
#LI - JK1
CME Group is committed to offering a competitive total rewards package for our employees that recognizes their contributions to the business and reflects our long-term investment in their future. The pay range for this role is $100,700-$167,800. Actual salary offered will be dependent on a wide array of factors including but not limited to: relevant experience, skills, education and comparison to internal employees (where relevant). Our compensation program also includes an annual target bonus opportunity for all employees, as well as the opportunity to become an owner in the company through our broad-based equity program. Through our benefits program, we strive to offer flexibility, value and choice. From comprehensive health coverage, to a retirement package that includes both a 401(k) and an active pension plan, to highly competitive education reimbursement provisions, paid time off and a mental health benefit, CME Group offers a holistic benefits package for our team and their dependents.
CME Group: Where Futures are Made
CME Group is the world's leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it - all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And we're looking for more.
At CME Group, we embrace our employees' unique experiences and skills to ensure that everyone's perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.
Important Notice: Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.
$100.7k-167.8k yearly 60d+ ago
Site Reliability Engineer II, Operations
Pennymac 4.7
Westlake Village, CA jobs
PENNYMAC Pennymac is (NYSE: PFSI) is a specialty financial services firm with a comprehensive mortgage platform and integrated business focused on the production and servicing of U. S. mortgage loans and the management of investments related to the U.
S.
mortgage market.
At Pennymac, our people are the foundation of our success and at the heart of our dynamic work culture.
Together, we work towards a unified goal of helping millions of Americans achieve aspirations of homeownership through the complete mortgage journey.
A Typical Day As the Site Reliability Operations, Engineer II (SRO), you will help the team provide 24/7 monitoring and support of the company's IT Infrastructure.
Ideal candidates should have experience in Windows and Linux administration, in addition to experience working in AWS, as Pennymac is now almost completely migrated into the AWS cloud.
Individuals in this role should be comfortable working in a fast-paced environment.
Multitasking, in addition to communicating quickly and accurately, is critical to the success of anyone in this role.
Responsibilities Monitoring - 24/7 health monitoring of Pennymac's IT Infrastructure using tools such as AWS CloudWatch and New Relic.
Alert Management - participate in the active modification and creation of alerts to ensure the SRO team has constant visibility and is able to proactively identify threats to the stability of Pennymac's IT Infrastructure.
Incident Management - Engineers will coordinate with Pennymac's Incident Management team, Application Developers, Internal Support Teams, and 3rd Party Vendors, with the goal of resolving any production service outages quickly and accurately.
Systems Administration - responsible for various administrative tasks in both a Windows or Linux environment.
Virtual Server and Desktop Management - maintenance and troubleshooting of Pennymac's virtual server and desktop environments.
Technical Troubleshooting and Investigation - investigate and troubleshoot various technical issues that are submitted by Pennymac's IT and Application Development teams.
Internal and External Escalation - act as a point of escalation for any production impacting incidents.
Ensure both internal and external support teams are contacted in a timely manner to ensure a quick and accurate resolution.
Change Management - follow and enforce Pennymac's established Change Management processes and procedures.
Communication - monitor and respond to Call, Chat, and Email inquiries sent to the SRO team.
Ticket Queue Management - responsible for managing multiple different Ticket Queues using tools such as ServiceNow and JIRA to ensure deliverables are on time and accurate.
Documentation - assist in maintaining the SRO team's knowledge base of support articles and Standard Operating Procedures.
Play an active role in the creation of new documentation as needed.
Deployments - handle application and website code deployments, making use of tools such as Jenkins and GitLab.
Data backup, recovery, retention, and compliance - responsible for various tasks related to backup management using tools like CommVault and AWS Backup.
Project Management - organize and prioritize tasks, adhere to deadlines, and achieve all project goals within the given constraints.
What You'll Bring Qualifications Bachelor's Degree in Computer Science or comparable experience AWS Solutions Architect and/or AWS SysOps Administrator certification Proficient with Windows and Linux administration Proficient with Monitoring and Alerting tools such as Nagios, New Relic, SumoLogic, and AWS CloudWatch Proficient with programming languages such as Powershell or Python Strong attention to detail Able to prioritize tasks and have a sense of urgency with critical issues or requests Excellent written and verbal communication skills Must be comfortable completing annual role-based training and certification assignments Why You Should Join As one of the top mortgage lenders in the country, Pennymac has helped over 4 million lifetime homeowners achieve and sustain their aspirations of home.
Our vision is to be the most trusted partner for home.
Together, 4,000 Pennymac team members across the country are guided by our core values: to be Accountable, Reliable and Ethical in all that we do.
Pennymac is committed to conducting a business that makes positive contributions and promotes long-term sustainable growth and to fostering an equitable and inclusive environment, where all employees and customers feel valued, respected and supported.
Benefits That Bring It Home: Whether you're looking for flexible benefits for today, setting up short-term goals for tomorrow, or planning for long-term success and retirement, Pennymac's benefits have you covered.
Some key benefits include: Comprehensive Medical, Dental, and Vision Paid Time Off Programs including vacation, holidays, illness, and parental leave Wellness Programs, Employee Recognition Programs, and onsite gyms and cafe style dining (select locations) Retirement benefits, life insurance, 401k match, and tuition reimbursement Philanthropy Programs including matching gifts, volunteer grants, charitable grants and corporate sponsorships We value the hard work and dedication of our employees.
In addition to a competitive salary, positions may offer bonus opportunities.
To learn more about our benefits visit: *********************
page.
link/benefits Compensation: Individual salary may vary based on multiple factors including specific role, geographic location / market data, and skills and experience as defined below: Lower in range - Building skills and experience in the role Mid-range - Experience and skills align with proficiency in the role Higher in range - Experience and skills add value above typical requirements of the role Some roles may be eligible for performance-based compensation and/or stock-based incentives awarded to employees based on company and individual performance.
Salary $68,000 - $115,000 Work Model OFFICE
$68k-115k yearly Auto-Apply 33d ago
Observability / Reliability Engineer Lead (US)
TD Bank 4.5
Mount Laurel, NJ jobs
Hours: 40 Pay Details: $113,000 - $196,000 USD TD is committed to providing fair and equitable compensation opportunities to all colleagues. Growth opportunities and skill development are defining features of the colleague experience at TD. Our compensation policies and practices have been designed to allow colleagues to progress through the salary range over time as they progress in their role. The base pay actually offered may vary based upon the candidate's skills and experience, job-related knowledge, geographic location, and other specific business and organizational needs.
As a candidate, you are encouraged to ask compensation related questions and have an open dialogue with your recruiter who can provide you more specific details for this role.
Line of Business:
Technology Solutions
Job Description:
The Senior IT Development Manager manages / leads a team of technology development / designs professionals in providing a wide range of application or system solutions to the organization, ensuring standards are met and business objectives are achieved. Also provides technical leadership and guidance beyond own team.
Depth & Scope:
* Deep expertise and knowledge of Bank, technology standards and leading large and varied teams of professionals
* Extensive knowledge and understanding of businesses and/or organizational practices/ disciplines
* Advanced knowledge of external competition, industry and/or market trends in relation to own function / business
* Directs/manages a large and diverse group / team of IT professionals (e.g. up to 50) focused on complex development, system enhancements, new releases, large-scale applications / projects across multiple product lines / businesses, involving significant scope and complexity
* Sets operational team direction and works autonomously in the management of the unit and collaborates with others to establish and execute on common goals
Education & Experience:
* Undergraduate degree or Technical Certificate
* Graduate degree, preferred
* 10+ years related experience
Preferred Qualifications:
* 5+ years of expertise in reliability engineering concepts including SLI, SLO, error budget, incident and problem management, resiliency patterns, etc.
* 5+ years of experience with observability platforms including Dynatrace, Splunk, Datadog, etc.
* 5+ years of experience with automated monitoring and alerting: building monitoring frameworks, dashboards, alert tuning.
* Advanced understanding of distributed and cloud native systems engineering including scalability, high availability, and performance optimization
* Strong hands-on experience with Azure and infrastructure as code using Terraform
* 5+ years of experience with CI/CD pipelines and automated testing frameworks
* Proficiency in at least one modern language (Python or Java) with strong scripting skills
* Strong technical leadership and cross functional influencing experience with ability to communicate abstract technical concepts in business meaningful terms.
* 5+ years of technical strategy and roadmap development experience
Physical Requirements:
Never: 0%; Occasional: 1-33%; Frequent: 34-66%; Continuous: 67-100%
* Domestic Travel - Occasional
* International Travel - Never
* Performing sedentary work - Continuous
* Performing multiple tasks - Continuous
* Operating standard office equipment - Continuous
* Responding quickly to sounds - Occasional
* Sitting - Continuous
* Standing - Occasional
* Walking - Occasional
* Moving safely in confined spaces - Occasional
* Lifting/Carrying (under 25 lbs.) - Occasional
* Lifting/Carrying (over 25 lbs.) - Never
* Squatting - Occasional
* Bending - Occasional
* Kneeling - Never
* Crawling - Never
* Climbing - Never
* Reaching overhead - Never
* Reaching forward - Occasional
* Pushing - Never
* Pulling - Never
* Twisting - Never
* Concentrating for long periods of time - Continuous
* Applying common sense to deal with problems involving standardized situations - Continuous
* Reading, writing and comprehending instructions - Continuous
* Adding, subtracting, multiplying and dividing - Continuous
The above statements are intended to describe the general nature and level of work being performed by people assigned to this job. They are not intended to be an exhaustive list of all responsibilities, duties and skills required. The listed or specified responsibilities & duties are considered essential functions for ADA purposes.
Who We Are:
TD is one of the world's leading global financial institutions and is the fifth largest bank in North America by branches/stores. Every day, we deliver legendary customer experiences to over 27 million households and businesses in Canada, the United States and around the world. More than 95,000 TD colleagues bring their skills, talent, and creativity to the Bank, those we serve, and the economies we support. We are guided by our vision to Be the Better Bank and our purpose to enrich the lives of our customers, communities and colleagues.
TD is deeply committed to being a leader in customer experience, that is why we believe that all colleagues, no matter where they work, are customer facing. As we build our business and deliver on our strategy, we are innovating to enhance the customer experience and build capabilities to shape the future of banking. Whether you've got years of banking experience or are just starting your career in financial services, we can help you realize your potential. Through regular leadership and development conversations to mentorship and training programs, we're here to support you towards your goals. As an organization, we keep growing - and so will you.
Our Total Rewards Package
Our Total Rewards package reflects the investments we make in our colleagues to help them and their families achieve their financial, physical and mental well-being goals. Total Rewards at TD includes base salary and variable compensation/incentive awards (e.g., eligibility for cash and/or equity incentive awards, generally through participation in an incentive plan) and several other key plans such as health and well-being benefits, savings and retirement programs, paid time off (including Vacation PTO, Flex PTO, and Holiday PTO), banking benefits and discounts, career development, and reward and recognition. Learn more
Additional Information:
We're delighted that you're considering building a career with TD. Through regular development conversations, training programs, and a competitive benefits plan, we're committed to providing the support our colleagues need to thrive both at work and at home.
Colleague Development
If you're interested in a specific career path or are looking to build certain skills, we want to help you succeed. You'll have regular career, development, and performance conversations with your manager, as well as access to an online learning platform and a variety of mentoring programs to help you unlock future opportunities. Whether you have a passion for helping customers and want to expand your experience, or you want to coach and inspire your colleagues, there are many different career paths within our organization at TD - and we're committed to helping you identify opportunities that support your goals.
Training & Onboarding
We will provide training and onboarding sessions to ensure that you've got everything you need to succeed in your new role.
Interview Process
We'll reach out to candidates of interest to schedule an interview. We do our best to communicate outcomes to all applicants by email or phone call.
Accommodation
TD Bank is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, status as a protected veteran or any other characteristic protected under applicable federal, state, or local law.
If you are an applicant with a disability and need accommodations to complete the application process, please email TD Bank US Workplace Accommodations Program at ***************. Include your full name, best way to reach you and the accommodation needed to assist you with the applicant process.
$113k-196k yearly Auto-Apply 3d ago
Observability / Reliability Engineer Lead (US)
TD Bank 4.5
Mount Laurel, NJ jobs
Mount Laurel, New Jersey, United States of America **Hours:** 40 **Pay Details:** $113,000 - $196,000 USD TD is committed to providing fair and equitable compensation opportunities to all colleagues. Growth opportunities and skill development are defining features of the colleague experience at TD. Our compensation policies and practices have been designed to allow colleagues to progress through the salary range over time as they progress in their role. The base pay actually offered may vary based upon the candidate's skills and experience, job-related knowledge, geographic location, and other specific business and organizational needs.
As a candidate, you are encouraged to ask compensation related questions and have an open dialogue with your recruiter who can provide you more specific details for this role.
**Line of Business:**
Technology Solutions
**Job Description:**
The Senior IT Development Manager manages / leads a team of technology development / designs professionals in providing a wide range of application or system solutions to the organization, ensuring standards are met and business objectives are achieved. Also provides technical leadership and guidance beyond own team.
**Depth & Scope:**
+ Deep expertise and knowledge of Bank, technology standards and leading large and varied teams of professionals
+ Extensive knowledge and understanding of businesses and/or organizational practices/ disciplines
+ Advanced knowledge of external competition, industry and/or market trends in relation to own function / business
+ Directs/manages a large and diverse group / team of IT professionals (e.g. up to 50) focused on complex development, system enhancements, new releases, large-scale applications / projects across multiple product lines / businesses, involving significant scope and complexity
+ Sets operational team direction and works autonomously in the management of the unit and collaborates with others to establish and execute on common goals
**Education & Experience:**
+ Undergraduate degree or Technical Certificate
+ Graduate degree, preferred
+ 10+ years related experience
**Preferred Qualifications:**
+ 5+ years of expertise in reliability engineering concepts including SLI, SLO, error budget, incident and problem management, resiliency patterns, etc.
+ 5+ years of experience with observability platforms including Dynatrace, Splunk, Datadog, etc.
+ 5+ years of experience with automated monitoring and alerting: building monitoring frameworks, dashboards, alert tuning.
+ Advanced understanding of distributed and cloud native systems engineering including scalability, high availability, and performance optimization
+ Strong hands-on experience with Azure and infrastructure as code using Terraform
+ 5+ years of experience with CI/CD pipelines and automated testing frameworks
+ Proficiency in at least one modern language (Python or Java) with strong scripting skills
+ Strong technical leadership and cross functional influencing experience with ability to communicate abstract technical concepts in business meaningful terms.
+ 5+ years of technical strategy and roadmap development experience
**Physical Requirements:**
Never: 0%; Occasional: 1-33%; Frequent: 34-66%; Continuous: 67-100%
+ Domestic Travel - Occasional
+ International Travel - Never
+ Performing sedentary work - Continuous
+ Performing multiple tasks - Continuous
+ Operating standard office equipment - Continuous
+ Responding quickly to sounds - Occasional
+ Sitting - Continuous
+ Standing - Occasional
+ Walking - Occasional
+ Moving safely in confined spaces - Occasional
+ Lifting/Carrying (under 25 lbs.) - Occasional
+ Lifting/Carrying (over 25 lbs.) - Never
+ Squatting - Occasional
+ Bending - Occasional
+ Kneeling - Never
+ Crawling - Never
+ Climbing - Never
+ Reaching overhead - Never
+ Reaching forward - Occasional
+ Pushing - Never
+ Pulling - Never
+ Twisting - Never
+ Concentrating for long periods of time - Continuous
+ Applying common sense to deal with problems involving standardized situations - Continuous
+ Reading, writing and comprehending instructions - Continuous
+ Adding, subtracting, multiplying and dividing - Continuous
The above statements are intended to describe the general nature and level of work being performed by people assigned to this job. They are not intended to be an exhaustive list of all responsibilities, duties and skills required. The listed or specified responsibilities & duties are considered essential functions for ADA purposes.
**Who We Are:**
TD is one of the world's leading global financial institutions and is the fifth largest bank in North America by branches/stores. Every day, we deliver legendary customer experiences to over 27 million households and businesses in Canada, the United States and around the world. More than 95,000 TD colleagues bring their skills, talent, and creativity to the Bank, those we serve, and the economies we support. We are guided by our vision to Be the Better Bank and our purpose to enrich the lives of our customers, communities and colleagues.
TD is deeply committed to being a leader in customer experience, that is why we believe that all colleagues, no matter where they work, are customer facing. As we build our business and deliver on our strategy, we are innovating to enhance the customer experience and build capabilities to shape the future of banking. Whether you've got years of banking experience or are just starting your career in financial services, we can help you realize your potential. Through regular leadership and development conversations to mentorship and training programs, we're here to support you towards your goals. As an organization, we keep growing - and so will you.
**Our Total Rewards Package**
Our Total Rewards package reflects the investments we make in our colleagues to help them and their families achieve their financial, physical and mental well-being goals. Total Rewards at TD includes base salary and variable compensation/incentive awards (e.g., eligibility for cash and/or equity incentive awards, generally through participation in an incentive plan) and several other key plans such as health and well-being benefits, savings and retirement programs, paid time off (including Vacation PTO, Flex PTO, and Holiday PTO), banking benefits and discounts, career development, and reward and recognition. Learn more (***************************************
**Additional Information:**
We're delighted that you're considering building a career with TD. Through regular development conversations, training programs, and a competitive benefits plan, we're committed to providing the support our colleagues need to thrive both at work and at home.
**Colleague Development**
If you're interested in a specific career path or are looking to build certain skills, we want to help you succeed. You'll have regular career, development, and performance conversations with your manager, as well as access to an online learning platform and a variety of mentoring programs to help you unlock future opportunities. Whether you have a passion for helping customers and want to expand your experience, or you want to coach and inspire your colleagues, there are many different career paths within our organization at TD - and we're committed to helping you identify opportunities that support your goals.
**Training & Onboarding**
We will provide training and onboarding sessions to ensure that you've got everything you need to succeed in your new role.
**Interview Process**
We'll reach out to candidates of interest to schedule an interview. We do our best to communicate outcomes to all applicants by email or phone call.
**Accommodation**
TD Bank is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, status as a protected veteran or any other characteristic protected under applicable federal, state, or local law.
If you are an applicant with a disability and need accommodations to complete the application process, please email TD Bank US Workplace Accommodations Program at *************** . Include your full name, best way to reach you and the accommodation needed to assist you with the applicant process.
Federal law prohibits job discrimination based on race, color, sex, sexual orientation, gender identity, national origin, religion, age, equal pay, disability and genetic information.
$113k-196k yearly 60d+ ago
Site Reliability Engineer III
Jpmorganchase 4.8
Reliability engineer job at JPMorgan Chase & Co.
There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within Chase within the Enterprise technology, engineering services and platform team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Supports the adoption of site reliability engineering best practices within your team
Production 24*7 support for business-critical applications
Required qualifications, capabilities, and skills
Formal training or certification in software engineering concepts with 2+ years of applied experience.
Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Familiarity with troubleshooting common networking technologies and issues
Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
Experience with event streaming platforms likes Kafka
Experience in Incident and change management
Preferred qualifications, capabilities, and skills
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems
Networking and systems
Deep understanding of TCP/IP, DNS, load balancing, firewalls, and VPN technologies
Experience tuning Linux performance and troubleshooting system-level issues
Collaborative leadership
Demonstrated ability to mentor junior engineers and drive SRE best-practice adoption
Strong written and verbal communication skills; comfortable presenting to technical and non-technical stakeholders
Certifications (a plus)
AWS Certified SysOps Administrator or Professional, Certified Kubernetes Administrator (CKA), or equivalent
$97k-119k yearly est. Auto-Apply 60d+ ago
Site Reliability Engineer II-1
Mastercard 4.7
Bogota, NJ jobs
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Site Reliability Engineer II-1
Overview
The GBSC EPMS team is looking for a Site Reliability Engineer who can help us solve problems, implement automation, and leverage best practices.
* Are you a born problem solver who loves to figure out how something works?
* Are you a detail -oriented individual who enjoys complex problem solving?
* Do you love determining the correct actions required to fix a problem?
* Do you have a low tolerance for manual work and look to automate everything you can?
Business Operations is leading the Site Reliability Engineering (SRE) transformation at Mastercard through our tooling and by being an advocate for change & standards throughout the development, quality, release, and product organizations. We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Responsibilities
* Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement.
* Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
* Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
* Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
* Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
* Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
* Practice sustainable incident response and blameless postmortems.
* Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
* Work with a global team spread across tech hubs in multiple geographies and time zones
* Share knowledge and mentor junior resources
All About You
* BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
* Experience with algorithms, data structures, scripting, pipeline management, software design and OLAP systems.
* Hands on experience with understanding custom objects using JavaScript, HTML5, CSS and API integrations.
* Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
* Ability to help debug and optimize code and automate routine tasks.
* We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
* Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl, Ruby, MDX.
* Interest in designing, analyzing and troubleshooting large-scale distributed systems.
* We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
* Abide by Mastercard's security policies and practices;
* Ensure the confidentiality and integrity of the information being accessed;
* Report any suspected information security violation or breach, and
* Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
$88k-119k yearly est. Auto-Apply 9d ago
Site Reliability Engineer II-2
Mastercard 4.7
Bogota, NJ jobs
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Site Reliability Engineer II-2
Overview
The GBSC EPMS team is looking for a Site Reliability Engineer who can help us solve problems, implement automation, and leverage best practices.
* Are you a born problem solver who loves to figure out how something works?
* Are you a detail -oriented individual who enjoys complex problem solving?
* Do you love determining the correct actions required to fix a problem?
* Do you have a low tolerance for manual work and look to automate everything you can?
Business Operations is leading the Site Reliability Engineering (SRE) transformation at Mastercard through our tooling and by being an advocate for change & standards throughout the development, quality, release, and product organizations. We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Responsibilities
* Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement.
* Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
* Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
* Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
* Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
* Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
* Practice sustainable incident response and blameless postmortems.
* Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
* Work with a global team spread across tech hubs in multiple geographies and time zones
* Share knowledge and mentor junior resources
All About You
* BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
* Experience with algorithms, data structures, scripting, pipeline management, software design and OLAP systems.
* Hands on experience with understanding custom objects using JavaScript, HTML5, CSS and API integrations.
* Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
* Ability to help debug and optimize code and automate routine tasks.
* We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
* Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl, Ruby, MDX.
* Interest in designing, analyzing and troubleshooting large-scale distributed systems.
* We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
* Abide by Mastercard's security policies and practices;
* Ensure the confidentiality and integrity of the information being accessed;
* Report any suspected information security violation or breach, and
* Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.