Senior Site Reliability Engineer - Enterprise Technology
Reliability engineer job at Hudson Valley Trading Co
Hudson River Trading (HRT) is looking for a Senior Site Reliability Engineer to join our growing Enterprise Technology group. The SRE team sits within Enterprise Technology and is responsible for operating and optimizing corporate productivity & collaboration infrastructure for the entire firm, both on-prem and in the cloud.
As one of Enterprise Technology's first SREs, you will help to establish and grow our site reliability engineering practice in addition to ensuring the availability and reliability of systems within our stack.
This role requires a deep Linux operating system and application administration skill set, proficiency in Python, and solid experience with configuration management/IaC. Successful candidates should also have exceptional organizational, communication, and project management skills, as well as the ability to troubleshoot complex technical issues.
Responsibilities
Manage on-premise containerized web services, and a multitude of bridge services, integrations and batch processes that interconnect the elements of our productivity ecosystem
Proactively eliminate sources of operational work. Engineering not firefighting
Automate and troubleshoot a broad range of technical infrastructure both on-prem and in the cloud
Develop and implement monitoring solutions to ensure high system uptime and reliability
Enable transparency and high development velocity within the firm while maintaining a high bar for security. Find ways to reduce user friction, and make sure HRTers have access to the tools and data they need when they need it
Break down complexity, iterate, and communicate progress to a wide variety of leads and stakeholders
Qualifications
5+ years of experience in site reliability engineering or related disciplines
Proficiency with Python
Experience managing and monitoring containerized infrastructure
Experience working with CI/CD tools such as Jenkins, GitHub Actions, or ArgoCD
Expert experience with IaC and configuration management tools such as Terraform, SaltStack, Chef, Puppet, or Ansible
Annual base salary range of $150,000 to $250,000. Pay (base and bonus) may vary depending on job-related skills and experience. A sign-on and discretionary performance bonus may be provided as part of the total compensation package, in addition to company-paid medical and/or other benefits.
Culture
Hudson River Trading (HRT) brings a scientific approach to trading financial products. We have built one of the world's most sophisticated computing environments for research and development. Our researchers are at the forefront of innovation in the world of algorithmic trading.
At HRT we welcome a variety of expertise: mathematics and computer science, physics and engineering, media and tech. We're a community of self-starters who are motivated by the excitement of being at the cutting edge of automation in every part of our organization-from trading, to business operations, to recruiting and beyond. We value openness and transparency, and celebrate great ideas from HRT veterans and new hires alike. At HRT we're friends and colleagues - whether we are sharing a meal, playing the latest board game, or writing elegant code. We embrace a culture of togetherness that extends far beyond the walls of our office.
Feel like you belong at HRT? Our goal is to find the best people and bring them together to do great work in a place where everyone is valued. HRT is proud of our diverse staff; we have offices all over the globe and benefit from our varied and unique perspectives. HRT is an equal opportunity employer; so whoever you are we'd love to get to know you.
Auto-ApplyNPD Quality Engineer
Plymouth, MA jobs
Must Have Technical/Functional Skills
• Knowledge on Quality Management and its tools & techniques
• Knowledge about GMP (Good Manufacturing Practices), FDA, ISO 13485 and compliance regulations
• Knowledge on Medical Device Regulatory Standards, MDD and MDR
• Knowledge on NC, CAPA, Root Cause Analysis and Audit processes
• Knowledge on Validation process, writing protocols/ reports
• Very good understanding/ experience in writing procedures, product specs and work instructions
• Knowledge in Statistics, Risk Management and Design control
• Must possess good communication skills (verbal and written), familiar with project management methodology, problem solving, and presentation skills
• Experience in creating FMEAs & Writing reports
• Experience in PMS (Post Market Surveillance)
• Experience in PLM Tool (Windchill)
• Good understanding of Design, Drawing and GD&T
• Excellent Interpersonal / communication skills, Organizational / planning and Project management skills preferred
• Personal computer skills, Windows: word processing, presentation, e-mail, web browsers & spreadsheet software
• Ability to work efficiently, meet timelines, and communicate status (generate trackers, send emails, etc.)
Roles & Responsibilities
• Under limited supervision and in accordance with all applicable federal, state and local laws/regulations and Corporate Johnson & Johnson, procedures and guidelines, the duties and responsibilities for this position are:
• Development and review of PDP (Product development Process) deliverables
• Review and approve R&D/ Engineering protocol/ reports
• Development of Risk management records (i.e. DFMEA/ PFMEA) in collaboration with SMEs
• Support and provide guidance on Validations and if required write Validation Protocols/ Reports
• Support/ Remediation of Validation/ Quality Documentation
• Support Root Cause Investigation and closure of NC and CAPA
• Review and approve the Change Orders (CR/ CN)
• Review and update the design/ process control documents like procedures/ work instructions/ product specs etc.
• Work with cross functional teams and internal teams to create deliverables
• Performs other duties assigned as needed
Salary Range: $90,000 $95,000 Year
TCS Employee Benefits Summary:
Discretionary Annual Incentive.
Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
Family Support: Maternal & Parental Leaves.
Insurance Options: Auto & Home Insurance, Identity Theft Protection.
Convenience & Professional Growth: Commuter Benefits & Certification & amp; Training Reimbursement.
Time Off: Vacation, Time Off, Sick Leave & Holidays.
Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
Site Reliability Engineer
New York, NY jobs
MIO Partners, Inc. (MIO) provides proprietary investment products to McKinsey's retirement plan and partners and offers independent, high-quality financial advice to McKinsey's partners. We manage a wide array of investment vehicles with significant expertise and a long and successful track record in alternative strategies, including hedge funds and private equity. We have a multibillion-dollar portfolio of assets under management, and we manage assets for and advise only McKinsey-related clients; we do not accept outside or third-party investments.
MIO is a values-based organization that is strongly aligned with our investors' interests. MIO measures success as performance relative to a market-based benchmark.
MIO, a 250+ person registered investment adviser, provides ample opportunities for somebody with an entrepreneurial drive to shine. We strive to meet the highest professional standards and build an organization that attracts, develops, and retains exceptional people. MIO is a wholly owned subsidiary of McKinsey, but our activities are kept entirely separate from those of the consulting Firm.
Primary responsibilities
The successful candidate will have extensive technical experience working with AWS cloud technologies, preferably for financial services firms, such as asset managers, hedge funds, and/or broker/dealers.
The new hire must lead by example and work collaboratively to:
Design and maintain monitoring systems and dashboards
Architect and manage cloud infrastructure (AWS, Azure) with security, stability, and cost in mind
Implement CI/CD pipelines for reliable software delivery
Establish infrastructure as code practices using CDK, GitLab, AWS developer tools
Contribute to MIO application codebase to follow resiliency and performance best practices
Ensure application architectures follow cloud best practices for reliability, security, performance, and efficiency
Work with development teams to improve deployment processes and system reliability
Collaborate with business owners to translate business requirements into technical solutions with an eye toward technology consistency and best practices
Work with engineers, business users, and other stakeholders to understand their needs and ensure solutions align with business goals
Maintain detailed documentation for reference architectures, design patterns, and system configurations
Raise the bar on our development capabilities, standards, and processes
Synthesize requirements gathered from various teams within/outside of IT and suggest creative solutions; where appropriate, guiding MIO to “do it the right way”
Following a scrum methodology, organize with end users, business analysts, and other architects and developers
Recommend positive steps toward standardizing development processes, including technology selection, deployment steps, code reviews, and IT tools
Partner with development, QA, and AppSecOps teams to promote standardization, consistency, and improved security posture
Our applications are primarily developed using Python/Django and libraries such as Pandas, NumPy, PL/SQL. In addition, we utilize SQL Server, MySQL, Elastic Search, Redis, Kafka, Tableau, and various third-party APIs and data sources. Our applications are hosted in AWS using docker containers on ECS/EC2 platforms.
Primary responsibilities estimated percentage
allocation
25% Technology Leadership: design, mentoring,
15% Relationship Building: requirements
60% Heads Down Development
Desired background
Please note applicants must be authorized to work in the U.S. without current or future visa sponsorship
At least 8+ years of hands-on experience in DevOps, SRE, or platform engineering roles
Bachelor of science in computer science or other related discipline (although strong experience with a less directly related degree will be considered)
Strong experience in AWS Cloud technologies
Knowledge of CI/CD pipeline tools (GitLab pipelines, Jenkins etc.)
Understanding of monitoring and observability tools (ELK, Dynatrace, Datadog etc.)
Experience with microservices, serverless architectures, and containerization
Proficiency in AWS cloud platform including infrastructure-as-code and CI/CD pipelines
Formal problem-solving and/or analytical training/experience a plus, as is experience working with management consultants
Good intuition for end-user requirements gathering; iterative and collaborative approach to design
Strong client relationship management skills and excellent written/verbal communication skills to interact at all levels
*****************
MIO Partners, Inc. (MIO) is an equal opportunity employer. MIO will consider all applicants regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability status. MIO has adopted a flexible, hybrid model that supports a blend of in-office and remote work. Our office is in New York City.
Certain US states require MIO Partners, Inc. to include a reasonable estimate of the salary range for this role. Actual salaries may vary and may be above or below the range based on various factors, including, but not limited to an individual's assigned office location, experience, and expertise. Certain roles are also eligible for bonuses, subject to MIO's discretion and based on factors such as individual and/or organizational performance. Additionally, MIO offers a comprehensive benefits package, including medical, dental and vision coverage, telemedicine services, life, accident and disability insurance, parental leave and family planning benefits, caregiving resources, a generous retirement program, financial guidance, and paid time off.
Base salary range$175,000-$200,000 USD
MIO Partners, Inc. (MIO) is an equal opportunity employer. MIO will consider all applicants regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability status.
We are committed to protecting your privacy. Please review our Applicant Privacy Policy for a detailed explanation of how we collect, use, and protect your personal information.
Auto-ApplyEQ Electronic Reliability Engineer
New York jobs
Purpose of the role
To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them.
Accountabilities
Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.
Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience.
Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations.
Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth.
Assistant Vice President Expectations
To advise and influence decision making, contribute to policy development and take responsibility for operational effectiveness. Collaborate closely with other functions/ business divisions.
Lead a team performing complex tasks, using well developed professional knowledge and skills to deliver on work that impacts the whole business function. Set objectives and coach employees in pursuit of those objectives, appraisal of performance relative to objectives and determination of reward outcomes
If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L - Listen and be authentic, E - Energise and inspire, A - Align across the enterprise, D - Develop others.
OR for an individual contributor, they will lead collaborative assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will identify new directions for assignments and/ or projects, identifying a combination of cross functional methodologies or practices to meet required outcomes.
Consult on complex issues; providing advice to People Leaders to support the resolution of escalated issues.
Identify ways to mitigate risk and developing new policies/procedures in support of the control and governance agenda.
Take ownership for managing risk and strengthening controls in relation to the work done.
Perform work that is closely related to that of other areas, which requires understanding of how areas coordinate and contribute to the achievement of the objectives of the organisation sub-function.
Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategy.
Engage in complex analysis of data from multiple sources of information, internal and external sources such as procedures and practises (in other areas, teams, companies, etc).to solve problems creatively and effectively.
Communicate complex information. 'Complex' information could include sensitive information or information that is difficult to communicate because of its content or its audience.
Influence or convince stakeholders to achieve outcomes.
All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship - our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset - to Empower, Challenge and Drive - the operating manual for how we behave.
Embark on a transformative journey as an EQ Electronic Reliability Engineer. At Barclays, our vision is clear - to redefine the future of banking and help craft innovative solutions. In this role, you will be directly involved in supporting and enhancing electronic trading systems that underpin critical business operations. As part of a small, collaborative team, you will ensure the reliability of fast-moving, low-latency platforms while contributing to projects that are modernizing the trading environment. This position provides an opportunity to gain exposure to both stability management and forward-looking engineering projects that will shape the next generation of our infrastructure.
To be successful as an EQ Electronic Reliability Engineer, you should have experience with:
Electronic trading systems and concepts, including exchange gateways, trading algorithms, dark pools, market data, and industry protocols such as FIX, within at least one asset class such as Equities, Rates, Futures, or FX
Applying scripting, monitoring (ITRS), observability, and Linux platform experience troubleshoot, and support complex trading environments
Writing and maintaining scripts to automate processes and improve efficiency
Some other highly valued skills may include:
Programming experience in Java, Python, or C++ to extend automation and strengthen application reliability
Using KDB to manage and analyze time-series trading data effectively
Leveraging SRE tooling such as Grafana or Elastic to enhance monitoring and operational resilience
You may be assessed on the key critical skills relevant for success in this role, such as risk and controls, change and transformation, business acumen, strategic thinking, digital and technology, as well as job-specific technical skills.
This role is located in New York, NY.
Minimum Salary: $120,000
Maximum Salary: $175,000
The minimum and maximum salary/rate information above includes only base salary or base hourly rate. It does not include any other type of compensation or benefits that may be available.
Auto-ApplySite Reliability Engineer II
Arlington, TX jobs
Why GMF Technology? Innovation isn't just a talking point at GM Financial, it's how we operate. From generative AI and cloud-native technologies to peer-led learning and hackathons, our tech teams are building real solutions that make a difference. We're committed to AI-powered transformation, using advanced machine learning and automation to help us reimagine customer interactions and modernize operations, positioning GM Financial as a leader in digital innovation within a dynamic industry.
Join us and discover a workplace where your ideas matter, your development is prioritized, and you can truly make a global impact.
Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Please note: we are unable to provide sponsorship for this role at this time.
About this Role:
The Site Reliability Engineering (SRE) team provides leadership, direction, and accountability for building and running large-scale software systems. As a Site Reliability Engineer, you will identify and deliver automation solutions designed to ensure high availability and resiliency using your expertise in software development, complexity analysis, and scalable system design. Strong collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our business partners and end-users.
* Partner with the architecture and development teams on how to make applications highly available, reliable, and performant at global scale
* Collaborate with the architecture team to ensure Reliability factors are accounted for in business features and enablers
* Guide development teams in understanding established service level objectives and consequences, and implementing appropriate SLIs to support the objectives
* Collaborate with development team members to swarm, troubleshoot, and resolve problems
* Guide ad-hoc teams to brainstorm solutions and build implementation plans based on the Root Cause Analysis of production issues
* Design and build automated solutions to optimize application/service/platform uptime with minimal human intervention
* Be available for an on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours
* Implement and help create standards and best practices, and mentor other team members in order to drive adoption across development teams
What makes you a dream candidate?
Knowledge and Skills
* Expert in defining, implementing, and evaluating Service Level Objectives (SLO) and Service Level Indicators (SLI), and associated consequences
* Software development expertise in two or more high-level programming and scripting languages
* Experience in evolutionary database design, query performance analysis, and indexing as a cornerstone for delivering scalable, performant products and services
* Experience in designing, building, and optimizing automated pipelines with automated testing and automated security controls
* Experience in performing Root Cause Analysis and Problem Management
* Experience working in Agile Scrum teams with demonstrated success leading improvements (getting better/faster/happier) Help establish and maintain a culture of learning through the development and sharing of skills, knowledge, process and tools; combat traditional silos that create "us and them" environments
* A driving passion for finding solutions to hard problems at scale and operationalizing them
* Exceptional critical thinking and communication skills, with a passion for leveraging documentation as a tool for constant improvement
* Pipeline Automation: Azure DevOps (YAML, ARM), Terraform, Jenkins, Chef, Octopus Deploy
* Code Scanning: SonarQube, Checkmarx
* Source Code repos: Git
* Containerization: Azure Kubernetes Service, Kubernetes (open source), Docker
* High-level programming languages: C# (.NET MVC and .NET Core)
* Scripting: PowerShell, Bash
* Database: Oracle, Microsoft SQL Server, NoSQL (e.g. CosmosDB)
* Test Automation: Xamarin.UITest, Specflow, DevTest, Selenium, Test Data Manager, Postman, Maven, TestNG, JMeter
* Operating systems: Windows, Linux
* Cloud Platforms: Azure
* Metrics and Monitoring: Splunk
Experience and Experience
* 3-5 years of experience in software development and test automation required
* 3-5 years of web development experience strongly preferred
* High School Diploma or equivalent required
* Bachelor's Degree in related field or equivalent experience required
* Master's Degree in related field preferred
What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.
Our Culture: Our team members define and shape our culture - an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work - we thrive.
Compensation: Competitive pay and bonus eligibility
Work Life Balance: Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Please note: we are unable to provide sponsorship for this role at this time.
Auto-ApplySite Reliability Engineer III
Arlington, TX jobs
Why GMF Technology?
GM Financial is set to change the auto finance industry and is leading the path of embarking on tech modernization - we have a startup mindset, and preserve our small company culture, in a public company environment with financial stability and intense growth over a decade-plus history. We are data junkies and trust in data and insights to advance our business objectives. We take our goal of zero emission, zero collision, zero congestion, and zero friction very seriously. We believe as an auto finance market leader we are in the driver's seat to lead us in the GM EV mission to change the world. We are building global platforms, in LATAM, Europe, China, U.S. and Canada- and we are looking to grow our high-performing team. GMF is comprised of over 10,000 team members globally. Join our fintech culture within a Blue-Chip company where we are changing the way we use technology to support our customers, dealers and business.
Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Knowledge and Skills
DevOps (YAML, ARM), Terraform, Jenkins, Chef, Octopus Deploy
Pipeline Automation: Azure, GitHub
Code Scanning: SonarQube, Checkmarx, Blackduck
Source Code repos: Git
Containerization: Azure Kubernetes Service, Kubernetes (open source), Docker
High level programming languages: Java, C# (.NET MVC and .NET Core), Go
Scripting: PowerShell, Bash
Database: Oracle, Microsoft SQL Server, NoSQL (e.g. CosmosDB)
Test Automation: Xamarin.UITest, Specflow, DevTest, Selenium, Test Data Manager, Postman, Maven, TestNG, JMeter
Operating systems: Windows, Linux
Cloud Platforms: Azure
Metrics and Monitoring: Splunk, Azure Application Insights, Azure Monitor
Networking: Azure DNS, Virtual Networks, Azure API Manager, Azure Application Gateway, Akamai WAF/CDN
Experience and Education
5-7 years experience in software development and test automation Req
5-7 years of cloud experience strongly Pref
High School Diploma
Associate Degree
Bachelor's Degree in related field or equivalent work or military experience Required
Master's Degree in related field Preferred
What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.
Our Culture: Our team members define and shape our culture - an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work - we thrive.
Compensation: Competitive pay and bonus eligibility
Work Life Balance: Flexible hybrid work environment, 2-days a week in office
About this role
Collaborate with product owners and managers to establish service level objectives for applications and agreed consequences if the objectives are not being met
Collaborate with development team members to identify monitoring gaps, identify application performance improvement opportunities and assist with troubleshooting problems
Drive the Root Cause Analysis of production issues and other failures within the product software, pipeline, or other DevOps support processes or technology
Design, build, and champion automated solutions to optimize application/service/platform uptime with minimal human intervention
Be available for an on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours
Create and implement standards and best practices, driving adoption across development teams and external vendors as applicable
JOB DUTIES
Authority in defining, implementing, and evaluating Service Level Objectives (SLO) and Service Level Indicators (SLI), and associated consequences
Software development expertise in multiple high-level programming and scripting languages
Expert in evolutionary database design, query performance analysis, and indexing as a cornerstone for delivering scalable, performant products and services
Expert in designing, building, and optimizing automated pipelines with automated testing and automated security controls
Expert in performing Root Cause Analysis and Problem Management
Experience working in Agile Scrum teams with demonstrated success leading improvements (getting better/faster/happier)
Auto-ApplySite Reliability Engineer I
Arlington, TX jobs
Why GMF Technology?
GM Financial is set to change the auto finance industry and is leading the path of embarking on tech modernization - we have a startup mindset, and preserve our small company culture, in a public company environment with financial stability and intense growth over a decade-plus history. We are data junkies and trust in data and insights to advance our business objectives. We take our goal of zero emission, zero collision, zero congestion, and zero friction very seriously. We believe as an auto finance market leader we are in the driver's seat to lead us in the GM EV mission to change the world. We are building global platforms, in LATAM, Europe, China, U.S. and Canada- and we are looking to grow our high-performing team. GMF is comprised of over 10,000 team members globally. Join our fintech culture within a Blue-Chip company where we are changing the way we use technology to support our customers, dealers and business.
Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Please note: we are unable to provide sponsorship for this role at this time.
Knowledge and Skills
Thorough command of both the Windows and Linux Operating Systems, with strong background in troubleshooting either
Knowledge of native Kubernetes or related enterprise container platforms such as Open Shift
Good understanding of the mechanics of this platform and the deployment pipeline that feeds it
Knowledge of Public Cloud Governance frameworks, architectures, configurations, services, and solutions, specifically within Microsoft Azure, but may also include AWS and GCP
Knowledge in core Azure services like Azure Kubernetes Service, CosmoDB, Azure Functions, Azure Storage Entities and Concepts, Azure CLI and Powershell Cmdlets
Knowledge in Azure organizational entities such as Departments, Accounts, Subscriptions, Resource Groups and Management Groups
Strong automation skills in Linux and Windows including bash, python, and Powershell
Extensive experience with Terraform plans and associated development
Knowledge of Arm Templates and various related automation methods within Azure
Experience with modern source control repositories (e. g. Git) and dev Ops toolsets (Jenkins/ Ansible etc) and familiarity with Agile/ Scrum methodologies
Experience with cloud-native and microservice architectures and an understanding of design principles for scalability, performance, and reliability
Experience with distributed systems, asynchronous messaging, and networking protocols
Experience with open source applications, frameworks, and libraries
Experience and Education
3-5 years of experience in cloud computing, DevOps, and all related automation disciplines preferred
High School Diploma or equivalent required
Bachelor's Degree in related field or equivalent work experience within the IT field required
What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.
Our Culture: Our team members define and shape our culture - an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work - we thrive.
Compensation: Competitive pay and bonus eligibility
Work Life Balance: Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Please note: we are unable to provide sponsorship for this role at this time.
About this Role:
The Site Reliability Engineer under the general direction from the leadership will assist in the day-to-day tasks critical to the team's success. The position will be responsible for supporting cloud infrastructure architecture and components, including hybrid cloud and Public Cloud platforms. This will include prototyping, initiating, and operationalizing of Public Cloud solutions. The role will also be supportive of overall Cloud Transformation initiatives designed to meet key goals in creating a service-driven culture through performance and delivery of SaaS, PaaS, and IaaS solutions by public cloud vendors such as Azure and AWS. The Site Reliability Engineer will be responsible for configuration, efficiency, and performance of the deployed public cloud solutions. The scope of the role includes not only cloud engineering, but advanced level automation capabilities, and even some overlap into software development disciplines.
Design and automate governance framework across cloud environments
In partnership with provisioning teams, and in accordance with policy frameworks, automate identity and access management in cloud environments for enabling users and services
Partner with various delivery teams in enabling cloud services and the infrastructure as code patterns to facilitate automated deployments
Work with various dev, ops and security stakeholders to provide solutions that meet security and governance requirements while minimizing impact on developer productivity
Proactive monitoring, logging, audits and automated policy enforcement for security and cost compliance
Ensuring services availability and continuity through proper response to incidents and requests
Work with management to build the long-term strategic roadmap for the team, providing mentorship to engineering peers seeking to up-skill
Work in evolving operating model based on agile / scrum and SRE practices
Assist Cloud Transformation and related Squads with the design of Automation and Configuration Management
Identify, evaluate, and recommend opportunities for automation; All automated processes should be maintainable, simple, and leverage a CI/CD pipeline release process where possible
Develop test plans and accept items for production release
Assist in development of automated post-release validation
Develop and maintain cloud integration, script documentation and other standards that align with enterprise architecture standards
Participates in business discussions, solution architectures, and customer-related activities when required
Prototyping and testing of infrastructure components
Validates and contributes to operational testing and change plans
Provides Tier 3 automation support for cloud ecosystem, infrastructure, and automation
Assist in various cloud reporting activities
Platform and service monitoring responsibilities within cloud ecosystems
Assists in the design of highly available, fault tolerant and cloud DR failover capabilities
Auto-ApplySite Reliability Engineer II
Arlington, TX jobs
Why GMF Technology?
GM Financial is set to change the auto finance industry and is leading the path of embarking on tech modernization - we have a startup mindset, and preserve our small company culture, in a public company environment with financial stability and intense growth over a decade-plus history. We are data junkies and trust in data and insights to advance our business objectives. We take our goal of zero emission, zero collision, zero congestion, and zero friction very seriously. We believe as an auto finance market leader we are in the driver's seat to lead us in the GM EV mission to change the world. We are building global platforms, in LATAM, Europe, China, U.S. and Canada- and we are looking to grow our high-performing team. GMF is comprised of over 10,000 team members globally. Join our fintech culture within a Blue-Chip company where we are changing the way we use technology to support our customers, dealers and business.
Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Please note: we are unable to provide sponsorship for this role at this time.
What makes you a dream candidate?
Knowledge and Skills
Expert in defining, implementing, and evaluating Service Level Objectives (SLO) and Service Level Indicators (SLI), and associated consequences
Software development expertise in two or more high-level programming and scripting languages
Experience in evolutionary database design, query performance analysis, and indexing as a cornerstone for delivering scalable, performant products and services
Experience in designing, building, and optimizing automated pipelines with automated testing and automated security controls
Experience in performing Root Cause Analysis and Problem Management
Experience working in Agile Scrum teams with demonstrated success leading improvements (getting better/faster/happier) Help establish and maintain a culture of learning through the development and sharing of skills, knowledge, process and tools; combat traditional silos that create “us and them” environments
A driving passion for finding solutions to hard problems at scale and operationalizing them
Exceptional critical thinking and communication skills, with a passion for leveraging documentation as a tool for constant improvement
Pipeline Automation: Azure DevOps (YAML, ARM), Terraform, Jenkins, Chef, Octopus Deploy
Code Scanning: SonarQube, Checkmarx
Source Code repos: Git
Containerization: Azure Kubernetes Service, Kubernetes (open source), Docker
High-level programming languages: C# (.NET MVC and .NET Core)
Scripting: PowerShell, Bash
Database: Oracle, Microsoft SQL Server, NoSQL (e.g. CosmosDB)
Test Automation: Xamarin.UITest, Specflow, DevTest, Selenium, Test Data Manager, Postman, Maven, TestNG, JMeter
Operating systems: Windows, Linux
Cloud Platforms: Azure
Metrics and Monitoring: Splunk
Experience and Experience
3-5 years of experience in software development and test automation required
3-5 years of web development experience strongly preferred
High School Diploma or equivalent required
Bachelor's Degree in related field or equivalent experience required
Master's Degree in related field preferred
What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.
Our Culture: Our team members define and shape our culture - an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work - we thrive.
Compensation: Competitive pay and bonus eligibility
Work Life Balance: Flexible hybrid work environment (onsite 2 days a week/3 days remote) at our Arlington (AOC1), TX office.
Please note: we are unable to provide sponsorship for this role at this time.
About this Role:
The Site Reliability Engineering (SRE) team provides leadership, direction, and accountability for building and running large-scale software systems. As a Site Reliability Engineer, you will identify and deliver automation solutions designed to ensure high availability and resiliency using your expertise in software development, complexity analysis, and scalable system design. Strong collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our business partners and end-users.
Partner with the architecture and development teams on how to make applications highly available, reliable, and performant at global scale
Collaborate with the architecture team to ensure Reliability factors are accounted for in business features and enablers
Guide development teams in understanding established service level objectives and consequences, and implementing appropriate SLIs to support the objectives
Collaborate with development team members to swarm, troubleshoot, and resolve problems
Guide ad-hoc teams to brainstorm solutions and build implementation plans based on the Root Cause Analysis of production issues
Design and build automated solutions to optimize application/service/platform uptime with minimal human intervention
Be available for an on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours
Implement and help create standards and best practices, and mentor other team members in order to drive adoption across development teams
Auto-ApplySite Reliability Engineer
New York jobs
We are FIS. Our technology powers the world's economy and our teams bring innovation to life. We champion diversity to deliver the best products and solutions for our colleagues, clients and communities. If you're ready to start learning, growing and making an impact with a career in fintech, we'd like to know: Are you FIS?
NOTE:
1: This position is hybrid (3 days onsite) in our FIS Office locations in New York City (New York), Milwaukee (Wisconsin), Jacksonville (Florida) & Atlanta (Georgia).
2: Current and future sponsorship are not available for this position
About the Team:
This position is under our CTO org to support SRE functions for innovation and growth for the Banking Solutions, Payments and Capital Markets business. This role will report under our Wealth and Retirement team.
FIS empowers small to large retirement plan providers around the globe with a comprehensive, integrated suite of retirement solutions. Our industry-leading offering includes an extensive selection of technology and services. From record keeping technology and plan administration and compliance solutions, to financial wellness that supports all aspects of retirement services business and positions our clients for future growth. We are looking for a talented resource, who is comfortable working multiple appropriately prioritized issues and/or projects at a time. One who desires to be a part of this global dynamic retirement technology team where they will grow personally, technically and professionally.
What you will be doing:
Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments and investment landscape.
Your broad responsibilities will include: owning the technical engagement and ultimate success around specific modernization projects and have hands-on experience with AWS technologies as well as broad know-how around how applications and services are constructed using the AWS platform.
Design and maintain monitoring solutions for infrastructure, application performance, and user experience.
Implement automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
Ensure application reliability, availability, and performance, minimizing downtime and optimizing response times.
Lead incident response, including identification, triage, resolution, and post-incident analysis.
Conduct capacity planning, performance tuning, and resource optimization.
Collaborate with security teams to implement best practices and ensure compliance.
Manage deployment pipelines and configuration management for consistent and reliable app deployments.
Develop and test disaster recovery plans and backup strategies.
Collaborate with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
Participate in on-call rotations and provide 24/7 support for critical incidents.
What you bring:
Proficiency in development technologies, architectures, and platforms (web, mobile apps, API).
Experience with AWS cloud platform and IaC tools.
AWS CLI: used for a few tasks, changing AWS configurations, uploading/downloading files to s3, downloading kubectl context
Terraform: defining the infrastructure as code, where all AWS resource configurations gets configured/changed
Kubectl: for communication with EKS
Knowledge of monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack).
Experience in incident management and post-mortem reviews.
Strong troubleshooting skills for complex technical issues.
Proficiency in scripting languages (Python, Bash) and automation tools (Terraform, Ansible).
Experience with CI/CD pipelines (Jenkins, GitLab CI/CD,).
Ownership approach to engineering and product outcomes.
Excellent interpersonal communication, negotiation, and influencing skills.
Must have the following Certifications:
AWS Cloud Practitioner
AWS Solutions Architect Associate OR AWS Certified Devops Engineer
Tech Stack:
Linux, Git, Docker, Kubernetes /OpenShift, Helm, Jenkins, Harness, CheckMarx, SonarQube, Maven, Node, Artifactory, FlyWay, Splunk, KeyFactor, HashiCorp Vault, CyberArk, SNOW, Jira, Confluence, Oracle DB, PostgreSQL DB, EC2, EKS, KMS, Secrets Manager (Stores passwords and enables auto-rotation), RDS: Postgres DB, Redis: In-Memory Cache, S3: object storage, Security Groups: Ingress/Egress Firewall configurations for each service, IAM: Access control to AWS resources, Route53, SFTP, Scheduler like Tivoli is nice to know.
What we offer you:
At FIS, we hire the best. In return, you receive exceptional benefits including:
Opportunities to innovate in fintech
Tools for personal and professional growth
Inclusive and diverse work environment
Resources to invest in your community
Competitive salary and benefits
NOTE:
1: This position is hybrid (3 days onsite) in our FIS Office locations in New York City (New York), Milwaukee (Wisconsin), Jacksonville (Florida) & Atlanta (Georgia).
2: Current and future sponsorship are not available for this position
FIS is committed to providing its employees with an exciting career opportunity and competitive compensation. The pay range for this full-time position is $170,550.00 - $286,520.00 and reflects the minimum and maximum target for new hire salaries for this position based on the posted role, level, and location. Within the range, actual individual starting pay is determined by additional factors, including job-related skills, experience, and relevant education or training. Any changes in work location will also impact actual individual starting pay. Please consult with your recruiter about the specific salary range for your preferred location during the hiring process.
Privacy Statement
FIS is committed to protecting the privacy and security of all personal information that we process in order to provide services to our clients. For specific information on how FIS protects personal information online, please see the Online Privacy Notice.
EEOC Statement
FIS is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, marital status, genetic information, national origin, disability, veteran status, and other protected characteristics. The EEO is the Law poster is available here supplement document available here
For positions located in the US, the following conditions apply. If you are made a conditional offer of employment, you will be required to undergo a drug test. ADA Disclaimer: In developing this job description care was taken to include all competencies needed to successfully perform in this position. However, for Americans with Disabilities Act (ADA) purposes, the essential functions of the job may or may not have been described for purposes of ADA reasonable accommodation. All reasonable accommodation requests will be reviewed and evaluated on a case-by-case basis.
Sourcing Model
Recruitment at FIS works primarily on a direct sourcing model; a relatively small portion of our hiring is through recruitment agencies. FIS does not accept resumes from recruitment agencies which are not on the preferred supplier list and is not responsible for any related fees for resumes submitted to job postings, our employees, or any other part of our company.
#pridepass
Auto-ApplyReliability Engineer
Dallas, TX jobs
TrinityRail is searching for a Reliability Engineer to join our Railcar Fleet Engineering team at our corporate headquarters in Dallas, Texas. What you'll do: * Analyze data from various quality inputs (including, but not limited to nonconformance reports, customer complaints, and internal quality data) to determine trends and identify areas for systemic improvements. Prepare and issue reports on a weekly, monthly, and/or quarterly basis
* Extract, compile, and analyze data using standard statistical tools
* Perform reliability analyses for railcar design and components
* Recommend changes to maintenance plans, inspection intervals, and component selection based on reliability analyses
* Work with customers, suppliers, and shops to improve data collection processes
* Facilitate root cause investigations to gain support from key stakeholders for implementation of corrective actions
* Serve as a liaison for researching, analyzing, and communicating on quality-related issues
* B.S. in Mechanical or Reliability Engineering or other relevant engineering discipline required
* 5+ years' experience of reliability analysis or quality assurance; emphasis on heavy industrial environment preferred
* Demonstrated ability to program and create data visualizations
* Knowledge of railcar operating environment, regulations, and standards preferred
* Outstanding communication skills - both written and verbal
* Excellent working knowledge of SQL, R, Python, DataBricks, or PowerBI required
* Demonstrated ability to perform data analysis and reliability analysis
* Ability to multi-task / handle multiple projects of differing scale
* Demonstrated ability to execute all phases of root cause analysis, recommend and implement corrective actions.
* Experience in a manufacturing or repair plant in a Mechanical Engineer, Reliability Engineer, or Quality Engineer position with a custom product, freight and/or tank car maintenance preferred.
Site Reliability Engineer III - AWM
Boston, MA jobs
We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorganChase within the Asset and Wealth Management Americas team, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for carrying out critical technology solutions across multiple technical areas within various business functions in support of the firm's business objectives.
**Job responsibilities**
+ Executes software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
+ Creates secure and high-quality production code and maintains algorithms that run synchronously with appropriate systems
+ Produces architecture and design artifacts for complex applications while being accountable for ensuring design constraints are met by software code development
+ Gathers, analyzes, synthesizes, and develops visualizations and reporting from large, diverse data sets in service of continuous improvement of software applications and systems
+ Proactively identifies hidden problems and patterns in data and uses these insights to drive improvements to coding hygiene and system architecture
+ Contributes to software engineering communities of practice and events that explore new and emerging technologies
+ Adds to team culture of diversity, opportunity, inclusion, and respect
**Required qualifications, capabilities, and skills**
+ Formal training or certification on computer science and reliability concepts and 3+ years applied experience.
+ Hands-on practical experience in system design, application development, testing, and operational stability
+ Proficient in coding in one or more languages
+ Experience in developing, debugging, and maintaining code in a large corporate environment with one or more modern programming languages and database querying languages
+ Overall knowledge of the Software Development Life Cycle
+ Solid understanding of agile methodologies such as CI/CD, Application Resiliency, and Security
+ Demonstrated knowledge of software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)
**Preferred qualifications, capabilities, and skills**
+ Familiarity with modern front-end technologies
+ Exposure to cloud technologies
JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
**Base Pay/Salary**
New York,NY $133,000.00 - $185,000.00 / year
Site Reliability Engineer - Capital Markets
New York, NY jobs
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
* Front Line Site Reliable Engineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
* Build monitoring tools for application and infrastructure components.
* Implement and manage scalable infrastructure using cloud-native technologies and tools.
* Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
* Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
* Develop and maintain CI/CD pipelines to streamline deployment processes.
* Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
* Create sustainable systems and services through automation.
* Collaborate with Application team to establish and enforce production and development standards.
* Document procedures, best practices and troubleshooting FAQs.
* Resolve complex application and technical problems.
* Debugging the system and fixing the production related issues.
* Escalate / follow-up on permanent fix for development related issues.
* Lead incident response efforts and post-mortem analysis to prevent future occurrences.
* Handles complex operational tasks and recommends process and technology changes.
* Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
* Ability to work both independently and in groups in an energetic, diverse environment.
* Participate in on-call rotations to ensure 24/7 system availability and support.
* Support compliance and legal queries.
Qualifications:
* Strong experience in Windows and Linux/Unix services.
* Strong experience in scripting language like Power shell, Python and SQL.
* Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
* Strong Knowledge of FIX protocol
* Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
* Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
* Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
Auto-ApplySite Reliability Engineer III
Plano, TX jobs
There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Consumer & Community Banking team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
Formal training or Certification on software engineering concepts and 3+ years applied experience
Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Familiarity with troubleshooting common networking technologies and issues
Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems
Preferred qualifications, capabilities, and skills
Prior fin-tech industry experience
Excellent written and communication skills
Cloud certification on AWS or AZURE or GCP
Auto-ApplySite Reliability Engineer III
Plano, TX jobs
JobID: 210683373 JobSchedule: Full time JobShift: : There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within Chase within the Enterprise technology, engineering services and platform team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
* Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
* Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
* Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
* Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
* Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
* Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
* Supports the adoption of site reliability engineering best practices within your team
* Production 24*7 support for business-critical applications
Required qualifications, capabilities, and skills
* Formal training or certification in software engineering concepts with 2+ years of applied experience.
* Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
* Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
* Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
* Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
* Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
* Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
* Familiarity with troubleshooting common networking technologies and issues
* Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
* Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
* Experience with event streaming platforms likes Kafka
* Experience in Incident and change management
Preferred qualifications, capabilities, and skills
* Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
* Ability to initiate and implement ideas to solve business problems
* Networking and systems
* Deep understanding of TCP/IP, DNS, load balancing, firewalls, and VPN technologies
* Experience tuning Linux performance and troubleshooting system-level issues
* Collaborative leadership
* Demonstrated ability to mentor junior engineers and drive SRE best-practice adoption
* Strong written and verbal communication skills; comfortable presenting to technical and non-technical stakeholders
* Certifications (a plus)
* AWS Certified SysOps Administrator or Professional, Certified Kubernetes Administrator (CKA), or equivalent
Auto-ApplySite Reliability Engineer III
Plano, TX jobs
Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.
As a Site Reliability Engineer II at JPMorgan Chase within the Commercial and Investment bank, Digital and platform devices team , you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies.
Job responsibilities
Executes small to medium projects independently with initial direction and eventually graduates to designing and delivering projects by yourself
Leverages technology to solve business problems by writing high quality, maintainable, and robust code following best practices in software engineering
Participates in triaging, examining, diagnosing, and resolving incidents and work with others to solve problems at their root
Recognizes the toil within your role and proactively works towards eliminating it through either systems engineering or updating application code
Understands observability patterns and strives to implement and improve service level indicators, objectives monitoring, and alerting solutions for optimal transparency and analysis
Required qualifications, capabilities, and skills
Formal training or certification on software engineering concepts and 3+ years of applied experience
Ability to code in at least one programming language
Experience maintaining a Cloud-base infrastructure
Familiar with site reliability concepts, principles, and practices
Familiar with observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Familiarity with containers or a common Server OS such as Linux and Windows
Emerging knowledge of software, applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Emerging knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Emerging knowledge of common networking technologies
Preferred qualifications, capabilities, and skills
General knowledge of financial services industry
Ability to work in a large, collaborative team and demonstrates the willingness to vocalize ideas with peers and managers
Understanding of how to prioritize and adjust work plans to adapt to changes in assigned responsibilities and projects
Eagerness to participate in learning opportunities to enhance one's effectiveness in executing day-to-day project activities
Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems
Auto-ApplyLead Site Reliability Engineer
Plano, TX jobs
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
Job responsibilities
Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels
Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
Documents and shares knowledge within your organization via internal forums and communities of practice
Required qualifications, capabilities, and skills
Formal training or certification on software engineering and SRE concepts and 5+ years applied experience.
Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
Experience with troubleshooting common networking technologies and issues
Ability to identify and solve problems related to complex data structures and algorithms
Preferred qualifications, capabilities, and skills
Drive to self-educate and evaluate new technology
Ability to teach new programming languages to team members
Ability to expand and collaborate across different levels and stakeholder groups
Auto-ApplyLead Site Reliability Engineer - Pega, Cloud
Plano, TX jobs
As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate Sector, you hold a senior technical role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. As a core technical contributor and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a senior technical for medium to large-sized products, and provide advice and mentoring to other engineers.
Job responsibilities
Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels
Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
Documents and shares knowledge within your organization via internal forums and communities of practice
Required qualifications, capabilities, and skills
Formal Pega training and certification with over 8 years of applied experience, including architecting complex solutions and integrating Pega with databases, messaging systems, and cloud services.
At least 3 years serving as a senior engineer or technical lead, demonstrating strong individual contributor skills and the ability to engage technically with senior technology and management teams.
More than 3 years of experience in Pega Production Support, PRPC Framework, SQL, PL/SQL, SQL Server, Oracle Databases, and Unix environments.
Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and site reliability best practices, with proven ability to implement these practices within applications and platforms.
Experience applying DevOps principles to Pega applications, including CI/CD, automated testing, monitoring, and logging.
Fluency in at least one programming language such as Python, Java Spring Boot, or .Net, with strong knowledge of software applications and technical processes.
Experience troubleshooting common networking technologies and resolving related issues.
Ability to identify and solve problems involving complex data structures and algorithms.
Demonstrated drive to self-educate, evaluate new technologies, and collaborate effectively across different levels and stakeholder groups; prior experience as a Pega SRE in a Center of Excellence (CoE) is a plus.
Preferred qualifications, capabilities, and skills
Must have Certified Pega Senior System Architect or higher certification. on latest Versions Pega 8 and above.
Experience with cloud-native Pega deployments and hybrid architectures.
Advanced knowledge of automation frameworks and scripting for Pega environments.
Experience with regulatory and compliance requirements in financial services.
Prior experience working in global, distributed teams and large-scale enterprise environments.
Strong presentation and communication skills for both technical and non-technical audiences.
Auto-ApplyLead Site Reliability Engineer - Pega, Cloud
Plano, TX jobs
As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate Sector, you hold a senior technical role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. As a core technical contributor and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a senior technical for medium to large-sized products, and provide advice and mentoring to other engineers.
**Job responsibilities**
+ Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
+ Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels
+ Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
+ Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
+ Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
+ Documents and shares knowledge within your organization via internal forums and communities of practice
**Required qualifications, capabilities, and skills**
+ Formal Pega training and certification with over 8 years of applied experience, including architecting complex solutions and integrating Pega with databases, messaging systems, and cloud services.
+ At least 3 years serving as a senior engineer or technical lead, demonstrating strong individual contributor skills and the ability to engage technically with senior technology and management teams.
+ More than 3 years of experience in Pega Production Support, PRPC Framework, SQL, PL/SQL, SQL Server, Oracle Databases, and Unix environments.
+ Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and site reliability best practices, with proven ability to implement these practices within applications and platforms.
+ Experience applying DevOps principles to Pega applications, including CI/CD, automated testing, monitoring, and logging.
+ Fluency in at least one programming language such as Python, Java Spring Boot, or .Net, with strong knowledge of software applications and technical processes.
+ Experience troubleshooting common networking technologies and resolving related issues.
+ Ability to identify and solve problems involving complex data structures and algorithms.
+ Demonstrated drive to self-educate, evaluate new technologies, and collaborate effectively across different levels and stakeholder groups; prior experience as a Pega SRE in a Center of Excellence (CoE) is a plus.
**Preferred qualifications, capabilities, and skills**
+ Must have Certified Pega Senior System Architect or higher certification. on latest Versions Pega 8 and above.
+ Experience with cloud-native Pega deployments and hybrid architectures.
+ Advanced knowledge of automation frameworks and scripting for Pega environments.
+ Experience with regulatory and compliance requirements in financial services.
+ Prior experience working in global, distributed teams and large-scale enterprise environments.
+ Strong presentation and communication skills for both technical and non-technical audiences.
JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
Lead Site Reliability Engineer
Plano, TX jobs
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
**Job responsibilities**
+ Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
+ Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels
+ Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
+ Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
+ Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
+ Documents and shares knowledge within your organization via internal forums and communities of practice
**Required qualifications, capabilities, and skills**
+ Formal training or certification on software engineering and SRE concepts and 5+ years applied experience.
+ Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
+ Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
+ Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
+ Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
+ Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
+ Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
+ Experience with troubleshooting common networking technologies and issues
+ Ability to identify and solve problems related to complex data structures and algorithms
**Preferred qualifications, capabilities, and skills**
+ Drive to self-educate and evaluate new technology
+ Ability to teach new programming languages to team members
+ Ability to expand and collaborate across different levels and stakeholder groups
Chase is a leading financial services firm, helping nearly half of America's households and small businesses achieve their financial goals through a broad range of financial products. Our mission is to create engaged, lifelong relationships and put our customers at the heart of everything we do. We also help small businesses, nonprofits and cities grow, delivering solutions to solve all their financial needs.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
Equal Opportunity Employer/Disability/Veterans
Site Reliability Engineer III- CIB - Markets - Post Trade Technology - SRE
Houston, TX jobs
There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Commercial and Investment bank, Post trade team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
Formal training or certification on software engineering concepts and 3+ years applied experience
Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Familiarity with troubleshooting common networking technologies and issues
Preferred qualifications, capabilities, and skills
Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems
Auto-Apply