Reliability Engineer jobs at Arch Capital Group - 808 jobs
Site Reliability Engineer
The Voleon Group 4.1
Berkeley, CA jobs
Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future.
Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together.
In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more.
As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production‑critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real‑world problems and collaborate with passionate and talented colleagues in an empowering, results‑driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.
Responsibilities
Improve fault‑tolerance and maintainability of code in proprietary data pipelines and trading systems
Diagnose and fix bugs in code
Lead complex deployments
Automate manual workflows
Track and prioritize outstanding production‑related issues
Share an on‑call rotation responding to incidents to ensure the continuous operation of production‑critical systems
Requirements
Experience with coding and debugging Python
Experience with Linux
Familiarity with Relational Databases & SQL
Sharp analytical and problem‑solving skills and a persistent drive to make things work (better)
Strong growth mindset and a passion for learning
Strong technical communication skills
Attention to detail
2 years of relevant industry experience
An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience
Preferred Qualifications
Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
Experience supporting production systems
Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes
The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match.
Friends of Voleon Candidate Referral Program
If you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program.
Equal Opportunity Employer
The Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
#J-18808-Ljbffr
$120k-160k yearly 5d ago
Looking for a job?
Let Zippia find it for you.
Service Reliability Engineer, GI Application Management
AIG-American International Group, Inc. 4.5
Charlotte, NC jobs
As a Site Reliability Engineer (SRE), you will apply software engineering principles to IT operations, ensuring robust and scalable systems. The core mission is to build resilient, efficient, and rapidly evolving IT infrastructure through a data-driv Reliability Engineer, Liability, Management, Application, Reliability, Reliability
$93k-119k yearly est. 2d ago
Process Improvement Specialist/Concrete Industry
DZ Corporation 4.3
The Villages, FL jobs
Reports To:
Operations Manager
The Process Improvement Specialist is responsible for optimizing production processes within the precast concrete facility. This role focuses on identifying inefficiencies, implementing process enhancements, and supporting quality and safety improvements across manufacturing operations. Working closely with production teams, engineers, and supervisors, the specialist helps streamline workflows, reduce waste, and ensure consistent product quality.
Key Responsibilities:
Process Analysis & Optimization:
Observe and analyze daily production activities (casting, curing, reinforcement, finishing, etc.) to identify bottlenecks and improvement opportunities.
Data Collection & Reporting:
Gather and track production data such as cycle times, material usage, downtime, and defect rates to support improvement projects.
Continuous Improvement Projects:
Assist in implementing Lean, 5S, or Six Sigma initiatives to improve plant efficiency, reduce waste, and enhance workplace organization.
Standard Work & Documentation:
Help develop and update standard operating procedures (SOPs), work instructions, and visual management tools.
Quality & Safety Support:
Collaborate with Quality Control and Safety teams to ensure process changes meet safety standards and product specifications.
Technical Support:
Support the introduction of new molds, equipment, or materials by conducting process trials and documenting results.
Collaboration:
Partner with maintenance, engineering, and production supervisors to troubleshoot recurring process issues.
Qualifications:
Education:
Associate's degree or technical diploma in Manufacturing Technology, Industrial Engineering, or related field.
Equivalent experience in precast concrete production or process improvement will be considered.
Experience:
2+ years in a manufacturing or precast concrete environment.
Familiarity with Lean Manufacturing, 6S, or Continuous Improvement principles.
Skills:
Strong mechanical aptitude and understanding of production equipment.
Ability to collect and interpret process data (cycle times, scrap, yield, etc.).
Proficiency in Microsoft Office and basic data entry tools.
Good communication and problem-solving skills.
Team-oriented and hands-on approach.
Preferred Qualifications:
Experience with precast or concrete manufacturing processes (casting, curing, form setup, reinforcement, finishing).
Knowledge of quality systems such as NPCA or PCI standards.
Basic CAD or technical drawing reading ability.
Certification in Lean or Six Sigma or willingness to acquire.
Performance Indicators:
Reduction in process waste or rework rates.
Increased production throughput and efficiency.
Improved safety compliance and incident reduction.
Consistency in meeting product quality standards.
Implementation and sustainability of improvement projects.
$68k-100k yearly est. 5d ago
Senior ML Engineer: Production Pipelines & HPC Expert
Capital One 4.7
McLean, VA jobs
A leading financial services company in Virginia seeks an experienced professional to design and build data-intensive solutions. The role requires expertise in C, C++, Python, Scala, and machine learning, along with the ability to lead teams and communicate complex concepts effectively. Candidates should possess a Bachelor's and preferably a Master's degree, with a proven track record in production-ready data pipelines and ML lifecycle. Competitive compensation and comprehensive benefits are offered.
#J-18808-Ljbffr
$90k-111k yearly est. 2d ago
Site Reliability Engineer
The Voleon Group 4.1
Remote
Voleon is a technology company that applies state-of-the-art AI and machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future. Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together.
In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more.
As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production-critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real-world problems and collaborate with passionate and talented colleagues in an empowering, results-driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.Responsibilities
Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems
Diagnose and fix bugs in code
Lead complex deployments
Automate manual workflows
Track and prioritize outstanding production-related issues
Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems
Requirements
Experience with coding and debugging Python
Experience with Linux
Familiarity with Relational Databases & SQL
Sharp analytical and problem-solving skills and a persistent drive to make things work (better)
Strong growth mindset and a passion for learning
Strong technical communication skills
Attention to detail
2 years of relevant industry experience
An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience
Preferred Qualifications
Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
Experience supporting production systems
Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes
The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match.
“Friends of Voleon” Candidate Referral ProgramIf you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this form to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity EmployerThe Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
$120k-160k yearly Auto-Apply 51d ago
Site Reliability Engineer 2
Drivewealth 4.0
Remote
DriveWealth is a global B2B financial technology organization dedicated to democratizing access to financial independence around the world. Our mission is realized through an API-based platform, empowering our partners to offer seamless investing and trading experiences to clients worldwide, all from their mobile devices.
Our technology provides partners with a modern, extensible toolkit, enabling traditional investment workflows and innovative techniques like fractional share ownership. DriveWealth has evolved into a global platform offering trading of US equities, mutual funds, ETFs, fixed income, and options.
We seek enthusiastic professionals to contribute diverse perspectives and experiences to our Brokerage-as-a-Service platform. Our culture blends the pace and opportunity of a tech start-up with the impact, stability, and significance of Wall Street. We encourage creativity and experimentation while ensuring institutional-grade execution and regulatory compliance in everything we do. We value diversity and inclusion, celebrating the unique differences of our employees as we scale and grow together. We're guided by operating principles grounded in accountability, teamwork, integrity, and solutions built to scale. Join us!
About The Role
As a Site Reliability Engineer 2, you will enhance the reliability and performance of our Brokerage-as-a-Service platform during critical 7/24 operations. This role demands a proactive approach to managing technical challenges and system optimizations that align with our global operational strategies.
What You'll Do
Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements.
Handle technical escalations, troubleshoot complex issues, and actively participate in on-call rotations to ensure rapid response and resolution during non-traditional hours.
Adhere and administer incident and change management policies.
Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability, especially during critical system operations at night.
Work closely with the New York office to ensure smooth operation and alignment of SRE practices across time zones.
What You'll Need
3+ years in a SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations.
Working knowledge in REST APIs and understanding of API integration.
Python proficiency in scripting for automation and system management, with a track record of developing and implementing automation solutions.
SQL and Database expertise in transactional databases, including querying and troubleshooting.
Analytical and troubleshooting skills with a demonstrated ability to perform troubleshooting and root cause analysis of technical issues.
Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage.
Knowledge of Change Management Process and Risk Management.
Nice to Have, But No Required
Experience in the brokerage or financial industry
Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB
Experience maintaining and supporting containerized systems, with familiarity in orchestration tools
Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation
Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow
Advanced skills in managing containerized environments using Kubernetes and OpenShift
Practical experience with Confluent Cloud for event streaming architectures
Experience with Java applications and a basic understanding of using the browser developer console for front-end debugging
Additional Notes: This role is critical for our continuous operations and requires a commitment to nighttime hours, aligning with the global nature of our financial services. Candidates must be prepared for intense collaboration periods and proactive communication across global teams.
Applicants must be authorized to work for any employer in the U.S. DriveWealth is unable to sponsor or take over sponsorship of an employment Visa at this time.
Compensation
Compensation package offerings are based on candidate experience and technical qualifications, as it relates to the role. These are identified and determined throughout your interviewing experience.
Please note: at this time, we are not able to hire in all states.
Remote (Most US States) Pay Range$70,000-$120,000 USD
Benefits
Competitive medical, dental, and vision insurance options
Mental health resources
Generous paid time off with observed holidays (varies per country)
Paid parental leave for biological and adoptive parents
Up to $2,500 or local equivalent each year to invest in continued education and personal development
Up to $900 each year or local equivalent for fitness and wellness reimbursement
Company-provided phone (varies by country)
For HQ in-office employees, a daily lunch stipend, unlimited snacks, and engaging office space in the Financial District
Pre-tax commuter benefits (US only)
Employer 401K match (US only)
Benefit offerings vary based on country and are subject to change.
Equal Employment Opportunity
To build technology and products that are used and loved by people and solve real-world problems, we need to build a team with many different perspectives and experiences. We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us at **************************.
Agency Disclaimer
DriveWealth does not accept agency resumes. Please do not forward resumes to our jobs alias, employees, or any other organization location. DriveWealth is not responsible for any fees related to unsolicited resumes.
$70k-120k yearly Auto-Apply 2d ago
Mechanical Reliability Engineer
Cantor Fitzgerald 4.8
Freeport, TX jobs
The Reliability Engineer improves the operational reliability of the various office and R&D facilities across NA, drives equipment reliability improvement by analyzing performance data to prioritize, identify and eliminate causes of equipment failures. With emphasis on mechanical systems initially, all equipment types that require reliability improvement are considered. The Reliability Engineer utilizes cross functional teams as technical resources to develop reliability improvement solutions.
Duties/Responsibilities:
Transforms sites from a reactive maintenance culture to a preventative maintenance culture.
Manages the preventive maintenance (PM) program for rotating equipment. Implements and coordinates the utilization of Predictive Maintenance technologies such as vibration, oil, and ultra-sound analysis.
Assists with development of definitions & tracking of Key Performance Indicators for site equipment; provides reports to support equipment performance analyses. Conducts root cause analysis of equipment failures and implement corrective and preventative actions.
Develops maintenance repair procedures and arranges for craftsmen training as needed.
Develops engineering solutions to maintenance problems, including ways of reducing or eliminating the need for maintenance through equipment replacement or improvement work.
Provides input to project teams regarding ease of maintenance, materials of construction, and equipment selection.
Assists operations with efforts to use technicians to set up and monitors routine equipment checks (seal levels, oil levels, etc.) Documents methods and measurements to support technician maintenance efforts.
Coordinates with site operations in analyzing equipment maintenance problems, project work and determining solutions.
Acts as Mechanical Technical resource for Maintenance and Facilities department. Limited travel for training/meetings and equipment supplier visits may be required.
Skills, Qualifications, Experience, Special Requirements:
Minimum of bachelor's degree in engineering, preferably mechanical engineering
At least 5 years of experience working in laboratory facilities, chemical or petrochemical plant environments.
Experience in evaluating tank, piping and heat exchanger condition in accordance with ASME, API or STI standards.
Experience in performing pump and system curve calculations and evaluation.
Excellent written and verbal communication skills
Microsoft Office experience with emphasis on Excel data manipulation
Experience leading cross-functional teams toward equipment improvement targets
Experience that includes reliability analysis of some or all the following facility plant equipment: pumps, fans, blowers, pressure vessels, heat exchangers, boilers, air compressors, mechanical seals, bearings and gears, and analyzing performance data from a CMMS system.
Considered a Plus
Previous experience with maintenance scheduling or planning in a production environment.
Experience with Building Automation Systems
Experience with Chillers, Exhaust Fans and Air Handlers
Knowledge of State Boiler and Mechanical Codes
Experience with the engineering and construction process and/or project management.
Experience with Tririga and SAP
Proven ability to drive organizational change.
Benefits and Perks:
Industry-leading Parental Leave Policy (up to 16 weeks)
Generous healthcare
Bright Horizons back-up care program
Generous paid time off
Education reimbursement
Referral Program
Opportunities to network and connect.
Benefits/perks listed may vary depending on the nature of your employment with Newmark and the job location.
Working Conditions: Normal working conditions with the absence of disagreeable elements
Note: The statements herein are intended to describe the general nature and level of work being performed by employees and are not to be construed as an exhaustive list of responsibilities, duties, and skills required of personnel so classified.
Newmark is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
$113k-147k yearly est. Auto-Apply 1d ago
Staff Site Reliability Engineer
Figure 4.5
San Jose, CA jobs
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.
We are looking for a Site Reliability Engineer to own our internal systems infrastructure. This role is responsible for setting up and managing cloud and on-prem infrastructure to deliver highly available, reliable, and automated systems.
Responsibilities:
Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more.
Migrate SaaS to self-hosted solutions to enhance security and reliability.
Implement monitoring and alerting systems, and define incident response plans and runbooks.
Reduce human workload through automation to automate deployment and scaling.
Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives.
Use a data driven approach to demonstrate service robustness and track optimization work.
Partner with the security team to ensure that security remediations and updates are applied in a timely manner.
Requirements:
Strong experience with Linux/Unix systems administration
Proficiency in programming/scripting
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems.
Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets.
Ability to work in cross-functional teams with developers, infra, and product teams
Excellent verbal and written communication skills
The US base salary range for this full-time position is between $175,000 - $250,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
$175k-250k yearly Auto-Apply 60d+ ago
Site Reliability Engineer - Capital Markets
Jefferies Financial Group Inc. 4.8
Jersey City, NJ jobs
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.
As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
Front Line Site ReliableEngineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
Build monitoring tools for application and infrastructure components.
Implement and manage scalable infrastructure using cloud-native technologies and tools.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
Develop and maintain CI/CD pipelines to streamline deployment processes.
Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
Create sustainable systems and services through automation.
Collaborate with Application team to establish and enforce production and development standards.
Document procedures, best practices and troubleshooting FAQs.
Resolve complex application and technical problems.
Debugging the system and fixing the production related issues.
Escalate / follow-up on permanent fix for development related issues.
Lead incident response efforts and post-mortem analysis to prevent future occurrences.
Handles complex operational tasks and recommends process and technology changes.
Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
Ability to work both independently and in groups in an energetic, diverse environment.
Participate in on-call rotations to ensure 24/7 system availability and support.
Support compliance and legal queries.
Qualifications:
Strong experience in Windows and Linux/Unix services.
Strong experience in scripting language like Power shell, Python and SQL.
Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
Strong Knowledge of FIX protocol
Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
$175k-200k yearly Auto-Apply 60d+ ago
Site Reliability Engineer
Tata Consulting Services 4.3
Jacksonville, FL jobs
Must Have Technical/Functional Skills Site Reliability Engineers are expected to be able to drive technology triage efforts to completion by assisting with restoral steps, identifying root cause, and work with different ITIL teams to improve environment stability
Must have a background of server management(Windows/Linux), understand distributed application flow, level of understanding code(Java/.NET), scripting, database knowledge(SQL/Oracle) and network transport layer.
Primary Skill: Splunk, Dynatrace, Linux/Windows Administration, Java, SQL, Autosys, Spring Boot, Kafka, Cockroach DB, Redis, JIRA, NetScout or related.
Secondary: Oracle, MS SQL, Sybase, Mainframe, and DB2, ITIL Certified.
Experience: Minimum 8 years
Roles & Responsibilities
* Experience with Remedy, Service now, JIRA in creating/updating/closing incident tickets
* Executing Bladelogic scripts to route traffic/recycle JVMs/enable/disable MQs
* Routing traffic using AIC tool
* Experience in using SoapUI/PostMan tool to test SOAP/REST APIs
* Proven experience in production support or a related role
* Experience in supporting Java/Java web services based applications with high volume transactions
* Working knowledge of Splunk and Dynatrace tools to identify issues in production quickly
* Basic knowledge of using Unix/Linux commands to login into servers, fetch logs, copy/delete files, run shell scripts
* Strong analytical and problem-solving skills
* Familiarity with incident and problem management processes
* Excellent communication skills and ability to understand customer-based requirements and expectations. Strong documentation skills. Highly effective at driving process improvement based on lessons learned analysis.
* Ability to work effectively in a team environment and independently
* Willingness to work in shifts and provide weekend support on a rotational basis
* Strong Mulesoft experience ( 4.X and up)
* Web Services ( SOAP, REST etc), Windows Services experience
* Strong RDBMS experience ( SQL Server, Oracle or DB2 etc )
* Hands on or strong knowledge with Autosys job scheduling
Salary Range- $100,000-$130,000 a year
#LI-SP3
#LI-VX1
$100k-130k yearly 3d ago
Site Reliability Engineer, Lead - Consumer Lending Domain
Toyota Motor Company 4.8
Plano, TX jobs
Who we are Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for talented team members who want to Dream. Do. Grow. with us.
An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment.
To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time.
Who we're looking for
Toyota Financial Services is building out a new Site Reliability Engineering (SRE) team for application domains, and we are seeking senior SRE engineers to ensure reliability, performance and availability of the applications within each domain. As a senior SRE engineer - applications, you will be working with development engineers, product owners, SRE Infrastructure, production engineers and Technology Operations Center personnel with a primary focus on improving observability, automation, overall system health, reliability and uptime.
What you'll be doing
* Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment.
* Working with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team culture and support professional development.
* Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform.
* Build automation and operational runbooks primarily using Python scripting.
* Manage container orchestration platforms and related cloud-native services.
* Drive reliability improvements through Service Level Objectives (SLOs), error budgets and Service Level Agreements (SLAs) aligned with business goals.
* Design & implement observability improvements using Dynatrace & CloudWatch.
* Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence.
* Conduct blameless post-incident reviews and drive continuous improvement.
* Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals.
* Participate in architectural reviews, providing input on reliability and scalability.
* Mentor, guide & provide technical direction to colleagues & SREs on the team, including design decisions & tradeoffs.
What you bring
* Experience with DevOps tools like GitHub, Harness & Dynatrace.
* Experience building self-healing systems and automated remediation workflows.
* 10+ years of experience in Site Reliability Engineering, DevOps, or related field.
* Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance.
* Strong experience with Terraform for AWS IaC.
* Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
* Deep knowledge of container orchestration (Kubernetes/EKS).
* Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
* Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.
Added bonus if you have
* AWS certifications (DevOps Engineer, Solutions Architect, etc.).
* Familiarity with GitOps, secrets management, and infrastructure monitoring best practices.
* Experience building self-healing systems and automated remediation workflows.
What we'll bring
During your interview process, our team can fill you in on all the details of our industry-leading benefits and career development opportunities. A few highlights include:
* A work environment built on teamwork, flexibility, and respect.
* Professional growth and development programs to help advance your career, as well as tuition reimbursement.
* Team Member Vehicle Purchase Discount
* Toyota Team Member Lease Vehicle Program (if applicable)
* Comprehensive health care and wellness plans for your entire family.
* Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute.
* Paid holidays and paid time off.
* Referral services related to prenatal services, adoption, childcare, schools, and more.
* Flexible spending accounts.
* Relocation assistance (if applicable).
Belonging at Toyota
Our success begins and ends with our people. We embrace all perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.
Applicants for our positions are considered without regard to race, ethnicity, national origin, sex, sexual orientation, gender identity or expression, age, disability, religion, military or veteran status, or any other characteristics protected by law.
Have a question, need assistance with your application or do you require any special accommodations? Please send an email to *****************************.
$86k-116k yearly est. Auto-Apply 1d ago
Site Reliability Engineer, Lead - Banking and Commercial Lending Domain
Toyota Motor Company 4.8
Plano, TX jobs
Who we are Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for talented team members who want to Dream. Do. Grow. with us.
An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment.
To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time.
Who we're looking for
Toyota Financial Services is building out a new Site Reliability Engineering (SRE) team for application domains, and we are seeking senior SRE engineers to ensure reliability, performance and availability of the applications within each domain. As a senior SRE engineer - applications, you will be working with development engineers, product owners, SRE Infrastructure, production engineers and Technology Operations Center personnel with a primary focus on improving observability, automation, overall system health, reliability and uptime.
What you'll be doing
* Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment.
* Working with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team culture and support professional development.
* Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform.
* Build automation and operational runbooks primarily using Python scripting.
* Manage container orchestration platforms and related cloud-native services.
* Drive reliability improvements through Service Level Objectives (SLOs), error budgets and Service Level Agreements (SLAs) aligned with business goals.
* Design & implement observability improvements using Dynatrace & CloudWatch.
* Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence.
* Conduct blameless post-incident reviews and drive continuous improvement.
* Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals.
* Participate in architectural reviews, providing input on reliability and scalability.
* Mentor, guide & provide technical direction to colleagues & SREs on the team, including design decisions & tradeoffs.
What you bring
* Experience with DevOps tools like GitHub, Harness & Dynatrace.
* Experience building self-healing systems and automated remediation workflows.
* 5-10 years of experience in Site Reliability Engineering, DevOps, or related field.
* Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance.
* Strong experience with Terraform for AWS IaC.
* Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
* Deep knowledge of container orchestration (Kubernetes/EKS).
* Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
* Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.
Added bonus if you have
* AWS certifications (DevOps Engineer, Solutions Architect, etc.).
* Familiarity with GitOps, secrets management, and infrastructure monitoring best practices.
* Experience building self-healing systems and automated remediation workflows.
What we'll bring
During your interview process, our team can fill you in on all the details of our industry-leading benefits and career development opportunities. A few highlights include:
* A work environment built on teamwork, flexibility, and respect.
* Professional growth and development programs to help advance your career, as well as tuition reimbursement.
* Team Member Vehicle Purchase Discount
* Toyota Team Member Lease Vehicle Program (if applicable)
* Comprehensive health care and wellness plans for your entire family.
* Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute.
* Paid holidays and paid time off.
* Referral services related to prenatal services, adoption, childcare, schools, and more.
* Flexible spending accounts.
* Relocation assistance (if applicable).
Belonging at Toyota
Our success begins and ends with our people. We embrace all perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.
Applicants for our positions are considered without regard to race, ethnicity, national origin, sex, sexual orientation, gender identity or expression, age, disability, religion, military or veteran status, or any other characteristics protected by law.
Have a question, need assistance with your application or do you require any special accommodations? Please send an email to *****************************.
$86k-116k yearly est. Auto-Apply 1d ago
Reliability Engineer (SRE OMS)
Tata Consulting Services 4.3
Marlborough, MA jobs
* SRE with Sterling OMS Skillset with adaptability to Distributed Systems, developing Automations with AI/GenAI tool etc * Operations skillset with enough attitude to scale to a Reliability Engineer. * Should be able to handle customer communication and coordination with offshore team.
TCS Employee Benefits Summary:
* Discretionary Annual Incentive.
* Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
* Family Support: Maternal & Parental Leaves.
* Insurance Options: Auto & Home Insurance, Identity Theft Protection.
* Convenience & Professional Growth: Commute r Benefits & Certification & Training Reimbursement.
* Time Off: Vacation, Time Off, Sick Leave & Holidays.
* Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
# LI-RJ2
Salary Range - $100,000-$120,000 a year
$100k-120k yearly 23d ago
Reliability Engineer
Tata Consulting Services 4.3
Marlborough, MA jobs
* SRE to quickly write automations, self-heal scripts, understanding and finding resolutions for errors from Microservices basically any from any stack ( Full-Stack capable). * Operations skillset with enough attitude to scale to a Reliability Engineer
* Should be able to handle customer communication and coordination with offshore team.
TCS Employee Benefits Summary:
* Discretionary Annual Incentive.
* Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
* Family Support: Maternal & Parental Leaves.
* Insurance Options: Auto & Home Insurance, Identity Theft Protection.
* Convenience & Professional Growth: Commute r Benefits & Certification & Training Reimbursement.
* Time Off: Vacation, Time Off, Sick Leave & Holidays.
* Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
# LI-RJ2
Salary Range - $100,000-$120,000 a year
$100k-120k yearly 23d ago
Site Reliability Engineer
Tata Consulting Services 4.3
Miami, FL jobs
Must-Have * Strong development experience in .NET and Java frameworks. * Proven leadership managing SRE and DevOps teams. * Incident and problem management using ServiceNow. * Expertise in Observability: AppDynamics, PagerDuty, Grafana, Splunk. * Deep understanding of CI/CD with Azure ADO, GitHub, Maven, Gradle.
* Automated regression and performance testing experience with Selenium, JMeter.
* Experience building self-healing systems.
* Strong skills in root cause analysis (RCA) and problem identification.
* Ability to define and enforce SLAs and response metrics.
* Document and maintain version-controlled knowledge repositories.
* Exposure to self-healing systems in SRE or DevOps context.
Good-to-Have
* Certifications in AWS/GCP/Azure
Salary Range-$100,000-$120,000 a year
#LI-KR3
TCS Employee Benefits Summary:
Discretionary Annual Incentive.
Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
Family Support: Maternal & Parental Leaves.
Insurance Options: Auto & Home Insurance, Identity Theft Protection.
Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.
Time Off: Vacation, Time Off, Sick Leave & Holidays.
Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
Experience working in a Travel/Tourism industry
$100k-120k yearly 20d ago
Lead Reliability Engineer (AI Focus)
Mastercard 4.7
OFallon, MO jobs
Our Purpose
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Lead Reliability Engineer (AI Focus) Overview
The Business Operations (BizOps) team is seeking a Lead Reliability Engineer to help establish an AI Community of Practice.
BizOps serves as the production readiness steward for Mastercard products. As part of Reliability Engineering, we ensure highly reliable service functionality by developing and maintaining service management strategies, tools, and service-level objectives to deliver zero-touch, resilient solutions for applications and infrastructure. We see the big picture, enforce operational standards, and foster an agile, learning culture.
The Process and Governance team within Payment Network BizOps is embarking on a journey to build an AI Community of Practice across the enterprise. AI will be implemented at scale to provide a foundational competitive advantage-enabling all internal processes, products, and services to continuously advance Mastercard's value proposition, consumer experience, and operational efficiency.
Responsibilities
The BizOps AI Service Management Reliability Engineer will focus on:
Applying data analysis and reporting to support innovation, maturity of practices, and regulatory/compliance risk assessment.
Analyzing IT Service Management (ITSM) activities and providing feedback to development teams on operational gaps and resiliency concerns.
Engaging in the full lifecycle of services-from design and deployment to operation and refinement.
Driving adoption of an industry-leading AI platform and building a secure, scalable, resilient ecosystem aligned with Mastercard standards.
Supporting platform integration with internal systems, tools, and data sources, including real-time and warehouse data.
Preparing services for launch through design consulting, capacity planning, and readiness reviews.
Maintaining live services by monitoring availability, latency, and overall system health.
Optimizing processes and implementing best practices for reliability and automation.
Collaborating across technical operations, product teams, data science, and quality engineering.
Ensuring AI solutions comply with data management and privacy standards.
Establishing governance, audit, and automation processes to accelerate build-to-deploy cycles.
Troubleshooting production events holistically across the technology stack to minimize recovery time.
Promoting a learning culture and advancing AI capabilities continuously.
Qualifications
Bachelor's degree in Information Systems, IT, Computer Science, Engineering, or equivalent experience.
Strong analytical skills and curiosity to identify root causes and solve upstream challenges.
Experience with AI platforms, data science frameworks, and integrating payment products and analytics solutions at scale.
Knowledge of AI development tools, methodologies, and languages.
Familiarity with data governance, security, privacy, and compliance in complex environments.
Ability to adapt quickly to emerging trends and tools in AI/ML.
Proven experience collaborating across development, operations, and product teams.
Background in production support and ITSM processes.
Equal Opportunity
Mastercard is an inclusive Equal Employment Opportunity employer. We consider applicants without regard to gender, gender identity, sexual orientation, race, ethnicity, disability, veteran status, or any other characteristic protected by law.Mastercard is a merit-based, inclusive, equal opportunity employer that considers applicants without regard to gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law. We hire the most qualified candidate for the role. In the US or Canada, if you require accommodations or assistance to complete the online application process or during the recruitment process, please contact reasonable_accommodation@mastercard.com and identify the type of accommodation or assistance you are requesting. Do not include any medical or health information in this email. The Reasonable Accommodations team will respond to your email promptly.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
Abide by Mastercard's security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
In line with Mastercard's total compensation philosophy and assuming that the job will be performed in the US, the successful candidate will be offered a competitive base salary and may be eligible for an annual bonus or commissions depending on the role. The base salary offered may vary depending on multiple factors, including but not limited to location, job-related knowledge, skills, and experience. Mastercard benefits for full time (and certain part time) employees generally include: insurance (including medical, prescription drug, dental, vision, disability, life insurance); flexible spending account and health savings account; paid leaves (including 16 weeks of new parent leave and up to 20 days of bereavement leave); 80 hours of Paid Sick and Safe Time, 25 days of vacation time and 5 personal days, pro-rated based on date of hire; 10 annual paid U.S. observed holidays; 401k with a best-in-class company match; deferred compensation for eligible roles; fitness reimbursement or on-site fitness facilities; eligibility for tuition reimbursement; and many more. Mastercard benefits for interns generally include: 56 hours of Paid Sick and Safe Time; jury duty leave; and on-site fitness facilities in some locations.
Pay Ranges
O'Fallon, Missouri: $122,000 - $207,000 USD
$122k-207k yearly Auto-Apply 4d ago
Lead Reliability Engineer (AI Focus)
Mastercard 4.7
OFallon, MO jobs
**Our Purpose** _Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential._
**Title and Summary**
Lead Reliability Engineer (AI Focus)
Overview
The Business Operations (BizOps) team is seeking a Lead Reliability Engineer to help establish an AI Community of Practice.
BizOps serves as the production readiness steward for Mastercard products. As part of Reliability Engineering, we ensure highly reliable service functionality by developing and maintaining service management strategies, tools, and service-level objectives to deliver zero-touch, resilient solutions for applications and infrastructure. We see the big picture, enforce operational standards, and foster an agile, learning culture.
The Process and Governance team within Payment Network BizOps is embarking on a journey to build an AI Community of Practice across the enterprise. AI will be implemented at scale to provide a foundational competitive advantage-enabling all internal processes, products, and services to continuously advance Mastercard's value proposition, consumer experience, and operational efficiency.
Responsibilities
The BizOps AI Service Management Reliability Engineer will focus on:
Applying data analysis and reporting to support innovation, maturity of practices, and regulatory/compliance risk assessment.
Analyzing IT Service Management (ITSM) activities and providing feedback to development teams on operational gaps and resiliency concerns.
Engaging in the full lifecycle of services-from design and deployment to operation and refinement.
Driving adoption of an industry-leading AI platform and building a secure, scalable, resilient ecosystem aligned with Mastercard standards.
Supporting platform integration with internal systems, tools, and data sources, including real-time and warehouse data.
Preparing services for launch through design consulting, capacity planning, and readiness reviews.
Maintaining live services by monitoring availability, latency, and overall system health.
Optimizing processes and implementing best practices for reliability and automation.
Collaborating across technical operations, product teams, data science, and quality engineering.
Ensuring AI solutions comply with data management and privacy standards.
Establishing governance, audit, and automation processes to accelerate build-to-deploy cycles.
Troubleshooting production events holistically across the technology stack to minimize recovery time.
Promoting a learning culture and advancing AI capabilities continuously.
Qualifications
Bachelor's degree in Information Systems, IT, Computer Science, Engineering, or equivalent experience.
Strong analytical skills and curiosity to identify root causes and solve upstream challenges.
Experience with AI platforms, data science frameworks, and integrating payment products and analytics solutions at scale.
Knowledge of AI development tools, methodologies, and languages.
Familiarity with data governance, security, privacy, and compliance in complex environments.
Ability to adapt quickly to emerging trends and tools in AI/ML.
Proven experience collaborating across development, operations, and product teams.
Background in production support and ITSM processes.
Equal Opportunity
Mastercard is an inclusive Equal Employment Opportunity employer. We consider applicants without regard to gender, gender identity, sexual orientation, race, ethnicity, disability, veteran status, or any other characteristic protected by law.
Mastercard is a merit-based, inclusive, equal opportunity employer that considers applicants without regard to gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law. We hire the most qualified candidate for the role. In the US or Canada, if you require accommodations or assistance to complete the online application process or during the recruitment process, please contact reasonable_accommodation@mastercard.com and identify the type of accommodation or assistance you are requesting. Do not include any medical or health information in this email. The Reasonable Accommodations team will respond to your email promptly.
**Corporate Security Responsibility**
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
+ Abide by Mastercard's security policies and practices;
+ Ensure the confidentiality and integrity of the information being accessed;
+ Report any suspected information security violation or breach, and
+ Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
In line with Mastercard's total compensation philosophy and assuming that the job will be performed in the US, the successful candidate will be offered a competitive base salary and may be eligible for an annual bonus or commissions depending on the role. The base salary offered may vary depending on multiple factors, including but not limited to location, job-related knowledge, skills, and experience. Mastercard benefits for full time (and certain part time) employees generally include: insurance (including medical, prescription drug, dental, vision, disability, life insurance); flexible spending account and health savings account; paid leaves (including 16 weeks of new parent leave and up to 20 days of bereavement leave); 80 hours of Paid Sick and Safe Time, 25 days of vacation time and 5 personal days, pro-rated based on date of hire; 10 annual paid U.S. observed holidays; 401k with a best-in-class company match; deferred compensation for eligible roles; fitness reimbursement or on-site fitness facilities; eligibility for tuition reimbursement; and many more. Mastercard benefits for interns generally include: 56 hours of Paid Sick and Safe Time; jury duty leave; and on-site fitness facilities in some locations.
**Pay Ranges**
O'Fallon, Missouri: $122,000 - $207,000 USD
$122k-207k yearly 9d ago
Network Reliability Engineer III
CME Group 4.4
Chicago, IL jobs
As we embark on a journey to transform the Network Services Group in CME, we are seeking a Network Reliability Engineer III to join our dynamic team. In this role, you will design, develop and maintain self-service tools and applications that enhance productivity and reduce operational costs. You will work across the full stack-both front-end and back-end-to architect microservices (GKE) in Google Cloud Platform (GCP), driving our infrastructure towards greater automation and reliability.
We are a global team across US, UK, India and Singapore made up of a diverse range of people from varied backgrounds who each bring unique network experiences and skill sets. The relatively new Network Reliability/Automation team are responsible for building a suite of custom automation tools and developing our self-healing capabilities while working closely with other members of the Network Services team in project delivery to ensure one of the largest Exchange network infrastructures in the world is highly available, resilient, secure and reliable.
Responsibilities
* Design, develop and maintain self-service and automation tools to streamline IT operations and reduce manual effort.
* Engage in full-stack development, delivering responsive front-end interfaces as well as robust scalable back-end services.
* With support Architect, deploy and scale microservices on GCP, with particular emphasis on containers and Google Kubernetes Engine (GKE).
* Manage cloud infrastructure via Infrastructure-as-Code (IaC), primarily using Terraform to provision and maintain resources.
* Operate and troubleshoot solutions on Linux-based platforms, leveraging Visual Studio Code (VSCode) as the primary development environment.
* Adhere to software engineering best practices, including PEP8 coding standards, SOLID design principles, and established SDLC processes.
* Implement and manage CI/CD pipelines with a DevOps mindset, ensuring rapid, reliable delivery of code.
* Develop and consume Flask-based RESTful APIs to support network and security automation.
* Collaborate within an Agile Scrum framework, utilizing tools such as Bitbucket and Jira to track progress and manage sprints.
* Apply strong analytical and problem-solving skills to balance multiple project variables and deliver high-quality solutions on schedule.
What we are looking for
* Approximately 2-3 years' hands-on Python programming experience, with a demonstrable track record of automation or tooling projects.
* Knowledge and experience working with both Python Django and Flask in a corporate environment.
* Any experience in network and security automation, coupled with understanding of network fundamentals (routing, switching, firewalls, VPNs) would be beneficial.
* Experience developing REST APIs using Flask (or a comparable Python framework).
* Applicants with front-end experience using Javascript/JQuery/HTML5/CSS would be ideal.
* Familiarity with Infrastructure-as-Code using Terraform (or similar) to manage cloud resources.
* Comfortable working in Linux environments and proficient in using Visual Studio Code (VSCode).
* Strong software engineering mindset: adherence to PEP8, SOLID principles, and best practices for SDLC, CI/CD and DevOps.
* Excellent communication skills, both verbal and written, with the ability to convey technical concepts to diverse stakeholders.
* Highly analytical, with the ability to troubleshoot complex issues and manage multiple tasks concurrently.
* Experience working in Agile Scrum teams, utilizing Bitbucket and Jira (or equivalent tools) for version control and project tracking.
Personal Attributes
* Proactive and positive attitude, taking initiative to identify and resolve issues ahead of time.
* Collaborative team player, eager to contribute knowledge and assist colleagues.
* Innovative thinker who brings fresh ideas and constructive suggestions for continuous improvement.
Education
Bachelor's Degree in Computer Science, Engineering or a related field is preferred. Equivalent practical experience will also be considered.
#LI - Hybrid
#LI - JK1
CME Group is committed to offering a competitive total rewards package for our employees that recognizes their contributions to the business and reflects our long-term investment in their future. The pay range for this role is $100,700-$167,800. Actual salary offered will be dependent on a wide array of factors including but not limited to: relevant experience, skills, education and comparison to internal employees (where relevant). Our compensation program also includes an annual target bonus opportunity for all employees, as well as the opportunity to become an owner in the company through our broad-based equity program. Through our benefits program, we strive to offer flexibility, value and choice. From comprehensive health coverage, to a retirement package that includes both a 401(k) and an active pension plan, to highly competitive education reimbursement provisions, paid time off and a mental health benefit, CME Group offers a holistic benefits package for our team and their dependents.
CME Group: Where Futures are Made
CME Group is the world's leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it - all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And we're looking for more.
At CME Group, we embrace our employees' unique experiences and skills to ensure that everyone's perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.
Important Notice: Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.
$100.7k-167.8k yearly 60d+ ago
Java Site Reliability Engineer, Messaging Platforms
Pacific Investment Management Co 4.9
Austin, TX jobs
We are a leading global asset management firm with over 3,000 employees across 20 offices in 15 countries; we help millions of investors around the world pursue their financial goals.
We hire critical thinkers. People who thrive in a collaborative culture like ours where we solve real problems while building the future of finance.
You
Are excited to be part of a vibrant engineering community that values diversity, hard work, and continuous learning.
Love solving complex real-world business problems.
Recognize that cross-functional collaboration is a core component of success for the team.
Believe there are multiple ways to solve most technical problems and are willing to debate the trade-offs.
Have become a stronger engineer by making mistakes and learning from them.
Are a doer, someone who wants to grow their career and gain experience across technologies and business functions.
We
Continuously invest in a high-performance and inclusive culture, in which a diversity of backgrounds, experiences and viewpoints are celebrated and valued.
Encourage career mobility, so you can benefit from learning different functions and technologies, and we gain the benefits of your experience across teams.
Run technology pro bono programs that help the non-profit community and give our engineering community opportunities to volunteer and participate.
Offer education reimbursements and ongoing training in technology, communication, and diversity & inclusion.
Embrace knowledge sharing through lunch-and-learns, demos, and technical forums.
Consider our people to be our greatest asset-we will help you learn what PIMCO Technology has to offer so you can participate in activities that benefit your career while delivering impactful technology solutions.
As a Java SRE in Trading Technology, you will:
As our immediate need
Help support the messaging platforms in use (MQ, AMPS, Kafka, etc.).
driving the firm's best use of these platforms, making sure all choice make sense, the correct tools issued for the solving each job, and that we build a sustainable messaging strategy.
Improve the operational efficiency and reduce the operational risk of our messaging platforms through better tools, better design, and better monitoring.
In the future
there will be new architectural or coding problems that we will need an experienced engineer to help solve.
Work closely with the business and other teams to design and implement solutions that have immediate impact to the business and help us build towards our strategic vision across all our trade floor applications.
We need someone proficient in Java, passionate about SRE practices, and able to collaborate effectively with an infrastructure team. We expect you to have a strong passion for messaging systems, including their proper setup, monitoring, and maintenance. At the same time, this role involves software development for target platforms once the immediate needs related to messaging platforms are resolved.
You will work with a team consisting of 1 SRE and 1 Unix SA, with full support from the infrastructure and DevOps teams.
Position Requirements
Bachelor's degree in computer science or equivalent
Strong Linux skills (including chef, puppet, ansible configuration tools)
Strong experience with different messaging systems (Kafka, AMPS, MQ, FIX, etc.).
Strong engineering culture (unit tests, CI/CD)
Ability to work independently and in teams
Good communication skills
Working from the office in Austin 4 days a week.
PIMCO follows a total compensation approach when rewarding employees which includes a base salary and a discretionary bonus. Base salary is the fixed component of compensation that is determined by core job responsibilities, relevant experience, internal level, and market factors. The discretionary bonus is used to award performance and therefore is determined by company, business, team, and individual performance.
Salary Range: $ 175,000.00 - $ 240,000.00
Equal Employment Opportunity and Affirmative Action Statement
PIMCO recruits and hires qualified candidates without regard to race, national origin, ancestry, religion (including religious dress and grooming practices), sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), sexual orientation, gender (including gender identity and expression), age, military or veteran status, disability (physical or mental), any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity and affirmative action, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other basis such as medical condition, or marital status under applicable laws.
Applicants with Disabilities
PIMCO is an Equal Employment Opportunity/Affirmative Action employer. We provide reasonable accommodation for qualified individuals with disabilities, including veterans, in job application procedures. If you have any difficulty using our online system due to a disability and you would like to request an accommodation, you may contact us at ************ and leave a message. This is a dedicated line designed exclusively to assist job seekers with disabilities to apply online. Only messages left for this purpose will be considered. A response to your request may take up to two business days.
$175k-240k yearly Auto-Apply 53d ago
Site Reliability Engineer II-2
Mastercard 4.7
Bogota, NJ jobs
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Site Reliability Engineer II-2
Overview
The GBSC EPMS team is looking for a Site Reliability Engineer who can help us solve problems, implement automation, and leverage best practices.
* Are you a born problem solver who loves to figure out how something works?
* Are you a detail -oriented individual who enjoys complex problem solving?
* Do you love determining the correct actions required to fix a problem?
* Do you have a low tolerance for manual work and look to automate everything you can?
Business Operations is leading the Site Reliability Engineering (SRE) transformation at Mastercard through our tooling and by being an advocate for change & standards throughout the development, quality, release, and product organizations. We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Responsibilities
* Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement.
* Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
* Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
* Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
* Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
* Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
* Practice sustainable incident response and blameless postmortems.
* Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
* Work with a global team spread across tech hubs in multiple geographies and time zones
* Share knowledge and mentor junior resources
All About You
* BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
* Experience with algorithms, data structures, scripting, pipeline management, software design and OLAP systems.
* Hands on experience with understanding custom objects using JavaScript, HTML5, CSS and API integrations.
* Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
* Ability to help debug and optimize code and automate routine tasks.
* We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
* Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl, Ruby, MDX.
* Interest in designing, analyzing and troubleshooting large-scale distributed systems.
* We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
* Abide by Mastercard's security policies and practices;
* Ensure the confidentiality and integrity of the information being accessed;
* Report any suspected information security violation or breach, and
* Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.