Reliability Engineer jobs at Mastercard - 472 jobs
Site Reliability Engineer
The Voleon Group 4.1
Berkeley, CA jobs
Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future.
Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together.
In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more.
As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production‑critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real‑world problems and collaborate with passionate and talented colleagues in an empowering, results‑driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.
Responsibilities
Improve fault‑tolerance and maintainability of code in proprietary data pipelines and trading systems
Diagnose and fix bugs in code
Lead complex deployments
Automate manual workflows
Track and prioritize outstanding production‑related issues
Share an on‑call rotation responding to incidents to ensure the continuous operation of production‑critical systems
Requirements
Experience with coding and debugging Python
Experience with Linux
Familiarity with Relational Databases & SQL
Sharp analytical and problem‑solving skills and a persistent drive to make things work (better)
Strong growth mindset and a passion for learning
Strong technical communication skills
Attention to detail
2 years of relevant industry experience
An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience
Preferred Qualifications
Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
Experience supporting production systems
Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes
The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match.
Friends of Voleon Candidate Referral Program
If you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program.
Equal Opportunity Employer
The Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
#J-18808-Ljbffr
$120k-160k yearly 3d ago
Looking for a job?
Let Zippia find it for you.
Senior AI Site Reliability Engineer
Charles Schwab Corporation 4.8
San Francisco, CA jobs
Your Opportunity
At Schwab, you will build a rewarding career while making a difference in the lives of our millions of clients. Here, innovative thinking meets creative problem solving as we work together to challenge the status quo. We believe in the power of collaboration and value being together in the office, which is why this role is based on-site in our San Francisco office. Joining Schwab means joining a company committed to transforming the financial industry and putting clients at the center of everything we do.
Schwab's AI Strategy & Transformation team, known as AI.x, is the central hub for Artificial Intelligence at Schwab. We are an integrated product, engineering, strategy and risk team, all based in San Francisco. We help set the enterprise vision for AI, invest in the most promising opportunities, and accelerate delivery across the company. We also build the core platform that powers AI at scale and explore next‑generation GenAI efforts that will redefine how we serve our clients. As a Senior AI Site Reliability Engineer on AI.x, you will play a key role in ensuring our AI solutions are reliable, scalable, and resilient-enabling us to deliver innovative experiences to millions of clients.
This role is more than a reliability engineering position. It is an opportunity to join a high‑profile team shaping Schwab's future with AI, to build and maintain solutions that matter to millions of clients, and to grow your career in one of the most exciting areas of technology today.
As a Senior AI Site Reliability Engineer, you will design, implement, and manage the reliability and operational excellence of GenAI applications and platforms. You will work closely with architects, engineers, and business leaders to align reliability practices with Schwab's enterprise strategy. You will mentor and coach junior engineers, helping to build strong operational practices and foster a culture of continuous improvement. You will lead by example in solving complex reliability challenges, advancing SRE standards, and driving rapid iteration from concept to production. Above all, you will bring curiosity, creativity, and technical depth to help shape the next generation of reliable AI at Schwab.
What you have
Required Qualifications
8+ years of software development or reliability engineering experience, with 4+ years as a hands‑on senior engineer in startups and/or large organizations.
Bachelor's degree in Computer Science or related field.
5+ years of experience building and operating complex products from scratch and running them in production.
3+ years of experience supporting applications that use Artificial Intelligence (AI) models to deliver real business impact.
3+ years of experience building and maintaining data pipelines and infrastructure for large datasets.
3+ years of experience with containers and cloud‑native applications, and the ability to operationalize them in the public cloud with infrastructure as code.
Experience implementing monitoring, alerting, and incident response for large‑scale distributed systems.
Proven track record in driving reliability, scalability, and performance improvements for production AI systems.
Preferred Qualifications
Strong computer science fundamentals and experience working across different parts of the tech stack.
Experience working with proprietary or open‑source LLMs (Gemini, Claude, OpenAI or other models) and supporting LLM‑powered applications in production.
Focus on quality and reliability in everything you do. Continue to raise the bar and drive others to deliver high‑quality, resilient products, with experience writing tests and implementing automated reliability checks.
Experience writing and running evaluations to ensure quality and monitor consistency in LLM‑generated responses and actions.
Strong communication skills - you balance written and verbal communication to clearly share your perspective with others on the team.
Experience mentoring junior engineers and helping them grow their technical and operational skills through clear feedback and code reviews.
Demonstrated mindset of continuous learning and improvement.
Ability to solve complex problems with ambiguous or incomplete data in highly distributed systems.
Demonstrated business domain knowledge related to all products you have worked on.
Curiosity about new technologies and processes - you always seek to improve yourself and everyone around you and proactively seek and share knowledge with others on your team.
Experience with Python and front‑end development preferred but not required.
Master's or advanced degrees in Computer Science or related fields.
In addition to the salary range, this role is also eligible for bonus or incentive opportunities.
#J-18808-Ljbffr
$118k-152k yearly est. 4d ago
Senior AI SRE: Scale GenAI Reliability & Impact
Charles Schwab Corporation 4.8
San Francisco, CA jobs
A leading financial services firm is seeking a Senior AI Site Reliability Engineer responsible for designing and managing the reliability of AI-driven applications. In this role, you'll work on innovative projects and mentor junior engineers while collaborating with cross-functional teams. Candidates should have extensive experience in software development and reliability engineering, with a particular focus on AI systems. This on-site position is located in San Francisco and offers opportunities for professional growth and development.
#J-18808-Ljbffr
$118k-152k yearly est. 4d ago
Process Improvement Specialist/Concrete Industry
DZ Corporation 4.3
The Villages, FL jobs
Reports To:
Operations Manager
The Process Improvement Specialist is responsible for optimizing production processes within the precast concrete facility. This role focuses on identifying inefficiencies, implementing process enhancements, and supporting quality and safety improvements across manufacturing operations. Working closely with production teams, engineers, and supervisors, the specialist helps streamline workflows, reduce waste, and ensure consistent product quality.
Key Responsibilities:
Process Analysis & Optimization:
Observe and analyze daily production activities (casting, curing, reinforcement, finishing, etc.) to identify bottlenecks and improvement opportunities.
Data Collection & Reporting:
Gather and track production data such as cycle times, material usage, downtime, and defect rates to support improvement projects.
Continuous Improvement Projects:
Assist in implementing Lean, 5S, or Six Sigma initiatives to improve plant efficiency, reduce waste, and enhance workplace organization.
Standard Work & Documentation:
Help develop and update standard operating procedures (SOPs), work instructions, and visual management tools.
Quality & Safety Support:
Collaborate with Quality Control and Safety teams to ensure process changes meet safety standards and product specifications.
Technical Support:
Support the introduction of new molds, equipment, or materials by conducting process trials and documenting results.
Collaboration:
Partner with maintenance, engineering, and production supervisors to troubleshoot recurring process issues.
Qualifications:
Education:
Associate's degree or technical diploma in Manufacturing Technology, Industrial Engineering, or related field.
Equivalent experience in precast concrete production or process improvement will be considered.
Experience:
2+ years in a manufacturing or precast concrete environment.
Familiarity with Lean Manufacturing, 6S, or Continuous Improvement principles.
Skills:
Strong mechanical aptitude and understanding of production equipment.
Ability to collect and interpret process data (cycle times, scrap, yield, etc.).
Proficiency in Microsoft Office and basic data entry tools.
Good communication and problem-solving skills.
Team-oriented and hands-on approach.
Preferred Qualifications:
Experience with precast or concrete manufacturing processes (casting, curing, form setup, reinforcement, finishing).
Knowledge of quality systems such as NPCA or PCI standards.
Basic CAD or technical drawing reading ability.
Certification in Lean or Six Sigma or willingness to acquire.
Performance Indicators:
Reduction in process waste or rework rates.
Increased production throughput and efficiency.
Improved safety compliance and incident reduction.
Consistency in meeting product quality standards.
Implementation and sustainability of improvement projects.
$68k-100k yearly est. 3d ago
Senior Quality Engineer
Affinipay, LLC 3.9
Austin, TX jobs
About the role:
We are seeking an experienced Senior Quality Engineer to join our Quality Engineering team at 8am. In this role, you will serve as a customer advocate and quality champion for your embedded scrum team, ensuring the functionality, reliability, performance, and user experience of our FinTech/Back-End SaaS platform through comprehensive testing strategies and collaborative engineering practices.
At 8am, Quality is not a phase-it's a mindset woven into every conversation, decision, and line of code. As a senior member of the team, you'll leverage your deep testing expertise to drive quality improvements that impact not just your immediate team but the broader engineering organization. You'll mentor junior engineers, contribute to process improvements, and take ownership of quality initiatives that protect and enhance our customers' experience.
About us:
At 8am, our vision is to power a world where professionals thrive. We start every day on a mission to empower professionals with the most trusted, innovative technology to deliver world-class outcomes for their clients and exceptional financial results for their business. They count on our purpose-built solutions to simplify operations, ensure compliance, and fuel profitable growth, so they can focus on their clients and do more of the work that matters.
Founded in 2005, 8am (formerly AffiniPay) is the professional business platform built to help legal, accounting, and other client-focused professionals run stronger, more profitable businesses. Today, more than 250,000 professionals across the U.S. trust 8am to help them work smarter, serve clients better, and unlock their full potential. We have been recognized as one of Inc 5000's fastest growing companies in the U.S. for 13 years in a row, and as a result, our teams continue to grow as well!
What you'll do:
Customer Advocacy & Technical Excellence
Champion the customer experience at every stage of development, viewing deliverables through the customer's lens
Design and execute comprehensive test strategies across UI, API, and database layers, ensuring they reflect real customer scenarios
Speak up when features aren't complete enough to test or when quality concerns arise-even if it means slowing down
Challenge requirements that don't serve customer needs and identify issues that could frustrate users before they reach production
Write high-quality test cases that tell a story about customer interactions and maintain them through their entire lifecycle
Identify and document defects with comprehensive root cause analysis, always including "how this affects the customer" assessments
Drive continuous improvement in our Defect Leakage Rate (≤15% target), recognizing that every bug reaching production erodes customer trust
Leadership & Execution
Complete 90%+ of assigned tasks independently while providing guidance and mentorship to junior engineers
Own quality initiatives and drive measurable improvements to QE processes and standards
Create and lead test plan reviews using test management tools like TestRail or Qase, ensuring adherence to documented standards
Engage with Product Owners and Product Management on feature feasibility, testability, and requirement clarification
Become a Subject Matter Expert (SME) for specific technical areas or product components
Track and report on Quality KPIs, with particular focus on Defect Leakage Rate-our primary customer-focused metric that measures bugs caught by QE versus those found by customers
Collaborate with SDET team/Developers on determining what tests are high value for automation, and ensuring that test automation operates as expected, owning automated test execution.
Technical Proficiency
Demonstrate proficient to advanced networking and Linux skills, utilizing command-line tools across common distributions
Leverage Python scripting for assisting with test automation tasks as necessary (Dedicated SDET team owns framework and most automation tasks). Willingness to learn basic Python scripting if not previously leveraged
Validate services deployed on AWS and other cloud platforms, managing test environments and cloud resources efficiently
Basic understanding of security and performance testing
Collaboration & Communication
Work effectively within scrum team, and across them when user features span teams, contributing meaningfully to technical discussions
Translate technical issues into customer experience impacts for both technical and non-technical stakeholders
Provide timely, helpful feedback to peers and junior engineers through code reviews and test case audits
Actively contribute to team knowledge sharing through documentation and presentations
Participate in all Agile ceremonies with a focus on quality advocacy
About you:
3-7 years of experience in software quality engineering, preferably in SaaS environments
Strong demonstrated proficiency in Linux distributions and CLI-based testing, including log file analysis, and other troubleshooting tasks
Experience with AWS or other major cloud platforms
Basic Python/Shell (or similar) scripting knowledge with ability to edit existing scripts and create new automation
Advanced skills with UI, API, and SQL testing methodologies
Familiarity with test management tools such as TestRail or Qase
Demonstrated experience leveraging Version Control Systems with a focus on Github and Bitbucket
Experience with testing tools: Jira, Sentry, DataDog
Understanding of containerization, Kubernetes, and CI/CD pipelines (Jenkins, CircleCI)
Strong understanding of Agile/Scrum methodologies
Proven track record of mentoring junior engineers and contributing to process improvements
Excellent analytical and problem-solving abilities
Strong communication skills with ability to present to both technical and non-technical stakeholders
Most importantly: The courage to be vocal about quality concerns and testing impediments
Demonstrated experience leveraging AI tools and technologies to improve workflows, enhance decision-making, or drive innovation.
Additional Information
The annual salary range for this position is $100,000 to $160,000. The salary range for performing this role outside of the US / Austin / California may differ. 8am is committed to offering competitive, fair and commensurate compensation and has provided an estimated pay range for this role. Actual compensation may vary based on job-related knowledge, skills, experience and education.
Why 8am:
At 8am, our culture is shaped by the people who bring it to life every day. Together, we build a company rooted in continuous learning, genuine community, holistic wellness, and meaningful engagement-values that empower us as individuals and unite us as a team. Our culture is grounded in our core values: Work Smart, Win Fast; Outshine Ordinary, and We Find a Way. These values drive how we serve our customers and work with each other in a collaborative, inspiring, and empowering environment, every day.
Here's how we support our 8Team:
Health Insurance Coverage: We offer our 8Team a variety of medical, dental, and vision plans, designed to fit your needs, including a 100% company-paid HDHP plan for employees.
Financial perks: We offer a competitive compensation and benefits package including annual bonuses, equity options and 401(k) or RRSP if in Canada, with a company match for all team members.
Time for what matters: Flexible Time Off, paid holidays, and a parental leave program for our new parents.
Wellness: Wellness stipends, mental health support, and one-on-one nutrition coaching.
Learning and Development: Continuous learning through 8am.edu, leadership programs, professional development funds, and individually focused talent development.
Giving back to the communities around us: Participate in our charitable matching gift program, paid time off for volunteer service, and company-sponsored volunteer events (both local and virtually).
Engagement: Virtual and in-person team-building events, quarterly award recognition through our Rise & Shine Award of Excellence Program, and our peer-to-peer appreciation platform.
At 8am, we don't just offer benefits - we create an environment where people can thrive, grow, and make a real impact every day.
Diversity, equity & inclusion at 8am:
At 8am, we recognize that innovation occurs with a strong team of people who are diverse in background, personality, talent and ideas. Experience comes in many forms and ensuring a diverse and inclusive workplace where we continue to learn from each other is an integral part of our culture. We are committed to creating a welcoming and transparent environment for all that embraces those differences through education, equal access to opportunities and information, inclusionary programs, and community outreach.
Security advisory:
Our hiring teams at 8am are dedicated to recruiting top talent that share our passion for serving the professional services industry through innovative financial technology. As such, our Talent Acquisition Team only follows legitimate hiring practices. We will always communicate with our candidates using emails with the 8am domain and will never ask for sensitive/personal data during the application process. All interviews take place over phone call, Zoom/Google Meet or in person. All offers are communicated verbally by our Talent Acquisition Specialists with a written offer letter as a follow up.
$100k-160k yearly 7d ago
Senior Quality Engineer | Mobile/UI
Affinipay, LLC 3.9
Austin, TX jobs
About the role:
We are seeking an experienced Senior Quality Engineer with a strong focus on mobile testing to join our Quality Engineering team at 8am. In this role, you will serve as a customer advocate and quality champion for your embedded scrum team, ensuring the functionality, reliability, performance, and user experience of our SaaS platform across mobile and web interfaces through comprehensive testing strategies and collaborative engineering practices.
At 8am, Quality is not a phase; it's a mindset woven into every conversation, decision, and line of code. As a senior member of the team, you'll leverage your deep mobile testing expertise to drive quality improvements that impact not just your immediate team but the broader engineering organization. You'll mentor junior engineers, contribute to process improvements, and take ownership of quality initiatives that protect and enhance our customers' experience - whether they're accessing our platform from an iPhone, Android device, iPad, or desktop browser.
About us:
At 8am, our vision is to power a world where professionals thrive. We start every day on a mission to empower professionals with the most trusted, innovative technology to deliver world-class outcomes for their clients and exceptional financial results for their business. They count on our purpose-built solutions to simplify operations, ensure compliance, and fuel profitable growth, so they can focus on their clients and do more of the work that matters.
Founded in 2005, 8am (formerly AffiniPay) is the professional business platform built to help legal, accounting, and other client-focused professionals run stronger, more profitable businesses. Today, more than 250,000 professionals across the U.S. trust 8am to help them work smarter, serve clients better, and unlock their full potential. We have been recognized as one of Inc 5000's fastest growing companies in the U.S. for 13 years in a row, and as a result, our teams continue to grow as well!
What you'll do:
Customer Advocacy & Technical Excellence
Champion customer experience; speak up when features aren't ready or quality concerns arise
Design and execute comprehensive test strategies across iOS (native + Safari), Android (native + Chrome), and desktop browsers
Validate responsive design, touch interactions, gestures, orientation changes, and varying network conditions
Write high-quality test cases reflecting real-world scenarios; document defects with root cause analysis
Drive continuous improvement toward ≤15% Defect Leakage Rate
Cross-Platform & Browser Testing
Ensure seamless experiences across platform transitions (mobile desktop)
Validate across mac OS, Windows, and major browsers (Chrome, Safari, Firefox, Edge)
Maintain a comprehensive device/browser/OS testing matrix based on user analytics
Leadership & Execution
Complete 90%+ of tasks independently; mentor junior engineers on mobile testing best practices
Own quality initiatives and drive measurable improvements to QE standards
Become SME for mobile testing, device fragmentation, or specific product components
Collaborate with SDET team on automation priorities (Appium, XCUITest, Espresso)
Track and report Quality KPIs with focus on Defect Leakage Rate
About you:
3-7 years of experience in software quality engineering, preferably in SaaS environments
Strong demonstrated experience in mobile application testing across iOS and Android platforms, including both native apps and mobile web browsers
Experience testing on physical iOS devices (iPhones, iPads) and Android devices across multiple manufacturers
Have familiarity with the major App store submission processes and general guidelines
Proficiency testing across major desktop environments (mac OS, Windows) and browsers (Chrome, Safari, Firefox, Edge)
Strong demonstrated proficiency in Linux distributions and CLI-based testing, including log file analysis and other troubleshooting tasks
Experience with AWS or other major cloud platforms
Basic Python/Shell (or similar) scripting knowledge with ability to edit existing scripts and create new automation
Advanced skills with UI, API, and SQL testing methodologies
Familiarity with test management tools such as TestRail; experience with Qase is a plus
Demonstrated experience leveraging Version Control Systems with a focus on GitHub
Experience with testing tools: Jira, Sentry, DataDog
Strong understanding of Agile/Scrum methodologies
Proven track record of mentoring junior engineers and contributing to process improvements
Excellent analytical and problem-solving abilities
Strong communication skills with ability to present to both technical and non-technical stakeholders
Demonstrated experience leveraging AI tools and technologies to improve workflows, enhance decision-making, or drive innovation.
Nice to have:
Experience with mobile test automation frameworks such as Appium, XCUITest (iOS), or Espresso (Android)
Experience with BrowserStack or similar cloud-based device testing platforms for cross-platform compatibility testing
Experience testing SaaS products in regulated industries (such as PCI-compliant)
Basic understanding of containerization, Kubernetes, and CI/CD pipelines (Jenkins, CircleCI)
Knowledge of basic non-functional testing (security, performance, accessibility) with emphasis on mobile-specific concerns
Experience with microservices architectures and distributed systems
Understanding of responsive design principles and mobile UX best practices
Certifications such as ISTQB or CSTE
Familiarity with AI-assisted testing tools and leveraging LLMs as a productivity-boosting tool
Experience evaluating and implementing new QE tools and processes
Additional Information
The annual salary range for this position is $120,000 to $150,000. The salary range for performing this role outside of the US / Austin / California may differ. 8am is committed to offering competitive, fair and commensurate compensation and has provided an estimated pay range for this role. Actual compensation may vary based on job-related knowledge, skills, experience and education.
Why 8am:
At 8am, our culture is shaped by the people who bring it to life every day. Together, we build a company rooted in continuous learning, genuine community, holistic wellness, and meaningful engagement-values that empower us as individuals and unite us as a team. Our culture is grounded in our core values: Work Smart, Win Fast; Outshine Ordinary, and We Find a Way. These values drive how we serve our customers and work with each other in a collaborative, inspiring, and empowering environment, every day.
Here's how we support our 8Team:
Health Insurance Coverage: We offer our 8Team a variety of medical, dental, and vision plans, designed to fit your needs, including a 100% company-paid HDHP plan for employees.
Financial perks: We offer a competitive compensation and benefits package including annual bonuses, equity options and 401(k) or RRSP if in Canada, with a company match for all team members.
Time for what matters: Flexible Time Off, paid holidays, and a parental leave program for our new parents.
Wellness: Wellness stipends, mental health support, and one-on-one nutrition coaching.
Learning and Development: Continuous learning through 8am.edu, leadership programs, professional development funds, and individually focused talent development.
Giving back to the communities around us: Participate in our charitable matching gift program, paid time off for volunteer service, and company-sponsored volunteer events (both local and virtually).
Engagement: Virtual and in-person team-building events, quarterly award recognition through our Rise & Shine Award of Excellence Program, and our peer-to-peer appreciation platform.
At 8am, we don't just offer benefits - we create an environment where people can thrive, grow, and make a real impact every day.
Diversity, equity & inclusion at 8am:
At 8am, we recognize that innovation occurs with a strong team of people who are diverse in background, personality, talent and ideas. Experience comes in many forms and ensuring a diverse and inclusive workplace where we continue to learn from each other is an integral part of our culture. We are committed to creating a welcoming and transparent environment for all that embraces those differences through education, equal access to opportunities and information, inclusionary programs, and community outreach.
Security advisory:
Our hiring teams at 8am are dedicated to recruiting top talent that share our passion for serving the professional services industry through innovative financial technology. As such, our Talent Acquisition Team only follows legitimate hiring practices. We will always communicate with our candidates using emails with the 8am domain and will never ask for sensitive/personal data during the application process. All interviews take place over phone call, Zoom/Google Meet or in person. All offers are communicated verbally by our Talent Acquisition Specialists with a written offer letter as a follow up.
$120k-150k yearly 7d ago
Senior ML Engineer: Production Pipelines & HPC Expert
Capital One 4.7
McLean, VA jobs
A leading financial services company in Virginia seeks an experienced professional to design and build data-intensive solutions. The role requires expertise in C, C++, Python, Scala, and machine learning, along with the ability to lead teams and communicate complex concepts effectively. Candidates should possess a Bachelor's and preferably a Master's degree, with a proven track record in production-ready data pipelines and ML lifecycle. Competitive compensation and comprehensive benefits are offered.
#J-18808-Ljbffr
$90k-111k yearly est. 5d ago
Site Reliability Engineer
The Voleon Group 4.1
Remote
Voleon is a technology company that applies state-of-the-art AI and machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future. Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together.
In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more.
As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production-critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real-world problems and collaborate with passionate and talented colleagues in an empowering, results-driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.Responsibilities
Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems
Diagnose and fix bugs in code
Lead complex deployments
Automate manual workflows
Track and prioritize outstanding production-related issues
Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems
Requirements
Experience with coding and debugging Python
Experience with Linux
Familiarity with Relational Databases & SQL
Sharp analytical and problem-solving skills and a persistent drive to make things work (better)
Strong growth mindset and a passion for learning
Strong technical communication skills
Attention to detail
2 years of relevant industry experience
An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience
Preferred Qualifications
Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
Experience supporting production systems
Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes
The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match.
“Friends of Voleon” Candidate Referral ProgramIf you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this form to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity EmployerThe Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
$120k-160k yearly Auto-Apply 50d ago
Principal Site Reliability Engineer - Foundational Services
Jpmorgan Chase 4.8
Palo Alto, CA jobs
Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact. As a Principal Site Reliability Engineer at JP Morgan Chase within the (insert LOB or sub LOB), you draw upon your advanced knowledge to identify new opportunities to influence critical incident management and improve the end-to-end lifecycle of software development for the firm. You will have the opportunity to manage, design, and implement infrastructure components to improve reliability and ensure operational efficiency.
**Job responsibilities**
+ Identifies and solves problems of high complexity
+ Works with development teams throughout the Software Development Life Cycle to ensure sustainable software releases
+ Leads medium to large projects by bringing together the proper perspective, identifying roadblocks, and integrating feedback from team members and subject matter experts at the firm
+ Participates in support responsibilities for coverage of critical applications
+ Sees problems as opportunities to improve
**Required qualifications, capabilities, and skills**
+ Formal training or certification on site reliability engineering concepts and 10+ years applied experience
+ Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines
+ Ability to determine how each system relates to each other and use breadth of tools to build automation to improve reliability for the firm
+ Experience with translating research, analysis, and tests into business recommendations
+ Ability to balance and be accountable for the work of multiple architects and designers
**Preferred qualifications, capabilities, and skills**
+ Understands and leads partnerships across job functions (e.g., Cybersecurity and Data) to develop efficient and developer-friendly systems
+ Engages team members and expresses complex ideas with appropriate level of detail, while also providing constructive feedback
JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
**Base Pay/Salary**
Palo Alto,CA $204,250.00 - $285,000.00 / year
$204.3k-285k yearly 4d ago
Staff Site Reliability Engineer
Figure 4.5
San Jose, CA jobs
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.
We are looking for a Site Reliability Engineer to own our internal systems infrastructure. This role is responsible for setting up and managing cloud and on-prem infrastructure to deliver highly available, reliable, and automated systems.
Responsibilities:
Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more.
Migrate SaaS to self-hosted solutions to enhance security and reliability.
Implement monitoring and alerting systems, and define incident response plans and runbooks.
Reduce human workload through automation to automate deployment and scaling.
Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives.
Use a data driven approach to demonstrate service robustness and track optimization work.
Partner with the security team to ensure that security remediations and updates are applied in a timely manner.
Requirements:
Strong experience with Linux/Unix systems administration
Proficiency in programming/scripting
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems.
Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets.
Ability to work in cross-functional teams with developers, infra, and product teams
Excellent verbal and written communication skills
The US base salary range for this full-time position is between $175,000 - $250,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
$175k-250k yearly Auto-Apply 44d ago
Site Reliability Engineer - Capital Markets
Jefferies Financial Group Inc. 4.8
Jersey City, NJ jobs
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.
As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
Front Line Site ReliableEngineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
Build monitoring tools for application and infrastructure components.
Implement and manage scalable infrastructure using cloud-native technologies and tools.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
Develop and maintain CI/CD pipelines to streamline deployment processes.
Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
Create sustainable systems and services through automation.
Collaborate with Application team to establish and enforce production and development standards.
Document procedures, best practices and troubleshooting FAQs.
Resolve complex application and technical problems.
Debugging the system and fixing the production related issues.
Escalate / follow-up on permanent fix for development related issues.
Lead incident response efforts and post-mortem analysis to prevent future occurrences.
Handles complex operational tasks and recommends process and technology changes.
Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
Ability to work both independently and in groups in an energetic, diverse environment.
Participate in on-call rotations to ensure 24/7 system availability and support.
Support compliance and legal queries.
Qualifications:
Strong experience in Windows and Linux/Unix services.
Strong experience in scripting language like Power shell, Python and SQL.
Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
Strong Knowledge of FIX protocol
Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
$175k-200k yearly Auto-Apply 60d+ ago
Site Reliability Engineer
The Voleon Group 4.1
Berkeley, CA jobs
Voleon is a technology company that applies state-of-the-art AI and machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future. Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together.
In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more.
As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production-critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real-world problems and collaborate with passionate and talented colleagues in an empowering, results-driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.Responsibilities
Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems
Diagnose and fix bugs in code
Lead complex deployments
Automate manual workflows
Track and prioritize outstanding production-related issues
Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems
Requirements
Experience with coding and debugging Python
Experience with Linux
Familiarity with Relational Databases & SQL
Sharp analytical and problem-solving skills and a persistent drive to make things work (better)
Strong growth mindset and a passion for learning
Strong technical communication skills
Attention to detail
2 years of relevant industry experience
An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience
Preferred Qualifications
Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
Experience supporting production systems
Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes
The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match.
“Friends of Voleon” Candidate Referral ProgramIf you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this form to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity EmployerThe Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
$120k-160k yearly Auto-Apply 50d ago
Reliability Engineer
Tata Consulting Services 4.3
Marlborough, MA jobs
* SRE to quickly write automations, self-heal scripts, understanding and finding resolutions for errors from Microservices basically any from any stack ( Full-Stack capable). * Operations skillset with enough attitude to scale to a Reliability Engineer
* Should be able to handle customer communication and coordination with offshore team.
TCS Employee Benefits Summary:
* Discretionary Annual Incentive.
* Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
* Family Support: Maternal & Parental Leaves.
* Insurance Options: Auto & Home Insurance, Identity Theft Protection.
* Convenience & Professional Growth: Commute r Benefits & Certification & Training Reimbursement.
* Time Off: Vacation, Time Off, Sick Leave & Holidays.
* Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
# LI-RJ2
Salary Range - $100,000-$120,000 a year
$100k-120k yearly 21d ago
Reliability Engineer (SRE OMS)
Tata Consulting Services 4.3
Marlborough, MA jobs
* SRE with Sterling OMS Skillset with adaptability to Distributed Systems, developing Automations with AI/GenAI tool etc * Operations skillset with enough attitude to scale to a Reliability Engineer. * Should be able to handle customer communication and coordination with offshore team.
TCS Employee Benefits Summary:
* Discretionary Annual Incentive.
* Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
* Family Support: Maternal & Parental Leaves.
* Insurance Options: Auto & Home Insurance, Identity Theft Protection.
* Convenience & Professional Growth: Commute r Benefits & Certification & Training Reimbursement.
* Time Off: Vacation, Time Off, Sick Leave & Holidays.
* Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
# LI-RJ2
Salary Range - $100,000-$120,000 a year
$100k-120k yearly 21d ago
Site Reliability Engineer
Tata Consulting Services 4.3
Miami, FL jobs
Must-Have * Strong development experience in .NET and Java frameworks. * Proven leadership managing SRE and DevOps teams. * Incident and problem management using ServiceNow. * Expertise in Observability: AppDynamics, PagerDuty, Grafana, Splunk. * Deep understanding of CI/CD with Azure ADO, GitHub, Maven, Gradle.
* Automated regression and performance testing experience with Selenium, JMeter.
* Experience building self-healing systems.
* Strong skills in root cause analysis (RCA) and problem identification.
* Ability to define and enforce SLAs and response metrics.
* Document and maintain version-controlled knowledge repositories.
* Exposure to self-healing systems in SRE or DevOps context.
Good-to-Have
* Certifications in AWS/GCP/Azure
Salary Range-$100,000-$120,000 a year
#LI-KR3
TCS Employee Benefits Summary:
Discretionary Annual Incentive.
Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
Family Support: Maternal & Parental Leaves.
Insurance Options: Auto & Home Insurance, Identity Theft Protection.
Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.
Time Off: Vacation, Time Off, Sick Leave & Holidays.
Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
Experience working in a Travel/Tourism industry
$100k-120k yearly 18d ago
Site Reliability Engineer - Capital Markets
Jefferies Financial Group Inc. 4.8
New York, NY jobs
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
* Front Line Site ReliableEngineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
* Build monitoring tools for application and infrastructure components.
* Implement and manage scalable infrastructure using cloud-native technologies and tools.
* Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
* Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
* Develop and maintain CI/CD pipelines to streamline deployment processes.
* Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
* Create sustainable systems and services through automation.
* Collaborate with Application team to establish and enforce production and development standards.
* Document procedures, best practices and troubleshooting FAQs.
* Resolve complex application and technical problems.
* Debugging the system and fixing the production related issues.
* Escalate / follow-up on permanent fix for development related issues.
* Lead incident response efforts and post-mortem analysis to prevent future occurrences.
* Handles complex operational tasks and recommends process and technology changes.
* Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
* Ability to work both independently and in groups in an energetic, diverse environment.
* Participate in on-call rotations to ensure 24/7 system availability and support.
* Support compliance and legal queries.
Qualifications:
* Strong experience in Windows and Linux/Unix services.
* Strong experience in scripting language like Power shell, Python and SQL.
* Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
* Strong Knowledge of FIX protocol
* Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
* Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
* Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
$175k-200k yearly Auto-Apply 49d ago
Network Reliability Engineer III
CME Group 4.4
Chicago, IL jobs
As we embark on a journey to transform the Network Services Group in CME, we are seeking a Network Reliability Engineer III to join our dynamic team. In this role, you will design, develop and maintain self-service tools and applications that enhance productivity and reduce operational costs. You will work across the full stack-both front-end and back-end-to architect microservices (GKE) in Google Cloud Platform (GCP), driving our infrastructure towards greater automation and reliability.
We are a global team across US, UK, India and Singapore made up of a diverse range of people from varied backgrounds who each bring unique network experiences and skill sets. The relatively new Network Reliability/Automation team are responsible for building a suite of custom automation tools and developing our self-healing capabilities while working closely with other members of the Network Services team in project delivery to ensure one of the largest Exchange network infrastructures in the world is highly available, resilient, secure and reliable.
Responsibilities
* Design, develop and maintain self-service and automation tools to streamline IT operations and reduce manual effort.
* Engage in full-stack development, delivering responsive front-end interfaces as well as robust scalable back-end services.
* With support Architect, deploy and scale microservices on GCP, with particular emphasis on containers and Google Kubernetes Engine (GKE).
* Manage cloud infrastructure via Infrastructure-as-Code (IaC), primarily using Terraform to provision and maintain resources.
* Operate and troubleshoot solutions on Linux-based platforms, leveraging Visual Studio Code (VSCode) as the primary development environment.
* Adhere to software engineering best practices, including PEP8 coding standards, SOLID design principles, and established SDLC processes.
* Implement and manage CI/CD pipelines with a DevOps mindset, ensuring rapid, reliable delivery of code.
* Develop and consume Flask-based RESTful APIs to support network and security automation.
* Collaborate within an Agile Scrum framework, utilizing tools such as Bitbucket and Jira to track progress and manage sprints.
* Apply strong analytical and problem-solving skills to balance multiple project variables and deliver high-quality solutions on schedule.
What we are looking for
* Approximately 2-3 years' hands-on Python programming experience, with a demonstrable track record of automation or tooling projects.
* Knowledge and experience working with both Python Django and Flask in a corporate environment.
* Any experience in network and security automation, coupled with understanding of network fundamentals (routing, switching, firewalls, VPNs) would be beneficial.
* Experience developing REST APIs using Flask (or a comparable Python framework).
* Applicants with front-end experience using Javascript/JQuery/HTML5/CSS would be ideal.
* Familiarity with Infrastructure-as-Code using Terraform (or similar) to manage cloud resources.
* Comfortable working in Linux environments and proficient in using Visual Studio Code (VSCode).
* Strong software engineering mindset: adherence to PEP8, SOLID principles, and best practices for SDLC, CI/CD and DevOps.
* Excellent communication skills, both verbal and written, with the ability to convey technical concepts to diverse stakeholders.
* Highly analytical, with the ability to troubleshoot complex issues and manage multiple tasks concurrently.
* Experience working in Agile Scrum teams, utilizing Bitbucket and Jira (or equivalent tools) for version control and project tracking.
Personal Attributes
* Proactive and positive attitude, taking initiative to identify and resolve issues ahead of time.
* Collaborative team player, eager to contribute knowledge and assist colleagues.
* Innovative thinker who brings fresh ideas and constructive suggestions for continuous improvement.
Education
Bachelor's Degree in Computer Science, Engineering or a related field is preferred. Equivalent practical experience will also be considered.
#LI - Hybrid
#LI - JK1
CME Group is committed to offering a competitive total rewards package for our employees that recognizes their contributions to the business and reflects our long-term investment in their future. The pay range for this role is $100,700-$167,800. Actual salary offered will be dependent on a wide array of factors including but not limited to: relevant experience, skills, education and comparison to internal employees (where relevant). Our compensation program also includes an annual target bonus opportunity for all employees, as well as the opportunity to become an owner in the company through our broad-based equity program. Through our benefits program, we strive to offer flexibility, value and choice. From comprehensive health coverage, to a retirement package that includes both a 401(k) and an active pension plan, to highly competitive education reimbursement provisions, paid time off and a mental health benefit, CME Group offers a holistic benefits package for our team and their dependents.
CME Group: Where Futures are Made
CME Group is the world's leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it - all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And we're looking for more.
At CME Group, we embrace our employees' unique experiences and skills to ensure that everyone's perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.
Important Notice: Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.
$100.7k-167.8k yearly 60d+ ago
Principal Site Reliability Engineer - Foundational Services
Jpmorganchase 4.8
Palo Alto, CA jobs
Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact.
As a Principal Site Reliability Engineer at JP Morgan Chase within the [insert LOB or sub LOB], you draw upon your advanced knowledge to identify new opportunities to influence critical incident management and improve the end-to-end lifecycle of software development for the firm. You will have the opportunity to manage, design, and implement infrastructure components to improve reliability and ensure operational efficiency.
Job responsibilities
Identifies and solves problems of high complexity
Works with development teams throughout the Software Development Life Cycle to ensure sustainable software releases
Leads medium to large projects by bringing together the proper perspective, identifying roadblocks, and integrating feedback from team members and subject matter experts at the firm
Participates in support responsibilities for coverage of critical applications
Sees problems as opportunities to improve
Required qualifications, capabilities, and skills
Formal training or certification on site reliability engineering concepts and 10+ years applied experience
Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines
Ability to determine how each system relates to each other and use breadth of tools to build automation to improve reliability for the firm
Experience with translating research, analysis, and tests into business recommendations
Ability to balance and be accountable for the work of multiple architects and designers
Preferred qualifications, capabilities, and skills
Understands and leads partnerships across job functions (e.g., Cybersecurity and Data) to develop efficient and developer-friendly systems
Engages team members and expresses complex ideas with appropriate level of detail, while also providing constructive feedback
$140k-177k yearly est. Auto-Apply 7d ago
Reliability Engineer*
3M 4.6
Georgia jobs
Job Title
Reliability Engineer
Collaborate with Innovative 3Mers Around the World
Choosing where to start and grow your career has a major impact on your professional and personal life, so it's equally important you know that the company that you choose to work at, and its leaders, will support and guide you. With a wide variety of people, global locations, technologies and products, 3M is a place where you can collaborate with other curious, creative 3Mers.
This position provides an opportunity to transition from other private, public, government or military experience to a 3M career.
The Impact You'll Make in this Role
As a(n) HANDS ON Reliability Engineer, you will have the opportunity to tap into your curiosity and collaborate with some of the most innovative people around the world. Here, you will make an impact by:
Perform failure mode and effect analysis to assure the proper Preventive & Predictive Maintenance programs are implemented, audited and improved on all existing and future assets.
Application of Reliability Based Maintenance programs such as Reliability Centered Maintenance (RCM) and Total Productive Maintenance (TPM).
Assess & develop capability of mechanics on their role in reliability improvement and to advance their technical capabilities.
Analyze data (failure, cost, uptime, etc.) and apply appropriate reliability analysis tools to develop & implement improvement plans.
Perform & document equipment criticality analysis in support of an effective critical spares strategy.
Submit recommendations and justification for capital expenditures that support and improve the Reliability Program.
Provide an external awareness of methods and technologies that advance our own internal body of knowledge for the improvement of our operations reliability.
Your Skills and Expertise
To set you up for success in this role from day one, 3M requires (at a minimum) the following qualifications:
Technical degree or higher (completed and verified prior to start) and Two (2) years of manufacturing experience in a private, public, government or military environment.
OR
Associates Degree or higher (completed and verified prior to start) and Two (2) years of manufacturing experience in a private, public, government or military environment.
AND
One (1) year of experience with mechanical and electrical drawings.
Additional qualifications that could help you succeed even further in this role include:
Bachelor's degree in Electrical, Mechanical, or Mechatronics Engineering from an accredited institution
Five (5) years of manufacturing in automotive or aerospace private, public, government or military environment
Experience with reliability analysis, predictive (PdM), and preventative maintenance (PM).
Skills include… Strong communication, independent, strategic, problem solving. PLC, Automation, variable frequency drives
Work location:
On-site
Clarkston, GA
Travel: May include up to 5% domestic/international]
Relocation Assistance: Not Authorized
Must be legally authorized to work in country of employment without sponsorship for employment visa status (e.g., H1B status).
Responsibilities of this position may include direct and/or indirect physical or logical access to information, systems, technologies subjected to the regulations/compliance with U.S. Export Control Laws.
U.S. Export Control laws and U.S. Government Department of Defense contracts and sub-contracts impose certain restrictions on companies and their ability to share export-controlled and other technology and services with certain "non-U.S. persons" (persons who are not U.S. citizens or nationals, lawful permanent residents of the U.S., refugees, "Temporary Residents" (granted Amnesty or Special Agricultural Worker provisions), or persons granted asylum (but excluding persons in nonimmigrant status such as H-1B, L-1, F-1, etc.) or non-U.S. citizens.
To comply with these laws, and in conjunction with the review of candidates for those positions within 3M that may present access to export controlled technical data, 3M must assess employees' U.S. person status, as well as citizenship(s).
The questions asked in this application are intended to assess this and will be used for evaluation purposes only. Failure to provide the necessary information in this regard will result in our inability to consider you further for this particular position. The decision whether or not to file or pursue an export license application is at 3M Company's sole election.
Supporting Your Well-being
3M offers many programs to help you live your best life - both physically and financially. To ensure competitive pay and benefits, 3M regularly benchmarks with other companies that are comparable in size and scope.
Chat with Max
For assistance with searching through our current job openings or for more information about all things 3M, visit Max, our virtual recruiting
Applicable to US Applicants Only:The expected compensation range for this position is $81,983 - $100,202, which includes base pay plus variable incentive pay, if eligible. This range represents a good faith estimate for this position. The specific compensation offered to a candidate may vary based on factors including, but not limited to, the candidate's relevant knowledge, training, skills, work location, and/or experience. In addition, this position may be eligible for a range of benefits (e.g., Medical, Dental & Vision, Health Savings Accounts, Health Care & Dependent Care Flexible Spending Accounts, Disability Benefits, Life Insurance, Voluntary Benefits, Paid Absences and Retirement Benefits, etc.). Additional information is available at: ******************************************************************* Faith Posting Date Range 08/11/2025 To 09/10/2025 Or until filled All US-based 3M full time employees will need to sign an employee agreement as a condition of employment with 3M. This agreement lays out key terms on using 3M Confidential Information and Trade Secrets. It also has provisions discussing conflicts of interest and how inventions are assigned. Employees that are Job Grade 7 or equivalent and above may also have obligations to not compete against 3M or solicit its employees or customers, both during their employment, and for a period after they leave 3M.Learn more about 3M's creative solutions to the world's problems at ********** or on Instagram, Facebook, and LinkedIn @3M.Responsibilities of this position include that corporate policies, procedures and security standards are complied with while performing assigned duties.Safety is a core value at 3M. All employees are expected to contribute to a strong Environmental Health and Safety (EHS) culture by following safety policies, identifying hazards, and engaging in continuous improvement.Pay & Benefits Overview: https://**********/3M/en_US/careers-us/working-at-3m/benefits/3M does not discriminate in hiring or employment on the basis of race, color, sex, national origin, religion, age, disability, veteran status, or any other characteristic protected by applicable law.
Please note: your application may not be considered if you do not provide your education and work history, either by: 1) uploading a resume, or 2) entering the information into the application fields directly.
3M Global Terms of Use and Privacy Statement
Carefully read these Terms of Use before using this website. Your access to and use of this website and application for a job at 3M are conditioned on your acceptance and compliance with these terms.
Please access the linked document by clicking here, select the country where you are applying for employment, and review. Before submitting your application, you will be asked to confirm your agreement with the terms.
$82k-100.2k yearly Auto-Apply 60d+ ago
Java Site Reliability Engineer, Messaging Platforms
Pacific Investment Management Co 4.9
Austin, TX jobs
We are a leading global asset management firm with over 3,000 employees across 20 offices in 15 countries; we help millions of investors around the world pursue their financial goals.
We hire critical thinkers. People who thrive in a collaborative culture like ours where we solve real problems while building the future of finance.
You
Are excited to be part of a vibrant engineering community that values diversity, hard work, and continuous learning.
Love solving complex real-world business problems.
Recognize that cross-functional collaboration is a core component of success for the team.
Believe there are multiple ways to solve most technical problems and are willing to debate the trade-offs.
Have become a stronger engineer by making mistakes and learning from them.
Are a doer, someone who wants to grow their career and gain experience across technologies and business functions.
We
Continuously invest in a high-performance and inclusive culture, in which a diversity of backgrounds, experiences and viewpoints are celebrated and valued.
Encourage career mobility, so you can benefit from learning different functions and technologies, and we gain the benefits of your experience across teams.
Run technology pro bono programs that help the non-profit community and give our engineering community opportunities to volunteer and participate.
Offer education reimbursements and ongoing training in technology, communication, and diversity & inclusion.
Embrace knowledge sharing through lunch-and-learns, demos, and technical forums.
Consider our people to be our greatest asset-we will help you learn what PIMCO Technology has to offer so you can participate in activities that benefit your career while delivering impactful technology solutions.
As a Java SRE in Trading Technology, you will:
As our immediate need
Help support the messaging platforms in use (MQ, AMPS, Kafka, etc.).
driving the firm's best use of these platforms, making sure all choice make sense, the correct tools issued for the solving each job, and that we build a sustainable messaging strategy.
Improve the operational efficiency and reduce the operational risk of our messaging platforms through better tools, better design, and better monitoring.
In the future
there will be new architectural or coding problems that we will need an experienced engineer to help solve.
Work closely with the business and other teams to design and implement solutions that have immediate impact to the business and help us build towards our strategic vision across all our trade floor applications.
We need someone proficient in Java, passionate about SRE practices, and able to collaborate effectively with an infrastructure team. We expect you to have a strong passion for messaging systems, including their proper setup, monitoring, and maintenance. At the same time, this role involves software development for target platforms once the immediate needs related to messaging platforms are resolved.
You will work with a team consisting of 1 SRE and 1 Unix SA, with full support from the infrastructure and DevOps teams.
Position Requirements
Bachelor's degree in computer science or equivalent
Strong Linux skills (including chef, puppet, ansible configuration tools)
Strong experience with different messaging systems (Kafka, AMPS, MQ, FIX, etc.).
Strong engineering culture (unit tests, CI/CD)
Ability to work independently and in teams
Good communication skills
Working from the office in Austin 4 days a week.
PIMCO follows a total compensation approach when rewarding employees which includes a base salary and a discretionary bonus. Base salary is the fixed component of compensation that is determined by core job responsibilities, relevant experience, internal level, and market factors. The discretionary bonus is used to award performance and therefore is determined by company, business, team, and individual performance.
Salary Range: $ 175,000.00 - $ 240,000.00
Equal Employment Opportunity and Affirmative Action Statement
PIMCO recruits and hires qualified candidates without regard to race, national origin, ancestry, religion (including religious dress and grooming practices), sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), sexual orientation, gender (including gender identity and expression), age, military or veteran status, disability (physical or mental), any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity and affirmative action, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other basis such as medical condition, or marital status under applicable laws.
Applicants with Disabilities
PIMCO is an Equal Employment Opportunity/Affirmative Action employer. We provide reasonable accommodation for qualified individuals with disabilities, including veterans, in job application procedures. If you have any difficulty using our online system due to a disability and you would like to request an accommodation, you may contact us at ************ and leave a message. This is a dedicated line designed exclusively to assist job seekers with disabilities to apply online. Only messages left for this purpose will be considered. A response to your request may take up to two business days.