Post job

Reliability Engineer jobs at J.P. Morgan - 79 jobs

  • Process Engineer

    CTC 4.6company rating

    Cincinnati, OH jobs

    20 hrs/week ONSITE Cincinnati, OH 45224 The Manufacturing Process Engineer will be responsible for evaluating, improving, and maintaining manufacturing processes and equipment to ensure efficiency, safety, and compliance. This role requires strong analytical skills, technical expertise, and the ability to drive continuous improvement initiatives across the plant. Responsibilities Evaluate existing manufacturing processes and identify areas for improvement. Inspect and maintain mechanical equipment performance within the plant. Diagnose production issues and implement effective solutions. Conduct cost-benefit analyses for new processes and equipment. Design detailed layouts for equipment, processes, and workflows. Research and develop new processes, equipment, and products. Implement cost-saving measures and quality control systems. Ensure compliance with safety standards and legal regulations. Maintain documentation and prepare technical reports. Must Have Process evaluation and continuous improvement experience. Mechanical equipment inspection and maintenance knowledge. Strong problem-solving and root cause analysis skills. Ability to perform cost-benefit analysis. Process design and workflow optimization expertise. Knowledge of quality control systems and regulatory compliance. Technical documentation and report preparation skills. Bachelor's degree in Mechanical, Industrial, or Manufacturing Engineering (or equivalent). 2 years of experience Nice to Have Experience with advanced manufacturing technologies (automation, robotics, Industry 4.0). Familiarity with Lean Manufacturing, Six Sigma, or Kaizen methodologies. Exposure to ERP systems (SAP, Oracle, Salesforce). Project management and cross-functional collaboration skills. Innovation mindset for R&D of new processes and products. Bilingual communication (English/Spanish) for global operations. Experience in cost-saving initiatives with measurable impact.
    $55k-74k yearly est. 5d ago
  • Job icon imageJob icon image 2

    Looking for a job?

    Let Zippia find it for you.

  • Site Reliability Engineer

    Mio Partners 4.5company rating

    New York, NY jobs

    MIO Partners, Inc. (MIO) provides proprietary investment products to McKinsey's retirement plan and partners and offers independent, high-quality financial advice to McKinsey's partners. We manage a wide array of investment vehicles with significant expertise and a long and successful track record in alternative strategies, including hedge funds and private equity. We have a multibillion-dollar portfolio of assets under management, and we manage assets for and advise only McKinsey-related clients; we do not accept outside or third-party investments. MIO is a values-based organization that is strongly aligned with our investors' interests. MIO measures success as performance relative to a market-based benchmark. MIO, a 250+ person registered investment adviser, provides ample opportunities for somebody with an entrepreneurial drive to shine. We strive to meet the highest professional standards and build an organization that attracts, develops, and retains exceptional people. MIO is a wholly owned subsidiary of McKinsey, but our activities are kept entirely separate from those of the consulting Firm. Primary responsibilities The successful candidate will have extensive technical experience working with AWS cloud technologies, preferably for financial services firms, such as asset managers, hedge funds, and/or broker/dealers. The new hire must lead by example and work collaboratively to: Design and maintain monitoring systems and dashboards Architect and manage cloud infrastructure (AWS, Azure) with security, stability, and cost in mind Implement CI/CD pipelines for reliable software delivery Establish infrastructure as code practices using CDK, GitLab, AWS developer tools Contribute to MIO application codebase to follow resiliency and performance best practices Ensure application architectures follow cloud best practices for reliability, security, performance, and efficiency Work with development teams to improve deployment processes and system reliability Collaborate with business owners to translate business requirements into technical solutions with an eye toward technology consistency and best practices Work with engineers, business users, and other stakeholders to understand their needs and ensure solutions align with business goals Maintain detailed documentation for reference architectures, design patterns, and system configurations Raise the bar on our development capabilities, standards, and processes Synthesize requirements gathered from various teams within/outside of IT and suggest creative solutions; where appropriate, guiding MIO to “do it the right way” Following a scrum methodology, organize with end users, business analysts, and other architects and developers Recommend positive steps toward standardizing development processes, including technology selection, deployment steps, code reviews, and IT tools Partner with development, QA, and AppSecOps teams to promote standardization, consistency, and improved security posture Our applications are primarily developed using Python/Django and libraries such as Pandas, NumPy, PL/SQL. In addition, we utilize SQL Server, MySQL, Elastic Search, Redis, Kafka, Tableau, and various third-party APIs and data sources. Our applications are hosted in AWS using docker containers on ECS/EC2 platforms. Primary responsibilities estimated percentage allocation 25% Technology Leadership: design, mentoring, 15% Relationship Building: requirements 60% Heads Down Development Desired background Please note applicants must be authorized to work in the U.S. without current or future visa sponsorship At least 8+ years of hands-on experience in DevOps, SRE, or platform engineering roles Bachelor of science in computer science or other related discipline (although strong experience with a less directly related degree will be considered) Strong experience in AWS Cloud technologies Knowledge of CI/CD pipeline tools (GitLab pipelines, Jenkins etc.) Understanding of monitoring and observability tools (ELK, Dynatrace, Datadog etc.) Experience with microservices, serverless architectures, and containerization Proficiency in AWS cloud platform including infrastructure-as-code and CI/CD pipelines Formal problem-solving and/or analytical training/experience a plus, as is experience working with management consultants Good intuition for end-user requirements gathering; iterative and collaborative approach to design Strong client relationship management skills and excellent written/verbal communication skills to interact at all levels ***************** MIO Partners, Inc. (MIO) is an equal opportunity employer. MIO will consider all applicants regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability status. MIO has adopted a flexible, hybrid model that supports a blend of in-office and remote work. Our office is in New York City. Certain US states require MIO Partners, Inc. to include a reasonable estimate of the salary range for this role. Actual salaries may vary and may be above or below the range based on various factors, including, but not limited to an individual's assigned office location, experience, and expertise. Certain roles are also eligible for bonuses, subject to MIO's discretion and based on factors such as individual and/or organizational performance. Additionally, MIO offers a comprehensive benefits package, including medical, dental and vision coverage, telemedicine services, life, accident and disability insurance, parental leave and family planning benefits, caregiving resources, a generous retirement program, financial guidance, and paid time off. Base salary range$175,000-$200,000 USD MIO Partners, Inc. (MIO) is an equal opportunity employer. MIO will consider all applicants regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability status. We are committed to protecting your privacy. Please review our Applicant Privacy Policy for a detailed explanation of how we collect, use, and protect your personal information.
    $175k-200k yearly Auto-Apply 57d ago
  • Site Reliability Engineer

    The Voleon Group 4.1company rating

    Remote

    Voleon is a technology company that applies state-of-the-art AI and machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future. Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together. In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production-critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real-world problems and collaborate with passionate and talented colleagues in an empowering, results-driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.Responsibilities Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems Diagnose and fix bugs in code Lead complex deployments Automate manual workflows Track and prioritize outstanding production-related issues Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems Requirements Experience with coding and debugging Python Experience with Linux Familiarity with Relational Databases & SQL Sharp analytical and problem-solving skills and a persistent drive to make things work (better) Strong growth mindset and a passion for learning Strong technical communication skills Attention to detail 2 years of relevant industry experience An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience Preferred Qualifications Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment Experience supporting production systems Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes The base salary for this position is $120,000 to $160,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match. “Friends of Voleon” Candidate Referral ProgramIf you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this form to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity EmployerThe Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
    $120k-160k yearly Auto-Apply 49d ago
  • Principal Site Reliability Engineer - Remote

    Donnelley Financial Solutions 4.8company rating

    Remote

    Join a dynamic team at the pulse of global markets, where we deliver innovative software and service solutions for essential financial reporting and capital markets transactions. At DFIN, we are a values-driven organization that empowers you to build a fulfilling career while bringing your authentic self to work every day. Our "Win as One" mentality ensures that our team's success is directly linked to Client, Shareholder and Employee Satisfaction. Recognized as one of AMERICA'S MOST LOVED WORKPLACES for five consecutive years and a Built In Best Places to Work for six years, we are committed to our employees' total well-being. Enjoy competitive compensation, a flexible workplace, comprehensive benefits, and opportunities for professional growth. Bring your passion and talents to DFIN - because being YOU thrives here. Summary: We are looking for technical team members at all levels who want to push themselves to deliver best in market SaaS solutions. We offer a challenging environment where you will have to grow, adapt and use your skills consistently. Our customers rely on us in the moments that matter. Engineering delivers on that promise. The Principal Site Reliability Engineer - Cloud is responsible for designing, building, securing, monitoring and maintaining our SaaS product cloud infrastructure so it is fast, cost effective, stable and optimized for our customers. SRE's at DFIN take on availability, performance, managing change, monitoring, response and are guardians of non-functional requirements. You either have a SaaS cloud infrastructure background in Azure or AWS with a programmatic, automated mindset or are someone that comes with a software engineering background with SaaS cloud infrastructure experience in Azure or AWS. The SRE goal is to build automated systems that reduce or eliminate manual work to keep our products up and running and performing optimally. We are looking for someone who thrives on collaboration within the team and across other groups and can lead colleagues independently to deliver solutions to complex problems. Responsibilities: * Champion and implement a culture to maintain performant, reliable, secure, cost-effective platform cloud infrastructure in DFIN SaaS products based on operationalized processes you define * Champion security of our cloud infrastructure collaborating with Security and Governance teams and using static and dynamic tooling * Champion and implement application and cloud infrastructure monitoring and alerting to prevent client impacting issues by ensuring system availability, performance and scalability to maintain SLOs and SLAs * Optimize cloud infrastructure and application performance at scale while maintaining effective cost controls * Automate cloud infrastructure buildout and maintenance including system operational runbooks * Dive deep into technology and stay on the forefront of the latest tools, technologies, and strategies; help evaluate, prototype, and integrate them into operationalized work processes * Perform with broad independence and deliver on project milestones and tasks you define on schedule while communicating progress regularly * Build strong relationships with SRE team members and software engineering teams to hold each other accountable for quality expectations * Learn continuously and apply lessons learned * Evangelize best practices, eliminate bottlenecks, and improve process * Participate in on-call duties 365/24/7 and lead the triage and RCA of production incidents Qualifications: * 8+ years experience designing, building, securing, monitoring and maintaining cloud infrastructure in Azure or AWS * 5+ years experience creating, configuring, maintaining and monitoring Kubernetes clusters (AKS or EKS) in cloud infrastructure to optimize application performance and reliability * 5+ years building and deploying Infrastructure as Code with Terraform or similar technology * 5+ years experience with common cloud networking, firewall and load balancing configuration * 5+ years experience writing software in any modern software language such as C# .NET, Java * 5+ years experience creating automated deployments with tools such as Harness, Azure DevOps, Ansible or Jenkins to manage Infrastructure as Code and software build and deployment in a continuous integration (CI) / continuous delivery (CD) environment * 5+ years experience implementing production performance, availability, and scalability monitoring and alerting using a tool such as New Relic, Dynatrace, DataDog or AppDynamics * 5+ years experience supporting public client facing revenue generating systems * Experiencing monitoring and preventing issues with databases and database queries (SQL) using tools like Solarwinds Database Performance Analyzer, Idera SQL Diagnostic Manager, or Redgate SQL Monitor * Experience planning, coordinating, developing and executing all stages of post deployment verification test scripts * Experience securing Windows or Linux systems in 24x7 production environment * BS in Computer Science or equivalent work experience It is the policy of Donnelley Financial Solutions to select, place, and manage all its employees without discrimination based on race, color, national origin, gender, age, religion, actual or perceived disability, veteran status, actual or perceived sexual orientation, genetic information or any other protected status. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access jobs.dfinsolutions.com as a result of your disability. You can request a reasonable accommodation by sending an email to ***********************************. At DFIN, protecting your identity is a top priority. Please be aware of scammers impersonating DFIN recruiters. DFIN recruiters will never request personal information via email or text. You will only receive a text from us if you've already been in contact. All automated messages will come from ***********************************. If you ever have doubts about the legitimacy of any communication from us, please do not hesitate to reach out for verification via *********************************** (this email is for general TA questions and is not used for updates on your application status). #BI-Remote Job Segment: Cloud, Database, SQL, Testing, Linux, Technology
    $105k-156k yearly est. 29d ago
  • SIte Reliability Engineer/ AWS

    New Era Technologies Inc. 3.5company rating

    Remote

    Join New Era Technology, where People First is at the heart of everything we do. With a global team of over 4,500 professionals, we're committed to creating a workplace where everyone feels valued, empowered, and inspired to grow. Our mission is to securely connect people, places, and information with end-to-end technology solutions at scale. At New Era, you'll join a team-oriented culture that prioritizes your personal and professional development. Work alongside industry-certified experts, access continuous training, and enjoy competitive benefits. Driven by values like Community, Integrity, Agility, and Commitment, we nurture our people to deliver exceptional customer service. If you want to make an impact in a supportive, growth-oriented environment, New Era is the place for you. Apply today and help us shape the future of work-together. Include benefits highlights here. Paste full job description here. Include qualifications here. New Era Technology, Inc., and its subsidiaries (“New Era” “we”, “us”, or “our”) in its operating regions worldwide are committed to respecting your privacy and recognize the need for appropriate protection and management of any Personal Data that you may provide us. In this, we are also committed to providing you with a positive experience on our websites and while using our products, services and solutions (“Solutions”). View our Privacy Policy here ********************************************* We never ask candidates to pay any fees at any point in our hiring process. If you are ever asked to provide payment for training, certification, equipment, or any other purpose, it is not from our company. Only communications from our official company channels should be trusted. Please note our official email domain is @neweratech.com. If you suspect fraudulent activity, please contact us immediately at privacy@neweratech.com .
    $94k-136k yearly est. Auto-Apply 35d ago
  • Staff Reliability Engineer

    The Hartford 4.5company rating

    Hartford, CT jobs

    Staff Reliability Engineer - IE07KE We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future. The Hartford is seeking a highly skilled Senior Reliability Engineer (RE) to join our Enterprise Data Organization. This role is pivotal in applying software engineering principles to operations, ensuring the reliability, performance, and scalability of our foundational data infrastructure, platforms and applications in this organization. You will be instrumental in driving our transition from traditional production support to a modern RE model through automation, toil reduction, and standardized service management. This role can have a Hybrid or Remote work arrangement. Candidates who live near one of our locations will have the expectation of working in an office 3 days a week (Tuesday through Thursday). Candidates who do not live near an office should maintain their current work arrangement with the expectation of coming into the office as business needs arise Responsibilities Platform Reliability & Resiliency: Design, build, and maintain highly reliable, scalable, and resilient cloud-based data platforms on AWS and GCP, including core infrastructure and services like Snowflake, EKS, OpenSearch, EMR and Hadoop ecosystems. Automation & Toil Reduction: Champion the RE mandate by identifying manual, repetitive operational tasks (toil) and developing robust automation solutions to eliminate them. This includes automating provisioning, deployment, self-healing and operational tasks. Observability & Monitoring: Implement and manage comprehensive observability solutions (monitoring, alerting, logging, tracing) for the underlying data infrastructure, applications focusing on establishing clear Service Level Indicators (SLIs), Service Level Objectives (SLOs). Incident Response & Management: Act as an escalation point for production incidents, leading incident response, performing deep root cause analysis (RCA), designing error budgets and implementing preventative measures to ensure issues do not recur Standardization & Documentation: Lead the standardization of operational processes and documentation, including the creation and automation of dynamic runbooks and playbooks for consistent and efficient incident resolution and service management. RE Transition: Leads as RE Subject Matter Expert and collaborate with other Platform, Product and Data Engineering Support teams to instill RE best practices, including participation in system design consulting, capacity planning, and deployment pipelines (CI/CD). Qualifications 10+ year's overall experience in an Infrastructure, Data or related technology organization with increasing responsibilities as a hands-on technologist. Must have 5+ year experience as an RE, Cloud, DevOps Engineer, or similar role supporting large-scale enterprise infrastructure and applications. Strong scripting and programming skills (Python etc.) for automation and tooling development. Experience with infrastructure-as-code (e.g., Terraform, CloudFormation, Ansible) and CI/CD tools. Experience designing and operating reliable and resilient infrastructure, fail-safe patterns, reliability controls, and observability from a Reliability Engineering (SRE/RE) infrastructure support perspective across cloud and big data platforms (AWS, GCP, Amazon EMR, Hadoop/Spark, OpenSearch, and container orchestration platforms etc.) Familiarity with cloud-native integrations with databases, data integration, and business intelligence platforms (Snowflake, Informatica IDMC, Tableau, and ThoughtSpot etc.) Expertise in setting up and tuning monitoring and alerting systems (e.g., Dynatrace, Splunk, Prometheus, Grafana, Datadog, Open Telemetry etc.). Expertise defining and implementing of DataOps practices Expertise implementing AIOps to monitor, manage and self-heal infrastructure, data platforms, experience implementing machine learning principles for anomaly detection, alerting and runbook automation. Experience with prompt engineering, implementing AWS or Google AI services, AI enabled automation for infrastructure reliability and performance management. Relevant industry certifications preferred (AWS, GCP, Kubernetes, SRE/DevOps frameworks etc.) This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday). Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position. Compensation The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford's total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is: $127,600 - $191,400 Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age About Us | Our Culture | What It's Like to Work Here | Perks & Benefits
    $127.6k-191.4k yearly Auto-Apply 2d ago
  • Senior Cluster Site Reliability Engineer

    The Voleon Group 4.1company rating

    Remote

    Voleon is a technology company that applies state-of-the-art machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying machine learning to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to meet our growing needs, and you will leverage engineering skills to ensure high degrees of uptime, reliability, and robustness. Our research clusters are at the core of our R&D, and you will be directly responsible for keeping this key resource available and performant. Your work will provide a world-class HPC platform for researchers to focus on cutting-edge machine learning problems at scale. You will support both on-prem and cloud infrastructure, and work to provide the best experience to our technical staff. You will leverage IaC, Automation, and SRE principles to refine and hone a product that operates 24/7 to support Voleon. The Cluster Operations team works on the frontline to triage and mitigate real-time operational issues. You will be an integral member of this team, solving day-to-day issues with high urgency, while also engineering systemic improvements and architectural fixes to prevent recurring issues. You will collaborate with engineering teams to develop improvements to monitoring/telemetry. You will help design and oversee operational frameworks to ensure the cluster operates within a set of rigorous SLAs. Responsibilities Be a first responder in the event of cluster outages or issues. Triage and resolve urgent issues as they arise Ensure a high degree of cluster uptime (measured in multiple nines), and define + track SLAs to quantify reliability Diagnose systemic/recurring patterns of problems, and engineer precision solutions to them in collaboration with engineering teams Develop robust metrics and observability for cluster health and use those metrics to inform your work. Build out custom observability mechanisms when off-the-shelf ones won't do Help software and research teams design policies around fair cluster usage, and help develop enforcement mechanisms for said policies Assist in forecasting cluster growth, and help select appropriate scale-up strategies. Help optimize operations across dimensions of cost and usability Requirements 5+ years of experience in SRE or DevOps roles, preferably working as a senior engineer or tech lead Knowledge of HPC/batch compute frameworks (Slurm, Kueue, AWS/GCP Batch) and/or machine learning training systems (Kubeflow, MLflow, Horovod) Ability to develop scripts and utilities of moderate complexity in a common scripting language (Python, Ruby, etc.) Familiarity with infrastructure-as-code and configuration management tools (Terraform, Ansible) Experience with cloud infrastructure (AWS or GCP) Familiarity designing and implementing modern observability stacks (Prometheus, Grafana, Loki, ELK, OpenTelemetry) Experience with distributed storage technologies (Lustre, Ceph, S3) Embodies a "system engineer" rather than "system administrator" mindset, thinking systematically and leveraging automation Bachelor degree in computer science Preferred Qualifications Hands-on experience with HPC frameworks (Slurm, Grid Engine) and Kubernetes-based job orchestrators (Airflow, Kueue, Kubeflow Pipelines), along with other distributed computing frameworks (Ray, Modin, Dask, Spark) Familiarity with ML frameworks (PyTorch/Tensorflow, JAX, Horovod, DeepSpeed) Familiarity with hybrid/on-prem environments Experience with containerization (Docker, Podman, Singularity), particularly for HPC/batch compute environments Experience with HPC networking (InfiniBand, RDMA) Solid security/IAM foundations (Identity management systems, AWS/GCP IAM, Zero Trust) The base salary range for this position is $205,000 to $235,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match. “Friends of Voleon” Candidate Referral ProgramIf you have a great candidate in mind for this role and would like to have the potential to earn $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this form to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity EmployerThe Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
    $205k-235k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer 2

    Drivewealth 4.0company rating

    Remote

    DriveWealth is a global B2B financial technology organization dedicated to democratizing access to financial independence around the world. Our mission is realized through an API-based platform, empowering our partners to offer seamless investing and trading experiences to clients worldwide, all from their mobile devices. Our technology provides partners with a modern, extensible toolkit, enabling traditional investment workflows and innovative techniques like fractional share ownership. DriveWealth has evolved into a global platform offering trading of US equities, mutual funds, ETFs, fixed income, and options. We seek enthusiastic professionals to contribute diverse perspectives and experiences to our Brokerage-as-a-Service platform. Our culture blends the pace and opportunity of a tech start-up with the impact, stability, and significance of Wall Street. We encourage creativity and experimentation while ensuring institutional-grade execution and regulatory compliance in everything we do. We value diversity and inclusion, celebrating the unique differences of our employees as we scale and grow together. We're guided by operating principles grounded in accountability, teamwork, integrity, and solutions built to scale. Join us! About The Role As a Site Reliability Engineer 2, you will enhance the reliability and performance of our Brokerage-as-a-Service platform during critical 7/24 operations. This role demands a proactive approach to managing technical challenges and system optimizations that align with our global operational strategies. What You'll Do Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements. Handle technical escalations, troubleshoot complex issues, and actively participate in on-call rotations to ensure rapid response and resolution during non-traditional hours. Adhere and administer incident and change management policies. Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability, especially during critical system operations at night. Work closely with the New York office to ensure smooth operation and alignment of SRE practices across time zones. What You'll Need 3+ years in a SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations. Working knowledge in REST APIs and understanding of API integration. Python proficiency in scripting for automation and system management, with a track record of developing and implementing automation solutions. SQL and Database expertise in transactional databases, including querying and troubleshooting. Analytical and troubleshooting skills with a demonstrated ability to perform troubleshooting and root cause analysis of technical issues. Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage. Knowledge of Change Management Process and Risk Management. Nice to Have, But No Required Experience in the brokerage or financial industry Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB Experience maintaining and supporting containerized systems, with familiarity in orchestration tools Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow Advanced skills in managing containerized environments using Kubernetes and OpenShift Practical experience with Confluent Cloud for event streaming architectures Experience with Java applications and a basic understanding of using the browser developer console for front-end debugging Additional Notes: This role is critical for our continuous operations and requires a commitment to nighttime hours, aligning with the global nature of our financial services. Candidates must be prepared for intense collaboration periods and proactive communication across global teams. Applicants must be authorized to work for any employer in the U.S. DriveWealth is unable to sponsor or take over sponsorship of an employment Visa at this time. Compensation Compensation package offerings are based on candidate experience and technical qualifications, as it relates to the role. These are identified and determined throughout your interviewing experience. Please note: at this time, we are not able to hire in all states. Remote (Most US States) Pay Range$130,000-$150,000 USD Benefits Competitive medical, dental, and vision insurance options Mental health resources Generous paid time off with observed holidays (varies per country) Paid parental leave for biological and adoptive parents Up to $2,500 or local equivalent each year to invest in continued education and personal development Up to $900 each year or local equivalent for fitness and wellness reimbursement Company-provided phone (varies by country) For HQ in-office employees, a daily lunch stipend, unlimited snacks, and engaging office space in the Financial District Pre-tax commuter benefits (US only) Employer 401K match (US only) Benefit offerings vary based on country and are subject to change. Equal Employment Opportunity To build technology and products that are used and loved by people and solve real-world problems, we need to build a team with many different perspectives and experiences. We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us at **************************. Agency Disclaimer DriveWealth does not accept agency resumes. Please do not forward resumes to our jobs alias, employees, or any other organization location. DriveWealth is not responsible for any fees related to unsolicited resumes.
    $130k-150k yearly Auto-Apply 40d ago
  • Senior Site Reliability Engineer

    Circle Internet Financial 4.5company rating

    Remote

    Circle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data - globally, nearly instantly and less expensively than legacy settlement systems. This ground-breaking new internet layer opens up previously unimaginable possibilities for payments, commerce and markets that can help raise global economic prosperity and enhance inclusion. Our infrastructure - including USDC, a blockchain-based dollar - helps businesses, institutions and developers harness these breakthroughs and capitalize on this major turning point in the evolution of money and technology. What you'll be part of: Circle is committed to visibility and stability in everything we do. As we grow as an organization, we're expanding into some of the world's strongest jurisdictions. Speed and efficiency are motivators for our success and our employees live by our company values: High Integrity, Future Forward, Multistakeholder, Mindful, and Driven by Excellence. We have built a flexible and diverse work environment where new ideas are encouraged and everyone is a stakeholder. What you'll be responsible for: The Site Reliability Engineer is responsible for building and maintaining Circle's common libraries and infrastructure to support the rapid development of software features; analyzing requirements, procedures, and problems to improve existing systems and modifying systems; building and owning scalable microservices that are responsible for reliable and secure APIs; working with SRE to improve software shipping experience and improve the speed and quality of iteration; building internal developer platform capabilities; collaborating with Product and Engineering teams to design, test, and ship software, including developing and documenting system design procedures, testing procedures, and quality standards; troubleshooting program and system malfunctions to restore normal functioning; consulting with management to ensure agreement on system principles; writing the infrastructure to deliver great development experiences. What you'll bring to Circle: 2-4 years of professional software development experience, with a strong foundation in object-oriented programming, preferably in languages such as Java or Golang Hands-on experience with major cloud platforms, including AWS, Google Cloud Platform (GCP), and Microsoft Azure Proficient with Kubernetes for container orchestration and managing scalable infrastructure Skilled in SQL database design, including schema modeling and query optimization Experience in the deployment and operation of production-quality, scalable software Emphasis on clean, maintainable code with a focus on speed, quality, and high test coverage to support continuous delivery practices Adaptable and quick learner, comfortable exploring new languages, frameworks, and technologies as needed Computer Science degree or a closely related field (or foreign equivalent) Solid understanding of API design and RESTful architecture, with the ability to derive and communicate well-structured designs Excellent communicator, able to collaborate effectively across remote teams and clearly present technical ideas and solutions Self-motivated with a growth mindset, thrives in fast-paced environments, delivers impactful user-focused software, and continuously seeks to improve without heavy oversight. Circle is on a mission to create an inclusive financial future, with transparency at our core. We consider a wide variety of elements when crafting our compensation ranges and total compensation packages. Starting pay is determined by various factors, including but not limited to: relevant experience, skill set, qualifications, and other business and organizational needs. Please note that compensation ranges may differ for candidates in other locations. Base Pay Range: $147,500 - $195,000 We are an equal opportunity employer and value diversity at Circle. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Additionally, Circle participates in the E-Verify Program in certain locations, as required by law. Should you require accommodations or assistance in our interview process because of a disability, please reach out to accommodations@circle.com for support. We respect your privacy and will connect with you separately from our interview process to accommodate your needs. #LI-Remote
    $147.5k-195k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer II

    Jpmorganchase 4.8company rating

    Columbus, OH jobs

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions. As a Site Reliability Engineer II at JPMorgan Chase within the Enterprise technology, corporate technology team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies. Job responsibilities Execute small to medium projects independently with initial direction, eventually graduating to designing and delivering projects autonomously. Leverage technology to solve business problems by writing high-quality, maintainable, and robust code following best practices in software engineering. Participate in triaging, examining, diagnosing, and resolving incidents, collaborating with others to solve problems at their root. Recognize toil within the role and proactively work towards eliminating it through systems engineering or updating application code. Understand observability patterns and strive to implement and improve service level indicators, objectives monitoring, and alerting solutions for optimal transparency and analysis. Required qualifications, capabilities, and skills Formal training or certification in software engineering concepts with 2+ years of applied experience. Ability to code in at least one programming language. Experience maintaining a cloud-based infrastructure. Familiarity with site reliability concepts, principles, and practices. Familiarity with observability tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others. Familiarity with containers or a common server OS such as Linux and Windows. Emerging knowledge of software, applications, and technical processes within a given technical discipline (e.g., Cloud, AI, Android, etc.). Emerging knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform. Emerging knowledge of common networking technologies. Preferred qualifications, capabilities, and skills Ability to work in a large, collaborative team and demonstrate the willingness to vocalize ideas with peers and managers. Understanding of how to prioritize and adjust work plans to adapt to changes in assigned responsibilities and projects. Eagerness to participate in learning opportunities to enhance effectiveness in executing day-to-day project activities. Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems. General knowledge of the financial services industry. Knowledge of IDEs and use of coding assistants. Knowledge of GEN AI for technology and operations. #LI-ID1
    $97k-119k yearly est. Auto-Apply 60d+ ago
  • Lead Site Reliability Engineer

    Jpmorgan Chase & Co 4.8company rating

    Columbus, OH jobs

    JobID: 210704038 JobSchedule: Full time JobShift: : Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking - Infrastructure & Production Management team, you will hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. Job responsibilities * Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team * Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels * Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers * Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise * Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses * Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other SRE best practices - implementing these within an application or a platform * Documents and shares knowledge within your organization via internal forums and communities of practice Required qualifications, capabilities, and skills * Formal training or certification on software engineering concepts and 5+ years of applied experience * Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform * Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, Micro services, etc.) * Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines * Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Data dog, Splunk, etc. * Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform, etc.) * Experience with container and container orchestration (e.g., ECS, Kubernetes, Dockers, etc.) * Experience with troubleshooting common networking technologies and issues * Ability to expand and collaborate across different levels and stakeholder groups Preferred qualifications, capabilities, and skills * Ability to identify and solve problems related to complex data structures and algorithms * Drive to self-educate and evaluate new technology - with ability to teach, train, and coach team members on current technology trends
    $97k-119k yearly est. Auto-Apply 5d ago
  • Site Reliability Engineer II

    Jpmorgan Chase 4.8company rating

    Columbus, OH jobs

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions. As a Site Reliability Engineer II at JPMorgan Chase within the Enterprise technology, corporate technology team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies. **Job responsibilities** + Execute small to medium projects independently with initial direction, eventually graduating to designing and delivering projects autonomously. + Leverage technology to solve business problems by writing high-quality, maintainable, and robust code following best practices in software engineering. + Participate in triaging, examining, diagnosing, and resolving incidents, collaborating with others to solve problems at their root. + Recognize toil within the role and proactively work towards eliminating it through systems engineering or updating application code. + Understand observability patterns and strive to implement and improve service level indicators, objectives monitoring, and alerting solutions for optimal transparency and analysis. **Required qualifications, capabilities, and skills** + Formal training or certification in software engineering concepts with 2+ years of applied experience. + Ability to code in at least one programming language. + Experience maintaining a cloud-based infrastructure. + Familiarity with site reliability concepts, principles, and practices. + Familiarity with observability tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others. + Familiarity with containers or a common server OS such as Linux and Windows. + Emerging knowledge of software, applications, and technical processes within a given technical discipline (e.g., Cloud, AI, Android, etc.). + Emerging knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform. + Emerging knowledge of common networking technologies. **Preferred qualifications, capabilities, and skills** + Ability to work in a large, collaborative team and demonstrate the willingness to vocalize ideas with peers and managers. + Understanding of how to prioritize and adjust work plans to adapt to changes in assigned responsibilities and projects. + Eagerness to participate in learning opportunities to enhance effectiveness in executing day-to-day project activities. + Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems. + General knowledge of the financial services industry. + Knowledge of IDEs and use of coding assistants. + Knowledge of GEN AI for technology and operations. \#LI-ID1 JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
    $97k-119k yearly est. 60d+ ago
  • Lead Site Reliability Engineer

    Jpmorganchase 4.8company rating

    Columbus, OH jobs

    Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking - Infrastructure & Production Management team, you will hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. Job responsibilities Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other SRE best practices - implementing these within an application or a platform Documents and shares knowledge within your organization via internal forums and communities of practice Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 5+ years of applied experience Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, Micro services, etc.) Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Data dog, Splunk, etc. Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform, etc.) Experience with container and container orchestration (e.g., ECS, Kubernetes, Dockers, etc.) Experience with troubleshooting common networking technologies and issues Ability to expand and collaborate across different levels and stakeholder groups Preferred qualifications, capabilities, and skills Ability to identify and solve problems related to complex data structures and algorithms Drive to self-educate and evaluate new technology - with ability to teach, train, and coach team members on current technology trends
    $97k-119k yearly est. Auto-Apply 5d ago
  • Staff Reliability Engineer

    The Hartford 4.5company rating

    Columbus, OH jobs

    Staff Reliability Engineer - IE07KE We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future. Position Overview: The Staff Reliability Engineer plays a critical role in maintaining the stability, performance, and scalability of our systems and services. This senior-level position is responsible for implementing best practices in reliability engineering, driving continuous improvement, and mentoring team members. The ideal candidate possesses deep technical expertise, strong problem-solving skills, and a passion for building resilient infrastructure. Key Responsibilities + Lead the design, implementation, and optimization of reliable systems and infrastructure. + Collaborate with software engineering, operations, and product teams to ensure uptime and availability targets are met. + Develop and maintain monitoring, alerting, and incident response strategies to detect and resolve issues quickly. + Conduct root cause analysis of system failures and drive corrective actions to prevent recurrence. + Advocate for reliability best practices and foster a culture of proactive risk mitigation across the organization. + Mentor and provide technical guidance to other reliability engineers and cross-functional team members. + Develop automation tools to enhance efficiency in deployment, monitoring, and recovery processes. + Participate in capacity planning, performance testing, and disaster recovery exercises. + Stay current with industry trends, emerging technologies, and best practices in reliability engineering. Qualifications + 5+ years of experience in reliability engineering, site reliability engineering (SRE), or related roles. + Expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and container orchestration (e.g., Kubernetes). + Strong programming skills in one or more languages (e.g., Python, Java). + Proven experience with logging and monitoring tools (e.g., Splunk, Dynatrace, Datadog) and incident management frameworks (e.g. ServiceNow). + Excellent analytical, troubleshooting, and communication skills. + Ability to lead complex projects and influence stakeholders at all levels. Preferred Skills + Experience with infrastructure as code (e.g., Terraform, CloudFormation). + Knowledge of security best practices and compliance requirements. + Background in high-availability architectures and distributed systems. + Certifications in cloud or reliability engineering domains are a plus. Work Environment This position may require participation in an on-call rotation and occasional after-hours support for critical incidents. We offer a dynamic, collaborative environment where innovation and reliability are valued. This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday). Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position. Compensation The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford's total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is: $127,600 - $191,400 Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age About Us (************************************* | Our Culture (******************************************************* | What It's Like to Work Here (************************************************** | Perks & Benefits (********************************************* Every day, a day to do right. Showing up for people isn't just what we do. It's who we are - and have been for more than 200 years. We're devoted to finding innovative ways to serve our customers, communities and employees-continually asking ourselves what more we can do. Is our policy language as simple and inclusive as it can be? Can we better help businesses navigate our ever-changing world? What else can we do to destigmatize mental health in the workplace? Can we make our communities more equitable? That we can rise to the challenge of these questions is due in no small part to our company values that our employees have shaped and defined. And while how we contribute looks different for each of us, it's these values that drive all of us to do more and to do better every day. About Us (************************************* Our Culture What It's Like to Work Here (************************************************** Perks & Benefits Legal Notice (***************************************** Accessibility Statement Producer Compensation (************************************************** EEO Privacy Policy (************************************************** California Privacy Policy Your California Privacy Choices (****************************************************** International Privacy Policy Canadian Privacy Policy (**************************************************** Unincorporated Areas of LA County, CA (Applicant Information) MA Applicant Notice (******************************************** Hartford India Prospective Personnel Privacy Notice
    $127.6k-191.4k yearly 11d ago
  • Staff Reliability Engineer

    The Hartford 4.5company rating

    Columbus, OH jobs

    Staff Reliability Engineer - IE07KE We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future. Position Overview: The Staff Reliability Engineer plays a critical role in maintaining the stability, performance, and scalability of our systems and services. This senior-level position is responsible for implementing best practices in reliability engineering, driving continuous improvement, and mentoring team members. The ideal candidate possesses deep technical expertise, strong problem-solving skills, and a passion for building resilient infrastructure. Key Responsibilities * Lead the design, implementation, and optimization of reliable systems and infrastructure. * Collaborate with software engineering, operations, and product teams to ensure uptime and availability targets are met. * Develop and maintain monitoring, alerting, and incident response strategies to detect and resolve issues quickly. * Conduct root cause analysis of system failures and drive corrective actions to prevent recurrence. * Advocate for reliability best practices and foster a culture of proactive risk mitigation across the organization. * Mentor and provide technical guidance to other reliability engineers and cross-functional team members. * Develop automation tools to enhance efficiency in deployment, monitoring, and recovery processes. * Participate in capacity planning, performance testing, and disaster recovery exercises. * Stay current with industry trends, emerging technologies, and best practices in reliability engineering. Qualifications * 5+ years of experience in reliability engineering, site reliability engineering (SRE), or related roles. * Expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and container orchestration (e.g., Kubernetes). * Strong programming skills in one or more languages (e.g., Python, Java). * Proven experience with logging and monitoring tools (e.g., Splunk, Dynatrace, Datadog) and incident management frameworks (e.g. ServiceNow). * Excellent analytical, troubleshooting, and communication skills. * Ability to lead complex projects and influence stakeholders at all levels. Preferred Skills * Experience with infrastructure as code (e.g., Terraform, CloudFormation). * Knowledge of security best practices and compliance requirements. * Background in high-availability architectures and distributed systems. * Certifications in cloud or reliability engineering domains are a plus. Work Environment This position may require participation in an on-call rotation and occasional after-hours support for critical incidents. We offer a dynamic, collaborative environment where innovation and reliability are valued. This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday). Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position. Compensation The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford's total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is: $127,600 - $191,400 Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age About Us | Our Culture | What It's Like to Work Here | Perks & Benefits
    $127.6k-191.4k yearly Auto-Apply 12d ago
  • Lead Site Reliability Engineer

    Jpmorgan Chase Bank, N.A 4.8company rating

    Westerville, OH jobs

    Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking - Infrastructure & Production Management team, you will hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. Job responsibilities Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other SRE best practices - implementing these within an application or a platform Documents and shares knowledge within your organization via internal forums and communities of practice Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 5+ years of applied experience Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, Micro services, etc.) Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Data dog, Splunk, etc. Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform, etc.) Experience with container and container orchestration (e.g., ECS, Kubernetes, Dockers, etc.) Experience with troubleshooting common networking technologies and issues Ability to expand and collaborate across different levels and stakeholder groups Preferred qualifications, capabilities, and skills Ability to identify and solve problems related to complex data structures and algorithms Drive to self-educate and evaluate new technology - with ability to teach, train, and coach team members on current technology trends Chase is a leading financial services firm, helping nearly half of America's households and small businesses achieve their financial goals through a broad range of financial products. Our mission is to create engaged, lifelong relationships and put our customers at the heart of everything we do. We also help small businesses, nonprofits and cities grow, delivering solutions to solve all their financial needs. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. Equal Opportunity Employer/Disability/Veterans
    $97k-119k yearly est. 3d ago
  • Lead Site Reliability Engineer

    Jpmorgan Chase 4.8company rating

    Westerville, OH jobs

    Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking - Infrastructure & Production Management team, you will hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. **Job responsibilities** + Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team + Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels + Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers + Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise + Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses + Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other SRE best practices - implementing these within an application or a platform + Documents and shares knowledge within your organization via internal forums and communities of practice **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 5+ years of applied experience + Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform + Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, Micro services, etc.) + Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines + Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Data dog, Splunk, etc. + Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform, etc.) + Experience with container and container orchestration (e.g., ECS, Kubernetes, Dockers, etc.) + Experience with troubleshooting common networking technologies and issues + Ability to expand and collaborate across different levels and stakeholder groups **Preferred qualifications, capabilities, and skills** + Ability to identify and solve problems related to complex data structures and algorithms + Drive to self-educate and evaluate new technology - with ability to teach, train, and coach team members on current technology trends Chase is a leading financial services firm, helping nearly half of America's households and small businesses achieve their financial goals through a broad range of financial products. Our mission is to create engaged, lifelong relationships and put our customers at the heart of everything we do. We also help small businesses, nonprofits and cities grow, delivering solutions to solve all their financial needs. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. Equal Opportunity Employer/Disability/Veterans
    $97k-119k yearly est. 2d ago
  • Senior Lead Site Reliability Engineer

    Jpmorgan Chase 4.8company rating

    Columbus, OH jobs

    Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a **Senior Lead Site Reliability Engineer** at **JPMorgan Chase** within the **Infrastructure Platforms team of Corporate Technology,** you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production. **Job responsibilities** + Creates **high quality** **designs, roadmaps, and program charters** that are delivered by you or the engineers under your guidance + Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues + Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team + Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt + Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations + Evolves and debug critical components of applications and platforms + Provides comprehensive and ongoing guidance, tools, and solutions to support the firms' growth + Makes significant contributions to JPMorgan Chase's site reliability community via internal forums, communities of practice, guilds, and conferences **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 5+ years applied experience + Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform + Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. + Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines + Ability to communicate data-based solutions with complex reporting and visualization methods + Ability to anticipate, identify, and troubleshoot defects found during testing **Preferred qualifications, capabilities, and skills** + Strong communication skills with ability to mentor and educate others on site reliability principles and practices + Recognized as an active contributor of the engineering community + Continues to expand network and leads evaluation sessions with vendors to see how offerings can fit into the firm's strategy + Fin-tech background is a plus JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
    $108k-132k yearly est. 60d+ ago
  • Senior Lead Site Reliability Engineer

    Jpmorganchase 4.8company rating

    Columbus, OH jobs

    Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms team of Corporate Technology, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production. Job responsibilities Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations Evolves and debug critical components of applications and platforms Provides comprehensive and ongoing guidance, tools, and solutions to support the firms' growth Makes significant contributions to JPMorgan Chase's site reliability community via internal forums, communities of practice, guilds, and conferences Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 5+ years applied experience Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines Ability to communicate data-based solutions with complex reporting and visualization methods Ability to anticipate, identify, and troubleshoot defects found during testing Preferred qualifications, capabilities, and skills Strong communication skills with ability to mentor and educate others on site reliability principles and practices Recognized as an active contributor of the engineering community Continues to expand network and leads evaluation sessions with vendors to see how offerings can fit into the firm's strategy Fin-tech background is a plus
    $108k-132k yearly est. Auto-Apply 60d+ ago
  • Senior Site Reliability Engineer

    Drivewealth 4.0company rating

    Remote

    DriveWealth is a global B2B financial technology organization dedicated to democratizing access to financial independence around the world. Our mission is realized through an API-based platform, empowering our partners to offer seamless investing and trading experiences to clients worldwide, all from their mobile devices. Our technology provides partners with a modern, extensible toolkit, enabling traditional investment workflows and innovative techniques like fractional share ownership. DriveWealth has evolved into a global platform offering trading of US equities, mutual funds, ETFs, fixed income, and options. We seek enthusiastic professionals to contribute diverse perspectives and experiences to our Brokerage-as-a-Service platform. Our culture blends the pace and opportunity of a tech start-up with the impact, stability, and significance of Wall Street. We encourage creativity and experimentation while ensuring institutional-grade execution and regulatory compliance in everything we do. We value diversity and inclusion, celebrating the unique differences of our employees as we scale and grow together. We're guided by operating principles grounded in accountability, teamwork, integrity, and solutions built to scale. Join us! About The Role As a Senior Site Reliability Engineer based in the US, you will enhance the reliability and performance of our Brokerage-as-a-Service platform during critical 7/24 operations. This role demands a proactive approach to managing technical challenges and system optimizations aligned with our global operational strategies. What You'll Do Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements Handle technical escalations, troubleshoot complex FIX and API connectivity issues, and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution Adhere to and administer incident and change management policies Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones Coordinate Incident Post Mortems and RCA analysis Design, implement, and maintain comprehensive monitoring, logging, and tracing solutions (observability stack) to provide deep insights into system performance and user experience Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs), managing error budgets to ensure service reliability meets business needs What You'll Need 3+ years in a senior SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations Knowledge of FIX protocol and messages, ability to read FIX logs Familiarity with REST APIs and a strong understanding of API integration Proficient in Python and scripting for automation and system management, with a proven track record of developing and implementing automation solutions Expertise in SQL and transactional databases, including querying and troubleshooting Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis In-depth knowledge of core networking concepts including TCP/IP, routing, and DNS Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo) Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage Knowledge of change management processes and risk management Nice to Have, But No Required Experience in the brokerage or financial industry Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB Experience maintaining and supporting containerized systems, with familiarity in orchestration tools Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow Advanced skills in managing containerized environments using Kubernetes and OpenShift Practical experience with Confluent Cloud, RedPanda for event streaming architectures Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging Additional Notes: This role is critical for our continuous operations and requires a commitment to nighttime hours, aligning with the global nature of our financial services. Candidates must be prepared for intense collaboration periods and proactive communication across global teams. Applicants must be authorized to work for any employer in the U.S. DriveWealth is unable to sponsor or take over sponsorship of an employment Visa at this time. Compensation Compensation package offerings are based on candidate experience and technical qualifications, as it relates to the role. These are identified and determined throughout your interviewing experience. Please note: at this time, we are not able to hire in all states. Remote (Most US States) Pay Range$150,000-$170,000 USD Benefits Competitive medical, dental, and vision insurance options Mental health resources Generous paid time off with observed holidays (varies per country) Paid parental leave for biological and adoptive parents Up to $2,500 or local equivalent each year to invest in continued education and personal development Up to $900 each year or local equivalent for fitness and wellness reimbursement Company-provided phone (varies by country) For HQ in-office employees, a daily lunch stipend, unlimited snacks, and engaging office space in the Financial District Pre-tax commuter benefits (US only) Employer 401K match (US only) Benefit offerings vary based on country and are subject to change. Equal Employment Opportunity To build technology and products that are used and loved by people and solve real-world problems, we need to build a team with many different perspectives and experiences. We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us at **************************. Agency Disclaimer DriveWealth does not accept agency resumes. Please do not forward resumes to our jobs alias, employees, or any other organization location. DriveWealth is not responsible for any fees related to unsolicited resumes.
    $150k-170k yearly Auto-Apply 1d ago

Learn more about J.P. Morgan jobs

View all jobs