Post job

Reliability Engineer jobs at Tradeweb

- 99 jobs
  • Site Reliability Engineer

    Tradeweb Markets 4.9company rating

    Reliability engineer job at Tradeweb

    Tradeweb is a global leader in electronic trading for rates, credit, equities, and money markets. As financial markets become increasingly interconnected, our technology enables efficient, multi-asset trading on a global scale. We serve more than 3,000 clients in more than 85 countries, including many of the world's largest banks, asset managers, hedge funds, insurers, corporations, and wealth managers. Creative collaboration and sharp client focus have helped fuel our organic growth. We facilitated average daily trading volume (ADV) of more than $2.2 trillion over the past four fiscal quarters, topping $2.5 trillion in ADV for the first quarter of 2025. Since our IPO in 2019, Tradeweb has completed four acquisitions and doubled our revenues - and 2024 was our 25th consecutive year of record revenues. Tradeweb is a great place to work, recognized in 2024 by Forbes as one of America's Best Companies (2024) and by U.S. News & World Report as one of the Best Financial Services Companies to Work For . Tradeweb Markets LLC ("Tradeweb") is proud to be an EEO Minorities/Females/Protected Veterans/Disabled/Affirmative Action Employer. Mission: Move first and never stop. Collaborate with clients to create and build solutions that drive efficiency, connectivity, and transparency in electronic trading. ***************************************************************** Group Details Tradeweb is seeking a driven Site Reliability Engineer to enhance the reliability of our trading systems and core client-facing applications, with a focus on observability, troubleshooting, optimization, and collaboration. You will be working to enhance the reliability and performance of the Tradeweb Viewer-our core client-facing application-focusing on the network layer and distributed systems. The ideal candidate combines technical expertise with strong communication and client-facing skills to deliver seamless user experiences in complex trading environments. Tradeweb Technology jobs are fully remote. The Tradeweb Technology hub is located in our Jersey City office which can be used for team meetings and collaboration efforts. There may be days where travel to the Jersey City office is recommended for organizational off-sites. Job Responsibilities Develop, maintain, and improve distributed systems and applications. Collaborate with external clients-including occasional on-site visits-to resolve critical issues in the Tradeweb Viewer. Troubleshoot and resolve complex connectivity issues related to proxies, networking protocols (TCP/IP, HTTPS, SSL), and client-server interactions. Optimize application performance, reliability, and scalability using Python, JavaScript, and C++. Analyze network traffic (e.g., Wireshark, packet capture) to identify root causes of latency, errors, or connectivity failures. Design and implement data-driven solutions for monitoring, analytics, and system improvements. Work closely with engineering teams to ensure application performance and reliability. Engage in continuous improvement efforts, identifying and implementing best practices in software engineering, testing, and deployment. Qualifications 1-4 years of software engineering experience with a focus on robust, scalable system. Proficiency in a high-level programming language such as Python or Java, and familiarity with C++; experience with asynchronous programming and web fundamentals (JavaScript, HTTP). Foundational understanding of networking concepts (TCP/IP, SSL/TLS, proxies). Strong problem-solving skills with the ability to debug complex, distributed systems. Excellent communication skills to liaise with technical/non-technical stakeholders and clients. Self-starter mentality with a passion for learning new technologies and domains. Preferred (Not Mandatory): Understanding of diagnostic network tools (Wireshark, Pico). Understanding of encryption and security best practices and their performance implications. Knowledge of data analytics, visualization, or performance optimization techniques. Familiarity with client-facing support or DevOps practices in financial/enterprise environments. Additional Information Tradeweb is committed to providing valuable and competitive benefits. In addition to working in our culture of innovation and collaboration, we offer: Health Insurance: Highly competitive medical, dental, and vision programs Hybrid Environment: Our employees have the flexibility of working in the office and from home. Health Care and Dependent Care Flexible Spending Accounts: You may elect to set aside pre-tax earnings to pay for eligible health care and dependent day care expenses for you and your eligible family members. Maven Family Building Benefit: Maven offers support for fertility and preconception; pregnancy and post-partum; adoption; surrogacy and pediatrics for children up to age 10. Tradeweb provide a $10,000 lifetime reimbursement towards fertility, egg freezing, adoption and surrogacy expenses. Building Wealth - 401(k) Savings Plan: Employees are immediately eligible for the 401(k) plan. Participants may contribute up to 75% of eligible compensation into a traditional 401(k) and/or Roth 401(k). Tradeweb will match 100% of the first 4% of compensation that you contribute. The current pay range for this role is currently $120,000 to $175,000 per year, based on a regular, full-time schedule. The amount of pay offered will be determined by a number of factors, including but not limited to qualifications, market data, and internal guidelines. This role will also be eligible to participate in Tradeweb's discretionary bonus program. This role is expected to remain open until 11/29/25. Other Benefit Programs Pre-Tax Commuter Benefits Program ARAG Legal Services Employee Assistance Program Tuition Reimbursement Financial Wellness Tools Travel Assistance Benefits Pet Insurance Corporate Gym Subsidies Wellness Perks Paid Time Off and Parental Leave
    $120k-175k yearly Auto-Apply 45d ago
  • Site Reliability Engineer

    Mio Partners 4.5company rating

    New York, NY jobs

    MIO Partners, Inc. (MIO) provides proprietary investment products to McKinsey's retirement plan and partners and offers independent, high-quality financial advice to McKinsey's partners. We manage a wide array of investment vehicles with significant expertise and a long and successful track record in alternative strategies, including hedge funds and private equity. We have a multibillion-dollar portfolio of assets under management, and we manage assets for and advise only McKinsey-related clients; we do not accept outside or third-party investments. MIO is a values-based organization that is strongly aligned with our investors' interests. MIO measures success as performance relative to a market-based benchmark. MIO, a 250+ person registered investment adviser, provides ample opportunities for somebody with an entrepreneurial drive to shine. We strive to meet the highest professional standards and build an organization that attracts, develops, and retains exceptional people. MIO is a wholly owned subsidiary of McKinsey, but our activities are kept entirely separate from those of the consulting Firm. Primary responsibilities The successful candidate will have extensive technical experience working with AWS cloud technologies, preferably for financial services firms, such as asset managers, hedge funds, and/or broker/dealers. The new hire must lead by example and work collaboratively to: Design and maintain monitoring systems and dashboards Architect and manage cloud infrastructure (AWS, Azure) with security, stability, and cost in mind Implement CI/CD pipelines for reliable software delivery Establish infrastructure as code practices using CDK, GitLab, AWS developer tools Contribute to MIO application codebase to follow resiliency and performance best practices Ensure application architectures follow cloud best practices for reliability, security, performance, and efficiency Work with development teams to improve deployment processes and system reliability Collaborate with business owners to translate business requirements into technical solutions with an eye toward technology consistency and best practices Work with engineers, business users, and other stakeholders to understand their needs and ensure solutions align with business goals Maintain detailed documentation for reference architectures, design patterns, and system configurations Raise the bar on our development capabilities, standards, and processes Synthesize requirements gathered from various teams within/outside of IT and suggest creative solutions; where appropriate, guiding MIO to “do it the right way” Following a scrum methodology, organize with end users, business analysts, and other architects and developers Recommend positive steps toward standardizing development processes, including technology selection, deployment steps, code reviews, and IT tools Partner with development, QA, and AppSecOps teams to promote standardization, consistency, and improved security posture Our applications are primarily developed using Python/Django and libraries such as Pandas, NumPy, PL/SQL. In addition, we utilize SQL Server, MySQL, Elastic Search, Redis, Kafka, Tableau, and various third-party APIs and data sources. Our applications are hosted in AWS using docker containers on ECS/EC2 platforms. Primary responsibilities estimated percentage allocation 25% Technology Leadership: design, mentoring, 15% Relationship Building: requirements 60% Heads Down Development Desired background Please note applicants must be authorized to work in the U.S. without current or future visa sponsorship At least 8+ years of hands-on experience in DevOps, SRE, or platform engineering roles Bachelor of science in computer science or other related discipline (although strong experience with a less directly related degree will be considered) Strong experience in AWS Cloud technologies Knowledge of CI/CD pipeline tools (GitLab pipelines, Jenkins etc.) Understanding of monitoring and observability tools (ELK, Dynatrace, Datadog etc.) Experience with microservices, serverless architectures, and containerization Proficiency in AWS cloud platform including infrastructure-as-code and CI/CD pipelines Formal problem-solving and/or analytical training/experience a plus, as is experience working with management consultants Good intuition for end-user requirements gathering; iterative and collaborative approach to design Strong client relationship management skills and excellent written/verbal communication skills to interact at all levels ***************** MIO Partners, Inc. (MIO) is an equal opportunity employer. MIO will consider all applicants regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability status. MIO has adopted a flexible, hybrid model that supports a blend of in-office and remote work. Our office is in New York City. Certain US states require MIO Partners, Inc. to include a reasonable estimate of the salary range for this role. Actual salaries may vary and may be above or below the range based on various factors, including, but not limited to an individual's assigned office location, experience, and expertise. Certain roles are also eligible for bonuses, subject to MIO's discretion and based on factors such as individual and/or organizational performance. Additionally, MIO offers a comprehensive benefits package, including medical, dental and vision coverage, telemedicine services, life, accident and disability insurance, parental leave and family planning benefits, caregiving resources, a generous retirement program, financial guidance, and paid time off. Base salary range$175,000-$200,000 USD MIO Partners, Inc. (MIO) is an equal opportunity employer. MIO will consider all applicants regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability status. We are committed to protecting your privacy. Please review our Applicant Privacy Policy for a detailed explanation of how we collect, use, and protect your personal information.
    $175k-200k yearly Auto-Apply 12d ago
  • Site Reliability Engineer

    The Voleon Group 4.1company rating

    Remote

    Voleon is a technology company that applies state-of-the-art AI and machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying AI/ML to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future. Your colleagues will include internationally recognized experts in artificial intelligence and machine learning research as well as highly experienced finance and technology professionals. The people who shape our company come from other backgrounds, including concert music performances, humanitarian aid, opera singing, sports writing, and BMX racing. You will be part of a team that loves to succeed together. In addition to our enriching and collegial working environment, we offer highly competitive compensation and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor production-critical infrastructure and data pipelines. At Voleon, many SREs serve together on a Production Operations team tasked with improving shared production infrastructure. Others are embedded with teams of software engineers to improve specific production systems owned by those teams. Voleon SREs work on important real-world problems and collaborate with passionate and talented colleagues in an empowering, results-driven environment. This role is a way to make a real difference: your contributions will make our critical systems more reliable, lower operational risk, and increase the efficiency of our engineering effort.Responsibilities Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems Diagnose and fix bugs in code Lead complex deployments Automate manual workflows Track and prioritize outstanding production-related issues Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems Requirements Experience with coding and debugging Python Experience with Linux Familiarity with Relational Databases & SQL Sharp analytical and problem-solving skills and a persistent drive to make things work (better) Strong growth mindset and a passion for learning Strong technical communication skills Attention to detail 2 years of relevant industry experience An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience Preferred Qualifications Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment Experience supporting production systems Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes The base salary for this position is $115,000 to $135,000 in the location(s) of this posting. Individual salaries are determined through a variety of factors, including, but not limited to, education, experience, knowledge, skills, and geography. Base salary does not include other forms of total compensation such as bonus compensation and other benefits. Our benefits package includes medical, dental and vision coverage, life and AD&D insurance, 20 days of paid time off, 9 sick days, and a 401(k) plan with a company match. “Friends of Voleon” Candidate Referral ProgramIf you have a great candidate in mind for this role and would like to have the potential to earn $7,500 - $15,000 if your referred candidate is successfully hired and employed by The Voleon Group, please use this form to submit your referral. For more details regarding eligibility, terms and conditions please make sure to review the Voleon Referral Bonus Program. Equal Opportunity EmployerThe Voleon Group is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
    $115k-135k yearly Auto-Apply 4d ago
  • Site Reliability Engineer 2

    Drivewealth 4.0company rating

    Remote

    DriveWealth is a global B2B financial technology organization dedicated to democratizing access to financial independence around the world. Our mission is realized through an API-based platform, empowering our partners to offer seamless investing and trading experiences to clients worldwide, all from their mobile devices. Our technology provides partners with a modern, extensible toolkit, enabling traditional investment workflows and innovative techniques like fractional share ownership. DriveWealth has evolved into a global platform offering trading of US equities, mutual funds, ETFs, fixed income, and options. We seek enthusiastic professionals to contribute diverse perspectives and experiences to our Brokerage-as-a-Service platform. Our culture blends the pace and opportunity of a tech start-up with the impact, stability, and significance of Wall Street. We encourage creativity and experimentation while ensuring institutional-grade execution and regulatory compliance in everything we do. We value diversity and inclusion, celebrating the unique differences of our employees as we scale and grow together. We're guided by operating principles grounded in accountability, teamwork, integrity, and solutions built to scale. Join us! About The Role As a Site Reliability Engineer 2, you will enhance the reliability and performance of our Brokerage-as-a-Service platform during critical 7/24 operations. This role demands a proactive approach to managing technical challenges and system optimizations that align with our global operational strategies. What You'll Do Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements. Handle technical escalations, troubleshoot complex issues, and actively participate in on-call rotations to ensure rapid response and resolution during non-traditional hours. Adhere and administer incident and change management policies. Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability, especially during critical system operations at night. Work closely with the New York office to ensure smooth operation and alignment of SRE practices across time zones. What You'll Need 3+ years in a SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations. Working knowledge in REST APIs and understanding of API integration. Python proficiency in scripting for automation and system management, with a track record of developing and implementing automation solutions. SQL and Database expertise in transactional databases, including querying and troubleshooting. Analytical and troubleshooting skills with a demonstrated ability to perform troubleshooting and root cause analysis of technical issues. Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage. Knowledge of Change Management Process and Risk Management. Nice to Have, But No Required Experience in the brokerage or financial industry Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB Experience maintaining and supporting containerized systems, with familiarity in orchestration tools Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow Advanced skills in managing containerized environments using Kubernetes and OpenShift Practical experience with Confluent Cloud for event streaming architectures Experience with Java applications and a basic understanding of using the browser developer console for front-end debugging Additional Notes: This role is critical for our continuous operations and requires a commitment to nighttime hours, aligning with the global nature of our financial services. Candidates must be prepared for intense collaboration periods and proactive communication across global teams. Applicants must be authorized to work for any employer in the U.S. DriveWealth is unable to sponsor or take over sponsorship of an employment Visa at this time. Compensation Compensation package offerings are based on candidate experience and technical qualifications, as it relates to the role. These are identified and determined throughout your interviewing experience. Please note: at this time, we are not able to hire in all states. Remote (Most US States) Pay Range$130,000-$150,000 USD Benefits Competitive medical, dental, and vision insurance options Mental health resources Generous paid time off with observed holidays (varies per country) Paid parental leave for biological and adoptive parents Up to $2,500 or local equivalent each year to invest in continued education and personal development Up to $900 each year or local equivalent for fitness and wellness reimbursement Company-provided phone (varies by country) For HQ in-office employees, a daily lunch stipend, unlimited snacks, and engaging office space in the Financial District Pre-tax commuter benefits (US only) Employer 401K match (US only) Benefit offerings vary based on country and are subject to change. Equal Employment Opportunity To build technology and products that are used and loved by people and solve real-world problems, we need to build a team with many different perspectives and experiences. We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us at **************************. Agency Disclaimer DriveWealth does not accept agency resumes. Please do not forward resumes to our jobs alias, employees, or any other organization location. DriveWealth is not responsible for any fees related to unsolicited resumes.
    $130k-150k yearly Auto-Apply 19d ago
  • Data Reliability Engineer II

    Zeta 4.4company rating

    Ridgefield, NJ jobs

    Zeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by Bhavin Turakhia and Ramki Gaddipati in 2015.Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios. Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US, EMEA, and Asia. We raised $340 million at a $2 billion valuation from Softbank, Mastercard, and other investors in 2021.Learn more @ ************** careers.zeta.tech, Linkedin, TwitterResponsibilities Proactively monitor PostgreSQL RDS instances for performance, availability, and resource utilization (CPU, memory, storage, connections) using established monitoring tools (e.g., CloudWatch, Prometheus). Assist in identifying performance bottlenecks in PostgreSQL RDS. Apply basic performance tuning techniques like reviewing query execution plans, adding missing indexes, and recommending parameter adjustments. Monitor the health and performance of Debezium and Kafka Connect connectors, identifying and troubleshooting basic issues related to data capture and delivery. Monitor Apache Nifi data flows for errors, backpressure, and performance issues. Assist in troubleshooting and resolving common Nifi flow failures. Provide support for data related issues and participate in root cause analysis. Monitor the execution of Apache Airflow DAGs, identify failed tasks, and troubleshooting and re-runs. Develop and maintain automation scripts and infrastructure as code (IAC) templates (e.g., using Crossplane, Terraform) to automate routine database tasks, deployments, and updates. Participate in on-call rotations to respond to database-related incidents and perform troubleshooting and root cause analysis. Assist in implementing and maintaining security best practices for cloud databases, including access controls, encryption, and compliance with regulatory requirements. Regularly audit and assess database security configurations. Configure and manage database backup and recovery strategies to ensure data integrity and availability in case of failures or data loss. Analyse database query performance and collaborate with developers to optimize SQL queries and schemas. Participate in continuous improvement initiatives to enhance the reliability, scalability, and performance of cloud databases. Assist in the design and optimization of database schemas for cloud environments. Skills Familiarity with data pipeline concepts and technologies like Debezium, Kafka Connect, Apache Nifi. Basic understanding of Amazon Redshift and S3. Exposure to Apache Spark for data processing. Basic understanding of Apache Airflow for workflow orchestration. Strong SQL scripting skills for querying and basic data manipulation. Familiarity with scripting languages (e.g., Python, Bash) is a plus. Knowledge of database security best practices, including access controls, encryption, and compliance with regulatory requirements (e.g., GDPR, HIPAA). Having ‘AWS Certified Database - Specialty' certification is a plus Experience and Qualifications Bachelor's degree in Computer Science, Information Technology, or a related field. 3-5 years of experience in database administration, with a focus on PostgreSQL. 1-2 years of hands-on experience with PostgreSQL RDS. Equal Opportunity Zeta is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants from all backgrounds, cultures, and communities to apply and believe that a diverse workforce is key to our success
    $141k-187k yearly est. 16d ago
  • Site Reliability Engineer - Core Platform Services

    Morgan Stanley 4.6company rating

    Edison, NJ jobs

    Company Profile: At Morgan Stanley, we advise, originate, trade, manage and distribute capital for governments, institutions and individuals, and always do so with a standard of excellence. We are a leading global financial services firm that conducts its business through three principal business segments-Institutional Securities, Wealth Management (WM), and Investment Management. The Firm's employees serve clients worldwide from more than 1,200 offices in 43 countries. Our WM business is one of the largest in the world with more than $2 trillion in client assets, $73 billion in lending balances, and nearly 16,000 Financial Advisors in 600+ offices across the U.S. Our Financial Advisors focus on delivering timely, customized solutions and services that help clients meet their financial and life goals. Our offering includes brokerage and investment advisory services, financial and wealth planning, access to credit and lending, cash management, annuities and insurance, and retirement services. As a market leader, the talent and passion of our people is critical to our success. Together, we share a common set of values rooted in integrity, excellence and strong team ethic. Morgan Stanley can provide a superior foundation for building a professional career - a place for people to learn, to achieve and grow. A philosophy that balances personal lifestyles, perspectives and needs is an important part of our culture. Department Profile: Reliability Operations is responsible for risk mitigation, stability, driving performance, and efficiency across Wealth Management Technology. Through Production Operations, Observability Engineering, Resiliency Assessment & Validation and Reliability Engineering, we will improve and increase Wealth Management stability, reliability, resiliency, efficiency, and performance. If you are an exceptional individual who is interested in solving complex problems and building sophisticated solutions in a dynamic team environment, Reliability Operations is the place for you. The ‘Site Reliability Engineer' role is within the Core Platform Services Super Department in Wealth Management Technology. Job Summary: We are looking for a Site Reliability Engineer at the Associate, Director and Vice President levels. The position in the Reliability Operations team is focused on delivering exceptional services to both BU and Dev partners to minimize/avoid any production outages. The role will focus on production support, automating deployments and working with the agile teams to build and support stable and reliable production systems. The ideal candidate will be passionate about automation and skilled in one of the programming languages: Python/PERL/ SHELL, Ruby, JAVA, C# or the like. Candidate should possess a strong understanding of database concepts, job scheduler, MQ, Web services, UNIX/LINUX/Windows OS as well as experience with debugging applications. We are looking for a strong leader with excellent communications skills who is committed to continuously improving and delivering results. Candidate should be organized, disciplined, detail-oriented, self-motivated, and delivery-focused. Responsibilities: Maintain applications once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and TOIL. Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews. Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs. Troubleshoot infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions Work closely with the application Development team to understand the platform and create tools/utilities to help with production management Work with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams Develop scripts and assist with code changes along with operational tasks/activities. Work closely with Application Development to ensure that the support team has excellent knowledge of the application set, own and maintain support knowledgebase and documents. Use analytical skills to find trends in the environment and drive out problems. Lead effort to determine improvement areas to stabilize the plant. Identify risks and work with a sense of urgency, working within a team or independently. Test and tune network, hardware, and software configurations to maximize performance Interface with different teams like IT Dev managers, Infrastructure teams and lead as a Subject Matter Expert (SME) for the application(s) supported. Understand the overall business flow of supported application systems and its interface with clients Take ownership and managing production requests, questions, issues and perform Root Cause Analysis for outages/incidents Understand the overall business flow of supported application systems and its interface with clients Be flexible to provide weekend on call rotation and available for offshore time lead Be accountable for the Production Environments as well as the non-Production Environments for the existing GBOT team and be part of 24/7 production support coverage. Skills Required: 10+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail Ability to focus, provide resolutions for production issues in a high demanding and pressured environment 10+ years hands-on experience in designing, developing, and implementing technical solutions, or significant experience in deep technical support Strong experience in scripting language (Shell scripting, Python, Perl, etc.) and cloud driven development Strong database skills with DB2, Sybase or Oracle Hands-on experience with Autosys or other batch scheduling software Strong experience in Continuous Integration and Continuous Deployment Strong experience in environment on demand for both Virtual Machines and containers Knowledge and hands-on experience on with monitoring tools like Splunk, IP Soft, Sockeye Practical experience on Agile Methodology (e.g. Scrum) Knowledge or experience with automating deployments using Jenkins, Train or Windeploy Ability to diagnose technical problems, debug, optimize code, and automate routine tasks Hands-on experience in application and database troubleshooting/issue resolution in a fast-paced environment Excellent communication and ability to think out of the box for process improvements. Knowledge of Cloud based deployment, security, networking concepts in Azure and AWS Bachelor's/Master's Degree in Computer Science, Information Systems or related field Skills Desired: Knowledge or experience with algorithms, data structures, complexity analysis and software design Interest in designing, analyzing and troubleshooting large-scale distributed systems Educational Qualification: Minimum BS degree in Computer Science, Engineering or a related field WHAT YOU CAN EXPECT FROM MORGAN STANLEY: We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste ***************************************************** into your browser. Expected base pay rates for the role will be between $70,000 and $120,000 per year for Associate and between $95,000 and $135,000 per year for Director and between $120,000 and $170,000 for Vice President at the commencement of employment. However, base pay if hired will be determined on an individualized basis and is only part of the total compensation package, which, depending on the position, may also include commission earnings, incentive compensation, discretionary bonuses, other short and long-term incentive packages, and other Morgan Stanley sponsored benefit programs. Morgan Stanley's goal is to build and maintain a workforce that is diverse in experience and background but uniform in reflecting our standards of integrity and excellence. Consequently, our recruiting efforts reflect our desire to attract and retain the best and brightest from all talent pools. We want to be the first choice for prospective employees. It is the policy of the Firm to ensure equal employment opportunity without discrimination or harassment on the basis of race, color, religion, creed, age, sex, sex stereotype, gender, gender identity or expression, transgender, sexual orientation, national origin, citizenship, disability, marital and civil partnership/union status, pregnancy, veteran or military service status, genetic information, or any other characteristic protected by law. Morgan Stanley is an equal opportunity employer committed to diversifying its workforce (M/F/Disability/Vet).
    $120k-170k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer - Capital Markets

    Jefferies 4.8company rating

    Jersey City, NJ jobs

    Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas. Responsibilities: Front Line Site Reliable Engineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users. Build monitoring tools for application and infrastructure components. Implement and manage scalable infrastructure using cloud-native technologies and tools. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures. Develop and maintain CI/CD pipelines to streamline deployment processes. Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth. Create sustainable systems and services through automation. Collaborate with Application team to establish and enforce production and development standards. Document procedures, best practices and troubleshooting FAQs. Resolve complex application and technical problems. Debugging the system and fixing the production related issues. Escalate / follow-up on permanent fix for development related issues. Lead incident response efforts and post-mortem analysis to prevent future occurrences. Handles complex operational tasks and recommends process and technology changes. Global support and includes weekend availability to troubleshoot production related issues and perform checkouts. Ability to work both independently and in groups in an energetic, diverse environment. Participate in on-call rotations to ensure 24/7 system availability and support. Support compliance and legal queries. Qualifications: Strong experience in Windows and Linux/Unix services. Strong experience in scripting language like Power shell, Python and SQL. Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog Strong Knowledge of FIX protocol Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle. Excellent communication, time management and project management skills. Primary Location Full Time Salary Range of $175,000 - $200,000
    $175k-200k yearly Auto-Apply 22d ago
  • Site Reliability Engineer - Capital Markets

    Jefferies Financial Group Inc. 4.8company rating

    New York, NY jobs

    Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas. Responsibilities: * Front Line Site Reliable Engineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users. * Build monitoring tools for application and infrastructure components. * Implement and manage scalable infrastructure using cloud-native technologies and tools. * Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. * Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures. * Develop and maintain CI/CD pipelines to streamline deployment processes. * Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth. * Create sustainable systems and services through automation. * Collaborate with Application team to establish and enforce production and development standards. * Document procedures, best practices and troubleshooting FAQs. * Resolve complex application and technical problems. * Debugging the system and fixing the production related issues. * Escalate / follow-up on permanent fix for development related issues. * Lead incident response efforts and post-mortem analysis to prevent future occurrences. * Handles complex operational tasks and recommends process and technology changes. * Global support and includes weekend availability to troubleshoot production related issues and perform checkouts. * Ability to work both independently and in groups in an energetic, diverse environment. * Participate in on-call rotations to ensure 24/7 system availability and support. * Support compliance and legal queries. Qualifications: * Strong experience in Windows and Linux/Unix services. * Strong experience in scripting language like Power shell, Python and SQL. * Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog * Strong Knowledge of FIX protocol * Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products * Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle. * Excellent communication, time management and project management skills. Primary Location Full Time Salary Range of $175,000 - $200,000
    $175k-200k yearly Auto-Apply 3d ago
  • Site Reliability Engineer III

    Jpmorgan Chase 4.8company rating

    Jersey City, NJ jobs

    There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Consumer and Community banking team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. **Job responsibilities** + Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate + Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines + Implements infrastructure, configuration, and network as code for the applications and platforms in your remit + Collaborates with technical experts, key stakeholders, and team members to resolve complex problems + Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers + Supports the adoption of site reliability engineering best practices within your team **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 3+ years applied experience + Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform + Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net + Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) + Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others + Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker + Familiarity with troubleshooting common networking technologies and issues + Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision + Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation + Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team **Preferred qualifications, capabilities, and skills** + Participate on call support rota for high severity issues to help with diagnosis and collect facts for RCA + Ability to facilitate post mortem meetings for Root Cause Analysis and implement effective steps for stability improvements + Ability to write technical documentation for lessons learnt from issues and help improve runbook steps for Mission Control teams. + Completed AWS Solution Architect certification Chase is a leading financial services firm, helping nearly half of America's households and small businesses achieve their financial goals through a broad range of financial products. Our mission is to create engaged, lifelong relationships and put our customers at the heart of everything we do. We also help small businesses, nonprofits and cities grow, delivering solutions to solve all their financial needs. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. Equal Opportunity Employer/Disability/Veterans **Base Pay/Salary** Jersey City,NJ $133,000.00 - $185,000.00 / year
    $133k-185k yearly 33d ago
  • Site Reliability Engineer III- Kafka Platform Engineering

    Jpmorgan Chase 4.8company rating

    Jersey City, NJ jobs

    There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platforms, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. **Job responsibilities** + Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate. + Demonstrate deep knowledge of Kafka technology, Kafka connect framework, and distributed systems technologies, with the ability to operate in and migrate across public and private clouds. + Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines + Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications + Implements infrastructure, configuration, and network as code for the applications and platforms in your remit. + Collaborates with technical experts, key stakeholders, and team members to resolve complex problems. + Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers. + Contribute to the development of technical documentation, including service APIs using Swagger, ensuring robust logging, auditability, security, and monitoring features. + Supports the adoption of site reliability engineering best practices within your team. + Engage in periodic on-call rotation shifts, providing client support and ensuring thorough monitoring of the platform. **Required qualifications, capabilities, and skills** + Formal training or certification on computer science and reliability concepts and 3+ years applied experience. + Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform + Proficient in at least one programming language such as Java/Spring Boot, python. + Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) + Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. + Experience with public cloud platforms like AWS, GCP or Azure. + Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams. + Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform. + Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker. + Familiarity with troubleshooting common networking technologies and issues. + Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision + Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation + Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team + Ability to initiate and implement ideas to solve business problems. **Preferred qualifications, capabilities, and skills** + Familiarity with running Apache Flink. + Understanding of authentication and authorization technologies (e.g., OAUTH, Kerberos). + Experience with AWS cloud services and Kubernetes platform orchestration. JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans **Base Pay/Salary** Jersey City,NJ $133,000.00 - $185,000.00 / year
    $133k-185k yearly 60d+ ago
  • Site Reliability Engineer III

    Jpmorgan Chase & Co 4.8company rating

    Jersey City, NJ jobs

    JobID: 210689758 JobSchedule: Full time JobShift: Day Base Pay/Salary: Jersey City,NJ $133,000.00-$185,000.00 There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the [insert LOB or sub LOB], you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities * Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate * Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines * Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications * Implements infrastructure, configuration, and network as code for the applications and platforms in your remit * Collaborates with technical experts, key stakeholders, and team members to resolve complex problems * Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers * Supports the adoption of site reliability engineering best practices within your team Required qualifications, capabilities, and skills * Formal training or certification on software engineering concepts and 3+ years applied experience * Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform * Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net * Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) * Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others * Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform * Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker * Familiarity with troubleshooting common networking technologies and issues * Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision Preferred qualifications, capabilities, and skills * Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation * Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team * Ability to initiate and implement ideas to solve business problems
    $133k-185k yearly Auto-Apply 16d ago
  • Site Reliability Engineer III

    Jpmorgan Chase 4.8company rating

    Jersey City, NJ jobs

    There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the (insert LOB or sub LOB), you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. **Job responsibilities** + Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate + Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines + Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications + Implements infrastructure, configuration, and network as code for the applications and platforms in your remit + Collaborates with technical experts, key stakeholders, and team members to resolve complex problems + Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers + Supports the adoption of site reliability engineering best practices within your team **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 3+ years applied experience + Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform + Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net + Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) + Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others + Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform + Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker + Familiarity with troubleshooting common networking technologies and issues + Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision **Preferred qualifications, capabilities, and skills** + Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation + Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team + Ability to initiate and implement ideas to solve business problems JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans **Base Pay/Salary** Jersey City,NJ $133,000.00 - $185,000.00 / year
    $133k-185k yearly 14d ago
  • Site Reliability Engineer III- Kafka Platform Engineering

    Jpmorgan Chase & Co 4.8company rating

    Jersey City, NJ jobs

    JobID: 210662270 JobSchedule: Full time JobShift: Base Pay/Salary: Jersey City,NJ $133,000.00-$185,000.00 There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platforms, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities * Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate. * Demonstrate deep knowledge of Kafka technology, Kafka connect framework, and distributed systems technologies, with the ability to operate in and migrate across public and private clouds. * Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines * Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications * Implements infrastructure, configuration, and network as code for the applications and platforms in your remit. * Collaborates with technical experts, key stakeholders, and team members to resolve complex problems. * Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers. * Contribute to the development of technical documentation, including service APIs using Swagger, ensuring robust logging, auditability, security, and monitoring features. * Supports the adoption of site reliability engineering best practices within your team. * Engage in periodic on-call rotation shifts, providing client support and ensuring thorough monitoring of the platform. Required qualifications, capabilities, and skills * Formal training or certification on computer science and reliability concepts and 3+ years applied experience. * Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform * Proficient in at least one programming language such as Java/Spring Boot, python. * Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) * Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. * Experience with public cloud platforms like AWS, GCP or Azure. * Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams. * Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform. * Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker. * Familiarity with troubleshooting common networking technologies and issues. * Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision * Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation * Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team * Ability to initiate and implement ideas to solve business problems. Preferred qualifications, capabilities, and skills * Familiarity with running Apache Flink. * Understanding of authentication and authorization technologies (e.g., OAUTH, Kerberos). * Experience with AWS cloud services and Kubernetes platform orchestration.
    $133k-185k yearly Auto-Apply 60d+ ago
  • Equity Site Reliability Engineer

    Jefferies Financial Group 4.6company rating

    Jersey City, NJ jobs

    Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas. Job Duties: Front Line Site Reliable Engineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users. Build monitoring tools for application and infrastructure components. Implement and manage scalable infrastructure using cloud-native technologies and tools. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures. Develop and maintain CI/CD pipelines to streamline deployment processes. Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth. Create sustainable systems and services through automation. Collaborate with Application team to establish and enforce production and development standards. Document procedures, best practices and troubleshooting FAQs. Resolve complex application and technical problems. Debugging the system and fixing the production related issues. Escalate / follow-up on permanent fix for development related issues. Lead incident response efforts and post-mortem analysis to prevent future occurrences. Handles complex operational tasks and recommends process and technology changes. Global support and includes weekend availability to troubleshoot production related issues and perform checkouts. Ability to work both independently and in groups in an energetic, diverse environment. Participate in on-call rotations to ensure 24/7 system availability and support. Support compliance and legal queries. Experience/skills Required: Strong experience in Windows and Linux/Unix services. Strong experience in scripting language like Power shell, Python and SQL. Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog Strong Knowledge of FIX protocol Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle. Excellent communication, time management and project management skills. The Salary Range for this role is $150,000-$225,000 #LI-JR1
    $150k-225k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer II- Physical Security Technology

    Jpmorgan Chase 4.8company rating

    Jersey City, NJ jobs

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions. As a Site Reliability Engineer II at JPMorgan Chase within the enterprise technology, finance technology team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies. **Job responsibilities** + Assist in the deployment and configuration of Genetec Security Center on Windows servers, ensuring successful implementation and integration of security systems. + Provide first-level technical support to end-users, troubleshoot issues related to Genetec Security Center, and offer recommendations on best practices. + Recognize and eliminate toil through systems engineering or automation, and implement observability patterns to improve service level indicators, objectives monitoring, and alerting solutions. + Collaborate with senior IT staff and participate in training sessions to improve knowledge and skills related to Genetec Security Center and other IT systems. + Monitor system performance and availability of core Genetec Security Center services, ensuring optimal transparency and analysis. + Document and maintain accurate records of system configurations, changes, and support requests, ensuring clear communication and organization. + Package Genetec Security Center software for client and server installs, and provide on-call support as needed to address urgent issues. **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 2+ years applied experience. + 2+ years' experience working with Genetec Security Center, including configuring federations, and experience with installing and upgrading Genetec Security Center software while managing Windows patching. + Familiarity with observability practices such as white and black box monitoring, service level objective alerting, and telemetry collection using tools like Grafana, Dynatrace, Prometheus, Datadog, and Splunk. + Possession of Genetec Security Center Omnicast certification and a good understanding of network protocols and security principles. + Experience working with third-party applications deployed on Windows Server environments and the ability to work with SQL Server, including running queries. + Strong problem-solving skills and attention to detail, ensuring effective troubleshooting and resolution of issues. + Excellent communication and interpersonal skills, facilitating collaboration and effective interaction with team members and stakeholders. + Ability to work independently and as part of a team, demonstrating flexibility and adaptability in various work environments. + Willingness to learn and adapt to new technologies, staying current with industry trends and advancements. **Preferred qualifications, capabilities, and skills** + General knowledge of financial services industry + Experience working with third-party applications. + Experience working with any other video management solutions + Experience working with Intrusion Detection systems + Genetec Mission Control certification is a plus \#LI-ID1 JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans **Base Pay/Salary** Jersey City,NJ $118,750.00 - $150,000.00 / year
    $118.8k-150k yearly 60d+ ago
  • Site Reliability Engineer III- Kafka Platform Engineering

    Jpmorganchase 4.8company rating

    Jersey City, NJ jobs

    There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platforms, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate. Demonstrate deep knowledge of Kafka technology, Kafka connect framework, and distributed systems technologies, with the ability to operate in and migrate across public and private clouds. Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications Implements infrastructure, configuration, and network as code for the applications and platforms in your remit. Collaborates with technical experts, key stakeholders, and team members to resolve complex problems. Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers. Contribute to the development of technical documentation, including service APIs using Swagger, ensuring robust logging, auditability, security, and monitoring features. Supports the adoption of site reliability engineering best practices within your team. Engage in periodic on-call rotation shifts, providing client support and ensuring thorough monitoring of the platform. Required qualifications, capabilities, and skills Formal training or certification on computer science and reliability concepts and 3+ years applied experience. Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform Proficient in at least one programming language such as Java/Spring Boot, python. Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Experience with public cloud platforms like AWS, GCP or Azure. Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams. Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform. Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker. Familiarity with troubleshooting common networking technologies and issues. Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team Ability to initiate and implement ideas to solve business problems. Preferred qualifications, capabilities, and skills Familiarity with running Apache Flink. Understanding of authentication and authorization technologies (e.g., OAUTH, Kerberos). Experience with AWS cloud services and Kubernetes platform orchestration.
    $113k-140k yearly est. Auto-Apply 60d+ ago
  • Site Reliability Engineer III

    Jpmorganchase 4.8company rating

    Jersey City, NJ jobs

    There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Consumer and Community banking team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines Implements infrastructure, configuration, and network as code for the applications and platforms in your remit Collaborates with technical experts, key stakeholders, and team members to resolve complex problems Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers Supports the adoption of site reliability engineering best practices within your team Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 3+ years applied experience Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker Familiarity with troubleshooting common networking technologies and issues Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team Preferred qualifications, capabilities, and skills Participate on call support rota for high severity issues to help with diagnosis and collect facts for RCA Ability to facilitate post mortem meetings for Root Cause Analysis and implement effective steps for stability improvements Ability to write technical documentation for lessons learnt from issues and help improve runbook steps for Mission Control teams. Completed AWS Solution Architect certification
    $113k-140k yearly est. Auto-Apply 23d ago
  • Site Reliability Engineer III

    Jpmorganchase 4.8company rating

    Jersey City, NJ jobs

    There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the [insert LOB or sub LOB], you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications Implements infrastructure, configuration, and network as code for the applications and platforms in your remit Collaborates with technical experts, key stakeholders, and team members to resolve complex problems Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers Supports the adoption of site reliability engineering best practices within your team Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 3+ years applied experience Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker Familiarity with troubleshooting common networking technologies and issues Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision Preferred qualifications, capabilities, and skills Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team Ability to initiate and implement ideas to solve business problems
    $113k-140k yearly est. Auto-Apply 17d ago
  • Site Reliability Engineer II- Physical Security Technology

    Jpmorganchase 4.8company rating

    Jersey City, NJ jobs

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions. As a Site Reliability Engineer II at JPMorgan Chase within the enterprise technology, finance technology team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies. Job responsibilities Assist in the deployment and configuration of Genetec Security Center on Windows servers, ensuring successful implementation and integration of security systems. Provide first-level technical support to end-users, troubleshoot issues related to Genetec Security Center, and offer recommendations on best practices. Recognize and eliminate toil through systems engineering or automation, and implement observability patterns to improve service level indicators, objectives monitoring, and alerting solutions. Collaborate with senior IT staff and participate in training sessions to improve knowledge and skills related to Genetec Security Center and other IT systems. Monitor system performance and availability of core Genetec Security Center services, ensuring optimal transparency and analysis. Document and maintain accurate records of system configurations, changes, and support requests, ensuring clear communication and organization. Package Genetec Security Center software for client and server installs, and provide on-call support as needed to address urgent issues. Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 2+ years applied experience. 2+ years' experience working with Genetec Security Center, including configuring federations, and experience with installing and upgrading Genetec Security Center software while managing Windows patching. Familiarity with observability practices such as white and black box monitoring, service level objective alerting, and telemetry collection using tools like Grafana, Dynatrace, Prometheus, Datadog, and Splunk. Possession of Genetec Security Center Omnicast certification and a good understanding of network protocols and security principles. Experience working with third-party applications deployed on Windows Server environments and the ability to work with SQL Server, including running queries. Strong problem-solving skills and attention to detail, ensuring effective troubleshooting and resolution of issues. Excellent communication and interpersonal skills, facilitating collaboration and effective interaction with team members and stakeholders. Ability to work independently and as part of a team, demonstrating flexibility and adaptability in various work environments. Willingness to learn and adapt to new technologies, staying current with industry trends and advancements. Preferred qualifications, capabilities, and skills General knowledge of financial services industry Experience working with third-party applications. Experience working with any other video management solutions Experience working with Intrusion Detection systems Genetec Mission Control certification is a plus #LI-ID1
    $113k-140k yearly est. Auto-Apply 60d+ ago
  • Site Reliability Engineer II-1

    Mastercard 4.7company rating

    Bogota, NJ jobs

    Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential. Title and Summary Site Reliability Engineer II-1 Overview The GBSC EPMS team is looking for a Site Reliability Engineer who can help us solve problems, implement automation, and leverage best practices. * Are you a born problem solver who loves to figure out how something works? * Are you a detail -oriented individual who enjoys complex problem solving? * Do you love determining the correct actions required to fix a problem? * Do you have a low tolerance for manual work and look to automate everything you can? Business Operations is leading the Site Reliability Engineering (SRE) transformation at Mastercard through our tooling and by being an advocate for change & standards throughout the development, quality, release, and product organizations. We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must. Responsibilities * Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement. * Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns * Support services before they go live through activities such as system design consulting, capacity planning and launch reviews. * Maintain services once they are live by measuring and monitoring availability, latency and overall system health. * Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. * Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices. * Practice sustainable incident response and blameless postmortems. * Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover * Work with a global team spread across tech hubs in multiple geographies and time zones * Share knowledge and mentor junior resources All About You * BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience. * Experience with algorithms, data structures, scripting, pipeline management, software design and OLAP systems. * Hands on experience with understanding custom objects using JavaScript, HTML5, CSS and API integrations. * Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. * Ability to help debug and optimize code and automate routine tasks. * We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed. * Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl, Ruby, MDX. * Interest in designing, analyzing and troubleshooting large-scale distributed systems. * We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: * Abide by Mastercard's security policies and practices; * Ensure the confidentiality and integrity of the information being accessed; * Report any suspected information security violation or breach, and * Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
    $88k-119k yearly est. Auto-Apply 9d ago

Learn more about Tradeweb jobs