Post job

Reliability engineer jobs in Union City, NJ

- 707 jobs
All
Reliability Engineer
Process Engineer
Quality Engineer
Senior Reliability Engineer
Production Engineer
  • Project Quality Engineer

    Top Prospect Group

    Reliability engineer job in Yonkers, NY

    Job Title: Project Quality Engineer Shift: 1st Shift (Monday - Friday) Pay Rate: Up to $75000-$95,000 annually (commensurate with experience) Type: Direct Hire Reports To: QA Manager Dept.: Quality Assurance Job Description The Project Quality Engineer supports the Quality Assurance Manager in overseeing the Quality Assurance program for assigned rail car manufacturing projects. This role ensures compliance with contract requirements, technical specifications, and industry standards across production, acceptance, warranty, and modification phases. Responsibilities include creating Master Test and Inspection Plans, First Article Inspection schedules, Project Quality Plans, and audit procedures. The Project Quality Engineer coordinates closely with customers, vendors, and internal Kawasaki divisions to align project requirements, resolve quality issues, and support continuous improvement initiatives. This position also monitors documentation, leads corrective action activities, conducts contract reviews, and provides weekly and monthly quality reports. The engineer will serve as a primary Quality liaison between internal teams, subcontractors, and customer Resident Inspectors, ensuring timely communication, follow-up, and delivery of all quality-related commitments. Candidate Fit Summary This candidate is an excellent fit for organizations in the rail, aerospace, transportation, and heavy manufacturing sectors where strict compliance, technical quality standards, and contractual requirements are essential. They bring strong experience supporting complex production programs, managing supplier and customer interfaces, and developing detailed quality documentation. Skilled in FAI, FMEA, audits, and ISO 9001 processes, they excel in environments requiring strict quality controls, cross-functional coordination, and schedule accountability. Their ability to lead inspections, manage customer quality requirements, and drive corrective actions makes them a strong match for production-focused, project-driven engineering organizations. Essential Functions Implement and maintain QA programs for assigned contracts. Develop Master Test and Inspection Plans, Project Quality Plans, FAI schedules, and audit procedures. Attend project meetings and provide detailed quality status updates and reports. Analyze quality issues, identify root causes, and drive corrective actions. Coordinate with customers, suppliers, and internal teams across production and warranty phases. Manage project quality schedules and interface with Resident Inspectors. Ensure compliance with customer specifications, contract terms, and Kawasaki quality standards. Review and approve subcontractor/supplier documentation (PSI, FAI, audits, drawings, field reports). Monitor and report deviations, implement process improvements, and update procedures. Support Configuration Management planning, execution, and product delivery. Assist with subcontractor activity quality review and documentation. Travel domestically/internationally up to 30% to support project quality functions. Job Specifications Bachelor's Degree in Engineering (Master's preferred). Minimum five (5) years' experience in rail, aerospace, transportation, or heavy manufacturing. Knowledge of FAI, FMEA, ISO 9001, and source inspection processes. Strong communication, analytical, reporting, and computer skills. Ability to plan, coordinate, and manage workloads across multiple concurrent projects. Capable of working in both office and field/manufacturing environments. Work Environment Office and manufacturing floor settings. Frequent interaction with engineering, production, and customer teams. PPE required in production areas; must adhere to all safety protocols. Candidate Fit This candidate is a strong fit for Project Quality Engineering roles in complex manufacturing environments like rail, aerospace, automotive, and heavy industrial production. They have demonstrated capability in quality planning, regulatory compliance, supplier oversight, and customer interface management. With experience leading FAIs, audits, and corrective actions while supporting production schedules, they excel in driving continuous improvement, ensuring contract compliance, and maintaining high standards of safety, product quality, and documentation integrity. Their structured approach, technical acumen, and ability to manage project-based workloads make them a key contributor to high-complexity engineering programs. Company Overview Founded in 2010, Top Prospect Group was created with a focus on matching high-quality candidates with top-tier clients while fostering an environment where success is shared by all. In 2023, the company was acquired by HW Staffing Solutions, expanding its service offerings to include technology, engineering, and professional services. Qualified candidates are encouraged to apply immediately! Please include a clean copy of your resume, salary expectations, and availability with your application.
    $75k-95k yearly 1d ago
  • Production Engineer

    Insight Global

    Reliability engineer job in New York, NY

    A client of Insight Global in the Bronx, NY is seeking a Production Engineer to join their team! This individual will be responsible for leading manufacturing improvements by optimizing packaging line performance and minimizing downtime through data-driven analysis. Must partner with cross-functional teams to implement sustainable process enhancements and uphold quality standards, as well as applying Lean Six Sigma methodologies to drive efficiency and support continuous improvement. This is an onsite position; candidates are required to be onsite 5 days per week. Required Skills & Experience 5-7 years of experience as an engineer in a manufacturing environment Bachelor's degree in engineering (mechanical, chemical, or biomedical preferred) Experience partnering with cross functional teams Strong understanding of Lean and Six Sigma principles
    $64k-95k yearly est. 4d ago
  • Process Engineers

    Scientific Search

    Reliability engineer job in Parsippany-Troy Hills, NJ

    Nationwide Opportunities Do you thrive on optimizing systems, improving yields, and driving operational excellence? Whether your background is in pharma, biotech, medical device, or advanced manufacturing, we'd love to connect. At Scientific Search, we partner with industry leaders and emerging innovators across the U.S. who rely on talented Process Engineers to enhance production efficiency, scale technologies, and ensure consistent product quality. We're continually supporting new searches and always expanding our network of process professionals. We're Interested In Connecting With Engineers Experienced In Process development, scale-up, and optimization Equipment design, installation, and validation Root cause analysis, troubleshooting, and continuous improvement (Six Sigma, Lean) cGMP manufacturing support within regulated environments Cross-functional collaboration with operations, quality, and R&D teams If you'd like to be considered for future Process Engineer roles, please send us your resume. We'll keep you in mind as new positions arise that fit your expertise, interests, and preferred locations. Submit your resume We're always growing our network of skilled Process Engineers - let's stay connected so we can help you discover the right opportunity when the time is right. #19580
    $72k-98k yearly est. 4d ago
  • Quality Engineer

    Techgene Solutions 3.4company rating

    Reliability engineer job in Florham Park, NJ

    Role: QA Engineer App Sr Advanced Tech hybrid 3 days onsite. Duration: Contract Mandatory Skills & Qualifications 10+ years of relevant client-facing experience in QA/testing roles (excluding training periods) Minimum 5 years of on-site client experience in similar technology and domain Educational Requirement: Bachelor's degree in Technology (Master's preferred) Strong hands-on experience with the MERN Stack (MongoDB, Express, React, Node.js) Proficiency in programming/scripting: Java and JavaScript Extensive experience with modern automation tools such as: Selenium WebDriver Cypress or other industry-standard automation frameworks Strong understanding of QA methodologies, SDLC, STLC, Agile/Scrum processes Experience building reusable automation frameworks (UI and/or API) Experience in Performance Testing tools (e.g., JMeter, Locust) - preferred for Group B
    $70k-90k yearly est. 2d ago
  • Site Reliability Engineer - AML Global Recommendation - USDS

    Tiktok 4.4company rating

    Reliability engineer job in New York, NY

    About the Team: Site Reliability Engineering (SRE) of the AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run a massively distributed AI/ML recommendation system for the United States and all around the world. On the SRE team, you'll have the opportunity to sharpen your expertise in coding, performance analysis, and large-scale systems operation. Join us and you'll have the chance to shape the future of AML systems and make a real, tangible impact on TikTok users. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time. Responsibilities: * Design, build, and maintain highly available, scalable, and fault-tolerant systems. * Monitor and analyze system performance, identifying and resolving issues before causing user impact. * Develop and maintain automated monitoring, alerting, and incident response systems. * Collaborate closely with software engineering teams to ensure that applications are designed with reliability, scalability, and performance in mind. * Implement and maintain security best practices and ensure compliance with regulatory requirements. * Participate in on-call rotations and respond to issues and incidents within and outside of normal business hours. * Conduct root cause analysis of incidents, hold post-mortem reviews with stakeholders, and implement preventative measures to minimize the risk of similar incidents occurring in the future.Minimum Qualifications: * Expertise in analyzing and troubleshooting Linux-based distributed systems. * Bachelor's/Master's degree in Computer Science, Computer Engineering, or equivalent years of experience in a SRE or software engineering role. * Experience programming with at least one commonly used language (C, C++, Python, Go). * Strong understanding of data structures and algorithms. * Competent knowledge of relational database systems. Preferred Qualifications: * Ability to design and maintain large-scale systems. * Strong understanding of code optimization and routine task automation. * Proficiency in at least one machine learning framework: TensorFlow, PyTorch, MXNet or PaddlePaddle As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship for any immigration-related benefits.
    $131k-200k yearly est. 60d+ ago
  • Software Engineer II - Site Reliability Engineer

    The Walt Disney Company 4.6company rating

    Reliability engineer job in New York, NY

    Technology is at the heart of Disney's past, present, and future. Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more - all working to build and advance the technological backbone for Disney's media business globally. The team marries technology with creativity to build world-class products, enhance storytelling, and drive velocity, innovation, and scalability for our businesses. We are Storytellers and Innovators. Creators and Builders. Entertainers and Engineers. We work with every part of The Walt Disney Company's media portfolio to advance the technological foundation and consumer media touch points serving millions of people around the world. Here are a few reasons why we think you'd love working here: **Building the future of Disney's media:** Our Technologists are designing and building the products and platforms that will power our media, advertising, and distribution businesses for years to come. **Reach, Scale & Impact:** More than ever, Disney's technology and products serve as a signature doorway for fans' connections with the company's brands and stories. Disney+. Hulu. ESPN. ABC. ABC News...and many more. These products and brands - and the unmatched stories, storytellers, and events they carry - matter to millions of people globally. **Innovation:** We develop and implement groundbreaking products and techniques that shape industry norms, and solve complex and distinctive technical problems. Product Engineering is a unified team responsible for the engineering of Disney Entertainment & ESPN digital and streaming products and platforms. This includes product engineering, media engineering, quality assurance, engineering behind personalization, commerce, lifecycle, and identity. **Job Summary:** As a Software Engineer on the COPEX team, you'll design and build the foundational backend systems that directly power the Hulu & Disney+ streaming experience. You will architect mission-critical, high-throughput services for API and content recommendation delivery, while also building the platforms that empower our entire engineering organization to ship code with speed and confidence. You will join a talented team of engineers who build the software that: + Delivers foundational APIs and serves personalized streaming experiences to millions of users daily. + Enables our engineering organization to define, provision, and manage cloud infrastructure programmatically and at scale. + Allows teams to deploy changes to production swiftly and safely through sophisticated, automated CI/CD pipelines. + Provides deep insight into application performance via powerful, self-service observability and testing platforms. + Optimizes system capacity and cloud costs by engineering data-driven, automated solutions. **Responsibilities and Duties of the Role:** + Architect, build, and scale foundational backend services for API delivery and content recommendation, focusing on high availability, low latency, and massive throughput. + Design, build, and evolve our CI/CD solutions, writing clean, scalable code to automate the entire build, test, and deployment lifecycle. + Architect and develop robust, scalable test automation frameworks that product teams will use for load, integration, and functional testing. + Write software to abstract and automate infrastructure provisioning, creating a seamless, self-service experience for engineering teams using Infrastructure as Code (IaC). + Develop the core software, libraries, and services that form our observability platform, enabling engineers to easily build reliable and performant applications. + Proactively improve system architecture and build software-based solutions to reduce toil, minimize incidents, and automate remediation. **Required Education, Experience/Skills/Training:** Basic Qualifications + Minimum 3 years of professional experience + Experience in a DevOps or SRE role. + Experience with IaC + Experience with incident response + Experience with containerization + Experience with CI/CD tools + Experience programming in Java or a JVM language + Experience working on cross-team projects. + An ability to work both independently and collaboratively + Strong communication skills and a desire to share and learn Required Education + Bachelor's degree in computer science, Computer Engineering, Information Technology, or a related technical field. The hiring range for this position in New York is $120,300 to $161,300 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate's geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered. **Job ID:** 10133879 **Location:** New York,New York **Job Posting Company:** Disney Entertainment and ESPN Product & Technology The Walt Disney Company and its Affiliated Companies are Equal Employment Opportunity employers and welcome all job seekers including individuals with disabilities and veterans with disabilities. If you have a disability and believe you need a reasonable accommodation in order to search for a job opening or apply for a position, email Candidate.Accommodations@Disney.com with your request. This email address is not for general employment inquiries or correspondence. We will only respond to those requests that are related to the accessibility of the online application system due to a disability.
    $120.3k-161.3k yearly 29d ago
  • Reliability Engineer

    GE Vernova

    Reliability engineer job in Parsippany-Troy Hills, NJ

    SummaryAs the Reliability Engineer for Metem a GE Vernova business, you will be an active contributor to the success of the organization by improving the reliability, availability, and performance of our equipment and processes. You will analyze failure data, develop maintenance strategies, and work cross-functionally to implement proactive measures that reduce downtime and increase efficiency.Job Description What you'll do Develop and implement reliability improvement strategies using industry best practices such as RCM (Reliability-Centered Maintenance), FMEA (Failure Mode and Effects Analysis), and Root Cause Analysis (RCA). Monitor and analyze equipment performance and failure data to identify trends and areas for improvement. Collaborate with maintenance, operations, engineering, and safety teams to design and implement preventive and predictive maintenance programs. Establish key performance indicators (KPIs) for equipment reliability, and track progress against targets. Drive continuous improvement initiatives aimed at reducing equipment downtime and maintenance costs. Lead investigations into equipment failures and chronic issues, identifying root causes and implementing long-term solutions. Provide technical support for asset management, including equipment life cycle analysis and spare parts optimization. Participate in the design and installation of new equipment, ensuring reliability is considered from the outset (Design for Reliability). Eligibility Requirements This role requires use of technical data subject to U.S. Government export restrictions and this posting is only for U.S. Persons (U.S. Citizens, lawful permanent residents and protected individuals (e.g., certain refugees and asylees)). GE will require proof of status prior to employment. This is an onsite position based in Parsippany, NJ. Must be open to travel requirements. Ability to travel to the Allentown, PA facility approximately 1 time per week and to Hungary on average of 2 times a year. What you'll bring (Basic Qualifications) Bachelor's degree from an accredited university in Mechanical, Electrical, or Industrial Engineering. Minimum of 7 years of experience in reliability, maintenance, or engineering Strong knowledge of reliability engineering tools and methodologies (e.g., FMEA, RCA, Weibull analysis, MTBF/MTTR). Strong knowledge of engineering concepts and maintenance repair methods. Ability to interpret blueprints, specifications, drawings, and schematics. Experience with Maintenance Management Systems. Project Management skills and experience. What will make you stand out You have completed your CMRP certification. You have a Six Sigma certification. You are detail oriented with good organizational skills. You have excellent verbal and written communication skills. You have experience in chemical manufacturing operations and/or CNC machining facilities. You have a Process Safety Management background. This role requires access to U.S. export-controlled information. If applicable, final offers will be contingent on ability to obtain authorization for access to U.S. export-controlled information from the U.S. Government. Additional Information GE Vernova offers a great work environment, professional development, challenging careers, and competitive compensation. GE Vernova is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law. GE Vernova will only employ those who are legally authorized to work in the United States for this opening. Any offer of employment is conditioned upon the successful completion of a drug screen (as applicable). Relocation Assistance Provided: Yes For candidates applying to a U.S. based position, the pay range for this position is between $123,700.00 and $206,200.00. The Company pays a geographic differential of 110%, 120% or 130% of salary in certain areas. The specific pay offered may be influenced by a variety of factors, including the candidate's experience, education, and skill set.Bonus eligibility: discretionary annual bonus.This posting is expected to remain open for at least seven days after it was posted on December 01, 2025.Available benefits include medical, dental, vision, and prescription drug coverage; access to Health Coach from GE Vernova, a 24/7 nurse-based resource; and access to the Employee Assistance Program, providing 24/7 confidential assessment, counseling and referral services. Retirement benefits include the GE Vernova Retirement Savings Plan, a tax-advantaged 401(k) savings opportunity with company matching contributions and company retirement contributions, as well as access to Fidelity resources and financial planning consultants. Other benefits include tuition assistance, adoption assistance, paid parental leave, disability benefits, life insurance, 12 paid holidays, and permissive time off.GE Vernova Inc. or its affiliates (collectively or individually, “GE Vernova”) sponsor certain employee benefit plans or programs GE Vernova reserves the right to terminate, amend, suspend, replace, or modify its benefit plans and programs at any time and for any reason, in its sole discretion. No individual has a vested right to any benefit under a GE Vernova welfare benefit plan or program. This document does not create a contract of employment with any individual.
    $123.7k-206.2k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer

    Kalshi

    Reliability engineer job in New York, NY

    Role Roadmap We are building a next-generation financial ecosystem (think NYSE or CME from scratch). We are a small team, which means your responsibilities scale very rapidly, and your contributions are clear and visible, not marginal. There is still a lot of green field at Kalshi and a lot of it (including entire systems) can be yours. What you'll do Improve observability, reliability and availability by defining and measuring key metrics. Build automation and improve systems to eliminate toil and operations work. Collaborate with our core infrastructure team to performance tune and optimize our cloud deployments. (Think Docker, Terraform, Kubernetes, EC2, etc.) Collaborate with product teams to reduce service disruptions and automate incident response. Proactively find and analyze reliability problems across our business units and stack, then design and implement software to create step-function improvements. Educate, mentor and hold accountable the engineering team to improve the reliability of our systems and make reliability a core value of the Kalshi engineering culture. Write high quality, well tested code to meet the needs of your customers. Debugging extremely difficult technical problems, and making systems and products both work better and are easier to deploy, own, operate and diagnose. Review all feature designs within your product area and across the company for cross-cutting projects. Be an owner of the security, safety, scale, operational integrity, and architectural clarity of these designs. Build integrations with 3rd party vendors. Participate in an on-call support rotation to provide timely troubleshooting and resolution of urgent issues. What we're looking for Attributes: You have at least 4 years of experience in software engineering. You've designed, built, scaled and maintained production services, and know how to compose a service oriented architecture. You write high quality, well tested code to meet the needs of your customers. You're passionate about building an open financial system that brings the world together. You possess strong technical skills for system design and coding. Excellent written and verbal communication skills, and a bias toward open, transparent cultural practices. Strong skills around observability, debugging and performance tuning. Strong interpersonal skills working with engineers from junior to principal levels Demonstrated critical thinking under pressure. A willingness to dive into understanding, debugging, and improving any layer of the stack. On-call availability to ensure swift resolution of issues. Bonus points Experience designing and building reliable systems capable of handling high throughput and low latency. Experience with Datadog. Experience with Rust, Go and Terraform. Experience with AWS, GCP, or Azure. Experience working in a highly regulated environment. Experience writing company-facing blog posts and training materials. Our Culture Meritocracy is at our core, and we value people who take ownership and figure (usually hard) things out. We dream big. We love our craft deeply and are proud of what we put out in the world. We are committed to our vision of building something big… but also useful: a product that brings more truth through the power of markets. Kalshians are Kalshi's most important asset: we pick Kalshians carefully, so we trust them fully on day 1. NYC Pay Transparency Disclosure: Salary Range: $100,000 to $250,000 annually plus equity and benefits. This salary range is based on the current available market data and represents the expected salary range for this role. Kalshi has minimal hierarchy and few titles, but a broad range of experience is represented within roles. Should you have compensation expectations that exceed these bands, we'd love to hear from you and would welcome you to reach out to discuss further. Commitment to Equal Opportunity Kalshi is committed to creating a culture of inclusion and belonging, and we are proud to be an equal opportunity employer. We believe it is our collective responsibility to uphold these values and encourage candidates from all backgrounds to join us in our mission. All qualified applicants will be treated with respect and receive equal consideration for employment without regard to race, color, creed, religion, sex, gender identity, sexual orientation, national origin, disability, uniform service, veteran status, age, or any other protected characteristic per federal, state, or local law. If you are passionate about what you do and want to use your talents to support our mission and values, we'd love to hear from you.
    $100k-250k yearly Auto-Apply 39d ago
  • Staff Site Reliability Engineer

    Weight Loss, Better Sex, Fuller Hair, Improved Skin and More Online

    Reliability engineer job in New York, NY

    Ro is a direct-to-patient healthcare company with a mission of helping patients achieve their health goals by delivering the easiest, most effective care possible. Ro is the only company to offer nationwide telehealth, labs, and pharmacy services. This is enabled by Ro's vertically integrated platform that helps patients achieve their goals through a convenient, end-to-end healthcare experience spanning from diagnosis, to delivery of medication, to ongoing care. Since 2017, Ro has helped millions of patients, including one in every county in the United States, and in 98% of primary care deserts. Ro has been recognized as a Fortune Best Workplace in New York and Health Care for four consecutive years (2021-2024). In 2023, Ro was also named Best Workplace for Parents for the third year in a row. In 2022, Ro was listed as a CNBC Disruptor 50. The Role: At Ro, our mission is to provide world-class healthcare by putting patients first - and that mission depends on reliable, secure, and scalable systems. As a Staff SRE on the infrastructure team, you'll sit at the core of that effort: owning the reliability of our production systems, hardening infrastructure and building tools that empower our engineers to ship safely and confidently. You will work across teams to drive uptime, performance and observability - partnering closely with product, platform and security engineers. From designing resilient systems to shaping incident response practices, this is a role for engineers who thrive on impact and care deeply about operational excellence.What You'll Do: Design and implement resilient infrastructure to support high availability at scale Build and contribute to tools and platforms that streamline deployment, monitoring and recovery of systems Drive incident response and harness learnings, leading efforts to minimize downtime and improve MTTR Partner with engineering teams to bake best practices for reliability, resilience and observability into services Automate infrastructure workflows using IaC and other cloud native tools Champion a culture of operational excellence, guiding engineers through reliability practices and raising the bar across the engineering org What You'll Bring to the Team: Deep understanding of systems and infrastructure, with experience operating distributed services in production. We are mostly in AWS and leverage a lot of its primitives - EKS, RDS, Route53, S3, Elasticache to name a few Strong programming and automation skills using Go (bonus points for Python) Proficiency with infrastructure as code - Terraform / Pulumi A passion for observability, with hands-on experience in metrics, logging, tracing using Datadog Strong cross-functional communication, able to collaborate with product, platform, security and other teams An operational mindset that puts reliability and resilience as a core product requirement A mission-driven attitude, motivated by the opportunity to make healthcare better. We've Got You Covered: Full medical, dental, and vision insurance + OneMedical membership Healthcare and Dependent Care FSA 401(k) with company match Flexible PTO Wellbeing + Learning & Growth reimbursements Paid parental leave + Fertility benefits Pet insurance Student loan refinancing Virtual resources for mindfulness, counseling, and fitness The target base salary for this position ranges from $202,000 to $243,000, in addition to a competitive equity and benefits package (as applicable). When determining compensation, we analyze and carefully consider several factors, including location, job-related knowledge, skills and experience. These considerations may cause your compensation to vary. Ro recognizes the power of in-person collaboration, while supporting the flexibility to work anywhere in the United States. For our Ro'ers in the tri-state (NY) area, you will join us at HQ on Tuesdays and Thursdays. For those outside of the tri-state area, you will be able to join in-person collaborations throughout the year (i.e., during team on-sites). At Ro, we believe that our diverse perspectives are our biggest strengths - and that embracing them will create real change in healthcare. As an equal opportunity employer, we provide equal opportunity in all aspects of employment, including recruiting, hiring, compensation, training and promotion, termination, and any other terms and conditions of employment without regard to race, ethnicity, color, religion, sex, sexual orientation, gender identity, gender expression, familial status, age, disability and/or any other legally protected classification protected by federal, state, or local law. See our California Privacy Policy here.
    $202k-243k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer

    Cape Asset Management

    Reliability engineer job in New York, NY

    The Company Cape was founded in early 2022 by Palantir and Anduril alums with deep expertise in privacy and national security. While running Palantir's US national security business, our CEO became passionate about privacy and security on mobile devices. Our mission is to be a force for good in global wireless. At Cape, we are not just another cellular service provider; we are the architects of a privacy-centric movement that starts with the devices in your pocket. We are building a cellular network that helps citizens, including those responsible for our nation's security, regain control of their own data. We believe that where we are, where we go, and whom we are with are among our most personal information and should be kept private. Privacy is not something you achieve by limiting yourself or by doing less, it is a set of features to be built so you can do more. We have raised money from Andreessen Horowitz and other top-tier VCs, and are excited to grow the team. The Team We are relentless builders, constantly pushing the boundaries of what's possible and bringing to life ideas that have never before existed. Innovation is at the core of everything we do. At Cape, we trust our team to deliver greatness and empower them to make a profound impact. As a member of our team, you will collaborate seamlessly with our diverse group of talented engineers and other team members, enjoying dynamic interactions with colleagues from across the organization. The Role To join our team, you should be excited to: * Dive into a well-funded but early-stage startup. We're in a scrappy phase, be comfortable getting a little uncomfortable. * Reclaim some of the personal privacy we have all sacrificed as smartphone adoption has grown. * Flex your technical skills on hard, important problems with serious implications for consumer privacy and national security. * Push the envelope - we are using new technology in novel ways. * Work on greenfield problems. Starting new projects from the ground up. Shaping the stack, practices, and getting the opportunity to try new tools and technologies. * Work in person! There are no facetime requirements or set hours here, and we all take work from home days. But our default work location is our DC or NY office, and we enjoy the informal culture and serendipity that in-person work enables. We're offering competitive salary, benefits, and equity with early-stage upside. What you'll do * Be responsible for the full lifecycle development of our privacy-focused telecommunications and deployment infrastructure. * Build, integrate, and maintain our instrumentation and monitoring infrastructure and tooling for improving the reliability, availability, and performance of our system. * Help solve issues proactively before they become issues. * Build new or integrate with existing telecommunications infrastructure and components. * Own the technical accreditation and compliance process end-to-end for FedRamp. * Shape and influence what great software engineering practices look like. * Balance short term critical business needs with long term product vision and roadmap. Qualifications Although we list out what we generally look for, we are likely missing other attributes and skills that you have that could make you a great fit, but are not currently listed. It doesn't hurt to take a chance and apply! Preferred * 4+ years of software engineering or SRE experience. * Strong familiarity with AWS. * Fluency in Golang, Rust, Java/Kotlin, Python, or similar language. * Experience with building, deploying, and using monitoring infrastructure & tools. * Experience designing, building, and delivering high availability systems and infrastructure. * Passion for privacy and national security. * A desire to work on software that has real-world impact. Nice to have * Familiarity with Azure and/or GCP in addition to AWS. * Familiarity with mobile telecommunications technologies such as IP media subsystem (IMS), 4G/5G mobile core network functions, and/or multimedia protocols. * Experience managing redundant, high availability multi-site deployments. * Experience organizing documentation to support accreditation processes such as FedRamp and ATOs. The salary range for this role is $150,000-$230,000 a year + equity + 401K match. Within the range, individual pay is determined by location, experience, relevant education, and/or training. Our Culture * We are builders, and we choose to spend our time building things that matter. Many of our people have backgrounds in Defense Tech as well as the defense and intelligence community. We build to win. * We hire excellent people, give them outsized responsibility, and trust them to execute at a high level. Everyone here has a track record of solving hard problems throughout their careers. * We believe that personal privacy and national security interests are not inherently at odds, and can be reconciled via strong technology. * We believe that companies exist to build awesome things and take care of their people. Our benefits reflect that- top-tier health care, 401(k) matching, and a generous vacation policy (that we actually use). * We hire candidates of any race, color, ancestry, religion, sex, national origin, sexual orientation, gender identity, age, marital or family status, disability, Veteran status, and any other status. Achieving diversity across these categories will serve to make our company stronger and our product better. How to apply Click the link below to apply. We reserve the right to make use of any unsolicited resumes received from outside recruiting agencies and / or individual recruiters without being responsible for payment of any fees asserted from the use of unsolicited resumes.
    $150k-230k yearly 45d ago
  • Lead Site Reliability Engineer

    Kraken 3.3company rating

    Reliability engineer job in New York, NY

    Help us use technology to make a big green dent in the universe! Kraken powers some of the most innovative global developments in energy. We're a technology company focused on creating a smart, sustainable energy system. From optimising renewable generation, creating a more intelligent grid and enabling utilities to provide excellent customer experiences, our operating system for energy is transforming the industry around the world in a way that benefits everyone. It's a really exciting time in energy. Help us make a real impact on shaping a better, more sustainable future. Our Global Platform Engineering Reliability group is responsible for architecting, developing, and maintaining the resilient and scalable infrastructure that power and support our platforms. As a Lead Site Reliability Engineer within the newly created ‘Product Reliability' team, you'll be responsible for ensuring the availability, performance, and scalability of the products on our platform. Your proficiency in leading technical teams that support products serving millions of customers will ensure stability and high performance for our brands and clients. You will keep up with best practices in building products for scale. Your communication skills and attention to detail will be indispensable as you pinpoint areas for enhancement, ensure optimal product performance, and continuously improve our platforms reliability and efficiency.What you'll do: Team leadership Have ownership of the Product Reliability team within Platform, working closely with the Director and Heads of Platform Engineering to define strategic objectives and team direction Manage team priorities and ensure initiatives are completed within deadlines Collaborate regularly and effectively with the Staff Platform Engineer in your functional team to deliver the technical implementation of the team's strategic priorities Lead delivery of major initiatives on clear timelines Partner effectively in the wider Platform Engineering team to deliver outcomes Build a strong culture of open communication where teammates can ask questions without fear, promoting a positive and inclusive team environment People management Line-manage the engineers in the Product Reliability team Set clear performance expectations and goals for team members Regularly review individual and team performance, offering actionable insights and constructive feedback to support and grow team members Technical delivery Deliver technical improvements such as small features and bug fixes Support team delivery through code reviews, technology research and architectural guidance Provide support for service offerings owned by your team Help solve interesting and difficult problems. There's a great opportunity for disruption in the global energy market What you'll have: Excellent communication skills, working effectively with developers, product managers and other business stakeholders to understand and deliver impactful projects and reliability improvements Record of successfully and consistently delivering critical path projects, on time and at scale Meticulous organisation and planning skills Experience of mentoring and coaching a team to perform at a high-level of quality Experience managing and supporting a large-scale internet-facing distributed systems, for millions of customers Good experience with AWS and a programming language. We use a lot of different AWS services and not just the standard few Knowledge of security best-practices, security and CI/CD tooling, and methodologies We're hiring this role in New York City, but would also consider remote candidates who are based in the EST timezone, we cannot consider any applicants outside this region What will help: Previous experience in leading technical delivery for small, highly-autonomous teams Previous experience as a technical individual contributor, preferably as a Site Reliability Engineer Track-record of effective collaboration with other teams and departments to drive holistic outcomes A proactive, innovative mindset with the ability to drive continuous improvement Previous experience working in a remote-first asynchronous global team Familiarity with some of our tech stack: - PostgreSQL, or a similar RDBMS, particularly in Amazon RDS at scale - Docker and Kubernetes, we use Amazon EKS in production - Python - Datadog, or a similar logging/monitoring tool - Messaging queues, event-driven async processing or similar technologies - we use RabbitMQ - Terraform, or a similar infrastructure-as-code tool - Experience with a Linux distribution Why you'll love it here: Great medical, dental, and vision insurance options including FSAs. Paid time off - we know working hard means also being able to recharge as needed, we trust our employees to get the work done and take the time they need. 401(k) plan with employer match. Parental leave. Biological, adoptive and foster parents are all eligible. Pre-tax commuter benefits. Flexible working environment: you need to shift around your schedule? You do you, we genuinely believe in work/life balance. Equity Options: every Octopus employee owns part of the business. We're a team, working together towards huge goals. Every person is crucial to our success, you should be rewarded as such. Modern office or co-working spaces depending on location. We hire a wide range of experience levels into our platform team. The salary range for this role in the US ranges on average from $170,000-$200,000 depending on relevant experience, role alignment, and performance throughout the interview process. While the broad salary range is listed, not all candidates will be placed at the top of the range-this will be determined by the overall fit for the position. If you have questions about this, just ask! Our recruiters are happy to provide more context. We are hiring this role in our New York City office, but would also consider candidates who are remote within the East Coast region/timezone. We cannot candidates outside this region. Kraken is a certified Great Place to Work in France, Germany, Spain, Japan and Australia. In the UK we are one of the Best Workplaces on Glassdoor with a score of 4.7. Check out our Welcome to the Jungle site (FR/EN) to learn more about our teams and culture. Are you ready for a career with us? We want to ensure you have all the tools and environment you need to unleash your potential. If you have any specific accommodations or a unique preference, please contact us at [email protected] and we'll do what we can to customise your interview process for comfort and maximum magic! Studies have shown that some groups of people, like women, are less likely to apply to a role unless they meet 100% of the job requirements. Whoever you are, if you like one of our jobs, we encourage you to apply as you might just be the candidate we hire. Across Kraken, we're looking for genuinely decent people who are honest and empathetic. Our people are our strongest asset and the unique skills and perspectives people bring to the team are the driving force of our success. As an equal opportunity employer, we do not discriminate on the basis of any protected attribute. We consider all applicants without regard to race, colour, religion, national origin, age, sex, gender identity or expression, sexual orientation, marital or veteran status, disability, or any other legally protected status. U.S. based candidates can learn more about their EEO rights here. Our (i) Applicant and Candidate Privacy Notice and Artificial Intelligence (AI) Notice , (ii) Website Privacy Notice and (iii) Cookie Notice govern the collection and use of your personal data in connection with your application and use of our website. These policies explain how we handle your data and outline your rights under applicable laws, including, but not limited to, the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Depending on your location, you may have the right to access, correct, or delete your information, object to processing, or withdraw consent. By applying, you acknowledge that you've read, understood and consent to these terms We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
    $170k-200k yearly Auto-Apply 60d+ ago
  • Director, Site Reliability Engineer

    Ordergroove 4.5company rating

    Reliability engineer job in New York, NY

    Who We Are: Ordergroove is a dynamic, fast-paced environment where you will be involved in building something of real value from the ground-up. We're looking for bright, talented people who are excited about innovation, growth, and the exciting world of Relationship Commerce. If you're motivated by a desire to solve problems and deliver groundbreaking insights and solutions you'll fit in perfectly! About the Role: OrderGroove is looking for an extraordinarily talented, passionate and naturally curious person to join our Engineering Team. Our Engineers are problem solvers who get excited about pushing the boundaries of what we can achieve, love learning and thrive in a fast-paced, collaborative environment. As the Site Reliability Engineer Director you will be joining our SRE team whose primary goal is to infinitely scale and secure our cloud-based hosted platform and accelerate time to market of code deployments while supporting the biggest brands in the world. What You Will Do: You will define and execute the vision for continuous delivery, cloud deployment strategies, and operational excellence. You will spearhead our reliability, scalability, and automation efforts to ensure our platform operates securely and efficiently. Work closely with engineering, security, and QA teams to enhance deployment and release processes. You will help us scale how we push high-quality code, securely and efficiently. You will guide, mentor, and collaborate with an awesome group of highly passionate engineers. About You: 8+ years working in DevOps, SRE, or similar capacity managing cloud/private or hybrid environments. Passionate about automation, leveraging best-of-breed technologies, and eager to learn new skills. Experience working with automation tools (we use Terraform, Ansible and Chef). Experience with Kubernetes, Docker, and container orchestration best practices. Comfortable with systems Linux administration (CentOS / RHEL / Debian). Experience configuring CI/CD pipelines using Jenkins, GitHub Actions, or GitOps. Fluent with Apache/Nginx configurations. Comfortable with MySQL/MongoDB administration and scaling strategies. Experience designing high-availability/fault-tolerant systems. Comfortable working with development teams to understand their pain points and come up with creative solutions. Excellent communication skills. Service-oriented mentality. Great critical thinking skills. Ability to quickly adapt to change. Bonus Points For: Python or Go experience. Managing GCP hosted environments. Previous eCommerce experience. Experience going through PCI1 and SOC2 compliance approval processes. Start-up and SaaS experience strongly preferred. Familiarity with monitoring tools such as Prometheus, Grafana, or New Relic. Bachelor's or Master's degree in Computer Science, Engineering, or a related field preferred. If you don't meet 100% of the qualifications outlined above - that's okay, nobody's perfect! We encourage you to apply if you think this is a role that would make you excited to come into work every day. About Ordergroove: Ordergroove powers recurring revenue for the world's largest and most innovative retailers including L'Oreal, Dollar Shave Club, La Colombe Coffee, Bonafide Health, BarkBox and more. As a direct result, more than 11% of adult Americans have a subscription powered by Ordergroove. Our technology makes seamless, one-of-a-kind subscriber and membership experiences possible to turn one-time transactions into profitable recurring customer relationships. Ordergroove's powerful platform empowers merchants with highly customizable options such as flexible promotions, bundling, and analytics to bolster their bottom line while making customers' lives easier. We recently achieved a milestone year with 152% year over year new business growth, and were rated best-in-class subscription technology by CB Insights and eCommerce Platform of the Year by RetailTech Breakthrough Awards. Our company values celebrate collaboration, different perspectives, and curiosity with the goal of getting to the right answer, no matter who came up with it. At Ordergroove we are committed to creating a welcoming and supportive environment for all people. We encourage people with different backgrounds and experiences to join our growing team so that we gain different perspectives and build the best team possible. We demand the best of ourselves and each other and never miss an opportunity to celebrate our successes. With a fully flexible work from anywhere culture, staying connected and supporting each other are always top of mind. We build our tight-knit community through small group events like trivia night, cooking classes, and book clubs. We encourage cross-functional relationships through virtual coffees and we stay close to the business through weekly team updates and quarterly all-hands meetings. At Ordergroove, we focus on flexibility and empowering our team to make the right decisions for themselves. We have flexible PTO and a totally remote (anywhere in the US) workforce, and an annual personal development budget that you use for what matters to you (wellness, career development, productivity at home, etc). And of course, that is on top of the basics like competitive compensation (including stock options) and incredible, affordable benefits. Come join our amazing team while we enable the fastest-growing segment of commerce that makes life easier for millions of consumers every day! At Ordergroove, we want to hire, develop and retain the best talent, making Ordergroove a top destination to grow your career. The pay transparency law is a way of narrowing the gender pay gap and fostering an engaged and positive working environment. It is also a way to share what we think is a reasonable, equitable and competitive compensation structure for the roles on our team. The total compensation range for this role starts at $175,000.
    $175k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer III- Kafka Platform Engineering

    Jpmorgan Chase & Co 4.8company rating

    Reliability engineer job in Jersey City, NJ

    JobID: 210662270 JobSchedule: Full time JobShift: Base Pay/Salary: Jersey City,NJ $133,000.00-$185,000.00 There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platforms, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities * Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate. * Demonstrate deep knowledge of Kafka technology, Kafka connect framework, and distributed systems technologies, with the ability to operate in and migrate across public and private clouds. * Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines * Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications * Implements infrastructure, configuration, and network as code for the applications and platforms in your remit. * Collaborates with technical experts, key stakeholders, and team members to resolve complex problems. * Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers. * Contribute to the development of technical documentation, including service APIs using Swagger, ensuring robust logging, auditability, security, and monitoring features. * Supports the adoption of site reliability engineering best practices within your team. * Engage in periodic on-call rotation shifts, providing client support and ensuring thorough monitoring of the platform. Required qualifications, capabilities, and skills * Formal training or certification on computer science and reliability concepts and 3+ years applied experience. * Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform * Proficient in at least one programming language such as Java/Spring Boot, python. * Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) * Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. * Experience with public cloud platforms like AWS, GCP or Azure. * Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams. * Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform. * Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker. * Familiarity with troubleshooting common networking technologies and issues. * Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision * Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation * Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team * Ability to initiate and implement ideas to solve business problems. Preferred qualifications, capabilities, and skills * Familiarity with running Apache Flink. * Understanding of authentication and authorization technologies (e.g., OAUTH, Kerberos). * Experience with AWS cloud services and Kubernetes platform orchestration.
    $133k-185k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer - Capital Markets

    Jefferies 4.8company rating

    Reliability engineer job in Jersey City, NJ

    Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas. Responsibilities: Front Line Site Reliable Engineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users. Build monitoring tools for application and infrastructure components. Implement and manage scalable infrastructure using cloud-native technologies and tools. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures. Develop and maintain CI/CD pipelines to streamline deployment processes. Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth. Create sustainable systems and services through automation. Collaborate with Application team to establish and enforce production and development standards. Document procedures, best practices and troubleshooting FAQs. Resolve complex application and technical problems. Debugging the system and fixing the production related issues. Escalate / follow-up on permanent fix for development related issues. Lead incident response efforts and post-mortem analysis to prevent future occurrences. Handles complex operational tasks and recommends process and technology changes. Global support and includes weekend availability to troubleshoot production related issues and perform checkouts. Ability to work both independently and in groups in an energetic, diverse environment. Participate in on-call rotations to ensure 24/7 system availability and support. Support compliance and legal queries. Qualifications: Strong experience in Windows and Linux/Unix services. Strong experience in scripting language like Power shell, Python and SQL. Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog Strong Knowledge of FIX protocol Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle. Excellent communication, time management and project management skills. Primary Location Full Time Salary Range of $175,000 - $200,000
    $175k-200k yearly Auto-Apply 13d ago
  • Senior Site Reliability Engineer

    Ava Labs 4.5company rating

    Reliability engineer job in New York, NY

    Applicant Privacy Notice Looking to join a world-class blockchain development team? Ava Labs makes it simple to deploy high-performance solutions for Web3, led by innovations on Avalanche. The company was founded by Cornell computer scientists, who partnered with Wall Street veterans and early Web3 leaders to execute a promising vision for redefining the way people build permissionless networks. Ava Labs is redefining the way people create value with Web3. Join us as we empower people to easily and freely digitize all the world's assets on one open, programmable blockchain platform. We're looking for a Senior Site Reliability Engineer to join our engineering team. This Engineer will be responsible for release pipelines, environments, observability, and monitoring of critical components of the Avalanche blockchain network. They will drive efficiency and velocity across the engineering team while maintaining reliability and security. WHAT YOU WILL DO Develop and optimize highly reliable and scalable infrastructure focused on best practices and SRE principles Implement and maintain monitoring, logging, and tracing tools to gain insights into service behavior and health Uphold SLOs (Service Level Objectives), SLIs (Service Level Indicators), and error budgets for critical systems Enhance the reliability and resiliency of critical systems by identifying single points of failure and implementing best practices Deploy and monitor observability tools and dashboards for monitoring and optimization of systems using tools such as Datadog and Grafana Develop infrastructure deployment scripts including terraform, terragrunt, and Argo CD for production and test environments Improve the development release pipeline automation and quality gates in GitHub Actions Work closely with developers to increase developer productivity, velocity, and efficiency of the team Identify areas of cost optimization and reduction and execute cost reduction measures Automate and streamline incident management processes to minimize service disruption and improve response times Participate in on-call rotations, ensuring quick restoration of services and fostering a blameless post-mortem culture Foster a continuous improvement mindset by analyzing and learning from incidents and implementing preventive measures Leverage cloud technologies and IaC tools to ensure scalability and repeatability Advocate for best practices in reliability, security, and maintainability within the team WHAT YOU WILL BRING BS in Computer Science or related field. 7+ years of experience as an SRE, DevOps, or Cloud Engineer. Strong grasp of SRE principles, including error budgets, SLOs, and SLIs. Cloud networking and orchestration with AWS (EKS, ECS, VPC, S3, ELB). Strong Kubernetes experience with Docker or RKT containerization. Proficiency in Infrastructure as Code (IaC) using tools such as Terraform, Terragrunt, and Ansible. Experience with monitoring and observability tools like Prometheus, Grafana, or ELK Stack. Building and maintaining CI/CD pipelines with GitHub Actions (preferred), Jenkins, Travis CI, Circle CI. Experience with automation and configuration management using Ansible, Puppet or Chef. Experience with Linux-based infrastructures. (Ubuntu preferred). Experience with scripting languages and the creation of scripts. (Python and GoLang preferred). Working knowledge of decentralized architecture design patterns and distributed systems. Salary Range: $158,440.00 to $188,147.50 ( **This is not a guarantee of compensation or salary, a final offer amount may vary based on factors including but not limited to experience and geographic location. NYC metro candidates are required to be in office 2-3x/week, with exceptions.) #LI-Remote #LI-DS1 WHY AVA LABS? If you've ever thought about joining an early stage Web3 company - this is it! We're a global, world-class team of experts in computer science, economics, finance, marketing, and law with offices in New York City and Miami. We're highly passionate about Web3 and redefining the way people build and use finance and decentralized applications of all kinds. The company received early-stage funding from Andreessen Horowitz, Initialized Capital, and Polychain Capital, with angel investments from Balaji Srinivasan and Naval Ravikant. Join us and be a pioneer in a new technology that will have implications across a range of verticals such as finance, gaming, investing, collectibles, among many others. Ava Labs is committed to diversity in the workplace and we're proud to be an Equal Opportunity Employer. We do not hire on basis of race, color, religion, creed, gender, national origin, citizenship, age, disability, veteran status, marital status, pregnancy, parental status, sex, gender expression or identity, sexual orientation, or any other basis protected by local, state or federal law. All employment is decided on the basis of qualifications, merit, and business need.
    $158.4k-188.1k yearly Auto-Apply 60d+ ago
  • Site Reliability Engineer

    Hebbia

    Reliability engineer job in New York, NY

    The AI platform for investors and bankers that generates alpha and drives upside. Founded in 2020 by George Sivulka and backed by Peter Thiel and Andreessen Horowitz, Hebbia powers investment decisions for BlackRock, KKR, Carlyle, Centerview, and 40% of the world's largest asset managers. Our flagship product, Matrix, delivers industry-leading accuracy, speed, and transparency in AI-driven analysis. It is trusted to help manage over $15 trillion in assets globally. We deliver the intelligence that gives finance professionals a definitive edge. Our AI uncovers signals no human could see, surfaces hidden opportunities, and accelerates decisions with unmatched speed and conviction. We do not just streamline workflows. We transform how capital is deployed, how risk is managed, and how value is created across markets. Hebbia is not a tool. Hebbia is the competitive advantage that drives performance, alpha, and market leadership. The Role As a highly skilled Site Reliability Engineer (SRE), you will contribute to building systems that optimize the uptime and reliability of our platform, and support the management and optimization of our DevOps and infrastructure operations. You will be responsible for owning our deployment pipelines, building and maintaining our continuous integration and continuous deployment (CI/CD) systems, ensuring the reliability and performance of our services, enhancing our observability, supporting our local development environments, and bolstering our security posture. Your technical expertise and problem-solving skills will contribute to the success of our AI products and shape the future of our technology stack. Responsibilities Assist in managing deployment pipelines to facilitate smooth and efficient software releases. Help implement and maintain observability solutions for monitoring system performance and reliability. Support local development environments to optimize developer workflows. Work with development teams to ensure infrastructure aligns with project requirements. Contribute to improving the security of our infrastructure by assisting with proactive measures and audits. Assist in developing and maintaining automation scripts and tools to enhance operational efficiency. Help troubleshoot and resolve infrastructure and application issues to minimize downtime and maintain smooth operations. Participate in evaluating and integrating new technologies to enhance the scalability, reliability, and security of our infrastructure. Who You Are Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. 5+ years software development experience at a venture-backed startup or top technology firm. Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Strong expertise in managing CI/CD pipelines and deployment automation. Proficiency in cloud platforms such as AWS, Azure, or Google Cloud (we are an AWS shop). Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes. Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar. Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation. Familiarity with security best practices and tools for infrastructure and application security. Excellent problem-solving skills and the ability to troubleshoot complex issues. Strong communication skills and the ability to work effectively in a collaborative environment. A proactive and self-motivated approach to learning and adopting new technologies. Passion for continuous improvement and operational excellence. Compensation The salary range for this role is $160,000 to $300,000. This range may be inclusive of several career levels at Hebbia and will be narrowed during the interview process based on the candidate's experience and qualifications. Adjustments outside of this range may be considered for candidates whose qualifications significantly differ from those outlined in the job description. Life @ Hebbia PTO: Unlimited Insurance: Medical + Dental + Vision + 401K Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent Fertility benefits: $15k lifetime benefit New hire equity grant: competitive equity package with unmatched upside potential #LI-Onsite
    $90k-125k yearly est. Auto-Apply 51d ago
  • Site Reliability Engineering

    Forhyre

    Reliability engineer job in New York, NY

    Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changing technology landscape. To be successful in this role You'll have the opportunity to design and implement major infrastructure components, systems, and developer-friendly capabilities to improve the availability, scalability, latency, and efficiency of our services You will provide technical leadership to cross-functional engineering, infrastructure, and product teams, and evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle of software development--from inception and design, through deployment, operation and refinement of a highly distributed system running in public cloud Serve as subject matter expert in an SRE mindset, best practices, and cloud-native principles Scale systems sustainably through automation to improve reliability and velocity Assist with all aspects of operational security and compliance Run software performance analysis and system tuning Design and implement tools to collect data from various sources and provide actionable insights Participate in critical incident management and timely post-mortems of production incidents to drive practices around blameless analysis, resolution, and continuous improvement work with cross-functional teams Develop the rest of the team by conducting code reviews, providing mentorship, pairing, and training opportunities Qualification & Skills We are looking for Principal SRE with proven experience in running distributed systems at scale, in production You have 15+ years of experience in relevant skills gained and developed in the same or similar role Strong knowledge of container orchestration, preferably Kubernetes and networking technology Hands-on experience in one or more languages, such as Node JS, Python, Go, Perl, Ruby, and Bash Experience with SOA, Microservices architecture, API Management & Enterprise system Integrations Strong production experience with cloud infrastructure, AWS, Azure & Google Cloud Strong sense of ownership, and an ability to drive tasks to completion Experience developing and monitoring distributed systems Experience working in an Agile Environment with great collaboration skills
    $90k-125k yearly est. 11d ago
  • Site Reliability Engineer

    Clay Labs

    Reliability engineer job in New York, NY

    About Clay Clay is a creative tool for growth. Our mission is to help businesses grow - without huge investments in tooling or manual labor. We're already helping over 100,000 people grow their business with Clay. From local pizza shops to enterprises like Anthropic and Notion, our tool lets you instantly translate any idea that you have for growing your company into reality. We believe that modern GTM teams win by finding GTM alpha - a unique competitive edge powered by data, experimentation, and automation. Clay is the platform they use to uncover hidden signals, build custom plays, and launch faster than their competitors. We're looking for sharp, low-ego people to help teams find their GTM alpha. Why is Clay the best place to work? Customers love the product (100K+ users and growing) We're growing a lot (6x YoY last year, and 10x YoY the two years before that) Incredible culture (our customers keep applying to work here) Well-resourced - We raised a $100M Series C in 2025 at a $3.1B valuation and are backed by world-class investors like Capital G (Google), Sequoia and Meritech Read more about why people love working at Clay here and explore our wall of love to learn more about the product. SRE @ Clay In this role, you'll join our growing infrastructure team in building and fine-tuning our infrastructure to keep our services running smoothly. We're looking for someone who's excited about automation and continuous improvement. While your main focus will be on infrastructure, coding skills are a must. As a growing startup, we all jump in where needed, so you'll need to be comfortable taking on a variety of roles. What You'll Do Architect, design, implement, and manage robust, scalable, and secure infrastructure solutions. Develop, maintain, and enforce best practices for CI/CD, infrastructure as code, and automation. Oversee the management and optimization of cloud infrastructure, ensuring high availability, performance, and cost-efficiency. Implement monitoring, logging, and alerting solutions to maintain system health and quickly resolve issues. Lead incident response efforts, troubleshooting and resolving complex issues in a timely manner. Participate in an oncall rotation. Work with teams across the company to ensure we achieve the right balance of developer velocity, reliability and performance, and cost efficiency. What You'll Bring 5+ years of experience Experience with containerization and orchestration tools Strong understanding of CI/CD concepts and tools Knowledge of infrastructure automation tools Experience with oncall and incident response Proficiency in one or more programming languages Familiarity with our stack or ability to learn unfamiliar technologies quickly: Aurora Postgres RDS, Elasticache Redis, Docker + ECS, Lambda, OpenSearch Terraform and Atlantis CircleCI, Netlify, Playwright Cloudwatch, Datadog, Mezmo Typescript, Python
    $90k-125k yearly est. Auto-Apply 13d ago
  • Lead Site Reliability Engineer

    JPMC

    Reliability engineer job in Jersey City, NJ

    Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within Employee Platforms, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. Job responsibilities Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses Documents and shares knowledge within your organization via internal forums and communities of practice Required qualifications, capabilities, and skills Formal training or certification on site reliability engineering concepts and 5+ years of applied experience Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.) Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.) Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.) Experience with troubleshooting common networking technologies and issues Ability to identify and solve problems related to complex data structures and algorithms Drive to self-educate and evaluate new technology Preferred qualifications, capabilities, and skills Experience with Splunk, Azure, and Microsoft 365 infrastructure optimization, especiallly migrating from monolith to distributed services. Experience with cloud infrastructure management. Demonstrated achievements with automation of operational excellence; especially pro-active monitoring. Ability to teach new programming languages to team members Ability to expand and collaborate across different levels and stakeholder groups #LI-ID1
    $87k-121k yearly est. Auto-Apply 60d+ ago
  • Cloud Site Reliability Engineer

    Ayr Global It Solutions 3.4company rating

    Reliability engineer job in New York, NY

    AYR Global IT Solutions is a national staffing firm focused on cloud, cyber security, web application services, ERP, and BI implementations by providing proven and experienced consultants to our clients. Our competitive, transparent pricing model and industry experience make us a top choice of Global System Integrators and enterprise customers with federal and commercial projects supported nationwide. Job Description Role: Cloud Site Reliability Engineer Location: NYC or Boston Duration: Fulltime Permanent Qualifications Description: As Cloud Site Reliability Engineer will deploy solutions in the public cloud (e.g., AWS) using configuration, provisioning and management tools (e.g., AWS CloudFormation). He / she are required to design configuration templates that are used to provision infrastructure components (i.e., AWS EFS, EC2, RDS, etc.) in a scalable manner. The configuration developer works closely with the scrum master to understand project requirements within an agile software development environment. Responsibilities: •Provide inputs to major architectural designs to ensure consistency, security, maintainability and flexibility with respect to the overall system architecture •Support architects in designing highly scalable and automated deployments for a wide range of applications •Templatize configurations using architecture blueprints •Develop stable and scalable services across public cloud environments like AWS and GCP •Configure and assess overall compliance of infrastructure resources against policy rules •Make recommendations to improve process efficiency and effectiveness •Handle support escalations from developers requiring troubleshooting with existing configuration details •Fulfill requests from developers for designing new configuration templates as per their needs Requirements: •Prior experience in scripting and creating configuration templates using cloud provider tools (e.g., AWS CloudFormation) •Experience as an infrastructure and / or platform developer with scripting languages (e.g., Python, Ruby) •General knowledge of the following: IT concepts, strategies and methodologies, IT architectures and technical standards •General knowledge of layered systems architectures •General understanding of shared software concepts •General knowledge of cloud-centric architectures and technical standards •General knowledge of agile software development concepts and processes •Consultative skills, including the ability to understand and assist in applying customer requirements •Ability in drawing out unforeseen implications and making recommendations for design, •Ability to define design reasoning with an understanding of potential impacts to design requirements •Familiarity in developing stable and scalable services in public and private cloud environments Bachelors Degree in a Computer related field 5+ yrs experience. Additional Information If anyone might be interest, please share your resume at *************************** or you can directly contact me at ************
    $104k-148k yearly est. Easy Apply 11h ago

Learn more about reliability engineer jobs

How much does a reliability engineer earn in Union City, NJ?

The average reliability engineer in Union City, NJ earns between $75,000 and $140,000 annually. This compares to the national average reliability engineer range of $76,000 to $144,000.

Average reliability engineer salary in Union City, NJ

$102,000

What are the biggest employers of Reliability Engineers in Union City, NJ?

The biggest employers of Reliability Engineers in Union City, NJ are:
  1. JPMC
  2. JPMorgan Chase & Co.
  3. S&P Global
  4. AYR
  5. Kraken
  6. BNY Mellon
  7. The Walt Disney Company
  8. Verisk Analytics
  9. Ordergroove
  10. Zeta
Job type you want
Full Time
Part Time
Internship
Temporary