Reliability engineer jobs in West New York, NJ - 777 jobs
All
Reliability Engineer
Process Engineer
Senior Reliability Engineer
Senior Process Engineer
Quality Engineer
Process Improvement Engineer
Biomedical Engineer
Quality Engineer/Process Engineer
Failure Analysis Engineer
Reliability Engineer
Mini-Circuits 4.1
Reliability engineer job in New York, NY
Mini-Circuits designs, manufactures and distributes integrated circuits, modules, and sub‑systems for high‑performance radio frequency (RF) and microwave applications. With design, sales and manufacturing locations in over 30 countries, Mini‑Circuits' products are used in a range of wired and wireless communications applications. Our products are also used in detection, measurement and imaging applications, including military communication, guidance and electronic countermeasure systems, commercial, scientific, military land, sea and aircraft; automotive systems, medical systems, and industrial test equipment.
Mini‑Circuits' sells its products to over 20,000 customers globally through our direct sales force, applications engineering staff, sales representatives, as well as through our extensive website.
Position Summary:
The Reliability Engineer is responsible for conducting reliability studies of existing products and coordinating new product qualification prior to market release. The candidate will work in collaboration with various teams including Reliability, Design Engineering, Product Engineering, Failure Analysis and Project Management teams.
Salary Range: $99,000 - $117,000 per year
Job Function:
Participate in the product development meetings and guide the team to develop reliable products that meet internal specifications and customer requirements.
Develop qualification plans for new products, primarily MMICs but also support other product lines including but not limited to Low Temperature Co‑Fired Ceramics, PCBA products, RF accessories and Core & Wire Products.
Analyze new products for similarity with existing released products in terms of package, die process and design to determine Qualification by Similarity, thus streamlining qualification testing.
Design and execute both device level and package level qualification tests including but not limited to MSL pre‑conditioning, Thermal cycling, UHAST, HTSL, ESD and Life Tests.
Define ESD Human Body Model (HBM) and Charged Device Model (CDM) tests as per JEDEC standards.
Collaborate with Engineering Test Teams to execute Accelerate Life Tests, High Temperature Operating Life Test.
Execute Mechanical stresses such as Vibration, Mechanical Shock, Constant Acceleration & Bend Testing.
Co‑ordinate with external labs for outsourced tests.
Review RF Test data before and after stresses to analyze changes in performance.
Collaborate with Failure Analysis teams to understand the root cause of failures.
Identify and record any non‑conformities. Monitor solution implementations to verify effectiveness of corrective actions.
Ensure On‑Time Completion of Qualification activities and escalator any potential delays.
Present Qualification results with all relevant stakeholders to help Design teams initiate changes to improve reliability performance.
Prepare written reports summarizing the results of product performance and failure analysis for both internal purposes as well as customer review.
Interface with customers and suppliers on product reliability as required.
Interface with supplier to purchase lab equipment.
Support reliability assessments originating from production of released products or customer returns.
Makes decisions within area of specialty, manages medium to large projects.
Promotes ISO9001/AS9100 Quality.
The duties, responsibilities and expectations described above are not a comprehensive list and additional tasks may be assigned to the member, within the scope of the position.
Qualifications:
BS in Mechanical Engineering, Electrical Engineering, Materials, Reliability, Industrial Engineering or Physics. Advanced degree preferred.
3‑5 years' experience as a Reliability Engineer in Semiconductor or equivalent industry.
Familiarity with common industry standards including JEDEC, MIL‑STD‑883, MIL‑STD‑202 and AEC‑Q.
Experience with Reliability Qualification by Similarity.
Experience with Environmental, Mechanical and ESD stresses.
Experience with problem solving methodologies and leading root cause analysis.
Experience with customer returns failure analysis support.
Must have familiarity with failure analysis techniques including Scanning Acoustic Microscopy (SEM), Radiographic Inspection (X‑Ray), Cross‑Section methods.
Familiarity with MTTF, MTBF Calculations.
Experience with Reliability prediction modeling and tools like Weibull++ (or equivalent reliability software)
Experience with Data analysis tools including Advanced Excel, JMP, Minitab.
Ability to analyze component performance data in reliability tests, including large variety of test parts and multiple design variations.
Experience with Design of Experiments, FMEA, product design reviews and DFM.
Excellent written and oral communication skills.
Physical Demands:
The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. While performing the duties of this job, the employee is regularly required to talk and hear. The employee frequently is required to stand, walk, sit and use hands to operate a computer keyboard. The employee is occasionally required to reach with hands and arms. The employee must occasionally lift and/or move up to 10 pounds. Specific vision abilities required by this job include close vision, and ability to adjust focus. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
Additional Requirements/Skills:
Ability and willingness to abide by Company's Code of Conduct.
Occasional travel, some overnight, as required (up to 10%).
Disclaimer: The listed qualifications and requirements for each position are intended as guidelines. Mini‑Circuits reserves the right to hire outside of these guidelines at Management's discretion.
Mini‑Circuits is an Equal Opportunity Employer and does not discriminate on the basis of actual or perceived age, race, creed, color, national origin, sexual orientation, military status, sex, disability, predisposing genetic characteristics, marital status, familial status, gender identity, gender dysphoria, pregnancy‑related condition, and domestic violence victim status or protected class characteristic, or any other protected characteristic as established by federal or state law.
#J-18808-Ljbffr
$99k-117k yearly 4d ago
Looking for a job?
Let Zippia find it for you.
Senior Site Reliability Engineer
Unify 4.2
Reliability engineer job in New York, NY
Unify was founded January 17th, 2023 by Austin Hughes and Connor Heggie. Prior to Unify, Austin led Ramp's growth product team focused on new customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like Airbnb, Spotify, Bridgewater and LinkedIn.
Our mission is to build the first system-of-action for go-to-market teams, starting with an end to end platform powering warm outbound. Today, outbound sales is dominated by cold, mass outreach that floods people's inboxes and converts to deals at a tiny rate. We're building a platform to power warm outbound, allowing go-to-market teams to get in touch with the right people at the exact time they're looking for a solution.
We've grown revenue 8x year-over-year, and are already serving customers like Guru, Justworks, Together.AI, Flock Safety, Hightouch and more. We're a high energy, high intensity team and we've raised $58M from Thrive, Emergence, OpenAI and others. Come join us in changing how go-to-market works.
About the Role
Unify is redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and supporting enterprise customers with demanding uptime requirements. You'll work across the stack-optimizing databases, hardening services, and building the automation and observability that keep Unify fast and reliable at scale.
What You'll Do
Scale our data infrastructure: Optimize and extend our ClickHouse and PostgreSQL deployments-designing partitioning strategies, tuning queries, and improving replication and failover systems.
Improve system performance: Profile and optimize critical paths across backend services, identify bottlenecks in data pipelines and API layers, and ship changes that improve latency and throughput.
Build for reliability: Implement rate limiting, circuit breakers, graceful degradation, and other patterns that keep the platform stable under load and during partial failures.
Automate everything: Write tooling that eliminates toil-automating deployments, scaling operations, backup verification, and incident remediation.
Instrument and observe: Build out distributed tracing, metrics, and alerting that give engineers clear visibility into system behavior and accelerate debugging.
Respond and learn: Participate in on-call rotations, run incident response, and drive blameless postmortems that prevent recurrence.
Who You Are
5+ years of software engineering experience with a strong backend foundation, including 2+ years focused on reliability, infrastructure, or platform work.
Hands‑handon experience operating databases at scale including query optimization, replication, and failover.
Strong programming skills (Typescript, Python, Go, or similar) with experience building automation and tooling.
Able to diagnose complex distributed systems issues under pressure and communicate clearly during incidents.
Collaborative, low-ego attitude and desire to work in a fast‑paced environment.
#J-18808-Ljbffr
$104k-142k yearly est. 4d ago
Senior Process Innovation Engineer, AMZL Process Engineering Innovation
Amazon 4.7
Reliability engineer job in New York, NY
Stellen-ID: 3126879 | Amazon.com Services LLC
Amazon Logistics (AMZL) is searching for a Sr PE Innovation Engineer to take the AMZL Process Engineering Innovation team to the next level. Our logistics teams are changing the way we interact with customers around the globe every single day and solving some of the biggest logistical challenges facing not just Amazon, but the entire industry.
We are looking for an experienced, customer-obsessed Engineer to join the Amazon Logistics WW Process Engineering Team. This Engineer will be responsible for driving change, large-scale business communications, mitigating risks to plan, goal tracking, data delivery and governance, process improvement, and other ad-hoc projects that drive efficiency and reduce risk across the PE Innovation portfolio for Process Engineering. The right candidate can influence without authority and present to senior leadership. They will be a strong advocate for the team and will work hand-in-hand with key stakeholders in developing and growing strategic relationships, both internally and externally, as well as creating and streamlining internal processes. The Engineer will strive for service excellence, and drive centralization and adoption of our mechanisms with all global stakeholders.
This highly visible, cross-functional role will require effective collaboration and efficient communication with multiple teams (Product, Tech, Program, Finance, Legal, SMEs, Local Ops, GP, etc.) and global stakeholders. The right candidate is proactive in understanding and communicating critical business needs, and knows how those translate into solutions. This person must have experience building long-term, strategic relationships, and becoming influential partners. This position requires heavy verbal and written communication on status, risks and our sourcing pipeline.
This role is a great fit for a professional team-player who is flexible, and very comfortable with ambiguity. The successful applicant will merge a strong technical depth with bar raising communication skills. The Engineer will possess a strong business and project management background, and is able to thrive and succeed in an entrepreneurial environment, and not be hindered by opposing priorities. They will not only successfully develop and drive programs across teams, but is willing to roll up their sleeves, dive into the weeds to get the job done, focusing on the data and details to ensure the right inputs are top of mind for our customers and stakeholders.
To be successful in this role you will need to be able to influence and work with senior leaders across multiple organizations and geographies. You will work with organizations such as Worldwide (WW) Engineering and Design, WW Tech SME, Last Mile Delivery and Tech as well as Operations to innovate at scale and drive the development roadmap.
Key job responsibilities
Handle a high volume of requests, engagements and interactions in fast paced environment
Track existing and set up new metrics as required to validate vendor and team performance
Develop and implement performance improvement plans, working cross-functionally with internal and external stakeholders
Report on performance in weekly and monthly forums with partner teams and senior leadership
Set measurable targets that align with the overall goals of the organization
Comprehend the technical requirements of our internal customers and become an expert in field
Influencer: Innovative pioneer with the ability to identify opportunities and influence organizations to gain support and overcome resistance with data and persuasion.
Significant experience overseeing partner teams through good judgment, making and data.
Doer: Ability to successfully deliver end-to-end technology and vendor operations projects, working through the many obstacles along the way
Communicator: Ability to present to senior leadership, communicate expectations and requirements equally well with business and technology teams, and capacity to write well-reasoned and data-oriented proposals, performing your own data examination as necessary
Problem Solver: Ability to utilize exceptional problem-solving skills to work through difficult tasks
Qualifications
5+ years of directly managing and responsible for multiple large projects experience
5+ years of Microsoft Office products and applications experience
Bachelor's degree in Electrical or Mechanical Engineering
Experience and strong technical background in relevant fields of automated or non-automated material handling equipment
5+ years of leading large complex programs experience
Master's degree in engineering, mechanical, operations, supply chain, business administration, or equivalent STEM field
Experience in Lean Management, Six Sigma and other operations engineer tools
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit ********************************************************* for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $107,600/year in our lowest geographic market up to $177,900/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit ********************************************************
This position will remain posted until filled. Applicants should apply via our internal or external career site.
#J-18808-Ljbffr
$107.6k-177.9k yearly 4d ago
Biomedical ML Engineer II/III - AI Pathology
Pathai 4.3
Reliability engineer job in New York, NY
A leading healthcare technology firm is hiring Machine Learning Engineers to develop and deploy AI models for enhancing patient outcomes. Candidates should have strong backgrounds in Python and machine learning frameworks, with an emphasis on collaboration across scientific and engineering teams. The ideal candidates are those with a passion for improving healthcare through innovative solutions. This role presents a unique opportunity to directly impact patient care while working on advanced ML projects and initiatives.
#J-18808-Ljbffr
$71k-98k yearly est. 4d ago
Data Quality Engineer
Capital Rx 4.1
Reliability engineer job in New York, NY
About Judi Health
Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans, including:
Capital Rx, a public benefit corporation delivering full-service pharmacy benefit management (PBM) solutions to self-insured employers,
Judi Health, which offers full-service health benefit management solutions to employers, TPAs, and health plans, and
Judi, the industry's leading proprietary Enterprise Health Platform (EHP), which consolidates all claim administration-related workflows in one scalable, secure platform.
Together with our clients, we're rebuilding trust in healthcare in the U.S. and deploying the infrastructure we need for the care we deserve. To learn more, visit ****************
Location: Remote (For Non-Local) or Hybrid (Local to NYC area)
Position Summary:
We are seeking a highly motivated and detail-oriented Data Quality Engineer to join our team. In this critical role, you will be the guardian of our data's integrity, ensuring the accuracy, reliability, and robustness of the systems that power our operations and analytics. You will be instrumental in building trust in our data and empowering the organization to make confident, data-driven decisions that drive positive healthcare outcomes.
Position Responsibilities:
Technical Issue Identification & Root Cause Analysis: Identify, investigate, and triage technical issues within the data engineering tech stack (specifically focusing on [Specify Key Technologies - e.g., Python, SQL, Airflow, dbt]). Conduct thorough root cause analysis, utilizing logs, database queries, and system monitoring data to pinpoint the source of problems.
Log Analysis & Monitoring: Monitor and analyze system logs (e.g., using CloudWatch to validate application functionality, identify performance bottlenecks, and proactively detect anomalies. Develop and maintain dashboards to visualize key system metrics.
Database Querying & Analysis: Utilize SQL to query and analyze data within the Snowflake data warehouse. Develop and execute complex queries to investigate data discrepancies, identify trends, and support troubleshooting efforts. Familiarity with SQLAlchemy is a plus.
AWS Service Support: Collaborate with the engineering team on the support and monitoring of AWS services utilized within data engineering (g., EC2, S3). Assist with troubleshooting issues related to these services.
Collaboration & Requirements Translation: Collaborate with Product Managers and engineers to understand business requirements and translate them into actionable test requirements and test plans. Participate in sprint planning and daily stand-ups.
QA Execution: Conduct thorough QA tasks, including ticket review, refinement, testing (manual and potentially exploratory), and bug identification.
Scrum Team Support: Partner with the scrum team to manage backlogs, refine tickets, and support roadmap development.
UAT Support: Assist with UAT testing, stakeholder communication, and documentation to align team efforts with business goals.
Compliance & Reporting: Ensure adherence to company policies, including timely reporting of noncompliance.
Code of Conduct: Responsible for adherence to the Capital Rx Code of Conduct including reporting of noncompliance.
Minimum Qualifications:
Bachelor's degree strongly preferred in Computer Science, Information Technology, or a related field.
3+ years of experience in a QA Analyst, Data Engineer, Business Analyst, or related role.
Proficiency in Python
Strong SQL experience; familiarity with Snowflake preferred.
Familiarity with Agile methodologies and workflows.
Experience with GitHub or similar source control repositories.
Excellent communication and collaboration skills, with the ability to translate between technical and non-technical audiences both verbally and in writing.
Strong analytical and problem-solving skills with attention to detail and QA principles
A meticulous, detail-oriented mindset with a passion for ensuring data accuracy.
Preferred Qualifications:
Experience in the healthcare or PBM sector.
Hands-on experience with modern data stack tools like Airflow, dbt, and Snowflake.
Experience with CI/CD pipelines.
Understanding of data warehousing concepts.
Familiarity with automated testing frameworks and data validation tools.
This position description is designed to be flexible, allowing management the opportunity to assign or reassign duties and responsibilities as needed to best meet organizational goals.
Salary Range$85,000-$100,000 USD
All employees are responsible for adherence to the Capital Rx Code of Conduct including the reporting of non-compliance. This position description is designed to be flexible, allowing management the opportunity to assign or reassign duties and responsibilities as needed to best meet organizational goals.
Judi Health values a diverse workplace and celebrates the diversity that each employee brings to the table. We are proud to provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, medical condition, genetic information, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
By submitting an application, you agree to the retention of your personal data for consideration for a future position at Judi Health. More details about Judi Health's privacy practices can be found at*********************************************
$85k-100k yearly 6d ago
Inventory Control & Warehouse Process Improvement Specialist
LX Pantos Americas
Reliability engineer job in Englewood Cliffs, NJ
We are hiring an Inventory Control & Warehouse Process Improvement Manager to inventory accuracy and drive operational process improvements across approximately 30 small warehouse locations in the U.S.
This role serves as a central control and improvement owner, focusing on inventory integrity, KPI visibility, and standardized warehouse processes.
*This position does NOT manage daily warehouse operations or on-site staff.
Key Responsibilities
Inventory Control (Primary Responsibility)
Own inventory accuracy across multi-site warehouse operations
Manage system vs. physical inventory reconciliation
Design and execute cycle count and audit programs
Investigate inventory variances and drive root cause analysis
Monitor shrinkage, adjustments, and aging inventory
Warehouse Operations Process Improvement (PI)
Analyze warehouse KPIs (productivity, error rate, on-time performance)
Develop and roll out standardized SOPs for: Receiving, Shipping, Transfers, Returns and damages
Benchmark performance across warehouses and share best practices
Build and maintain inventory and operations dashboards
Conduct occasional site visits (Approximately 20% travel) for audits, alignment, and improvement rollout
Support new warehouse launches from an inventory and process perspective
Qualifications
Bachelor's degree or higher
5+ years of experience in inventory control, warehouse operations, or supply chain
Experience supporting multiple warehouse locations
Strong understanding of WMS / ERP systems
Proven experience with cycle counts and physical inventories
Experience in process improvement, SOP development, or operational standardization
Strong Excel skills (Pivot Tables, XLOOKUP/VLOOKUP)
Comfortable with limited travel (20%)
Preferred Qualifications
Experience in installation service or final mile logistics operations
Understanding of logistics systems such as TMS and WMS
Experience leading projects or collaborating in cross-functional teams
Power BI / Tableau or similar reporting tools
$76k-105k yearly est. 2d ago
Site Reliability Engineer, Payments - USDS
Tiktok 4.4
Reliability engineer job in New York, NY
Team Intro: The Global Payment team of the US Tech Service department of TikTok provides all-round payment solutions for the company's USA products, overseas commercialization, and the company's overseas travel and procurement, including channel access, product order design, user interaction, capital management, tax and exchange optimization, settlement Reconciliation and so on. In this role, you'll have the opportunity to develop and manage the complex challenges of scale with your expertise in large-scale system design.
Responsibilities:
* Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation and refinement.
* Support services before they go live through activities such as capacity planning and launch reviews.
* Support and maintain services by measuring and monitoring availability, latency and overall system health.
* Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.Minimum Qualifications:
* BS or MS degree in Computer Science, Electrical Engineering, Computer Engineering or related areas.
* 3+ years of experience in one or more programming languages such as Go, Java, C++, Python etc.
* Good problem-solving, analytical thinking capabilities and exceptional attention to details.
* Good communication and collaboration skills.
Preferred Qualifications:
* Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
* Ability to debug, optimize code, and automate routine tasks.
* Proficiency working with algorithms, data structures and production troubleshooting.
* Expertise in problem solving and analyzing global scale distributed systems.
* Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
$131k-200k yearly est. 60d+ ago
Staff Site Reliability Engineer
Tabs 4.5
Reliability engineer job in New York, NY
Job Description
Tabs is the leading AI-native revenue platform for modern finance and accounting teams. Tabs agents automates the entire contract-to-cash lifecycle, including billing, collections, revenue recognition, and reporting, to help teams eliminate manual work and accelerate cash flow.
High-growth companies like Cursor and Statsig rely on Tabs to generate invoices directly from contracts, reconcile payments in real time, and automate ASC 606 compliance.
Founded in 2023, Tabs has raised over $91 million from Lightspeed Venture Partners, General Catalyst, and Primary. The team is headquartered in New York and brings deep expertise in finance and AI.
About the Role
We're looking for a Staff Site Reliability Engineer to help ensure Tabs' platform is reliable, scalable, and ready for the next phase of growth. You'll operate as a senior individual contributor, partnering with engineering and product teams to embed reliability into how systems are designed, built, and operated.
This role is ideal for someone who enjoys solving complex infrastructure and systems problems, influencing architecture, and raising the reliability bar across an organization.
What You'll Do
Define and evolve reliability standards, SLIs, SLOs, and error budgets
Improve observability, alerting, and incident response across services
Lead high-severity incidents and drive meaningful post-incident improvements
Partner with engineers to design resilient, scalable systems
Build automation to reduce toil and operational risk
Mentor engineers and influence best practices across teams
Who You Are
You think in systems and failure modes, not just tools
You're comfortable operating in ambiguity and setting direction
You value learning from failure over blame
You balance pragmatism with long-term system health
You communicate clearly and influence through trust and expertise
Experience
10+ years of experience in SRE, infrastructure, or backend engineering roles
Strong software engineering experience in one or more modern languages
Experience operating distributed systems in production at scale
Deep familiarity with cloud infrastructure, observability tooling, and CI/CD
Additional Information
This role is based in New York City at our Soho office.
Perks and Benefits (Full-time Employees)
Competitive compensation and equity
Unlimited PTO
Up to 100% employer covered monthly healthcare premium (medical, dental, vision)
Lunch provided via Sharebite, plus dinner for any later office days.
Parental leave up to 12 weeks
Tax free commuter and parking benefits
Voluntary insurances (Life, Hospital, Critical Illness, Accident)
Employee Assistance Program (Rightway)
401k
Tabs is an equal opportunity employer. We welcome teammates of all identities and do not discriminate on the basis of race, ethnicity, religion, gender identity, sexual orientation, age, disability, veteran status, or any other protected characteristic. We're committed to creating an environment where everyone can grow, contribute, and feel comfortable being themselves.
Compensation Range: $200K - $250K
$200k-250k yearly 7d ago
Site Reliability Engineer
Kalshi
Reliability engineer job in New York, NY
Role Roadmap
We are building a next-generation financial ecosystem (think NYSE or CME from scratch). We are a small team, which means your responsibilities scale very rapidly, and your contributions are clear and visible, not marginal. There is still a lot of green field at Kalshi and a lot of it (including entire systems) can be yours.
What you'll do
Improve observability, reliability and availability by defining and measuring key metrics.
Build automation and improve systems to eliminate toil and operations work.
Collaborate with our core infrastructure team to performance tune and optimize our cloud deployments. (Think Docker, Terraform, Kubernetes, EC2, etc.)
Collaborate with product teams to reduce service disruptions and automate incident response.
Proactively find and analyze reliability problems across our business units and stack, then design and implement software to create step-function improvements.
Educate, mentor and hold accountable the engineering team to improve the reliability of our systems and make reliability a core value of the Kalshi engineering culture.
Write high quality, well tested code to meet the needs of your customers.
Debugging extremely difficult technical problems, and making systems and products both work better and are easier to deploy, own, operate and diagnose.
Review all feature designs within your product area and across the company for cross-cutting projects.
Be an owner of the security, safety, scale, operational integrity, and architectural clarity of these designs.
Build integrations with 3rd party vendors.
Participate in an on-call support rotation to provide timely troubleshooting and resolution of urgent issues.
What we're looking for
Attributes:
You have at least 4 years of experience in software engineering.
You've designed, built, scaled and maintained production services, and know how to compose a service oriented architecture.
You write high quality, well tested code to meet the needs of your customers.
You're passionate about building an open financial system that brings the world together.
You possess strong technical skills for system design and coding.
Excellent written and verbal communication skills, and a bias toward open, transparent cultural practices.
Strong skills around observability, debugging and performance tuning.
Strong interpersonal skills working with engineers from junior to principal levels
Demonstrated critical thinking under pressure.
A willingness to dive into understanding, debugging, and improving any layer of the stack.
On-call availability to ensure swift resolution of issues.
Bonus points
Experience designing and building reliable systems capable of handling high throughput and low latency.
Experience with Datadog.
Experience with Rust, Go and Terraform.
Experience with AWS, GCP, or Azure.
Experience working in a highly regulated environment.
Experience writing company-facing blog posts and training materials.
Our Culture
Meritocracy is at our core, and we value people who take ownership and figure (usually hard) things out. We dream big. We love our craft deeply and are proud of what we put out in the world. We are committed to our vision of building something big… but also useful: a product that brings more truth through the power of markets.
Kalshians are Kalshi's most important asset: we pick Kalshians carefully, so we trust them fully on day 1.
NYC Pay Transparency Disclosure:
Salary Range: $100,000 to $250,000 annually plus equity and benefits.
This salary range is based on the current available market data and represents the expected salary range for this role. Kalshi has minimal hierarchy and few titles, but a broad range of experience is represented within roles. Should you have compensation expectations that exceed these bands, we'd love to hear from you and would welcome you to reach out to discuss further.
Commitment to Equal Opportunity
Kalshi is committed to creating a culture of inclusion and belonging, and we are proud to be an equal opportunity employer. We believe it is our collective responsibility to uphold these values and encourage candidates from all backgrounds to join us in our mission. All qualified applicants will be treated with respect and receive equal consideration for employment without regard to race, color, creed, religion, sex, gender identity, sexual orientation, national origin, disability, uniform service, veteran status, age, or any other protected characteristic per federal, state, or local law. If you are passionate about what you do and want to use your talents to support our mission and values, we'd love to hear from you.
$100k-250k yearly Auto-Apply 60d+ ago
Staff Site Reliability Engineer
Garner Health
Reliability engineer job in New York, NY
Job Description
Healthcare quality is declining and soaring costs are crushing American families and businesses. At Garner, we've developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. By providing organizations relief from surging healthcare costs, we've experienced rapid adoption in the market and have more than doubled our revenue annually over the last 5 years, becoming the fastest growing company in our space. To support our continued growth, we're expanding our team by over 50% each year, seeking exceptional talent to shape our unique, award winning culture (for example, USA Today Top Workplaces 2025) designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth that creates an inflection point in your career.
About the role:
We're looking for a Staff Site Reliability Engineer to architect, operate, and improve the platform our product runs on. This role will report to the Manager of Platform Engineering (DevOps/SRE).
This role is open to remote candidates across the U.S. For candidates based in New York City, the position follows a hybrid schedule with in-office work required Tuesday, Wednesday, and Thursday each week.
What you will do:
Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard
Research and advocate for improved techniques, process, and designs within the team
Collaborate with teammates to deliver strategic platform initiatives
Support the Garner platform in production
Secure the Garner app in production according to regulatory and compliance requirements
Partner with other stakeholders to ensure a highly-available and performant product for users
Shape long-term platform strategy, influence cross-team engineering decisions, and mentor engineers across the org
What you will bring to the team:
10+ years experience delivering software solutions
10+ years hands on production work with cloud infrastructure, containers, monitoring, and alerting
8+ years working in a security-conscious environment
Expertise and experience leading and/or delivering cloud-first/only projects, preferably AWS
Expertise improving developer experience/efficiency with respect to change management
Expertise with Terraform and Kubernetes
Expertise with Go and Python, especially utilizing Kubernetes APIs
Technologies we use:
Python, TypeScript, React, NodeJS, Kubernetes, Istio, Postgres, ElasticSearch, NATS, AWS, Terraform
Compensation Transparency:
The target salary range for this position is $219,000 - $245,000. Individual compensation for this role will depend on various factors, including qualifications, skills, and applicable laws. In addition to base compensation, this role is eligible to participate in our equity incentive and competitive benefits plans, including but not limited to: flexible PTO, Medical/Dental/Vision plan options, 401(k), Teladoc Health and more.
Fraud and Security Notice:
Please be aware of recent job scam attempts. Our recruiters use getgarner.com email domain exclusively. If you have been contacted by someone claiming to be a Garner recruiter or a hiring manager from a different domain about a potential job, please report it to law enforcement here and to *********************************.
Equal Employment Opportunity: Garner Health is proud to be an Equal Employment Opportunity employer and values diversity in the workplace. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.
Garner Health is committed to providing accommodations for qualified individuals with disabilities in our recruiting process. If you need assistance or an accommodation due to a disability, you may contact us at ********************.
$219k-245k yearly Easy Apply 15d ago
Staff Site Reliability Engineer
Gradle Inc. 4.1
Reliability engineer job in New York, NY
Job DescriptionWho We Are
Develocity is a first-of-its-kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve software delivery excellence. It combines build and test acceleration with deep observability for builds and tests with Gradle Build Tool, Apache Maven™, sbt, npm, and Python, and applies to both CI and local builds and tests. Ultimately, Develocity provides an operational layer across an organization's toolchains to speed up, troubleshoot, and optimize local developer and remote CI feedback loops.
Our software is used by some of the world's leading software organizations, such as Netflix, Airbnb, SAP, several top ten banks, and many other major customers across all verticals. We regularly collaborate with these and other users to make our products continuously better.
We have partnered with the Apache Software Foundation, the Commonhaus Foundation, the Scala Center, the Micronaut Foundation, and other OSS projects like Spring, Quarkus, Kotlin, JUnit, AndroidX, and many more to bring the values of Develocity also to the OSS Community.
Our Values
Seek to Understand: Everything starts with listening and understanding, and we strive to understand different viewpoints, problems, and motivations. Before we take action, we ensure we truly grasp the challenges, perspectives, and goals.
Know the Why: We approach our work with a clear sense of purpose, ensuring every step is deliberate and focused. We take meaningful action with urgency, but never at the expense of thoughtful consideration.
Innovate & Iterate: We embrace challenges and are not afraid to try new things, even if they might fail. With deep understanding and a clear purpose, we can develop creative and bold solutions to tackle challenges.
Own the Outcome: We are empowered to take initiative and we maintain transparency in our work and its outcomes. When we execute, we take responsibility for our decisions, measure the success of our innovations, and learn from the results.
Who You Are
We're building a new SRE team and looking for founding members to help shape how we operate. As a Lead SRE, you'll be a technical and operational leader for reliability across Develocity. You'll help define our SRE vision, set standards for how we operate production services, and mentor other SREs as the team grows. This is a hands-on role with broad influence across engineering, cloud platform, and customer-facing teams.
The SRE team will be responsible for the reliability, performance, and availability of Develocity instances serving paying customers, open-source projects, and public-facing services, plus supporting infrastructure like artifact registries.
You'll work on our internally-built Cloud Application Platform, Kubernetes on AWS, and develop deep expertise in it. When incidents happen, you'll troubleshoot issues across the stack, from application to infrastructure. You'll collaborate with the Cloud Platform team to improve the tooling you depend on, and with engineering teams to build reliability into how we ship software. If you like automating things and hate doing the same task twice, you'll fit in well.
You'll be part of a distributed, remote-first team that values asynchronous communication and written documentation. Strong self-direction and clear communication across time zones are essential.
Responsibilities
Operate and maintain all Develocity instances and supporting services in production.
Define and evolve SRE standards, practices, and operating models, including on-call, incident response, postmortems, and SLOs.
Participate in a follow-the-sun on-call rotation, acting as a technical escalation point for complex or high-severity incidents.
Lead incident response and blameless retrospectives, ensuring learnings result in measurable reliability improvements.
Set reliability priorities using risk, customer impact, business goals, SLOs, and error budgets.
Identify systemic reliability risks and continuously evolve Develocity's SaaS operations as the platform and customer base grow.
Lead and influence architectural and design reviews to ensure reliability, scalability, and operability.
Drive automation across deployment, upgrades, monitoring, self-healing, recovery, and operational workflows.
Build and maintain comprehensive observability for all managed services, including logging, metrics, tracing, and alerting.
Own disaster recovery, backups, and business continuity planning and execution.
Partner with engineering leadership to balance feature delivery with reliability and operational excellence.
Mentor and coach SREs, supporting technical growth and strong operational practices.
Help onboard new SREs and contribute to hiring by defining and assessing SRE excellence at Develocity.
Communicate clearly with customers during incidents and maintenance windows.
Optimize performance, resource utilization, and operational costs.
Minimum qualifications
7+ years in SRE, DevOps, or an equivalent role operating production services at scale.
Experience leading reliability initiatives across multiple teams or services.
Demonstrated ability to influence technical direction without direct authority.
Experience designing and operating systems with SLOs and error budgets, and exercising strong judgment in balancing reliability, velocity, and cost.
Strong Kubernetes experience in production environments.
Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).
Proficiency with observability tools (Prometheus, Grafana) and Infrastructure as Code (Terraform).
Track record of incident management and response in a 24/7 on-call environment.
Scripting proficiency (Python, Bash) for automation.
Strong written and verbal English communication skills.
Preferred qualifications
Experience as a founding or early SRE establishing practices in a growing SaaS organization.
Familiarity with Develocity.
JVM language experience (Java, Kotlin).
Experience with customer-facing and executive-level incident communications.
What We Offer
A ground-floor role in a new SRE team - you'll shape how we do things, not inherit someone else's decisions.
Real ownership of production systems used by engineers at companies you've heard of.
Direct interaction with customers when things go wrong (and when they go right).
A culture that values automation over heroics.
In-person meetings, such as our annual company offsite and team meetings.
Work from home in a remote-first environment.
Competitive salaries and equity grants.
Compensation
The US salary range for this position is $180-220k which reflects the target ranges for all US locations. Within this range, individual pay is determined by geographic location and additional factors including but not limited to experience, relevant skills, qualifications, seniority, performance, and travel requirements. Our recruiting team can share more information about the specific salary range for your location during the hiring process.
Location
Remote from anywhere in EST timezone.
While our team works remotely and is spread across the globe, we deeply value daily interactions and collaboration.
$180k-220k yearly 4d ago
Lead Site Reliability Engineer, AI/ML Platform
Jpmorgan Chase 4.8
Reliability engineer job in Jersey City, NJ
Responsibilities: + Design and implement solutions to enhance the reliability and scalability of AI/ML platforms and applications to accommodate fast growing demands. + Partner with product engineering teams to ensure the AI/ML systems are reliable and high performing.
+ Develop observability, security, automation and fin-ops tools and orchestration.
+ Provide strategic technology leadership by defining and evaluating standards and architecture for reliability, observability and automation frameworks.
+ Build strong cross-functional relationships that foster engagements across the organization and deliver solutions to user problems.
+ Debug and solve issues in a production environment, identify root cause and remediate.
+ Participates in on-call rotations, incident management and escalation workflows.
+ Take full ownership of problems, develop solutions, and acquire new knowledge to complete the task.
+ Mentor and guide junior engineers.
Required Qualifications:
+ Bachelor's degree in computer science, Information Technology, or equivalent technical qualification with 5+ years professional experience.
+ Expertise in SRE principles, reliability, scalability and performance of application and infrastructure.
+ Have hands-on experience with cloud platforms (AWS, GCP, Azure) and IaC tools (Terraform, Ansible).
+ Extensive experience implementing advanced observability using tools like Open Telemetry, Dynatrace, Grafana, and/or cloud-native services.
+ Experience in architecting distributed systems and cloud-native architecture in AWS.
+ Systematic problem-solving and troubleshooting skills in a complex system.
+ Excellent communication skills and ability to represent and present business and technical concepts to stakeholders.
+ Self-managed, self-motivated with strong sense of ownership, urgency, and drive
Good to have:
+ Prior experience working in AI, ML, or Data engineering.
+ Prior experience developing AI Ops/AI Agents.
+ Multi cloud experience (AWS, GCP, Azure) is a plus
JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
**Base Pay/Salary**
Jersey City,NJ $152,000.00 - $215,000.00 / year
$152k-215k yearly 28d ago
Staff Site Reliability Engineer
Roman 4.1
Reliability engineer job in New York, NY
Ro is a direct-to-patient healthcare company with a mission of helping patients achieve their health goals by delivering the easiest, most effective care possible. Ro is the only company to offer nationwide telehealth, labs, and pharmacy services. This is enabled by Ro's vertically integrated platform that helps patients achieve their goals through a convenient, end-to-end healthcare experience spanning from diagnosis, to delivery of medication, to ongoing care. Since 2017, Ro has helped millions of patients, including one in every county in the United States, and in 98% of primary care deserts.
Ro has been recognized as a Fortune Best Workplace in New York and Health Care for four consecutive years (2021-2024). In 2023, Ro was also named Best Workplace for Parents for the third year in a row. In 2022, Ro was listed as a CNBC Disruptor 50.
The Role:
At Ro, our mission is to provide world-class healthcare by putting patients first - and that mission depends on reliable, secure, and scalable systems. As a Staff SRE on the infrastructure team, you'll sit at the core of that effort: owning the reliability of our production systems, hardening infrastructure and building tools that empower our engineers to ship safely and confidently.
You will work across teams to drive uptime, performance and observability - partnering closely with product, platform and security engineers.
From designing resilient systems to shaping incident response practices, this is a role for engineers who thrive on impact and care deeply about operational excellence.
What You'll Do:
* Design and implement resilient infrastructure to support high availability at scale
* Build and contribute to tools and platforms that streamline deployment, monitoring and recovery of systems
* Drive incident response and harness learnings, leading efforts to minimize downtime and improve MTTR
* Partner with engineering teams to bake best practices for reliability, resilience and observability into services
* Automate infrastructure workflows using IaC and other cloud native tools
* Champion a culture of operational excellence, guiding engineers through reliability practices and raising the bar across the engineering org
What You'll Bring to the Team:
* Deep understanding of systems and infrastructure, with experience operating distributed services in production. We are mostly in AWS and leverage a lot of its primitives - EKS, RDS, Route53, S3, Elasticache to name a few
* Strong programming and automation skills using Go (bonus points for Python)
* Proficiency with infrastructure as code - Terraform / Pulumi
* A passion for observability, with hands-on experience in metrics, logging, tracing using Datadog
* Strong cross-functional communication, able to collaborate with product, platform, security and other teams
* An operational mindset that puts reliability and resilience as a core product requirement
* A mission-driven attitude, motivated by the opportunity to make healthcare better.
We've Got You Covered:
* Full medical, dental, and vision insurance + OneMedical membership
* Healthcare and Dependent Care FSA
* 401(k) with company match
* Flexible PTO
* Wellbeing + Learning & Growth reimbursements
* Paid parental leave + Fertility benefits
* Pet insurance
* Student loan refinancing
* Virtual resources for mindfulness, counseling, and fitness
The target base salary for this position ranges from $211,700 to $292,000, in addition to a competitive equity and benefits package (as applicable). When determining compensation, we analyze and carefully consider several factors, including location, job-related knowledge, skills and experience. These considerations may cause your compensation to vary.
Ro recognizes the power of in-person collaboration, while supporting the flexibility to work anywhere in the United States. For our Ro'ers in the tri-state (NY) area, you will join us at HQ on Tuesdays and Thursdays. For those outside of the tri-state area, you will be able to join in-person collaborations throughout the year (i.e., during team on-sites).
At Ro, we believe that our diverse perspectives are our biggest strengths - and that embracing them will create real change in healthcare. As an equal opportunity employer, we provide equal opportunity in all aspects of employment, including recruiting, hiring, compensation, training and promotion, termination, and any other terms and conditions of employment without regard to race, ethnicity, color, religion, sex, sexual orientation, gender identity, gender expression, familial status, age, disability and/or any other legally protected classification protected by federal, state, or local law.
See our California Privacy Policy here.
$99k-135k yearly est. 60d+ ago
Site Reliability Engineer III
Stratacuity
Reliability engineer job in Jersey City, NJ
Title: Site Reliability Engineer III Contract Duration: 10 Months Contract Role Roles and Responsivities * Design and develop automation tools and frameworks to eliminate repetitive manual tasks and improve operational efficiency.
* Build self-service capabilities for common support activities, reducing dependency on manual intervention.
* Create CI/CD pipelines and automated workflows for application deployment, configuration, and maintenance.
* Partner with development and infrastructure teams to embed reliability and scalability into automated solutions.
* Analyze operational pain points and implement tooling to reduce toil and improve system resilience.
* Maintain and enhance production support runbooks, incorporating automation wherever possible.
* Participate in major incident triage and root cause analysis, focusing on automation to prevent recurrence.
* Strong programming skills in Python, Java, or similar languages for automation and integration.
* Experience with CI/CD tools (e.g., Jenkins, Github, Ansible etc).
* Understanding of distributed systems, APIs, and data pipelines.
* Ability to design and implement automation solutions for operational workflows
* Operating systems: Linux, Windows
* Scripting languages: Unix Shell, Perl, Python, SQL (Intermediate level: multi-table joins, grouping results, etc...)
* Well rounded understanding of infrastructure disciplines (network, server, storage, messaging)
* Experience with messaging systems (60East AMPS. Kafka etc) and event-driven architectures.
* Exposure to infrastructure-as-code tools (Terraform, Ansible).
* Experience of Observability Tools ; Splunk, Dynatrace, Prometheus/Grafana
* Strong collaboration and communication skills across global teams
EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or ************.
Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Apex uses a virtual recruiter as part of the application process. Click here for more details.
Apex Benefits Overview: Apex offers a range of supplemental benefits, including medical, dental, vision, life, disability, and other insurance plans that offer an optional layer of financial protection. We offer an ESPP (employee stock purchase program) and a 401K program which allows you to contribute typically within 30 days of starting, with a company match after 12 months of tenure. Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts. In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Apex has a dedicated customer service team for our Consultants that can address questions around benefits and other resources, as well as a certified Career Coach. You can access a full list of our benefits, programs, support teams and resources within our 'Welcome Packet' as well, which an Apex team member can provide.
Employee Type:
Contract
Location:
Jersey City, NJ, US
Job Type:
Date Posted:
January 15, 2026
Pay Range:
$65 - $73 per hour
Similar Jobs
* Site Reliability Engineer III
* Site Reliability Engineer III
* Site Reliability Engineer
* Site Reliability Engineer
* Site Reliability Engineer
$65-73 hourly 4d ago
Reliability Engineer
Minaris
Reliability engineer job in Allendale, NJ
The Reliability Engineer is responsible for providing both tactical and strategic support for the development, implementation, monitoring, and continuous improvement of Reliability Solutions over the entire lifecycle of Site Assets and Capital Projects.
Essential Functions and Responsibilities
Below is the summary of the role responsibilities. This is not an exhaustive list of all responsibilities, duties, skills, efforts, requirements or working conditions associated with the role. While this is intended to be an accurate reflection of the current job, management reserves the right to revise the job or to require that other or different tasks be performed as assigned with or without notice.
CMMS Development
Develop, maintain and update the Proactive Maintenance and Condition Monitoring programs within the organization.
Authorize all final changes to the PM/PdM program and ensure that program changes are properly entered into CMMS
Integrate Asset Criticality Assessment results into Asset records and maintenance strategy
Assist in developing a spare parts program strategy in the CMMS and ensure required changes are documented and communicated
Assist in establishing clear workflows & responsibilities for Work Order and Job Plan execution and approval, system administration, and training in use of the system.
PM Optimization Program - review the existing maintenance program for opportunities to: o Identify and remove low value-added tasks
Revise PM intervals based on task type, asset usage, environment, and industry guidelines
Replace time-based intrusive tasks with condition based non-intrusive tasks where applicable
Develop Failure Modes and ensure all Job Plan tasks are tied to a failure mode
Strengthen work packages (task instructions, parts, and tools)
Procedural Development
Provide oversight and direction to implement a life cycle Asset Management program across the organization. This includes defining the Vision, providing guidance to Site teams, setting expectations, establishing and monitoring metrics, interfacing with numerous groups and communicating the Vision to key Stakeholders across the Organization. This effort also involves defining the interfaces with other Business Enterprise systems supported across HCATS
Assist in developing a site Criticality Analysis approach and update as required.
Devise methods to develop and improve engineering standards and work practices within the department
Develop procedures and specifications for the reliability maintenance aspects of existing and new equipment and systems.
Analytics
Monitor and drive the effectiveness of the Reliability & Maintenance program.
Develop reliability metrics to evaluate activities and processes to assess Capital Effectiveness. Identify opportunities to provide improvement through benchmarking of metrics, personal knowledge, and interfacing with others.
Evaluate and trend critical equipment performance and history and develop proactive strategies to minimize the occurrence of the failures. Compare system and components failure rates with industry norms.
Identify Bad actors by analyzing equipment history using CMMS data, spare parts utilization, PdM history, etc.
Investigation and Quality Support
Provide support in Quality investigations relating to critical equipment failures
Conduct Risk Assessments (FMEAs) and Root Cause Analyses for critical equipment failures
Ensure that all CAPAs (Corrective Actions, Preventive Actions) are included in the proactive maintenance program updates
Ensure that change control requirements including testing and documentation are always adhered to within the department and the Asset Management Program as a whole.
Project Support
Defining and supporting Processes, Systems and Approaches that provide significant improvement in Capital Effectiveness, contribute to improved Operating Performance, promote consistency across the Network and support compliance, reliability, energy efficiency, ergonomics, operability, etc. initiatives with all project designs
Providing support for the ongoing development of Sites and facilities through development of Site Master Plans. This includes assessing facility designs with EHS, Regulatory and Operational requirements and promoting the use of innovative designs that provide improved reliability and performance
Assist in the continued development of the Capital Project Management Process within the organization. This includes collaboration with other Departments to ensure best practices are shared across the site network, implementing Engineering Design guidelines as appropriate and contributing to optimization of the RFP, DQ, Commissioning, and ETOP Processes.
Manage smaller reliability improvement projects in areas such as yield loss improvement, efficiency, PdM, TPM, lean processes
Participate with Corporate communities of practice and engineering excellence forums to ensure compliance of the site while providing SME (Subject Matter Expert) lead on site initiatives.
Knowledge, Skills & Ability
Candidate must be self-motivated and be able to quickly ramp-up on the understanding of existing plant processes.
Demonstrated diagnostic and problem-solving skills in a team environment
Acts decisively o Exercises good judgment and makes effective, sound, timely and informed decisions. Seeks to identify, analyze and resolve problems effectively.
Communicates effectively o Uses appropriate modes and media, targeting the amount, level of detail, and content of the information to the needs of the audience. Prepares clear, concise, and well-organized written documents and oral presentations. Conveys information clearly, confidently, and with the proper tone. Facilitates open communication. Uses discretion and demonstrates sensitivity to confidentiality concerns. Listens effectively and provides appropriate feedback.
Strong collaborative and influencing skills and ability to work well in a matrixed environment with internal and external stakeholders.
Promotes efforts aimed at improving current business processes through a culture that fosters continuous improvement and innovation. Identifies and implements improvements and innovations that increase efficiency and enhance work quality. Promotes ongoing development of staff and takes initiative to assess and self-develop supervisory competencies.
Displays and fosters integrity and honesty through the promotion of mutual trust and respect, demonstrates and fosters high ethical standards, and treats others fairly and ethically.
Manages projects and leads initiatives in the workplace. Organizes resources, people, and activities; and ensures collaboration and the achievement of project and function goals and targets. Ensures effectiveness and efficiency in the delivery of services, products and/or programs.
Capable of working both as part of a team or autonomously - previous team leadership and facilitation skills will be a plus.
Fosters continuous improvement and innovation
Desire to learn and develop career within organization
Fosters integrity and honesty
Manages projects and functions
Education & Experience
A minimum of 3 years post college graduate or equivalent experience required
Knowledge and application of reliability methodologies and tools (e.g. Root Cause Analysis, FMEA, Structured problem solving, Lean and Six Sigma) is preferred.
Hands-on maintenance and engineering experience required
Both tactical and strategic knowledge of the principles of preventive & predictive maintenance, reliability centered maintenance and equipment performance standards
Experience preparing site assessments, FMEA reports and the ability to produce sample documents if requested
Previous experience in a Pharmaceutical/Life Sciences/Medical Device environment preferred
CMRP Certification preferred
Experience working with a CMMS package -Blue Mountain and/or Maximo preferred
Disclaimer
The above information in this description is intended to describe the general nature and level of work performed. It does not contain nor is it intended to be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications required of employees assigned to this job. Duties, responsibilities, and activities may change at any time with or without notice.
$87k-121k yearly est. 11d ago
Site Reliability Engineer - Capital Markets
Jefferies Financial Group Inc. 4.8
Reliability engineer job in Jersey City, NJ
Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.
As part of the wider platform engineering team, you will be working closely with the Business users interactively throughout the day, along with technical, analysis and testing colleagues. Investigation and resolution of the work items at hand will require competent technical skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas.
Responsibilities:
Front Line Site ReliableEngineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users.
Build monitoring tools for application and infrastructure components.
Implement and manage scalable infrastructure using cloud-native technologies and tools.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Partner with business, development and infrastructure teams to improve services through rigorous testing and release procedures.
Develop and maintain CI/CD pipelines to streamline deployment processes.
Expedient deployment of new systems. Capacity planning, Platform Management, and support for increasing volumes and business growth.
Create sustainable systems and services through automation.
Collaborate with Application team to establish and enforce production and development standards.
Document procedures, best practices and troubleshooting FAQs.
Resolve complex application and technical problems.
Debugging the system and fixing the production related issues.
Escalate / follow-up on permanent fix for development related issues.
Lead incident response efforts and post-mortem analysis to prevent future occurrences.
Handles complex operational tasks and recommends process and technology changes.
Global support and includes weekend availability to troubleshoot production related issues and perform checkouts.
Ability to work both independently and in groups in an energetic, diverse environment.
Participate in on-call rotations to ensure 24/7 system availability and support.
Support compliance and legal queries.
Qualifications:
Strong experience in Windows and Linux/Unix services.
Strong experience in scripting language like Power shell, Python and SQL.
Strong Knowledge of monitoring tools - Nagios, Splunk, OTEL, Datadog
Strong Knowledge of FIX protocol
Strong Domain skills - Must have working experience in Capital Markets across modules and instruments especially - CASH, ETS, Bonds, Options, Futures, Swaps products
Experience in BFSI (Banking and Financial Industry) Domain applications with a proper understanding of the Trade Lifecycle.
Excellent communication, time management and project management skills.
Primary Location Full Time Salary Range of $175,000 - $200,000
$175k-200k yearly Auto-Apply 60d+ ago
Site Reliability Engineer
Akkodis
Reliability engineer job in New York, NY
Akkodis is seeking a Site Reliability Engineer for a Contract with a client in New York, NY. Candidate must have expert Linux server management and troubleshooting skills, with strong scripting and solid production experience across networking/security and database administration to be considered.
Rate Range: $53/hour to $60/hour; The rate may be negotiable based on experience, education, geographic location, and other factors.
Site Reliability Engineer job responsibilities include:
* Design, implement, and maintain Linux infrastructure for enterprise IoT systems, ensuring high availability, performance, and security hardening.
* Automate operations, deployments, and monitoring using Perl/Python/PowerShell and manage CI/CD workflows.
* Administer and optimize SQL Server and Sybase databases; perform backups, tuning, and integrity checks.
* Manage application packaging, deployment, patching, and upgrades on Linux servers with end‑to‑end request flow awareness.
* Diagnose and resolve complex incidents across OS, applications, networks, and IoT platforms; drive root‑cause analysis and prevention.
* Implement and enforce security/network controls (RBAC, IP/DNS/DHCP, firewalls, segmentation) and maintain user/permission governance.
Required Qualifications:
* Bachelor's degree in computer science, Information Systems, or a related field.
* 8+ years in Linux infrastructure/SRE within large enterprise environments (preferably with IoT systems exposure).
* Candidate must have expert Linux server management and troubleshooting skills, with strong scripting in Perl/Python/PowerShell and CI/CD automation experience.
* Candidate will have solid knowledge of networking/security (IP, DNS, DHCP, firewalls, segmentation) and hands-on administration of SQL Server/Sybase in regulated, high-availability environments.
If you are interested in this role, then please click APPLY NOW. For other opportunities available at Akkodis, or any questions, feel free to contact me at **********************************
Pay Details: $53.00 to $60.00 per hour
Benefit offerings available for our associates include medical, dental, vision, life insurance, short-term disability, additional voluntary benefits, EAP program, commuter benefits and a 401K plan. Our benefit offerings provide employees the flexibility to choose the type of coverage that meets their individual needs. In addition, our associates may be eligible for paid leave including Paid Sick Leave or any other paid leave required by Federal, State, or local law, as well as Holiday pay where applicable.
Equal Opportunity Employer/Veterans/Disabled
Military connected talent encouraged to apply
To read our Candidate Privacy Information Statement, which explains how we will use your information, please navigate to ******************************************************
The Company will consider qualified applicants with arrest and conviction records in accordance with federal, state, and local laws and/or security clearance requirements, including, as applicable:
* The California Fair Chance Act
* Los Angeles City Fair Chance Ordinance
* Los Angeles County Fair Chance Ordinance for Employers
* San Francisco Fair Chance Ordinance
Massachusetts Candidates Only: It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
$53-60 hourly Easy Apply 5d ago
Site Reliability Engineer
Clay Labs
Reliability engineer job in New York, NY
About Clay
Our mission is to help organizations turn any growth idea into reality.
We see growth as a creative practice, not a formula. Finding and reaching your best-fit customers takes unique ideas and constant iteration. As AI makes execution faster and tactics easier to copy, creativity is the only lasting advantage. We're already helping thousands of customers - including Anthropic, Waste Management, Figma, and Ramp - go to market with unique data, signals, and AI research.
In 2025, we crossed $100M in revenue and raised a $100M Series C at a $3.1B valuation, backed by world-class investors including Sequoia, CapitalG, and First Round. We also completed our first first employee tender offer and launched a community equity round, for our customers, agency partners, and club members.
Some things to know about us:
Our community includes 11,000+ customers, 150+ integration partners, 125+ agencies, 50+ Clay clubs, and 30k members on Slack.
Our culture is unique inside
and
outside of work. Our team members are also DJs, activists, writers, clowns, marathoners, skydivers, psychedelic therapists, social workers, and more.
All employees can work for free with world-class coaches who specialize in creativity, management, and more.
Our operating principles - including negative maintenance and non-attached action - guide our work. Read more about them here.
Read about us in the NYT, Forbes, First Round Review, and more.
Hear from our employees directly on our Glassdoor page!
SRE @ Clay
In this role, you'll join our growing infrastructure team in building and fine-tuning our infrastructure to keep our services running smoothly. We're looking for someone who's excited about automation and continuous improvement. While your main focus will be on infrastructure, coding skills are a must. As a growing startup, we all jump in where needed, so you'll need to be comfortable taking on a variety of roles.
What You'll Do
Architect, design, implement, and manage robust, scalable, and secure infrastructure solutions.
Develop, maintain, and enforce best practices for CI/CD, infrastructure as code, and automation.
Oversee the management and optimization of cloud infrastructure, ensuring high availability, performance, and cost-efficiency.
Implement monitoring, logging, and alerting solutions to maintain system health and quickly resolve issues.
Lead incident response efforts, troubleshooting and resolving complex issues in a timely manner.
Participate in an oncall rotation.
Work with teams across the company to ensure we achieve the right balance of developer velocity, reliability and performance, and cost efficiency.
What You'll Bring
5+ years of experience
Experience with containerization and orchestration tools
Strong understanding of CI/CD concepts and tools
Knowledge of infrastructure automation tools
Experience with oncall and incident response
Proficiency in one or more programming languages
Familiarity with our stack or ability to learn unfamiliar technologies quickly:
Aurora Postgres RDS, Elasticache Redis, Docker + ECS, Lambda, OpenSearch
Terraform and Atlantis
CircleCI, Netlify, Playwright
Cloudwatch, Datadog, Mezmo
Typescript, Python
$90k-125k yearly est. Auto-Apply 60d+ ago
Site Reliability Engineer/ Terraform Developer - C67711 6.0 New York, NYC
CapB Infotek
Reliability engineer job in New York, NY
SRE Perform low level component design · Develop and build the code Create confidence and certainty in deployments with immutable infrastructure built and tested using CI/CD. Should have experience with the following technologies: AWS Terraform
Container Orchestration ( Kubernetes, ECS)
Configuration Management tools (Chef, Puppet)
Infrastructure as Code (Terraform, CloudFormation)
Experience level 3-5 yrs.
$90k-125k yearly est. 60d+ ago
Site Reliability Engineer - (Linux & Python/Go)
Elliot Partnership
Reliability engineer job in New York, NY
Site Reliability Engineer - (Linux & Python/Go)
New York, NY (Hybrid, 3 days in office)
Highly competitive compensation package
Join an elite technology and research group at the forefront of global finance, where world-class engineering and quantitative research converge to solve some of the most complex problems in any industry. Their teams are composed of passionate problem-solvers who operate in a dynamic, large-scale IT environment. We are seeking a visionary engineer to lead critical reliability and automation initiatives, ensuring the firm's complex trading and research platforms operate with maximum performance, scalability, and resilience.
The Role:
We are seeking a deeply experienced Site Reliability Engineer to act as a Tech Lead for key infrastructure initiatives. This is a crucial, hands-on role for a hybrid systems and software engineer who thrives on solving complex problems at scale. You will be a key technical leader responsible for architecting and building the robust, automated systems that underpin the firm's critical operations. You will act as a force multiplier for the engineering organization by leading high-impact projects, mentoring other engineers, and setting the standard for technical excellence in reliability and performance.
Responsibilities:
Lead the design and execution of high-impact projects focused on improving the reliability, scalability, and performance of their core infrastructure.
Architect, build, and maintain mission-critical tools and automation in Python or Go to eliminate operational toil and enhance system capabilities.
Serve as a senior escalation point for complex Linux systems issues, diagnosing and resolving deep technical challenges related to performance, configuration, and stability.
Drive the architecture for scalable, resilient, and performant infrastructure, making key design decisions for production environments.
Mentor and guide other engineers, championing best practices in software development, infrastructure management and site reliability.
Your experience:
7+ years of experience in a senior site reliability, infrastructure, or software engineering role with a track record of success in complex, large-scale environments.
Expert-level proficiency in Python or Go, with a proven track record of engineering libraries, tools, or applications (not just scripting).
Deep, hands-on expertise with the Linux operating system, including performance tuning, troubleshooting, and systems administration in a large-scale environment.
Demonstrated experience leading technical projects, driving architectural decisions, and mentoring other engineers.
Strong knowledge of CI/CD, infrastructure-as-code (Ansible, Terraform), and containerization (Docker, Kubernetes).
Exceptional communication skills, with the ability to articulate complex technical concepts to a variety of audiences.
How much does a reliability engineer earn in West New York, NJ?
The average reliability engineer in West New York, NJ earns between $75,000 and $140,000 annually. This compares to the national average reliability engineer range of $76,000 to $144,000.
Average reliability engineer salary in West New York, NJ
$102,000
What are the biggest employers of Reliability Engineers in West New York, NJ?
The biggest employers of Reliability Engineers in West New York, NJ are: