Reliability Engineer jobs at BAE Systems - 4393 jobs

Site Reliability Engineer II Boston, Massachusetts, United States Boston, Massachusetts
Axon Enterprise 4.5
Boston, MA jobs
Join Axon and be a Force for Good. At Axon, we're on a mission to Protect Life. We're explorers, pursuing society's most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We connect with candor and care, seeking out diverse perspectives from our customers, communities and each other. Life at Axon is fast-paced, challenging and meaningful. Here, you'll take ownership and drive real change. Constantly grow as you work hard for a mission that matters at a company where you matter. Your Impact As a Site Reliability Engineer II within the APX SRE organization, you'll focus on delivering practical, scalable solutions to support the reliability and performance of our mission‑critical, cloud‑native global Kubernetes platform and the services that run on it. You care deeply about system stability, clear documentation, and creating tools that improve the developer experience. Work Location: This role is based out of our Boston office and follows a hybrid schedule. We rely on in‑person collaboration and ask that team members work onsite Tuesdays through Fridays, with the flexibility to work remotely on Mondays, unless there is an approved workplace accommodation. We believe that connection fuels innovation, and our in‑office culture is designed to foster meaningful teamwork, mentorship, and shared success. What You'll Do As an SRE, you'll play a critical role in building the infrastructure and tools that power reliable, scalable, and secure engineering operations across the company. You will: Build robust, easy‑to‑use Kubernetes platforms and tools that enable engineering teams to provision and operate services rapidly, consistently, and securely. Exemplify cloud‑native site reliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem‑solving skills, with the ability to debug problems in cloud‑native distributed systems. Influence and educate the engineering organization to adopt new and improved architectural patterns. Provide robust documentation for use by engineers to promote self‑service. Continually seek improvement within our Kubernetes platform for improved reliability, operability, and cost efficiency. Take calculated risks, champion new ideas, and cultivate your craft. What You Bring 3+ years of applicable experience in platform engineering, and container orchestration. Experience building platforms on clouds such as Azure and AWS. Building, operating, and innovating clustering solutions for Kubernetes platforms like AKS, EKS, or similar in production at scale. Experience with programming languages such as Python, Go, C#, Java, or similar. Experience of code collaboration such as GitHub, ArgoCD, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Pulumi, or similar. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and testable Infrastructure as Code modules. Empathy to support the needs of software engineers. Benefits that Benefit You Competitive salary and 401k with employer match. Discretionary time off. Paid parental leave for all. Fitness programs. Emotional & Development Programs. And yes, we have snacks in our offices. Benefits listed herein may vary depending on the nature of your employment and the location where you work. Base Pay Range $115,500 - $184,800 USD Don't meet every single requirement? That's ok. At Axon, we aim far. We think big with a long‑term view because we want to reinvent the world to be a safer, better place. We are also committed to building diverse teams that reflect the communities we serve. Studies have shown that women and people of color are less likely to apply to jobs unless they check every box in the . If you're excited about this role and our mission to Protect Life but your experience doesn't align perfectly with every qualification listed here, we encourage you to apply anyway. You may be just the right candidate for this or other roles. The above is not intended as, nor should it be construed as, exhaustive of all duties, responsibilities, skills, efforts, or working conditions associated with this job. The job description may change or be supplemented at any time in accordance with business needs and conditions. Some roles may also require legal eligibility to work in a firearms environment. Axon's mission is to Protect Life and is committed to the well‑being and safety of its employees as well as Axon's impact on the environment. All Axon employees must be aware of and committed to the appropriate environmental, health, and safety regulations, policies, and procedures. Axon employees are empowered to report safety concerns as they arise and activities potentially impacting the environment. We are an equal opportunity employer that promotes justice, advances equity, values diversity and fosters inclusion. We're committed to hiring the best talent-regardless of race, creed, color, ancestry, religion, sex (including pregnancy), national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, genetic information, veteran status, or any other characteristic protected by applicable laws, regulations and ordinances-and empowering all of our employees so they can do their best work. If you have a disability or special need that requires assistance or accommodation during the application or the recruiting process, please email **********************. Please note that this email address is for accommodation purposes only. Axon will not respond to inquiries for other purposes. #J-18808-Ljbffr
$115.5k-184.8k yearly 17h ago

Looking for a job?

Let Zippia find it for you.

Remote Site Reliability Engineer - Windows, AD & ITIL
Iron Mountain 4.3
Boston, MA jobs
A global leader in information management is looking for a talented Systems Engineer in Boston. This role requires U.S. Citizenship and the ability to obtain government clearance. Responsibilities include troubleshooting, providing support, and performing system documentation. Ideal candidates will have a Bachelor's degree, strong technical skills, and experience with Windows Server and Linux. The expected salary range is between $93,400 to $124,500, offering opportunities for professional growth in a fast-paced environment. #J-18808-Ljbffr
$93.4k-124.5k yearly 2d ago
Site Reliability Engineer - Scale, Automation & Observability
Apple Inc. 4.8
San Francisco, CA jobs
A leading technology company in San Francisco is seeking a Site Reliability Engineer. This role requires expertise in managing infrastructure across multiple data centers, with a focus on automation and reliability. Candidates must have a BS/MS in Computer Science and at least three years of experience in a related role, with advanced skills in programming languages like GoLang, Python, or Java. The position also emphasizes collaboration with development teams to deliver high-quality outcomes, along with a commitment to automation in systems management. #J-18808-Ljbffr
$147k-191k yearly est. 3d ago
Senior Site Reliability Engineer
Apple Inc. 4.8
San Francisco, CA jobs
San Francisco, California, United States Software and Services The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple's long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. And they do it on a massive scale, meeting Apple's high expectations with high performance to deliver a huge variety of entertainment in over 35 languages to more than 150 countries. These engineers build secure, end-to-end solutions. They develop the custom software used to process all the creative work, the tools that providers use to deliver that media, all the server-side systems, and the APIs for many Apple services. Thanks to Apple's unique integration of hardware, software, and services, engineers here partner to get behind a single unified vision. That vision always includes a deep commitment to strengthening Apple's privacy policy, one of Apple's core values. Although services are a bigger part of Apple's business than ever before, these teams remain small, forward-thinking, and cross‑functional, offering greater exposure to the array of opportunities here. Description Apple Services Engineering infrastructure is BIG. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As an SRE at Apple, you'll need to solve these problems using data, teamwork, and your own expertise. SREs at Apple own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management - our responsibilities are both broad and deep. ASE runs the majority of its systems on Linux. We run a mix of open source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software deployment, logging, and monitoring. You'll learn these tools and have opportunities to improve them. Our team is collaborative; we work closely with the development teams we support to deliver the best results for Apple. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded. Culturally we believe in a close partnership with our development teams and aim to design & build new services together. We're passionate about software and automation in SRE and develop a variety of tooling and infrastructure. Our services run on mixed & hybrid platforms. Minimum Qualifications BS/MS in Computer Science or Equivalent At least 5 -7 years in a Reliability Engineering, DevOps or infrastructure focused role Advanced experience with programming languages (GoLang, Python, Java) Passion for designing and building reliable systems Strong sense of ownership and integrity demonstrated through clear communication and collaboration Deep systems and infrastructure knowledge Advanced knowledge and hands‑on experience with CI/CD systems Automation advocate - you truly believe in removing operation load with software Understanding of the Linux Operating System, standard networking protocols, and components Preferred Qualifications Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment Hands‑on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Ansible, and Spinnaker) Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks Excellent troubleshooting and problem solving skills Experience with scale testing, disaster recovery, and capacity planning Familiarity with microservices architecture and container orchestration with Docker & Kubernetes Demonstrated ability to deliver results on time with high quality At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr
$181.1k-318.4k yearly 4d ago
Site Reliability Engineer
Phase2 Technology 3.9
Washington, DC jobs
Site Reliability Engineer The Opportunity Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or software development, if you have a passion for making systems better, we need you! As a site reliability engineer on our team, you'll lead the development of more robust systems by building a resilient infrastructure. You'll build in redundancy, implement monitoring tools, and automate wherever possible. You'll reduce toil by scripting routine tasks and automating self-repair. This is your chance to leverage your expertise in cloud technologies while supporting your team of engineers and acting as a subject matter expert for your clients. Work with us as we help deliver a scalable, secure, and intelligent payment ecosystem that meets modernization goals and public expectations for transparency and service quality. Join us. The world can't wait. You Have 2+ years of experience leading teams Experience deploying, maintaining, or troubleshooting complex applications at an enterprise scale Experience with CloudWatch, CloudTrail, Splunk/ITSI, and Pager Duty Experience working in Unix or Linux, AWS, SaaS, and PaaS implementation Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements Master's degree in CS, Engineering, or IT and 8+ years of experience working with key indicators for IT system operability, reliability, application performance, or code quality, or 10+ years of experience working with key indicators for IT system operability, reliability, application performance, or code quality in lieu of a degree Nice If You Have Experience with test‑driven development, distributed systems, microservices, and cloud‑native application implementation Experience with CI/CD, including GitLab Runners, GitHub Actions, and Jenkins, Git, and system administration Experience working in an Agile framework, including Kanban and Scrum Possession of excellent written and verbal communication skills Possession of excellent critical‑thinking and error assessment skills Vetting Applicants selected will be subject to a government investigation and may need to meet eligibility requirements of the U.S. government client. Compensation At Booz Allen, we celebrate your contributions, provide you with opportunities and choices, and support your total well‑being. Our offerings include health, life, disability, financial, and retirement benefits, as well as paid leave, professional development, tuition assistance, work‑life programs, and dependent care. Our recognition awards program acknowledges employees for exceptional performance and superior demonstration of our values. Full‑time and part‑time employees working at least 20 hours a week on a regular basis are eligible to participate in Booz Allen's benefit programs. Individuals that do not meet the threshold are only eligible for select offerings, not inclusive of health benefits. We encourage you to learn more about our total benefits by visiting the Resource page on our Careers site and reviewing Our Employee Benefits page. Salary at Booz Allen is determined by various factors, including but not limited to location, the individual's particular combination of education, knowledge, skills, competencies, and experience, as well as contract‑specific affordability and organizational requirements. The projected compensation range for this position is $99,000.00 to $225,000.00 (annualized USD). The estimate displayed represents the typical salary range for this position and is just one component of Booz Allen's total compensation package for employees. This posting will close within 90 days from the Posting Date. Identity Statement As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud. Work Model Our people‑first culture prioritizes the benefits of flexibility and collaboration, whether that happens in person or remotely. If this position is listed as remote or hybrid, you'll periodically work from a Booz Allen or client site facility. If this position is listed as onsite, you'll work with colleagues and clients in person, as needed for the specific role. Commitment to Non-Discrimination All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, local, or international law. #J-18808-Ljbffr
$99k-225k yearly 4d ago
Site Reliability Engineer
Iron Mountain 4.3
Boston, MA jobs
At Iron Mountain we know that work, when done well, makes a positive impact for our customers, our employees, and our planet. That's why we need smart, committed people to join us. Whether you're looking to start your career or make a change, talk to us and see how you can elevate the power of your work at Iron Mountain. We provide expert, sustainable solutions in records and information management, digital transformation services, data centers, asset lifecycle management, and fine art storage, handling, and logistics. We proudly partner every day with our 225,000 customers around the world to preserve their invaluable artifacts, extract more from their inventory, and protect their data privacy in innovative and socially responsible ways. Are you curious about being part of our growth story while evolving your skills in a culture that will welcome your unique contributions? If so, let's start the conversation. Summary This role will be working within the Iron Mountain Government Services team and requires U.S. Citizenship living on U.S. soil and while active government clearance is not required, the candidate will be required to obtain government clearance to perform this role. We are looking for a talented Systems Engineer who has the experience necessary to help build out our existing infrastructure and troubleshoot problems as they arise. The ideal candidate for this position can prioritize mission critical tasks and coordinate expansion of our system so updates and other maintenance tasks do not get in the way of daily operations. In addition to solid technical, analytical, and troubleshooting skills, the candidate must have great soft and customer service skills allowing them to confidently interact with customers and explain highly technical concepts in simple, easy to understand terms. Responsibilities Troubleshoot escalated tickets Provide 2nd and 3rd level support to external and internal customers Join critical incident calls for priority 1 issues as the subject matter expert Perform on-call duties on a weekly rotation Windows Server Builds Maintaining backups Security patching and vulnerability management General break-fix and KLO Create and maintain system documentation and support processes Communicate verbally and in writing with customers Participate in the setup and configuration of new customers, environments, and proof-of-concept solutions Work with multiple vendors routinely for support and troubleshooting of the solution Work with internal cross-functional teams on troubleshooting system issues Responsible for working and prioritizing issues and project tasks according to SLA's Responsible for managing their own tickets and maintaining them on a daily basis Participate in testing and post-deployment validation of the system Perform Active Directory Services administration and management to include design, cleanup and routine maintenance and configuration Other duties and projects as assigned Qualifications Strong communication, collaboration and problem solving skills with a track record of delivering production grade systems in a team environment Motivated individual who learns quickly, has pride in building a new product and can engage others to accelerate technical solutions Minimum Bachelor's degree 5+ Years, hands-on technical architecture skills and depth across multiple technologies Technology areas to include Cloud, Virtualization, Network, Compute, and Storage Demonstrable experience with Hyper Converged Infrastructure an advantage Experience with Nutanix hardware is a plus Advanced experience with Windows Server operating systems Experience with Linux is required. Proficient in server oriented architectures and web platform applications Familiarity with PowerShell scripting preferred Knowledge in AD architecture and infrastructure (LDAP, Directory Replication, group policy, security, schema changes, Identity and Access Management, etc.) Excellent troubleshooting and analysis skills Knowledge in Microsoft Endpoint Configuration Manager or System Center Experience in working with geographically distributed teams Excellent written and verbal communications skills with external customers Experience working with and coordinating issue resolution with 3rd party vendors 3 years experience of Incident and Request Management process as defined by ITIL v3 Written and verbal proficiency in the English language Comfortable working in a fast-paced environment Knowledge in Rubrik backup and recovery is a plus Ability to get and maintain US Gov security clearances for various customers and projects Eligibility and Compliance Requirements Must be a U.S. Citizen. Required to pass a rigorous background check to obtain IRS Publication 1075 certification and gain access to Federal Tax Information (FTI). Must be able to successfully complete all government-mandated security and suitability investigations, which may include fingerprinting and financial reviews. Proven experience supporting the IT infrastructure or applications for a federal agency, such as the Defense Logistics Agency (DLA), is essential and clearance required. Education BS from accredited/recognized university Location #LI-Remote Reasonably expected salary range: $93,400.00 - $124,500.00 Category: Information Technology Iron Mountain is a global leader in storage and information management services trusted by more than 225,000 organizations in 60 countries. We safeguard billions of our customers' assets, including critical business information, highly sensitive data, and invaluable cultural and historic artifacts. Take a look at our history here. Iron Mountain helps lower cost and risk, comply with regulations, recover from disaster, and enable digital and sustainable solutions, whether in information management, digital transformation, secure storage and destruction, data center operations, cloud services, or art storage and logistics. Please see our Values and Code of Ethics for a look at our principles and aspirations in elevating the power of our work together. If you have a physical or mental disability that requires special accommodations, please let us know by sending an email to accommodationrequest@ironmountain.com. See the Supplement to learn more about Equal Employment Opportunity. Iron Mountain is committed to a policy of equal employment opportunity. We recruit and hire applicants without regard to race, color, religion, sex (including pregnancy), national origin, disability, age, sexual orientation, veteran status, genetic information, gender identity, gender expression, or any other factor prohibited by law. To view the Equal Employment Opportunity is the Law posters and the supplement, as well as the Pay Transparency Policy Statement, CLICK HERE Requisition J0089482 #J-18808-Ljbffr
$93.4k-124.5k yearly 2d ago
Senior Site Reliability Engineer, Global Services Platform
Apple Inc. 4.8
San Francisco, CA jobs
A leading technology company in San Francisco is seeking a Site Reliability Engineer (SRE) to manage and optimize their extensive infrastructure. The ideal candidate will have 5-7 years of experience in reliability engineering or DevOps, with advanced skills in programming languages like GoLang, Python, and Java. You will collaborate with cross-functional teams to ensure the robust performance of systems across multiple data centers. This role offers competitive compensation and comprehensive benefits including medical coverage and stock options. #J-18808-Ljbffr
$158k-202k yearly est. 4d ago
Founding SRE Engineer - Reliability & Growth
Asana 4.6
San Francisco, CA jobs
A leading software company is seeking experienced Software Engineers to join the new Site Reliability Engineering team. This role focuses on building reliable, scalable systems and leading projects across infrastructure. Candidates should have strong software engineering skills and a passion for reliability. The position offers a hybrid work model and generous compensation packages with additional benefits. #J-18808-Ljbffr
$147k-189k yearly est. 3d ago
SRE II - Hybrid Cloud Reliability Engineer
Axon Enterprise 4.5
Boston, MA jobs
A technology company is seeking a Site Reliability Engineer in Boston to enhance cloud-native services and ensure high reliability. Responsibilities include building foundational platforms, utilizing cloud tools, and following best practices. Candidates should have over 5 years of relevant experience in managing cloud platforms and programming languages like Python or Go. This role follows a hybrid work schedule, promoting collaboration and innovation. #J-18808-Ljbffr
$88k-122k yearly est. 2d ago
Reliability Engineer
Mini-Circuits 4.1
New York, NY jobs
Mini-Circuits designs, manufactures and distributes integrated circuits, modules, and sub‑systems for high‑performance radio frequency (RF) and microwave applications. With design, sales and manufacturing locations in over 30 countries, Mini‑Circuits' products are used in a range of wired and wireless communications applications. Our products are also used in detection, measurement and imaging applications, including military communication, guidance and electronic countermeasure systems, commercial, scientific, military land, sea and aircraft; automotive systems, medical systems, and industrial test equipment. Mini‑Circuits' sells its products to over 20,000 customers globally through our direct sales force, applications engineering staff, sales representatives, as well as through our extensive website. Position Summary: The Reliability Engineer is responsible for conducting reliability studies of existing products and coordinating new product qualification prior to market release. The candidate will work in collaboration with various teams including Reliability, Design Engineering, Product Engineering, Failure Analysis and Project Management teams. Salary Range: $99,000 - $117,000 per year Job Function: Participate in the product development meetings and guide the team to develop reliable products that meet internal specifications and customer requirements. Develop qualification plans for new products, primarily MMICs but also support other product lines including but not limited to Low Temperature Co‑Fired Ceramics, PCBA products, RF accessories and Core & Wire Products. Analyze new products for similarity with existing released products in terms of package, die process and design to determine Qualification by Similarity, thus streamlining qualification testing. Design and execute both device level and package level qualification tests including but not limited to MSL pre‑conditioning, Thermal cycling, UHAST, HTSL, ESD and Life Tests. Define ESD Human Body Model (HBM) and Charged Device Model (CDM) tests as per JEDEC standards. Collaborate with Engineering Test Teams to execute Accelerate Life Tests, High Temperature Operating Life Test. Execute Mechanical stresses such as Vibration, Mechanical Shock, Constant Acceleration & Bend Testing. Co‑ordinate with external labs for outsourced tests. Review RF Test data before and after stresses to analyze changes in performance. Collaborate with Failure Analysis teams to understand the root cause of failures. Identify and record any non‑conformities. Monitor solution implementations to verify effectiveness of corrective actions. Ensure On‑Time Completion of Qualification activities and escalator any potential delays. Present Qualification results with all relevant stakeholders to help Design teams initiate changes to improve reliability performance. Prepare written reports summarizing the results of product performance and failure analysis for both internal purposes as well as customer review. Interface with customers and suppliers on product reliability as required. Interface with supplier to purchase lab equipment. Support reliability assessments originating from production of released products or customer returns. Makes decisions within area of specialty, manages medium to large projects. Promotes ISO9001/AS9100 Quality. The duties, responsibilities and expectations described above are not a comprehensive list and additional tasks may be assigned to the member, within the scope of the position. Qualifications: BS in Mechanical Engineering, Electrical Engineering, Materials, Reliability, Industrial Engineering or Physics. Advanced degree preferred. 3‑5 years' experience as a Reliability Engineer in Semiconductor or equivalent industry. Familiarity with common industry standards including JEDEC, MIL‑STD‑883, MIL‑STD‑202 and AEC‑Q. Experience with Reliability Qualification by Similarity. Experience with Environmental, Mechanical and ESD stresses. Experience with problem solving methodologies and leading root cause analysis. Experience with customer returns failure analysis support. Must have familiarity with failure analysis techniques including Scanning Acoustic Microscopy (SEM), Radiographic Inspection (X‑Ray), Cross‑Section methods. Familiarity with MTTF, MTBF Calculations. Experience with Reliability prediction modeling and tools like Weibull++ (or equivalent reliability software) Experience with Data analysis tools including Advanced Excel, JMP, Minitab. Ability to analyze component performance data in reliability tests, including large variety of test parts and multiple design variations. Experience with Design of Experiments, FMEA, product design reviews and DFM. Excellent written and oral communication skills. Physical Demands: The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. While performing the duties of this job, the employee is regularly required to talk and hear. The employee frequently is required to stand, walk, sit and use hands to operate a computer keyboard. The employee is occasionally required to reach with hands and arms. The employee must occasionally lift and/or move up to 10 pounds. Specific vision abilities required by this job include close vision, and ability to adjust focus. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. Additional Requirements/Skills: Ability and willingness to abide by Company's Code of Conduct. Occasional travel, some overnight, as required (up to 10%). Disclaimer: The listed qualifications and requirements for each position are intended as guidelines. Mini‑Circuits reserves the right to hire outside of these guidelines at Management's discretion. Mini‑Circuits is an Equal Opportunity Employer and does not discriminate on the basis of actual or perceived age, race, creed, color, national origin, sexual orientation, military status, sex, disability, predisposing genetic characteristics, marital status, familial status, gender identity, gender dysphoria, pregnancy‑related condition, and domestic violence victim status or protected class characteristic, or any other protected characteristic as established by federal or state law. #J-18808-Ljbffr
$99k-117k yearly 1d ago
Senior Site Reliability Engineer
Captivateiq, Inc. 4.3
San Francisco, CA jobs
CaptivateIQ is transforming the way companies plan, manage, and optimize sales performance. We started by revolutionizing incentive compensation management, and now we're expanding our platform to solve broader sales planning challenges. Recognized by industry analysts like Forrester and G2 and backed by top-tier investors, including Sequoia, ICONIQ, Accel, and Sapphire Ventures, we empower high-growth companies like Netflix, Figma, and Stripe with the flexibility and insights needed to drive revenue performance. Join a talented, fast-growing team committed to solving some of the most complex and impactful problems in sales performance management. About the Role The Site Reliability Engineering team in CaptivateIQ operates across the engineering organization, supporting our development teams by providing them with the tools and processes they need to get their job done well. We ensure that the service provided by our product is great for the paying customers and when it isn't we ensure that the business is well informed. We do this by providing infrastructure, platform, reliability, and observability support to our internal customers to help them achieve their goals. The team are thoughtful and pragmatic engineers who balance doing things right versus doing things right now. We invest in iterative efforts to refine or pivot our work, deliver real-world results, and reflect on the process in order to improve it incrementally. We are fully remote and invest in written communication for long term institutional memory while valuing the synchronous time we have together in order to build and strengthen our relationships. Location San Francisco Bay Area, CA (Menlo Park) / Hybrid Responsibilities SRE team responsibilities vary based on the needs of our internal customers and the skills available in the team. Below is a list of general responsibilities that all SRE team members should be expected to fulfill. Learn by reading and writing designs, documentation, runbooks, and industry literature Partner with development teams to design and implement reliable and resilient services Build infrastructure automation that's easy to use by other teams Develop observability processes, reports, and tooling to diagnose performance and stability issues Eliminate toil by automating manual processes Ensure we exceed our compliance and security commitments Act in an ethical and professional manner Requirements This list is not comprehensive or evaluated in total, it is meant as a guide. If you have some of these skills and traits, please apply! 5+ years of experience in Software Engineer, SRE or DevOps roles Strong written and verbal communication skills (We use Slack, Notion, and Github) Experience with Infrastructure as Code (We use Terraform and AWS) Experience with containers and container orchestration tools (We use ECS) Experience with authoring and maintaining code (We use Bash, Python, and Golang) Experience with using and helping others observability tools and techniques (We use Datadog) Love for the Oxford comma (We use, love, and respect it) Nice to Haves Experience with cloud cost management and FinOps Experience in building, maintaining, and operating SaaS or Web based applications Experience with distributed system principles their application Experience building and operating multi-region or cell based applications Experience with managing cloud vendor relationships Experience with compliance and regulated environments (We use SOC2 and HIPAA) Benefits (US-ONLY) 100% of medical, dental, and vision covered including 75% for dependents vacation days and quarterly mental health days so you can recharge US-ONLY) 401k plan to participate in and save towards the future Apple products to help you do your best work Resource Groups (ERGs) to support and celebrate the shared identities and life experiences of communities within CaptivateIQ. ERGs directly support our company-wide DEI goals as a space for developing and retaining diverse talent Notice to Prospective Candidates Only emails ********************* should be trusted. We are aware of active recruitment scams using the CaptivateIQ name, in which individuals pose as our recruiters and post fake remote job openings and make fake job offers on the Internet. Please note, we will never do the following: Attempt to correspond with a candidate using a free web-based account, such as an email address that ends *************, @yahoo.com, @hotmail.com, etc. Make an offer of employment without conducting multiple rounds of interviews face-to-face using secure video-conferencing technology. Ask candidates to cash checks to buy equipment on behalf of CaptivateIQ. Ask candidates to make a payment in order to be considered for a position. Make early requests for candidates' personal information such as date of birth, passport details, credit card numbers, bank details and social security number, etc. Please note that we'll only ask for more sensitive personal information in connection with background checks after an offer is made. Participate in an on-call rotation to provide after-hours support, ensuring timely resolution of critical issues and maintaining system uptime. $195,700 - $225,000 a year The base range represents the minimum and maximum for this position in the San Francisco Bay area. The compensation offered for this position will depend on numerous factors, including individual proficiency, anticipated performance, and the location of the selected candidate. Our OTE is just one component of CaptivateIQ's competitive total rewards package. CaptivateIQ participates in E-Verify, web-based system that allows enrolled employers to confirm the eligibility of their employees to work in the United States. #J-18808-Ljbffr
$195.7k-225k yearly 1d ago
Senior SRE - Remote-First, Observability & Reliability
Captivateiq, Inc. 4.3
San Francisco, CA jobs
A tech company focused on sales performance is seeking a Site Reliability Engineer in San Francisco. This role involves collaborating with development teams, automating infrastructure, and ensuring service reliability. Ideal candidates will have extensive experience in SRE or DevOps, with skills in infrastructure as code and strong communication abilities. The company offers generous benefits including health coverage and a 401k plan, fostering a diverse and inclusive work environment. #J-18808-Ljbffr
$142k-189k yearly est. 1d ago
Senior PostgreSQL DBRE - Scale, Reliability & Automation
Okta, Inc. 4.3
San Francisco, CA jobs
A leading identity management firm is looking for a Senior Database Reliability Engineer (DBRE) in San Francisco, California. The ideal candidate will have over 4 years of experience specifically with PostgreSQL and will be responsible for designing and optimizing data persistence layers for mission-critical systems. Key responsibilities include leading database incidents, working cross-functionally with platform teams, developing automation for tasks, and ensuring high availability across database environments. This position is essential for operational excellence in a hybrid environment. #J-18808-Ljbffr
$157k-199k yearly est. 17h ago
Senior PostgreSQL DBRE - Reliability at Scale
Okta, Inc. 4.3
San Francisco, CA jobs
A leading identity management company is seeking a skilled Senior Database Reliability Engineer (DBRE) to optimize and manage their PostgreSQL database environment. The role emphasizes designing resilient data infrastructure, automating key database processes, and collaborating with engineering teams. With a focus on high availability and performance optimization, the ideal candidate will possess extensive experience in high-volume production environments, specifically with PostgreSQL and MySQL. This hybrid position requires in-person onboarding in San Francisco. #J-18808-Ljbffr
$157k-199k yearly est. 17h ago
Site Reliability Engineer US - San Francisco
Near Inc. 4.6
San Francisco, CA jobs
The NEAR AI engineering team is developing decentralized and confidential machine learning infrastructure to power user owned AI. We currently focus on building infrastructure to enable private and confidential inference that works across different compute providers, as well as a blockchain-based coordination layer that incentivizes computer providers to join the decentralized inference network. You will own various components and drive critical decisions throughout their life cycles, including architecture, implementation, and maintenance. You will collaborate with highly knowledgeable and skilled colleagues who are passionate about solving hard problems that can disrupt the industry. What You'll Be Doing: End-to-end infrastructure ownership (for handling telemetry data, for performing testing, etc) Design and implementation of infrastructure components that manage clusters of GPU with special configurations Performance tuning and optimizations Create and maintain runbooks that support the on-call rotation Participate in the on-call rotation. Support code releases and delivery Plan and implement infrastructure cost and security strategies Plan and implement effective CI/CD Pipelines to facilitate development processes What We're Looking For: Agility to quickly learn new programming languages and technologies Ability to write clean and efficient code Ability to transform ambiguous problems into tangible solutions or prototypes Experience with software concurrency or parallelism Experience in building, operating, and scaling Cloud infrastructure (GCP, AWS, etc) Experience with data visualization and observability tooling (Grafana, Graphite, Zabbix, etc) Detail-oriented mindset with a focus on setting priorities and progressing towards objectives Excellent communication and teamwork skills Bachelor's Degree in Computer Science or a related field We'd Love If You Have: Experience with NEAR or other blockchain internals Experience with GPUs Experience with Trusted Execution Environments Experience debugging and troubleshooting complex concurrent systems Professional experience with Rust Locations: onsite, San Francisco office #J-18808-Ljbffr
$126k-176k yearly est. 3d ago
Site Reliability Engineer - AI Inference Infra & GPU Clusters
Near Inc. 4.6
San Francisco, CA jobs
A tech company specializing in AI infrastructure based in San Francisco is looking for a candidate to own the development of decentralized machine learning infrastructure. The role involves designing components, performance tuning, and collaboration with skilled colleagues. The ideal candidate should have experience in Cloud infrastructure and software concurrency, along with a Bachelor's degree in Computer Science. Excellent communication skills and the ability to learn quickly are essential. The position is onsite at the San Francisco office. #J-18808-Ljbffr
$126k-176k yearly est. 3d ago
Business Value Engineer
Ironclad 3.8
San Francisco, CA jobs
Ironclad is the leading AI contracting platform that transforms agreements into assets. Contracts move faster, insights surface instantly, and agents push work forward, all with you in control. Whether you're buying or selling, Ironclad unifies the entire process on one intelligent platform, providing leaders with the visibility they need to stay one step ahead. That's why the world's most transformative organizations, from OpenAI to the World Health Organization and the Associated Press, trust Ironclad to accelerate their business. We're consistently recognized as a leader in the industry: a Leader in the Forrester Wave and Gartner Magic Quadrant for Contract Lifecycle Management, a Fortune Great Place to Work, and one of Fast Company's Most Innovative Workplaces. Ironclad has also been named to Forbes' AI 50 and Business Insider's list of Companies to Bet Your Career On. We're backed by leading investors including Accel, Y Combinator, Sequoia, BOND, and Franklin Templeton. For more information, visit ******************* or follow us on LinkedIn. The Business Value Engineer role is intended to own, define, and execute Ironclad's value-selling methodology across the customer lifecycle. This role is critical in bringing financial rigor and strategic storytelling to the sales process, ensuring our prospects and customers clearly understand the projected and realized impact of Ironclad's technology. You will be responsible for the end-to-end value lifecycle-from initial discovery and financial modeling to executive-level presentations and post‑sale value realization. This role is cross‑functional, partnering with Sales, Customer Success, Product, and Marketing to ensure our value strategy drives revenue growth, deal velocity, and long‑term customer success. What you'll do: Deal‑Level Execution: Own the value strategy for complex deals by leading discovery‑based sales processes to identify customer pain points and business objectives. Develop custom financial models (TCO, ROI) and defend them against CFO and procurement scrutiny. Process Analysis & Standardization: Map as‑is workflows to identify inefficiencies and quantify metrics. Standardize advanced modeling methodologies (scenario‑based, risk‑adjusted) and build repeatable process frameworks for specific industries or functions. Strategic Programs: Lead multi‑threaded value programs that extend beyond individual deals. Create prioritization frameworks for the team to focus the organization on the highest‑impact paths. Sales Partnership & Influence: Provide informal coaching and share best practices across the team to raise the collective bar. Partner seamlessly with GTM and Sales Engineers (SEs) on deal strategy and sponsorship creation. Who are you? Experience: 6+ years of experience in value engineering, management consulting, or software sales, with a track record of driving complex deal strategy. Financial Expertise: Deep experience in financial modeling, specifically developing custom business cases that detail the value of complex software solutions. Resilient & Adaptive: Proven ability to stay effective under ambiguity and adjust value narratives quickly based on customer feedback and shifting deal dynamics. Agile Problem Solver: Ability to guide teams on when to pivot approaches and teach others how to unblock value engagements with methodology when resources are limited. Strategic Storyteller: Demonstrated ability to identify patterns from CXO conversations to innovate and pilot new solutions that establish the team as a trusted advisor. Collaborative Leader: Extensive experience collaborating with cross‑functional teams, including Sales, Sales Engineering, and gTech, to drive persuasive customer adoption. Technical Rigor: Proficiency in mapping as‑is workflows to identify inefficiencies and quantify metrics for "above-the-line" executive audiences. OTE Range: $188,000.00 - $235,000.00 The OTE range represents the minimum and maximum of the OTE range for this position based at our San Francisco headquarters. The OTE offered for this position will depend on numerous factors, including individual proficiency, anticipated performance, and the location of the selected candidate. Our OTE is just one component of Ironclad's competitive total rewards package, which also includes equity awards (a new hire grant, along with opportunities for additional awards throughout your tenure), competitive health and wellness benefits, and a commitment to career growth and development. US Employee Benefits at Ironclad: 100% health coverage for employees (medical, dental, and vision), and 75% coverage for dependents with buy‑up plan options available Market‑leading leave policies, including gender‑neutral parental leave and compassionate leave Family forming support through Maven for you and your partner Paid time off - take the time you need, when you need it Monthly stipends for wellbeing, hybrid work, and (if applicable) cell phone use Mental health support through Modern Health, including therapy, coaching, and digital tools Pre‑tax commuter benefits (US Employees) 401(k) plan with Fidelity with employer match (US Employees) Regular team events to connect, recharge, and have fun And most importantly: the opportunity to help build the company you want to work at UK Employee-specific benefits are included on our UK job postings Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. #J-18808-Ljbffr
$188k-235k yearly 17h ago
Reliability/DFX Engineer
Openai 4.2
San Francisco, CA jobs
About the Team OpenAI's Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI's supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI. About the Role We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. This hands-on individual contributor will sit within our hardware team and work closely with chip design, platform design, hardware health, and the broader industry ecosystem to architect, implement, and deploy reliable next-generation AI accelerator systems. This engineer will evaluate system and chip architecture holistically, identify high-ROI opportunities to improve reliability and availability across the stack, and translate those opportunities into strategy and silicon features. In this role, you will Oversee DFX architecture, implementation, and execution in silicon from concept to high-volume deployment, and propose high-ROI features to enhance reliability and fault tolerance. DFX includes design for testability, reliability, availability, and serviceability of high-performance AI hardware. Build system-level reliability models grounded in empirical data to guide organization-wide DFX and reliability strategy. This requires a detailed understanding of chip and system architecture, design, implementation, and component-level reliability. Collaborate with chip and platform architecture/design teams to explore and implement DFX features, including the specification and implementation of digital/mixed-signal IP, firmware/system software, and DFX methodology (in partnership with engineering teams). Partner with hardware health and platform design teams to continuously improve reliability and fault tolerance in NPI and HVM. This includes optimizing operating conditions, designing experiments, and performing data analysis to drive continuous, data-driven improvements across the stack. Serve as the DFX/reliability champion and evangelist to align the broader industry ecosystem with OpenAI's requirements and roadmap. Qualifications BS with 15+ years, MS with 10+ years, or PhD with 3+ years of relevant industry experience focused on reliability across the chip/platform stack. Hands-on experience with RTL design and DFT is required; physical implementation and/or silicon ATE experience is preferred. Detailed understanding of ML chip and platform architecture and ML workload characteristics is required. Strong fundamentals in reliability modeling, with hands-on skills in empirical data analysis. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. #J-18808-Ljbffr
$127k-176k yearly est. 3d ago
Reliability Engineer: Scale Systems, Observe & Automate
Openai 4.2
San Francisco, CA jobs
A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations. Candidates should have strong cloud proficiency, experience in containerization technologies, and a bachelor's degree in a related field. #J-18808-Ljbffr
$127k-176k yearly est. 1d ago
Senior Value Engineer - Strategic Deals & ROI Expert
Ironclad 3.8
San Francisco, CA jobs
A leading AI contracting platform is seeking a Business Value Engineer in San Francisco to own and execute value-selling methodology throughout the customer lifecycle. This role will drive revenue growth and customer success by partnering with Sales and Customer Success while developing financial models to articulate the impact of the platform. Candidates should have over six years of relevant experience and a proven track record in strategic storytelling and agile problem-solving. Competitive compensation is offered, including benefits and opportunity for growth. #J-18808-Ljbffr
$96k-132k yearly est. 17h ago