Reliability engineer jobs in Olympia, WA - 321 jobs
All
Reliability Engineer
Quality Engineer
Senior Reliability Engineer
Process Engineer
System Safety Engineer
Production Engineer
Quality Engineer/Process Engineer
Manufacturing Quality Engineer
Packaging Engineer
Production ML Engineer, Foundation Models at Scale
Apple Inc. 4.8
Reliability engineer job in Seattle, WA
A leading technology firm in Seattle seeks a Machine Learning Engineer to optimize inference for advanced model architectures. You will build solutions serving millions of customers, mentor engineers, and work with cutting-edge technologies. This role requires strong leadership, cloud application knowledge, and GPU programming expertise. A competitive compensation package, including stock options and comprehensive benefits, is offered.
#J-18808-Ljbffr
$146k-187k yearly est. 1d ago
Looking for a job?
Let Zippia find it for you.
Site Reliability Engineer, Product - USDS
Tiktok 4.4
Reliability engineer job in Seattle, WA
About the team The Product Engineering team monitors and maintains the availability of TikTok, including services such as video playback, content discovery/recommendations, live streaming, and customer service feedback. Responsibilities In this role, you will:
* Gain a solid understanding of the various components and services that power the TikTok experience
* Maintain services to meet service-level-agreements (SLAs) and service-level-objectives (SLOs) by measuring and monitoring availability, performance, and overall system health
* Participate as part of a global team to support site-up issues to ensure that services are reliable, fault-tolerant, efficiently scalable and cost-effective
* Scale systems sustainability through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
* Provide user support, incident responses and postmortems Minimum Qualifications
1. Bachelor or above degree in Computer Science or a related technical discipline with 3+ years experience in the deployment and administration of large-scale distributed systems
2. Strong understanding of Unix/Linux operating systems internals and administration, networking (e.g. TCP/IP, routing, network topologies and hardware), storage systems, and database systems
3. Experience in one or more programming languages, such as C, C++, Java, Python, Go, Ruby, Rust, JavaScript
4. Experience in debugging and optimizing code and automate routine tasks
5. Experience in development, testing, deployment and administration of one or more of the following types of systems: Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, Kafka
6. Experience in designing and analyzing large-scale distributed systems is preferred
7. Strong skills in problem solving and communication
As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship or any immigration-related benefits.
$145k-221k yearly est. 60d+ ago
Site Reliability Engineer, Edge Services (Seattle)
Bytedance 4.6
Reliability engineer job in Seattle, WA
Our Content Distribution Networks (CDN) team operates on a hybrid platform that integrates both commercial CDN vendors and ByteDance's proprietary edge network. This platform encompasses a vast array of Points of Presence (POPs) around the world, hosting edge services like traffic acceleration, CDN caching, gaming, and more. We are looking for experienced reliability and performance engineers to ensure the stability and reliability of our edge services and products running on our hybrid CDN platform. This role will also involve optimizing performance and developing innovative solutions to meet the evolving business demands at the edge. Site Reliability Engineering (SRE) combines software and systems engineering to create and manage large-scale, highly distributed infrastructures. Our SREs are responsible for ensuring that these infrastructure services are reliable, fault-tolerant, scalable, and cost-effective. In this role, you will manage complex systems at scale, including hyperscale datacenter administration, public cloud management, global CDNs, and load balancers that process terabits of traffic per second. You will collaborate with diverse teams to translate business requirements into actionable items, driving improvements in system design and operational procedures. Responsibilities • Architect and implement solutions that enable both internal and external customers to harness the power of Bytedance's globally scaled content delivery network. • Build metrics, tools, automations, visualizations and monitors to facilitate the operation and optimization of the edge services. • Develop procedures and workflows that improve efficiency, foster trust, and ensure compliance in operational processes. • Run vulnerability and capacity assessment and develop disaster recovery strategies to ensure high availability of our global CDN services. • Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
Minimum Qualifications • Bachelor's degree with 2+ years of experience in Computer Engineering, Computer Science, or related fields. • 2+ years working experience in the field of CDN performance engineering, solution architecting or site reliability engineering roles. • 2+ years experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python. Preferred Qualifications • Self-driven and capable of coping with ambiguity and moving projects from concept to delivery. • Experience in operating in a multi-CDN environment. • Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment. Past experience with CDN technologies is a plus. • Strong in analytical skills and the ability to solve real world problems in a fast moving environment. • Experience in designing, analyzing and building automation and tools for large scale systems • Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.
$161k-224k yearly est. 19d ago
Principal Site Reliability Engineer
Oracle 4.6
Reliability engineer job in Olympia, WA
**Our Team** Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Data, Analytics Platform. This team will focus on product development and product strategy for Oracle Health, while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed with an entrepreneurial spirit that promotes an energetic and creative environment. We are unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence.
Oracle Health Data, Analytics Platform has a rare opportunity to play a critical role in how Oracle Health products impact and disrupt the healthcare industry by transforming how healthcare and technology intersect.
You will have the opportunity to:
+ Reach billions of people with our products & services
+ Create technology in which truly impacts the world
+ Ability to have immediate impact on developing technology
+ Unlimited growth potential with inspiring work
+ Work with the best minds in the industry
+ Enjoy working in an open, diverse, and productive environment
**About The Job**
This role provides technical leadership for the core data platforms behind Oracle Health's Data & Analytics Platform. As a Principal Site Reliability Engineer (SRE), you will own shared, mission-critical systems used by multiple products and teams.
You will lead the design and operation of large-scale, stateful distributed platforms, including Hadoop ecosystem components (HDFS, YARN, HBase) deployed on Oracle Big Data Service (BDS), Kafka, and Storm. These multi-tenant platforms are deployed and operated through Ansible- and Terraform-based automation and require strong architectural ownership to manage scale, change, and broad blast radius.
**What You'll Do**
**Platform Ownership & Technical Leadership**
+ Own the end-to-end reliability, scalability, and operability of shared data platforms
+ Define platform standards, architectural direction, and operational guardrails
+ Influence cross-team technical decisions and long-term platform strategy
+ Drive long-term platform evolution and influence reliability strategy across the data ecosystem
**Architecture & Design**
+ Lead platform architecture and design reviews
+ Clearly articulate system behavior, dependencies, and failure modes
+ Make principled trade-offs between reliability, performance, cost, and complexity
+ Provide guidance and guardrails that enable downstream teams to use platforms safely and effectively
**Operations Engineering**
+ Establish capacity models, scaling strategies, and operational best practices
+ Design platforms that behave predictably under load, failure, and change
+ Own platform lifecycle events: upgrades, expansions, decommissioning, and recovery
**Distributed Systems Expertise**
+ Operate and evolve stateful distributed systems where data placement, replication, and recovery are critical
+ Reason about failure modes such as backpressure, rebalancing, region movement, replication lag, and rolling upgrades
**Security**
+ Operate and maintain Kerberized platforms, including authentication, authorization, and secure service-to-service communication
+ Treat security as a first-class architectural concern
**Automation**
+ Design and evolve an Ansible- and Terraform-driven automation framework
+ Treat automation as production software: versioned, reviewed, tested, and improved
+ Eliminate operational toil by encoding reliability and safety into the platform
**Incident Leadership & Prevention**
+ Serve as the ultimate escalation point for complex or ambiguous incidents
+ Focus on eliminating entire classes of failure, not just resolving individual issues
**Representation**
+ Represent SRE and platform engineering in high-visibility and sensitive forums
+ Communicate clearly with engineering leadership and partner teams
**Responsibilities**
The team operates within the Oracle Health Data & Analytics Platform, supporting one of Oracle Health's core products, HealtheIntent. We operate the big data and streaming infrastructure that enables downstream teams to deliver reliable customer-facing solutions at scale, while continuously improving operability and efficiency.
**Required Experience**
+ 8+ years operating large-scale, customer-facing distributed platforms
+ Deep experience with HDFS, YARN, HBase, Kafka, Storm, or similar systems
+ Strong background in Linux, networking, and distributed system troubleshooting
+ Infrastructure-as-Code using Ansible and Terraform
+ Scripting and automation using Python, Ruby, and Bash
+ Hands-on experience operating Kerberized environments
+ Proven ability to define and document technical architecture for complex systems
+ Demonstrated ownership of shared platforms with broad blast radius and multiple downstream consumers
+ Experience designing observability and capacity models for distributed platforms
**Required Qualifications:**
+ U.S. Citizenship and eligibility for a Federal Security Clearance
+ 10+ years of technical experience relevant to this position
+ Ability to communicate effectively and build rapport with team members
+ BS or MS in Computer Science, or equivalent
\#LI-HR1
**Responsibilities**
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
Disclaimer:
**Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.**
**Range and benefit information provided in this posting are specific to the stated locations only**
US: Hiring Range in USD from: $86,400 to $199,500 per annum. May be eligible for bonus and equity.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
1. Medical, dental, and vision insurance, including expert medical opinion
2. Short term disability and long term disability
3. Life insurance and AD&D
4. Supplemental life insurance (Employee/Spouse/Child)
5. Health care and dependent care Flexible Spending Accounts
6. Pre-tax commuter and parking benefits
7. 401(k) Savings and Investment Plan with company match
8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
9. 11 paid holidays
10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
11. Paid parental leave
12. Adoption assistance
13. Employee Stock Purchase Plan
14. Financial planning and group legal
15. Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - IC4
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_************* or by calling *************** in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
$86.4k-199.5k yearly 4d ago
Site Reliability Engineer
Comtech 4.3
Reliability engineer job in Seattle, WA
Comtech is a woman-owned small business founded in 1998 and headquartered in Reston, VA. We offer IT solutions across the disciplines of program/project management, applications development, infrastructure, Cyber security, and enterprise content/data management services. We have developed our methodologies and processes based on the IT Infrastructure Library (ITIL) v.3 Framework across enterprise infrastructure operations. These methodologies and processes are reinforced through our organization's externally accredited certifications, which include ISO 9001:2008 Quality Management System (QMS), ISO/IEC 20000-1:2011 IT Service Management Systems (SMS, corporate ITIL certification), ISO 27001:2005 Information Security Management System (ISMS), and CMMI-DEV Level 3"
Job Description
Job Title: Sr. Site Reliability Engineer
Duration: Long Term
These are top 4 criteria important for SREs
2. Platform experience = Chef, Puppet, Azure, Ansible. Good hands on in implementing and executing at least 2 of these. This requires language on Python, PowerShell, Ruby, Perl. At least experience in 2 of these languages is must - Requires 30% of work
4. Agile and project methodologies knowledge - 10% work
As a Sr. Site Reliability Engineer - you will be responsible for the day-to-day maintenance and administration of Internet-based enterprise systems. On an on-going basis, this position will identify root causes of operational issues in order to resolve them. As required, this position will help develop tools and scripts to facilitate that maintenance and administration.
This position will also work closely with other teams to document the enterprise infrastructure and monitoring systems. You will also be responsible for planning and execution of small to large-scale projects within the Technology teams under the direction of the manager.
This role requires your A-Game: deep technical proficiency in both enterprise-scale systems as well as next gen cloud native applications required. So if you believe, like we do, that a cup of coffee can change a life and change our world, come check us out and help us deliver that same amazing experience to our customers around the globe.
Must Haves/Nice to Haves:
· Proven ability to participate with other functional teams in systems integration and design including writing operational specifications, test plans and requirements management with attention to detail.
· Web (IIS, Apache), .Net & Java application (Tomcat, Jboss, etc) server expertise including installation, administration, configuration, troubleshooting, performance tuning, preventative maintenance, capacity planning, monitoring, and security procedures.
· Experience with Configuration Management platforms (Chef, Ansible, CFEngine, Puppet, etc.).
· Understanding of internet standards such as HTTP, DNS, FTP, SSH, HTML, XML, JDBC, ODBC, SNMP and other protocols.
· Knowledge of storage systems (SAN, NAS, RAID Array, etc).
· Network hardware architecting experience with load balancing equipment, switches, routers, and network troubleshooting.
· Experience working with ITIL and Service Management best practices is a plus.
· Demonstrated knowledge of agile project methodologies.
· 4+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
Wasim Ahmed
************
Additional Information
** Please share me your updated word copy of Resume.
*** I Appreciate, if you refer to someone who is looking for this position.
$110k-145k yearly est. 60d+ ago
OpenShift/Kubernetes Site Reliability Engineer
Ford Motor Company 4.7
Reliability engineer job in Olympia, WA
This Kubernetes Site Reliability Engineer position will design and provision infrastructure supporting cloud native applications alongside a geographically distributed team. Emphasis will be on container strategies and ecosystem (Kubernetes) supporting Ford's rapidly increasing data and compute requirements. These environments will be both multi-cloud (GCP, Azure) and within our on-premise datacenters. This role will have opportunity to drive and shape new solutions and feature capability that improve our Kubernetes platform's developer experience (reducing toil and complexity).
+ Develop automation (Pipelines/Operators) to support platform management and end user capabilities
+ Engineering of new platform feature capabilities
+ Research / prove-outs of new technologies to extend platform functionality and increase ease-of-use and developer barrier to entry
+ Lifecycle management (Upgrades / Maintenance) and support of all Kubernetes clusters (OpenShift) deployed globally by the team
_Technical Skills Required:_
+ Bachelors in Computer Science or related field or equivalent work experience
+ (3+ years) Strong Linux knowledge on multiple platforms (RHEL, Ubuntu, SUSE).
+ (1+ years) Experience in use of Kubernetes.
+ (1+ years) Experience in use of container technologies.
+ (3+ years) In-depth use of Cloud-based PaaS and IaaS. Google Cloud (GCP) and Azure experience preferred.
+ (2+ years) Experience with GIT and CI/CD pipelines using Tekton, GitHub Actions, CloudBuild,and/or Jenkins.
+ Experience using PaaS solutions such as Cloud Foundry.
+ Demonstrable proficiency in at least one programming or scripting language. Go and/or Bourne Shell preferred.
+ Demonstrated knowledge of RESTFUL API concepts and JSON formats.
+ Working knowledge of application development architectures, web deployment methodologies and mobile technologies.
+ This position requires a wide breadth of skills and experience in-order to develop new application hosting platforms and strategies.
_Soft Skills Required:_
+ Work effectively in a team environment. Team leader capability desired.
+ Pairing with team members occasionally required.
+ Primary team is located locally with other individuals distributed globally.
+ Must have good communication skills.
+ Must a self-starter, capable of working independently and within a team with minimum supervision.
+ Ability to work with a variety of cross functional teams to support deployment to remote locations across the global.
_Technical Skills Preferred:_
+ Experience with IT Automation products such as Terraform, Chef, etc.
+ Experience with VMWare ESX and vCenter.
+ Experience with network load balancers (SLB, GSLB).
+ Exceptional technical writing skills (run books, DR plans, platform architecture, etc.)
You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply!
As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builder...or all of the above? No matter what you choose, we offer a work life that works for you, including:
- Immediate medical, dental, and prescription drug coverage
- Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
- Vehicle discount program for employees and family members, and management leases
- Tuition assistance
- Established and active employee resource groups
- Paid time off for individual and team community service
- A generous schedule of paid holidays, including the week between Christmas and New Year's Day
- Paid time off and the option to purchase additional vacation time.
For a detailed look at our benefits, click here: Benefit Summary (**********************************************************************************************************************
**_This role is remote but if you live within 50 miles of a Ford Hub (Dearborn, MI or Palo Alto), you will be required on-site 4x a week._**
**_*Visa Sponsorship is NOT provided for this specific role_** *****
**_*Relocation assistance IS NOT provided for this specific role*_**
Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire.
We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call **************.
\#LI-Remote
\#LI-DS2
GSR6-8
**Requisition ID** : 52545
$113k-142k yearly est. 60d+ ago
Alibaba Cloud-ECS Site Reliability Engineer-Seattle
Alibaba Group Ltd.
Reliability engineer job in Seattle, WA
Minimum qualification: - Professional Knowledge and Experience ● Bachelor's degree or higher in Computer Science, Information Technology, or a related field. ● At least 3 years of experience in system operations or SRE, with familiarity in cloud computing services and core products (e.g., ECS, K8S,Heterogeneous Computer, etc.). ● Familiarity with the design and optimization of cloud resource provisioning and delivery systems; experience in serving overseas customers is preferred. ● In-depth understanding of the overall architecture and operational mechanisms of the elastic computing product line, with the ability to quickly identify and resolve complex issues. Preferred qualification: - Possession of cloud-related certifications (e.g., ACP, ACE, or other major cloud vendor certifications). ● Participation in the architectural design or performance optimization projects of large cloud platforms. ● Outstanding contributions in system stability assurance, automation tool development, or cloud-native domains are highly valued. Position Highlights As a SRE for elastic computing, you will have the opportunity to: ● Deeply engage in the core operations of Alibaba Cloud's elastic computing product line, ensuring service stability for global users. ● Explore cutting-edge technologies in virtualization, containerization, cloud-native, driving technological innovation. ● Grow within an open and innovative team, collaborating with top engineers to solve complex technical challenges. If you are passionate about technology, strive for excellence, and wish to leverage your expertise in the cloud computing domain, we welcome you to join us! The pay range for this position at commencement of employment is expected to be between $133,200/year and $219,600/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If hired, employee will be in an "at-will position" and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors. Alibaba U.S. based full time regular employees have access to medical, dental, and vision insurance, a 401(k) plan and basic life insurance, and wellbeing benefits like FSA, subject to the terms and conditions of the applicable plans then in effect. U.S. based employees are also eligible to receive up to 12 paid holidays, accrue up to 15 paid vacation days for this position, and receive up to 72 hours paid sick time (front-loaded) per calendar year.
Elastic Compute Service (ECS) is a core product of Alibaba Cloud. The Elastic Compute team is dedicated to building world-leading cloud computing nfrastructure. As a key component of Alibaba Cloud's self-developed Apsara operating system , Elastic Compute Service (ECS) provides full-stack computing resources covering virtual machine instances, container services and Heterogeneous computing clusters. Through technological innovation and product optimization, the Alibaba Cloud Elastic Compute team continuously drives advancements in cloud computing technologies, delivering high-quality computing services to users worldwide . Our goal is not only to support enterprises in achieving elastic scalability but also to deeply empower infrastructure innovation in the New era . Our mission is to build an intelligent foundation of "Computing as a Service," enabling developers to focus on businesses to concentrate on breakthroughs, without worrying about the complex engineering implementations from chips to clusters . SRE Team: The Alibaba Cloud Elastic Compute Service (ECS) SRE (Site Reliability Engineering) team is a critical force in ensuring system stability and reliability. The SRE team focuses on guaranteeing the high availability, high performance, and robust stability of ECS products through technical expertise and innovation. The Alibaba Cloud ECS SRE team is not only a core technical safeguard but also a driver of technological innovation and continuous optimization . By leveraging technical capabilities and collaborative teamwork, we ensure the stability and reliability of ECS products, safeguarding global customers' businesses. Additionally, we are committed to advancing cloud computing technologies through knowledge sharing and industry collaboration . Joining the Alibaba Cloud ECS SRE team offers the opportunity to engage in the development and optimization of world-leading cloud computing technologies, while growing alongside a passionate and creative team. This is an SRE or DevOps position focused on the entire Elastic Computing product line. The responsibilities of this role include: 1. Stability, Performance Optimization, Monitoring, and Operations: Oversee the stability, performance optimization, monitoring, and operational work for multiple core products of Alibaba Cloud (such as ECS, ACK, ACS, Heterogeneous computer cluster, OOS, Compute Nest, etc.), taking responsibility for the online stability of these products. 2. Operation System and Online System Development: Engage in the development of operation systems and some online systems. Through tools, process optimization, and system improvements, ensure the stability and performance of Alibaba Cloud's Elastic Computing-related products. 3. Customer and Team Collaboration: Work closely with other teams (such as R&D, after-sales support, etc.) to ensure efficient technical support and problem resolution. Candidates can choose to take responsibility for one or more core duties based on their expertise. Meanwhile, we are looking for experts who possess cross-team collaboration skills and system-level thinking abilities.
$133.2k-219.6k yearly 60d+ ago
Site Reliability Engineer
Ensono 4.4
Reliability engineer job in Olympia, WA
Site Reliability EngineerRemote - United StatesJR012690 At Ensono, our **Purpose is to be a relentless ally, disrupting the status quo and unleashing our clients to Do Great Things** **_!_** We enable our clients to achieve key business outcomes that reshape how our world runs. As an expert technology adviser and managed service provider with cross-platform certifications, Ensono empowers our clients to keep up with continuous change and embrace innovation.
We can **Do Great Things** because we have great Associates. The Ensono Core Values unify our diverse talents and are woven into how we do business. These five traits are the key to achieving our purpose.
Honesty - Reliability - Curiosity - Collaboration - Passion
**About the role and what you'll be doing:**
We are seeking an experienced Site Reliability Engineer (SRE) with expertise in Infrastructure as Code tools like Terraform, core CI/CD tools such as Azure DevOps, and monitoring tools including DataDog and AWS CloudWatch. The ideal candidate will have commercial experience in technologies like Dotnet or Java, and be skilled in troubleshooting, incident resolution, and improving service and change management processes. Strong leadership in client-facing discussions and engagement with third-party suppliers is essential. An SRE Foundation certificate and a cloud provider associate-level certification are highly beneficial.
+ Commercial experience and proficiency with industry standard:
+ IAC tooling (Terraform preferably, or ARM/bicep and CloudFront)
+ Core CI/CD Tooling (Azure DevOps, GitHub Actions or Gitlab)
+ Monitoring Tooling (Splunk, NewRelic, Azure Monitor, AWS CloudWatch)
+ Commercial experience in at least one core technology (Dotnet, Java, AI/Data Engineering, Golang)
+ Troubleshooting issues and identifying systemic failings indicated by incidents/failures
+ Implementing fixes
+ Proposing solutions for reducing toil
+ Providing leadership in the Incident resolution process, including creating and maintaining documentation, and providing key input to Post-mortem analysis
+ Improving Service Requests and Change Management processes, both technically and through stakeholder management).
+ Participate in the process for, and Proactively mitigate risks in a Security management process (Vulnerabilities in Code, Infrastructure, Dependencies)
+ Lead discussion in client-facing meetings and discussions around the SRE process, and identifying areas for increasing SRE footprint.
+ Engaging with suppliers and 3rd parties for support, requests and opportunities
**We want all new Associates to succeed in their roles at Ensono. That's why we've outlined the job requirements below. To be considered for this role, it's important that you meet all Required Qualifications. If you do not meet all of the Preferred Qualifications, we still encourage you to apply.**
**Required Qualifications**
+ 3-9 Years experience
+ Bachelor's degree (or equivalent) in computer science or related discipline
+ SRE Foundation certificate (DevOps Institute) and a Cloud provider (AWS, Azure, GCP) 'associate'-level certification, or completed during the probationary period.
+ Proficiency in Azure and Kubernetes, with hands-on experience in managing and deploying applications.
+ Expertise in Infrastructure as Code (IaC) using Terraform for efficient and scalable infrastructure management.
+ Familiarity with Harness for continuous delivery and deployment processes.
**Preferred Qualifications**
+ Certified Kubernetes Administrator / Application Developer
+ Certified Azure DevOps Engineer
+ Experience with monitoring tools such as NewRelic or Splunk for effective system monitoring and alerting.
+ Strong programming skills in .Net, Java, or JavaScript for developing robust and scalable applications
**Why Ensono?**
Ensono is a place to make better happen - for our clients and for your career. You can do great things through innovation or collaboration, by learning or volunteering, or to promote diversity and inclusion. You can do great things for your own health or for a healthier planet. Whatever it means to you to do great things we want Ensono to be the place you can do it.
We are a client-facing business, but we do encourage clients to allow us to work remotely most of the time so if you are not required to be on a client site, you can choose to work from home or in our Ensono offices.
Some of our benefits include:
+ Unlimited Paid Days Off
+ Three health plan options
+ 401k with company match
+ Eligibility for dental, vision, short and long-term disability, life and AD&D coverage, and flexible spending accounts
+ Family Forming Benefit including fertility coverage and adoption/surrogacy reimbursement
+ Paid childbearing and paternal leave
+ Education Reimbursement, Student Loan Assistance or 529 College Funding
+ Sabbatical leave
+ Wellness program
+ Flexible work schedule
As of the date of this posting, a good faith estimate of the current pay scale for this role is $85,000 to $135,000 annually based on a full-time schedule. Please note that placement in the range may vary based on numerous factors including but not limited to skills, experience, internal equity, and business needs. In addition to base salary, other compensation programs, depending on eligibility, include an annual bonus plan based on company and individual performance [OR] a role-based, sales-incentive plan, and an equity grant under our Associate Equity Appreciation Program.
Ensono is an Equal Opportunity/Affirmative Action employer. We are committed to providing equal employment to our Associates and building a diverse and inclusive workforce. All qualified applicants will be considered without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability, or other legally protected basis, in accordance with applicable law.
Pay transparency nondiscrimination statement/posting OFCCP's pay transparency policy can be found on OFCCP's website.
If you need accommodation at any point during the application or interview process, please let your recruiter know or email
******************************
.
JR012690
$85k-135k yearly Easy Apply 3d ago
Site Reliability Engineer
DAT 4.6
Reliability engineer job in Seattle, WA
DAT is an award-winning employer of choice and a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on DAT for the most relevant data and most accurate insights to help them make smarter business decisions and run their companies more profitably. We operate the largest marketplace of its kind in North America, with 400 million freights posted in 2022, and a database of $150 billion of annual global shipment market transaction data. Our headquarters are in Denver, CO, and Beaverton, OR, with additional offices in Seattle, WA; Springfield, MO; and Bangalore, India. For additional information, see *******************
Job Application Deadline: 01/20/2026
The Opportunity
DAT is looking for a Senior Site Reliability Engineer to join our SRE platform team. This position will work hybrid in Seattle, WA.
Candidate profile
DAT is seeking an experienced Senior Site Reliability Engineer to help grow our SRE practices. In this role, you will be responsible for contributing to technical initiatives and enhancing your skills. You'll work closely with development teams and platform architects to achieve critical reliability goals and help scale our platform. A successful candidate will be instrumental in driving key technical initiatives, fostering a culture of continuous improvement, and significantly enhancing their own professional expertise.
This role necessitates close collaboration with various stakeholders, including our dedicated development teams and platform architects. The primary objective of these partnerships is to collectively achieve ambitious reliability goals and strategically scale our platform to meet evolving growth of the company. The SRE will be responsible for ensuring the stability, performance, and scalability of our systems, implementing robust monitoring solutions, automating operational tasks, and proactively identifying and resolving potential issues. This will involve a deep understanding of distributed systems, cloud infrastructure, and a commitment to best practices in site reliability engineering.
What You'll Do
Contribute to the design, implementation, and maintenance of scalable and reliable systems. Collaborate with engineering teams to ensure reliability targets are met.
Identify and troubleshoot complex issues across distributed systems, ensuring minimal downtime and optimal performance.
Advocate for and implement SRE best practices, including automation, monitoring, and incident response, to enhance system resilience.
Participate in capacity planning and performance tuning to proactively address potential bottlenecks and support future growth.
Leverage new AI tools to assist with software development and observability tasks.
Assist and respond to critical engineering incidents.
Improve your engineering skills within the SRE team.
Provide technical guidance and best practices for use of cloud infrastructure and tooling. Contribute to Infrastructure-as-Code within the platform. We strive to automate all the things!
Contribute to reliability-focused initiatives and projects.
Help optimize our work to be customer-focused. Continually seek feedback from our customers on how we can improve.
Assist in migrating legacy systems to modern, scalable cloud environments.
Help develop and drive a culture of continuous improvement with the Platform Engineering and Software Engineering groups.
Participate in an on-call rotation.
The Skills and Experience You'll Bring
Strong collaboration and problem-solving abilities, especially within SRE or Platform Engineering/Infrastructure teams.
Total of 4 to 6+ years industry experience
At least 2+ years of software engineering experience (JavaScript, Python, Go, Java/Kotlin, C++, etc)
Extensive experience with modern observability tools (Datadog preferred). Experience working with development teams on APM monitoring and instrumentation.
Extensive experience with cloud platforms (preferably AWS).
Demonstrated success in contributing to large technical initiatives and acting as a driving force to complete those initiatives.
Proven experience assisting in modernizing legacy code and infrastructure.
Ability to work closely with peer teams, platform/software architects and management to drive key reliability improvements.
Willingness to share your expertise among team members and others within the engineering organization. We value upleveling our peers however we can.
Understanding of cloud infrastructure, automation, and best practices for reliability.
Strong desire to automate toil work and create easily maintainable tooling for the team and developers.
Strong Experience with our tools (Kubernetes, ArgoCD, Terraform, Github Actions).
Experience with AI Tooling/Observability a plus.
Why DAT?
DAT is an award winning employer of choice.
For starters, we have a hybrid work environment, but we also know what makes a great workplace. We have a time-tested and resolute set of operating values predicated on integrity, mutual respect, open communication, and executing with excellence. These values inform our strategic vision as much as any one of our products does. We've been an employer of choice in the Portland metropolitan area for four decades, and within one year of opening our Denver office, DAT was #26 on Built In Colorado's 100 Best Places to Work In Colorado.
Medical, Dental, Vision, Life, and AD&D insurance
Parental Leave
Flexible Vacation Time (FVT)
An additional 10 holidays of paid time off per calendar year
401k matching (immediately vested)
Employee Stock Purchase Plan
Short- and Long-term disability sick leave
Flexible Spending Accounts
Health Savings Accounts
Employee Assistance Program
Additional programs - Employee Referral, Internal Recognition, and Wellness
Free TriMet transit pass (Beaverton Office)
Competitive salary and benefits package
Work on impactful projects in a cutting-edge environment
Collaborative and supportive team culture
Opportunity to make a real difference in the trucking industry
Employee Resource Groups
For Washington-based candidates, in compliance with the Washington State Pay Transparency Law, the salary range for this role is $102,000.00 - $139,000.00 + target bonus. DAT considers factors such as scope and responsibilities of the position, candidate's work experience, education and training, core skills, internal equity, and market and business elements when extending an offer.
DAT embraces the value of a diverse workforce, and believes it is a core strength of our company that we encourage those values in every DAT employee, at every level of our organization, regardless of tenure or rank. We provide equal employment opportunities (EEO) to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state, and local laws.
Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information. 41 CFR 60-1.35(c)
#LI-RF1
#LI-hybrid
$102k-139k yearly Auto-Apply 60d+ ago
Platform / Site Reliability Engineer
Axiom Software Solutions Limited 3.8
Reliability engineer job in Seattle, WA
We are looking for a skilled Platform Engineer / SRE to design, implement, and maintain our cloud infrastructure and platforms. The ideal candidate will have a strong background in Kubernetes administration, Azure cloud services, infrastructure as code, and automation. You will play a crucial role in ensuring the scalability, reliability, and security of our systems while supporting our AI/ML initiatives.
* Design, deploy, and manage infrastructure solutions using Terraform, ensuring scalability, security, and reliability.
* Develop and maintain infrastructure as code scripts to automate the provisioning and configuration of resources.
* Ensure version-controlled, repeatable deployments using IaC best practices.
* Implement and manage Kubernetes clusters for containerized applications.
* Collaborate with development teams to deploy, scale, and optimize applications in Kubernetes environments.
* Leverage scripting languages (e.g Python) to automate routine tasks and streamline workflows.
* Implement continuous integration and continuous deployment (CI/CD) pipelines for efficient software delivery.
* Ensure seamless integration of infrastructure components with CI/CD pipelines.
* Design, deploy, and maintain scalable and reliable infrastructure for AI/ML platforms.
* Implement containerization (Docker) and orchestration (Kubernetes) solutions for deploying and managing AI/ML applications.
* Ensure containerized applications are secure, scalable, and easily deployable.
* Enable seamless integration of AI/ML models into the platform, ensuring data pipelines are efficient and reliable.
* Establish monitoring and alerting systems to ensure the health and performance of AI/ML platforms.
* Implement security best practices for AI/ML platforms, ensuring data privacy and compliance with industry standards
* Bachelor's degree in computer science, Engineering, or a related field
* Proven experience in Kubernetes administration, specifically with Azure Kubernetes Service (AKS)
* Strong proficiency in Azure cloud services and Azure ARM templates
* Expert-level scripting skills in PowerShell and Python
* Hands-on experience with Terraform for infrastructure as code
* Solid understanding of CI/CD principles and experience with Azure DevOps
* Experience with containerization technologies, particularly Docker
* Strong problem-solving skills and ability to work in a fast-paced environment
* Excellent communication and collaboration skills
$122k-165k yearly est. Auto-Apply 60d+ ago
Lead Site Reliability Engineer
JPMC
Reliability engineer job in Seattle, WA
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Enterprise technology, infrastructure Platforms team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
Job responsibilities
Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
Leads initiatives to improve the reliability and stability of your team's applications and platforms using data-driven analytics to improve service levels
Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
Documents and shares knowledge within your organization via internal forums and communities of practice
Required qualifications, capabilities, and skills
Formal training or certification on software engineering and SRE concepts and 5+ years applied experience.
Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
Proficiency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
Experience with troubleshooting common networking technologies and issues
Ability to identify and solve problems related to complex data structures and algorithms
Preferred qualifications, capabilities, and skills
Drive to self-educate and evaluate new technology
Ability to teach new programming languages to team members
Ability to expand and collaborate across different levels and stakeholder groups
AWS and/or CKA certified preferred
$106k-150k yearly est. Auto-Apply 60d+ ago
DevOps / Site Reliability Engineer
Ittconnect
Reliability engineer job in Seattle, WA
ITTConnect is seeking a DevOps \/ Site reliability engineer to work in a hybrid mode for one of our clients.
Seniority: 8+ years of total experience
About the role:
We would like someone to come on primarily as a âDevOps Engineerâ but also have some SRE capabilities.
This hybrid role is focused on enabling our application development by building automation and improving the health of our applications by bringing SRE best practices into effect. Ops pipelines for Performance testing, write custom scripts for automation of tasks.
Requirements
Tech â Must have
Docker \- expert working with containers â Docker expertise a must
Docker builds, dockerfiles, docker compose
Container registries, kaniko, devcontainers
Kubernetes â expert at working with Kubernetes from the application development side
Certificate Kubernetes Application Developer (CKAD) a huge plus
Experience working with GitOps tooling: ArgoCD or Flux experience needed.
CI\/CD:
Github Actions or Azure DevOps preferred
Other platforms ok like Jenkins, BitBucket, AppVeyor, etc..
Observability
Application monitoring, alerting, dashboarding â Datadog preferred
Look for candidates with experience working with Datadog.. Or a suitable alternative...
Grafana, loki, Prometheus, open telemetry, Promtail, elk
Linux administration(some)
Should be highly competent with bash, shell\-scripting, and many basic linux commands
Tech â Nice\-to\-have
App Dev:
Any Node.js and Typescript experience is a HUGE plus, though not fully necessary for this role
Go â also nice\-to\-have, as it could be used for ci\/cd and automation tooling
Python â not used in our stack, but familiarity with app dev code is a big plus
Kubernetes Administration â cluster administration â certificated kubernetes admin(CKA) preferre
Advanced Linux admin â ansible experience preferred
Terraform â minimal experience here needed
Azure \/ AKS â We shouldnât need to do a lot in azure, but experience working with azure and Azure Kubernetes Service (AKS) would be valuable
"}}],"is Mobile":false,"iframe":"true","job Type":"Contract","apply Name":"Apply Now","zsoid":"654899455","FontFamily":"PuviRegular","job OtherDetails":[{"field Label":"Job Opening ID","uitype":111,"value":"1013"},{"field Label":"Service Type","uitype":2,"value":"Staff Augmentation"},{"field Label":"Duration","uitype":2,"value":"6+ months"},{"field Label":"Workplace","uitype":2,"value":"On\-site"},{"field Label":"Date Created","uitype":24,"value":"20\-Nov\-2025"},{"field Label":"Visa Transfer Provided?","uitype":100,"value":"No"},{"field Label":"Open to C2C Vendors?","uitype":2,"value":"Yes"},{"field Label":"Priority","uitype":2,"value":"2 \- Medium"},{"field Label":"Industry","uitype":2,"value":"11 \- IT Services \/ Consulting \/ Technology \/ Telecom"},{"field Label":"Job Description URL","uitype":21,"value":"http:\/\/bit.ly\/ITTConnect\-SRESeattle"},{"field Label":"City","uitype":1,"value":"Seattle"},{"field Label":"State\/Province","uitype":1,"value":"WA"},{"field Label":"Zip\/Postal Code","uitype":1,"value":"10010"}],"header Name":"DevOps \/ Site Reliability Engineer","widget Id":"**********00072311","is JobBoard":"false","user Id":"**********00185003","attach Arr":[],"custom Template":"2","is CandidateLoginEnabled":false,"job Id":"**********42897998","FontSize":"15","location":"Seattle","embedsource":"CareerSite"}
$106k-150k yearly est. 60d ago
Senior Site Reliability Engineer
Supio
Reliability engineer job in Seattle, WA
Who We're Looking For We're looking for a hands-on, high-agency Site Reliability Engineer to help shape and scale the reliability layer of our stack. You'll own the release pipeline end-to-end - managing daily releases, weekly deploys, and hotfixes - while also automating infrastructure, monitoring systems, and GitHub workflows. This is a software engineering role, deeply embedded in DevOps culture, with significant autonomy and direct impact on the pace and safety of our shipping process.
You'll work closely with engineers, product leads, and company leadership to ensure uptime, speed, and confidence in every deploy.
What You'll Do
* Own Deployments: Lead our release and deployment process - from daily rollouts to weekly deploys and hotfix coordination. Build safe, repeatable, and observable workflows.
* GitHub Operations: Manage GitHub branching strategies, pull request flows, merge policies, and GitHub Actions. Set and enforce collaboration standards for the engineering team.
* Infrastructure & Monitoring: Build and maintain resilient AWS-based infrastructure. Set up and manage observability tools (logs, metrics, traces), configure alarms, and be the first responder for incidents. Triage, escalate, or resolve based on impact.
* Automation & Internal Tooling: Write scripts, services, and automations that reduce friction and improve deployment confidence. Using AI tools to generate code is encouraged and expected - you'll be comfortable guiding, adapting, and integrating AI-assisted outputs into production workflows.
* Software Development: You'll contribute code when needed - whether that's building internal tools, improving system reliability, or unblocking a deploy. This is not a sprint-based role, but strong software fundamentals are key to success.
* Support Global Teams: Work off-hours as needed to unblock offshore teams and maintain deployment velocity across time zones.
You're a Great Fit If You
* Have 3-6+ years in SRE, DevOps, or infrastructure roles with production ownership.
* Started your career in software development - and still enjoy writing code.
* Are fluent in or at least familiar with Bash, Python, TypeScript, and Postgres SQL.
* Are a confident AWS operator and know your way around EC2, Lambda, RDS, IAM, and VPCs.
* Have strong experience with GitHub workflows, including GitHub Actions and release automation.
* Are comfortable using AI tools (Claude, ChatGPT, etc.) to generate code - and have the skill to audit and adapt that code to meet production standards.
* Are familiar with CI/CD principles and enjoy owning the full deployment lifecycle.
* Are comfortable being on-call and understand how to design systems for both speed and safety.
* Can operate with a high level of autonomy in fast-moving, ambiguous environments.
Compensation
The base salary range for this position in Seattle is $168,000 - $220,000 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate's qualifications, skills, competencies, and experience.
$168k-220k yearly 40d ago
Network Reliability Engineer
Govcio
Reliability engineer job in Olympia, WA
GovCIO is currently hiring for Network Reliability Engineer to support our client's contract needs.The Network Reliability Engineer will support, maintain, optimize, monitor, and participate in troubleshooting efforts for a mature network environment for a large Government Agency.This position is located in the within the United States and is fully remote position.
**Responsibilities**
The ideal candidate will have extensive experience with SolarWinds Network Performance Monitor (NPM), Network Configuration Manager (NCM), NetFlow Traffic Analyzer (NTA), and SolarWinds Security Event Manager (SEM). Candidate will also have substantial experience with developing and maintaining interactive webpage dashboards related to SolarWinds suite. Highly desirable skills would include experience with CISCO switches and routers, CISCO specific monitoring systems, packet capture systems, packet broker switches, and automation provisioning. Additional skills that would provide value include experience with COTS DDI products, load balancers, NAC, AAA systems, SSL certificate management, and proxy systems.
+ Perform Incident Response support by monitoring ticket system and taking appropriate actions on network related Incidents.
+ Manage, maintain, and support existing SolarWinds instance which monitors and generates alerts for network equipment.
+ Manage, maintain, and support existing interactive webpage dashboards through programming and database calls to SolarWinds and various other systems utilizing SWQL Studio.
+ Participate in troubleshooting sessions involving connectivity issues for infrastructure and applications by using SolarWinds and/or analyzing packet capture data, syslogs and related log data.
+ Use network diagnostic tools to reduce outage times by quickly identifying network related anomalies and issues.
+ Generate root cause analysis for issues diagnosed on the network.
+ Use network diagnostic tools to proactively address network anomalies and issues as well as optimize network performance.
+ Develop and maintain technical documentation as it pertains to the systems being managed.
+ Ensure compliance with security standards, policies, and best practices for IT systems and data protection.
+ Train and mentor internal teams on efficiencies gained by utilizing SolarWinds to monitor and diagnose network anomalies and issues.
**Qualifications**
Bachelor's with 5 - 8 years (or commensurate experience)
Required Skills and Experience
+ 5+ years of experience in monitoring and managing network equipment.
+ Experience with Incident and Request ticketing systems.
+ Strong knowledge and experience with SolarWinds and related dashboard activities.
+ Strong knowledge of networking protocols (TCP/IP, HTTP/HTTPS, DNS, etc.).
+ Solid knowledge of SNMP as it relates to SolarWinds implementations.
+ Solid network and cybersecurity fundamentals knowledge.
+ Fundamental knowledge of MPLS and WAN routing principles.
+ Fundamental knowledge of IPv4 and IPv6 protocols.
+ Basic understanding of Red Hat Enterprise Linux (RHEL 8/9) to administer Apache2 web server.
+ Basic understanding of Application Programming Interface (API) utilization to extract performance and informational data from various vendors.
+ Basic understanding of Bash shell scripting, Python, PHP, Perl, Java Script, MySQL database.
+ Proficiency with Microsoft Office suite.
+ Familiarity with cloud service providers (AWS, Azure, Google Cloud).
+ Familiarity with Data Center network environments.
+ Strong ability to work as part of a team, but also possess the drive and ability to perform duties autonomously if the project demands it.
**Clearance Required:** Must be able to obtain and maintain a HUD Public Trust
**Company Overview**
GovCIO is a team of transformers--people who are passionate about transforming government IT. Every day, we make a positive impact by delivering innovative IT services and solutions that improve how government agencies operate and serve our citizens.
But we can't do it alone. We need great people to help us do great things - for our customers, our culture, and our ability to attract other great people. We are changing the face of government IT and building a workforce that fuels this mission. Are you ready to be a transformer?
**What You Can Expect**
**Interview & Hiring Process**
If you are selected to move forward through the process, here's what you can expect:
+ During the Interview Process
+ Virtual video interview conducted via video with the hiring manager and/or team
+ Camera must be on
+ A valid photo ID must be presented during each interview
+ During the Hiring Process
+ Enhanced Biometrics ID verification screening
+ Background check, to include:
+ Criminal history (past 7 years)
+ Verification of your highest level of education
+ Verification of your employment history (past 7 years), based on information provided in your application
**Employee Perks**
At GovCIO, we consistently hear that meaningful work and a collaborative team environment are two of the top reasons our employees enjoy working here. In addition, our employees have access to a range of perks and benefits to support their personal and professional well-being, beyond the standard company offered health benefits, including:
+ Employee Assistance Program (EAP)
+ Corporate Discounts
+ Learning & Development platform, to include certification preparation content
+ Training, Education and Certification Assistance*
+ Referral Bonus Program
+ Internal Mobility Program
+ Pet Insurance
+ Flexible Work Environment
*Available to full-time employees
Our employees' unique talents and contributions are the driving force behind our success in supporting our customers, which ultimately fuels the success of our company. Join us and be a part of a culture that invests in its people and prioritizes continuous enhancement of the employee experience.
**We are an Equal Opportunity Employer.** All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, disability, or status as a protected veteran. EOE, including disability/vets.
**Posted Pay Range**
The posted pay range, if referenced, reflects the range expected for this position at the commencement of employment, however, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, education, experience, and internal equity. The total compensation package for this position may also include other compensation elements, to be discussed during the hiring process. If hired, employee will be in an "at-will position" and the GovCIO reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, GovCIO or individual department/team performance, and market factors.
**Posted Salary Range**
USD $90,000.00 - USD $120,000.00 /Yr.
Submit a referral to this job (*********************************************************************************************************************************
**Location** _US-Remote_
**ID** _2025-7212_
**Category** _IT Infrastructure & Network Engineering & Operations_
**Position Type** _Full-Time_
$90k-120k yearly 39d ago
Senior Reliability Engineer
Tin Can
Reliability engineer job in Seattle, WA
Tin Can is building a safer, simpler way for kids to connect - without smartphones. We're creating screen-free, delightful devices and services that let families call the people who matter most, free from the noise of today's digital world.
We're building a bold, authentic, nostalgic, and kinda quirky brand that resonates with folks who want something simpler & better for their kids than the tech-infused lives we're currently living (and who have a sense of humor about it). As we gear up to scale to thousands of families, we're ready to bring on an engineer to help make it happen.
The Role
We're looking for a Senior Infrastructure / Reliability Engineer to take clear ownership of platform reliability as we scale. This role exists to make our systems more predictable, observable, and resilient - and to ensure stability work doesn't compete with feature development.
You'll work closely with the founding team and senior engineers to reduce incidents, improve operational maturity, and support architectural changes required for growth.
What You'll Do
Own production reliability, including incident response, root-cause analysis, and follow-through
Improve observability (metrics, logging, alerting) so issues are easier to detect and diagnose
Strengthen infrastructure and deployment practices to support growth and reduce operational risk
Partner on architectural changes needed to scale the platform safely
Help establish on-call, incident, and postmortem practices that are effective and humane
Identify recurring failure patterns and drive systemic fixes over time
What We're Looking For
5+ years experience operating and improving production systems that matter to users
Comfort working across infrastructure, backend services, and cloud environments
Strong judgment under pressure and a bias toward durable, boring solutions
Ability to balance reliability with the realities of shipping and iteration
A collaborative mindset and interest in improving how we operate as a team and business
Why Join Tin Can
We're building tech that protects childhood: We're on a quest to give kids a more analog childhood-one with real conversations, real connection, and way less screen time. No doomscrolling, data mining, or dopamine traps-just a simpler, better way for kids to stay in touch. At Tin Can, security isn't a checkbox - it's how we earn and keep the trust of every family we serve. This is a rare opportunity to build something that truly matters: technology that protects what's most precious.
You'll have real ownership, not just responsibility: This is the dream role for an engineer who wants to fully own building and scaling secure systems from the ground up - with the full backing of the founding team. You'll have both the autonomy and the responsibility to shape Tin Can's security vision, tools, and culture as we scale.
Small, high trust team: Every company says “team,” but at Tin Can it means something different. We're a small, mission-driven group that genuinely has each other's backs - professionally and personally. You'll be joining a tight-knit crew where your ideas and instincts matter from day one.
Room to explore, not just execute: You'll have the space to explore new ideas, challenge assumptions, and build secure systems that reflect our values of trust, simplicity, and care.
If you're excited by the idea of taking a solid technical foundation and scaling it to support thousands of families-while making it more efficient, secure, and delightful along the way-please reach out. We can't wait to meet someone who's ready to jump in, level up what we've started, and help us grow.
$121k-166k yearly est. Auto-Apply 8d ago
Sr. Site Reliability Engineer
Insight Global
Reliability engineer job in Seattle, WA
An employer in the Pacific Northwest is seeking a highly skilled Senior Site Reliability Engineer (SRE). This role is critical in ensuring the reliability, scalability, and performance of our infrastructure and services. You will work on automation, infrastructure-as-code, and observability solutions while collaborating with cross-functional teams to deliver secure and efficient systems. You will design, implement, and maintain Infrastructure as Code (IaC) solutions using Ansible and Terraform for consistent and scalable deployments. You will develop automation scripts and tools in Python to streamline operational workflows, including system upgrades and configuration management. You will manage and optimize containerized environments using Kubernetes, ensuring high availability and resilience. You will drive automation for system upgrades and patching processes to reduce downtime and improve operational efficiency. You will collaborate on networking-focused projects, leveraging tools like NetBox and Infoblox for IP address management and network automation. You will support and enhance virtualization and cloud environments, including OpenStack and VMware, for hybrid infrastructure solutions. You will implement and maintain observability frameworks using Grafana and Prometheus to monitor system health and performance. You will partner with development and operations teams to define and track SLIs, SLOs, and KPIs, ensuring alignment with reliability goals.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to ********************.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: ****************************************************
Skills and Requirements
- 8+ years of experience in networking concepts and experience with network automation tools.
- 6+ years of hands-on experience with NetBox or Infoblox for network resource management.
- Proficiency in Python for automation and scripting tasks.
- 6+ years of expertise in Kubernetes, OpenStack, and VMware environments.
- Familiarity with observability tools such as Grafana and Prometheus.
- Solid understanding of IaC principles and experience with Ansible and Terraform.
- Ability to work in a fast-paced environment and collaborate effectively across teams. Experience mentoring less experienced SREs
$121k-166k yearly est. 33d ago
Senior Site Reliability Engineer, Tacoma
Onebrief
Reliability engineer job in Tacoma, WA
Onebrief is collaboration and AI-powered workflow software designed specifically for military staffs. By transforming this work, Onebrief makes the staff as a whole superhuman - meaning faster, smarter, and more efficient.
We take ownership, seek excellence, and play to win with the seriousness and camaraderie of an Olympic team. Onebrief operates as an all-remote company, though many of our employees work alongside our customers at military commands around the world.
Founded in 2019 by a group of experienced planners, today, Onebrief's team spans veterans from all forces and global organizations, and technologists from leading-edge software companies. We've raised $320m+ from top-tier investors, including Battery Ventures, General Catalyst, Sapphire Ventures, Insight Partners, and Human Capital, and today, Onebrief is valued at $2.15B. With this continued growth, Onebrief is able to make an impact where it matters most.
Security Clearance, Location, and Onsite Notice:
This role requires regularly working on-site at customer locations in Tacoma, WA (Joint Base Lewis-McChord).
If you are not currently within commuting distance, you must be willing to relocate
(note that Onebrief will provide relocation assistance).
Active Secret Clearance required.
About The Role
We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. You'll work closely with fellow SREs, security, and customer success.
You will be the first line of support for our mission critical deployments, and responsible for ensuring best-in-class service quality and issue resolution. You will work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works, from policy to implementation.
In addition to working at the customer, you will contribute directly to solutions that increase stability, performance, and security of our deployments, and improve the overall experience of deploying and managing Onebrief on premise.
About You
You care deeply about reliability and treat it as a core feature of any application or platform, with a bias toward “reliability over novelty.” You think about infrastructure and operability as products to be automated, well-documented, and continuously improved, and you aim to leave systems easier to operate than you found them.
You are equally comfortable leading a post-incident review, or diving into a kubectl shell to triage a complex production issue. You don't just fix problems; you translate constraints and failure modes into clear, automated guardrails and scalable, resilient architecture. For you, robust monitoring, actionable alerting, and insightful runbooks are core parts of the engineering process, not afterthoughts.
You mentor others, fostering a culture of blameless postmortems and proactive reliability. You collaborate naturally with application and platform teams, helping them move quickly but safely by building the tools, processes, and observability that make "fast recovery" a reality.
What You'll Do
You'll own the reliability, scalability, and security of the production application and/or platform. You will do this by:
Implementing a World-Class Observability Platform: Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana). You won't just track metrics; you'll create the actionable insights and automated alerting that allow teams to identify and resolve issues before they impact users.
Defining and Upholding Reliability: Define, measure, and own alerting that feeds into our Service Level Indicators (SLIs) and Service Level Objectives (SLOs), increasing trust internally and externally. You will be the organization's expert on what it means for our systems to be reliable and how to measure it.
Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents who will lead blameless post-mortems / After Action Reviews (AARs) that identify true root causes and drive automated, long-term solutions to prevent recurrence.
Automating for Scale and Security: Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud/on-prem environments using Infrastructure-as-Code (Terraform, Ansible). You will embed security and compliance controls (RMF, STIGs) directly into this automation.
Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation. You will partner with other teams to share best practices for air-gapped environments and support their readiness for production.
What We Look For
An active Secret clearance
5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus.
Proven partner to DevOps/Platform and application teams; collaborates well across functions and shares context openly.
A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement.
Technical expertise
Infrastructure as Code: Terraform (or CloudFormation), Ansible.
Containers and orchestration: Kubernetes design, deployment, and operations.
CI/CD: experience building and maintaining pipelines (GitLab CI/CD, Jenkins, GitHub Actions).
Scripting: proficiency with at least one of Python, Go, or Bash.
Cloud: Familiarity with AWS or AWS GovCloud.
Observability: Grafana stack, ELK stack, or Datadog.
Networking fundamentals: core protocols and secure configurations.
Bonus points (nice to have)
Experience in DoD environments and compliance frameworks (RMF, STIGs, ICD 503).
GitOps practices and toolchains.
Security‑minded design for sensitive environments.
Experience designing and implementing meaningful SLIs/SLOs (including error budgets) for complex, distributed systems.
Familiarity with on‑prem virtualization(VMware, Proxmox, Nutanix, Hyper-V, etc).
Service mesh exposure (Istio, Linkerd).
Relevant certifications (e.g., AWS DevOps Engineer, CKA/CKAD).
Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment.
Notice to Third Party Recruitment Agencies
Please note that Onebrief does not accept unsolicited resumes from recruiters or employment agencies. In the absence of an executed Recruitment Services Agreement, there will be no obligation to any referral compensation or recruiter fee. In the event a recruiter or agency submits a resume or candidate without an agreement Onebrief explicitly reserves the right to pursue and hire those candidate(s) without any financial obligation to the recruiter or agency. Any unsolicited resumes, including those submitted to hiring managers, shall be deemed the property of Onebrief.
$121k-166k yearly est. Auto-Apply 34d ago
Systems Safety Engineer
Overland Ai
Reliability engineer job in Seattle, WA
Founded in 2022 and headquartered in Seattle, Washington, Overland AI is transforming land operations for modern defense. The company leverages over a decade of advanced research in robotics and machine learning, as well as a field-test forward ethos, to deliver combined capabilities for unit commanders. Our OverDrive autonomy stack enables ground vehicles to navigate and operate off-road in any terrain without GPS or direct operator control. Our intuitive OverWatch C2 interface provides commanders with precise coordination capabilities essential for mission success.
Overland AI has secured funding from prominent defense tech investors including 8VC and Point 72, and built trusted partnerships with DARPA, the U.S. Army, Marine Corps, and Special Operations Command. Backed by eight-figure contracts across the Department of Defense, we are strengthening national security by iterating closely with end users engaged in tactical operations.
Role Summary
Overland AI is hiring Systems Safety Engineers to lead system safety engineering across our autonomous vehicles and programs. These roles sit within the Systems Engineering, Integration, and Test (SEIT) organization and partner closely with systems engineers, designers, and test teams to ensure safety is fully integrated into requirements, architecture, design, integration, and verification activities.
As a Systems Safety Engineer, you will develop and execute system safety programs across the full lifecycle-from concept and requirements definition through architecture, integration, testing, and field operations. You will produce and maintain standard safety artifacts aligned with DoD and industry expectations, including MIL-STD-882E-based analyses, UL 4600, ATEC safety releases, and hazard tracking, while working cross-functionally with hardware, software, test, and operations teams.
Safety Engineers collaborate closely with those building the system while maintaining independence in safety assessment. You will objectively evaluate hazards, risk controls, verification evidence, and operational suitability to support rigorous, defensible, and auditable safety determinations. This role emphasizes collaboration and alignment: safety analyses are developed alongside evolving system designs and remain tightly connected to system requirements, baselines, test plans, and operational concepts.
As Overland AI scales and pursues multiple safety-certified deployments in parallel, these roles are critical to sustaining disciplined engineering execution and credible safety approvals.
Key Responsibilities
System Safety Engineering
Develop and maintain system safety analyses in accordance with MIL-STD-882E and applicable customer or regulatory guidance.
Perform PHAs, FHAs, SHAs, SSHAs, and O&SHAs as appropriate to system maturity, incorporating operational concepts, mission scenarios, and anticipated use cases to assess safety impacts and residual risk.
Apply standard system safety analysis techniques (e.g., hazard analysis, FMEA/FMECA, fault tree analysis, functional hazard assessment) as appropriate to system scope and maturity.
Define safety requirements and constraints and ensure they are correctly flowed into system and subsystem requirements.
Review system architectures, autonomy behaviors, software-controlled functions, integration approaches, and operational workflows to identify hazards and inform safety requirements and constraints.
Facilitate cross-functional safety discussions with system and technical leads to surface hazards, align on safety intent, and ensure safety considerations are consistently integrated into system decisions.
Provide independent technical assessment of system designs and proposed mitigations, offering objective safety insights that inform system requirements, architectural decisions, and operational procedures.
Analyze software-controlled and autonomy-driven behaviors as part of system hazard identification, ensuring software-based safety controls and constraints are captured, traced, and supported by verification evidence.
Hazard Tracking & Safety Case Development
Maintain hazard tracking with clear traceability from hazards through mitigations, safety requirements, and verification activities.
Develop and maintain the system safety case, integrating hazard analyses, design controls, operational assumptions, and verification evidence into a coherent, defensible argument for safe operation.
Ensure software-based safety mitigations (e.g., autonomy behaviors, fault detection, failsafe logic) are explicitly captured within the safety case and supported by appropriate verification evidence.
Ensure the safety case evolves alongside system design, configuration changes, and operational refinement and is consistently integrated into system decisions.
Test & Verification Integration
Define safety-focused test objectives, acceptance criteria, and verification strategies aligned with system requirements and operational use cases.
Collaborate with Test & Evaluation teams to ensure safety-driven objectives are represented in test plans, procedures, and acceptance criteria.
Participate in lab, vehicle, and field test events to independently assess safety performance and mitigation effectiveness.
Review and evaluate verification evidence to confirm safety controls are effective, sufficient, and correctly implemented, and to support readiness and risk acceptance decisions.
Capture and organize verification evidence that supports safety assessments, readiness reviews, and release decisions.
Support readiness reviews, safety releases, and risk acceptance processes.
Customer & Certification Support
Prepare safety documentation and technical inputs required for ATEC-led evaluations, customer reviews, and non-DoD certification or approval efforts.
Support technical reviews and safety discussions with government and commercial stakeholders.
Clearly communicate safety rationale, risk posture, and supporting evidence to internal leadership and external reviewers.
Help translate technical safety work into clear, defensible narratives for external stakeholders.
Process, Standards, & Scaling
Contribute to company safety processes, templates, and standards to support multiple concurrent programs.
Help establish consistent safety practices across prototypes, fielded systems, and future production platforms.
Mentor engineers and help raise overall system safety maturity within the organization.
What You'll Need to Succeed
Bachelor's degree in Engineering or related technical discipline.
Experience performing system safety engineering for complex systems (autonomy, robotics, vehicles, aerospace, defense, or similar).
Knowledge of existing standards and regulations relevant in the automotive industry, esp. ISO 26262, ISO 21448, UL4600, MIL-STD-883E, JSSSEH.
Experience developing and maintaining safety artifacts suitable for external review, esp. SAR
Ability to operate independently and exercise sound engineering judgment in risk assessment.
Comfort working in hands-on, fast-moving environments with real hardware and field testing.
What Will Set You Apart
Direct experience supporting ATEC, DoD test agencies, or similar certification authorities.
Direct experience in the design and development of autonomous vehicles or advanced ADAS applications
Experience with autonomous or semi-autonomous systems.
Experience supporting both developmental testing and operational/test readiness activities.
Experience with SysML modeling and Model-Based Systems Engineering (MBSE) tools (Cameo, Goal Structured Notation)
Experience with managing requirements (Jama, Doors) and issue tracking (JIRA)
Location
The preferred location for this position is onsite in Seattle, WA.
Compensation
Annual Base Pay: $170,000 - $225,000 USD
Benefits
Equity compensation
Best-in-class healthcare, dental, and vision plans
Unlimited PTO
401(k) with company match
Parental leave
$67k-104k yearly est. Auto-Apply 36d ago
Senior Site Reliability Engineer
Astreya 4.3
Reliability engineer job in Seattle, WA
Salary Range
$98,040.00 - $154,800.00 USD (Salary)
Please note that the salary information provided herein is base pay only (gross); it does not include other forms of compensation which may or may not apply to this specific position, namely, performance-based bonuses, benefits-related payments, or other general incentives - none of which are guaranteed, may be subject to specific eligibility requirements, and are wholly within the discretion of Astreya to remit.
Further, the salary information noted above is a range that consists of a minimum and maximum rate of pay for this specific position. Where an applicant or employee is placed on this range will depend and be contingent on objective, documented work-related considerations like education, experience, certifications, licenses, preferred qualifications, among other factors.
Astreya offers comprehensive benefits to all Regular, Full-Time Employees, including:
Medical provided through Cigna (PPO, HSA, EPO options) / Medical provided through Kaiser (HMO option only) for California employees only
Dental provided through Cigna (DPPO & DHMO options)
Nationwide Vision provided through VSP
Flexible Spending Account for Health & Dependent Care
Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific)
Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera
Corporate Wellness Program
Employee Assistance Program
Wellness Days
401k Plan
Basic Life, Accidental Life, Supplemental Life Insurance
Short Term & Long Term Disability
Critical Illness, Critical Hospital, and Voluntary Accident Insurance
Tuition Reimbursement (available 6 months after start date, capped)
Paid Time Off (accrued and prorated, maximum of 120 hours annually)
Paid Holidays
Any other statutory leaves, paid time, or other fringe benefits required under state and federal law
$98k-154.8k yearly Auto-Apply 60d+ ago
Senior Systems Safety Engineer
Powerlight Technologies
Reliability engineer job in Kent, WA
Who We Are:
PowerLight Technologies is the leading developer of safe, long-distance laser power beaming solutions totransmit kilowatt-level power remotely, making energy accessible to new and increasingly mobiledistributed digital assets on and off planet Earth. Laser power beaming delivers energy to digitalinfrastructure, enabling power to be delivered when and where it is needed, including in hard-to-reachplaces or at high altitudes, with or without human access, and without the tether to todays battery- and power-cord powerdistribution environment. At PowerLight Technologies we seek to hire contributors who exhibit three special traits: the hands-on skill to create hardware and software; a commitment to teamwork; and a track record of high achievement ideally with startup experience growing into a larger organization. The most successful members of our team exhibit all of these qualities, wrapped with genuine niceness and respect for others.
Successful team members are undeterred by extremely challenging, first-of-their-kind problems to solve. They experiment, fail-and-learn quickly, and lock in the mastery of their explorations alongside like-minded innovators.
We hire professionals who know their craft; we look for those who can design boards, assemblies, optical chains, code, and processes with an eye toward successful, field-ready, secure, nicely documented products and systems. We also value academic chops, people who love embrace physics, analysis, and technical progress.
Our culture is one of urgency, accountability, teamwork, and a healthy sense of fun, along with a passion for science. We show up prepared, engage in healthy debate, embrace deadlines, and work hard to maintain a workplace environment that is energizing.
A career here means you will interact with the world's leading scientists, engineers, and builders. Join our team bring your skills, experiences, and determination to make a difference. This is a unique opportunity for a skilled professional and thought leader to work in the emerging field of long-distance power beaming.
Power Beaming Video
Power Over Fiber Video
Job Description:
As a Senior Systems Safety Engineer, you will play a pivotal role in ensuring the safety of our next-generation complex multidisciplinary laser power beaming systems. Your responsibilities will include the modeling, analysis, optimization, and design of safety features through a novel safety-based approach for both free-space power beaming systems and power-over-fiber systems. This role requires a deep understanding of failure characteristics of system components, as well as expertise in formal system safety design and assessment methodologies.
Key Responsibilities:
Lead ongoing safety Concept of Operations and requirements development in a model based environment for advanced high energy laser power beaming systems.
Multidisciplinary Design Analysis:
Collaborate and contribute across teams to analyze the design of complex systems spanning multiple disciplines, including mechanical and electronic hardware, optics, and software.
Assess potential safety risks associated with various design elements and propose mitigating solutions.
Perform safety analyses based on knowledge of standards such as ISO 26262 and MIL-STD-882E.
Optimization of Safety Features:
Develop and implement strategies to optimize safety features within the system architecture, including algorithms, sensing layers, and components.
Collaborate with design and engineering teams to integrate safety considerations seamlessly into designs.
Novel Safety-Based Approach:
Innovate and propose novel approaches to enhance the safety of systems, incorporating the latest industry best practices and emerging technologies.
Develop and propose hazard mitigation techniques.
Stay abreast of advancements in safety engineering and integrate relevant methodologies into the design process. Influence evolving standards for laser power beaming safety. Participate in standards organizations and industry alliances.
Hazard and Failure Analysis:
Conduct in-depth analysis of failure characteristics of system components to identify potential failure modes and their impact on overall system safety.
Collaborate with reliability engineers to develop strategies for minimizing the likelihood of failures.
Ability to develop and perform HAZOP, FMECA, FTA and other related methods.
Safety Assessment:
Lead and conduct safety assessments and audits to evaluate performance of systems.
Generate safety reports and documentation, providing clear and comprehensive insights into potential risks and recommended mitigations.
Collaboration and Communication:
Work closely with colleagues on cross-functional teams, including engineering and quality assurance, to ensure a cohesive and integrated approach to system safety. Facilitate discussions and progress toward addressing safety issues. Manage functional safety programs and activities.
Prepare detailed documentation, including safety analysis reports, feature tracking, hazard identification, and more.
Communicate effectively with stakeholders, presenting safety analyses and recommendations in a clear and understandable manner.
Qualifications:
Bachelor's or higher degree in Engineering, physics, or a related field. Master's or PhD preferred.
Proven experience in systems safety engineering, with a focus on multidisciplinary design analysis, optimization, and modeling.
Strong analytical and problem-solving skills.
Ability to develop and implement a System Safety Program Plan.
Familiarity with IEC 61508, ISO 26262, SAE ARP4761, MIL-STD-882E, and other safety related standards, and ability to evaluate designed systems in accordance with, industry standards and regulations related to system safety, laser safety and range safety.
Excellent communication and collaboration skills.
Preferred Skills:
Experience in safety-critical industries such as aerospace, automotive, or medical devices.
Knowledge of software tools used in safety analysis and modeling.
This role requires a combination of technical expertise, creativity, and effective communication to drive the development of reliable and safe systems. The Senior Systems Safety Engineer will be at the forefront of ensuring that safety considerations continue to be integral to our design and development processes.
Physical Requirements:
Prolonged periods sitting or standing at a desk and working on a computer, as well as setting up and adjusting laser power beaming equipment for testing, both indoors and outdoors. Must be able to occasionally lift up to 20 pounds (e.g. computer, printer paper). Repeating motions that may include the wrists, hands and/or fingers using a computer, keyboard, mouse, writing.
Benefits:
401Kand HSA options
Comprehensive medical, dental & vision benefits
Additional voluntary benefits offered
Flexible paid time off program and paid holidays
TeleDoc and Employee assistance programs
In-office gym and locker rooms, plus Green River Trail access right outside the office doors
Snacks/beverages
Community giving programs and events
Contact us at ************************** or on our website: ***************************
PowerLight Technologies is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
How much does a reliability engineer earn in Olympia, WA?
The average reliability engineer in Olympia, WA earns between $91,000 and $175,000 annually. This compares to the national average reliability engineer range of $76,000 to $144,000.
Average reliability engineer salary in Olympia, WA
$126,000
What are the biggest employers of Reliability Engineers in Olympia, WA?
The biggest employers of Reliability Engineers in Olympia, WA are: