Post job

Reliability Engineer jobs at KIK Custom Products Inc. - 260 jobs

  • Remote JavaScript Engineer for AI Training & Code Quality

    Labelbox 4.3company rating

    San Francisco, CA jobs

    A leading AI solutions firm is looking for a JavaScript Developer to work remotely on AI-related projects. The successful candidate will review AI-generated JavaScript code, develop high-quality solutions, and create explanations for code logic. A Bachelor's degree in Computer Science and 3-5 years of experience with JavaScript frameworks like React or Node.js are required. This hourly contract offers flexible working hours ranging from 10 to 40 hours per week. #J-18808-Ljbffr
    $85k-115k yearly est. 2d ago
  • Job icon imageJob icon image 2

    Looking for a job?

    Let Zippia find it for you.

  • Lead Site Reliability Engineer

    One Dynamic 3.7company rating

    Remote

    Quick Details Rate Duration Fully Remote (US) 8+ Years $70-75/hour 6 months+ About One Dynamic One Dynamic is a Service-Disabled Veteran-Owned Small Business (SDVOSB) headquartered in Fairfax, VA. We specialize in digital transformation, cloud infrastructure, quality assurance, and enterprise architecture for federal and healthcare organizations. We are currently seeking a Lead Site Reliability Engineer to support our client ARC, a rapidly growing device management company revolutionizing how frontline workers interact with enterprise mobile devices. About the Role The Lead Site Reliability Engineer is a senior technical leadership role responsible for the reliability, availability, and operational excellence of the cloud infrastructure and kiosks platform. This role owns uptime, SLAs, and incident response while driving long-term improvements to system resilience, observability, and operational maturity. The Lead SRE serves as both a hands-on technical leader and a force multiplier across platform, QA, and development teams. This role is well-suited for an experienced engineer who thrives in high-ownership environments and can balance real-time operational demands with strategic reliability initiatives. Strong communication, sound technical judgment, and a bias toward preventative engineering are critical to success. Key Responsibilities Own uptime, SLAs, and overall reliability of the cloud infrastructure and kiosks platform Lead incident response, root-cause analysis, and drive actionable postmortems Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team Maintain and improve monitoring, alerting, and observability (e.g., Grafana, Prometheus, New Relic). Execute and continuously improve disaster recovery and business continuity plans Partner with platform engineering, QA, and development teams to ensure operational readiness Establish and maintain runbooks, operational standards, and reliability best practices Provide leadership, mentorship, and clear communication during both normal operations and incidents Optimize cloud and Kubernetes environments for reliability, performance, and scalability Required Qualifications 8+ years in SRE, DevOps, or Platform Engineering roles; 2+ years in a senior or lead capacity Strong experience supporting production environments with strict SLAs and high uptime requirements Deep knowledge of Kubernetes, containers, and cloud-native infrastructure Proficiency in automation and scripting using Bash, Python, or Go Hands-on experience with CI/CD pipelines and release engineering in modern environments Expert-level familiarity with IaC tools (Terraform preferred) Strong understanding of monitoring, alerting, logging, and observability tooling Experience implementing and managing GitOps workflows (ArgoCD or similar) Demonstrated ability to lead incidents and communicate effectively with technical and non-technical stakeholders Solid understanding of disaster recovery planning, resilience practices, and system hardening Must be authorized to work in the United States (US-based candidates only) The Ideal Candidate You think several steps ahead. You are relentless, strategic, and a long-term thinker. You believe the details are essential, and so you get them right. You are a fast learner. You take feedback well and implement it. You care about achieving the best outcome and do not focus on being right or wrong. About the Client ARC is a device management solution integrated with smart lockers, designed to store, secure, and charge company-owned handheld devices (E.g., Zebra, Honeywell) used by frontline workers to perform core job functions. Launched in late 2021, ARC was spun off from ChargeItSpot, a consumer-facing phone-charging technology company established in 2012. ARC's Mission: Minimize Device Waste. Maximize Worker Productivity. Make Life Easier. How to Apply If you have the unique combination of skills and qualities we are seeking, please submit your resume via One Dynamic's careers portal. We look forward to hearing from you! One Dynamic is an Equal Opportunity employer. Personnel are chosen based on ability without regard to race, color, religion, sex, national origin, disability, marital status, or sexual orientation, in accordance with federal and state law.
    $70-75 hourly Auto-Apply 13d ago
  • Site Reliability Engineer 2

    Drivewealth 4.0company rating

    Remote

    DriveWealth is a global B2B financial technology organization dedicated to democratizing access to financial independence around the world. Our mission is realized through an API-based platform, empowering our partners to offer seamless investing and trading experiences to clients worldwide, all from their mobile devices. Our technology provides partners with a modern, extensible toolkit, enabling traditional investment workflows and innovative techniques like fractional share ownership. DriveWealth has evolved into a global platform offering trading of US equities, mutual funds, ETFs, fixed income, and options. We seek enthusiastic professionals to contribute diverse perspectives and experiences to our Brokerage-as-a-Service platform. Our culture blends the pace and opportunity of a tech start-up with the impact, stability, and significance of Wall Street. We encourage creativity and experimentation while ensuring institutional-grade execution and regulatory compliance in everything we do. We value diversity and inclusion, celebrating the unique differences of our employees as we scale and grow together. We're guided by operating principles grounded in accountability, teamwork, integrity, and solutions built to scale. Join us! About The Role As a Site Reliability Engineer 2, you will enhance the reliability and performance of our Brokerage-as-a-Service platform during critical 7/24 operations. This role demands a proactive approach to managing technical challenges and system optimizations that align with our global operational strategies. What You'll Do Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements. Handle technical escalations, troubleshoot complex issues, and actively participate in on-call rotations to ensure rapid response and resolution during non-traditional hours. Adhere and administer incident and change management policies. Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability, especially during critical system operations at night. Work closely with the New York office to ensure smooth operation and alignment of SRE practices across time zones. What You'll Need 3+ years in a SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations. Working knowledge in REST APIs and understanding of API integration. Python proficiency in scripting for automation and system management, with a track record of developing and implementing automation solutions. SQL and Database expertise in transactional databases, including querying and troubleshooting. Analytical and troubleshooting skills with a demonstrated ability to perform troubleshooting and root cause analysis of technical issues. Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage. Knowledge of Change Management Process and Risk Management. Nice to Have, But No Required Experience in the brokerage or financial industry Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB Experience maintaining and supporting containerized systems, with familiarity in orchestration tools Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow Advanced skills in managing containerized environments using Kubernetes and OpenShift Practical experience with Confluent Cloud for event streaming architectures Experience with Java applications and a basic understanding of using the browser developer console for front-end debugging Additional Notes: This role is critical for our continuous operations and requires a commitment to nighttime hours, aligning with the global nature of our financial services. Candidates must be prepared for intense collaboration periods and proactive communication across global teams. Applicants must be authorized to work for any employer in the U.S. DriveWealth is unable to sponsor or take over sponsorship of an employment Visa at this time. Compensation Compensation package offerings are based on candidate experience and technical qualifications, as it relates to the role. These are identified and determined throughout your interviewing experience. Please note: at this time, we are not able to hire in all states. Remote (Most US States) Pay Range$130,000-$150,000 USD Benefits Competitive medical, dental, and vision insurance options Mental health resources Generous paid time off with observed holidays (varies per country) Paid parental leave for biological and adoptive parents Up to $2,500 or local equivalent each year to invest in continued education and personal development Up to $900 each year or local equivalent for fitness and wellness reimbursement Company-provided phone (varies by country) For HQ in-office employees, a daily lunch stipend, unlimited snacks, and engaging office space in the Financial District Pre-tax commuter benefits (US only) Employer 401K match (US only) Benefit offerings vary based on country and are subject to change. Equal Employment Opportunity To build technology and products that are used and loved by people and solve real-world problems, we need to build a team with many different perspectives and experiences. We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us at **************************. Agency Disclaimer DriveWealth does not accept agency resumes. Please do not forward resumes to our jobs alias, employees, or any other organization location. DriveWealth is not responsible for any fees related to unsolicited resumes.
    $130k-150k yearly Auto-Apply 22d ago
  • Site Reliability Engineer

    Minio 4.1company rating

    Remote

    MinIO is the industry leader in high-performance object storage and the company behind the world's fastest, most widely deployed object store, powering production infrastructure for more than half of the Fortune 500, including 9 of the 10 largest global automakers and all 10 of the largest U.S. banks. Our enterprise offering, AIStor, is engineered to handle the scale, speed, and pressure of modern AI and analytics, from terabytes to exabytes, all in a single namespace. As a Site Reliability Engineer, you will work closely with customers as well as the engineering team on enhancing, optimizing, validating and automating our cloud-native storage platform. Your role will be a mix of DevOps and software engineering to assure that MinIo is delivering a very high quality product with high-performance, scalability and durability to enable seamless data storage and retrieval for demanding workloads for customers. This role requires deep expertise in DevOps practices, SRE, systems programming, distributed computing, and storage architectures. You will work closely with a world-class team of engineers to push the boundaries of object storage performance and reliability. What You Will Do: Enhance, optimize, validate and automate core MinIO software for performance, scalability, and security. Help building and delivering high-performance distributed storage solutions with a focus on cloud-native architectures. Validate the MinIO Software according to customer environment and requirements, ensuring no surprises are observed at customer deployments. Improve existing features, fix critical issues, and contribute to open-source repositories. Collaborate with other engineers to refine architecture, APIs, and integrations. Write efficient, well-documented, and maintainable code. Conduct performance benchmarking and debugging of complex storage environments. Work closely with customers to address issues, and manage expectations. Your Skills and Experience: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. 5+ years of professional experience in software engineering. Desire and ability to directly work with customers to solve their problem with product enhancement and automation. Experience in DevOps, GitOps, Automation and testing frameworks. Expertise in distributed systems, networking, or high-performance computing. Experience with cloud-native technologies (Kubernetes, containers, microservices). Strong proficiency in Go desired (or deep experience in C/C++/Rust with a willingness to learn Go). Deep understanding of storage systems, file systems, or databases. Strong problem-solving skills and experience debugging complex, large-scale applications. Contributions to open-source projects are a plus. Ability to work in a fast-moving, collaborative environment with a strong sense of ownership. Empathy towards the customer and ability to quickly dig in to resolve any customer issue. Passion for innovation and staying current with technology trends. Self-motivation and a commitment to continuous learning and adopting new tools and frameworks. A strong sense of ownership and accountability in delivering high-quality work while directly working with customers.. A collaborative and team-oriented mindset, thriving in environments that value open communication and shared goals. Ability to collaborate effectively with cross-functional teams, contributing to a positive and productive work environment. Attention to detail and fine craftsmanship. What We Offer: Health Care Plan (Medical, Dental & Vision) 401K with 3% Contribution Pre-IPO Stock Options At least 12 Public Holidays Flexible Time Off Equal Opportunity Policy (EEO) MinIO is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
    $94k-136k yearly est. Auto-Apply 17d ago
  • Site Reliability Engineer

    Podium Corporation 4.5company rating

    Remote

    At Podium, our mission is to arm every local business with a complete platform and outcome-driven AI employees that convert leads into real, paying customers. Every day, millions of workers use our AI lead conversion and communication platform to help them get more leads and make more money. Our work and focus on helping local businesses thrive has been recognized across the industry, including Forbes' Next Billion Dollar Startups, Forbes' Cloud 100, the Inc. 5000, and Fast Company's World's Most Innovative Companies. At Podium, we believe in fostering a culture that thrives on hiring and developing exceptional talent. Our operating principles serve as a compass, guiding daily behavior and decision-making, and ensure we hire people who will thrive at Podium. If you resonate with our operating principles and are energized by our mission, Podium will be a great place for you! Site Reliability Engineer At Podium, our Site Reliability Engineers (SREs) operate at the intersection of software and systems engineering. The SRE team ensures our products are stable, scalable, sustainable, and seamless. We partner closely with product engineering teams to address their operational needs while continuously improving the reliability and performance of Podium's platform. We're looking for a SRE who can make an impact from day one! What You'll Do Work with technologies including Kubernetes, Helm, Docker, AWS, Terraform, Datadog, Honeycomb, Prometheus, Ansible, StrongDM, Python, Go, Ruby, GitLab/GitHub, and CI/CD pipelines. Collaborate across Podium's engineering community to identify areas for improvement, enhance reliability, and create a safer, more efficient system. Participate in an on-call rotation, triaging and resolving production and development issues. Partner with cross-functional teams to minimize downtime and ensure platform resilience. Mentor junior engineers, fostering growth and technical excellence. What You'll Bring Bachelor's degree in a technical field or equivalent experience. 4+ years experience supporting production systems in a software or systems engineering role. 3+ years deploying, operating, and debugging server software on Linux. Strong curiosity and a desire to learn continuously. Willingness to participate in on-call rotations. What We Hope You Have Experience with distributed systems and microservices. Knowledge of system design principles. Hands-on experience with cloud computing (AWS, GCP, or Azure). Familiarity with SOC2, HIPAA, PCI, or similar compliance frameworks. Experience building and maintaining CI/CD pipelines. Deep expertise in infrastructure engineering. BENEFITS Open and transparent culture Life insurance, long and short-term disability coverage Paid maternity and paternity leave Fertility Benefits Generous vacation time, plus three 4-day summer holiday weekends Excellent medical, dental, and vision benefits 401k Plan with company matching Bi-annual swag drops with cool Podium gear and apparel A stellar HQ (Utah) gym with local professional coaches and classes offered Onsite HQ (Utah) child care center, subsidized for employees Podium is an equal opportunity employer. Podium provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, national origin, sexual orientation, gender identity or expression, age, disability, genetic information, marital status or veteran status.
    $96k-139k yearly est. Auto-Apply 28d ago
  • Staff Database Reliability Engineer

    Boulevard Ford 4.6company rating

    Remote

    Who is Boulevard? Boulevard provides the first and only client experience platform for appointment-based, self-care businesses. We empower our customers to give their clients more of the magical moments that matter most. Before launching in 2016, our founders spent months interviewing salon managers and working behind front desks to understand their pain points so we could design a modern, user-friendly platform that meets the unique needs of their business. Our roots may be in hair salons, but we are built for the broader self-care industry, including many types of salons, spas, medspa, barbershops, and more. Our technology not only helps our customers survive but thrive. Take a look at how we (and YOU) can make that happen. We have an insatiable curiosity and embrace experimentation. We believe that simple solutions require the most sophistication, and we design each and every detail to maximize potential, power, and impact. Do our values match? Read through our story and what we value the most. Our team values and celebrates our diverse backgrounds. Being open about who we are and what we do allows us to do the best work of our lives. We believe in equal opportunity for all, and you should too. Come do the best work of your life at Boulevard. We're hiring a Staff Database Reliability Engineer to shape the foundation of Database Reliability Engineering at Boulevard. This role goes beyond optimizing queries or reviewing SQL PRs - you'll own database reliability at scale within our cloud-based infrastructure, influence site reliability practices, and drive our RDBMS scalability strategy. You'll help teams improve how they design, operate, and depend on databases through repeatable, reliable practices. Reporting to the Director of Cloud & Reliability, this is a hands-on technical leadership role focused on elevating reliability practices and building resilient database platforms. You'll define what “good” looks like, partner closely with engineering teams, and help Boulevard operate databases that scale with confidence. The Cloud & Reliability group operates on four foundational principles: Reliable Infrastructure - a foundation of stability and security Developer Productivity - empowering builders to do the right things Clear Ownership - accountability aligned with ownership; collaboration over silos Long-Term Focus - we engineer for tomorrow Key Projects & Initiatives Database Reliability & Fault Tolerance: Lead initiatives to make our database platforms more robust, fault-tolerant, and self-healing. Platform Performance Optimization: Drive continuous improvement in performance and cost efficiency, using observability data to identify and resolve bottlenecks with a focus on RDBMS infrastructure. Observability & Operational Insight: Enhance database observability across metrics, logging, and tracing to ensure deep visibility into production health and behavior. What You'll Do Here Develop a deep understanding of how Boulevard's systems behave, scale, interact, and fail, and use that insight to identify risks and improvement opportunities. Own and improve database reliability, performance, and scalability; participate in incident response and drive architectural improvements that reduce incident frequency and impact. Partner with engineering teams to design, build, and operate scalable, fault-tolerant, and secure distributed systems that support Boulevard's growth and customer trust. Build tools, automation, and frameworks that eliminate toil, reduce operational overhead, and establish best practices used across engineering teams. Elevate observability and operational excellence through actionable metrics, alerts, and dashboards that enable faster incident resolution and proactive reliability improvements. Mentor and influence engineers across the organization, helping foster a culture where reliability is a shared responsibility. What You'll Need to Thrive Deep Systems Expertise: 8-10+ years of experience in systems, infrastructure, or backend software engineering, with a strong focus on RDBMS and NoSQL systems. Cloud Database Experience: Production experience with managed cloud databases such as AWS Aurora/RDS (PostgreSQL), and deploying/managing infrastructure using infrastructure-as-code tools. Reliability Engineering Mindset: Proven experience delivering reliability outcomes using SLOs, SLIs, error budgets, and mature observability practices. Automation-First Philosophy: Strong background in automation, scripting, and infrastructure-as-code (e.g., Terraform, Python, Go, or similar). Incident Management Mastery: Experience diagnosing and mitigating production incidents in high-availability systems, with a focus on learning and continuous improvement. Collaboration & Influence: Excellent communication skills and the ability to influence without authority across engineering teams. Technical Leadership & Mentorship: Demonstrated ability to set technical standards, mentor engineers, and scale impact through others. Comfort with Ambiguity: Ability to navigate uncertainty, set direction, and iterate toward meaningful outcomes in a fast-moving environment. Bonus Experience with Elixir, Phoenix, Ruby, or Rails Hands-on experience identifying and improving database performance In addition to the wonderful people you'll get to work with and challenging projects that'll push you - Boulevard is here to make sure you're always at the top of your game emotionally, mentally, and physically. ✨ We've got you covered with a 401(k) match plus dental, medical, vision, and life insurance. 🏝 Take a break whenever you need with our flexible vacation day policy. 🖥 Fully remote so you can choose where you want to work. You'll receive a work from home stipend every month. 💚 Family planning resources and specialized support programs. 🔮 Equity: get ahead on the ground floor and grow with Boulevard. 💅 Boulevard Bucks Learning and Development program allows employees to explore businesses in the market we serve. 📲 We recommend following our official LinkedIn page to stay up to date on all things Boulevard life! Boulevard Labs, Inc. is an Equal Opportunity Employer committed to hiring a diverse workforce and sustaining an inclusive culture. All employment decisions at Boulevard Labs, Inc. are based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.
    $94k-136k yearly est. Auto-Apply 3d ago
  • Site Reliability Engineer, Edge Services

    Bytedance 4.6company rating

    Boston, MA jobs

    Our Content Distribution Networks (CDN) team operates on a hybrid platform that integrates both commercial CDN vendors and ByteDance's proprietary edge network. This platform encompasses a vast array of Points of Presence (POPs) around the world, hosting edge services like traffic acceleration, CDN caching, gaming, and more. We are looking for experienced reliability and performance engineers to ensure the stability and reliability of our edge services and products running on our hybrid CDN platform. This role will also involve optimizing performance and developing innovative solutions to meet the evolving business demands at the edge. Site Reliability Engineering (SRE) combines software and systems engineering to create and manage large-scale, highly distributed infrastructures. Our SREs are responsible for ensuring that these infrastructure services are reliable, fault-tolerant, scalable, and cost-effective. In this role, you will manage complex systems at scale, including hyperscale datacenter administration, public cloud management, global CDNs, and load balancers that process terabits of traffic per second. You will collaborate with diverse teams to translate business requirements into actionable items, driving improvements in system design and operational procedures. Responsibilities • Architect and implement solutions that enable both internal and external customers to harness the power of Bytedance's globally scaled content delivery network. • Build metrics, tools, automations, visualizations and monitors to facilitate the operation and optimization of the edge services. • Develop procedures and workflows that improve efficiency, foster trust, and ensure compliance in operational processes. • Run vulnerability and capacity assessment and develop disaster recovery strategies to ensure high availability of our global CDN services. • Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues. Minimum Qualifications • Bachelor's degree with 2+ years of experience in Computer Engineering, Computer Science, or related fields. • 2+ years working experience in the field of CDN performance engineering, solution architecting or site reliability engineering roles. • 2+ years experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python. Preferred Qualifications • Self-driven and capable of coping with ambiguity and moving projects from concept to delivery. • Experience in operating in a multi-CDN environment. • Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment. Past experience with CDN technologies is a plus. • Strong in analytical skills and the ability to solve real world problems in a fast moving environment. • Experience in designing, analyzing and building automation and tools for large scale systems • Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.
    $133k-192k yearly est. 8d ago
  • Lead Site Reliability Engineer - Federal Team

    Saviynt 4.4company rating

    Atlanta, GA jobs

    Job DescriptionSaviynt is an identity authority platform built to power and protect the world at work. In a world of digital transformation, where organizations are faced with increasing cyber risk but cannot afford defensive measures to slow down progress, Saviynt's Enterprise Identity Cloud gives customers unparalleled visibility, control and intelligence to better defend against threats while empowering users with right-time, right-level access to the digital technologies and tools they need to do their best work. This opportunity is in the Saviynt Labs organization. We design, build and run the leading Enterprise Identity solutions. Our product teams innovate industry leading solutions. The engineering teams design, build and run SaaS software built on leading edge technologies. We focus on engineering excellence and we attract the best talent in our industry. Our cloud services are built on AWS, GCP and Azure with a global presence. Our customers love what we do and work with us to build the future customer experience at scale.WHAT YOU WILL BE DOING Perform customer deployments, migrations, and upgrades in the cloud environment. Installing and configuring Saviynt product(s) following installation procedure and organizational guidelines Troubleshooting and resolving incidents while collaborating with the development and IT teams to minimize downtime and maintain service quality Manage and maintain cloud infrastructure on platforms such as AWS, Azure, or Google Cloud. Monitor cloud resources to ensure availability and scalability. Automate any manual work being performed pre/during/post deployments. Troubleshoot cloud-related infrastructure incidents and issues. Develop and maintain CI/CD pipelines to ensure reliable and efficient software delivery. Monitor and troubleshoot issues within the CI/CD pipelines. Automate infrastructure setup and maintenance using Infrastructure as Code (IaC) tools. Collaborate with development, operations, and QA teams to improve deployment processes. Maintain compliance with security and quality standards throughout the CI/CD pipeline Creating and maintaining technical documents for cloud infrastructure and related processes. Design and implement novel solutions to automate cloud-environment provisioning. Developing automation solutions to streamline processes, such as creating scripts to run specific tasks on systems. Developing and implementing automation scripts to reduce repetitive tasks and eliminate human error. Configuring and deploying monitoring tools WHAT YOU BRING U.S. Citizenship: Applicants must be United States citizens. 8+ years of professional experience in observability, SRE, or cloud platform roles, with demonstrated success in leading strategic initiatives and cross-team collaborations. 4+ years of hands-on cloud experience (AWS, Azure), with deep understanding of cloud-native architectures and observability practices. Proven track record of designing and operating highly available and resilient systems in public cloud environments (especially AWS). 3+ years of experience in software development using Python, NodeJS, or Java, with strong focus on automation, CI/CD integration, and DevOps practices. Advanced expertise in container orchestration platforms (Kubernetes) and service mesh technologies. Hands-on experience implementing observability at scale using tools such as Prometheus, Grafana, OpenTelemetry, ELK/OpenSearch, Datadog, CloudWatch, or Azure Monitor. Demonstrated success in driving adoption of SLOs, SLIs, error budgets, and automated alerting frameworks across engineering teams. Strong experience with infrastructure as code (e.g., Terraform, Helm) and automated deployment pipelines. Proven leadership in setting engineering standards, mentoring team members, and driving initiatives that reduce MTTD/MTTR and improve operational excellence. Strong analytical skills, communication capabilities, and a strategic mindset to influence and guide technical direction across large-scale engineering teams. Meet US persons on US soil requirements Undergo full background investigation/screening Undergo IAL3 requirements (Identity proofing to include I-9 document verification, biometric collection, and mailing address confirmation) We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensationdecisions are dependent on many factors including but are not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. A reasonable estimate of the current range is $135,000 -$180,000 annually. You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.If required for this role, you will:- Complete security & privacy literacy and awareness training during onboarding and annually thereafter- Review (initially and annually thereafter), understand, and adhere to Information Security/Privacy Policies and Procedures such as (but not limited to): - Data Classification, Retention & Handling Policy- Incident Response Policy/Procedures- Business Continuity/Disaster Recovery Policy/Procedures- Mobile Device Policy- Account Management Policy- Access Control Policy- Personnel Security Policy- Privacy Policy Saviynt is an amazing place to work. We are a high-growth, Platform as a Service company focused on Identity Authority to power and protect the world at work. You will experience tremendous growth and learning opportunities through challenging yet rewarding work that directly impacts our customers, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic environment, you belong with us! Saviynt is an equal opportunity employer, and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
    $135k-180k yearly 25d ago
  • Lead Site Reliability Engineer - Federal Team

    Saviynt 4.4company rating

    Atlanta, GA jobs

    Saviynt is an identity authority platform built to power and protect the world at work. In a world of digital transformation, where organizations are faced with increasing cyber risk but cannot afford defensive measures to slow down progress, Saviynt's Enterprise Identity Cloud gives customers unparalleled visibility, control and intelligence to better defend against threats while empowering users with right-time, right-level access to the digital technologies and tools they need to do their best work. This opportunity is in the Saviynt Labs organization. We design, build and run the leading Enterprise Identity solutions. Our product teams innovate industry leading solutions. The engineering teams design, build and run SaaS software built on leading edge technologies. We focus on engineering excellence and we attract the best talent in our industry. Our cloud services are built on AWS, GCP and Azure with a global presence. Our customers love what we do and work with us to build the future customer experience at scale.WHAT YOU WILL BE DOING Perform customer deployments, migrations, and upgrades in the cloud environment. Installing and configuring Saviynt product(s) following installation procedure and organizational guidelines Troubleshooting and resolving incidents while collaborating with the development and IT teams to minimize downtime and maintain service quality Manage and maintain cloud infrastructure on platforms such as AWS, Azure, or Google Cloud. Monitor cloud resources to ensure availability and scalability. Automate any manual work being performed pre/during/post deployments. Troubleshoot cloud-related infrastructure incidents and issues. Develop and maintain CI/CD pipelines to ensure reliable and efficient software delivery. Monitor and troubleshoot issues within the CI/CD pipelines. Automate infrastructure setup and maintenance using Infrastructure as Code (IaC) tools. Collaborate with development, operations, and QA teams to improve deployment processes. Maintain compliance with security and quality standards throughout the CI/CD pipeline Creating and maintaining technical documents for cloud infrastructure and related processes. Design and implement novel solutions to automate cloud-environment provisioning. Developing automation solutions to streamline processes, such as creating scripts to run specific tasks on systems. Developing and implementing automation scripts to reduce repetitive tasks and eliminate human error. Configuring and deploying monitoring tools WHAT YOU BRING U.S. Citizenship: Applicants must be United States citizens. 8+ years of professional experience in observability, SRE, or cloud platform roles, with demonstrated success in leading strategic initiatives and cross-team collaborations. 4+ years of hands-on cloud experience (AWS, Azure), with deep understanding of cloud-native architectures and observability practices. Proven track record of designing and operating highly available and resilient systems in public cloud environments (especially AWS). 3+ years of experience in software development using Python, NodeJS, or Java, with strong focus on automation, CI/CD integration, and DevOps practices. Advanced expertise in container orchestration platforms (Kubernetes) and service mesh technologies. Hands-on experience implementing observability at scale using tools such as Prometheus, Grafana, OpenTelemetry, ELK/OpenSearch, Datadog, CloudWatch, or Azure Monitor. Demonstrated success in driving adoption of SLOs, SLIs, error budgets, and automated alerting frameworks across engineering teams. Strong experience with infrastructure as code (e.g., Terraform, Helm) and automated deployment pipelines. Proven leadership in setting engineering standards, mentoring team members, and driving initiatives that reduce MTTD/MTTR and improve operational excellence. Strong analytical skills, communication capabilities, and a strategic mindset to influence and guide technical direction across large-scale engineering teams. Meet US persons on US soil requirements Undergo full background investigation/screening Undergo IAL3 requirements (Identity proofing to include I-9 document verification, biometric collection, and mailing address confirmation) If required for this role, you will:- Complete security & privacy literacy and awareness training during onboarding and annually thereafter- Review (initially and annually thereafter), understand, and adhere to Information Security/Privacy Policies and Procedures such as (but not limited to): - Data Classification, Retention & Handling Policy- Incident Response Policy/Procedures- Business Continuity/Disaster Recovery Policy/Procedures- Mobile Device Policy- Account Management Policy- Access Control Policy- Personnel Security Policy- Privacy Policy Saviynt is an amazing place to work. We are a high-growth, Platform as a Service company focused on Identity Authority to power and protect the world at work. You will experience tremendous growth and learning opportunities through challenging yet rewarding work that directly impacts our customers, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic environment, you belong with us! Saviynt is an equal opportunity employer, and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
    $82k-115k yearly est. Auto-Apply 60d+ ago
  • Principal Site Reliability Engineer - Federal Team

    Saviynt 4.4company rating

    Atlanta, GA jobs

    Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit **************** This opportunity is in the Saviynt Labs organization. We design, build and run the leading Enterprise Identity solutions. Our product teams innovate industry leading solutions. The engineering teams design, build and run SaaS software built on leading edge technologies. We focus on engineering excellence and we attract the best talent in our industry. Our cloud services are built on AWS, GCP and Azure with a global presence. Our customers love what we do and work with us to build the future customer experience at scale. WHAT YOU WILL BE DOING Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics. Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures. Design and implement strategies, tooling, and processes to enhance system uptime and reliability. Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability. Align the platform with customer needs and business goals by working closely with cross-functional teams. Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to monitor platform infrastructure and applications. Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions. Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement. Provide primary operational support and engineering for multiple large-scale distributed software applications. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. WHAT YOU BRING U.S. Citizenship: Applicants must be United States citizens. Master's Degree in an Engineering discipline, a bachelor's degree and 7+ years of professional software engineering experience, or equivalent experience. 10 + years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles 4+ experience in Cloud development (AWS, Azure) and observability skills Experience with building and operating highly resilient platforms in AWS cloud environments 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation Hands-on experience with container orchestration, preferably with Kubernetes Hands-on experience with building observability, monitoring and alerting on large scale distributed systems Logging and monitoring tools experience (preference for Prometheus, Grafana, Datadog, AWS Cloudwatch; Related, Azure Monitor, Log Analytics) Proven experience in implementing advanced observability practices and techniques at scale. Hands-on experience with one or more observability tools (preference for Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.) If required for this role, you will:- Complete security & privacy literacy and awareness training during onboarding and annually thereafter- Review (initially and annually thereafter), understand, and adhere to Information Security/Privacy Policies and Procedures such as (but not limited to): > Data Classification, Retention & Handling Policy > Incident Response Policy/Procedures > Business Continuity/Disaster Recovery Policy/Procedures > Mobile Device Policy > Account Management Policy > Access Control Policy > Personnel Security Policy > Privacy Policy Saviynt is an amazing place to work. We are a high-growth, Platform as a Service company focused on Identity Authority to power and protect the world at work. You will experience tremendous growth and learning opportunities through challenging yet rewarding work which directly impacts our customers, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic environment you belong with us! Saviynt is an equal opportunity employer and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
    $82k-115k yearly est. Auto-Apply 60d+ ago
  • Principal Site Reliability Engineer - Federal Team

    Saviynt 4.4company rating

    Atlanta, GA jobs

    Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit **************** This opportunity is in the Saviynt Labs organization. We design, build and run the leading Enterprise Identity solutions. Our product teams innovate industry leading solutions. The engineering teams design, build and run SaaS software built on leading edge technologies. We focus on engineering excellence and we attract the best talent in our industry. Our cloud services are built on AWS, GCP and Azure with a global presence. Our customers love what we do and work with us to build the future customer experience at scale. WHAT YOU WILL BE DOING * Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics. * Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures. * Design and implement strategies, tooling, and processes to enhance system uptime and reliability. * Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability. * Align the platform with customer needs and business goals by working closely with cross-functional teams. * Run the production environment by monitoring availability and taking a holistic view of system health. * Build software and systems to monitor platform infrastructure and applications. * Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions. * Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement. * Provide primary operational support and engineering for multiple large-scale distributed software applications. * Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. WHAT YOU BRING * U.S. Citizenship: Applicants must be United States citizens. * Master's Degree in an Engineering discipline, a bachelor's degree and 7+ years of professional software engineering experience, or equivalent experience. * 10 + years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles * 4+ experience in Cloud development (AWS, Azure) and observability skills * Experience with building and operating highly resilient platforms in AWS cloud environments * 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation * Hands-on experience with container orchestration, preferably with Kubernetes * Hands-on experience with building observability, monitoring and alerting on large scale distributed systems * Logging and monitoring tools experience (preference for Prometheus, Grafana, Datadog, AWS Cloudwatch; Related, Azure Monitor, Log Analytics) * Proven experience in implementing advanced observability practices and techniques at scale. * Hands-on experience with one or more observability tools (preference for Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.) If required for this role, you will: * Complete security & privacy literacy and awareness training during onboarding and annually thereafter * Review (initially and annually thereafter), understand, and adhere to Information Security/Privacy Policies and Procedures such as (but not limited to): > Data Classification, Retention & Handling Policy > Incident Response Policy/Procedures > Business Continuity/Disaster Recovery Policy/Procedures > Mobile Device Policy > Account Management Policy > Access Control Policy > Personnel Security Policy > Privacy Policy Saviynt is an amazing place to work. We are a high-growth, Platform as a Service company focused on Identity Authority to power and protect the world at work. You will experience tremendous growth and learning opportunities through challenging yet rewarding work which directly impacts our customers, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic environment you belong with us! Saviynt is an equal opportunity employer and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
    $82k-115k yearly est. 60d+ ago
  • Principal Site Reliability Engineer - Federal Team

    Saviynt 4.4company rating

    Atlanta, GA jobs

    Job DescriptionSaviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit **************** This opportunity is in the Saviynt Labs organization. We design, build and run the leading Enterprise Identity solutions. Our product teams innovate industry leading solutions. The engineering teams design, build and run SaaS software built on leading edge technologies. We focus on engineering excellence and we attract the best talent in our industry. Our cloud services are built on AWS, GCP and Azure with a global presence. Our customers love what we do and work with us to build the future customer experience at scale. WHAT YOU WILL BE DOING Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics. Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures. Design and implement strategies, tooling, and processes to enhance system uptime and reliability. Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability. Align the platform with customer needs and business goals by working closely with cross-functional teams. Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to monitor platform infrastructure and applications. Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions. Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement. Provide primary operational support and engineering for multiple large-scale distributed software applications. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. WHAT YOU BRING U.S. Citizenship: Applicants must be United States citizens. Master's Degree in an Engineering discipline, a bachelor's degree and 7+ years of professional software engineering experience, or equivalent experience. 10 + years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles 4+ experience in Cloud development (AWS, Azure) and observability skills Experience with building and operating highly resilient platforms in AWS cloud environments 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation Hands-on experience with container orchestration, preferably with Kubernetes Hands-on experience with building observability, monitoring and alerting on large scale distributed systems Logging and monitoring tools experience (preference for Prometheus, Grafana, Datadog, AWS Cloudwatch; Related, Azure Monitor, Log Analytics) Proven experience in implementing advanced observability practices and techniques at scale. Hands-on experience with one or more observability tools (preference for Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.) If required for this role, you will:- Complete security & privacy literacy and awareness training during onboarding and annually thereafter- Review (initially and annually thereafter), understand, and adhere to Information Security/Privacy Policies and Procedures such as (but not limited to): > Data Classification, Retention & Handling Policy> Incident Response Policy/Procedures> Business Continuity/Disaster Recovery Policy/Procedures> Mobile Device Policy> Account Management Policy> Access Control Policy> Personnel Security Policy> Privacy Policy Saviynt is an amazing place to work. We are a high-growth, Platform as a Service company focused on Identity Authority to power and protect the world at work. You will experience tremendous growth and learning opportunities through challenging yet rewarding work which directly impacts our customers, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic environment you belong with us! Saviynt is an equal opportunity employer and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
    $82k-115k yearly est. 25d ago
  • Site Reliability Engineer

    Eagle Eye Networks 4.0company rating

    Austin, TX jobs

    About UsEagle Eye Networks is the global leader in cloud video surveillance, delivering cyber-secure, cloud-based video with artificial intelligence (AI) and analytics to make businesses more efficient and the world a safer place. The Eagle Eye Cloud VMS (video management system) is the only platform robust and flexible enough to power the future of video surveillance and intelligence. Eagle Eye is based in Austin, Texas, with offices in Amsterdam, Bangalore, and Tokyo. Learn more at een.com.SummaryAre you an SRE ready to grow your impact in a high-scale, cloud-native environment? Eagle Eye Networks is hiring a Site Reliability Engineer to help ensure the reliability, performance, and automation of our global video surveillance platform. As an SRE, you'll collaborate with senior engineers to operate and improve infrastructure, contribute to observability and automation, and support deployments. This is a hands-on role for someone who enjoys solving systems problems, learning through collaboration, and gradually taking on greater ownership of platform reliability. If you're excited about infrastructure, thrive in a collaborative environment, and are eager to deepen your technical skills, this is the role for you. Responsibilities Build and maintain reliable, automated infrastructure across private cloud environments. Participate in incident response, assisting with communication, troubleshooting, and follow-up actions. Contribute to efforts that reduce recurring issues and improve service availability and recovery. Apply and support best practices for observability, incident management, and production readiness. Collaborate on improvements to Infrastructure as Code and CI/CD tooling. Work with product and application teams to help define meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Advocate for automation and efficiency in day-to-day operations. Contribute to reliability-focused projects and offer insights during architecture discussions when needed. Participate in the on-call rotation and help identify opportunities to improve its effectiveness. Experience Must Have: 2+ years of experience as a Site Reliability Engineer (or related role). Strong experience managing Linux systems in production environments. Good working knowledge of Kubernetes or other container orchestration systems. Solid scripting abilities in languages such as Python or Bash; familiarity with Golang is a plus. Experience contributing to automation that reduces operational toil and improves reliability. Hands-on experience participating in incident response and contributing to root-cause analysis. Familiarity with observability tooling such as Prometheus/VictoriaMetrics and Grafana for metrics, alerting, and basic SLO/error-budget usage. Understanding of networking fundamentals and security best practices. Ability to identify reliability issues and assist in implementing scalable improvements. Experience participating in or improving on-call and alerting systems. Nice to Have: Exposure to system performance tuning or capacity planning. Experience collaborating in cross-functional architecture or design discussions. Interest in growing toward technical leadership or mentorship over time. Why work for Eagle Eye? Eagle Eye Networks is an innovative, global start-up building the only platform powerful enough to support the future of video surveillance and security. Here your voice will be heard, and talent respected. We have proven leadership and financial backing of one of the world's premier venture capital firms. The work we do is essential in today's world, as our systems are used to protect the health, safety, and welfare of people and property around the world. Eagle Eye is a place where you can make a difference. Bring your passion, your drive, a roll-up- your- sleeves- and- get- it- done work ethic, and a collaborative mindset. Be ready to work hard and have fun. We also have great benefits and perks. Medical Benefits: We offer a competitive medical plan. Company offsets premiums. 100% paid employee dental and vision insurance. Taco Tuesdays: Like breakfast tacos? You're at the right place, because weekly breakfast tacos are provided. 401k plan with company match! Weekly Lunch: Food is love. Especially when it is free. Snacks: You will never go hungry. Culture: Innovation drives our vibe. Diversity: We embrace our global presence, the diverse ideas and backgrounds of our team to improve our culture, our products and grow our people and our business. Unlimited PTO: We value our employees' work/life balance and want you to spend the time off you need. More About Eagle Eye Networks Eagle Eye Networks is leveraging artificial intelligence on its true cloud platform to dramatically reshape the video surveillance and security industry. The Eagle Eye Cloud Video Management System (VMS) is a smart cloud video surveillance solution, purpose-built to help businesses improve safety, security, operations, and customer service. Tens of thousands of companies in more than 90 countries around the globe have moved their video surveillance to the cloud with Eagle Eye VMS. Customers, including multi-family residences, smart cities, schools, hospitals, hotels, logistics, restaurants, and retail shops trust Eagle Eye for actionable business intelligence and proactive security across multiple locations. The Eagle Eye VMS has strong APIs for the secure integration of third-party systems and works with thousands of industry cameras, so customers don't have to “rip and replace” their existing infrastructure. Eagle Eye Cloud VMS is the only platform robust enough to power the future of video surveillance. Eagle Eye Networks is an equal employment opportunity employer and values diversity. Qualified candidates are considered for employment without regard to race, religion, gender, gender identity, sexual orientation, national origin, age, military or veteran status, disability, or any other characteristic protected by applicable law.
    $85k-120k yearly est. Auto-Apply 54d ago
  • Principal Site Reliability Engineer

    Veracode 4.2company rating

    Burlington, MA jobs

    Veracode is seeking an enthusiastic, motivated engineer with deep AWS knowledge and the ability to keep up with a high-performing team. This is a chance to be on the leading edge of our evolution to the cloud in a fast-paced environment. As a member of our SRE team, you will be part of a team migrating existing applications into AWS while seeking opportunities for efficiencies in deployment, monitoring, and cost. In addition to this, you will serve as a subject matter expert and consultant to other teams developing applications in the cloud. The ideal candidate will be a self-starter who can work with minimal supervision, covering a wide range of duties in a high-energy DevOps environment. Veracode is seeking an enthusiastic, motivated engineer with deep AWS knowledge and the ability to keep up with a high-performing team. This is a chance to be on the leading edge of our evolution to the cloud in a fast-paced environment. As a member of our SRE team, you will be part of a team migrating existing applications into AWS while seeking opportunities for efficiencies in deployment, monitoring, and cost. In addition to this, you will serve as a subject matter expert and consultant to other teams developing applications in the cloud. The ideal candidate will be a self-starter who can work with minimal supervision, covering a wide range of duties in a high-energy DevOps environment. · BS in Computer Science or equivalent work experience · 3+ years' experience architecting and automating AWS infrastructure · 3+ years' experience automating deployments in AWS · Experience in an Agile environment · Strong written and verbal communication · Experience with logging and monitoring tools like Sumologic, Splunk, ELK preferred · Experience with Docker and ECS or Kubernetes preferred · Experience with infrastructure as code like Terraform or Cloudformation preferred · Experience with configuration management tools like Ansible or Puppet preferred · Experience with metrics collection and aggregation solutions preferred · Proficient in one or more programming languages, preferably Python and contemporary engineering practices · Working knowledge of software defined networking on cloud and is' automation o VPC o Routing o DNS o Direct Connect · Strong written and verbal communication · Influence, design and create new architectures, standards and methods for enterprise systems · Collaborate with, learn from, and mentor teammates Skills & Requirements · BS in Computer Science or equivalent work experience · 3+ years' experience architecting and automating AWS infrastructure · 3+ years' experience automating deployments in AWS · Experience in an Agile environment · Strong written and verbal communication · Experience with logging and monitoring tools like Sumologic, Splunk, ELK preferred · Experience with Docker and ECS or Kubernetes preferred · Experience with infrastructure as code like Terraform or Cloudformation preferred · Experience with configuration management tools like Ansible or Puppet preferred · Experience with metrics collection and aggregation solutions preferred · Proficient in one or more programming languages, preferably Python and contemporary engineering practices · Working knowledge of software defined networking on cloud and is' automation o VPC o Routing o DNS o Direct Connect · Strong written and verbal communication · Influence, design and create new architectures, standards and methods for enterprise systems · Collaborate with, learn from, and mentor teammates
    $84k-109k yearly est. 60d+ ago
  • Process Engineer

    Cellink 3.5company rating

    Georgetown, TX jobs

    This position is responsible for designing and implementing systems and equipment procedures. The main duties include testing and monitoring equipment, updating current system processes, and conducting risk assessments. ensuring high-quality deliverables for business objectives. Essential Duties and Responsibilities To perform this job successfully, the individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. Lead multiple interdisciplinary technology projects from ideation phase through full implementation with minimal oversight. Achieve results through working with and guiding less experienced engineers and technicians. Develop novel, robust processes for manufacturing flexible circuits Evaluate new materials for flexible circuits; interfacing with vendors and suppliers to co-develop the best solutions for our products Work hands-on with high-volume manufacturing, qualifying new tools and processes on the manufacturing line Identify key process metrics and establishing statistical process control at workstations Develop and implement metrology methods to measure tool health and process performance data Document standard operating procedures (SOPs) and training staff to perform processes correctly and safely Minimum Qualifications (Knowledge, Skills and Abilities): To perform this job successfully, the individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. Experience/Education: Bachelor's degree or higher in Materials Science, Physics, Chemical Engineering, Mechanical, or related field 5+ yields of hands-on experience working in a high-volume manufacturing or equivalent environment Knowledge/Skills/Abilities: Solid background in statistical methods, systematic experiment design, and related software (e.g. JMP, Minitab, etc.) Ability to synthesize and interpret complex data sets to understand physical phenomena and support data-driven decisions Ability to safely and confidently operate and troubleshoot custom manufacturing and metrology equipment Independent problem solver with a desire to learn new skills and new technology Strong organization skills and detail-oriented mind set Able to adapt to changing requirements in a fast-paced start-up setting Excellent interpersonal and communication skills, strong work ethic, and ability to work in a collaborative team environment Strong understanding of CAD software, SolidWorks and/or AutoCAD is a plus Pluses: Knowledge of and process experience with precision patterning equipment Prior start-up experience Prior experience with roll-to-roll processing. Hands-on focus on injection molding process development and optimization is a plus Physical Demands and Work Environment The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this position. Reasonable accommodations may be made to enable individuals with disabilities to perform the functions. Working Conditions/Hours: Salaried Exempt Physical Demands - Office and Manufacturing Environment While performing the duties of this position, the employee is regularly required to talk or hear. The employee frequently is required to use hands or finger, handle, or feel objects, tools or controls. The employee is required to stand; walk; sit; reach with hands and arms and pull/push; climb or balance; and stoop, kneel, crouch, or crawl. The employee must lift and/or move up to 50 pounds without assistance. Specific vision abilities required by this position include close vision, distance vision, color vision, peripheral vision, and the ability to adjust focus. Work Environment - Includes both a typical office environment, with minimal exposure to excessive noise or adverse environmental issues, and a shop environment, with exposure to high noise levels from operating machines, physical hazards from moving equipment and machine parts, nuisance dust, and skin exposure to chemicals used to run/maintain machines. PPE- May be required to wear Personal Protective Equipment, including but not limited to safety glasses, safety shoes, bump-caps, gloves, hair nets, masks, & clean-room frocks while adhering to the prescribed safety procedures. We believe diversity and inclusion among our teammates are essential to our success. We celebrate diversity and are committed to creating an inclusive environment for all employees while building teams that represent a variety of backgrounds, perspectives, and skills. We are an equal opportunity employer. All employment is decided based on qualifications, merit, and business needs. CelLink participates in the E-Verify program in specific locations as required by law. CelLink was founded in 2012 and entered volume production in 2018. CelLink provides electrical systems to the world's leading EV manufacturers, traditional automotive OEMs, and tiered suppliers. The company has raised approximately $315M in funding through private investment and multiple grants from the US Department of Energy. CelLink's investors include 3M, Atreides, BMW, BorgWarner, Bosch, D1 Capital, Fidelity, Fontinalis Partners, Ford, Franklin Templeton, Lear, Park West, SK Telecom, Standard Investments, T. Rowe Price, Tinicum, and Whale Rock .
    $77k-102k yearly est. Auto-Apply 38d ago
  • Process Engineer

    Bright Farms 4.1company rating

    Lorena, TX jobs

    BrightFarms is The Place to Grow! At BrightFarms, we're on a mission to revolutionize the way leafy greens are grown. But we don't just want to grow great tasting greens, we want them to do good as well: for the planet, for the health of people, and for the well-being of our employees. We give BrightFarmers the tools, training, support, and opportunities they need to do better for themselves and the world every day. Because when you do good for your people, they do good for the world. BrightFarms is a national leader in the booming indoor farming industry, transforming how produce is grown and delivered with its expanding network of five high-tech, sustainable hydroponic farms. From seed to leaf we grow lettuce smarter, with less space, no pesticides and protected inside, so it's better for the planet and way better for humans. BrightFarms' fresh lettuce options, from classic greens to crunchy mixes and salad kits, are available in more than 4,500 retail stores across the East Coast, Midwest, and South. Our Lorena, Texas farm is the largest and most advanced greenhouse in the state, delivering fresh, locally grown leafy greens to Texans year-round. Spanning nearly 480,000 square feet, this facility will produce approximately 6 million pounds of leafy greens annually. Specifically, you will: * Analyze and improve greenhouse workflows, including packaging, planting, harvesting, irrigation and climate control through leading cross-functional Kaizen events and Root Cause Analysis sessions to identify and mitigate operational risks * Work side by side with operators daily and quickly learning facility specific equipment to become an expert ready to develop SOPs, mentor other technical teams, train hourly and technical teams * Work directly with Greenhouse staff to understand operational challenges and implement practical, employee-engaged solutions while providing coaching on Lean principles, Six Sigma methods, and statistical analysis tools (e.g., SPC, DOE, regression modeling) * Collaborate with mechanical, electrical, and controls engineers and vendors to design and implement process improvements and automation solutions * Collect and interpret operational data to identify trends, inefficiencies, and opportunities for improvement * Lead and support process improvement and capital projects from concept through implementation, including timeline, budget, ROI and resource planning * Partner with HSE and FSQ teams on people safety, environmental and food compliance. Ensure all process changes comply with safety standards, environmental regulations, and company policies * Maintain accurate records of process changes, performance metrics, and improvement outcomes The ideal candidate: * Bachelor's degree in Engineering (Process, Production, Mechanical, Industrial, Chemical, or related) or equivalent experience * 3-5 years of experience in process engineering, preferably in agriculture, horticulture, food production, or manufacturing * Must be flexible to work on day or night shift depending on site needs * Expertise in data analytics tools (Excel, Minitab, Tableau, etc.) and experience with ERP/MES systems * Knowledge of automation systems (PLCs, HMIs, SCADA) and control strategy development * Demonstrated success leading cost savings initiatives, product launches, or system redesigns * Excellent communication, mentorship, and stakeholder alignment skills BrightFarms is an Equal Employment Opportunity employer - All qualified applicants/employees will receive consideration for employment without regard to that individual's age, race, color, religion or creed, national origin or ancestry, sex (including pregnancy), sexual orientation, gender, gender identity, physical or mental disability, veteran status, genetic information, ethnicity, citizenship, or any other characteristic protected by law.
    $77k-102k yearly est. 2d ago
  • Process Engineer

    Brightfarms Inc. 4.1company rating

    Lorena, TX jobs

    Job Description BrightFarms is The Place to Grow! At BrightFarms, we're on a mission to revolutionize the way leafy greens are grown. But we don't just want to grow great tasting greens, we want them to do good as well: for the planet, for the health of people, and for the well-being of our employees. We give BrightFarmers the tools, training, support, and opportunities they need to do better for themselves and the world every day. Because when you do good for your people, they do good for the world. BrightFarms is a national leader in the booming indoor farming industry, transforming how produce is grown and delivered with its expanding network of five high-tech, sustainable hydroponic farms. From seed to leaf we grow lettuce smarter, with less space, no pesticides and protected inside, so it's better for the planet and way better for humans. BrightFarms' fresh lettuce options, from classic greens to crunchy mixes and salad kits, are available in more than 4,500 retail stores across the East Coast, Midwest, and South. Our Lorena, Texas farm is the largest and most advanced greenhouse in the state, delivering fresh, locally grown leafy greens to Texans year-round. Spanning nearly 480,000 square feet, this facility will produce approximately 6 million pounds of leafy greens annually. Specifically, you will: Analyze and improve greenhouse workflows, including packaging, planting, harvesting, irrigation and climate control through leading cross-functional Kaizen events and Root Cause Analysis sessions to identify and mitigate operational risks Work side by side with operators daily and quickly learning facility specific equipment to become an expert ready to develop SOPs, mentor other technical teams, train hourly and technical teams Work directly with Greenhouse staff to understand operational challenges and implement practical, employee-engaged solutions while providing coaching on Lean principles, Six Sigma methods, and statistical analysis tools (e.g., SPC, DOE, regression modeling) Collaborate with mechanical, electrical, and controls engineers and vendors to design and implement process improvements and automation solutions Collect and interpret operational data to identify trends, inefficiencies, and opportunities for improvement Lead and support process improvement and capital projects from concept through implementation, including timeline, budget, ROI and resource planning Partner with HSE and FSQ teams on people safety, environmental and food compliance. Ensure all process changes comply with safety standards, environmental regulations, and company policies Maintain accurate records of process changes, performance metrics, and improvement outcomes The ideal candidate: Bachelor's degree in Engineering (Process, Production, Mechanical, Industrial, Chemical, or related) or equivalent experience 3-5 years of experience in process engineering, preferably in agriculture, horticulture, food production, or manufacturing Must be flexible to work on day or night shift depending on site needs Expertise in data analytics tools (Excel, Minitab, Tableau, etc.) and experience with ERP/MES systems Knowledge of automation systems (PLCs, HMIs, SCADA) and control strategy development Demonstrated success leading cost savings initiatives, product launches, or system redesigns Excellent communication, mentorship, and stakeholder alignment skills BrightFarms is an Equal Employment Opportunity employer - All qualified applicants/employees will receive consideration for employment without regard to that individual's age, race, color, religion or creed, national origin or ancestry, sex (including pregnancy), sexual orientation, gender, gender identity, physical or mental disability, veteran status, genetic information, ethnicity, citizenship, or any other characteristic protected by law.
    $77k-102k yearly est. 25d ago
  • Process Engineer

    Brightfarms Inc. 4.1company rating

    Lorena, TX jobs

    BrightFarms is The Place to Grow! At BrightFarms, we're on a mission to revolutionize the way leafy greens are grown. But we don't just want to grow great tasting greens, we want them to do good as well: for the planet, for the health of people, and for the well-being of our employees. We give BrightFarmers the tools, training, support, and opportunities they need to do better for themselves and the world every day. Because when you do good for your people, they do good for the world. BrightFarms is a national leader in the booming indoor farming industry, transforming how produce is grown and delivered with its expanding network of five high-tech, sustainable hydroponic farms. From seed to leaf we grow lettuce smarter, with less space, no pesticides and protected inside, so it's better for the planet and way better for humans. BrightFarms' fresh lettuce options, from classic greens to crunchy mixes and salad kits, are available in more than 4,500 retail stores across the East Coast, Midwest, and South. Our Lorena, Texas farm is the largest and most advanced greenhouse in the state, delivering fresh, locally grown leafy greens to Texans year-round. Spanning nearly 480,000 square feet, this facility will produce approximately 6 million pounds of leafy greens annually. Specifically, you will: Analyze and improve greenhouse workflows, including packaging, planting, harvesting, irrigation and climate control through leading cross-functional Kaizen events and Root Cause Analysis sessions to identify and mitigate operational risks Work side by side with operators daily and quickly learning facility specific equipment to become an expert ready to develop SOPs, mentor other technical teams, train hourly and technical teams Work directly with Greenhouse staff to understand operational challenges and implement practical, employee-engaged solutions while providing coaching on Lean principles, Six Sigma methods, and statistical analysis tools (e.g., SPC, DOE, regression modeling) Collaborate with mechanical, electrical, and controls engineers and vendors to design and implement process improvements and automation solutions Collect and interpret operational data to identify trends, inefficiencies, and opportunities for improvement Lead and support process improvement and capital projects from concept through implementation, including timeline, budget, ROI and resource planning Partner with HSE and FSQ teams on people safety, environmental and food compliance. Ensure all process changes comply with safety standards, environmental regulations, and company policies Maintain accurate records of process changes, performance metrics, and improvement outcomes The ideal candidate: Bachelor's degree in Engineering (Process, Production, Mechanical, Industrial, Chemical, or related) or equivalent experience 3-5 years of experience in process engineering, preferably in agriculture, horticulture, food production, or manufacturing Must be flexible to work on day or night shift depending on site needs Expertise in data analytics tools (Excel, Minitab, Tableau, etc.) and experience with ERP/MES systems Knowledge of automation systems (PLCs, HMIs, SCADA) and control strategy development Demonstrated success leading cost savings initiatives, product launches, or system redesigns Excellent communication, mentorship, and stakeholder alignment skills BrightFarms is an Equal Employment Opportunity employer - All qualified applicants/employees will receive consideration for employment without regard to that individual's age, race, color, religion or creed, national origin or ancestry, sex (including pregnancy), sexual orientation, gender, gender identity, physical or mental disability, veteran status, genetic information, ethnicity, citizenship, or any other characteristic protected by law.
    $77k-102k yearly est. Auto-Apply 60d+ ago
  • Process Engineer - Manufacturing & Embedded Systems

    Carnegie Robotics 4.4company rating

    Pittsburgh, PA jobs

    Who We Are: Carnegie Robotics designs and manufactures advanced robotics systems and components for defense, agricultural, mining, industrial, and off-road autonomy applications. Our ruggedized solutions can meet the challenges of any industry, providing effective and efficient answers for even the toughest problems. Job Summary Carnegie Robotics is seeking a Process Engineer who can merge manufacturing engineering principles with embedded systems and connected technologies. This role is ideal for someone who thrives at the intersection of hardware, software, and process optimization. The incumbent will build and refine reliable manufacturing systems from prototype through full-scale production. What You'll Be Doing Developing and optimizing manufacturing processes for electronic and electromechanical assemblies, ensuring repeatability, scalability, and quality Applying manufacturing engineering principles to improve assembly line flow, ergonomics, equipment utilization, and process documentation Designing and implementing test systems using C++ and Python to automate device configuration, data capture, and validation workflows Integrating Internet of Things communication protocols such as MQTT and LoRa into production and test environments to enable real-time monitoring and data-driven process improvement Designing jigs, fixtures, and assembly aids in Solidworks to support efficient and error-proof production Working with MES/MRP systems to track production data, manage work orders, maintain BOMs, and ensure alignment with quality and inventory systems Collaborating with Electrical, Mechanical, and Software Engineering teams to refine designs for manufacturability (DFM) and assembly (DFA) Performing root-cause analyses and leading continuous improvement initiatives using Lean, Six Sigma, and 8D methodologies What You'll Have Bachelor's degree in Manufacturing, Mechanical, Electrical, or Software Engineering, or related discipline; or applicable job experience 3+ years' experience in a manufacturing or process engineering environment Proficiency in C++ and Python for test automation and data analysis Knowledge of MQTT, LoRa, and other IoT/industrial communication protocols Strong working experience with SolidWorks for fixturing and tooling design Hands-on experience with MES and MRP systems (e.g. Odoo, Epicor, Fishbowl, SAP) Strong understanding of Lean manufacturing principles, process mapping, and continuous improvement methodologies How You'll Stand Out You have experience with embedded Linux systems, microcontrollers, or industrial sensor integration You have familiarity with data visualization tools (e.g. Grafana, Power BI) or production analytics dashboards You have experience writing process documentation, PFMEAs, and control plans You have work experience in robotics, automation, or low-volume high-mix manufacturing environments What You Get Out of It UPMC health coverage with FSA or HSA options Comprehensive dental, vision, and life insurance Fidelity 401(k) plan with employer match Free catered lunch every day with a vegan option 31 Days of PTO (including holidays) Comp time for company travel Carnegie Robotics is an Equal Opportunity Employer that welcomes applications from all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, national origin, age, disability, marital status, or status as a covered veteran in accordance with applicable laws and Carnegie Robotics' employment policies.
    $62k-80k yearly est. Auto-Apply 42d ago
  • Project Engineer - Total Process Safety (TPS)

    Amp Americas 4.1company rating

    Chicago, IL jobs

    Founded in 2011, Amp Americas builds, manages, operates and maintains RNG production facilities that convert dairy waste into carbon-negative hydrogen, renewable transportation fuel and power. The vertically-integrated company leverages over a decade of unique expertise and specialized experience in carbon-negative fuel development, operations, services and marketing to deliver comprehensive, turn-key solutions that address greenhouse gas emissions and seek to improve air, land and water quality. Visit ampamericas.com. Position Summary The Project Engineer - Total Process Safety (TPS) will play a key role in advancing the company's engineering and safety excellence within its biogas manufacturing operations. During the first six (6) months, the position will focus on implementing the Total Process Safety (TPS) framework - developing systems, safety inspections, tools, and procedures that ensure safe, reliable, and compliant operation of biogas production and upgrading facilities. Following the initial implementation phase, the role will transition into a Project Engineer position, leading and supporting capital projects from design through commissioning, while maintaining strong integration of process safety principles in all project phases. This position will report to the Engineering Manager. Key Responsibilities A. Initial 6-Month Total Process Safety (TPS) Phase Lead the rollout of the TPS technical assessments within operations and engineering teams Work with the external engineering partners to complete sites inspections of Safety Critical Devices, electrical systems and gas pressure systems Develop and maintain Safety Critical Element (SCE) and Safety Critical Device (SCD) registers derived from HAZOP/PHA studies Support and facilitate HAZOP, LOPA, and risk assessment workshops for existing and new systems Implement procedures for functional testing, alarm and trip verification, and bypass/override management Work with operations and maintenance teams to embed safe operating practices, ensuring clear accountability for process safety controls Establish process safety performance metrics (leading and lagging indicators) Support training and awareness activities related to process safety systems and TPS principles B. Transition to Project Engineering Role Plan, execute, and deliver capital and improvement projects related to biogas production, upgrading, and utilities systems Prepare engineering design packages, including scopes of work, P&IDs, datasheets, and cost estimates Coordinate FEED, detailed design, procurement, construction, and commissioning activities Ensure process safety integration in all project stages-from concept to startup Collaborate with multidisciplinary teams (process, mechanical, electrical, controls, and operations) to achieve safe and efficient project outcomes Manage contractors, vendors, and third-party service providers to meet project objectives on schedule, within budget, and in compliance with standards Support project documentation, design reviews, management of change (MOC), and pre-startup safety reviews (PSSR) Qualifications Bachelor's Degree in Chemical, Mechanical, or Process Engineering 3-7 years of relevant experience in engineering, operations, or process safety in industrial or energy sectors (biogas, oil & gas, or chemical processing preferred) Strong understanding of Process Safety Management (PSM) and functional safety Experience participating in or leading HAZOPs and risk assessments Solid knowledge of plant instrumentation, control systems, and process operations Strong organizational and communication skills; ability to collaborate across teams Demonstrated commitment to safety-first culture Experience in biogas, renewable energy, or anaerobic digestion operations (preferred) Knowledge of environmental and regulatory frameworks (EPA, OSHA, COMAH, ATEX, etc.) (preferred) Project management experience or certification (PMP, PRINCE2, or equivalent) (preferred) What We Offer Compensation package commensurate with experience Comprehensive benefits package including health, dental, vision, disability, and life insurance Paid time off and paid company holidays Opportunity to build upon your career in a company on the cutting edge of the RNG industry Disclaimer: This job description is not intended to be a comprehensive list of the duties and responsibilities of the position, and the duties and responsibilities may change without notice. Amp is an Equal Opportunity Employer and is committed to excellence through diversity.
    $65k-84k yearly est. Auto-Apply 60d+ ago

Learn more about KIK Custom Products Inc. jobs