Post job

Reliability Engineer jobs at Cisco

- 4024 jobs
  • AI Infrastructure Site Reliability Engineer (remote USA)

    Cisco 4.8company rating

    Reliability engineer job at Cisco

    The application window is expected to close on: 12-25-25 Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Meet the Team At Cisco, the AI Infrastructure Services team is at the forefront of integrating artificial intelligence into our platforms, transforming collaboration, security, networking, observability, and more. We design, build, and maintain high-performance compute and AI platforms-including NVIDIA DGX and Cisco-UCS infrastructure-to empower Cisco's business and drive innovation. Working alongside top AI experts, you'll contribute to ethical AI products and solutions that solve real-world problems and shape the future of technology. Your Impact As an AI Site Reliability Engineer, you will: - Leverage SRE practices to reduce toil and maintain Service Level Objectives (SLOs) for internal AI platforms. - Lead, build, and run fully automated pipelines through CI/CD systems for operational excellence and continuous improvements. - Ensure the availability, scalability, latency, and efficiency of NVIDIA DGX and Cisco-UCS infrastructure using fault-tolerant engineering approaches. - Drive capacity planning, performance analysis, instrumentation, and other non-functional requirements. - Automate operational capabilities using Python, Ansible, Terraform, Go, and related technologies. - Deliver automation through CI/CD pipelines and chatbot integrations. - Implement metrics-driven processes to maintain high service quality. Minimum Qualifications - Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent years of IT experience. - 5+ years Experience deploying and administering NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g., Cray, HPE, IBM). - 5+ years coordinating and supporting Linux-based operating systems. - 5+ years Proficiency in programming languages such as Python, Go, C/C++; experience with Git and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins). - 5+ years experience deploying enterprise-grade Kubernetes clusters (RedHat OpenShift preferred) and/or Google Anthos. - Advanced knowledge of Kubernetes, Docker, Terraform, Ansible, Jenkins, GitOps, Git, and Linux. - 5+ years Experience with the software development lifecycle: design, development, testing, packaging, and deployment (preferably using Python or Go). Preferred Qualifications - Master's degree or equivalent experience in a relevant field. - Certifications in Linux, networking, cloud, or related technologies. - Previous experience as a compute or site/systems reliability engineer. - Experience with hybrid cloud, virtualization, and container technologies. - Familiarity with Agile and DevOps operating models, including project tracking tools (e.g., Jira, Rally). - Excellent collaboration, leadership, and communication skills. **Why Cisco?** At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. **Message to applicants applying to work in the U.S. and/or Canada:** The starting salary range posted for this position is $165,000.00 to $241,400.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: + 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees + 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco + Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees + Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) + 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next + Additional paid time away may be requested to deal with critical or emergency issues for family members + Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: + .75% of incentive target for each 1% of revenue attainment up to 50% of quota; + 1.5% of incentive target for each 1% of attainment between 50% and 75%; + 1% of incentive target for each 1% of attainment between 75% and 100%; and + Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $165,000.00 - $277,600.00 Non-Metro New York state & Washington state: $146,700.00 - $247,000.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. ** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements. Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.
    $92k-112k yearly est. 46d ago
  • Site Reliability Engineer

    Matlen Silver 3.7company rating

    Columbus, OH jobs

    Title: Senior Cloud Security Engineer/Architect Environment: Onsite Duration: 6 month contract to hire Contract pay: $68-$90/hour W2 Conversion salary: $150k-$188k NO C2C ** Due to client requirements, US Citizen or GC Holder ONLY ** Requirements Minimum 13+ years of professional experience in Cloud Infrastructure, DevOps, or Site Reliability Engineering. Strong Infrastructure as Code (IaC) expertise with Terraform-hands-on experience creating and managing EKS clusters, repositories, and Terraform modules. Architect, implement, and manage Azure IaaS infrastructure encompassing VNets, subnets, network security groups, VPN gateways, CDNs, Traffic Manager, peering, custom routes, DNS, DHCP, and virtual appliances. Proven proficiency across Azure and/or AWS (multi-cloud experience preferred). Strong security mindset with practical experience in IAM, vulnerability remediation, encryption, and patching. Solid understanding of DNS, Docker, Kubernetes, and containerization best practices. Experience with Windows and Linux/Unix system and network administration (8+ years). Proficiency in one or more programming/scripting languages: Python, Go, Bash, or Ruby. Expertise in Terraform, Ansible, or Chef for automation and configuration management. Hands-on experience with cloud services (AWS, Azure, GCP) - including EC2, S3, Kubernetes, and serverless environments. Knowledge of networking fundamentals: DNS, firewalls, load balancing, and VPNs. Experience with container orchestration using Docker, Kubernetes, or OpenShift. Experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, or New Relic. CI/CD pipeline development using Jenkins, GitLab CI, GitHub Actions, or CircleCI. Bonus: Experience with HashiCorp Vault and advanced Terraform module design. Deep understanding of access control, encryption standards, secure coding practices, and regulatory frameworks Skilled in incident management, root cause analysis, automation, and performance tuning. Understanding of SLOs/SLAs, system scalability, redundancy, and resilience best practices.
    $150k-188k yearly 3d ago
  • Site Reliability Engineer

    Bcforward 4.7company rating

    Jersey City, NJ jobs

    *Presently we are unable to sponsor and request applicants to apply who are authorized to work without sponsorship* (Can work only on W2) Below are the few details of the opportunity. Job Title: Software Engineering (SRE/DevOps/Windows Eng) Location: Jersey City, NJ 07310 - Onsite Duration: Contract to Hire Job Description: About Candidate: End to end - development, deployment, automation & monitor - using Automation CI/CD pipelines Working with SQL servers, oracle Most apps deployed on windows servers - (windows stack - deployment front end web servers, application servers and database servers) Manage vendor applications Experience with reporting Observability - is key - Graphana, dashboards, Dynatrace, SQL monitoring Agile Skills (required) - Windows PowerShell - scripting / APIs (post man, swagger) Automation - (jewls PL), this is an CI/CD process
    $86k-115k yearly est. 1d ago
  • Site Reliability Engineer

    Pyramid Consulting, Inc. 4.1company rating

    Roanoke, TX jobs

    Immediate need for a talented Site Reliability Engineer. This is a 12 months contract opportunity with long-term potential and is in Westlake, TX(Onsite). Please review the job description below and contact me ASAP if you are interested. Job Diva ID: 25-94846 Pay Range: $60/hr - $65/hr. Employee benefits include, but are not limited to, health insurance (medical, dental, vision), 401(k) plan, and paid sick leave (depending on work location). Key Requirements and Technology Experience: Key Skills; Production Support, Python/Java, JavaScript, Kubernetes, AWS, SQL, DevOps, CI/CD, Observability tools Observability tools (Datadog, Splunk etc). Bachelor's Degree in Computer Science, Information Science, (or equivalent) Minimum 8 years of software engineering experience Hands-on Linux experience preferred (user and permissions management, file systems, performance tuning) Hands-on Experience in AWS compute and storage services (AWS Lambda, S3, Glue, Route 53 & IAM), Kubernetes Observability skills such as Datadog, Splunk, SLI/SLO and other tools. Some databases and using SQL (like Oracle, MySQL, Postgres, or Dynamo DB) required Proficiency with Data Processing and ETL (Control-M & Informatica) Experience with tools like: GIT, Maven, Jenkins, uDeploy, JIRA, Artifactory, Sonar. Proficiency with scripting languages like Python, Java, Bash or Power Shell preferred Proven experience with CI/CD pipelines using technology such as Groovy, Jenkins, JenkinsCore and Urbancode Deploy preferred Experience with ITSM (Incident, Change & Problem Management) Automation skills Excellent at communicating and building relationships across teams and technology partners Confidence to work independently and with minimum supervision Our client is a leading Financial Industry, and we are currently interviewing to fill this and other similar contract positions. If you are interested in this position, please apply online for immediate consideration Pyramid Consulting, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, colour, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. By applying to our jobs, you agree to receive calls, AI-generated calls, text messages, or emails from Pyramid Consulting, Inc. and its affiliates, and contracted partners. Frequency varies for text messages. Message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You can reply STOP to cancel and HELP for help. You can access our privacy policy here.
    $60 hourly 5d ago
  • Process & Reliability Engineer- Industrial Analytics

    Prometheus Group 3.9company rating

    Raleigh, NC jobs

    Prometheus Group is a team of self-starters centered on being resourceful, accountable, and results-focused. Career progress is based on merit and not years of service or attaining certifications. Our drive and dedication to creating great products for our global customers are at the heart of all we do! In joining Prometheus, you will become a part of the largest global provider of comprehensive enterprise asset management (EAM) software solutions that support the management life cycle for equipment maintenance and operations. This position will provide an opportunity to join a successful, rapidly-growing software company that is backed by some of the most reputable private equity firms in the world such as Advent, LGP, and Genstar. An ideal candidate will bring the skills and aptitude necessary to manage the increasing complexity of the company's global operations driven by the company's continued expansion. Success in this role will provide opportunities for increased levels of responsibility within the company. Position Overview We have a truly unique opportunity to apply your engineering background in the exciting world of AI and Machine Learning! We're looking for a self-motivated, detail-oriented hands-on engineer who is passionate about analytics technology and enjoys customer interaction. Specifically, you will be responsible for remotely implementing Prometheus APM, monitoring plant operations, and working directly with customers to ensure they receive great value from our software. If you have an engineering background and would thrive on solving problems using a unique blend of applied engineering, cutting-edge technology, and customer interaction, this is the perfect opportunity to join the fast-paced tech industry. Responsibilities Implement Prometheus APM for new customers. Review and structure customer process data (pressure, temperature, flows, vibrations, etc.) Configure first-principle performance calculations (efficiency, effectiveness, etc.) Build machine learning models easily using Prometheus APM's automated model deployment tools. Monitor customer's plant operations Responsible for risk identification, customer escalation, and mitigation. Review alerts generated by Prometheus APM's AI and machine learning models Diagnose the root cause of operational deviations and collaborate with on-site personnel to develop and execute remediation plans Track the data in Prometheus APM to ensure the problem has been resolved Lead weekly/monthly/quarterly business reviews with customers, reinforcing Prometheus APM solutions to solve critical plant issues. Lead customers through adopting Prometheus APM while driving high ROI leading to subscription renewals. Facilitate customer relationships through video calls and occasional on-site visits. Gather and share customer feedback with the appropriate Product Teams, ensuring customer needs are heard throughout the product team. Identify incremental opportunities company-wide and works closely with Product and Sales Teams. Preferred Qualifications Bachelor's degree in Mechanical Engineering, Chemical Engineering, or Electrical Engineering 0-2 Years of engineering experience in industrial process plants (Power Generation, Pulp & Paper, Oil & Gas, Chemicals, etc). Demonstrated aptitude for adopting new technologies. Strong data analysis skills. Excellent customer relationship skills; ability to grow and retain accounts, build relationships and quickly spot and communicate potential risks and issues. Customer-focused, organized, excellence oriented, and personable. Problem Solver - the ability to get to the root cause of complicated operational issues. Strong written and verbal communication skills to effectively present operational findings and suggested actions. Benefits Overview We offer an attractive benefits program to meet the diverse needs of our teammates: Employee base HSA plan, dental, life and short-term disability coverage 100% paid for by Prometheus Group HSA & FSA plan options Retirement Savings with Generous Company Match & Immediate Vesting Gym membership to O2 Fitness Casual dress attire Half-Day Fridays Generous Paid Time Off Company Outings, Trips & Activities Prometheus Group is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. #LI-onsite
    $89k-127k yearly est. 1d ago
  • Sr. Site Reliability Engineer (SRE)

    Avenue Code 3.5company rating

    Mountain View, CA jobs

    About the Opportunity: We're seeking an experienced, highly collaborative SRE to partner with product teams and tackle our most critical infrastructure challenges. You'll be hands-on in designing, building, and operating our cloud platform-and driving the reliability, performance, and security that empower our engineering organization. Responsibilities: Infrastructure as Code & CI/CD: Automate provisioning and deployments with Terraform and integrate best-practice pipelines (GitHub Actions, ArgoCD, etc.). Reliability Engineering: Define SLIs/SLOs, manage error budgets, and build dashboards & alerts to proactively measure and improve system health. Security & Compliance: Enforce least-privilege IAM policies, automate vulnerability scans, and maintain audit logging for compliance. Monitoring & Observability: Instrument services with metrics, logs, and distributed tracing to enable rapid troubleshooting, aid teams in alerting, custom metrics, and dashboarding Incident Management: Own on-call rotations, lead real-time incident response, conduct post-mortems, and drive continuous improvements. Cost Optimization: Implement tagging strategies, right-size resources, and leverage concrete data to decide on optimal methods to control cloud spend at scale. Documentation & Mentorship: Author runbooks, standards, and best-practice guides-and coach dev teams on implementing modern DevOps, reliability, and security patterns. Required Qualifications: Have 5+ years of experience running production critical systems. Deep proficiency with the AWS Cloud and Cloud-Native best practices. Experience with Kubernetes (EKS, GKE) and Container Orchestration at scale. Skilled in Terraform to declaratively provision and maintain infrastructure services. Working knowledge of managing and debugging databases like Redis and Postgres. Strong familiarity with VPC, VPN, Load Balancing, and cloud networking components. Proficiency with Git workflows, branching strategies, and CI/CD systemintegrations. Solid understanding of web and network protocols and standards (HTTP, REST, TLS, DNS, etc...) Professional proficiency in English (both written and spoken) is required for this role. Nice to Have Skills: Bachelor's degree, or equivalent in Computer Science, Engineering, or a related field. Experience with ArgoCD, GitHub Actions, Jenkins, or other CI/CD pipeline solutions. Working knowledge of Python, Golang, and Helm templating languages. Node.js experience a plus, including running scalable, resilient Node microservices. Grasp of foundational security best practices for cloud infrastructure. Awareness of Terragrunt, managing Terraform state, and optimal project structure. Seasoned in production readiness fundamentals amidst a fast-moving team. Avenue Code reinforces its commitment to privacy and to all the principles guaranteed by the most accurate global data protection laws, such as GDPR, LGPD, CCPA and CPRA. The Candidate data shared with Avenue Code will be kept confidential and will not be transmitted to disinterested third parties, nor will it be used for purposes other than the application for open positions. As a Consultancy company, Avenue Code may share your information with its clients and other Companies from the CompassUol Group to which Avenue Code's consultants are allocated to perform its services.
    $144k-188k yearly est. 3d ago
  • Site Reliability Engineer

    Optomi 4.5company rating

    Plano, TX jobs

    Optomi, in partnership with a leading technology operations center, is looking for an SRE - Cloud Platform to join their team in Plano, TX. 6 month contract to hire Onsite in Plano, TX 4x/week The SRE - Cloud Platform will be focused on operating and automating scalable, resilient AWS infrastructure. Working with core AWS services such as EKS, Lambda, CloudWAN, ECR, and Systems Manager, this role will drive self-healing automation, observability, and CI/CD pipeline integration. The role embodies SRE best practices to ensure reliability, performance, and operational excellence of cloud-native platforms supporting business-critical applications. This position will collaborate closely with Cloud Platform Development Teams, Production Engineering, and Major Incident Management teams to resolve production issues and improve infrastructure. What the right candidate will enjoy: Opportunity to work with cutting-edge AWS technologies. Collaborative and cross-functional team environment. Focus on automation, scalability, and operational excellence. What type of experience does the right candidate have: Solid understanding of SRE concepts: SLIs, SLOs, error budgets, incident response. Strong hands-on experience with AWS services such as EKS, Lambda, CloudWAN, and Systems Manager. Experience with infrastructure-as-code tools like Terraform and CloudFormation. Proficiency in scripting languages such as Python, Bash, or PowerShell. Familiarity with DevOps tools like GitHub, Harness, and Dynatrace. What the responsibilities are of the right candidate: Build and maintain components required to automate and self-heal AWS infrastructure. Develop and maintain infrastructure as code (IaC) using Terraform for scalable and repeatable deployments. Manage container orchestration platforms and related cloud-native services. Define and measure SLIs/SLOs, error budgets, and drive reliability improvements. Implement monitoring and observability using Dynatrace and AWS native services like CloudWatch. Participate in incident management, on-call rotations, and lead blameless postmortems. Collaborate cross-functionally to embed SRE principles into cloud platform design and operation. Troubleshoot network issues and manage cloud routing. Added bonus if you have: Certifications like AWS Certified DevOps Engineer or AWS Certified Solutions Architect. Knowledge of integration tools and technologies like MuleSoft, Camel, and message streaming services.
    $87k-122k yearly est. 5d ago
  • Life Safety Systems Engineer - 25-02807

    Datasoft Technologies, Inc. 4.2company rating

    Tempe, AZ jobs

    Life Safety Systems Engineer About the Job Duration : 12-month contract with possibility of extension Qualifications: Post Job As: Life Safety Systems Engineer Job Qualifications Bachelor's Degree in Engineering with the ability to attain PE licensure At least 4 years of AutoCAD design experience Knowledge of IFC, IBC, NFPA101 Life Safety Code and NFPA 72 Fire Alarm and Signaling Code Knowledge and application of NFPA 70 (NEC) and electrical design requirements Knowledge and application of NFPA 400 Hazardous Materials Code Knowledge of Fire Alarm Systems and manufacturers' equipment, including VESDA systems Design experience with Fire Alarm, VESDA systems, security CCTV, HPM and Leak Detection Knowledge of design and construction specifications Experience estimating and scheduling mid-to-large-scale projects Ideally, you'll also have: Professional Engineering (PE) license Experience leading projects and managing resources Experience mentoring junior staff Design Management experience Design experience in semiconductor or data centers Working knowledge of 3D BIM software (Revit) Job Description We're looking for a Fire Protection / Life Safety Systems (LSS) Engineer who is excited about working on projects that enable the heart of our client's business. The ideal candidate will be located in AZ, OR, TX or PA, however, qualified candidates located elsewhere in the U.S will also be considered. As a LSS Engineer working with our teams, you'll use your skills and experience to help our clients, by providing code-compliant asset protection and fire life safety mitigation solutions. You will be expected to both provide guidance to co-workers and receive solutions from variance senior design professionals so that you can take ownership of various projects as the engineer of record. These fire solutions entail fire sprinklers, gaseous agents, fire detection, toxic gas detection, smoke modeling, and explosion prevention. You will interface directly with client counterparts in the designing of their facilities and state and local code officials. Utilizing AutoCAD and Revit MEP, our teams create 3D models in coordination with architects and engineers. From those 3D BIM models you'll design fire sprinkler systems, fire alarm systems, smoke and leak detection, security and intrusion detection systems, and more. Using the various applicable Life Safety & Fire codes as well as industry standards you will provide calculations; equipment size and quantities; equipment and construction specifications; network diagrams and various construction deliverables in both 2D and 3D platforms for a constructible design. About our Company DataSoft Technologies is a highly recognized provider of professional IT Consulting services in the US. Founded in 1994, DataSoft Technologies, Inc. provides staff augmentation services for Information Technology and Automotive Services. Our team member benefits include: Paid Holidays/Paid Time Off (PTO) Medical/Dental Insurance Vision Insurance Short Term/Long Term Disability Life Insurance 401 (K)
    $71k-101k yearly est. 1d ago
  • Site Reliability Engineer

    Optomi 4.5company rating

    Orlando, FL jobs

    Site Reliability Engineer - (Hybrid, Orlando FL) Optomi, in partnership with a leading enterprise organization, is seeking a Site Reliability Engineer (SRE) to join a cloud-focused engineering team supporting large-scale, customer-facing systems. This role requires onsite presence two days per week in Orlando, FL. The ideal candidate is a strong cloud engineer with AWS expertise, hands-on Terraform experience, solid scripting skills, and the confidence to communicate clearly with stakeholders and executive leadership during high-pressure situations. What the Right Candidate Will Enjoy! Working in a modern cloud environment with primary focus on AWS and exposure to GCP and Azure! Supporting enterprise-scale systems with real business impact! Participating in incident bridge calls and collaborating directly with leadership! Maintaining and improving existing Infrastructure as Code environments! Joining a small, highly collaborative SRE/DevOps-focused team! Having autonomy, trust, and visibility while contributing to critical initiatives! Experience of the Right Candidate: Strong hands-on experience supporting AWS cloud environments. Experience working with GCP and/or Azure in an enterprise setting. Hands-on experience maintaining and modifying existing Terraform infrastructure. Comfortable scripting and troubleshooting code-related issues (Python, Bash, Node.js, or similar). Experience using monitoring and observability tools such as Splunk, CloudWatch, Grafana, or AppDynamics. Ability to clearly communicate technical issues to both technical and non-technical audiences. Confidence speaking on calls with large groups, including stakeholders and leadership. Experience working in on-call or incident-response environments. Responsibilities of the Right Candidate: Maintain, support, and optimize cloud infrastructure across AWS, GCP, and Azure environments. Work with existing Terraform and Atlantis configurations to support infrastructure needs. Troubleshoot infrastructure, application, and CI/CD-related issues. Participate in incident bridge calls and provide clear status updates to leadership. Support load balancers, containerized workloads, and cloud-native services. Collaborate with application teams to identify whether issues are infrastructure- or code-related. Utilize monitoring and alerting tools to ensure system performance and reliability. Communicate effectively with engineers, stakeholders, and executives during incidents and projects. Monitoring, Tooling & Cloud Exposure: AWS services including EC2, ECS, EKS, Fargate, Lambda, API Gateway, S3, ALB/ELB, VPC, IAM, and KMS. Google Cloud Platform services including App Engine, Kubernetes, Cloud Functions, and IAM. Infrastructure as Code using Terraform (existing configurations). Monitoring and observability tools including Splunk, CloudWatch, Grafana, and AppDynamics. Configuration and automation tools such as Chef, Ansible, Rundeck, and Vault. Message queuing technologies including RabbitMQ and Pub/Sub. Preferred Qualifications: Experience supporting load balancers and high-traffic systems. Background in SRE or DevOps-oriented teams. Experience working in hybrid cloud and on-prem environments. Strong Linux or Windows systems administration background. Enterprise experience supporting customer-facing applications.
    $79k-114k yearly est. 1d ago
  • Site Reliability Engineer

    Optomi 4.5company rating

    Irving, TX jobs

    Optomi, in partnership with our client, are seeking an experienced SRE II to join their team for a 6 month contract to hire opportunity that is 2 days hybrid onsite in Irving, TX. W2 only - no C2C/sponsorship at this time. We are seeking a highly skilled Site Reliability Engineer II to join our engineering organization. This role focuses on building resilient, scalable, and automated systems-not traditional production support. The ideal candidate has hands-on engineering experience across cloud infrastructure, observability, automation, and reliability-focused development. You will work closely with development, cloud engineering, and platform teams to ensure high availability, optimal performance, and operational excellence of critical customer-facing applications. Key Responsibilities Contribute directly to the reliability, scalability, performance, and security of critical applications. Build reusable services, automation, and frameworks that improve platform stability and developer velocity. Cloud & Platform Engineering Design and enhance cloud infrastructure using Azure services including: Azure Service Bus Event Hub Azure SQL AKS (Azure Kubernetes Service) Function Apps App Services Implement and manage Infrastructure as Code (IaC) using Terraform. Containerization & Orchestration Build and deploy containerized applications using Docker (2-3+ years). Support Kubernetes workloads via AKS, including scaling, upgrades, and cluster reliability improvements. Development & DevOps Collaborate with development teams using a working knowledge of .NET. Improve CI/CD workflows using Azure DevOps (ADO). Monitoring, Observability & Incident Response Implement and optimize monitoring and alerting strategies. Use Splunk Observability Cloud (preferred) or equivalent observability platforms to enhance visibility and reduce MTTR. Drive proactive incident identification, root-cause analysis, and long-term fixes. Performance, Reliability & Scalability Enhancements Design and implement SLOs, SLIs, and error budgets. Develop auto-scaling policies, failover strategies, and disaster recovery procedures. Optimize application and database performance to ensure reliability across high-traffic, mission-critical systems. Required Qualifications 3-5+ years of hands-on SRE experience Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent experience) Master's degree preferred Hands-on experience with: Azure Cloud (AKS, Service Bus, Event Hub, SQL, Function Apps, App Services) Terraform Docker Azure DevOps Monitoring tools (Splunk Observability Cloud preferred) .NET ecosystem (understanding of development fundamentals) Preferred Skills Experience designing resilient, distributed systems Strong troubleshooting and analytical skills Performance tuning across applications, databases, and cloud services Experience improving uptime, latency, throughput, or cost efficiency of production applications Familiarity with SRE principles and modern operational practices
    $87k-122k yearly est. 1d ago
  • Process Engineer HLK

    U.S. Tsubaki Power Transmission, LLC 4.2company rating

    Holyoke, MA jobs

    The TSUBAKI name is synonymous with excellence in quality, dependability, and customer service. U.S. Tsubaki is a leading manufacturer and supplier of power transmission and motion control products. As a part of a vast, international network of corporate and industrial resources, Tsubaki offers its customers the finest state-of-the-art products available in the world and we strive to be the "Best Value" supplier in the industry. Essential Duties and Responsibilities: The essential duties and responsibilities of this job are included but not limited to this job description - other tasks may be assigned and expected to be performed. Prepare supporting data and documentation for Capital Appropriation approval request forms Create and maintain shop routings for components and chain assemblies. Create product structures by assigning and calculating raw material for components. Provide tooling cost for job quotations. Provide run time standards and estimates for job quotations. Maintain new and existing tooling database. Maintain records for wastewater treatment, discharge, chemical purchasing and usage. Manage database for item master and item site planning information. Manage the database for work centers/departments. Installation and maintenance of all cost and performance standards. Perform occupied time studies for machine and labor run time standards. Perform cost analysis of product Analyze manufacturing processes and determine Return on Investment (ROI) for project justification. Participate in Design Review meetings with Product and Design Engineers. Respond to customer inquiries, escalating manufacturing and delivery issues as appropriate. Other tasks, projects and functions as assigned. Requirements: Bachelor's Degree in industrial or Manufacturing Engineering preferred. 4 or more years of related work experience. Knowledge of lean manufacturing and an understanding of rates and cycle time Proficient in Microsoft Office Print interpretation including GD&T Knowledge of engineering principles Ability to travel, if needed Learn more about U.S. Tsubaki at: ************************* U.S. Tsubaki offers a competitive compensation and benefits package, including health benefits effective on date of hire, dental and vision benefits effective on the first of the month following date of hire, Paid Time Off ("PTO"), 10 paid holidays, generous 401(k) match and profit sharing, annual bonus potential, life insurance, short and long-term disability, flexible spending accounts, commuter benefits, education reimbursement, home and auto insurance discounts, and pet insurance. The estimated salary range is meant to reflect an anticipated salary range for the position. We may pay more or less than of the anticipated range based upon market data and other factors, all of which are subject to change. Individual pay is based on location, skills and expertise, experience, and other relevant factors. Tsubaki is an Equal Opportunity Employer - Minorities/Females/Veterans/Disability PM21 Compensation details: 80000-100000 Yearly Salary PI484bca973b8b-37***********1
    $72k-127k yearly est. 6d ago
  • Senior Site Reliability Engineer

    Optomi 4.5company rating

    Winter Park, FL jobs

    Optomi, in partnership with a leading organization, is looking for a Sr. Site Reliability Engineer (SRE) to join their team. 4 month contract with possibility to extend- W2 only - no sponsorship offered, USC/GC holders only. Hybrid onsite 2 days a week in Winter Park, FL. Position Summary: This role focuses on maintaining and optimizing cloud environments, primarily in AWS (80%), with some exposure to GCP (15%) and Azure (5%). The Site Reliability Engineer will work on existing Terraform infrastructure, ensuring systems are efficient and functional, without the need to write Terraform from scratch. Candidates should be comfortable scripting, using tools like Splunk, and speaking in front of stakeholders and leadership, including calls with over 100 attendees. What you will enjoy: Working in a dynamic cloud environment with exposure to multiple platforms (AWS, GCP). Opportunities to collaborate with leadership and stakeholders. A supportive team environment with potential for contract extension. What you can bring: Strong expertise in AWS and GCP cloud environments. Proficiency in scripting. Experience maintaining and modifying existing Terraform infrastructure. Familiarity with monitoring tools like Splunk. Confidence speaking in front of stakeholders and large groups. Terraform/Atlantis AWS Cloud (Fargate, ECS, Lambdas, ApiGateways, EC2, S3, ALB/ELB, Elasticache, EKS, KMS-Secret Manager, VPCs, IAM) Logging/Monitoring/Alerting (Cloudwatch/Splunk/AppDynamics/Elasticache/Grafana) Rundeck, Chef, Ansible, Vault MessageQueueing: RabbitMQ, PubSub Google Cloud Platform (App Engine, Kubernetes ( Helm/Tiller ), Cloud Functions, Firebase, IAM) Load balancers Languages: Go/Python/ Node.js (Angular.js framework)/Java (Spring MVC) Technical Qualifications: • Technical experience in consumer and employee facing enterprise systems • Ability to troubleshoot applications and systems • Expertise in maintaining web, caching and queuing technologies in large high traffic environments. • Expertise in architecting highly scalable and highly available systems • Expertise in a public cloud (AWS or Google's GCP, Azure) • Proficiency with containerization • Proficiency in a programming language • Proficiency with distributed version control systems (for example GIT) with Continuous Integration/Deployment techniques. • Proficiency supporting SQL and NoSQL technologies
    $94k-133k yearly est. 1d ago
  • ONLY W2 & LOCAL CANDIDATES :: Quality Engineer in Sunnyvale, CA

    Infotree Global Solutions 4.1company rating

    Sunnyvale, CA jobs

    Key Skills: Full Stack Validation & Hands-On Testing, Automation Scripting and Tool Development, Data Collection & Analysis, DevOps Required Experience: Proven experience with full stack validation, test case development, and test strategy creation. Proven experience running data collection protocols and managing datasets. Strong experience with Python (and ideally Bash) scripting for data processing and automation. Data analysis and statistical interpretation skills. Meticulous attention to detail with excellent written and verbal communication.
    $85k-113k yearly est. 2d ago
  • Process Engineer

    Pyramid Consulting, Inc. 4.1company rating

    San Antonio, TX jobs

    Immediate need for a talented Process Engineer. This is a 12+ Months contract opportunity with long-term potential and is in San Antonio, TX (Remote). Please review the job description below and contact me ASAP if you are interested. Job Diva ID: 25-95050 Pay Range: $45 - $50 /hour. Employee benefits include, but are not limited to, health insurance (medical, dental, vision), 401(k) plan, and paid sick leave (depending on work location). Key Responsibilities: Work closely with operations to perform value stream mapping, conduct time motion study to identify opportunities for work process improvements, and suggest quality and productivity improvement measures to enhance operational efficiency Perform Lean Six Sigma review to map current state, capture team utilization and throughput to identify Non-Value Adds (NVA), process gaps and redundancies, hand-offs, and perform cost-benefit analysis for deploying process improvement initiatives Develop and maintain scorecard and other tools to measure the success of process improvement initiatives Create and periodically review metrics to capture operational performance more efficiently Provide training/working sessions to junior team members on various business process analysis and process improvement approaches using business process reengineering and improvement (BPR/BPI) tools Provide guidance to project/program managers on the most effective, efficient use of resources to ensure consistent productivity across teams Facilitate workshops for operations managers focused on critical thinking, brainstorming, and business process design to capture improvement opportunities Key Requirements and Technology Experience: Key skills: Process Engineering, Process Improvement, Risk Management, Controls, Jira, Visio, Banking Industry. 2 Years Banking 2 Year Risk and Controls What are the top three technology tools this resource must have knowledge of using? MS Visio, PowerPoint, JIRA Can you please provide a brief description of the daily duties using the technology tools above? Responsible for reviewing assigned controls to identify process efficacies, potential redundancies, and opportunities for automation. Please provide any details that will be helpful to find the right candidate for the job. Must have banking and risk, and control experience. Experience just indicating where risks and controls reside in the process is not sufficient. Our client is a leading Banking Industry, and we are currently interviewing to fill this and other similar contract positions. If you are interested in this position, please apply online for immediate consideration Pyramid Consulting, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, colour, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. By applying to our jobs, you agree to receive calls, AI-generated calls, text messages, or emails from Pyramid Consulting, Inc. and its affiliates, and contracted partners. Frequency varies for text messages. Message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You can reply STOP to cancel and HELP for help. You can access our privacy policy here.
    $45-50 hourly 4d ago
  • Business Intelligence Engineer

    Comrise 4.3company rating

    Foster City, CA jobs

    Foster City, CA (On-Site) Contract | 6-12 Months | $90-100/hr About the Role We're an autonomous mobility company building an on-demand, driverless ride-hailing service-and we're looking for a Business Intelligence Engineer to help power the insights behind our safety, operations, and commercial readiness efforts. In this role, you'll partner closely with data scientists, engineers, and operational leaders to build scalable data models, high-impact dashboards, and reliable metrics that support informed, data-driven decisions. What You'll Do Partner with technical and non-technical teams to gather requirements and deliver automated, actionable BI solutions. Design, build, and maintain data models, datamarts, and ETL/ELT pipelines. Collaborate with data scientists and engineers to define consistent and trustworthy metrics. Develop dashboards and visualizations that drive operational insights and support leadership decisions. Enable self-service analytics and promote data literacy across the organization. Ensure reporting best practices-data integrity, validation, documentation, and scalability. Translate business needs into well-structured data assets under fast-paced timelines. Ideal Candidate Profile dbt certification or strong hands-on experience with dbt. Experience with Airflow for workflow orchestration. Strong background in analytics engineering, SQL, and dimensional data modeling. Full-stack BI skill set: ~40-50% dashboarding and ~50-60% backend datamart development. Proven ability to build and maintain datamarts-not just frontend dashboards. Skilled in creating self-serve dashboards and working directly with stakeholders. Must have Looker (not Looker Studio) experience, including LookML modeling. Required Skills 6+ years of relevant industry experience. Degree or background in Computer Science, Engineering, Applied Math, Statistics, or similar. High proficiency in SQL, dbt, and data modeling. Expertise in Looker and BI best practices. Strong communication and collaboration skills. Interview Process Coding Assessment 30-minute Zoom interview with Hiring Manager 1.5-hour technical panel interview
    $90-100 hourly 5d ago
  • Validation Engineer

    Collabera 4.5company rating

    Palo Alto, CA jobs

    Role: Firmware Validation Software Engineer Type: Contract to Hire Pay Range: $48-$53/hr. Mission: This will be part of the supercharger team and will be responsible for testing our EV charger features to ensure the quality and safety of the charging experience for both client owners and third party EVs.??You will architect, design, and implement firmware validation procedures, equipment, tooling, and automation to efficiently test charging components and subsystems.??You will work closely with development and integration teams to explore and validate the performance capabilities of our hardware and firmware to ensure code quality is high. Must Haves: Degree in Electrical Engineering, Computer Engineering, or a related technical field, or equivalent experience Experience in embedded systems validation, firmware testing, or related fields Hands-on expertise with hardware debugging tools (oscilloscopes, protocol analyzers, etc.) Strong understanding of software development in systems languages (e.g. C, C++, Rust), Linux software architecture, embedded firmware (e.g. RTOS) Ability to translate complex requirements into scalable test solutions Day-to-Day Design and deploy advanced automated test frameworks for embedded Linux and RTOS-based products Develop software-in-the-loop (SIL) and hardware-in-the-loop (HIL) test systems using tools like oscilloscopes, logic analyzers, and custom automation Create actionable test reports to track code coverage, regression metrics, and release readiness Reverse-engineer complex systems to identify edge cases and failure modes Collaborate with cross-functional teams to refine validation strategies and troubleshoot issues Drive adoption of best practices for test automation, CI/CD, code robustness, and infrastructure scalability Plusses: communication protocols (Ethernet, CAN, RS485, etc.) The Company offers the following benefits for this position, subject to applicable eligibility requirements: medical insurance, dental insurance, vision insurance, 401(k) retirement plan, life insurance, long-term disability insurance, short-term disability insurance, paid parking/public transportation, paid time off, paid sick and safe time, hours of paid vacation time, weeks of paid parental leave, and paid holidays annually - as applicable.
    $48-53 hourly 2d ago
  • Business Process Engineer

    Logisolve 3.6company rating

    Madison, WI jobs

    No third- party vendors will be accepted. Please do not respond/reach out. Logisolve is seeking a Manager, Business Process Engineering for a 6-month contract to hire position with our direct Healthcare company. The Manager collaborates closely with cross-functional leaders, vendors and employees at all levels. We have adopted a holistic approach to Lean Six Sigma where we identify Process Owners for ongoing continuous improvement in seven key Value Streams. Each Value Stream is supported by a Manager and Business Consultant that evaluates, documents, designs, manage and monitors the end-to-end processes and underlying systems through the continuous application of Lean principles. This position plays a leading role in transitioning to implementation to ensure changes stick, using additional skills in project management and organizational change management (OCM). This position may have multiple headcount accountability. Qualifications: Bachelor's degree or equivalent experience in related field, plus 7+ years of related work experience beyond degree within Business Process Management (BPM), Business Analytics, Program and Project Management, Business Operations, etc. Health Plan/Payer and/or Healthcare experience (7+ years required) Plateform Migration experience preferred. Knowledge of Healthrules and/or Cosmos preferred Proven leadership ability across large cross functional teams required Continuous improvement and implementation experience required Program and Project Management experience required Required License/Certification: Lean Six Sigma Black Belt required Preferred Qualifications: Demonstrated experience managing day-to-day supervision of Business Process Consultants Mastery over all Six Sigma concepts and tools including Value Stream Mapping, Kaizen events, A3, Kanban boards, 5 Whys, FMEA, etc. Functional understanding of Agile methodology preferred Hands-on change management experience preferred Skills and Abilities: Client focused program, project, and process management experience including operational and cross-functional workflows Proven ability to formulate content and present clearly both internally and externally Experience creating, building, and leading cross functional teams from conception through implementation Advanced level of proficiency with Microsoft Teams, Visio, Smartsheet, PowerPoint & SharePoint Experience working with Business and Technology to design future state Hands on experience doing and driving work - not coaching Logisolve offers medical, dental, vision, life insurance, short-term disability, long-term disability, paid sick leave, and retirement benefits to eligible employees.
    $69k-99k yearly est. 5d ago
  • Silicon Validation Engineer

    Vdart 4.5company rating

    Raleigh, NC jobs

    Manage lab servers, test benches, and associated hardware infrastructure Perform system provisioning, installation, configuration, and upgrades Support daily lab operations including: Hardware bring-up Firmware flashing Network connectivity Monitor lab health and promptly resolve system or network issues Maintain lab inventory and manage access control Develop automation scripts using PowerShell, Python, or similar scripting languages to improve efficiency Collect and analyze logs using Kusto (KQL), Splunk, or equivalent tools for troubleshooting and data analytics Collaborate with cross-functional teams to ensure lab readiness for validation and development activities Document lab processes, configurations, and troubleshooting steps Support on-call or after-hours lab issues when required Mandatory Skills: PowerShell D2D SERDES
    $68k-90k yearly est. 4d ago
  • Process Engineer

    Talent Software Services 3.6company rating

    Pittsburgh, PA jobs

    Are you an experienced Process Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Process Engineer to work at their company in Pittsburgh, PA. Primary Responsibilities/Accountabilities: Value Stream Assessment & Process Design End-to-end value stream definition, assessment and mapping Value stream diagnostic, including process improvement opportunities, pain-point identification, process waste, etc. Executing and interpreting process mining outputs Automation and AI-enabled workflow design and use case identification KPI framework and metric design Reporting design, including statistical process control Apply horizontal and systems thinking to design efforts Apply lean tools (e.g., poka-yoke, RCA) to ensure future state process excellence and continuous improvement Cross Team Support Technology requirements identification and gathering Workshop facilitation Change partnership Identify risks and issues Qualifications: Business process modelling Automation / AI BPM tools (e.g., Visio, ARIS, etc.) Process mining Statistical process control Lean tools and techniques Healthcare domain knowledge (preferred) Scrappiness and sense of urgency 7+ years in process and operational excellence MBA or equivalent (preferred) Lean / Six Sigma Black Belt or MBB (preferred)
    $61k-81k yearly est. 4d ago
  • Manufacturing Diagnostics Engineer

    Comrise 4.3company rating

    Tuscaloosa, AL jobs

    Company is helping our client find a Manufacturing Diagnostics Engineer to support the bring-up, validation, testing, and troubleshooting of the company's products, as well as the development and implementation of diagnostic systems and infrastructure. In this role, you'll develop, refine, and scale test processes for the electrical and firmware systems of the company's primary drive module subassembly, which is ultimately assembled onto the vehicle. The ideal candidate has strong debugging and diagnostic skills, hands-on experience with automotive networking and firmware testing, and enjoys working cross-functionally in a fast-paced manufacturing environment. As a Manufacturing Diagnostics Engineer, you will: Serve as the first line of defense for electrical, firmware, and functional issues on the drive module manufacturing assembly, including software, firmware, harnessing, networking (CAN, LIN, Automotive & Standard Ethernet, GMSL), hardware, and infrastructure. Troubleshoot and resolve electrical and firmware issues at the manufacturing supplier (e.g., flashing firmware, updating test sequences, repairing harness connectors, terminals, and wires). Work closely with multidisciplinary engineering teams to interpret component- and vehicle-level requirements and translate them into scalable, system-level validation test scripts and cases.7 Develop work instructions, troubleshooting guides, and workaround documentation for operators on the manufacturing line. Design and implement next-generation diagnostic architectures to support higher-volume production in future manufacturing lines. Daily tasks Responsibilities Serve as the first line of defense for all electrical and functional issues on the drive module manufacturing assembly including but not limited to software / firmware, harness, networking (CAN, LIN, Automotive and Standard Ethernet, GMSL, etc.), hardware, and infrastructure issues. Resolve electrical and firmware issues at manufacturing supplier by flashing firmware, updating test sequences, repairing harness connectors / terminals / wires, etc. Work closely with multidisciplinary engineers to collect and interpret the component and vehicle level requirements and translate them into scalable system-level validation test scripts and test cases Develop work instructions and troubleshooting / workaround guides for operators on the Manufacturing line Design and implement future manufacturing lines involving the next generation diagnostic architecture to support higher volume production Required skills Qualifications Bachelor's degree in a relevant area such as Electrical Engineering or Computer Science. Very strong troubleshooting and debugging skills and experience, including reading, troubleshooting, and injecting packets over various network protocols such as CAN, LIN, and Ethernet 2-4 years of experience with automotive controller integration testing or test script development / programming in high-level languages such as Python Experience with Github or similar tools for software management Strong background in Linux and shell / bash / terminal scripting Bonus Qualifications Master's degree in a relevant area such as Computer Science or Engineering Experience with integration and automation of manufacturing and test equipment Background in creating, reading, and writing to tables in SQL database Experience with computer network engineering, TCP / IP protocols, communication over API Business driver of role As a Manufacturing Diagnostics Engineer, you will be a part of the team that executes processes to bring up, validate, test, and troubleshoot the company products as well as implements diagnostic systems / infrastructure. You will develop and improve our processes for testing the electrical and firmware systems of the primary drive module subassembly that is ultimately assembled onto the company vehicle.
    $67k-89k yearly est. 4d ago

Learn more about Cisco jobs

View all jobs