Post job

Data engineer jobs in Santa Rosa, CA

- 3,442 jobs
All
Data Engineer
Data Scientist
Requirements Engineer
  • Engineer

    The Timbri

    Data engineer job in San Francisco, CA

    Compensation Type: Hourly Highgate Hotels: Highgate is a premier real estate investment and hospitality management company widely recognized as an innovator in the industry. Highgate is the dominant player in U.S. gateway markets including New York, Boston, Miami, San Francisco and Honolulu, with a rapidly expanding presence in Europe, Latin America, and the Caribbean. Highgate's portfolio of global properties represents an aggregate asset value exceeding $20B and generates over $5B in cumulative revenues. The company provides expert guidance through all stages of the hospitality property cycle, from planning and development through recapitalization or disposition. Highgate also has the creativity and bandwidth to develop bespoke hotel brands and utilizes industry-leading proprietary revenue management tools that identify and predict evolving market dynamics to drive out performance and maximize asset value. With an executive team consisting of some of the industry's most experienced hotel management leaders, the company is a trusted partner for top ownership groups and major hotel brands. Highgate maintains corporate offices in London, New York, Dallas, and Seattle. Location: Overview: Respond and attend to guest repair requests. Communicate with guests/customers to resolve maintenance issues. Perform preventive maintenance on tools and kitchen and mechanical room equipment, including cleaning and lubrication. Visually inspect tools, equipment, or machines. Responsibilities: Assist with the operation maintenance and repair of equipment. Change out light bulbs; perform preventative maintenance for guest rooms to include: vinyl repair touch up paint minor furniture repair tub caulking tile repairs etc. Perform preventative maintenance for ice machines refrigerators kitchen equipment laundry equipment HVAC guestrooms meeting rooms the swimming pool and hot tub. Perform plumbing repair laundry equipment repair preventative maintenance on all exhaust fans and supply; monitor energy conservation; repair vacuum cleaners and any other small equipment upon request. Immediately follow up on any alarms to determine the exact location and cause - determine emergency status and report to the Front Desk with findings. Perform other tasks/jobs as assigned by the supervisor or manager. Qualifications: Experience in a hotel or a related field preferred. High School diploma or equivalent required. Licensed in a trade preferred (plumbing electrical HVAC carpentry etc.). Must have a valid driver's license for the applicable state. Must have an acceptable MVR (Motor Vehicle Driving Record) property specific.
    $95k-138k yearly est. Auto-Apply 4d ago
  • Staff Data Scientist

    Quantix Search

    Data engineer job in Santa Rosa, CA

    Staff Data Scientist | San Francisco | $250K-$300K + Equity We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI. In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance. What you'll do: Drive deep-dive analyses on user behavior, product performance, and growth drivers Design and interpret A/B tests to measure product impact at scale Build scalable data models, pipelines, and dashboards for company-wide use Partner with Product and Engineering to embed experimentation best practices Evaluate ML models, ensuring business relevance, performance, and trade-off clarity What we're looking for: 5+ years in data science or product analytics at scale (consumer or marketplace preferred) Advanced SQL and Python skills, with strong foundations in statistics and experimental design Proven record of designing, running, and analyzing large-scale experiments Ability to analyze and reason about ML models (classification, recommendation, LLMs) Strong communicator with a track record of influencing cross-functional teams If you're excited by the sound of this challenge- apply today and we'll be in touch.
    $250k-300k yearly 3d ago
  • Data Scientist

    Skale 3.7company rating

    Data engineer job in San Francisco, CA

    We're working with a Series A health tech start-up pioneering a revolutionary approach to healthcare AI, developing neurosymbolic systems that combine statistical learning with structured medical knowledge. Their technology is being adopted by leading health systems and insurers to enhance patient outcomes through advanced predictive analytics. We're seeking Machine Learning Engineers who excel at the intersection of data science, modeling, and software engineering. You'll design and implement models that extract insights from longitudinal healthcare data, balancing analytical rigor, interpretability, and scalability. This role offers a unique opportunity to tackle foundational modeling challenges in healthcare, where your contributions will directly influence clinical, actuarial, and policy decisions. Key Responsibilities Develop predictive models to forecast disease progression, healthcare utilization, and costs using temporal clinical data (claims, EHR, laboratory results, pharmacy records) Design interpretable and explainable ML solutions that earn the trust of clinicians, actuaries, and healthcare decision-makers Research and prototype innovative approaches leveraging both classical and modern machine learning techniques Build robust, scalable ML pipelines for training, validation, and deployment in distributed computing environments Collaborate cross-functionally with data engineers, clinicians, and product teams to ensure models address real-world healthcare needs Communicate findings and methodologies effectively through visualizations, documentation, and technical presentations Required Qualifications Strong foundation in statistical modeling, machine learning, or data science, with preference for experience in temporal or longitudinal data analysis Proficiency in Python and ML frameworks (PyTorch, JAX, NumPyro, PyMC, etc.) Proven track record of transitioning models from research prototypes to production systems Experience with probabilistic methods, survival analysis, or Bayesian inference (highly valued) Bonus Qualifications Experience working with clinical data and healthcare terminologies (ICD, CPT, SNOMED CT, LOINC) Background in actuarial modeling, claims forecasting, or risk adjustment methodologies
    $123k-171k yearly est. 1d ago
  • AI Data Engineer

    Hartleyco

    Data engineer job in Santa Rosa, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 5d ago
  • Data Engineer

    Midjourney

    Data engineer job in Sonoma, CA

    Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species. We are a small, self-funded team focused on design, human infrastructure, and AI. We have no investors, no big company controlling us, and no advertisers. We are 100% supported by our amazing community. Our tools are already used by millions of people to dream, to explore, and to create. But this is just the start. We think the story of the 2020s is about building the tools that will remake the world for the next century. We're making those tools, to expand what it means to be human. Core Responsibilities: Design and maintain data pipelines to consolidate information across multiple sources (subscription platforms, payment systems, infrastructure and usage monitoring, and financial systems) into a unified analytics environment Build and manage interactive dashboards and self-service BI tools that enable leadership to track key business metrics including revenue performance, infrastructure costs, customer retention, and operational efficiency Serve as technical owner of our financial planning platform (Pigment or similar), leading implementation and build-out of models, data connections, and workflows in partnership with Finance leadership to translate business requirements into functional system architecture Develop automated data quality checks and cleaning processes to ensure accuracy and consistency across financial and operational datasets Partner with Finance, Product and Operations teams to translate business questions into analytical frameworks, including cohort analysis, cost modeling, and performance trending Create and maintain documentation for data models, ETL processes, dashboard logic, and system workflows to ensure knowledge continuity Support strategic planning initiatives by building financial models, scenario analyses, and data-driven recommendations for resource allocation and growth investments Required Qualifications: 3-5+ years experience in data engineering, analytics engineering, or similar role with demonstrated ability to work with large-scale datasets Strong SQL skills and experience with modern data warehousing solutions (BigQuery, Snowflake, Redshift, etc.) Proficiency in at least one programming language (Python, R) for data manipulation and analysis Experience with BI/visualization tools (Looker, Tableau, Power BI, or similar) Hands-on experience administering enterprise financial systems (NetSuite, SAP, Oracle, or similar ERP platforms) Experience working with Stripe Billing or similar subscription management platforms, including data extraction and revenue reporting Ability to communicate technical concepts clearly to non-technical stakeholders
    $110k-157k yearly est. 5d ago
  • Data Engineer

    Zigma LLC

    Data engineer job in Santa Rosa, CA

    Zigma LLC is a women-owned technology consulting and IT services start-up specializing in Big Data engineering, cloud data modernization, cloud architecture, and advanced analytics. Our mission is to empower organizations through secure, scalable, and high-performance digital ecosystems while maintaining a strong commitment to cybersecurity and compliance. We work with clients across various industries, including healthcare, telecom, and financial services, ranging from local businesses to enterprise-level corporations. Dedicated to fostering inclusion and women's leadership, we strive to deliver innovative solutions that drive operational efficiency and digital transformation. Zigma LLC combines technical expertise with a passion for empowering the next generation of women entrepreneurs. Data Engineer (Mid-Level) - Hybrid | C2C | Healthcare Locations: East Bay Area, CA | Greater Los Angeles Area, CA | Oregon's Willamette Valley, OR | Greater Atlanta Area, GA Employment Type: C2C Work Authorization: US Citizens, Green Card, H4/L2/Any EAD, OPT/CPT Candidates. Work Arrangement: Hybrid Openings: 3 per location Experience: 7-12 years Contract: Long-term (12+ months, performance-based) Preferred Education/Certification: B.S/M.S. in Engineering Discipline with Computer Science, Data Engineering or relevant skills and certifications Join a leading healthcare analytics team as a Data Engineer! Work on Azure Cloud, Databricks, and modern Data Pipelines to drive insights from complex healthcare datasets. This is a hybrid role with opportunities to collaborate across multiple locations. Key Responsibilities: • Design, build, and maintain ETL/ELT Ingestion pipelines on Azure Cloud • Collaborate with data scientists and analysts to ensure data quality, governance, and availability • Implement batch and streaming data processing workflows • Optimize data workflows and pipelines for performance and scalability • Work with HIPAA-compliant healthcare data Technical Skills & Tools: Programming & Scripting: Python, SQL, Scala/Java Data Processing Frameworks: Apache Spark, Kafka, Airflow/Prefect Databases: Relational (PostgreSQL, MySQL, SQL Server), NoSQL (MongoDB, Cassandra), Data Warehouses (Snowflake, Redshift) Data Formats: CSV, JSON, Parquet, Avro, ORC Version Control & DevOps: Git, Azure DevOps, CI/CD pipelines Cloud & Containerization: Azure Cloud, Docker, Kubernetes, Terraform Core Skills: • ETL/ELT Ingestion pipeline design • Batch & streaming data processing • Data modelling (star/snowflake schema) • Performance optimization & scalability • Data governance and security Must-Have: • 7-12 years in Data Engineering • Hands-on Azure Cloud and Databricks experience • M.S. in Data Science or relevant certifications (Databricks/Data Science)
    $110k-157k yearly est. 1d ago
  • Lead Data Engineer

    Mentor Talent Acquisition

    Data engineer job in Santa Rosa, CA

    We're looking for a Lead Data Engineer to spearhead the design, implementation, and iteration of a world-class, modern data infrastructure that powers analytics, data science, and ML/AI systems. You will be in the driver's seat for a new function on the Engineering team and will help chart its future. This role is highly strategic, cross-functional, and hands-on. If you're passionate about building 0→1 data platforms collaboratively and have experience scaling them at a rapidly growing startup, this role is for you. What you will do Define and execute the strategic roadmap for data infrastructure and analytics capabilities across the organization. Partner closely with Data Science, Operations Analytics, Engineering, and Product on the design and implementation of scalable data pipelines, models, and solutions. Drive the development of foundational data products and tools to power self-service analytics. Actively contribute to and influence engineering processes, culture, practices, and systems. Serve as a technical thought leader on data engineering best practices. About you Strong technical foundation with the modern data engineering stack (dbt, PySpark, Fivetran, Snowflake, Lakehouse, CDPs, ETL tools, etc.). Advanced knowledge of SQL and Python. Deep expertise in data pipelines, distributed systems, and analytics infrastructure. Hands-on experience with data warehousing technologies, data lake architecture, and ETL pipelines/tools. Deep understanding of BI tooling infrastructure and semantic layer design (e.g., Looker, Tableau, Metabase, Mode). Experience and interest in leading major architecture initiatives from the ground up. Believer in applying best-in-class software engineering practices to data systems. Interest in coaching/mentoring junior engineers. Bonus points Experience building data products that meet HIPAA requirements. Built platforms that support real-time and batch ML/AI products and systems. Experience integrating EHR and other complex third-party system data. For more info or to apply please share your resume to *************************.
    $110k-157k yearly est. 2d ago
  • Senior Data Engineer

    Sigmaways Inc.

    Data engineer job in Santa Rosa, CA

    If you're hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you. As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines that transform raw information into actionable insights. The ideal candidate will have strong experience across modern data platforms, cloud environments, and big data technologies, with a focus on building secure, repeatable, and high-performing solutions. Responsibilities: Design, develop, and maintain secure, scalable data pipelines to ingest, transform, and deliver curated data into the Common Data Platform (CDP). Participate in Agile rituals and contribute to delivery within the Scaled Agile Framework (SAFe). Ensure quality and reliability of data products through automation, monitoring, and proactive issue resolution. Deploy alerting and auto-remediation for pipelines and data stores to maximize system availability. Apply a security first and automation-driven approach to all data engineering practices. Collaborate with cross-functional teams (data scientists, analysts, product managers, and business stakeholders) to align infrastructure with evolving data needs. Stay current on industry trends and emerging tools, recommending improvements to strengthen efficiency and scalability. Qualifications: Bachelor's degree in Computer Science, Information Systems, or related field (or equivalent experience). At least 3 years of experience with Python and PySpark, including Jupyter notebooks and unit testing. At least 2 years of experience with Databricks, Collibra, and Starburst. Proven work with relational and NoSQL databases, including STAR and dimensional modeling approaches. Hands-on experience with modern data stacks: object stores (S3), Spark, Airflow, lakehouse architectures, and cloud warehouses (Snowflake, Redshift). Strong background in ETL and big data engineering (on-prem and cloud). Work within enterprise cloud platforms (CFS2, Cloud Foundational Services 2/EDS) for governance and compliance. Experience building end-to-end pipelines for structured, semi-structured, and unstructured data using Spark.
    $110k-157k yearly est. 2d ago
  • Data Engineer / Analytics Specialist

    Ittconnect

    Data engineer job in Santa Rosa, CA

    Citizenship Requirement: U.S. Citizens Only ITTConnect is seeking a Data Engineer / Analytics to work for one of our clients, a major Technology Consulting firm with headquarters in Europe. They are experts in tailored technology consulting and services to banks, investment firms and other Financial vertical clients. Job location: San Francisco Bay area or NY City. Work Model: Ability to come into the office as requested Seniority: 10+ years of total experience About the role: The Data Engineer / Analytics Specialist will support analytics, product insights, and AI initiatives. You will build robust data pipelines, integrate data sources, and enhance the organization's analytical foundations. Responsibilities: Build and operate Snowflake-based analytics environments. Develop ETL/ELT pipelines (DBT, Airflow, etc.). Integrate APIs, external data sources, and streaming inputs. Perform query optimization, basic data modeling, and analytics support. Enable downstream GenAI and analytics use cases. Requirements: 10+ years of overall technology experience 3+ years hands-on AWS experience required Strong SQL and Snowflake experience. Hands-on pipeline engineering with DBT, Airflow, or similar. Experience with API integrations and modern data architectures.
    $110k-157k yearly est. 3d ago
  • Data Engineer

    Odiin

    Data engineer job in San Francisco, CA

    You'll work closely with engineering, analytics, and product teams to ensure data is accurate, accessible, and efficiently processed across the organization. Key Responsibilities: Design, develop, and maintain scalable data pipelines and architectures. Collect, process, and transform data from multiple sources into structured, usable formats. Ensure data quality, reliability, and security across all systems. Work with data analysts and data scientists to optimize data models for analytics and machine learning. Implement ETL (Extract, Transform, Load) processes and automate workflows. Monitor and troubleshoot data infrastructure, ensuring minimal downtime and high performance. Collaborate with cross-functional teams to define data requirements and integrate new data sources. Maintain comprehensive documentation for data systems and processes. Requirements: Proven experience as a Data Engineer, ETL Developer, or similar role. Strong programming skills in Python, SQL, or Scala. Experience with data pipeline tools (Airflow, dbt, Luigi, etc.). Familiarity with big data technologies (Spark, Hadoop, Kafka, etc.). Hands-on experience with cloud data platforms (AWS, GCP, Azure, Snowflake, or Databricks). Understanding of data modeling, warehousing, and schema design. Solid knowledge of database systems (PostgreSQL, MySQL, NoSQL). Strong analytical and problem-solving skills.
    $110k-157k yearly est. 3d ago
  • Data Engineer, Knowledge Graphs

    Mithrl

    Data engineer job in San Francisco, CA

    We imagine a world where new medicines reach patients in months, not years, and where scientific breakthroughs happen at the speed of thought. Mithrl is building the world's first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes. Scientists ask questions in natural language, and Mithrl responds with analysis, novel targets, hypotheses, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks. We are one of the fastest growing tech bio companies in the Bay Area with 12x year over year revenue growth. Our platform is used across three continents by leading biotechs and big pharmas. We power breakthroughs from early target discovery to mechanism-of-action. And we are just getting started. ABOUT THE ROLE We are hiring a Data Engineer, Knowledge Graphs to build the infrastructure that powers Mithrl's biological knowledge layer. You will partner closely with the Data Scientist, Knowledge Graphs to take curated knowledge sources and transform them into scalable, reliable, production ready systems that serve the entire platform. Your work includes building ETL pipelines for large biological datasets, designing schemas and storage models for graph structured data, and creating the API surfaces that allow ML engineers, application teams, and the AI Co-Scientist to query and use the knowledge graph efficiently. You will also own the reliability, performance, and versioning of knowledge graph infrastructure across releases. This role is the bridge between biological knowledge ingestion and the high performance engineering systems that use it. If you enjoy working on data modeling, schema design, graph storage, ETL, and scalable infrastructure, this is an opportunity to have deep impact on the intelligence layer of Mithrl. WHAT YOU WILL DO Build and maintain ETL pipelines for large public biological datasets and curated knowledge sources Design, implement, and evolve schemas and storage models for graph structured biological data Create efficient APIs and query surfaces that allow internal teams and AI systems to retrieve nodes, relationships, pathways, annotations, and graph analytics Partner closely with the Data Scientists to operationalize curated relationships, harmonized variable IDs, metadata standards, and ontology mappings Build data models that support multi tenant access, versioning, and reproducibility across releases Implement scalable storage and indexing strategies for high volume graph data Maintain data quality, validate data integrity, and build monitoring around ingestion and usage Work with ML engineers and application teams to ensure the knowledge graph infrastructure supports downstream reasoning, analysis, and discovery applications Support data warehousing, documentation, and API reliability Ensure performance, reliability, and uptime for knowledge graph services WHAT YOU BRING Required Qualifications Strong experience as a data engineer or backend engineer working with data intensive systems Experience building ETL or ELT pipelines for large structured or semi structured datasets Strong understanding of database design, schema modeling, and data architecture Experience with graph data models or willingness to learn graph storage concepts Proficiency in Python or similar languages for data engineering Experience designing and maintaining APIs for data access Understanding of versioning, provenance, validation, and reproducibility in data systems Experience with cloud infrastructure and modern data stack tools Strong communication skills and ability to work closely with scientific and engineering teams Nice to Have Experience with graph databases or graph query languages Experience with biological or chemical data sources Familiarity with ontologies, controlled vocabularies, and metadata standards Experience with data warehousing and analytical storage formats Previous work in a tech bio company or scientific platform environment WHAT YOU WILL LOVE AT MITHRL You will build the core infrastructure that makes the biological knowledge graph fast, reliable, and usable Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high-energy, in-person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
    $110k-157k yearly est. 2d ago
  • Imaging Data Engineer/Architect

    Intuitive.Ai

    Data engineer job in San Francisco, CA

    About us: Intuitive is an innovation-led engineering company delivering business outcomes for 100's of Enterprises globally. With the reputation of being a Tiger Team & a Trusted Partner of enterprise technology leaders, we help solve the most complex Digital Transformation challenges across following Intuitive Superpowers: Modernization & Migration Application & Database Modernization Platform Engineering (IaC/EaC, DevSecOps & SRE) Cloud Native Engineering, Migration to Cloud, VMware Exit FinOps Data & AI/ML Data (Cloud Native / DataBricks / Snowflake) Machine Learning, AI/GenAI Cybersecurity Infrastructure Security Application Security Data Security AI/Model Security SDx & Digital Workspace (M365, G-suite) SDDC, SD-WAN, SDN, NetSec, Wireless/Mobility Email, Collaboration, Directory Services, Shared Files Services Intuitive Services: Professional and Advisory Services Elastic Engineering Services Managed Services Talent Acquisition & Platform Resell Services About the job: Title: Imaging Data Engineer/Architect Start Date: Immediate # of Positions: 1 Position Type: Contract/ Full-Time Location: San Francisco, CA Notes: Imaging data Engineer/architect who understands Radiology and Digital pathology, related clinical data and metadata. Hands-on experience on above technologies, and with good knowledge in the biomedical imaging, and data pipelines overall. About the Role We are seeking a highly skilled Imaging Data Engineer/Architect to join our San Francisco team as a Subject Matter Expert (SME) in radiology and digital pathology. This role will design and manage imaging data pipelines, ensuring seamless integration of clinical data and metadata to support advanced diagnostic and research applications. The ideal candidate will have deep expertise in medical imaging standards, cloud-based data architectures, and healthcare interoperability, contributing to innovative solutions that enhance patient outcomes. Responsibilities Design and implement scalable data architectures for radiology and digital pathology imaging data, including DICOM, HL7, and FHIR standards. Develop and optimize data pipelines to process and store large-scale imaging datasets (e.g., MRI, CT, histopathology slides) and associated metadata. Collaborate with clinical teams to understand radiology and pathology workflows, ensuring data solutions align with clinical needs. Ensure data integrity, security, and compliance with healthcare regulations (e.g., HIPAA, GDPR). Integrate imaging data with AI/ML models for diagnostic and predictive analytics, working closely with data scientists. Build and maintain metadata schemas to support data discoverability and interoperability across systems. Provide technical expertise to cross-functional teams, including product managers and software engineers, to drive imaging data strategy. Conduct performance tuning and optimization of imaging data storage and retrieval systems in cloud environments (e.g., AWS, Google Cloud, Azure). Document data architectures and processes, ensuring knowledge transfer to internal teams and external partners. Stay updated on emerging imaging technologies and standards, proposing innovative solutions to enhance data workflows. Qualifications Education: Bachelor's degree in computer science, Biomedical Engineering, or a related field (master's preferred). Experience: 5+ years in data engineering or architecture, with at least 3 years focused on medical imaging (radiology and/or digital pathology). Proven experience with DICOM, HL7, FHIR, and imaging metadata standards (e.g., SNOMED, LOINC). Hands-on experience with cloud platforms (AWS, Google Cloud, or Azure) for imaging data storage and processing. Technical Skills: Proficiency in programming languages (e.g., Python, Java, SQL) for data pipeline development. Expertise in ETL processes, data warehousing, and database management (e.g., Snowflake, BigQuery, PostgreSQL). Familiarity with AI/ML integration for imaging data analytics. Knowledge of containerization (e.g., Docker, Kubernetes) for deploying data solutions. Domain Knowledge: Deep understanding of radiology and digital pathology workflows, including PACS and LIS systems. Familiarity with clinical data integration and healthcare interoperability standards. Soft Skills: Strong analytical and problem-solving skills to address complex data challenges. Excellent communication skills to collaborate with clinical and technical stakeholders. Ability to work independently in a fast-paced environment, with a proactive approach to innovation. Certifications (preferred): AWS Certified Solutions Architect, Google Cloud Professional Data Engineer, or equivalent. Certifications in medical imaging (e.g., CIIP - Certified Imaging Informatics Professional).
    $110k-157k yearly est. 5d ago
  • Data Platform Engineer / AI Workloads

    The Crypto Recruiters 3.3company rating

    Data engineer job in Santa Rosa, CA

    We are actively searching for a Data Infrastructure Engineer to join our team on a permanent basis. In this founding engineer role you will focus on building next-generation data infrastructure for our AI platform. If you have a passion for distributed systems, unified storage, orchestration, and retrieval for AI workloads we would love to speak with you. Your Rhythm: Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient Tackle complex challenges in distributed systems, databases, and AI infrastructure Collaborate with technical leadership to define and refine the product roadmap Write high-quality, well-tested, and maintainable code Contribute to the open-source community and engage with developers in the space Your Vibe: 5+ years experience designing building distributed database systems Expertise in building and operating scalable, reliable and secure database infrastructure systems Strong knowledge around distributed compute, data orchestration, distributed storage, streaming infrastructure Strong knowledge of SQL and NoSQL databases, such as MySQL, Postgres, and MongoDB. Programming skills in Python Passion for building developer tools and scalable infrastructure Our Vibe: Relaxed work environment 100% paid top of the line health care benefits Full ownership, no micro management Strong equity package 401K Unlimited vacation An actual work/life balance, we aren't trying to run you into the ground. We have families and enjoy life too!
    $128k-181k yearly est. 2d ago
  • Staff Data Scientist

    Quantix Search

    Data engineer job in San Francisco, CA

    Staff Data Scientist | San Francisco | $250K-$300K + Equity We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI. In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance. What you'll do: Drive deep-dive analyses on user behavior, product performance, and growth drivers Design and interpret A/B tests to measure product impact at scale Build scalable data models, pipelines, and dashboards for company-wide use Partner with Product and Engineering to embed experimentation best practices Evaluate ML models, ensuring business relevance, performance, and trade-off clarity What we're looking for: 5+ years in data science or product analytics at scale (consumer or marketplace preferred) Advanced SQL and Python skills, with strong foundations in statistics and experimental design Proven record of designing, running, and analyzing large-scale experiments Ability to analyze and reason about ML models (classification, recommendation, LLMs) Strong communicator with a track record of influencing cross-functional teams If you're excited by the sound of this challenge- apply today and we'll be in touch.
    $250k-300k yearly 3d ago
  • AI Data Engineer

    Hartleyco

    Data engineer job in San Francisco, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 5d ago
  • Senior ML Data Engineer

    Midjourney

    Data engineer job in Santa Rosa, CA

    We're the data team behind Midjourney's image generation models. We handle the dataset side: processing, filtering, scoring, captioning, and all the distributed compute that makes high-quality training data possible. What you'd be working on: Large-scale dataset processing and filtering pipelines Training classifiers for content moderation and quality assessment Models for data quality and aesthetic evaluation Data visualization tools for experimenting on dataset samples Testing/simulating distributed inference pipelines Monitoring dashboards for data quality and pipeline health Performance optimization and infrastructure scaling Occasionally jumping into inference optimization and other cross-team projects Our current stack: PySpark, Slurm, distributed batch processing across hybrid cloud setup. We're pragmatic about tools - if there's something better, we'll switch. We're looking for someone strong in either: Data engineering/ML pipelines at scale, or Cloud/infrastructure with distributed systems experience Don't need exact tech matches - comfort with adjacent technologies and willingness to learn matters more. We work with our own hardware plus GCP and other providers, so adaptability across different environments is valuable. Location: SF office a few times per week (we may make exceptions on location for truly exceptional candidates) The role offers variety, our team members often get pulled into different projects across the company, from dataset work to inference optimization. If you're interested in the intersection of large-scale data processing and cutting-edge generative AI, we'd love to hear from you.
    $110k-157k yearly est. 3d ago
  • Data Engineer - Scientific Data Ingestion

    Mithrl

    Data engineer job in San Francisco, CA

    We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world's first commercially available AI Co-Scientist-a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks. We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughs-from target discovery to mechanism of action. WHAT YOU WILL DO Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources - unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.). Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data - extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion - so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data. Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must-have 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet-style data - inconsistent headers, multiple sheets, mixed formats, free-text fields - and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data. Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end-to-end - from raw upload → final clean dataset - with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions. Nice-to-have Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion. Past exposure to LLM-based data transformation or cleansing agents - building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab-data / bioinformatics is a bonus - though not required. WHAT YOU WILL LOVE AT MITHRL Mission-driven impact: you'll be the gatekeeper of data quality - ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You'll have outsized influence over the reliability and trustworthiness of our entire data + AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You'll work closely with our product, data science, and infrastructure teams - shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high-energy, in-person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
    $110k-157k yearly est. 2d ago
  • Data Engineer

    Zigma LLC

    Data engineer job in San Francisco, CA

    Zigma LLC is a women-owned technology consulting and IT services start-up specializing in Big Data engineering, cloud data modernization, cloud architecture, and advanced analytics. Our mission is to empower organizations through secure, scalable, and high-performance digital ecosystems while maintaining a strong commitment to cybersecurity and compliance. We work with clients across various industries, including healthcare, telecom, and financial services, ranging from local businesses to enterprise-level corporations. Dedicated to fostering inclusion and women's leadership, we strive to deliver innovative solutions that drive operational efficiency and digital transformation. Zigma LLC combines technical expertise with a passion for empowering the next generation of women entrepreneurs. Data Engineer (Mid-Level) - Hybrid | C2C | Healthcare Locations: East Bay Area, CA | Greater Los Angeles Area, CA | Oregon's Willamette Valley, OR | Greater Atlanta Area, GA Employment Type: C2C Work Authorization: US Citizens, Green Card, H4/L2/Any EAD, OPT/CPT Candidates. Work Arrangement: Hybrid Openings: 3 per location Experience: 7-12 years Contract: Long-term (12+ months, performance-based) Preferred Education/Certification: B.S/M.S. in Engineering Discipline with Computer Science, Data Engineering or relevant skills and certifications Join a leading healthcare analytics team as a Data Engineer! Work on Azure Cloud, Databricks, and modern Data Pipelines to drive insights from complex healthcare datasets. This is a hybrid role with opportunities to collaborate across multiple locations. Key Responsibilities: • Design, build, and maintain ETL/ELT Ingestion pipelines on Azure Cloud • Collaborate with data scientists and analysts to ensure data quality, governance, and availability • Implement batch and streaming data processing workflows • Optimize data workflows and pipelines for performance and scalability • Work with HIPAA-compliant healthcare data Technical Skills & Tools: Programming & Scripting: Python, SQL, Scala/Java Data Processing Frameworks: Apache Spark, Kafka, Airflow/Prefect Databases: Relational (PostgreSQL, MySQL, SQL Server), NoSQL (MongoDB, Cassandra), Data Warehouses (Snowflake, Redshift) Data Formats: CSV, JSON, Parquet, Avro, ORC Version Control & DevOps: Git, Azure DevOps, CI/CD pipelines Cloud & Containerization: Azure Cloud, Docker, Kubernetes, Terraform Core Skills: • ETL/ELT Ingestion pipeline design • Batch & streaming data processing • Data modelling (star/snowflake schema) • Performance optimization & scalability • Data governance and security Must-Have: • 7-12 years in Data Engineering • Hands-on Azure Cloud and Databricks experience • M.S. in Data Science or relevant certifications (Databricks/Data Science)
    $110k-157k yearly est. 1d ago
  • Lead Data Engineer

    Mentor Talent Acquisition

    Data engineer job in San Francisco, CA

    We're looking for a Lead Data Engineer to spearhead the design, implementation, and iteration of a world-class, modern data infrastructure that powers analytics, data science, and ML/AI systems. You will be in the driver's seat for a new function on the Engineering team and will help chart its future. This role is highly strategic, cross-functional, and hands-on. If you're passionate about building 0→1 data platforms collaboratively and have experience scaling them at a rapidly growing startup, this role is for you. What you will do Define and execute the strategic roadmap for data infrastructure and analytics capabilities across the organization. Partner closely with Data Science, Operations Analytics, Engineering, and Product on the design and implementation of scalable data pipelines, models, and solutions. Drive the development of foundational data products and tools to power self-service analytics. Actively contribute to and influence engineering processes, culture, practices, and systems. Serve as a technical thought leader on data engineering best practices. About you Strong technical foundation with the modern data engineering stack (dbt, PySpark, Fivetran, Snowflake, Lakehouse, CDPs, ETL tools, etc.). Advanced knowledge of SQL and Python. Deep expertise in data pipelines, distributed systems, and analytics infrastructure. Hands-on experience with data warehousing technologies, data lake architecture, and ETL pipelines/tools. Deep understanding of BI tooling infrastructure and semantic layer design (e.g., Looker, Tableau, Metabase, Mode). Experience and interest in leading major architecture initiatives from the ground up. Believer in applying best-in-class software engineering practices to data systems. Interest in coaching/mentoring junior engineers. Bonus points Experience building data products that meet HIPAA requirements. Built platforms that support real-time and batch ML/AI products and systems. Experience integrating EHR and other complex third-party system data. For more info or to apply please share your resume to *************************.
    $110k-157k yearly est. 2d ago
  • Senior Data Engineer

    Sigmaways Inc.

    Data engineer job in San Francisco, CA

    If you're hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you. As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines that transform raw information into actionable insights. The ideal candidate will have strong experience across modern data platforms, cloud environments, and big data technologies, with a focus on building secure, repeatable, and high-performing solutions. Responsibilities: Design, develop, and maintain secure, scalable data pipelines to ingest, transform, and deliver curated data into the Common Data Platform (CDP). Participate in Agile rituals and contribute to delivery within the Scaled Agile Framework (SAFe). Ensure quality and reliability of data products through automation, monitoring, and proactive issue resolution. Deploy alerting and auto-remediation for pipelines and data stores to maximize system availability. Apply a security first and automation-driven approach to all data engineering practices. Collaborate with cross-functional teams (data scientists, analysts, product managers, and business stakeholders) to align infrastructure with evolving data needs. Stay current on industry trends and emerging tools, recommending improvements to strengthen efficiency and scalability. Qualifications: Bachelor's degree in Computer Science, Information Systems, or related field (or equivalent experience). At least 3 years of experience with Python and PySpark, including Jupyter notebooks and unit testing. At least 2 years of experience with Databricks, Collibra, and Starburst. Proven work with relational and NoSQL databases, including STAR and dimensional modeling approaches. Hands-on experience with modern data stacks: object stores (S3), Spark, Airflow, lakehouse architectures, and cloud warehouses (Snowflake, Redshift). Strong background in ETL and big data engineering (on-prem and cloud). Work within enterprise cloud platforms (CFS2, Cloud Foundational Services 2/EDS) for governance and compliance. Experience building end-to-end pipelines for structured, semi-structured, and unstructured data using Spark.
    $110k-157k yearly est. 2d ago

Learn more about data engineer jobs

How much does a data engineer earn in Santa Rosa, CA?

The average data engineer in Santa Rosa, CA earns between $94,000 and $184,000 annually. This compares to the national average data engineer range of $80,000 to $149,000.

Average data engineer salary in Santa Rosa, CA

$132,000

What are the biggest employers of Data Engineers in Santa Rosa, CA?

The biggest employers of Data Engineers in Santa Rosa, CA are:
  1. Midjourney
  2. Sigmaways Inc.
  3. Crypto.com
  4. Hartleyco
  5. Ittconnect
  6. Mentor Talent Acquisition
  7. Zigma LLC
Job type you want
Full Time
Part Time
Internship
Temporary