Post job

Data analyst jobs in Santa Rosa, CA

- 950 jobs
All
Data Analyst
Data Engineer
Data Scientist
Data Architect
Business Intelligence Analyst
  • Air Quality Data Analyst

    Brightline Defense

    Data analyst job in San Francisco, CA

    Please review the below sections, especially the "how to apply" section, to complete your application and be considered for this position! Title: Air Quality Data Analyst Salary: $36/hr-$45/hour-salary commensurate with experience Job Type: Part-time, Temporary Benefits: Sick leave accrual Duration of Appointment: Est. 2 months from start date Location/Schedule: Hybrid, with potential for remote. For remote consideration, the individual must be California-based and able to be on site for their first day; Estimate 10 hours/week About the Position: This role is a part-time, temporary position (10 hours/week) that will help assist with our air quality data work in the areas identified below. Air Quality Data Work with the Program Manager to provide support in bridging the technical components of our air quality data work and helping to translate that information for the general public and community members, including into written materials, presentations, and reports Evolve Brightline's air quality program to the next generation and help prepare for us for the next level of grantmaking. Review and assess our current air quality data and network to identify opportunities for expansion or new directions, as well as any gaps, and communicate those findings to the team. This includes researching other air quality data programs and materials and providing written recommendations/report. Collaborate with Brightline staff, partners and volunteers who are working on analyzing our air quality data. Participate in meetings related to our air quality work, including with vendors and other key partners. Drafts, reviews and analyzes air quality data and other documents; conducts inquiries, compiles and researches information. Additional program support as needed-could include supporting site visits, off site community meetings, air quality sensor network maintenance, etc. General Support: Provide grant support including with any progress report deliverables, looking for documentation, helping to track deadlines, follow up with partners, etc. Assist Brightline team members with other projects as needed Required Qualifications, Skills, and Abilities: 3-5 years of related experience, including some direct work with environmental mapping Experience with utilizing ArcGIS Experience with R, modeling or other coding languages Familiarity utilizing Google Suite, Canva (or other design software) Strong interpersonal, written and oral communication skills Ability to work with data and identify trends and areas where further data points or analysis are needed Ability to translate and bridge the technical components of the data analysis and findings to those not in the field, including individuals in the community Ability to collaborate with multiple stakeholders and take and incorporate feedback/input Outstanding relationship-building skills, as well as ability to adapt communication style Must demonstrate strong organizational skills, ability to adapt and must be detail oriented. Desire to learn more about Brightline's work Passion for working in environmental justice Desire to work with diverse communities and neighborhoods in San Francisco Preferred Skills & Qualifications Masters or PhD in related field Experience with Tableau or other data visualization software Experience with website coding Spanish, Cantonese, or Arabic language skills Experience working with community-based organizations and or low-income communities How to Apply Please email a short cover letter, resume, and three references (preferably direct supervisors and include e-mail and a phone number for each) to ************************** with the subject line “Air Quality Data Analyst Application - [Your Name].” Applications will be reviewed on a rolling basis, with first round of interviews occurring the week of December 1, 2025, until the position is filled.
    $36 hourly 3d ago
  • Staff Data Scientist

    Quantix Search

    Data analyst job in Santa Rosa, CA

    Staff Data Scientist | San Francisco | $250K-$300K + Equity We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI. In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance. What you'll do: Drive deep-dive analyses on user behavior, product performance, and growth drivers Design and interpret A/B tests to measure product impact at scale Build scalable data models, pipelines, and dashboards for company-wide use Partner with Product and Engineering to embed experimentation best practices Evaluate ML models, ensuring business relevance, performance, and trade-off clarity What we're looking for: 5+ years in data science or product analytics at scale (consumer or marketplace preferred) Advanced SQL and Python skills, with strong foundations in statistics and experimental design Proven record of designing, running, and analyzing large-scale experiments Ability to analyze and reason about ML models (classification, recommendation, LLMs) Strong communicator with a track record of influencing cross-functional teams If you're excited by the sound of this challenge- apply today and we'll be in touch.
    $250k-300k yearly 3d ago
  • Data Scientist

    Skale 3.7company rating

    Data analyst job in San Francisco, CA

    We're working with a Series A health tech start-up pioneering a revolutionary approach to healthcare AI, developing neurosymbolic systems that combine statistical learning with structured medical knowledge. Their technology is being adopted by leading health systems and insurers to enhance patient outcomes through advanced predictive analytics. We're seeking Machine Learning Engineers who excel at the intersection of data science, modeling, and software engineering. You'll design and implement models that extract insights from longitudinal healthcare data, balancing analytical rigor, interpretability, and scalability. This role offers a unique opportunity to tackle foundational modeling challenges in healthcare, where your contributions will directly influence clinical, actuarial, and policy decisions. Key Responsibilities Develop predictive models to forecast disease progression, healthcare utilization, and costs using temporal clinical data (claims, EHR, laboratory results, pharmacy records) Design interpretable and explainable ML solutions that earn the trust of clinicians, actuaries, and healthcare decision-makers Research and prototype innovative approaches leveraging both classical and modern machine learning techniques Build robust, scalable ML pipelines for training, validation, and deployment in distributed computing environments Collaborate cross-functionally with data engineers, clinicians, and product teams to ensure models address real-world healthcare needs Communicate findings and methodologies effectively through visualizations, documentation, and technical presentations Required Qualifications Strong foundation in statistical modeling, machine learning, or data science, with preference for experience in temporal or longitudinal data analysis Proficiency in Python and ML frameworks (PyTorch, JAX, NumPyro, PyMC, etc.) Proven track record of transitioning models from research prototypes to production systems Experience with probabilistic methods, survival analysis, or Bayesian inference (highly valued) Bonus Qualifications Experience working with clinical data and healthcare terminologies (ICD, CPT, SNOMED CT, LOINC) Background in actuarial modeling, claims forecasting, or risk adjustment methodologies
    $123k-171k yearly est. 1d ago
  • Business Intelligence Analyst - Tableau

    A.K.A. Brands 3.8company rating

    Data analyst job in Santa Rosa, CA

    About the Role We are seeking a Tableau Report Developer to join our Data & Analytics team in San Francisco. This role is critical to building and maintaining high-quality business reporting that drives decision-making across our retail brands. You will work closely with stakeholders in finance, operations, merchandising, and leadership to deliver insights that directly impact growth. Responsibilities ● Design, develop, and maintain Tableau dashboards and reports that provide actionable insights to business teams. ● Translate business questions into effective data visualizations and reporting solutions. ● Partner with stakeholders to understand requirements, gather feedback, and refine reporting deliverables. ● Perform data analysis to validate trends, identify anomalies, and ensure accuracy of reporting. ● Work with the data engineering team to improve data pipelines and ensure reliable data availability. ● Provide ad-hoc reporting support for retail, e-commerce, and cross-functional business partners. Requirements ● 3+ years of professional experience developing Tableau dashboards and reports. ● Strong background in data analysis and business reporting. ● Excellent ability to engage with business stakeholders-translating needs into technical solutions. ● Experience in retail or e-commerce analytics highly preferred. ● Solid SQL skills and familiarity with cloud-based data warehouses (e.g., Snowflake, Domo). ● Strong communication and collaboration skills.
    $84k-115k yearly est. 5d ago
  • Snowflake Data Architect

    Gspann Technologies, Inc. 3.4company rating

    Data analyst job in San Francisco, CA

    About GSPANN Headquartered in Milpitas, California (U.S.A.), GSPANN provides consulting and IT services to global clients, ranging from mid-size to Fortune 500 companies. With our experience in retail, high-technology, and manufacturing, we help our clients to transform and deliver business value by optimizing their IT capabilities, practices, and operations. Counting on our ten offices, including four global delivery centers, and approximately 1400 employees globally, we offer the intimacy of a boutique consultancy with capabilities of a large IT services firm. Location: South San Francisco, CA Duration: Long Term Key Responsibilities • Design and architect end-to-end data solutions leveraging Snowflake as the primary data warehouse. • Build and optimize ETL/ELT pipelines using Matillion, ensuring scalability, performance, and reliability. • Develop data models (conceptual, logical, and physical) addressing business and analytical requirements. • Define and enforce data governance, data quality, and metadata management standards. • Work closely with product, BI, analytics, and engineering teams to translate business needs into architectural blueprints. • Optimize Snowflake performance: clustering, micro-partitioning, caching, warehouse sizing, and cost governance. • Integrate data from various structured and unstructured sources. • Review existing data systems and recommend modernization opportunities. • Ensure security best practices including role-based access control, encryption, and compliance. • Create documentation, architectural diagrams, and data flow maps. • Drive best practices for CI/CD, data testing, and DevOps for data pipelines. Required Skills • 10+ years of experience in data engineering or data architecture roles. • Strong hands-on expertise in Matillion ETL/ELT (jobs, orchestration, components, transformations). • Deep understanding of Snowflake-warehouses, schema design, streams, tasks, materialized views, performance tuning. • Proficiency in SQL, data modeling, and data warehouse concepts (Kimball/Inmon). • Experience with cloud platforms (AWS/Azure/GCP), especially AWS services like S3, Lambda, EC2, IAM. • Strong understanding of batch and real-time data processing patterns. • Knowledge of data governance, master data management, and metadata frameworks. • Experience with version control (Git) and CI/CD workflows. • Ability to create architecture diagrams using tools like Lucidchart, Draw.io, or Visio. Working at GSPANN GSPANN is a diverse, prosperous, and rewarding place to work. We provide competitive benefits, educational assistance, and career growth opportunities to our employees. Every employee is valued for their talent and contribution. Working with us will give you an opportunity to work globally with some of the best brands in the industry. The company does and will take affirmative action to employ and advance in the employment of individuals with disabilities and protected veterans and to treat qualified individuals without discrimination based on their physical or mental disability status. GSPANN is an equal opportunity employer for minorities/females/veterans/disabled.
    $119k-168k yearly est. 1d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Data analyst job in Santa Rosa, CA

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 1d ago
  • AI Data Engineer

    Hartleyco

    Data analyst job in San Francisco, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 5d ago
  • Data Engineer

    Zigma LLC

    Data analyst job in Santa Rosa, CA

    Zigma LLC is a women-owned technology consulting and IT services start-up specializing in Big Data engineering, cloud data modernization, cloud architecture, and advanced analytics. Our mission is to empower organizations through secure, scalable, and high-performance digital ecosystems while maintaining a strong commitment to cybersecurity and compliance. We work with clients across various industries, including healthcare, telecom, and financial services, ranging from local businesses to enterprise-level corporations. Dedicated to fostering inclusion and women's leadership, we strive to deliver innovative solutions that drive operational efficiency and digital transformation. Zigma LLC combines technical expertise with a passion for empowering the next generation of women entrepreneurs. Data Engineer (Mid-Level) - Hybrid | C2C | Healthcare Locations: East Bay Area, CA | Greater Los Angeles Area, CA | Oregon's Willamette Valley, OR | Greater Atlanta Area, GA Employment Type: C2C Work Authorization: US Citizens, Green Card, H4/L2/Any EAD, OPT/CPT Candidates. Work Arrangement: Hybrid Openings: 3 per location Experience: 7-12 years Contract: Long-term (12+ months, performance-based) Preferred Education/Certification: B.S/M.S. in Engineering Discipline with Computer Science, Data Engineering or relevant skills and certifications Join a leading healthcare analytics team as a Data Engineer! Work on Azure Cloud, Databricks, and modern Data Pipelines to drive insights from complex healthcare datasets. This is a hybrid role with opportunities to collaborate across multiple locations. Key Responsibilities: • Design, build, and maintain ETL/ELT Ingestion pipelines on Azure Cloud • Collaborate with data scientists and analysts to ensure data quality, governance, and availability • Implement batch and streaming data processing workflows • Optimize data workflows and pipelines for performance and scalability • Work with HIPAA-compliant healthcare data Technical Skills & Tools: Programming & Scripting: Python, SQL, Scala/Java Data Processing Frameworks: Apache Spark, Kafka, Airflow/Prefect Databases: Relational (PostgreSQL, MySQL, SQL Server), NoSQL (MongoDB, Cassandra), Data Warehouses (Snowflake, Redshift) Data Formats: CSV, JSON, Parquet, Avro, ORC Version Control & DevOps: Git, Azure DevOps, CI/CD pipelines Cloud & Containerization: Azure Cloud, Docker, Kubernetes, Terraform Core Skills: • ETL/ELT Ingestion pipeline design • Batch & streaming data processing • Data modelling (star/snowflake schema) • Performance optimization & scalability • Data governance and security Must-Have: • 7-12 years in Data Engineering • Hands-on Azure Cloud and Databricks experience • M.S. in Data Science or relevant certifications (Databricks/Data Science)
    $110k-157k yearly est. 1d ago
  • Senior Data Engineer

    Sigmaways Inc.

    Data analyst job in Santa Rosa, CA

    If you're hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you. As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines that transform raw information into actionable insights. The ideal candidate will have strong experience across modern data platforms, cloud environments, and big data technologies, with a focus on building secure, repeatable, and high-performing solutions. Responsibilities: Design, develop, and maintain secure, scalable data pipelines to ingest, transform, and deliver curated data into the Common Data Platform (CDP). Participate in Agile rituals and contribute to delivery within the Scaled Agile Framework (SAFe). Ensure quality and reliability of data products through automation, monitoring, and proactive issue resolution. Deploy alerting and auto-remediation for pipelines and data stores to maximize system availability. Apply a security first and automation-driven approach to all data engineering practices. Collaborate with cross-functional teams (data scientists, analysts, product managers, and business stakeholders) to align infrastructure with evolving data needs. Stay current on industry trends and emerging tools, recommending improvements to strengthen efficiency and scalability. Qualifications: Bachelor's degree in Computer Science, Information Systems, or related field (or equivalent experience). At least 3 years of experience with Python and PySpark, including Jupyter notebooks and unit testing. At least 2 years of experience with Databricks, Collibra, and Starburst. Proven work with relational and NoSQL databases, including STAR and dimensional modeling approaches. Hands-on experience with modern data stacks: object stores (S3), Spark, Airflow, lakehouse architectures, and cloud warehouses (Snowflake, Redshift). Strong background in ETL and big data engineering (on-prem and cloud). Work within enterprise cloud platforms (CFS2, Cloud Foundational Services 2/EDS) for governance and compliance. Experience building end-to-end pipelines for structured, semi-structured, and unstructured data using Spark.
    $110k-157k yearly est. 2d ago
  • Data Engineer

    Midjourney

    Data analyst job in Santa Rosa, CA

    Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species. We are a small, self-funded team focused on design, human infrastructure, and AI. We have no investors, no big company controlling us, and no advertisers. We are 100% supported by our amazing community. Our tools are already used by millions of people to dream, to explore, and to create. But this is just the start. We think the story of the 2020s is about building the tools that will remake the world for the next century. We're making those tools, to expand what it means to be human. Core Responsibilities: Design and maintain data pipelines to consolidate information across multiple sources (subscription platforms, payment systems, infrastructure and usage monitoring, and financial systems) into a unified analytics environment Build and manage interactive dashboards and self-service BI tools that enable leadership to track key business metrics including revenue performance, infrastructure costs, customer retention, and operational efficiency Serve as technical owner of our financial planning platform (Pigment or similar), leading implementation and build-out of models, data connections, and workflows in partnership with Finance leadership to translate business requirements into functional system architecture Develop automated data quality checks and cleaning processes to ensure accuracy and consistency across financial and operational datasets Partner with Finance, Product and Operations teams to translate business questions into analytical frameworks, including cohort analysis, cost modeling, and performance trending Create and maintain documentation for data models, ETL processes, dashboard logic, and system workflows to ensure knowledge continuity Support strategic planning initiatives by building financial models, scenario analyses, and data-driven recommendations for resource allocation and growth investments Required Qualifications: 3-5+ years experience in data engineering, analytics engineering, or similar role with demonstrated ability to work with large-scale datasets Strong SQL skills and experience with modern data warehousing solutions (BigQuery, Snowflake, Redshift, etc.) Proficiency in at least one programming language (Python, R) for data manipulation and analysis Experience with BI/visualization tools (Looker, Tableau, Power BI, or similar) Hands-on experience administering enterprise financial systems (NetSuite, SAP, Oracle, or similar ERP platforms) Experience working with Stripe Billing or similar subscription management platforms, including data extraction and revenue reporting Ability to communicate technical concepts clearly to non-technical stakeholders
    $110k-157k yearly est. 5d ago
  • Data Platform Engineer / AI Workloads

    The Crypto Recruiters 3.3company rating

    Data analyst job in San Francisco, CA

    We are actively searching for a Data Infrastructure Engineer to join our team on a permanent basis. In this founding engineer role you will focus on building next-generation data infrastructure for our AI platform. If you have a passion for distributed systems, unified storage, orchestration, and retrieval for AI workloads we would love to speak with you. Your Rhythm: Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient Tackle complex challenges in distributed systems, databases, and AI infrastructure Collaborate with technical leadership to define and refine the product roadmap Write high-quality, well-tested, and maintainable code Contribute to the open-source community and engage with developers in the space Your Vibe: 5+ years experience designing building distributed database systems Expertise in building and operating scalable, reliable and secure database infrastructure systems Strong knowledge around distributed compute, data orchestration, distributed storage, streaming infrastructure Strong knowledge of SQL and NoSQL databases, such as MySQL, Postgres, and MongoDB. Programming skills in Python Passion for building developer tools and scalable infrastructure Our Vibe: Relaxed work environment 100% paid top of the line health care benefits Full ownership, no micro management Strong equity package 401K Unlimited vacation An actual work/life balance, we aren't trying to run you into the ground. We have families and enjoy life too!
    $127k-180k yearly est. 2d ago
  • Data Engineer - Scientific Data Ingestion

    Mithrl

    Data analyst job in San Francisco, CA

    We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world's first commercially available AI Co-Scientist-a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks. We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughs-from target discovery to mechanism of action. WHAT YOU WILL DO Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources - unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.). Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data - extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion - so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data. Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must-have 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet-style data - inconsistent headers, multiple sheets, mixed formats, free-text fields - and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data. Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end-to-end - from raw upload → final clean dataset - with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions. Nice-to-have Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion. Past exposure to LLM-based data transformation or cleansing agents - building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab-data / bioinformatics is a bonus - though not required. WHAT YOU WILL LOVE AT MITHRL Mission-driven impact: you'll be the gatekeeper of data quality - ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You'll have outsized influence over the reliability and trustworthiness of our entire data + AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You'll work closely with our product, data science, and infrastructure teams - shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high-energy, in-person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
    $110k-157k yearly est. 2d ago
  • Data Engineer

    Odiin

    Data analyst job in San Francisco, CA

    You'll work closely with engineering, analytics, and product teams to ensure data is accurate, accessible, and efficiently processed across the organization. Key Responsibilities: Design, develop, and maintain scalable data pipelines and architectures. Collect, process, and transform data from multiple sources into structured, usable formats. Ensure data quality, reliability, and security across all systems. Work with data analysts and data scientists to optimize data models for analytics and machine learning. Implement ETL (Extract, Transform, Load) processes and automate workflows. Monitor and troubleshoot data infrastructure, ensuring minimal downtime and high performance. Collaborate with cross-functional teams to define data requirements and integrate new data sources. Maintain comprehensive documentation for data systems and processes. Requirements: Proven experience as a Data Engineer, ETL Developer, or similar role. Strong programming skills in Python, SQL, or Scala. Experience with data pipeline tools (Airflow, dbt, Luigi, etc.). Familiarity with big data technologies (Spark, Hadoop, Kafka, etc.). Hands-on experience with cloud data platforms (AWS, GCP, Azure, Snowflake, or Databricks). Understanding of data modeling, warehousing, and schema design. Solid knowledge of database systems (PostgreSQL, MySQL, NoSQL). Strong analytical and problem-solving skills.
    $110k-157k yearly est. 3d ago
  • Data Engineer / Analytics Specialist

    Ittconnect

    Data analyst job in San Francisco, CA

    Citizenship Requirement: U.S. Citizens Only ITTConnect is seeking a Data Engineer / Analytics to work for one of our clients, a major Technology Consulting firm with headquarters in Europe. They are experts in tailored technology consulting and services to banks, investment firms and other Financial vertical clients. Job location: San Francisco Bay area or NY City. Work Model: Ability to come into the office as requested Seniority: 10+ years of total experience About the role: The Data Engineer / Analytics Specialist will support analytics, product insights, and AI initiatives. You will build robust data pipelines, integrate data sources, and enhance the organization's analytical foundations. Responsibilities: Build and operate Snowflake-based analytics environments. Develop ETL/ELT pipelines (DBT, Airflow, etc.). Integrate APIs, external data sources, and streaming inputs. Perform query optimization, basic data modeling, and analytics support. Enable downstream GenAI and analytics use cases. Requirements: 10+ years of overall technology experience 3+ years hands-on AWS experience required Strong SQL and Snowflake experience. Hands-on pipeline engineering with DBT, Airflow, or similar. Experience with API integrations and modern data architectures.
    $110k-157k yearly est. 3d ago
  • Staff Data Scientist

    Quantix Search

    Data analyst job in San Francisco, CA

    Staff Data Scientist | San Francisco | $250K-$300K + Equity We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI. In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance. What you'll do: Drive deep-dive analyses on user behavior, product performance, and growth drivers Design and interpret A/B tests to measure product impact at scale Build scalable data models, pipelines, and dashboards for company-wide use Partner with Product and Engineering to embed experimentation best practices Evaluate ML models, ensuring business relevance, performance, and trade-off clarity What we're looking for: 5+ years in data science or product analytics at scale (consumer or marketplace preferred) Advanced SQL and Python skills, with strong foundations in statistics and experimental design Proven record of designing, running, and analyzing large-scale experiments Ability to analyze and reason about ML models (classification, recommendation, LLMs) Strong communicator with a track record of influencing cross-functional teams If you're excited by the sound of this challenge- apply today and we'll be in touch.
    $250k-300k yearly 3d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Data analyst job in San Francisco, CA

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 1d ago
  • AI Data Engineer

    Hartleyco

    Data analyst job in Santa Rosa, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 5d ago
  • Senior Data Engineer

    Sigmaways Inc.

    Data analyst job in San Francisco, CA

    If you're hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you. As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines that transform raw information into actionable insights. The ideal candidate will have strong experience across modern data platforms, cloud environments, and big data technologies, with a focus on building secure, repeatable, and high-performing solutions. Responsibilities: Design, develop, and maintain secure, scalable data pipelines to ingest, transform, and deliver curated data into the Common Data Platform (CDP). Participate in Agile rituals and contribute to delivery within the Scaled Agile Framework (SAFe). Ensure quality and reliability of data products through automation, monitoring, and proactive issue resolution. Deploy alerting and auto-remediation for pipelines and data stores to maximize system availability. Apply a security first and automation-driven approach to all data engineering practices. Collaborate with cross-functional teams (data scientists, analysts, product managers, and business stakeholders) to align infrastructure with evolving data needs. Stay current on industry trends and emerging tools, recommending improvements to strengthen efficiency and scalability. Qualifications: Bachelor's degree in Computer Science, Information Systems, or related field (or equivalent experience). At least 3 years of experience with Python and PySpark, including Jupyter notebooks and unit testing. At least 2 years of experience with Databricks, Collibra, and Starburst. Proven work with relational and NoSQL databases, including STAR and dimensional modeling approaches. Hands-on experience with modern data stacks: object stores (S3), Spark, Airflow, lakehouse architectures, and cloud warehouses (Snowflake, Redshift). Strong background in ETL and big data engineering (on-prem and cloud). Work within enterprise cloud platforms (CFS2, Cloud Foundational Services 2/EDS) for governance and compliance. Experience building end-to-end pipelines for structured, semi-structured, and unstructured data using Spark.
    $110k-157k yearly est. 2d ago
  • Data Engineer

    Midjourney

    Data analyst job in San Francisco, CA

    Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species. We are a small, self-funded team focused on design, human infrastructure, and AI. We have no investors, no big company controlling us, and no advertisers. We are 100% supported by our amazing community. Our tools are already used by millions of people to dream, to explore, and to create. But this is just the start. We think the story of the 2020s is about building the tools that will remake the world for the next century. We're making those tools, to expand what it means to be human. Core Responsibilities: Design and maintain data pipelines to consolidate information across multiple sources (subscription platforms, payment systems, infrastructure and usage monitoring, and financial systems) into a unified analytics environment Build and manage interactive dashboards and self-service BI tools that enable leadership to track key business metrics including revenue performance, infrastructure costs, customer retention, and operational efficiency Serve as technical owner of our financial planning platform (Pigment or similar), leading implementation and build-out of models, data connections, and workflows in partnership with Finance leadership to translate business requirements into functional system architecture Develop automated data quality checks and cleaning processes to ensure accuracy and consistency across financial and operational datasets Partner with Finance, Product and Operations teams to translate business questions into analytical frameworks, including cohort analysis, cost modeling, and performance trending Create and maintain documentation for data models, ETL processes, dashboard logic, and system workflows to ensure knowledge continuity Support strategic planning initiatives by building financial models, scenario analyses, and data-driven recommendations for resource allocation and growth investments Required Qualifications: 3-5+ years experience in data engineering, analytics engineering, or similar role with demonstrated ability to work with large-scale datasets Strong SQL skills and experience with modern data warehousing solutions (BigQuery, Snowflake, Redshift, etc.) Proficiency in at least one programming language (Python, R) for data manipulation and analysis Experience with BI/visualization tools (Looker, Tableau, Power BI, or similar) Hands-on experience administering enterprise financial systems (NetSuite, SAP, Oracle, or similar ERP platforms) Experience working with Stripe Billing or similar subscription management platforms, including data extraction and revenue reporting Ability to communicate technical concepts clearly to non-technical stakeholders
    $110k-157k yearly est. 5d ago
  • Data Platform Engineer / AI Workloads

    The Crypto Recruiters 3.3company rating

    Data analyst job in Sonoma, CA

    We are actively searching for a Data Infrastructure Engineer to join our team on a permanent basis. In this founding engineer role you will focus on building next-generation data infrastructure for our AI platform. If you have a passion for distributed systems, unified storage, orchestration, and retrieval for AI workloads we would love to speak with you. Your Rhythm: Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient Tackle complex challenges in distributed systems, databases, and AI infrastructure Collaborate with technical leadership to define and refine the product roadmap Write high-quality, well-tested, and maintainable code Contribute to the open-source community and engage with developers in the space Your Vibe: 5+ years experience designing building distributed database systems Expertise in building and operating scalable, reliable and secure database infrastructure systems Strong knowledge around distributed compute, data orchestration, distributed storage, streaming infrastructure Strong knowledge of SQL and NoSQL databases, such as MySQL, Postgres, and MongoDB. Programming skills in Python Passion for building developer tools and scalable infrastructure Our Vibe: Relaxed work environment 100% paid top of the line health care benefits Full ownership, no micro management Strong equity package 401K Unlimited vacation An actual work/life balance, we aren't trying to run you into the ground. We have families and enjoy life too!
    $128k-181k yearly est. 2d ago

Learn more about data analyst jobs

How much does a data analyst earn in Santa Rosa, CA?

The average data analyst in Santa Rosa, CA earns between $60,000 and $129,000 annually. This compares to the national average data analyst range of $53,000 to $103,000.

Average data analyst salary in Santa Rosa, CA

$88,000

What are the biggest employers of Data Analysts in Santa Rosa, CA?

The biggest employers of Data Analysts in Santa Rosa, CA are:
  1. Atos
  2. Zigma LLC
Job type you want
Full Time
Part Time
Internship
Temporary