Senior data scientist jobs in Petaluma, CA - 653 jobs
All
Senior Data Scientist
Data Engineer
Data Scientist
Senior Data Architect
Lead Data Analyst
EY-Parthenon - Strategy and Execution - Growth Platforms - Data Scientist - Director
Ernst & Young Oman 4.7
Senior data scientist job in San Francisco, CA
Location: Atlanta, Boston, Chicago, Dallas, Denver, Detroit, Houston, Los Angeles, McLean, New York, Hoboken, Philadelphia, San Francisco, Seattle
At EY, we're all in to shape your future with confidence.
We'll help you succeed in a globally connected powerhouse of diverse teams and take your career wherever you want it to go. Join EY and help to build a better working world.
EY-Parthenon - EY Growth Platforms - DataScientist - Director The opportunity
EY-Parthenon's unique combination of transformative strategy, transactions and corporate finance delivers real-world value - solutions that work in practice, not just on paper. Benefiting from EY's full spectrum of services, we've reimagined strategic consulting to work in a world of increasing complexity.
With deep functional and sector expertise, paired with innovative AI-powered technology and an investor mindset, we partner with CEOs, Boards, Private Equity and Governments every step of the way - enabling you to shape your future with confidence.
Within the EY-Parthenon service line, the EY Growth Platforms DataScientist Director will collaborate with Business Leaders, AI/ML Engineers, Project Managers, and other team members to design, build, and scale innovative AI solutions that power strategic growth initiatives and create enterprise value for F500 clients.
Your key responsibilities
The EY Growth Platforms DataScientist Director will play a critical role building and scaling our multi-source data pipelines- sourcing, merging, and transforming data assets that power high-visibility client engagements. This role will architect, clean, transform, and enrich data to power AI/ML-driven agents and dashboards, and collaborate with Business leaders and C-level executives to get hands‑on experience solving some of the most interesting and mission‑critical business questions with data.
Skills and attributes for success
Lead ingestion and ETL design for structured and semi‑structured data (CSV, JSON, APIs, Flat Files).
Understand schema, data quality, and transformation logic for multiple sources on a client‑by‑client like NAIC, NOAA, Google Trends, EBRI, Cannex, LIMRA, and internal client logs.
Design normalization and joining pipelines across vertical domains (insurance + consumer + economic data).
Build data access layers optimized for ML (feature stores, event streams, vector stores).
Define and enforce standards for data provenance, quality checks, logging, and version control.
Partner with AI/ML and Platform teams to ensure data is ML‑ and privacy‑ready (HIPAA, SOC2, etc.).
To qualify for the role you must have
A bachelor's degree in Business, Statistics, Economics, Mathematics, Engineering, Computer Science, Analytics, or other related field and 5 years of related work experience; or a graduate degree and approximately 3 years of related work experience.
Experience in data engineering or hybrid data science roles focused on pipeline scalability and schema management.
Expertise in cloud‑native data infrastructure (e.g., GCP/AWS, Snowflake, BigQuery, Databricks, Delta Lake).
Strong SQL/Python/Scala proficiency and experience with orchestration tools (Airflow, dbt).
Experience with merging and reconciling third‑party data (public APIs, vendor flat files, dashboards).
Comfort defining semantic layers and mapping unstructured/dirty datasets into usable models for AI/BI use.
Basic understanding of ML/feature pipelines and downstream modeling needs.
The ability and willingness to travel and work in excess of standard hours when necessary.
Ideally, you will have
Experience working in a startup and/or management/strategy consulting.
Knowledge of how to leverage AI tools in a business setting, including Microsoft Copilot.
Collaborative, problem‑solving, and growth‑oriented mindset.
What we look for
We're interested in passionate leaders with strong vision and a desire to stay on top of trends in the Data Science and Big Data industry. If you have a genuine passion for helping businesses achieve the full potential of their data, this role is for you.
What we offer you
At EY, we'll develop you with future‑focused skills and equip you with world‑class experiences. We'll empower you in a flexible environment, and fuel you and your extraordinary talents in a diverse and inclusive culture of globally connected teams. Learn more.
We offer a comprehensive compensation and benefits package where you'll be rewarded based on your performance and recognized for the value you bring to the business. The base salary range for this job in all geographic locations in the US is $205,000 to $235,000. Individual salaries within those ranges are determined through a wide variety of factors including but not limited to education, experience, knowledge, skills and geography. In addition, our Total Rewards package includes medical and dental coverage, pension and 401(k) plans, and a wide range of paid time off options.
Join us in our team‑led and leader‑enabled hybrid model. Our expectation is for most people in external, client serving roles to work together in person 40‑60% of the time over the course of an engagement, project or year.
Under our flexible vacation policy, you'll decide how much vacation time you need based on your own personal circumstances. You'll also be granted time off for designated EY Paid Holidays, Winter/Summer breaks, Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well‑being.
Are you ready to shape your future with confidence? Apply today.
EY accepts applications for this position on an on‑going basis.
For those living in California, please click here for additional information.
EY focuses on high‑ethical standards and integrity among its employees and expects all candidates to demonstrate these qualities.
EY | Building a better working world
EY is building a better working world by creating new value for clients, people, society and the planet, while building trust in capital markets.
Enabled by data, AI and advanced technology, EY teams help clients shape the future with confidence and develop answers for the most pressing issues of today and tomorrow.
EY teams work across a full spectrum of services in assurance, consulting, tax, strategy and transactions. Fueled by sector insights, a globally connected, multi‑disciplinary network and diverse ecosystem partners, EY teams can provide services in more than 150 countries and territories.
EY provides equal employment opportunities to applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity/expression, pregnancy, genetic information, national origin, protected veteran status, disability status, or any other legally protected basis, including arrest and conviction records, in accordance with applicable law.
EY is committed to providing reasonable accommodation to qualified individuals with disabilities including veterans with disabilities. If you have a disability and either need assistance applying online or need to request an accommodation during any part of the application process, please call 1-800-EY-HELP3, select Option 2 for candidate related inquiries, then select Option 1 for candidate queries and finally select Option 2 for candidates with an inquiry which will route you to EY's Talent Shared Services Team (TSS) or email the TSS at ************************** .
#J-18808-Ljbffr
$205k-235k yearly 4d ago
Looking for a job?
Let Zippia find it for you.
Staff Data Scientist - Post Sales
Harnham
Senior data scientist job in Santa Rosa, CA
Salary: $200-250k base + RSUs
This fast-growing Series E AI SaaS company is redefining how modern engineering teams build and deploy applications. We're expanding our data science organization to accelerate customer success after the initial sale-driving onboarding, retention, expansion, and long-term revenue growth.
About the Role
As the seniordatascientist supporting post-sales teams, you will use advanced analytics, experimentation, and predictive modeling to guide strategy across Customer Success, Account Management, and Renewals. Your insights will help leadership forecast expansion, reduce churn, and identify the levers that unlock sustainable net revenue retention.
Key Responsibilities
Forecast & Model Growth: Build predictive models for renewal likelihood, expansion potential, churn risk, and customer health scoring.
Optimize the Customer Journey: Analyze onboarding flows, product adoption patterns, and usage signals to improve activation, engagement, and time-to-value.
Experimentation & Causal Analysis: Design and evaluate experiments (A/B tests, uplift modeling) to measure the impact of onboarding programs, success initiatives, and pricing changes on retention and expansion.
Revenue Insights: Partner with Customer Success and Sales to identify high-value accounts, cross-sell opportunities, and early warning signs of churn.
Cross-Functional Partnership: Collaborate with Product, RevOps, Finance, and Marketing to align post-sales strategies with company growth goals.
Data Infrastructure Collaboration: Work with Analytics Engineering to define data requirements, maintain data quality, and enable self-serve dashboards for Success and Finance teams.
Executive Storytelling: Present clear, actionable recommendations to senior leadership that translate complex analysis into strategic decisions.
About You
Experience: 6+ years in data science or advanced analytics, with a focus on post-sales, customer success, or retention analytics in a B2B SaaS environment.
Technical Skills: Expert SQL and proficiency in Python or R for statistical modeling, forecasting, and machine learning.
Domain Knowledge: Deep understanding of SaaS metrics such as net revenue retention (NRR), gross churn, expansion ARR, and customer health scoring.
Analytical Rigor: Strong background in experimentation design, causal inference, and predictive modeling to inform customer-lifecycle strategy.
Communication: Exceptional ability to translate data into compelling narratives for executives and cross-functional stakeholders.
Business Impact: Demonstrated success improving onboarding efficiency, retention rates, or expansion revenue through data-driven initiatives.
$200k-250k yearly 2d ago
Data Partnerships Lead - Equity & Growth (SF)
Exa
Senior data scientist job in San Francisco, CA
A cutting-edge AI search engine company in San Francisco is seeking a Data Partnerships specialist to build their data pipeline. The role involves owning the partnerships cycle, making strategic decisions, negotiating contracts, and potentially building a team. Candidates should have experience in contract negotiation and a Juris Doctor degree. This in-person role offers a competitive salary range of $160,000 - $250,000 with above-market equity.
#J-18808-Ljbffr
$160k-250k yearly 5d ago
Data Scientist
Talent Software Services 3.6
Senior data scientist job in Novato, CA
Are you an experienced DataScientist with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced DataScientist to work at their company in Novato, CA.
Client's Data Science is responsible for designing, capturing, analyzing, and presenting data that can drive key decisions for Clinical Development, Medical Affairs, and other business areas of Client. With a quality-by-design culture, Data Science builds quality data that is fit-for-purpose to support statistically sound investigation of critical scientific questions. The Data Science team develops solid analytics that are visually relevant and impactful in supporting key data-driven decisions across Client. The Data Management Science (DMS) group contributes to Data Science by providing complete, correct, and consistent analyzable data at data, data structure and documentation levels following international standards and GCP. The DMS Center of Risk Based Quality Management (RBQM) sub-function is responsible for the implementation of a comprehensive, cross-functional strategy to proactively manage quality risks for clinical trials. Starting at protocol development, the team collaborates to define critical-to-quality factors, design fit-for-purpose quality strategies, and enable ongoing oversight through centralized monitoring and data-driven risk management. The RBQM DataScientist supports central monitoring and risk-based quality management (RBQM) for clinical trials. This role focuses on implementing and running pre-defined KRIs, QTLs, and other risk metrics using clinical data, with strong emphasis on SAS programming to deliver robust and scalable analytics across multiple studies.
Primary Responsibilities/Accountabilities:
The RBQM DataScientist may perform a range of the following responsibilities, depending upon the study's complexity and the study's development stage:
Implement and maintain pre-defined KRIs, QTLs, and triggers using robust SAS programs/macros across multiple clinical studies.
Extract, transform, and integrate data from EDC systems (e.g., RAVE) and other clinical sources into analysis-ready SAS datasets.
Run routine and ad-hoc RBQM/central monitoring outputs (tables, listings, data extracts, dashboard feeds) to support signal detection and study review.
Perform QC and troubleshooting of SAS code; ensure outputs are accurate and efficient.
Maintain clear technical documentation (specifications, validation records, change logs) for all RBQM programs and processes.
Collaborate with Central Monitors, Central Statistical Monitors, Data Management, Biostatistics, and Study Operations to understand requirements and ensure correct implementation of RBQM metrics.
Qualifications:
PhD, MS, or BA/BS in statistics, biostatistics, computer science, data science, life science, or a related field.
Relevant clinical development experience (programming, RBM/RBQM, Data Management), for example:
PhD: 3+ years
MS: 5+ years
BA/BS: 8+ years
Advanced SAS programming skills (hard requirement) in a clinical trials environment (Base SAS, Macro, SAS SQL; experience with large, complex clinical datasets).
Hands-on experience working with clinical trial data.•Proficiency with Microsoft Word, Excel, and PowerPoint.
Technical - Preferred / Strong Plus
Experience with RAVE EDC.
Awareness or working knowledge of CDISC, CDASH, SDTM standards.
Exposure to R, Python, or JavaScript and/or clinical data visualization tools/platforms.
Preferred:
Knowledge of GCP, ICH, FDA guidance related to clinical trials and risk-based monitoring.
Strong analytical and problem-solving skills; ability to interpret complex data and risk outputs.
Effective communication and teamwork skills; comfortable collaborating with cross-functional, global teams.
Ability to manage multiple programming tasks and deliver high-quality work in a fast-paced environment.
$99k-138k yearly est. 3d ago
Staff Data Engineer, Energy
Medium 4.0
Senior data scientist job in San Francisco, CA
About GoodLeap
GoodLeap is a technology company delivering best-in-class financing and software products for sustainable solutions, from solar panels and batteries to energy-efficient HVAC, heat pumps, roofing, windows, and more. Over 1 million homeowners have benefited from our simple, fast, and frictionless technology that makes the adoption of these products more affordable, accessible, and easier to understand. Thousands of professionals deploying home efficiency and solar solutions rely on GoodLeap's proprietary, AI-powered applications and developer tools to drive more transparent customer communication, deeper business intelligence, and streamlined payment and operations. Our platform has led to more than $30 billion in financing for sustainable solutions since 2018.
GoodLeap is also proud to support our award-winning nonprofit, GivePower, which is building and deploying life-saving water and clean electricity systems, changing the lives of more than 1.6 million people across Africa, Asia, and South America.
Position Summary
The GoodLeap team is looking for a hands‑on Data Engineer with a strong background in API data integrations, Spark processing and data lake development. The focus of this role will be on ingesting production energy data and helping get the aggregated metrics to the many teams in GoodLeap that need them. The successful candidate is a highly motivated individual with strong technical skills to create secure and performant data pipelines as well as support our foundational enterprise data warehouse. The ideal candidate is passionate about quality and has a bold, visionary approach to data practices in a modern finance enterprise.
The candidate in this role will be required to work closely with cross‑functional teams to effectively coordinate the complex interdependencies inherent in the applications. Typical teams we collaborate with are Analytics & Reporting, Origination Platform engineers and AI developers. We are looking for a hardworking and passionate engineer who wants to make a difference with the tools they develop.
Essential Job Duties and Responsibilities
Implement data integrations across the organization as well as with business applications
Develop and maintain data oriented web applications with scalable web services
Participate in the design and development of projects, either independently or in a team
Utilize agile software development lifecycle and DevOps principles
Be the data stewards of the organization upholding quality and availability standards for our downstream consumers
Be self‑sufficient and fully own the responsibility of executing projects from inception to delivery
Provide mentorship to team members including pair programming and skills development
Participate in data design and architecture discussions, considering solutions in the context of the larger GoodLeap ecosystem
Required Skills, Knowledge & Abilities
6-10 years of full‑time Data Analysis and/or Software Development experience
Experience with an end to end reporting & analytics technology: data warehousing (SQL, NoSQL) to BI/Visualization (Tableau, PowerBI, Excel)
Degree in Computer Science or related discipline
Experience with DataBricks/Spark processing
Expertise with relational databases (including functional SQL/stored procedures) and non‑relational databases (MongoDB, DynamoDB, Elastic Search)
Experience with orchestrating data pipelines with modern tools such as Airflow
Strong knowledge and hands‑on experience with open source web frameworks (e.g. Vue /React)
Solid understanding of performance implications and scalability of code
Experience with Amazon Web Services (IAM, Cognito, EC2, S3, RDS, Cloud Formation)
Experience with messaging paradigms and serverless technologies (Lambda, SQS, SNS, SES)
Experience working with server‑less applications on public clouds (e.g. AWS)
Experience with large, complex codebases and know how to maintain them
$160,000 - $210,000 a year
In addition to the above salary, this role may be eligible for a bonus and equity.
Additional Information Regarding Job Duties and s
Job duties include additional responsibilities as assigned by one's supervisor or other managers related to the position/department. This job description is meant to describe the general nature and level of work being performed; it is not intended to be construed as an exhaustive list of all responsibilities, duties and other skills required for the position. The Company reserves the right at any time with or without notice to alter or change job responsibilities, reassign or transfer job position or assign additional job responsibilities, subject to applicable law. The Company shall provide reasonable accommodations of known disabilities to enable a qualified applicant or employee to apply for employment, perform the essential functions of the job, or enjoy the benefits and privileges of employment as required by the law.
If you are an extraordinary professional who thrives in a collaborative work culture and values a rewarding career, then we want to work with you! Apply today!
We are committed to protecting your privacy. To learn more about how we collect, use, and safeguard your personal information during the application process, please review our Employment Privacy Policy and Recruiting Policy on AI.
#J-18808-Ljbffr
$160k-210k yearly 3d ago
Staff Machine Learning Data Engineer
Backflip 3.7
Senior data scientist job in San Francisco, CA
Mechanical design, the work done in CAD, is the rate-limiter for progress in the physical world. However, there are only 2-4 million people on Earth who know how to CAD. But what if hundreds of millions could? What if creating something in the real world were as easy as imagining the use case, or sketching it on paper?
Backflip is building a foundation model for mechanical design: unifying the world's scattered engineering knowledge into an intelligent, end-to-end design environment. Our goal is to enable anyone to imagine a solution and hit “print.”
Founded by a second-time CEO in the same space (first company: Markforged), Backflip combines deep industry insight with breakthrough AI research. Backed by a16z and NEA, we raised a $30M Series A and built a deeply technical, mission-driven team.
We're building the AI foundation that tomorrow's space elevators, nanobots, and spaceships will be built in.
If you're excited to define the next generation of hard tech, come build it with us.
The Role
We're looking for a Staff Machine Learning Data Engineer to lead and build the data pipelines powering Backflip's foundation model for manufacturing and CAD.
You'll design the systems, tools, and strategies that turn the world's engineering knowledge - text, geometry, and design intent - into high-quality training data.
This is a core leadership role within the AI team, driving the data architecture, augmentation, and evaluation that underpin our model's performance and evolution.
You'll collaborate with Machine Learning Engineers to run data-driven experiments, analyze results, and deliver AI products that shape the future of the physical world.
What You'll Do
Architect and own Backflip's ML data pipeline, from ingestion to processing to evaluation.
Define data strategy: establish best practices for data augmentation, filtering, and sampling at scale.
Design scalable data systems for multimodal training (text, geometry, CAD, and more).
Develop and automate data collection, curation, and validation workflows.
Collaborate with MLEs to design and execute experiments that measure and improve model performance.
Build tools and metrics for dataset analysis, monitoring, and quality assurance.
Contribute to model development through insights grounded in data, shaping what, how, and when we train.
Who You Are
You've built and maintained ML data pipelines at scale, ideally for foundation or generative models, that shipped into production in the real world.
You have deep experience with data engineering for ML, including distributed systems, data extraction, transformation, and loading, and large-scale data processing (e.g. PySpark, Beam, Ray, or similar).
You're fluent in Python and experienced with ML frameworks and data formats (Parquet, TFRecord, HuggingFace datasets, etc.).
You've developed data augmentation, sampling, or curation strategies that improved model performance.
You think like both an engineer and an experimentalist: curious, analytical, and grounded in evidence.
You collaborate well across AI development, infra, and product, and enjoy building the data systems that make great models possible.
You care deeply about data quality, reproducibility, and scalability.
You're excited to help shape the future of AI for physical design.
Bonus points if:
You are comfortable working with a variety of complex data formats, e.g. for 3D geometry kernels or rendering engines.
You have an interest in math, geometry, topology, rendering, or computational geometry.
You've worked in 3D printing, CAD, or computer graphics domains.
Why Backflip
This is a rare opportunity to own the data backbone of a frontier foundation model, and help define how AI learns to design the physical world.
You'll join a world-class, mission-driven team operating at the intersection of research, engineering, and deep product sense, building systems that let people design the physical world as easily as they imagine it.
Your work will directly shape the performance, capability, and impact of Backflip's foundation model, the core of how the world will build in the future.
Let's build the tools the future will be made in.
#J-18808-Ljbffr
$126k-178k yearly est. 3d ago
Founding ML Infra Engineer - Audio Data Platform
David Ai
Senior data scientist job in San Francisco, CA
A pioneering audio tech company based in San Francisco is searching for a Founding Machine Learning Infrastructure Engineer. In this role, you will build and scale the core infrastructure that powers cutting-edge audio ML products. You will lead the development of systems for training and deploying models. Candidates should have over 5 years of backend experience with strong skills in cloud infrastructure and machine learning principles. The company offers benefits like unlimited PTO and comprehensive health coverage.
#J-18808-Ljbffr
$110k-157k yearly est. 3d ago
Data/Full Stack Engineer, Data Storage & Ingestion Consultant
Eon Systems PBC
Senior data scientist job in San Francisco, CA
About us
At Eon, we are at the forefront of large-scale neuroscientific data collection. Our mission is to enable the safe and scalable development of brain emulation technology to empower humanity over the next decade, beginning with the creation of a fully emulated digital twin of a mouse.
Role
We're a San Francisco team collecting very large microscopy datasets and we need an expert to design and implement our end-to-end data pipeline, from high-rate ingest to multi-petabyte storage and downstream processing. You'll own the strategy (on-prem vs. S3 or hybrid), the bill of materials, and the deployment, and you'll be on the floor wiring, racking, tuning, and validating performance.
Our current instruments generate data at ~1+ GB/s sustained (higher during bursts) and the program will accumulate multiple petabyes total over time. You'll help us choose and implement the right architecture considering reliability and cost controls.
Outcomes (what success looks like)
Within 2 weeks: Implement an immediate data-handling strategy that reliably ingests our initial data streams.
Within 2 weeks: Deliver a documented medium-term data architecture covering storage, networking, ingest, and durability.
Within 1 month: Operationalize the medium-term pipeline in production (ingest → buffer → long-term store → compute access).
Ongoing: Maintain ≥95% uptime for the end-to-end data-handling pipeline after setup.
Responsibilities
Architect ingest & storage: Choose and implement an on-prem hardware and data pipeline design or a cloud/S3 alternative with explicit cost and performance tradeoffs at multi-petabyte scale.
Set up a sustained-write ingest path ≥1 GB/s with adequate burst headroom (camera/frame-to-disk), including networking considerations, cooling, and throttling safeguards.
Optimize footprint & cost: Incorporate on-the-fly compression/downsampling options and quantify CPU budget vs. write-speed tradeoffs; document when/where to compress to control $/PB.
Integrate with acquisition workflows ensuring image data and metadata are compatible with downstream stitching/flat-field correction pipelines.
Enable downstream compute: Expose the data to segmentation/analysis stacks (local GPU nodes or cloud).
Skills
5+ years designing and deploying high-throughput storage or HPC pipelines (≥1 GB/s sustained ingest) in production.
Deep hands-on with: NVMe RAID/striping, ZFS/MDRAID/erasure coding, PCIe topology, NUMA pinning, Linux performance tuning, and NIC offload features.
Proven delivery of multi-GB/s ingest systems and petabyte-scale storage in production (life-sciences, vision, HPC, or media).
Experience building tiered storage systems (NVMe → HDD/object) and validating real-world throughput under sustained load.
Practical S3/object-storage know-how (AWS S3 and/or on-prem S3-compatible systems) with lifecycle, versioning, and cost controls.
Data integrity & reliability: snapshots, scrubs, replication, erasure coding, and backup/DR for PB-scale systems.
Networking: ****25/40/100 GbE (SFP+/SFP28), RDMA/ RoCE/iWARP familiarity; switch config and path tuning.
Ability to spec and rack hardware: selecting chassis/backplanes, RAID/HBA cards, NICs, and cooling strategies to prevent NVMe throttling under sustained writes.
Ideal skills:
Experience with microscopy or scientific imaging ingest at frame-to-disk speeds, including Micro-Manager-based pipelines and raw-to-containerized format conversions.
Experience with life science imaging data a plus.
Engagement details
Contract (1099 or corp-to-corp); contract-to-hire if there's a mutual fit.
On-site requirement: You must be physically present in San Francisco during build-out and initial operations; local field work (e.g., UCSF) as needed.
Compensation: Contract, $100-300/hour
Timeline: Immediate start
#J-18808-Ljbffr
$110k-157k yearly est. 4d ago
Global Data ML Engineer for Multilingual Speech & AI
Cartesia
Senior data scientist job in San Francisco, CA
A leading technology company in San Francisco is seeking a Machine Learning Engineer to ensure the quality and coverage of data across diverse languages. You will design large-scale datasets, evaluate models, and implement quality control systems. The ideal candidate has expertise in multilingual datasets and a strong background in applied ML. This full-time role offers competitive benefits, including fully covered insurance and in-office perks, in a supportive team environment.
#J-18808-Ljbffr
$110k-157k yearly est. 5d ago
Data/Full Stack Engineer, Data Storage & Ingestion Consultant
Kubelt
Senior data scientist job in San Francisco, CA
Employment Type
Full time
Department
Engineering
About us
At Eon, we are at the forefront of large-scale neuroscientific data collection. Our mission is to enable the safe and scalable development of brain emulation technology to empower humanity over the next decade, beginning with the creation of a fully emulated digital twin of a mouse.
Role
We're a San Francisco team collecting very large microscopy datasets and we need an expert to design and implement our end-to-end data pipeline, from high-rate ingest to multi-petabyte storage and downstream processing. You'll own the strategy (on-prem vs. S3 or hybrid), the bill of materials, and the deployment, and you'll be on the floor wiring, racking, tuning, and validating performance.
Our current instruments generate data at ~1+ GB/s sustained (higher during bursts) and the program will accumulate multiple petabyes total over time. You'll help us choose and implement the right architecture considering reliability and cost controls.
Outcomes (what success looks like)
Within 2 weeks: Implement an immediate data-handling strategy that reliably ingests our initial data streams.
Within 2 weeks: Deliver a documented medium-term data architecture covering storage, networking, ingest, and durability.
Within 1 month: Operationalize the medium-term pipeline in production (ingest → buffer → long-term store → compute access).
Ongoing: Maintain ≥95% uptime for the end-to-end data-handling pipeline after setup.
Responsibilities
Architect ingest & storage: Choose and implement an on-prem hardware and data pipeline design or a cloud/S3 alternative with explicit cost and performance tradeoffs at multi-petabyte scale.
Set up a sustained-write ingest path ≥1 GB/s with adequate burst headroom (camera/frame-to-disk), including networking considerations, cooling, and throttling safeguards.
Optimize footprint & cost: Incorporate on-the-fly compression/downsampling options and quantify CPU budget vs. write-speed tradeoffs; document when/where to compress to control $/PB.
Integrate with acquisition workflows ensuring image data and metadata are compatible with downstream stitching/flat-field correction pipelines.
Enable downstream compute: Expose the data to segmentation/analysis stacks (local GPU nodes or cloud).
Skills
5+ years designing and deploying high-throughput storage or HPC pipelines (≥1 GB/s sustained ingest) in production.
Deep hands-on with: NVMe RAID/striping, ZFS/MDRAID/erasure coding, PCIe topology, NUMA pinning, Linux performance tuning, and NIC offload features.
Proven delivery of multi-GB/s ingest systems and petabyte-scale storage in production (life-sciences, vision, HPC, or media).
Experience building tiered storage systems (NVMe ← HDD/object) and validating real-world throughput under sustained load.
Practical S3/object-storage know-how (AWS S3 and/or on-prem S3-compatible systems) with lifecycle, versioning, and cost controls.
Data integrity & reliability: snapshots, scrubs, replication, erasure coding, and backup/DR for PB-scale systems.
Networking: ****25/40/100 GbE (SFP+/SFP28), RDMA/ RoCE/iWARP familiarity; switch config and path tuning.
Ability to spec and rack hardware: selecting chassis/backplanes, RAID/HBA cards, NICs, and cooling strategies to prevent NVMe throttling under sustained writes.
Ideal skills:
Experience with microscopy or scientific imaging ingest at frame-to-disk speeds, including Micro-Manager-based pipelines and raw-to-containerized format conversions.
Experience with life science imaging data a plus.
Engagement details
Contract (1099 or corp-to-corp); contract-to-hire if there's a mutual fit.
On-site requirement: You must be physically present in San Francisco during build-out and initial operations; local field work (e.g., UCSF) as needed.
Compensation: Contract, $100-300/hour
Timeline: Immediate start
#J-18808-Ljbffr
$110k-157k yearly est. 2d ago
Full-Stack Engineer: AI Data Editor
Hex 3.9
Senior data scientist job in San Francisco, CA
A cutting-edge data analytics firm in San Francisco is seeking a full-stack engineer to enhance user experiences and integrate AI tools within their platform. You will work on innovative projects that shape data interactions, collaborate with teams on product initiatives, and tackle UX challenges. Ideal candidates should possess 3+ years of software engineering experience, proficiency in React and Typescript, and a strong desire to work in AI development. This position offers a competitive salary and benefits, with a hybrid work model.
#J-18808-Ljbffr
$126k-178k yearly est. 5d ago
Foundry Data Engineer: ETL Automation & Dashboards
Data Freelance Hub 4.5
Senior data scientist job in San Francisco, CA
A data consulting firm based in San Francisco is seeking a Palantir Foundry Consultant for a contract position. The ideal candidate should have strong experience in Palantir Foundry, SQL, and PySpark, with proven skills in data pipeline development and ETL automation. Responsibilities include building data pipelines, implementing interactive dashboards, and leveraging data analysis for actionable insights. This on-site role offers an excellent opportunity for those experienced in the field.
#J-18808-Ljbffr
$114k-160k yearly est. 2d ago
Senior Data Engineer: ML Pipelines & Signal Processing
Zendar
Senior data scientist job in Berkeley, CA
An innovative tech firm in Berkeley seeks a SeniorData Engineer to manage complex data engineering pipelines. You will ensure data quality, support ML engineers across locations, and establish infrastructure standards. The ideal candidate has over 5 years of experience in Data Science or MLOps, strong algorithmic skills, and proficiency in GCP, Python, and SQL. This role offers competitive salary and the chance to impact a growing team in a dynamic field.
#J-18808-Ljbffr
$110k-157k yearly est. 3d ago
Senior Data Engineer, Card Data Platform
Capital One 4.7
Senior data scientist job in San Francisco, CA
A financial services company in San Francisco seeks a Distinguished Data Engineer to lead innovation in data architecture and management. The role involves building critical data solutions, mentoring teams, and leveraging cloud technologies like AWS. Ideal candidates will have significant experience in data engineering, a Bachelor's degree, and proficiency in modern data practices to drive customer value through analytics and automation.
#J-18808-Ljbffr
$106k-144k yearly est. 5d ago
Staff Data Engineer
PG Forsta
Senior data scientist job in Emeryville, CA
PG Forsta is the leading experience measurement, data analytics, and insights provider for complex industries-a status we earned over decades of deep partnership with clients to help them understand and meet the needs of their key stakeholders. Our earliest roots are in U.S. healthcare -perhaps the most complex of all industries. Today we serve clients around the globe in every industry to help them improve the Human Experiences at the heart of their business. We serve our clients through an unparalleled offering that combines technology, data, and expertise to enable them to pinpoint and prioritize opportunities, accelerate improvement efforts and build lifetime loyalty among their customers and employees.
Like all great companies, our success is a function of our people and our culture. Our employees have world-class talent, a collaborative work ethic, and a passion for the work that have earned us trusted advisor status among the world's most recognized brands. As a member of the team, you will help us create value for our clients, you will make us better through your contribution to the work and your voice in the process. Ours is a path of learning and continuous improvement; team efforts chart the course for corporate success.
Our Mission:
We empower organizations to deliver the best experiences. With industry expertise and technology, we turn data into insights that drive innovation and action.
Our Values:
To put Human Experience at the heart of organizations so every person can be seen and understood.
Energize the customer relationship:Our clients are our partners. We make their goals our own, working side by side to turn challenges into solutions.
Success starts with me:Personal ownership fuels collective success. We each play our part and empower our teammates to do the same.
Commit to learning:Every win is a springboard. Every hurdle is a lesson. We use each experience as an opportunity to grow.
Dare to innovate:We challenge the status quo with creativity and innovation as our true north.
Better together:We check our egos at the door. We work together, so we win together.
We are seeking an experienced Staff Data Engineer to join our Unified Data Platform team. The ideal candidate will design, develop, and maintain enterprise-scale data infrastructure leveraging Azure and Databricks technologies. This role involves building robust data pipelines, optimizing data workflows, and ensuring data quality and governance across the platform. You will collaborate closely with analytics, data science, and business teams to enable data-driven decision-making.
Duties & Responsibilities:
Design, build, and optimizedata pipelinesand workflows in AzureandDatabricks, including Data Lake and SQL Database integrations.
Implement scalable ETL/ELT frameworksusing Azure Data Factory,Databricks, and Spark.
Optimize data structures and queries for performance, reliability, and cost efficiency.
Drivedata quality and governance initiatives, including metadata management and validation frameworks.
Collaborate with cross-functional teams to define and implementdata modelsaligned with business and analytical requirements.
Maintain clear documentation and enforce engineering best practices for reproducibility and maintainability.
Ensure adherence tosecurity, compliance, and data privacystandards.
Mentor junior engineers and contribute to establishingengineering best practices.
SupportCI/CD pipeline developmentfor data workflows using GitLab or Azure DevOps.
Partner with data consumers to publish curated datasets into reporting tools such as Power BI.
Stay current with advancements in Azure, Databricks, Delta Lake, and data architecture trends.
Technical Skills:
Advanced proficiency in Azure 5+ years(Data Lake, ADF, SQL).
Strong expertise in Databricks (5+ years),Apache Spark (5+ years), and Delta Lake (5+ years).
Proficient in SQL (10+ years)and Python (5+ years); familiarity with Scalais a plus.
Strong understanding ofdata modeling,data governance, andmetadata management.
Knowledge ofsource control (Git),CI/CD, and modern DevOps practices.
Familiarity with Power BIvisualization tool.
Minimum Qualifications:
Bachelor's or Master's degree in Computer Science, Data Science, or related field.
7+ yearsof experience in data engineering, with significant hands-on work incloud-based data platforms (Azure).
Experience buildingreal-time data pipelinesand streaming frameworks.
Strong analytical and problem-solving skills.
Proven ability tolead projectsand mentor engineers.
Excellent communication and collaboration skills.
Preferred Qualifications:
Master's degree in Computer Science, Engineering, or a related field.
Exposure tomachine learning integrationwithin data engineering pipelines.
Don't meet every single requirement?Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At PG Forsta we are dedicated to building a diverse, inclusive and authentic workplace, so if you're excited about this role but your past experience doesn't align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.
Additional Information for US based jobs:
Press Ganey Associates LLC is an Equal Employment Opportunity/Affirmative Action employer and well committed to a diverse workforce. We do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, veteran status, and basis of disability or any other federal, state, or local protected class.
Pay Transparency Non-Discrimination Notice - Press Ganey will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information.
The expected base salary for this position ranges from $110,000 to $170,000. It is not typical for offers to be made at or near the top of the range. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, and, where applicable, licensure or certifications obtained. Market and organizational factors are also considered. In addition to base salary and a competitive benefits package, successful candidates are eligible to receive a discretionary bonus or commission tied to achieved results.
All your information will be kept confidential according to EEO guidelines.
Our privacy policy can be found here:legal-privacy/
$110k-170k yearly 4d ago
Director, Growth Platforms Data Scientist
Ernst & Young Oman 4.7
Senior data scientist job in San Francisco, CA
A leading global consulting firm seeks a DataScientist - Director in San Francisco to drive AI solutions and data initiatives. The ideal candidate will lead multi-source data pipelines, architect complex data solutions while collaborating with business leaders. Candidates should have a strong educational background, extensive experience in data engineering, and proficiency with SQL and cloud-native infrastructure. This role offers a competitive salary range of $205,000 to $235,000 and promotes a hybrid working model.
#J-18808-Ljbffr
$205k-235k yearly 4d ago
Staff Data Scientist - Post Sales
Harnham
Senior data scientist job in San Francisco, CA
Salary: $200-250k base + RSUs
This fast-growing Series E AI SaaS company is redefining how modern engineering teams build and deploy applications. We're expanding our data science organization to accelerate customer success after the initial sale-driving onboarding, retention, expansion, and long-term revenue growth.
About the Role
As the seniordatascientist supporting post-sales teams, you will use advanced analytics, experimentation, and predictive modeling to guide strategy across Customer Success, Account Management, and Renewals. Your insights will help leadership forecast expansion, reduce churn, and identify the levers that unlock sustainable net revenue retention.
Key Responsibilities
Forecast & Model Growth: Build predictive models for renewal likelihood, expansion potential, churn risk, and customer health scoring.
Optimize the Customer Journey: Analyze onboarding flows, product adoption patterns, and usage signals to improve activation, engagement, and time-to-value.
Experimentation & Causal Analysis: Design and evaluate experiments (A/B tests, uplift modeling) to measure the impact of onboarding programs, success initiatives, and pricing changes on retention and expansion.
Revenue Insights: Partner with Customer Success and Sales to identify high-value accounts, cross-sell opportunities, and early warning signs of churn.
Cross-Functional Partnership: Collaborate with Product, RevOps, Finance, and Marketing to align post-sales strategies with company growth goals.
Data Infrastructure Collaboration: Work with Analytics Engineering to define data requirements, maintain data quality, and enable self-serve dashboards for Success and Finance teams.
Executive Storytelling: Present clear, actionable recommendations to senior leadership that translate complex analysis into strategic decisions.
About You
Experience: 6+ years in data science or advanced analytics, with a focus on post-sales, customer success, or retention analytics in a B2B SaaS environment.
Technical Skills: Expert SQL and proficiency in Python or R for statistical modeling, forecasting, and machine learning.
Domain Knowledge: Deep understanding of SaaS metrics such as net revenue retention (NRR), gross churn, expansion ARR, and customer health scoring.
Analytical Rigor: Strong background in experimentation design, causal inference, and predictive modeling to inform customer-lifecycle strategy.
Communication: Exceptional ability to translate data into compelling narratives for executives and cross-functional stakeholders.
Business Impact: Demonstrated success improving onboarding efficiency, retention rates, or expansion revenue through data-driven initiatives.
$200k-250k yearly 2d ago
Senior Energy Data Engineer - API & Spark Pipelines
Medium 4.0
Senior data scientist job in San Francisco, CA
A technology finance firm in San Francisco is seeking an experienced Data Engineer. The role involves building data pipelines, integrating data across various platforms, and developing scalable web applications. The ideal candidate will have a strong background in data analysis, software development, and experience with AWS. The salary range for this position is between $160,000 and $210,000, with potential bonuses and equity.
#J-18808-Ljbffr
$160k-210k yearly 3d ago
Distinguished Data Engineer - Card Data
Capital One 4.7
Senior data scientist job in San Francisco, CA
Distinguished Data Engineers are individual contributors who strive to be diverse in thought so we visualize the problem space. At Capital One, we believe diversity of thought strengthens our ability to influence, collaborate and provide the most innovative solutions across organizational boundaries. Distinguished Engineers will significantly impact our trajectory and devise clear roadmaps to deliver next generation technology solutions.**About the Team:** Capital One is seeking a Distinguished Data Engineer, to work in our Credit Card Technology Data Engineering Team and build the future of financial services. We are a fast-paced, mission-driven group responsible for managing and leveraging petabytes of sensitive, real-time and batch data that powers everything from fraud detection models and personalized reward systems to regulatory compliance reporting. As a leader in Data Engineering, you won't just move data; you'll architect high-availability that directly influence millions of customer experiences and secure billions in transactions daily. You'll own critical data domains end-to-end, working cross-functionally with ML Scientists, Product Managers, and Business Analysts teams etc to solve complex, high-stakes problems with cutting-edge cloud technologies (like Snowflake, Kafka, and AWS). If you thrive on technical challenges, demand data integrity, and want your work to have a clear, measurable impact on the bank's core profitability and security, this is your team.This leader must have the ability to attract and recruit the industry's best talent, and simultaneously have the technical chops to ensure that we build compelling, customer-oriented solutions in an iterative methodology. Success in the role requires an innovative mind, a proven track record of delivering next generation software and data products, rigorous analytical skills, and a passion for delivering customer value through automation, machine learning and predictive analytics.**Our Distinguished Engineers Are:*** Deep technical experts and thought leaders that help accelerate adoption of the very best engineering practices, while maintaining knowledge on industry innovations, trends and practices* Visionaries, collaborating on Capital One's toughest issues, to deliver on business needs that directly impact the lives of our customers and associates* Role models and mentors, helping to coach and strengthen the technical expertise and know-how of our engineering and product community* Evangelists, both internally and externally, helping to elevate the Distinguished Engineering community and establish themselves as a go-to resource on given technologies and technology-enabled capabilities**Responsibilities:*** Build awareness, increase knowledge and drive adoption of modern technologies, sharing consumer and engineering benefits to gain buy-in* Strike the right balance between lending expertise and providing an inclusive environment where others' ideas can be heard and championed; leverage expertise to grow skills in the broader Capital One team* Promote a culture of engineering excellence, using opportunities to reuse and innersource solutions where possible* Effectively communicate with and influence key stakeholders across the enterprise, at all levels of the organization* Operate as a trusted advisor for a specific technology, platform or capability domain, helping to shape use cases and implementation in an unified manner* Lead the way in creating next-generation talent for Tech, mentoring internal talent and actively recruiting external talent to bolster Capital One's Tech talent**Basic Qualifications:*** Bachelor's Degree* At least 7 years of experience in data engineering* At least 3 years of experience in data architecture* At least 2 years of experience building applications in AWS**Preferred Qualifications:*** Masters' Degree* 9+ years of experience in data engineering* 3+ years of data modeling experience* 2+ years of experience with ontology standards for defining a domain* 2+ years of experience using Python, SQL or Scala* 1+ year of experience deploying machine learning models* 3+ years of experience implementing big data processing solutions on AWS***Capital One will consider sponsoring a new qualified applicant for employment authorization for this position***Capital One offers a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being. Learn more at the . Eligibility varies based on full or part-time status, exempt or non-exempt status, and management level.
#J-18808-Ljbffr
$106k-144k yearly est. 2d ago
Staff Data Scientist - Sales Analytics
Harnham
Senior data scientist job in San Francisco, CA
Salary: $200-250k base + RSUs
This fast-growing Series E AI SaaS company is redefining how modern engineering teams build and deploy applications. We're looking for a Staff DataScientist to drive Sales and Go-to-Market (GTM) analytics, applying advanced modeling and experimentation to accelerate revenue growth and optimize the full sales funnel.
About the Role
As the seniordatascientist supporting Sales and GTM, you will combine statistical modeling, experimentation, and advanced analytics to inform strategy and guide decision-making across our revenue organization. Your work will help leadership understand pipeline health, predict outcomes, and identify the levers that unlock sustainable growth.
Key Responsibilities
Model the Business: Build forecasting and propensity models for pipeline generation, conversion rates, and revenue projections.
Optimize the Sales Funnel: Analyze lead scoring, opportunity progression, and deal velocity to recommend improvements in acquisition, qualification, and close rates.
Experimentation & Causal Analysis: Design and evaluate experiments (A/B tests, uplift modeling) to measure the impact of pricing, incentives, and campaign initiatives.
Advanced Analytics for GTM: Apply machine learning and statistical techniques to segment accounts, predict churn/expansion, and identify high-value prospects.
Cross-Functional Partnership: Work closely with Sales, Marketing, RevOps, and Product to influence GTM strategy and ensure data-driven decisions.
Data Infrastructure Collaboration: Partner with Analytics Engineering to define data requirements, ensure data quality, and enable self-serve reporting.
Strategic Insights: Present findings to executive leadership, translating complex analyses into actionable recommendations.
About You
Experience: 6+ years in data science or advanced analytics roles, with significant time spent in B2B SaaS or developer tools environments.
Technical Depth: Expert in SQL and proficient in Python or R for statistical modeling, forecasting, and machine learning.
Domain Knowledge: Strong understanding of sales analytics, revenue operations, and product-led growth (PLG) motions.
Analytical Rigor: Skilled in experimentation design, causal inference, and building predictive models that influence GTM strategy.
Communication: Exceptional ability to tell a clear story with data and influence senior stakeholders across technical and business teams.
Business Impact: Proven record of driving measurable improvements in pipeline efficiency, conversion rates, or revenue outcomes.
How much does a senior data scientist earn in Petaluma, CA?
The average senior data scientist in Petaluma, CA earns between $105,000 and $205,000 annually. This compares to the national average senior data scientist range of $90,000 to $170,000.
Average senior data scientist salary in Petaluma, CA