Staff Data Scientist
Data scientist job in Fremont, CA
Staff Data Scientist | San Francisco | $250K-$300K + Equity
We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI.
In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance.
What you'll do:
Drive deep-dive analyses on user behavior, product performance, and growth drivers
Design and interpret A/B tests to measure product impact at scale
Build scalable data models, pipelines, and dashboards for company-wide use
Partner with Product and Engineering to embed experimentation best practices
Evaluate ML models, ensuring business relevance, performance, and trade-off clarity
What we're looking for:
5+ years in data science or product analytics at scale (consumer or marketplace preferred)
Advanced SQL and Python skills, with strong foundations in statistics and experimental design
Proven record of designing, running, and analyzing large-scale experiments
Ability to analyze and reason about ML models (classification, recommendation, LLMs)
Strong communicator with a track record of influencing cross-functional teams
If you're excited by the sound of this challenge- apply today and we'll be in touch.
Data Scientist
Data scientist job in Fremont, CA
Key Responsibilities
Design and productionize models for opportunity scanning, anomaly detection, and significant change detection across CRM, streaming, ecommerce, and social data.
Define and tune alerting logic (thresholds, SLOs, precision/recall) to minimize noise while surfacing high-value marketing actions.
Partner with marketing, product, and data engineering to operationalize insights into campaigns, playbooks, and automated workflows, with clear monitoring and experimentation.
Required Qualifications
Strong proficiency in Python (pandas, NumPy, scikit-learn; plus experience with PySpark or similar for large-scale data) and SQL on modern warehouses (e.g., BigQuery, Snowflake, Redshift).
Hands-on experience with time-series modeling and anomaly / changepoint / significant-movement detection(e.g., STL decomposition, EWMA/CUSUM, Bayesian/prophet-style models, isolation forests, robust statistics).
Experience building and deploying production ML pipelines (batch and/or streaming), including feature engineering, model training, CI/CD, and monitoring for performance and data drift.
Solid background in statistics and experimentation: hypothesis testing, power analysis, A/B testing frameworks, uplift/propensity modeling, and basic causal inference techniques.
Familiarity with cloud platforms (GCP/AWS/Azure), orchestration tools (e.g., Airflow/Prefect), and dashboarding/visualization tools to expose alerts and model outputs to business users.
Data Scientist
Data scientist job in San Francisco, CA
We're working with a Series A health tech start-up pioneering a revolutionary approach to healthcare AI, developing neurosymbolic systems that combine statistical learning with structured medical knowledge. Their technology is being adopted by leading health systems and insurers to enhance patient outcomes through advanced predictive analytics.
We're seeking Machine Learning Engineers who excel at the intersection of data science, modeling, and software engineering. You'll design and implement models that extract insights from longitudinal healthcare data, balancing analytical rigor, interpretability, and scalability.
This role offers a unique opportunity to tackle foundational modeling challenges in healthcare, where your contributions will directly influence clinical, actuarial, and policy decisions.
Key Responsibilities
Develop predictive models to forecast disease progression, healthcare utilization, and costs using temporal clinical data (claims, EHR, laboratory results, pharmacy records)
Design interpretable and explainable ML solutions that earn the trust of clinicians, actuaries, and healthcare decision-makers
Research and prototype innovative approaches leveraging both classical and modern machine learning techniques
Build robust, scalable ML pipelines for training, validation, and deployment in distributed computing environments
Collaborate cross-functionally with data engineers, clinicians, and product teams to ensure models address real-world healthcare needs
Communicate findings and methodologies effectively through visualizations, documentation, and technical presentations
Required Qualifications
Strong foundation in statistical modeling, machine learning, or data science, with preference for experience in temporal or longitudinal data analysis
Proficiency in Python and ML frameworks (PyTorch, JAX, NumPyro, PyMC, etc.)
Proven track record of transitioning models from research prototypes to production systems
Experience with probabilistic methods, survival analysis, or Bayesian inference (highly valued)
Bonus Qualifications
Experience working with clinical data and healthcare terminologies (ICD, CPT, SNOMED CT, LOINC)
Background in actuarial modeling, claims forecasting, or risk adjustment methodologies
Lead Data Scientist - Computer Vision
Data scientist job in Santa Clara, CA
Lead Data Scientist - Computer Vision/Image Processing
About the Role
We are seeking a Lead Data Scientist to drive the strategy and execution of data science initiatives, with a particular focus on computer vision systems & image processing techniques. The ideal candidate has deep expertise in image processing techniques including Filtering, Binary Morphology, Perspective/Affine Transformation, Edge Detection.
Responsibilities
Solid knowledge of computer vision programs and image processing techniques: Filtering, Binary Morphology, Perspective/Affine Transformation, Edge Detection
Strong understanding of machine learning: Regression, Supervised and Unsupervised Learning
Proficiency in Python and libraries such as OpenCV, NumPy, scikit-learn, TensorFlow/PyTorch.
Familiarity with version control (Git) and collaborative development practices
Data Scientist V
Data scientist job in Menlo Park, CA
Creospan is a growing tech collective of makers, shakers, and problem solvers, offering solutions today that will propel businesses into a better tomorrow. “Tomorrow's ideas, built today!” In addition to being able to work alongside equally brilliant and motivated developers, our consultants appreciate the opportunity to learn and apply new skills and methodologies to different clients and industries.
******NO C2C/3RD PARTY, LOOKING FOR W2 CANDIDATES ONLY, must be able to work in the US without sponsorship now or in the future***
Summary:
The main function of the Data Scientist is to produce innovative solutions driven by exploratory data analysis from complex and high-dimensional datasets.
Job Responsibilities:
• Apply knowledge of statistics, machine learning, programming, data modeling, simulation, and advanced mathematics to recognize patterns, identify opportunities, pose business questions, and make valuable discoveries leading to prototype development and product improvement.
• Use a flexible, analytical approach to design, develop, and evaluate predictive models and advanced algorithms that lead to optimal value extraction from the data.
• Generate and test hypotheses and analyze and interpret the results of product experiments.
• Work with product engineers to translate prototypes into new products, services, and features and provide guidelines for large-scale implementation.
• Provide Business Intelligence (BI) and data visualization support, which includes, but limited to support for the online customer service dashboards and other ad-hoc requests requiring data analysis and visual support.
Skills:
• Experienced in either programming languages such as Python and/or R, big data tools such as Hadoop, or data visualization tools such as Tableau.
• The ability to communicate effectively in writing, including conveying complex information and promoting in-depth engagement on course topics.
• Experience working with large datasets.
Education/Experience:
• Master of Science degree in computer science or in a relevant field.
Senior Data Scientist
Data scientist job in Pleasanton, CA
Net2Source is a Global Workforce Solutions Company headquartered at NJ, USA with its branch offices in Asia Pacific Region. We are one of the fastest growing IT Consulting company across the USA and we are hiring
" Senior Data Scientist
" for one of our clients. We offer a wide gamut of consulting solutions customized to our 450+ clients ranging from Fortune 500/1000 to Start-ups across various verticals like Technology, Financial Services, Healthcare, Life Sciences, Oil & Gas, Energy, Retail, Telecom, Utilities, Technology, Manufacturing, the Internet, and Engineering.
Position: Senior Data Scientist
Location: Pleasanton, CA (Onsite) - Locals Only
Type: Contract
Exp Level - 10+ Years
Required Skills
Design, develop, and deploy advanced marketing models, including:
Build and productionize NLP solutions.
Partner with Marketing and Business stakeholders to translate business objectives into data science solutions.
Work with large-scale structured and unstructured datasets using SQL, Python, and distributed systems.
Evaluate and implement state-of-the-art ML/NLP techniques to improve model performance and business impact.
Communicate insights, results, and recommendations clearly to both technical and non-technical audiences.
Required Qualifications
5+ years of experience in data science or applied machine learning, with a strong focus on marketing analytics.
Hands-on experience building predictive marketing models (e.g., segmentation, attribution, personalization).
Strong expertise in NLP techniques and libraries (e.g., spa Cy, NLTK, Hugging Face, Gensim).
Proficiency in Python, SQL, and common data science libraries (pandas, NumPy, scikit-learn).
Solid understanding of statistics, machine learning algorithms, and model evaluation.
Experience deploying models into production environments.
Strong communication and stakeholder management skills.
Why Work With Us?
We believe in more than just jobs-we build careers. At Net2Source, we champion leadership at all levels, celebrate diverse perspectives, and empower you to make an impact. Think work-life balance, professional growth, and a collaborative culture where your ideas matter.
Our Commitment to Inclusion & Equity
Net2Source is an equal opportunity employer, dedicated to fostering a workplace where diverse talents and perspectives are valued. We make all employment decisions based on merit, ensuring a culture of respect, fairness, and opportunity for all, regardless of age, gender, ethnicity, disability, or other protected characteristics.
Awards & Recognition
America's Most Honored Businesses (Top 10%)
Fastest-Growing Staffing Firm by Staffing Industry Analysts
INC 5000 List for Eight Consecutive Years
Top 100 by
Dallas Business Journal
Spirit of Alliance Award by Agile1
Maddhuker Singh
Sr Account & Delivery Manager
***********************
AI Data Engineer
Data scientist job in Palo Alto, CA
Experience Level: Mid-Senior (4+ years)
About the Role
This role supports the organization's data, analytics, and AI innovation efforts by building reliable data pipelines, developing machine learning solutions, and improving how information is collected, processed, and used across the business. This person is a collaborative problem-solver who can translate business needs into technical solutions, work confidently across large datasets, and deliver clear, actionable insights that help drive strategic outcomes.
What They Do
Support innovation projects by facilitating discussions, refining business processes, integrating data sources, and advising on best practices in analytics and AI.
Build and maintain data architectures, APIs, and pipelines to support both operational systems and AI initiatives.
Design and implement machine learning models, including developing prompts and AI-driven applications.
Collaborate with technical and non-technical teams to improve data quality, reporting, and analytics across the organization.
Manage large, complex datasets and develop tools to extract meaningful insights.
Automate manual processes, optimize data delivery, and enhance internal infrastructure.
Create dashboards and visualizations that clearly communicate insights to stakeholders.
Apply statistical, computational, NLP, and ML techniques to solve business problems.
Document processes, use version control best practices, and deploy models to production using enterprise platforms such as Azure.
Ensure data security and privacy across systems, vendors, and applications.
Experience & Skills
4+ years in a Data Engineering or DataOps/DevOps role.
Strong experience with SQL, relational databases, cloud environments (Azure/AWS), and building/optimizing data pipelines.
Ability to work with structured and unstructured data and perform root-cause analysis on data and integrations.
Experience with AI/ML data preparation, preprocessing, feature engineering, and model deployment.
Familiarity with vector databases, embeddings, and RAG-based applications.
Strong communication skills and ability to collaborate across teams.
Ability to handle confidential and sensitive information responsibly.
Technologies
Data & BI: SQL Server, T-SQL, SSIS/SSRS, ETL/ELT, Power BI, DAX, PowerQuery
Programming: Python, R, C++, Julia, JavaScript, SQL
File Formats: JSON, XML, SQL
Extras (Nice to Have): PowerShell, Regex, VBA, documentation/process mapping, Tableau/Domo, Azure ML, legal tech platforms, LLM integration (OpenAI, Azure OpenAI, Claude), embedding models
Data Engineer
Data scientist job in San Francisco, CA
Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species. We are a small, self-funded team focused on design, human infrastructure, and AI. We have no investors, no big company controlling us, and no advertisers. We are 100% supported by our amazing community.
Our tools are already used by millions of people to dream, to explore, and to create. But this is just the start. We think the story of the 2020s is about building the tools that will remake the world for the next century. We're making those tools, to expand what it means to be human.
Core Responsibilities:
Design and maintain data pipelines to consolidate information across multiple sources (subscription platforms, payment systems, infrastructure and usage monitoring, and financial systems) into a unified analytics environment
Build and manage interactive dashboards and self-service BI tools that enable leadership to track key business metrics including revenue performance, infrastructure costs, customer retention, and operational efficiency
Serve as technical owner of our financial planning platform (Pigment or similar), leading implementation and build-out of models, data connections, and workflows in partnership with Finance leadership to translate business requirements into functional system architecture
Develop automated data quality checks and cleaning processes to ensure accuracy and consistency across financial and operational datasets
Partner with Finance, Product and Operations teams to translate business questions into analytical frameworks, including cohort analysis, cost modeling, and performance trending
Create and maintain documentation for data models, ETL processes, dashboard logic, and system workflows to ensure knowledge continuity
Support strategic planning initiatives by building financial models, scenario analyses, and data-driven recommendations for resource allocation and growth investments
Required Qualifications:
3-5+ years experience in data engineering, analytics engineering, or similar role with demonstrated ability to work with large-scale datasets
Strong SQL skills and experience with modern data warehousing solutions (BigQuery, Snowflake, Redshift, etc.)
Proficiency in at least one programming language (Python, R) for data manipulation and analysis
Experience with BI/visualization tools (Looker, Tableau, Power BI, or similar)
Hands-on experience administering enterprise financial systems (NetSuite, SAP, Oracle, or similar ERP platforms)
Experience working with Stripe Billing or similar subscription management platforms, including data extraction and revenue reporting
Ability to communicate technical concepts clearly to non-technical stakeholders
Data Engineer, Knowledge Graphs
Data scientist job in San Francisco, CA
We imagine a world where new medicines reach patients in months, not years, and where scientific breakthroughs happen at the speed of thought.
Mithrl is building the world's first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes. Scientists ask questions in natural language, and Mithrl responds with analysis, novel targets, hypotheses, and patent-ready reports.
No coding. No waiting. No bioinformatics bottlenecks.
We are one of the fastest growing tech bio companies in the Bay Area with 12x year over year revenue growth. Our platform is used across three continents by leading biotechs and big pharmas. We power breakthroughs from early target discovery to mechanism-of-action. And we are just getting started.
ABOUT THE ROLE
We are hiring a Data Engineer, Knowledge Graphs to build the infrastructure that powers Mithrl's biological knowledge layer. You will partner closely with the Data Scientist, Knowledge Graphs to take curated knowledge sources and transform them into scalable, reliable, production ready systems that serve the entire platform.
Your work includes building ETL pipelines for large biological datasets, designing schemas and storage models for graph structured data, and creating the API surfaces that allow ML engineers, application teams, and the AI Co-Scientist to query and use the knowledge graph efficiently. You will also own the reliability, performance, and versioning of knowledge graph infrastructure across releases.
This role is the bridge between biological knowledge ingestion and the high performance engineering systems that use it. If you enjoy working on data modeling, schema design, graph storage, ETL, and scalable infrastructure, this is an opportunity to have deep impact on the intelligence layer of Mithrl.
WHAT YOU WILL DO
Build and maintain ETL pipelines for large public biological datasets and curated knowledge sources
Design, implement, and evolve schemas and storage models for graph structured biological data
Create efficient APIs and query surfaces that allow internal teams and AI systems to retrieve nodes, relationships, pathways, annotations, and graph analytics
Partner closely with the Data Scientists to operationalize curated relationships, harmonized variable IDs, metadata standards, and ontology mappings
Build data models that support multi tenant access, versioning, and reproducibility across releases
Implement scalable storage and indexing strategies for high volume graph data
Maintain data quality, validate data integrity, and build monitoring around ingestion and usage
Work with ML engineers and application teams to ensure the knowledge graph infrastructure supports downstream reasoning, analysis, and discovery applications
Support data warehousing, documentation, and API reliability
Ensure performance, reliability, and uptime for knowledge graph services
WHAT YOU BRING
Required Qualifications
Strong experience as a data engineer or backend engineer working with data intensive systems
Experience building ETL or ELT pipelines for large structured or semi structured datasets
Strong understanding of database design, schema modeling, and data architecture
Experience with graph data models or willingness to learn graph storage concepts
Proficiency in Python or similar languages for data engineering
Experience designing and maintaining APIs for data access
Understanding of versioning, provenance, validation, and reproducibility in data systems
Experience with cloud infrastructure and modern data stack tools
Strong communication skills and ability to work closely with scientific and engineering teams
Nice to Have
Experience with graph databases or graph query languages
Experience with biological or chemical data sources
Familiarity with ontologies, controlled vocabularies, and metadata standards
Experience with data warehousing and analytical storage formats
Previous work in a tech bio company or scientific platform environment
WHAT YOU WILL LOVE AT MITHRL
You will build the core infrastructure that makes the biological knowledge graph fast, reliable, and usable
Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders
Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution
Speed: We ship fast (2x/week) and improve continuously based on real user feedback
Location: Beautiful SF office with a high-energy, in-person culture
Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
Data Engineer / Analytics Specialist
Data scientist job in San Francisco, CA
Citizenship Requirement: U.S. Citizens Only
ITTConnect is seeking a Data Engineer / Analytics to work for one of our clients, a major Technology Consulting firm with headquarters in Europe. They are experts in tailored technology consulting and services to banks, investment firms and other Financial vertical clients.
Job location: San Francisco Bay area or NY City.
Work Model: Ability to come into the office as requested
Seniority: 10+ years of total experience
About the role:
The Data Engineer / Analytics Specialist will support analytics, product insights, and AI initiatives. You will build robust data pipelines, integrate data sources, and enhance the organization's analytical foundations.
Responsibilities:
Build and operate Snowflake-based analytics environments.
Develop ETL/ELT pipelines (DBT, Airflow, etc.).
Integrate APIs, external data sources, and streaming inputs.
Perform query optimization, basic data modeling, and analytics support.
Enable downstream GenAI and analytics use cases.
Requirements:
10+ years of overall technology experience
3+ years hands-on AWS experience required
Strong SQL and Snowflake experience.
Hands-on pipeline engineering with DBT, Airflow, or similar.
Experience with API integrations and modern data architectures.
Data Engineer
Data scientist job in San Francisco, CA
You'll work closely with engineering, analytics, and product teams to ensure data is accurate, accessible, and efficiently processed across the organization.
Key Responsibilities:
Design, develop, and maintain scalable data pipelines and architectures.
Collect, process, and transform data from multiple sources into structured, usable formats.
Ensure data quality, reliability, and security across all systems.
Work with data analysts and data scientists to optimize data models for analytics and machine learning.
Implement ETL (Extract, Transform, Load) processes and automate workflows.
Monitor and troubleshoot data infrastructure, ensuring minimal downtime and high performance.
Collaborate with cross-functional teams to define data requirements and integrate new data sources.
Maintain comprehensive documentation for data systems and processes.
Requirements:
Proven experience as a Data Engineer, ETL Developer, or similar role.
Strong programming skills in Python, SQL, or Scala.
Experience with data pipeline tools (Airflow, dbt, Luigi, etc.).
Familiarity with big data technologies (Spark, Hadoop, Kafka, etc.).
Hands-on experience with cloud data platforms (AWS, GCP, Azure, Snowflake, or Databricks).
Understanding of data modeling, warehousing, and schema design.
Solid knowledge of database systems (PostgreSQL, MySQL, NoSQL).
Strong analytical and problem-solving skills.
Senior Data Engineer
Data scientist job in San Francisco, CA
If you're hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you.
As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines that transform raw information into actionable insights. The ideal candidate will have strong experience across modern data platforms, cloud environments, and big data technologies, with a focus on building secure, repeatable, and high-performing solutions.
Responsibilities:
Design, develop, and maintain secure, scalable data pipelines to ingest, transform, and deliver curated data into the Common Data Platform (CDP).
Participate in Agile rituals and contribute to delivery within the Scaled Agile Framework (SAFe).
Ensure quality and reliability of data products through automation, monitoring, and proactive issue resolution.
Deploy alerting and auto-remediation for pipelines and data stores to maximize system availability.
Apply a security first and automation-driven approach to all data engineering practices.
Collaborate with cross-functional teams (data scientists, analysts, product managers, and business stakeholders) to align infrastructure with evolving data needs.
Stay current on industry trends and emerging tools, recommending improvements to strengthen efficiency and scalability.
Qualifications:
Bachelor's degree in Computer Science, Information Systems, or related field (or equivalent experience).
At least 3 years of experience with Python and PySpark, including Jupyter notebooks and unit testing.
At least 2 years of experience with Databricks, Collibra, and Starburst.
Proven work with relational and NoSQL databases, including STAR and dimensional modeling approaches.
Hands-on experience with modern data stacks: object stores (S3), Spark, Airflow, lakehouse architectures, and cloud warehouses (Snowflake, Redshift).
Strong background in ETL and big data engineering (on-prem and cloud).
Work within enterprise cloud platforms (CFS2, Cloud Foundational Services 2/EDS) for governance and compliance.
Experience building end-to-end pipelines for structured, semi-structured, and unstructured data using Spark.
Sr Data Platform Engineer
Data scientist job in Elk Grove, CA
Hybrid role 3X a week in office in Elk Grove, CA; no remote capabilities
This is a direct hire opportunity.
We're seeking a seasoned Senior Data Platform Engineer to design, build, and optimize scalable data solutions that power analytics, reporting, and AI/ML initiatives. This full‑time role is hands‑on, working with architects, analysts, and business stakeholders to ensure data systems are reliable, secure, and high‑performing.
Responsibilites:
Build and maintain robust data pipelines (structured, semi‑structured, unstructured).
Implement ETL workflows with Spark, Delta Lake, and cloud‑native tools.
Support big data platforms (Databricks, Snowflake, GCP) in production.
Troubleshoot and optimize SQL queries, Spark jobs, and workloads.
Ensure governance, security, and compliance across data systems.
Integrate workflows into CI/CD pipelines with Git, Jenkins, Terraform.
Collaborate cross‑functionally to translate business needs into technical solutions.
Qualifications:
7+ years in data engineering with production pipeline experience.
Expertise in Spark ecosystem, Databricks, Snowflake, GCP.
Strong skills in PySpark, Python, SQL.
Experience with RAG systems, semantic search, and LLM integration.
Familiarity with Kafka, Pub/Sub, vector databases.
Proven ability to optimize ETL jobs and troubleshoot production issues.
Agile team experience and excellent communication skills.
Certifications in Databricks, Snowflake, GCP, or Azure.
Exposure to Airflow, BI tools (Power BI, Looker Studio).
Staff Data Scientist
Data scientist job in San Francisco, CA
Staff Data Scientist | San Francisco | $250K-$300K + Equity
We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI.
In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance.
What you'll do:
Drive deep-dive analyses on user behavior, product performance, and growth drivers
Design and interpret A/B tests to measure product impact at scale
Build scalable data models, pipelines, and dashboards for company-wide use
Partner with Product and Engineering to embed experimentation best practices
Evaluate ML models, ensuring business relevance, performance, and trade-off clarity
What we're looking for:
5+ years in data science or product analytics at scale (consumer or marketplace preferred)
Advanced SQL and Python skills, with strong foundations in statistics and experimental design
Proven record of designing, running, and analyzing large-scale experiments
Ability to analyze and reason about ML models (classification, recommendation, LLMs)
Strong communicator with a track record of influencing cross-functional teams
If you're excited by the sound of this challenge- apply today and we'll be in touch.
Data Scientist
Data scientist job in San Francisco, CA
Key Responsibilities
Design and productionize models for opportunity scanning, anomaly detection, and significant change detection across CRM, streaming, ecommerce, and social data.
Define and tune alerting logic (thresholds, SLOs, precision/recall) to minimize noise while surfacing high-value marketing actions.
Partner with marketing, product, and data engineering to operationalize insights into campaigns, playbooks, and automated workflows, with clear monitoring and experimentation.
Required Qualifications
Strong proficiency in Python (pandas, NumPy, scikit-learn; plus experience with PySpark or similar for large-scale data) and SQL on modern warehouses (e.g., BigQuery, Snowflake, Redshift).
Hands-on experience with time-series modeling and anomaly / changepoint / significant-movement detection(e.g., STL decomposition, EWMA/CUSUM, Bayesian/prophet-style models, isolation forests, robust statistics).
Experience building and deploying production ML pipelines (batch and/or streaming), including feature engineering, model training, CI/CD, and monitoring for performance and data drift.
Solid background in statistics and experimentation: hypothesis testing, power analysis, A/B testing frameworks, uplift/propensity modeling, and basic causal inference techniques.
Familiarity with cloud platforms (GCP/AWS/Azure), orchestration tools (e.g., Airflow/Prefect), and dashboarding/visualization tools to expose alerts and model outputs to business users.
Senior Data Engineer
Data scientist job in Fremont, CA
We're hiring a Senior/Lead Data Engineer to join a fast-growing AI startup. The team comes from a billion dollar AI company, and has raised a $40M+ seed round.
You'll need to be comfortable transforming and moving data in a new 'group level' data warehouse, from legacy sources. You'll have a strong data modeling background.
Proven proficiency in modern data transformation tools, specifically dbt and/or SQLMesh.
Exceptional ability to apply systems thinking and complex problem-solving to ambiguous challenges. Experience within a high-growth startup environment is highly valued.
Deep, practical knowledge of the entire data lifecycle, from generation and governance through to advanced downstream applications (e.g., fueling AI/ML models, LLM consumption, and core product features).
Outstanding ability to communicate technical complexity clearly, synthesizing information into actionable frameworks for executive and cross-functional teams.
Data Engineer - Scientific Data Ingestion
Data scientist job in San Francisco, CA
We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.
Mithrl is building the world's first commercially available AI Co-Scientist-a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks.
We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughs-from target discovery to mechanism of action.
WHAT YOU WILL DO
Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources - unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.
Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).
Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data - extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.
Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion - so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data.
Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.
Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.
WHAT YOU BRING
Must-have
5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.
Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).
Excellent experience dealing with messy Excel / CSV / spreadsheet-style data - inconsistent headers, multiple sheets, mixed formats, free-text fields - and normalizing it into clean structures.
Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data.
Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.
Strong desire and ability to own the ingestion & normalization layer end-to-end - from raw upload → final clean dataset - with an eye for maintainability, reproducibility, and scalability.
Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.
Nice-to-have
Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).
Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.
Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion.
Past exposure to LLM-based data transformation or cleansing agents - building or integrating tools that clean or structure messy data automatically.
Any background in computational biology / lab-data / bioinformatics is a bonus - though not required.
WHAT YOU WILL LOVE AT MITHRL
Mission-driven impact: you'll be the gatekeeper of data quality - ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You'll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.
High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You'll work closely with our product, data science, and infrastructure teams - shaping how data is ingested, stored, and exposed to end users or AI agents.
Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders
Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution
Speed: We ship fast (2x/week) and improve continuously based on real user feedback
Location: Beautiful SF office with a high-energy, in-person culture
Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
Senior ML Data Engineer
Data scientist job in San Francisco, CA
We're the data team behind Midjourney's image generation models. We handle the dataset side: processing, filtering, scoring, captioning, and all the distributed compute that makes high-quality training data possible.
What you'd be working on:
Large-scale dataset processing and filtering pipelines
Training classifiers for content moderation and quality assessment
Models for data quality and aesthetic evaluation
Data visualization tools for experimenting on dataset samples
Testing/simulating distributed inference pipelines
Monitoring dashboards for data quality and pipeline health
Performance optimization and infrastructure scaling
Occasionally jumping into inference optimization and other cross-team projects
Our current stack: PySpark, Slurm, distributed batch processing across hybrid cloud setup. We're pragmatic about tools - if there's something better, we'll switch.
We're looking for someone strong in either:
Data engineering/ML pipelines at scale, or
Cloud/infrastructure with distributed systems experience
Don't need exact tech matches - comfort with adjacent technologies and willingness to learn matters more. We work with our own hardware plus GCP and other providers, so adaptability across different environments is valuable.
Location: SF office a few times per week (we may make exceptions on location for truly exceptional candidates)
The role offers variety, our team members often get pulled into different projects across the company, from dataset work to inference optimization. If you're interested in the intersection of large-scale data processing and cutting-edge generative AI, we'd love to hear from you.
Senior Data Engineer
Data scientist job in Fremont, CA
If you're hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you.
As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines that transform raw information into actionable insights. The ideal candidate will have strong experience across modern data platforms, cloud environments, and big data technologies, with a focus on building secure, repeatable, and high-performing solutions.
Responsibilities:
Design, develop, and maintain secure, scalable data pipelines to ingest, transform, and deliver curated data into the Common Data Platform (CDP).
Participate in Agile rituals and contribute to delivery within the Scaled Agile Framework (SAFe).
Ensure quality and reliability of data products through automation, monitoring, and proactive issue resolution.
Deploy alerting and auto-remediation for pipelines and data stores to maximize system availability.
Apply a security first and automation-driven approach to all data engineering practices.
Collaborate with cross-functional teams (data scientists, analysts, product managers, and business stakeholders) to align infrastructure with evolving data needs.
Stay current on industry trends and emerging tools, recommending improvements to strengthen efficiency and scalability.
Qualifications:
Bachelor's degree in Computer Science, Information Systems, or related field (or equivalent experience).
At least 3 years of experience with Python and PySpark, including Jupyter notebooks and unit testing.
At least 2 years of experience with Databricks, Collibra, and Starburst.
Proven work with relational and NoSQL databases, including STAR and dimensional modeling approaches.
Hands-on experience with modern data stacks: object stores (S3), Spark, Airflow, lakehouse architectures, and cloud warehouses (Snowflake, Redshift).
Strong background in ETL and big data engineering (on-prem and cloud).
Work within enterprise cloud platforms (CFS2, Cloud Foundational Services 2/EDS) for governance and compliance.
Experience building end-to-end pipelines for structured, semi-structured, and unstructured data using Spark.
Senior ML Data Engineer
Data scientist job in Fremont, CA
We're the data team behind Midjourney's image generation models. We handle the dataset side: processing, filtering, scoring, captioning, and all the distributed compute that makes high-quality training data possible.
What you'd be working on:
Large-scale dataset processing and filtering pipelines
Training classifiers for content moderation and quality assessment
Models for data quality and aesthetic evaluation
Data visualization tools for experimenting on dataset samples
Testing/simulating distributed inference pipelines
Monitoring dashboards for data quality and pipeline health
Performance optimization and infrastructure scaling
Occasionally jumping into inference optimization and other cross-team projects
Our current stack: PySpark, Slurm, distributed batch processing across hybrid cloud setup. We're pragmatic about tools - if there's something better, we'll switch.
We're looking for someone strong in either:
Data engineering/ML pipelines at scale, or
Cloud/infrastructure with distributed systems experience
Don't need exact tech matches - comfort with adjacent technologies and willingness to learn matters more. We work with our own hardware plus GCP and other providers, so adaptability across different environments is valuable.
Location: SF office a few times per week (we may make exceptions on location for truly exceptional candidates)
The role offers variety, our team members often get pulled into different projects across the company, from dataset work to inference optimization. If you're interested in the intersection of large-scale data processing and cutting-edge generative AI, we'd love to hear from you.