Post job

Data scientist jobs in Apple Valley, CA

- 2,384 jobs
All
Data Scientist
Data Engineer
Senior Data Scientist
Data Modeler
Engineering Mathematician
Data Science Internship
Operations Research Analyst
  • Data Scientist

    Stand 8 Technology Consulting

    Data scientist job in Long Beach, CA

    STAND 8 provides end to end IT solutions to enterprise partners across the United States and with offices in Los Angeles, New York, New Jersey, Atlanta, and more including internationally in Mexico and India We are seeking a highly analytical and technically skilled Data Scientist to transform complex, multi-source data into unified, actionable insights used for executive reporting and decision-making. This role requires expertise in business intelligence design, data modeling, metadata management, data integrity validation, and the development of dashboards, reports, and analytics used across operational and strategic environments. The ideal candidate thrives in a fast-paced environment, demonstrates strong investigative skills, and can collaborate effectively with technical teams, business stakeholders, and leadership. Essential Duties & Responsibilities As a Data Scientist, participate across the full solution lifecycle: business case, planning, design, development, testing, migration, and production support. Analyze large and complex datasets with accuracy and attention to detail. Collaborate with users to develop effective metadata and data relationships. Identify reporting and dashboard requirements across business units. Determine strategic placement of business logic within ETL or metadata models. Build enterprise data warehouse metadata/semantic models. Design and develop unified dashboards, reports, and data extractions from multiple data sources. Develop and execute testing methodologies for reports and metadata models. Document BI architecture, data lineage, and project report requirements. Provide technical specifications and data definitions to support the enterprise data dictionary. Apply analytical skills and Data Science techniques to understand business processes, financial calculations, data flows, and application interactions. Identify and implement improvements, workarounds, or alternative solutions related to ETL processes, ensuring integrity and timeliness. Create UI components or portal elements (e.g., SharePoint) for dynamic or interactive stakeholder reporting. As a Data Scientist, download and process SQL database information to build Power BI or Tableau reports (including cybersecurity awareness campaigns). Utilize SQL, Python, R, or similar languages for data analysis and modeling. Support process optimization through advanced modeling, leveraging experience as a Data Scientist where needed. Required Knowledge & Attributes Highly self-motivated with strong organizational skills and ability to manage multiple verbal and written assignments. Experience collaborating across organizational boundaries for data sourcing and usage. Analytical understanding of business processes, forecasting, capacity planning, and data governance. Proficient with BI tools (Power BI, Tableau, PBIRS, SSRS, SSAS). Strong Microsoft Office skills (Word, Excel, Visio, PowerPoint). High attention to detail and accuracy. Ability to work independently, demonstrate ownership, and ensure high-quality outcomes. Strong communication, interpersonal, and stakeholder engagement skills. Deep understanding that data integrity and consistency are essential for adoption and trust. Ability to shift priorities and adapt within fast-paced environments. Required Education & Experience Bachelor's degree in Computer Science, Mathematics, or Statistics (or equivalent experience). 3+ years of BI development experience. 3+ years with Power BI and supporting Microsoft stack tools (SharePoint 2019, PBIRS/SSRS, Excel 2019/2021). 3+ years of experience with SDLC/project lifecycle processes 3+ years of experience with data warehousing methodologies (ETL, Data Modeling). 3+ years of VBA experience in Excel and Access. Strong ability to write SQL queries and work with SQL Server 2017-2022. Experience with BI tools including PBIRS, SSRS, SSAS, Tableau. Strong analytical skills in business processes, financial modeling, forecasting, and data flow understanding. Critical thinking and problem-solving capabilities. Experience producing high-quality technical documentation and presentations. Excellent communication and presentation skills, with the ability to explain insights to leadership and business teams. Benefits Medical coverage and Health Savings Account (HSA) through Anthem Dental/Vision/Various Ancillary coverages through Unum 401(k) retirement savings plan Paid-time-off options Company-paid Employee Assistance Program (EAP) Discount programs through ADP WorkforceNow Additional Details The base range for this contract position is $73 - $83 / per hour, depending on experience. Our pay ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hires of this position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Qualified applicants with arrest or conviction records will be considered About Us STAND 8 provides end-to-end IT solutions to enterprise partners across the United States and globally with offices in Los Angeles, Atlanta, New York, Mexico, Japan, India, and more. STAND 8 focuses on the "bleeding edge" of technology and leverages automation, process, marketing, and over fifteen years of success and growth to provide a world-class experience for our customers, partners, and employees. Our mission is to impact the world positively by creating success through PEOPLE, PROCESS, and TECHNOLOGY. Check out more at ************** and reach out today to explore opportunities to grow together! By applying to this position, your data will be processed in accordance with the STAND 8 Privacy Policy.
    $73-83 hourly 1d ago
  • Data Scientist

    Centraprise

    Data scientist job in Pleasanton, CA

    Key Responsibilities Design and develop marketing-focused machine learning models, including: Customer segmentation Propensity, churn, and lifetime value (LTV) models Campaign response and uplift models Attribution and marketing mix models (MMM) Build and deploy NLP solutions for: Customer sentiment analysis Text classification and topic modeling Social media, reviews, chat, and voice-of-customer analytics Apply advanced statistical and ML techniques to solve real-world business problems. Work with structured and unstructured data from multiple marketing channels (digital, CRM, social, email, web). Translate business objectives into analytical frameworks and actionable insights. Partner with stakeholders to define KPIs, success metrics, and experimentation strategies (A/B testing). Optimize and productionize models using MLOps best practices. Mentor junior data scientists and provide technical leadership. Communicate complex findings clearly to technical and non-technical audiences. Required Skills & Qualifications 7+ years of experience in Data Science, with a strong focus on marketing analytics. Strong expertise in Machine Learning (supervised & unsupervised techniques). Hands-on experience with NLP techniques, including: Text preprocessing and feature extraction Word embeddings (Word2Vec, GloVe, Transformers) Large Language Models (LLMs) is a plus Proficiency in Python (NumPy, Pandas, Scikit-learn, TensorFlow/PyTorch). Experience with SQL and large-scale data processing. Strong understanding of statistics, probability, and experimental design. Experience working with cloud platforms (AWS, Azure, or GCP). Ability to translate data insights into business impact. Nice to Have Experience with marketing automation or CRM platforms. Knowledge of MLOps, model monitoring, and deployment pipelines. Familiarity with GenAI/LLM-based NLP use cases for marketing. Prior experience in consumer, e-commerce, or digital marketing domains. EEO Centraprise is an equal opportunity employer. Your application and candidacy will not be considered based on race, color, sex, religion, creed, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, veteran status or any other characteristic protected by federal, state or local laws.
    $107k-155k yearly est. 2d ago
  • Data Scientist

    Skale 3.7company rating

    Data scientist job in San Francisco, CA

    We're working with a Series A health tech start-up pioneering a revolutionary approach to healthcare AI, developing neurosymbolic systems that combine statistical learning with structured medical knowledge. Their technology is being adopted by leading health systems and insurers to enhance patient outcomes through advanced predictive analytics. We're seeking Machine Learning Engineers who excel at the intersection of data science, modeling, and software engineering. You'll design and implement models that extract insights from longitudinal healthcare data, balancing analytical rigor, interpretability, and scalability. This role offers a unique opportunity to tackle foundational modeling challenges in healthcare, where your contributions will directly influence clinical, actuarial, and policy decisions. Key Responsibilities Develop predictive models to forecast disease progression, healthcare utilization, and costs using temporal clinical data (claims, EHR, laboratory results, pharmacy records) Design interpretable and explainable ML solutions that earn the trust of clinicians, actuaries, and healthcare decision-makers Research and prototype innovative approaches leveraging both classical and modern machine learning techniques Build robust, scalable ML pipelines for training, validation, and deployment in distributed computing environments Collaborate cross-functionally with data engineers, clinicians, and product teams to ensure models address real-world healthcare needs Communicate findings and methodologies effectively through visualizations, documentation, and technical presentations Required Qualifications Strong foundation in statistical modeling, machine learning, or data science, with preference for experience in temporal or longitudinal data analysis Proficiency in Python and ML frameworks (PyTorch, JAX, NumPyro, PyMC, etc.) Proven track record of transitioning models from research prototypes to production systems Experience with probabilistic methods, survival analysis, or Bayesian inference (highly valued) Bonus Qualifications Experience working with clinical data and healthcare terminologies (ICD, CPT, SNOMED CT, LOINC) Background in actuarial modeling, claims forecasting, or risk adjustment methodologies
    $123k-171k yearly est. 3d ago
  • Lead Data Scientist - Computer Vision

    Straive

    Data scientist job in Santa Clara, CA

    Lead Data Scientist - Computer Vision/Image Processing About the Role We are seeking a Lead Data Scientist to drive the strategy and execution of data science initiatives, with a particular focus on computer vision systems & image processing techniques. The ideal candidate has deep expertise in image processing techniques including Filtering, Binary Morphology, Perspective/Affine Transformation, Edge Detection. Responsibilities Solid knowledge of computer vision programs and image processing techniques: Filtering, Binary Morphology, Perspective/Affine Transformation, Edge Detection Strong understanding of machine learning: Regression, Supervised and Unsupervised Learning Proficiency in Python and libraries such as OpenCV, NumPy, scikit-learn, TensorFlow/PyTorch. Familiarity with version control (Git) and collaborative development practices
    $107k-154k yearly est. 5d ago
  • Principal Data Scientist

    Hiretalent-Staffing & Recruiting Firm

    Data scientist job in Alhambra, CA

    The Principal Data Scientist works to establish a comprehensive Data Science Program to advance data-driven decision-making, streamline operations, and fully leverage modern platforms including Databricks, or similar, to meet increasing demand for predictive analytics and AI solutions. The Principal Data Scientist will guide program development, provide training and mentorship to junior members of the team, accelerate adoption of advanced analytics, and build internal capacity through structured mentorship. The Principal Data Scientist will possess exceptional communication abilities, both verbal and written, with a strong customer service mindset and the ability to translate complex concepts into clear, actionable insights; strong analytical and business acumen, including foundational experience with regression, association analysis, outlier detection, and core data analysis principles; working knowledge of database design and organization, with the ability to partner effectively with Data Management and Data Engineering teams; outstanding time management and organizational skills, with demonstrated success managing multiple priorities and deliverables in parallel; a highly collaborative work style, coupled with the ability to operate independently, maintain focus, and drive projects forward with minimal oversight; a meticulous approach to quality, ensuring accuracy, reliability, and consistency in all deliverables; and proven mentorship capabilities, including the ability to guide, coach, and upskill junior data scientists and analysts. 5+ years of professional experience leading data science initiatives, including developing machine learning models, statistical analyses, and end-to-end data science workflows in production environments. 3+ years of experience working with Databricks and similar cloud-based analytics platforms, including notebook development, feature engineering, ML model training, and workflow orchestration. 3+ years of experience applying advanced analytics and predictive modeling (e.g., regression, classification, clustering, forecasting, natural language processing). 2+ years of experience implementing MLOps practices, such as model versioning, CI/CD for ML, MLflow, automated pipelines, and model performance monitoring. 2+ years of experience collaborating with data engineering teams to design data pipelines, optimize data transformations, and implement Lakehouse or data warehouse architectures (e.g., Databricks, Snowflake, SQL-based platforms). 2+ years of experience mentoring or supervising junior data scientists or analysts, including code reviews, training, and structured skill development. 2+ years of experience with Python and SQL programming, using data sources such as SQL Server, Oracle, PostgreSQL, or similar relational databases. 1+ year of experience operationalizing analytics within enterprise governance frameworks, partnering with Data Management, Security, and IT to ensure compliance, reproducibility, and best practices. Education: This classification requires possession of a Master's degree or higher in Data Science, Statistics, Computer Science, or a closely related field. Additional qualifying professional experience may be substituted for the required education on a year-for-year basis. At least one of the following industry-recognized certifications in data science or cloud analytics, such as: • Microsoft Azure Data Scientist Associate (DP-100) • Databricks Certified Data Scientist or Machine Learning Professional • AWS Machine Learning Specialty • Google Professional Data Engineer • or equivalent advanced analytics certifications. The certification is required and may not be substituted with additional experience.
    $97k-141k yearly est. 3d ago
  • Data Scientist

    Axtria-Ingenious Insights 3.7company rating

    Data scientist job in Thousand Oaks, CA

    Axtria is a leading global provider of cloud software and data analytics tailored for the Life Sciences industry. Since our inception in 2010, we have pioneered technology-driven solutions to revolutionize the commercialization journey, driving sales growth, and enhancing patient healthcare outcomes. Committed to impacting millions of lives positively, our innovative platforms deploy cutting-edge Artificial Intelligence and Machine Learning technologies. With a presence in over 30 countries, Axtria is a key player in delivering commercial solutions to the Life Sciences sector, consistently recognized for our growth and technological advancements. Job Description: We are looking for a Project Lead for our Decision Science practice. Success in this position requires managing consulting projects/engagements delivering Brand Analytics, Real World Data (RWD) Analytics, Commercial Analytics, Marketing Analytics, and Market Access Analytics solutions. Candidates will be expected to have familiarity with: Patient analytics using Real World Data (RWD) sources such as Claims data, EHR/EMR data, lab/diagnostic testing data, etc. Predictive modeling using Real World Data Patient and HCP segmentation Campaign effectiveness, promotion response modeling, marketing mix optimization Marketing analytics incl. digital marketing Required skills and experience: Overall, 4-8 years of relevant work experience and 2+ years of US local experience in pharma analytics Knowledge of the Biopharmaceutical domain. Prior experience in analytics in therapeutic areas of Oncology, Inflammation, Cardio and Bone will be preferred Exposure to syndicated data sets including Claims, EMR/EHR data and exposure to/experience working with large data sets. Strong quantitative and analytical skills, including sound knowledge of statistical concepts and predictive modeling/machine learning. Demonstrated ability to frame and scope business problems, design solutions, and deliver results. Excellent spoken and written communication skills, including superior visualization, storyboarding, and presentation skills. Ability to communicate actionable analytical findings to a technical or non-technical audience in clear and concise language. Relevant expertise in using analytical tools such as R/Python, Alteryx, Dataiku etc. and ability to quickly master new analytics tools/software as needed. Ability to lead project teams and own project delivery. Logistics and Location: U.S. Citizens and those authorized to work in the U.S. are encouraged to apply. The position is based out of Thousand Oaks and the candidate needs to be at the client site 3-5 days per week Axtria is an EEO/AA employer M/F/D/V. We offer attractive performance-based compensation packages including salary and bonus. Comprehensive benefits are available including health insurance, flexible spending accounts, and 401k with company match. Immigration sponsorship will be considered. Pay Transparency Laws Salary range or hourly pay range for the position The salary range for this position is $83,200 to $129,738 annually. The actual salary will vary based on applicant's education, experience, skills, and abilities, as well as internal equity and alignment with market data. The salary may also be adjusted based on applicant's geographic location. The salary range reflected is based on a primary work location of Thousand Oaks, CA. The actual salary may vary for applicants in a different geographic location.
    $83.2k-129.7k yearly 2d ago
  • Data Scientist

    Randomtrees

    Data scientist job in San Francisco, CA

    Key Responsibilities Design and productionize models for opportunity scanning, anomaly detection, and significant change detection across CRM, streaming, ecommerce, and social data. Define and tune alerting logic (thresholds, SLOs, precision/recall) to minimize noise while surfacing high-value marketing actions. Partner with marketing, product, and data engineering to operationalize insights into campaigns, playbooks, and automated workflows, with clear monitoring and experimentation. Required Qualifications Strong proficiency in Python (pandas, NumPy, scikit-learn; plus experience with PySpark or similar for large-scale data) and SQL on modern warehouses (e.g., BigQuery, Snowflake, Redshift). Hands-on experience with time-series modeling and anomaly / changepoint / significant-movement detection(e.g., STL decomposition, EWMA/CUSUM, Bayesian/prophet-style models, isolation forests, robust statistics). Experience building and deploying production ML pipelines (batch and/or streaming), including feature engineering, model training, CI/CD, and monitoring for performance and data drift. Solid background in statistics and experimentation: hypothesis testing, power analysis, A/B testing frameworks, uplift/propensity modeling, and basic causal inference techniques. Familiarity with cloud platforms (GCP/AWS/Azure), orchestration tools (e.g., Airflow/Prefect), and dashboarding/visualization tools to expose alerts and model outputs to business users.
    $108k-155k yearly est. 4d ago
  • Data Scientist with Gen Ai and Python experience

    Droisys 4.3company rating

    Data scientist job in Palo Alto, CA

    About Company, Droisys is an innovation technology company focused on helping companies accelerate their digital initiatives from strategy and planning through execution. We leverage deep technical expertise, Agile methodologies, and data-driven intelligence to modernize systems of engagement and simplify human/tech interaction. Amazing things happen when we work in environments where everyone feels a true sense of belonging and when candidates have the requisite skills and opportunities to succeed. At Droisys, we invest in our talent and support career growth, and we are always on the lookout for amazing talent who can contribute to our growth by delivering top results for our clients. Join us to challenge yourself and accomplish work that matters. Here's the job details, Data Scientist with Gen Ai and Python experience Palo Alto CA- 5 days Onsite Interview Mode:-Phone & F2F Job Overview: Competent Data Scientist, who is independent, results driven and is capable of taking business requirements and building out the technologies to generate statistically sound analysis and production grade ML models DS skills with GenAI and LLM Knowledge, Expertise in Python/Spark and their related libraries and frameworks. Experience in building training ML pipelines and efforts involved in ML Model deployment. Experience in other ML concepts - Real time distributed model inferencing pipeline, Champion/Challenger framework, A/B Testing, Model. Familiar with DS/ML Production implementation. Excellent problem-solving skills, with attention to detail, focus on quality and timely delivery of assigned tasks. Azure cloud and Databricks prior knowledge will be a big plus. Droisys is an equal opportunity employer. We do not discriminate based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. Droisys believes in diversity, inclusion, and belonging, and we are committed to fostering a diverse work environment.
    $104k-146k yearly est. 3d ago
  • Senior Data Scientist

    Net2Source (N2S

    Data scientist job in Pleasanton, CA

    Net2Source is a Global Workforce Solutions Company headquartered at NJ, USA with its branch offices in Asia Pacific Region. We are one of the fastest growing IT Consulting company across the USA and we are hiring " Senior Data Scientist " for one of our clients. We offer a wide gamut of consulting solutions customized to our 450+ clients ranging from Fortune 500/1000 to Start-ups across various verticals like Technology, Financial Services, Healthcare, Life Sciences, Oil & Gas, Energy, Retail, Telecom, Utilities, Technology, Manufacturing, the Internet, and Engineering. Position: Senior Data Scientist Location: Pleasanton, CA (Onsite) - Locals Only Type: Contract Exp Level - 10+ Years Required Skills Design, develop, and deploy advanced marketing models, including: Build and productionize NLP solutions. Partner with Marketing and Business stakeholders to translate business objectives into data science solutions. Work with large-scale structured and unstructured datasets using SQL, Python, and distributed systems. Evaluate and implement state-of-the-art ML/NLP techniques to improve model performance and business impact. Communicate insights, results, and recommendations clearly to both technical and non-technical audiences. Required Qualifications 5+ years of experience in data science or applied machine learning, with a strong focus on marketing analytics. Hands-on experience building predictive marketing models (e.g., segmentation, attribution, personalization). Strong expertise in NLP techniques and libraries (e.g., spa Cy, NLTK, Hugging Face, Gensim). Proficiency in Python, SQL, and common data science libraries (pandas, NumPy, scikit-learn). Solid understanding of statistics, machine learning algorithms, and model evaluation. Experience deploying models into production environments. Strong communication and stakeholder management skills. Why Work With Us? We believe in more than just jobs-we build careers. At Net2Source, we champion leadership at all levels, celebrate diverse perspectives, and empower you to make an impact. Think work-life balance, professional growth, and a collaborative culture where your ideas matter. Our Commitment to Inclusion & Equity Net2Source is an equal opportunity employer, dedicated to fostering a workplace where diverse talents and perspectives are valued. We make all employment decisions based on merit, ensuring a culture of respect, fairness, and opportunity for all, regardless of age, gender, ethnicity, disability, or other protected characteristics. Awards & Recognition America's Most Honored Businesses (Top 10%) Fastest-Growing Staffing Firm by Staffing Industry Analysts INC 5000 List for Eight Consecutive Years Top 100 by Dallas Business Journal Spirit of Alliance Award by Agile1 Maddhuker Singh Sr Account & Delivery Manager ***********************
    $122k-174k yearly est. 4d ago
  • Senior Data Scientist

    Revolve 4.2company rating

    Data scientist job in Cerritos, CA

    Meet REVOLVE: REVOLVE is the next-generation fashion retailer for Millennial and Generation Z consumers. As a trusted, premium lifestyle brand, and a go-to online source for discovery and inspiration, we deliver an engaging customer experience from a vast yet curated offering totaling over 45,000 apparel, footwear, accessories and beauty styles. Our dynamic platform connects a deeply engaged community of millions of consumers, thousands of global fashion influencers, and more than 500 emerging, established and owned brands. Through 16 years of continued investment in technology, data analytics, and innovative marketing and merchandising strategies, we have built a powerful platform and brand that we believe is connecting with the next generation of consumers and is redefining fashion retail for the 21st century. For more information please visit **************** At REVOLVE the most successful team members have a thirst and the creativity to make this the top e-commerce brand in the world. With a team of 1,000+ based out of Cerritos, California we are a dynamic bunch that are motivated by getting the company to the next level. It's our goal to hire high-energy, diverse, bright, creative, and flexible individuals who thrive in a fast-paced work environment. In return, we promise to keep REVOLVE a company where inspired people will always thrive. To take a behind the scenes look at the REVOLVE “corporate” lifestyle check out our Instagram @REVOLVEcareers or #lifeatrevolve. Are you ready to set the standard for Premium apparel? Main purpose of the Senior Data Science Analyst role: Use a diverse skill sets across math and computer science, dedicated to solving complex and analytically challenging problems here at Revolve. Major Responsibilities: Essential Duties and Responsibilities include the following. Other duties may be assigned. Partner closely with business leaders in Marketing, Product, Operations, Buying team to plan out valuable data science projects Conduct complex analysis and build models to uncover key learning form data, leading to appropriate strategy recommendations. Work closely with the DBA to improve BI's infrastructure, architect the reporting system, and invest in time for technical proof of concept. Work closely with the business intelligence and tech team to define, automate and validate the extraction of new metrics from various data sources for use in future analysis Work alongside business stakeholders to apply our findings and models in website personalization, product recommendations, marketing optimization, to fraud detection, demand forecast, CLV prediction. Required Competencies: To perform the job successfully, an individual should demonstrate the following competencies: Outstanding analytical skills, with strong academic background in statistics, math, science or technology. High comfort level with programming, ability to learn and adopt new technology with short turn-around time. Knowledge of quantitative methods in statistics and machine learning Intense intellectual curiosity - strong desire to always be learning Proven business acumen and results oriented. Ability to demonstrate logical thinking and problem solving skills Strong attention to detail Minimum Qualifications: Master Degree is required 3+ years of DS and ML experience in a strong analytical environment. Proficient in Python, NumPy and other packages Familiar with statistical and ML methodology: causal inference, logistic regression, tree-based models, clustering, model validation and interpretations. Experience with AB Testing and pseudo-A/B test setup and evaluations Advanced SQL experience, query optimization, data extract Ability to build, validate, and productionize models Preferred Qualifications: Strong business acumen Experience in deploying end to end Machine Learning models 5+ years of DS and ML experience preferred Advanced SQL and Python, with query and coding optimization experience Experience with E-commerce marketing and product analytics is a plus A successful candidate works well in a dynamic environment with minimal supervision. At REVOLVE we all roll up our sleeves to pitch-in and do whatever it takes to get the job done. Each day is a little different, it's what keeps us on our toes and excited to come to work every day. A reasonable estimate of the current base salary range is $120,000 to $150,000 per year.
    $120k-150k yearly 3d ago
  • AI Data Engineer

    Hartleyco

    Data scientist job in San Jose, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 1d ago
  • Senior Data Engineer - Spark, Airflow

    Sigmaways Inc.

    Data scientist job in San Jose, CA

    We are seeking an experienced Data Engineer to design and optimize scalable data pipelines that drive our global data and analytics initiatives. In this role, you will leverage technologies such as Apache Spark, Airflow, and Python to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercard's data ecosystem. The ideal candidate combines strong technical expertise with hands-on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data-driven solutions at enterprise scale. Responsibilities: Design and optimize Spark-based ETL pipelines for large-scale data processing. Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing. Implement partitioning and shuffling strategies to improve Spark performance. Ensure data lineage, quality, and traceability across systems. Develop Python scripts for data transformation, aggregation, and validation. Execute and tune Spark jobs using spark-submit. Perform DataFrame joins and aggregations for analytical insights. Automate multi-step processes through shell scripting and variable management. Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions. Qualifications: Bachelor's degree in Computer Science, Data Engineering, or related field (or equivalent experience). At least 7 years of experience in data engineering or big data development. Strong expertise in Apache Spark architecture, optimization, and job configuration. Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring. Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems. Expertise in Python programming including data structures and algorithmic problem-solving. Hands-on with Spark DataFrames and PySpark transformations using joins, aggregations, filters. Proficient in shell scripting, including managing and passing variables between scripts. Experienced with spark submit for deployment and tuning. Solid understanding of ETL design, workflow automation, and distributed data systems. Excellent debugging and problem-solving skills in large-scale environments. Experience with AWS Glue, EMR, Databricks, or similar Spark platforms. Knowledge of data lineage and data quality frameworks like Apache Atlas. Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools.
    $110k-156k yearly est. 3d ago
  • Data Engineer - Scientific Data Ingestion

    Mithrl

    Data scientist job in San Francisco, CA

    We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world's first commercially available AI Co-Scientist-a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks. We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughs-from target discovery to mechanism of action. WHAT YOU WILL DO Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources - unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.). Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data - extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion - so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data. Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must-have 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet-style data - inconsistent headers, multiple sheets, mixed formats, free-text fields - and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data. Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end-to-end - from raw upload → final clean dataset - with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions. Nice-to-have Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion. Past exposure to LLM-based data transformation or cleansing agents - building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab-data / bioinformatics is a bonus - though not required. WHAT YOU WILL LOVE AT MITHRL Mission-driven impact: you'll be the gatekeeper of data quality - ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You'll have outsized influence over the reliability and trustworthiness of our entire data + AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You'll work closely with our product, data science, and infrastructure teams - shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high-energy, in-person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
    $110k-157k yearly est. 4d ago
  • Data Engineer (SQL / SQL Server Focus)

    Franklin Fitch

    Data scientist job in San Francisco, CA

    Data Engineer (SQL / SQL Server Focus) (Kind note, we cannot provide sponsorship for this role) A leading professional services organization is seeking an experienced Data Engineer to join its team. This role supports enterprise-wide systems, analytics, and reporting initiatives, with a strong emphasis on SQL Server-based data platforms. Key Responsibilities Design, develop, and optimize SQL Server-centric ETL/ELT pipelines to ensure reliable, accurate, and timely data movement across enterprise systems. Develop and maintain SQL Server data models, schemas, and tables to support financial analytics and reporting. Write, optimize, and maintain complex T-SQL queries, stored procedures, functions, and views with a strong focus on performance and scalability. Build and support SQL Server Reporting Services (SSRS) solutions, translating business requirements into clear, actionable reports. Partner with finance and business stakeholders to define KPIs and ensure consistent, trusted reporting outputs. Monitor, troubleshoot, and tune SQL Server workloads, including query performance, indexing strategies, and execution plans. Ensure adherence to data governance, security, and access control standards within SQL Server environments. Support documentation, version control, and change management for database and reporting solutions. Collaborate closely with business analysts, data engineers, and IT teams to deliver end-to-end data solutions. Mentor junior team members and contribute to database development standards and best practices. Act as a key contributor to enterprise data architecture and reporting strategy, particularly around SQL Server platforms. Required Education & Experience Bachelor's or Master's degree in Computer Science, Information Systems, Data Engineering, or a related field. 8+ years of hands-on experience working with SQL Server in enterprise data warehouse or financial reporting environments. Advanced expertise in T-SQL, including: Query optimization Index design and maintenance Stored procedures and performance tuning Strong experience with SQL Server Integration Services (SSIS) and SSRS. Solid understanding of data warehousing concepts, including star and snowflake schemas, and OLAP vs. OLTP design. Experience supporting large, business-critical databases with high reliability and performance requirements. Familiarity with Azure-based SQL Server deployments (Azure SQL, Managed Instance, or SQL Server on Azure VMs) is a plus. Strong analytical, problem-solving, and communication skills, with the ability to work directly with non-technical stakeholders.
    $110k-157k yearly est. 1d ago
  • Data Engineer

    Odiin

    Data scientist job in San Francisco, CA

    You'll work closely with engineering, analytics, and product teams to ensure data is accurate, accessible, and efficiently processed across the organization. Key Responsibilities: Design, develop, and maintain scalable data pipelines and architectures. Collect, process, and transform data from multiple sources into structured, usable formats. Ensure data quality, reliability, and security across all systems. Work with data analysts and data scientists to optimize data models for analytics and machine learning. Implement ETL (Extract, Transform, Load) processes and automate workflows. Monitor and troubleshoot data infrastructure, ensuring minimal downtime and high performance. Collaborate with cross-functional teams to define data requirements and integrate new data sources. Maintain comprehensive documentation for data systems and processes. Requirements: Proven experience as a Data Engineer, ETL Developer, or similar role. Strong programming skills in Python, SQL, or Scala. Experience with data pipeline tools (Airflow, dbt, Luigi, etc.). Familiarity with big data technologies (Spark, Hadoop, Kafka, etc.). Hands-on experience with cloud data platforms (AWS, GCP, Azure, Snowflake, or Databricks). Understanding of data modeling, warehousing, and schema design. Solid knowledge of database systems (PostgreSQL, MySQL, NoSQL). Strong analytical and problem-solving skills.
    $110k-157k yearly est. 5d ago
  • Snowflake/AWS Data Engineer

    Ostechnical

    Data scientist job in Irvine, CA

    Sr. Data Engineer Full Time Direct Hire Job Hybrid with work location-Irvine, CA. The Senior Data Engineer will help design and build a modern data platform that supports enterprise analytics, integrations, and AI/ML initiatives. This role focuses on developing scalable data pipelines, modernizing the enterprise data warehouse, and enabling self-service analytics across the organization. Key Responsibilities • Build and maintain scalable data pipelines using Snowflake, dbt, and Fivetran. • Design and optimize enterprise data models for performance and scalability. • Support data cataloging, lineage, quality, and compliance efforts. • Translate business and analytics requirements into reliable data solutions. • Use AWS (primarily S3) for storage, integration, and platform reliability. • Perform other data engineering tasks as needed. Required Qualifications • Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field. • 5+ years of data engineering experience. • Hands-on expertise with Snowflake, dbt, and Fivetran. • Strong background in data warehousing, dimensional modeling, and SQL. • Experience with AWS (S3) and data governance tools such as Alation or Atlan. • Proficiency in Python for scripting and automation. • Experience with streaming technologies (Kafka, Kinesis, Flink) a plus. • Knowledge of data security and compliance best practices. • Exposure to AI/ML workflows and modern BI tools like Power BI, Tableau, or Looker. • Ability to mentor junior engineers. Skills • Snowflake • dbt • Fivetran • Data modeling and warehousing • AWS • Data governance • SQL • Python • Strong communication and cross-functional collaboration • Interest in emerging data and AI technologies
    $99k-139k yearly est. 2d ago
  • Senior ML Data Engineer

    Midjourney

    Data scientist job in Fremont, CA

    We're the data team behind Midjourney's image generation models. We handle the dataset side: processing, filtering, scoring, captioning, and all the distributed compute that makes high-quality training data possible. What you'd be working on: Large-scale dataset processing and filtering pipelines Training classifiers for content moderation and quality assessment Models for data quality and aesthetic evaluation Data visualization tools for experimenting on dataset samples Testing/simulating distributed inference pipelines Monitoring dashboards for data quality and pipeline health Performance optimization and infrastructure scaling Occasionally jumping into inference optimization and other cross-team projects Our current stack: PySpark, Slurm, distributed batch processing across hybrid cloud setup. We're pragmatic about tools - if there's something better, we'll switch. We're looking for someone strong in either: Data engineering/ML pipelines at scale, or Cloud/infrastructure with distributed systems experience Don't need exact tech matches - comfort with adjacent technologies and willingness to learn matters more. We work with our own hardware plus GCP and other providers, so adaptability across different environments is valuable. Location: SF office a few times per week (we may make exceptions on location for truly exceptional candidates) The role offers variety, our team members often get pulled into different projects across the company, from dataset work to inference optimization. If you're interested in the intersection of large-scale data processing and cutting-edge generative AI, we'd love to hear from you.
    $110k-156k yearly est. 5d ago
  • Senior Data Engineer

    Akube

    Data scientist job in Glendale, CA

    City: Glendale, CA Onsite/ Hybrid/ Remote: Hybrid (3 days a week onsite, Friday - Remote) Duration: 12 months Rate Range: Up to$85/hr on W2 depending on experience (no C2C or 1099 or sub-contract) Work Authorization: GC, USC, All valid EADs except OPT, CPT, H1B Must Have: • 5+ years Data Engineering • Airflow • Spark DataFrame API • Databricks • SQL • API integration • AWS • Python or Java or Scala Responsibilities: • Maintain, update, and expand Core Data platform pipelines. • Build tools for data discovery, lineage, governance, and privacy. • Partner with engineering and cross-functional teams to deliver scalable solutions. • Use Airflow, Spark, Databricks, Delta Lake, Kubernetes, and AWS to build and optimize workflows. • Support platform standards, best practices, and documentation. • Ensure data quality, reliability, and SLA adherence across datasets. • Participate in Agile ceremonies and continuous process improvement. • Work with internal customers to understand needs and prioritize enhancements. • Maintain detailed documentation that supports governance and quality. Qualifications: • 5+ years in data engineering with large-scale pipelines. • Strong SQL and one major programming language (Python, Java, or Scala). • Production experience with Spark and Databricks. • Experience ingesting and interacting with API data sources. • Hands-on Airflow orchestration experience. • Experience developing APIs with GraphQL. • Strong AWS knowledge and infrastructure-as-code familiarity. • Understanding of OLTP vs OLAP, data modeling, and data warehousing. • Strong problem-solving and algorithmic skills. • Clear written and verbal communication. • Agile/Scrum experience. • Bachelor's degree in a STEM field or equivalent industry experience.
    $85 hourly 1d ago
  • Sr Data Platform Engineer

    The Judge Group 4.7company rating

    Data scientist job in Elk Grove, CA

    Hybrid role 3X a week in office in Elk Grove, CA; no remote capabilities This is a direct hire opportunity. We're seeking a seasoned Senior Data Platform Engineer to design, build, and optimize scalable data solutions that power analytics, reporting, and AI/ML initiatives. This full‑time role is hands‑on, working with architects, analysts, and business stakeholders to ensure data systems are reliable, secure, and high‑performing. Responsibilites: Build and maintain robust data pipelines (structured, semi‑structured, unstructured). Implement ETL workflows with Spark, Delta Lake, and cloud‑native tools. Support big data platforms (Databricks, Snowflake, GCP) in production. Troubleshoot and optimize SQL queries, Spark jobs, and workloads. Ensure governance, security, and compliance across data systems. Integrate workflows into CI/CD pipelines with Git, Jenkins, Terraform. Collaborate cross‑functionally to translate business needs into technical solutions. Qualifications: 7+ years in data engineering with production pipeline experience. Expertise in Spark ecosystem, Databricks, Snowflake, GCP. Strong skills in PySpark, Python, SQL. Experience with RAG systems, semantic search, and LLM integration. Familiarity with Kafka, Pub/Sub, vector databases. Proven ability to optimize ETL jobs and troubleshoot production issues. Agile team experience and excellent communication skills. Certifications in Databricks, Snowflake, GCP, or Azure. Exposure to Airflow, BI tools (Power BI, Looker Studio).
    $108k-153k yearly est. 3d ago
  • Senior Snowflake Data Engineer

    Zensar Technologies 4.3company rating

    Data scientist job in Santa Clara, CA

    About the job Why Zensar? We're a bunch of hardworking, fun-loving, people-oriented technology enthusiasts. We love what we do, and we're passionate about helping our clients thrive in an increasingly complex digital world. Zensar is an organization focused on building relationships, with our clients and with each other-and happiness is at the core of everything we do. In fact, we're so into happiness that we've created a Global Happiness Council, and we send out a Happiness Survey to our employees each year. We've learned that employee happiness requires more than a competitive paycheck, and our employee value proposition-grow, own, achieve, learn (GOAL)-lays out the core opportunities we seek to foster for every employee. Teamwork and collaboration are critical to Zensar's mission and success, and our teams work on a diverse and challenging mix of technologies across a broad industry spectrum. These industries include banking and financial services, high-tech and manufacturing, healthcare, insurance, retail, and consumer services. Our employees enjoy flexible work arrangements and a competitive benefits package, including medical, dental, vision, 401(k), among other benefits. If you are looking for a place to have an immediate impact, to grow and contribute, where we work hard, play hard, and support each other, consider joining team Zensar! Zensar is seeking an Senior Snowflake Data Engineer -Santa Clara, CA-Work from office all 5 days-This is open for Full time with excellent benefits and growth opportunities and contract role as well. Job Description: Key Requirements: Strong hands-on experience in data engineering using Snowflake with proven ability to build and optimize large-scale data pipelines. Deep understanding of data architecture principles, including ingestion, transformation, storage, and access control. Solid experience in system design and solution architecture, focusing on scalability, reliability, and maintainability. Expertise in ETL/ELT pipeline design, including data extraction, transformation, validation, and load processes. In-depth knowledge of data modeling techniques (dimensional modeling, star, and snowflake schemas). Skilled in optimizing compute and storage costs across Snowflake environments. Strong proficiency in administration, including database design, schema management, user roles, permissions, and access control policies. Hands-on experience implementing data lineage, quality, and monitoring frameworks. Advanced proficiency in SQL for data processing, transformation, and automation. Experience with reporting and visualization tools such as Power BI and Sigma Computing. Excellent communication and collaboration skills, with the ability to work independently and drive technical initiatives. Zensar believes that diversity of backgrounds, thought, experience, and expertise fosters the robust exchange of ideas that enables the highest quality collaboration and work product. Zensar is an equal opportunity employer. All employment decisions shall be made without regard to age, race, creed, color, religion, sex, national origin, ancestry, disability status, veteran status, sexual orientation, gender identity or expression, genetic information, marital status, citizenship status or any other basis as protected by federal, state, or local law. Zensar is committed to providing veteran employment opportunities to our service men and women. Zensar is committed to providing equal employment opportunities for persons with disabilities or religious observances, including reasonable accommodation when needed. Accommodations made to facilitate the recruiting process are not a guarantee of future or continued accommodations once hired. Zensar does not facilitate/sponsor any work authorization for this position. Candidates who are currently employed by a client or vendor of Zensar may be ineligible for consideration. Zensar values your privacy. We'll use your data in accordance with our privacy statement located at: *********************************
    $109k-150k yearly est. 2d ago

Learn more about data scientist jobs

How much does a data scientist earn in Apple Valley, CA?

The average data scientist in Apple Valley, CA earns between $83,000 and $168,000 annually. This compares to the national average data scientist range of $75,000 to $148,000.

Average data scientist salary in Apple Valley, CA

$118,000
Job type you want
Full Time
Part Time
Internship
Temporary