Post job

Data engineer jobs in Berkeley, CA

- 9,802 jobs
All
Data Engineer
Data Scientist
Software Development Engineer
  • Software Development Engineer, AI/ML, AWS Neuron, Model Inference

    Annapurna Labs (U.S.) Inc. 4.6company rating

    Data engineer job in Cupertino, CA

    The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI workloads on Amazon's Inferentia and Trainium ML accelerators. This comprehensive toolkit includes an ML compiler, runtime, and application framework that seamlessly integrates with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team is at the forefront of running a wide range of models and supporting novel architecture alongside maximizing their performance for AWS's custom ML accelerators. Working across the stack from PyTorch till the hardware-software boundary, our engineers build systematic infrastructure, innovate new methods and create high-performance kernels for ML functions, ensuring every compute unit is fine tuned for optimal performance for our customers' demanding workloads. We combine deep hardware knowledge with ML expertise to push the boundaries of what's possible in AI acceleration. As part of the broader Neuron organization, our team works across multiple technology layers - from frameworks and kernels and collaborate with compiler to runtime and collectives. We not only optimize current performance but also contribute to future architecture designs, working closely with customers to enable their models and ensure optimal performance. This role offers a unique opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures, where you'll help shape the future of AI acceleration technology You will architect and implement business critical features, and mentor a brilliant team of experienced engineers. We operate in spaces that are very large, yet our teams remain small and agile. There is no blueprint. We're inventing. We're experimenting. It is a very unique learning culture. The team works closely with customers on their model enablement, providing direct support and optimization expertise to ensure their machine learning workloads achieve optimal performance on AWS ML accelerators. The team collaborates with open source ecosystems to provide seamless integration and bring peak performance at scale for customers and developers. This role is responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side with compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trainium and Inferentia. Experience optimizing inference performance for both latency and throughput on such large models across the stack from system level optimizations through to Pytorch or JAX is a must have. You can learn more about Neuron ***************************************************************************************** *********************************************** ************************************* ********************************************************************************************* Key job responsibilities This role will help lead the efforts in building distributed inference support for Pytorch in the Neuron SDK. This role will tune these models to ensure highest performance and maximize the efficiency of them running on the customer AWS Trainium and Inferentia silicon and servers. Strong software development using Python, System level programming and ML knowledge are both critical to this role. Our engineers collaborate across compiler, runtime, framework, and hardware teams to optimize machine learning workloads for our global customer base. Working at the intersection of software, hardware, and machine learning systems, you'll bring expertise in low-level optimization, system architecture, and ML model acceleration. In this role, you will: * Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators. * Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment. * Build infrastructure to systematically analyze and onboard multiple models with diverse architecture. * Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models * Analyze and optimize system-level performance across multiple generations of Neuron hardware * Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks * Implement optimizations such as fusion, sharding, tiling, and scheduling * Conduct comprehensive testing, including unit and end-to-end model testing with continuous deployment and releases through pipelines. * Work directly with customers to enable and optimize their ML models on AWS accelerators * Collaborate across teams to develop innovative optimization techniques A day in the life You will collaborate with a cross-functional team of applied scientists, system engineers, and product managers to deliver state-of-the-art inference capabilities for Generative AI applications. Your work will involve debugging performance issues, optimizing memory usage, and shaping the future of Neuron's inference stack across Amazon and the Open Source Community. As you design and code solutions to help our team drive efficiencies in software architecture, you'll create metrics, implement automation and other improvements, and resolve the root cause of software defects. You will also build high-impact solutions to deliver to our large customer base and participate in design discussions, code review, and communicate with internal and external stakeholders. You will work cross-functionally to help drive business decisions with your technical input. You will work in a startup-like development environment, where you're always working on the most important initiative. About the team The Inference Enablement and Acceleration team fosters a builder's culture where experimentation is encouraged, and impact is measurable. We emphasize collaboration, technical ownership, and continuous learning. Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future. Join us to solve some of the most interesting and impactful infrastructure challenges in AI/ML today. BASIC QUALIFICATIONS- Bachelor's degree in computer science or equivalent - 5+ years of non-internship professional software development experience - 5+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience - Fundamentals of Machine learning and LLMs, their architecture, training and inference lifecycles along with work experience on some optimizations for improving the model execution. - Software development experience in C++, Python (experience in at least one language is required). - Strong understanding of system performance, memory management, and parallel computing principles. - Proficiency in debugging, profiling, and implementing best software engineering practices in large-scale systems. PREFERRED QUALIFICATIONS- Familiarity with PyTorch, JIT compilation, and AOT tracing. - Familiarity with CUDA kernels or equivalent ML or low-level kernels - Candidates with performant kernel development such as CUTLASS, FlashInfer etc., would be well suited. - Familiar with syntax and tile-level semantics similar to Triton. - Experience with online/offline inference serving with vLLM, SGLang, TensorRT or similar platforms in production environments. - Deep understanding of computer architecture, operation systems level software and working knowledge of parallel computing. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company's reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit ********************************************************* for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner. Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit ******************************************************** This position will remain posted until filled. Applicants should apply via our internal or external career site.
    $129.3k-223.6k yearly 23h ago
  • Staff Data Scientist

    Quantix Search

    Data engineer job in San Francisco, CA

    Staff Data Scientist | San Francisco | $250K-$300K + Equity We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI. In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance. What you'll do: Drive deep-dive analyses on user behavior, product performance, and growth drivers Design and interpret A/B tests to measure product impact at scale Build scalable data models, pipelines, and dashboards for company-wide use Partner with Product and Engineering to embed experimentation best practices Evaluate ML models, ensuring business relevance, performance, and trade-off clarity What we're looking for: 5+ years in data science or product analytics at scale (consumer or marketplace preferred) Advanced SQL and Python skills, with strong foundations in statistics and experimental design Proven record of designing, running, and analyzing large-scale experiments Ability to analyze and reason about ML models (classification, recommendation, LLMs) Strong communicator with a track record of influencing cross-functional teams If you're excited by the sound of this challenge- apply today and we'll be in touch.
    $250k-300k yearly 23h ago
  • Data Scientist

    Skale 3.7company rating

    Data engineer job in San Francisco, CA

    We're working with a Series A health tech start-up pioneering a revolutionary approach to healthcare AI, developing neurosymbolic systems that combine statistical learning with structured medical knowledge. Their technology is being adopted by leading health systems and insurers to enhance patient outcomes through advanced predictive analytics. We're seeking Machine Learning Engineers who excel at the intersection of data science, modeling, and software engineering. You'll design and implement models that extract insights from longitudinal healthcare data, balancing analytical rigor, interpretability, and scalability. This role offers a unique opportunity to tackle foundational modeling challenges in healthcare, where your contributions will directly influence clinical, actuarial, and policy decisions. Key Responsibilities Develop predictive models to forecast disease progression, healthcare utilization, and costs using temporal clinical data (claims, EHR, laboratory results, pharmacy records) Design interpretable and explainable ML solutions that earn the trust of clinicians, actuaries, and healthcare decision-makers Research and prototype innovative approaches leveraging both classical and modern machine learning techniques Build robust, scalable ML pipelines for training, validation, and deployment in distributed computing environments Collaborate cross-functionally with data engineers, clinicians, and product teams to ensure models address real-world healthcare needs Communicate findings and methodologies effectively through visualizations, documentation, and technical presentations Required Qualifications Strong foundation in statistical modeling, machine learning, or data science, with preference for experience in temporal or longitudinal data analysis Proficiency in Python and ML frameworks (PyTorch, JAX, NumPyro, PyMC, etc.) Proven track record of transitioning models from research prototypes to production systems Experience with probabilistic methods, survival analysis, or Bayesian inference (highly valued) Bonus Qualifications Experience working with clinical data and healthcare terminologies (ICD, CPT, SNOMED CT, LOINC) Background in actuarial modeling, claims forecasting, or risk adjustment methodologies
    $123k-171k yearly est. 3d ago
  • Data Scientist V

    Creospan Inc.

    Data engineer job in Menlo Park, CA

    Creospan is a growing tech collective of makers, shakers, and problem solvers, offering solutions today that will propel businesses into a better tomorrow. “Tomorrow's ideas, built today!” In addition to being able to work alongside equally brilliant and motivated developers, our consultants appreciate the opportunity to learn and apply new skills and methodologies to different clients and industries. ******NO C2C/3RD PARTY, LOOKING FOR W2 CANDIDATES ONLY, must be able to work in the US without sponsorship now or in the future*** Summary: The main function of the Data Scientist is to produce innovative solutions driven by exploratory data analysis from complex and high-dimensional datasets. Job Responsibilities: • Apply knowledge of statistics, machine learning, programming, data modeling, simulation, and advanced mathematics to recognize patterns, identify opportunities, pose business questions, and make valuable discoveries leading to prototype development and product improvement. • Use a flexible, analytical approach to design, develop, and evaluate predictive models and advanced algorithms that lead to optimal value extraction from the data. • Generate and test hypotheses and analyze and interpret the results of product experiments. • Work with product engineers to translate prototypes into new products, services, and features and provide guidelines for large-scale implementation. • Provide Business Intelligence (BI) and data visualization support, which includes, but limited to support for the online customer service dashboards and other ad-hoc requests requiring data analysis and visual support. Skills: • Experienced in either programming languages such as Python and/or R, big data tools such as Hadoop, or data visualization tools such as Tableau. • The ability to communicate effectively in writing, including conveying complex information and promoting in-depth engagement on course topics. • Experience working with large datasets. Education/Experience: • Master of Science degree in computer science or in a relevant field.
    $107k-155k yearly est. 4d ago
  • AI Data Engineer

    Hartleyco

    Data engineer job in Fremont, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 2d ago
  • Data Engineer

    Midjourney

    Data engineer job in Sonoma, CA

    Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species. We are a small, self-funded team focused on design, human infrastructure, and AI. We have no investors, no big company controlling us, and no advertisers. We are 100% supported by our amazing community. Our tools are already used by millions of people to dream, to explore, and to create. But this is just the start. We think the story of the 2020s is about building the tools that will remake the world for the next century. We're making those tools, to expand what it means to be human. Core Responsibilities: Design and maintain data pipelines to consolidate information across multiple sources (subscription platforms, payment systems, infrastructure and usage monitoring, and financial systems) into a unified analytics environment Build and manage interactive dashboards and self-service BI tools that enable leadership to track key business metrics including revenue performance, infrastructure costs, customer retention, and operational efficiency Serve as technical owner of our financial planning platform (Pigment or similar), leading implementation and build-out of models, data connections, and workflows in partnership with Finance leadership to translate business requirements into functional system architecture Develop automated data quality checks and cleaning processes to ensure accuracy and consistency across financial and operational datasets Partner with Finance, Product and Operations teams to translate business questions into analytical frameworks, including cohort analysis, cost modeling, and performance trending Create and maintain documentation for data models, ETL processes, dashboard logic, and system workflows to ensure knowledge continuity Support strategic planning initiatives by building financial models, scenario analyses, and data-driven recommendations for resource allocation and growth investments Required Qualifications: 3-5+ years experience in data engineering, analytics engineering, or similar role with demonstrated ability to work with large-scale datasets Strong SQL skills and experience with modern data warehousing solutions (BigQuery, Snowflake, Redshift, etc.) Proficiency in at least one programming language (Python, R) for data manipulation and analysis Experience with BI/visualization tools (Looker, Tableau, Power BI, or similar) Hands-on experience administering enterprise financial systems (NetSuite, SAP, Oracle, or similar ERP platforms) Experience working with Stripe Billing or similar subscription management platforms, including data extraction and revenue reporting Ability to communicate technical concepts clearly to non-technical stakeholders
    $110k-157k yearly est. 2d ago
  • Senior Data Engineer - Spark, Airflow

    Sigmaways Inc.

    Data engineer job in Fremont, CA

    We are seeking an experienced Data Engineer to design and optimize scalable data pipelines that drive our global data and analytics initiatives. In this role, you will leverage technologies such as Apache Spark, Airflow, and Python to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercard's data ecosystem. The ideal candidate combines strong technical expertise with hands-on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data-driven solutions at enterprise scale. Responsibilities: Design and optimize Spark-based ETL pipelines for large-scale data processing. Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing. Implement partitioning and shuffling strategies to improve Spark performance. Ensure data lineage, quality, and traceability across systems. Develop Python scripts for data transformation, aggregation, and validation. Execute and tune Spark jobs using spark-submit. Perform DataFrame joins and aggregations for analytical insights. Automate multi-step processes through shell scripting and variable management. Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions. Qualifications: Bachelor's degree in Computer Science, Data Engineering, or related field (or equivalent experience). At least 7 years of experience in data engineering or big data development. Strong expertise in Apache Spark architecture, optimization, and job configuration. Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring. Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems. Expertise in Python programming including data structures and algorithmic problem-solving. Hands-on with Spark DataFrames and PySpark transformations using joins, aggregations, filters. Proficient in shell scripting, including managing and passing variables between scripts. Experienced with spark submit for deployment and tuning. Solid understanding of ETL design, workflow automation, and distributed data systems. Excellent debugging and problem-solving skills in large-scale environments. Experience with AWS Glue, EMR, Databricks, or similar Spark platforms. Knowledge of data lineage and data quality frameworks like Apache Atlas. Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools.
    $110k-156k yearly est. 3d ago
  • Imaging Data Engineer/Architect

    Intuitive.Ai

    Data engineer job in San Francisco, CA

    About us: Intuitive is an innovation-led engineering company delivering business outcomes for 100's of Enterprises globally. With the reputation of being a Tiger Team & a Trusted Partner of enterprise technology leaders, we help solve the most complex Digital Transformation challenges across following Intuitive Superpowers: Modernization & Migration Application & Database Modernization Platform Engineering (IaC/EaC, DevSecOps & SRE) Cloud Native Engineering, Migration to Cloud, VMware Exit FinOps Data & AI/ML Data (Cloud Native / DataBricks / Snowflake) Machine Learning, AI/GenAI Cybersecurity Infrastructure Security Application Security Data Security AI/Model Security SDx & Digital Workspace (M365, G-suite) SDDC, SD-WAN, SDN, NetSec, Wireless/Mobility Email, Collaboration, Directory Services, Shared Files Services Intuitive Services: Professional and Advisory Services Elastic Engineering Services Managed Services Talent Acquisition & Platform Resell Services About the job: Title: Imaging Data Engineer/Architect Start Date: Immediate # of Positions: 1 Position Type: Contract/ Full-Time Location: San Francisco, CA Notes: Imaging data Engineer/architect who understands Radiology and Digital pathology, related clinical data and metadata. Hands-on experience on above technologies, and with good knowledge in the biomedical imaging, and data pipelines overall. About the Role We are seeking a highly skilled Imaging Data Engineer/Architect to join our San Francisco team as a Subject Matter Expert (SME) in radiology and digital pathology. This role will design and manage imaging data pipelines, ensuring seamless integration of clinical data and metadata to support advanced diagnostic and research applications. The ideal candidate will have deep expertise in medical imaging standards, cloud-based data architectures, and healthcare interoperability, contributing to innovative solutions that enhance patient outcomes. Responsibilities Design and implement scalable data architectures for radiology and digital pathology imaging data, including DICOM, HL7, and FHIR standards. Develop and optimize data pipelines to process and store large-scale imaging datasets (e.g., MRI, CT, histopathology slides) and associated metadata. Collaborate with clinical teams to understand radiology and pathology workflows, ensuring data solutions align with clinical needs. Ensure data integrity, security, and compliance with healthcare regulations (e.g., HIPAA, GDPR). Integrate imaging data with AI/ML models for diagnostic and predictive analytics, working closely with data scientists. Build and maintain metadata schemas to support data discoverability and interoperability across systems. Provide technical expertise to cross-functional teams, including product managers and software engineers, to drive imaging data strategy. Conduct performance tuning and optimization of imaging data storage and retrieval systems in cloud environments (e.g., AWS, Google Cloud, Azure). Document data architectures and processes, ensuring knowledge transfer to internal teams and external partners. Stay updated on emerging imaging technologies and standards, proposing innovative solutions to enhance data workflows. Qualifications Education: Bachelor's degree in computer science, Biomedical Engineering, or a related field (master's preferred). Experience: 5+ years in data engineering or architecture, with at least 3 years focused on medical imaging (radiology and/or digital pathology). Proven experience with DICOM, HL7, FHIR, and imaging metadata standards (e.g., SNOMED, LOINC). Hands-on experience with cloud platforms (AWS, Google Cloud, or Azure) for imaging data storage and processing. Technical Skills: Proficiency in programming languages (e.g., Python, Java, SQL) for data pipeline development. Expertise in ETL processes, data warehousing, and database management (e.g., Snowflake, BigQuery, PostgreSQL). Familiarity with AI/ML integration for imaging data analytics. Knowledge of containerization (e.g., Docker, Kubernetes) for deploying data solutions. Domain Knowledge: Deep understanding of radiology and digital pathology workflows, including PACS and LIS systems. Familiarity with clinical data integration and healthcare interoperability standards. Soft Skills: Strong analytical and problem-solving skills to address complex data challenges. Excellent communication skills to collaborate with clinical and technical stakeholders. Ability to work independently in a fast-paced environment, with a proactive approach to innovation. Certifications (preferred): AWS Certified Solutions Architect, Google Cloud Professional Data Engineer, or equivalent. Certifications in medical imaging (e.g., CIIP - Certified Imaging Informatics Professional).
    $110k-157k yearly est. 2d ago
  • Data Engineer (SQL / SQL Server Focus)

    Franklin Fitch

    Data engineer job in San Francisco, CA

    Data Engineer (SQL / SQL Server Focus) (Kind note, we cannot provide sponsorship for this role) A leading professional services organization is seeking an experienced Data Engineer to join its team. This role supports enterprise-wide systems, analytics, and reporting initiatives, with a strong emphasis on SQL Server-based data platforms. Key Responsibilities Design, develop, and optimize SQL Server-centric ETL/ELT pipelines to ensure reliable, accurate, and timely data movement across enterprise systems. Develop and maintain SQL Server data models, schemas, and tables to support financial analytics and reporting. Write, optimize, and maintain complex T-SQL queries, stored procedures, functions, and views with a strong focus on performance and scalability. Build and support SQL Server Reporting Services (SSRS) solutions, translating business requirements into clear, actionable reports. Partner with finance and business stakeholders to define KPIs and ensure consistent, trusted reporting outputs. Monitor, troubleshoot, and tune SQL Server workloads, including query performance, indexing strategies, and execution plans. Ensure adherence to data governance, security, and access control standards within SQL Server environments. Support documentation, version control, and change management for database and reporting solutions. Collaborate closely with business analysts, data engineers, and IT teams to deliver end-to-end data solutions. Mentor junior team members and contribute to database development standards and best practices. Act as a key contributor to enterprise data architecture and reporting strategy, particularly around SQL Server platforms. Required Education & Experience Bachelor's or Master's degree in Computer Science, Information Systems, Data Engineering, or a related field. 8+ years of hands-on experience working with SQL Server in enterprise data warehouse or financial reporting environments. Advanced expertise in T-SQL, including: Query optimization Index design and maintenance Stored procedures and performance tuning Strong experience with SQL Server Integration Services (SSIS) and SSRS. Solid understanding of data warehousing concepts, including star and snowflake schemas, and OLAP vs. OLTP design. Experience supporting large, business-critical databases with high reliability and performance requirements. Familiarity with Azure-based SQL Server deployments (Azure SQL, Managed Instance, or SQL Server on Azure VMs) is a plus. Strong analytical, problem-solving, and communication skills, with the ability to work directly with non-technical stakeholders.
    $110k-157k yearly est. 1d ago
  • Data Engineer - Scientific Data Ingestion

    Mithrl

    Data engineer job in San Francisco, CA

    We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world's first commercially available AI Co-Scientist-a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks. We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughs-from target discovery to mechanism of action. WHAT YOU WILL DO Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources - unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.). Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data - extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion - so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data. Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must-have 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet-style data - inconsistent headers, multiple sheets, mixed formats, free-text fields - and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data. Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end-to-end - from raw upload → final clean dataset - with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions. Nice-to-have Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion. Past exposure to LLM-based data transformation or cleansing agents - building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab-data / bioinformatics is a bonus - though not required. WHAT YOU WILL LOVE AT MITHRL Mission-driven impact: you'll be the gatekeeper of data quality - ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You'll have outsized influence over the reliability and trustworthiness of our entire data + AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You'll work closely with our product, data science, and infrastructure teams - shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high-energy, in-person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
    $110k-157k yearly est. 4d ago
  • Bigdata Engineer

    Net2Source (N2S

    Data engineer job in Mountain View, CA

    Net2Source is a Global Workforce Solutions Company headquartered at NJ, USA with its branch offices in Asia Pacific Region. We are one of the fastest growing IT Consulting company across the USA and we are hiring " Bigdata Engineer " for one of our clients. We offer a wide gamut of consulting solutions customized to our 450+ clients ranging from Fortune 500/1000 to Start-ups across various verticals like Technology, Financial Services, Healthcare, Life Sciences, Oil & Gas, Energy, Retail, Telecom, Utilities, Technology, Manufacturing, the Internet, and Engineering. Position: Bigdata Engineer Location: MTV, CA (Onsite) - Locals Only Type: Contract Exp Level - 10+ Years Required Skills Min of 7+ years working with Apache Flink and Apache Spark 5+ years' experience with Java Strong expertise in Python Expertise developing new pipelines Adept at supporting and enhancing existing pipelines Strong experience with AWS Stack Why Work With Us? We believe in more than just jobs-we build careers. At Net2Source, we champion leadership at all levels, celebrate diverse perspectives, and empower you to make an impact. Think work-life balance, professional growth, and a collaborative culture where your ideas matter. Our Commitment to Inclusion & Equity Net2Source is an equal opportunity employer, dedicated to fostering a workplace where diverse talents and perspectives are valued. We make all employment decisions based on merit, ensuring a culture of respect, fairness, and opportunity for all, regardless of age, gender, ethnicity, disability, or other protected characteristics. Awards & Recognition America's Most Honored Businesses (Top 10%) Fastest-Growing Staffing Firm by Staffing Industry Analysts INC 5000 List for Eight Consecutive Years Top 100 by Dallas Business Journal Spirit of Alliance Award by Agile1 Maddhuker Singh Sr Account & Delivery Manager ***********************
    $110k-157k yearly est. 3d ago
  • Snowflake Data Engineer (DBT SQL)

    Maganti It Resources, LLC 3.9company rating

    Data engineer job in San Jose, CA

    Job Description - Snowflake Data Engineer (DBT SQL) Duration: 6 months Key Responsibilities • Design, develop, and optimize data pipelines using Snowflake and DBT SQL. • Implement and manage data warehousing concepts, metadata management, and data modeling. • Work with data lakes, multi-dimensional models, and data dictionaries. • Utilize Snowflake features such as Time Travel and Zero-Copy Cloning. • Perform query performance tuning and cost optimization in cloud environments. • Administer Snowflake architecture, warehousing, and processing. • Develop and maintain PL/SQL Snowflake solutions. • Apply design patterns for scalable and maintainable data solutions. • Collaborate with cross-functional teams and tech leads across multiple tracks. • Provide technical and functional guidance to team members. Required Skills & Experience • Hands-on Snowflake development experience (mandatory). • Strong proficiency in SQL and DBT SQL. • Knowledge of data warehousing concepts, metadata management, and data modeling. • Experience with data lakes, multi-dimensional models, and data dictionaries. • Expertise in Snowflake features (Time Travel, Zero-Copy Cloning). • Strong background in query optimization and cost management. • Familiarity with Snowflake administration and pipeline development. • Knowledge of PL/SQL and SQL databases (additional plus). • Excellent communication, leadership, and organizational skills. • Strong team player with a positive attitude.
    $119k-167k yearly est. 1d ago
  • Staff Data Scientist

    Quantix Search

    Data engineer job in Santa Rosa, CA

    Staff Data Scientist | San Francisco | $250K-$300K + Equity We're partnering with one of the fastest-growing AI companies in the world to hire a Staff Data Scientist. Backed by over $230M from top-tier investors and already valued at over $1B, they've secured customers that include some of the most recognizable names in tech. Their AI platform powers millions of daily interactions and is quickly becoming the enterprise standard for conversational AI. In this role, you'll bring rigorous analytics and experimentation leadership that directly shapes product strategy and company performance. What you'll do: Drive deep-dive analyses on user behavior, product performance, and growth drivers Design and interpret A/B tests to measure product impact at scale Build scalable data models, pipelines, and dashboards for company-wide use Partner with Product and Engineering to embed experimentation best practices Evaluate ML models, ensuring business relevance, performance, and trade-off clarity What we're looking for: 5+ years in data science or product analytics at scale (consumer or marketplace preferred) Advanced SQL and Python skills, with strong foundations in statistics and experimental design Proven record of designing, running, and analyzing large-scale experiments Ability to analyze and reason about ML models (classification, recommendation, LLMs) Strong communicator with a track record of influencing cross-functional teams If you're excited by the sound of this challenge- apply today and we'll be in touch.
    $250k-300k yearly 23h ago
  • AI Data Engineer

    Hartleyco

    Data engineer job in Santa Rosa, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 2d ago
  • Senior Data Engineer

    Skale 3.7company rating

    Data engineer job in San Francisco, CA

    We're hiring a Senior/Lead Data Engineer to join a fast-growing AI startup. The team comes from a billion dollar AI company, and has raised a $40M+ seed round. You'll need to be comfortable transforming and moving data in a new 'group level' data warehouse, from legacy sources. You'll have a strong data modeling background. Proven proficiency in modern data transformation tools, specifically dbt and/or SQLMesh. Exceptional ability to apply systems thinking and complex problem-solving to ambiguous challenges. Experience within a high-growth startup environment is highly valued. Deep, practical knowledge of the entire data lifecycle, from generation and governance through to advanced downstream applications (e.g., fueling AI/ML models, LLM consumption, and core product features). Outstanding ability to communicate technical complexity clearly, synthesizing information into actionable frameworks for executive and cross-functional teams.
    $126k-178k yearly est. 1d ago
  • AI Data Engineer

    Franklin Fitch

    Data engineer job in Palo Alto, CA

    Experience Level: Mid-Senior (4+ years) About the Role This role supports the organization's data, analytics, and AI innovation efforts by building reliable data pipelines, developing machine learning solutions, and improving how information is collected, processed, and used across the business. This person is a collaborative problem-solver who can translate business needs into technical solutions, work confidently across large datasets, and deliver clear, actionable insights that help drive strategic outcomes. What They Do Support innovation projects by facilitating discussions, refining business processes, integrating data sources, and advising on best practices in analytics and AI. Build and maintain data architectures, APIs, and pipelines to support both operational systems and AI initiatives. Design and implement machine learning models, including developing prompts and AI-driven applications. Collaborate with technical and non-technical teams to improve data quality, reporting, and analytics across the organization. Manage large, complex datasets and develop tools to extract meaningful insights. Automate manual processes, optimize data delivery, and enhance internal infrastructure. Create dashboards and visualizations that clearly communicate insights to stakeholders. Apply statistical, computational, NLP, and ML techniques to solve business problems. Document processes, use version control best practices, and deploy models to production using enterprise platforms such as Azure. Ensure data security and privacy across systems, vendors, and applications. Experience & Skills 4+ years in a Data Engineering or DataOps/DevOps role. Strong experience with SQL, relational databases, cloud environments (Azure/AWS), and building/optimizing data pipelines. Ability to work with structured and unstructured data and perform root-cause analysis on data and integrations. Experience with AI/ML data preparation, preprocessing, feature engineering, and model deployment. Familiarity with vector databases, embeddings, and RAG-based applications. Strong communication skills and ability to collaborate across teams. Ability to handle confidential and sensitive information responsibly. Technologies Data & BI: SQL Server, T-SQL, SSIS/SSRS, ETL/ELT, Power BI, DAX, PowerQuery Programming: Python, R, C++, Julia, JavaScript, SQL File Formats: JSON, XML, SQL Extras (Nice to Have): PowerShell, Regex, VBA, documentation/process mapping, Tableau/Domo, Azure ML, legal tech platforms, LLM integration (OpenAI, Azure OpenAI, Claude), embedding models
    $110k-157k yearly est. 23h ago
  • Data Engineer

    Midjourney

    Data engineer job in San Jose, CA

    Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species. We are a small, self-funded team focused on design, human infrastructure, and AI. We have no investors, no big company controlling us, and no advertisers. We are 100% supported by our amazing community. Our tools are already used by millions of people to dream, to explore, and to create. But this is just the start. We think the story of the 2020s is about building the tools that will remake the world for the next century. We're making those tools, to expand what it means to be human. Core Responsibilities: Design and maintain data pipelines to consolidate information across multiple sources (subscription platforms, payment systems, infrastructure and usage monitoring, and financial systems) into a unified analytics environment Build and manage interactive dashboards and self-service BI tools that enable leadership to track key business metrics including revenue performance, infrastructure costs, customer retention, and operational efficiency Serve as technical owner of our financial planning platform (Pigment or similar), leading implementation and build-out of models, data connections, and workflows in partnership with Finance leadership to translate business requirements into functional system architecture Develop automated data quality checks and cleaning processes to ensure accuracy and consistency across financial and operational datasets Partner with Finance, Product and Operations teams to translate business questions into analytical frameworks, including cohort analysis, cost modeling, and performance trending Create and maintain documentation for data models, ETL processes, dashboard logic, and system workflows to ensure knowledge continuity Support strategic planning initiatives by building financial models, scenario analyses, and data-driven recommendations for resource allocation and growth investments Required Qualifications: 3-5+ years experience in data engineering, analytics engineering, or similar role with demonstrated ability to work with large-scale datasets Strong SQL skills and experience with modern data warehousing solutions (BigQuery, Snowflake, Redshift, etc.) Proficiency in at least one programming language (Python, R) for data manipulation and analysis Experience with BI/visualization tools (Looker, Tableau, Power BI, or similar) Hands-on experience administering enterprise financial systems (NetSuite, SAP, Oracle, or similar ERP platforms) Experience working with Stripe Billing or similar subscription management platforms, including data extraction and revenue reporting Ability to communicate technical concepts clearly to non-technical stakeholders
    $110k-156k yearly est. 2d ago
  • Senior Data Engineer - Spark, Airflow

    Sigmaways Inc.

    Data engineer job in San Jose, CA

    We are seeking an experienced Data Engineer to design and optimize scalable data pipelines that drive our global data and analytics initiatives. In this role, you will leverage technologies such as Apache Spark, Airflow, and Python to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercard's data ecosystem. The ideal candidate combines strong technical expertise with hands-on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data-driven solutions at enterprise scale. Responsibilities: Design and optimize Spark-based ETL pipelines for large-scale data processing. Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing. Implement partitioning and shuffling strategies to improve Spark performance. Ensure data lineage, quality, and traceability across systems. Develop Python scripts for data transformation, aggregation, and validation. Execute and tune Spark jobs using spark-submit. Perform DataFrame joins and aggregations for analytical insights. Automate multi-step processes through shell scripting and variable management. Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions. Qualifications: Bachelor's degree in Computer Science, Data Engineering, or related field (or equivalent experience). At least 7 years of experience in data engineering or big data development. Strong expertise in Apache Spark architecture, optimization, and job configuration. Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring. Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems. Expertise in Python programming including data structures and algorithmic problem-solving. Hands-on with Spark DataFrames and PySpark transformations using joins, aggregations, filters. Proficient in shell scripting, including managing and passing variables between scripts. Experienced with spark submit for deployment and tuning. Solid understanding of ETL design, workflow automation, and distributed data systems. Excellent debugging and problem-solving skills in large-scale environments. Experience with AWS Glue, EMR, Databricks, or similar Spark platforms. Knowledge of data lineage and data quality frameworks like Apache Atlas. Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools.
    $110k-156k yearly est. 3d ago
  • Senior Data Engineer

    Skale 3.7company rating

    Data engineer job in Fremont, CA

    We're hiring a Senior/Lead Data Engineer to join a fast-growing AI startup. The team comes from a billion dollar AI company, and has raised a $40M+ seed round. You'll need to be comfortable transforming and moving data in a new 'group level' data warehouse, from legacy sources. You'll have a strong data modeling background. Proven proficiency in modern data transformation tools, specifically dbt and/or SQLMesh. Exceptional ability to apply systems thinking and complex problem-solving to ambiguous challenges. Experience within a high-growth startup environment is highly valued. Deep, practical knowledge of the entire data lifecycle, from generation and governance through to advanced downstream applications (e.g., fueling AI/ML models, LLM consumption, and core product features). Outstanding ability to communicate technical complexity clearly, synthesizing information into actionable frameworks for executive and cross-functional teams.
    $126k-177k yearly est. 1d ago
  • AI Data Engineer

    Hartleyco

    Data engineer job in San Jose, CA

    Member of Technical Staff - AI Data Engineer San Francisco (In-Office) $150K to $225K + Equity A high-growth, AI-native startup coming out of stealth is hiring AI Data Engineers to build the systems that power production-grade AI. The company has recently signed a Series A term sheet and is scaling rapidly. This role is central to unblocking current bottlenecks across data engineering, context modeling, and agent performance. Responsibilities: • Build distributed, reliable data pipelines using Airflow, Temporal, and n8n • Model SQL, vector, and NoSQL databases (Postgres, Qdrant, etc.) • Build API and function-based services in Python • Develop custom automations (Playwright, Stagehand, Zapier) • Work with AI researchers to define and expose context as services • Identify gaps in data quality and drive changes to upstream processes • Ship fast, iterate, and own outcomes end-to-end Required Experience: • Strong background in data engineering • Hands-on experience working with LLMs or LLM-powered applications • Data modeling skills across SQL and vector databases • Experience building distributed systems • Experience with Airflow, Temporal, n8n, or similar workflow engines • Python experience (API/services) • Startup mindset and bias toward rapid execution Nice To Have: • Experience with stream processing (Flink) • dbt or Clickhouse experience • CDC pipelines • Experience with context construction, RAG, or agent workflows • Analytical tooling (Posthog) What You Can Expect: • High-intensity, in-office environment • Fast decision-making and rapid shipping cycles • Real ownership over architecture and outcomes • Opportunity to work on AI systems operating at meaningful scale • Competitive compensation package • Meals provided plus full medical, dental, and vision benefits If this sounds like you, please apply now.
    $150k-225k yearly 2d ago

Learn more about data engineer jobs

How much does a data engineer earn in Berkeley, CA?

The average data engineer in Berkeley, CA earns between $94,000 and $184,000 annually. This compares to the national average data engineer range of $80,000 to $149,000.

Average data engineer salary in Berkeley, CA

$131,000

What are the biggest employers of Data Engineers in Berkeley, CA?

The biggest employers of Data Engineers in Berkeley, CA are:
  1. ATOMIC
  2. Promise
  3. Tata Group
  4. VuMedi
  5. Rockridge Resources
  6. Common
  7. Elicit
Job type you want
Full Time
Part Time
Internship
Temporary