Post job

Data Scientist jobs at LegalZoom

- 882 jobs
  • ETL/ELT Data Engineer (Secret Clearance) - Hybrid

    Launchcode 2.9company rating

    Austin, TX jobs

    LaunchCode is recruiting for a Software Data Engineer to work at one of our partner companies! Details: Full-Time W2, Salary Immediate opening Hybrid - Austin, TX (onsite 1-2 times a week) Pay $85K-$120K Minimum Experience: 4 years Security Clearance: Active DoD Secret Clearance Disclaimer: Please note that we are unable to provide work authorization or sponsorship for this role, now or in the future. Candidates requiring current or future sponsorship will not be considered. Job description Job Summary A Washington, DC-based software solutions provider founded in 2017, specializes in delivering mission-critical and enterprise solutions to the federal government. Originating from the Department of Defense's software factory ecosystem, the company focuses on Command and Control, Cybersecurity, Space, Geospatial, and Modeling & Simulation. The company leverages commercial technology to enhance the capabilities of the DoD, IC, and their end-users, with innovation driven by its Innovation centers. The company has a presence in Boston, MA, Colorado Springs, CO, San Antonio, TX, and St. Louis, MO. Why the company? Environment of Autonomy Innovative Commercial Approach People over process We are seeking a passionate Software Data Engineer to support the Army Software Factory (ASWF) in aligning with DoDM 8140.03 Cyber Workforce requirements and broader compliance mandates. The Army Software Factory (ASWF), a first-of-its-kind initiative under Army Futures Command, is revolutionizing the Army's approach to software development by training and employing self-sustaining technical talent from across the military and civilian workforce. Guided by the motto “By Soldiers, For Soldiers,” ASWF equips service members to develop mission-critical software solutions independently-especially vital for future contested environments where traditional technical support may be unavailable. This initiative also serves as a strategic prototype to modernize legacy IT processes and build technical readiness across the force to ensure battlefield dominance in the digital age. Required Skills: Active DoD Secret Clearance (Required) 4+ years of experience in data science, data engineering, or similar roles. Expertise in designing, building, and maintaining scalable ETL/ELT pipelines using tools and languages such as Python, SQL, Apache Spark, or Airflow. Strong proficiency in working with relational and NoSQL databases, including experience with database design, optimization, and query performance tuning (e.g., PostgreSQL, MySQL, MongoDB, Cassandra). Demonstrable experience with cloud data platforms and services (e.g., AWS Redshift, S3, Glue, Athena; Azure Data Lake, Data Factory, Synapse; Google BigQuery, Cloud Storage, Dataflow). Solid understanding of data warehousing concepts (e.g., Kimball, Inmon methodologies) and experience with data modeling for analytical purposes. Proficiency in at least one programming language commonly used in data engineering (e.g., Python, Java, Scala) for data manipulation, scripting, and automation. CompTIA Security+ Certified or otherwise DoDM 8140.03 (formerly DoD 8570.01-M) compliant. Nice to Have: Familiarity with SBIR technologies and transformative platform shifts Experience working in Agile or DevSecOps environments 2+ years of experience interfacing with Platform Engineers and data visibility team, manage AWS resources, and GitLab admin #LI-hybrid #austintx #ETLengineer #dataengineer #army #aswf #clearancejobs #clearedjobs #secretclearance #ETL
    $85k-120k yearly 2d ago
  • Data Architect - Azure Databricks

    Fractal 4.2company rating

    Palo Alto, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Job Posting Title: Principal Architect - Azure Databricks Job Description Seeking a visionary and hands-on Principal Architect to lead large-scale, complex technical initiatives leveraging Databricks within the healthcare payer domain. This role is pivotal in driving data modernization, advanced analytics, and AI/ML solutions for our clients. You will serve as a strategic advisor, technical leader, and delivery expert across multiple engagements. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs such as sales forecasting, trade promotions, supply chain optimization etc... Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using PySpark, SQL, DLT (Delta Live Tables), and Databricks Workflows. Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for: Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for: Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 12-18 years of hands-on experience in data engineering, with at least 5+ years on Databricks Architecture and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on Azure Databricks using PySpark, SQL, and Databricks-native features. Familiarity with ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage (Azure Data Lake Storage Gen2) Experience designing Lakehouse architectures with bronze, silver, gold layering. Expertise in optimizing Databricks performance using Delta Lake features such as OPTIMIZE, VACUUM, ZORDER, and Time Travel Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Databricks SQL warehouse and integrating with BI tools (Power BI, Tableau, etc.). Hands-on experience designing solutions using Workflows (Jobs), Delta Lake, Delta Live Tables (DLT), Unity Catalog, and MLflow. Familiarity with Databricks REST APIs, Notebooks, and cluster configurations for automated provisioning and orchestration. Experience in integrating Databricks with CI/CD pipelines using tools such as Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning Databricks workspaces and resources In-depth experience with Azure Cloud services such as ADF, Synapse, ADLS, Key Vault, Azure Monitor, and Azure Security Centre. Strong understanding of data privacy, access controls, and governance best practices. Experience working with Unity Catalog, RBAC, tokenization, and data classification frameworks Worked as a consultant for more than 4-5 years with multiple clients Contribute to pre-sales, proposals, and client presentations as a subject matter expert. Participated and Lead RFP responses for your organization. Experience in providing solutions for technical problems and provide cost estimates Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $ 200,000 - $300,000. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $200k-300k yearly 4d ago
  • Head of Data Science & AI

    Addison Group 4.6company rating

    Austin, TX jobs

    Duration: 6 month contract-to-hire Compensation: $150K-160K Work schedule: Monday-Friday (8 AM-5PM CST) - onsite 3x per week Benefits: This position is eligible for medical, dental, vision and 401(k) The Head of Data Science & AI leads the organization's data science strategy and team, driving advanced analytics and AI initiatives to deliver business value and innovation. This role sets the strategic direction for data science, ensures alignment with organizational goals, and promotes a data-driven culture. It involves close collaboration with business and technology teams to identify opportunities for leveraging machine learning and AI to improve operations and customer experiences. Key Responsibilities Develop and execute a data science strategy and roadmap aligned with business objectives. Build and lead the data science team, providing mentorship and fostering growth. Partner with business leaders to identify challenges and deliver actionable insights. Oversee design and deployment of predictive models, algorithms, and analytical frameworks. Ensure data integrity, governance, and security in collaboration with engineering teams. Communicate complex insights to non-technical stakeholders. Manage infrastructure, tools, and budget for data science initiatives. Drive experimentation with emerging AI technologies and ensure ethical AI practices. Oversee full AI model lifecycle: development, deployment, monitoring, and compliance. Qualifications 8+ years in data science/analytics with leadership experience. Expertise in Python, R, SQL, and ML frameworks (TensorFlow, PyTorch, Scikit-Learn). Experience deploying ML models and monitoring performance. Familiarity with visualization tools (Tableau, Power BI). Strong knowledge of data governance, advanced statistical methods, and AI trends. Skills in project management tools (MS Project, JIRA) and software development best practices (CI/CD, Git, Agile). Please apply directly to be considered.
    $150k-160k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    San Jose, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Santa Rosa, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    San Francisco, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Sunnyvale, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • Senior Data Engineer

    Robert Half 4.5company rating

    Los Angeles, CA jobs

    Robert Half is partnering with a well known brand seeking an experienced Data Engineer with Databricks experience. Working alongside data scientists and software developers, you'll work will directly impact dynamic pricing strategies by ensuring the availability, accuracy, and scalability of data systems. This position is full time with full benefits and 3 days onsite in the Woodland Hills, CA area. Responsibilities: Design, build, and maintain scalable data pipelines for dynamic pricing models. Collaborate with data scientists to prepare data for model training, validation, and deployment. Develop and optimize ETL processes to ensure data quality and reliability. Monitor and troubleshoot data workflows for continuous integration and performance. Partner with software engineers to embed data solutions into product architecture. Ensure compliance with data governance, privacy, and security standards. Translate stakeholder requirements into technical specifications. Document processes and contribute to data engineering best practices. Requirements: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. 4+ years of experience in data engineering, data warehousing, and big data technologies. Proficiency in SQL and experience with relational databases (e.g., PostgreSQL, MySQL, SQL Server). Must have experience in Databricks. Experience working within Azure or AWS or GCP environment. Familiarity with big data tools like Spark, Hadoop, or Databricks. Experience in real-time data pipeline tools. Experienced with Python
    $116k-165k yearly est. 1d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Santa Clara, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Fremont, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • Data Engineer

    Robert Half Recruiting 4.5company rating

    Culver City, CA jobs

    Robert Half is partnering with a well known high tech company seeking an experienced Data Engineer with strong Python and SQL skills. The primary duties involve managing the complete data lifecycle and utilizing extensive datasets across marketing, software, and web platforms. This position is full time with full benefits and 3 days onsite in the Culver CIty area. Responsibilities: 4+ years of professional experience ideally in a combination of data engineering and business intelligence. Working heavily with SQL and programming in Python. Ownership mindset to oversee the entire data lifecycle, including collection, extraction, and cleansing processes. Building reports and data visualization to help advance business. Leverage industry-standard tools for data integration such as Talend. Work extensively within Cloud based ecosystems such as AWS and GCP ecosystems. Requirements: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. 5+ years of experience in data engineering, data warehousing, and big data technologies. Proficiency in SQL and experience with relational databases (e.g., PostgreSQL, MySQL, SQL Server) and NoSQL Technologies. Experience working within GCP environments and AWS. Experience in real-time data pipeline tools. Hands-on expertise with Google Cloud services including BigQuery. Deep knowledge of SQL including Dimension tables and experienced in Python programming.
    $116k-165k yearly est. 1d ago
  • Data Engineer

    Addison Group 4.6company rating

    Coppell, TX jobs

    Title: Data Engineer Assignment Type: 6-12 month contract-to-hire Compensation: $65/hr-$75/hr W2 Work Model: Hybrid (4 days on-site, 1 day remote) Benefits: Medical, Dental, Vision, 401(k) What we need is someone who comes 8+ years of experience in the Data Engineering space who specializes in Microsoft Azure and Databricks. This person will be a part of multiple initiatives for the "New Development" and "Data Reporting" teams but will be primarily tasked with designing, building, maintaining, and automating their enterprise data architecture/pipelines within the cloud. Technology-wise we are needing to come with skills in Azure Databricks (5+ years), cloud-based environment (Azure and/or AWS), Azure DevOps (ADO), SQL (ETL, SSIS packages), and PySpark or Scala automation. Architecture experience in building pipelines, data modeling, data pipeline deployment, data mapping, etc. Top Skills: -8+ Years of Data Engineer/Business Intelligence -Databricks and Azure Data Factory *Most updated is Unity Catalog for Databricks* -Cloud-based environments (Azure or AWS) -Data Pipeline Architecture and CI/CD methodology -SQL -Automation (Python (PySpark), Scala)
    $65-75 hourly 5d ago
  • Data Engineer

    Robert Half 4.5company rating

    Dallas, TX jobs

    We are seeking a highly experienced Senior Data Engineer with deep expertise in modern data engineering frameworks and cloud-native architectures, primarily on AWS. This role focuses on designing, building, and optimizing scalable data pipelines and distributed systems. You will collaborate cross-functionally to deliver secure, high-quality data solutions that drive business decisions. Key Responsibilities Design & Build: Develop and maintain scalable, highly available AWS-based data pipelines, specializing in EKS/ECS containerized workloads and services like Glue, EMR, and Lake Formation. Orchestration: Implement automated data ingestion, transformation, and workflow orchestration using Airflow, NiFi, and AWS Step Functions. Real-time: Architect and implement real-time streaming solutions with Kafka, MSK, and Flink. Data Lake & Storage: Architect secure S3 data storage and govern data lakes using Lake Formation and Glue Data Catalog. Optimization: Optimize distributed processing solutions (Databricks, Spark, Hadoop) and troubleshoot performance across cloud-native systems. Governance: Ensure robust data quality, security, and governance via IAM, Lake Formation controls, and automated validations. Mentorship: Mentor junior team members and foster technical excellence. Requirements Experience: 7+ years in data engineering; strong hands-on experience designing cloud data pipelines. AWS Expertise: Deep proficiency in EKS, ECS, S3, Lake Formation, Glue, EMR, IAM, and MSK. Core Tools: Strong experience with Kafka, Airflow, NiFi, Databricks, Spark, Hadoop, and Flink. Coding: Proficiency in Python, Scala, or Java for building data pipelines and automation. Databases: Strong SQL skills and experience with relational/NoSQL databases (e.g., Redshift, DynamoDB). Cloud-Native Skills: Strong knowledge of Kubernetes, containerization, and CI/CD pipelines. Education: Bachelor's degree in Computer Science or related field.
    $86k-121k yearly est. 1d ago
  • Senior Data Engineer

    Addison Group 4.6company rating

    Houston, TX jobs

    About the Role The Senior Data Engineer will play a critical role in building and scaling an enterprise data platform to enable analytics, reporting, and operational insights across the organization. This position requires deep expertise in Snowflake and cloud technologies (AWS or Azure), along with strong upstream oil & gas domain experience. The engineer will design and optimize data pipelines, enforce data governance and quality standards, and collaborate with cross-functional teams to deliver reliable, scalable data solutions. Key Responsibilities Data Architecture & Engineering Design, develop, and maintain scalable data pipelines using Snowflake, AWS/Azure, and modern data engineering tools. Implement ETL/ELT processes integrating data from upstream systems (SCADA, production accounting, drilling, completions, etc.). Architect data models supporting both operational reporting and advanced analytics. Establish and maintain frameworks for data quality, validation, and lineage to ensure enterprise data trust. Platform Development & Optimization Lead the build and optimization of Snowflake-based data warehouses for performance and cost efficiency. Design cloud-native data solutions leveraging AWS/Azure services (S3, Lambda, Azure Data Factory, Databricks). Manage large-scale time-series and operational data processing workflows. Implement strong security, access control, and governance practices. Technical Leadership & Innovation Mentor junior data engineers and provide technical leadership across the data platform team. Research and introduce new technologies to enhance platform scalability and automation. Build reusable frameworks, components, and utilities to streamline delivery. Support AI/ML initiatives by delivering production-ready, high-quality data pipelines. Business Partnership Collaborate with stakeholders across business units to translate requirements into technical solutions. Work with analysts and data scientists to enable self-service analytics and reporting. Ensure data integration supports regulatory and compliance reporting. Act as a bridge between business and technical teams to ensure alignment and impact. Qualifications & Experience Education Bachelor's degree in Computer Science, Engineering, Information Systems, or a related field. Advanced degree or relevant certifications (SnowPro, AWS/Azure Data Engineer, Databricks) preferred. Experience 7+ years in data engineering roles, with at least 3 years on cloud data platforms. Proven expertise in Snowflake and at least one major cloud platform (AWS or Azure). Hands-on experience with upstream oil & gas data (wells, completions, SCADA, production, reserves, etc.). Demonstrated success delivering operational and analytical data pipelines. Technical Skills Advanced SQL and Python programming skills. Strong background in data modeling, ETL/ELT, cataloging, lineage, and data security. Familiarity with Airflow, Azure Data Factory, or similar orchestration tools. Experience with CI/CD, Git, and automated testing. Knowledge of BI tools such as Power BI, Spotfire, or Tableau. Understanding of AI/ML data preparation and integration.
    $86k-125k yearly est. 2d ago
  • Data Architect

    Vlink Inc. 4.0company rating

    Orlando, FL jobs

    Data Architect Duration: 6 Months Responsible for enterprise-wide data design, balancing optimization of data access with batch loading and resource utilization factors. Knowledgeable in most aspects of designing and constructing data architectures, operational data stores, and data marts. Focuses on enterprise-wide data modelling and database design. Defines data architecture standards, policies and procedures for the organization, structure, attributes and nomenclature of data elements, and applies accepted data content standards to technology projects. Responsible for business analysis, data acquisition and access analysis and design, Database Management Systems optimization, recovery strategy and load strategy design and implementation. Essential Position Functions: Evaluate and recommend data management processes. Design, prepare and optimize data pipelines and workflows. Lead implementations of secure, scalable, and reliable Azure solutions. Observe and recommend how to monitor and optimize Azure for performance and cost-efficiency. Endorse and foster security best practices, access controls, and compliance standards for all data lake resources. Perform knowledge transfer about troubleshooting and documenting Azure architectures and solutions. Skills required: Deep understanding of Azure synapse Analytics, Azure Data Factory, and related Azure data tools Lead implementations of secure, scalable, and reliable Azure solutions. Observe and recommend how to monitor and optimize Azure for performance and cost efficiency. Expertise in implementing Data Vault 2.0 methodologies using Wherescape automation software. Proficient in designing and optimizing fact and dimension table models. Demonstrated ability to design, develop, and maintain data pipelines and workflows. Strong skills in formulating, reviewing, and optimizing SQL code. Expertise in data collection, storage, accessibility, and quality improvement processes. Endorse and foster security best practices, access controls, and compliance standards for all data lake resources. Proven track record of delivering consumable data using information marts. Excellent communication skills to effectively liaise with technical and non-technical team members. Ability to document designs, procedures, and troubleshooting methods clearly. Proficiency in Python or PowerShell preferred. Bachelor's or master's degree in computer science, Information Systems, or other related field. Or equivalent work experience. A minimum of 7 years of experience with large and complex database management systems.
    $86k-111k yearly est. 1d ago
  • Data Engineer

    Richard, Wayne & Roberts 4.3company rating

    Houston, TX jobs

    Python Data Engineer - Houston, TX (Onsite Only) A global energy and commodities organization is seeking an experienced Python Data Engineer to expand and optimize data assets that support high-impact analytics. This role works closely with traders, analysts, researchers, and data scientists to translate business needs into scalable technical solutions. The position is fully onsite due to the collaborative, fast-paced nature of the work. MUST come from an Oil & Gas organization, prefer commodity trading firm. CANNOT do C2C. Key Responsibilities Build modular, reusable Python components to connect external data sources with internal tools and databases. Partner with business stakeholders to define data ingestion and access requirements. Translate business requirements into well-designed technical deliverables. Maintain and enhance the central Python codebase following established standards. Contribute to internal developer tools and ETL frameworks, helping standardize and consolidate core functionality. Collaborate with global engineering teams and participate in internal Python community initiatives. Qualifications 7+ years of professional Python development experience. Strong background in data engineering and pipeline development. Experience with web scraping tools (Requests, BeautifulSoup, Selenium). Hands-on Oracle/PL SQL development, including stored procedures. Strong grasp of object-oriented design, design patterns, and service-oriented architectures. Experience with Agile/Scrum, code reviews, version control, and issue tracking. Familiarity with scientific computing libraries (Pandas, NumPy). Excellent communication skills. Industry experience in energy or commodities preferred. Exposure to containerization (Docker, Kubernetes) is a plus.
    $83k-120k yearly est. 2d ago
  • Data Architect

    Mastech Digital 4.7company rating

    Dallas, TX jobs

    Primary responsibilities of the Senior Data Architect include designing and managing Data Architectural solutions for multiple environments including but not limited to Data Warehouse, ODS, Data Replication/ETL Data Management initiatives. The candidate will be in an expert role and will work closely with Business, DBA, ETL and Data Management teams providing solutions for complex Data related initiatives. This individual will also be responsible for developing and managing Data Governance and Master Data Management solutions. This candidate must have good technical and communication skills coupled with the ability to mentor effectively. Responsibilities Establishing policies, procedures and guidelines regarding all aspects of Data Governance Ensure data decisions are consistent, and best practices are adhered to Ensure Data Standardization definitions, Data Dictionary and Data Lineage are kept up to date and accessible Work with ETL, Replication and DBA teams to determine best practices as it relates to data transformations, data movement and derivations Work with support teams to ensure consistent and pro-active support methodologies are in place for all aspects of data movements and data transformations Work with and mentor Data Architects and Data Analysts to ensure best practices are adhered to for database design and data management Assist in overall Architectural solutions including, but not limited to Data Warehouse, ODS, Data Replication/ETL Data Management initiatives Work with the business teams and Enterprise Architecture team to ensure best architectural solutions from a Data perspective Create a strategic roadmap for MDM implementation Responsible for implementing a Master Data Management tool Establishing policies, procedures and guidelines regarding all aspects of Master Data Management Ensure Architectural rules and design of the MDM process are documented and best practices are adhered to Qualifications 5+ years of Data Architecture experience, including OLTP, Data Warehouse, Big Data 5+ years of Solution Architecture experience 5+ years of MDM experience 5+ years of Data Governance experience, working knowledge of best practices Extensive working knowledge of all aspects of Data Movement and Processing, including Middleware, ETL, API, OLAP and best practices for data tracking Good Communication skills Self-Motivated Capable of presenting to all levels of audiences Works well in a team environment
    $93k-120k yearly est. 4d ago
  • Data Architect

    KPI Partners 4.8company rating

    Plano, TX jobs

    KPI Partners is a 5 times Gartner-recognized data, analytics, and AI consulting company. We are leaders in data engineering on Azure, AWS, Google, Snowflake, and Databricks. Founded in 2006, KPI has over 400 consultants and has successfully delivered over 1,000 projects to our clients. We are looking for skilled data engineers who want to work with the best team in data engineering. Title: Senior Data Architect Location: Plano, TX (Hybrid) Job Type: Contract - 6 Months Key Skills: SQL, PySpark, Databricks, and Azure Cloud Key Note: Looking for a Data Architect who is Hands-on with SQL, PySpark, Databricks, and Azure Cloud. About the Role: We are seeking a highly skilled and experienced Senior Data Architect to join our dynamic team at KPI, working on challenging and multi-year data transformation projects. This is an excellent opportunity for a talented data engineer to play a key role in building innovative data solutions using Azure Native Services and related technologies. If you are passionate about working with large-scale data systems and enjoy solving complex engineering problems, this role is for you. Key Responsibilities: Data Engineering: Design, development, and implementation of data pipelines and solutions using PySpark, SQL, and related technologies. Collaboration: Work closely with cross-functional teams to understand business requirements and translate them into robust data solutions. Data Warehousing: Design and implement data warehousing solutions, ensuring scalability, performance, and reliability. Continuous Learning: Stay up to date with modern technologies and trends in data engineering and apply them to improve our data platform. Mentorship: Provide guidance and mentorship to junior data engineers, ensuring best practices in coding, design, and development. Must-Have Skills & Qualifications: Minimum 12+ years of overall experience in IT Industry. 4+ years of experience in data engineering, with a strong background in building large-scale data solutions. 4+ years of hands-on experience developing and implementing data pipelines using Azure stack experience (Azure, ADF, Databricks, Functions) Proven expertise in SQL for querying, manipulating, and analyzing large datasets. Strong knowledge of ETL processes and data warehousing fundamentals. Self-motivated and independent, with a “let's get this done” mindset and the ability to thrive in a fast-paced and dynamic environment. Good-to-Have Skills: Databricks Certification is a plus. Data Modeling, Azure Architect Certification.
    $88k-123k yearly est. 4d ago
  • Senior Data Engineer

    Luna Data Solutions, Inc. 4.4company rating

    Austin, TX jobs

    We are looking for a seasoned Azure Data Engineer to design, build, and optimize secure, scalable, and high-performance data solutions within the Microsoft Azure ecosystem. This will be a multi-year contract worked FULLY ONSITE in Austin, TX. The ideal candidate brings deep technical expertise in data architecture, ETL/ELT engineering, data integration, and governance, along with hands-on experience in MDM, API Management, Lakehouse architectures, and data mesh or data hub frameworks. This position combines strategic architectural planning with practical, hands-on implementation, empowering cross-functional teams to leverage data as a key organizational asset. Key Responsibilities 1. Data Architecture & Strategy Design and deploy end-to-end Azure data platforms using Azure Data Lake, Azure Synapse Analytics, Azure Databricks, and Azure SQL Database. Build and implement Lakehouse and medallion (Bronze/Silver/Gold) architectures for scalable and modular data processing. Define and support data mesh and data hub patterns to promote domain-driven design and federated governance. Establish standards for conceptual, logical, and physical data modeling across data warehouse and data lake environments. 2. Data Integration & Pipeline Development Develop and maintain ETL/ELT pipelines using Azure Data Factory, Synapse Pipelines, and Databricks for both batch and streaming workloads. Integrate diverse data sources (on-prem, cloud, SaaS, APIs) into a unified Azure data environment. Optimize pipelines for cost-effectiveness, performance, and scalability. 3. Master Data Management (MDM) & Data Governance Implement MDM solutions using Azure-native or third-party platforms (e.g., Profisee, Informatica, Semarchy). Define and manage data governance, metadata, and data quality frameworks. Partner with business teams to align data standards and maintain data integrity across domains. 4. API Management & Integration Build and manage APIs for data access, transformation, and system integration using Azure API Management and Logic Apps. Design secure, reliable data services for internal and external consumers. Automate workflows and system integrations using Azure Functions, Logic Apps, and Power Automate. 5. Database & Platform Administration Perform core DBA tasks, including performance tuning, query optimization, indexing, and backup/recovery for Azure SQL and Synapse. Monitor and optimize cost, performance, and scalability across Azure data services. Implement CI/CD and Infrastructure-as-Code (IaC) solutions using Azure DevOps, Terraform, or Bicep. 6. Collaboration & Leadership Work closely with data scientists, analysts, business stakeholders, and application teams to deliver high-value data solutions. Mentor junior engineers and define best practices for coding, data modeling, and solution design. Contribute to enterprise-wide data strategy and roadmap development. Required Qualifications Bachelor's or Master's degree in Computer Science, Data Engineering, Information Systems, or related fields. 5+ years of hands-on experience in Azure-based data engineering and architecture. Strong proficiency with the following: Azure Data Factory, Azure Synapse, Azure Databricks, Azure Data Lake Storage Gen2 SQL, Python, PySpark, PowerShell Azure API Management and Logic Apps Solid understanding of data modeling approaches (3NF, dimensional modeling, Data Vault, star/snowflake schemas). Proven experience with Lakehouse/medallion architectures and data mesh/data hub designs. Familiarity with MDM concepts, data governance frameworks, and metadata management. Experience with automation, data-focused CI/CD, and IaC. Thorough understanding of Azure security, RBAC, Key Vault, and core networking principles. What We Offer Competitive compensation and benefits package Luna Data Solutions, Inc. (LDS) provides equal employment opportunities to all employees. All applicants will be considered for employment. LDS prohibits discrimination and harassment of any type regarding age, race, color, religion, sexual orientation, gender identity, sex, national origin, genetics, protected veteran status, and/or disability status.
    $74k-95k yearly est. 4d ago
  • Data Engineer

    Interactive Resources-IR 4.2company rating

    Austin, TX jobs

    About the Role We are seeking a highly skilled Databricks Data Engineer with strong expertise in modern data engineering, Azure cloud technologies, and Lakehouse architectures. This role is ideal for someone who thrives in dynamic environments, enjoys solving complex data challenges, and can lead end-to-end delivery of scalable data solutions. What We're Looking For 8+ years designing and delivering scalable data pipelines in modern data platforms Deep experience in data engineering, data warehousing, and enterprise-grade solution delivery Ability to lead cross-functional initiatives in matrixed teams Advanced skills in SQL, Python, and ETL/ELT development, including performance tuning Hands-on experience with Azure, Snowflake, and Databricks, including system integrations Key Responsibilities Design, build, and optimize large-scale data pipelines on the Databricks Lakehouse platform Modernize and enhance cloud-based data ecosystems on Azure, contributing to architecture, modeling, security, and CI/CD Use Apache Airflow and similar tools for workflow automation and orchestration Work with financial or regulated datasets while ensuring strong compliance and governance Drive best practices in data quality, lineage, cataloging, and metadata management Primary Technical Skills Develop and optimize ETL/ELT pipelines using Python, PySpark, Spark SQL, and Databricks Notebooks Design efficient Delta Lake models for reliability and performance Implement and manage Unity Catalog for governance, RBAC, lineage, and secure data sharing Build reusable frameworks using Databricks Workflows, Repos, and Delta Live Tables Create scalable ingestion pipelines for APIs, databases, files, streaming sources, and MDM systems Automate ingestion and workflows using Python and REST APIs Support downstream analytics for BI, data science, and application workloads Write optimized SQL/T-SQL queries, stored procedures, and curated datasets Automate DevOps workflows, testing pipelines, and workspace configurations Additional Skills Azure: Data Factory, Data Lake, Key Vault, Logic Apps, Functions CI/CD: Azure DevOps Orchestration: Apache Airflow (plus) Streaming: Delta Live Tables MDM: Profisee (nice-to-have) Databases: SQL Server, Cosmos DB Soft Skills Strong analytical and problem-solving mindset Excellent communication and cross-team collaboration Detail-oriented with a high sense of ownership and accountability
    $84k-111k yearly est. 4d ago

Learn more about LegalZoom jobs