Post job

Data Scientist jobs at Knowesis

- 1080 jobs
  • Data Scientist

    Kavaliro 4.2company rating

    McLean, VA jobs

    Kavaliro is seeking a Data Scientist to provide highly technical and in-depth data engineering support. The candidate MUST have experience designing and building data infrastructure, developing data pipelines, transforming and preparing data, ensuring data quality and security, and monitoring and optimizing systems. The candidate MUST have extensive experience with Python and AWS. Experience with SQL, multi-data source queries with database technologies (PostgreSQL, MySQL, RDS, etc.), NiFi, Git, Elasticsearch, Kibana, Jupyter Notebooks, NLP, AI, and any data visualization tools (Tableau, Kibana, Qlik, etc.) are desired. Required Skills and Demonstrated Experience Demonstrated experience with data engineering, to include designing and building data infrastructure, developing data pipelines, transforming/preparing data, ensuring data quality and security, and monitoring/optimizing systems. Demonstrated experience with data management and integration, including designing and perating robust data layers for application development across local and cloud or web data sources. Demonstrated work experience programming with Python Demonstrated experience building scalable ETL and ELT workflows for reporting and analytics. Demonstrated experience with general Linux computing and advanced bash scripting Demonstrated experience with SQL. Demonstrated experience constructing complex multi-data source queries with database technologies such as PostgreSQL, MySQL, Neo4J or RDS Demonstrated experience processing data sources containing structured or unstructured data Demonstrated experience developing data pipelines with NiFi to bring data into a central environment Demonstrated experience delivering results to stakeholders through written documentation and oral briefings Demonstrated experience using code repositories such as Git Demonstrated experience using Elastic and Kibana Demonstrated experience working with multiple stakeholders Demonstrated experience documenting such artifacts as code, Python packages and methodologies Demonstrated experience using Jupyter Notebooks Demonstrated experience with machine learning techniques including natural language processing Demonstrated experience explaining complex technical issues to more junior data scientists, in graphical, verbal, or written formats Demonstrated experience developing tested, reusable and reproducible work Work or educational background in one or more of the following areas: mathematics, statistics, hard sciences (e.g. Physics, Computational Biology, Astronomy, Neuroscience, etc.) computer science, data science, or business analytics Desired Skills and Demonstrated Experience Demonstrated experience with cloud services, such as AWS, as well as cloud data technologies and architecture. Demonstrated experience using big data processing tools such as Apache Spark or Trino Demonstrated experience with machine learning algorithms Demonstrated experience with using container frameworks such as Docker or Kubernetes Demonstrated experience with using data visualizations tools such as Tableau, Kibana or Apache Superset Demonstrated experience creating learning objectives and creating teaching curriculum in technical or scientific fields Location: McLean, Virginia This position is onsite and there is no remote availability. Clearance: TS/SCI with Full Scope Polygraph Applicant MUST hold a permanent U.S. citizenship for this position in accordance with government contract requirements. Kavaliro provides Equal Employment Opportunities to all employees and applicants. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. Kavaliro is committed to the full inclusion of all qualified individuals. In keeping with our commitment, Kavaliro will take the steps to assure that people with disabilities are provided reasonable accommodations. Accordingly, if reasonable accommodation is required to fully participate in the job application or interview process, to perform the essential functions of the position, and/or to receive all other benefits and privileges of employment, please respond to this posting to connect with a company representative.
    $74k-105k yearly est. 2d ago
  • Data Scientist with ML

    Kavaliro 4.2company rating

    Reston, VA jobs

    Kavaliro is seeking a Data Scientist to provide highly technical and in-depth data engineering support. MUST have experience with Python, PyTorch, Flask (knowledge at minimum with ability to quickly pickup), Familiarity with REST APIs (at minimum), Statistics background/experience, Basic understanding of NLP. Desired skills for a candidate include experience performance R&D with natural language processing, deploying CNN and LLMs or foundational models, deploying ML models on multimedia data, experience with Linux System Administration (or bash), experience with Android Configuration, experience in embedded systems (Raspberry Pi). Required Skills and Demonstrated Experience Demonstrated experience in Python, Javascript, and R. Demonstrated experience employing machine learning and deep learning modules such as Pandas, Scikit, Tensorflow, Pytorch. Demonstrated experience with statistical inference, as well as building and understanding predictive models, using machine learning methods. Demonstrated experience with large-scale text analytics. Desired Skills Demonstrated hands-on experience performing research or development with natural language processing and working with, deploying, and testing Convolutional Neural Networks (CNN), large-language models (LLMs) or foundational models. Demonstrated experience developing and deploying testing and verification methodologies to evaluate algorithm performance and identify strategies for improvement or optimization. Demonstrated experience deploying machine learning models on multimedia data, to include joint text, audio, video, hardware, and peripherals. Demonstrated experience with Linux System Administration and associated scripting languages (Bash) Demonstrated experience with Android configuration, software development, and interfacing. Demonstrated experience in embedded systems (Raspberry Pi) Develops and conducts independent testing and evaluation methods on research-grade algorithms in applicable fields. Reports results and provide documentation and guidance on working with the research-grade algorithms. Evaluates, Integrates and leverage internally-hosted data science tools. Customize research grade algorithms to be optimized for memory and computational efficiency through quantizing, trimming layers, or through custom methods Location: Reston, Virginia This position is onsite and there is no remote availability. Clearance: Active TS/SCI with Full Scope Polygraph Applicant MUST hold a permanent U.S. citizenship for this position in accordance with government contract requirements. Kavaliro provides Equal Employment Opportunities to all employees and applicants. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. Kavaliro is committed to the full inclusion of all qualified individuals. In keeping with our commitment, Kavaliro will take the steps to assure that people with disabilities are provided reasonable accommodations. Accordingly, if reasonable accommodation is required to fully participate in the job application or interview process, to perform the essential functions of the position, and/or to receive all other benefits and privileges of employment, please respond to this posting to connect with a company representative.
    $74k-105k yearly est. 4d ago
  • Data Scientist - Causal Inference and Measurement

    SGS Consulting 4.1company rating

    Menlo Park, CA jobs

    W2 Contract Role (No C2C) - Requires Big-Tech or Core Causal Inference & Measurement Experience The main function of the Data Scientist is to produce innovative solutions driven by exploratory data analysis from complex and high-dimensional datasets. Job Responsibilities Apply knowledge of statistics, machine learning, programming, data modeling, simulation, and advanced mathematics to recognize patterns, identify opportunities, pose business questions, and make valuable discoveries leading to prototype development and product improvement. Use a flexible, analytical approach to design, develop, and evaluate predictive models and advanced algorithms that lead to optimal value extraction from the data. Generate and test hypotheses and analyze and interpret the results of product experiments. Work with product engineers to translate prototypes into new products, services, and features and provide guidelines for large-scale implementation. Provide Business Intelligence (BI) and data visualization support, which includes, but limited to support for the online customer service dashboards and other ad-hoc requests requiring data analysis and visual support. Key Projects Methodological questions - optimize current methods. Analyzing marketing performance. Expert to marketing measurements. Inference in python. Candidate Value Proposition Forefront of marketing measurement research. State of the art measurement methods, marketing tools, and access to experts in the field. Skills Experienced in either programming languages such as Python and/or R, big data tools such as Hadoop, or data visualization tools such as Tableau. The ability to communicate effectively in writing, including conveying complex information and promoting in-depth engagement on course topics. Experience working with large datasets. Education/Experience Master of Science degree in computer science or in a relevant field. Top 3 must-have HARD skills Python, experience with causal inference models (impact measurement). experience with different Difference in Difference DID models. methodology experience and ability to track it using Python. Good to Have Skills Survey data analysis AI workflows
    $115k-162k yearly est. 1d ago
  • Data Scientist - ML, Python

    Avance Consulting 4.4company rating

    McLean, VA jobs

    10+years of experience required in Information Technology. Python Programming: At least 5 years of hands-on experience with Python, particularly in frameworks like FastAPI, Django, Flask, and experience using AI frameworks. • Access Control Expertise: Strong understanding of access control models such as Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC). • API and Connector Development: Experience in developing API connectors using Python for extracting and managing access control data from platforms like Azure, SharePoint, Java, .NET, WordPress, etc. • AI and Machine Learning: Hands-on experience integrating AI into applications for automating tasks such as access control reviews and identifying anomalies • Cloud and Microsoft Technologies: Proficiency with Azure services, Microsoft Graph API, and experience integrating Python applications with Azure for access control reviews and reporting. • Reporting and Visualization: Experience using reporting libraries in Python (Pandas, Matplotlib, Plotly, Dash) to build dashboards and reports related to security and access control metrics. • Communication Skills: Ability to collaborate with various stakeholders, explain complex technical solutions, and deliver high-quality solutions on time. • PlainID: Experience or familiarity with PlainID platforms for identity and access management. • Azure OpenAI: Familiarity with Azure OpenAI technologies and their application in access control and security workflows. • Power BI: Experience with Microsoft Power BI for data visualization and reporting. • Agile Methodologies: Experience working in Agile environments and familiarity with Scrum methodologies for delivering security solutions.
    $76k-111k yearly est. 4d ago
  • Data Scientist

    Us Tech Solutions 4.4company rating

    Alhambra, CA jobs

    Title: Principal Data Scientist Duration: 12 Months Contract Additional Information California Resident Candidates Only. This position is HYBRID (2 days onsite, 2 days telework). Interviews will be conducted via Microsoft Teams. The work schedule follows a 4/40 (10-hour days, Monday-Thursday), with the specific shift determined by the program manager. Shifts may range between 7:15 a.m. and 6:00 p.m. Job description: The Principal Data Scientist works to establish a comprehensive Data Science Program to advance data-driven decision-making, streamline operations, and fully leverage modern platforms including Databricks, or similar, to meet increasing demand for predictive analytics and AI solutions. The Principal Data Scientist will guide program development, provide training and mentorship to junior members of the team, accelerate adoption of advanced analytics, and build internal capacity through structured mentorship. The Principal Data Scientist will possess exceptional communication abilities, both verbal and written, with a strong customer service mindset and the ability to translate complex concepts into clear, actionable insights; strong analytical and business acumen, including foundational experience with regression, association analysis, outlier detection, and core data analysis principles; working knowledge of database design and organization, with the ability to partner effectively with Data Management and Data Engineering teams; outstanding time management and organizational skills, with demonstrated success managing multiple priorities and deliverables in parallel; a highly collaborative work style, coupled with the ability to operate independently, maintain focus, and drive projects forward with minimal oversight; a meticulous approach to quality, ensuring accuracy, reliability, and consistency in all deliverables; and proven mentorship capabilities, including the ability to guide, coach, and upskill junior data scientists and analysts. Experience Required: Five (5)+ years of professional experience leading data science initiatives, including developing machine learning models, statistical analyses, and end-to-end data science workflows in production environments. Three (3)+ years of experience working with Databricks and similar cloud-based analytics platforms, including notebook development, feature engineering, ML model training, and workflow orchestration. Three (3)+ years of experience applying advanced analytics and predictive modeling (e.g., regression, classification, clustering, forecasting, natural language processing). Two (2)+ years of experience implementing MLOps practices, such as model versioning, CI/CD for ML, MLflow, automated pipelines, and model performance monitoring. Two (2)+ years of experience collaborating with data engineering teams to design data pipelines, optimize data transformations, and implement Lakehouse or data warehouse architectures (e.g., Databricks, Snowflake, SQL-based platforms). Two (2)+ years of experience mentoring or supervising junior data scientists or analysts, including code reviews, training, and structured skill development. Two (2)+ years of experience with Python and SQL programming, using data sources such as SQL Server, Oracle, PostgreSQL, or similar relational databases. One (1)+ year of experience operationalizing analytics within enterprise governance frameworks, partnering with Data Management, Security, and IT to ensure compliance, reproducibility, and best practices. Education Required & certifications: This classification requires possession of a Master's degree or higher in Data Science, Statistics, Computer Science, or a closely related field. Additional qualifying professional experience may be substituted for the required education on a year-for-year basis. At least one of the following industry-recognized certifications in data science or cloud analytics, such as: Microsoft Azure Data Scientist Associate (DP-100) Databricks Certified Data Scientist or Machine Learning Professional AWS Machine Learning Specialty Google Professional Data Engineer • or equivalent advanced analytics certifications. The certification is required and may not be substituted with additional experience. About US Tech Solutions: US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. To know more about US Tech Solutions, please visit ************************ US Tech Solutions is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. Recruiter Details: Name: T Saketh Ram Sharma Email: ***************************** Internal Id: 25-54101
    $92k-133k yearly est. 5d ago
  • Senior Data Scientist

    Entech 4.0company rating

    Plainfield, NJ jobs

    Data Scientist - Pharmaceutical Analytics (PhD) 1 year Contract - Hybrid- Plainfield, NJ We're looking for a PhD-level Data Scientist with experience in the pharmaceutical industry and expertise working with commercial data sets (IQVIA, claims, prescription data). This role will drive insights that shape drug launches, market access, and patient outcomes. What You'll Do Apply machine learning & advanced analytics to pharma commercial data Deliver insights on market dynamics, physician prescribing, and patient behavior Partner with R&D, medical affairs, and commercial teams to guide strategy Build predictive models for sales effectiveness, adherence, and market forecasting What We're Looking For PhD in Data Science, Statistics, Computer Science, Bioinformatics, or related field 5+ years of pharma or healthcare analytics experience Strong skills in enterprise-class software stacks and cloud computing Deep knowledge of pharma market dynamics & healthcare systems Excellent communication skills to translate data into strategy
    $84k-120k yearly est. 2d ago
  • Senior Data Engineer

    Bayforce 4.4company rating

    Charlotte, NC jobs

    **NO 3rd Party vendor candidates or sponsorship** Role Title: Senior Data Engineer Client: Global construction and development company Employment Type: Contract Duration: 1 year Preferred Location: Remote based in ET or CT time zones Role Description: The Senior Data Engineer will play a pivotal role in designing, architecting, and optimizing cloud-native data integration and Lakehouse solutions on Azure, with a strong emphasis on Microsoft Fabric adoption, PySpark/Spark-based transformations, and orchestrated pipelines. This role will lead end-to-end data engineering-from ingestion through APIs and Azure services to curated Lakehouse/warehouse layers-while ensuring scalable, secure, well-governed, and well-documented data products. The ideal candidate is hands-on in delivery and also brings data architecture knowledge to help shape patterns, standards, and solution designs. Key Responsibilities Design and implement end-to-end data pipelines and ELT/ETL workflows using Azure Data Factory (ADF), Synapse, and Microsoft Fabric. Build and optimize PySpark/Spark transformations for large-scale processing, applying best practices for performance tuning (partitioning, joins, file sizing, incremental loads). Develop and maintain API-heavy ingestion patterns, including REST/SOAP integrations, authentication/authorization handling, throttling, retries, and robust error handling. Architect scalable ingestion, transformation, and serving solutions using Azure Data Lake / OneLake, Lakehouse patterns (Bronze/Silver/Gold), and data warehouse modeling practices. Implement monitoring, logging, alerting, and operational runbooks for production pipelines; support incident triage and root-cause analysis. Apply governance and security practices across the lifecycle, including access controls, data quality checks, lineage, and compliance requirements. Write complex SQL, develop data models, and enable downstream consumption through analytics tools and curated datasets. Drive engineering standards: reusable patterns, code reviews, documentation, source control, and CI/CD practices. Requirements: Bachelor's degree (or equivalent experience) in Computer Science, Engineering, or a related field. 5+ years of experience in data engineering with strong focus on Azure Cloud. Strong experience with Azure Data Factory pipelines, orchestration patterns, parameterization, and production support. Strong hands-on experience with Synapse (pipelines, SQL pools and/or Spark), and modern cloud data platform patterns. Advanced PySpark/Spark experience for complex transformations and performance optimization. Heavy experience with API-based integrations (building ingestion frameworks, handling auth, pagination, retries, rate limits, and resiliency). Strong knowledge of SQL and data warehousing concepts (dimensional modeling, incremental processing, data quality validation). Strong understanding of cloud data architectures including Data Lake, Lakehouse, and Data Warehouse patterns. Preferred Skills Experience with Microsoft Fabric (Lakehouse/Warehouse/OneLake, Pipelines, Dataflows Gen2, notebooks). Architecture experience (formal or informal), such as contributing to solution designs, reference architectures, integration standards, and platform governance. Experience with DevOps/CI-CD for data engineering using Azure DevOps or GitHub (deployment patterns, code promotion, testing). Experience with Power BI and semantic model considerations for Lakehouse/warehouse-backed reporting. Familiarity with data catalog/governance tooling (e.g., Microsoft Purview).
    $70k-93k yearly est. 4d ago
  • Data Architect - Azure Databricks

    Fractal 4.2company rating

    Palo Alto, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Job Posting Title: Principal Architect - Azure Databricks Job Description Seeking a visionary and hands-on Principal Architect to lead large-scale, complex technical initiatives leveraging Databricks within the healthcare payer domain. This role is pivotal in driving data modernization, advanced analytics, and AI/ML solutions for our clients. You will serve as a strategic advisor, technical leader, and delivery expert across multiple engagements. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs such as sales forecasting, trade promotions, supply chain optimization etc... Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using PySpark, SQL, DLT (Delta Live Tables), and Databricks Workflows. Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for: Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for: Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 12-18 years of hands-on experience in data engineering, with at least 5+ years on Databricks Architecture and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on Azure Databricks using PySpark, SQL, and Databricks-native features. Familiarity with ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage (Azure Data Lake Storage Gen2) Experience designing Lakehouse architectures with bronze, silver, gold layering. Expertise in optimizing Databricks performance using Delta Lake features such as OPTIMIZE, VACUUM, ZORDER, and Time Travel Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Databricks SQL warehouse and integrating with BI tools (Power BI, Tableau, etc.). Hands-on experience designing solutions using Workflows (Jobs), Delta Lake, Delta Live Tables (DLT), Unity Catalog, and MLflow. Familiarity with Databricks REST APIs, Notebooks, and cluster configurations for automated provisioning and orchestration. Experience in integrating Databricks with CI/CD pipelines using tools such as Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning Databricks workspaces and resources In-depth experience with Azure Cloud services such as ADF, Synapse, ADLS, Key Vault, Azure Monitor, and Azure Security Centre. Strong understanding of data privacy, access controls, and governance best practices. Experience working with Unity Catalog, RBAC, tokenization, and data classification frameworks Worked as a consultant for more than 4-5 years with multiple clients Contribute to pre-sales, proposals, and client presentations as a subject matter expert. Participated and Lead RFP responses for your organization. Experience in providing solutions for technical problems and provide cost estimates Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $ 200,000 - $300,000. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $200k-300k yearly 4d ago
  • AWS Data Architect

    Fractal 4.2company rating

    San Jose, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Santa Rosa, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    San Francisco, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Sunnyvale, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Santa Clara, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • AWS Data Architect

    Fractal 4.2company rating

    Fremont, CA jobs

    Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work Institute and recognized as a ‘Cool Vendor' and a ‘Vendor to Watch' by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Fractal is looking for a proactive and driven AWS Lead Data Architect/Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines. Responsibilities: Design & Architecture of Scalable Data Platforms Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management). Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility. Client & Business Stakeholder Engagement Partner with business stakeholders to translate functional requirements into scalable technical solutions. Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases Data Pipeline Development & Collaboration Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets. Performance, Scalability, and Reliability Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques. Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools Security, Compliance & Governance Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies. Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging. Adoption of AI Copilots & Agentic Development Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks. Generating documentation and test cases to accelerate pipeline development. Interactive debugging and iterative code optimization within notebooks. Advocate for agentic AI workflows that use specialized agents for Data profiling and schema inference. Automated testing and validation. Innovation and Continuous Learning Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements: Bachelor's or master's degree in computer science, Information Technology, or a related field. 8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark. Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS/Azure/GCP using Python, PySpark, SQL. Excellent hands on experience with workload automation tools such as Airflow, Prefect etc. Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage Experience designing Lakehouse architectures with bronze, silver, gold layering. Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing. Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.). Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions. Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resources In-depth experience with AWS Cloud services such as Glue, S3, Redshift etc. Strong understanding of data privacy, access controls, and governance best practices. Experience working with RBAC, tokenization, and data classification frameworks Excellent communication skills for stakeholder interaction, solution presentations, and team coordination. Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements. Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution quality Must be able to work in PST time zone. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: $150k - $180k. In addition, you may be eligible for a discretionary bonus for the current performance period. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    $150k-180k yearly 5d ago
  • Data Engineer

    Yochana 4.2company rating

    Charlotte, NC jobs

    Job Title: Azure Databricks Engineer (Onsite) Years of Experience: 7- 12 Years Full Time We are seeking a highly skilled and motivated Technical Team Lead with extensive experience in Azure Databricks to join our dynamic team. The ideal candidate will possess a strong technical background, exceptional leadership abilities, and a passion for driving innovative solutions. As a Technical Team Lead, you will be responsible for guiding a team of developers and engineers in the design, development, and implementation of data driven solutions that leverage Azure Databricks. Responsibilities: Lead and mentor a team of technical professionals, fostering a collaborative and high performance culture. Design and implement data processing solutions using Azure Databricks, ensuring scalability and efficiency. Collaborate with cross functional teams to gather requirements and translate them into technical specifications. Oversee the development lifecycle, from planning and design to deployment and maintenance. Conduct code reviews and provide constructive feedback to team members to ensure code quality and adherence to best practices. Stay up to date with industry trends and emerging technologies related to Azure Databricks and data engineering. Facilitate communication between technical and non technical stakeholders to ensure alignment on project goals. Identify and mitigate risks associated with project delivery and team performance. Mandatory Skills: Proven expertise in Azure Databricks, including experience with Spark, data pipelines, and data lakes. Strong programming skills in languages such as Python, Scala, or SQL. Experience with cloud based data storage solutions, particularly Azure Data Lake Storage and Azure SQL Database. Solid understanding of data modeling, ETL processes, and data warehousing concepts. Demonstrated ability to lead technical teams and manage multiple projects simultaneously. Preferred Skills: Familiarity with Azure DevOps for CI/CD processes. Experience with machine learning frameworks and libraries. Knowledge of data governance and compliance standards. Strong analytical and problem solving skills. Excellent communication and interpersonal skills. Qualifications: Bachelor's degree in Computer Science, Information Technology, or a related field. 7 10 years of experience in data engineering, software development, or a related technical role. Proven track record of leading technical teams and delivering successful projects. Relevant certifications in Azure or data engineering are a plus. If you are a passionate Technical Team Lead with a strong background in Azure Databricks and a desire to drive innovation, we encourage you to apply and join our team
    $80k-111k yearly est. 2d ago
  • Data Architect

    Optech 4.6company rating

    Cincinnati, OH jobs

    THIS IS A W2 (NOT C2C OR REFERRAL BASED) CONTRACT OPPORTUNITY REMOTE MOSTLY WITH 1 DAY/MO ONSITE IN CINCINNATI-LOCAL CANDIDATES TAKE PREFERENCE RATE: $75-85/HR WITH BENEFITS We are seeking a highly skilled Data Architect to function in a consulting capacity to analyze, redesign, and optimize a Medical Payments client's environment. The ideal candidate will have deep expertise in SQL, Azure cloud services, and modern data architecture principles. Responsibilities Design and maintain scalable, secure, and high-performing data architectures. Lead migration and modernization projects in heavy use production systems. Develop and optimize data models, schemas, and integration strategies. Implement data governance, security, and compliance standards. Collaborate with business stakeholders to translate requirements into technical solutions. Ensure data quality, consistency, and accessibility across systems. Required Qualifications Bachelor's degree in Computer Science, Information Systems, or related field. Proven experience as a Data Architect or similar role. Strong proficiency in SQL (query optimization, stored procedures, indexing). Hands-on experience with Azure cloud services for data management and analytics. Knowledge of data modeling, ETL processes, and data warehousing concepts. Familiarity with security best practices and compliance frameworks. Preferred Skills Understanding of Electronic Health Records systems. Understanding of Big Data technologies and modern data platforms outside the scope of this project.
    $75-85 hourly 3d ago
  • Azure Data Engineer

    Sharp Decisions 4.6company rating

    Jersey City, NJ jobs

    Title: Senior Azure Data Engineer Client: Major Japanese Bank Experience Level: Senior (10+ Years) The Senior Azure Data Engineer will design, build, and optimize enterprise data solutions within Microsoft Azure for a major Japanese bank. This role focuses on architecting scalable data pipelines, enhancing data lake environments, and ensuring security, compliance, and data governance best practices. Key Responsibilities: Develop, maintain, and optimize Azure-based data pipelines and ETL/ELT workflows. Design and implement Azure Data Lake, Synapse, Databricks, and ADF solutions. Ensure data security, compliance, lineage, and governance controls. Partner with architecture, data governance, and business teams to deliver high-quality data solutions. Troubleshoot performance issues and improve system efficiency. Required Skills: 10+ years of data engineering experience. Strong hands-on expertise with Azure Synapse, Azure Data Factory, Azure Databricks, Azure Data Lake, and Azure SQL. Azure certifications strongly preferred. Strong SQL, Python, and cloud data architecture skills. Experience in financial services or large enterprise environments preferred.
    $77k-101k yearly est. 2d ago
  • Big Data Engineer

    Kellymitchell Group 4.5company rating

    Santa Monica, CA jobs

    Our client is seeking a Big Data Engineer to join their team! This position is located in Santa Monica, California. Design and build core components of a large-scale data platform for both real-time and batch processing, owning key features of big data applications that evolve with business needs Develop next-generation, cloud-based big data infrastructure supporting batch and streaming workloads, with continuous improvements to performance, scalability, reliability, and availability Champion engineering excellence, promoting best practices such as design patterns, CI/CD, thorough code reviews, and automated testing Drive innovation, contributing new ideas and applying cutting-edge technologies to deliver impactful solutions Participate in the full software development lifecycle, including system design, experimentation, implementation, deployment, and testing Collaborate closely with program managers, product managers, SDETs, and researchers in an open, agile, and highly innovative environment Desired Skills/Experience: Bachelor's degree in a STEM field such as: Science, Technology, Engineering, Mathematics 5+ years of relevant professional experience 4+ years of professional software development experience using Java, Scala, Python, or similar programming languages 3+ years of hands-on big data development experience with technologies such as Spark, Flink, SingleStore, Kafka, NiFi, and AWS big data tools Strong understanding of system and application design, architecture principles, and distributed system fundamentals Proven experience building highly available, scalable, and production-grade services Genuine passion for technology, with the ability to work across interdisciplinary areas and adopt new tools or approaches Experience processing massive datasets at the petabyte scale Proficiency with cloud infrastructure and DevOps tools, such as Terraform, Kubernetes (K8s), Spinnaker, IAM, and ALB Hands-on experience with modern data warehousing and analytics platforms, including ClickHouse, Druid, Snowflake, Impala, Presto, Kinesis, and more Familiarity with common web development frameworks, such as Spring Boot, React.js, Vue.js, or Angular Benefits: Medical, Dental, & Vision Insurance Plans Employee-Owned Profit Sharing (ESOP) 401K offered The approximate pay range for this position is between $52.00 and $75.00. Please note that the pay range provided is a good faith estimate. Final compensation may vary based on factors including but not limited to background, knowledge, skills, and location. We comply with local wage minimums.
    $52-75 hourly 4d ago
  • Senior Data Engineer

    Kellymitchell Group 4.5company rating

    Glendale, CA jobs

    Our client is seeking a Senior Data Engineer to join their team! This position is located in Glendale, California. Contribute to maintaining, updating, and expanding existing Core Data platform data pipelines Build tools and services to support data discovery, lineage, governance, and privacy Collaborate with other software and data engineers and cross-functional teams Work with a tech stack that includes Airflow, Spark, Databricks, Delta Lake, Kubernetes, and AWS Collaborate with product managers, architects, and other engineers to drive the success of the Core Data platform Contribute to developing and documenting internal and external standards and best practices for pipeline configurations, naming conventions, and more Ensure high operational efficiency and quality of Core Data platform datasets to meet SLAs and ensure reliability and accuracy for stakeholders in Engineering, Data Science, Operations, and Analytics Participate in agile and scrum ceremonies to collaborate and refine team processes Engage with customers to build relationships, understand needs, and prioritize both innovative solutions and incremental platform improvements Maintain detailed documentation of work and changes to support data quality and data governance requirements Desired Skills/Experience: 5+ years of data engineering experience developing large data pipelines Proficiency in at least one major programming language such as: Python, Java or Scala Strong SQL skills and the ability to create queries to analyze complex datasets Hands-on production experience with distributed processing systems such as Spark Experience interacting with and ingesting data efficiently from API data sources Experience coding with the Spark DataFrame API to create data engineering workflows in Databricks Hands-on production experience with data pipeline orchestration systems such as Airflow for creating and maintaining data pipelines Experience developing APIs with GraphQL Deep understanding of AWS or other cloud providers, as well as infrastructure-as-code Familiarity with data modeling techniques and data warehousing best practices Strong algorithmic problem-solving skills Excellent written and verbal communication skills Advanced understanding of OLTP versus OLAP environments Benefits: Medical, Dental, & Vision Insurance Plans Employee-Owned Profit Sharing (ESOP) 401K offered The approximate pay range for this position is between $51.00 and $73.00. Please note that the pay range provided is a good faith estimate. Final compensation may vary based on factors including but not limited to background, knowledge, skills, and location. We comply with local wage minimums.
    $51-73 hourly 3d ago
  • AWS Data Engineer

    Mindlance 4.6company rating

    McLean, VA jobs

    Responsibilities: Design, build, and maintain scalable data pipelines using AWS Glue and Databricks. Develop and optimize ETL/ELT processes using PySpark and Python. Collaborate with data scientists, analysts, and stakeholders to enable efficient data access and transformation. Implement and maintain data lake and warehouse solutions on AWS (S3, Glue Catalog, Redshift, Athena, etc.). Ensure data quality, consistency, and reliability across systems. Optimize performance of large-scale distributed data processing workflows. Develop automation scripts and frameworks for data ingestion, transformation, and validation. Follow best practices for data governance, security, and compliance. Required Skills & Experience: 5-8 years of hands-on experience in Data Engineering. Strong proficiency in Python and PySpark for data processing and transformation. Expertise in AWS services - particularly Glue, S3, Lambda, Redshift, and Athena. Hands-on experience with Databricks for building and managing data pipelines. Experience working with large-scale data systems and optimizing performance. Solid understanding of data modeling, data lake architecture, and ETL design principles. Strong problem-solving skills and ability to work independently in a fast-paced environment. “Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of - Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans.”
    $85k-113k yearly est. 4d ago

Learn more about Knowesis jobs

Most common jobs at Knowesis