FIND PERSONALIZED JOBS
Sign up to Zippia and discover your career options with your personalized career search.
Sorry, we can't find that. Please try a different city or state.
APPLY NOW
Apply Now
×
FIND
PERSONALIZED JOBS

Sorry, we can't find that. Please try a different city or state.

CONTENT HAS
BEEN UNLOCKED
Close this window to view unlocked content
or
find interesting jobs in

Log In

Log In to Save

Sign Up to Save

Sign Up to Dismiss

Sign Up

SIGN UP TO UNLOCK CONTENT

or

The email and password you specified are invalid. Please, try again.

Email and password are mandatory

Forgot Password?

Don't have an account? Sign Up

reset password

Enter your email address and we will send you a link to reset your password.

Back to Log In

Log In

Log In to Save

Sign Up to Save

Sign Up to Dismiss

Sign up to save the job and get personalized job recommendations.

Sign up to dismiss the job and get personalized job recommendations.

or

The email and password you specified are invalid. Please, try again.

Email and password are mandatory

Already have an account? Log in

reset password

Enter your email address and we will send you a link to reset your password.

Back to Log In

Company Saved

Answer a few questions and view jobs at that match your preferences.

Where do you want to work?

Job Saved

See your Saved Jobs now

or

find more interesting jobs in

Job Dismissed

Find better matching jobs in

Your search has been saved!

Top 50 Data Engineer Skills

Below we've compiled a list of the most important skills for a Data Engineer. We ranked the top skills based on the percentage of Data Engineer resumes they appeared on. For example, 8.5% of Data Engineer resumes contained Pl/Sql as a skill. Let's find out what skills a Data Engineer actually needs in order to be successful in the workplace.

These Are The Most Important Skills For A Data Engineer

1. Pl/Sql
demand arrow
high Demand
Here's how Pl/Sql is used in Data Engineer jobs:
  • Developed database triggers, packages, functions, and stored procedures using PL/SQL and maintained the scripts for various data feeds.
  • Have extensive experience in developing efficient PL/SQL code for OLTP environment and ETL code for the Data Warehouse environment.
  • Worked on Programming using PL/SQL, Stored Procedures, Functions, Packages, Database triggers for Oracle and SQL.
  • Created database objects such as stored procedures, views, and materialized views in Oracle 10g using PL/SQL.
  • Worked on HRIS Data warehouse and wrote new PL/SQL functions to add Stock vest logic for different countries.
  • Developed suite of reports used by Domain Experts, requiring complex SQL and PL/SQL.
  • Major technologies: Oracle, PL/SQL, XML, J2EE, Java, Agile/Scrum methodologies
  • Developed PL/SQL, Perl scripts and custom ETL programs for NOC Data Warehouse.
  • Created various PL/SQL stored procedures for dropping and recreating indexes on target tables.
  • Coded and implemented PL/SQL packages to perform application security and batch job scheduling.
  • Developed modules in PL/SQL and scheduled them to run using BMC scheduling software.
  • Developed PL/SQL procedures, functions and packages for automating the monthly reports.
  • Involved in developing high performance database applications using SQL and PL/SQL.
  • Developed and tested stored procedures, functions and packages in PL/SQL.
  • Object Orient programming implementation in PL/SQL by using Object Types.
  • Designed and developed Test Harness using Perl and PL/SQL.
  • Used PL/SQL as a QA tool for all changes.
  • Managed, cleaned data, removed unusable traces, correct/ remove errors of flight information with Oracle PL/SQL 2.
  • Lead the development of new ETL processes using PL/SQL, Informatica and Oracle Warehouse Builder.
  • Executed this through scripts, PL/SQL code and sync using Informatica.

Show More

2. Database
demand arrow
high Demand
Here's how Database is used in Data Engineer jobs:
  • Developed normalized Logical and Physical database models to design OLTP system.
  • Inspected the infrastructure layout and configuration across Network, SAN storage, Database Server, and SOA J2EE Application servers.
  • Created SQL codes from data models and interacted with DBA's to create development, testing and production database.
  • Reduced a production database s size from over 100 GB to less than 10 GB by improving data management
  • Load data into clients' data validation and training databases to perform data comparison and updates.
  • Extracted data from various sources relational databases like SQL Server, Oracle, and PostgreSQL.
  • Created test fitness database with 6 months data and 50,000 users to test fitness API.
  • Added database objects (like indexes, partitions in oracle database) for performance.
  • Checked to see if the data models and the databases are in sync.
  • Involved in the design review of the finalized database model.
  • Reviewed and developed data models and database designs with Architects.
  • Manage and maintain various databases for network.
  • Created a database discovery process using SCCM.
  • Used Pentaho to migrate data between different databases and applications.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI games team.
  • Architected CRM Database System to store membership data and analyze member's engagement level with the chapter.
  • Research data in non-relational databases using Keen IO for javascript events in the company's websites.
  • Designed and modeled a geodatabase as well as a geometric network.
  • Use of Visio engineer a design or view database layout.
  • Transfer data from Hive to Oracle or NoSQL database Vertica.

Show More

5,722 Database Jobs

No jobs at selected location

3. ETL
demand arrow
high Demand
Here's how ETL is used in Data Engineer jobs:
  • Designed ETL architecture and architecture documents and mapping documents.
  • Modeled the mart design and created ETL to store the data in multiple pivots along with aggregated tables for faster access.
  • Designed and developed ETL scripts to process past chat records to load into Agent IQ's data warehouse.
  • Conducted peer design and code reviews and prepared checklists for standards, best practices, and ETL procedures.
  • Performed ETL and classification of districts into areas of high and low energy consumption rates.
  • Identify multiple sources of Physician Master Data and develop ETL for integration into MDM Hub.
  • Developed process to read files and performed ETL through spark RDD and data frame.
  • Established best practice and standard for ETL process documents.
  • Coordinate the efforts of the ETL Development team.
  • Develop and manage ETL processes.
  • Developed the ETL pipelines, created BO universes, implemented extraction schedules and created test procedures for the Salesforce Service group.
  • Worked on Google cloud platform to build pipeline in big query, ETL development tasks and ensuring workflows of pipeline.
  • Documented routine tasks like ETL process flow diagrams, mapping specs, source-target matrixand unit test documentation.
  • Identified data sources and performed ETL tasks to bring in the data from disparate source systems.
  • Worked with Business Developer team in generating customized reports and ETL workflows in Data Stage.
  • Prepared ETL scheduled jobs and integrated them with the existing codebase.
  • Used Pentaho for execution and scheduling of ETL jobs.
  • Design ETL processes using Informatica tool.
  • Created new shell executable files to execute ETL jobs and added new crontab entries to schedule ETLs.
  • Worked with unstructured/semi-structured data in terabytes size Responsible for ETL process using Pig, Sqoop, Flume and Oozie.

Show More

638 ETL Jobs

No jobs at selected location

4. Hadoop
demand arrow
high Demand
Here's how Hadoop is used in Data Engineer jobs:
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Involved in installing Hadoop Ecosystem components.
  • Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
  • Involved in Capacity planning, Hardware planning, Installation, Performance tuning of Hadoop Ecosystem.
  • Build Hadoop Data Lake using SQOOP, HIVE, and PIG for all reporting requests.
  • Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
  • Create ETL jobs and implement Machine Learning models in Hadoop using Java & Cascading.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Developed ETL routines to move 400 TB data from Oracle to Hadoop.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Experienced in managing and reviewing Hadoop log files.
  • Used Sqoop to import the data from RDMS to Hadoop Distributed File System (HDFS).
  • Worked with engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Implemented an end to end Hadoop System using HIVE, Spark, Hadoop and Pig.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.
  • Monitored and imported preexisting Oozie work flow for pig and Hadoop jobs.
  • Deployed the Big Data Hadoop application using Talend on cloud AWS.
  • Trained on Hadoop tool like Hive, PIG and Sqoop.
  • Worked with data ingestion teams for data migration from EDW to Hadoop using Falcon, Oozie and Sqoop.
  • Architected and Implemented Cloudera Hadoop DataLake for Enterprise Sales data.

Show More

1,354 Hadoop Jobs

No jobs at selected location

5. Hdfs
demand arrow
high Demand
Here's how Hdfs is used in Data Engineer jobs:
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Involved in loading data from UNIX file system to HDFS using Flume and Kettle and HDFS API.
  • Maintain System integrity of all sub-components (primarily HDFS, MR and Flume).
  • Transfer of data between HDFS to database can be achieved using Spark-submit job.
  • Developed hive queries and UDF's to analyze/transform the data in HDFS.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in loading data from UNIX file system to HDFS.
  • Loaded data from Oracle database into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Captured the weblog, call log streaming data from Application servers and moved it into HDFS using Flume.
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Utilized Sqoop to import data from Oracle to HDFS.
  • Used NoSQL, HDFS, MySQL, Oracle, Hive, Oozie, Java, Python, Tableau & UNIX scripting.
  • Applied the machine learning KMeans algorithm on the preprocessed data to predict the customer behavior and stored the result in HDFS.
  • Involved in importing data from Oracle tables to HDFS and Hbase tables using Sqoop.
  • Used apache Spark connector to send data from HDFS/hive to NO-SQL database Vertica.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Imported data frequentlyfrom MySQL to HDFS using Sqoop.
  • Utilized Sqoop, MapReduce, HDFS, Pig, Oozie, Spark, QlikView, Hue & HAWQ

Show More

111 Hdfs Jobs

No jobs at selected location

Job type you want
Full Time
Part Time
Internship
Temporary
6. Python
demand arrow
high Demand
Here's how Python is used in Data Engineer jobs:
  • Developed fuzzy-text matching program in Python to identify and eliminate redundancy on county instruments.
  • Used Python to scrub and transform medical procedure data, and loaded clean data into ElasticSearch.
  • Created Python programs to scrape text from different websites containing scanned images of county records.
  • Gathered data on select companies by creating unique web-scraping programs in Ruby and Python.
  • Extracted feeds from social media using python and shell scripting.
  • Extracted & loaded data from unstructured flat files using Python.
  • Automated some of Batch jobs using Python Scripts.
  • Used HDP 2.3 and CDH 4.x distributions, Hive 0.x, 1.x, Spark 1.6.1/2.x, Oracle DB, Python 2.x
  • Developed rule based analytics on Time Series sensor data using python to detect anomaly for thermal power generation assets.
  • Develop python scripts and T-SQL stored procedures and functions for data file imports, integrating data from different systems.
  • Experience in writing job flows in python which integrates developed java code and shell scripts which run onAzkabanserver.
  • Authored open-source Python library, KettleParser, for parsing Pentaho Data Integration (Kettle) transformation files.
  • Create and maintain in-house data pipeline using python Celery, Flower, RabbitMQ, and Redshift.
  • Create a custom logging framework in Python for all applications running on Hadoop cluster.
  • Performed importing data from various sources to the Cassandra cluster using Java/Python APIs.
  • Optimized searching and analyzing algorithm and reduced 37% time needed Developed Suspicious Passenger Detection Model in Python 1.
  • Created Oozie workflows to run Hive, Unix shell scripts, MapReduce and Python programs.
  • Created process to develop and to deploy Dockerized Python RESTful services.
  • PROJECT: Customizable Dashboard Tools and Algorithms: R Shiny Apps, Hadoop, Python, SAS, Tableau.
  • Worked on data load from various sources i.e., Oracle, Cassandra, Mongodb, Hadoop using Sqoop and Python script.

Show More

4,707 Python Jobs

No jobs at selected location

7. Sqoop
demand arrow
high Demand
Here's how Sqoop is used in Data Engineer jobs:
  • Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
  • Transformed and aggregated data for analysis by implementing work flow management of Sqoop and Hive scripts.
  • Experience in injecting data from multiple data sources to HDFS and Hive Tables using Sqoop.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Configured periodic incremental imports of data from MySQL into HDFS using Sqoop.
  • Designed and developed a framework to automate the creation of Sqoop jobs.
  • Utilized Sqoop to import and output data between HDFS and Oracle database.
  • Used Sqoop extensively to import data from RDMS sources into HDFS.
  • Exported the patterns analyzed back to MYSQL using Sqoop.
  • Created Sqoop import and export jobs for multiple sources.
  • Used Sqoop to transfer data between database and HDFS.
  • Played a substantial role in validating and loading data using Hive, Sqoop and Hadoop features from disparate file formats.
  • Worked on pulling the data from oracle databases into the hadoop cluster using the sqoop import.
  • Worked on data load from various sources i.e., Oracle, MySQL, Hadoop using Sqoop.
  • Imported all the Customer specific personal data to Hadoop using SQOOP component of Hadoop.
  • Developed scripts to automate the creation Sqoop jobs for various workflows.
  • Automated the Sqoop jobs in a timely manner using Shell Scripting.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS Hands-on experience with productionalizing Hadoop applications viz.

Show More

276 Sqoop Jobs

No jobs at selected location

8. Analytics
demand arrow
high Demand
Here's how Analytics is used in Data Engineer jobs:
  • Designed ETL processes for Data migration between Data warehouses to improve business intelligence, analytics and reports across other factory applications.
  • Used advanced analytics techniques such as regression, decision trees and optimization to design algorithms for advertisement targeting.
  • Synthesized and advocated insights, and recommendations from data analytics and modeling.
  • Developed real time data clustering sub-system to implement unsupervised analytics.
  • Designed and developed real-time analytics processing tools, as well as batch processes to analyze approximately 7 million signals each day.
  • Worked with product and analytics team to test hypothesis, answer key questions, and create reports and dashboards.
  • Worked on Recruiting Analytics (RA), a dimensional model designed to analyze the recruiting data in Amazon.
  • Experience working with several windowing and Analytics functions of Hive for aggregating data of a specific range.
  • Use various data science machine learning libraries to develop applications used for predictive analytics.
  • Support analytics group by adding additional fields and processing as needed.
  • Build and present proposals for Analytics work, lead project teams, interface with both business and analytics sides of clients.
  • Developed BI analytics and reporting using SAS BI Tools and Cognos (v10) for Marketing and Finance team.
  • Develop and deliver the data infrastructure required to support needs of predictive modeling and analytics for Claims Analytics.
  • Developed BI analytics and reporting using Microstrategy for Sales, Marketing, and Finance.
  • Praise award for design and implementation of log analytics pipeline at Citi.
  • Implemented Portfolio analytics using HIVE, Spark, Hadoop and Pig.
  • Worked as a analytics engineer with Lehigh University, Dept.
  • Developed integrations between the bigdata platform and proprietary telco analytics products.
  • Project involved implementing the Sales Data Analytics for ecommerce platform for Netherlands, US, Canada multichannel eCommerce applications.
  • Automated basic reports through different kind of APIs like Google Analytics, Youtube Analytics, ComScore, DoubleClick, Facebook.

Show More

3,104 Analytics Jobs

No jobs at selected location

9. Web Application
demand arrow
high Demand
Here's how Web Application is used in Data Engineer jobs:
  • Created web applications that facilitate running simulations and fine-tuning various traffic model parameters and interacting with the results directly.
  • Designed and implemented web application security based on Basic Authentication and AES-256.
  • Position: Software developer 06/2005-09/2005 Developed Web applications using JSP
  • Created PL/SQL packages to be called by Web Application to process and store data for Consumer Data Privacy Preference.
  • Collaborated with Web Application Engineers, used Python scripts to load the data into AWS Cloud Cassandra database.
  • Used AWS Elastic Beanstalk service for deploying and scaling web applications and services developed with Java, Python.
  • Closed worked with the web application development team to develop the user interface for data movement framework.
  • Designed AWS Cloud Formation templates to create and ensure successful deployment of Web applications and database templates.
  • Developed web application using JSP custom tag libraries, Struts Action classes and Action.
  • Supported and taught co-workers SQL and Python strategies using web applications and knowledge.
  • Designed and developed SQL database and C#/ASP.Net web application for lease tracking.
  • Enhanced and created over 20 PL/pgSQL analytical functions used by our customer-facing web application.

Show More

879 Web Application Jobs

No jobs at selected location

10. Cloud
demand arrow
high Demand
Here's how Cloud is used in Data Engineer jobs:
  • Work as software developer in Dev-Ops Model to support Data Platform Engineering Service in Private Cloud.
  • Lead Ferret is a cloud-based data management platform loaded with 30+ million B2B companies and contacts.
  • Manage a team of subject matter experts for Cloud Security and Data Loss Protection.
  • Monitor and evaluate the industry trends and directions in the Cloud technologies and tools.
  • Evaluate Puppet framework and tools to automate the cloud deployment and operations.
  • Team builds big data platforms to create clusters in the clouds.
  • Work supported a large scale data framework using cloud technologies.
  • Design and Implement AWS environment end-to-end solution for Cloud Migration.
  • Developed a public private cloud platform decision model.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Deployed Cloudera Manager to improve cluster manageability.
  • Interact with the Vendor (Cloudera)for any Technical issues.
  • Monitored continuously and managed the hadoop cluster using cloudera manager.
  • Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
  • Implemented "infrastructure as code" practice for resource provisioning at AWS using CloudFormation.
  • Supported Aretta Cloud PBX, and Broadsoft Virtual Cloud Phone Systems and Networks.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Configured 10 node Cloudera Hadoop Cluster as a part of POC.
  • Worked profoundly on creating on-demand Hadoop cloud for development environments using virtualization so lutions like Docker and vagrant.

Show More

13,522 Cloud Jobs

No jobs at selected location

11. Data Warehouse
demand arrow
high Demand
Here's how Data Warehouse is used in Data Engineer jobs:
  • Prioritize, address and resolve support requests regarding data warehouse/reporting systems, coordinating with other IT departments when necessary.
  • Develop and implement data warehouse solutions while observing industry best practices and established standard operating procedures.
  • Created several SSIS packages in SQL Server 2012 environment to load data from OLTP to staging and to Data Warehouse incrementally.
  • Managed SAP ERP/SAP BW/SAP BPC/ SAP BO and Enterprise Data warehouse implementation for over 10 years.
  • Worked with multiple end to end implementations of Enterprise Data Warehouse projects and Data Migration projects.
  • Established and Managed Global Enterprise Data warehouse and Data Management life cycle for around 9 years.
  • Managed and created all Data Warehouse loads using SAP Business Objects and Microsoft SQL Server.
  • Created the entire infrastructure in data warehouse (DW) that is needed for scoring.
  • Review, repair, and enhance existing data warehouse structures and reporting systems
  • Developed new fact tables and dimensions for Data Warehouse.
  • Design data warehouse for reporting.
  • Architected/Implemented data warehouse platform utilizing feeds from all Brokerage companies.
  • Performed comparative performance benchmarking and tuning of Data Warehouse components.
  • Designed and developed a multi-terabyte data warehouse.
  • Implemented an efficient data warehouse whose processing speed is 7.3 times faster than the second fast data warehouse 3.
  • Work in conjunction with Business Analyst, DBA, Data Architects on the backend data warehouse reporting solutions.
  • Investigated data technologies such as AWS Redshift, Azure Data Warehouse, ScaleDB, MySQL, and PostgreSQL.
  • Performed data modelling in Data Warehouse after discussions with the team including the architect.
  • Led a team of ETL engineers in the full lifecycle design and deployment of the Hotwire enterprise data warehouse.
  • Involved in Designing, Modelling the Data Architecture of the of the Data Warehouse initiatives.

Show More

946 Data Warehouse Jobs

No jobs at selected location

12. Big Data
demand arrow
high Demand
Here's how Big Data is used in Data Engineer jobs:
  • Leveraged expertise in Big Data technologies.
  • Lead data architect for Sony's Online Technology Group, responsible for high transaction relational databases and Big Data reporting environments.
  • Identify and spearhead Big Data initiatives to support marketing, merchandise, supply chain and logistics opportunities at an enterprise level.
  • Implemented POC by comparing SPARK with Hive on big data sets by performing aggregations and observing time responses.
  • Be on the leading edge to provide cutting edge big data machine learning solutions to the end clients.
  • Perform ETL process (extracting source data, cleansing rules, loading process) on Big Data.
  • Provide technical assistance for selection and integrating of the new big data technologies.
  • Create and change system which leverage business intelligence and big data technologies.
  • Developed Map Reduce programs for some refined queries on big data.
  • Installed and configured Big Data clusters on Open Stack Tenant.
  • Lead architect responsible for Endgame's Big Data Platform.
  • Mentor junior members of the Big Data team.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, Cassandra database and SQOOP.
  • Worked on ETL, data warehouse and OLAP technologies for big data analytics and troubleshoot during the ETL process.
  • Experience with installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Worked independently on different POCs for Insurance, Media and Retail Clients using Big Data technology.
  • Implemented Big Data, Cloud Computing and Data Analytics solutions for multiple end clients.
  • Led the rearchitecture of security product onto cloud-based, Big Data Platforms.
  • Export data into HDFS format, analyzed Big data using Hadoop environment.
  • Used Apache Solr to search for specific products each cycle for the business Proof of Concept in Pentaho for Big Data.

Show More

6,752 Big Data Jobs

No jobs at selected location

13. Procedures
demand arrow
high Demand
Here's how Procedures is used in Data Engineer jobs:
  • Created and modified indexes to optimize query performance of several stored procedures supporting front end applications and reports.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Analyze business procedures and problems to create and present thorough project requirements documentation.
  • Supported efficient data storage and manipulation by creating complex stored procedures and functions.
  • Developed and implemented testing procedures.
  • Developed processes and procedures for annual client audit, saving the client 8 hours per month and 72 hours per year.
  • Used the machine learning libraries of Mahout to perform advanced statistical procedures like clustering and classification to determine the usage trends.
  • Created complex stored procedures with T-SQL and D-SQL to profile the data and discover potential anomalies and missing data values.
  • Converted complex business logic into SQL Stored Procedures and user-defined functions to achieve functionality required by the UI team.
  • Put together processes, procedures, object oriented designs and implementation documents using UML.
  • Designed Database tables, created Packages by using Stored Procedures, Functions, Exceptions.
  • Participated in performance tuning and management for stored procedures, tables and database servers.
  • Design and development of Oracle packages, procedures, programming and data structures.
  • Documented the systems processes and procedures for future references.
  • Optimized SQL Server stored procedures.
  • Generated reports using SQL procedures.
  • Identified cable rack details from drawings Developed a technical white paper ensuring emergency recover procedures during network element upgrades.
  • Developed stored procedures and functions to implement necessary business logic for interface and parameterized reports.
  • Designed stored procedures to perform data cleansing in pre-staging database, and application of business rules before loading data marts.
  • Write and re-write stored procedures in SQL Server and Oracle (packages and functions) 5.

Show More

668 Procedures Jobs

No jobs at selected location

14. Hbase
demand arrow
high Demand
Here's how Hbase is used in Data Engineer jobs:
  • Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Perform the coding and testing on Load, Extract the data into the HBASE database.
  • Performed Hive-Hbase Integration for providing some additional functionalities like aggregation, sorting etc.
  • Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.
  • Involved in adding huge volume of data in rows and columns to store in HBase.
  • Involved in designing and developing HBase tables and storing aggregated data from Hive table.
  • Created HBase tables to store variable data formats of data coming from different applications.
  • Imported the huge sized TSV files to HBase for storing, using bulk load.
  • Parse process and persist the received message to the HBase.
  • Developed Kafka consumer to streaming data from Kafka to Hbase.
  • Created data models in HBase for customer data.
  • Experience in NoSql database such as Hbase.
  • Implemented IoT data store in Hbase.
  • Loaded the created HFiles into HBase for faster access of large customer base without taking performance hit.
  • Developed low latency response queries by using Hbase-phoenix layer to address the transactional needs of other teams.
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
  • Develop a Restful API to provide access to data in Solr, HBase and HDFS.
  • Manage HBase cluster on Openstack, provide production support on Kafka producer & consumer.
  • Experience in streaming the data between Kafka and databases like HBase and Elasticsearch.
  • Consume and deserialize data from Kafka and push to HBase.

Show More

1,006 Hbase Jobs

No jobs at selected location

15. SQL
demand arrow
average Demand
Here's how SQL is used in Data Engineer jobs:
  • Developed SparkSQL automation components and responsible for modifying java component to directly connect to thrift server.
  • Monitored performance and optimized SQL queries for maximum efficiency.
  • Support 200+ SQL Server instances on a team providing level three or higher support for an energy client.
  • Performed Database Health Checks to make sure SQL Server environments are in line with Microsoft Best Practices.
  • Created general troubleshooting and recovery manuals for Oracle, SQL Server, and NetApp environments.
  • Use of T-SQL heavily in order to handle some daily processes and some task.
  • Developed Hive queries in Spark-SQL for analysis and processing the data.
  • Involved in integrating hive queries into spark environment using SparkSql.
  • Performed SQL code reviews for multiple cloud based applications.
  • Converted the regression algorithm written in SAS to SQL.
  • Implemented SQL jobs to alert when problems were detected.
  • Worked with Spark Context, Spark-SQL, DataFrames, Pair RDD's, Spark Streaming.
  • Use of SQLRAP for policy's that will be rolled out.
  • Constructed both simple and complex PostgreSQL queries to pull data and create views or rollup tables for Marketing and Engineering teams.
  • Analyzed requirements; create stored procedure in T-SQL to calculate scores for nearly 800,000 listings on a daily basis.
  • Exported the result set from HIVE to MySQL using Kettle (Pentaho data-integration tool).
  • Worked on SQL statements in checking the validity of the Backend.
  • Generate simple Dashboard using Direct SQLs in OBIEE.
  • Come up with ways to use Oracle's PL\SQL to handle some of the business needs (application needs).
  • Developed environmental search engine engine using PHP5, JAVA, Lucene/SOLR, Apache and MYSQL.

Show More

5,606 SQL Jobs

No jobs at selected location

16. Oozie
demand arrow
average Demand
Here's how Oozie is used in Data Engineer jobs:
  • Configured the daily batch job using the oozie/Amazon data pipeline.
  • Scheduled the batch loading jobs using oozie.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Created Qliksense and Apache Zeppelin Dashboards of Oozie and Falcon data ingestion jobs for efficient monitoring.
  • Developed dynamic workflow engine using oozie and java which schedules most of our batch jobs.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Developed OOZIE workflows for automating Sqoop, Spark and Hive scripts.
  • Created Oozie job to design the workflow and run it periodically.
  • Automated scheduling of various machine learning workflows using Oozie and Hue.
  • Worked with workflow scheduling and monitor tools using Oozie.
  • Used Oozie scheduler system to automate the pipeline workflow.
  • Developed oozie coordinator workflow that runs hourly.
  • Designed Oozie workflows for job automation.
  • Used Oozie for workflow management.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Create Oozie coordinators & workflows to schedule jobs on Hadoop cluster.
  • Integrated hive, Pig, Sqoop with Oozie workflows for ETL flow.
  • Developed Oozie workflow for scheduling and orchestrating the ETL processes Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Worked on installing Hadoop Ecosystem components such as Sqoop, Pig, Hive, Oozie, and Hcatalog.

Show More

223 Oozie Jobs

No jobs at selected location

17. Unix
demand arrow
average Demand
Here's how Unix is used in Data Engineer jobs:
  • Perform quality assurance for data input/output on all orders including data manipulation in UNIX environment if necessary.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Authored several UNIX scripts to support data sharing between ZDNET and CNET Data Warehouse Systems
  • Applied UNIX administration skills whenever it is required to access from putty and terminal.
  • Worked on UNIX shell scripts to handle Job Control Process.
  • Tune application and query performance using UNIX and SQL tools.
  • Created UNIX shell scripts to run single lines and batch scripts to make changes to the live network.
  • Consulted with Elkay Manufacturing to build a data mart from a PeopleSoft ERP database using Oracle under Unix.
  • Created parameter files with Global, mapping, session and workflow variables using UNIX scripts.
  • Supported MS Window NT, Network Printer, UNIX switches, and Telegence billing system.
  • Involved in migrating hardware and software from UNIX/ Informix to Windows/ SQL Server 2005.
  • Diagnosed and resolved issues through reading Tivoli Workload Scheduler logs retrieved through UNIX.
  • Developed Dollar-U Unix scripts to automate the execution of Informatica ETL jobs.
  • Converted existing UNIX scripts and SAS programs to Ab Initio Graphs.
  • Used the Crontab in UNIX for Automatic Tasks.
  • Create and validate scripts in the RNC, Node B & MSN in Unix/ Citrix environment.
  • Created UNIX shell scripts to handle pre and post session tasks.
  • Create UNIX scripts for ftp'ing the target files.
  • Created UNIX scripts in support of ETLs.
  • Create Data Warehouse report, Ad-hoc report, Microsoft Report Service report and some UNIX Shell script.

Show More

603 Unix Jobs

No jobs at selected location

18. Data Analysis
demand arrow
average Demand
Here's how Data Analysis is used in Data Engineer jobs:
  • Performed wireless data analysis, mapping, quality assurance and develop effective business strategic plans and weekly/daily status reports for managers.
  • Performed data analysis and optimized algorithms using various statistical & mathematical models for ice events pattern recognition.
  • Performed data analysis on Facebook accounts to segment customers for more effective, targeted marketing.
  • Provided efficient coding to reduce large data analysis time for time sensitive credit applications.
  • Prepared data analysis for independent driver coaching and engine monitoring.
  • Used big data tools such as Apache Pig, HIVE, R-Studio, and Apache Spark to complete data analysis tool.
  • Monitored critical daily jobs, performed data analysis to identify the root cause and fixed the issues using SQL queries.
  • Developed Complex ETL code through Data manager to design BI related Cubes for data analysis at corporate level.
  • Implemented Data import/export, Transformation, Data Analysis, Data Quality, Data lineage, Data governance.
  • Explored the R statistical tool to provide data analysis on peer feedback data on leadership principles.
  • Performed data analysis and profiling of source data to better understand the sources.
  • Led team tasked with data analysis and network management.
  • Create insightful and appealing reports based on in-depth Data analysis regarding systems performance, gap assessments, schema modeling, etc.
  • Collect and clean, transform data from client for data analysis, maintenance every day data loading with T-SQL and SSIS.
  • Work with Facebook global marketing team to provide end-to-end BI solutions via data collection, processing and advanced data analysis.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Provided large data analysis support for clients like Mercedes USA , Bank of America, Wells Fargo.
  • Created ad-hoc T-SQL queries and SSIS reports for data analysis.
  • Perform Big Data analysis using Scala, Spark, Spark SQL, Hive, Mlib, Machine Learn algorithms.
  • Assist with the addition of Hadoop processing to the IT infrastructure Perform data analysis using Hive and Pig and managing jobs

Show More

933 Data Analysis Jobs

No jobs at selected location

19. Linux
demand arrow
average Demand
Here's how Linux is used in Data Engineer jobs:
  • Installed, configured, upgraded and administrated Linux Operating Systems.
  • Used Linux shell scripts to automate the build process, and to perform regular jobs like file transfers between different hosts.
  • Worked on Linux shell scripts for business processes and with loading the data from different systems to the HDFS.
  • Involved in complete cycle on migrating physical Linux/Windows machines to cloud (AWS) and test it.
  • Used Linux, python, pandas, and SQL to accomplish tasks assigned to me.
  • Worked with Linux server admin team in administering the server hardware and operating system.
  • Develop scripts to automate routine DBA tasks using Linux Shell Scripts, Python.
  • Managed the Linux server infrastructure.
  • Maintained the configuration files of UNIX servers, AIX, HP-UNIX, Solaris and Linux resulting in servers being security compliant.
  • Involved in integration work Technical Environment: C, Xml, NXP TV 550, MIPS Processor, Linux
  • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Skilled in using Linux language, such as s3 command language and Hadoop HDFS language.
  • Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Scheduled Tasks on Linux Using Crontab and Shell Scripts.
  • Perform web-based deployments using Websphere in a linux environment.
  • Used SVN for version control Environment: Hadoop HDFS, MapReduce, Sqoop, Hive, Linux, MySql
  • Developed Spark/Scala,Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Virtualized Hadoop in Linux environment for providing a safer, scalable analytics sandbox on the application.

Show More

3,579 Linux Jobs

No jobs at selected location

20. Mapreduce
demand arrow
average Demand
Here's how Mapreduce is used in Data Engineer jobs:
  • Implemented Generic writable to incorporate multiple data sources into reducer to implement recommendation based reports using MapReduce programs.
  • Implemented MapReduce programs to perform joins using secondary sorting and distributed cache.
  • Developed MapReduce layer to source targeted data for visualization.
  • Designed and developed MapReduce programs for data lineage.
  • Integrated product catalog structure with MapReduce framework for Recommendations (ML - Classification and Regression algorithm) and POI strategy.
  • Develop MapReduce job for AVRO conversion and load the AVRO data to hive table using the SerDe's.
  • Developed a MapReduce pipeline to explore real-time top 10 positive and negative words on the dashboard.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Developed MapReduce jobs using Hive, and Pig to extract and analyze data.
  • Experience in troubleshooting errors in Shell, Hive, and MapReduce.
  • Integrated R scripts with MapReduce jobs.
  • Involved in creating Hive tables, loading data and writing hive queries that will run internally in mapreduce way.
  • Project: Microsoft Office Data Collection Developed MapReduce programs to parse the raw data.
  • Developed MapReduce job that runs on top of encoded, encrypted json messages.
  • Optimized Mapreduce jobs to use HDFS efficiently by using various compression techniques.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Performed unit testing of MapReduce jobs on cluster using MRUnit.
  • Developed a java MapReduce and pig cleansers for data cleansing Developed hive udfs to mask confidential information in the data.
  • Designed and written component in spark and mapreduce to calculate daily overall scores on daily batch and stored to HBase.
  • Write Hadoop MapReduce programs, Spark/Storm topologies to process large set of data for better decision-making.

Show More

804 Mapreduce Jobs

No jobs at selected location

21. Kafka
demand arrow
average Demand
Here's how Kafka is used in Data Engineer jobs:
  • Designed and configured Kafka cluster to accommodate heavy throughput messages per second.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Used Apache Kafka as messaging system to load log data, data from UI applications into HDFS system.
  • Worked on Spark Streaming using Kafka to submit the job and start the job working in Live manner.
  • Experience in creating NiFi flow to streaming data between Kafka, PulsarDB, ElasticSearch, and FTP.
  • Created Spark streaming projects for data ingestion and integrated with Kafka consumers and producer for messaging.
  • Involved in setting up the Kafka Streams Framework which is the core of enterprise inventory.
  • Harmonized data coming from different sources using Kafka to bring it to consistent format.
  • Used Kafka to ingest data into a data center unit.
  • Installed and configured Kafka clusters.
  • Developed the model to predict the life of goods and alert to the customer by sending it another kafka queue.
  • Designed and developed Elastic Search Connector using Kafka connect api with source as Kafka and sink as elastic search.
  • Created component to receive the messages from the Mosquito queue and post the same to kafka producer for availability.
  • Created Map Reduce Jobs using Pig Latin and Hive Queries Integrated Kafka with Storm for real time processing.
  • Develop Nifi flows to transform, serialize streaming data and send to Kafka and Elastic Search.
  • Implemented the spark real time component to poll the messages from kafka for every 5 seconds.
  • Implemented POC to extract data with Kafka and Spark into HDFS and Hbase.
  • Developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Implemented contracts administrator service using Kafka, Spark and Java.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.

Show More

3,119 Kafka Jobs

No jobs at selected location

22. Scala
demand arrow
average Demand
Here's how Scala is used in Data Engineer jobs:
  • Informed higher levels of critical office statuses where escalation was necessary.
  • Communicate and escalate issues appropriately.
  • Served as point of escalation for the resolution of complex issues.
  • Designed and implemented psychical model utilizing Oracle 11g to maximize query performance and maintain data scalability.
  • Build new data pipelines for data integration and data ingestion using Spark/Python/Scala and Airflow.
  • Developed data marts (sourced from ODS) that provision fast, scalable and maintainable cubes for data mining by SSAS.
  • Assisted and performed Hadoop-MapReduce jobs and streaming to execute scalable machine learning data algorithms on terabytes of data.
  • Developed a fault tolerant, highly scalable and robust pipeline to ingest large volume of unstructured data.
  • Led efforts for scalable data processing pipelines for music metadata (Python, Shell, MySQL)
  • Involved in converting Hive/HQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Implemented Spark using Java/Scala and Spark SQL for faster testing and processing of data.
  • Used Spark and Scala for developing machine learning algorithms which analyses click stream data.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Extended the capabilities of DataFrame using User Defined Functions in Python and Scala.
  • Used Scala programming as well to perform transformations and applying business logic.
  • Used Scala to extract information from JSON files.
  • Implemented audit service using Scala on Spark.
  • Provided detailed design for system modules/components with the vision of future extensibility and scalability.
  • Developed SQL queries into Spark Transformations using Spark RDDs, DataFrames and Scala, and performed broadcast join on RDD's/DF.
  • Worked with Hbase to provide real-time and scalable data storage.

Show More

23. Flume
demand arrow
average Demand
Here's how Flume is used in Data Engineer jobs:
  • Used interceptors with RegEx as part of flume configuration to eliminate the chunk from logs and dump the rest into HDFS.
  • Used the source agent in Apache Flume to import the data, especially logs information, into HDFS.
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Architect-ed and developed a Zero Data Loss data pipeline using Flume and UM queues.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume.
  • Worked in Agile development approach and Storm, Flume, Bolt, Kafka.
  • Worked on streaming the data into HDFS from web servers using Flume.
  • Developed custom Flume interceptor to transform near real time data.
  • Used Flume in Loading log data into HDFS.
  • Worked with Flume to transfer online logs data.
  • Worked on tools Flume, Kafka, Storm and Spark.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
  • Configured Flume with sources as multiple Kafka topics, channel as Kafka channel, and sink as Hdfs.
  • Research on HBase, flume, Yarn and Hive on spark for different project requirement.
  • Design and build data pipelines using Sqoop, Flume, and Kafka.
  • Worked with flume to import the log data from the reaper logs, syslog's into the Hadoop cluster.
  • Used the Avro SerDe's for serialization and de-serialization of log files at different flume agents.
  • Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.

Show More

180 Flume Jobs

No jobs at selected location

24. BI
demand arrow
average Demand
Here's how BI is used in Data Engineer jobs:
  • Report: power volatility of hydro-electric systems in Colombia, comparison of power pools and potential for gaming
  • Implemented NameNode backup using NFS for High availability.
  • Presented research at INFORMS Conference, 1997: Probabilistic Assessment of Timing of Access to the US Retail Power Market.
  • Paused this activity due to forming Big Data teams in Asia (Taiwan and Japan) from 09/2015.
  • Developed accounting data model in Looker application to analyze billing across the enterprise at the transaction level.
  • Worked on Apache Spark streaming API on Big Data distributions in the active cluster environment.
  • Design and develop DMA(Disney Movies anywhere) dashboard for BI analyst team.
  • Provide revenue assurance between CDPD system and company billing system.
  • Serve as the single point of contact for notification of planned/emergency data network outages and communications between carriers for interoperability.
  • Discovered a very useful metric that uses signal strength variability to estimate UE speed when GPS is not available.
  • Worked on gsutil to get data from Hive to Google buckets to be consumed by big query tables.
  • Project: Marketing and Securities FDR Description: Designed and developed big data solutions involving Terabytes of data.
  • Maintain data base that provide trends and patterns of product stability companywide for quick response to product issues.
  • Worked on predictive Analytics on performance metrics for big data platforms for Center of Excellence team.
  • Managed roadmap for long term BI strategy working with various organizations from legacy and new technology.
  • Worked as a contractor on the NG-ABIS department of defense national security biometrics project.
  • Migrated BI instance to same availability zone as Redshift to co-locate in same zone.
  • Project Title: Rejuvenate Responsibilities: Evaluate and set up the environments for various new tech stacks.
  • Designed a Clouded based Big Data architecutre 3 tire TB level data lake system from scrach.
  • Power BI (Training) Setup and configured always on instances.

Show More

1,274 BI Jobs

No jobs at selected location

25. API
demand arrow
average Demand
Here's how API is used in Data Engineer jobs:
  • Designed and developed named entity recognition system using AlchemyAPI, Spotlight, UIMA and GATE.
  • Developed REST API using Spark(web framework) to register device into HSDM.
  • Integrated internal and external data via API for cross platform marketing campaign evaluations.
  • Obtained, cleaned and processed data from open sources and scraping from Internet.
  • Involved in Designing and Developing Enhancements of CSG using AWS APIS.
  • Developed reusable modules using buck build for Graph API endpoints access.
  • Involved in HDFS maintenance and administering it through Hadoop-Java API.
  • Developed shell script to schedule data consumption from API.
  • Created REST API layer using JAX-RS and Hibernate.
  • Developed Map Reduce process using Java API.
  • Post Processing (TEMS and MapInfo).
  • Used MATLAB and developed Apache Mahout APIs to analyze the dictionary data to act as Machine Learning model.
  • Designed and developed Apache Spark APIs to integrate with AWS DataLake for in memory transactions.
  • Create standard and custom visualizations using d3.js, tableau, Tableau JavaScript API
  • Experience building and maintaining a REST API with Nodejs.
  • Work closely with architect and clients to define and prioritize their use cases and iteratively develop APIs and architecture.
  • Created all the services to work with the various entities provided and restified the services using REST APIs.
  • Prepared custom APIs for the CEO's Dev metric team on top of the existing Mart.
  • Created optimized Postgres queries based around performance metrics and the end point APIs to track performance of agents utilizing the app.
  • Have developed automated Pyspark job to extract data from third party APIs like EBay, Manheim Auction etc.

Show More

3,842 API Jobs

No jobs at selected location

26. Business Requirements
demand arrow
average Demand
Here's how Business Requirements is used in Data Engineer jobs:
  • Receive business requirements from stakeholders and converting them to technical specs then developing the solution.
  • Participated in user meetings, gathered Business requirements & specifications for the data reports.
  • Interacted with Business Analyst to understand the business requirements.
  • Provide ETL design, development and testing estimations based business requirements and research into data that is currently sourced.
  • Analyzed business requirements and delivering test plans and other artifacts such as test scenarios, test cases and reports.
  • Designed and created mappings to deal with Change Data Capture (CDC) for various business requirements.
  • Defined ETL processes and worked with offshore teams to collaborate business requirements into functional data models.
  • Work with end users and translate their business requirements into technical tasks using Test Driven Development.
  • Created reports with the help of SSRS, Tableau and presentations to meet business requirements.
  • Capture business requirements from the users and lead the team by providing technical solutions.
  • Gathered business requirements from the power users and designed the Reports based on requirements.
  • Analyze business requirements and map the requirements to a technical solution.
  • Gathered business requirements from finance, marketing and product groups.
  • Implement the business requirements from the analysts.
  • Evaluate new technological developments and evolving business requirements and makes recommendations for improved service levels and efficiencies.
  • Participated in business requirements as well as provided trainings to various technical and business users.
  • Used Pig Latin to analyze datasets and perform transformation according to business requirements.
  • Develop iBots (alerts) and schedule OBIEE reports according to the business requirements.
  • Create tables, views, projections in Vertica per Business requirements.
  • Project Title: BI Tivomart Responsibilities: Translate business requirements into technical requirements.

Show More

439 Business Requirements Jobs

No jobs at selected location

27. Informatica
demand arrow
average Demand
Here's how Informatica is used in Data Engineer jobs:
  • Designed and developed ETL solutions using Informatica PowerCenter.
  • Configured Informatica Power Center with SharePoint 2010/2013.
  • Developed SSIS packages for supporting Informatica deficiencies.
  • Assisted in configuring Informatica Power Exchange.
  • Develop batch and real-time ETL applications using Informatica to populate the data warehouse and perform unit testing / system integration testing.
  • Worked with Informatica Power Center to design mappings and sessions to extract, transform and load into Data Warehouse/DataMart.
  • Worked with Informatica Mapping Architect to design the template which can be used to generate ETL Interfaces.
  • Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle using Informatica Power Center.
  • Develop ETL solutions utilizing Informatica to fetch data from external sources to populate the data warehouse.
  • Design, Development, Testing and Implementation of ETL programs using Informatica and batch/shell scripts.
  • Worked on Informatica Templates to standardize the process and maintain consistency across different teams.
  • Conducted performance tuning of informatica maps/db queries, debugged maps, completed code reviews.
  • Create ETL Mappings using Informatica 9.x to distribute plan data to multiple destinations.
  • Designed and developed ETL mappings for multiple projects using Informatica and PL/SQL.
  • Installed Hot fixes, utilities, and patches released from Informatica Corporation.
  • Worked with Informatica Power Exchange to extract the data from SAP.
  • Worked on setting up the Informatica Sales Cloud environment.
  • Upgraded to Informatica Power Center 8.6.1 from version 8.1.1.
  • Used Informatica to extract data from the Salesforce SaaS system and scheduled jobs to bring real-time data into a staging area.
  • Developed the Unix Scripts to automate pre-session and post-session processes in Informatica, file transfers between the various hosts.

Show More

321 Informatica Jobs

No jobs at selected location

28. Teradata
demand arrow
average Demand
Here's how Teradata is used in Data Engineer jobs:
  • Handled end-to-end implementation of SAS to Teradata migration projects to successful completion under tight deadlines.
  • Involved in Teradata database design prototypes.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle and successfully loaded files to HDFS.
  • Developed automated jobs for reconciling the data between the source (Oracle) and EDW systems (Teradata).
  • Converted DB2 SQL scripts, macros, and stored procedures to Teradata SQL in support of DB2-to-Teradata database migration.
  • Developed automated scripts for ingesting the data from Teradata around 200TB bi-weekly refreshment of data.
  • Created DDL for the dropping, creation, and modification of database objects in Teradata.
  • Loaded data from Hive to Teradata using TDCH.
  • Used Teradata SQL Assistant to view hive tables.
  • Adhered to GCSS-AF Teradata standards.
  • Diagnosed, resolved, and remediated performance issues of varying degrees through SQL analysis, logs, and Teradata Explain Plans.
  • Developed job flows in TWS to automate the workflow for extraction of data from Teradata and Oracle.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
  • Led data migration project to replace PeopleSoft with Workday on Schwab's Teradata Data Warehouse.
  • Exported data using Sqoop from HDFS to Teradata on regular basis.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Worked on importing and exporting data from different databases like Oracle, Teradata into HDFS andHive using Sqoop.
  • Design and develop Teradata FastLoad and Multiload scripts to load and unload data from the datawarehouse.
  • Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
  • Extract large datasets from Teradata using utilities like fastexport, bteq and Load into Verticas.

Show More

114 Teradata Jobs

No jobs at selected location

29. XML
demand arrow
average Demand
Here's how XML is used in Data Engineer jobs:
  • Developed java program that uses JAXP, XML/XSLT and Java Mail API to send data to mainframe application via an email.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
  • Designed a relational model to house XML and text data files received from third parties.
  • Used web based markup languages such as HTML and XML schema.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Develop Spark/MapReduce jobs to parse the JSON or XML data.
  • Developed Perl code to process and load XML-based vendor feeds.
  • Worked with customers to establish direct XML data feeds.
  • Involved in creating and enhancing homegrown ETL scripts using Perl, XML, PLSQL, and SQL.
  • Used XSLT to convert the XML documents into the HTML pages.
  • Utilize Unix, SOAP, XML, LDAP interfaces.
  • Implemented the JAVA MapReduce XMLParser to handle both gas and electric XML data and improved the performance using AVRO.
  • Designed ETL dataflow pipelines for processing large XML payloads, data files and Message queue messages.
  • Report large datasets to Warner, Sony and Universal using SQL in TPT and XML.
  • Integrated clients like for data automation using Talend, Java, and XML.
  • Involved in Web Service/API testing through Soap Xml using SOAP UI tool.
  • Reduced time to create the xml file from 2-4 weeks to 2-10 seconds.
  • Processed unstructured clinical text into JSON/XML formats with natural language processing tool Bioportal Annotator and REST.
  • Used JSON, XMLand Avro SerDe's for serialization and de-serialization packaged with Hive to parse the contents of streamed.
  • Create and update stored procedure which is for every day dataloading in back-end database using T-SQL and xmla code.

Show More

530 XML Jobs

No jobs at selected location

30. Nosql
demand arrow
low Demand
Here's how Nosql is used in Data Engineer jobs:
  • Used Cassandra as the NoSQL Database and acquired very good working experience with NoSQL databases.
  • Worked on Apache Cassandra writing NoSQL routines for time-series data.
  • Used NoSQL databases like MongoDB in implementation and integration.
  • Provided solution using MongoDB and NoSQL databases.
  • Implemented the NoSQL databases like HBase, the management of the other tools and process observed running on YARN.
  • Used Spring Data for NoSQL HBase, Cassandra, & Hibernate/JPA for MySql & Derby.
  • Worked with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB.
  • Worked on NoSQL databases like Cassandra, HBase to store structured and unstructured data.
  • Experience in different NOSQL databases such as MongoDB, HBbase and Cassandra.
  • Worked with NoSQL database Hbase to create tables and store data.
  • Involved in NOSQL databases like HBase in implementing and integration.
  • Gained experience with NOSQL databases like Hbase, Cassandra.
  • Used NoSQL, Cassandra, Mongo databases.
  • Developed practical NoSQL and relational solutions to conquer scalability and distributed data processing challenges for analytic model development.
  • Worked with NoSQL databases like Mongo, Hbase, Cassandra in creating tables to load large sets of semi structured data.
  • Worked with Pig, HBase, NoSQL database HBASE, for analyzing the Hadoop cluster as well as big data.
  • Performed Sqooping for various file transfers through the Cassandra tables for processing of data to several NoSQL DBs.
  • Secure data ingest to NOSQL data stores (Accumulo) and running analytics against it.
  • Experience in NoSQL Column-Oriented Databases like HBase and its integration with Hadoop cluster.
  • Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.

Show More

3,952 Nosql Jobs

No jobs at selected location

31. Ssis
demand arrow
low Demand
Here's how Ssis is used in Data Engineer jobs:
  • Assisted designing machine learning algorithms for solving business insight problems.
  • Worked on several projects that required me to use SSIS to move data around and normalize it.
  • Assisted in the development of a service used to perform sentiment classification on Twitter tweets.
  • Assisted Applied Mathematician in setting up VM's to run software for training food recognition.
  • Developed SSIS packages and SSRS reports for Engineering Division of a major automotive corporation.
  • Assisted the ETL staff, developers and end users to understand the data model.
  • Assisted remote engineers with troubleshooting network issues in remote offices and data centers.
  • Build efficient SSIS packages for processing fact and dimension tables with complex transforms.
  • Assisted with computer projects of other areas in processing of department specific data.
  • Assist DBA in tuning SQL queries and stored procedures to improve performance.
  • Assisted with design, develop, and deployment of network solutions.
  • Assist in placing and/or disabling accounts on test as requested.
  • Front of House assisted with microphone placement and Live Recording.
  • Utilize SSIS & SSRS to increase productivity and efficacy.
  • Work on MS SQL Server, SSIS and MS Visual Studio on daily basis.
  • Migrated DTS 2000/2005 packages to SQL server 2008, 2012 and 2014 SSIS packages.
  • Use of SSIS to move data into a SQL Server box.
  • Provided technical support assistance with printers, computers etc., and helped input lots of data information in AllStates Database.
  • Assist as needed with troubleshooting equipment over the phone including the use of some GWS internally developed tools.
  • Assisted the project manager in problem solving with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.

Show More

367 Ssis Jobs

No jobs at selected location

32. R
demand arrow
low Demand
Here's how R is used in Data Engineer jobs:
  • Created user behavior metrics and conducted cluster analysis.
  • Utilized Oracle Data Integrator (12C) to develop star schema data models and create ELT jobs for data warehousing.
  • Worked as the configuration controller (CC) of the team and implemented a controlled change process.
  • Coordinated with offshore/onshore, collaboration and arranging the weekly meeting to discuss and track the development progress.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Designed, created, and maintained over 100 reports that facilitated great customer service.
  • Prepare daily and weekly project status report and share it with the client.
  • Lead organizations Data management strategies; both vendor, and open-source.
  • Developed Java module to automate XUDML file creation for ODI.
  • Created Sequence files, and AVRO file formats.
  • Involved in User Acceptance Testing(UAT)
  • Supervised upgrades to the SS7 (SCP, STP, SSP, Trucking) architecture.
  • Developed Hive scripts with analyst requirements for Adhoc analysis.
  • Architected automated batch jobs to create, encrypt, and transfer data extracts using Oracle Pump, PGP.
  • Developed a recommendation engine for Magento based e-commerce website, using Spark, MySQL and Azure.
  • Handled end to end project lifecycles, right from cost, time estimates to deployment.
  • Upgraded HDP and Ambari on multiple clusters and installation of HDP services on the cluster.
  • Interact with the onsite periodically to discuss project status.
  • Re-engineered Proxyvote an OLTP voting system.
  • Implemented a CDC data pipline with Cloudera stack, Kafka, Java8 and Postgres on AWS.

Show More

33. Json
demand arrow
low Demand
Here's how Json is used in Data Engineer jobs:
  • Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into the Data Warehouse.
  • Created ETL jobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.
  • Created restful web service using the spring framework to acquire the JSON trip messages from the Kafka/Amazon SQS queue.
  • Performed ETL on the data coming in the form of JSON files and populate it into a MySQL Database.
  • Parse all the received JSON trip messages using JSON Parser.
  • Parsed JSON formatted twitter data and uploaded to database.
  • Automated report development using Java, SQL, JSON.
  • Used Cassandra to work on JSON documented data.
  • Worked with parquet, JSON file formats.
  • Handled different File Formats like JSON.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Designed ETL Process to process the unstructured data (Json, PHP Serialized, XML) data.
  • Used JSON file as config file, to set different input parameters.

Show More

236 Json Jobs

No jobs at selected location

34. S3
demand arrow
low Demand
Here's how S3 is used in Data Engineer jobs:
  • Developed Pig Latin/Hive scripts to extract the data from the web server output files to load into S3.
  • Validated if data has been normalized and loaded into S3 buckets using Hive queries.
  • Have written Spark jobs to aggregate the data and stored in S3.
  • Developed and deployed PySpark application to process AWS S3 data.
  • Worked on EMR to analyze data in S3 buckets.
  • Imported data from DynamoDB to S3 using hive.
  • Optimized data movement from S3 to minimize latency.
  • Configured lifecycle management in S3.
  • Worked with carriers and field techs nationwide to test and turn-up Ds1 & Ds3 Voice/Data circuits utilizing Cisco MGX 8260 gateway.
  • Table Backup: Setup table backup/restore to/from S3 scripts for long-term data archiving and easy recovery by non-DBAs.
  • Configured Sqoop and developed scripts to extract structured data from PostgreSQL onto Amazon S3 cloud.
  • Supplemented ad-hoc querying of data in S3 using Hive and Presto on Elastic MapReduce.
  • Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
  • Implemented data pipelines for data migration from AWS S3 to Redshift.
  • Exported analyzed data to S3 using Sqoop for generating reports.
  • Build reliable Data Warehousing on AWS Cloud using: RDS, EC2, Redshift, S3, EMR, Data Pipeline.
  • Experience in working with AWS S3, RDS along with AWS Redshift.
  • Perform data analytics and load data to Amazon s3/datalake/Spark cluster.
  • Developed workflow in Amazon Datapipeline to automate the tasks of loading the data into S3 and pre-processing with Pig/Hive.
  • Transferred the data using Informatica tool from AWS S3 to Using AWS Redshift for storing the data on cloud.

Show More

192 S3 Jobs

No jobs at selected location

35. Zookeeper
demand arrow
low Demand
Here's how Zookeeper is used in Data Engineer jobs:
  • Performed cluster co-ordination and assisted with data capacity planning and node forecasting using ZooKeeper.
  • Experienced in using Zookeeper Operational Services for coordinating the cluster.
  • Used Zookeeper for various types of centralized configurations.
  • Maintained cluster co-ordination services through ZooKeeper.
  • Cluster coordination services using ZooKeeper.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Implemented Storm integration with Kafka and ZooKeeper for the processing of real time data.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Configured Apache Kafka's multiple brokers along with zookeeper nodes.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Automated the installation and maintenance of kafka, storm, zookeeper and elastic search using salt stack technology.
  • Worked on multiple data formats on HDFS using Spark Used Zookeeper for various types of centralized configurations.
  • Utilized ZooKeeper to implement high availability for Namenode and automatic failover infrastructure to overcome single point of failure.
  • Provided solution using Hadoop ecosystem HDFS, MapReduce, Pig, Hive, Impala, HBase, and Zookeeper.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Used ApacheZookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Used Zookeeper and Oozie for coordinating the cluster and scheduling workflows.
  • Scheduled workflows using Zookeeper and Oozie.
  • Installed Hive, Zookeeper, Oozie, Sqoop, Hue and Hbase.

Show More

269 Zookeeper Jobs

No jobs at selected location

36. Spark SQL
demand arrow
low Demand
Here's how Spark SQL is used in Data Engineer jobs:
  • Involved in developing the Spark Streaming jobs by writing RDD's and developing data frame using Spark SQL as needed.
  • Developed Spark SQL Scripts for data Ingestion from Oracle to Spark Clusters and relevant data joins using Spark SQL.
  • Developed a JDBC API for ArrayDB to integrate it into the Spark platform based on JNI and Spark SQL.
  • Implemented Spark using Java and Spark SQL for real time processing of storage/compute/ network operation metrics.
  • Involved in working and tuning of the Spark SQL queries and Spark streaming scripts.
  • Used Spark SQL to process the huge amount of structured data.
  • Created Spark SQL queries for faster processing of data.
  • Worked on Spark SQL UDF's and Hive UDF's.
  • Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQl Context.
  • Worked on the componentization of existing spark jobs such that they were re-written in Spark SQL.
  • Project scope limited to Only Spark SQL and Spark Streaming here but interested in exploring Mlibs.
  • Developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
  • Design & Develop ETL Workflows in Spark SQL for data analysis.
  • Have worked with Apache Spark SQL, Spark Streaming and Spark MLlib.
  • Parsed and queried JSON, CSV, Parquet, Avro and XML data formats using Spark SQL and Dataframes.
  • Developed Spark programs, scripts and UDF's using Scala/Spark SQL for aggregative operations as per the requirement.
  • Developed various main & service classes through Scala using spark SQLs for the requirement specific tasks.
  • Develop spark sql tables & queries to perform adhoc data analytics for analyst team.

Show More

37. Rdbms
demand arrow
low Demand
Here's how Rdbms is used in Data Engineer jobs:
  • Modeled and instantiated a reference data capability using Oracle 9i/10g RDBMS to support DOJ homeland security activities.
  • Planned, revised, tested and documented backup and recovery procedures for several RDBMS (Oracle and SQL server).
  • Gathered and analyzed data to aid in the decisions that would create a healthy RDBMS system.
  • Experience in RDBMS concepts, writing PL/SQL database queries, stored procedures, and triggers.
  • Implemented comprehensive regression test system for RDBMS, Native API, ODBC API and clients.
  • Involved in Extracting, loading Data from Hive to Load an RDBMS using SQOOP.
  • Become more proactive instead of reactive, especially on RDBMS.
  • Maintain, monitor and service any RDBMS systems already in existence or new for a variety of Systems (i.e.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Involved in creating generic Sqoop import script for loading data into hive tables from RDBMS.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Transferred the structured data in RDBMS to HDFS for computation preparation with Sqoop.
  • Developed, profiled, tuned native middleware for proprietary data-warehousing SQL RDBMS.
  • Export the processed data from HDFS to RDBMS using Sqoop.
  • Handled incremental data loads from RDBMS into HDFS using Sqoop.
  • Imported the data from RDBMS (MYSQL) to HDFS using Sqoop.
  • Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
  • Used Sqoop utility tool to import structured RDBMS data to Hadoop.
  • Performed transferring of data from RDBMS to HDFS/HIVE/HBase(NoSQL) using the Sqoop and vice versa.
  • Utilized Apache Hadoop environment by Cloudera Distribution.Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Show More

38. Ingest
demand arrow
low Demand
Here's how Ingest is used in Data Engineer jobs:
  • Designed and developed a critical ingestion pipeline to process over 100TB of data into a Data Lake.
  • Designed and developed various machine learning modules for data ingestion, transformation, training and predictions.
  • Extracted raw data from landing zone and loaded into a staging / ingest zone.
  • Designed Spark schema and data selection queries and Involved in data ingestion process.
  • Reviewed audit data ingested in to the SIEM tool for accuracy and usability.
  • Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
  • Created and maintained all data ingestion flows into cloud architecture.
  • Tasked to improve the ingestion and data representation.
  • Loaded metadata, data into Ingest zone.
  • Created Hive tables in Landing, Ingest and IDL zones.
  • Designed and developed data ingestion pipeline to work on hybrid (partly on-prem/partly in-cloud) environment using Apache Spark and EMR.
  • Designed and developed solution for ESP pump real time data ingestion using Kafka, Storm and HBase.
  • Created a Cron Job those will execute a program that will start the ingestion process.
  • Fixed ingestion issues using Regex and coordinated with System Administrators to verify audit log data.
  • Loaded data from ingest zone to IDL Data Validation in IDL zone using HiveSQL.
  • Experience in retrieving data from databases like MYSQL and Oracle into HDFS using Sqoop and ingesting them into HBase.
  • Integrated Apache Kafka for data ingestion Configured Domain Name System (DNS) for hostname to IP resolution.
  • Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
  • Envisioned news article crawler/ingestion pipeline using nutch, feed reader and activemq.
  • Ingested large sensor dataset in HBase/Hive on Hortonworks 2.3 hadoop cluster.

Show More

39. Data Quality
demand arrow
low Demand
Here's how Data Quality is used in Data Engineer jobs:
  • Collaborated and coordinated with development teams to deploy data quality solutions, while creating and maintaining standard operating procedure documentation.
  • Established data quality program, profiling metrics, trend monitoring and defect tracking/management.
  • Designed, built an automated validation processes/jobs to ensure data quality.
  • Applied processes improving and re-engineering methodologies to ensure data quality.
  • Implement automated processes for tracking data quality and consistency.
  • Developed and executed data quality strategies.
  • Worked in a team environment to fix data quality issues typically by creating Regular Expression codes to parse the data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Involved in data validation by writing complex SQL queries and participated in back-end testing and data quality issues.
  • Design and develop reusable components, frameworks and UDF's for ingestion and data quality.
  • Build out internal infrastructure processes to help with maintenance and operations of data quality.
  • Managed export of large data sets for user end testing for estimating Data quality.
  • Monitor Data Quality of department's Big Data databases using Metric Insights.
  • Perform data discovery, integration and identify / fix data quality issues.
  • Create Data Reconciliation & Data Quality reports for newly acquired data sources.
  • Established a data quality methodology to determine traits of poor data.
  • Led in consistent data audits to maintain enterprise data quality.
  • Worked with Subject Matter Experts (SMEs) and with system owners to establish data quality business rules and definitions.
  • Designed conceptual, logical and physical data models using best practices to ensure high data quality and reduce redundancy.
  • Involved in the mapping of data sources and analytics, with the goal of ensuring data quality.

Show More

433 Data Quality Jobs

No jobs at selected location

40. Log Files
demand arrow
low Demand
Here's how Log Files is used in Data Engineer jobs:
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Extracted data from various log files and imported data into text files.
  • Used by scheduled jobs to email notifications and log files.
  • Transferred log files from the log generating servers into HDFS.
  • Performed Cluster Drive and Post Processing of Log Files.
  • Configured Flume to stream log files.
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Achieved high security and retention by managing and reviewing Hadoop log files.
  • Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Managed and reviewed the Hadoop log files using Shell scripts.
  • Managed and reviewed Hadoop log files using Flume and Kafka.
  • Load and transform large sets of structured, semi structured and unstructured data Experience in managing and reviewing Hadoop log files.
  • Implemented and administered Cloudera Hadoop cluster(CDH); reviewed log files of all daemons.
  • Managed and Reviewed Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
  • Gained experience in managing and reviewing Hadoop log files 9.
  • Upload drive data log files to ATIX, Filezilla and customized softwares and perform pre and post data analysis.

Show More

40 Log Files Jobs

No jobs at selected location

41. Data Processing
demand arrow
low Demand
Here's how Data Processing is used in Data Engineer jobs:
  • Performed repairs and inspections on vehicle equipment and data processing equipment.
  • Design, develop and test tools to automate data processing methods in Java using JA Builder(Matlab-Java platform integration).
  • Modeled hive partitions extensively for data separation and faster data processing and followed by and Hive best practices for tuning.
  • Implemented large scale data processing of trade and position level information, by using a custom built MapReduce engine.
  • Assisted in creation of ETL processes for transformation of data sources from existing data processing systems.
  • Used Apache Spark on YARN to have fast large scale data processing and to increase performance.
  • Developed Spark jobs using Python for faster data processing and used Spark SQL for querying.
  • Worked on Batch processing and Real-time data processing on Spark Streaming using python Lambda functions.
  • Optimized ORACLE ETL (partitioning, direct path inserts, parallel data processing etc.)
  • Tasked with creating in-house data processing tools for increasing performance and data continuity.
  • Research data processing and analysis technologies and their application to client data.
  • Developed a data pipeline for data processing using Kafka-Spark API.
  • Developed data processing pipeline in Spark for DNS data.
  • Developed k-streams using java for real time data processing.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Developed data processing pipeline in Hadoop.
  • Try to implement Spark as a new platform for faster data processing and analytics.
  • Configured Hadoop HDFS and developed multiple MapReduce jobs for data processing.
  • Estimated Workload Profiles (for analytical processing, Data Processing, Ad-hoc processing etc.)
  • Developed complete end to end Big-data processing in hadoop eco system.

Show More

589 Data Processing Jobs

No jobs at selected location

42. Avro
demand arrow
low Demand
Here's how Avro is used in Data Engineer jobs:
  • Fulfilled requirements from all silos by using AVRO, Parquet, JSON, XML, CSV file formats.
  • Worked on various file formats AVRO, ORC, Text, CSV, Parquet using Snappy compression.
  • Experienced in handling Sequence, ORC, AVRO and Parquet files.
  • Worked with file formats TEXT, AVRO, PARQUET and SEQUENCE files.
  • Used Pig Latin scripts to convert data from JSON, XML and other formats to Avro file format.
  • Designed Hive tables on the hospital data with the file format as Avro and snappy compression.
  • Used different file formats like Text files, Sequence Files, Avro.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Experience in using Sequence files, ORC and Avro file formats.
  • Converted the text data to Apache Avro records with compression.
  • Worked with RC/ORC & Avro Hive tables with Snappy compression.
  • Loaded data into landing zone in Avro format.
  • Implemented data serialization using apache Avro.
  • Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
  • Determine best serialization methodology to store and retrieve the data like Text, Sequential, Avro and Parquet file format.
  • Extracted, translated, loaded and streamed disparate datasets in multiple formats/sources including Avro, JSON delivered by Kafka.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Imported Avro files using Apache Kafka and did some analytics using Spark.
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Convert Avro format to ORC format Convert existing SQL scripts to HQLs and Hive UDFs.

Show More

43. Mongodb
demand arrow
low Demand
Here's how Mongodb is used in Data Engineer jobs:
  • Handled MongoDB Disaster Recovery Plan for Business Continuity Plan.
  • Developed application component interacting with MongoDB.
  • Developed a tool to stream business related Twitter data and store in MongoDB for sentiment analysis.
  • Worked closely with the development team to help them better understand MongoDB database concepts and techniques.
  • Implemented document store database (like MongoDB) to integrate with a relational database.
  • Migrated Data from Oracle & SQL Server Database by Reverse Engineering to MongoDB Database.
  • Worked with MongoDB and utilized NoSQL for non-relation data storage and retrieval.
  • Involved in data migration from Oracle database to MongoDB.
  • Designed and Implemented MongoDB Cloud Manger for Google Cloud.
  • Populated sensor data into NoSQL database (MongoDB).
  • Worked on NoSQL including MongoDB and Cassandra.
  • Maintain customer data pipelines and ETL processes, extracting data from Exchange/Gmail/CRM/EXCEL into MongoDB with processing and reporting in PostgreSQL.
  • Tuned Reports and Metadata Models for Efficient and faster reporting Environment: Windows 7, UNIX, SQL Server, MongoDB
  • Set up development environment with backend of MongoDB Sandbox and data visualization front end hosted on AWS Elastic Beanstalk.
  • Designed and Implemented Capacity Management Database (CMDB) using MongoDB and reports using D3.js.
  • Experience in ingesting data into MongoDB and consuming the ingested data from MongoDB to Hadoop.
  • Created and Designed the Workflow of migration from Oracle to MongoDB Database.
  • Re-Designed the Site Analytics using MySQL & MongoDB to provide analytic reports for Businesss Group.
  • Picked up skills and greatly contributed in areas of Storm, and Scala and created POCs on Hbase and MongoDb.
  • Developed POC to analyse high volume events data (>100 m views) using Scala/Hadoop/Mahout Tech stack: Raisin/Spring/Postgres/MongoDB/Talend/Scala/Splunk/AWS

Show More

1,907 Mongodb Jobs

No jobs at selected location

44. Impala
demand arrow
low Demand
Here's how Impala is used in Data Engineer jobs:
  • Extracted and loaded the required data from the tables by using Hive and Impala by creating new tables in Hive.
  • Load and transform large sets of structured, semi structured and unstructured data with map reduce, Impala and Pig.
  • Created databases table, tables and views in Hive QL, Impala and Pig Latin.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Performed querying of both managed and external tables created by Hive using Impala.
  • Leveraged Impala Shell in Unix scripts for auditing the number of processed records.
  • Designed and developed complex impala SQL queries for 360 degree comparison of data.
  • Connected Hive and Impala to Tableau reporting tool and generated graphical reports.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Process unstructured data utilizing Spark, Impala and Hive.
  • Worked closely with developers to enhance queries in Impala.
  • Worked on analyzing different big data analytic tools including Hive, Impala and Sqoop in importing data from RDBMS to HDFS.
  • Stored the data in an Apache Cassandra Cluster Used Impala to query the Hadoop data stored in HDFS.
  • Have been writing Python programs to assist in moving and analyzing data, primarily from Vertica and impala.
  • Implemented extensive Impala 2.7.0 queries and creating views for adhoc and business processing.
  • Installed Cloudera CDH 5.3 Enterprise edition with YARN, Spark and Impala.
  • Developed Hive and Impala scripts on Avro and parquet file formats.
  • Worked on Hive, Impala, Sqoop.
  • Used Spark API 2.0.0 over Cloudera to perform analytics on data in Impala 2.7.0.
  • Implied ETL from Impala Hadoop system to Neo4j Graph Database.

Show More

45. Redshift
demand arrow
low Demand
Here's how Redshift is used in Data Engineer jobs:
  • Developed migration framework to RedShift DW platform.
  • Designed the production Redshift data warehouse star schema for the proper balance of performance, cost, and flexibility.
  • Create ETL jobs that wrangled, aggregated, and migrated data from PostGres to Redshift to meet BI demands.
  • Tuned Performance of Redshift DW by creating suitable Distribution Styles and Sort Keys on Dimensions and Facts.
  • Worked on AWS- Redshift Data ware house offering from Amazon in Display Advertising group.
  • Experienced in data warehousing using Amazon Redshift when dealing with large sets of data.
  • Worked on migrating the existing BI system to the AWS cloud and Redshift.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Programmed ETL functions between Oracle and Amazon Redshift.
  • Validated data on cloud using Amazon Redshift.
  • Piloted project evaluating Amazon Redshift for analytics.
  • Involved in migrating and testing ETL jobs from Oracle, Netezza to Hive (built on HDFS), AWS Redshift Database
  • Included designing different types of schema to maximize the distributing capability for Redshift and collecting data from multiple sources (i.e.
  • Created and managed a data-warehouse and a data-lake for BI analytics platform in Redshift.
  • Prepared, Scaled and transformed datasets for classification models using R and Redshift.
  • Create new tables in Amazon Redshift using sortkeys, distkeys and primary-foreign key references to optimize performance.
  • Design of Redshift Data model, Redshift Performance improvements/analysis Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Show More

46. File System
demand arrow
low Demand
Here's how File System is used in Data Engineer jobs:
  • Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
  • Ingested the data from various file system to HDFS using Unix command line utilities.
  • Develop shell scripts and invoke through program to process operations on HDFS file system.
  • Developed a HDFS plugin to interface with proprietary file system.
  • File system management with VERITAS volume manager and LVM.
  • Developed extract of file system info.
  • Load data from Linux file system to HDFS (Hadoop framework) and manipulate /analyze the data utilizing Impala and Hive.
  • Pipelined the output of the filtered data in HDFS file system and same has been projected in a MapViewer.
  • Used SQOOP to import and export structured and semi structured data into Hadoop distributed file system for further processing.
  • Performed fsck on the file systems and bad super blocks were repaired using repair and analyze.
  • Involved in loading data from local file systems to Hadoop Distributed File System.
  • Performed File system management and monitoring on Hadoop log files.
  • Advised file system engineers on Hadoop specific optimizations.
  • Used shell scripts to dump the data from MySQL to Hadoop Distributed File System (HDFS.
  • Extracted RDBMS data (Oracle, MySQL) to Hadoop Distributed File System (HDFS using Sqoop.

Show More

182 File System Jobs

No jobs at selected location

47. EC2
demand arrow
low Demand
Here's how EC2 is used in Data Engineer jobs:
  • Developed applications with high performance utilizing EMR-EC2 instances by choosing appropriate instance types and capacity.
  • Moved physical servers to AWS EC2 Cloud by creating multiple cloud instances using Elastic IP and Elastic Block Volumes.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
  • Designed, developed and deployed CSV Parsing using the big data approach on AWS EC2.
  • Used Amazon EC2 to utilize GPU acceleration for the heavy-load and computation power demanding CNN.
  • Experience in migration of data across cloud environment to Amazon EC2 clusters.
  • Experience in AWS cloud environment and on s3 storage and ec2 instances.
  • Developed a task execution framework on EC2 instances using SQL and DynamoDB.
  • Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
  • Installed and configured Hive and Pig environment on Amazon EC2.
  • Worked and developed solutions in major AWS services like Lambda, EMR, EC2, Kinesis, Elastic Search, CloudFormation.
  • Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Used Aspera Client on Amazon EC2 instance to connect and store data in the Amazon S3 cloud.
  • Provisioned servers using public cloud (AWS EC2) and private cloud (vSphere).
  • Worked with administration team and created the Load Balancer on AWS EC2 for unstable cluster.
  • Set up various instances of Yellowfin AMI on multiple Amazon EC2 instances.
  • Worked on AWS to create, manage EC2 instances and Hadoop Clusters.
  • Ported from Microsoft BI stack to Amazon EC2 & Redshift.
  • Worked on orchestrating Hadoop, Pig and Hive workflows on EC2 Cluster using Apache Oozie.

Show More

1,664 EC2 Jobs

No jobs at selected location

48. Perl
demand arrow
low Demand
Here's how Perl is used in Data Engineer jobs:
  • Developed automated process utilizing Shell/Perl scripts for operations users to maintain data.
  • Converted all chain of custody forms and legal documents into PDF format minimizing any risk of information being improperly altered.
  • Ensured the proper equipment was procured and properly installed in accordance with regulations while maintaining an appropriate budget.
  • Created automated feeds of data using both SSIS and Perl to provide the business with the critical data.
  • Generate TOP business KPI using Perl scripting in the MS Excel for Business users and analysts.
  • Created API and Web-Services using REST to other Web-properties of the company using Perl.
  • Created a Dashboard using Perl/CGI, Strawberry Perl for Windows and Apache Web Server.
  • Developed Shell and Perl scripts to automate the DBA monitoring and diagnostic jobs.
  • Design and code web services automation using Perl SOAP and web API.
  • Introduced PERL automated framework to assure quality of SQL code.
  • Design and Develop File Based data collections in Perl.
  • Develop Unix and PERL shell scripts.
  • Implemented the same using PERL, PHP and MySQL.
  • Experience in UNIX, SQL, and Perl.
  • Develop QA automation SQL/Perl scripts for validating ETL processes and BI reporting for Yahoo advertiser reporting datawarehouse.
  • Tested for eNB handover to check if the handover is taking properly from 4G to 3G and vice versa.
  • Customized core Joomla functionality in Php to interface and integrate the Existing site in Perl.
  • Develop ETL scripts using Perl, Python, Scala on Linux/Hadoop Platform.
  • Build using mod_perl and mongodb.
  • Spark Developed Perl Scripts to generate compatible DDL for Hive External Scala, Scala play, tables.

Show More

180 Perl Jobs

No jobs at selected location

49. Pig UDF
demand arrow
low Demand
Here's how Pig UDF is used in Data Engineer jobs:
  • Designed and implemented Hive and Pig UDF's for evaluation, filtering, loading and storing of data.
  • Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Experienced with Pig Latin operations and writing Pig UDF's to perform analytics.
  • Developed Pig UDF's to handle complex parsing and business logics.
  • Developed the Pig UDF's to pre-process the data for analysis.
  • Developed Pig UDFs to specifically preprocess and filter data sets for analysis.
  • Experience writing reusable custom Hive and Pig UDFs in Java, and using existing UDFsfrom Piggybanks and other sources.

Show More

50. Log Data
demand arrow
low Demand
Here's how Log Data is used in Data Engineer jobs:
  • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
  • Graph database modeling for log data and brute-force attack identification using R and Neo4j graph database.
  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Ingested and integrated the unstructured log data from the web servers onto cloud using Flume.
  • Analyze the log data from Elastic Search to find the pattern of service/host failures.
  • Analyzed the performance log data to improve the application response time and performance.
  • Log data and implemented Hive custom UDF's.
  • Designed and implemented monitoring service using distributed frameworks for persistence and analysis of metric and log data within Macys.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Interpreted lithology based on formation samples and electric log data.
  • Streamed log data from mobile and webservers into Hadoop cluster using a sqoop.

Show More

22 Log Data Jobs

No jobs at selected location

Data Engineer Jobs

NO RESULTS

Aw snap, no jobs found.

20 Most Common Skills For A Data Engineer

Pl/Sql

12.6%

Database

8.0%

ETL

7.9%

Hadoop

7.4%

Hdfs

5.3%

Python

4.9%

Sqoop

4.7%

Analytics

4.7%

Web Application

4.4%

Cloud

4.3%

Data Warehouse

4.3%

Big Data

4.2%

Procedures

3.8%

Hbase

3.6%

SQL

3.6%

Oozie

3.4%

Unix

3.3%

Data Analysis

3.2%

Linux

3.2%

Mapreduce

3.2%
Show More

Typical Skill-Sets Required For A Data Engineer

Rank Skill
1 Pl/Sql 8.5%
2 Database 5.4%
3 ETL 5.3%
4 Hadoop 5.0%
5 Hdfs 3.6%
6 Python 3.3%
7 Sqoop 3.2%
8 Analytics 3.2%
9 Web Application 2.9%
10 Cloud 2.9%
11 Data Warehouse 2.9%
12 Big Data 2.8%
13 Procedures 2.6%
14 Hbase 2.5%
15 SQL 2.4%
16 Oozie 2.3%
17 Unix 2.2%
18 Data Analysis 2.2%
19 Linux 2.2%
20 Mapreduce 2.1%
21 Kafka 1.8%
22 Scala 1.7%
23 Flume 1.7%
24 BI 1.6%
25 API 1.6%
26 Business Requirements 1.4%
27 Informatica 1.4%
28 Teradata 1.3%
29 XML 1.2%
30 Nosql 1.2%
31 Ssis 1.1%
32 R 1.1%
33 Json 1.1%
34 S3 1.0%
35 Zookeeper 1.0%
36 Spark SQL 1.0%
37 Rdbms 0.9%
38 Ingest 0.9%
39 Data Quality 0.9%
40 Log Files 0.9%
41 Data Processing 0.9%
42 Avro 0.9%
43 Mongodb 0.8%
44 Impala 0.8%
45 Redshift 0.8%
46 File System 0.8%
47 EC2 0.7%
48 Perl 0.7%
49 Pig UDF 0.7%
50 Log Data 0.7%
{[{skill.rank}]} {[{skill.name}]} {[{skill.percentageDisplay}]}%
Show More

58,701 Data Engineer Jobs

Where do you want to work?

To get started, tell us where you'd like to work.
Sorry, we can't find that. Please try a different city or state.