Top Data Engineer Skills

Below we've compiled a list of the most important skills for a Data Engineer. We ranked the top skills based on the percentage of Data Engineer resumes they appeared on. For example, 8.5% of Data Engineer resumes contained Pl/Sql as a skill. Let's find out what skills a Data Engineer actually needs in order to be successful in the workplace.

The six most common skills found on Data Engineer resumes in 2020. Read below to see the full list.

1. Pl/Sql

high Demand
Here's how Pl/Sql is used in Data Engineer jobs:
  • Created PL/SQL packages to be called by Web Application to process and store data for Consumer Data Privacy Preference.
  • Worked on Programming using PL/SQL, Stored Procedures, Functions, Packages, Database triggers for Oracle and SQL.
  • Created database objects such as stored procedures, views, and materialized views in Oracle 10g using PL/SQL.
  • Developed SQL queries, PL/SQL programming Packages, Procedures, and Functions to meet various user/business requirements.
  • Major technologies: Oracle, PL/SQL, XML, J2EE, Java, Agile/Scrum methodologies
  • Developed suite of reports used by Domain Experts, requiring complex SQL and PL/SQL.
  • Developed modules in PL/SQL and scheduled them to run using BMC scheduling software.
  • Create extract data files for auditing or external customers using SQL or PL/SQL.
  • Coded and implemented PL/SQL packages to perform application security and batch job scheduling.
  • Developed PL/SQL procedures, functions and packages for automating the monthly reports.
  • Developed and tested stored procedures, functions and packages in PL/SQL.
  • Involved in developing high performance database applications using SQL and PL/SQL.
  • Object Orient programming implementation in PL/SQL by using Object Types.
  • Developed payroll reports using PL/SQL in Oracle ERP system.
  • Used PL/SQL as a QA tool for all changes.
  • Managed, cleaned data, removed unusable traces, correct/ remove errors of flight information with Oracle PL/SQL 2.
  • Migrate and convert MSSQL stored procedures to Oracle PL/SQL Create and maintain month end reporting process and reports.
  • Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.

Show More

2. Relational Databases

high Demand
Here's how Relational Databases is used in Data Engineer jobs:
  • Developed new OLAP and relational databases.
  • Imported millions of structured data from relational databases using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop.
  • Experience in migrating data from relational databases to Cassandra.
  • Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
  • Work on the relational databases deployed with GC Computing engine.

Show More

3. ETL

high Demand
Here's how ETL is used in Data Engineer jobs:
  • Designed ETL architecture and architecture documents and mapping documents.
  • Developed ETL jobs using DataStage to populate dimensional models.
  • Scheduled/monitored ETL jobs and responsible for troubleshooting failures.
  • Implemented ETL metadata and perform data validation.
  • Used Spark as an ETL tool to remove Duplicates, Joins and aggregate the input data before storing in a Blob.
  • Use and maintenance of configurable C++ modules allowed us to create efficient ETL strategies quickly for each customer sending us information.
  • Monitored day-to-day ETL solutions, scheduled regular database backups, restoring the databases and coordinated with DBA's and Business users.
  • Extract, transform, and load(ETL) data from multiple federated data sources (relational database) in Spark.
  • Modeled the mart design and created ETL to store the data in multiple pivots along with aggregated tables for faster access.
  • Analyzed Business and systems requirements and playing a key role in implementing the system using Data Stage as ETL tool.
  • Lead and implemented multiple key data warehousing ETL and BI solutions for various internal business units including CTG and SBG.
  • Consolidated three scheduled ETL jobs into one and improved time from source to target from 3 days to 2 hours.
  • Implement ETL framework to provide features such as Master Data Management, ETL-restart capability, security model and version control.
  • Installed and configured Pig for ETL jobs and made sure we had Pig scripts with regular expression for data cleaning.
  • Accessed, configured, and tuned SQL Server databases to store, query, and perform ETL processes on data.
  • Build ETL, create low latency dashboards, build web interactive forms and integrate calendar for P&G.
  • Developed an ETL system to replace an existing data pipeline built on DTS using Python 2.7 and SQL Server.
  • Tested graphs for extracting, cleansing, transforming, integrating, and loading data using Data Stage ETL Tool.
  • Provide ETL design, development and testing estimations based business requirements and research into data that is currently sourced.
  • Created ETL error log which stored errors so if packages fail, it sends email notifications with error attachments.

Show More

4. Hadoop

high Demand
Here's how Hadoop is used in Data Engineer jobs:
  • Used Spark on HADOOP to analyze data.
  • Provided solution in supporting Hadoop Developers and assisting in troubleshooting and optimization of MapReduce jobs and Pig Latin scripts.
  • Enhanced recent computer science education with system administration skills developed while standing up and maintaining Hadoop big data platforms.
  • Involved in integrating Hadoop into existing technology stacks and software portfolios to achieve maximum Business value.
  • Validated and Recommended on Hadoop Infrastructure and data center planning considering data growth.
  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Performed operating system installation, Hadoop version updates using automation tools.
  • Involved in Design and Development of technical specifications using Hadoop technologies.
  • Mentored an intern working on recommendation engine using Hadoop/ Mahout.
  • Performed data validation and transformation using Python and Hadoop streaming.
  • Installed and configured Hadoop Cluster for development and testing environment.
  • Integrated Hadoop execution environment with enterprise job scheduling environments.
  • Maintain System integrity of all sub-components related to Hadoop.
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Provided solution for MapReduce development using Hadoop.
  • Developed data processing pipeline in Hadoop.
  • Involved in installing Hadoop Ecosystem components.
  • Developed custom ETL solution, batch and real time data ingestion pipeline to move data in and out from Hadoop.
  • Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.

Show More

5. Data Warehouse

high Demand
Here's how Data Warehouse is used in Data Engineer jobs:
  • Prioritize, address and resolve support requests regarding data warehouse/reporting systems, coordinating with other IT departments when necessary.
  • Resolved client support tickets regarding the data warehouse production environment such as data accuracy, timeliness and validity.
  • Participated in design of Staging Databases and Data Warehouse/Data mart using Star Schema/Snowflakes schema in data modeling.
  • Develop and implement data warehouse solutions while observing industry best practices and established standard operating procedures.
  • Designed staging database and data mart for data warehouse using Star Schema Dimensional Modeling Methodology.
  • Modeled a financial data warehouse creating a golden version of data incorporating sentimental analysis.
  • Provided comprehensive system administration support for large data warehouses.
  • Prepared data warehouse and ETL documentations.
  • Managed, designed, and maintained a data warehouse used to track liquidity data for several equity markets around the globe.
  • Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into the Data Warehouse.
  • Created several SSIS packages in SQL Server 2012 environment to load data from OLTP to staging and to Data Warehouse incrementally.
  • Led efforts to develop a Data Model used for populating the Data Warehouse with Star Schema as the primary technology.
  • Design and write summary and detail reports against the Data Warehouse DB using Crystal Reports, SAS and Oracle Discoverer.
  • Created ETL jobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.
  • Have extensive experience in developing efficient PL/SQL code for OLTP environment and ETL code for the Data Warehouse environment.
  • Designed and developed ETL scripts to process past chat records to load into Agent IQ's data warehouse.
  • Worked on HRIS Data warehouse and wrote new PL/SQL functions to add Stock vest logic for different countries.
  • Engineered and supported the Campaign Data Mart (Oracle Database) as part of the data warehouse team.
  • Analyze, develop, test and support data integration processes in support of the Data Warehouse project.
  • Required to monitor the performance of the data warehouse and work with team members for technology decisions.

Show More

Job type you want
Full Time
Part Time
Internship
Temporary

6. Hdfs

high Demand
Here's how Hdfs is used in Data Engineer jobs:
  • Used interceptors with RegEx as part of flume configuration to eliminate the chunk from logs and dump the rest into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Created and written queries in SQL to update the changes in MySQL when we upload or delete file in HDFS.
  • Worked on Linux shell scripts for business processes and with loading the data from different systems to the HDFS.
  • Developed a Spark job in Java which indexes data into ElasticSearch from external Hive tables which are in HDFS.
  • Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
  • Developed Hive scripts to enrich, cleanse and transform data to load them on to HDFS with dynamic partitioning.
  • Stored data from HDFS to respective Hive tables for further analysis in order to identify the Trends in data.
  • Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and processing.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Developed ETL process in PySpark to effectively parse and load huge volume of data onto HDFS and S3.
  • Used the source agent in Apache Flume to import the data, especially logs information, into HDFS.
  • Used Apache Kafka as messaging system to load log data, data from UI applications into HDFS system.
  • Performed data extraction from various databases to use in statistical modeling including HDFS, MySQL and Flat files.
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Involved in loading data from UNIX file system to HDFS using Flume and Kettle and HDFS API.
  • Create tables, analyze data and write SQL queries in Impala on the processed data in HDFS.

Show More

7. Python

high Demand
Here's how Python is used in Data Engineer jobs:
  • Developed fuzzy-text matching program in Python to identify and eliminate redundancy on county instruments.
  • Developed data quality portal using Python scripting.
  • Implemented automation scripts using Python.
  • Developed internal tools for text mining (sentimental analysis), and web crawler for collecting data by Python and SQL.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Developed python scripts to parse the raw data, populate staging tables and store the refined data in partitioned tables.
  • Write scripts through Python to accomplish various repetitive tasks, validate database fields, and to write to databases.
  • Developed automation scripts in python to automate the test, analyze, plot and report the results.
  • Used Python to scrub and transform medical procedure data, and loaded clean data into ElasticSearch.
  • Developed Python and Bash, Shell scripts to automate the end-to-end implementation process of AI project.
  • Used Spark-Python as part of doing a POC to migrate an existing MapReduce job to Spark.
  • Developed shell scripting and Python programs to automate the data flow on day to day tasks.
  • Used Linux, python, pandas, and SQL to accomplish tasks assigned to me.
  • Experience in developing customized UDF's in Python to extend Hive and Pig Latin functionality.
  • Created Python API loader for mobile reports that are viewed at the CIO level daily.
  • Created Python programs to scrape text from different websites containing scanned images of county records.
  • Involved in converting Hive SQL queries into Spark transformation using Spark RDD's, Python.
  • Created solution to facilitate collection of real time logs using Apache Spark and Python.
  • Gathered data on select companies by creating unique web-scraping programs in Ruby and Python.
  • Extracted feeds from social media sites such as Facebook, TwiNer using Python scripts.

Show More

8. Sqoop

high Demand
Here's how Sqoop is used in Data Engineer jobs:
  • Create Data Models in Cassandra using CQL and SQOOP.
  • Developed UDF's using both DataFrames/ SQL and RDD in Spark for data Aggregation queries and reverting into OLTP through Sqoop.
  • Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
  • Streamlined SQOOP import from Oracle to Hive and implemented scripts to export the aggregations back to Microstraregy.
  • Developed the Sqoop scripts in order to make the interaction between Apache Pig and MYSQL Database.
  • Configured Sqoop and developed scripts to extract structured data from PostgreSQL onto Amazon S3 cloud.
  • Designed and developed a framework to automate the creation of Sqoop jobs.
  • Created Sqoop job with incremental load to populate Hive External tables.
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Proposed an automated system using Shell script to sqoop the job.
  • Captured data from existing databases that provide SQL interfaces using Sqoop.
  • Exported analyzed data to S3 using Sqoop for generating reports.
  • Exported the patterns analyzed back to MYSQL using Sqoop.
  • Scheduled ingestion scripts to support incremental loading using Sqoop.8.
  • Migrated data using sqoop from HANA and Oracle.
  • Automated the Sqoop jobs using Shell Scripts.
  • Worked extensively with Sqoop for importing data.
  • Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
  • Handled in Migrationof 160 tables from oracle to HDFS using Apache Sqoop and Z connector.
  • Developed the Sqoop scripts to make the interaction between Hive and vertica Database.

Show More

9. Analytics

high Demand
Here's how Analytics is used in Data Engineer jobs:
  • Used advanced analytics techniques such as regression, decision trees and optimization to design algorithms for advertisement targeting.
  • Provide expertise, advice and support regarding database development, analytics and performance tuning.
  • Synthesized and advocated insights, and recommendations from data analytics and modeling.
  • Lead data integration process life cycle from Requirements gathering to Business Analytics.
  • Developed ad-clicks based data analytics, for keyword analysis and insights.
  • Develop analytics services to support internal operations.
  • Designed Diagnostics Analytics using Tableau Desktop.
  • Designed and developed real-time analytics processing tools, as well as batch processes to analyze approximately 7 million signals each day.
  • Implemented SAP BODS Data Services 4.1 for Demeter Analytics, a database marketing firm located in Alexandria, VA.
  • Worked with product and analytics team to test hypothesis, answer key questions, and create reports and dashboards.
  • Worked on Recruiting Analytics (RA), a dimensional model designed to analyze the recruiting data in Amazon.
  • Implemented the analytics dashboard used to track app features performance: search, feed and each single app.
  • Devise the architecture for data provisioning, data storage, data extraction, data transformation and data analytics.
  • Engaged in various internal R&D effort based on NLP technologies for social and media analytics.
  • Experience working with several windowing and Analytics functions of Hive for aggregating data of a specific range.
  • Designed a real estate data acquisition system to enable managing, searching, and analytics for agents.
  • Develop new advanced targeting and data analytics to further enhance NDN's digital media exchange platform.
  • Project was presented during in-house data analytics expo and yearly project highlights town hall meeting.
  • Worked with SSAS cubes, Excel and other UI frameworks to implement the visual analytics.
  • Converted semi structured data to structured data and imported to Spot fire with analytics team.

Show More

10. Amazon Web

high Demand
Here's how Amazon Web is used in Data Engineer jobs:
  • Experience using Amazon Web Services.
  • Initial deployment of 5 node Hortonworks distributed environment on Amazon Web services.
  • Implemented the web interface using Flask and deployed it using Amazon Web Service.
  • Utilized Amazon Web Services for cloud computing, from networking to storage.
  • Used Amazon Web Services to host website for app.
  • Worked with the database on Amazon Web Services.

Show More

11. Cloud

high Demand
Here's how Cloud is used in Data Engineer jobs:
  • Liaised with multiple IC community/civilian customers and working groups in Cloud/data flow development and coordination of parser design.
  • Designed and developed ingestion framework to cleanse, validate, transform and load data on to GCS and BigQuery on cloud.
  • Experience in deployment of Cassandra cluster in cloud, premises and data storage and their disaster recovery.
  • Work as software developer in Dev-Ops Model to support Data Platform Engineering Service in Private Cloud.
  • Achieved success in providing and growing Public, Private and Hybrid cloud VM in NYC.
  • Monitor and evaluate the industry trends and directions in the Cloud technologies and tools.
  • Manage a team of subject matter experts for Cloud Security and Data Loss Protection.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Work supported a large scale data framework using cloud technologies.
  • Developed a public private cloud platform decision model.
  • Experienced with monitoring Cluster using Cloudera manager.
  • Identified the feedback using sentiment analysis based on the comments Integrated the data cloud based on the analysis performed on technology streams
  • Interact with the Vendor (Cloudera)for any Technical issues.
  • Worked on cloud computing infrastructure (e.g.
  • Installed Cloudera Manager CDH on the clusters.
  • Designed and developed collections, shards and replicas in Solr and successfully implemented Cloudera Search functionality.
  • Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
  • Worked on POC and implementation & integration of Cloudera & Hortonworks for multiple clients.
  • Supported Aretta Cloud PBX, and Broadsoft Virtual Cloud Phone Systems and Networks.
  • Installed Cloudera tools, Ganglia and Nagios tools for production environment.

Show More

12. Data Analysis

high Demand
Here's how Data Analysis is used in Data Engineer jobs:
  • Performed wireless data analysis, mapping, quality assurance and develop effective business strategic plans and weekly/daily status reports for managers.
  • Performed data analysis and modeling using statistical regression methods for evaluating recommendations based on optimal performance and cost data.
  • Performed data analysis and optimized algorithms using various statistical & mathematical models for ice events pattern recognition.
  • Performed data analysis on Facebook accounts to segment customers for more effective, targeted marketing.
  • Provided efficient coding to reduce large data analysis time for time sensitive credit applications.
  • Prepared data analysis for independent driver coaching and engine monitoring.
  • Provided feedback to engineers regarding information obtained from data analysis.
  • Perform complex data analysis to identify patterns and anomalies.
  • Developed Pig scripts for data analysis and perform transformation.
  • Perform data analysis in an evolving data environment.
  • Performed data analysis using SQL and excel
  • Used big data tools such as Apache Pig, HIVE, R-Studio, and Apache Spark to complete data analysis tool.
  • Used Power BI Power Pivot to develop data analysis, and used Power View and Power Map to visualize reports.
  • Created and maintained a post mission data request database for subsystem post flight data analysis for all Space Shuttle flights.
  • Created and maintained a post test data request database for subsystem test data analysis of all Shuttle software ground tests.
  • Monitored critical daily jobs, performed data analysis to identify the root cause and fixed the issues using SQL queries.
  • Source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load) and HiveQL.
  • Gather and clarify requirements of customized data analysis, reporting and data extracts to meet client need and requests.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Developed Complex ETL code through Data manager to design BI related Cubes for data analysis at corporate level.

Show More

13. Big Data

high Demand
Here's how Big Data is used in Data Engineer jobs:
  • Implemented Big Data solutions and analyzed virtual machine requirements.
  • Designed the application framework, data strategies, tools and technologies using the Big Data and Cloud technologies.
  • Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios.
  • Involved in working with large sets of big data in dealing with various security logs.
  • Designed, developed and maintained Big Data streaming and batch applications using Storm.
  • Involve in defining and implementing Big Data BI reports and data marts.
  • Understand the requirements and prepared architecture document for the Big Data project.
  • Design, development and deployment of custom applications with big data.
  • Design the Global Match Utility using the power of Big Data.
  • Create and maintain Tableau reports based on Big Data requirements.
  • Started using Apache Spark2.0 to simplify the big data management.
  • Worked as the team member in the platform group (Big Data).
  • Collaborate with architects to define Big Data implementation in Datawarehouse.
  • Developed analytical solutions, data strategies, tools and technologies for the marketing platform using the Big Data technologies.
  • Developed Use cases in demonstrating the current Big Data Eco Systems and inter connected them on their functionalities.
  • Analyzed Hadoop cluster using big data analytic tools including Kafka, Pig, Hive, Spark.
  • Defined Big Data Roadmap and Capabilities for the bank and the asset management firm.
  • Worked with analysts to understand and load big data sets into Accumulo.
  • Support the Innovation Center, specifically the Big Data Platform & Analytics tools-platforms available Hortonworks.

Show More

14. Hbase

high Demand
Here's how Hbase is used in Data Engineer jobs:
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Perform the coding and testing on Load, Extract the data into the HBASE database.
  • Performed Hive-Hbase Integration for providing some additional functionalities like aggregation, sorting etc.
  • Worked on ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as HBase, and Hive.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from a variety of data sources.
  • Implemented the NoSQL databases like HBase, the management of the other tools and process observed running on YARN.
  • Used Hbase tables to capture all job statistics and maintained job history table based on batch numbers.
  • Used Spring Data for NoSQL HBase, Cassandra, & Hibernate/JPA for MySql & Derby.
  • Involved in adding huge volume of data in rows and columns to store in HBase.
  • Used Hbase to store majority of data which needs to be divided based on region.
  • Worked on NoSQL databases like Cassandra, HBase to store structured and unstructured data.
  • Created HBase tables to store variable data formats of data coming from different applications.
  • Imported the huge sized TSV files to HBase for storing, using bulk load.
  • Performed transformations and aggregation to build data model and persists the data into HBase.
  • Created HBasetables to store variable data formats of data coming from different portfolios.
  • Developed counters on HBase data to count total records on different tables.
  • Implemented helper classes that accessHBase directly from Java using Java API.
  • Integrated the Hive warehouse with HBase for information sharing among teams.
  • Used pig loader for loading tables from HBase to various clusters.
  • Developed the storage system interface using Hbase as the database.

Show More

15. SQL

average Demand
Here's how SQL is used in Data Engineer jobs:
  • Developed SparkSQL automation components and responsible for modifying java component to directly connect to thrift server.
  • Monitored performance and optimized SQL queries for reducing engineering time and better efficiency.
  • Developed SQL based time-bound notification alerts for subjects meeting compliance/noncompliance criteria.
  • Performed data cleaning and data manipulation activities using NZSQL utility.
  • Tuned MySQL database for optimizing against predictive weather model data.
  • Continued working with them to created and led an application DBA team for SQL Server, Oracle, and DB2 UDB.
  • Utilized SQL to pull data from various databases and created various dashboards, scorecards and reports to support business decision making.
  • Championed the shift from existing extracts to excel spreadsheets from DB2 to using SQL Server T/SQL in relational data marts.
  • Created number of jobs, alerts and operators to be paged or emailed in case of failure for SQL 2000.
  • Worked on complex SQL queries to create database views and used those views in RPD to create few complex reports.
  • Designed SQL views, triggers and queries as per the requirement of data for testing screen of each insurance policy.
  • Created complex stored procedures with T-SQL and D-SQL to profile the data and discover potential anomalies and missing data values.
  • Utilized spatial data tools such as Shp2pgsql, ogr2ogr to perform complex transformations and standardize the raw spatial data.
  • Created SQL codes from data models and interacted with DBA's to create development, testing and production database.
  • Involved in developing the Spark Streaming jobs by writing RDD's and developing data frame using SparkSQL as needed.
  • Worked on blending, manipulating, cleaning of large data sets on the back-end using SQL in SAP HANA.
  • Converted complex business logic into SQL Stored Procedures and user-defined functions to achieve functionality required by the UI team.
  • Generated reports using SQL procedures to view patient information, doctor itineraries, hospital administration, inventory and sales.
  • Mentor Data Engineers and SQL Developers with less experience * Continually strive to find opportunities for process improvement.
  • Involved in creating database objects like tables, views, procedures, triggers, and functions using PL-SQL.

Show More

16. Oozie

average Demand
Here's how Oozie is used in Data Engineer jobs:
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozieworkflows.
  • Develop Oozie job to send email after Spark job completion with status as success or failure with details.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Developed Pig scripts to transform the data into structured format; automated through Oozie coordinators.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Design and development Oozie job to run spark job on daily bases.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Configured and scheduled MapReduce jobs manually and through Oozie and Yarn.
  • Provided solution in defining job flow and work flow using Oozie.
  • Developed the OOZIE workflows for the Application execution.
  • Scheduled all jobs using Oozie and monitored workloads.
  • Used Oozie to automate/schedule business per the requirements.
  • Implemented Oozie coordinators for job scheduling and processing.
  • Scheduled the batch loading jobs using oozie.
  • Job management using Oozie scheduler.
  • Created production jobs using Oozie work flows that integrated different actions like MapReduce, Sqoop, and Hive.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Used Oozie Scheduler system to automate the pipeline workflow and extract the data on a timely manner.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS for analysis.
  • Developed dynamic workflow engine using oozie and java which schedules most of our batch jobs.

Show More

17. Unix

average Demand
Here's how Unix is used in Data Engineer jobs:
  • Perform quality assurance for data input/output on all orders including data manipulation in UNIX environment if necessary.
  • Set up and coordinated legacy data loads from flat files into SAS staged data repositories using UNIX shell scripts and CRON.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Applied UNIX administration skills whenever it is required to access from putty and terminal.
  • Developed Unix Shell Scripts for batch processing, data loading and extraction in parallel.
  • Tune application and query performance using UNIX and SQL tools.
  • Involved in maintaining various Unix Shell scripts.
  • Consulted with Elkay Manufacturing to build a data mart from a PeopleSoft ERP database using Oracle under Unix.
  • Created parameter files with Global, mapping, session and workflow variables using UNIX scripts.
  • Involved in migrating hardware and software from UNIX/ Informix to Windows/ SQL Server 2005.
  • Diagnosed and resolved issues through reading Tivoli Workload Scheduler logs retrieved through UNIX.
  • Developed UNIX shell scripts, Apxrcv scripts and Visual Basic programs.
  • Converted existing UNIX scripts and SAS programs to Ab Initio Graphs.
  • Used the Crontab in UNIX for Automatic Tasks.

Show More

18. Linux

average Demand
Here's how Linux is used in Data Engineer jobs:
  • Installed, configured, upgraded and administrated Linux Operating Systems.
  • Worked with Linux server admin team in administering the server hardware and operating system.
  • Maintained the configuration files of UNIX servers, AIX, HP-UNIX, Solaris and Linux resulting in servers being security compliant.
  • Scheduled Tasks on Linux Using Crontab and Shell Scripts.
  • Perform web-based deployments using Websphere in a linux environment.
  • Improved scoring algorithm using statistical factor analysis methodology o Deployed and maintained the hosting server in a Linux environment

Show More

19. Mapreduce

average Demand
Here's how Mapreduce is used in Data Engineer jobs:
  • Developed and implemented MapReduce programs for analyzing Big Data with different file formats like structured and unstructured data.
  • Implemented MapReduce programs to perform joins using secondary sorting and distributed cache.
  • Designed and oversaw implementation of MapReduce-based large-scale parallel relation-learning system.
  • Provided solution in development of MapReduce jobs using Java.
  • Developed MapReduce layer to source targeted data for visualization.
  • Developed MapReduce jobs using Java for data transformations.
  • Implemented Data classification algorithms using MapReduce design patterns.
  • Designed and developed MapReduce programs for data lineage.
  • Developed MapReduce jobs for data cleaning and manipulation.
  • Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of MapReduce outputs.
  • Integrated product catalog structure with MapReduce framework for Recommendations (ML - Classification and Regression algorithm) and POI strategy.
  • Develop MapReduce job for AVRO conversion and load the AVRO data to hive table using the SerDe's.
  • Develop MapReduce & Pig scripts to process high volumes of data-understand and deploy various open source technologies.
  • Implemented Custom Input formats that handles input files received from java applications to process in MapReduce.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Created Hive External tables in partitioned format to load the processed data obtained from MapReduce.
  • Divided each data set in to corresponding categories by fallowing MapReduce Binning design pattern.
  • Developed MapReduce jobs using Hive, and Pig to extract and analyze data.
  • Developed MapReduce/EMR jobs to analyze the data and provide heuristics and reports.
  • Integrated native Map-Reduce jobs in PIG data pipeline using the MAPREDUCE Command.

Show More

20. Business Requirements

average Demand
Here's how Business Requirements is used in Data Engineer jobs:
  • Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.
  • Receive business requirements from stakeholders and converting them to technical specs then developing the solution.
  • Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
  • Participated in user meetings, gathered Business requirements & specifications for the data reports.
  • Work closely with business, transforming business requirements to technical requirements.
  • Interacted with Business Analyst to understand the business requirements.
  • Involved from the beginning of the design, architecture phase and determined what open source frameworks suites for the business requirements.
  • Analyzed business requirements and delivering test plans and other artifacts such as test scenarios, test cases and reports.
  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
  • Designed and created mappings to deal with Change Data Capture (CDC) for various business requirements.
  • Work with end users and translate their business requirements into technical tasks using Test Driven Development.
  • Collected the business requirements from the subject matter experts like data scientists and business partners.
  • Created reports with the help of SSRS, Tableau and presentations to meet business requirements.
  • Performed walk through of the data models with the stakeholders to confirm business requirements.
  • Capture business requirements from the users and lead the team by providing technical solutions.
  • Gathered business requirements from the power users and designed the Reports based on requirements.
  • Analyze source systems and business requirements and create functional and technical design documents.
  • Designed customized dashboards and SSRS reports for business users as per business requirements.
  • Experience in writing complex spatial queries using SQL/PostGIS that satisfy business requirements.
  • Performed unit testing to meet the functional, technical and business requirements.

Show More

21. Kafka

average Demand
Here's how Kafka is used in Data Engineer jobs:
  • Designed and configured Kafka cluster to accommodate heavy throughput messages per second.
  • Created Kafka topics and distributed to different consumer applications.
  • Involved in processing the streaming data as well as batch data using Apache Spark, Spark Streaming and Kafka.
  • Worked on Spark Streaming using Kafka to submit the job and start the job working in Live manner.
  • Evaluated Kafka v2.10-0.8.1.1, KafkaCat build 2/18/15, Camus build 2/18/15 and QuantiFind Kafka Offset Monitor v0.2.1.
  • Experience in creating NiFi flow to streaming data between Kafka, PulsarDB, ElasticSearch, and FTP.
  • Design and Development of adapters to inject and eject data from various data source to/from Kafka.
  • Involved in setting up the Kafka Streams Framework which is the core of enterprise inventory.
  • Handled partitions and replication of topics and broker/server for fault tolerance using Apache Kafka.
  • Harmonized data coming from different sources using Kafka to bring it to consistent format.
  • Worked on HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Streamed real time data by integrating Spark with Kafka for dynamic price surging.
  • Utilized Apache Kafka to develop real time reporting using its Java API.
  • Experienced in writing Producer and Consumer in Kafka using Java based client.
  • Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
  • Experienced in collecting the real-time data from Kafka using Spark Streaming.
  • Used Kafka to ingest data into a data center unit.
  • Experience on Kafka, RabbitMQ and various messaging systems.
  • Used Apache Kafka for handling real-time user data feeds.
  • Implemented a proof of concept using Kafka, Storm.

Show More

22. Scala

average Demand
Here's how Scala is used in Data Engineer jobs:
  • Provided status updates to management, escalating risks/challenges and providing proposed solutions.
  • Informed higher levels of critical office statuses where escalation was necessary.
  • Communicate and escalate issues appropriately.
  • Responded to and tracked issues, ensuring the customer is promptly notified while resolving and/or escalating to team members accordingly.
  • Implemented all the components following test-driven development (TDD) methodology and used ScalaTest 3.0.1 for unit testing.
  • Designed and developed SPARK-SCALA programs for performing ETL on large volumes of medical membership and claims data.
  • Received and processed customer trouble ticket escalations from Tier 1 and Tier 2.
  • Trouble Isolation and Resolution for issues that were escalated from Technical Support.
  • Coordinate and escalate network outages to III and IV Tier support.
  • Provided Escalation support for customer network relates issues or change requests.
  • Served as point of escalation for the resolution of complex issues.
  • Designed and implemented psychical model utilizing Oracle 11g to maximize query performance and maintain data scalability.
  • Develop and support the scalability and functionality of relational databases.
  • Developed Scala applications using Spring tool suits environment.
  • Worked with developers designing scalable supportable infrastructure.
  • Developed data marts (sourced from ODS) that provision fast, scalable and maintainable cubes for data mining by SSAS.
  • Improved stability and performance of the Scala plug-in for Eclipse, using product feedback from customers and internal users.
  • Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable.
  • Consolidated the small files for large set of data using spark Scala to create table on the data.
  • Developed a fault tolerant, highly scalable and robust pipeline to ingest large volume of unstructured data.

Show More

23. Flume

average Demand
Here's how Flume is used in Data Engineer jobs:
  • Executed custom interceptors for Flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Implemented Custom interceptors to Mask confidential data and filter unwanted records from the event payload in flume.
  • Analyzed the customer behavior by performing click stream analysis and to ingest the data using Flume.
  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Ingested and integrated the unstructured log data from the web servers onto cloud using Flume.
  • Developed various data ingestion pipelines using streaming tools like Flume, Spark and Kafka.
  • Architect-ed and developed a Zero Data Loss data pipeline using Flume and UM queues.
  • Worked in Agile development approach and Storm, Flume, Bolt, Kafka.
  • Developed custom Flume interceptor to transform near real time data.
  • Stack includes Flume into Cassandra with Storm and Kafka.
  • Worked with Flume to transfer online logs data.
  • Configured Flume to stream log files.
  • Worked on tools Flume, Kafka, Storm and Spark.
  • Research on HBase, flume, Yarn and Hive on spark for different project requirement.
  • Design and build data pipelines using Sqoop, Flume, and Kafka.
  • Managed and reviewed Hadoop log files using Flume and Kafka.
  • Worked on tools like Flume, Sqoop, Hive and PySpark.
  • Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
  • Used the Avro SerDe's for serialization and de-serialization of log files at different flume agents.

Show More

24. BI

average Demand
Here's how BI is used in Data Engineer jobs:
  • Major responsibilities included performing core data analyses and reporting functionality within the Analysis Tools Section of the Product Sustaining Group.
  • Provide design recommendations to the relevant T-Mobile engineering organization to improve the customer experience or ease of operation long-term.
  • Programmed BI visualization reports giving insights on patient performance thereby aiding the decision making for clinical drug investigators.
  • Delivered an interactive network visualization framework for quickly responding to and resolving network availability issues and service disruptions.
  • Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
  • Configured to Authenticate users and implemented both object level and Data level security based on roles and responsibilities.
  • Key areas were leveraging this capability to drive financial benefits, Information Management, Marketing and Risk Capabilities.
  • Collaborated remotely with third-party vendor for installation of lightning detection system upgrades inside of turbine located in Greece.
  • Report: power volatility of hydro-electric systems in Colombia, comparison of power pools and potential for gaming
  • Delivered a predictive model using random forest classification model to predict possibility of denial of service attack.
  • Used substantial technical infrastructure and management model and operational philosophy, which provided the flexibility and reliability.
  • Design Link system architecture to support Combined Test Bed laboratory testing efforts for respective departments.
  • Oversee day-to-day 24/7 high-availability operations including change management processes to ensure SLA is met.
  • Educate client management and business users on business value and capabilities of data warehousing.
  • Operated and maintained computerized data systems for the Gas Turbine Technologies Laboratory.
  • Designed custom tools for validating telecommunication market data over billions of records.
  • Position Responsibilities: - Monitoring and observing drilling parameters during drilling operation.
  • Conduct security risk assessments that analyzed both security controls and technical vulnerabilities.
  • Maintained well data integrity and reliability by consistently acquiring accurate data.
  • Participate in reliability studies and evaluation of SOA engineering designs constructs.

Show More

25. API

average Demand
Here's how API is used in Data Engineer jobs:
  • Utilized CMM and automated laser measurement equipment in the GE manufacturing center to perform dimensional inspections of turbine capital parts.
  • Acquainted with Comprehensive Capital Analysis and Review (CCAR) and Asset Liability Management (ALM) for various clients.
  • Implemented the GBM and Random Forest models in the system using H2O Java API to help make business decisions.
  • Increase automation and productivity in the on-boarding process to support extremely rapid growth of both content providers and consumers.
  • Added extension to Podium platform to automate ingestion of data based on Podium metadata using Podium REST API.
  • Discussed H2O and Apache Spark-based machine learning techniques for increasing customer acquisition rates on Capital One's website.
  • Designed and built Rails and Sinatra application for verifying information via crowdsourcing using Amazon Mechanical Turk API.
  • Worked with software developers to deploy web services loading Latitude/longitude data by interfacing with ESRI API.
  • Tested all data reports published to Web including dashboards, summarized, master-detailed and API's.
  • Use Spark as ETL tool to manipulate web data* Use Spark API for Machine learning.
  • Provided enhancements to core and peripheral modules, including a read-repair REST API module.
  • Developed REST API using Spark(web framework) to register device into HSDM.
  • Developed race fuel strategy application using C# and the Pi Server API.
  • Integrated internal and external data via API for cross platform marketing campaign evaluations.
  • Utilized Google Maps API to embed maps showing various weather station location and details
  • Extracted Google AdWords and Facebook Marketing Insights data using corresponding Java API libraries.
  • Analyzed and recommended neighbor List/ Parameter changes based on MapInfo plots.
  • Worked extensively on API and used Java to perform data validations.
  • Developed reusable modules using buck build for Graph API endpoints access.
  • Develop the rest API to extract the data from the warehouse.

Show More

26. Informatica

average Demand
Here's how Informatica is used in Data Engineer jobs:
  • Created Oracle Stored Procedures to implement complex business logic for good performance and called from Informatica using Stored Procedure transformation.
  • Developed Informatica design mappings using various transformations.
  • Tuned Informatica Mappings and Sessions for optimum performance
  • Configured Informatica Power Center with SharePoint 2010/2013.
  • Created Informatica Mappings for Error Handling/Quality control.
  • Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle using Informatica Power Center.
  • Worked on complete SDLC from Extraction, Transformation and Loading of data using Informatica.
  • Conducted performance tuning of informatica maps/db queries, debugged maps, completed code reviews.
  • Worked on Informatica Templates to standardize the process and maintain consistency across different teams.
  • Installed Hot fixes, utilities, and patches released from Informatica Corporation.
  • Executed this through scripts, PL/SQL code and sync using Informatica.
  • Worked with Informatica Power Exchange to extract the data from SAP.
  • Perform proof of concept on new features of Informatica and SnapLogic
  • Worked on setting up the Informatica Sales Cloud environment.
  • Upgraded to Informatica Power Center 8.6.1 from version 8.1.1.
  • Developed Informatica Mappings using heterogeneous sources like flat files and different relational databases, Mapplets, Mappings using Power Center Designer.
  • Assumed data modeling responsibilities and developed Informatica push- down optimized mappings for a Netezza MPP appliance.
  • Developed and implemented Informatica mappings and workflows to integrate data into SQL Server database.
  • Used Informatica to extract data from the Salesforce SaaS system and scheduled jobs to bring real-time data into a staging area.
  • Developed the Unix Scripts to automate pre-session and post-session processes in Informatica, file transfers between the various hosts.

Show More

27. Teradata

average Demand
Here's how Teradata is used in Data Engineer jobs:
  • Reduced Teradata space used by optimizing tables - adding compression where appropriate and ensuring optimum column definitions.
  • Handled end-to-end implementation of SAS to Teradata migration projects to successful completion under tight deadlines.
  • Used Teradata database management system to manage the warehousing operations and parallel processing.
  • Support development organizations with Teradata expertise and user training.
  • Involved in Teradata database design prototypes.
  • Created Teradata BTEQ scripts to transform the data and create Purpose Build Extensions(PBE) which are used in Reporting.
  • Mentored fellow employees on: Teradata syntax, debugging, and how data is loaded and stored in the database.
  • Developed automated jobs for reconciling the data between the source (Oracle) and EDW systems (Teradata).
  • Converted DB2 SQL scripts, macros, and stored procedures to Teradata SQL in support of DB2-to-Teradata database migration.
  • Merged different sources of data such as Excels and Teradata in Tableau for building monthly reports.
  • Experienced in troubleshooting Teradata Scripts, fixing bugs and addressing production issues and performance tuning.
  • Developed automated scripts for ingesting the data from Teradata around 200TB bi-weekly refreshment of data.
  • Created DDL for the dropping, creation, and modification of database objects in Teradata.
  • Trained in consulting, technical and operational skills through Teradata's new hire program.
  • Used Teradata Aster bulk load feature to bulk load flat files to Aster.
  • Trained employees in various departments on how to use Teradata and run reports.
  • Developed complex Teradata SQL queries, unions, multiple table joins and views.
  • Work with all Teradata utilities and WhereScape development tool.
  • Loaded data from Hive to Teradata using TDCH.
  • Worked on Oracle to Teradata data mover utility.

Show More

28. XML

average Demand
Here's how XML is used in Data Engineer jobs:
  • Designed, developed, and maintained data flows involving various data formats like file feeds, XML feeds to Web Services.
  • Fulfilled requirements from all silos by using AVRO, Parquet, JSON, XML, CSV file formats.
  • Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
  • Generated XML files that conformed to the DOD Data Quality Metadata Exchange (DQME) standard.
  • Designed a relational model to house XML and text data files received from third parties.
  • Used XML, XSLT and XSD to created dashboards and reports that identified anomalous data.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Worked on different file formats like Sequence files, XML files.
  • Used web based markup languages such as HTML and XML schema.
  • Develop Spark/MapReduce jobs to parse the JSON or XML data.
  • Worked with customers to establish direct XML data feeds.
  • Utilize Unix, SOAP, XML, LDAP interfaces.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Participated in meetings to discuss strategic initiatives and improvising internal monitoring tools and reports functioning on SQL and XML.
  • Implemented the JAVA MapReduce XMLParser to handle both gas and electric XML data and improved the performance using AVRO.
  • Involved in integration work Technical Environment: C, Xml, NXP TV 550, MIPS Processor, Linux
  • Used Pig Latin scripts to convert data from JSON, XML and other formats to Avro file format.
  • Developed the Spark application consumer code to perform data checks and filters, parsing data using XML parsers.2.
  • Report large datasets to Warner, Sony and Universal using SQL in TPT and XML.
  • Integrated clients like for data automation using Talend, Java, and XML.

Show More

29. Nosql

average Demand
Here's how Nosql is used in Data Engineer jobs:
  • Used Cassandra as the NoSQL Database and acquired very good working experience with NoSQL databases.
  • Worked on Apache Cassandra writing NoSQL routines for time-series data.
  • Performed Sqooping for various file transfers through the Cassandra tables for processing of data to several NoSQL DBs.
  • Secure data ingest to NOSQL data stores (Accumulo) and running analytics against it.
  • Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
  • Involved in NoSQL Cassandra database design, integration and implementation.

Show More

30. Ssis

low Demand
Here's how Ssis is used in Data Engineer jobs:
  • Assist in analyzing and assessing the feasibility of proposed enhancement requests for reports and application systems.
  • Assist in identifying and prioritizing business information needs, and defining business benefit of data warehousing.
  • Performed cluster co-ordination and assisted with data capacity planning and node forecasting using ZooKeeper.
  • Perform 1st level assistance to subscribers making new network connections or becoming operational.
  • Coordinated and provided oversight management and design assistance on hundreds of projects.
  • Assist in reviewing and performing quality assurance tasks for overall accuracy.
  • Analyzed mechanical problems with engine and chassis performance and implemented solutions.
  • Assisted designing machine learning algorithms for solving business insight problems.
  • Assist companies in designing and optimizing their email marketing campaigns
  • Assisted business analyst in gathering technical requirements from management.
  • Provide assistance with the configuration of customer equipment.
  • Assist with general computer operations.
  • Developed a SQL Server SSIS custom task transform to execute Data Services jobs via web services from within the SSIS environment.
  • Assist the client in the control of drilling operations by inputting geological information to the best of the operator's ability.
  • Assisted research in RFID integration into business process, to track global shipment real-time tracking of goods and predict expected deliveries.
  • Created complex SSIS packages and optimized them to the fullest depth to reflecting required business logic to implement designed ETL Strategy.
  • Assist with development, revision, and maintenance of Tactical Data Link (TDL) course materials IAW ISD processes.
  • Created SSIS packages to automate data movement into SAN Box from flat files and exposed them to the users.
  • Assisted in implementation and maintenance of the Operational Data Store(ODS) which involved the migration of data.
  • Used project configuration, parameters, project connections, integration service catalog and environment to perform SSIS 2012 configuration.

Show More

31. R

low Demand
Here's how R is used in Data Engineer jobs:
  • Generated reports based on the data to present relevant metrics to executives using Tableau for visualization and better understanding.
  • Performed preliminary analysis of county record sets as part of Master Data Management process.
  • Work with Project Manager/Supervisor on open issues regarding markets and data collection progress.
  • Analyze business procedures and problems to create and present thorough project requirements documentation.
  • Developed a stream processing audit system used to characterize the distributed system.
  • Supported network operations center technicians with level-3 support for customer issues.
  • Developed normalized Logical and Physical database models to design OLTP system.
  • Created an automated reconciliation process, six sigma methodologies applied.
  • Maintained warehouse metadata, naming standards and warehouse standards.
  • Gather and analyze requirement data for projects being developed.
  • Selected numerical and categorical variables from data set.
  • Created user behavior metrics and conducted cluster analysis.
  • Designed interest ontology using Protege and Neon ToolKit.
  • Developed a social lead generation strategy and application.
  • Involved in review of functional and non-functional requirements.
  • Developed transformation algorithms to implement dimension reduction.
  • Perform thorough testing and validation to support the accuracy of data transformations, data verification used in the machine learning models.
  • Utilized Oracle Data Integrator (12C) to develop star schema data models and create ELT jobs for data warehousing.
  • Worked on Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Extracted and loaded the required data from the tables by using Hive and Impala by creating new tables in Hive.

Show More

32. Json

low Demand
Here's how Json is used in Data Engineer jobs:
  • Automated report development using Java, SQL, JSON.
  • Used Cassandra to work on JSON documented data.
  • Worked with parquet, JSON file formats.
  • Handled different File Formats like JSON.
  • Used JSON file as config file, to set different input parameters.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.

Show More

33. AWS

low Demand
Here's how AWS is used in Data Engineer jobs:
  • Follow dispatch instructions and comply with all applicable laws/regulations, as well as company policies/procedures.
  • Created an AWS tag, when set, script will create snapshots periodically (daily, weekly, and monthly).
  • Worked with Amazon Web Services (AWS)cloud infrastructure services and involved in ETL, Data Integration and Migration.
  • Used a myriad of tools to form ETL/ELT applications to integrate many disparate sources into a platform in AWS.
  • Moved physical servers to AWS EC2 Cloud by creating multiple cloud instances using Elastic IP and Elastic Block Volumes.
  • Provide architectural solutions when building new and migrating existing applications, software and services on the AWS platform.
  • Utilized Amazon Web Services (AWS) such as S3 to save the executed results along with Cassandra.
  • Bridged the gap between engineering and management regarding potential mechanical or design flaws for all server product lines.
  • Collaborated with Web Application Engineers, used Python scripts to load the data into AWS Cloud Cassandra database.
  • Used AWS Elastic Beanstalk service for deploying and scaling web applications and services developed with Java, Python.
  • Designed AWS Cloud Formation templates to create and ensure successful deployment of Web applications and database templates.
  • Maintain / Enhance SQL Server databases- Design / Build streaming data architecture using NodeJS, Kafka, AWS
  • Worked on Spark SQL and DataFrames for faster execution of Hive queries using Spark and AWS EMR.
  • Involved in complete cycle on migrating physical Linux/Windows machines to cloud (AWS) and test it.
  • Shaped or cut materials to specified measurements, using hand tools, machines, or power saws.
  • Used AWS Identity and Access Management (IAM) to create policies for AWS resource access.
  • Involved in data migration, data processing and data validation in the S3 buckets of AWS.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Designed, developed and deployed CSV Parsing using the big data approach on AWS EC2.
  • Worked on mapping review, if design flaws found as a part of peer testing.

Show More

34. Zookeeper

low Demand
Here's how Zookeeper is used in Data Engineer jobs:
  • Experienced in using Zookeeper Operational Services for coordinating the cluster.
  • Used Zookeeper for various types of centralized configurations.
  • Maintained cluster co-ordination services through ZooKeeper.
  • Cluster coordination services using ZooKeeper.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Implemented Storm integration with Kafka and ZooKeeper for the processing of real time data.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Configured Apache Kafka's multiple brokers along with zookeeper nodes.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Automated the installation and maintenance of kafka, storm, zookeeper and elastic search using salt stack technology.

Show More

35. Spark SQL

low Demand
Here's how Spark SQL is used in Data Engineer jobs:
  • Developed a JDBC API for ArrayDB to integrate it into the Spark platform based on JNI and Spark SQL.
  • Implemented Spark using Java and Spark SQL for real time processing of storage/compute/ network operation metrics.
  • Used Spark SQL and DataFrames API to load structured and semi-structured data into Spark Clusters.
  • Involved in working and tuning of the Spark SQL queries and Spark streaming scripts.
  • Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
  • Involved in data transformation, filtering and load using Spark SQL.
  • Used Hive, spark SQL Connection to generate Tableau BI reports.
  • Used Spark SQL to process the huge amount of structured data.
  • Created Spark SQL queries for faster processing of data.
  • Worked on Spark SQL UDF's and Hive UDF's.
  • Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
  • Experience in extracting appropriate features from datasets in-order to handle bad, null, partial records using spark SQL.
  • Involved in design & development of Big Data / distributed systems using /Spark / Hadoop / Spark SQL.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQl Context.
  • Worked on the componentization of existing spark jobs such that they were re-written in Spark SQL.
  • Project scope limited to Only Spark SQL and Spark Streaming here but interested in exploring Mlibs.
  • Developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
  • Have worked with Apache Spark SQL, Spark Streaming and Spark MLlib.
  • Parsed and queried JSON, CSV, Parquet, Avro and XML data formats using Spark SQL and Dataframes.
  • Develop spark sql tables & queries to perform adhoc data analytics for analyst team.

Show More

36. Rdbms

low Demand
Here's how Rdbms is used in Data Engineer jobs:
  • Modeled and instantiated a reference data capability using Oracle 9i/10g RDBMS to support DOJ homeland security activities.
  • Planned, revised, tested and documented backup and recovery procedures for several RDBMS (Oracle and SQL server).
  • Designed universal update history library using C# and MongoDB to store history log for any RDBMS table.
  • Gathered and analyzed data to aid in the decisions that would create a healthy RDBMS system.
  • Experience in RDBMS concepts, writing PL/SQL database queries, stored procedures, and triggers.
  • Implemented comprehensive regression test system for RDBMS, Native API, ODBC API and clients.
  • Involved in Extracting, loading Data from Hive to Load an RDBMS using SQOOP.
  • Become more proactive instead of reactive, especially on RDBMS.
  • Maintain, monitor and service any RDBMS systems already in existence or new for a variety of Systems (i.e.
  • Involved in creating generic Sqoop import script for loading data into hive tables from RDBMS.
  • Developed, profiled, tuned native middleware for proprietary data-warehousing SQL RDBMS.
  • Imported and exporting data into RDBMS and Hive using Sqoop.
  • Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop.
  • Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
  • Used Sqoop utility tool to import structured RDBMS data to Hadoop.
  • Induct data from RDBMS to Hadoop using Sqoop.
  • Used Sqoop to pull the data from RDBMS like Teradata, Netezza, Oracle and storing it in the Hadoop.

Show More

37. Data Ingestion

low Demand
Here's how Data Ingestion is used in Data Engineer jobs:
  • Worked with the windows PowerShell and Azure scheduler to automate the data ingestion and transformation jobs on daily and monthly schedules.
  • Developed Spark SQL Scripts for data Ingestion from Oracle to Spark Clusters and relevant data joins using Spark SQL.
  • Designed and developed various machine learning modules for data ingestion, transformation, training and predictions.
  • Created Spark streaming projects for data ingestion and integrated with Kafka consumers and producer for messaging.
  • Created, monitored, and maintained all data ingestion flows cloud architecture using GOTS product.
  • Tested and solved the critical issues faced while data ingestion into data unit.
  • Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
  • Designed Spark schema and data selection queries and Involved in data ingestion process.
  • Worked as a developer in data transformation and data ingestion teams.
  • Created and maintained all data ingestion flows into cloud architecture.
  • Designed and developed application for real-time data ingestion using Kinesis and Lambda on cloud.
  • Designed and developed data ingestion pipeline to work on hybrid (partly on-prem/partly in-cloud) environment using Apache Spark and EMR.
  • Designed and developed solution for ESP pump real time data ingestion using Kafka, Storm and HBase.
  • Implemented Kafka for realtime processing with data ingestion and cluster handling.
  • Design talend jobs for data ingestion, enrichment and provisioning.
  • Integrated Apache Kafka for data ingestion Configured Domain Name System (DNS) for hostname to IP resolution.
  • Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
  • Created Qliksense and Apache Zeppelin Dashboards of Oozie and Falcon data ingestion jobs for efficient monitoring.
  • Designed workflows for processes from data ingestion to transformation and access using LinkedIn s Azkaban.

Show More

38. Log Files

low Demand
Here's how Log Files is used in Data Engineer jobs:
  • Extracted data from various log files and imported data into text files.
  • Used by scheduled jobs to email notifications and log files.
  • Performed Cluster Drive and Post Processing of Log Files.
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Achieved high security and retention by managing and reviewing Hadoop log files.
  • Managed and reviewed the Hadoop log files using Shell scripts.
  • Managed and Reviewed Hadoop Log Files, deploy and Maintaining Hadoop Cluster.

Show More

39. Data Processing

low Demand
Here's how Data Processing is used in Data Engineer jobs:
  • Performed repairs and inspections on vehicle equipment and data processing equipment.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Design, develop and test tools to automate data processing methods in Java using JA Builder(Matlab-Java platform integration).
  • Modeled hive partitions extensively for data separation and faster data processing and followed by and Hive best practices for tuning.
  • Involved different phases in big data projects like data acquiring, data processing and data serving using dash boards.
  • Implemented large scale data processing of trade and position level information, by using a custom built MapReduce engine.
  • Developed Python programs and utilities for data processing, password encryption and other Data Engineering needs.
  • Used Apache Spark on YARN to have fast large scale data processing and to increase performance.
  • Developed Spark jobs using Python for faster data processing and used Spark SQL for querying.
  • Optimized ORACLE ETL (partitioning, direct path inserts, parallel data processing etc.)
  • Tasked with creating in-house data processing tools for increasing performance and data continuity.
  • Worked on optimization of queries that impacted the performance of loads/data processing.
  • Research data processing and analysis technologies and their application to client data.
  • Developed a data pipeline for data processing using Kafka-Spark API.
  • Developed k-streams using java for real time data processing.
  • Developed data processing pipeline in Spark for DNS data.
  • Experience building a data processing pipeline.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Integrated Kafka with Spark Streaming for real-time data processing Imported data from disparate sources into Spark RDD for processing.
  • Led efforts for scalable data processing pipelines for music metadata (Python, Shell, MySQL)

Show More

40. Avro

low Demand
Here's how Avro is used in Data Engineer jobs:
  • Worked on various file formats AVRO, ORC, Text, CSV, Parquet using Snappy compression.
  • Experienced in handling Sequence, ORC, AVRO and Parquet files.
  • Designed Hive tables on the hospital data with the file format as Avro and snappy compression.
  • Worked with Text, Avro, and Parquet file formatted and snappy as a default compression.
  • Worked with different file formats like Text files, Sequence Files, Avro.
  • Used different file formats like Text files, Sequence Files, Avro.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Experience in using Sequence files, ORC and Avro file formats.
  • Converted the text data to Apache Avro records with compression.
  • Loaded data into landing zone in Avro format.
  • Determine best serialization methodology to store and retrieve the data like Text, Sequential, Avro and Parquet file format.
  • Imported Avro files using Apache Kafka and did some analytics using Spark.
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.

Show More

41. Mongodb

low Demand
Here's how Mongodb is used in Data Engineer jobs:
  • Handled MongoDB Disaster Recovery Plan for Business Continuity Plan.
  • Developed application component interacting with MongoDB.
  • Led the capability building exercise for the bank foraying into Big Data with open source MongoDb and on cloud.
  • Developed batch utility for migrating data between MongoDB and SQL Server using the 10gen C# driver.
  • Developed a tool to stream business related Twitter data and store in MongoDB for sentiment analysis.
  • Worked closely with the development team to help them better understand MongoDB database concepts and techniques.
  • Implemented document store database (like MongoDB) to integrate with a relational database.
  • Extracted events from 7,000 pieces of news articles and built knowledge base with MongoDB.
  • Migrated Data from Oracle & SQL Server Database by Reverse Engineering to MongoDB Database.
  • Implemented LVM snapshots and Ops manager backups for large scale MongoDB systems.
  • Worked with MongoDB and utilized NoSQL for non-relation data storage and retrieval.
  • Used NoSQL databases like MongoDB in implementation and integration.
  • Populated sensor data into NoSQL database (MongoDB).
  • Involved in data migration from Oracle database to MongoDB.
  • Designed and Implemented MongoDB Cloud Manger for Google Cloud.
  • Provided solution using MongoDB and NoSQL databases.
  • Worked on NoSQL including MongoDB and Cassandra.
  • Worked with multiple storage engines in MongoDB.
  • Employed MongoDB Aggregation Framework to perform analytics based on logged-in user-id.
  • Tuned Reports and Metadata Models for Efficient and faster reporting Environment: Windows 7, UNIX, SQL Server, MongoDB

Show More

42. Impala

low Demand
Here's how Impala is used in Data Engineer jobs:
  • Load and transform large sets of structured, semi structured and unstructured data with map reduce, Impala and Pig.
  • Created databases table, tables and views in Hive QL, Impala and Pig Latin.
  • Documented the process and mentored analyst and test team for writing Hive and Impala Queries.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Leveraged Impala Shell in Unix scripts for auditing the number of processed records.
  • Designed and developed complex impala SQL queries for 360 degree comparison of data.
  • Performed querying of both managed and external tables created by Hive using Impala.
  • Saved the output in Apache parquet format, to use with Impala.
  • Designed and presented a POC on introducing Impala in project architecture.
  • Process unstructured data utilizing Spark, Impala and Hive.
  • Worked closely with developers to enhance queries in Impala.
  • Used Tableau for Data Visualization on Hive/Impala tables.
  • Visualized impala tables in Tableau for reporting.
  • Integrated BI tool with Impala for visualization.
  • Visualize the impala tables in Tableau.
  • Implemented extensive Impala 2.7.0 queries and creating views for adhoc and business processing.
  • Developed Hive and Impala for end user/ analyst requirements to perform hoc analysis.
  • Installed Cloudera CDH 5.3 Enterprise edition with YARN, Spark and Impala.
  • Worked on Hive, Impala, Sqoop.
  • Worked on analyzing Hadoop cluster and different Big Data components including Pig, Hive, Spark, Impala, and Sqoop.

Show More

43. Redshift

low Demand
Here's how Redshift is used in Data Engineer jobs:
  • Developed migration framework to RedShift DW platform.
  • Tuned Performance of Redshift DW by creating suitable Distribution Styles and Sort Keys on Dimensions and Facts.
  • Experienced in data warehousing using Amazon Redshift when dealing with large sets of data.
  • Validated data on cloud using Amazon Redshift.
  • Piloted project evaluating Amazon Redshift for analytics.
  • Prepared, Scaled and transformed datasets for classification models using R and Redshift.
  • Design of Redshift Data model, Redshift Performance improvements/analysis Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Show More

44. File System

low Demand
Here's how File System is used in Data Engineer jobs:
  • File system management with VERITAS volume manager and LVM.
  • Developed extract of file system info.
  • Performed fsck on the file systems and bad super blocks were repaired using repair and analyze.
  • Developed a Java API to invoke the REST API of HTTPFS, which supported all the file system operations in HDFS.
  • Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Leveraged Amazon S3 as the data layer by using EMR File System (EMRFS) by using Amazon EMR cluster.

Show More

45. EC2

low Demand
Here's how EC2 is used in Data Engineer jobs:
  • Developed applications with high performance utilizing EMR-EC2 instances by choosing appropriate instance types and capacity.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
  • Used Amazon EC2 to utilize GPU acceleration for the heavy-load and computation power demanding CNN.
  • Developed a task execution framework on EC2 instances using SQL and DynamoDB.
  • Experience in migration of data across cloud environment to Amazon EC2 clusters.
  • Installed and configured Hive and Pig environment on Amazon EC2.
  • Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
  • Used Aspera Client on Amazon EC2 instance to connect and store data in the Amazon S3 cloud.
  • Set up various instances of Yellowfin AMI on multiple Amazon EC2 instances.
  • Used a 40-node cluster with Cloudera Hadoop distribution on Amazon EC2.

Show More

46. Perl

low Demand
Here's how Perl is used in Data Engineer jobs:
  • Developed automated process utilizing Shell/Perl scripts for operations users to maintain data.
  • Developed regular expressions to properly transform inconsistent data sources.
  • Converted all chain of custody forms and legal documents into PDF format minimizing any risk of information being improperly altered.
  • Developed Perl scripts to manipulate incoming data in-preparation for data loading and SQL reporting (Web and email reports).
  • Ensured the proper equipment was procured and properly installed in accordance with regulations while maintaining an appropriate budget.
  • Developed Python, Shell/Perl Scripts and Power shell for automation purpose and Component unit testing using Azure Emulator.
  • Created automated feeds of data using both SSIS and Perl to provide the business with the critical data.
  • Verified and tested all systems, confirmed network connectivity was maintained and functioning properly before departing the site.
  • Trained others on how to properly utilize department data and understand the different nuances of the data.
  • Generate TOP business KPI using Perl scripting in the MS Excel for Business users and analysts.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Maintain online Scrum board with stories and tasks to ensure work is being tracked properly.
  • Used Perl scripting in the automation of the collection of data from the network routers.
  • Created a Dashboard using Perl/CGI, Strawberry Perl for Windows and Apache Web Server.
  • Created API and Web-Services using REST to other Web-properties of the company using Perl.
  • Developed Shell and Perl scripts to automate the DBA monitoring and diagnostic jobs.
  • Created supporting scripts for automation in Shell, Perl, and Python.
  • Design and code web services automation using Perl SOAP and web API.
  • Developed Perl code to process and load XML-based vendor feeds.
  • Introduced PERL automated framework to assure quality of SQL code.

Show More

47. UDF

low Demand
Here's how UDF is used in Data Engineer jobs:
  • Developed customized UDF's for extending Hive functionality.
  • Worked on Hive UDF's and due to some security privileges I have to ended up the task in middle itself.
  • Designed and implemented Hive and Pig UDF's for evaluation, filtering, loading and storing of data.
  • Developed Hive UDF to parse the staged raw data to get the item details from a specific store.
  • Design and develop reusable components, frameworks and UDF's for ingestion and data quality.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive.
  • Installed and configured Hive and also wrote Hive UDF's that helped spot market trends.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Developed and involved in the industry specific UDF (user defined functions).
  • Extended Hive framework through the use of custom UDF to meet the requirements.
  • Implemented business logic based on state in Hive using Generic UDF's.
  • Involved in developing UDF's and UDAF's to implement customized transformations.
  • Implemented Hive Generic UDF's to implement business logic.
  • Developed UDF's using Hive, Pig in Java.
  • Log data and implemented Hive custom UDF's.
  • Developed custom UDFS and implemented Pig scripts.
  • Developed complex UDF for data validation.
  • Developed FILTER and EVAL UDF's in java for PIG.
  • Developed highly efficient Pig Java UDFs utilizing advanced concept like Algebraic and Accumulator interface to populate ADP Benchmarks cube metrics.
  • Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster Database.

Show More

48. Log Data

low Demand
Here's how Log Data is used in Data Engineer jobs:
  • Graph database modeling for log data and brute-force attack identification using R and Neo4j graph database.
  • Analyze the log data from Elastic Search to find the pattern of service/host failures.
  • Analyzed the performance log data to improve the application response time and performance.
  • Designed and implemented monitoring service using distributed frameworks for persistence and analysis of metric and log data within Macys.
  • Fixed ingestion issues using Regex and coordinated with System Administrators to verify audit log data.
  • Interpreted lithology based on formation samples and electric log data.

Show More

49. Internet

low Demand
Here's how Internet is used in Data Engineer jobs:
  • Sling TV is an American over-the-top internet television service that is owned by the Sling TV LLC subsidiary of Dish Network.
  • Company uses telemarketing and mail order as well as internet to reach virtually every country with established postal and telephone systems.
  • Used in a variety of products: Backbone infrastructure, Layer 3 internet services via Frame, Private L2 ATM/frame-relay transport.
  • Supported sales office, provided support on desktop, software, network, telephone and Internet connection.
  • Developed the Web Based Rich Internet Application (RIA) using JAVA/J2EE (spring framework).
  • Enable high-speed Internet solutions in T1, T3, and Ethernet over copper installation through Verizon.
  • Designed and Developed complete solution for the Internet of Things to monitors the goods in transit.
  • Installed and currently supporting Internet Access to different levels of the IZOD Center and Racetrack.
  • Obtained, cleaned and processed data from open sources and scraping from Internet.
  • Installed and supported high speed Internet access and high speed data networks.
  • Designed, configured, and deployed high speed internet access networks.
  • Orchestrated relationships with Frame Relay and Internet Service Providers.
  • Co-ordinated and implemented security features for outside vendor VPN's and internal user access to the internet.
  • Detect and solving issues in data circuits Helping customers to fix internet breakdown Give technical support to technician

Show More

50. IP

low Demand
Here's how IP is used in Data Engineer jobs:
  • Reviewed and analyzed data standards, entity and attribute relationships as well as technically advised staff on data integration issues.
  • Implemented Generic writable to incorporate multiple data sources into reducer to implement recommendation based reports using MapReduce programs.
  • Designed and instructed class on Basic Telemetry of multiplexers for Network Engineering groups within the organization.
  • Participated fully in collaborative modeling sessions, informal and formal collaboration, and process improvement discussions.
  • Developed server based visualization applications that leverage machine learning and predictive analytic to predictive equipment state.
  • Interviewed and evaluated multiple candidates for a variety of positions including business analysts and engineers.
  • Installed, migrated and troubleshot workstations and related peripherals in time-sensitive and dynamic environments.
  • Configured Fair Scheduler to provide service-level agreements for multiple users of a cluster.
  • Created entity diagrams and relationship diagrams and modeled cascade to maintain referential integrity.
  • Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
  • Supported efficient data storage and manipulation by creating complex stored procedures and functions.
  • Communicated with other power-generating stations and vendors for equipment and readiness.
  • Analyze equipment, establish operating data and handle complex engineering assignments.
  • Participate in ongoing projects including security assessments and metric reporting.
  • Right Now Technologies - Customer Relationship Management Software development.
  • Performed receipt and completeness verification of client data.
  • Participated in creating job scheduler infrastructure using AirFlow.
  • Participated in Hughes Network Management Portal project.
  • Decommissioned old equipment and prepared for shipping.
  • Build relationship with Technology partners.

Show More

20 Most Common Skill for a Data Engineer

Pl/Sql12.4%
Relational Databases7.9%
ETL7.8%
Hadoop7.2%
Data Warehouse6.5%
Hdfs5.2%
Python4.8%
Sqoop4.6%

Typical Skill-Sets Required For A Data Engineer

RankSkillPercentage of ResumesPercentage
1
1
Pl/Sql
Pl/Sql
8.5%
8.5%
2
2
Relational Databases
Relational Databases
5.4%
5.4%
3
3
ETL
ETL
5.3%
5.3%
4
4
Hadoop
Hadoop
4.9%
4.9%
5
5
Data Warehouse
Data Warehouse
4.4%
4.4%
6
6
Hdfs
Hdfs
3.6%
3.6%
7
7
Python
Python
3.3%
3.3%
8
8
Sqoop
Sqoop
3.1%
3.1%
9
9
Analytics
Analytics
3.1%
3.1%
10
10
Amazon Web
Amazon Web
2.9%
2.9%
11
11
Cloud
Cloud
2.9%
2.9%
12
12
Data Analysis
Data Analysis
2.8%
2.8%
13
13
Big Data
Big Data
2.8%
2.8%
14
14
Hbase
Hbase
2.4%
2.4%
15
15
SQL
SQL
2.4%
2.4%
16
16
Oozie
Oozie
2.3%
2.3%
17
17
Unix
Unix
2.2%
2.2%
18
18
Linux
Linux
2.2%
2.2%
19
19
Mapreduce
Mapreduce
2.1%
2.1%
20
20
Business Requirements
Business Requirements
1.8%
1.8%
21
21
Kafka
Kafka
1.8%
1.8%
22
22
Scala
Scala
1.7%
1.7%
23
23
Flume
Flume
1.7%
1.7%
24
24
BI
BI
1.6%
1.6%
25
25
API
API
1.6%
1.6%
26
26
Informatica
Informatica
1.4%
1.4%
27
27
Teradata
Teradata
1.3%
1.3%
28
28
XML
XML
1.2%
1.2%
29
29
Nosql
Nosql
1.2%
1.2%
30
30
Ssis
Ssis
1.1%
1.1%
31
31
R
R
1.1%
1.1%
32
32
Json
Json
1.1%
1.1%
33
33
AWS
AWS
1%
1%
34
34
Zookeeper
Zookeeper
1%
1%
35
35
Spark SQL
Spark SQL
1%
1%
36
36
Rdbms
Rdbms
0.9%
0.9%
37
37
Data Ingestion
Data Ingestion
0.9%
0.9%
38
38
Log Files
Log Files
0.9%
0.9%
39
39
Data Processing
Data Processing
0.9%
0.9%
40
40
Avro
Avro
0.9%
0.9%
41
41
Mongodb
Mongodb
0.8%
0.8%
42
42
Impala
Impala
0.8%
0.8%
43
43
Redshift
Redshift
0.8%
0.8%
44
44
File System
File System
0.8%
0.8%
45
45
EC2
EC2
0.7%
0.7%
46
46
Perl
Perl
0.7%
0.7%
47
47
UDF
UDF
0.7%
0.7%
48
48
Log Data
Log Data
0.7%
0.7%
49
49
Internet
Internet
0.7%
0.7%
50
50
IP
IP
0.7%
0.7%

54,981 Data Engineer Jobs

Where do you want to work?