Top Hadoop Developer Skills

Below we've compiled a list of the most important skills for a Hadoop Developer. We ranked the top skills based on the percentage of Hadoop Developer resumes they appeared on. For example, 6.0% of Hadoop Developer resumes contained Hdfs as a skill. Let's find out what skills a Hadoop Developer actually needs in order to be successful in the workplace.

The six most common skills found on Hadoop Developer resumes in 2020. Read below to see the full list.

1. Hdfs

high Demand
Here's how Hdfs is used in Hadoop Developer jobs:
  • Used Kafka, Flume for building robust and fault tolerant data Ingestion pipeline for transporting streaming web log data into HDFS.
  • Perform transformations, clean and filter on imported data using Hive, Map Reduce, and load final data into HDFS.
  • Load log data into HDFS using Flume, worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
  • Worked on HDP security implementations, backup, recovery and DR. * Loading log data directly into HDFS using Flume.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Worked extensively in creating data platform for data injection from multiple sources into HDFS using Kafka producers, consumers.
  • Handled data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Used HUE to view HDFS directory structure, monitor jobs, Query editors (HIVE, Impala).
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer).
  • Used Pig for ETL purposes and performed operations using Pig functions before loading the data into HDFS.
  • Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Used different data formats (Text format and ORC format) while loading the data into HDFS.
  • Performed SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
  • Implemented Kafka Simple consumer to get data from specific partition into HDFS using Custom logic.
  • Executed the HiveQL commands on CLI and transferred back the required output data to HDFS.
  • Processed data is then published to HDFS, exported for downstream consumption and generating reports.
  • Worked on HDFS & HIVE to ingest the structured data to data raw zone.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.

Show More

2. Sqoop

high Demand
Here's how Sqoop is used in Hadoop Developer jobs:
  • Implemented the import and export of Data using SQOOP between MySQL to HDFS on regular basis.
  • Involved in transfer of data from post log tables into HDFS and Hive using SQOOP.
  • Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
  • Worked on SQOOP to import data from various relational data sources.
  • Created SQOOP jobs to export analyzed data to relational database.
  • Migrate data to/from sources to/from HDFS using SQOOP.
  • Involved in database connection by using SQOOP.
  • Generated summary reports utilizing Hive and Pig and exported these results via Sqoop for Business reporting and intelligence analysis.
  • Developed Sqoop scripts to import/export data from relational sources and handled incremental loading on the customer and transaction data by date.
  • Performed series of ingestion jobs using Sqoop, Kafka and custom Input adapter to move data from various sources to HDFS.
  • Worked in Loading and transforming large sets of structured data using Sqoop, semi structured and unstructured data using Map Reduce.
  • Imported Metadata to Hive by using Sqoop and transferred the applications and tables which already exist to work on hive.
  • Developed Sqoop scripts to import and export data from relational sources and handled incremental and updated changes into HDFS layer.
  • Exported the investigated information to the social databases utilizing Sqoop for visualization and to produce reports for the BI group.
  • Used sqoop to extract Tables from HDFS to DB and from there different reports are generated using reporting tools.
  • Worked with the DBA on understanding the limitations of concurrent DB connections when using Sqoop import/Export on different times.
  • Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
  • Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.
  • Trained and Mentored analyst / test team for writing, running and validating Sqoop scripts and Hive Queries.
  • Imported the data from different data sources into HDFS using Sqoop by making the required transformations using Hive.

Show More

3. Pl/Sql

high Demand
Here's how Pl/Sql is used in Hadoop Developer jobs:
  • Used PL/SQL to develop stored procedures and triggers in order to calculate and update the tables to implement business logic.
  • Worked on data migration and data conversion using PL/SQL, SQL and Python to convert them into custom ETL tasks.
  • Crafted SQL Queries and wrote PL/SQL blocks for storing procedures, Functions, Cursors, Index, triggers and packages.
  • Created Oracle PL/SQL queries and Stored Procedures, Packages, Triggers, Cursors and backup-recovery for the various tables.
  • Developed efficient PL/SQL packages for data migration and involved in bulk loads, testing and reports generation.
  • Developed SQL, HQL queries and PL/SQL stored procedures, functions to interact with the oracle database.
  • Involved in Oracle database development by creating OraclePL/SQL Functions, Procedures, Triggers and Packages.
  • Used Oracle Java Developer and SQL Navigator as tools for Java and PL/SQL development.
  • Developed Unix Shell scripts to call Oracle PL/SQL packages and contributed to standard framework.
  • Designed and Developed PL/SQL Procedures and UNIX Shell Scripts Data Import/Export and Data Conversions.
  • Created the Stored Procedures, functions and triggers using PL/SQL.
  • Scripted PL/SQL procedures for automation tasks like partition rotation.
  • Involved in writing PL/SQL for the stored procedures.
  • Developed Unix/Linux Shell Scripts and PL/SQL procedures.
  • Transformed existing PL/SQL procedures to hive queries.
  • Worked in PL/SQL Performance tuning.
  • Involved in writing PL/SQL, SQL queries.
  • Identified several PL/SQL batch applications in General Ledger processing and conducted performance comparison to demonstrate the benefits of migrating to Hadoop.
  • Implemented the DBCRs by developing PL/SQL scripts and stored procedures.
  • Developed SQL quesries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.

Show More

4. Pig UDF

high Demand
Here's how Pig UDF is used in Hadoop Developer jobs:
  • PIG UDF was required to extract the information of the area from the huge data which we get from the sensors.
  • Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage.
  • Design and develop customized business rule framework to implement business logic using hive, pig UDF functions.
  • Involved in writing Hive and Pig UDF's to perform aggregations on customers' data.
  • Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data.
  • Developed Pig UDF's in java for custom data for various levels of optimization.
  • Implemented various Pig UDF's for converting unstructured data into structured data.
  • Developed Pig UDF's and Hive UDF's using Java.
  • Experience in writing Pig UDF's and macros.
  • Implemented business logic by writing Pig scripts, Pig UDF's and Macros by using various Piggybanks and DataFu other sources.
  • Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Developed custom Java code to call the Pig Job who runs the custom Pig UDF on runtime through REST WebServices.
  • Used Pig UDF's and Hive UDF's extensively to implement data transformation according to user requirements in Hadoop.
  • Filtered, transformed and combined data from multiple providers based on payer filter criteria using custom Pig UDFs.
  • Experience writing Hive and Pig UDFs to perform aggregation to support the business use case.
  • Developed Pig UDF's for preprocessing the data for analysis as per business logic.
  • Involved in the development of Pig UDF'S to analyze by pre-processing the data.
  • Developed custom Pig UDFs for transforming the data as per the requirement for analysis.
  • Developed HIVE and PIG UDF functions using Java to format the logs data.
  • Developed PIG UDF's to preprocess and analyze the Data source feed files.

Show More

5. Mapreduce

high Demand
Here's how Mapreduce is used in Hadoop Developer jobs:
  • Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
  • Developed MapReduce programs in java to support distributed processing of data generated by various business units.
  • Developed Secondary sorting implementation to get sorted values at reduces side to improve MapReduce performance.
  • Divided each data set in to corresponding categories by fallowing MapReduceBinning design pattern.
  • Experience in developing MapReduce jobs to process data sets using MapReduce programming paradigm.
  • Designed and implemented MapReduce based large scale parallel relation learning system.
  • Designed & Implemented Java MapReduce programs to support distributed data processing.
  • Developed MapReducejobs for cleaning, accessing and validating the data.
  • Implemented secondary sorting to sort reducer output globally in MapReduce.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Designed and implemented MapReduce based large-scale parallel processing.
  • Implemented MapReduce programs using Java.
  • Performed transformations on the data imported from HDFS, Oracle, Cassandra, MongoDB, DB2 by using MapReduce and Hive.
  • Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and load structured and semi-structured data into Spark clusters.
  • Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce and store into HDFS.
  • Developed automated data ingest process to handle the Inserts, Updates and Deletes during the Incremental loads using Java MapReduce.
  • Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for MapReduce computations.
  • Developed MapReduce programs to parse the raw data, populate tables and store the refined data in partitioned tables.
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.

Show More

Job type you want
Full Time
Part Time
Internship
Temporary

6. Oozie

high Demand
Here's how Oozie is used in Hadoop Developer jobs:
  • Defined and captured metadata and rules associated with HIVE and OOZIE processes.
  • Experienced in defining OOZIE job flows.
  • Worked on MapReduce, HIVE, PIG and OOZIE.
  • Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
  • Integrated Quartz scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
  • Used Hue for UI based PIG script execution, Oozie scheduling and creating tables in Hive.
  • Installed and worked on Apache Oozie in order to run multiple Pig and Hive scripts.
  • Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
  • Configured Time Based Schedulers that get data from multiple sources parallel using Oozie work flows.
  • Worked on OOZIE to automate data loading into HDFS and PIG to pre-process the data.
  • Used Oozie to coordinate and automate the flow of jobs in the cluster accordingly.
  • Implemented a web application, which uses Oozie Rest API and schedule jobs.
  • Used Oozie to orchestrate the scheduling of map reduce jobs and pig scripts.
  • Scheduled Recurrent Jobs, Packaged Multiple Jobs, Sequenced jobs in Oozie.
  • Installed Oozie work process motor to run numerous Hive and Pig jobs.
  • Involved in setting up the daily process on a schedule using Oozie.
  • Experienced in defining job work flows as per their dependencies in Oozie.
  • Used Oozie Operational Services for batch processing and scheduling work flows dynamically.
  • Integrated Ooziewc client with java applications to build work flows into application.
  • Develop Hive queries for the analysts and schedule jobs via oozie.

Show More

7. Hbase

high Demand
Here's how Hbase is used in Hadoop Developer jobs:
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Implemented HBASE for creating tabular data.
  • Involved in storing data into HBASE.
  • Worked exclusively on SQOOP and HBASE.
  • Worked with HBASE NOSQL database.
  • Experience in HBase database manipulation with structured, unstructured and semi-structured types of data.
  • Integrated multiple sources data into Hadoop cluster and analyzed data by Hive-Hbase integration.
  • Install and configure Phoenix on HDP and Create views over HBase table and used SQL queries to retrieve alerts and metadata.
  • Analyzed HBase data in Hive (version 0.11.0.2) by creating external partitioned and bucketed tables so that efficiency is maintained.
  • Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
  • Performed transformations by developing Map Reduce jobs and Hive Script on imported data and then loading obtained data into HBase tables.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Involved with the team of fetching live stream data from DB to HBase table using Spark Streaming and Apache Kafka.
  • Implemented Pattern matching algorithms with Regular Expressions, built profiles using Hive and stored the results in HBase.
  • Analyzed the option of running Hive on top of HBase to query against data other than existing architecture.
  • Worked on NoSQL databases like HBase, integration with written storm topology to accept inputs from Kafka producer.
  • Developed Spark applications to move data into HBase tables from various sources like Relational Database or Hive.
  • Experienced with accessing HBase using different client API's like Thrift, Java and Rest API.
  • Extracted the data from different sources into HDFS and Bulk Loaded the cleaned data into HBase.
  • Managed real time data processing and real time Data Ingestion in HBase and Hive using Storm.

Show More

8. Hadoop

high Demand
Here's how Hadoop is used in Hadoop Developer jobs:
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
  • Project Description: OneHadoop is a Big Data solution developed in Bank of America.
  • Implemented nodes on CDH3Hadoop cluster on Red hat LINUX.
  • Stack and change extensive arrangements of organized, semi organized and unstructured information utilizing Hadoop/Big Data ideas.
  • Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
  • Experience in integrating RHadoop for categorization and statistical analysis to generate reports.
  • Involved in Design and Development of technical specification documents using Hadoop.
  • Experienced in installing, configuring and using Hadoop Ecosystem components.
  • Explored and used Hadoop ecosystem features and architectures.
  • Installed and configured various components of Hadoop ecosystem.
  • Monitored multiple Hadoop clusters environments using Ganglia.
  • Project deals with developing historical database using Hadoop ecosystem for maintaining last 10 years of data spread across branches in US.
  • Deploy new hardware and software environments required for Hadoop and to expand memory and disks on nodes in existing environments.
  • Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Participated the main goal of this project is to replace existing legacy system(main frames) with Hadoop.
  • Convinced the organization for using Hadoop as a technology to store the data apart from their traditional systems.
  • Enabled Verizon to connect to Facebook followers and be able to grow the brand advocacy using Hadoop platform.
  • Involved in Hadoop upgrade from HDP 1.3.3 to HDP 2.2 by enhancing existing framework and testing it.
  • Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.

Show More

9. Flume

high Demand
Here's how Flume is used in Hadoop Developer jobs:
  • Developed data pipeline using Flume to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Used Flume to collect, aggregate, and store the streaming data from different sources like Twitter and ingested into HDFS.
  • Worked on a concept to show that the current Kafka based data processing can be converted to the Flume based application.
  • Configured Apache Flume to extract the data from the web server O/P files and to copy log files to HDFS.
  • Involved in increasing the performance of system by adding other real time components like Flume, Storm to the platform.
  • Ingested the click stream data from web servers to Spark Streaming and HDFS using Kafka and Flume for further processing.
  • Used Flume to load unstructured and semi structured data from various sources such as websites and streaming data to cluster.
  • Used Flume to collect, aggregate and store the log data from Sensor Log data and pushed to HDFS.
  • Worked on social media analysis using Flume and provided required data sets to social media teams to generate reports.
  • Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers.
  • Imported unstructured data like logs from different web servers to HDFS using Flume and developed custom Map Reduce jobs.
  • Configured the source, sink, and channel in flume configuration file in order to collect streaming data.
  • Worked with Flume for gathering and moving log files from Application server to a central location in HDFS.
  • Configured flume source, sink and memory channel to handle streaming data from server logs and JMS sources.
  • Used Flume to channel data from different sources to HDFS as well as managed and monitored them.
  • Experience in transferring Streaming data from different data sources into HDFS and NoSQL databases using Apache Flume.
  • Collected and aggregated huge amount of log data from multiple sources and integrated into HDFS using Flume.
  • Developed data pipeline using Flume, Kafka to store data into HDFS and further processing through spark.
  • Developed Hive queries to process the data for visualizing Loading log data directly into HDFS using Flume.
  • Involved in processing the real time steaming data using Kafka, Flume integrating with Spark streaming API.

Show More

10. Pig Scripts

high Demand
Here's how Pig Scripts is used in Hadoop Developer jobs:
  • Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts.
  • Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
  • Created Java UDF to handle derived fields while inserting in to Hive table s and Pig scripts to format the data.
  • Develop Pig scripts to establish the data flow to achieve the desired watch list at store and item level exception reporting.
  • Developed Apache Pig scripts and UDF's extensively for data transformations and calculating Statement date formats and aggregates the monitory transactions.
  • Analyzed the data by performing Hive queries and running Pig scripts to understand the premium, claims and sales trending.
  • Processed data using Hive Query languages (HQL) in HIVE and PIG scripts in different stages of the project.
  • Involved in writing Pig Scripts for Cleansing the data and implemented Hive tables for the processed data in tabular format.
  • Analyzed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL and Spark streaming.
  • Developed and used Pig Scripts to process and query flat files in HDFS which cannot be accessed using HIVE.
  • Performed Pig scripts running and Hive queries to analyze the data to be aware the behavior of the user.
  • Developed Pig Scripts to perform operations on data such as Filter, Joins, Grouping, Sorting and Splitting.
  • Experience of performance tuning hive scripts, pig scripts, MR jobs in production environment by altering job parameters.
  • Developed Pig Scripts for capturing data change and record processing between new data and already existed data in HDFS.
  • Performed optimization on Pig scripts and Hive queries to increase efficiency and added new features to existing code.
  • Designed and developed Pig scripts with Java UDF's to implement business logic to transform the ingested data.
  • Developed PIG scripts to arrange incoming data into suitable and structured data before piping it out for analysis.
  • Involved in data cleaning and null suppression and various other transformations like date re formatting using PIG scripts.
  • Analyzed the data by performing Hive queries and running PIG scripts to study various features in an organization.
  • Involved in developing Pig scripts to transform raw data into data that is useful to gain business insights.

Show More

11. Cloudera

high Demand
Here's how Cloudera is used in Hadoop Developer jobs:
  • Participated in development/implementation of ClouderaHadoop environment.
  • Worked on POC and implementation & integration of Cloudera&Hortonworks for multiple clients.
  • Involved in upgrading clusters to Cloudera Distributed versions.
  • Clear understanding of Cloudera Manager Enterprise edition.
  • Experienced in development using Cloudera distribution system.
  • Used Cloudera Distribution for Data Transformations.
  • Received certifications in CloudEra and Hortonworks.
  • Implemented Cloudera Manager on existing cluster.
  • Created operational reports using Cloudera manager.
  • Created Data Lineage using Cloudera navigator.
  • Created metadata tagging using Cloudera navigator.
  • Utilized ApacheHadoop environment by Cloudera.
  • Used ClouderaManager and Cloudera Navigator.
  • Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, and other tools.
  • Use Cloudera manager to pull metrics on various cluster features like JVM, Running Map and reduce tasks etc.
  • Monitored System health and logs and responded accordingly to any warning or failure conditions through the Cloudera Manager.
  • Used Cloudera Crunch to develop data pipelines that ingests data from multiple data sources and process them.
  • Experienced in using one or more of the Big Data vendor tools such as Cloudera or HortonWorks.
  • Used Cloudera manager to monitor the health of the jobs which are running on the cluster.
  • Experience in using Cloudera Sentry for Role Based controlled for HDFS, Hive and impala tables.

Show More

12. Linux

high Demand
Here's how Linux is used in Hadoop Developer jobs:
  • Involved in development and deployment of application on Linux environment.
  • Possess good Linux and Shell Scripting and familiarity with open source configuration management and deployment tools such as puppet or chef.
  • Involved in loading huge volumes of data to and from Linux and HDFS as part of development.
  • Have done activities like Installation, Configuration and Upgrade of LINUX operating Systems and SQL database.
  • Performed complex Linux administrative activates as we created, maintained and updated Linux shell scripts.
  • Used Linux (Ubuntu) machine for designing, developing and deploying of Java modules.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data.
  • Design, Implement and Administer Linux Web and MySQL Database Servers.
  • Worked on Installation of HORTONWORKS 2.1 in AZURE Linux Servers.
  • Install, manage and support Linux operating systems, ex.
  • Performed system administration activities on Linux, CentOs & Ubuntu.
  • Worked on shell scripting in Linux and the Cluster.
  • Prepared Linux shell scripts for automating the process.
  • Test build and deploy applications on Linux.
  • Installed and configured databases on Linux platforms.
  • Configured Hadoop in Linux and deploying application.
  • Provide Subject Matter Expertise on Linux (To support running CDH/Hadoop optimally on the underlying OS).
  • Design, Implement and Administer LinuxWeb and Mysql Database Servers.
  • Configured nine nodes CDH5 Hadoop cluster on Red hat LINUX.
  • Implemented nine nodes CDH4 Hadoop cluster on Ubuntu LINUX.

Show More

13. Unix

high Demand
Here's how Unix is used in Hadoop Developer jobs:
  • Developed UNIX shell scripts to send a mail notification upon the job completing either with a success or Failure notation.
  • Developed Unix shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
  • Experience in Installing Firmware Upgrades, kernel patches, systems configuration, performance tuning on UNIX/Linux systems.
  • Worked on creation of UNIX shell scripts to watch for 'null' files and trigger jobs accordingly.
  • Developed MES Ingestion framework Unix scripts for pushing daily source files into HDFS & Hive tables.
  • Developed UNIX shell scripts for the business process and assimilation of data from different interfaces.
  • Automated the triggering of Data Lake REST API calls using Unix Shell Scripting and PERL.
  • Experience on UNIX shell scripts for process and loading data from various interfaces to HDFS.
  • Experience in UNIX shell scripting and has good understanding of OOPS and Data structures.
  • Worked with 10+ source systems and got batch files from heterogeneous systems like Unix/windows/oracle/mainframe/db2.
  • Imported data from Oracle database to HDFS using Unix based File Watcher tool.
  • Load and transform large sets of unstructured data from UNIX system to HDFS.
  • Experienced in working with different scripting technologies like Python, Unix shell scripts.
  • Developed UNIX scripts for fast load and export for various vendor formats.
  • Worked on different operating systems like UNIX/Linux, Windows XP and Z/OS.
  • Developed automated scripts using Unix Shell for scheduling and automation of tasks.
  • Developed simple to complex Unix shell/Bash scripting scripts in framework developing process.
  • Created UNIX numerous scripts to process raw data from the above sources.
  • Converted and loaded local data files into HDFS through the UNIX shell.
  • Developed UNIX shell scripts for creating the reports from Hive data.

Show More

14. Hive Tables

high Demand
Here's how Hive Tables is used in Hadoop Developer jobs:
  • Developed and managed partition-level locks on Hive tables using Zookeeper.
  • Created Managed and external Hive tables with static/dynamic partitioning.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Designed and created the Hive tables to load large sets of structured, semi-structured and unstructured data coming from upstream.
  • Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries which will internally run MapReduce jobs.
  • Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
  • Extracted StrongView logs from servers using Flume and extracted information like open/click info of customers and loaded into Hive tables.
  • Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in MapReduce.
  • Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Worked on Impala-Tableau integration to help client's access data in hive tables through impala for creating various Dashboards.
  • Cleansed data is transformed into HIVE tables and then necessary queries performed on data to get the required results.
  • Experience in creating HIVE tables in order to further store data in a way that contained only useful information.
  • Performed operations like Delta Processing, Validations & Transformations on hive tables and transformed data into Base Consumable Layer.
  • Loaded data into HIVE tables, and extensively used Hive/HQL or Hive queries to query data in Hive Tables.
  • Involved in creating Hive Tables, and loading and analyzing data using hive queries for the customer card Details.
  • Involved in creating Hive tables, loading data and writing Hive queries which ran internally in map reduce way.
  • Created Hive tables and managed loading the data and writing hive queries that will run internally in MapReduce.
  • Used Hive/HQL or Hive queries to query or search for a particular string in Hive tables of HDFS.

Show More

15. Log Files

average Demand
Here's how Log Files is used in Hadoop Developer jobs:
  • Used Kafka for Log aggregation to collect physical log files from servers and puts them in the HDFS for further processing.
  • Logged various level of information like error, info, and debug into the log files using the Log4j.
  • Implemented the Map Reduce program for converting log files to CSV format which can be used for further processing.
  • Experienced in processing server, application and user log files using Hive in combination with Pig.
  • Developed Map Reduce Program for searching the production log files for application issues and download performance.
  • Perform Root cause analysis by debug the log files whenever a problem come in the system.
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS.
  • Transformed the log files into structured data using Hive SerDe's and Pig Loaders.
  • Monitored the log files and solved the issues raised by the application development team.
  • Involved in moving all the log files to HDFS for further processing through Flume.
  • Implemented different search techniques to perform on the structured log files for analysis.
  • Developed Pig Latin scripts to aggregate the log files of the business clients.
  • Implemented the Map Reduce program for converting log files to CSV format.
  • Developed custom aggregate UDF's in Hive to parse log files.
  • Spark Streaming is used to get the Web server log files.
  • Moved all log files generated by various products into HDFS location.
  • Experience in troubleshooting in map reduce jobs by reviewing log files.
  • Involved in managing & review of data backups and log files.
  • Experience in working on log files using Apache Storm.
  • Obtained experience in maintaining and understanding the log files.

Show More

16. File System

average Demand
Here's how File System is used in Hadoop Developer jobs:
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
  • Worked on tuning the performance Pig queries and involved in loading data from LINUX file system to HDFS.
  • Involved in loading data from LINUX file system, servers, Java web services using KafkaProducers, partitions.
  • Involved in loading data from Linux file system to HDFS and migrated data across clusters using DISTCP.
  • Connected local file system to HDFS using WinSCP and loaded data from UNIX file system to HDFS.
  • Used Cassandra Query language (CQL) to implement CRUD operations on Cassandra file system.
  • Involved in exporting and importing data from local(Linux) file system to HDFS.
  • Handled data coming from different data sources and loaded from UNIX file system into HDFS.
  • Developed MapReduce program to convert data from storage systems, flat file system into HDFS.
  • Developed a Shell Script which dynamically downloads Amazon S3 files into HDFS file system.
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions.
  • Worked on StreamingContext API to convert the file system data to batches of data.
  • Involved in loading data from local file system to HDFS using HDFS Shell commands.
  • Used Unix bash scripts to validate the files from Unix to HDFS file systems.
  • Experience in reading and writing files into HDFS using Java file system API.
  • Loaded the data from Linux/UNIX file system into HDFS for analyzing data.
  • Involved in loading data from LINUX file system to HDFS using Kettle.
  • Deployed Network file system (NFS) for Name Node Metadata backup.
  • Included in stacking data from UNIX File System framework to HDFS.
  • Implemented CRUD operations using CQL on top Cassandra file system.

Show More

17. ETL

average Demand
Here's how ETL is used in Hadoop Developer jobs:
  • Evaluated ETL applications to support overall performance and improvement opportunities.
  • Create and execute unit test plans based on system and validation requirements, Troubleshoot, optimize, and tune ETL processes.
  • Used SSIS to create ETL/ELT packages to validate, extract, transform and load data to data warehouse and data marts.
  • Investigate and diagnose issues with ETL and Big Data systems, data, and processes as part of systems administration team.
  • Developed ETL code using different transformations to extract, transform the data from legacy sources and load data into target system.
  • Developed ETL processes using Hive, including performance tuning the scripts using partitioning, map join, bucketing etc.
  • Developed map-reduce programs using Java and performed ETL operations such as data cleansing, data mapping and data conversion.
  • Developed Pig Latin scripts and used Pig as ETL tool for transformations, event joins, and filter.
  • Hive is used to perform ETL on data and conclusive reports were made using Excel sheet with graphs.
  • Resulted data extracted and loaded into Hive in incremental basis and full refresh after applying ETL transformations.
  • Validated ETL mappings and tuned them for better performance and implemented various Performance and Tuning techniques.
  • Involved in orchestration of delta generation for time series data and Developed ETL from ParAcel database.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked on extract, transform, and load (ETL) data from multiple data sources.
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Developed Map Reduce programs using Java to perform various ETL, cleaning and scrubbing tasks.
  • Created hive external tables to perform ETL on data that is generated on daily basics.
  • Worked on QlikView to provide data integration, reporting, data mining and ETL.
  • Worked on ETL data flow development using Hive/Pig scripts and loading from Hive views.
  • Implemented ETL code to load data from multiple sources into HDFS using pig scripts.

Show More

18. Zookeeper

average Demand
Here's how Zookeeper is used in Hadoop Developer jobs:
  • Used Kafka to conjunction with Zookeeper for deployment management, which necessitates monitoring its metrics alongside Kafka clusters.
  • Involved in configuring cluster using Zookeeper for Cluster configuration and Coordination.
  • Involved in configuring cluster using Zookeeper distributed cluster coordination service.
  • Configured Kafka brokers, Zookeepers to increase node utilization.
  • Configured the Zookeeper for Kafka implementation for dedicated Consumer.
  • Configured Zookeeper for Cluster co-ordination services.
  • Cluster co-ordination services through Zookeeper.
  • Coordinated cluster services using ZooKeeper.
  • Processed the logs in Spark Streaming to store it in Cassandra, provide high availability to Kafka brokers using Zookeeper.
  • Automated the installation and maintenance of Kafka, storm, zookeeper and elastic search using salt stack technology.
  • Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency and Monitored services.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions using Zookeeper.
  • Worked together with ZOOKEEPER and KAFKA to make Kafka talk to Zookeeper by various classes.
  • Involved in Cluster coordination services through Zookeeper and adding new nodes to an existing cluster.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Configured, deployed and maintained a single node Zookeeper cluster in DEV environment.
  • Created topics on the Desktop portal using Spark Streaming with Kafka and Zookeeper.
  • Used ZooKeeper and HDFS for high availability during Spark Streaming.
  • Created Produce, consumer and Zookeeper setup to Kafka replication.
  • Implemented real time system with Kafka, Storm and Zookeeper.

Show More

19. Kafka

average Demand
Here's how Kafka is used in Hadoop Developer jobs:
  • Replicated data across data centers using mirror maker in Apache Kafka by doing both synchronous replication and asynchronous replication.
  • Experience in maintaining and operating Kafka and monitor it consistently and effectively using cluster management tools.
  • Developed Java Base JMX application to capture Kafka Audit information for Data reconciliation.
  • Developed multiple Kafka topics/queues and produced 20Million data using producer written in java.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Developed multiple Kafka Producers and Consumers from scratch implementing organization & requirements.
  • Developed Kafka producer and consumer components for real time data processing.
  • Performed streaming data ingestion using Kafka to the spark distribution environment.
  • Configured, Designed implemented and monitored Kafka cluster and connectors.
  • Worked on events triggering using Kafka distributed messaging system.
  • Developed the customized Kafka Producers and consumers.
  • Configured Kafka Brokers and developed Storm topology.
  • Developed code base to stream data from sample data files > Kafka > Kafka Spout >Storm Bolt > HDFS Bolt.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Dumped data to Cassandra using Kafka, Created Cassandra tables to store various data formats data came from different portfolios.
  • Have been experienced with Spark streaming to ingest data into Spark engine and to receive real time data using Kafka.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers, and Kafka brokers.
  • Developed multiple Kafka Producers and Consumers from base by using low level and high level API's and implementing.
  • Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.

Show More

20. Data Analysis

average Demand
Here's how Data Analysis is used in Hadoop Developer jobs:
  • Developed statistical models to forecast inventory and procurement cycles.8 Developed Python code to provide data analysis and generate complex data report.
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Developed MapReduce programs in Java for Data Analysis and worked on compression mechanisms to optimize MapReduce Jobs.
  • Developed complex MapReduce programs in Java for Data Analysis on different data formats.
  • Created Hive queries for performing data analysis and improving performance using tuning parameters.
  • Performed Data analysis and prepared the physical database based on the requirements.
  • Developed MapReduce Programs for data analysis and data cleaning.
  • Developed Map Reduce programs for data analysis and data cleaning
  • Developed Map Reduce programs to perform data analysis.
  • Conduct data analysis and perform feasibility study.
  • Implemented Impala for data analysis.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
  • Implemented Hive custom UDF's to transform large volumes of data with respect to business requirement and achieve comprehensive data analysis.
  • Implemented Hive custom UDF's to integrate the Weather and geographical data which produces business data to achieve comprehensive data analysis.
  • Developed scripts for creating Hive tables, loaded with data and written hive queries using Hive QL for data analysis.
  • Used PigLatin and Hive to work on unmodified data and produce reports for client's purposes for large data analysis.
  • Performed Input data analysis, generated space estimation reports for Staging and Target tables in Testing and Production environments.
  • Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Involved in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
  • Designed and implemented SQL queries for data analysis and data validation and compare data in test and production environment.

Show More

21. Log Data

average Demand
Here's how Log Data is used in Hadoop Developer jobs:
  • Collected the log data that is frequently generated over the sources and stored the data in HDFS.
  • Created MapReduce jobs which were used in processing survey data and log data stored in HDFS.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
  • Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
  • Write Map Reduce Java programs to analyze the log data for large-scale data sets.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Connect customer, product, and TLOG data for scientific analysis of pricing.
  • Create Hive scripts for processing log data to develop report metrics.
  • Analyzed the web log data using the Hive query language.
  • Analyzed the web log data using the HiveQL.
  • Used Talend Open Studio to ingest the log data in to Hive, processing and loading in to MySQL database.
  • Analyzed customers before lunching products and maintain log data in Hadoop storage system.
  • Imported the streaming and log data into the Hdfs using SPARK.
  • Performed database cloning for testing purpose Scheduling Jobs Managing to remove the duplicate log data files in HDFS using Oozie.
  • Used Map Reduce and Sqoop to load, aggregate, store and analyse web log data from different web servers.
  • Source daily weblog data into Hadoop/Hive and transform raw logs into defined schema for Aster Web Analytics project consumption.

Show More

22. Nosql

average Demand
Here's how Nosql is used in Hadoop Developer jobs:
  • Provided NoSql solutions in MongoDB, Cassandra for data extraction and storing huge amount of data.
  • Worked on to ease the jobs by building the applications on top of NoSQL database Cassandra.
  • Involved in building the REST API using Jersey API for fetching the data from NoSQL MongoDB.
  • Processed the source data to structured data and store in NoSQL database Cassandra.
  • Acquired good understanding and experience of NoSQL databases such as MongoDB and Cassandra.
  • Analyzed the advantage of introduction of Cassandra NoSQL database into the present system.
  • Worked on a Cassandra NoSQL based database that persists high-volume user profile data.
  • Worked with NoSQL databases like Cassandra to create control and Metadata tables.
  • Experienced in design, development, tuning and maintenance of NoSQL database.
  • Worked with Cassandra and utilized NoSQL for non-relation data storage and retrieval.
  • Implemented the DAO layer of rewriting applications with MongoDB NoSQL database.
  • Worked on importing and exporting data from various NoSQL databases frequently.
  • Tested the performance of the data sets on various NoSQL databases.
  • Created and compared solutions with NoSQL databases and SQL server solutions.
  • Designed table architecture and developed DAO layer using Cassandra NoSQL database.
  • Well Versed in NoSQL concepts and good at writing queries.
  • Gained knowledge in NoSQL database with Cassandra and MongoDB.
  • Used Cassandra to process high volume of NoSQL data.
  • Experienced with NoSQL database and handled using the queries.
  • Involved in loading and maintenance of NoSQL database.

Show More

23. XML

average Demand
Here's how XML is used in Hadoop Developer jobs:
  • Involved in development of Storm topology for ingestion of data through XML payload and then load them to various distributed stores.
  • Used spark to parse XML files and extract values from tags and load it into multiple hive tables using map classes.
  • Implemented SOA architecture with Web Services using SOAP, WSDL, UDDI and XML using Apache CXF framework tool/Apache Commons.
  • Created a java utility to sort and convert log files to XML files based on different parameters.
  • Created database schema for app and imported data from XML file to Oracle database on Amazon RDS.
  • Worked on Map phase programs which extracts appraisal PDF and appraisal images from the original appraisal XML.
  • Moved all crawl data files (XML) generated from various retailers to HDFS for further processing.
  • Developed applications using Java/J2EE, Struts, Spring, Hibernate, JMS, JDBC and XML.
  • Developed PySpark code to read data from Hive, group the fields and generate XML files.
  • Implemented custom input format and record reader to read XML input efficiently using SAX parser.
  • Developed Map Reduce scripts to make web service calls and convert data into XML.
  • Developed the XML Schema and Amazon Web services for the data maintenance and structures.
  • Serialized the streamed XML to String using Text Decoders to further processing the data.
  • Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface.
  • Performed XML data mining and stored extracted results into database using XSLT and JAXB.
  • Involved in parsing of the input XML files and loading to MySQL database.
  • Applied SOAP for Web Services by exchanging XML data between applications over HTTP.
  • Parsed XML files and loaded the data into hive ORC tables using spark.
  • Used pig and map reduce to analyze XML files and log files.
  • Used SOAP service to exchange XML based messaging between server and user.

Show More

24. Scala

average Demand
Here's how Scala is used in Hadoop Developer jobs:
  • Write User Defined function wherever required special functionality using Spark-Scala.
  • Fine-Tuned Oracle procedures during implementation of Spark-Scala processes.
  • Worked on ApacheSpark along with SCALA Programming language for transferring the data in much faster and efficient way.
  • Analyze data in Pig Latin, Hive and Map Reduce in Java and SCALA (SPARK).
  • Created RDD's in SPARK TECHNOLOGY using PYTHON/SCALA.
  • Worked on regular expression related text-processing using the in-memory computing capabilities of Spark using Scala.
  • Implemented data classification and categorization using Spark machine learning library using Scala.
  • Developed enhancements to Hive architecture to improve performance and scalability.
  • Worked on migrating MapReduce programs into Spark transformations using Scala.
  • Developed Scala programs to perform data scrubbing for unstructured data.
  • Develop batch processing solutions using Spark with Scala/Python and Cassandra.
  • Involved in developing Scala programs which supports functional programming.
  • Implemented recommendation engine using scala.
  • Created reader, processor and writer components to load the data from transporter to Enterprise data hub using Spark and Scala.
  • Used TWS for batch processing and scheduling work flows by understanding workload management, availability, scalability and distributed data platforms.
  • Diagnosed and produced solutions to problems such as scalability to thousands of processing units and load balancing on terabytes of data.
  • Create flexible data model design that is scalable, reusable, while emphasizing performance, Data Validation and business needs.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Developed Scala programs, Upgraded older version Pipelines using Spark code and Spark-SQL for faster testing and processing of data.
  • Worked on optimizing ETL jobs based on Spark in Scala to improve performance in order to meet demanding client requirements.

Show More

25. Business Requirements

average Demand
Here's how Business Requirements is used in Hadoop Developer jobs:
  • Played a key role in understanding user requirements and translating business requirements into technical solutions and documenting them for ClearXchange project.
  • Interact with business users to gather business requirements and understand the requirements, analyze and validate design architecture.
  • Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Coordinated with business customers to gather business requirements and interact with other technical peers to derive technical requirements.
  • Worked in an agile development environment, evaluated business requirements and prepare the business requirements and design documents.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop the application.
  • Participate in requirement gathering and documenting the business requirements by conducting workshops/meetings with various business users.
  • Designed number of partitions and replication factor for Kafka topics based on business requirements.
  • Participated in gathering requirements, analyze requirements and design technical documents for business requirements.
  • Optimized Map Reduce algorithms and implemented in Java according to the business requirements.
  • Prepare technical design documents based on business requirements and prepare data flow diagrams.
  • Developed and implemented Java code according to MapReduce for the business requirements.
  • Create technical specification documents by analyzing the functional specifications and business requirements.
  • Collected business requirements and wrote functional specifications and detailed design documents.
  • Gathered the business requirements by coordinating and communicating with business team.
  • Work closely with the technology counterparts in communicating the business requirements.
  • Performed visualizations according to business requirements using custom visualization tool.
  • Developed Java Mapper and Reducer programs for complex business requirements.
  • Performed visualizations per business requirements using custom visualization tool.
  • Experienced in gathering business requirements and develop design documents.

Show More

26. Reduce Programs

average Demand
Here's how Reduce Programs is used in Hadoop Developer jobs:
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Support and maintain a repository of map-reduce programs used for creating metrics for regulatory reporting.
  • Experienced in developing complex MapReduce programs against structured, and unstructured data.
  • Developed MapReduce programs to Validate Incoming Data and report invalid data.
  • Developed several advanced Map Reduce programs as part of functional requirements.
  • Developed several advanced Map Reduce programs to process data files received.
  • Used MongoDB extensively to filter required data for MapReduce programs.
  • Developed efficient MapReduce programs for filtering out the unstructured data.
  • Developed several MapReduce programs to process data files received.
  • Worked on migrating MapReduce programs into PySpark transformation.
  • Developed Java map-reduce programs to encapsulate transformations.
  • Developed transformations using custom MapReduce programs.
  • Develop map reduce programs using Combiners, Sequence Files, Compression techniques, Chained Jobs, multiple input and output API.
  • Map reduce programs used this data to determine different meter data numbers that could be used for calculations involving power usage.
  • Worked on creating Hive tables, loading with data and writing Hive queries that will run internally like map reduce programs.
  • Map reduce programs were written to do Sales Compensation calculation, grouping items and extending the promos to store level.
  • Implemented Partitioning, Dynamic Partitions, buckets in Hive and wrote map reduce programs to analyze and process the data.
  • Developed MapReduce programs for data cleansing and transformations and load the output to Hive/Impala partition tables in Parquet format.
  • Developed MapReduce programs and Hive queries to analyze shipping pattern and customer satisfaction index over the history of data.
  • Implemented Map reduces programs to get Top K Results using Map Reduce programs by fallowing Map Reduce Design Patterns.

Show More

27. Relational Databases

average Demand
Here's how Relational Databases is used in Hadoop Developer jobs:
  • Maintained Relational Databases for NS.
  • Developed Sqoop scripts for exporting analyzed data from relational databases to hive.
  • Job Configuration, Channing Map Reduce Jobs, reading writing data from relational Databases, implemented Sorting and joining data sets.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
  • Used Sqoop jobs, PIG and Hive scripts for data ingestion from relational databases to compare with historical data.
  • Exported the analyzed data to the relational databases using Sqoop to settling up QA testing environment.
  • Used Apache Sqoop to transfer Relational databases into HDFS as Hive tables.
  • Exported data from Hive, HDFS to other relational databases using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization.
  • Used sqoop to migrate data from several relational databases into HDFS.
  • Used Sqoop to move the data from relational databases to HDFS.
  • Involved in emitting processed information from Hadoop to relational databases or external frameworks utilizing Sqoop, HDFS GET or CopyToLocal.
  • Loaded data from different relational databases (Oracle and MySql) into HDFS and Vice Versa using sqoop.
  • Imported data from MySQL server and other relational databases to Apache Hadoop with the help of Apache Sqoop.
  • Exported the analyzed data to the relational databases using Sqoopand to generate reportsfor the BI team.
  • Transfer data to & from different relational databases like MySQL to HDFS, HBase using Sqoop.
  • Used Talend for integrating Data from Relational Databases like oracle and mysql.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Databasesystems/ mainframe andvice-versa.
  • Exported the analysed data to the relational databases using Sqoop and scheduled them using Oozie.

Show More

28. Python

average Demand
Here's how Python is used in Hadoop Developer jobs:
  • Developed Spark application using Python on different data formats for processing and analysis.
  • Developed MapReducejobs in Python for data cleaning and data processing.
  • Developed scripting components using with Python scripting language.
  • Developed Java Map Reduce programs using Python programming.
  • Worked on internal automation frameworks using Python.
  • Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
  • Developed a web crawler code to obtain the raw data of product review and performed data cleansing in Python.
  • Developed simple to complex map reduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.
  • Experience with Python and Shell scripting, which will be used for automating ETL jobs and tasks.
  • Worked on with python script that will analyze the required data and process them into required output.
  • Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
  • Crawled some websites using Python and collected information about users, questions asked and the answers posted.
  • Developed Spark jobs in Python that are used by data scientists to design predictive models of devices.
  • Used Python (Panda library) and R (ggplot2 library) for analysis and visualization.
  • Worked with typical modules including python and CSV and have good knowledge on pickle for development.
  • Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
  • Used open source web scraping framework for python to crawl and extract data from web pages.
  • Implemented the Python Schema check utility to check the schema integrity for the input landing files.
  • Used Pandas and NumPy array indexing, and also Series and DataFrame of pandas in python.

Show More

29. BI

average Demand
Here's how BI is used in Hadoop Developer jobs:
  • Involved in the Big-data requirements review meetings and partnered with business analysts to clarify any specific scenarios.
  • Evaluate big data technologies and prototype solutions to improve our data processing architecture.
  • Advanced through multiple roles and contributed to high-visibility projects during highly successful tenure.
  • Collaborated with BI teams to ensure data quality and availability.
  • Developed user defined functions to provide custom pig capabilities.
  • Used Tableau for visualizing the customer eligibility data.
  • Populate big data customer Marketing data structures.
  • Worked on Ingestion of deltas for the company's biggest table INVOICE_LINE (700 GB) with partitioning on daily basis.
  • Worked in the BI team in the area of Big Data cluster implementation and data integration in developing large-scale system software.
  • Analyze and Ingest Policy, Claims, Billing and Agency Data in Client's Solution which is done through multiple stages.
  • Build a mechanism for automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
  • Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig.
  • Key areas were leveraging this capability to drive financial benefit to the company in Information Management, Marketing and Risk Capabilities.
  • Test possible use of Graph database (Neo4J) by combing different sources and find the relevant path for a node.
  • Created Design and architecture documents for the billing system and coordinated the effort across cross functional teams across this huge company.
  • Performed queries on tables stored in Pig to find patterns in the temperatures of each turbine on a daily scale.
  • Prospect Marketing & Digital-Big Data Capabilities: Merchant Financing from American Express is a unique approach to small business lending.
  • Performed Configuring, Managing of Azure Storage with PowerShell, Azure Portal, Azure virtual Machines for High Availability Solutions.
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports to the BI team.
  • Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.

Show More

30. Rdbms

low Demand
Here's how Rdbms is used in Hadoop Developer jobs:
  • Exported analyzed data to downstream systems using Sqoop-RDBMS for generating end-user reports, Business Analysis reports and payment reports.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Normalized the data coming from various sources like RDBMS, Flat Files and various log files.
  • Pushed the data to RDBMS Systems for mount location for Tableau to import it for reporting.
  • Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
  • Used MapReduce to process the large sets of unstructured data to compatible with RDBMS's.
  • Developed multiple Map Reduce and worked on importing and exporting data into HDFS from RDBMS.
  • Ingested data from RDBMS and performed data transformations, and then export to Cassandra.
  • Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
  • Exported data from HDFS to RDBMS for visualization and user report generation using Tableau.
  • Experience in streaming the data between Kafka and other databases like RDBMS and NoSQL.
  • Aggregated results are then exported over to downstream RDBMS for Business Intelligence reporting.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS.
  • Loaded the aggregated data onto RDBMS for reporting on the dashboard.
  • Developed Hive tables to store data from RDBMS database.
  • Automated importing data from RDBMS to hive using spark.
  • Imported Data from RDBMS to into HDFS/Hive Table.
  • Migrated data from HDFS to RDBMS using spark.
  • Created Hive tables from RDBMS data.
  • Developed ER Diagrams to design RDBMS.

Show More

31. API

low Demand
Here's how API is used in Hadoop Developer jobs:
  • Developed Hive-JDBC implementation using Thrift API.
  • Project is uniquely focused to help clients manage and move their financial assets and succeed in the rapidly changing global marketplace.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Used ASN encoding to send the data across the network and used MIIG API to talk to mainframe server.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
  • Implemented a Data service as a rest API project to retrieve server utilization data from this Cassandra Table.
  • Developed on-the-fly decryption support for Hive, Pig and custom Map Reduce use cases using Java API.
  • Involved in the team meetings with the customers and weekly status meetings Project: Presence API Development
  • Used Thrift API/Cassandra Query Language (CQL) to access Cassandra data and implement CRUD operations.
  • Experienced in Developing spark application using spark Core, spark SQL and spark Streaming API's.
  • Integrated Apache Kafka with Apache Spark, for streaming data using Twitter Data with Twitter API.
  • Stored the large data sets that are rapidly changing in HBASE, optimizing the updates.
  • Worked on MongoDB to load and retrieve data for real time processing using Rest API.
  • Migrated the existing web application from using DB2 data to calling the REST API services.
  • Used Cassandra (CQL) with Java API's to retrieve data from Cassandra tables.
  • Gained experience on Selenium, Rally, SOAP UI, REST/SOAP testing and API testing.
  • Assisted in developer engagement for Phone API applications and cloud services through social networking portal.
  • Developed the Java client API for node provisioning, load balancing and artifact deployment.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.

Show More

32. Teradata

low Demand
Here's how Teradata is used in Hadoop Developer jobs:
  • Created extensive SQL queries for data extraction to test the data against the various databases like ORACLE, TERADATA and DB2.
  • Extracted data is processed and dispatched to TeraData for reporting purposes.
  • Have designed and developed logic for aggregate and complex procedures in teradata.
  • Propose and implement technical solutions to improve data warehouse performance in Teradata.
  • Developed Hive Scripts equivalent to Teradata and performance tuning using Hive.
  • Worked with Oracle and Teradata for data import/export operations.
  • Prepare ETL specification, creating mappings, Teradata scripts for extraction, transformation and loading of data to data warehouse.
  • Replaced full table statistics with Sample statistics and reduced the usage of existing Teradata resources on need to change basis.
  • Created several Teradata SQL queries and created several reports using the above data mart for UAT and user reports.
  • Involved in migration of Teradata queries such as updates, inserts and deletes migration into the hive queries.
  • Have migrated various files from Teradata and DB2 to HDFS using shell scripts and other FTP Commands.
  • Worked with Teradata Appliance team, HortonWorks PM and Engineering Team, Aster PM and Engineering team.
  • Developed setting session parameters to set the session as ANSI or Teradata (BTET) mode.
  • Exported the data present in HDFS to Teradata for further usage by the data science team.
  • Developed a process for scooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Processed data with Hive and Teradata, and developed web applications using Java and Oracle SQL.
  • Performed Unit testing on both Teradata & Hive tables such that it has same record count.
  • Involved heavily in writing complex SQL queries based on the given requirements on Teradata platform.
  • Worked on several BTEQ scripts to transform the data and load into the Teradata database.
  • Experienced in working with various kinds of data sources such as Teradata and DB2.

Show More

33. Latin Scripts

low Demand
Here's how Latin Scripts is used in Hadoop Developer jobs:
  • Developed Pig Latin scripts for data cleansing and analysis of semi-structured data.
  • Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
  • Developed Pig Latin scripts to extract the data from the FTP Servers as files and applied data transformation logic.
  • Developed Pig Latin scripts to extract the data and load into HDFS for further analysis using Hive.
  • Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimization of existing scripts.
  • Developed Pig Latin scripts and HQL queries for the analysis of Structured, Semi-Structured and Unstructured data.
  • Developed Pig Latin scripts using DDL and DML to extract data from files and load into HDFS.
  • Configured Pig and also designed Pig Latin scripts to process the data into a universal data model.
  • Created Hive and Pig Latin scripts to manage, transform and load the data to HDFS.
  • Removed bugs from code by performing testing and debugging of MapReduce jobs and Pig Latin scripts.
  • Created Pig Latin scripts to support multiple data flows involving various data transformations on input data.
  • Participated in performance and optimization meetings for MapReduce programs, Hive scripts and Pig Latin scripts.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
  • Developed PigLatin scripts and HiveQL queries for trend analysis and pattern recognition on user data.
  • Installed and configured Pig and also wrote Pig Latin scripts for Map Reduce jobs.
  • Performed joins, group by and other operations in MapReduce using PIG Latin scripts.
  • Created Pig Latin scripts in the areas where extensive MapReduce code can be reduced.
  • Developed PIG Latin scripts for the analysis of input file which were semi structured.
  • Developed Pig Latin scripts to load data from output files and put to HDFS.

Show More

34. Generate Reports

low Demand
Here's how Generate Reports is used in Hadoop Developer jobs:
  • Demonstrated Proof of Concept to business units/stake holders to identify business strategies and generate reports using Tableau and QlikView
  • Used Tableau for visualization and generate reports for financial data consolidation, reconciliation and segmentation.
  • Worked on the DWCDR web module, users can generate reports by querying data stored in the HDFS HIVE tables.
  • Generate reports based on hive queries and by querying big data sources using REST based web services.
  • Exported the analyzed data to hive for visualization and to generate reports using Tableau.
  • Developed an application using Spark to process data from Cassandra clusters and generate reports.
  • Used Tableau for visualizing and to generate reports.
  • Generate reports and predictions using Tableau.
  • Involved in extracting data from different servers and dump the data into Hadoop cluster to generate reports for analysis.
  • Used TABLEAU which grabs data to generate reports, graphs and charts summarising the given set of data.
  • Extracted the data from Oracle into HDFS using Sqoop to store and generate reports for visualization purpose.
  • Created many complex SQL queries and used them in Oracle Reports to generate reports.
  • Managed and reviewed Hadoop logs and generate reports on the data.
  • Transform massive amounts of raw data into actionable analytics Developed scripts to automate the process and generate reports.

Show More

35. Json

low Demand
Here's how Json is used in Hadoop Developer jobs:
  • Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
  • Experience in working on various data compression techniques like snappy, ORC, sequence and formats like XML, JSON etc.
  • Experience with creating ETL jobs to load JSON Data and server Data into MongoDB and transformed MongoDB into the Data warehouse.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Developed Java programs to process huge JSON files received from marketing team to convert into format standardized for the application.
  • Worked with various formats of files like delimited text files, Apache log files, JSON files, XML Files.
  • Experience in Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats.
  • Collected JSON that is generated by the Data Collector from Oracle and mapped to objects for processing and persistence.
  • Integrated Spark Streaming with Sprinkler using pull mechanism and loaded the JSON data from social media into HDFS system.
  • Worked on custom Pig loaders to work with a variety of data formats such as JSON, CSV etc.
  • Portioned, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Worked on various data formats including tab delimited, comma-separated, XML, JSON and semi-structured data.
  • Developed scripts for parsing data from CSV, JSON and XML files into Hive and Pig environments.
  • Performed validation and standardization of raw data from XML and JSON files with Pig and MapReduce.
  • Created a phase for the demonstration of retrieving the JSON data by calling the REST service.
  • Created external tables in Hive, Loaded JSON format log files and ran queries using HiveQL.
  • Worked with KAFKA POC, implemented the twitter producer to test the live raw JSON data.
  • Develop shell scripts and makes the process automotive to drive the process from JSON to BSON.

Show More

36. Avro

low Demand
Here's how Avro is used in Hadoop Developer jobs:
  • Developed a PySpark code for saving data in to AVRO and Parquet format and building hive tables on top of them.
  • Implemented RC, ORC, Sequence and AVRO files formats in Hive to achieve better performance.
  • Developed scripts to convert raw data to AVRO and load the data into Hive tables.
  • Worked with Hive AVRO data format to compress data and speed up processing.
  • End to End implementation with AVRO and Snappy.
  • Gained knowledge on AVRO serialization technique.
  • Worked on platforms like Kafka clusters with ORC, RC, AVRO and PARQUET files.
  • Used Avro serialization technique to serialize data for handling schema evolution.
  • Worked on creation of Hive tables on top of Avro files for execution of HQL queries as per client's requirement.
  • Optimized MAP/Reduce jobs to use HDFS efficiently by using various compression mechanisms like Avro, text, and parquet.
  • Converted data in files to AVRO and loaded it into Hive tables Created Producers and Consumers for Kafka.
  • Loaded various formats of data like Avro, parquet into these tables and analyzed data using HQL.
  • Developed Avro model designs on required fields from the database to represent the data to Dashboards.
  • Design and developed Custom Avro Storage to use in Pig Latin to Load and Store data.
  • Worked with different File Format like Avro file format and SEQUENCEFILE for Hive querying and processing.
  • Used Avro file format to store query results in order to manage large quantity of data.
  • Worked on Avro, parquet file format to ingest the data from source to target.
  • Worked on Avro file format and Snappy Compression techniques to leverage the data in HDFS.
  • Exported the data from Avro files and indexed the documents in sequence file format.
  • Used Pig to import semi-structured data from Avro files to make serialization faster.

Show More

37. Workflow Engine

low Demand
Here's how Workflow Engine is used in Hadoop Developer jobs:
  • Scheduled the jobs with Oozie workflow engine.
  • Scheduled jobs using Oozie workflow Engine.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs Develop HiveQL scripts to perform the incremental loads.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs Experienced with performing CURD operations in HBase.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs Performed unit testing using MRUnit.

Show More

38. Test Cases

low Demand
Here's how Test Cases is used in Hadoop Developer jobs:
  • Write test cases to test software throughout development cycles, inclusive of functional/unit-testing/continuous integration.
  • Followed Agile Methodology (SCRUM) for easy verification from clients and wrote Test Cases for Unit Testing the integration layer.
  • Worked on preparing the test cases, documenting and performing unit testing and System Integration testing and fixing defects.
  • Created visualizations in Tableau using Excel data extract source, wrote test cases & documented the test case results.
  • Involved in testing activities within QA environment which include System testing, Integration testing and writing test cases.
  • Created unit test plans, test cases and reports on various test cases for testing the data loads.
  • Involved in building and maintaining test plans, test cases, defects and test scripts using SVN.
  • Involved in designing unit test cases for mapper, reducer and driver classes using MR testing library.
  • Prepare Test scenarios, Test cases/scripts and get them signed off by the business team.
  • Carried out DIT testing exhaustively including negative test cases and transferred scripts for SIT testing.
  • Develop and execute maintainable automation tests for acceptance, functional, and regression test cases.
  • Worked as the Test Engineer for preparing test results document for the test cases.
  • Developed Unit Test Cases, and used JUNIT for Unit Testing of the application.
  • Used Test Driven Development in writing the test cases for developing the Java modules.
  • Installed and configured QA framework for 100+ test cases automation in Linux.
  • Developed several test cases using MR Unit for testing Map Reduce Applications.
  • Created test data and test functions for the various test cases.
  • Involved in writing unit test cases for camel routes and processors.
  • Prepare Developer (Unit) Test cases and execute Developer Testing.
  • Developed Use Case diagrams and flow diagrams for the test cases.

Show More

39. AWS

low Demand
Here's how AWS is used in Hadoop Developer jobs:
  • Used AWS to produce comprehensive architecture strategy for environment mapping.
  • Designed and implemented data processing using AWS Data Pipeline.
  • Migrated existing on-premises application to AWS.
  • Design and implement solutions with Amazon Web Services (AWS) cloud technologies such as S3, EC2 and Dynamo DB.
  • Experience and Work in a language agnostic environment with exposure to multiple web platforms such as AWS and databases like Cassandra.
  • Worked on setting up cloud environment and supported Amazon AWS services like EC2, S3, VPC, RDS etc.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process data stored in Amazon S3 bucket.
  • Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive and AWS cloud.
  • Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Worked on the Data Pipeline which is an orchestration tool for all our jobs that run on AWS.
  • Created custom input adapters for pulling the raw click stream data from FTP servers and AWS S3 buckets.
  • Worked on AWS cloud to create EC2 instance and installed Java, Zookeeper and Kafka on those instances.
  • Developed setting up Heterogeneous Connectivity between a PostgreSQL and Oracle Database, both running on AWS Cloud.
  • Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
  • Configured and managed through the Amazon Web Services (AWS) Management Console using Amazon Cloud Search.
  • Experienced in working with Elastic Map Reduce (EMR) on Amazon Web Services (AWS).
  • Perform SQOOP Incremental Import Job, Shell Script & CRONJOB for importing data into AWS S3.
  • Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services.
  • Migrated corporate Linux servers from physical servers to Amazon Web Services (AWS) virtual servers.

Show More

40. Large Amounts

low Demand
Here's how Large Amounts is used in Hadoop Developer jobs:
  • Analyzed large amounts of data sets from hospitals and providers to determine optimal way to aggregate and generate summary reports.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used to Storm analyze large amounts of non-unique data points with low latency and high throughput.
  • Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
  • Experience in using Apache Flume for collecting, aggregating and moving large amounts of data.
  • Analyzed large amounts of raw data in an effort to create information.
  • Analyzed large amounts of data sets by writing Pig scripts.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
  • Collected and aggregated large amounts of streaming log data into the Hadoop cluster using Flume.
  • Worked on partitioning and bucketing of large amounts of data to optimize Hivequery performance.
  • Analyzed large amounts of datasets to determine optimal way to aggregate and report.
  • Experience loading and transforming large amounts of structured and unstructured data into HBase database and exposure handling Automatic failover in HBase.

Show More

41. Impala

low Demand
Here's how Impala is used in Hadoop Developer jobs:
  • Use Impala to determine statistical information about Operational Data.
  • Published and analyzed data using Impala.
  • Transformed the Impala queries into hive scripts which can be run using the shell commands directly for higher performance rate.
  • Worked on real-time, in-memory tools such as Spark, Impala and integration with BI Tools such as Tableau.
  • Developed various JAVA UDF functions to use in both Hive and Impala for ease of usage in various requirements.
  • Configured the Hive Metadata and CatalogD to make it possible for Impala daemon to pull data using Hive metadata.
  • Used Impala to pull Hive table data for faster query processing and pushed the results to Cassandra.
  • Developed Spark core and Spark SQL jobs for critical SLA applications and load into Hive/Impala tables.
  • Worked on customizing Map Reduce code in Amazon EMR using Hive, Pig, Impala frameworks.
  • Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
  • Used Impala for aggregating jobs and runs an average of 100K aggregation queries per day.
  • Experienced in converting the AVRO data into PARQUET format in IMPALA for faster query processing.
  • Performed analysis using MPP database like Impala to analyze instant insights from the data.
  • Experience working on parquet files in impala and implemented multiple batch queries in impala.
  • Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
  • Coordinated with the Shared Platform team for deploy the code and Impala views.
  • Performed Joins, Grouping, and Count Operations on the Tables using Impala.
  • Involved in migrating Hive QL into Impala to minimize query response time.
  • Connected hive and impala to tableau reporting tool and generated graphical reports.
  • Worked on Impala for obtaining fast results without any transformation of data.

Show More

42. Manage Data

low Demand
Here's how Manage Data is used in Hadoop Developer jobs:
  • Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.
  • Developed shell script to manage data in UNIX file system and moving to HDFS as Landing.
  • Manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data Responsible to manage data coming from different sources.
  • Used UDF's to implement business logic in Hadoop and responsible to manage data coming from different sources.
  • Developed UNIX Shell scripts to automate repetitive database processes Responsible to manage data coming from different sources.
  • Involved in review of functional and non-functional requirements Responsible to manage data coming from different sources.
  • Manage databases from 40TB to 100 TB using Hadoop Cluster.
  • Dumped the data from HDFS toMYSQL database and vice-versa using SQOOP Responsible to manage data coming from different sources.

Show More

43. Data Warehouse

low Demand
Here's how Data Warehouse is used in Hadoop Developer jobs:
  • Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
  • Implemented procedures for maintenance of quality and security of company data warehouse systems.
  • Prepared and implemented data verification and testing methods for the data warehouse.
  • Assisted in different data Modeling and Data Warehouse design and development.
  • Created reports by extracting transformed data from Composite data warehouse.
  • Involved in Data pipeline validations for different internal data warehouse.
  • Used DataStage Designer for developing various jobs for Extracting, Cleansing, Transforming, Integrating and Loading data into Data Warehouse.
  • Performed analysis on the extracted data which is transformed and loaded into the Data Warehouse, to achieve quality and consistency.
  • Involved in the development of the Hive/Impala scripts for extraction, transformation and loading of data into other data warehouses.
  • Created views on Oracle to source the data from DART data warehouse and expose them to Axiom SL reporting tool.
  • Developed and maintained large scale distributed data platforms with experienced in data warehouses, data marts and data lakes.
  • Involved in creating Dashboards using Tableau Desktop, Tableau Server by connecting to HIVE, Impala data warehouse systems.
  • Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD's.
  • Created Mappings to move data from Oracle, SQL Server to new Data Warehouse in Green Plum.
  • Migrated EDW (Enterprise Data Warehouse) into Data Lake and implemented Star Schema in Big Data.
  • Worked with Hive data warehouse to analyze the historic data in HDFS to identify behavioral patterns.
  • Worked on Hive (Distributed Data Warehouse) in managing the data stored in HDFS.
  • Develop ETL routines to source data from client source systems and target the data warehouse.
  • Experience in writing Hive/HQL scripts to extract and load data in to Hive Data warehouse.
  • Integrated Data warehouse (IDW) as single source of truth in the BI Landscape.

Show More

44. Maven

low Demand
Here's how Maven is used in Hadoop Developer jobs:
  • Used Apache Maven extensively while developing MapReduce program.
  • Used different plugins of Maven to clean, compile, build, install, deploy and more for jars and wars.
  • Project Description: Involved in multiple projects which includes optimizing hive queries in spark developed in Eclipse - Maven built.
  • Used WebSphere as an application server and used Apache Maven to deploy and build the application in WebSphere.
  • Used automated build systems like SBT and MAVEN and implement new features or scripts for the build system.
  • Used Maven extensively for building jar files of Map-Reduce jobs as per the requirement and deployed to cluster.
  • Experience in using Maven extensively for building jar files of Map Reduce programs to deploy into the cluster.
  • Used Vagrant as an internal VM which provides a running environment for testing and maven to build.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Used J-UNIT tool for performing Unit testing and Used ECLIPSE and MAVEN to build the application.
  • Experienced in building Java applications using Maven build and Jenkins, version management software SVN.
  • Compiled and built the application using MAVEN and used SVN as version control system.
  • Experienced in configuring maven builds that integrated dependencies check styles, test coverage's.
  • Used Apache Maven to build and configure the application for the MapReduce jobs.
  • Used Maven and Jenkins for compiling, building and packing the applications.
  • Involved in generating JAXB classes from XSD file using maven-JAXB plugins.
  • Worked on Maven 3.3.9 for building and managing Java based projects.
  • Used Maven as build tool and Git for version control.
  • Used Maven, Eclipse and Ant to build the application.
  • Implemented Maven builds using plugins to do automatic version.

Show More

45. POC

low Demand
Here's how POC is used in Hadoop Developer jobs:
  • Performed feasibility analysis and POC for big data application in fraud detection.
  • Developed mini games like rock-paper-scissor-lizard-Spock, Pong.
  • Created working POC's using Spark 1.1.0 streaming for real time stream processing of continuous stream of large data sets.
  • Implemented POC in persisting click stream data with Apache Kafka and Spark Streaming API into HBASE and Hive.
  • Developed tests cases and POC's to benchmark and verify data flow through the Kafka clusters.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying, on a POC level.
  • Performed POC to implement Apache Spark to discuss the uses of implementing it in project.
  • Implemented POC's using Amazon Cloud Components S3, EC2, Elastic beanstalk and SimpleDB.
  • Worked on the POC/development for watch tower application to bring data to HDFS and Hive.
  • Worked on various POC's like Apache Spark and integration of Kafka with spark streaming.
  • Proof of Concept (PoC)for enabling member and suspect search using Elastic search.
  • Worked on a POC to perform sentiment analysis of twitter data using Open NLP API.
  • Gathered business requirements in meetings for successful implementation and POC and moving it to Production.
  • Participated in multiple big data POC to evaluate different architectures, tools and vendor products.
  • Involved in various POC's to choose right big data tools for business use cases.
  • Involved in supporting Design and analysis by providing POC's by using Cassandra DB.
  • Created PoC to store Server Log data in MongoDB to identify System Alert Metrics.
  • Designed a data model in Cassandra(POC) for storing server performance data.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Developed a POC (proof of concept) on Cassandra as NoSQL database.

Show More

46. File Formats

low Demand
Here's how File Formats is used in Hadoop Developer jobs:
  • Worked on various file formats such as Sequence files, AVRO, HAR files to be processed by MapRedce programs.
  • Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
  • Gained hands on experience dealing with ORC, Sequence file formats and loading compressed data into hive tables.
  • Worked with various Hive file formats like TEXT, ORC and AVRO for data sources having dynamic schema.
  • Worked with different file formats like Sequence files, XML files and Flat files using Map Reduce Programs.
  • Worked with different Hive file formats like RC file, Sequence file, ORC file format and Parquet.
  • Write Map Reduce programs to work with more small files that store data in Sequence file formats.
  • Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
  • Used Spark transformations for Data Wrangling and ingesting the real-time data of various file formats.
  • Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
  • Used different columnar file formats (RC File, Parquet and ORC formats).
  • Worked on different file formats like Text files, Sequence Files, ORC files.
  • Worked on loading tables to Impala for faster retrieval using different file formats.
  • Developed Custom Input Formats in MapReduce jobs to handle custom file formats.
  • Experienced with multiple file in HIVE, AVRO, Sequence file formats.
  • Implemented MapReduce Custom File Formats, Custom Writable and Custom Practitioners.
  • Worked with different file formats and compression techniques to determine standards.
  • Worked on impala performance tuning with different workloads and file formats.
  • Experience in using Sequence files, AVRO and HAR file formats.
  • Stored, Accessed and Processed data from different file formats i.e.

Show More

47. Data Solutions

low Demand
Here's how Data Solutions is used in Hadoop Developer jobs:
  • Designed and developed big data solutions involving Terabytes of data.
  • Project Description: The objective is to design and develop big data solutions involving Terabytes of data.
  • Experience with Jenkins for deploying Big data solutions (CI/CD).
  • Project: Engineering Data Mart * Designed and developed data solutions to help product and business teams make data driven decisions.
  • Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
  • Configured MySQL Database to store Hive metadata and Involved in building scalable distributed data solutions using Hadoop.
  • Worked on CPT as a part-time employee)Responsibilities: Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.

Show More

48. Setup

low Demand
Here's how Setup is used in Hadoop Developer jobs:
  • Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to Elastic search.
  • Simplified the configuration setup for the varying user defined environments by automating the processes and saving time for the ASBCE team.
  • Involved in launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
  • Support Data Scientists with Data and Platform Setup for their analysis and finally migrating their finished product to Production.
  • Cluster design, installation and configuration using HDP 2.1 stack on Azure and HDP 2.2 on premise setup.
  • Provided extensive support to L2 team regarding the environment setup, monitoring and solving of several issues.
  • Launched and Setup of HADOOP Cluster which includes configuring different components of HADOOP on Linux.
  • Control M Job setup by cloning and updating jobs parallel to UAT/production environment.
  • Worked on environment setup for coding as part of the migration.
  • Created Glacier account and setup automated data movement to save money.
  • Involved in elk setup with chef on open stack cluster.
  • Involved in cluster setup meetings with the administration team.
  • Initial setup to receive data from external source.
  • Worked on JDBC connectivity using Squirrel, ODBC connection setups with Toad, Stress testing Hadoop data for BI reporting tools.
  • Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
  • Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing.
  • Implemented Daily Cron jobs that automates the jobs once the upstream jobs are run using the CNTL-M setup.
  • Based on the offers setup for each client, the requests were post processed and given offers.
  • Experience in setting up Hadoop Linux cluster, setup SSH, Network monitoring and trouble shooting.
  • Developed the base abstract classes/interfaces, did setup of the development workspace, environment etc.

Show More

49. External Tables

low Demand
Here's how External Tables is used in Hadoop Developer jobs:
  • Designed and created Hive external tables using shared meta-store and supported partitioning, dynamic partitioning for faster data retrieval.
  • Created internal and external tables with properly defined static and dynamic partitions for efficiency.
  • Created Hive internal/external tables with proper static and dynamic partitions.
  • Involved in creating Hive internal and external tables, loading with data and writing hive queries which involves multiple join scenarios.
  • Worked on different Hive tables like External Tables, Managed Tables and implemented dynamic partitioning and clustering for efficient data access.
  • Created Hive external tables for the data in HDFS and moved data from archive layer to business layer with hive transformations.
  • Created Hive External tables and designed the loading of the data(overwrite/append) into tables and query data using HQL.
  • Created Hive tables as per requirement which were internal and external tables and used static and dynamic partitions to improve efficiency.
  • Created partitioned external tables in Hive which are used as staging table for the data loaded from the inbound feeds.
  • Implemented Dynamic /StaticPartitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Worked on partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Create Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Created Partitions, Bucketing in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Created hive external tables on top of processed data to easily manage and query the data using HiveQL.
  • Created the Hive external tables and loaded the converted ASCII data based on partition logic into ingestion layer.
  • Optimize performance of external tables in Hive, very good knowledge of Partitions, Bucketing in Hive.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Implemented bucketing concepts in Hive and Managed and External tables were designed to enhance the performance.
  • Created external tables on top of the flat file which are stored in HDFS using HIVE.

Show More

50. Multiple Map

low Demand
Here's how Multiple Map is used in Hadoop Developer jobs:
  • Developed multiple Map Reduce applications.
  • Developed multiple Map Reduce jobs and used Hive and Pig Scripts for analyzing the data.
  • Developed multiple Map Reduce jobs in java for data extraction and transformation.
  • Developed multiple MapReduce jobs in Pig for data cleaning and processing.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Developed multiple Map Reduce jobs in java for data cleaning.
  • Developed multiple MapReduce jobs in java language for data processing.
  • Developed multiple MapReduce jobs using Java for Data Cleansing.
  • Implemented multiple MapReduce jobs for data processing.
  • Developed multiple Map Reduce jobs in java.
  • Developed multiple MapReduce jobs in Java.
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Worked on Installed and configured f MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Developed multiple Map Reduce jobs in Java and supported them on Hadoop cluster for parallel processing of large data sets.
  • Developed multiple Map Reduce jobs in java for data cleaning and also written Hive & Pig UDFs.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing according to the business requirements.
  • Installed, configured Apache Hadoop, developed multiple MapReduce jobs in Command Line Interface and Eclipse.
  • Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Developed multiple Map Reduce jobs using Java API for data cleaning and pre-processing.
  • Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.

Show More

20 Most Common Skill for a Hadoop Developer

Hdfs9%
Sqoop8.6%
Pl/Sql7.2%
Pig UDF6.7%
Mapreduce6.7%
Oozie6.7%
Hbase6.1%
Hadoop6%

Typical Skill-Sets Required For A Hadoop Developer

RankSkillPercentage of ResumesPercentage
1
1
Hdfs
Hdfs
6%
6%
2
2
Sqoop
Sqoop
5.7%
5.7%
3
3
Pl/Sql
Pl/Sql
4.8%
4.8%
4
4
Pig UDF
Pig UDF
4.5%
4.5%
5
5
Mapreduce
Mapreduce
4.5%
4.5%
6
6
Oozie
Oozie
4.5%
4.5%
7
7
Hbase
Hbase
4.1%
4.1%
8
8
Hadoop
Hadoop
4%
4%
9
9
Flume
Flume
3.6%
3.6%
10
10
Pig Scripts
Pig Scripts
3.5%
3.5%
11
11
Cloudera
Cloudera
2.7%
2.7%
12
12
Linux
Linux
2.6%
2.6%
13
13
Unix
Unix
2.4%
2.4%
14
14
Hive Tables
Hive Tables
2.3%
2.3%
15
15
Log Files
Log Files
2.3%
2.3%
16
16
File System
File System
2%
2%
17
17
ETL
ETL
2%
2%
18
18
Zookeeper
Zookeeper
1.8%
1.8%
19
19
Kafka
Kafka
1.7%
1.7%
20
20
Data Analysis
Data Analysis
1.7%
1.7%
21
21
Log Data
Log Data
1.6%
1.6%
22
22
Nosql
Nosql
1.6%
1.6%
23
23
XML
XML
1.5%
1.5%
24
24
Scala
Scala
1.4%
1.4%
25
25
Business Requirements
Business Requirements
1.4%
1.4%
26
26
Reduce Programs
Reduce Programs
1.4%
1.4%
27
27
Relational Databases
Relational Databases
1.4%
1.4%
28
28
Python
Python
1.3%
1.3%
29
29
BI
BI
1.3%
1.3%
30
30
Rdbms
Rdbms
1.3%
1.3%
31
31
API
API
1.3%
1.3%
32
32
Teradata
Teradata
1.2%
1.2%
33
33
Latin Scripts
Latin Scripts
1.2%
1.2%
34
34
Generate Reports
Generate Reports
1.1%
1.1%
35
35
Json
Json
1.1%
1.1%
36
36
Avro
Avro
1.1%
1.1%
37
37
Workflow Engine
Workflow Engine
1.1%
1.1%
38
38
Test Cases
Test Cases
1%
1%
39
39
AWS
AWS
1%
1%
40
40
Large Amounts
Large Amounts
0.9%
0.9%
41
41
Impala
Impala
0.9%
0.9%
42
42
Manage Data
Manage Data
0.9%
0.9%
43
43
Data Warehouse
Data Warehouse
0.9%
0.9%
44
44
Maven
Maven
0.8%
0.8%
45
45
POC
POC
0.8%
0.8%
46
46
File Formats
File Formats
0.8%
0.8%
47
47
Data Solutions
Data Solutions
0.8%
0.8%
48
48
Setup
Setup
0.8%
0.8%
49
49
External Tables
External Tables
0.7%
0.7%
50
50
Multiple Map
Multiple Map
0.7%
0.7%

22,948 Hadoop Developer Jobs

Where do you want to work?