Hadoop Administrator

Top Hadoop Administrator Skills

Below we've compiled a list of the most important skills for a Hadoop Administrator. We ranked the top skills based on the percentage of Hadoop Administrator resumes they appeared on. For example, 9.9% of Hadoop Administrator resumes contained Hadoop as a skill. Let's find out what skills a Hadoop Administrator actually needs in order to be successful in the workplace.

The six most common skills found on Hadoop Administrator resumes in 2020. Read below to see the full list.

1. Hadoop

high Demand
Here's how Hadoop is used in Hadoop Administrator jobs:
  • Installed and configured Hadoop cluster in pseudo and fully distributed mode environments.
  • Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Worked with the Linux team to prepare nodes for Hadoop deployment.
  • Managed and reviewed Hadoop log files.
  • Upgraded Hadoop to 2.7.2 version.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Job Description: Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Worked on pulling the data from oracle databases into the Hadoop cluster using the Sqoop import.
  • Commissioned and Decommissioned nodes on CDH2, CDH 3 Hadoop cluster on Red Hat LINUX.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Performed benchmarking on the Hadoop cluster using different bench marking mechanisms.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Implemented high availability for Namenode's on the Hadoop cluster.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Developed various POCs over Hadoop, Big data.
  • Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop.
  • Administered linux servers, other unix variants, and managed hadoop clusters.

Show More

2. Hdfs

high Demand
Here's how Hdfs is used in Hadoop Administrator jobs:
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Deployed and configured flume agents to stream log events into HDFS for analysis.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in loading data to HDFS from various sources.
  • Loaded data from Linux/UNIX file system into HDFS.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
  • Extracted files from HBase and placed in HDFS/HIVE for processing.
  • Retrieved data from HDFS into relational databases with Sqoop.
  • Processed information from Hadoop HDFS.
  • Dumped the data from MySQL database to HDFS and vice-versa using Sqoop.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.

Show More

3. Sqoop

high Demand
Here's how Sqoop is used in Hadoop Administrator jobs:
  • Worked with SQOOP import and export functionalities to handle large data set transfer between traditional databases and HDFS.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Advised teams on best practices and optimal processes using Pig, Hive and Sqoop tools.
  • Handled the imports and exports of data onto HDFS using Flume and Sqoop.
  • Configured Sqoop and developed scripts to extract data from DB2 into HDFS.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Worked extensively with Sqoop for importing metadata from DB2.
  • Imported data from Oracle to HDFS using Sqoop.
  • Worked extensively with Sqoop for importing data.
  • Used Apache Sqoop for import and export functionalities to handle large data set transfer between DB2, Oracle databases and HDFS.
  • Perform periodic data dumps of and related datasets into HDFS using Sqoop to run clustering MR jobs.
  • Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
  • Installed and configured Hadoop HDFS, Map Reduce, Pig, Hive, and Sqoop.
  • Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
  • Experience in using sqoop to import and export data from external databases to Hadoop cluster.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Integrate data from various sources into Hadoop and Move data from Hadoop to other databases using Sqoop import and Export.

Show More

4. Cluster Nodes

high Demand
Here's how Cluster Nodes is used in Hadoop Administrator jobs:
  • Cluster maintenance including Capacity planning, adding and removing cluster nodes; cluster Monitoring and Troubleshooting.
  • Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running.
  • Performed Yarn and hive tuning on the cluster nodes.
  • Added and removed cluster nodes as required.
  • Balance, commission & decommission cluster nodes.
  • Check the disk space in cluster nodes.
  • Write queries to access the data for internal data processing and monitoring of Hadoop cluster nodes/edge nodes.
  • Integrated hadoop cluster nodes with Active Directory in order to be able to control user access.
  • Added and Decommissioned Hadoop cluster nodes including Balancing HDFS block data.
  • Add and decommission Hadoop cluster nodes.
  • Installed multi cluster nodes on HDP platform with the help of Ambari.
  • Worked on installations of kafka on single and multi cluster nodes.
  • Check the disk space in cluster nodes.Environment: HDFS, Map-Reduce, Sqoop, Pig and Cloudera Manager, Linux

Show More

5. Hbase

high Demand
Here's how Hbase is used in Hadoop Administrator jobs:
  • Installed, Configured and maintained HBASE.
  • Created HBase tables to store variable data formats of data coming from different applications.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Created HBase tables to store data depending on column families.
  • Store unstructured data in semi structure on HDFS using HBase.
  • Worked on taking Snapshot backups for HBase tables.
  • Work with HBase and Hive scripts to extract, transform and load the data into HBase and Hive.
  • Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop and Pentaho Kettle.
  • Worked with Sqoop Importing and exporting data, from different databases into HDFS, HBase and Hive.
  • Create views over HBase table and used SQL queries to retrieve alerts and meta data.
  • Worked on importing and exporting data from rational database to HDFS and Hbase using sqoop.
  • Assisted in designing, development and architecture of Hadoop clusters and HBase systems.
  • Experience in administration of No Sql databases including Hbase and MongoDB.
  • Commissioned and Decommissioned Nodes from time to time Written automated script to monitor HDFS and HBase through Cron jobs.
  • Build and maintain scalable data using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
  • Installed and Configured Hbase by installing Hbase Master and Hbase Regional Servers.
  • Installed and configured Hadoop ecosystem components like Hive, Pig, Sqop, Flume, Oozie and HBase.
  • Used apache tools/frameworks Hive, Pig, Sqoop & HBase for the entire ETL workflow.

Show More

Job type you want
Full Time
Part Time
Internship
Temporary

6. Flume

high Demand
Here's how Flume is used in Hadoop Administrator jobs:
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Load daily traffic data from online transactions into the HDFS using Apache Flume.
  • Experience in using Flume to stream data into HDFS from various sources.
  • Worked on streaming data into HDFS from web servers using Flume.
  • Load data from various data sources into HDFS using Flume.
  • Used Flume to channel data from different sources to HDFS.
  • Installed, Configured and managed Flume Infrastructure.
  • Used Flume to move the data from web logs onto HDFS.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Installed and configured flume to get the messages from MDM servers to Hadoop for business analysis.
  • Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
  • Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Experience in Setting up Data Ingestion tools like Flume, Sqoop, SFTP.
  • Extracted output files using Sqoop and loaded the extracted log data using Flume.
  • Build Flume and Kafka setup to stream webserver log data into HDFS.
  • Configure Sqoop and Flume to Export/Import Data in to HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behaviouraldata into HDFS for analysis.

Show More

7. Oozie

high Demand
Here's how Oozie is used in Hadoop Administrator jobs:
  • Worked on Configuring Oozie Jobs.
  • Configured Oozie for workflow automation and coordination.
  • Cluster balancing and performance tuning of Hadoop components like HDFS, Hive, Impala, MapReduce, Oozie work flows.
  • Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the CDH cluster.
  • Designed and proposed end-to-end data pipeline using falcon and Oozie by doing POCs.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Configured Oozie workflow engine to run multiple Hive jobs.
  • Used Hue to create, maintain, monitor and run various Hadoop jobs such as Hive queries and Oozie workflows.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Manage and review Hadoop log files Installed Oozieworkflow engine to run multiple Hive and Pig Jobs.
  • End to end Data flow management from sources to Nosql (mongoDB) Database using Oozie.
  • Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
  • Worked on installing Hadoop Ecosystem components such as Sqoop, Pig, Hive, Oozie, and Hcatalog.
  • Installed and worked on Hadoop ecosystem components Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Orchestrated Sqoop scripts, pig scripts, hive queries using Oozie workflows and sub-workflows.
  • Streamlined Hadoop jobs and workflow operations using Oozie workflows, job schedules.
  • Monitor and troubleshoot Oozie workflows with hive/pig/mapreduce/java actions to process data events are collected to hdfs/hive.

Show More

8. Cloudera Hadoop

high Demand
Here's how Cloudera Hadoop is used in Hadoop Administrator jobs:
  • Participate in development/implementation of Cloudera Hadoop environment.
  • Installed, configured and deployed a 60 node Cloudera Hadoop cluster for development, production.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloud era distribution parcels.
  • Performed both major and minor upgrades to the existing Cloudera Hadoop cluster.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Installed, configured and optimized Hadoop infrastructure using Apache Hadoop and Cloudera Hadoop distributions.
  • Experience on Cloudera Hadoop upgrade from CDH4.3 to CDH5.3 and applying patches Worked for Name node recovery and balancing Hadoop Cluster.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Maintained 70+ node Hadoop clusters using Cloudera Hadoop Cluster CDH 4 using Cloudera Manager.
  • Maintained and supported a multi node Production, Stagingand Pre-Production Hadoop Clusters.Production Cluster Cloudera Hadoop version is CDH 4.1.2.

Show More

9. SQL

high Demand
Here's how SQL is used in Hadoop Administrator jobs:
  • Implemented Custom JOINS to create tables containing the records of Items or vendors blacklisted for defaulting payments suing Spark SQL.
  • Used HiveQL to write Hive queries from the existing SQL queries.
  • Created Python/MySQL back-end for data entry from Flash.
  • Configured MySQL Database to store Hive metadata.
  • Deployed necessary SQL queries for database transactions.
  • Lead various data conversion initiatives that included testing Hadoop SQL performance using bucketing and partitioning.
  • Configured Hive Metastore to use MySQL Database, to make available all the tables created in Hive for different users simultaneously.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from various sources.
  • Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Worked extensively with Sqoop for importing metadata from SQL server 2008.
  • Worked with NoSQL database Hbase to create tables and store data.
  • Configured and deployed hive megastore using MySQL and thrift server.
  • Installed and configured Hive with remote Metastore using MySQL.
  • Experience in configuring Mysql to store the hive metadata.
  • Configured and deployed hive metastore using MySQL.
  • Experience with NoSQL database Hbase.
  • Performed Sqooping for various file transfers through the Cassandra tables for processing of data to several NoSQL DBs.
  • Used Sqoop to import data into HDFS from MySQL and Access databases and vice-versa.
  • Migrated Hive Metastore from postgesSQl to MySQL for better scalability.

Show More

10. Review Log Files

high Demand

11. Linux

high Demand
Here's how Linux is used in Hadoop Administrator jobs:
  • Preformed Linux server administration to the server hardware and operating system.
  • Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
  • Worked with Linux, EMC SAN, and Network teams to ensure the smooth relocation of the servers.
  • Worked with Linux server admin team in administering the server hardware and operating system.
  • Provide after Hours and On Call Linux support to Development team & internal customers.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Worked with Linux systems and MySQL database on a regular basis.
  • Experience with Linux internals, virtual machines and open source tools/platforms.
  • Installed and maintained Linux Servers, upgraded and downloaded patches.
  • Implemented NFS, NAS and HTTP servers on Linux servers.
  • Red hat Linux package administration using RPM and YUM.
  • Involved in support and monitoring production Linux Systems.
  • Experienced in Linux Administration tasks like IP Management (IP Addressing, Ethernet Bonding, Static IP and Subnetting).
  • Performed Red Hat Linux Kickstart installations on RedHat 4.x/5.x, performed Red Hat Linux Kernel Tuning, memory upgrades.
  • Install, validate, test, and package Hadoop products on Red Hat Linux platforms.
  • Involved in the process of linux kernel upgrade in the cluster.
  • Involved in estimation and setting-up Hadoop Cluster in Linux.
  • Maintained and Monitored Hadoop and Linux Servers.
  • Provision Virtual Linux servers in VMWare environment.
  • Image machines using Jumpstart /Kickstart to install Solaris 10 and Red Hat Enterprise Linux.

Show More

12. Mapreduce

high Demand
Here's how Mapreduce is used in Hadoop Administrator jobs:
  • Developed MapReduce programs to perform data filtering for unstructured data.
  • Supported/Troubleshooted MapReduce programs running on the cluster.
  • Worked with big data developers, designers in troubleshooting MapReduce job failures and issues with Hive, Pig and Flume.
  • Involved in creating Hive tabbies, loading with data and writing Hive queries that will run internally in MapReduce way.
  • Parsed cleansed and mined useful and meaningful data in HDFS using MapReduce for further analysis.
  • Experienced in writing MapReduce jobs using Pig Latin scripts and Pig UDF's in Java.
  • Close monitoring and analysis of the MapReduce job executions on cluster at task level.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Tuned YARN components to achieve high performance for MapReduce jobs.
  • Supported MapReduce Programs which are running on the cluster.
  • Helped develop MapReduce programs and define job flows.
  • Supported Data Analysts in running MapReduce Programs.
  • Develop MapReduce jobs for the users.
  • Set up Linux Users, and tested HDFS, Hive, Pig and MapReduce Access for the new users.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
  • Optimized Hadoop clusters components: HDFS, Yarn, MapReduce, Hive, Kafka to achieve high performance.
  • Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
  • Validated YARN and HIVE parameters for mapreduce jobs to run successfully.
  • Involved in installation of Hive, Pig, MapReduce and Sqoop.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.

Show More

13. Capacity Planning

high Demand
Here's how Capacity Planning is used in Hadoop Administrator jobs:
  • Involved in cluster capacity planning and forecasts including timing and budget considerations.
  • Involved in Applications capacity planning.
  • Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
  • Involved in Capacity planning, Hardware planning, Installation, Performance tuning of HadoopEcosystem.
  • Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Worked on Capacity planning for the Production Cluster.
  • Monitored workload, job performance and capacity planning.
  • Assisted with data capacity planning and node forecasting.
  • Assist in the capacity planning process.
  • Involved in Hadoop Cluster environment administration that includes cluster capacity planning, performance tuning, cluster Monitoring and Troubleshooting.
  • Worked on file system management and monitoring and Capacity planning Execute system disaster recovery processes
  • Involved in Cluster Capacity planning alongwith expansion of the existing environment.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked on Hadoop clusters capacity planning and management.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Experience commissioning & decommissioning od nodes, performance tuning, capacity planning, cluster monitoring.
  • Worked on commissioning & decommissioning of Data Nodes, Namenode recovery, capacity planning.
  • Screen Hadoop cluster job performances and capacity planning.
  • Manage and review Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
  • Screen Hadoop cluster job performances and capacity planning Monitor Hadoop cluster connectivity and security Manage and review Hadoop log files.

Show More

14. Zookeeper

high Demand
Here's how Zookeeper is used in Hadoop Administrator jobs:
  • Configured ZooKeeper to implement node coordination in clustering support.
  • Cluster coordination services through Zookeeper.
  • Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Integrated Hadoop cluster with Zookeeper cluster and achieved NameNode High Availability.
  • Implemented automatic failover controller using zookeeper service.
  • Load and transform large sets of structured, semi structured and unstructured data Cluster coordination services through Zookeeper.
  • Experience configuration of high availability solutions for Name node, and HBase master through Zookeeper ensemble.
  • Installed and configured Zookeeper for Hadoop cluster.
  • control using zookeeper and quorum journal nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
  • Design and Configure the Cluster with the services required (HDFS, Hive, Hbase, Oozie, Zookeeper).
  • Experience in using Hive, Pig, Scoop, HBase, Zookeeper, Oozie, etc.
  • Installed and configured of Hadoop projects such as Hive, Pig, HBase, Spark, Zookeeper, Oozie, Kafka.
  • Installed and configured Flume, Hive, Sqoop, Zookeeper and Oozie on the Hadoop cluster.
  • Installed and configured Hadoop ecosystem components like MapReduce, Hive, Pig, Sqoop, HBase, ZooKeeper and Oozie.
  • Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.

Show More

15. High Availability

average Demand
Here's how High Availability is used in Hadoop Administrator jobs:
  • Developed a high availability cluster configuration with various high available services on a need basis.
  • Worked on implementing NameNode backup using NFS for High availability.
  • Involved in setting up of High Availability disaster recovery environment.
  • Configured NameNode high availability and NameNode federation.
  • Configured High Availability for HDFS NameNode.
  • Applied standard Back up policies to make sure the high availability of cluster.
  • Maintain data safety and maintain high availability(HA) of NameNode.
  • Installed and configured MYSQL and Enabled High Availability.
  • Experience in setting up HBase which includes master and region server configuration, High availability configuration, performance tuning and administration.
  • Worked on setting up High Availability for major production cluster and designed automatic failover.
  • Implemented Hadoop High Availability, Backup and Disaster Recovery.
  • Worked on High Availability for NameNode using Cloudera Manager to avoid single point of failure.
  • Implemented NAMENODE, YARN high availability and Hive Metastore/hiveserver2 for cluster load balancing.
  • Implemented and Configured High Availability Hadoop Cluster (Quorum Based).
  • Configured High Availability (HA) for Namenode and Hiveserver2.
  • Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Involved in configuring HBase to Use HDFS High Availability model.
  • Created load balancers (ELB) and used Route53 with failover and latency options for high availability and fault tolerance.
  • Implemented the Cluster High Availability incase of crash or planned maintenance.
  • Experience in Handling Name Node High availability, Experiense deploying Hadoop, Understands how quires run in Hadoop.

Show More

16. Kerberos

average Demand
Here's how Kerberos is used in Hadoop Administrator jobs:
  • Worked on authenticated user management by setting up Kerberos integrated with Active Directory.
  • Involved in installation and configuration of Kerberos security setup on CDH5.5 cluster.
  • Configured and maintained Kerberos security for echo system application users.
  • Implemented authentication and authorization service using Kerberos authentication protocol.
  • Worked on Configuring Kerberos Authentication in the cluster.
  • Implemented Kerberos Security Authentication protocol for existing cluster.
  • Implemented Kerberos security in all environments.
  • Implemented Cluster Security using Kerberos.
  • Configured Kerberos for authentication, Knox for perimeter security and Ranger for granular access in the cluster.
  • Performed setup of Linux users, Kerberos principals and access of HDFS/Hive/MapReduce for the new users.
  • Involved in installation and configuration of LDAP server and integrated with kerberos on cluster.
  • Created various permissions levels on various databases using Ranger and Kerberos.
  • Experience in setting up Kerberos in Horton works cluster.
  • Implemented MIT Kerberos in cluster to authenticate users.
  • Maintained Hadoop ecosystem security by Installing and configuring Kerberos.
  • Implement security in Hadoop using Kerberos authentication.
  • Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
  • Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
  • Worked in Setting up Kerberos authentication for Hadoop and Monitor Hadoop cluster connectivity and security as-well.
  • Integrated complete Hadoop CDH stack with Enterprise AD/LDAP, Kerberos in company's enterprise security standards.

Show More

17. CDH

average Demand
Here's how CDH is used in Hadoop Administrator jobs:
  • Collaborated with different teams for Cluster Planning, Hardware requirement, server configurations, network equipment's to implement CDH 3.6.
  • Involved in the installation of CDH5 and up-gradation from CDH4 to CDH5.
  • Performed a major upgrade of cluster from CDH4 to CDH5.
  • Performed a Major upgrade HADOOP Cluster running 100 Nodes from CDH 4 to CDH 5.
  • Upgraded cluster from CDH 5.3.2 to CDH 5.4.4.
  • Involved in upgrading clusters to Cloudera Distributed Clusters and deployed into CDH5.
  • Configured alters for all the services in CDH using Cloudera Manager.
  • Installed and configured CDH5.0.0 cluster, using Cloudera manager.
  • Have knowledge about Hadoop 2.0 version and CDH 5.x.
  • Upgraded production Hadoop clusters from CDH4U1 to CDH5.2 and CM 4.x to CM 5.1.
  • Major Upgrade from CDH 4 to CDH 5.2 HDFS support and maintenance.
  • Worked on Hadoop CDH upgrade from CDH4.x to CDH5.x.
  • Upgraded the Hadoop cluster from CDH4.7 to CDH5.2.
  • Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with cdh4.
  • Installed and configured Hadoop Cloudera CDH5.7.1.
  • Major Upgrade from cdh 4 to chd 5.2.
  • Involved in HDFS maintenance, Upgrading the cluster to latest versions of CDH Imported/exported data from RDMS to HDFS using Sqoop.
  • Installed and configured HDP, Cloudera Manager, Hive, Pig, Sqoop and Oozie on the CDH4 cluster.
  • Installed and configured Hadoop cluster of Cloudera Distribution(CDH 4.x) using Cloudera Manager and maintained their integrity.
  • Implemented 100 node CDH4 Hadoop cluster on Red hat Linux using Cloudera Manager.

Show More

18. Data Nodes

average Demand
Here's how Data Nodes is used in Hadoop Administrator jobs:
  • Implemented commissioning and decommissioning of data nodes.
  • Commissioned / Decommissioned data nodes to the cluster at times of planned Maintenance and as well as unplanned outages.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Commissioned data nodes as per the cluster capacity and decommissioned when the hardware degraded or failed.
  • Monitor health check of data nodes and fix the servers that are with bad hard drives.
  • Commission and decommission the Data nodes from cluster in case of problems.
  • Commissioned Data Nodes when data grew and Decommissioned when the hardware degraded.
  • Worked on installing cluster, commissioning & decommissioning of data nodes.
  • Tuned the cluster by Commissioning and decommissioning the Data Nodes.
  • Monitor Hadoop Name node Health status, number of Task trackers running, number of Data nodes running.
  • Worked with operational team for Commissioning and Decommissioning of data nodes on the Hadoop Cluster.
  • Added new Data Nodes when needed and ran balancer.
  • Implemented automatic failover zookeeper and ZOOKEEPER failover controller Tuned the cluster by COMMISIONING and DECOMMISIONING the Data Nodes.
  • HIVE 1975) Adding new Data Nodes when needed and running balancer.

Show More

19. File System

average Demand
Here's how File System is used in Hadoop Administrator jobs:
  • File system management and monitoring.
  • Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.
  • Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Developed script to check the Space utilization of local file system directories on Gateway servers and Master Nodes.
  • Set up Flume agents to stream the log data onto local Linux file systems and to HDFS.
  • Experienced in writing the automatic scripts for monitoring the file systems, key MAPR services.
  • Enabled HA for CLDB using MapR File System to avoid single point of failure.
  • Deployed Network file system (NFS) for NameNode Metadata backup.
  • Experienced in loading data from Linux file system to HDFS.
  • Monitored the file systems and CPU load for better performance.
  • Defined file system layout and data set permissions.
  • Involved in HDFS File system management and monitoring.
  • Worked on the Hadoop File System Java API to develop or Compute the Disk Usage Statistics.
  • Implemented spark on HDP cluster and configured HDFS as backend file system for spark engine.
  • Monitored Hadoop cluster and file system usage on name and data nodes.
  • Worked in Kerberos, Active Directory/LDAP, Unix based File System.
  • Deployed Network file system for NameNode Meta data backup.
  • Configured NFS server and POSIX clients on data nodes in order to read/write into Hadoop MapR file system locallyin secured way.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing Involved in loading data from LINUX file system to HDFS.
  • Identify Topologies for High Availability Identify and setup all the Infrastructure like Storage, File System, and User Accounts etc.

Show More

20. Setup

average Demand
Here's how Setup is used in Hadoop Administrator jobs:
  • Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
  • Involved in setup, installation, configuration of OBIEE 11g in Linux operating system also integrating with the existing environment.
  • Configured JMS Server setup, Database connection setup and deployed the returned items in WebLogic Application Server.
  • Experienced in Setting up the project and volume setups for the new projects.
  • Build automated setup for the cluster monitoring and issue escalation process.
  • Worked on Audit setup for activity on HDFS, entitlement reviews.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in sentry services setup with security configuration with LDAP Users.
  • Name node hardware clustering setup with NFS as common storage.
  • Helped in the setup of Map reduce jobs.
  • Worked on monitoring setup for Hadoop ecosystem components.
  • Installed and configured he Kerberos KDC setup and created the realm, principals.
  • Load Balancer setup for Impala to make more efficient on failure of DataNode.
  • Managed Hadoop clusters: setup, install, monitor, maintain.
  • Created method of process for the Kerberos KDC cluster Setup.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Designed and Implemented Icinga monitoring for complete Hadoop setup, including architecture planning, host-grouping and alert setup.
  • Worked with MapR Hadoop clusters to setup and maintain environment.
  • Worked with data delivery teams to setup new Hadoop users.

Show More

21. Nagios

average Demand
Here's how Nagios is used in Hadoop Administrator jobs:
  • Use NAGIOS to configure cluster/server level alerts and notifications in case of a failure or glitch in the service.
  • Designed and implemented a distributed network monitoring solution based on Nagios and Ganglia using puppet.
  • Implemented nagios and integrated with puppet for automatic monitoring of servers known to puppet.
  • Installed and implemented the monitoring tools like ganglia and Nagios on both the clusters.
  • Have written scripts for Nagios and configured the Check NRPE accordingly.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Installed Apache Kafka and monitored it with ganglia and nagios.
  • Experience with monitoring tools such as Cloud Watch/ Nagios.
  • Monitored jobs using the Nagios tool.
  • Monitored Clusters with Ganglia and Nagios.
  • Administered, monitored and maintained multi data-center Cassandra cluster using OpsCenter and Nagios in production.
  • Cluster Monitoring with Cloudera Manager and Nagios.
  • Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise.
  • Monitored cluster health and activity with Nagios, Ganglia and Rivermuse;Maintain and troubleshoot scribe intake and integration servers.
  • Deployed a Hadoop cluster using cdh4 integrated with Nagios and Ganglia.
  • Monitored multiple clusters environments using Ambari Alerts, Metrics and Nagios.
  • Worked on Nagios X1 for monitoring hadoop clusters.
  • Migrated all Nagios alerts to Icinga alerts.
  • Set up automated 24x7x365 monitoring and escalation infrastructure for Hadoop cluster using Nagios Core and Ambari.
  • Installed Datastax Opscenter and Nagios for monitoring purposes.

Show More

22. Unix

average Demand
Here's how Unix is used in Hadoop Administrator jobs:
  • Experience with Unix or Linux, including shell scripting and ran monthly security checks through UNIX and Linux environment.
  • Involved in Unix to Linux & PL-SQL code to AbInitio Graphs migration set of projects.
  • Designed and implemented UNIX level security model for Big Data Cluster on Linux systems.
  • Ingested the data from various file system to HDFS using UNIX command line utilities.
  • Installed and configured TIBCO EMS servers on developer's workstations and UNIX machines.
  • Managed UNIX account maintenance including additions, changes, and removals.
  • Resolved issues related to Application, Hardware and UNIX Environment.
  • Experience with Unix or Linux, including shell scripting.
  • Shell scripting for Linux/Unix Systems Administration and related tasks.
  • Apply security patches on UNIX and Linux servers.
  • Worked in Unix commands and Shell Scripting.
  • Managed Users on Unix/Linux systems.
  • Monitor and communicate the availability of UNIX and Informatica Environment.
  • Coded well-tuned Unix Korn Shell scripts, Wrapper scripts for high volume data warehouse instances.
  • Developed UNIX scripts for scheduling the delta loads and master loads using Auto sys Scheduler.
  • Work with various Admin teams (Teradata, UNIX & Informatica) to migrate code from one environment to another environment.
  • Develop and maintainUNIX scripts that load salesforce data into the HDFS.
  • Worked with UNIX team on remediating qualys findings.
  • Experience with Unix or Linux, including shell scripting Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
  • Worked on Configuring Kerberos Authentication in the cluster Very good experience with all the Hadoop eco systems in UNIX environment.

Show More

23. Hortonworks

average Demand
Here's how Hortonworks is used in Hadoop Administrator jobs:
  • Worked on Installation of HORTONWORKS 2.1 in AWS Linux Servers.
  • Installed and Deployed HADOOP cluster with HORTONWORKS HDP 2.4.3 via AMBARI 2.2.
  • Collaborate with Hortonworks team for technical consultation on business problems and validate the architecture/design proposed.
  • Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
  • Performed both major and minor upgrades to the existing Hortonworks cluster of another business unit.
  • Worked on Migrating Hive scripts from CDH to Hortonworks.
  • Experienced on setting up hortonworks cluster and installing all the ecosystem components through Ambari and manually from command line.
  • Build the 12 node development Hortonworks HDP 2.4.0 cluster on AWS using cloudbreak from the scratch.
  • Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
  • Installed, configured, secured, and troubleshoot Hortonworks Hadoop Data Platform (HDP).
  • Designed the Data Model to be used for correlation in Hadoop/Hortonworks.
  • Worked on Hortonworks Distribution on major contributors to Apache Hadoop.
  • Worked with Hortonworks flavor to setup and maintain environment.
  • Performed major and minor upgrades on Hortonworks Hadoop.
  • Experience in setting up Kerberos in hortonworks cluster.
  • Set up 5 node Cloudera CDH5 and 5 node Hortonworks cluster (HDP-2.1 and HDP-2.2) for carrying the POC.
  • Installed Hortonworks 2.1, 2.2 and 2.3 Installed HUE Browser.
  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop, Hortonworks Distribution.
  • Installed and configured Cloudera Manager for easy management of existing Hadoop cluster Administered and supported distribution of Hortonworks.
  • Experienced in installing, configuring and optimizing Cloudera Hadoop version CDH4 and Hortonworks in a 100 node Multi Clustered environment.

Show More

24. Hive Tables

average Demand
Here's how Hive Tables is used in Hadoop Administrator jobs:
  • Created internal and external Hive tables and defined static and dynamic partitions as per requirement for optimized performance.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Created External Hive tables and loaded the data in to tables and query data using HQL.
  • Configured Hive meta store with MySQL, which stores the metadata of Hive tables.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Created Hive tables to store data into HDFS and processed data.
  • Involved in Data model sessions to develop models for HIVE tables.
  • Created Hive tables and working on them using Hive QL.
  • Identified and created ORC formatted hive tables for high usage.
  • Created Hive tables in Parquet format.
  • Used IMPALA to pull the data from Hive tables.
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Developed an Audit script to check data in Hive tables are sent correctly without mismatch when distcp between two clusters.
  • Configured Hive metastore which stores the metadata for Hive tables and partitions in a relational database.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Developed scripts to delete the empty hive tables existing in the Hadoop file system.
  • Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
  • Involved in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.

Show More

25. Ambari

average Demand
Here's how Ambari is used in Hadoop Administrator jobs:
  • Created own YUM repository for AMBARI, HDP, HDP-UTILS and EPEL for installation and update packages.
  • Experienced in Ambari-alerts configuration for various components and managing the alerts.
  • validate Infra readiness on H2OandAmbari Components.
  • Worked intact for integrating the LDAP server and active directory with the Ambari through command line interface.
  • Experienced in implementing and configuring HDP tools Ranger, Ambari Alerts, SMART SENSE.
  • Experience in using Ambari Web UI for automatic installations and also manual installations.
  • Experience in setting up a new cluster from existing cluster using Ambari blueprint.
  • Monitored cluster using Ambari and optimize system based on job performance and criteria.
  • Maintained, audited and built new clusters for testing purposes using the Ambari.
  • Experienced on adding/installation of new components and removal of them through Ambari.
  • Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
  • Worked on continuous monitoring of Hadoop cluster components and Metrics on AMBARI.
  • Configured hive view and file view in ambari and maintained ambari database.
  • Cluster status monitoring by Ambari cluster Management and HBase Master Web UI.
  • Experience in collecting metrics for Hadoop clusters using Ambari.
  • validate Infra readiness on H2Oand Ambari Components.
  • Deployed cluster with Ambari and Cloudera manager.
  • Installed Ambari on existing Hadoop cluster.
  • Created Ambari Views for Tez, Hive and HDFS.
  • Upgraded Hortonworks Ambari and HDP Stack from 2.3 to 2.4 Version in Dev, DR and Prod Environment.

Show More

26. HDP

average Demand
Here's how HDP is used in Hadoop Administrator jobs:
  • Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Experience in installing, configuring, monitoring HDP stacks 2.1, 2.2, and 2.3.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2.
  • Worked on Configuring High Availability for Name Node in HDP 2.2.
  • Worked on HDP 2.2 and Enabled NameNode HA.
  • Install and configure Phoenix on HDP 2.1.
  • Provided POC for test and QA cluster using HDP 1.7 using NOSQL DB using HBASE.
  • Worked with Horton works HDP2 including Pig, Hive, H Base.
  • Major Upgrade from HDP 2.2 to HDP 2.3.
  • Performed both major (HDP 2.)
  • Managed 350+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
  • Installed and configured Hortonworks Distribution Platform (HDP 2.3) on Amazon EC2 instances.
  • Involved in upgrading Hadoop Cluster from HDP 1.3 to HDP 2.0.
  • Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
  • Implemented new feature Smart sence in 2.3 HDP.
  • Experience in Upgrading the Hadoop cluster from HDP 2.3.4 to HDP 2.3.6.
  • Deployed HDP 2.2 with Ambari.
  • Worked on cluster Upgradation in Hadoop from HDP 2.2 to HDP 2.3.
  • Upgraded the Hadoop cluster from HDP 1.3 TO 2.2 Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Installed and maintained Hortonworks HDP 2.1 hadoop cluster in a test environment with Ambari 1.5.1.

Show More

27. Name Node

average Demand
Here's how Name Node is used in Hadoop Administrator jobs:
  • Set up automated processes to archive/clean the unwanted data on the cluster, on Name node and Secondary name node.
  • Designed the cluster so that only one Secondary name node daemon could be run at any given time.
  • Worked on fixing the cluster issues and Configuring High Availability for Name Node in HDP 2.1.
  • Design and maintain the Name node and Data nodes with appropriate processing capacity and disk space.
  • Installed Name Node, Secondary Name Node, Job Tracker, Data Node, Task Tracker.
  • Set up and managing HA Name Node to avoid single point of failures in large clusters.
  • Worked on High Availability for Name Node using ClouderaManager to avoid single point of failure.
  • Implemented Name Node HA in all environments to provide high availability of clusters.
  • Implemented Name Node backup using NFS for High availability.
  • Deployed Network file system for Name Node Metadata backup.
  • Enabled Name node High Availability on production cluster.
  • Experience in Name Node HA implementation.
  • Implemented Name node backup using NFS.
  • Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
  • Experience in Implementing High Availability of Name Node and Hadoop Cluster capacity planning to add and remove the nodes.
  • Implemented Name node High Availability on the Hadoop cluster to overcome single point of failure.
  • Configured High Availability on the name node for the Hadoop cluster - part of the disaster recovery roadmap.
  • Deployed Name Node high availability for Hadoop cluster where to handle automatic failover control.
  • Implemented HA for name node, resource manager and Hbase master using ambari.
  • Installed Cloudera CDH 4 Cluster (name node and data nodes ) * Implemented HIVE security using Sentry.

Show More

28. ETL

average Demand
Here's how ETL is used in Hadoop Administrator jobs:
  • Developed multiple Proof-Of-Concepts to justify viability of the ETL solution including performance and compliance to non-functional requirements.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment.
  • Designed end to end ETL flow for one of the feed having millions of records inflow daily.
  • Worked on ETL process and handled importing data from various data sources, performed transformations.
  • Participated in team meetings and proposed ETL Strategy and provide them best practices.
  • Performed administration, troubleshooting and maintenance of ETL and ELT processes.
  • Monitored the performance and identified performance bottlenecks in ETL code.
  • Worked hands on with ETL process.
  • Worked extensively with ETL processes.
  • Reviewed ETL application use-cases before on-boarding to Hadoop.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Used SPARK to build fast analytics for ETL Process and Constructed ingest pipeline using Spark streaming.
  • Involved in collecting requirements from business users, designing and implementing data pipelines and ETL workflows.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations.
  • Involved in development of ETL processes with Hadoop, YARN and Hive.
  • Loaded the dataset into Hive for ETL Operation.
  • nvolved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Configured Talend, DataStage and Toad DataPoint for ETL activities on Hadoop/Hive databases.
  • Created Talend mappings for initial load and daily updates and also involved in the ETL migration jobs from Infromatica to Talend.
  • Design and maintain dataflow and workflows (ETL scripts/packages) using Flume, Sqoop and Oozie.

Show More

29. Nosql

average Demand
Here's how Nosql is used in Hadoop Administrator jobs:
  • Evaluated system performance and validated NoSQL solutions.
  • Worked on implementing NOSQL database Cassandra cluster.
  • Gained good experience with NOSQL database.
  • Involved in NoSQL (Datastax Cassandra) database design, integration and implementation.
  • Worked on NoSQL databases including HBase.
  • Create and publish REST Clients for the middleware to interact with the Accumulo NoSQL DB.
  • Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.
  • Experience in admistration of NoSql databases including Hbase and MongoDB.
  • Worked on NoSQL databases including HBase, Mongo DB, and Cassandra.+ Implemented multi-data center and multi-rack Cassandra cluster.
  • Experienced in using Pentaho Data Integration in Cassandra Imported files into NoSQL database like HBase through Sqoop.

Show More

30. Version Upgrades

low Demand
Here's how Version Upgrades is used in Hadoop Administrator jobs:
  • Implement both major and minor version upgrades to the existing cluster and also rolling back to the previous version.
  • Experience in managing backups and version upgrades.
  • Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
  • Rack aware configuration Applying Patches and Perform Version Upgrades to the various modules in the HDFS cluster.
  • Install operating system and Hadoop updates, patches, version upgrades when required.
  • Formulated procedures for installation of Hadoop patches, updates and version upgrades.
  • Assisted application teams in Hadoop updates, version upgrades.
  • Experience in managing backups and version upgrades Good experience in troubleshoot production level issues in the cluster and its functionality.
  • Worked on component version upgrades, backup, commissioning and decommissioning Hadoop eco system components.

Show More

31. Kafka

low Demand
Here's how Kafka is used in Hadoop Administrator jobs:
  • Experience With installing and configuring Distributed Messaging System like Kafka.
  • Performed Kafka operations on regular basis.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Involved in monitoring data and filtering data for high speed data handling using Kafka.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Design and implemented Kafka cluster with separate nodes for brokers and provide operation support.
  • Integrated Kafka with Storm for real time Streaming data ingestion and processing.
  • Worked on Partition, Kafka topics and monitoring Kafka clusters.
  • Integrated Kafka with Spark Streaming for real time data processing.
  • Consumed the data from Kafka queue using spark.
  • Installed Kafka cluster with separate nodes for brokers.
  • Used Kafka to publish messages.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Build Apache Kafka Multinode Cluster and used Kafka Manager to monitor multiple Clusters.
  • Configured Kafka to write the data into Elasticsearch via the dedicated consumer.
  • Managed Hadoop log files using Flumes and Kafka.
  • Worked on monitoring Hadoop cluster and different big data tools including Flume, Oozie & Kafka.
  • Managed application access logs using the flume and kafka .Monitored daily access logs and average service response time using flume.
  • Experience in diagnosing problems with Hbase, Solr, Ooozie, Yarn, Storm, Kafka, and Kerberos.
  • Experienced with Hadoop ecosystems such as Hive, HBase, Sqoop, Kafka, Oozie etc.

Show More

32. Workflow Engine

low Demand
Here's how Workflow Engine is used in Hadoop Administrator jobs:
  • Installed and configured the Oozie workflow engine by adding the service on the existing cluster.
  • Worked on Oozie workflow engine to run multiple Map Reduce jobs.
  • Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.

Show More

33. Configuration Management

low Demand
Here's how Configuration Management is used in Hadoop Administrator jobs:
  • Performed the automation using Chef Configuration management and managing the infrastructure environment with Puppet.
  • Used Puppet for User Account Provisioning and automating software configuration management.
  • Automated the configuration management for several servers using Chef and Puppet.
  • Used ClearCase Version Control for Project Configuration Management.
  • Experienced in automating the configuration management using Puppet.
  • Used Puppet Enterprise edition for configuration management.
  • Installed and maintain puppet-based configuration management system.
  • Provide software product build configuration management.
  • Interact with developers and Enterprise Configuration Management Team for changes to best practices and tools to eliminate non-efficient practices and bottlenecks.
  • Deployed Puppet, Puppet Dashboard, and Puppet DB for configuration management to existing infrastructure.
  • Involved in administration, configuration management, monitoring, debugging and performance tuning of Hadoop environments.
  • Performed automation/configuration management using Chef, Ansible, and Docker based containerized applications.
  • Deployed and used Ansible dashboard for configuration management to existing infrastructure.
  • Used Ansible to automate Configuration management.
  • Deploy and monitor scalable infrastructure on Amazon web services (AWS) & configuration management using puppet.
  • Set up multi-node Hadoop cluster with configuration management/deployment tool (Chef).

Show More

34. Review Data Backups

low Demand
Here's how Review Data Backups is used in Hadoop Administrator jobs:
  • Manage and review data backups and log files and experience in deploying Java applications on cluster.
  • Manage and review data backups and log files.

Show More

35. AWS

low Demand
Here's how AWS is used in Hadoop Administrator jobs:
  • Used AWS (Amazon Web Services) Cloud computing EC2 for provisioning like new instance (VM) creation.
  • Involved in data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
  • Worked on standards and proof of concept in support of CDH4/5 implementation using AWS cloud infrastructure.
  • Monitored and configured a test cluster on AWS for further testing process and gradual migration.
  • Installed the application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Reviewed firewall settings (security group) and updated on Amazon AWS.
  • Worked on AWS services like EC2 instances on regular basis.
  • Configured AWS IAM and Security Groups.
  • Experience with Amazon Web Services (AWS).
  • Cloud installation utilizing Cloudera Director with AWS provider.
  • Worked on Configuration and administration of Load Balancers, Network and Auto scaling for sub-domains in AWS VPC.
  • Provided guidance on AWS operations and deployment, and best practices throughout the lifecycle of a project.
  • Worked on customizing ambari blueprints, AWS resources, VPC, Networks and Security Groups.
  • Worked on moving some of the data pipelines from CDH cluster to run on AWS.
  • Configured and monitored clusters in AWS and establish connections from Hadoop to NoSql data transfer.
  • Deployed Secondary data storage hadoop clusters in the cloud using AWS.
  • Installed Hadoop Cluster on AWS for Evaluation purpose.
  • Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch.
  • Developed terraform template to deploy Cloudera Manager on AWS.
  • Configured and installed several Hadoop clusters in both physical machines as well as the AWS cloud for POCs.

Show More

36. Log Data

low Demand
Here's how Log Data is used in Hadoop Administrator jobs:
  • Migrated real time log data with FLUME from VVD cluster, Compute, Edge & Management nodes into HDFS.
  • Configured Flume for efficient collection, aggregation and transformation of huge log data from various sources to HDFS.
  • Worked on Flume to collect, aggregate, and store the web log data from different sources.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data.
  • Used Flume to collect, aggregate and store the web log data onto HDFS.
  • Used Flume for Loading log data into HDFS from multiple sources.
  • Load log data into HDFS using Flume.
  • Used the RegEx, JSON for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
  • Involved in efficiently collecting and aggregating large amounts of streaming log data into Hadoop Cluster using Apache Flume.
  • Collected log data from Webservers and integrated into HDFS using Flume.
  • Loaded web log data into HDFS using apache flume for analysis.
  • Collected and aggregated large amounts of log data using HBase.
  • Developed Pig Latin scripts to filter the weblog data.
  • Worked with flume to import the log data from the reaper logs, syslog's into the Hadoop cluster.
  • Experienced in define being job flows with Oozie Loading log data directly into HDFS using Flume.
  • Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
  • Streamed weblog data into HDFS using Flume Exported data from Oracle database to Hive using Sqoop.

Show More

37. Job Performance

low Demand
Here's how Job Performance is used in Hadoop Administrator jobs:
  • Monitored system activities and fine-tuned system parameters and configurations to optimize job performance and ensure security of systems.
  • Monitored workload, job performance and capacity planning using MapR control systems.
  • Screened Hadoop cluster job performances and capacity planning, setting up queues and scheduler.
  • Monitored workload, job performance and collected metrics for Hadoop cluster when required.
  • Monitored Hadoop cluster, workload, job performance environments using Data dog and Cloudera Manager.
  • Monitored multiple hadoop clusters environments using Ganglia and Nagios, monitored workload, job performance and capacity planning using Ambari.
  • Screen Hadoop cluster job performances and capacity planning Experience in Disaster Recovery and High Availability of Hadoop clusters/components.
  • Managed nodes on Hadoop cluster and optimized Hadoop cluster job performance using Cloudera Manager.

Show More

38. Job Tracker

low Demand
Here's how Job Tracker is used in Hadoop Administrator jobs:
  • Enabled Job Tracker and Hue to listen on all interfaces.
  • Implemented schedulers on the Job tracker to share the resources of the cluster for the mapreduce jobs given by the users.
  • Installed and Configured Hbase Master and Region services on cluster Configured High Availability for Control services like Namenode and Job tracker.

Show More

39. Impala

low Demand
Here's how Impala is used in Hadoop Administrator jobs:
  • Created transaction using hive and Impala query editors and check the entries in the navigator logs.
  • Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
  • Performed installation, upgrade and configure tasks for impala on all machines in a cluster.
  • Worked on Impala performance tuning with different workloads and file formats.
  • Installed MySQL on NFS Server and Created Database for Impala Stats.
  • Installed and configured Drill, Fuse and Impala on MapR-5.1.
  • Created reporting views in Impala using Sentry Policy files.
  • Worked on installation and configuring the Spark2.9 and Impala1.2.3.
  • Implemented IMPALA for data processing on top of HIVE.
  • Granted Data access to Hive and Impala.
  • Implemented load balancing on Impala/Hive DB access with HAProxy load balancing that resulted in 15% reduction in query execution time.
  • Installed and configured Hadoop components MaprFS, Hive, Impala, Pig, Hue.
  • Manage Massive Parallel Processing with Impala with Hbase and Hive.
  • Installed CDH4 Services such as HDFS, MapReduce, Zookeeper, Hive, Oozie, Hue, and Impala.
  • Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra.
  • Configured MySQL for Hue, Oozie, and Impala Changed ports for HiveServer2.

Show More

40. Puppet

low Demand
Here's how Puppet is used in Hadoop Administrator jobs:
  • Designed, automated the process of installation and configuration of secure DataStax Enterprise Cassandra using puppet.
  • Performed centralized management of Linux boxes using Puppet.
  • Designed custom deployment and configuration automation systems to allow for hands-off management of clusters via Cobbler, FUNC, and Puppet.
  • Used Puppet for creating scripts, deployment for servers, and managing changes through Puppet master server on its clients.
  • Configured Security group for EC2 Window and Linux instances and also for puppet master and puppet agents.
  • Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
  • Automated the cloud deployments using Puppet, Python and AWS Cloud Formation Templates.
  • Automated the process of installation and configuration of the nodes using Puppet.
  • Configure and setup multiple nodes with writing Puppet manifest scripts.
  • Used Puppet tool for managing system configurations.
  • Worked with Puppet for automated deployments.
  • Worked with Puppet for application deployment Experience in developing customized UDF's in java to extend Hive and PigLatin functionality.
  • Performed Linux deployments utilizing Puppet and Chef with scalability in Amazon Web Services (AWS).
  • Worked with Puppet for application deployment Helping the users in production deployments throughout the process.
  • Worked on Hue interface for querying the data Automating system tasks using Puppet.
  • Installed, configured Hadoop Cluster using Puppet.
  • Manage deployment automation using Puppet and Ansible.
  • Performed Puppet, Kibana, Elastic Search, Talend, Red Hat infrastructure for data ingestion, processing, and storage.

Show More

41. POC

low Demand
Here's how POC is used in Hadoop Administrator jobs:
  • Implemented POC on Cassandra data replication for disaster recovery.
  • Involved in Cluster Capacity planning, deployment and Implementing POC.
  • Developed POC for Apache Kafka.
  • Involved in a POC to implement a failsafe distributed data storage and computation system using Apache YARN.
  • Involved in various POC activity using technology like Map reduce, Hive, Pig, and Oozie.
  • Created POC's for applications interested in the Big Data solutions Used Sqoop to transfer data.
  • Prepared Standard Operating Procedures for all the POC activities to reflect in production environment in future.
  • Worked as an admin in HortonWorks Hadoop distribution for 4 clusters ranges from POC to PROD.
  • Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
  • Worked on POC Recommendation System for social media using Movie lens dataset.
  • Worked on Migrating application by doing Poc's from relation database systems.
  • Implemented POC using Hive/Pig and Oozie.
  • Developed POC's on Amazon Web Services (S3, EC2, EMR, etc.
  • Installed the Elasticsearh for poc and configured by starting the services.
  • Worked on building a NEW 5 Node VM cluster for a Poc.
  • Set up Kerberos locally on 5 node POC cluster using Cloudera manager and evaluated the performance of cluster.
  • Started with POC on Cloudera Hadoop converting one small, medium, complex legacy system into Hadoop.
  • Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
  • Major CDH upgrade from 4.X to 5.X Part of POC using Kafka, Spark and HBase for streaming data analysis.
  • Build Kylin multi dimensional cubes using Hbase tables and injected billion records for POC performance testing/comparison.

Show More

42. Distcp

low Demand
Here's how Distcp is used in Hadoop Administrator jobs:
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Implemented cross realm between two clusters and established a trust for DISTCP.
  • Migrated data across clusters using DISTCP.
  • Performed cluster back using DISTCP.
  • Developed design for data migration form one cluster to another cluster using Distcp.
  • Back up of data from active cluster to a backup cluster using distcp.
  • Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
  • Backed up data on regular basis to a remote cluster using distcp.
  • Dumped the data from one cluster to another cluster by using Distcp.
  • Backed up data on regular basis to a remote cluster using distcp Fine tuning of Hive jobs for better performance.
  • Performed data ingestion using Hadoop distcp and Java file transfer techinques.

Show More

43. Cdh3

low Demand
Here's how Cdh3 is used in Hadoop Administrator jobs:
  • Installed and configured CDH3, 4 and 5 on each cluster on Ubuntu.
  • Created and published installation guides of CDH3, 4, 5..
  • Worked on performing minor upgrade from CDH3-u4 to CDH3-u6.
  • Upgraded the Cluster from CDH3U0 to CDH3U1.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS.
  • Experience with Cloudera CDH3, CDH4 and CDH5 distributions.
  • Implemented ten nodes CDH3 Hadoop cluster on Ubuntu LINUX.
  • Implemented CDH3 Hadoop cluster on RedHat Enterprise Linux 6.4.
  • Upgraded the Hadoop cluster from cdh3 to cdh4.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions.
  • Installed & managing distributions of Hadoop CDH3, CDH4, Cloudera manager, Map R, Horton works.
  • Worked on performing minor upgrade from CDH4.u3 to CDH.u5 Upgraded the Hadoop cluster from cdh3 to cdh4.
  • Upgraded the Hadoop cluster from cdh3 to cdh4 using Cloudera distribution packages.
  • Implemented CDH3 Hadoop cluster on Redhat & CentOS.
  • Involved in the process of designing, installation and configuration of Cloudera Hadoop cluster (CDH3) using Cloudera manager.
  • Involved in architecting Hadoop clusters using major Hadoop Distributions - CDH3 & CDH4.
  • Implemented nine node CDH3 Hadoop cluster on Redhat LINUX.

Show More

44. Fair Scheduler

low Demand
Here's how Fair Scheduler is used in Hadoop Administrator jobs:
  • Configured Fair Scheduler to provide service-level agreements for multiple users of a cluster.
  • Configured Fair Scheduler to provide service-level agreements for various teams.
  • Enabled resource management using Fair scheduler.
  • Used Fair Scheduler to manage Map Reduce jobs so that each job gets roughly the same amount of CPU time.
  • Use of fair scheduler to manage map reduce jobs for equivalent execution time of all the jobs.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Set up queues in Fair scheduler for efficient usage of cluster resources by different users.
  • Configured Fair Scheduler to provide fair resources to all the applications across the cluster.
  • Implemented Fair schedulers to share the resources of the cluster for the map.
  • Configured memory and v-cores for the dynamic resource pools within the fair scheduler.
  • Configured Fair scheduler to share the resources of the cluster.
  • Advanced knowledge in configuring Fair scheduler in cluster.
  • Involved in scheduling jobs using fair scheduler.
  • Enabled Fair Scheduler to Share resources fairly.
  • Define job flows using fair scheduler.
  • Job management using Fair scheduler.
  • Configured Fair scheduler to ensure proper resources usage of the Cluster for Map Reduce jobs submitted by the users.
  • Set up different queues using Fair schedulers to share cluster resources between Datameer jobs and regular jobs.

Show More

45. Relational Databases

low Demand
Here's how Relational Databases is used in Hadoop Administrator jobs:
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Exported the analyzed data to the relational databases Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
  • Copied data from Relational databases to hadoop HDFS using sqoop and fulfilled business requirement.
  • Load data from relational databases into MapR-FS filesystem and HBase using Sqoop.

Show More

46. Pig Scripts

low Demand
Here's how Pig Scripts is used in Hadoop Administrator jobs:
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Analyzed data using Pig and wrote Pig scripts by grouping, joining and sorting data.
  • Used Java to develop User Defined Functions (UDF) for Pig Scripts.
  • Developed Pig scripts in the areas where extensive coding needs to be reduced.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Analyzed the data by using Hive queries and running Pig scripts to study customer behaviour.
  • Implemented best income logic using Pig scripts and UDFs.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Installed Oozie workflow engine to schedule Hive and PIG scripts.

Show More

47. Capacity Scheduler

low Demand
Here's how Capacity Scheduler is used in Hadoop Administrator jobs:
  • Configured Capacity scheduler for Queue and Resource management.
  • Implemented Fair schedulers and Capacity schedulers to share the resources of the cluster with other teams to run map reduce jobs.
  • Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
  • Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
  • Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
  • Implemented Fair scheduler and capacity scheduler to allocate fair amount of resources to small jobs.
  • Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
  • Configured capacity scheduler and allocated resources to various pool through YARN queue manager.
  • Configured YARN queues - based on Capacity Scheduler for resource management.
  • Worked on Configuring queues in capacity scheduler.
  • Configured the capacity scheduler in the Dev cluster and allocated the resources as required by enabling the queues.
  • Worked on analyzing Data with HIVE and PIG Implemented Fair scheduler and capacity schedulers.
  • Scheduled jobs using Hadoop using FIFO and fair scheduler along with capacity scheduler.

Show More

48. Ecosystem Components

low Demand
Here's how Ecosystem Components is used in Hadoop Administrator jobs:
  • Analyzed logs and resolved issues with in the cluster and ecosystem components.
  • Involved in installing Hadoop ecosystem components.
  • Experience installing Hadoop Ecosystem components.
  • Worked on developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS.
  • Involved in performance tuning of various hadoop ecosystem components like YARN, MRv2.
  • Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, and HBase.
  • Upgrade of Hadoop ecosystem components (Pig, Hive, oozie).
  • Experience in installing Hortonworks stack of Hadoop and other ecosystem components.
  • Installed, configured and integrated Hadoop Ecosystem components (Hive, Sqoop, Flume, PIG, and Tez).
  • Installed and configured multi-nodes fully distributed Hadoop cluster Involved in installing Hadoop Ecosystem components.
  • Deployed Hortonworks Hadoop Ecosystem components such as Sqoop, Hbase and Mapreduce.

Show More

49. Rdbms

low Demand
Here's how Rdbms is used in Hadoop Administrator jobs:
  • Installed, configured, tested and administration of RDBMS/NoSQL database clusters in AWS Virtual Private Cloud Network.
  • Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
  • Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
  • Experience in data migration from RDBMS to Cassandra.
  • Implemented generic export framework for moving data from HDFS to RDBMS and vice-versa.
  • Imported data using Sqoop to load data from RDBMS to HDFS whenever required.
  • Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
  • Moved data from HDFS to RDBMS and vice-versa using SQOOP.
  • Handled incremental data loads from RDBMS into HDFS using Sqoop.
  • Involved in transferring data between RDBMS and HDFS using Sqoop.
  • Analyzed differences between Traditional RDBMS database and HBase.
  • Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using sqoop for processing the data.
  • Used Sqoop to import the data to Hadoop Distributed File System (HDFS) from RDBMS.
  • Used HBase tables to load semi-structured, structured and un-structured data from existing RDBMS.
  • Use ofSqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Migrated structured data from multiple RDBMS servers to Hadoop platform using Sqoop.
  • Worked on data imports from various rdbms systems like mysql, oracle and ms-sql servers.

Show More

50. OS

low Demand
Here's how OS is used in Hadoop Administrator jobs:
  • Involved closely with developers for choosing right compaction strategies and consistency levels.
  • Resolve application problems, support end-users, and manage the processes of reporting, diagnosing, and troubleshooting.
  • Hand on experience on cluster upgrade and patching without any data loss and with proper backup plans.
  • Evaluate and propose new tools and technologies to meet the needs of the organization.
  • Developed the ingestion pipeline framework for collection of surveys across all customers and types.
  • Worked closely with data analysts to construct creative solutions for their analysis tasks.
  • Created a local YUM repository for installing and updating packages.
  • Supported Map Reduce Programs those are running on the cluster.
  • Implemented JMS for asynchronous auditing purposes.
  • Build tables, views, Indexes, Macros, create roles, profiles, users, database objects, indexes.
  • Proposed new hardware/software environments required for Hadoop and expand existing environments.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Visa Inc is an American multinational financial services corporation headquartered in Foster City, California, United States.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
  • Installed centos using Pre-Execution Environment boot and Kick Start method on multiple servers.
  • Experience in using distcp to migrate data between and across the clusters.
  • Perform cluster validation and run various pre-install and post install tests.
  • Configured Metastore for Hadoop ecosystem and management tools.
  • Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
  • Provided Hadoop, OS, Hardware optimizations.

Show More

20 Most Common Skill for a Hadoop Administrator

Hadoop14%
Hdfs7.3%
Sqoop6.7%
Cluster Nodes5.7%
Hbase5.6%
Flume5.5%
Oozie5.2%
Cloudera Hadoop5.1%

Typical Skill-Sets Required For A Hadoop Administrator

RankSkillPercentage of ResumesPercentage
1
1
Hadoop
Hadoop
9.9%
9.9%
2
2
Hdfs
Hdfs
5.1%
5.1%
3
3
Sqoop
Sqoop
4.7%
4.7%
4
4
Cluster Nodes
Cluster Nodes
4%
4%
5
5
Hbase
Hbase
3.9%
3.9%
6
6
Flume
Flume
3.9%
3.9%
7
7
Oozie
Oozie
3.7%
3.7%
8
8
Cloudera Hadoop
Cloudera Hadoop
3.6%
3.6%
9
9
SQL
SQL
3.2%
3.2%
10
10
Review Log Files
Review Log Files
3.2%
3.2%
11
11
Linux
Linux
3.1%
3.1%
12
12
Mapreduce
Mapreduce
3.1%
3.1%
13
13
Capacity Planning
Capacity Planning
2.8%
2.8%
14
14
Zookeeper
Zookeeper
2.8%
2.8%
15
15
High Availability
High Availability
2.6%
2.6%
16
16
Kerberos
Kerberos
2.6%
2.6%
17
17
CDH
CDH
2.1%
2.1%
18
18
Data Nodes
Data Nodes
2.1%
2.1%
19
19
File System
File System
1.9%
1.9%
20
20
Setup
Setup
1.9%
1.9%
21
21
Nagios
Nagios
1.8%
1.8%
22
22
Unix
Unix
1.7%
1.7%
23
23
Hortonworks
Hortonworks
1.4%
1.4%
24
24
Hive Tables
Hive Tables
1.3%
1.3%
25
25
Ambari
Ambari
1.2%
1.2%
26
26
HDP
HDP
1.2%
1.2%
27
27
Name Node
Name Node
1.1%
1.1%
28
28
ETL
ETL
1.1%
1.1%
29
29
Nosql
Nosql
1%
1%
30
30
Version Upgrades
Version Upgrades
1%
1%
31
31
Kafka
Kafka
1%
1%
32
32
Workflow Engine
Workflow Engine
1%
1%
33
33
Configuration Management
Configuration Management
1%
1%
34
34
Review Data Backups
Review Data Backups
1%
1%
35
35
AWS
AWS
1%
1%
36
36
Log Data
Log Data
1%
1%
37
37
Job Performance
Job Performance
0.9%
0.9%
38
38
Job Tracker
Job Tracker
0.9%
0.9%
39
39
Impala
Impala
0.8%
0.8%
40
40
Puppet
Puppet
0.8%
0.8%
41
41
POC
POC
0.8%
0.8%
42
42
Distcp
Distcp
0.8%
0.8%
43
43
Cdh3
Cdh3
0.8%
0.8%
44
44
Fair Scheduler
Fair Scheduler
0.8%
0.8%
45
45
Relational Databases
Relational Databases
0.8%
0.8%
46
46
Pig Scripts
Pig Scripts
0.8%
0.8%
47
47
Capacity Scheduler
Capacity Scheduler
0.7%
0.7%
48
48
Ecosystem Components
Ecosystem Components
0.7%
0.7%
49
49
Rdbms
Rdbms
0.7%
0.7%
50
50
OS
OS
0.6%
0.6%

17,640 Hadoop Administrator Jobs

Where do you want to work?