Top Data Scientist Skills

Below we've compiled a list of the most important skills for a Data Scientist. We ranked the top skills based on the percentage of Data Scientist resumes they appeared on. For example, 8.4% of Data Scientist resumes contained R as a skill. Let's find out what skills a Data Scientist actually needs in order to be successful in the workplace.

The six most common skills found on Data Scientist resumes in 2020. Read below to see the full list.

1. R

high Demand
Here's how R is used in Data Scientist jobs:
  • Developed promotional pricing optimization procedures.
  • Developed analyses to differentiate oilfield water utilization and production trends by their corresponding geological formation in the greater Permian basin.
  • Implemented algorithms to analyze credit card purchases in order to provide specialized recommendation to customers based on their purchase history.
  • Worked on dependency parsing of unstructured product review data for sentiment and aspect based analysis by computing specification scores.
  • Collaborated with collections to identify characteristics of customers likely to 'never pay' Sprint through a decision tree analysis.
  • Collaborated with product management and engineering departments to understand every California Community College needs and devised possible solutions.
  • Designed and implemented statistical and machine learning pipelines for data exploration, feature engineering/extraction, and predictive modeling.
  • Developed and implemented an automated flight scheduling solution which optimized the Naval Academy s Powered Flight Program.
  • Finished stress testing on different portfolios under different market scenarios to further establish our forecasting models.
  • Worked on, multiple projects to leverage statistical learning/machine learning algorithms to automate Alternate Asset Servicing.
  • Measure effectiveness of marketing campaigns, reporting results to marketing leaders with recommendations for future campaigns.
  • Managed offshore resources and acted as a communication funnel to upper management helping determine priorities.
  • Implemented new statistical or other mathematical methodologies as needed for specific models or analysis.
  • Help merchants by providing recommendations to optimize inventory & space allocations and reduce out-of-stocks.
  • Develop database systems to streamline wildlife trade control beginning with internationally protected fish and invertebrates
  • Implemented statistical methods to match new songs to significantly similar audio profiles.
  • Build time-series forecasting models to determine if internal advertisements are effective.
  • Support development of client-side, cloud-based predictive analysis engine software portal.
  • Follow agile software development methods to develop an internal reporting application.
  • Provided graphic report to illustrate portfolio performance under various market trend.

Show More

2. Pl/Sql

high Demand
Here's how Pl/Sql is used in Data Scientist jobs:
  • Designed and Developed Oracle11g, PL/SQL Procedures and UNIX Shell Scripts for Data Import/Export and Data Conversions.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Provide analysis, validation and synchronization PL/SQL scripts for system data initialization.
  • Developed PL/SQL procedures for specialized processes and select data transformations.
  • Delivered models and prototypes for multiple browser-based visualizations for TurboTax user attrition leveraging D3 JavaScript API and Oracle PL/SQL.
  • Write PL/SQL packages, store procedures and functions to load the data and generate batch files to initiate the load process.

Show More

3. Python

high Demand
Here's how Python is used in Data Scientist jobs:
  • Created categorization models in Python to identify customer loss and metrics to identify customers prior to discontinuation for improved retention.
  • Translated scientific and statistical designs to a developer based on common Python coding language.
  • Developed python based statistical visualization to provide insights of fuzzy social media data.
  • Manipulated internal database and scraped different open data sources online in Python.
  • Improved curriculum materials in python, machine learning and statistical inference.
  • Implemented a Python-based distributed random forest via Python streaming.
  • Build customized filters in Python achieve initial candidate selection.
  • Evaluated multiprocessing infrastructure in R versus python.
  • Implemented Support Vector Machine (SVM), Logistic regression model, K-means clustering for predictive analysis using python Scikit-Learn 0.18.
  • Analyzed and designed feature set to maximize model fit with R. Implemented the machine learning algorithm into production software utilizing Python.
  • Analyzed, verified, and modified UNIX, SAS, and Python scripts to improve data quality and performance.
  • Utilized Java, Python, and SQL to develop and maintain customized algorithms that meet customer and business needs.
  • Created data acquisition and cleaning protocol using the newest technology including MongoDB, Python, R, and AWS
  • Explored, debugged and improved fundamental data system, efficiently access data queue and process data using Python.
  • Trained neural network model for prediction of approval of credit card for a given client using python libraries.
  • Conduct regular training on how to use AWS, the command line, and python for data science.
  • Assisted the project with Python programming, coding and running QA on the same from time to time.
  • Created Python scripts using regular expressions to scan documents for contact information and answers to survey questions.
  • Used Python as a programming tool to analyze public MTA turnstile data to make an informed decision.
  • Mined the non-digitized patents with Python code to clean and to strip them into an analysis-ready format.

Show More

4. Analytics

high Demand
Here's how Analytics is used in Data Scientist jobs:
  • Advance a LLNL program's fundamental understanding of advanced data analytics by modernization of related processes using applied research and development.
  • Investigate existing and emerging technologies to explore analytics solutions for business partners, and make recommendations for enterprise-wide implementation.
  • Mentored sophisticated organizations on large scale data and analytics using advanced statistical and machine learning models.
  • Evaluated potential vendors and products to enable continued maturity of analytics tools and safeguard data integrity.
  • Designed and applied statistical and mathematical methods for corporate analytics that were implemented into client-facing products.
  • Provided external analytics modeling consultant services to commercial teams in businesses outside the chemical industry.
  • Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Maintained a $542M utilities budget by delivering business intelligence and predictive analytics.
  • Consulted in Competitive Intelligence, Due Diligence, Strategic Analytics and Scientific Assessment
  • Have implemented Google Analytics connector and implemented visualizations for important KPI's.
  • Helped major companies and organizations use predictive analytics to inform business strategy.
  • Fantasy Football Predictive Analytics: Use predictive modeling to project future fantasy points
  • Provided thought leadership for integrating business concepts to analytics and data mining.
  • Implemented end-to-end systems for Data Analytics, Data Automation and Integration.
  • Constructed vision for analytics improvements on retail and hospitality clients.
  • Performed behavioral analytics to identify various demographic patterns of users.
  • Improved analytics for every customer-facing department in the company.
  • Delivered in depth descriptive analytics on U.S. cancer demographics.
  • Provided consulting services to clients regarding predictive analytics initiatives.

Show More

5. Algorithms

high Demand
Here's how Algorithms is used in Data Scientist jobs:
  • Designed and implemented proprietary distributed clustering algorithms
  • Executed massive voice-of-customer initiative by scrapping twitter feeds and developing machine learning algorithms to assess the sentiment of each tweet.
  • Implemented supervised machine learning algorithms to predict the engine performance based on the selected features using multivariate regressions.
  • Led efforts to develop feature extraction algorithms on Twitter data for detection and identification of social systems.
  • Developed offline model of current algorithm as well as implemented several offline algorithms that improved upon it.
  • Developed complex algorithms to monitor suspicious transactions that increased due diligence of regulatory requirement by 40%.
  • Designed different query algorithms for obtaining relevant images pertaining to a color using the above indexed data.
  • Developed a program to identify facial expressions in images using information retrieval techniques and machine learning algorithms.
  • Performed a variety of statistical algorithms including cluster analysis, predictive modeling, and Bayesian networks.
  • Worked with software engineering teams to implement and verify production statistical modeling and machine learning algorithms.
  • Formulate and implement clinical quality measure algorithms and validation strategies to ensure data validity.
  • Designed and launched the new Business-to-Business Customer Experience Index, including developing proprietary algorithms.
  • Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
  • Developed NLP and predictive algorithms to turn massive review data into actionable information.
  • Utilized spatial-temporal cluster algorithms to detect the activity clusters and stay points for patients
  • Designed and implemented machine learning algorithms to enhance existing data mining capabilities.
  • Refine and improve processes and algorithms with technical input from investigative analysts.
  • Developed predictive algorithms optimizing daily manpower needs for maximal warehouse throughput.
  • Provide data expertise to data scientist for developing machine learning algorithms.
  • Designed novel algorithms to detect market manipulation in high frequency trading.

Show More

Job type you want
Full Time
Part Time
Internship
Temporary

6. Hadoop

high Demand
Here's how Hadoop is used in Data Scientist jobs:
  • Discovered interesting correlations between Wikipedia traffic volume spike and news events using R and Hadoop.
  • Played integral part in transitioning company from relational databases to Hadoop based data-store.
  • Implemented distributed algorithms in the Hadoop environment using Hive MapReduce.
  • Developed Hadoop to MongoDB integration.
  • Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required data through it.
  • Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Experience in developing custom Map Reduce Programs in Java using Apache Hadoop for analyzing Big Data as per the requirement.
  • Farm machine logs collector: gathers information from several types of farm machines and stores it in Apache Hadoop.
  • Review current technology for storing and processing Big Data including Hadoop, SPARK, MATLAB, and IBM BigInsights.
  • Worked on ingesting company data into Hadoop to form a data lake and analyzing it using DDL statements.
  • Hive and Impala queries written in Hadoop cluster to analyze pattern of usage of various services by subscribers.
  • Used Hadoop - HIVE to fit the complete data and HIVE queries to perform Data Munging.
  • Created visualization using R Skills Used Environment: Linux, Hadoop, MySQL, R, R-Studio
  • Established partnerships with top Big Data vendors, DataStax/Cassandra, DataBricks/Spark and 3 Hadoop distributions.
  • Designed and implemented software using the following tools: Hadoop, Cassandra, and Java.
  • Delivered Internal Training classes on Big Data Hadoop, Spark to ramp up the teams.
  • Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
  • Gained firm understanding of Hadoop architecture, which involves data processing in various nodes.
  • Used client s Hadoop platform to pull data via Hive and Pig queries.
  • Configure and Built the Hadoop clusters for Production, Staging and Test Environments.

Show More

7. Logistic Regression

high Demand
Here's how Logistic Regression is used in Data Scientist jobs:
  • Developed machine learning models utilizing Logistic Regression and Random Forests to identify human characteristics and behavior to derive striking insights.
  • Designed and developed data wrangling and visualization techniques as well as a classification engine based on Logistic Regression.
  • Build Ordinal Logistic Regression or Generalized Logistic Regression to present the relationship between injury level and risk factors.
  • Developed a multiple logistic regression prediction model to evaluate quality of mortgages.
  • Experience with supply chain management analyzing data with logistic regression model.
  • Developed Random forest and logistic regression models to observe this classification.
  • Implemented logistic regression algorithm and measured the model accuracy.
  • Build analytic models using a variety of techniques such as logistic regression, risk scorecards and pattern recognition technologies.
  • Performed logistic regression within each cluster on other predictors like image size, name, caption 4.
  • Used R to pull large-scale data and help build logistic regressions to predict customer churn rate.
  • Fit Logistic Regression, Linear Regression Models on training data, using SciKit Learn.
  • Used Decision tree CART and Logistic Regression to identify the loan defaulters.
  • Constructed, fitted and diagnosed a logistic regression model for cancer data.
  • Developed Statistical Analysis and Response Modeling for Analytical Data base contributors (logistic regression).
  • Performed analyses using classification trees, principle component analysis, and logistic regression.
  • Used GLM packages in R for implementation of Multi Band Logistic Regression which can give a linear output.
  • Developed a Logistic Regression based filter that uses merchant transactions data to block false positive merchant alerts.
  • Compared frequentist logistic regression methods to Bayesian Markov Chain Monte Carlo methods.
  • Combined IMDb rating data, along with Oscars demographicsdata to build a classification model, using logistic regression and random forest.
  • Trained a model for multi label data, implemented logistic regression classifier on 34gb of data using apache spark.

Show More

8. Data Warehouse

high Demand
Here's how Data Warehouse is used in Data Scientist jobs:
  • Performed performance improvement of the existing Data warehouse applications to increase efficiency of the existing system.
  • Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
  • Manage data warehouse and retrieving data from MYSQL database for analysis
  • Translated operational rules into ETL and data warehouse requirements.
  • Created views, queries, and data warehouse reports using SSRS providing management with financial information from SQL Server production databases.
  • Created and optimized processes in the Data Warehouse to import retrieve and analyze data from the CyberLife database.
  • Provided inputs to development team in performing extraction, transformation and load for data marts and data warehouses.
  • Worked as Big Data Architect and data engineer to ensure business value is achieved in 1.5M data warehouse.
  • Manage EFA data warehouse and develop data acquisition, ETL and data quality control framework and software.
  • Developed the data warehouse model (star schema) for the proposed central model for the project.
  • Worked on enhancements to the Data Warehouse model using Erwin as per the business reporting requirements.
  • Maintained SQL scripts to create and populate tables in data warehouse for daily reporting across departments.
  • Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
  • Managed data warehouse (DW) release deployment and maintenance in an Agile development environment.
  • Implemented and maintained a data warehouse used by over 130 employees.
  • Assist R enterprise implementation in the enterprise oracle data warehouse platform.
  • Build ETL scripts and provided Oracle Data Warehouse tables design.
  • Experience extracting from Data Warehouse using HIVE and Impala.
  • Optimized OLAP system for MSN's Data Warehouse.
  • Re-engineered data warehouse within 8 weeks.

Show More

9. Big Data

high Demand
Here's how Big Data is used in Data Scientist jobs:
  • Automated and integrated customization algorithm for different business partners into big data environment.
  • Developed and evangelized best practices for statistical analysis of Big Data.
  • Developed and implemented big data and machine learning capabilities that could predict and optimize key quantities and events of the business.
  • Devised Big Data adoption plan to assess existing IT systems and screen hiring applicants to help management deploy said strategy.
  • Voice of Customer was a system that used Big Data technologies to get insight into customer behavior.
  • Used ElasticSearch (Big Data) to retrieve data into application as required.
  • Utilized MapR as a low-risk big data solution to build a digital oilfield.
  • Evaluate large databases (terabyte+) using big data methodology.
  • Build Churn Prediction models using Big Data Platform.
  • Worked with big data using Apache Spark.
  • Utilized advanced methods of big data analytics, machine learning, artificial intelligence, wave equation modeling, and statistical analysis.
  • Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
  • Tasked to evaluate and integrate Fusion Charts libraries as strategic reporting tool for Big Data Analytics platform.
  • Utilized specific big data oriented hardware like Cray s Urika GX.

Show More

10. Amazon Web

high Demand
Here's how Amazon Web is used in Data Scientist jobs:
  • Implemented the web interface using Flask and deployed it using Amazon Web Service.
  • Utilized Amazon Web Services for cloud computing, from networking to storage.
  • Used Amazon Web Services to host website for app.
  • Worked with the database on Amazon Web Services.
  • Worked in Amazon Web Services cloud computing environment.
  • Used Amazon Web Services S3 to store large amounts of data in a repository.

Show More

11. SAS

high Demand
Here's how SAS is used in Data Scientist jobs:
  • Performed standard SAS Administration duties such as creating users/groups, defining libraries/databases, and registering metadata.
  • Led the analysis in SAS for data integration of mortality data using meta-analysis integration methods.
  • Provided technical support on SAS programming related to data manipulation and analysis.
  • Developed and performance tuned various SAS modules hosting reporting applications.
  • Changed our model codes and results for smooth connection with SAS MO system and managed the testing, implementation of MO.
  • Created modeling platform in SAS and Visual Basic 6.0, enabling efficient estimation of linear, logistic, and Tobit models.
  • Analyze business problems and design Statistical models using Regression and Machine Learning, using SAS, R, and H2O.
  • Produced RTF, CSV, PDF and HTML formatted files using SAS to produce reports for presentation and further analysis.
  • Created SAS-based system that extracts and mines queue time data and identifies operations whose queue times have correlations to yield.
  • Use SAS to do text analysis for police report (CDS, SEMCOG) to extract crash micro scenario.
  • Solved and explained discrepancies in financial health insurance data using SAS software and applying data management techniques.
  • Performed Data Validation and Data Cleaning using PROC SORT, PROC FREQ and through various SAS formats.
  • Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
  • Promoted enterprise-wide business intelligence by enabling report access in SAS BI Portal and on Tableau Server.
  • Worked with DBMI, SAS_IT, BI team and automated 2 mature modeling and scoring process.
  • Developed and executed SAS SQL queries for merging, concatenating and updating large volumes of data.
  • Used SAS/Macro facility to create macros for statistical analysis, reporting results and data extraction.
  • Developed Predictive Clinical Score Index for severe morbidity after coronary artery surgery (SAS).
  • Included detailed product profitability study and market forecasts (SAS, ARIMA, Regression).
  • Created complex reports utilizing SAS, Microsoft Word, Microsoft Excel, and R studio.

Show More

12. Data Visualization

high Demand
Here's how Data Visualization is used in Data Scientist jobs:
  • Implemented clinical reporting programs that were utilized by both clinical and data management teams which aided in data visualization and reporting.
  • Key internal adviser on statistical modeling, machine learning, data validation, data visualization, and business intelligence processes.
  • Provided clients with high quality data visualizations including static plots, animations, and interactive Shiny applications.
  • Provided training for executives on statistical analysis, Tableau, data visualization and storytelling best practices.
  • Model selection, Statistical analysis, Time Series analysis, Frequency domain analysis, Data Visualization.
  • Developed technical documents and reports, and designing data visualizations to communicate complex analysis results.
  • Specialized in developing intuitive user interfaces and data visualization tools within a team environment.
  • Create reports by using data visualization on Tableau to provide strategy recommendations.
  • Manipulated and prepared the data for data visualization and report generation.
  • Performed Inventory Analysis with Statistical and Data Visualization Tools.
  • Translate all data to statistical reports with data visualizations.
  • Produced interactive data visualizations using Tableau and Python.
  • Build web interface for data visualization and analysis
  • Develop back-end web application, such as query, data visualization, and real time data process for data service.
  • Applied Large scale and low latency Machine learning to Non-parametric models, Fraud detection models, High dimensional data visualization.
  • Deployed data-driven tools to enable a large restaurant chain reduce food-waste by 15% through demand forecasting and data visualizations.
  • Conceived data visualization techniques using R, Microsoft Excel and Tableau to identify data relationships, trends and anomalies.
  • Used R to create various data visualization plots between RSSI and logarithmic distance among various nodes.
  • Work on projects about data visualization, text mining, deep learning and computer vision.
  • Prepared Data Visualization reports for the management using R, Tableau, and Power BI.

Show More

13. Data Science

high Demand
Here's how Data Science is used in Data Scientist jobs:
  • Consult clients in Data science/statistical consultation Database development/design Machine learning (Classification, Regression etc.)
  • Completed Microsoft Professional Program in Data Science.
  • Serve as the sole in-house Principle Data Scientist to build data science functionality for the company from the ground up.
  • Designed and delivered five day training on introduction to data science and research methods for public sector employees.
  • Worked with Data science team and developed algorithm, to detect financial misstatement in Audit data.
  • Led/managed software engineering and data science teams (ranging from 3-8 scientists and engineers).
  • Covered fundamentals of Data Science Essential, Principles of Machine Learning and Statistics.
  • Collaborated with data science team to create user level telemetry outputs on apps.
  • Advise senior management team on, data science strategy and vendor selection.
  • Produced descriptive tables and presented the analysis to the data science team.
  • Helped promote data science within R&D and all BV products.
  • Implement data science and engineering techniques to bring insight to difficult data.
  • Lead efforts to implement Big Data and Data Science practices.
  • Supervised interns in data science and data engineering projects.
  • Perform end-to-end data science research and development projects.
  • Act as a data science subject matter expert.
  • Specialized in Data Science and Machine Learning.
  • Leverage data analytics and data science to bring key pricing initiatives/strategies from ideas only to implementation to performance management.
  • Collaborated with marketing and technical professionals across organizations to translate business requirements into data science questions and actions.
  • Collaborated with several multidisciplinary groups of my peers to participate in Kaggle data science competitions.

Show More

14. Predictive Models

high Demand
Here's how Predictive Models is used in Data Scientist jobs:
  • Developed explanatory/ predictive models using independent variables (manufacturing process variables, raw material attributes) to predict critical parameters.
  • Researched methods to improve statistical inferences of variables across models and developed statistical, mathematical and predictive models.
  • Developed regression based predictive models using macro-economic variables to be used in Long Range Business Planning process.
  • Implemented Predictive Models for ad delivery algorithm to provide ad targeting with suitability and ranking.
  • Designed multiple research studies, developed targeted profiles through quantitative methods including predictive models.
  • Designed and Developed online advertising predictive models for JenJo's proprietary trading platform.
  • Developed statistical tools for comparing the performance of predictive models.
  • Skilled with using R to build predictive models that update continuously, mine historical data, and predict future outcomes.
  • Utilized Machine Learning algorithms in Python to create predictive models for consumer behavior, client activity, and revenue projection.
  • Used Mahout and collaborative filtering to build predictive models, which were used to optimize ad campaign performance.
  • Worked on multiple predictive models to predict future electricity and gas bills based on the current usage patterns.
  • Performed statistical analysis in languages: SAS and R. Created various predictive models using structured or unstructured inputs.
  • Created predictive models to increase the revenue of the company by studying the demographics and usage patterns.
  • Build predictive models, based on facility characteristics of the adoption of solar and power storage technologies.
  • Developed predictive models using SAS and R, leading to 48% improvement on incremental sales.
  • Developed predictive models using Logistic regression, Decision Tree, Random Forest and KNN algorithms.
  • Implemented predictive models in python to improve click-through rate for an email marketing campaign.
  • Developed predictive models using regression, C5.0, decision lists, and decision trees.
  • Developed predictive models using Decision Tree, Random Forest and Na ve Bayes.
  • Engineered new features for predictive models to improve the cross-device attribution graph.

Show More

15. ETL

average Demand
Here's how ETL is used in Data Scientist jobs:
  • Authored tailor-made ETL system to analyze conversation between customer & agent to identify new potential business opportunities.
  • Utilize SAS programming software and ETL techniques to manage and maintain U.S. foreign assistance and trade capacity building databases for USAID.
  • Created ETL packages using SSIS to extract data from the relational database and then transform and load into the data mart.
  • Led team consisting of data content experts, DBA, and programmers to deliver robust solution to ETL needs.
  • Lead the transition from a single monolithic ETL process to more modular processes using Spark ML data pipelines.
  • Worked on file systems, server architectures, databases, SQL, and data movement (ETL).
  • Created ETL processes to source, transform and link data to be fed into machine learning algorithms.
  • Worked on Data modeling, Data integration, Data Migration, ETL process and Business Intelligence.
  • Update SQL/ETL supporting bank-wide profitability (SVA) model, affecting several million in officer compensation.
  • Implemented ETL data streams from end-to-end, delivered data points and checked data quality regularly.
  • Fixed broken ETL processes, reducing production system downtime from 2 hours daily to zero.
  • Interfaced with large scale database system through an ETL server for data extraction and preparation.
  • Build machine learning pipeline for customer churn with end-to-end ETL processes with SQL/NoSQL databases.
  • Managed Data quality & integrity using skills in Data Warehousing, Databases & ETL.
  • Write small code/macros for nuclear data ETL, data visualization, and graphical presentation.
  • Developed an ETL pipeline to predict soil properties across the entire United States.
  • Perform data-extraction, transformation and loading (ETL) development and programming.
  • Coded and deployed this ETL pipeline to be run in batch mode.
  • Use of scripting for backups, ETL, and report assembly.
  • Evaluate cloud based ETL/ELT tools and perform a pilot on AWS.

Show More

16. Data Analysis

average Demand
Here's how Data Analysis is used in Data Scientist jobs:
  • Performed data analysis, data migration, data preparation, graphical presentation, statistical analysis, reporting, validation and documentation.
  • Developed supervised learning-based computer application for global mining-machine data analysis; contributed significantly in earning $1.3 million revenue for company.
  • Performed quantitative high-dimensional data analysis using LASSO on the order transactions error data set in developing forecast of error occurrence.
  • Performed exploratory data analysis for identifying high value candidate features and gaining additional insight into the efficacy of engineered features.
  • Worked closely with cross functional teams to encourage statistical best practices with respect to experimental design and data analysis.
  • Developed robust data analysis tools used to produce statistical analysis reports along with maintaining my team's code repository.
  • Performed exploratory data analysis on corporate purchase orders, contracts and projects data using sampling and statistical methods.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Executed statistical data analysis plans to evaluate completeness and correctness of data supplied by several vendors.
  • Design and develop new algorithms for data analysis and prediction; application to battery Performance data.
  • Performed data analysis, statistical analysis and generated reports and graphs for customer data.
  • Designed, developed and documented logic for algorithms for data analysis and presentation.
  • Provide technical support for data analysis and enterprise database system utilizing SAS and SQL
  • Conducted Exploratory Data Analysis using R and carried out visualizations with Tableau reporting.
  • Develop data analysis/visualization pipeline based on mathematical modeling and machine learning algorithms.
  • Implemented deep learning architectures for use in data analysis and classification.
  • Developed new methods and techniques for accurate data analysis and interpretation.
  • Interpreted data analysis results and made recommendations for client performance improvement.
  • Analyzed market growth and implemented efficient strategies using exploratory data analysis.
  • Assisted in developing internal tools for data analysis and statistical modeling.

Show More

17. SQL

average Demand
Here's how SQL is used in Data Scientist jobs:
  • Developed and automated the data manipulation process for above using store procedures/views in SQL Server.
  • Created normalized database architecture and accordingly designed a Relational Database in SQL.
  • Redefined many attributes and relationships and cleansed unwanted tables/columns using SQL queries.
  • Analyzed business data primarily utilizing SQL queries and stored procedures in PostgreSQL.
  • Loaded data from flat files into SQL server database tables using BCP utility, bulk insert command and table export/import wizard.
  • Executed SQL scripts to extract data records, wrote C++ algorithms for data filtering and introduced shower correlation method in analysis.
  • Involved in business process modeling using DDL, DML, TCL statements for structured data in Microsoft SQL Server Management Studio.
  • Develop automated user reports using Visual Studio 2013, SQL Server 2014, MS Access 2016 and MS Office 365.
  • Implemented a second SQL Server (2008) for reverse engineering the proprietary building automation system and other development projects.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
  • Pick Systems: Performed strategic performance evaluation of Pick NoSQL multidimensional DBMS vs standard high performance database engines.
  • Involved in performing extensive Testing by writing T-SQL queries and stored procedures to extract the data from Database.
  • Prepared large volume of user history data and performed SQL query using SparkR (R on Spark).
  • Create visualizations in Tableau using existing data or data modified in SQL to suit clients' needs.
  • Analyzed large amounts of data using statistical techniques and implemented through SQL queries to derive actionable insights.
  • Retrieved and modified data using SQL Server 2008 and BIDS and acquired disease information from different databases.
  • Project McNulty: Setup SQL database on cloud server, storing client's data for query analysis.
  • Leveraged Tableau, SQL, Big Data, R, and Java to analyze and make proposals.
  • Automated and scheduled various reports generated on predetermined time using SSMS SQL Agent job or scheduled tasks.

Show More

18. Support Vector Machines

average Demand
Here's how Support Vector Machines is used in Data Scientist jobs:
  • Machine learning, statistics, survival analysis, support vector machines, neural nets, graph theory.
  • Used Support vector machines for classification of data in groups.
  • Gained familiarity with K-means clustering and support vector machines.
  • Utilized Support Vector Machines, Logistic/Linear Regression, Nearest Neighbors, Naive Bayes Classifiers.

Show More

19. Hdfs

average Demand
Here's how Hdfs is used in Data Scientist jobs:
  • Worked on importing data from various sources and performed transformations using Map Reduce, Hive to load data into HDFS.
  • Handled importing data from various data sources, performed transformations using MapReduce, Hive and loaded data into HDFS.
  • Spark SQL uses the Spark engine to execute SQL queries on data sets persisted in HDFS or data lake.
  • Analyzed sensors and well log data in HDFS with HiveQL and prepare for prediction learning models.
  • Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Worked with developers to extract data from HDFS to Spark shell for analysis.
  • Performed various Unix operations to import data to HDFS and run Map-Reduce jobs.
  • Import the data from different sources like HDFS/data lake into Spark.
  • Interfaced with Data Architecture teams using HDFS distributed systems.
  • Involved in extracting data from source to HDFS.
  • Dumped (ETL) to HDFS.
  • Installed and configured Hadoop, MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and processing.
  • Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Spark.
  • Cleaned and transformed these web data from json file to data frame, and loaded into Hive table in HDFS.
  • Involved in HDFS maintenance and loading the data using Sqoop and responsible to manage data coming from different sources.
  • Experience with configuration, management of Hadoop, HDFS clusters and MapReduce based algorithm development under UNIX/LINUX environment.
  • Have built and configured HDFS with User-level and Job-level disk storage, User Security using Kerberos, SSL.
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS.
  • Downloaded images are stored in HDFS, metadata about the images are stored in Sql Server.

Show More

20. Natural Language Processing

average Demand
Here's how Natural Language Processing is used in Data Scientist jobs:
  • Use natural language processing and topic clustering analysis with Latent Dirichlet Allocation and Non- negative Matrix Factorization.
  • Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
  • Developed internal natural language processing software to analyze product sentiment trends from these mentions.
  • Positioned as team expert in Natural Language Processing and Market Segmentation projects.
  • Design algorithms and systems for natural language processing (NLP) based feature extraction from social media postings.
  • Used Natural Language Processing (NLP) for response modeling and fraud detection efforts for credit cards.
  • Created word cloud graphics using R and natural language processing (NLP) to provide insight.
  • Performed natural language processing on users' reviews, incorporated with enhance star rating average.
  • Apply natural language processing to interpret legal rules using Text Mining and Data Mining methods.
  • Focused on issues around data science and especially natural language processing at scale.
  • Machine Learning, Natural Language Processing, Data Mining and Data Visualization.
  • Applied some basic natural language processing techniques to process social media data.
  • Create speech recognition and natural language processing components of the product.
  • Performed natural language processing to extract features from text data.
  • Applied GATENLP for text mining and natural language processing.
  • Develop natural language processing solutions (named entity recognition, keyword extraction).
  • Used standard natural language processing techniques, graph theory, and linear algebra to cluster, compare, and analyze films.
  • Conducted Natural Language Processing and machine learning analysis (Bayesian, logistic, SVM, Random Forest..etc).
  • Worked on a POC project for NLP(Natural Language processing) with Lowes Review Data.
  • Assisted in teaching the Applied Natural Language Processing course at gU.

Show More

21. Neural Networks

average Demand
Here's how Neural Networks is used in Data Scientist jobs:
  • Included decision trees, support vector machines, genetic programming, neural networks, distance correlation and mixture models.
  • Used Supervised learning techniques such as classifiers and neural networks to identify patters in these data sets.
  • Develop and modify Neural Networks and machine learning for pattern and image recognition.
  • Developed diagnostic tests for neural networks used in prediction.
  • Created web app that utilized convolutional neural networks to identify actors in user supplied images.
  • Implemented supervised learning algorithms such as Neural networks, SVM, Decision trees and Na ve Bayes for advanced text analytics.
  • Research Neural Networks is involved in examining the crime problem in Chicago by looking through public data sets.
  • Introduced Decision Trees, Neural Networks, LASSO & Quantile Regression techniques to the client targeting process.
  • Applied neural networks models (RNN, Feedforward, Auto encoder) on multiple projects.
  • Implemented .Net MicroService technology to allow various neural networks to share/co-evolve information faster and more intelligently.4.
  • Advanced Text analytics using Deep learning techniques such as convolutional neural networks to determine the sentiment of texts.
  • Experienced in Artificial Neural Networks(ANN) and Deep Learning models using Theano, Tensorflow and keras packages using Python.

Show More

22. Pandas

average Demand
Here's how Pandas is used in Data Scientist jobs:
  • Performed data modeling operations using Power Bi, Pandas, and SQL.
  • Analyzed NYC Subway Turnstile data for traffic patterns using pandas.
  • Utilized tools including Tableau, Pandas, SQL, Flask, JS, D3, MongoDB, and AWS.
  • Used Python libraries (like Pandas and Scikit-learn) to import, clean and analyze data, and Tableau for visualization.
  • Automated data reporting using Python (Pandas), R, and Redshift (SQL).
  • Predicted audience scores on Rotten Tomatoes using BeautifulSoup and Python (pandas, scikit-learn).
  • Perform EDA using Sql and Pandas.
  • Utilized data science packages such as scikit learn, pandas and jupyter to optimize student accessibility.
  • Scraped data aquired using BeautifulSoup and Selenium and analyzed using the pandas module.
  • Developed combination of Extractive and Abstractive summarizer using python packages (Pandas, Beautiful Soup, etc.)

Show More

23. K-Means

average Demand
Here's how K-Means is used in Data Scientist jobs:
  • Clustered retailers using k-means algorithm and identified different sales drivers for different groups of retailers.
  • Leveraged k-means to optimize media costs and improve advertising budget management system.
  • Improved Right Party Contact Rates by 11% by using Monte-Carlo based simulation k-means to establish contact likelihood estimation by date-parts.
  • Accomplished customer segmentation using K-means algorithm in R, based on behavioral and demographic tendencies, for improving campaigning strategies.
  • Created a recommendation system using k-means clustering, NLP and Flask to identify the potential customers.
  • Used R to develop k-means, random forest and decision tree models to classify the members.
  • Used R to do K-means clustering over the metric defined by latitude and longitude 3.
  • Utilized clustering method of K-means clustering to categorized patients into groups via packages in Spark.
  • Used k-means clustering for profiling customers, location, product, season, etc.
  • Applied k-means and hierarchical clustering (using R) on the above data.
  • Created clusters for customer type and vehicle type using k-means clustering.
  • Use cluster analysis (k-means) for market segmentation.
  • Segmented the customers based on demographics using K-means Clustering.
  • Used the Scikit-learn k-means algorithm to cluster news articles for the different state banking holidays together.
  • Use CHAID, Apriori, K-Means, SVM Classification algorithm for prediction of opportunities
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Used recency, frequency and monetary (RFM) indices to perform customer-segmentation using k-means and k-medoids clustering.

Show More

24. AWS

average Demand
Here's how AWS is used in Data Scientist jobs:
  • Installed, configured and maintained DNS systems using Route53 (AWS) and used JENKINS for continuous integration and continuous delivery.
  • Developed, tested, and fielded the supporting database and web application on an Amazon AWS LAMP stack.
  • Used AWS EC2 and RDS instances to scale across hundreds of virtual machines.
  • Developed platform requirements, designed prototype by using AWS.
  • Extracted data from various web page interfaces via AWS.
  • Used Amazon AWS for cloud computing.
  • Worked in AWS cloud computing environment.
  • Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval of data.
  • Determined new correlations between business categories in the YELP dataset using SAP HANA on AWS Marketplace.
  • Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
  • Deployed on AWS spot instances and persisted data to Postgresql on RDS.
  • Perform gap analysis and proposing a scalable architecture on AWS.
  • Created and administered Google and AWS Hybrid Cloud infrastructure.
  • Spark, Scala, deep learning for j, Cloudera, EMR, AWS.

Show More

25. Scikit-Learn

average Demand
Here's how Scikit-Learn is used in Data Scientist jobs:
  • Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
  • Classified trade journal and newspaper articles using Scikit-learn's k-nearest neighbor algorithm.
  • Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Scaled the link prediction experimentation from a specific decision tree classifier to any classifier within Spark ML/MLlib and scikit-learn.

Show More

26. Numpy

average Demand
Here's how Numpy is used in Data Scientist jobs:
  • Developed scripts in Python (Pandas, Numpy) for data ingestion, analyzing and data cleaning.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy.
  • Experience in Data wrangling tasks on Insurance data using python libraries Numpy and pandas.
  • Performed data processing using Python libraries like Numpy and Pandas.
  • Cleaned data using numpy and pandas.
  • Generated graphical reports using python package Numpy and matPlotLib.
  • Developed machine learning models for health insurance claims data, using R and python libraries like numpy, pandas, scilearn.
  • Implemented the customer choice model described in the paper Restricted Boltzmann Machines Modeling Human Choice using Theano and NumPy.
  • Work on outliers identification with box-plot, K-means clustering using Pandas, NumPy.
  • Architected and prototyped item recommendation system using python, scikit-learn, numpy
  • Created analytical reports for CEO using SQL, python (pandas, 2014 numpy, scipy, scikit-learn).
  • Utilized Python libraries wxPython, numPY, Twisted and matPlotLib Used python libraries like Beautiful Soup and matplotlib.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn in Python for developing various machine learning algorithms.

Show More

27. Teradata

average Demand
Here's how Teradata is used in Data Scientist jobs:
  • Acted as DBA for the cluster and trained subordinates on how to plan and architect databases and useTeradata Aster Database software.
  • Developed Python programs for manipulating the data reading from various Teradata, update the Content in the database tables.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards.
  • Developed SQL queries to extract data from a Teradata database to provide reports to clients of Precision.
  • Utilized SPSS, SAS, SAS Enterprise, and Teradata to perform analyses on millions of customers.
  • Developed SAS/SQL and Teradata/SQL queries for automation of reports in SAS and SSRS.
  • Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
  • Installed, upgraded, and maintained the Teradata Aster Database software.
  • Used Teradata utilities such as Fast Export, Multi LOAD for handling various tasks.
  • Worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport.
  • Perform tuning of Redshift database to match with performance of "on premise" Teradata implementation.
  • Developed "Guide to Using Informatica Power Center in a Teradata Aster nCluster Environment."
  • Managed data using DB2 and Teradata; modeled conditionalities environment of Bolsa Fam lia.

Show More

28. Mapreduce

average Demand
Here's how Mapreduce is used in Data Scientist jobs:
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Utilized Hive and MapReduce to process and link millions of rows of data from multiple data sources.
  • Implemented Theta Join from structural database for large scale tables through MapReduce programming model utilizing Java.
  • Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Implemented MapReduce jobs to clean and wrangle customer data based on client-specific rules.
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Tested MapReduce jobs with MRUnit and scheduled works in OOZIE.
  • Created and ran daily jobs in MapReduce, [ ] Pig, Hive, etc.
  • Used R and MapReduce to conduct data analytics Conducted data modeling in Cassandra.

Show More

29. Mongodb

average Demand
Here's how Mongodb is used in Data Scientist jobs:
  • Collected unstructured data from MongoDB and completed data aggregation.
  • Collected 5 million customer records from MongoDB and MySQL.
  • Worked on NOSQL databases such as MongoDB and Cassandra.
  • Experience with NoSQL databases such as MongoDB.
  • Worked on NOSQL databases like MongoDB.
  • Deployed on the web using Flask, jQuery, Bootstrap, and Plot.ly from stored data in a MongoDB database.

Show More

30. Large Data

low Demand
Here's how Large Data is used in Data Scientist jobs:
  • Used data mining and non-parametric methods to analyze very large data sets (over 5,000 rows) using R..
  • Managed large data sets, including 10 million checks from over 30 restaurants and 10,000 employees.
  • Build automated data conditioning tools for use on large data sets.
  • Developed large data sets from structured and unstructured data.
  • Build analytical solutions and models by manipulating large data sets.
  • Value added modeling Manage Large Datasets
  • Advised business partners regarding best practices on how to utilize large datasets to make better data-driven decisions.
  • Managed simultaneous projects, large data sets and strict timelines and communicated results to the management.
  • Applied machine learning and statistical techniques to large datasets to find actionable insights.
  • Analyzed large datasets to answer business questions by generating reports and outcome.
  • Analyze large datasets to provide strategic direction to the company.
  • Carry out various statistical analyses of large data sets.
  • Implemented Cassandra cluster to store large data.
  • Worked on large datasets, transformed information from raw data into meaningful analysis that identifiestrends and predicts outcomes.
  • Distributed and multi-threaded development for efficient processing of large data sets End-to-end product development and integration.
  • Developed a data governance framework for the creation and storage of large datasets, including internal data and Mixpanel events.
  • Utilized sparklyR, H2O packages for High Performance Computing, to reduce the time to process large dataset.
  • Analyzed large datasets to provide strategic recommendations for the prediction and detection of various crimes using RStat.

Show More

31. API

low Demand
Here's how API is used in Data Scientist jobs:
  • Develop web-based software to automatically understand generic user input data and rapidly offer relevant visualizations and intelligent insights.
  • Assisted business owners in developing marketing strategies to capitalize on the statistical analysis results.
  • Developed decision tree to segment customers and spearheaded churn predictive modeling using RapidMiner.
  • Performed Data manipulation and wrangling Visualized data with diverse ways such as API, shapely, shiny and so on.
  • Managed and coded application development projects for clinical trials, market research, and capital markets trading risk management systems.
  • Create agent-based model to predict human interactions and relationships Develop API and graph database for use in social model
  • Processed data such as web scraping, data cleaning, joining and wrangling (munging).
  • Orchestrated IBM Discovery Rest API web service for logging in and database communication.
  • Web scraping/crawling to collect additional data from and Web Pages using different tools.
  • Project includes substantial data wrangling, web scraping and Google Books API interaction.
  • Integrated Maven Framework in Java and wrote the REST API classes.
  • Used Restful API's to gather network traffic data from Servers.
  • Performed web scraping of external websites to provide business insights.
  • Extracted data from Twitter using Java and Twitter API.
  • Performed as specialist on intelligent scraping and machine learning.
  • Conducted post audits of completed capital projects.
  • Prototyped and validated Economic Capital, Value-at-Risk, capital adequacy, and credit risk models.
  • Provided a POJO Java Output by compiling the R code using H2o ai package used in developing an API.
  • Evaluated and finalized API using F1-score performance metrics Shortlisted API was used to set up batch processing of interactions.
  • Conduct Trading Service, Capital Market, and Commercial Loan reporting visualization on a monthly basis using Spotfire.

Show More

32. Decision Trees

low Demand
Here's how Decision Trees is used in Data Scientist jobs:
  • Implemented Decision trees instead of Cluster analysis to identify conditions.
  • Developed audience extension models relying on decision trees, random forest, Support VectorRegression, and other continuous data.
  • Performed Market-Basket Analysis and implemented decision trees, random forests and K- fold cross validation.
  • Used linear regression, k-means clustering, and decision trees for modeling.
  • Used Decision trees and Random forests to find employee attrition rate.
  • Used Logistic regression, LDA & random forest decision trees.
  • Used linear regression, ARMA, ARIMA, k-means, decision trees for modeling.
  • Developed audience extension models relying on decision trees, random forest, logistic regression, and othercategorical data
  • Used variety of analytical tools and techniques (regression, logistic, GLM, decision trees, machine learning etc.)
  • Used Regression Decision Trees, Neural network and Time series analysis.
  • Implemented ctree algorithm of Decision trees and Random Forest.
  • Implemented user segmentation using Decision Trees and K-means Clustering, prototyped dynamic visualization of clustering results in R shiny and Plotly.
  • Developed data pre-processing modules and rule extraction engines in R using Random Forest and Decision Trees for an analytics product.

Show More

33. Statistical Models

low Demand
Here's how Statistical Models is used in Data Scientist jobs:
  • Developed theoretical and practical approaches for formulating mathematical optimization models and statistical models to solve business problem and optimize existing processes.
  • Led initiative to build statistical models using historical data to predict real estate prices in several economic markets.
  • Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Created statistical models, algorithms and methodologies to predict, quantify and/or forecast performance metrics.
  • Developed and evaluated statistical models, and made suggestions for further improvements of training.
  • Improved statistical models performance by using leaning curves, feature selection methods and regularization.
  • Delivered Statistical models using Machine Learning, predictive Modeling and Text mining.
  • Developed statistical models to analyze physiological data and human performance measures.
  • Developed statistical models to forecast requirements of the company.
  • Design and develop statistical models to classify text statements.
  • Developed statistical models to forecast inventory and procurement cycles.
  • Gathered requirements, pulled and prepared the data, developed statistical models, and interpreted and presented results to management.
  • Worked on statistical models like Regression, Logistic Regression, SVM, Linear Models and Random Forests.
  • Worked on various Statistical models like DOE, hypothesis testing, Survey testing and queuing theory.
  • Implement statistical models like time series analysis, regression analysis, confidence interval, etc.
  • Established measures, metrics, visualizations, and tools to evaluate statistical models.
  • Develop statistical models using software such as R, SPSS, and STATA.
  • Designed and built statistical models to streamline global health monitoring.
  • Design and implement novel mathematical and statistical models to predict when mechanical equipment will require maintenance.
  • Understand transaction data and develop Analytics insights using Statistical models using Azure Machine learning.

Show More

34. Data Collection

low Demand
Here's how Data Collection is used in Data Scientist jobs:
  • Build and implement data collection and machine learning analysis strategy for online direct product marketing on mobile devices
  • Develop and implement data collection systems that optimize prediction and relevance accuracy in a recommendation system.
  • Collaborated with software engineer to optimize data collection efficiency and deploy prediction model on website.
  • Manage raw data collection, preparing and validating raw data for statistical analysis.
  • Performed extensive implicit as well as explicit data collection.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation and visualization.
  • Understand all phases of the analytic process including data collection, preparation, modeling, evaluation, and deployment.
  • Developed original techniques for this technology, and implemented all necessary code, data collection, and database design.
  • Developed software for data collection, analysis and experiments automation control which saved the lab $20K.
  • Lead research team throughout data collection, experimental design, and results delivery stages.
  • Build the internal data collection platform for 200+ tenants which fuels product/design decisions.
  • Perform data collection, cleaning, imputing, stitching and balancing of data.
  • Advised on strategic data collection and datamining of network traffic flows to increase efficiency of operational bottlenecks.
  • Developed data collection specifications for contractedstudies.
  • Lead person for data collection and munging, creating database, performing all kinds of analytics, forecasting, and predictions.
  • Research designed on Qualtrics for data collection on student experience.

Show More

35. BI

low Demand
Here's how BI is used in Data Scientist jobs:
  • Worked extensively on data pipelines to aggregate data from different sources and determine any limitations in terms of reliability or usability.
  • Created dashboard design to maximize visibility of all resource used by different research projects that combines both financial and operational data.
  • Model building based on historical data for discrete event simulator including fitting reliability curves and analysis of multiple simulation histories.
  • Developed novel machine learning technology by combining advanced models and innovative algorithms to solve complex scientific and business problems.
  • Increased conditional probability of success through novel sourcing of additional features to supplement both the collaborative and content-based processes.
  • Overhauled corporate billing system, providing increased billing flexibility and saving several hours per month in manual processing.
  • Used regression analysis to forecast profitability of new customers based on detailed analysis of their banking behavior.
  • Supervised production of several terabytes of data and worked on feasibility study and impact on analysis.
  • Developed models to isolate the key parameters to optimize manufacturing defect reduction process & set desirability.
  • Optimized markdown schedules and increased profitability of lower-performing products in cooperation with the merchandising team.
  • Created captivating interactive visualizations and presentations to enhance decision making capabilities by the management.
  • Validated client data to optimize modeling data and monitored client credit/debit data feed contribution.
  • Research Responsibilities: Conducted scholarly studies on student classroom behavior and previous academic work.
  • Maintain memory storage on server to allow high availability and functional data processing.
  • Created reliability based analysis programs and procedures translating raw data into statistical model inputs
  • Explain possibilities of analytic solutions to clients with limited knowledge of statistical modeling.
  • Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms.
  • Analyzed the performance characteristics of baseball players and factors influencing them using Beta-Binomial
  • Combined collaborative filtering and information filtering to develop a complete recommendation solution.
  • Conducted usability tests with quantitative and qualitative data to validate interface designs.

Show More

36. Matlab

low Demand
Here's how Matlab is used in Data Scientist jobs:
  • Improved model performance by over 78% to the baseline model using R and MATLAB.
  • Developed software based analysis technique to detect water toxins in a closed-loop water purification plant (C++/Matlab).
  • Performed quality assurance analysis on spectrometer performance and accuracy Used: Matlab, Excel, SQL, Python
  • Implement the methodology using MATLAB, Python and R * Cross-modality comparison and cross-validation.
  • Used R statistical software, Matlab, and Octave.
  • Designed and implemented an instrumented glove for assessment of spasticity with Python and Matlab.
  • TOOL EVALUATION: Evaluated utilization of Qlik View, Hyperion, MicroStrategy, Demantra, Matlab, MiniTab, etc.

Show More

37. Linux

low Demand
Here's how Linux is used in Data Scientist jobs:
  • Improved scoring algorithm using statistical factor analysis methodology o Deployed and maintained the hosting server in a Linux environment
  • Work with all programs including Microsoft office suite (Advanced Excel) in Linux/Unix.
  • Supported Apache Tomcat web server on Linux Platform.
  • Developed Linux/C++ code to perform text analysis and topic classification of 737 systems logbook data.
  • Manage Installation and system administration on Linux.
  • Developed hybrid system solutions utilizing Linux as the central transaction-mapping platform.

Show More

38. Informatica

low Demand
Here's how Informatica is used in Data Scientist jobs:
  • Hand on development and maintenance using Oracle SQL, SQL Loader, PL/SQL and Informatica Power Center9.1.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Implemented complex business rules in Informatica Power Center by creating re-usable transformations, and robust Mapplets.
  • excel files into Vertica via ksh and informatica Manipulate and analyze data in Vertica database i.e.
  • Prepared Low level design documents, including Source to Target mapping specifications to help Informatica Mapping design and development.
  • Utilized Informatica Data Explorer to analyze corporate data to discover relational model and redesign existing system AIS Migration project.

Show More

39. Machine Learning Techniques

low Demand
Here's how Machine Learning Techniques is used in Data Scientist jobs:
  • Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques.
  • Provided statistical support including statistical modeling and machine learning techniques for internal research and provided recommendations based on the results.
  • Machine learning techniques like generalized linear regression, logistic regression, clustering algorithms, and other supervised classification algorithms.
  • Used supervised machine learning techniques such as Logistics Regression and Decision Tree Classification
  • Implemented 5 machine learning techniques, evaluated their accuracy, and improved their predictive accuracy from 60% to 80%.
  • Applied different Machine Learning techniques for customer insight, target marketing, channel execution, risk management, and business strategy.
  • Used Machine Learning techniques to analyze complex interactions among players to help drive intelligent decision-making regarding player churn rate.
  • Use of cutting edge data mining, machine learning techniques for building advanced customer solutions.
  • Developed models to identify high value users and their lookalikes using advanced machine learning techniques.
  • Utilized machine learning techniques for predictions & forecasting based on the training data.
  • Perform statistical analysis and apply machine learning techniques to large amount of data for identifying potential areas of enhancement in products.
  • supervised & unsupervised machine learning techniques.
  • Prepared large volumes of user history data and performed ETL with Hadoop and applied above-mentioned machine learning techniques using Mahout.
  • Performed descriptive and predictive data analytics using machine learning techniques in R and/or Python.
  • Implemented regression and machine learning techniques to study and model ticket buyer s behaviour.
  • research, contract) Researched and implemented machine learning techniques to predict phrase usage in mathematical publications.

Show More

40. A/B

low Demand
Here's how A/B is used in Data Scientist jobs:
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Delivered analysis support to hotel recommendation and providing an online A/B test.
  • Designed and analyzed A/B tests on email coupon templates, improving click through rate by 30%.
  • Build infrastructures to simulate data, design and implement A/B tests, and report effects of treatments.
  • Leverage modeling insights to design and conduct A/B test to evaluate performance impact on the design features.
  • Predicted sales growth of the next quarter using Time Series Analysis to carry out an A/B testing.
  • Utilized A/B/N tests recorded in no-SQL search logs to evaluate new features in search engine results.
  • Conduct experiment design, A/B testing, time series analysis, survey design, etc.
  • Designed A/B tests, created and interpreted post-campaign analytic reports for addressable campaigns for clients.
  • Designed A/B testing frameworks to test efficacy of products and interventions designed to help students.
  • Designed a schema to analyze A/B tests, and created automated reporting and alerting.
  • Perform A/B Testing using different variants on website and compare web traffic and conversions.
  • Used A/B test and Hypothesis test to check the accuracy of the model.
  • Experienced with User Engagement Modeling, Data Pipelines, and A/B testing.
  • Executed A/B and multivariate tests to optimize web analyst performance.
  • Design and report on A/B split tests.
  • Analyzed historical performance and administered A/B tests to examine viewer interactions.
  • Provided in-depth analysis on new data sets, complex web analytics, and key support and feedback on A/B campaign testing.
  • Calculated RMSE score, F-SCORE, PRECISION, RECALL, and A/B testing to evaluate recommender's performance.
  • Participated in implementation and using of A/B testing functionality in the core part of Experian web projects.

Show More

41. Sentiment Analysis

low Demand
Here's how Sentiment Analysis is used in Data Scientist jobs:
  • Developed sentiment analysis and text classification techniques for analyzing large volumes for financial news, with application to financial markets.
  • Designed, programmed, and implemented into production a novel ensemble technique for sentiment analysis on unstructured social media.
  • Applied statistical models to understand student behavior and performance, and applied sentiment analysis to understand challenges faced by students
  • Designed a natural language processing pipeline for sentiment analysis and topic extraction of social media postings.
  • Developed and implemented in R a text classification for sentiment analysis of twitter data.
  • Perform sentiment analysis and gathering insights from large volumes of unstructured data.
  • Major engineer developing algorithm for text mining and sentiment analysis.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Designed sentiment analysis model using NLP on internet data.
  • Designed and developed NLP models for sentiment analysis.
  • Created sentiment analysis dashboard for Human Resources.
  • Structured twitter post data by topic and affect using a sentiment analysis engine to test compensatory control theories in social psychology.
  • Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Assisted a senior colleague in developing a Sentiment Analysis application in Spark using its Python API.
  • Perform sentiment analysis using Naive Bayes algorithm and gathering insights from large volume of unstructured/text data.
  • Used NLP to construct sentiment analysis and social networks of characters in Harry Potter books.
  • Project for natural language processing (NLP) and sentiment analysis using Python.
  • Perform sentiment analysis and identify patterns in comments from lost account surveys.
  • Conduct user research through social media sentiment analysis and data mining.
  • Research on sentiment analysis of social media text.

Show More

42. Hbase

low Demand
Here's how Hbase is used in Data Scientist jobs:
  • Involved in extracting and cleansing the data and defining quality process for the builds to load data into HBASE.
  • Verified the data in HBASE environment.
  • Created HBase tables to store variable data formats coming from different portfolios.
  • Hive, Pig, HBase and Spark.
  • Used Sqoop commands to load Hive, HBase from Oracle and Oracle and automated using Unix shell scripts and Oozie.
  • Developed the Use cases and Technical prototyping to implement PIG, HDP, HIVE and HBASE.

Show More

43. XML

low Demand
Here's how XML is used in Data Scientist jobs:
  • Designed an automation process that parsing XML financial data to Database.
  • Modify the DB2 table and API to be able to save an incomplete bond as an XML back to AIX DB2.
  • Used SAX and DOM parsers to parse the raw XML documents Used RAD as Development IDE for web applications.
  • Developed archival process using Oracle stored procedures and partitioning to assemble fully self-contained XML documents.
  • Designed and implemented innovative path index method for handling hierarchical coding and generic XML data.
  • Developed database program to generate XML configuration file for Adobe Flex application

Show More

44. Nltk

low Demand
Here's how Nltk is used in Data Scientist jobs:
  • Analyzed tweets using nltk and tf-idf to understand common topics in tweets.
  • Open source tools used in the project include NLTK, Gensim and Mallet.

Show More

45. Business Requirements

low Demand
Here's how Business Requirements is used in Data Scientist jobs:
  • Gathered, analyzed and translated business requirements into relevant analytic approaches and shared for peer review.
  • Work closely with business team to understand business requirements and conduct data preparation regarding the requirements.
  • Experience working in Data Requirement analysis for transforming data according to business requirements.
  • Gather complex business requirements and translates into technical requirements.
  • Evaluated big data solutions relative to business requirements.
  • Developed reports as per business requirements and created various reports like summary reports, tabular reports, excel reports etc.
  • Communicate with different Stakeholders, Business Groups, and field User Groups to elicit and to analyze business requirements.
  • Worked with project team to understand the problem and business requirements.
  • Analyzed business requirements, translated it into SAS and SQL.
  • Gathered, analyzed, and documented the business requirements.

Show More

46. Json

low Demand
Here's how Json is used in Data Scientist jobs:
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.
  • Parsed JSON formatted twitter data and uploaded to database.
  • Used MongoDB to accept JSON pings and output new Fraud predictions on a web-app interface.
  • Scraped and retrieved web data as JSON using Scrapy, presented with Pandas library.
  • Updated elasticsearch-tableau connector to handle nested JSON objects and published to GitHub.
  • Used JavaScript and JSON to update a portion of a webpage.
  • Placed data into JSON files using Python to test Django websites.
  • Used mongoDB as back end to store the text data as json scripts.

Show More

47. PCA

low Demand
Here's how PCA is used in Data Scientist jobs:
  • Applied Principal Component Analysis (PCA) for data reduction and to create Key Performance Indicators (KPI).
  • Joined PCA scores with factory equipment data to identify process tools responsible for yield issues.
  • Generalized by feature engineering, grid-searching, and PCA.
  • Utilized dimensional reduction methods (such as, PCA and neural network) to improve the stability of security segmentation.
  • Build revenue per click model to adjust bid for SEM using PCA and time series analysis.
  • Applied kernel PCA towards effective performance monitoring of WTGs.

Show More

48. Spss

low Demand
Here's how Spss is used in Data Scientist jobs:
  • Create custom visualization and data exploration solutions in R and SPSS to query databases and display data to subject matter experts.
  • Trained in Basics of Data Scientist and implemented those software applications in collecting and managing patient data in Excel/SPSS.
  • Introduced, designed and integrated R modeling into existing SPSS data mining process.
  • Augmented process for R model deployment & scoring on existing SPSS scoring engine.
  • Refined time-series data and validated mathematical models using analytical tools like R and SPSS to reduce forecasting errors.
  • Date entry was performed in MS Excel and recoded in IBM SPSS for further analysis.
  • Utilized SPSS and Minitab software to randomize, analyze and interpretation of data.
  • Generated ad-hoc or management specific reports using SSRS, SPSS, and Excel.

Show More

49. Mahout

low Demand
Here's how Mahout is used in Data Scientist jobs:
  • Designed and conducted two successful trainings on Mahout (platform for machine learning on big data).
  • Machine learning: user churn modelling (PoC Mahout) +Technical environment:.

Show More

20 Most Common Skill for a Data Scientist

R11.8%
Pl/Sql9.8%
Python9.8%
Analytics9.7%
Algorithms7.4%
Hadoop5.2%
Logistic Regression4.7%
Data Warehouse4.1%

Typical Skill-Sets Required For A Data Scientist

RankSkillPercentage of ResumesPercentage
1
1
R
R
8.4%
8.4%
2
2
Pl/Sql
Pl/Sql
7%
7%
3
3
Python
Python
7%
7%
4
4
Analytics
Analytics
6.9%
6.9%
5
5
Algorithms
Algorithms
5.3%
5.3%
6
6
Hadoop
Hadoop
3.7%
3.7%
7
7
Logistic Regression
Logistic Regression
3.4%
3.4%
8
8
Data Warehouse
Data Warehouse
3%
3%
9
9
Big Data
Big Data
2.9%
2.9%
10
10
Amazon Web
Amazon Web
2.8%
2.8%
11
11
SAS
SAS
2.5%
2.5%
12
12
Data Visualization
Data Visualization
2.5%
2.5%
13
13
Data Science
Data Science
2.5%
2.5%
14
14
Predictive Models
Predictive Models
2.4%
2.4%
15
15
ETL
ETL
2.2%
2.2%
16
16
Data Analysis
Data Analysis
2%
2%
17
17
SQL
SQL
1.8%
1.8%
18
18
Support Vector Machines
Support Vector Machines
1.7%
1.7%
19
19
Hdfs
Hdfs
1.7%
1.7%
20
20
Natural Language Processing
Natural Language Processing
1.7%
1.7%
21
21
Neural Networks
Neural Networks
1.6%
1.6%
22
22
Pandas
Pandas
1.5%
1.5%
23
23
K-Means
K-Means
1.5%
1.5%
24
24
AWS
AWS
1.4%
1.4%
25
25
Scikit-Learn
Scikit-Learn
1.3%
1.3%
26
26
Numpy
Numpy
1.2%
1.2%
27
27
Teradata
Teradata
1.1%
1.1%
28
28
Mapreduce
Mapreduce
1.1%
1.1%
29
29
Mongodb
Mongodb
1.1%
1.1%
30
30
Large Data
Large Data
1.1%
1.1%
31
31
API
API
1.1%
1.1%
32
32
Decision Trees
Decision Trees
1.1%
1.1%
33
33
Statistical Models
Statistical Models
1%
1%
34
34
Data Collection
Data Collection
1%
1%
35
35
BI
BI
1%
1%
36
36
Matlab
Matlab
0.9%
0.9%
37
37
Linux
Linux
0.8%
0.8%
38
38
Informatica
Informatica
0.8%
0.8%
39
39
Machine Learning Techniques
Machine Learning Techniques
0.8%
0.8%
40
40
A/B
A/B
0.8%
0.8%
41
41
Sentiment Analysis
Sentiment Analysis
0.8%
0.8%
42
42
Hbase
Hbase
0.8%
0.8%
43
43
XML
XML
0.8%
0.8%
44
44
Nltk
Nltk
0.8%
0.8%
45
45
Business Requirements
Business Requirements
0.7%
0.7%
46
46
Json
Json
0.7%
0.7%
47
47
PCA
PCA
0.7%
0.7%
48
48
Spss
Spss
0.7%
0.7%
49
49
Mahout
Mahout
0.7%
0.7%

26,929 Data Scientist Jobs

Where do you want to work?