Data Science

No Image

Data Science

  • Category
    Data Science

The program begins with an overview of Digital Transformation and covers all the phases of the Data Science process from Data Cleansing, Data Manipulation, Data Integration, Data Wrangling, Descriptive Analytics and Visualization to Predictive Analytics and Machine Learning Models using R, Python and Tableau.

Since the base foundation of Statistics and Mathematics is required for Data Science and Machine Learning, the course also focuseson laying astrong foundation on Statistics covering Descriptive Statistics, Inferential Statistics, Hypothesis testing and Exploratory Data Analysis.

Special attention is also given in understanding how the Machine Learning models work in the backend.

The course also helps you understand the methodology and framework in which the data science projects are handled and the way the data science teams are managed. A comparative study with all other Project Management methodologies would also be taken up as part of the course.

The course will help you to launch your career in the field of data science and take up roles of data scientist, data analyst, machine learning experts deploying various machine learning algorithms to solve real time complex business problems.

What is Data Science and Machine Learning?

Data science is a term used for dealing with big data that includes data collection, cleansing, preparation and analysis for various purposes. A data scientist collects data from multiple sources and after analysis applies predictive analysis or machine learning and sentiment analysis to extract the critical information from the data sets. These data scientists analyse and understand the data from business perspective and give useful insights and accurate predictions that can be used while taking critical business decisions.

Data science covers a wide array of data-oriented technologies including SQL, Python, R, and Hadoop, etc.

Data Science stitches together a bunch of ideas/ algorithms drawn from machine learning to create a solution and in doing so borrows a lot of ideas from traditional statistics, domain expertise and basic mathematics. In this way, data science is the process of solving a use case, providing a solution as opposed to machine learning that is an important cog in that solution.

Machine learning can be defined as the practice of using algorithms to use data, learn from it and then forecast future trends for that topic. Traditional machine learning software comprised of statistical analysis and predictive analysis that are used to spot patterns and catch hidden insights based on perceived data.

It is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. There are 3 types of Machine Learning Algorithms.

      1. Supervised Learning
      2. Unsupervised Learning
      3. Reinforcement Learning

List of Common Machine Learning Algorithms

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

      1. Linear Regression
      2. Logistic Regression
      3. Decision Tree
      4. SVM
      5. Naive Bayes
      6. KNN
      7. K-Means
      8. Random Forest
      9. Dimensionality Reduction Algorithms
      10. Gradient Boosting algorithms
      11. GBM
      12. XGBoost
      13. LightGBM
      14. CatBoost

Who is a Data scientist?

Data scientists are responsible for discovering insights from massive amounts of structured and unstructured data to help shape or meet specific business needs and goals. The data scientist role is becoming increasingly important as businesses rely more heavily on data analytics to drive decision-making and lean on automation and machine learning as core components of their IT strategies.

Industry where Data Science is being used?

Life sciences and Health care, Retail, Banking, Telecom, FMCG, Banking, eCommerce, Manufacturing, etc.

Participants can further specialise into different verticals based on their domain experience – HR Analytics, Marketing Analytics, Social Network Analytics, Health care Analytics, Fraud Analytics

Machine learning aids data science by providing a suit of algorithms for data modelling, decision making and even data preparation

Where machine learning -- at its core -- is about the use and development of these learning algorithms, data science is more about the extraction of knowledge from data to answer particular question or solve particular problems.

Machine learning is often a big part of a "data science" project. It is often heavily used for exploratory analysis and discovery (clustering algorithms) and building predictive models (supervised learning algorithms). However, in data science, you often also worry about the collection, wrangling, and cleaning of your data (i.e., data engineering), and eventually, you want to draw conclusions from your data that help you solve a particular problem.

Course Information

  • Class Start: 27th Oct 2018
  • Course Duration: 3 Months / 100 + hours
  • Class Schedule: Saturday & Sunday
  • Class Time:

Course Curriculum

Intro to Data Science

  • Intro to Data Science
  • WHY Data Science
  • WHAT is achieved using Data Science
  • Intro to Algebra / Statistics / Probability involved in Data Science
  • A Case Study on usage of Data Science

Getting Started with R

  • Getting , Installing & Setting up R & R-Studio
  • Understanding R Interface & various key Features of R & R-Studio
  • Short Introduction to R Programming Language
  • R Data Structures
  • R Packages & Functions
  • Data Handling in R
  • Data Visualization in R
  • 2 Exercises using R

Data Handling & Visualization in R

  • Data Handling & Data Management using R
  • Exploratory Data Analysis using R
  • Data Wrangling using R ( dplyr )
  • Text Mining Using R
  • R Graphics
  • Data Visualization using R

Descriptive Statistics

  • Measures of Central Tendency
  • Spread of Data
  • Association between variables

Infrential Statistics

  • Basics of Probability
  • Probability Distribution
  • Sampling and Sampling Distribution

Hypothesis Testing

  • Concepts in Hypothesis Testing
  • Setting up Hypothesis Test
  • When not to use Z-test

Python

  • Data Structures In python
  • Control Structures & Functions
  • Data Analaysis Using Pandas

Sql

  • SQL Basics and Introduction

Case Study 1 Using R

  • R hands-on : Case Study implementation using R .
  • Twitter Sentiment Analysis Case Study , which will cover
    • Data Handling
    • Data Wrangling
    • Text Mining
    • Data Visualization

Intro to Machine Learning

  • Machine Learning Basics
  • WHY Machine Learning
  • HOW Machine Learns
  • High level view on Machine Learning Algorithms
  • WHEN to use WHICH Algorithms
  • Data Preparation & Exploration for Machine Learning
  • Data Spilt - Creating Training Data & Testing Data
  • Missing Data Handling & Data Imputation
  • Machine Learning in R - Required Packages & Functions

Case Study 2 Using R

  • R hands-on : Case Study implementation using R .
  • Case Study on Data Preparation for Machine Learning
  • Using the Housevotes84 Dataset available in 'mlbench' package
    • Identifying Missing Data
    • Visualizing the Missing Data
    • Data Imputation for Missing Data
    • Data Splitting to create Training & Testing Data

Unsupervised Machine Learning

  • Unspervised Machine Learning Algorithms
    • Association Rules / Market Basket Analysis
      • What is Market Basket Analysis & Why it is used
      • Various Concepts involved in MBA
      • Apriori Algorithm
    • Clustering Analysis
      • What is Clustering Analysis
      • Various concepts of Clustering
      • K-Mean Clustering
      • Hierarchical Clustering

Case Study 3/4 & 5 Using R

  • R Hands-on : Case Study implementation using R :
  • Retail Case Study on Association Rule
  • Using R , perform Association Rule Mining
  • Case Study: Identifying frequently purchased grocerries with association rules
  • Wine Data Case Study on Clustering
  • Applying K-Mean Clustering
  • Case Study: Diagnosing Breast Cancer with KNN Algorithm
  • Case Study: Finding Teen Market segment using K-means Clustering
  • Applying Hierarchical Clustering

Supervised Learning -Classification

  • Supervised Machine Learning : Classification Algorithms
  • What is Supervised Machine Learning
  • What is Classification Algorithm
    • Naïve Bayes Classification
    • Logistic Regression
    • Decision Trees
    • Random Forest
    • SVM

Case Study 6/7 & 8 using R

  • R Hands-on : Case Study implementation using R :
  • Case Study to identify Political Affiliation based on voting Pattern
  • Applying Naïve Bayes using R
  • Case Study:Filtering mobile phone spam with Naïve Bayes Algorithm
  • Case study to Predict the Survival on the Titanic
  • Applying Logistic Regression using R

Case Study 9/10 & 11 using R

  • R Hands-on : Case Study implementation using R :
  • Case study to implement Decision Trees in R
  • Telecom Case Study
  • Case Study:Identifying Risky bank loans using C5.0 Decision Trees
  • Case Study to implement Random Forest
  • Sonar Data Case Study

ML Model Validation Techniques

  • Validation of the Predictive Model Developed
    • Resubstitution Technique
    • Holdout
    • K-fold cross validation
    • LOOCV ( Leave One Out Cross Validation )
    • Random Subsampling
    • Bootstrapping

Methods to Finetune input Data & Evaluation of Binary Classification

  • Anomaly Detection Technique -
  • Data Anomaly & Outlier Detection using R
  • Evaluation of Classification Model using Confusion Matrix
  • True Postive / False Positive / True Negative / False Negative
  • Other Metrics from Confusion Matrix
  • Accuracy / Specificity / Precision / Recall
  • ROC Curve & AUC

Case Study 12 / 13 using R

  • R Hands-on : Case Study implementation using R :
    • Anomaly Detection using R ( Wikipedia Pageview Case Study )
    • Confusion Matrix computation For Housevotes84 Case Study
    • Creating ROC Curve + computing AUC for Housevotes84 data

ML Prediction Using Linear Regressiion

  • Linear Regression Technique -
    • What is Linear Regression
    • When to use Linear Regression
  • Simple vs. Multiple Linear Regression
  • Measures of Model Performance
    • R - Squared
    • Adjusted R-Squared
    • RMSE ( Root Mean Square Error )

Case Study 14 & 15 using R

  • R Hands-on : Case Study implementation using R :
    • Simple Linear Regression Case Study -
    • Australian Athletics Data
    • Case Study: Predicting Medical Expenses using Linear Regression
  • Multiple Linear Regression Case Study
    • OZONE Case Study

Time Series Analysis Using R

  • Time Series Analaysis using R
  • WHAT is Time Series Analysis
  • WHEN to use it
  • ARIMA
  • Random Walks
  • State Space Models

Case Study 16 using R

  • R Hands-on : Case Study implementation using R :
  • Manufacturing Case Study using ARIMA
  • manufacturing case study example to forecast tractor sales through time series and ARIMA models

Tableau

  • Visualisation with Tableau

Intro to Big Data

  • Introduction to Big Data and Industry Applications and Utility

Real time Project Experience

  • Guest Lecture on Real time Project Experience by Rohit

Managing DS Projects and Teams

  • Guest Lecture on How to Manage DS Projects and Handle DS Teams

Introduction to Deep Learning / NLP and Neural Networks

Live Project Discussion