Data Science

Category
Data Science
The program begins with an overview of Digital Transformation and covers all the phases of the Data Science process from Data Cleansing, Data Manipulation, Data Integration, Data Wrangling, Descriptive Analytics and Visualization to Predictive Analytics and Machine Learning Models using R, Python and Tableau.
Since the base foundation of Statistics and Mathematics is required for Data Science and Machine Learning, the course also focuseson laying astrong foundation on Statistics covering Descriptive Statistics, Inferential Statistics, Hypothesis testing and Exploratory Data Analysis.
Special attention is also given in understanding how the Machine Learning models work in the backend.
The course also helps you understand the methodology and framework in which the data science projects are handled and the way the data science teams are managed. A comparative study with all other Project Management methodologies would also be taken up as part of the course.
The course will help you to launch your career in the field of data science and take up roles of data scientist, data analyst, machine learning experts deploying various machine learning algorithms to solve real time complex business problems.
What is Data Science and Machine Learning?
Data science is a term used for dealing with big data that includes data collection, cleansing, preparation and analysis for various purposes. A data scientist collects data from multiple sources and after analysis applies predictive analysis or machine learning and sentiment analysis to extract the critical information from the data sets. These data scientists analyse and understand the data from business perspective and give useful insights and accurate predictions that can be used while taking critical business decisions.
Data science covers a wide array of dataoriented technologies including SQL, Python, R, and Hadoop, etc.
Data Science stitches together a bunch of ideas/ algorithms drawn from machine learning to create a solution and in doing so borrows a lot of ideas from traditional statistics, domain expertise and basic mathematics. In this way, data science is the process of solving a use case, providing a solution as opposed to machine learning that is an important cog in that solution.
Machine learning can be defined as the practice of using algorithms to use data, learn from it and then forecast future trends for that topic. Traditional machine learning software comprised of statistical analysis and predictive analysis that are used to spot patterns and catch hidden insights based on perceived data.
It is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. There are 3 types of Machine Learning Algorithms.
 1. Supervised Learning
 2. Unsupervised Learning
 3. Reinforcement Learning
List of Common Machine Learning Algorithms
Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:
 1. Linear Regression
 2. Logistic Regression
 3. Decision Tree
 4. SVM
 5. Naive Bayes
 6. KNN
 7. KMeans
 8. Random Forest
 9. Dimensionality Reduction Algorithms
 10. Gradient Boosting algorithms
 11. GBM
 12. XGBoost
 13. LightGBM
 14. CatBoost
Who is a Data scientist?
Data scientists are responsible for discovering insights from massive amounts of structured and unstructured data to help shape or meet specific business needs and goals. The data scientist role is becoming increasingly important as businesses rely more heavily on data analytics to drive decisionmaking and lean on automation and machine learning as core components of their IT strategies.
Industry where Data Science is being used?
Life sciences and Health care, Retail, Banking, Telecom, FMCG, Banking, eCommerce, Manufacturing, etc.
Participants can further specialise into different verticals based on their domain experience – HR Analytics, Marketing Analytics, Social Network Analytics, Health care Analytics, Fraud Analytics
Machine learning aids data science by providing a suit of algorithms for data modelling, decision making and even data preparation
Where machine learning  at its core  is about the use and development of these learning algorithms, data science is more about the extraction of knowledge from data to answer particular question or solve particular problems.
Machine learning is often a big part of a "data science" project. It is often heavily used for exploratory analysis and discovery (clustering algorithms) and building predictive models (supervised learning algorithms). However, in data science, you often also worry about the collection, wrangling, and cleaning of your data (i.e., data engineering), and eventually, you want to draw conclusions from your data that help you solve a particular problem.
Course Information
 Class Start: 23rd March 2019
 Course Duration: 3 Months / 100 + hours
 Class Schedule: Saturday & Sunday
 Class Time:
Course Curriculum
Intro to Data Science
 Intro to Data Science
 WHY Data Science
 WHAT is achieved using Data Science
 Intro to Algebra / Statistics / Probability involved in Data Science
 A Case Study on usage of Data Science
Getting Started with R
 Getting , Installing & Setting up R & RStudio
 Understanding R Interface & various key Features of R & RStudio
 Short Introduction to R Programming Language
 R Data Structures
 R Packages & Functions
 Data Handling in R
 Data Visualization in R
 2 Exercises using R
Data Handling & Visualization in R
 Data Handling & Data Management using R
 Exploratory Data Analysis using R
 Data Wrangling using R ( dplyr )
 Text Mining Using R
 R Graphics
 Data Visualization using R
Descriptive Statistics
 Measures of Central Tendency
 Spread of Data
 Association between variables
Infrential Statistics
 Basics of Probability
 Probability Distribution
 Sampling and Sampling Distribution
Hypothesis Testing
 Concepts in Hypothesis Testing
 Setting up Hypothesis Test
 When not to use Ztest
Python
 Data Structures In python
 Control Structures & Functions
 Data Analaysis Using Pandas
Sql
 SQL Basics and Introduction
Case Study 1 Using R
 R handson : Case Study implementation using R .
 Twitter Sentiment Analysis Case Study , which will cover
 Data Handling
 Data Wrangling
 Text Mining
 Data Visualization
Intro to Machine Learning
 Machine Learning Basics
 WHY Machine Learning
 HOW Machine Learns
 High level view on Machine Learning Algorithms
 WHEN to use WHICH Algorithms
 Data Preparation & Exploration for Machine Learning
 Data Spilt  Creating Training Data & Testing Data
 Missing Data Handling & Data Imputation
 Machine Learning in R  Required Packages & Functions
Case Study 2 Using R
 R handson : Case Study implementation using R .
 Case Study on Data Preparation for Machine Learning
 Using the Housevotes84 Dataset available in 'mlbench' package
 Identifying Missing Data
 Visualizing the Missing Data
 Data Imputation for Missing Data
 Data Splitting to create Training & Testing Data
Unsupervised Machine Learning
 Unspervised Machine Learning Algorithms
 Association Rules / Market Basket Analysis
 What is Market Basket Analysis & Why it is used
 Various Concepts involved in MBA
 Apriori Algorithm
 Clustering Analysis
 What is Clustering Analysis
 Various concepts of Clustering
 KMean Clustering
 Hierarchical Clustering
 Association Rules / Market Basket Analysis
Case Study 3/4 & 5 Using R
 R Handson : Case Study implementation using R :
 Retail Case Study on Association Rule
 Using R , perform Association Rule Mining
 Case Study: Identifying frequently purchased grocerries with association rules
 Wine Data Case Study on Clustering
 Applying KMean Clustering
 Case Study: Diagnosing Breast Cancer with KNN Algorithm
 Case Study: Finding Teen Market segment using Kmeans Clustering
 Applying Hierarchical Clustering
Supervised Learning Classification
 Supervised Machine Learning : Classification Algorithms
 What is Supervised Machine Learning
 What is Classification Algorithm
 Naïve Bayes Classification
 Logistic Regression
 Decision Trees
 Random Forest
 SVM
Case Study 6/7 & 8 using R
 R Handson : Case Study implementation using R :
 Case Study to identify Political Affiliation based on voting Pattern
 Applying Naïve Bayes using R
 Case Study:Filtering mobile phone spam with Naïve Bayes Algorithm
 Case study to Predict the Survival on the Titanic
 Applying Logistic Regression using R
Case Study 9/10 & 11 using R
 R Handson : Case Study implementation using R :
 Case study to implement Decision Trees in R
 Telecom Case Study
 Case Study:Identifying Risky bank loans using C5.0 Decision Trees
 Case Study to implement Random Forest
 Sonar Data Case Study
ML Model Validation Techniques
 Validation of the Predictive Model Developed
 Resubstitution Technique
 Holdout
 Kfold cross validation
 LOOCV ( Leave One Out Cross Validation )
 Random Subsampling
 Bootstrapping
Methods to Finetune input Data & Evaluation of Binary Classification
 Anomaly Detection Technique 
 Data Anomaly & Outlier Detection using R
 Evaluation of Classification Model using Confusion Matrix
 True Postive / False Positive / True Negative / False Negative
 Other Metrics from Confusion Matrix
 Accuracy / Specificity / Precision / Recall
 ROC Curve & AUC
Case Study 12 / 13 using R
 R Handson : Case Study implementation using R :
 Anomaly Detection using R ( Wikipedia Pageview Case Study )
 Confusion Matrix computation For Housevotes84 Case Study
 Creating ROC Curve + computing AUC for Housevotes84 data
ML Prediction Using Linear Regressiion
 Linear Regression Technique 
 What is Linear Regression
 When to use Linear Regression
 Simple vs. Multiple Linear Regression
 Measures of Model Performance
 R  Squared
 Adjusted RSquared
 RMSE ( Root Mean Square Error )
Case Study 14 & 15 using R
 R Handson : Case Study implementation using R :
 Simple Linear Regression Case Study 
 Australian Athletics Data
 Case Study: Predicting Medical Expenses using Linear Regression
 Multiple Linear Regression Case Study
 OZONE Case Study
Time Series Analysis Using R
 Time Series Analaysis using R
 WHAT is Time Series Analysis
 WHEN to use it
 ARIMA
 Random Walks
 State Space Models
Case Study 16 using R
 R Handson : Case Study implementation using R :
 Manufacturing Case Study using ARIMA
 manufacturing case study example to forecast tractor sales through time series and ARIMA models
Tableau
 Visualisation with Tableau
Intro to Big Data
 Introduction to Big Data and Industry Applications and Utility
Real time Project Experience
 Guest Lecture on Real time Project Experience by Rohit
Managing DS Projects and Teams
 Guest Lecture on How to Manage DS Projects and Handle DS Teams