Data Science Course in Ahmedabad, Data Science Training Institute

Data Science Training In Ahmedabad

Introduction to Data Science with R

What is Data Science, significance of Data Science in today’s digitally-driven world, applications of Data Science, lifecycle of Data Science, components of the Data Science lifecycle, introduction to big data and Hadoop, introduction to Machine Learning and Deep Learning, introduction to R programming and R Studio.

Data Science is a multidisciplinary branch created from various parental disciplines of software engineering, data engineering, business intelligence, scientific methods, visualization, statistics and a mishmash of many other disciplines. R is a statistical programming language which will help us analyzing the data in a very fine manner. In data science now a days R is playing a major role and creates a lot of scope to explore every day. This tutorial series explains how to perform Data Science application using R programming language. First let us go through R.

Get Expert Consultation

Hands-on Exercise – Installation of R Studio, implementing simple mathematical operations and logic using R operators, loops, if statements and switch cases.

Data Exploration

Introduction to data exploration, importing and exporting data to/from external sources, what is data exploratory analysis, data importing, dataframes, working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.

Hands-on Exercise – Accessing individual elements of customer churn data, modifying and extracting the results from the dataset using user-defined functions in R.

Data Manipulation

Need for Data Manipulation, Introduction to dplyr package, Selecting one or more columns with select() function, Filtering out records on the basis of a condition with filter() function, Adding new columns with the mutate() function, Sampling & Counting with sample_n(), sample_frac() & count() functions, Getting summarized results with the summarise() function, Combining different functions with the pipe operator, Implementing sql like operations with sqldf.

Hands-on Exercise – Implementing dplyr to perform various operations for abstracting over how data is manipulated and stored.

Data Visualization

Introduction to visualization, Different types of graphs, Introduction to grammar of graphics & ggplot2 package, Understanding categorical distribution with geom_bar() function, understanding numerical distribution with geom_hist() function, building frequency polygons with geom_freqpoly(), making a scatter-plot with geom_pont() function, multivariate analysis with geom_boxplot, univariate Analysis with Bar-plot, histogram and Density Plot, multivariate distribution, Bar-plots for categorical variables using geom_bar(), adding themes with the theme() layer, visualization with plotly package & building web applications with shinyR, frequency-plots with geom_freqpoly(), multivariate distribution with scatter-plots and smooth lines, continuous vs categorical with box-plots, subgrouping the plots, working with co-ordinates and themes to make the graphs more presentable, Intro to plotly & various plots, visualization with ggvis package, geographic visualization with ggmap(), building web applications with shinyR

Hands-on Exercise – Creating data visualization to understand the customer churn ratio using charts using ggplot2, Plotly for importing and analyzing data into grids. You will visualize tenure, monthly charges, total charges and other individual columns by using the scatter plot.

Introduction to Statistics

Why do we need Statistics?, Categories of Statistics, Statistical Terminologies,Types of Data, Measures of Central Tendency, Measures of Spread, Correlation & Covariance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.

Hands-on Exercise – Building a statistical analysis model that uses quantifications, representations, experimental data for gathering, reviewing, analyzing and drawing conclusions from data.

Machine Learning

Introduction to Machine Learning, introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model, predicting results and finding p-value, introduction to logistic regression, comparing linear regression and logistics regression, bivariate & multi-variate logistic regression, confusion matrix & accuracy of model, threshold evaluation with ROCR, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals, qqnorm(), qqline(), understanding the fit of the model, building simple linear model, predicting results and finding p-value, understanding the summary results with Null Hypothesis, p-value & F-statistic, building linear models with multiple independent variables.

Hands-on Exercise – Modeling the relationship within the data using linear predictor functions. Implementing Linear & Logistics Regression in R by building model with ‘tenure’ as dependent variable and multiple independent variables.

Logistic Regression

Introduction to Logistic Regression, Logistic Regression Concepts, Linear vs Logistic regression, math behind Logistic Regression, detailed formulas, logit function and odds, Bi-variate logistic Regression, Poisson Regression, building simple “binomial” model and predicting result, confusion matrix and Accuracy, true positive rate, false positive rate, and confusion matrix for evaluating built model, threshold evaluation with ROCR, finding the right threshold by building the ROC plot, cross validation & multivariate logistic regression, building logistic models with multiple independent variables, real-life applications of Logistic Regression.

Hands-on Exercise – Implementing predictive analytics by describing the data and explaining the relationship between one dependent binary variable and one or more binary variables. You will use glm() to build a model and use ‘Churn’ as the dependent variable.

Decision Trees & Random Forest

What is classification and different classification techniques, introduction to Decision Tree, algorithm for decision tree induction, building a decision tree in R, creating a perfect Decision Tree, Confusion Matrix, Regression trees vs Classification trees, introduction to ensemble of trees and bagging, Random Forest concept, implementing Random Forest in R, what is Naive Bayes, Computing Probabilities, Impurity Function – Entropy, understand the concept of information gain for right split of node, Impurity Function – Information gain, understand the concept of Gini index for right split of node, Impurity Function – Gini index, understand the concept of Entropy for right split of node, overfitting & pruning, pre-pruning, post-pruning, cost-complexity pruning, pruning decision tree and predicting values, find the right no of trees and evaluate performance metrics.

Hands-on Exercise – Implementing Random Forest for both regression and classification problems. You will build a tree, prune it by using ‘churn’ as the dependent variable and build a Random Forest with the right number of trees, using ROCR for performance metrics.

Unsupervised learning

What is Clustering & it’s Use Cases, what is K-means Clustering, what is Canopy Clustering, what is Hierarchical Clustering, introduction to Unsupervised Learning, feature extraction & clustering algorithms, k-means clustering algorithm, Theoretical aspects of k-means, and k-means process flow, K-means in R, implementing K-means on the data-set and finding the right no. of clusters using Scree-plot, hierarchical clustering & Dendogram, understand Hierarchical clustering, implement it in R and have a look at Dendograms, Principal Component Analysis, explanation of Principal Component Analysis in detail, PCA in R, implementing PCA in R.

Hands-on Exercise – Deploying unsupervised learning with R to achieve clustering and dimensionality reduction, K-means clustering for visualizing and interpreting results for the customer churn data.

Association Rule Mining & Recommendation Engine

Introduction to association rule Mining & Market Basket Analysis, measures of Association Rule Mining: Support, Confidence, Lift, Apriori algorithm & implementing it in R, Introduction to Recommendation Engine, user-based collaborative filtering & Item-Based Collaborative Filtering, implementing Recommendation Engine in R, user-Based and item-Based, Recommendation Use-cases.

Hands-on Exercise – Deploying association analysis as a rule-based machine learning method, identifying strong rules discovered in databases with measures based on interesting discoveries.

Support Vector Machine – (SVM)

Introduction to Support Vector Machine (SVM), Data classification using SVM, SVM Algorithms using Separable and Inseparable cases, Linear SVM for identifying margin hyperplane.

Naïve Bayes

What is Bayes theorem, What is Naïve Bayes Classifier, Classification Workflow, How Naive Bayes classifier works, Classifier building in Scikit-learn, building a probabilistic classification model using Naïve Bayes, Zero Probability Problem.

Data science Projects

Data Science with Python

Introduction to Data Science

What is Data Science, what does a data scientist do, various examples of Data Science in the industries and how Python is deployed for Data Science applications, various steps in Data Science process like data wrangling, data exploration and selecting the model, understanding data visualization, what is exploratory data analysis and building of hypothesis, plotting and other techniques.

Introduction to Python

Introduction to Python programming language, important Python features, how is Python different from other programming languages, Python installation, Anaconda Python distribution for Windows, Linux and Mac, how to run a sample Python script, Python IDE working mechanism, running some Python basic commands, Python variables, data types and keywords.

Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac

Python basic constructs

Introduction to a basic construct in Python, understanding indentation like tabs and spaces, code comments like Pound # character, names and variables, Python built-in data types like containers (list, set, tuple and dict), numeric (float, complex, int), text sequence (string), constants (true, false, ellipsis) and others (classes, instances, modules, exceptions and more), basic operators in Python like logical, bitwise, assignment, comparison and more, slicing and the slice operator, loop and control statements like break, if, for, continue, else, range() and more.

Hands-on Exercise – Write your first Python program, write a Python function (with and without parameters), use Lambda expression, write a class, create a member function and a variable, create an object and write a for loop to print all odd numbers

Writing OOP in Python and connecting to database

Understanding the OOP paradigm like encapsulation, inheritance, polymorphism and abstraction, what are access modifiers, instances, class members, classes and objects, function parameter and return type functions, Lambda expressions, connecting with database to pull the data.

Hands-on Exercise – Writing a Python program and incorporating the OOP concepts and connecting to a database for getting the data.

NumPy for mathematical computing

Introduction to mathematical computing in Python, what are arrays and matrices, array indexing, array math, ND-array object, datatypes, standard deviation, conditional probability in NumPy, correlation, covariance

Hands-on Exercise – How to import NumPy module, creating array using ND-array, calculating standard deviation on array of numbers and calculating correlation between two variables

SciPy for scientific computing

Introduction to SciPy, building on top of NumPy, what are the characteristics of SciPy, various subpackages for SciPy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with SciPy.

Hands-on Exercise: Importing of SciPy, applying the Bayes theorem on the given dataset.

Data Analysis and Machine Learning (Pandas)

Introduction to Machine Learning with Python, various tools in Python used for Machine Learning like NumPy, Scikit-Learn, Pandas, Matplotlib and more, use cases of Machine Learning, process flow of Machine Learning, various categories of Machine Learning, understanding Linear Regression and Logistic Regression, what is gradient descent in Machine Learning, introduction to Python DataFrames, importing data from JSON, CSV, Excel, SQL database, NumPy array to DataFrame, various data operations like selecting, filtering, sorting, viewing, joining and combining, how to handle missing values, time series analysis.

Hands-on Exercise – Implementing Python libraries for Machine Learning models, doing Linear and Logistic Regression using these Python libraries.

Data manipulation

What is a data object and its basic functionalities, using Pandas library for data manipulation, NumPy dependency of Pandas library, loading and handling data with Pandas, how to merge data objects, concatenation and various types of joins on data objects, exploring and analyzing datasets.

Hands-on Exercise – Doing data manipulation with Pandas by handling tabular datasets that includes variable types like float, integer, double and others.

Data visualization with Matplotlib

Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more, Matplotlib API, Subplots and Pandas built-in data visualization.

Hands-on Exercise – Deploying Matplotlib for creating pie, scatter, line and histogram.

Supervised learning

What is supervised learning, classification, Decision Tree, algorithm for Decision Tree induction, Confusion Matrix, Random Forest, Naïve Bayes, working of Naïve Bayes, how to implement Naïve Bayes Classifier, Support Vector Machine, working process of Support Vector Mechanism, what is Hyperparameter Optimization, comparing Random Search with Grid Search, how to implement Support Vector Machine for classification.

Hands-on Exercise – Using Python library Scikit-Learn for coming up with Random Forest algorithm to implement supervised learning.

Unsupervised Learning

Introduction to unsupervised learning, use cases of unsupervised learning, what is K-means clustering, understanding the K-means clustering algorithm, optimal clustering, hierarchical clustering and K-means clustering and how does hierarchical clustering work, what is natural language processing, working with NLP on text data, setting up the environment using Jupyter Notebook, analyzing sentence, the Scikit-Learn Machine Learning algorithms, bags of words model, extracting feature from text, searching a grid, model training, multiple parameters and building of a pipeline

Hands-on Exercise – Setting up the Jupyter notebook environment, loading of a dataset in Jupyter, algorithms in Scikit-Learn package for performing Machine Learning techniques and training a model to search a grid.

Web Scraping with Python

Introduction to web scraping in Python, various web scraping libraries, BeautifulSoup, Scrapy Python packages, installing of BeautifulSoup, installing Python parser lxml, creating soup object with input HTML, searching of tree, full or partial parsing, output print and searching the tree

Hands-on Exercise – Installation of Beautiful Soup and lxml Python parser, making a soup object with input HTML file and navigating using Py objects in soup tree.

Python integration with Hadoop and Spark

What is the need for integrating Python with Hadoop and Spark, the basics of the Hadoop ecosystem, Hadoop Common, the architecture of MapReduce and HDFS and deploying Python coding for MapReduce jobs on Hadoop framework, understanding Apache Spark, setting up Cloudera QuickStart VM, Spark tools, RDD in Spark, PySpark, integrating PySpark with Jupyter Notebook, introduction to Artificial Intelligence and Deep Learning, deploying Spark code with Python, the Machine Learning library of Spark MLlib, deploying Spark MLlib for classification, clustering and regression.

Hands-on Exercise – How to write a MapReduce job with Python, connecting to the Hadoop framework and performing the tasks, how to implement Python in a sandbox, working with the HDFS file system.