Data science, also known as data-driven science, is an interdisciplinary field about scientific methods and processes to extract knowledge from data
Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
The Introduction to Data Science class will survey the foundational topics in data science.
1.) What is Data Science? – Introduction.
2.) Importance of Data Science.
3.) Demand for Data Science Professional.
4.) Brief Introduction to Big data and Data Analytics.
5.) The lifecycle of data science.
6.) Tools and Technologies used in data Science.
7.) Business Intelligence vs Data Science.
8.) Role of a data scientist
Use the R Programming Language to execute data science projects and become a data scientist
1. Introduction to R
R Basics, background.
Comprehensive R Archive Network
Demo of Installing R On windows from CRAN Website
Installing R Studios on Windows OS
Setting Up R Workspace.
Getting Help for R-How to use the help system
Installing Packages – Loading And Unloading Packages
2. Starting with R: Getting familiar with basics
Operators in R – Arithmetic, Relational, Logical and Assignment Operators
Variables, Types Of Variables, Using variables
Conditional statements,ifelse(),switch
Loops: For Loops, While Loops, Using Break statement, Switch
3. The R Programming Language- Data Types And Functions
Use R for simple maths, creating data objects from the keyword.
How to make a different type of data objects.
Understand the various data types that the language supports.
Introduction to Functions in R
Types of data structures in R
Arrays And Lists- Create Access the elements
Vectors – Create Vectors, Vectorized Operations, Power of Vectorized Operations
Matrices- Building the first matrices, Matrix Operations, Subsetting, visualizing subset, Visualising with matplot()
Factors – Creating a Factor
Data Frames- create and filter data frames, Building And Merging data frames.
4. Functions And Importing data into R
Function Overview – Naming Guidelines
Arguments Matching, Function with Multiple Arguments
Additional Arguments using Ellipsis, Lazy Evaluation
Multiple Return Values
Function as Objects, Anonymous Functions
Importing and Exporting Data into R- importing from files like excel,csv, and minitab.
Import from URL and Excel Files
Import from the database.
5. Data Descriptive Statistics, Tabulation, Distribution
Summary Statistics for Matrix Objects.
apply() Command.Converting an Object into a Table
Histograms, Stem and Leaf Plot, Density Function. Normal Distribution
6. Graphics in R – Types of graphics
Bar Chart, Pie Chart, Histograms- Create and edit.
Box Plots- Basics of Boxplots- Create and Edit
Visualisation in R using ggplot2.
More About Graphs: Adding Legends to Graphs
Adding Text to Graphs, Orienting the Axis Label.
SQL is a standard language for accessing and manipulating databases.
1. Introduction to SQL Server and RDBMS
Covers an overview of using relational databases. You’ll learn basic terminology used in
future modules, SQL Server Management Studio is the primary tool used to create queries
and manage objects in SQL Server databases
2. SQL Operations
Single Table Queries – SELECT, WHERE, ORDER BY, Distinct, And, OR
Multiple Table Queries: INNER, SELF, CROSS, and OUTER,oin, Left Join, Right Join, Full
Join, Union and MANY MORE…..
3. SQL Advanced -Operations
Data Aggregations and summarizing the data
Ranking Functions: Top-N Analysis
Advanced SQL Queries for Analytics
A comprehensive learning path to become a data scientist using Python. Topics include machine learning, deep learning & pandas on Python.
1. Python Programming Basics
Installing Jupyter Notebooks
Python Overview
Python 2.7 vs Python 3
Python Identifiers
Various Operators and Operators Precedence
Getting input from User, Comments, Multi-line Comments.
2. Making Decisions And Loop Control
Simple if Statement,if-else Statement
if-elif Statement.
Introduction To while Loops.
Introduction To for Loops,Using continue
and break,
3. Python Data Types: List, Tuples, Dictionaries
Python Lists, Tuples, Dictionaries
Accessing Values
Basic Operations
Indexing, Slicing, and Matrixes
Built-in Functions & Methods
Exercises on List, Tuples, And Dictionary
4. Functions And Modules
Introduction To Functions – Why
Defining Functions
Calling Functions
Functions With Multiple Arguments.
Anonymous Functions – Lambda
Using Built-In Modules, User-Defined Modules, Module Namespaces,
Iterators And Generators
5. A file I/O And Exceptional Handling
Opening and Closing Files
open Function,file Object Attributes
close() Method ,Read,write,seek.Exception Handling, the try-finally Clause
Raising an Exceptions, User-Defined Exceptions
Regular Expression- Search and Replace
Regular Expression Modifiers
Regular Expression Patterns, re module
6. Numpy
Introduction to Numpy. Array Creation, Printing Arrays
Basic Operations- Indexing, Slicing, and Iterating
Shape Manipulation – Changing shape,stacking and splitting of an array
Vector stacking
7. Pandas And Matplotlib
Introduction to Pandas
Importing data into Python
Pandas Data Frames, Indexing Data Frames, Basic Operations With Data frame, Renaming Columns, Subletting and filtering a data frame.
Matplotlib – Introduction, plot(), Controlling Line Properties, Working with Multiple Figures, Histograms
Numerical descriptive measures computed from data are called statistics.
1. Fundamentals of Math and Probability
Basic understanding of linear algebra, Matrics, vectors
Addition and Multiplication of matrics
Fundamentals of Probability
Probability distributed function and cumulative distributed function.
Class Hand-on
Problem-solving using R for vector manipulation
Problem-solving for probability assignments
2.Descriptive Statistics
Describe or summarise a set of data
A measure of central tendency and measure of dispersion.
The mean, median, mode, curtosis and skewness
Computing Standard deviation and Variance.
Types of distribution.
Class Hands-on:
5 Point summary BoxPlot
Histogram and Bar Chart
Exploratory analytics R Methods
3. Inferential Statistics
What is inferential statistics
Different types of Sampling techniques
Central Limit Theorem
Point estimate and Interval estimate
Creating confidence interval for a population parameter
Characteristics of Z-distribution and T-Distribution
Basics of Hypothesis Testing
Type of test and rejection region
Type of errors in Hypothesis resting, Type-l error and Type-ll errors
P-Value and Z-Score Method
T-Test, Analysis of variance(ANOVA) and Analysis of Co variance(ANCOVA)
Regression analysis in ANOVA
Class Hands-on:
Problem-solving for C.L.T
Problem-solving Hypothesis Testing
Problem-solving for T-test, Z-score test
Case study and model run for ANOVA, ANCOVA
4. Hypothesis Testing
Hypothesis Testing
Basics of Hypothesis Testing
Type of test and Rejection Region
Type o errors-Type 1 Errors,Type 2 Errors
P-value method, Z score Method
Implementing machine learning algorithms from scratch seems like a great way for a programmer to understand machine learning.
1. Introduction To Machine Learning
What is Machine Learning?
What is the Challenge?
Introduction to Supervised Learning, Unsupervised Learning
What is Reinforcement Learning?
2. Linear Regression
Introduction to Linear Regression
Linear Regression with Multiple Variables
Disadvantage of Linear Models
Interpretation of Model Outputs
Understanding Covariance and Colinearity
Understanding Heteroscedasticity
Case Study – Application of Linear Regression for Housing Price Prediction
3. Logistic Regression
Introduction to Logistic Regression.– Why Logistic Regression.
Introduce the notion of classification
Cost function for logistic regression
Application of logistic regression to multi-class classification.
Confusion Matrix, Odd’s Ratio And ROC Curve
Advantages And Disadvantages of Logistic Regression.
Case Study: To classify an email as spam or not spam using Logistic Regression.
4. Decision Trees And Supervised Learning
Decision Tree – data set
How to build decision tree?
Understanding Kart Model
Classification Rules- Overfitting Problem
Stopping Criteria And Pruning
How to Find the final size of Trees?
Model A decision Tree.
Naive Bayes
Random Forests and Support Vector Machines
Interpretation of Model Outputs
Case Study:
1 Business Case Study for Kart Model
2 Business Case Study for Random Forest
3 Business Case Study for SVM
5. Unsupervised Learning
Hierarchical Clustering
k-Means algorithm for clustering – groupings of unlabeled data points.
Principal Component Analysis(PCA)- Data
Independent components analysis(ICA)
Anomaly Detection
Recommender System-collaborative filtering algorithm
Case Study– Recommendation Engine for e-commerce/retail chain
6. Introduction to Deep Learning
Neural Network
Understanding Neural Network Model
Understanding Tuning of Neural Network
Case Study:
Case study using Neural Network
7. Natural language Processing
Introduction to Natural Language Processing(NLP).
Word Frequency Algorithms for NLP
Sentiment Analysis
Case Study :
Twitter data analysis using NLP
8. Apache Spark Analytics
What is Spark
Introduction to Spark RDD
Introduction to Spark SQL and Data frames
Using R-Spark for machine learning
Hands-on:
Installation and configuration of Spark
Hands-on Spark RDD programming
Hands on of Spark SQL and Dataframe programming
Using R-Spark for machine learning programming
9. Introduction to Tableau/Spotfire
Connecting to the data source
Creating dashboard pages
How to create calculated columns
Different charts
Hands-on:
Hands-on on connecting data source and data cleansing
Hands-on various charts
Hands-on the deployment of the Predictive model in visualization