Data Science Publication

Posts

Exploratory data analysis

September 30, 2020

In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. For a good example of automated EDA, please check this one: (IBM cloud) https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/b021d2c8-585c-4af7-98fb-9c7d950fb9d1/view?access_token=92f2f438d92d81ce251a8aaecc4e3e35373bc978cee3f0c03ff59ec0e67...

Decision Tree algorithm

August 21, 2020

A decision tree is a supervised machine learning algorithm mainly used for Regression and Classification. It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision tree can handle both categorical and numerical data.

Different kernels in SVM

August 21, 2020

There are four types of kernels in SVM. Linear Kernel Polynomial kernel Radial basis kernel Sigmoid kernel

Support Vectors in SVM

August 21, 2020

In the diagram, we see that the thinner lines mark the distance from the classifier to the closest data points called the support vectors (darkened data points). The distance between the two thin lines is called the margin.

SVM algorithm

August 21, 2020

SVM stands for support vector machine, it is a supervised machine learning algorithm which can be used for both Regression and Classification. If you have n features in your training data set, SVM tries to plot it in n-dimensional space with the value of each feature being the value of a particular coordinate. SVM uses hyperplanes to separate out different classes based on the provided kernel function.

Naive Bayes

August 21, 2020

The Naive Bayes Algorithm is based on the Bayes Theorem. Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. The Algorithm is ‘naive’ because it makes assumptions that may or may not turn out to be correct.

Eigenvectors and Eigenvalues

August 21, 2020

Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching. Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs.