Data Science Publication

Posts

Correlation and Covariance in Statistics

August 20, 2020

Covariance and Correlation are two mathematical concepts; these two approaches are widely used in statistics. Both Correlation and Covariance establish the relationship and also measure the dependency between two random variables. Though the work is similar between these two in mathematical terms, they are different from each other. Correlation : Correlation is considered or described as the best technique for measuring and also for estimating the quantitative relationship between two variables. Correlation measures how strongly two variables are related. Covariance : In covariance two items vary together and it’s a measure that indicates the extent to which two random variables change in cycle. It is a statistical term; it explains the systematic relation between a pair of random variables, wherein changes in one variable reciprocal by a corresponding change in another variable.

Chocolate Bar (data set)

January 01, 2019

Chocolate is one of the most popular candies in the world. Each year, residents of the United States collectively eat more than 2.8 billions pounds. However, not all chocolate bars are created equal! This dataset contains expert ratings of over 1,700 individual chocolate bars, along with information on their regional origin, percentage of cocoa, the variety of chocolate bean used and where the beans were grown. Flavors of Cacao Rating System: 5= Elite (Transcending beyond the ordinary limits) 4= Premium (Superior flavor development, character and style) 3= Satisfactory(3.0) to praiseworthy(3.75) (well made with special qualities) 2= Disappointing (Passable but contains at least one significant flaw) 1= Unpleasant (mostly unpalatable) Each chocolate is evaluated from a combination of both objective qualities and subjective interpretation. A rating here only represents an experience with one bar from one batch. Batch numbers, vintages and review dates are included in the database w...

Real life example for linear regression: different training regimens and player performance

October 12, 2017

Data scientists for professional sports teams often use linear regression to measure the effect that different training regimens have on player performance. For example, data scientists in the NBA might analyze how different amounts of weekly yoga sessions and weightlifting sessions affect the number of points a player scores. They might fit a multiple linear regression model using yoga sessions and weightlifting sessions as the predictor variables and total points scored as the response variable. The regression model would take the following form: points scored = β0 + β1(yoga sessions) + β2(weightlifting sessions) The coefficient β0 would represent the expected points scored for a player who participates in zero yoga sessions and zero weightlifting sessions. The coefficient β1 would represent the average change in points scored when weekly yoga sessions is increased by one, assuming the number of weekly weightlifting sessions remains unchanged. The coefficient β2 would represent...

Real life example for linear regression: fertiliser, water and crop yields

October 12, 2017

Agricultural scientists often use linear regression to measure the effect of fertilizer and water on crop yields. For example, scientists might use different amounts of fertilizer and water on different fields and see how it affects crop yield. They might fit a multiple linear regression model using fertilizer and water as the predictor variables and crop yield as the response variable. The regression model would take the following form: crop yield = β0 + β1(amount of fertilizer) + β2(amount of water) The coefficient β0 would represent the expected crop yield with no fertilizer or water. The coefficient β1 would represent the average change in crop yield when fertilizer is increased by one unit, assuming the amount of water remains unchanged. The coefficient β2 would represent the average change in crop yield when water is increased by one unit, assuming the amount of fertilizer remains unchanged. Depending on the values of β1 and β2, the scientists may change the amount of ferti...

Real life example for linear regression: drug dosage and blood pressure of patients

October 12, 2017

Medical researchers often use linear regression to understand the relationship between drug dosage and blood pressure of patients. For example, researchers might administer various dosages of a certain drug to patients and observe how their blood pressure responds. They might fit a simple linear regression model using dosage as the predictor variable and blood pressure as the response variable. The regression model would take the following form: blood pressure = β0 + β1(dosage) The coefficient β0 would represent the expected blood pressure when dosage is zero. The coefficient β1 would represent the average change in blood pressure when dosage is increased by one unit. If β1 is negative, it would mean that an increase in dosage is associated with a decrease in blood pressure. If β1 is close to zero, it would mean that an increase in dosage is associated with no change in blood pressure. If β1 is positive, it would mean that an increase in dosage is associated with an increase in bl...

Real life example for linear regression: advertising spending and revenue

October 12, 2017

Businesses often use linear regression to understand the relationship between advertising spending and revenue. For example, they might fit a simple linear regression model using advertising spending as the predictor variable and revenue as the response variable. The regression model would take the following form: revenue = β0 + β1(ad spending) The coefficient β0 would represent the total expected revenue when ad spending is zero. The coefficient β1 would represent the average change in total revenue when ad spending is increased by one unit (e.g. one dollar). If β1 is negative, it would mean that more ad spending is associated with less revenue. If β1 is close to zero, it would mean that ad spending has little effect on revenue. And if β1 is positive, it would mean more ad spending is associated with more revenue. Depending on the value of β1, a company may decide to either decrease or increase their ad spending.

Linear regression

October 12, 2017

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, line...