Home Data Tutorials Hypothesis testing with R

Hypothesis testing with R

February 13, 2018 - 12:00 am

6834

7 min read

[box type=”note” align=”” class=”” width=””]This article is an excerpt taken from the book Learning Quantitative Finance with R written by Dr. Param Jeet and Prashant Vats. This book will help you understand the basics of R and how they can be applied in various Quantitative Finance scenarios.[/box]

Hypothesis testing is used to reject or retain a hypothesis based upon the measurement of an observed sample. So in today’s tutorial we will discuss how to implement the various scenarios of hypothesis testing in R.

Lower tail test of population mean with known variance

The null hypothesis is given by where is the hypothesized lower bound of the population mean.

Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is greater than $10. The average of 30 days’ daily return sample is $9.9. Assume the population standard deviation is 0.011. Can we reject the null hypothesis at .05 significance level?

Now let us calculate the test statistics z which can be computed by the following code in R:

> xbar= 9.9

> mu0 = 10

> sig = 1.1

> n = 30

> z = (xbar-mu0)/(sig/sqrt(n))

> z

Here:

xbar: Sample mean
mu: Hypothesized value
sig: Standard deviation of population
n: Sample size
z: Test statistics

This gives the value of z the test statistics:

[1] -0.4979296

Now let us find out the critical value at 0.05 significance level. It can be computed by the following code:

> alpha = .05

> z.alpha = qnorm(1-alpha)

> -z.alpha

This gives the following output:

[1] -1.644854

Since the value of the test statistics is greater than the critical value, we fail to reject the null hypothesis claim that the return is greater than $10.

In place of using the critical value test, we can use the pnorm function to compute the lower tail of Pvalue test statistics. This can be computed by the following code:

> pnorm(z)

This gives the following output:

[1] 0.3092668

Since the Pvalue is greater than 0.05, we fail to reject the null hypothesis.

Upper tail test of population mean with known variance

The null hypothesis is given by where is the hypothesized upper bound of the population mean.

Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is at most $5. The average of 30 days’ daily return sample is $5.1. Assume the population standard deviation is 0.25. Can we reject the null hypothesis at .05 significance level?

Now let us calculate the test statistics z, which can be computed by the following code in R:

> xbar= 5.1

> mu0 = 5

> sig = .25

> n = 30

> z = (xbar-mu0)/(sig/sqrt(n))

> z

Here:

xbar: Sample mean
mu0: Hypothesized value
sig: Standard deviation of population
n: Sample size
z: Test statistics

It gives 2.19089 as the value of test statistics. Now let us calculate the critical value at .05 significance level, which is given by the following code:

> alpha = .05

> z.alpha = qnorm(1-alpha)

> z.alpha

This gives 1.644854, which is less than the value computed for the test statistics. Hence we reject the null hypothesis claim.

Also, the Pvalue of the test statistics is given as follows:

>pnorm(z, lower.tail=FALSE)

This gives 0.01422987, which is less than 0.05 and hence we reject the null hypothesis.

Two-tailed test of population mean with known variance

The null hypothesis is given by where is the hypothesized value of the population mean.

Let us assume a scenario where the mean of daily returns of a stock last year is $2. The average of 30 days’ daily return sample is $1.5 this year. Assume the population standard deviation is .2. Can we reject the null hypothesis that there is not much significant difference in returns this year from last year at .05 significance level?

Now let us calculate the test statistics z, which can be computed by the following code in R:

> xbar= 1.5

> mu0 = 2

> sig = .1

> n = 30

> z = (xbar-mu0)/(sig/sqrt(n))

> z

This gives the value of test statistics as -27.38613.

Now let us try to find the critical value for comparing the test statistics at .05 significance level. This is given by the following code:

>alpha = .05

>z.half.alpha = qnorm(1-alpha/2)

>c(-z.half.alpha, z.half.alpha)

This gives the value -1.959964, 1.959964. Since the value of test statistics is not between the range (-1.959964, 1.959964), we reject the claim of the null hypothesis that there is not much significant difference in returns this year from last year at .05 significance level.

The two-tailed Pvalue statistics is given as follows:

>2*pnorm(z)

This gives a value less than .05 so we reject the null hypothesis.

In all the preceding scenarios, the variance is known for population and we use the normal distribution for hypothesis testing. However, in the next scenarios, we will not be given the variance of the population so we will be using t distribution for testing the hypothesis.

Lower tail test of population mean with unknown variance

The null hypothesis is given by where is the hypothesized lower bound of the population mean.

Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is greater than $1. The average of 30 days’ daily return sample is $.9. Assume the population standard deviation is 0.01. Can we reject the null hypothesis at .05 significance level?

In this scenario, we can compute the test statistics by executing the following code:

> xbar= .9

> mu0 = 1

> sig = .1

> n = 30

> t = (xbar-mu0)/(sig/sqrt(n))

> t

Here:

xbar: Sample mean
mu0: Hypothesized value
sig: Standard deviation of sample
n: Sample size
t: Test statistics

This gives the value of the test statistics as -5.477226. Now let us compute the critical value at .05 significance level. This is given by the following code:

> alpha = .05

> t.alpha = qt(1-alpha, df=n-1)

> -t.alpha

We get the value as -1.699127. Since the value of the test statistics is less than the critical value, we reject the null hypothesis claim.

Now instead of the value of the test statistics, we can use the Pvalue associated with the test statistics, which is given as follows:

>pt(t, df=n-1)

This results in a value less than .05 so we can reject the null hypothesis claim.

Upper tail test of population mean with unknown variance

The null hypothesis is given by where is the hypothesized upper bound of the population mean.

Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is at most $3. The average of 30 days’ daily return sample is $3.1. Assume the population standard deviation is .2. Can we reject the null hypothesis at .05 significance level?

Now let us calculate the test statistics t which can be computed by the following code in R:

> xbar= 3.1

> mu0 = 3

> sig = .2

> n = 30

> t = (xbar-mu0)/(sig/sqrt(n))

> t

Here:

xbar: Sample mean
mu0: Hypothesized value
sig: Standard deviation of sample
n: Sample size
t: Test statistics

This gives the value 2.738613 of the test statistics. Now let us find the critical value associated with the .05 significance level for the test statistics. It is given by the following code:

> alpha = .05

> t.alpha = qt(1-alpha, df=n-1)

> t.alpha

Since the critical value 1.699127 is less than the value of the test statistics, we reject the null hypothesis claim.

Also, the value associated with the test statistics is given as follows:

>pt(t, df=n-1, lower.tail=FALSE)

This is less than .05. Hence the null hypothesis claim gets rejected.

Two tailed test of population mean with unknown variance

The null hypothesis is given by , where is the hypothesized value of the population mean.

Let us assume a scenario where the mean of daily returns of a stock last year is $2. The average of 30 days’ daily return sample is $1.9 this year. Assume the population standard deviation is .1. Can we reject the null hypothesis that there is not much significant difference in returns this year from last year at .05 significance level?

Now let us calculate the test statistics t, which can be computed by the following code in R:

> xbar= 1.9

> mu0 = 2

> sig = .1

> n = 30

> t = (xbar-mu0)/(sig/sqrt(n))

> t

This gives -5.477226 as the value of the test statistics. Now let us try to find the critical value range for comparing, which is given by the following code:

> alpha = .05

> t.half.alpha = qt(1-alpha/2, df=n-1)

> c(-t.half.alpha, t.half.alpha)

This gives the range value (-2.04523, 2.04523). Since this is the value of the test statistics, we reject the claim of the null hypothesis

We learned how to practically perform one-tailed/ two-tailed hypothesis testing with known as well as unknown variance using R.

If you enjoyed this excerpt, check out the book Learning Quantitative Finance with R to explore different methods to manage risks and trading using Machine Learning with R.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Hypothesis testing with R

Lower tail test of population mean with known variance

Upper tail test of population mean with known variance

Two-tailed test of population mean with known variance

Lower tail test of population mean with unknown variance

Upper tail test of population mean with unknown variance

Two tailed test of population mean with unknown variance

LEAVE A REPLY Cancel reply

MobilePro

datapro

Programming

Subscribe to our newsletter