(For more resources related to this topic, see here.)
In this article, we will cover the following topics:

Generating random numbers from standard normal distribution and normal distribution

Generating random numbers from a uniform distribution

A simple application: estimate pi by the Monte Carlo simulation

Generating random numbers from a Poisson distribution

Bootstrapping with/without replacements

The lognormal distribution and simulation of stock price movements

Simulating terminal stock prices

Simulating an efficient portfolio and an efficient frontier
Generating random numbers from a standard normal distribution
Normal distributions play a central role in finance. A major reason is that many finance theories, such as option theory and applications, are based on the assumption that stock returns follow a normal distribution. It is quite often that we need to generate n random numbers from a standard normal distribution. For this purpose, we have the following two lines of code:
>>>import scipy as sp >>>x=sp.random.standard_normal(size=10)
The basic random numbers in SciPy/NumPy are created by Mersenne Twister PRNG in the numpy.random function. The random numbers for distributions in numpy.random are in cython/pyrex and are pretty fast. To print the first few observations, we use the print() function as follows:
>>>print x[0:5] [0.55062594 0.51338547 0.04208367 0.66432268 0.49461661] >>>
Alternatively, we could use the following code:
>>>import scipy as sp >>>x=sp.random.normal(size=10)
This program is equivalent to the following one:
>>>import scipy as sp >>>x=sp.random.normal(0,1,10)
The first input is for mean, the second input is for standard deviation, and the last one is for the number of random numbers, that is, the size of the dataset. The default settings for mean and standard deviations are 0 and 1. We could use the help() function to find out the input variables. To save space, we show only the first few lines:
>>>help(sp.random.normal) Help on builtin function normal: normal(...) normal(loc=0.0, scale=1.0, size=None)
Drawing random samples from a normal (Gaussian) distribution
The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently, is often called the bell curve because of its characteristic shape; refer to the following graph:
Again, the density function for a standard normal distribution is defined as follows:
(1)
Generating random numbers with a seed
Sometimes, we like to produce the same random numbers repeatedly. For example, when a professor is explaining how to estimate the mean, standard deviation, skewness, and kurtosis of five random numbers, it is a good idea that students could generate exactly the same values as their instructor. Another example would be that when we are debugging our Python program to simulate a stock’s movements, we might prefer to have the same intermediate numbers. For such cases, we use the seed() function as follows:
>>>import scipy as sp >>>sp.random.seed(12345) >>>x=sp.random.normal(0,1,20) >>>print x[0:5] [0.20470766 0.47894334 0.51943872 0.5557303 1.96578057] >>>
In this program, we use 12345 as our seed. The value of the seed is not important. The key is that the same seed leads to the same random values.
Generating n random numbers from a normal distribution
To generate n random numbers from a normal distribution, we have the following code:
>>>import scipy as sp >>>sp.random.seed(12345) >>>x=sp.random.normal(0.05,0.1,50) >>>print x[0:5] [ 0.02952923 0.09789433 0.00194387 0.00557303 0.24657806] >>>
The difference between this program and the previous one is that the mean is 0.05 instead of 0, while the standard deviation is 0.1 instead of 1. The density of a normal distribution is defined by the following equation, where μ is the mean and σ is the standard deviation. Obviously, the standard normal distribution is just a special case of the normal distribution shown as follows:
(2)
Histogram for a normal distribution
A histogram is used intensively in the process of analyzing the properties of datasets. To generate a histogram for a set of random values drawn from a normal distribution with specified mean and standard deviation, we have the following code:
>>>import scipy as sp >>>import matplotlib.pyplot as plt >>>sp.random.seed(12345) >>>x=sp.random.normal(0.08,0.2,1000) >>>plt.hist(x, 15, normed=True) >>>plt.show()
The resultant graph is presented as follows:
Graphical presentation of a lognormal distribution
When returns follow a normal distribution, the prices would follow a lognormal distribution. The definition of a lognormal distribution is as follows:
(3)
The following code shows three different lognormal distributions with three pairs of parameters, such as (0, 0.25), (0, 0.5), and (0, 1.0). The first parameter is for mean (), while the second one is for standard deviation, :
import scipy.stats as sp import numpy as np import matplotlib.pyplot as plt x=np.linspace(0,3,200) mu=0 sigma0=[0.25,0.5,1] color=['blue','red','green'] target=[(1.2,1.3),(1.7,0.4),(0.18,0.7)] start=[(1.8,1.4),(1.9,0.6),(0.18,1.6)] for i in range(len(sigma0)): sigma=sigma0[i] y=1/(x*sigma*sqrt(2*pi))*exp((log(x)mu)**2/(2*sigma*sigma)) plt.annotate('mu='+str(mu)+', sigma='+str(sigma),
xy=target[i], xytext=start[i], arrowprops=dict(facecolor=color[i],shrink=0.01),) plt.plot(x,y,color[i]) plt.title('Lognormal distribution') plt.xlabel('x') plt.ylabel('lognormal density distribution') plt.show()
The corresponding three graphs are put together to illustrate their similarities and differences:
Generating random numbers from a uniform distribution
When we plan to randomly choose m stocks from n available stocks, we could draw a set of random numbers from a uniform distribution. To generate 10 random numbers between one and 100 from a uniform distribution, we have the following code. To guarantee that we generate the same set of random numbers, we use the seed() function as follows:
>>>import scipy as sp >>>sp.random.seed(123345) >>>x=sp.random.uniform(low=1,high=100,size=10)
Again, low, high, and size are the three keywords for the three input variables. The first one specifies the minimum, the second one specifies the high end, while the size gives the number of the random numbers we intend to generate. The first five numbers are shown as follows:
>>>print x[0:5] [ 30.32749021 20.58006409 2.43703988 76.15661293 75.06929084] >>>
Using simulation to estimate the pi value
It is a good exercise to estimate pi by the Monte Carlo simulation. Let’s draw a square with 2R as its side. If we put the largest circle inside the square, its radius will be R. In other words, the areas for those two shapes have the following equations:
(4)
(5)
By dividing equation (4) by equation (5), we have the following result:
In other words, the value of pi will be 4* Scircle/Ssquare. When running the simulation, we generate n pairs of x and y from a uniform distribution with a range of zero and 0.5. Then we estimate a distance that is the square root of the summation of the squared x and y, that is, . Obviously, when d is less than 0.5 (value of R), it will fall into the circle. We can imagine throwing a dart that falls into the circle. The value of the pi will take the following form:
(6)
The following graph illustrates these random points within a circle and within a square:
The Python program to estimate the value of pi is presented as follows:
import scipy as sp n=100000 x=sp.random.uniform(low=0,high=1,size=n) y=sp.random.uniform(low=0,high=1,size=n) dist=sqrt(x**2+y**2) in_circle=dist[dist our_pi=len(in_circle)*4./n print ('pi=',our_pi) print('error (%)=', (our_pipi)/pi)
The estimated pi value would change whenever we run the previous code as shown in the following code, and the accuracy of its estimation depends on the number of trials, that is, n:
('pi=', 3.15) ('error (%)=', 0.0026761414789406262) >>>
Generating random numbers from a Poisson distribution
To investigate the impact of private information, Easley, Kiefer, O’Hara, and Paperman (1996) designed a (PIN) Probability of informed trading measure that is derived based on the daily number of buyerinitiated trades and the number of sellerinitiated trades. The fundamental aspect of their model is to assume that order arrivals follow a Poisson distribution. The following code shows how to generate n random numbers from a Poisson distribution:
import scipy as sp import matplotlib.pyplot as plt x=sp.random.poisson(lam=1, size=100) #plt.plot(x,'o') a = 5. # shape n = 1000 s = np.random.power(a, n) count, bins, ignored = plt.hist(s, bins=30) x = np.linspace(0, 1, 100) y = a*x**(a1.) normed_y = n*np.diff(bins)[0]*y plt.plot(x, normed_y) plt.show()
Selecting m stocks randomly from n given stocks
Based on the preceding program, we could easily choose 20 stocks from 500 available securities. This is an important step if we intend to investigate the impact of the number of randomly selected stocks on the portfolio volatility as shown in the following code:
import scipy as sp n_stocks_available=500 n_stocks=20 x=sp.random.uniform(low=1,high=n_stocks_available,size=n_stocks) y=[] for i in range(n_stocks): y.append(int(x[i])) #print y final=unique(y) print final print len(final)
In the preceding program, we select 20 numbers from 500 numbers. Since we have to choose integers, we might end up with less than 20 values, that is, some integers appear more than once after we convert real numbers into integers. One solution is to pick more than we need. Then choose the first 20 integers. An alternative is to use the randrange() and randint() functions. In the next program, we choose n stocks from all available stocks. First, we download a dataset from http://canisius.edu/~yany/yanMonthly.pickle:
n_stocks=10 x=load('c:/temp/yanMonthly.pickle') x2=unique(np.array(x.index)) x3=x2[x2 sp.random.seed(1234567) nonStocks=['GOLDPRICE','HML','SMB','Mkt_Rf','Rf','Russ3000E_D','US_DEBT', 'Russ3000E_X','US_GDP2009dollar','US_GDP2013dollar'] x4=list(x3) for i in range(len(nonStocks)): x4.remove(nonStocks[i]) k=sp.random.uniform(low=1,high=len(x4),size=n_stocks) y,s=[],[] for i in range(n_stocks): index=int(k[i]) y.append(index) s.append(x4[index]) final=unique(y) print final print s
In the preceding program, we remove nonstock data items. These nonstock items are a part of data items. First, we load a dataset called yanMonthly.pickle that includes over 200 stocks, gold price, GDP, unemployment rate, SMB (Small Minus Big), HML (High Minus Low), riskfree rate, price rate, market excess rate, and Russell indices.
The .pickle extension means that the dataset has a type from Pandas. Since x.index would present all indices for each observation, we need to use the unique() function to select all unique IDs. Since we only consider stocks to form our portfolio, we have to move all market indices and other nonstock securities, such as HML and US_DEBT. Because all stock market indices start with a carat (^), we use less than ZZZZ to remove them. For other IDs that are between A and Z, we have to remove them one after another. For this purpose, we use the remove() function available for a list variable. The final output is shown as follows: