Let us first learn how to read files line by line in Python. Reading a file line by line is not only fast but also gives lot of control over reading files. As an example we will use the data set:
http://lib.stat.cmu.edu/DASL/Datafiles/USTemperatures.html
The data gives the normal average January minimum temperature in degrees Fahrenheit with the latitude and longitude of 56 U.S. cities. (For each year from 1931 to 1960, the daily minimum temperatures in January were added together and divided by 31. Then, the averages for each year were averaged over the 30 years.)
# Import all the packages we will use
import numpy as np
import scipy as sc
import matplotlib.pyplot as plt
with open('temp1.txt','r') as f:
lines = f.readlines()
f.close()
print lines
# Removing '\n' and splitting a line in columns (remember '\t' is the delimiter)
linesX=lines[1].strip('\n')
line = linesX.split('\t')
print 'line=',line
print 'line[0]=',line[0]
# Lets us just read the column 1 (city,state) and col 2 (temperature)
line=[];xdata = np.zeros(len(lines),dtype={'names':['city', 'temp'], 'formats':['S32','i4']})
for j in range(1,len(lines)):
line=lines[j].strip('\n');line=line.split('\t')
xdata[j-1]=(line[0],line[1])
print xdata['city'],xdata['temp']
%matplotlib inline
plt.hist(xdata['temp'],bins=range(-1,60),color='g');
plt.xlabel('Temperature')
plt.ylabel('Frequency');
# Random numbers between 0 and 1 from a uniform distribution
ur=np.random.rand(5000)
# Random numbers from Gaussian (or normal) distribution with mean =0 and varaince=1
gr=np.random.normal(0,1,5000)
# Random numbers from exponential distribution with mean=1
ex=np.random.exponential(1,1000)
# plotting both the above distributions in subplot
plt.subplot(1,3,1)
plt.hist(ur,20,color='m')
plt.subplot(1,3,2)
plt.hist(gr,20,color='b')
plt.subplot(1,3,3)
plt.hist(ex,20,color='c')
plt.tight_layout(); # improves the spacing between subplots
Let us define a density function for normal (Gaussian) distribution
def gaussianpdf(mu,sigma,x):
fdensity=1/(sigma*np.sqrt(2 * np.pi))*np.exp(-(x-mu)**2 / (2 * sigma**2))
return fdensity
plt.subplot(1,2,2)
plt.hist(gr,20,color='k',normed='True')
x=np.linspace(-4,4,num=125)
fx=gaussianpdf(0,1,x)
plt.plot(x,fx,'r-',linewidth=3);