Exploring census and demographic data with R

I wanted to use R to explore how to access and visualize census data, home value data and school rating data using Indianapolis metro area as an example. My learning objectives here are to learn how to use R to:

  • Make choropleth maps at zip code level
  • Process data in XML or JSON format that is returned when using API provided by websites (here I chose Zillow and education.com as an example)

The data that I wanted to overlay over the Indianapolis map was median age, median income, median home listing price/value per sq feet, and school rating at the zip code level. I got the census data using US census data API, zip level home demographic data using Zillow demographic API, school rating data using education.com API. (Disclaimer: This is only a one off use of Zillow API and education.com API. I believe I am in compliance with the terms of use of these API for Zillow and education.com. However, if somebody notices any violation of terms of use of these API, please bring it to my attention and I will remove this material). I should also note here that there are several nice interactive tools and widgets on the web that get the same information. My purpose here is just to play with R to extract and visualize this data.

The full R code along with explanation can be found here.

The maps of median age, median household income, median home listing price/value per sq feet and school rating overlaid over Indianapolis map is shown below

medianAgeMap

medianIncomeMapmedianListPrice

medianValperSqFtschoolRating

Posted in R | 1 Comment

Using R in Used Car Search

Recently I was looking to buy a used car. I was using the Edmunds website to estimate the true market value of a used car. It has a nice click through interface where it guides you to input make, model, year etc. and will give an estimated value. While the interface is nice, it would be nice to get a broader view across years, models etc. Luckily, Edmunds also has a developer API which is nicely documented. Here I am listing my basic attempts at using the developer API to get some data into R and get exploratory plots of car value by year. For using this API, you need to get a API key (specifically the vehicle API key is used here).

The code used for creating the plots below can be found here.

The plots below show the market value across styles for a given model/make of a car by year. The y axis in these plots is the Edmunds estimated True Market Value for a typically equipped vehicle (not sure how they define it). The bars in the graph span the range of market value.

TMV for Honda Accord

hondaAccord_tmv

TMV for Toyota Corolla

toyotaCorolla_tmv

Edmunds also makes several adjusments for car condition, car mileage and additional car options. The plot below show the TMV adjustment for each option for a 2008 Toyota Corolla (LE 4dr Sedan (1.8L 4cyl 4A)

adjOptionThe plot below shows the adjustment made for car condition for a 2008 Toyota Corolla (LE 4dr Sedan (1.8L 4cyl 4A)

adjCondition

The plot below shows the mileage adjustment again for  a 2008 Toyota Corolla (LE 4dr Sedan (1.8L 4cyl 4A)

adjMiles

Posted in R | 3 Comments

Learning R: Parameter Fitting for Models Involving Differential Equations

It looks like MATLAB, Octave and Python seem to be the preferred tools for scientific and engineering analysis (especially those involving physical models with differential equations). However as part of my learning R experience, I wanted to check out some of R tools for parameter fitting of models involving ordinary differential equations. R has packages deSolve for solving differential equations and FME for parameter fitting. The specific example here is taken from the computational appendix (A.6) of the book Chemical Reactor Analysis and Design Fundamentals by Rawlings and Ekerdt. In fact, all examples in this book are available in Octave and MATLAB.

This example involves two reactions in series

A \xrightarrow{k_1} B \xrightarrow{k_2} C

where we need to estimate the rate constants of the the two reactions k_1 and k_2  from data

Here the required libraries are loaded


# set working directory
setwd("~/R/wkspace")

# load libraries
library(ggplot2) #library for plotting
library(reshape2) # library for reshaping data (tall-narrow <-> short-wide)
library(deSolve) # library for solving differential equations
library(minpack.lm) # library for least squares fit using levenberg-marquart algorithm

The data available is concentration of A, B, and C over time and is loaded below and plotted


#load concentration data
 df=read.table("ABC_data.dat")
 names(df)=c("time","ca","cb","cc")

# plot data
 tmp=melt(df,id.vars=c("time"),variable.name="species",value.name="conc")
 ggplot(data=tmp,aes(x=time,y=conc,color=species))+geom_point(size=3)

concProfileData

The system being solved is

\frac{dc_A}{dt}=-k_1c_A

\frac{dc_B}{dt}=k_1c_A-k_2c_B

\frac{dc_C}{dt}=k_2c_B

with inital concentrations being c_A=1, c_B=0, c_C=0.

The rate equations are captured in a function that is an input parameter to the ODE solver


# rate function
rxnrate=function(t,c,parms){

 # rate constant passed through a list called parms
 k1=parms$k1
 k2=parms$k2

 # c is the concentration of species

 # derivatives dc/dt are computed below
 r=rep(0,length(c))
 r[1]=-k1*c["A"] #dcA/dt
 r[2]=k1*c["A"]-k2*c["B"] #dcB/dt
 r[3]=k2*c["B"] #dcC/dt

 # the computed derivatives are returned as a list
 # order of derivatives needs to be the same as the order of species in c
 return(list(r))

}

Computing the predicted concentration for a given set of rate constants involves just solving the ODEs with intial conditions. This is illustrated for the parameter values k1=2, k2=1 (which is the actual parameters based on which data was generated).

# predicted concentration for a given parameter set
cinit=c(A=1,B=0,C=0)
t=df$time
parms=list(k1=2,k2=1)
out=ode(y=cinit,times=t,func=rxnrate,parms=parms)
head(out)

      time     A          B         C
[1,] 0.000 1.00000000 0.0000000 0.00000000
[2,] 0.263 0.59096447 0.3555550 0.05348051
[3,] 0.526 0.34923934 0.4834497 0.16731099
[4,] 0.789 0.20638798 0.4958219 0.29779013
[5,] 1.053 0.12172441 0.4543303 0.42394528
[6,] 1.316 0.07193491 0.3925423 0.53552284

This is wrapped into a function whose input is the parameters to be estimated and the output is the residuals. Here there are three concentrations that are fitted. In general, we may want to put different weights to these but  in this example, they have the same weight. Also, the package FME has several nice features for fitting and uncertainty estimation. Here I have not used FME but directly tried to do the fit more to learn how this works.


ssq=function(parms){

 # inital concentration
 cinit=c(A=1,B=0,C=0)
 # time points for which conc is reported
 # include the points where data is available
 t=c(seq(0,5,0.1),df$time)
 t=sort(unique(t))
 # parameters from the parameter estimation routine
 k1=parms[1]
 k2=parms[2]
 # solve ODE for a given set of parameters
 out=ode(y=cinit,times=t,func=rxnrate,parms=list(k1=k1,k2=k2))

 # Filter data that contains time points where data is available
 outdf=data.frame(out)
 outdf=outdf[outdf$time %in% df$time,]
 # Evaluate predicted vs experimental residual
 preddf=melt(outdf,id.var="time",variable.name="species",value.name="conc")
 expdf=melt(df,id.var="time",variable.name="species",value.name="conc")
 ssqres=preddf$conc-expdf$conc

 # return predicted vs experimental residual
 return(ssqres)

}

The parameter fitting is done using levenberg-marquardt routine in package minpack.lm.


# parameter fitting using levenberg marquart algorithm
# initial guess for parameters
parms=c(k1=0.5,k2=0.5)
# fitting
fitval=nls.lm(par=parms,fn=ssq)

The fitval object has information on estimated parameter and variance covariance matrix


# Summary of fit
summary(fitval)

Parameters:
Estimate  Std. Error  t value  Pr(>|t|)
k1 2.01906  0.04867     41.49      <2e-16 ***
k2 0.99297  0.01779     55.82      <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0212 on 58 degrees of freedom
Number of iterations to termination: 7
Reason for termination: Relative error in the sum of squares is at most `ftol'.

# Estimated parameter
parest=as.list(coef(fitval))
parest
$k1
[1] 2.019065

$k2
[1] 0.992973

# degrees of freedom: # data points - # parameters
dof=3*nrow(df)-2
dof
[1] 58
# mean error
ms=sqrt(deviance(fitval)/dof)
ms
[1] 0.02119577

# variance Covariance Matrix
S=vcov(fitval)
S

      k1              k2
k1 0.0023685244 -0.0003605831
k2 -0.0003605831 0.0003164724

The estimated parameters are (k1=2.019,k2=0.993) which is close to the real parameters (k1=2,k2=1) used to generate the data. The predicted profiles is overlaid with experimental data


# plot of predicted vs experimental data

# simulated predicted profile at estimated parameter values
cinit=c(A=1,B=0,C=0)
t=seq(0,5,0.2)
parms=as.list(parest)
out=ode(y=cinit,times=t,func=rxnrate,parms=parms)
outdf=data.frame(out)
names(outdf)=c("time","ca_pred","cb_pred","cc_pred")

# Overlay predicted profile with experimental data
tmppred=melt(outdf,id.var=c("time"),variable.name="species",value.name="conc")
tmpexp=melt(df,id.var=c("time"),variable.name="species",value.name="conc")
p=ggplot(data=tmppred,aes(x=time,y=conc,color=species,linetype=species))+geom_line()
p=p+geom_line(data=tmpexp,aes(x=time,y=conc,color=species,linetype=species))
p=p+geom_point(data=tmpexp,aes(x=time,y=conc,color=species))
p=p+scale_linetype_manual(values=c(0,1,0,1,0,1))
p=p+scale_color_manual(values=rep(c("red","blue","green"),each=2))+theme_bw()
print(p)

ExpvsPred

Estimation of Parameter Uncertainty

Usually though the model is nonlinear, the \alpha confidence region is approximated using the following ellipsoid

(\theta-\hat{\theta})^TS^{-1}(\theta-\hat{\theta}) \leq pF(p,n-p,\alpha)

where \hat{\theta} is the estimated value of parameter, S is the variance covariance matrix of estimated parameters and p=2 is the number of parameters and n=60 is the total number of data points.


# Get the 95% confidence region

# Inverse of covariance matrix
Sinv=solve(S)

# get points for a circle with radius r
r=sqrt(qf(0.95,2,58)*2)
theta=seq(0,2*pi,length.out=100)
z=cbind(r*cos(theta),r*sin(theta))
# transform points of circle into points of ellipse using
# svd of inverse covariance matrix
Sinv_svd=svd(Sinv) # inverse of covariance matrix
xt=t(Sinv_svd$v)%*%diag(1/sqrt(Sinv_svd$d))%*%t(z) # transform from circle to ellispse
x=t(xt)
# translate the ellipse so that center is the estimated parameter value
x=x+matrix(rep(as.numeric(parest),100),nrow=100,byrow=T)

plot(x[,1],x[,2],type="l",xlab="k1",ylab="k2",lwd=2)
points(parest$k1,parest$k2,pch=20,col="blue",cex=2)

ConfRegion_95pct

Another way of estimating the uncertainty is using bootstrapping procedure. Here several simulated datasets are generated using the current estimated model and adding random normal noise to each data point (with mean=0 and variance = mean square error from model). Then the parameter is estimated for each simulated dataset. The set of parameters thus generated indicate the uncertainty in parameter estimates.


# Simulation based estimation of uncertainty

# store original experimental data in a separate dataframe
dforig=df

# conc profile based on estimated k1 and k2
cinit=c(A=1,B=0,C=0)
t=dforig$time
parms=parest
out=ode(y=cinit,times=t,func=rxnrate,parms=parms)

outsim=matrix(0,nrow=nrow(dforig),ncol=4)
outsim[,1]=out[,1]

# number of simulations
nsim=1000

parsim=matrix(0,nrow=nsim,ncol=2)
colnames(parsim)=c("k1","k2")

for (i in 1:nsim){

 # Simulate data set by adding normal random variable with mean 0 and stdev from fit

 outsim[,2:4]=out[,2:4]+matrix(rnorm(3*nrow(dforig)),nrow=nrow(dforig),ncol=3)*ms
 df=data.frame(outsim)
 names(df)=c("time","ca","cb","cc")

 # get parameter estimate for the simulated dataset
 parms=as.numeric(parest)
 fitsim=nls.lm(par=parms,fn=ssq)
 # store estimated parameters in the ith row
 parsim[i,]=coef(fitsim)

}

# plot the parameter estimates from the 1000 simulations
plot(parsim[,1],parsim[,2],xlab="k1",ylab="k2")
# overlay the 95% ellipse computed previously
lines(x[,1],x[,2],col="blue",lwd=2)

confRegion_simul95pct

Next, the percentage of parameter estimates that fall within the ellipse is  computed and found to be 93% (expected is 95%). In this case, the ellipsoidal approximation of parameter uncertainty seems adequate. But this might not be the case in general.


# percentage of parameters from simulation within the 95% ellipse
tmp=rep(0,length.out=nsim)
for(i in 1:nsim){
 tmp[i]=(parsim[i,]-as.numeric(parest))%*%Sinv%*%(parsim[i,]-as.numeric(parest))
}
sum(tmp <= qf(0.95,2,58)*2)/nsim

[1] 0.933

As I mentioned previously, the package FME has several functions for parameter fitting, MCMC simulation, sensitivity analysis but was not used in this post. Also, when running multiple simulations using ODE models, it might be better to use the C compiled version of reaction rate function. deSolve package provides that ability and describes it in this vignette. The data and R code is available here.


# session Info
sessionInfo()

R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] minpack.lm_1.1-7 deSolve_1.10-6 reshape2_1.2.2 ggplot2_0.9.3.1

loaded via a namespace (and not attached):
 [1] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2
 [6] labeling_0.2 MASS_7.3-26 munsell_0.4 plyr_1.8 proto_0.3-10
[11] RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_3.0.1

Posted in R | 3 Comments

Learning R Using a Chemical Reaction Engineering Book: Part 4

The links to previous parts are listed here. (Part 1, Part 2, Part 3). In this part, I tried to recreate the examples in sections A.3.1 of the computational appendix in the reaction engineering book (by Rawlings and Ekerdt).

Solving a system of ordinary differential equations

This example involves reaction (Benzene pyrolysis) in a plug flow reactor. The actual reactions happening are:

\begin{aligned}  (Rxn1) \; &2B = D + H \\  (Rxn2) \; &B + D = T+H    \end{aligned}

The rate of each reaction is given by:

r_1=k_1(c_B^2-\frac{c_Dc_H}{K_1})
r_2 = k_2(c_Bc_D-\frac{c_Tc_H}{K_2})

The feed to the reactor consists of 60kmol/hr of Benzene (B). The temperature of the reactor is T=1033K and the pressure is P=1atm.The rate constants and equilibrium constants for this example are:
k_1=7\times 10^5\; L/mol.hr,\; k_2=4\times 10^5 \; L/mol.hr,\; K_1=0.31,\; K_2=0.48


# Appendix A.3.1: Solution of Differential Equations

# Benzene pyrolysis example

# Parameters
# NBf - feed benzene rate - mol/h
# R - Universal gas constant
# T - Reactor temperature K
# P - Reactor pressure atm
# k1 - rxn1 forward rate constant L/mol.h
# k2 - rxn2 forward rate constant L/mol.h
# Keq1 - rxn1 equilibrium constant
# Keq2 - rxn2 equilibrium constant
pars=c(
NBf=60e3,
R=0.08205,
T=1033,
P=1,
k1=7e5,
k2=4e5,
Keq1=0.31,
Keq2=0.48
)

The governing equations for conversion versus volume in a plug flow reactor is based on extent of each of the reactions:
\frac{d\epsilon_1}{dV}=r_1,\; \frac{d\epsilon_2}{dV}=r_2
The initial conditions (corresponding to feed conditions N_B(0)=60kmol/h,\;N_D(0)=N_H(0)=N_T(0)=0) are that the extent of reaction is zero.
\epsilon_1(0)=0, \; \epsilon_2(0)=0

The flow rates of each component along the reactor volume can be calculated from reaction extent
N_B=N_B(0)-2\epsilon_1-\epsilon_2, \; N_D=\epsilon_1-\epsilon_2, \; N_H=\epsilon_1+\epsilon_2, \; N_T=\epsilon_2

These are setup in a function that can be passed to an ODE solver. In this case the ODE solver we use is lsode from the R package deSolve. The inputs to the function are:

  • Variable over which the integration is done (Volume in this case)
  • The state variables of the system (Extent of the two reactions)
  • Parameters that are needed for description of the system (Rate constants, Temperature, Pressure, etc.)

The output from this function is the rate of change as described by the equations previously.

# load library deSolve for solving ODEs
library(deSolve)

# function that will be passed to odesolver
# vol is the variable over which the system is integrated (equivalent of time in batch reactions)
# ext is the extent of reactions 1 and 2
# params are the parameters passed to the system
rxnrate=function(vol,ext,params) {
    with(as.list(c(ext,params)),{
      NB=NBf-2*ext1-ext2
      ND=ext1-ext2
      NH=ext1+ext2
      NT=ext2
      Q=NBf*R*T/P
      cB=NB/Q
      cD=ND/Q
      cT=NT/Q
      cH=NH/Q
      dext1=k1*(cB*cB-cD*cH/Keq1)
      dext2=k2*(cB*cD-cT*cH/Keq2)
      return(list(c(dext1=dext1,dext2=dext2)))
    })
}

Since the reaction start only after the feed enters the reactor, the extent of reaction is zero for both reactions at the beginning of the reactor (V=0L). The set of volumes where the concentration and reaction extent is computed is chosen in this case to be from 0L to 1600L at every 50L. The ODE solver lsode from deSolve package is used to solve the system of equations.


# initial extent of reaction (zero in this case for both reactions)
extinit=c(ext1=0,ext2=0)
# Volumes where the concentration is reported (in this case 0 to 1600L at every 50L)
vols=seq(0,1600,length=50)
# Solution of the set of differential equations using lsode solver in deSolve package
extout=lsode(times=vols,y=extinit,func=rxnrate,parms=pars)

extout contains the extent of reaction vs volume data. That is used to compute mole fraction and conversion at different volumes along the reactor.


# Calculation of mole fraction and conversion from extent of reaction at different volumes
extoutdf=data.frame(extout)
NBf=pars["NBf"]
extoutdf$conv=(extoutdf$ext1*2+extoutdf$ext2)/NBf
extoutdf$yB=(NBf-2*extoutdf$ext1-extoutdf$ext2)/NBf
extoutdf$yD=(extoutdf$ext1-extoutdf$ext2)/NBf
extoutdf$yT=(extoutdf$ext2)/NBf
extoutdf$yH=(extoutdf$ext1+extoutdf$ext2)/NBf

Next conversion and mole fraction is plotted as a function of reaction volume

# load library ggplot2 for plotting
library(ggplot2)
# load library reshape2 for data reshaping
library(reshape2)
# plot of conversion vs volume
ggplot(extoutdf,aes(x=time,y=conv))+geom_line()+
scale_x_continuous(breaks=seq(0,1600,by=200))+xlab('Volume (L)')+ylab('Conversion')+theme_bw(20)

# plot of mole fraction vs volume
tmp=melt(extoutdf[,c("time","yB","yD","yT","yH")],id.vars=c("time"),variable.name="moleFraction")
ggplot(tmp,aes(x=time,y=value,color=moleFraction))+geom_line()+
scale_x_continuous(breaks=seq(0,1600,by=200))+xlab('Volume (L)')+ylab('moleFraction')+theme_bw(20)

conversion

molefraction


sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] reshape2_1.2.1  ggplot2_0.9.2.1 deSolve_1.10-4

loaded via a namespace (and not attached):
[1] colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2       grid_2.15.1
[5] gtable_0.1.1       labeling_0.1       MASS_7.3-18        memoise_0.1
[9] munsell_0.3        plyr_1.7.1         proto_0.3-9.2      RColorBrewer_1.0-5
[13] scales_0.2.2       stringr_0.6.1

Posted in R | 2 Comments

Learning R using a Chemical Reaction Engineering Book: Part 3

In case you missed previous parts, the links to them are listed below.

In this part, I tried to recreate the examples in section A.2.3 of the computational appendix in the reaction engineering book (by Rawlings and Ekerdt).

Function Minimization

In part 2, the reaction equilibrium conditions was determined by solving a set of nonlinear equations. Alternately, it could be determined by minimizing the Gibbs free energy of the system. The function to be minimized is (refer to part2 for information on what the variables refer to):

G=-(x_1lnK_1+x_2lnK_2)+(1-x_1-x_2)lnP+(y_{I0}-x_1-x_2)ln(y_I)+(y_{B0}-x_1-x_2)ln(y_B)+(y_{P10}+x_1)ln(y_{p1})+(y_{P20}+x_2)ln(y_{P2})

The following constraints need to be satisfied for x_1 and x_2

\begin{aligned}  0 &\leq x_1 &\leq 0.5 \\  0 &\leq x_2 &\leq 0.5 \\  x_1&+x_2 &\leq 0.5 \\  \end{aligned}

First I used constrOptim function for minimization. We need to specify the function to be minimized

# function to be minimized
eval_f0=function(x){
dg1=-3.72e3; dg2=-4.49e3; T=400; R=1.987; P=2.5
K1=exp(-dg1/(R*T)); K2=exp(-dg2/(R*T))

yI0=0.5; yB0=0.5; yP10=0; yP20=0;
d=1-x[1]-x[2]
yI=(yI0-x[1]-x[2])/d
yB=(yB0-x[1]-x[2])/d
yP1=(yP10+x[1])/d
yP2=(yP20+x[2])/d

f=-(x[1]*log(K1)+x[2]*log(K2))+(1-x[1]-x[2])*log(P)+yI*d*log(yI)+
yB*d*log(yB)+yP1*d*log(yP1)+yP2*d*log(yP2)

return(f)
}

The constraints need to be specified in the form Ax-b \geq 0

#  constraint
A=matrix(c(-1,-1,1,0,-1,0,0,1,0,-1),ncol=2,byrow=TRUE)
> A
[,1] [,2]
[1,]   -1   -1
[2,]    1    0
[3,]   -1    0
[4,]    0    1
[5,]    0   -1

b=c(-0.5,0,-0.5,0,-0.5)
> b
[1] -0.5  0.0 -0.5  0.0 -0.5

Next, the function is minimized using constrOptim (starting from an initial guess of (0.2,0.2)). Here Nelder-Mead method is used since BFGS method requires specifying the gradient of the function by the user. R taskview optimization lists other options.

# initial guess
xinit=c(0.2,0.2)

# minimize function subject to bounds and constraints
xans2=constrOptim(theta=xinit, f=eval_f0, grad=NULL, ui=A, ci=b,
method = "Nelder-Mead")
> xans2
$par
[1] 0.1331157 0.3509254

$value
[1] -2.559282

$counts
function gradient
100       NA

$convergence
[1] 0

$message
NULL

$outer.iterations
[1] 3

$barrier.value
[1] 8.695209e-05

The solution can be accessed from xans2$par and is (0.1331,0.3509)

Next, I also tried function minimization with ConstrOptim.nl from the package alabama. Here the constraints are specified in terms of h(x) \geq 0. Even if gradient is not supplied, this function will estimate it using finite-differences.

Definition of constraints in the format for constrOptim.nl

# load library alabama
library(alabama)
library(numDeriv)

h_ineq=function(x){
h=rep(NA,1)
h[1]=-x[1]-x[2]+0.5
h[2]=x[1]
h[3]=-x[1]+0.5
h[4]=x[2]
h[5]=-x[2]+0.5
return(h)
}

> xans3=constrOptim.nl(par=xinit,fn=eval_f0,hin=h_ineq)
Min(hin):  0.1
par:  0.2 0.2
fval:  -2.313951
Min(hin):  0.01602445
par:  0.1332729 0.3507026
fval:  -2.559282
Min(hin):  0.01597985
par:  0.1331951 0.350825
fval:  -2.559282
Min(hin):  0.01597985
par:  0.1331951 0.350825
fval:  -2.559282

> xans3
$par
[1] 0.1331951 0.3508250

$value
[1] -2.559282

$counts
function gradient
97       14

$convergence
[1] 0

$message
NULL

$outer.iterations
[1] 4

$barrier.value
[1] 0.0008697741

The solution can be accessed from xans3$par and is (0.1332,0.3508).
Since this is a 2 dimensional problem, the solution can also be visualized using a contour plot of the function.

# Region of interest: 0.01<=x1<=0.49, 0.01<=x2<=0.49
x1=seq(0.01,0.49,by=0.01)
x2=seq(0.01,0.49,by=0.01)

# vectorizing function eval_f0 so that it can be evaluated in the outer function
fcont=function(x,y) eval_f0(c(x,y))
fcontv=Vectorize(fcont,SIMPLIFY=FALSE)
> z=outer(x1,x2,fcontv)
There were 50 or more warnings (use warnings() to see the first 50)

The warnings are produced in regions where x_1+x_2 > 0.5 since the function is not defined in those regions. This is not an issue for the contour plot (it will just ignore those regions) we will plot next.


# filled.coutour and contour are overlaid with the minimum point (0.133,0.351)
filled.contour(x1,x2,z,xlab="x1",ylab="x2",
plot.axes={axis(1); axis(2); contour(x1,x2,z,levels=c(-3,-2.5,-2,-1.5,-1,0),vfont=c("sans serif","bold"),labcex=1,lwd=2,add=TRUE);
points(0.133,0.351,pch=15,col="blue")})

gibbscontour

MATLAB/Octave functions for function minimization (fmincon) have been used in Chemical Engineering computations for a long time and are robust. R has traditionally not been used in this domain. So it is hard to say how the functions I have used in this blog will perform across the range of problems encountered in Reaction Engineering.


> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] alabama_2011.9-1  numDeriv_2012.3-1 rootSolve_1.6.2
Posted in R | 1 Comment

Learning R using a Chemical Reaction Engineering Book: Part 2

In case you missed part 1, you can view it here. In this part, I tried to recreate the examples in section A.2.2 of the computational appendix in the reaction engineering book by Rawlings and Ekerdt.

Solving a nonlinear system of equations

This example involves determining reaction equilibrium conditions by solving the following system of nonlinear equations.
\begin{aligned}  PK_1y_Iy_B-y_{P1}&=&0, \\  PK_2y_Iy_B-y_{P2}&=&0  \end{aligned}

The relation between the variables y_I,y_B,y_{P1},y_{P2} and extent of reactions x_1,x_2 are:
\begin{aligned}  y_I&=&\frac{y_{I0}-x_1-x_2}{1-x_1-x_2} \\  y_B&=&\frac{y_{B0}-x_1-x_2}{1-x_1-x_2} \\  y_{P1}&=&\frac{y_{p10}+x_1}{1-x_1-x_2} \\  y_{P2}&=&\frac{y_{p20}+x_2}{1-x_1-x_2}  \end{aligned}

Here I have used R package rootSolve for solving the above set of equations to determine x_1 and x_2 . The library is loaded and the functions to be solved are defined in the R function fns.

# load library rootSolve
library(rootSolve)

# function defining F(x)=0
fns=function(x){
K1=108; K2=284; P=2.5
yI0=0.5; yB0=0.5; yP10=0; yP20=0;
d=1-x[1]-x[2]
yI=(yI0-x[1]-x[2])/d
yB=(yB0-x[1]-x[2])/d
yP1=(yP10+x[1])/d
yP2=(yP20+x[2])/d
F1=P*K1*yI*yB-yP1
F2=P*K2*yI*yB-yP2
c(F1=F1,F2=F2)
}

Next, an initial guess of (0.2,0.2) is set for the variables and the equations are solved using the function multiroot (from package rootSolve)

# initial guess for x
xinit=c(0.2,0.2)

# solve the equations
xans=multiroot(f=fns,start=xinit)

# object returned by multiroot
> xans
$root
[1] 0.1333569 0.3506793

$f.root
F1           F2
6.161738e-15 1.620926e-14

$iter
[1] 7

$estim.precis
[1] 1.11855e-14

# solution to the equations
> xans$root
[1] 0.1333569 0.3506793
>

The solution to the equations is accessed from the variable xans$root which in this case is (0.1334,0.3507)

MATLAB/Octave functions for solving nonlinear equations (fsolve) have been used in Chemical Engineering computations for a long time and are robust. R has traditionally not been used in this domain. So it is hard to say how the functions I have used in this blog will perform across the range of problems encountered in Reaction Engineering.

Posted in R | 2 Comments

Learning R using a Chemical Reaction Engineering Book: Part 1

Chemical Reactor Analysis and Design Fundamentals by J.B. Rawlings and J. G. Ekerdt is a textbook for studying Chemical Reaction Engineering. The popular open source package Octave has its origins to the reaction engineering course offered by Prof. Rawlings. This book is accompanied by Octave and Matlab code for solving typical problems encountered in Reaction Engineering.

I figured that maybe one way to learn R is so see whether I can code some of th examples from this book in R. I am by no means suggesting that R can replace MATLAB/Octave for engineering problems but merely it is a way for me to learn the language.

I started with the computational appendix listed in the book’s website and am trying to work through some of the examples there. It will be good to refer to the computational appendix to follow the R code below.


# stoichiometric matrix
stoi=matrix(c(0,1,0,-1,-1,1,
-1,1,1,-1,0,0,
1,0,-1,0,-1,1),
ncol=6,byrow=T)
> stoi
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    1    0   -1   -1    1
[2,]   -1    1    1   -1    0    0
[3,]    1    0   -1    0   -1    1

# rank of the stoichiometrix matrix
rank=qr(stoi)$rank
> rank
[1] 2

# reaction rate vector
r=c(1,2,3)

Given the reaction rate r=(r1,r2)’, the rate of change of species concentration R is given by
R=\nu^Tr
where \nu is the stoichiometrix matrix

# rate of change of components
R=t(stoi)%*%r
> R
[,1]
[1,]    1
[2,]    3
[3,]   -1
[4,]   -3
[5,]   -4
[6,]    4

Example A1: Estimating reaction rates

The stoichometrix matrix \nu is input below

# stoichiometry
stoi=matrix(c(0,1,0,-1,-1,1,
-1,1,1,-1,0,0),nrow=2,byrow=T)
> stoi
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    1    0   -1   -1    1
[2,]   -1    1    1   -1    0    0

# number of species and number of reactions
nspec=ncol(stoi)
nr=nrow(stoi)

> nspec
[1] 6
> nr
[1] 2

# true rxn rates
r=c(1,2)
> r
[1] 1 2

# true component rates
R=t(stoi)%*%r
> R
[,1]
[1,]   -2
[2,]    3
[3,]    2
[4,]   -3
[5,]   -1
[6,]    1

Simulate 2000 measured component rates

Add random noise (normally distributed with mean 0 and standard deviation 0.05) to true species rate vector R
R^m = R + \epsilon, \epsilon \sim N(0,0.0025)

## simulate 2000 noise estimates
e=matrix(0.05*rnorm(2000*nspec,0,1),nrow=2000,byrow=T)
Rmeas=matrix(rep(R,2000),ncol=nspec,byrow=T)+e

The least squares estimate of reaction rate vector \hat{r} is
\hat{r}=(\nu\nu^T)^{-1}{\nu}R^m

## estimate reaction rates
rest=solve(stoi%*%t(stoi),stoi%*%t(Rmeas))

I was trying different plot features in R and applying to this data of estimated rates.
I found the following function that plots scatterplot with marginal histograms


# plotting scatterplot with histogram
# downloaded from web
#  http://www.r-bloggers.com/example-8-41-scatterplot-with-marginal-histograms/
#
scatterhist = function(x, y, xlab="", ylab=""){
par.default <- par(no.readonly=TRUE)
zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
xhist = hist(x, plot=FALSE)
yhist = hist(y, plot=FALSE)
top = max(c(xhist$counts, yhist$counts))
par(mar=c(3,3,1,1))
plot(x,y)
par(mar=c(0,3,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
par(mar=c(3,0,1,1))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
par(oma=c(3,3,0,0))
mtext(xlab, side=1, line=1, outer=TRUE, adj=0,
at=0.9 * (mean(x) - min(x))/(max(x)-min(x)))
mtext(ylab, side=2, line=1, outer=TRUE, adj=0,
at=(0.9 * (mean(y) - min(y))/(max(y) - min(y))))
par=par(par.default)

}

# scatter plot of reaction rates with marginal histograms
scatterhist(t(rest)[,1],t(rest)[,2],xlab="r_1",ylab="r_2")

scatterhist

There is library cars that has a command ‘scatterplot’ for plotting scatterplot with box plots and has several other options
In the plot below 50% and 90% ellipses are overlaid on the data

# scatter plot of reaction rates with marginal box plots
library(car)
scatterplot(t(rest)[,1],t(rest)[,2],reg.line=FALSE,smooth=FALSE,ellipse=TRUE)

scatterbox

I tried ggplot2 also for the same plot

# 2d contours/density with ggplot2 for reaction rates

# create dataframe of reaction rates
rest_df=data.frame(t(rest))
names(rest_df)=c("r1","r2")

library(ggplot2)
ggplot(data=rest_df,aes(x=r1,y=r2))+geom_point()

scatterggplot1

ggplot(data=rest_df,aes(x=r1,y=r2))+stat_density2d()

scatterggplot2

ggplot(data=rest_df,aes(x=r1,y=r2))+stat_density2d(aes(fill=..level..),geom="polygon")

scatterggplot

Posted in R | 3 Comments