Friday, August 21, 2009

Transposing a Dataset


# A common problem is to transpose a dataset from the long to the wide
# format,e.g., given the dataset below:
> s <- c("a","b","c") # Set up a group of source
> d <- expand.grid(s,s) # Create the cartesian product of s
> d$flow <- ifelse(d$Var1==d$Var2,0,0.2534469)# Assign the values
> print(d)
Var1 Var2 flow
1 a a 0.0000000
2 b a 0.2534469
3 c a 0.2534469
4 a b 0.2534469
5 b b 0.0000000
6 c b 0.2534469
7 a c 0.2534469
8 b c 0.2534469
9 c c 0.0000000
#we would like something like this:
# Var2
#Var1 a b c
# a 0.0000000 0.2534469 0.2534469
# b 0.2534469 0.0000000 0.2534469
# c 0.2534469 0.2534469 0.0000000
#The function reshape() is handy for this, but more often than not is
#cumbersome to use. The package reshape with the functions cast() and
#melt() is way more intuitive. An alternative way takes advantage of
#R's vectorization function tapply():
> tapply(d[,3],d[,c(1,2)],c)
Var2
Var1 a b c
a 0.0000000 0.2534469 0.2534469
b 0.2534469 0.0000000 0.2534469
c 0.2534469 0.2534469 0.0000000
# contrast the tapply version with the reshape one:
> reshape(d, idvar="Var1",timevar="Var2",direction="wide")
Var1 flow.a flow.b flow.c
1 a 0.0000000 0.2534469 0.2534469
2 b 0.2534469 0.0000000 0.2534469
3 c 0.2534469 0.2534469 0.0000000

No comments:

Post a Comment