Archive for September, 2011

Visualization of Dynamic Changes Using googleVis, r, and dropbox

Published by admin on September 12th, 2011

GoogleVis could be used to visualize the dynamic change of social pattern. Here I will test some examples.

library(googleVis)

data(package="googleVis")
# Data sets in package ‘googleVis’:
# Andrew            Hurricane Andrew: googleVis example data set
# CityPopularity    CityPopularity: googleVis example  data set
# Exports           Exports: googleVis example data set
# Fruits            Fruits: googleVis example data set
# OpenClose         OpenClose: googleVis example data set
# Population        Population: googleVis example data  set
# Regions           Regions: googleVis example data set
# Stock             Stock: googleVis example data set

# Visualizing the hurricane of Andrew

data(Andrew)
plot(Andrew)

AndrewGeoMap <- gvisGeoMap(Andrew, locationvar='LatLong', numvar='Speed_kt',
hovervar='Category',
options=list(width=600,height=300,
region='US', dataMode='Markers'))
plot(AndrewGeoMap)
AndrewGeoMap$html$chart
setwd("d:/r")
cat(AndrewGeoMap$html$chart, file="AndrewGeoMap.html")
# then you can find it in the work directory.
# Since WordPress doesn’t allow embedded JavaScript. I  put it into dropbox public folder and i past the

link is given here.

Further, I visualize the dynamic change of cell phone ratio in the world.

# fixed telephone use

ftel<-read.csv("d:/r/Fixed Telephone.csv", header = T, sep = ",", quote = "\"'",  dec = ".")[1:7263,]
ftelr<-read.csv("d:/r/Fixed Telephone ratio.csv", header = T, sep = ",", quote = "\"'",    dec = ".")

ftelm<-merge(ftelr, ftel, by=c("Country.or.Area","Year"))

ftelm<-cbind(ftelm[,1:3], ftelm[,5])
names(ftelm)<-c("Country or Area","Year","Percentage","Users")
library("googleVis")
fixed_telephone_motion<- gvisMotionChart(ftelm, idvar="Country or Area", timevar="Year", options=list(width=1024, height=768))plot(fixed_telephone_motion)
setwd("d:/r")
cat(fixed_telephone_motion$html$chart, file="fixed_telephone_motion.html")

The geographical distribution of cell phone usge (per 100 people)

The motion chart of cell phone usage (per 100 people)

# International news coverage visualization

Finally, i visualize the international news of different countries during August of 2011.

You may not be able to see the links above, I made a web page to show the result, click here:

http://weblab.com.cityu.edu.hk/chengjun/


Visualizing News Diffusion Process Using Heat Graph

Published by chengjun on September 8th, 2011

# visualize diffusion according to diffusion time
# chengjunwang @ office, CMC, 2011/09/06

This is the network of diffusion, yellow node denotes those nodes who don’t diffuse the news, while the red denotes the diffuser. You can find that most of people do not diffuse the news. To reduce this misleading information, we need to merge those nodes who connects with diffusers but don’t diffuse the news. This makes the giant node in the figure below, the white node.

#~~~~~~~~~~~~~~~~~~~~~~~~~read csv~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
digg &lt;- read.csv(file= "d:/micro-blog/digg/digg_votes1.csv", head=FALSE,na.string='NA')
 
net &lt;- read.csv(file= "d:/micro-blog/digg/digg_net.csv", head=T) # getdirected friends network
 
# V1 is friend_date: Unix time stamp of when the friendship link was created
# user_id: anonymized unique id of a user
# friend_id: anonymized unique id of a user
#~~~~~~~~~~~~~~~~~get the name list~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
namelist&lt;-subset(digg,digg[,3]==299) #get the name list of who votes for story 1
name&lt;-data.frame(namelist[,2])
dim(namelist)       # number of diffuers
subnet=subset(net,net[,3]%in%name[,1])
dim(subnet)         # edge numbers
length(unique(c(subnet[,3], subnet[,4]))) # unique nodes number
########################################################################
#      merging nodes for visualization                                 #
########################################################################
g1&lt;-subnet[,3:4]
g1[,2]&lt;-replace(g1[,2], !(g1[,2]%in%namelist[,2]), 0)
 
library(igraph)     # install.packages("igraph")
g1&lt;-graph.data.frame(g1, directed=T)
summary(g1)
 
V(g1)$size=log(degree(g1))
 
time&lt;-namelist[,1][match(V(g1)$name,namelist[,2])]
time[is.na(time)]&lt;-max(time)+1
# subset(namelist, namelist[,2]==333807) ## use this to test the sequence
code=sort(unique(time))
time=match(time,code)
# This sorts the values and assigns a unique id for each ID number
# This matches the IDs of column 1 in the edgelist to the unique IDs
V(g1)$color &lt;- rainbow(max(time))[time]
V(g1)$color &lt;-heat.colors(266, alpha = 1)[time]
 
par(mfrow=c(1,2))
plot(g1, vertex.label= NA,edge.arrow.size=0.2,layout=layout.kamada.kawai)
plot(rep(2,266), col=heat.colors(266),axes = FALSE, ann = FALSE)

In this figure, the nodes with dark color diffuse first, the larges nodes of white color is the merged cluster of non-diffusers. You can find that many diffusers connect with this non-diffuser giant, but they are not connected with each other, apparently, they are not influenced by the people they followed. So whom are they influence by? This is the starting point of my study of news diffusion.

# pie(rep(1,max(time)), col=rainbow(max(time)))
# plot(g1, vertex.label= NA,edge.arrow.size=0.2,layout=layout.fruchterman.reingold )
 
setwd("D:/NKS &amp; SFI/")
savePlot(filename = "diffusion 5 news3 digg266 of 9496 users 58 isolates neglected",
         type = c( "png"),
         device = dev.cur(),
         restoreConsole = TRUE)


Human Disease Network

Published by chengjun on September 5th, 2011

This is the result, the codes is attached below.

# disease co-occurrence network
# http://barabasilab.neu.edu/projects/hudine/resource/data/data.html
# chengjun @ common room 2011/9/4
 
#~~~~~~~~~~~load data~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
dis&lt;-read.table("D:/NKS &amp; SFI/AllNet3.txt", header = FALSE, sep = "",  dec = ".")
 
#~~~~~~~~~~~Data Description and rename~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# 1  ICD-9 code disease 1
# 2  ICD-9 code disease 2
# 3  Prevalence disease 1
# 4  Prevalence disease 2
# 5  Co-ocurrence between diseases 1 and 2
# 6  Relative Risk
# 7  Relative Risk 99% Conf. Interval (left)
# 8  Relative Risk 99% Conf. Interval (right)
# 9  Phi-correlation
# 10 t-test value
 
names(dis)&lt;-c("dis1","dis2","prevalence_dis1","prevalence_dis2","co_ocurrence","risk",
     "riskleft","riskright","phi","t" ) 
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~distribution~~~~~~~~~~~~~~~~~~~~~~~~#
dis1&lt;-as.data.frame(table(dis[,1]))
names(dis1)&lt;-c("disease","Numbers of combinations of coocurrence with other diseases")
plot(dis1) # this is the number of combinations of coocurrences with other diseases
plot(as.data.frame(table(dis1[,2])))  # distribution of combinations
 
popp&lt;-as.data.frame(dis1)
 
plot(popp[,1],popp[,2],xlab="In-degree",ylab="Frequency",type = "p", col = "black", lwd=2,main = "")
 
powerfit&lt;-lm(log(popp[,2])~log(as.numeric(levels(popp[,1])[popp[,1])))
summary(powerfit)
 
plot(log(as.numeric(levels(popp[,1])[popp[,1]]),log(popp[,2]),
   xlab="In-degree",ylab="Frequency",type = "p", col = "black", lwd=2,main = "")
 
abline(powerfit, col = "grey",lwd=3)
 
#~~~~~~~compute the degree distribution using igraph~~~~~~~~~~~~~~~~~~~~#
library(igraph)# install.packages("igraph")
jj&lt;-graph.data.frame(dis[,1:2], directed=FALSE, vertices=NULL)
class(jj)
dd &lt;- degree(jj, mode="in")
ddd &lt;- degree.distribution(jj, mode="in", cumulative=TRUE)
alpha &lt;- power.law.fit(dd, xmin=20)
alpha
plot(ddd, log="xy", xlab="degree", ylab="cumulative frequency",
     col=1, main="Nonlinear preferential attachment")
lines(10:500, 10*(10:500)^(-coef(alpha)+1))
 
# save data
setwd("D:/NKS &amp; SFI/")
savePlot(filename = "Nonlinear preferential attachment",
         type = c( "png"),
         device = dev.cur(),
         restoreConsole = TRUE)
#####################################################
#
#                               plot the disease graph
#
######################################################
#~~~~~~~~~~~~~~~~subset data~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
backbone&lt;-subset(dis, dis[5]&gt;=100000)
dim(backbone)
 
library(igraph)# install.packages("igraph")
g&lt;-graph.data.frame(backbone[,1:2], directed=FALSE, vertices=NULL)
summary(g)
#------the size of nodes denotes the prevalence of events~~~~~~~~~~~~~~~~~~~~~~~~~~~#
 
prevalence&lt;-data.frame(rbind(cbind(backbone[,1], backbone[,3]), cbind(backbone[,2], backbone[,4])) )
prevalencen&lt;-unique(prevalence, fromLast = F)  ## extract unique elements
 
V(g)$size &lt;- log(prevalencen[,2])-5
 
#------the color of nodes denotes the popularity of events~~~~~~~~~~~~~~~~~~~~------#
V(g)$color &lt;- rainbow(20)[log(prevalencen[,2]+5)]
# V(g)$color &lt;-heat.colors(20, alpha = 1)[log(prevalencen[,2])]
#-------the width of links denots the volume of user traffic on the link------#
E(g)$weight &lt;- log(backbone[,5])/5
 
plot(g, vertex.label= NA,edge.arrow.size=0.2,layout=layout.fruchterman.reingold,edge.width=E(g)$weight+1 )
 
setwd("D:/NKS &amp; SFI/")
savePlot(filename = "Disease Coocurrence plot 100000 prevalence_97nodes_674",
         type = c( "png"),
         device = dev.cur(),
         restoreConsole = TRUE)