Posts Tagged ‘Network’

Understanding dyad in ERGM

Published by chengjun on October 20th, 2012

It’s a bit difficult to understand the terms used by ERGM.

I am working on predicting how friendships influence the information diffusion using weibo landscape data with ERGM.

According to the statnet library:

With the parameter of “dyadcov” term, we add three statistics to the model, each equal to the sum of the covariate values for all dyads occupying one of the three possible non-empty dyad states (mutual, upper-triangular asymmetric, and lower-triangular asymmetric dyads, respectively).

Obviously, there are three kinds of dyads.

Check Wasserman and Faust’ book of Social Network Analysis, and I find the figure above.

 However, most of us don’t know the difference between upper-triangular asymmetric and lower-triangular asymmetric dyads.

Based on my understanding, it’s related to the direction of ties, see the figure below (am i right?):


Human Disease Network

Published by chengjun on September 5th, 2011

This is the result, the codes is attached below.

# disease co-occurrence network
# http://barabasilab.neu.edu/projects/hudine/resource/data/data.html
# chengjun @ common room 2011/9/4
 
#~~~~~~~~~~~load data~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
dis<-read.table("D:/NKS & SFI/AllNet3.txt", header = FALSE, sep = "",  dec = ".")
 
#~~~~~~~~~~~Data Description and rename~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# 1  ICD-9 code disease 1
# 2  ICD-9 code disease 2
# 3  Prevalence disease 1
# 4  Prevalence disease 2
# 5  Co-ocurrence between diseases 1 and 2
# 6  Relative Risk
# 7  Relative Risk 99% Conf. Interval (left)
# 8  Relative Risk 99% Conf. Interval (right)
# 9  Phi-correlation
# 10 t-test value
 
names(dis)<-c("dis1","dis2","prevalence_dis1","prevalence_dis2","co_ocurrence","risk",
     "riskleft","riskright","phi","t" ) 
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~distribution~~~~~~~~~~~~~~~~~~~~~~~~#
dis1<-as.data.frame(table(dis[,1]))
names(dis1)<-c("disease","Numbers of combinations of coocurrence with other diseases")
plot(dis1) # this is the number of combinations of coocurrences with other diseases
plot(as.data.frame(table(dis1[,2])))  # distribution of combinations
 
popp<-as.data.frame(dis1)
 
plot(popp[,1],popp[,2],xlab="In-degree",ylab="Frequency",type = "p", col = "black", lwd=2,main = "")
 
powerfit<-lm(log(popp[,2])~log(as.numeric(levels(popp[,1])[popp[,1])))
summary(powerfit)
 
plot(log(as.numeric(levels(popp[,1])[popp[,1]]),log(popp[,2]),
   xlab="In-degree",ylab="Frequency",type = "p", col = "black", lwd=2,main = "")
 
abline(powerfit, col = "grey",lwd=3)
 
#~~~~~~~compute the degree distribution using igraph~~~~~~~~~~~~~~~~~~~~#
library(igraph)# install.packages("igraph")
jj<-graph.data.frame(dis[,1:2], directed=FALSE, vertices=NULL)
class(jj)
dd <- degree(jj, mode="in")
ddd <- degree.distribution(jj, mode="in", cumulative=TRUE)
alpha <- power.law.fit(dd, xmin=20)
alpha
plot(ddd, log="xy", xlab="degree", ylab="cumulative frequency",
     col=1, main="Nonlinear preferential attachment")
lines(10:500, 10*(10:500)^(-coef(alpha)+1))
 
# save data
setwd("D:/NKS & SFI/")
savePlot(filename = "Nonlinear preferential attachment",
         type = c( "png"),
         device = dev.cur(),
         restoreConsole = TRUE)
#####################################################
#
#                               plot the disease graph
#
######################################################
#~~~~~~~~~~~~~~~~subset data~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
backbone<-subset(dis, dis[5]>=100000)
dim(backbone)
 
library(igraph)# install.packages("igraph")
g<-graph.data.frame(backbone[,1:2], directed=FALSE, vertices=NULL)
summary(g)
#------the size of nodes denotes the prevalence of events~~~~~~~~~~~~~~~~~~~~~~~~~~~#
 
prevalence<-data.frame(rbind(cbind(backbone[,1], backbone[,3]), cbind(backbone[,2], backbone[,4])) )
prevalencen<-unique(prevalence, fromLast = F)  ## extract unique elements
 
V(g)$size <- log(prevalencen[,2])-5
 
#------the color of nodes denotes the popularity of events~~~~~~~~~~~~~~~~~~~~------#
V(g)$color <- rainbow(20)[log(prevalencen[,2]+5)]
# V(g)$color <-heat.colors(20, alpha = 1)[log(prevalencen[,2])]
#-------the width of links denots the volume of user traffic on the link------#
E(g)$weight <- log(backbone[,5])/5
 
plot(g, vertex.label= NA,edge.arrow.size=0.2,layout=layout.fruchterman.reingold,edge.width=E(g)$weight+1 )
 
setwd("D:/NKS & SFI/")
savePlot(filename = "Disease Coocurrence plot 100000 prevalence_97nodes_674",
         type = c( "png"),
         device = dev.cur(),
         restoreConsole = TRUE)

the hidden pattern behind online posting behavior: take 3377 as an example

Published by chengjun on October 3rd, 2010

This is a study for fun. There was a case of flesh search called 3377. I download the whole thread and analyze the hidden pattern behind the reply behavior.

From the figures above, we can find that the number of reply is distributed as power law.

Then I use Rost software to analyze its semantic network structure.