Archive for the ‘Reflection’ Category

How to Get Standard Regression Coefficient Using R

Published by admin on March 18th, 2012

Last week, Lexing showed us a picture of how different programming languages look like. Very Interesting. However, let’s dive to the question raised in the title of this post: How to Get Standard Regression Coefficient Using R?( In addition to that, I want to practice using SyntaxHighliter here.)

# e.g. I set up one regression
reg<-lm(view/top$DAYS~feo+fev+ffvv+frf+frfsm+frfgs
        +frfgvs+frflv+frfy+frfys+fvfa+fvfmd+fvocp+ov)
summary(reg)
# model diagnosis
layout(matrix(1:4,2,2))
plot(reg)

# according to the formula between standard regression coefficient
                beta.x=b.x*sd.x/sd.y
# calcuate the standard regression coefficient one by one
# for the beta of feo
b.x<-coef(reg)[2]
sd.y<-sd(view/top$DAYS)
sd.x<-sd(feo)
beta.x<-b.x*sd.x/sd.y
# it's dull to do it in this way!

# output standard coefficient using QuantPsyc library
library(QuantPsyc)# install.packages("QuantPsyc")
as.data.frame(lm.beta(reg))

Apparently, R is not well designed in this aspect, although we could get what we want, but it’s not so efficient and convenient compared with other commercial software. Hope this will be improved in the future.


Communication Journals

Published by chengjun on November 3rd, 2011

Visualization of Dynamic Changes Using googleVis, r, and dropbox

Published by admin on September 12th, 2011

GoogleVis could be used to visualize the dynamic change of social pattern. Here I will test some examples.

library(googleVis)

data(package="googleVis")
# Data sets in package ‘googleVis’:
# Andrew            Hurricane Andrew: googleVis example data set
# CityPopularity    CityPopularity: googleVis example  data set
# Exports           Exports: googleVis example data set
# Fruits            Fruits: googleVis example data set
# OpenClose         OpenClose: googleVis example data set
# Population        Population: googleVis example data  set
# Regions           Regions: googleVis example data set
# Stock             Stock: googleVis example data set

# Visualizing the hurricane of Andrew

data(Andrew)
plot(Andrew)

AndrewGeoMap <- gvisGeoMap(Andrew, locationvar='LatLong', numvar='Speed_kt',
hovervar='Category',
options=list(width=600,height=300,
region='US', dataMode='Markers'))
plot(AndrewGeoMap)
AndrewGeoMap$html$chart
setwd("d:/r")
cat(AndrewGeoMap$html$chart, file="AndrewGeoMap.html")
# then you can find it in the work directory.
# Since WordPress doesn’t allow embedded JavaScript. I  put it into dropbox public folder and i past the

link is given here.

Further, I visualize the dynamic change of cell phone ratio in the world.

# fixed telephone use

ftel<-read.csv("d:/r/Fixed Telephone.csv", header = T, sep = ",", quote = "\"'",  dec = ".")[1:7263,]
ftelr<-read.csv("d:/r/Fixed Telephone ratio.csv", header = T, sep = ",", quote = "\"'",    dec = ".")

ftelm<-merge(ftelr, ftel, by=c("Country.or.Area","Year"))

ftelm<-cbind(ftelm[,1:3], ftelm[,5])
names(ftelm)<-c("Country or Area","Year","Percentage","Users")
library("googleVis")
fixed_telephone_motion<- gvisMotionChart(ftelm, idvar="Country or Area", timevar="Year", options=list(width=1024, height=768))plot(fixed_telephone_motion)
setwd("d:/r")
cat(fixed_telephone_motion$html$chart, file="fixed_telephone_motion.html")

The geographical distribution of cell phone usge (per 100 people)

The motion chart of cell phone usage (per 100 people)

# International news coverage visualization

Finally, i visualize the international news of different countries during August of 2011.

You may not be able to see the links above, I made a web page to show the result, click here:

http://weblab.com.cityu.edu.hk/chengjun/


Visualizing News Diffusion Process Using Heat Graph

Published by chengjun on September 8th, 2011

# visualize diffusion according to diffusion time
# chengjunwang @ office, CMC, 2011/09/06

This is the network of diffusion, yellow node denotes those nodes who don’t diffuse the news, while the red denotes the diffuser. You can find that most of people do not diffuse the news. To reduce this misleading information, we need to merge those nodes who connects with diffusers but don’t diffuse the news. This makes the giant node in the figure below, the white node.

#~~~~~~~~~~~~~~~~~~~~~~~~~read csv~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
digg &lt;- read.csv(file= "d:/micro-blog/digg/digg_votes1.csv", head=FALSE,na.string='NA')
 
net &lt;- read.csv(file= "d:/micro-blog/digg/digg_net.csv", head=T) # getdirected friends network
 
# V1 is friend_date: Unix time stamp of when the friendship link was created
# user_id: anonymized unique id of a user
# friend_id: anonymized unique id of a user
#~~~~~~~~~~~~~~~~~get the name list~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
namelist&lt;-subset(digg,digg[,3]==299) #get the name list of who votes for story 1
name&lt;-data.frame(namelist[,2])
dim(namelist)       # number of diffuers
subnet=subset(net,net[,3]%in%name[,1])
dim(subnet)         # edge numbers
length(unique(c(subnet[,3], subnet[,4]))) # unique nodes number
########################################################################
#      merging nodes for visualization                                 #
########################################################################
g1&lt;-subnet[,3:4]
g1[,2]&lt;-replace(g1[,2], !(g1[,2]%in%namelist[,2]), 0)
 
library(igraph)     # install.packages("igraph")
g1&lt;-graph.data.frame(g1, directed=T)
summary(g1)
 
V(g1)$size=log(degree(g1))
 
time&lt;-namelist[,1][match(V(g1)$name,namelist[,2])]
time[is.na(time)]&lt;-max(time)+1
# subset(namelist, namelist[,2]==333807) ## use this to test the sequence
code=sort(unique(time))
time=match(time,code)
# This sorts the values and assigns a unique id for each ID number
# This matches the IDs of column 1 in the edgelist to the unique IDs
V(g1)$color &lt;- rainbow(max(time))[time]
V(g1)$color &lt;-heat.colors(266, alpha = 1)[time]
 
par(mfrow=c(1,2))
plot(g1, vertex.label= NA,edge.arrow.size=0.2,layout=layout.kamada.kawai)
plot(rep(2,266), col=heat.colors(266),axes = FALSE, ann = FALSE)

In this figure, the nodes with dark color diffuse first, the larges nodes of white color is the merged cluster of non-diffusers. You can find that many diffusers connect with this non-diffuser giant, but they are not connected with each other, apparently, they are not influenced by the people they followed. So whom are they influence by? This is the starting point of my study of news diffusion.

# pie(rep(1,max(time)), col=rainbow(max(time)))
# plot(g1, vertex.label= NA,edge.arrow.size=0.2,layout=layout.fruchterman.reingold )
 
setwd("D:/NKS &amp; SFI/")
savePlot(filename = "diffusion 5 news3 digg266 of 9496 users 58 isolates neglected",
         type = c( "png"),
         device = dev.cur(),
         restoreConsole = TRUE)


0 visitors online now
0 guests, 0 bots, 0 members