Archive for the ‘Data collection’ Category

Three clickstream network data sets

Monday, May 14th, 2012


The data sets of three clickstream networks can be downloaded here. These data sets are collected from Alexa in the way introduced as follows. Please cite the paper The Decentralized Structure of Collective Attention on the Web while using these data sets.

We at first selected three lists of top 1000 sites at different time points. Two of them were selected from Google statistics and the rest one was selected from Alexa reports. We then downloaded from Alexa the clickstreams between the sites on the lists. From the downloaded data we constructed three clickstream networks, in which a directed, weighted edge from node i to j indicated the daily percentage of the global Web users who visited i and j successively. It should be noted that as Alexa only reports a maximum of ten top inbound and outbound clickstreams for each site, our data set does not necessarily include all the clickstreams between the studied sites. We actually constructed and studied the “backbone networks” of the clickstreams on the Web. That is, we extracted the top clickstreams connecting the largest sites on the Web.