Datasets

Last update: 2011-02-25

Dataset1: Citeulike Posts and paper titles

Description: This data was crawled from citeulike during June and July of 2009, starting from selected users, and then jumping to their neighbors (that tagged the same papers).
The table post_normalized_instances has

  • 1,894,165 instances of tagging
  • On 475,283 unique posts (~papers)
  • Made by 5,468 unique users
  • Using 117,230 unique tags

To use it, please reference this paper:
Denis Parra and Peter Brusilovsky. 2009. Collaborative filtering for social tagging systems: an experiment with CiteULike. In Proceedings of the third ACM conference on Recommender systems (RecSys '09). ACM, New York, NY, USA, 237-240. DOI=10.1145/1639714.1639757 http://doi.acm.org/10.1145/1639714.1639757

Dataset2: Twitter activity during four conferences in 2012

Description: We used this data to submit an extended abstract to Web Science 2013: "Who contributes and Who is receiving the attention on Twitter during academic conferences?" by Denis Parra, Christoph Trattner and Xidao Wen. We collected the tweets of four conferences by using their respective conference hashtag: Hypertext 2012 (#ht2012), UMAP 2012 (#umap2012), RECSYS 2012 (#umap2012), and ECTEL 2012 (#ectel2012).

  • Link to download the data
  • Link to download the R Code to do tables and plots