美国政府数据 http://www.data.gov/
Information Network
-
proximity DBLP http://kdl.cs.umass.edu/data/dblp/dblp-info.html
-
DBLP-Citation-Network http://arnetminer.org/citation
-
CiteSeer (hardly) http://csxstatic.ist.psu.edu/about/data
-
CiteSeer dumped http://martinharrigan.blogspot.com/2008/07/citeseers-dataset.html
-
Cora (hardly) http://people.cs.umass.edu/~mccallum/data.html
Social Network
-
Stanford large network dataset (contains lots of network dataset): http://snap.stanford.edu/data/
-
Stanford class resources http://snap.stanford.edu/na09/resources.html
-
ICWSM twitter dataset: http://twitter.mpi-sws.org/data-icwsm2010.html
-
EBSN - Event-based social network dataset: http://www.largenetwork.org/ebsn
-
Other social network dataset: Slashdot, Enron email, Mit mobile, Epinions reviews.
Sentiment and Option Mining
-
Bing Liu's homepage
-
Movie Review http://www.cs.cornell.edu/people/pabo/movie-review-data/
-
Lee's homepage
-
twitter sentiment: http://www.sananalytics.com/lab/twitter-sentiment/
Recommendation
-
index1: https://gist.github.com/1653794
Machine Learning
-
UCI dataset http://archive.ics.uci.edu/ml/datasets.html
Audio Retrieval
-
CAL-500: http://twitterdata.org/
-
Million song dataset http://labrosa.ee.columbia.edu/millionsong/
Miscellaneous1
-
A lot graph dataset including several cups, twitter etc http://graphlab.org/downloads/datasets/
-
Several graph dataset http://law.di.unimi.it/datasets.php
-
Delicious/Flikr/Last.FM etc http://www.tagora-project.eu/data/
-
A small dataset about links http://www.cs.umd.edu/projects/linqs/projects/lbc/index.html
-
A small dataset including citeseerx/imdb http://komarix.org/ac/ds/
Miscellaneous2
Only user-object
-
Amazon
Both user-user and user-object
single-type user netwrok
-
Flickr, Youtube, twitter
signed user network
-
Epinion, Slashdot, Ciao
Multi-type user network
-
Facebook, Google plus