Open Data Resources

SUBJECTS

PUBLIC DATASETS
google dataset search logo

datasetsearch.research.google.com

  • Google's search engine designed specifically for finding datasets.
  • Many of the government and association-provided datasets discussed in this guide can be found through this platform as well.
github logo for awesome public datasets page

github.com/awesomedata/awesome-public-datasets

  • Curated list of open data resources on GitHub (by the AwesomeData community)
  • Entries categorized neatly by subject area.
kaggle logo

kaggle.com/datasets

  • Kaggle is a popular data science competition site, where community members compete in building predictive models from uploaded datasets.
  • The datasets are incredibly varied and interesting, but also highly specific. Examples: NYC AirBnB dataYoutube trending videos datavideo game sales data, and Netflix movies & shows data
  • A word of caution: Be sure to check the dataset documentation to see how the owner collected the data. Sometimes it may simply be simulated play data for the sake of practice. Since any user in the Kaggle community can submit datasets, the quality may vary.
reddit datasets subreddit logo

reddit.com/r/datasets

  • The r/datasets subreddit is an active community of data users to share interesting datasets or request data they are seeking.
  • Like Kaggle, be mindful that content can be uploaded by anonymous members who have not been vetted.

 

RESEARCH DATASETS
registry of research data repositories logo

re3data.org

  • Over 2,000 research data repositories indexed.
  • Multidisciplinary list covering Humanities & Social Sciences, Natural Sciences, Life Sciences, and Engineering Sciences.

[Breakdown of subject areas]

open access directory logo

oad.simmons.edu/oadwiki/Data_repositories

  • Maintained by a community of researchers and scholars.
  • If you're interested in browsing general datasets, start from their multidisciplinary section.
harvard dataverse logo

dataverse.harvard.edu

  • Harvard's Dataverse contains over 100,000+ multidisciplinary datasets from published studies by researchers within and outside the Harvard community. 
uci machine learning repository logo

archive.ics.uci.edu/ml/datasets

  • Hosted by the University of California--Irvine, the UCI Machine Learning Repository contains many famous datasets common for machine learning techniques. 
  • Examples: Challenger Space Shuttle dataset and Wine Quality dataset.
  • Datasets are typically already cleaned and preprocessed.
  • Categorized by methods of analyses they were designed for.
GOVERNMENT DATASETS
data.gov logo

data.gov

  • The US government's open data portal contains datasets from the local city/state level and its many federal departments and agencies.
  • Topics include businessdemographics, education, energy & environment, finance & economics, health, human services, public safety, recreation, and transportation.

SUBJECT AREAS DATA
Businesses
  • General data on US businesses -- including types, finances, investments, profits, expenditures, sales, and inventory [Source: US Census Bureau
  • Annual number of firms, establishments, employment, and payroll by US geographic location, industry, and company size [Source: Statistics of US Businesses (SUSB)]
  • Production & business activity across industries such as retail, manufacturing, services, and tech. [Source: Federal Reserve]
Consumer
  • Consumer Expenditure Surveys (CE) -- data on expenditures, income, and characteristics of US consumers [Source: US Bureau of Labor Statistics]
  • Consumer Price Index (CPI) -- Average change over time in prices paid by consumers for goods and services [Source: US Bureau of Labor Statistics]
Demographics of labor force
Economic indicators
Employment
International economic accounts
  • International trade (exports and imports) in goods and services, investments, transactions (balance of payments) and activities of multinational enterprises (employment, sales, expenditures, R&D, etc.) [Source: US Bureau of Economic Analysis
Money statistics
  • National finances, interest rates, exchange rates, consumer credit, and loans [Source: US Federal Reserve
Prices
  • Producer Price Index (PPI) data by industry and commodity [Source: US Bureau of Labor Statistics]
  • Commodity prices, house price indexes, health care indexes, PPI & CPI, etc. [Source: Federal Reserve]
  • Historical pricing data (up to 2012): Indexes of producer and consumer prices, actual prices for selected commodities, food costs, and energy/fuel prices [Source: see "Prices" section of Statistical Abstracts of the United States]
Tax
  • Statistical tables for individual and corporate taxes. Tax data at the organization-level for charities and private foundations. [Source: Internal Revenue Service]

SUBJECT AREAS DATA
Arts & Culture
Crime & Justice
Minority
Politics & Voting
  • Public opinion survey data on political issues, presidential performance, and voting trends [Source: ICPSR
  • Political opinion survey data [Source: Pew Research Center]
  • Polling data and presidential performance [Source: FiveThirtyEight]
Sports
  • Sports data on gambling odds, player performance metrics, and rankings [Source: FiveThirtyEight]
Social media / internet
  • Raw content data from social media networks (Facebook, Twitter, etc.), online communities (Reddit), media (YouTube), and e-commerce (Amazon) [Source: Stanford Network Analysis Project]
Social trends
  • Public opinion survey data on American cultural and social trends, media, political climate, technology, and current issues [Source: Pew Research Center]
Women in the labor force

HIGHLIGHTED RESOURCES
icpsr logo

icpsr.umich.edu

  • ICPSR is a consortium of over 750 academic institutions and research organizations.
  • Contains over 250,000+ datasets from research in the social, political, and behavioral sciences.
  • Includes topics on education, voting, criminal justice, substance abuse, and terrorism.

[list of major topic areas]

pew research center logo

pewresearch.org

  • The Pew Research Center provides data from their polling and survey studies on social issues, attitudes, and trends.
  • Covers topics on politics, media, culture, religion, and internet/tech.  

[list of topics] [list of dataset categories]


Suggest a dataset

Got an interesting dataset or data source to recommend? Let me know about it!

E-mail: charles.terng@baruch.cuny.edu