公共数据集:数据科学、机器学习、AI 与数据分析

数据存储库

查看世界上最大的外部策划数据平台,该平台集成了来自全球所有领先来源的数据。


Anacode Chinese Web Datastore: A collection of crawled Chinese news and blogs in JSON format

Appen Open Source Datasets: Over 270 audio, image, video and text datasets in over 80 languages

AssetMacro: Historical data of macroeconomic indicators and market data

Awesome Public Datasets: A topic-centric list of HQ open datasets

AWS Public Data Sets: A centralized repository of public data sets

BigML Public Data Sources: A long list of sources of data that anyone can use

USA.gov: APIs and data feeds to help people find useful government information

DataPortals.org: A Comprehensive List of Open Data Portals from Around the World

Data.gov.uk: Find data published by central government, local authorities and public bodies to help you build products and services

Data Planet: The largest repository of standardized and structured statistical data

DataSF.org: Search hundreds of datasets from the City and County of San Francisco

Data.world: Discover and share data, connect with interesting people, and work together to solve problems faster

Europeana Data: Open metadata on 20 million texts, images, videos and sounds gathered by Europeana

GEO Gene Expression Omnibus: A curated, online resource for gene expression data browsing, query and retrieval

HitCompanies Datasets: Comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning

ICWSM 2009 Data Challenge: 44 million blog posts made between August 1st and October 1st, 2008

JMP Public Featured Datasets: Assorted public datasets from JMP

Kaggle Datasets: Explore, analyze, and share quality data

Linking Open Data: Making data freely available to everyone

LoveTheSales: The world’s biggest online sales marketplace

Lyst Fashion Data Trends: The industry’s trusted source for tracking fashion data trends

Million Song Dataset: A freely-available collection of audio features and metadata for a million contemporary popular music tracks

NASDAQ Data Link: A premier source for financial, economic and alternative datasets

NASA Space Science Data Coordinated Archive: NASA’s archive for space science mission data

Qlik Sense Data Sources: Connect and combine data from hundreds of data sources

Robert Schiller Data: Housing data, financial market data and more, from his book Irrational Exuberance

Sports Statistics: Data for soccer, NBA, NFL, NHL, and more

StatLib Datasets Archive: Datasets from Carnegie Mellon University

UCI Machine Learning Repository: A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms (new beta version)

UCR Time Series Classification Archive: Datasets, papers, links, and code

UK Open Postcode Geo: We organise UK open data by location and signpost the source

United States Census Bureau: An assortment of US Census data

Virtual Screening of Bioassay Data: Bioassay datasets available for download, by Amanda Schierz, J.

Web Data Commons: Structured data from the Common Crawl, the largest web corpus available to the public

WorldData.AI: Connect your data to many of 3.5 Billion WorldData datasets and improve your Data Science and Machine Learning models! 

Subscribe to KDnuggets to get free access to Partners planYahoo Webscope Program: Reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists

Yelp Open Dataset: An all-purpose dataset for learning; subset of Yelp businesses, reviews, and user data for use in personal, educational, and academic purposes