面向程序猿的數據科學與機器學習知識體系及資料合集

GretaColeba 9年前發布 | 9K 次閱讀數據挖掘機器學習

Table of Contents generated withDocToc

DataScience & Machine Learning Reference
Introduction & Overview:入門與概覽
- Collections:資源匯總帖
- Video Courses:視頻教程
- Blogs & Forum:博客與論壇
- Data Process:數據處理
- Machine Learning:機器學習
- Nature Language Processing:自然語言處理
- Deep Learning:深度學習
- Recommend System:推薦系統
CrawlerSE:爬蟲與搜索引擎
- Search Engine:搜索引擎
Data Visual:數據可視化
- Collections:資源匯總帖
  - 跨學科數據庫與搜索引擎
- Social Network:社交網絡
- Driving Data:駕駛數據
- Competition:機器學習相關競賽

DataScience & Machine Learning Reference

本文是筆者在學習DataScience過程中所有資源的匯總，本文著眼于各個領域的入門介紹以及綜述性質資源的匯總，并不會過多的深挖前沿，若有興趣了解更多，可以關注筆者的程序猿的數據科學與機器學習實戰手冊。本文主線從對數據科學與機器學習入門概覽開始，繼而提供一系列的資源、書籍與教程，然后介紹各個具體的領域內的參考文章，最后介紹一系列的實用工具。筆者的數據科學與機器學習世界觀圖解如下，其從屬于筆者的編程世界觀與方法論系列:

本文會隨著筆者自身學習實踐中格局與能力的提升而不斷完善，筆者并非純粹的機器學習與數據挖掘研究者，更多的是從工程的角度來尋找能夠與工程相結合應用的方面。

Introduction & Overview:入門與概覽

Introduction

Application:數據挖掘/機器學習/深度學習的實際應用案例

Resources:資源

Collections:資源匯總帖

機器學習入門資源不完全匯總 :本文是機器學習日報的一個專題合集。
Top-down learning path: Machine Learning for Software Engineers :針對軟件工程師的機器學習進階之路

Books:書籍

Video Courses:視頻教程

University of Illinois at Urbana-Champaign:Text Mining and Analytics
臺大機器學習技法
斯坦福機器學習課程
CS224d: Deep Learning for Natural Language Processing
Unsupervised Feature Learning and Deep Learning :來自斯坦福的無監督特征學習與深度學習系列教程
小象機器學習視頻教程
小象深度學習視頻教程

Blogs & Forum:博客與論壇

Methodology:方法論

Data Process:數據處理

Machine Learning:機器學習

Nature Language Processing:自然語言處理

Deep Learning:深度學習

重磅論文：解析深度卷積神經網絡的14種設計模式

Application:應用

Recommend System:推薦系統

CrawlerSE:爬蟲與搜索引擎

Crawler:爬蟲

Search Engine:搜索引擎

Toolkits:工具

Language

Python

Jupyter :交互式編程與數據展示
data-science-ipython-notebooks :一系列基于IPython的數據科學代碼展示
The Open Source Data Science Masters

Java

Matlab

R

ClusterComputing

Madout
- MLib ## DeepLearning:深度學習工具集
Evaluation of Deep Learning Toolkits
代碼解析深度學習系統編程模型：TensorFlow vs. CNTK
tensorflow-playground :Play with neural networks!
dl-docker:將常用的深度學習工具打包在了一個Docker鏡像中
deep-learning-models:Keras code and weights files for popular deep learning models.
Top Deep Learning Projects -

Data Visual:數據可視化

Books:書籍

Video Courses:視頻教程

John C. Hart Coursera

Toolkits:工具

Data Sets

Collections:資源匯總帖

awesome-public-datasets :An awesome list of high-quality open datasets in public domains (on-going).
Wikimedia Dumps :Wiki上的數據打包下載
Reddit Datasets :Reddit上關于數據集的討論板塊 | Militarized Interstate Disputes | Nearly 200 years of international threats, conflicts, etc. for modelling or prediction. Includes action taken, level of hostility, fatalities, and outcomes. | Multiple datasets, e.g., 962KB, 179KB | http://www.correlatesofwar.org/data-sets/MIDs |

單一數據庫

跨學科數據庫與搜索引擎

Text:文本

20 Newsgroups :The text from 20000 messages taken from 20 Usenet newsgroups for text analysis, classification, etc. 61.6MB
Amazon Reviews :Over 142 million product reviews for sentiment analysis, recommender systems, and more.20GB | SMS Spam Collection | A collection of 5,574 SMS (text) messages, some spam, some normal, for spam filtering. | 204KB | http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ |

Social Network:社交網絡

http://enigma.io
http://www.ufindthem.com/
http://NetworkRepository.com（有視覺互動分析的機器學習數據庫）
http://MLvis.com
Yahoo Instant Messenger Friends Connectivity Graph :Connections between Yahoo users who communicate with each other using Yahoo messenger, can be used to identify key social contacts/influencers. Add dataset to cart to access. 共 28MB。

Media:影音圖片

Labeled Faces in the Wild :13,000 named faces for facial recognition. Multiple training and test sets. 共173MB
Mushroom Identification :For hypothetically classifying mushrooms as edible or poisonous based on its characteristics.3 files, 480KB
NORB 3D Object Recognition :Binocular images of 50 toy figurines for 3D object recognition from image.Multiple files, over 5GB total
One Million Songs :Audio features and metadata for a subset (10,000) of the one million popular songs dataset for recognition/classification.1.8GB
Hate Speech Identification :A sampling of 推ter posts that have been judged based on whether they are offensive or contain hate speech, as a training set for text analysis.2.66MB
Hidden Beauty of Flickr Pictures :15,000 Flickr photo IDs that have received ratings based on aesthetics, for image analysis.138KB, use Flickr API to get images

Recognition

| Human Activity Recognition with Smartphones | Sensor data for recognizing the human activity - walking, sitting, etc. | 25MB | https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones |

Driving Data:駕駛數據

UDA City 開源的223G的關于自動駕駛的歷史數據

Domain:領域數據

Sports:體育

Football Strategy :Thousands of scenarios to make the best coaching decisions. 共876KB
Horses for Courses :Horse-racing data for predicting race results. 共 19MB
NBA & MLB Stats :Current and past season stats for teams and players for fantasy sports predictions.

Medicines:醫藥

National Survey on Drug Use and Health :Predict drug use based on health survey questions. 共2GB
Prostate Cancer :Tumor and nontumor samples, used to recognize prostate cancer. 共 4.8MB
Record of Heart Sound :Recordings of normal and abnormal heartbeats, used to recognize heart murmur, etc. 共47.7MB

Alien:外星人

UFO Reports:80,000 historic reports for classification or regression. This dataset has been standardized from the source data at nuforc.org 共14.6MB。

Foods:飲食

Wine Quality :Chemical properties of red and white wines (separately) and quality, for classification. 3個文件，共343KB。

Finance:金融

Others:其他

Competition:機器學習相關競賽

阿里天池新人實戰賽
Kaggle :官方新人賽，不錯的入門學習
Kaggle Tutorial :基于旅館推薦比賽實例的完整Tutorial
Driven Data
Innocentive
Crowdanalytix
Tunedit
DataFountain :DF,CCF指定中國專業的數據競賽平臺

Career:職業

Quora 關于機器學習的招聘啟事
Google 關于機器學習與人工智能崗位的招聘啟事

來自：https://github.com/wxyyxc1992/DataScience-And-MachineLearning-Handbook-For-Coders/blob/master/DataScience-Reference.md

本文由用戶 GretaColeba 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1479871027582.html

數據挖掘機器學習

面向程序猿的數據科學與機器學習知識體系及資料合集

DataScience & Machine Learning Reference

Introduction & Overview:入門與概覽

Introduction

Machine Learning

Deep Learning

Statistics

News:行業與新聞

Application:數據挖掘/機器學習/深度學習的實際應用案例

Resources:資源

Collections:資源匯總帖

Books:書籍

Video Courses:視頻教程

Blogs & Forum:博客與論壇

Methodology:方法論

Data Process:數據處理

Machine Learning:機器學習

Nature Language Processing:自然語言處理

Deep Learning:深度學習

Application:應用

Recommend System:推薦系統

CrawlerSE:爬蟲與搜索引擎

Crawler:爬蟲

Search Engine:搜索引擎

Toolkits:工具

Language

Python

Java

Matlab

R

ClusterComputing

Data Visual:數據可視化

Books:書籍

Video Courses:視頻教程

Toolkits:工具

Data Sets

Collections:資源匯總帖

單一數據庫

跨學科數據庫與搜索引擎

Text:文本

Social Network:社交網絡

Media:影音圖片

Recognition

Driving Data:駕駛數據

Domain:領域數據

Sports:體育

Medicines:醫藥

Alien:外星人

Foods:飲食

Finance:金融

Others:其他

Competition:機器學習相關競賽

Career:職業

相關經驗

相關資訊

相關文檔

目錄