28萬個開源項目之番外篇

jopen 9年前發布 | 54K 次閱讀 開源項目

一、工具

1. 數據抓取

最初是打算使用openhub.net的Open API的,他們有不錯的API,還在Github上放了一個開源項目。只可惜,他們的API,最多申請5個API Key,每個Key明天的訪問請求數量,不能超過1000次。當時我還不知道,其實openhub的數據只有28萬多,還以為滿打滿算,至少得60多天才能全部抓完,頓時心就涼了。

后來有朋友介紹了一個很棒的直接抓取HTML頁面,然后做DOM分析的工具,名叫noodle

接下來,只要抓取: https://www.openhub.net/p?ref=homepage&q=&page={num}
就能夠拿到所有項目的概要數據了。

當然,后續的331個項目的明細數據,還是得通過OpenHub的API來抓取。

2. 數據分析

完全是土法上馬:sqlite3+numbers+csv+ruby,反正各種手法,什么稱手用什么。

3. 數據展示

原本是打算在numbers里想想辦法的,后來發現實在太弱。Excel也差不多,只能到網上搜索一些信息圖制作的工具,后來找到了幾個不錯的在線工具,經過一番比較,最后決定用infogr.am來完成。的確非常不錯。

二、釋疑:項目大小與創建時間的關系

我與@云風 在微博上有一小段討論,起因還是我之前的一些分析的觀點:

  • 是否使用Github,越是新的項目越愿意用;越是大的項目越沒法用。
  • 是否使用Github來管理項目的issue,越是新的項目越愿意用;越是大的項目越沒法用。

這個結論,其實在用詞上,是有些講究的:按理說,新與老相對,小與大相對;愿意與不愿意相對,能用與沒法用相對,我的兩個結論,對仗都不公整。其實,確實故意為之。

于是,云風與我的對話如下:
云風:項目規模和項目歷史本身有相關性吧。代碼規模越大的項目歷史很可能越久。
我:項目的規模,主要還是與項目本身的特性有關。原本就復雜的項目,才可能越長越大。原本就是小項目,也未必就會穩定的逐年增長。
云風:這只能說明小項目可以歷史久,不能說明大項目可以歷史短啊。很少有新項目一開始就很大啊。代碼也是一行行寫出來的啊。
我:那就是成長速度不同了。比如OpenStack,一開始就不小。
云風:一開始就不小只能說閉源開發過一段時間,或從別的地方搬遷過來的吧。你能想象不被版本管理工具管理的情況下,首次提交 10 萬行以上的代碼?看這個 link 提交日志寫的 initial fork out of nova。

后來,我也沒有再繼續這個討論,但是卻一直在思考這個問題:「項目的大小,與項目的創建時間,究竟有大少相關性?」

后來,我將兩個數據,做了一個分析:Log(第一次提交代碼,至今的天數)/Log(代碼行數),大概得到如下一個圖:


經過強大的Excel的計算,兩個數據的相關系數,大約是0.203的樣子,也就是說:大致上有較弱的正相關。

三、開源

目前,我已經將這個分析的相關數據,放在Github上開源了。簡單介紹一下:

data.sqlite3.zip 是28萬基礎數據
projects.sqlite3 是331個項目的詳細數據
projects.csv 是我用來做數據分析的大表格

四、名單

331一個開源項目,名單如下:

Name Homepage
Metasploit Framework http://www.metasploit.com/framework/
NetBSD http://www.netbsd.org
GNU C Library http://www.gnu.org/software/libc/
cURL http://curl.haxx.se/
Python programming language https://www.python.org
Linux Kernel http://kernel.org/
GNU Emacs http://www.gnu.org/software/emacs
gnulib http://savannah.gnu.org/projects/gnulib/
GNU Core Utilities http://savannah.gnu.org/projects/coreutils/
GNU Compiler Collection http://gcc.gnu.org/
Wine http://www.winehq.org
Debian http://www.debian.org/
GNU Octave http://www.octave.org
Visualization Toolkit http://www.vtk.org
pf http://www.benzedrine.cx/pf.html
GDB http://www.gnu.org/software/gdb/
GNU binutils http://www.gnu.org/software/binutils/
GHC http://haskell.org/ghc/
Zope http://zope2.zope.org
FreeBSD https://github.com/trueos/trueos
Perl http://www.perl.org/
GNU LilyPond Music Typesetter http://lilypond.org/
Gnus http://gnus.org/
ikiwiki https://github.com/schmonz/ikiwiki
Samba http://www.samba.org
PHP http://php.net
FreeBSD Ports http://www.freebsd.org/ports/
pkgsrc: The NetBSD Packages Collection http://www.pkgsrc.org/
Mesa http://www.mesa3d.org/
Squid Cache http://www.squid-cache.org/
KDElibs (KDE) http://www.kde.org/
gedit http://www.gnome.org/projects/gedit/
Evolution http://www.gnome.org/projects/evolution/
Kontact http://kontact.org/
KDE PIM http://pim.kde.org
Advanced Linux Sound Architecture (ALSA) http://www.alsa-project.org/
Wireshark http://www.wireshark.org
OpenSSL http://www.openssl.org/
GIMP http://www.gimp.org/
NetBeans IDE http://www.netbeans.org
Koha Library Automation Package http://www.koha-community.org
openSUSE Linux http://www.opensuse.org/
Doxygen http://doxygen.org/
libcurl http://curl.haxx.se/libcurl
GStreamer http://github.com/zaheerm/gst-plugins-good
GNOME http://www.gnome.org/
Insight Toolkit http://www.itk.org
zsh http://zsh.sourceforge.net/
Nautilus https://wiki.gnome.org/Apps/Nautilus
X.Org http://www.x.org/wiki/
Mozilla Core http://www.ahrcloud.com
MariaDB http://mariadb.org/
CMake http://www.cmake.org
LibreOffice http://www.libreoffice.org
ALT Linux http://www.altlinux.org
ParaView http://www.paraview.org
GTK+ http://www.gtk.org/
Poedit http://www.poedit.net/
Bugzilla http://www.bugzilla.org/
Enlightenment (window manager) http://www.enlightenment.org
FFmpeg http://www.ffmpeg.org/
GLib http://library.gnome.org/devel/glib/
PEAR http://pear.php.net/
Ruby http://www.ruby-lang.org/
GnuCash http://www.gnucash.org/
phpMyAdmin http://www.phpmyadmin.net/
Mono http://www.mono-project.com
SWIG http://www.swig.org
SWT (Standard Widget Toolkit) http://www.eclipse.org/swt/
Checkstyle http://checkstyle.sourceforge.net
Eclipse Java Development Tools (JDT) http://www.eclipse.org/jdt/
Eclipse Platform Project http://www.eclipse.org/eclipse/platform-ui/
Natural Language Toolkit (NLTK) http://www.nltk.org
Ekiga http://ekiga.org/
Boost C++ Libraries http://www.boost.org
Kate (KDE) http://kate-editor.org
Devhelp http://live.gnome.org/devhelp
Arch Linux Packages http://www.archlinux.org
SPIP http://www.spip.net
GNOME Terminal https://help.gnome.org/users/gnome-terminal/stable/
ScummVM http://www.scummvm.org/
Anjuta DevStudio http://anjuta.org
BlueZ http://www.bluez.org/
Eye of GNOME http://www.gnome.org/projects/eog
Tor http://www.torproject.org/
Fedora Packages http://fedoraproject.org
Haiku http://www.haiku-os.org
Stellarium http://stellarium.org/
Totem http://projects.gnome.org/totem/
Rhythmbox http://www.gnome.org/projects/rhythmbox/
Gentoo Linux http://www.gentoo.org/
CDT (Eclipse) http://www.eclipse.org/cdt/
JRuby http://www.jruby.org
eZ Publish http://share.ez.no
VLC media player http://videolan.org/
Equinox http://www.eclipse.org/equinox/
Epiphany http://www.gnome.org/projects/epiphany/
Thunderbird http://mozilla.org/thunderbird/
GeoTools http://geotools.org
PyPy http://pypy.org
KDE http://www.kde.org
apt - Advanced Package Tool https://wiki.debian.org/Apt
Moodle http://git.moodle.org/gw?p=moodle.git
Calligra Suite http://www.calligra.org
QGIS http://qgis.org/
Mozilla Firefox http://www.firefox.com/
coreboot http://www.coreboot.org/Welcome_to_coreboot
Tiki Wiki CMS Groupware http://tiki.org
Apache Maven 2 http://github.com/apache/maven-archetype
Plone http://plone.org
Superior Lisp Interaction Mode for Emacs http://common-lisp.net/project/slime/
Kodi http://kodi.tv
MythTV http://www.mythtv.org
systemd http://www.freedesktop.org/wiki/Software/systemd
GeoServer http://www.geoserver.org
Groovy http://groovy.codehaus.org/
Blender http://www.blender.org/
MySQL http://www.mysql.com/
iproute2 http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2
MonoDevelop http://www.monodevelop.com
Hibernate http://www.hibernate.org/subprojects/ogm
NetworkManager http://www.gnome.org/projects/NetworkManager/
NLog - Advanced .NET Logging http://nlog-project.org/
GParted http://gparted.org/
Seahorse http://www.gnome.org/projects/seahorse/
Glade User Interface Designer http://glade.gnome.org/
Jenkins http://jenkins-ci.org/
IntelliJ IDEA Community Edition http://www.jetbrains.org
Ruby on Rails http://rubyonrails.org
BusyBox http://busybox.net/
Evince http://projects.gnome.org/evince/
DokuWiki http://www.dokuwiki.org/
Linux NTFS file system support http://www.linux-ntfs.org/
KVM http://kvm.qumranet.com/kvmwiki
Battle for Wesnoth http://wesnoth.org/
Git http://git-scm.com/
SPIP-Zone http://zone.spip.org/trac/spip-zone/
Mercurial http://mercurial.selenic.com/
Hibernate Entity Manager http://entitymanager.hibernate.org/
Racket http://racket-lang.org/
RubyGems http://rubygems.org
SQLAlchemy http://www.sqlalchemy.org/
cabal http://haskell.org/cabal/
U-Boot http://www.denx.de/wiki/U-Boot/WebHome
WebKit http://webkit.org
OpenEmbedded http://openembedded.org
Yocto Project http://www.yoctoproject.org
matplotlib http://matplotlib.org/
Symfony http://www.symfony.com/
Meld http://meldmerge.org/
Haxe http://haxe.org/
FreeSWITCH http://www.freeswitch.org/
Geany http://geany.org/
collectd http://collectd.org/
Gramps http://gramps-project.org
phpBB Forum Software http://www.phpbb.com/
HAProxy http://www.haproxy.org/
fail2ban http://www.fail2ban.org/wiki/index.php/Main_Page
NumPy http://numpy.scipy.org
Scala http://www.scala-lang.org/
dpkg http://wiki.debian.org/Teams/Dpkg/
Nette Framework http://nette.org
Inkscape http://www.inkscape.org
Phing http://www.phing.info/
jBPM http://jbpm.org
JBoss Drools http://www.jboss.org/drools
Bitbake http://developer.berlios.de/projects/bitbake/
Zotero http://www.zotero.org/
Lutece http://www.lutece.paris.fr
OTRS http://www.otrs.com/
Sage: Open Source Mathematics Software http://sagemath.org
Rockbox http://rockbox.org
Liferay Portal http://liferay.com
TYPO3 CMS http://typo3.org
Vala http://live.gnome.org/Vala
pylint http://pylint.org
The LLVM Compiler Infrastructure http://llvm.org/
libvirt http://libvirt.org
TinyMCE http://tinymce.moxiecode.com
Django http://www.djangoproject.com/
PHPUnit http://www.phpunit.de/
OpenStreetMap http://www.openstreetmap.org/
SymPy http://sympy.org
Xen Project (Hypervisor) http://www.xenproject.org
Eclipse Mylyn http://www.eclipse.org/mylyn/
PHP_CodeSniffer http://pear.php.net/package/PHP_CodeSniffer
Sakai LMS (core) http://www.sakaiproject.org/
Spring Framework http://github.com/SpringSource/spring-framework
Joomla! http://www.joomla.org/
Marble http://edu.kde.org/marble/
LXDE http://lxde.org
Pygments http://pygments.org/
OpenLayers http://openlayers.org/
The MacPorts Project http://www.macports.org/
calibre http://calibre-ebook.com/
Grails http://grails.org
Alfresco Content Management http://www.alfresco.com
util-linux https://github.com/karelzak/util-linux
jQuery http://jquery.com/
Vaadin http://vaadin.com/
Cython http://www.cython.org/
Dojo Toolkit http://dojotoolkit.org/
MediaWiki https://www.mediawiki.org/wiki/MediaWiki
Second Life Viewer http://www.secondlife.com/
Munin http://munin-monitoring.org/
Odoo https://www.odoo.com
Mozilla Calendar http://www.mozilla.org/projects/calendar/
KDevelop http://kdevelop.org/
ZNC http://znc.in
Werkzeug http://werkzeug.pocoo.org/
cppcheck http://cppcheck.sourceforge.net/
Wicket Stuff http://wicketstuff.org
Drush http://drupal.org/project/drush
Sphinx documentation builder http://sphinx-doc.org/
Piwik http://piwik.org
JDownloader http://www.jdownloader.org
SeaMonkey http://www.seamonkey-project.org/
Empathy http://live.gnome.org/Empathy
SilverStripe http://www.silverstripe.org
PulseAudio http://pulseaudio.org
LLVM/Clang C family frontend http://clang.llvm.org/
Pylons http://pylonsproject.org
MongoDB http://www.mongodb.org/
Mockito https://github.com/mockito/mockito
Doctrine http://www.doctrine-project.org
Pacman http://www.archlinux.org/pacman/
MAME - Multiple Arcade Machine Emulator http://mamedev.org/
Rubinius http://rubini.us/
Apache Camel http://camel.apache.org/
OpenJDK http://openjdk.java.net/
Buildbot http://buildbot.net/trac
MPD http://sourceforge.net/projects/musicpd
Tracker http://projects.gnome.org/tracker/
org-mode http://orgmode.org
Sass http://sass-lang.com/
WPA/WPA2/IEEE 802.1X Supplicant http://hostap.epitest.fi/wpa_supplicant/
Go programming language http://golang.org/
Apache CouchDB http://couchdb.apache.org/
Qt 4 http://qt-project.org/
Apache CXF http://cxf.apache.org/
CakePHP http://cakephp.org
CKeditor WYSIWYG editor http://ckeditor.com/
SciPy http://www.scipy.org
gitg http://trac.novowork.com/gitg/
Banshee http://banshee-project.org
OGRE http://www.ogre3d.org
Chromium (Google Chrome) http://code.google.com/chromium/
Gradle http://www.gradle.org/
Netty Project http://netty.io/
Sinatra http://www.sinatrarb.com
Chef http://www.opscode.com/chef
Gerrit Code Review http://code.google.com/p/gerrit
GNOME Shell http://live.gnome.org/GnomeShell
Git Extensions http://code.google.com/p/gitextensions
Qt Creator http://qt-project.org/
Kohana v3 http://kohanaframework.org/
Android http://www.android.com
JUnit http://www.junit.org
PCSX2 http://pcsx2.net/
Shotwell https://wiki.gnome.org/Apps/Shotwell
Redis http://redis.io/
Cassandra http://cassandra.apache.org/
PhoneGap http://phonegap.com/
Trinity Core http://www.trinitycore.org
Icinga http://www.icinga.org
CyanogenMod http://www.cyanogenmod.com/
Rygel http://live.gnome.org/Rygel
QEMU http://www.qemu.org/
Trinity Core2 http://www.trinitycore.org
Pitivi http://github.com/jhoolmans
Openfire http://www.igniterealtime.org/projects/openfire/
Apache Hadoop http://hadoop.apache.org/core/
akka http://akka.io
JGit http://www.eclipse.org/jgit/
Homebrew https://github.com/Homebrew/homebrew-apache
Oh My Zsh http://github.com/robbyrussell/oh-my-zsh
ehcache http://www.ehcache.org/
EGit http://www.eclipse.org/egit/
node.js (NodeJs) http://nodejs.org
Thunar http://www.xfce.org
Selenium http://seleniumhq.org/
Arquillian http://jboss.org/arquillian
Erlang http://www.erlang.org
YUI http://yuilibrary.com/
Gunicorn http://gunicorn.org
CoffeeScript http://www.coffeescript.org/
Clementine Music Player https://github.com/clementine-player/Clementine
scikit learn http://scikit-learn.org
Processing http://processing.org/
Vagrant http://vagrantup.com/
Qt 5 http://www.qt-project.org/
Yii PHP Framework http://www.yiiframework.com
Zend Framework http://framework.zend.com/
Apache Spark http://spark.apache.org
Flask http://flask.pocoo.org/
OsmAnd http://www.osmand.net
ownCloud http://ownCloud.org
Open Computer Vision Library (OpenCV) http://opencv.org/
phpDocumentor http://www.phpdoc.org
IPython http://ipython.org/
RSpec http://rspec.info/
OpenStack http://www.openstack.org/
OpenStack Nova https://launchpad.net/nova
Apache CloudStack https://github.com/apache/incubator-cloudstack
AngularJS http://angularjs.org/
GWT (formerly Google Web Toolkit) https://github.com/google-web-toolkit/gwt
Facter http://puppetlabs.com/puppet/related-projects/facter/
salt http://saltstack.org
jMonkey Engine http://jmonkeyengine.org
Puppet http://puppetlabs.com/puppet/
Play! framework http://www.playframework.org/
Elasticsearch http://www.elasticsearch.com
Bootstrap (推ter) http://推ter.github.com/bootstrap/
Apache OpenOffice http://www.openoffice.org/
GlassFish https://glassfish.dev.java.net/
Propel http://propelorm.org
JabRef http://jabref.sourceforge.net
CodeIgniter http://www.codeigniter.com/
GNOME Boxes http://live.gnome.org/Boxes
GitLab https://www.gitlab.com/gitlab-ce/
TiddlyWiki http://www.tiddlywiki.org
Fish shell https://github.com/fish-shell/fish-shell
Ansible http://ansible.com
Simple Machines Forum http://www.simplemachines.org/
FontForge http://www.fontforge.org
libgdx http://libgdx.badlogicgames.com
py-pandas http://pandas.sourceforge.net/
javascript https://github.com/airbnb/javascript
EasyTAG https://wiki.gnome.org/Apps/EasyTAG
docker http://docker.io
Capistrano http://capistranorb.com/

來自: http://www.jianshu.com/p/093d3386b0a3

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!