Apache?SystemM v0.10.0-incubating 發布,一個機器學習語言

jopen 8年前發布 | 9K 次閱讀 機器學習 Apache?SystemM

 

SystemML是靈活的,可伸縮機器學習 (ML) 語言,使用Java編寫。可實現三大功能:(1) 可定制算法;(2) 多個執行模式,包括單個,Hadoop 批量和 Spark 批量;(3) 自動優化。

SystemML的機器學習主要基于兩方面:

  • SystemML 語言,聲明式機器學習 (DML)。SystemML 包含線性代數原語,統計功能和 ML 指定結構,可以更容易也更原生的表達 ML 算法。算法通過 R 類型或者 Python 類型的語法進行表達。DML 通過提供靈活的定制分析表達和獨立于底層輸入格式和物理數據表示的數據顯著提升數據科學的生產力。 
  • SystemML 提供自動優化功能,通過數據和集群特性保證高效和可伸縮。SystemML 可以在 MapReduce 或者 Spark 環境運行。

更新日志 

Different Types of Spark Matrix Blocks

  • Supported internal formats: MCSR (default), CSR, COO
  • Automatic MCSR?CSR on Spark read/caching (for memory efficiency)
  • Automatic MCSR?CSR on sparse update-in-place (avoid serialization)

Frame Support for JMLC API/CP

  • New frame data type, deeply integrated into compiler and runtime
  • New builtin functions: transformapply, transformencode, transformdecode, transformmeta
  • Supported operations: read/write, left/right indexing, casting, append, transform/transformapply

Framework Compatibility/Configuration

  • [SYSTEMML-418] Version-specific Spark memory budgets (>=1.6, legacy)
  • [SYSTEMML-158] Updated deprecated Hadoop properties
  • [SYSTEMML-476] Version-specific MR configuration handling (MRv2, MRv1)
  • Fixes for backwards compatibility to MRv1 (Guava dependency conflicts, runtime changes such as task handling for multiple output committer)
  • New pass-through mapred/mapreduce configurations through SystemML-config
  • [SYSTEMML-584/585] New thread-local configuration handling (compiler/DML config)

Deep Learning Support

  • [SYSTMEML-618] New DML-script NN library
  • [SYSTEMML-540] New built-in singlenode operations: conv2d, maxpooling, im2col, col2im, rotate
  • New lenet-train DML script

API/Script Usability

  • [SYSTEMML-607/604/611] Parser error handling
  • [SYSTEMML-506/508/544/577/649/651] Extended MLContext/JMLC APIs
  • [SYSTEMML-625/626/632] Improved source statement handling (e.g., imports, absolute paths)
  • [SYSTEMML-617/631/654] Improved namespace handling
  • [SYSTEMML-240] Extended stats outputs for Spark collect/broadcast/parallelize
  • [SYSTEMML-495] SystemML configuration handling
  • [SYSTEMML-209] Include algorithms in SystemML jar
  • [SYSTEMML-647/648] Deprecated castAsScalar, ppred
  • [SYSTEMML-477] JSON meta data handling
  • [SYSTEMML-294] Print matrix built-in function
  • [SYSTEMML-296/676/670] Improved PyDML syntax: slicing, rand, cdf, elif
  • [SYSTEMML-675] Support for negative for/parfor loop increments

New Fused Physical Operators

  • [SYSTEMML-488] Fused wdivmm w/ 4 operands
  • [SYSTEMML-510] Fused wdivmm/wcemm w/ eps term

Various Performance Features

  • [SYSTEMML-427/512] Extended IPA (propagate scalar variables)
  • [SYSTEMML-282] Extended update-in-place support for parfor intermediates
  • [SYSTEMML-552/399] Performance parallel binary/text readers (sort sparse/nnz handling)
  • [SYSTEMML-552/641] Cache-conscious operations: sparse-dense wdivmm/wsloss, sparse-dense/sparse-sparse mm, dense-dense skinny rhs mm
  • [SYSTEMML-641] Tuned special cases for block matrix multiplication: e.g., mm w/ skinny rhs, colwise parallelization wide rhs
  • [SYSTEMML-396/400] New/extended multithreaded operations: cumsum/cummin/cummax/cumprod, transpose, and rand
  • [SYSTEMML-510/694] New simplification rewrites: “pushdown unaryagg-transpose”, “simplify transpose-aggbin-binary chains”, “reorder minus-mmult”, “canonicalize matmult-add-scalar”, improved constant folding (all unary)
  • [SYSTEMML-653] Asynchronous bufferpool cleanup of evicted files/nio file eviction
  • MR iqm/quantile/median (qsort num reducers, qpick buffer size)

DML Script Updates

  • [SYSTEMML-536] New KNN algorithm (still staging)
  • [SYSTEMML-534] Optional console output univariate statistics
  • [SYSTMEML-494] GLM compiler warnings
  • Robustness input/output handling L2SVM, MSVM, and Naive Bayes
  • Random data generator for ALS

Various Fixes

  • Dozens of fixes for diverse issues, fix pack for 0.9 release

Build, Documentation, Examples

  • [SYSTEMML-551] Enhanced JMLC javadoc
  • [SYSTEMML-484] Build javadoc jar
  • [SYSTEMML-468] Contributing to SystemML doc
  • [SYSTEMML-517/524] DML Language Reference updates
  • [SYSTEMML-498] Troubleshooting guide
  • SystemML Jupyter/Zeppelin Notebook examples

下載

systemml-0.10.0-incubating (tar.gz) tar.gz MD5 ASC
systemml-0.10.0-incubating (zip) zip MD5 ASC
systemml-0.10.0-incubating-standalone (tar.gz) tar.gz MD5 ASC
systemml-0.10.0-incubating-standalone (zip) zip MD5 ASC
systemml-0.10.0-incubating (Source tar.gz) tar.gz MD5 ASC
systemml-0.10.0-incubating (Source zip) zip MD5 ASC

 

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!