Miller 3.4.0 發布,CSV 和 JSON 處理工具
Miller 3.4.0 發布了,
% mlr --csv cut -f hostname,uptime mydata.csv % mlr --csv --rs lf filter '$status != "down" && $upsec >= 10000' *.csv % mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat % grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group % mlr join -j account_id -f accounts.dat then group-by account_name balances.dat % mlr put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json % mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/* % mlr stats2 -a linreg-pca -f u,v -g shape data/
新版本改進內容:
Primary features:
-
JSON is now a supported format for input and output. Miller handles tabular data, and JSON supports arbitrarily deeply nested data structures, so if you want general JSON processing you should use
jq. But if you have tabular data represented in JSON then Miller can now handle that for you. Please see
the reference page and the FAQ. -
Reshape is a standard data-processing idiom, now available in Miller: http://johnkerl.org/miller/doc/reference.html#reshape
-
Incidentally (not part of this release, but new since the last release) Miller is now available in FreeBSD's package manager: https://www.freshports.org/textproc/miller/. A full list of distributions containing Miller may be found here.
-
Miller is not yet available from within Fedora/CentOS, but as a step toward this goal, an SRPM is included in this release (see file-list below).
DSL enhancements for mlr put and mlr filter:
-
Regex captures
\0through\9: http://johnkerl.org/miller/doc/reference.html#Regex_captures -
Ternary operator in expression right-hand sides: e.g.
mlr put '$y = $x < 0.5 ? 0 : 1' -
Boolean literals
trueandfalse -
Final semicolon is now allowed: e.g.
mlr put '$x=1;$y=2;' -
Environment variables are now accessible, where environment-variable names may be string literals or arbitrary expressions:
mlr put '$home = ENV["HOME"]'ormlr put '$value = ENV[$name]'. -
While records are still string-to-string maps for input and output, and between
thenstatements, types are preserved between multiple statements within aput. Example:mlr put '$y = string($x); $z = $y . $y'works as expected, without requringmlr put '$y = string($x); $z = string($y) . string($y)'as before.
Bug fixes:
-
Mixed-format join, e.g. CSV file joined with DKVP file, was incorrectly computing default separators (
IRS,IFS,IPS). This resulted in records not being joined together. -
Segmentation violation on non-standard-input read of files with size an exact multiple of page size and not ending in
IRS, e.g. newline. (This is less of a corner case than it sounds: for example, leave a long-running program running with output redirected to a file, then in a sleep-and-process loop, have Miller process that file. The former program's stdio library will likely be doing block-sized buffered I/O, where block sizes will often be multiples of system page size and the block will almost surely not ending a newline.)
下載地址:https://github.com/johnkerl/miller/releases/tag/v3.4.0
來自: http://www.oschina.net//news/70709/miller-3-4-0