使用 Bash Shell 處理 JSON 文件
原文 http://wsgzao.github.io/post/bash-json/
前言
JSON(JavaScript Object Notation) 是一種輕量級的數據交換格式,易于人閱讀和編寫,同時也易于機器解析和生成。本文提供一個真實的測試用例需求,設計邏輯類似Makefile,我以Bash處理JSON為例,Coding水平有限,請各位多多包涵哈,歡迎大家一起學習和挑戰各種不同的語言來實現。
巧用jq處理JSON數據
更新歷史
2015年06月19日 - 初稿
閱讀原文 - http://wsgzao.github.io/post/bash-json/
擴展閱讀
- JSON - http://json.org/
- jq - http://stedolan.github.io/jq/
Test Case
In data pipeline system and configuration management systems, it’s very common that you need execute a bunch of jobs which has dependencies with each other.
Write a programpipeline_runnerto execute a list of shell scripts. The definition of those scripts and their dependencies are described in a JSON file. The program only takes in one argument which is the file path of JSON file that defines the jobs.
For example,
// jobs.json
{
    "log0_compressed" : {
        "commands": "curl http://websrv0/logs/access.log.gz > access0.log.gz",
        "input": [],
        "output": "access0.log.gz"
    },
    "log0" : {
        "commands": "gunzip access0.log.gz",
        "input": ["access0.log.gz"],
        "output": "access0.log"
    },
    "log1_compressed": {
        "commands": "curl http://websrv1/logs/access.log.gz > access1.log.gz",
        "input": [],
        "output": "access1.log.gz"
    },
    "log1" : {
        "commands": "gunzip access1.log.gz",
        "input": ["access1.log.gz"],
        "output": "access1.log"
    },
    "log_combined": {
        "commands": "cat access0.log access1.log > access.log",
        "input": ["access0.log", "access1.log"],
        "output": "access.log"
    }
} 
To run the program
pipeline_runner jobs.json As you can see, each job has its input files and output files.
- A job will only be executed if all its input files exist.
- A job can have multiple input files (or none) but only produce one output file.
- Users could run the program multiple times, but if a job’s output file already exists, the program would skip the job.
If you’re still not very clear, think ofMakefilein Linux systems. The logic is quite similar.
You could complete the test with the programming language you preferred.
Bash Shell
#!/bin/bash
# dos2unix *.sh
# Program:
# This program to test json.
# History:
# 2015/06/18 by OX
#---------------------------- custom variables ---------------------start
runuser=root
# commands
log_combined_commands=`cat jobs.json | ./jq -r '.log_combined.commands'`
log1_commands=`cat jobs.json | ./jq -r '.log1.commands'`
log1_compressed_commands=`cat jobs.json | ./jq -r '.log1_compressed.commands'`
log0_commands=`cat jobs.json | ./jq -r '.log0.commands'`
log0_compressed_commands=`cat jobs.json | ./jq -r '.log0_compressed.commands'`
# input file name
log0_input=`cat jobs.json | ./jq -r '.log0.input[0]'`
log1_input=`cat jobs.json | ./jq -r '.log1.input[0]'`
log_combined_input1=`cat jobs.json | ./jq -r '.log_combined.input[0]'`
log_combined_input2=`cat jobs.json | ./jq -r '.log_combined.input[1]'`
# output file name
log_combined_output=`cat jobs.json | ./jq -r '.log_combined.output'`
log1_output=`cat jobs.json | ./jq -r '.log1.output'`
log1_compressed_output=`cat jobs.json | ./jq -r '.log1_compressed.output'`
log0_output=`cat jobs.json | ./jq -r '.log0.output'`
log0_compressed_output=`cat jobs.json | ./jq -r '.log0_compressed.output'`
#---------------------------- custom variables ---------------------end
#---------------------------- user check ---------------------start
if [ "`whoami`" != "$runuser" ]; then
    echo "Please re-run ${this_file} as $runuser."
    exit 1
fi
#---------------------------- user check ---------------------end
#---------------------------- function ---------------------start
pause()
{
    read -n1 -p "Press any key to continue..."
}
log_combined_check_first()
{
if [ -f "$log_combined_output" ]; then
   echo "${log_combined_output} has been generated, the programe will exit"
   exit 0
fi
}
log0_compressed_check()
{
if [ ! -f "$log0_compressed_output" ]; then
   eval ${log0_compressed_commands}
fi
}
log0_check()
{
if [ ! -f "$log0_output" ]; then
   eval ${log0_commands}
fi
}
log1_compressed_check()
{
if [ ! -f "$log1_compressed_output" ]; then
   eval ${log1_compressed_commands}
fi
}
log1_check()
{
if [ ! -f "$log1_output" ]; then
   eval ${log1_commands}
fi
}
log_combined_check()
{
if [ ! -f "$log_combined_output" ]; then
   eval ${log_combined_commands}
   echo "${log_combined_output} has been generated, the programe will exit"
fi
}
#---------------------------- function ---------------------end
#---------------------------- main ---------------------start
echo "
Please read first:
[0]Check jobs.json and jq by yourself first
[1]A job will only be executed if all its input files exist.
[2]A job can have multiple input files (or none) but only produce one output file.
[3]Users could run the program multiple times, but if a job's output file already exists, the program would skip the job.
"
pause
#check if file exist and do the job
log_combined_check_first
log0_compressed_check
log0_check
log1_compressed_check
log1_check
log_combined_check
#---------------------------- main ---------------------end小結
我的代碼未實現任意數量jobs的input,希望大牛指點
file://D:\pipeline (2 folders, 4 files, 490.55 KB, 531.04 KB in total.)
│  jobs.json 794 bytes
│  jq 486.13 KB
│  pipeline_runner 2.95 KB
│  README.md 714 bytes
├─logs (0 folders, 2 files, 248 bytes, 248 bytes in total.)
│      access0.log.gz 123 bytes
│      access1.log.gz 125 bytes
└─result (0 folders, 5 files, 40.25 KB, 40.25 KB in total.)
        access.log 20.00 KB
        access0.log 10.00 KB
        access0.log.gz 123 bytes
        access1.log 10.00 KB
        access1.log.gz 125 bytes