Ruby的輕量級ETL工具:Kiba
編寫可靠,簡潔,完善的測試和可維護的數據處理代碼是棘手的。Kiba讓你能夠使用Ruby輕松定義和運行高質量的ETL (Extract-Transform-Load) jobs。
Kiba provides you with a DSL to define ETL jobs:
# declare a ruby method here, for quick reusable logic def parse_french_date(date) Date.strptime(date, '%d/%m/%Y') endor better, include a ruby file which loads reusable assets
eg: commonly used sources / destinations / transforms, under unit-test
require_relative 'common'
declare a pre-processor: a block called before the first row is read
pre_process do
do something
end
declare a source where to take data from (you implement it - see notes below)
source MyCsvSource, 'input.csv'
declare a row transform to process a given field
transform do |row| row[:birth_date] = parse_french_date(row[:birth_date])
return to keep in the pipeline
row end
declare another row transform, dismissing rows conditionally by returning nil
transform do |row| row[:birth_date].year < 2000 ? row : nil end
declare a row transform as a class, which can be tested properly
transform ComplianceCheckTransform, eula: 2015
before declaring a definition, maybe you'll want to retrieve credentials
config = YAML.load(IO.read('config.yml'))
declare a destination - like source, you implement it (see below)
destination MyDatabaseDestination, config['my_database']
declare a post-processor: a block called after all rows are successfully processed
post_process do
do something
end</pre>
本文由用戶 xf3f 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!