Python流式高效訪問(超)大文件的庫:smart_open

jopen 9年前發布 | 34K 次閱讀 Python開發 smart_open

Python流式高效訪問(超)大文件的庫(支持云端/本地的壓縮/未壓縮文件:S3, HDFS, gzip, bz2...)

>>> # stream lines from an S3 object
>>> for line in smart_open.smart_open('s3://mybucket/mykey.txt'):
...    print line

>>> # can use context managers too: >>> with smart_open.smart_open('s3://mybucket/mykey.txt') as fin: ... for line in fin: ... print line ... fin.seek(0) # seek to the beginning ... print fin.read(1000) # read 1000 bytes

>>> # stream from HDFS >>> for line in smart_open.smart_open('hdfs://user/hadoop/my_file.txt'): ... print line

>>> # stream content into S3 (write mode): >>> with smart_open.smart_open('s3://mybucket/mykey.txt', 'wb') as fout: ... for line in ['first line', 'second line', 'third line']: ... fout.write(line + '\n')

>>> # stream from/to local compressed files: >>> for line in smart_open.smart_open('./foo.txt.gz'): ... print line

>>> with smart_open.smart_open('/home/radim/foo.txt.bz2', 'wb') as fout: ... fout.write("some content\n")</pre>

項目主頁:http://www.baiduhome.net/lib/view/home/1422349535814

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!