Python流式高效訪問(超)大文件的庫:smart_open
Python流式高效訪問(超)大文件的庫(支持云端/本地的壓縮/未壓縮文件:S3, HDFS, gzip, bz2...)
>>> # stream lines from an S3 object >>> for line in smart_open.smart_open('s3://mybucket/mykey.txt'): ... print line>>> # can use context managers too: >>> with smart_open.smart_open('s3://mybucket/mykey.txt') as fin: ... for line in fin: ... print line ... fin.seek(0) # seek to the beginning ... print fin.read(1000) # read 1000 bytes
>>> # stream from HDFS >>> for line in smart_open.smart_open('hdfs://user/hadoop/my_file.txt'): ... print line
>>> # stream content into S3 (write mode): >>> with smart_open.smart_open('s3://mybucket/mykey.txt', 'wb') as fout: ... for line in ['first line', 'second line', 'third line']: ... fout.write(line + '\n')
>>> # stream from/to local compressed files: >>> for line in smart_open.smart_open('./foo.txt.gz'): ... print line
>>> with smart_open.smart_open('/home/radim/foo.txt.bz2', 'wb') as fout: ... fout.write("some content\n")</pre>