11 個可能不為人知的Python函數庫

jopen 10年前發布 | 11K 次閱讀 Python

本文專門鎖定那些不為人知的Python庫,就是Python老手也應當看看,也許有一兩個還真從未見過.

1) delorean

Delorean is a really cool date/time library. Apart from having a sweet name, it's one of the more natural feeling date/time munging libraries I've used in Python. It's sort of like moment in javascript, except I laugh every time I import it. The docs are also good and in addition to being technically helpful, they also make countless Back to the Future references.

from delorean import Delorean
EST = "US/Eastern"
d = Delorean(timezone=EST) 

 

2) prettytable

There's a chance you haven't heard of prettytable because it's listed on GoogleCode, which is basically the coding equivalent of Siberia.

Despite being exiled to a cold, snowy and desolate place, prettytable is great for constructing output that looks good in the terminal or in the browser. So if you're working on a new plug-in for the IPython Notebook, check out prettytable for your HTML __repr__.

from prettytable import PrettyTable
table = PrettyTable(["animal", "ferocity"])
table.add_row(["wolverine", 100])
table.add_row(["grizzly", 87])
table.add_row(["Rabbit of Caerbannog", 110])
table.add_row(["cat", -1])
table.add_row(["platypus", 23])
table.add_row(["dolphin", 63])
table.add_row(["albatross", 44])
table.sort_key("ferocity")
table.reversesort = True
+----------------------+----------+
|        animal        | ferocity |
+----------------------+----------+
| Rabbit of Caerbannog |   110    |
|      wolverine       |   100    |
|       grizzly        |    87    |
|       dolphin        |    63    |
|      albatross       |    44    |
|       platypus       |    23    |
|         cat          |    -1    |
+----------------------+----------+ 

3) snowballstemmer

Ok so the first time I installed snowballstemmer, it was because I thought the name was cool. But it's actually a pretty slick little library. snowballstemmer will stem words in 15 different languages and also comes with a porter stemmer to boot.

from snowballstemmer import EnglishStemmer, SpanishStemmer
EnglishStemmer().stemWord("Gregory")

Gregori

SpanishStemmer().stemWord("amarillo")

amarill </pre>

4) <a href="/misc/goto?guid=4958862358205405522">wget</a> </h3>

Remember every time you wrote that web crawler for some specific purpose? Turns out somebody built it...and it's called wget. Recursively download a website? Grab every image from a page? Sidestep cookie traces? Done, done, and done.

Movie Mark Zuckerberg even says it himself

First up is Kirkland, they keep everything open and allow indexes on their apache configuration, so a little wget magic is enough to download the entire Kirkland 非死book. Kid stuff!

 

The Python version comes with just about every feature you could ask for and is easy to use.

import wget
wget.download("

100% [............................................................................] 280385 / 280385 </pre>

Note that another option for linux and osx users would be to use do: <code>from sh import wget</code>. However the Python wget module does have a better argument

handline.</p>

5) PyMC

I'm not sure how PyMC gets left out of the mix so often. scikit-learn seems to be everyone's darling (as it should, it's fantastic), but in my opinion, not enough love is given to PyMC.

from pymc.examples import disaster_model
from pymc import MCMC
M = MCMC(disaster_model)
M.sample(iter=10000, burn=1000, thin=10)
[-----------------100%-----------------] 10000 of 10000 complete in 1.4 sec 

If you don't already know it, PyMC is a library for doing Bayesian analysis. It's featured heavily in Cam Davidson-Pilon's Bayesian Methods for Hackers and has made cameos on a lot of popular data science/python blogs, but has never received the cult following akin to scikit-learn.

6) sh

I can't risk you leaving this page and not knowing about sh. sh lets you import shell commands into Python as functions. It's super useful for doing things that are easy in bash but you can't remember how to do in Python (i.e. recursively searching for files).

from sh import find
find("/tmp")
/tmp/foo
/tmp/foo/file1.json
/tmp/foo/file2.json
/tmp/foo/file3.json
/tmp/foo/bar/file3.json 

7) fuzzywuzzy

Ranking in the top 10 of simplest libraries I've ever used (if you have 2-3 minutes, you can read through the source), fuzzywuzzy is a fuzzy string matching library built by the fine people at SeatGeek.

fuzzywuzzy implements things like string comparison ratios, token ratios, and plenty of other matching metrics. It's great for creating feature vectors or matching up records in different databases.

from fuzzywuzzy import fuzz
fuzz.ratio("Hit me with your best shot", "Hit me with your pet shark")

85 </pre>

8) <a href="/misc/goto?guid=4958862358973432092">progressbar</a> </h3>

You know those scripts you have where you do a print "still going..." in that giant mess of a for loop you call your __main__? Yeah well instead of doing that, why don't you step up your game and start using progressbar?

progressbar does pretty much exactly what you think it does...makes progress bars. And while this isn't exactly a data science specific activity, it does put a nice touch on those extra long running scripts.

Alas, as another GoogleCode outcast, it's not getting much love (the docs have 2 spaces for indents...2!!!). Do what's right and give it a good ole pip install.

from progressbar import ProgressBar
import time
pbar = ProgressBar(maxval=10)
for i in range(1, 11):
    pbar.update(i)
    time.sleep(1)
pbar.finish()

60% |######################################################## | </pre>

9) <a href="/misc/goto?guid=4958862359068018949">colorama</a> </h3>

So while you're making your logs have nice progress bars, why not also make them colorful! It can actually be helpful for reminding yourself when things are going horribly wrong.

colorama is super easy to use. Just pop it into your scripts and add any text you want to print to a color:

10) uuid

I'm of the mind that there are really only a few tools one needs in programming: hashing, key/value stores, and universally unique ids. uuid is the built in Python UUID library. It implements versions 1, 3, 4, and 5 of the UUID standards and is really handy for doing things like...err...ensuring uniqueness.

That might sound silly, but how many times have you had records for a marketing campaign, or an e-mail drop and you want to make sure everyone gets their own promo code or id number?

And if you're worried about running out of ids, then fear not! The number of UUIDs you can generate is comparable to the number of atoms in the universe.

import uuid
print uuid.uuid4()

e7bafa3d-274e-4b0a-b9cc-d898957b4b61 </pre>
Well if you were a uuid you probably would be.

</p>

11) bashplotlib

Shameless self-promotion here, bashplotlib is one of my creations. It lets you plot histograms and scatterplots using stdin. So while you might not find it replacing ggplot or matplotlib as your everyday plotting library, the novelty value is quite high. At the very least, use it as a way to spruce up your logs a bit.

$ pip install bashplotlib
$ scatter --file data/texas.txt --pch x 
via:http://blog.yhathq.com/posts/11-python-libraries-you-might-not-know.html

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!