Contents
In this interactive tutorial, we'll cover many essential Python idioms and techniques in depth, adding immediately useful tools to your belt.
There are 3 versions of this presentation:
©2006-2008, licensed under a Creative Commons Attribution/Share-Alike (BY-SA) license.
My credentials: I am
In the tutorial I presented at PyCon 2006 (called Text & Data Processing), I was surprised at the reaction to some techniques I used that I had thought were common knowledge. But many of the attendees were unaware of these tools that experienced Python programmers use without thinking.
Many of you will have seen some of these techniques and idioms before. Hopefully you'll learn a few techniques that you haven't seen before and maybe something new about the ones you have already seen.
These are the guiding principles of Python, but are open to interpretation. A sense of humor is required for their proper interpretation.
If you're using a programming language named after a sketch comedy troupe, you had better have a sense of humor.
Beautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Complex is better than complicated.Flat is better than nested.Sparse is better than dense.Readability counts.Special cases aren't special enough to break the rules.Although practicality beats purity.Errors should never pass silently.Unless explicitly silenced....
In the face of ambiguity, refuse the temptation to guess.There should be one—and preferably only one—obvious way to do it.Although that way may not be obvious at first unless you're Dutch.Now is better than never.Although never is often better than right now.If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.Namespaces are one honking great idea—let's do more of those!—Tim Peters
This particular "poem" began as a kind of a joke, but it really embeds a lot of truth about the philosophy behind Python. The Zen of Python has been formalized in PEP 20, where the abstract reads:
Long time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down.
—www.python.org/dev/peps/pep-0020/
You can decide for yourself if you're a "Pythoneer" or a "Pythonista". The terms have somewhat different connotations.
When in doubt:
import this
Try it in a Python interactive interpreter:
>>> import this
Here's another easter egg:
>>> from __future__ import braces File "<stdin>", line 1 SyntaxError: not a chance
What a bunch of comedians! :-)
Programs must be written for people to read, and only incidentally for machines to execute.
—Abelson & Sussman, Structure and Interpretation of Computer Programs
Worthwhile reading:
www.python.org/dev/peps/pep-0008/
PEP = Python Enhancement Proposal
A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment.
The Python community has its own standards for what source code should look like, codified in PEP 8. These standards are different from those of other communities, like C, C++, C#, Java, VisualBasic, etc.
Because indentation and whitespace are so important in Python, the Style Guide for Python Code approaches a standard. It would be wise to adhere to the guide! Most open-source projects and (hopefully) in-house projects follow the style guide quite closely.
4 spaces per indentation level.
No hard tabs.
Never mix tabs and spaces.
This is exactly what IDLE and the Emacs Python mode support. Other editors may also provide this support.
One blank line between functions.
Two blank lines between classes.
def make_squares(key, value=0): """Return a dictionary and a list...""" d = {key: value} l = [key, value] return d, l
joined_lower for functions, methods, attributes
joined_lower or ALL_CAPS for constants
StudlyCaps for classes
camelCase only to conform to pre-existing conventions
Attributes: interface, _internal, __private
But try to avoid the __private form. I never use it. Trust me. If you use it, you WILL regret it later.
Explanation:
People coming from a C++/Java background are especially prone to overusing/misusing this "feature". But __private names don't work the same way as in Java or C++. They just trigger a name mangling whose purpose is to prevent accidental namespace collisions in subclasses: MyClass.__private just becomes MyClass._MyClass__private. (Note that even this breaks down for subclasses with the same name as the superclass, e.g. subclasses in different modules.) It is possible to access __private names from outside their class, just inconvenient and fragile (it adds a dependency on the exact name of the superclass).
The problem is that the author of a class may legitimately think "this attribute/method name should be private, only accessible from within this class definition" and use the __private convention. But later on, a user of that class may make a subclass that legitimately needs access to that name. So either the superclass has to be modified (which may be difficult or impossible), or the subclass code has to use manually mangled names (which is ugly and fragile at best).
There's a concept in Python: "we're all consenting adults here". If you use the __private form, who are you protecting the attribute from? It's the responsibility of subclasses to use attributes from superclasses properly, and it's the responsibility of superclasses to document their attributes properly.
It's better to use the single-leading-underscore convention, _internal. This isn't name mangled at all; it just indicates to others to "be careful with this, it's an internal implementation detail; don't touch it if you don't fully understand it". It's only a convention though.
There are some good explanations in the answers here:
Keep lines below 80 characters in length.
Use implied line continuation inside parentheses/brackets/braces:
def __init__(self, first, second, third, fourth, fifth, sixth): output = (first + second + third + fourth + fifth + sixth)
Use backslashes as a last resort:
VeryLong.left_hand_side \ = even_longer.right_hand_side()
>>> print 'o' 'n' "e" one
The spaces between literals are not required, but help with readability. Any type of quoting can be used:
>>> print 't' r'\/\/' """o""" t\/\/o
The string prefixed with an "r" is a "raw" string. Backslashes are not evaluated as escapes in raw strings. They're useful for regular expressions and Windows filesystem paths.
Note named string objects are not concatenated:
>>> a = 'three' >>> b = 'four' >>> a b File "<stdin>", line 1 a b ^ SyntaxError: invalid syntax
That's because this automatic concatenation is a feature of the Python parser/compiler, not the interpreter. You must use the "+" operator to concatenate strings at run time.
text = ('Long strings can be made up ' 'of several shorter strings.')
The parentheses allow implicit line continuation.
Multiline strings use triple quotes:
"""Triple double quotes"""
'''\ Triple single quotes\ '''
Good:
if foo == 'blah': do_something() do_one() do_two() do_three()
Bad:
if foo == 'blah': do_something() do_one(); do_two(); do_three()
Whitespace & indentations are useful visual indicators of the program flow. The indentation of the second "Good" line above shows the reader that something's going on, whereas the lack of indentation in "Bad" hides the "if" statement.
Multiple statements on one line are a cardinal sin. In Python, readability counts.
Docstrings = How to use code
Comments = Why (rationale) & how code works
Docstrings explain how to use code, and are for the users of your code. Uses of docstrings:
Comments explain why, and are for the maintainers of your code. Examples include notes to yourself, like:
# !!! BUG: ... # !!! FIX: This is a hack # ??? Why is this here?
Both of these groups include you, so write good docstrings and comments!
Docstrings are useful in interactive use (help()) and for auto-documentation systems.
False comments & docstrings are worse than none at all. So keep them up to date! When you make changes, make sure the comments & docstrings are consistent with the code, and don't contradict it.
There's an entire PEP about docstrings, PEP 257, "Docstring Conventions":
www.python.org/dev/peps/pep-0257/
A foolish consistency is the hobgoblin of little minds.
—Ralph Waldo Emerson
(hobgoblin: Something causing superstitious fear; a bogy.)
There are always exceptions. From PEP 8:
But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!
Two good reasons to break a particular rule:
- When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.
- To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).
... but practicality shouldn't beat purity to a pulp!
A selection of small, useful idioms.
Now we move on to the meat of the tutorial: lots of idioms.
We'll start with some easy ones and work our way up.
In other languages:
temp = a a = b b = temp
In Python:
b, a = a, b
The right-hand side is unpacked into the names in the tuple on the left-hand side.
Further examples of unpacking:
>>> l =['David', 'Pythonista', '+1-514-555-1234'] >>> name, title, phone = l >>> name 'David' >>> title 'Pythonista' >>> phone '+1-514-555-1234'
Useful in loops over structured data:
l (L) above is the list we just made (David's info). So people is a list containing two items, each a 3-item list.
>>> people = [l, ['Guido', 'BDFL', 'unlisted']] >>> for (name, title, phone) in people: ... print name, phone ... David +1-514-555-1234 Guido unlisted
Each item in people is being unpacked into the (name, title, phone) tuple.
Arbitrarily nestable (just be sure to match the structure on the left & right!):
>>> david, (gname, gtitle, gphone) = people >>> gname 'Guido' >>> gtitle 'BDFL' >>> gphone 'unlisted' >>> david ['David', 'Pythonista', '+1-514-555-1234']
>>> 1, (1,)
>>> (1,) (1,)
>>> (1) 1
>>> () ()
>>> tuple() ()
>>> value = 1, >>> value (1,)
This is a really useful feature that surprisingly few people know.
In the interactive interpreter, whenever you evaluate an expression or call a function, the result is bound to a temporary name, _ (an underscore):
>>> 1 + 1 2 >>> _ 2
_ stores the last printed expression.
When a result is None, nothing is printed, so _ doesn't change. That's convenient!
This only works in the interactive interpreter, not within a module.
It is especially useful when you're working out a problem interactively, and you want to store the result for a later step:
>>> import math >>> math.pi / 3 1.0471975511965976 >>> angle = _ >>> math.cos(angle) 0.50000000000000011 >>> _ 0.50000000000000011
colors = ['red', 'blue', 'green', 'yellow']
Don't do this:
result = '' for s in colors: result += s
This is very inefficient.
It has terrible memory usage and performance patterns. The "summation" will compute, store, and then throw away each intermediate step.
Instead, do this:
result = ''.join(colors)
The join() string method does all the copying in one pass.
When you're only dealing with a few dozen or hundred strings, it won't make much difference. But get in the habit of building strings efficiently, because with thousands or with loops, it will make a difference.
If you want spaces between your substrings:
result = ' '.join(colors)
Or commas and spaces:
result = ', '.join(colors)
Here's a common case:
colors = ['red', 'blue', 'green', 'yellow'] print 'Choose', ', '.join(colors[:-1]), \ 'or', colors[-1]
To make a nicely grammatical sentence, we want commas between all but the last pair of values, where we want the word "or". The slice syntax does the job. The "slice until -1" ([:-1]) gives all but the last value, which we join with comma-space.
Of course, this code wouldn't work with corner cases, lists of length 0 or 1.
Choose red, blue, green or yellow
If you need to apply a function to generate the substrings:
result = ''.join(fn(i) for i in items)
If you need to compute the substrings incrementally, accumulate them in a list first:
items = [] ... items.append(item) # many times ... # items is now complete result = ''.join(fn(i) for i in items)
Good:
for key in d: print key
Bad:
for key in d.keys(): print key
But .keys() is necessary when mutating the dictionary:
for key in d.keys(): d[str(key)] = d[key]
For consistency, use key in dict, not dict.has_key():
# do this: if key in d: ...do something with d[key] # not this: if d.has_key(key): ...do something with d[key]
We often have to initialize dictionary entries before use:
navs = {} for (portfolio, equity, position) in data: if portfolio not in navs: navs[portfolio] = 0 navs[portfolio] += position * prices[equity]
dict.get(key, default) removes the need for the test:
navs = {} for (portfolio, equity, position) in data: navs[portfolio] = (navs.get(portfolio, 0) + position * prices[equity])
Initializing mutable dictionary values:
equities = {} for (portfolio, equity) in data: if portfolio in equities: equities[portfolio].append(equity) else: equities[portfolio] = [equity]
dict.setdefault(key, default) does the job much more efficiently:
equities = {} for (portfolio, equity) in data: equities.setdefault(portfolio, []).append( equity)
dict.setdefault() is equivalent to "get, or set & get". Or "set if necessary, then get". It's especially efficient if your dictionary key is expensive to compute or long to type.
The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute.
If the default value is expensive to compute, you may want to use the defaultdict class, which we'll cover shortly.
setdefault can also be used as a stand-alone statement:
navs = {} for (portfolio, equity, position) in data: navs.setdefault(portfolio, 0) navs[portfolio] += position * prices[equity]
New in Python 2.5.
defaultdict is new in Python 2.5, part of the collections module. defaultdict is identical to regular dictionaries, except for two things:
There are two ways to get defaultdict:
import the collections module and reference it via the module,
➔
or import the defaultdict name directly:
➔
import collections d = collections.defaultdict(...)
from collections import defaultdict d = defaultdict(...)
from collections import defaultdict equities = defaultdict(list) for (portfolio, equity) in data: equities[portfolio].append(equity)
There's no fumbling around at all now. In this case, the default factory function is list, which returns an empty list.
This is how to get a dictionary with default values of 0: use int as a default factory function:
navs = defaultdict(int) for (portfolio, equity, position) in data: navs[portfolio] += position * prices[equity]
given = ['John', 'Eric', 'Terry', 'Michael'] family = ['Cleese', 'Idle', 'Gilliam', 'Palin']
pythons = dict(zip(given, family))
>>> pprint.pprint(pythons) {'John': 'Cleese', 'Michael': 'Palin', 'Eric': 'Idle', 'Terry': 'Gilliam'}
>>> pythons.keys() ['John', 'Michael', 'Eric', 'Terry'] >>> pythons.values() ['Cleese', 'Palin', 'Idle', 'Gilliam']
# do this: # not this: if x: if x == True: pass pass
Testing a list:
# do this: # not this: if items: if len(items) != 0: pass pass # and definitely not this: if items != []: pass
False | True |
---|---|
False (== 0) | True (== 1) |
"" (empty string) | any string but "" (" ", "anything") |
0, 0.0 | any number but 0 (1, 0.1, -1, 3.14) |
[], (), {}, set() | any non-empty container ([0], (None,), ['']) |
None | almost any object that's not explicitly False |
Example of an object's truth value:
>>> class C: ... pass ... >>> o = C() >>> bool(o) True >>> bool(C) True
(Examples: execute truth.py.)
To control the truth value of instances of a user-defined class, use the __nonzero__ or __len__ special methods. Use __len__ if your class is a container which has a length:
class MyContainer(object): def __init__(self, data): self.data = data def __len__(self): """Return my length.""" return len(self.data)
If your class is not a container, use __nonzero__:
class MyClass(object): def __init__(self, value): self.value = value def __nonzero__(self): """Return my truth value (True or False).""" # This could be arbitrarily complex: return bool(self.value)
In Python 3.0, __nonzero__ has been renamed to __bool__ for consistency with the bool built-in type. For compatibility, add this to the class definition:
__bool__ = __nonzero__
>>> items = 'zero one two three'.split() >>> print items ['zero', 'one', 'two', 'three']
Say we want to iterate over the items, and we need both the item's index and the item itself:
- or - i = 0 for item in items: for i in range(len(items)): print i, item print i, items[i] i += 1
The enumerate function takes a list and returns (index, item) pairs:
>>> print list(enumerate(items)) [(0, 'zero'), (1, 'one'), (2, 'two'), (3, 'three')]
Our loop becomes much simpler:
for (index, item) in enumerate(items): print index, item
# compare: # compare: index = 0 for i in range(len(items)): for item in items: print i, items[i] print index, item index += 1
The enumerate version is much shorter and simpler than the version on the left, and much easier to read and understand than either.
An example showing how the enumerate function actually returns an iterator (a generator is a kind of iterator):
>>> enumerate(items) <enumerate object at 0x011EA1C0> >>> e = enumerate(items) >>> e.next() (0, 'zero') >>> e.next() (1, 'one') >>> e.next() (2, 'two') >>> e.next() (3, 'three') >>> e.next() Traceback (most recent call last): File "<stdin>", line 1, in ? StopIteration