summaryrefslogtreecommitdiff
path: root/Doc/howto
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/howto')
-rw-r--r--Doc/howto/advocacy.rst3
-rw-r--r--Doc/howto/argparse.rst764
-rw-r--r--Doc/howto/cporting.rst117
-rw-r--r--Doc/howto/curses.rst6
-rw-r--r--Doc/howto/descriptor.rst14
-rw-r--r--Doc/howto/doanddont.rst290
-rw-r--r--Doc/howto/functional.rst488
-rw-r--r--Doc/howto/index.rst5
-rw-r--r--Doc/howto/logging-cookbook.rst1672
-rw-r--r--Doc/howto/logging.rst1053
-rwxr-xr-xDoc/howto/logging_flow.pngbin0 -> 49648 bytes
-rw-r--r--Doc/howto/pyporting.rst715
-rw-r--r--Doc/howto/regex.rst79
-rw-r--r--Doc/howto/sockets.rst62
-rw-r--r--Doc/howto/sorting.rst18
-rw-r--r--Doc/howto/unicode.rst286
-rw-r--r--Doc/howto/urllib2.rst39
-rw-r--r--Doc/howto/webservers.rst14
18 files changed, 4721 insertions, 904 deletions
diff --git a/Doc/howto/advocacy.rst b/Doc/howto/advocacy.rst
index e67e201702..2969d266ad 100644
--- a/Doc/howto/advocacy.rst
+++ b/Doc/howto/advocacy.rst
@@ -264,8 +264,7 @@ the organizations that use Python.
**What are the restrictions on Python's use?**
-They're practically nonexistent. Consult the :file:`Misc/COPYRIGHT` file in the
-source distribution, or the section :ref:`history-and-license` for the full
+They're practically nonexistent. Consult :ref:`history-and-license` for the full
language, but it boils down to three conditions:
* You have to leave the copyright notice on the software; if you don't include
diff --git a/Doc/howto/argparse.rst b/Doc/howto/argparse.rst
new file mode 100644
index 0000000000..a134036802
--- /dev/null
+++ b/Doc/howto/argparse.rst
@@ -0,0 +1,764 @@
+*****************
+Argparse Tutorial
+*****************
+
+:author: Tshepang Lekhonkhobe
+
+.. _argparse-tutorial:
+
+This tutorial is intended to be a gentle introduction to :mod:`argparse`, the
+recommended command-line parsing module in the Python standard library.
+
+.. note::
+
+ There's two other modules that fulfill the same task, namely
+ :mod:`getopt` (an equivalent for :c:func:`getopt` from the C
+ language) and the deprecated :mod:`optparse`.
+ Note also that :mod:`argparse` is based on :mod:`optparse`,
+ and therefore very similar in terms of usage.
+
+
+Concepts
+========
+
+Let's show the sort of functionality that we are going to explore in this
+introductory tutorial by making use of the :command:`ls` command:
+
+.. code-block:: sh
+
+ $ ls
+ cpython devguide prog.py pypy rm-unused-function.patch
+ $ ls pypy
+ ctypes_configure demo dotviewer include lib_pypy lib-python ...
+ $ ls -l
+ total 20
+ drwxr-xr-x 19 wena wena 4096 Feb 18 18:51 cpython
+ drwxr-xr-x 4 wena wena 4096 Feb 8 12:04 devguide
+ -rwxr-xr-x 1 wena wena 535 Feb 19 00:05 prog.py
+ drwxr-xr-x 14 wena wena 4096 Feb 7 00:59 pypy
+ -rw-r--r-- 1 wena wena 741 Feb 18 01:01 rm-unused-function.patch
+ $ ls --help
+ Usage: ls [OPTION]... [FILE]...
+ List information about the FILEs (the current directory by default).
+ Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
+ ...
+
+A few concepts we can learn from the four commands:
+
+* The :command:`ls` command is useful when run without any options at all. It defaults
+ to displaying the contents of the current directory.
+
+* If we want beyond what it provides by default, we tell it a bit more. In
+ this case, we want it to display a different directory, ``pypy``.
+ What we did is specify what is known as a positional argument. It's named so
+ because the program should know what to do with the value, solely based on
+ where it appears on the command line. This concept is more relevant
+ to a command like :command:`cp`, whose most basic usage is ``cp SRC DEST``.
+ The first position is *what you want copied,* and the second
+ position is *where you want it copied to*.
+
+* Now, say we want to change behaviour of the program. In our example,
+ we display more info for each file instead of just showing the file names.
+ The ``-l`` in that case is known as an optional argument.
+
+* That's a snippet of the help text. It's very useful in that you can
+ come across a program you have never used before, and can figure out
+ how it works simply by reading it's help text.
+
+
+The basics
+==========
+
+Let us start with a very simple example which does (almost) nothing::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.parse_args()
+
+Following is a result of running the code:
+
+.. code-block:: sh
+
+ $ python3 prog.py
+ $ python3 prog.py --help
+ usage: prog.py [-h]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ $ python3 prog.py --verbose
+ usage: prog.py [-h]
+ prog.py: error: unrecognized arguments: --verbose
+ $ python3 prog.py foo
+ usage: prog.py [-h]
+ prog.py: error: unrecognized arguments: foo
+
+Here is what is happening:
+
+* Running the script without any options results in nothing displayed to
+ stdout. Not so useful.
+
+* The second one starts to display the usefulness of the :mod:`argparse`
+ module. We have done almost nothing, but already we get a nice help message.
+
+* The ``--help`` option, which can also be shortened to ``-h``, is the only
+ option we get for free (i.e. no need to specify it). Specifying anything
+ else results in an error. But even then, we do get a useful usage message,
+ also for free.
+
+
+Introducing Positional arguments
+================================
+
+An example::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("echo")
+ args = parser.parse_args()
+ print(args.echo)
+
+And running the code:
+
+.. code-block:: sh
+
+ $ python3 prog.py
+ usage: prog.py [-h] echo
+ prog.py: error: the following arguments are required: echo
+ $ python3 prog.py --help
+ usage: prog.py [-h] echo
+
+ positional arguments:
+ echo
+
+ optional arguments:
+ -h, --help show this help message and exit
+ $ python3 prog.py foo
+ foo
+
+Here is what's happening:
+
+* We've added the :meth:`add_argument` method, which is what we use to specify
+ which command-line options the program is willing to accept. In this case,
+ I've named it ``echo`` so that it's in line with its function.
+
+* Calling our program now requires us to specify an option.
+
+* The :meth:`parse_args` method actually returns some data from the
+ options specified, in this case, ``echo``.
+
+* The variable is some form of 'magic' that :mod:`argparse` performs for free
+ (i.e. no need to specify which variable that value is stored in).
+ You will also notice that its name matches the string argument given
+ to the method, ``echo``.
+
+Note however that, although the help display looks nice and all, it currently
+is not as helpful as it can be. For example we see that we got ``echo`` as a
+positional argument, but we don't know what it does, other than by guessing or
+by reading the source code. So, let's make it a bit more useful::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("echo", help="echo the string you use here")
+ args = parser.parse_args()
+ print(args.echo)
+
+And we get:
+
+.. code-block:: sh
+
+ $ python3 prog.py -h
+ usage: prog.py [-h] echo
+
+ positional arguments:
+ echo echo the string you use here
+
+ optional arguments:
+ -h, --help show this help message and exit
+
+Now, how about doing something even more useful::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", help="display a square of a given number")
+ args = parser.parse_args()
+ print(args.square**2)
+
+Following is a result of running the code:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4
+ Traceback (most recent call last):
+ File "prog.py", line 5, in <module>
+ print(args.square**2)
+ TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
+
+That didn't go so well. That's because :mod:`argparse` treats the options we
+give it as strings, unless we tell it otherwise. So, let's tell
+:mod:`argparse` to treat that input as an integer::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", help="display a square of a given number",
+ type=int)
+ args = parser.parse_args()
+ print(args.square**2)
+
+Following is a result of running the code:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4
+ 16
+ $ python3 prog.py four
+ usage: prog.py [-h] square
+ prog.py: error: argument square: invalid int value: 'four'
+
+That went well. The program now even helpfully quits on bad illegal input
+before proceeding.
+
+
+Introducing Optional arguments
+==============================
+
+So far we, have been playing with positional arguments. Let us
+have a look on how to add optional ones::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--verbosity", help="increase output verbosity")
+ args = parser.parse_args()
+ if args.verbosity:
+ print("verbosity turned on")
+
+And the output:
+
+.. code-block:: sh
+
+ $ python3 prog.py --verbosity 1
+ verbosity turned on
+ $ python3 prog.py
+ $ python3 prog.py --help
+ usage: prog.py [-h] [--verbosity VERBOSITY]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ --verbosity VERBOSITY
+ increase output verbosity
+ $ python3 prog.py --verbosity
+ usage: prog.py [-h] [--verbosity VERBOSITY]
+ prog.py: error: argument --verbosity: expected one argument
+
+Here is what is happening:
+
+* The program is written so as to display something when ``--verbosity`` is
+ specified and display nothing when not.
+
+* To show that the option is actually optional, there is no error when running
+ the program without it. Note that by default, if an optional argument isn't
+ used, the relevant variable, in this case :attr:`args.verbosity`, is
+ given ``None`` as a value, which is the reason it fails the truth
+ test of the :keyword:`if` statement.
+
+* The help message is a bit different.
+
+* When using the ``--verbosity`` option, one must also specify some value,
+ any value.
+
+The above example accepts arbitrary integer values for ``--verbosity``, but for
+our simple program, only two values are actually useful, ``True`` or ``False``.
+Let's modify the code accordingly::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--verbose", help="increase output verbosity",
+ action="store_true")
+ args = parser.parse_args()
+ if args.verbose:
+ print("verbosity turned on")
+
+And the output:
+
+.. code-block:: sh
+
+ $ python3 prog.py --verbose
+ verbosity turned on
+ $ python3 prog.py --verbose 1
+ usage: prog.py [-h] [--verbose]
+ prog.py: error: unrecognized arguments: 1
+ $ python3 prog.py --help
+ usage: prog.py [-h] [--verbose]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ --verbose increase output verbosity
+
+Here is what is happening:
+
+* The option is now more of a flag than something that requires a value.
+ We even changed the name of the option to match that idea.
+ Note that we now specify a new keyword, ``action``, and give it the value
+ ``"store_true"``. This means that, if the option is specified,
+ assign the value ``True`` to :data:`args.verbose`.
+ Not specifying it implies ``False``.
+
+* It complains when you specify a value, in true spirit of what flags
+ actually are.
+
+* Notice the different help text.
+
+
+Short options
+-------------
+
+If you are familiar with command line usage,
+you will notice that I haven't yet touched on the topic of short
+versions of the options. It's quite simple::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("-v", "--verbose", help="increase output verbosity",
+ action="store_true")
+ args = parser.parse_args()
+ if args.verbose:
+ print("verbosity turned on")
+
+And here goes:
+
+.. code-block:: sh
+
+ $ python3 prog.py -v
+ verbosity turned on
+ $ python3 prog.py --help
+ usage: prog.py [-h] [-v]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -v, --verbose increase output verbosity
+
+Note that the new ability is also reflected in the help text.
+
+
+Combining Positional and Optional arguments
+===========================================
+
+Our program keeps growing in complexity::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", type=int,
+ help="display a square of a given number")
+ parser.add_argument("-v", "--verbose", action="store_true",
+ help="increase output verbosity")
+ args = parser.parse_args()
+ answer = args.square**2
+ if args.verbose:
+ print("the square of {} equals {}".format(args.square, answer))
+ else:
+ print(answer)
+
+And now the output:
+
+.. code-block:: sh
+
+ $ python3 prog.py
+ usage: prog.py [-h] [-v] square
+ prog.py: error: the following arguments are required: square
+ $ python3 prog.py 4
+ 16
+ $ python3 prog.py 4 --verbose
+ the square of 4 equals 16
+ $ python3 prog.py --verbose 4
+ the square of 4 equals 16
+
+* We've brought back a positional argument, hence the complaint.
+
+* Note that the order does not matter.
+
+How about we give this program of ours back the ability to have
+multiple verbosity values, and actually get to use them::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", type=int,
+ help="display a square of a given number")
+ parser.add_argument("-v", "--verbosity", type=int,
+ help="increase output verbosity")
+ args = parser.parse_args()
+ answer = args.square**2
+ if args.verbosity == 2:
+ print("the square of {} equals {}".format(args.square, answer))
+ elif args.verbosity == 1:
+ print("{}^2 == {}".format(args.square, answer))
+ else:
+ print(answer)
+
+And the output:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4
+ 16
+ $ python3 prog.py 4 -v
+ usage: prog.py [-h] [-v VERBOSITY] square
+ prog.py: error: argument -v/--verbosity: expected one argument
+ $ python3 prog.py 4 -v 1
+ 4^2 == 16
+ $ python3 prog.py 4 -v 2
+ the square of 4 equals 16
+ $ python3 prog.py 4 -v 3
+ 16
+
+These all look good except the last one, which exposes a bug in our program.
+Let's fix it by restricting the values the ``--verbosity`` option can accept::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", type=int,
+ help="display a square of a given number")
+ parser.add_argument("-v", "--verbosity", type=int, choices=[0, 1, 2],
+ help="increase output verbosity")
+ args = parser.parse_args()
+ answer = args.square**2
+ if args.verbosity == 2:
+ print("the square of {} equals {}".format(args.square, answer))
+ elif args.verbosity == 1:
+ print("{}^2 == {}".format(args.square, answer))
+ else:
+ print(answer)
+
+And the output:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4 -v 3
+ usage: prog.py [-h] [-v {0,1,2}] square
+ prog.py: error: argument -v/--verbosity: invalid choice: 3 (choose from 0, 1, 2)
+ $ python3 prog.py 4 -h
+ usage: prog.py [-h] [-v {0,1,2}] square
+
+ positional arguments:
+ square display a square of a given number
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -v {0,1,2}, --verbosity {0,1,2}
+ increase output verbosity
+
+Note that the change also reflects both in the error message as well as the
+help string.
+
+Now, let's use a different approach of playing with verbosity, which is pretty
+common. It also matches the way the CPython executable handles its own
+verbosity argument (check the output of ``python --help``)::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", type=int,
+ help="display the square of a given number")
+ parser.add_argument("-v", "--verbosity", action="count",
+ help="increase output verbosity")
+ args = parser.parse_args()
+ answer = args.square**2
+ if args.verbosity == 2:
+ print("the square of {} equals {}".format(args.square, answer))
+ elif args.verbosity == 1:
+ print("{}^2 == {}".format(args.square, answer))
+ else:
+ print(answer)
+
+We have introduced another action, "count",
+to count the number of occurences of a specific optional arguments:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4
+ 16
+ $ python3 prog.py 4 -v
+ 4^2 == 16
+ $ python3 prog.py 4 -vv
+ the square of 4 equals 16
+ $ python3 prog.py 4 --verbosity --verbosity
+ the square of 4 equals 16
+ $ python3 prog.py 4 -v 1
+ usage: prog.py [-h] [-v] square
+ prog.py: error: unrecognized arguments: 1
+ $ python3 prog.py 4 -h
+ usage: prog.py [-h] [-v] square
+
+ positional arguments:
+ square display a square of a given number
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -v, --verbosity increase output verbosity
+ $ python3 prog.py 4 -vvv
+ 16
+
+* Yes, it's now more of a flag (similar to ``action="store_true"``) in the
+ previous version of our script. That should explain the complaint.
+
+* It also behaves similar to "store_true" action.
+
+* Now here's a demonstration of what the "count" action gives. You've probably
+ seen this sort of usage before.
+
+* And, just like the "store_true" action, if you don't specify the ``-v`` flag,
+ that flag is considered to have ``None`` value.
+
+* As should be expected, specifying the long form of the flag, we should get
+ the same output.
+
+* Sadly, our help output isn't very informative on the new ability our script
+ has acquired, but that can always be fixed by improving the documentation for
+ out script (e.g. via the ``help`` keyword argument).
+
+* That last output exposes a bug in our program.
+
+
+Let's fix::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", type=int,
+ help="display a square of a given number")
+ parser.add_argument("-v", "--verbosity", action="count",
+ help="increase output verbosity")
+ args = parser.parse_args()
+ answer = args.square**2
+
+ # bugfix: replace == with >=
+ if args.verbosity >= 2:
+ print("the square of {} equals {}".format(args.square, answer))
+ elif args.verbosity >= 1:
+ print("{}^2 == {}".format(args.square, answer))
+ else:
+ print(answer)
+
+And this is what it gives:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4 -vvv
+ the square of 4 equals 16
+ $ python3 prog.py 4 -vvvv
+ the square of 4 equals 16
+ $ python3 prog.py 4
+ Traceback (most recent call last):
+ File "prog.py", line 11, in <module>
+ if args.verbosity >= 2:
+ TypeError: unorderable types: NoneType() >= int()
+
+* First output went well, and fixes the bug we had before.
+ That is, we want any value >= 2 to be as verbose as possible.
+
+* Third output not so good.
+
+Let's fix that bug::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("square", type=int,
+ help="display a square of a given number")
+ parser.add_argument("-v", "--verbosity", action="count", default=0,
+ help="increase output verbosity")
+ args = parser.parse_args()
+ answer = args.square**2
+ if args.verbosity >= 2:
+ print("the square of {} equals {}".format(args.square, answer))
+ elif args.verbosity >= 1:
+ print("{}^2 == {}".format(args.square, answer))
+ else:
+ print(answer)
+
+We've just introduced yet another keyword, ``default``.
+We've set it to ``0`` in order to make it comparable to the other int values.
+Remember that by default,
+if an optional argument isn't specified,
+it gets the ``None`` value, and that cannot be compared to an int value
+(hence the :exc:`TypeError` exception).
+
+And:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4
+ 16
+
+You can go quite far just with what we've learned so far,
+and we have only scratched the surface.
+The :mod:`argparse` module is very powerful,
+and we'll explore a bit more of it before we end this tutorial.
+
+
+Getting a little more advanced
+==============================
+
+What if we wanted to expand our tiny program to perform other powers,
+not just squares::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("x", type=int, help="the base")
+ parser.add_argument("y", type=int, help="the exponent")
+ parser.add_argument("-v", "--verbosity", action="count", default=0)
+ args = parser.parse_args()
+ answer = args.x**args.y
+ if args.verbosity >= 2:
+ print("{} to the power {} equals {}".format(args.x, args.y, answer))
+ elif args.verbosity >= 1:
+ print("{}^{} == {}".format(args.x, args.y, answer))
+ else:
+ print(answer)
+
+Output:
+
+.. code-block:: sh
+
+ $ python3 prog.py
+ usage: prog.py [-h] [-v] x y
+ prog.py: error: the following arguments are required: x, y
+ $ python3 prog.py -h
+ usage: prog.py [-h] [-v] x y
+
+ positional arguments:
+ x the base
+ y the exponent
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -v, --verbosity
+ $ python3 prog.py 4 2 -v
+ 4^2 == 16
+
+
+Notice that so far we've been using verbosity level to *change* the text
+that gets displayed. The following example instead uses verbosity level
+to display *more* text instead::
+
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument("x", type=int, help="the base")
+ parser.add_argument("y", type=int, help="the exponent")
+ parser.add_argument("-v", "--verbosity", action="count", default=0)
+ args = parser.parse_args()
+ answer = args.x**args.y
+ if args.verbosity >= 2:
+ print("Running '{}'".format(__file__))
+ if args.verbosity >= 1:
+ print("{}^{} == ".format(args.x, args.y), end="")
+ print(answer)
+
+Output:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4 2
+ 16
+ $ python3 prog.py 4 2 -v
+ 4^2 == 16
+ $ python3 prog.py 4 2 -vv
+ Running 'prog.py'
+ 4^2 == 16
+
+
+Conflicting options
+-------------------
+
+So far, we have been working with two methods of an
+:class:`argparse.ArgumentParser` instance. Let's introduce a third one,
+:meth:`add_mutually_exclusive_group`. It allows for us to specify options that
+conflict with each other. Let's also change the rest of the program make the
+new functionality makes more sense:
+we'll introduce the ``--quiet`` option,
+which will be the opposite of the ``--verbose`` one::
+
+ import argparse
+
+ parser = argparse.ArgumentParser()
+ group = parser.add_mutually_exclusive_group()
+ group.add_argument("-v", "--verbose", action="store_true")
+ group.add_argument("-q", "--quiet", action="store_true")
+ parser.add_argument("x", type=int, help="the base")
+ parser.add_argument("y", type=int, help="the exponent")
+ args = parser.parse_args()
+ answer = args.x**args.y
+
+ if args.quiet:
+ print(answer)
+ elif args.verbose:
+ print("{} to the power {} equals {}".format(args.x, args.y, answer))
+ else:
+ print("{}^{} == {}".format(args.x, args.y, answer))
+
+Our program is now simpler, and we've lost some functionality for the sake of
+demonstration. Anyways, here's the output:
+
+.. code-block:: sh
+
+ $ python3 prog.py 4 2
+ 4^2 == 16
+ $ python3 prog.py 4 2 -q
+ 16
+ $ python3 prog.py 4 2 -v
+ 4 to the power 2 equals 16
+ $ python3 prog.py 4 2 -vq
+ usage: prog.py [-h] [-v | -q] x y
+ prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose
+ $ python3 prog.py 4 2 -v --quiet
+ usage: prog.py [-h] [-v | -q] x y
+ prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose
+
+That should be easy to follow. I've added that last output so you can see the
+sort of flexibility you get, i.e. mixing long form options with short form
+ones.
+
+Before we conclude, you probably want to tell your users the main purpose of
+your program, just in case they don't know::
+
+ import argparse
+
+ parser = argparse.ArgumentParser(description="calculate X to the power of Y")
+ group = parser.add_mutually_exclusive_group()
+ group.add_argument("-v", "--verbose", action="store_true")
+ group.add_argument("-q", "--quiet", action="store_true")
+ parser.add_argument("x", type=int, help="the base")
+ parser.add_argument("y", type=int, help="the exponent")
+ args = parser.parse_args()
+ answer = args.x**args.y
+
+ if args.quiet:
+ print(answer)
+ elif args.verbose:
+ print("{} to the power {} equals {}".format(args.x, args.y, answer))
+ else:
+ print("{}^{} == {}".format(args.x, args.y, answer))
+
+Note that slight difference in the usage text. Note the ``[-v | -q]``,
+which tells us that we can either use ``-v`` or ``-q``,
+but not both at the same time:
+
+.. code-block:: sh
+
+ $ python3 prog.py --help
+ usage: prog.py [-h] [-v | -q] x y
+
+ calculate X to the power of Y
+
+ positional arguments:
+ x the base
+ y the exponent
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -v, --verbose
+ -q, --quiet
+
+
+Conclusion
+==========
+
+The :mod:`argparse` module offers a lot more than shown here.
+Its docs are quite detailed and thorough, and full of examples.
+Having gone through this tutorial, you should easily digest them
+without feeling overwhelmed.
diff --git a/Doc/howto/cporting.rst b/Doc/howto/cporting.rst
index d20e4a6a94..9d8a1b0e35 100644
--- a/Doc/howto/cporting.rst
+++ b/Doc/howto/cporting.rst
@@ -2,27 +2,28 @@
.. _cporting-howto:
-********************************
-Porting Extension Modules to 3.0
-********************************
+*************************************
+Porting Extension Modules to Python 3
+*************************************
:author: Benjamin Peterson
.. topic:: Abstract
- Although changing the C-API was not one of Python 3.0's objectives, the many
- Python level changes made leaving 2.x's API intact impossible. In fact, some
- changes such as :func:`int` and :func:`long` unification are more obvious on
- the C level. This document endeavors to document incompatibilities and how
- they can be worked around.
+ Although changing the C-API was not one of Python 3's objectives,
+ the many Python-level changes made leaving Python 2's API intact
+ impossible. In fact, some changes such as :func:`int` and
+ :func:`long` unification are more obvious on the C level. This
+ document endeavors to document incompatibilities and how they can
+ be worked around.
Conditional compilation
=======================
-The easiest way to compile only some code for 3.0 is to check if
-:cmacro:`PY_MAJOR_VERSION` is greater than or equal to 3. ::
+The easiest way to compile only some code for Python 3 is to check
+if :c:macro:`PY_MAJOR_VERSION` is greater than or equal to 3. ::
#if PY_MAJOR_VERSION >= 3
#define IS_PY3K
@@ -35,7 +36,7 @@ conditional blocks.
Changes to Object APIs
======================
-Python 3.0 merged together some types with similar functions while cleanly
+Python 3 merged together some types with similar functions while cleanly
separating others.
@@ -43,16 +44,16 @@ str/unicode Unification
-----------------------
-Python 3.0's :func:`str` (``PyString_*`` functions in C) type is equivalent to
-2.x's :func:`unicode` (``PyUnicode_*``). The old 8-bit string type has become
-:func:`bytes`. Python 2.6 and later provide a compatibility header,
+Python 3's :func:`str` (``PyString_*`` functions in C) type is equivalent to
+Python 2's :func:`unicode` (``PyUnicode_*``). The old 8-bit string type has
+become :func:`bytes`. Python 2.6 and later provide a compatibility header,
:file:`bytesobject.h`, mapping ``PyBytes`` names to ``PyString`` ones. For best
-compatibility with 3.0, :ctype:`PyUnicode` should be used for textual data and
-:ctype:`PyBytes` for binary data. It's also important to remember that
-:ctype:`PyBytes` and :ctype:`PyUnicode` in 3.0 are not interchangeable like
-:ctype:`PyString` and :ctype:`PyUnicode` are in 2.x. The following example
-shows best practices with regards to :ctype:`PyUnicode`, :ctype:`PyString`,
-and :ctype:`PyBytes`. ::
+compatibility with Python 3, :c:type:`PyUnicode` should be used for textual data and
+:c:type:`PyBytes` for binary data. It's also important to remember that
+:c:type:`PyBytes` and :c:type:`PyUnicode` in Python 3 are not interchangeable like
+:c:type:`PyString` and :c:type:`PyUnicode` are in Python 2. The following example
+shows best practices with regards to :c:type:`PyUnicode`, :c:type:`PyString`,
+and :c:type:`PyBytes`. ::
#include "stdlib.h"
#include "Python.h"
@@ -94,10 +95,12 @@ and :ctype:`PyBytes`. ::
long/int Unification
--------------------
-In Python 3.0, there is only one integer type. It is called :func:`int` on the
-Python level, but actually corresponds to 2.x's :func:`long` type. In the
-C-API, ``PyInt_*`` functions are replaced by their ``PyLong_*`` neighbors. The
-best course of action here is using the ``PyInt_*`` functions aliased to
+Python 3 has only one integer type, :func:`int`. But it actually
+corresponds to Python 2's :func:`long` type--the :func:`int` type
+used in Python 2 was removed. In the C-API, ``PyInt_*`` functions
+are replaced by their ``PyLong_*`` equivalents.
+
+The best course of action here is using the ``PyInt_*`` functions aliased to
``PyLong_*`` found in :file:`intobject.h`. The abstract ``PyNumber_*`` APIs
can also be used in some cases. ::
@@ -120,10 +123,11 @@ can also be used in some cases. ::
Module initialization and state
===============================
-Python 3.0 has a revamped extension module initialization system. (See PEP
-:pep:`3121`.) Instead of storing module state in globals, they should be stored
-in an interpreter specific structure. Creating modules that act correctly in
-both 2.x and 3.0 is tricky. The following simple example demonstrates how. ::
+Python 3 has a revamped extension module initialization system. (See
+:pep:`3121`.) Instead of storing module state in globals, they should
+be stored in an interpreter specific structure. Creating modules that
+act correctly in both Python 2 and Python 3 is tricky. The following
+simple example demonstrates how. ::
#include "Python.h"
@@ -209,10 +213,65 @@ both 2.x and 3.0 is tricky. The following simple example demonstrates how. ::
}
+CObject replaced with Capsule
+=============================
+
+The :c:type:`Capsule` object was introduced in Python 3.1 and 2.7 to replace
+:c:type:`CObject`. CObjects were useful,
+but the :c:type:`CObject` API was problematic: it didn't permit distinguishing
+between valid CObjects, which allowed mismatched CObjects to crash the
+interpreter, and some of its APIs relied on undefined behavior in C.
+(For further reading on the rationale behind Capsules, please see :issue:`5630`.)
+
+If you're currently using CObjects, and you want to migrate to 3.1 or newer,
+you'll need to switch to Capsules.
+:c:type:`CObject` was deprecated in 3.1 and 2.7 and completely removed in
+Python 3.2. If you only support 2.7, or 3.1 and above, you
+can simply switch to :c:type:`Capsule`. If you need to support Python 3.0,
+or versions of Python earlier than 2.7,
+you'll have to support both CObjects and Capsules.
+(Note that Python 3.0 is no longer supported, and it is not recommended
+for production use.)
+
+The following example header file :file:`capsulethunk.h` may
+solve the problem for you. Simply write your code against the
+:c:type:`Capsule` API and include this header file after
+:file:`Python.h`. Your code will automatically use Capsules
+in versions of Python with Capsules, and switch to CObjects
+when Capsules are unavailable.
+
+:file:`capsulethunk.h` simulates Capsules using CObjects. However,
+:c:type:`CObject` provides no place to store the capsule's "name". As a
+result the simulated :c:type:`Capsule` objects created by :file:`capsulethunk.h`
+behave slightly differently from real Capsules. Specifically:
+
+ * The name parameter passed in to :c:func:`PyCapsule_New` is ignored.
+
+ * The name parameter passed in to :c:func:`PyCapsule_IsValid` and
+ :c:func:`PyCapsule_GetPointer` is ignored, and no error checking
+ of the name is performed.
+
+ * :c:func:`PyCapsule_GetName` always returns NULL.
+
+ * :c:func:`PyCapsule_SetName` always raises an exception and
+ returns failure. (Since there's no way to store a name
+ in a CObject, noisy failure of :c:func:`PyCapsule_SetName`
+ was deemed preferable to silent failure here. If this is
+ inconvenient, feel free to modify your local
+ copy as you see fit.)
+
+You can find :file:`capsulethunk.h` in the Python source distribution
+as :source:`Doc/includes/capsulethunk.h`. We also include it here for
+your convenience:
+
+.. literalinclude:: ../includes/capsulethunk.h
+
+
+
Other options
=============
If you are writing a new extension module, you might consider `Cython
<http://www.cython.org>`_. It translates a Python-like language to C. The
-extension modules it creates are compatible with Python 3.x and 2.x.
+extension modules it creates are compatible with Python 3 and Python 2.
diff --git a/Doc/howto/curses.rst b/Doc/howto/curses.rst
index 53ef7deb9d..1b14ceb6bd 100644
--- a/Doc/howto/curses.rst
+++ b/Doc/howto/curses.rst
@@ -118,7 +118,7 @@ function to restore the terminal to its original operating mode. ::
A common problem when debugging a curses application is to get your terminal
messed up when the application dies without restoring the terminal to its
previous state. In Python this commonly happens when your code is buggy and
-raises an uncaught exception. Keys are no longer be echoed to the screen when
+raises an uncaught exception. Keys are no longer echoed to the screen when
you type them, for example, which makes using the shell difficult.
In Python you can avoid these complications and make debugging much easier by
@@ -271,7 +271,7 @@ application are commonly shown in reverse video; a text viewer may need to
highlight certain words. curses supports this by allowing you to specify an
attribute for each cell on the screen.
-An attribute is a integer, each bit representing a different attribute. You can
+An attribute is an integer, each bit representing a different attribute. You can
try to display text with multiple attribute bits set, but curses doesn't
guarantee that all the possible combinations are available, or that they're all
visually distinct. That depends on the ability of the terminal being used, so
@@ -300,7 +300,7 @@ could code::
curses.A_REVERSE)
stdscr.refresh()
-The curses library also supports color on those terminals that provide it, The
+The curses library also supports color on those terminals that provide it. The
most common such terminal is probably the Linux console, followed by color
xterms.
diff --git a/Doc/howto/descriptor.rst b/Doc/howto/descriptor.rst
index cdb6a8ec3d..f8763d888e 100644
--- a/Doc/howto/descriptor.rst
+++ b/Doc/howto/descriptor.rst
@@ -42,7 +42,7 @@ classes (a class is new style if it inherits from :class:`object` or
Descriptors are a powerful, general purpose protocol. They are the mechanism
behind properties, methods, static methods, class methods, and :func:`super()`.
-They are used used throughout Python itself to implement the new style classes
+They are used throughout Python itself to implement the new style classes
introduced in version 2.2. Descriptors simplify the underlying C-code and offer
a flexible set of new tools for everyday Python programs.
@@ -97,7 +97,7 @@ transforms ``b.x`` into ``type(b).__dict__['x'].__get__(b, type(b))``. The
implementation works through a precedence chain that gives data descriptors
priority over instance variables, instance variables priority over non-data
descriptors, and assigns lowest priority to :meth:`__getattr__` if provided. The
-full C implementation can be found in :cfunc:`PyObject_GenericGetAttr()` in
+full C implementation can be found in :c:func:`PyObject_GenericGetAttr()` in
`Objects/object.c <http://svn.python.org/view/python/trunk/Objects/object.c?view=markup>`_\.
For classes, the machinery is in :meth:`type.__getattribute__` which transforms
@@ -131,7 +131,7 @@ search using :meth:`object.__getattribute__`.
Note, in Python 2.2, ``super(B, obj).m()`` would only invoke :meth:`__get__` if
``m`` was a data descriptor. In Python 2.3, non-data descriptors also get
invoked unless an old-style class is involved. The implementation details are
-in :cfunc:`super_getattro()` in
+in :c:func:`super_getattro()` in
`Objects/typeobject.c <http://svn.python.org/view/python/trunk/Objects/typeobject.c?view=markup>`_
and a pure Python equivalent can be found in `Guido's Tutorial`_.
@@ -224,17 +224,17 @@ here is a pure Python equivalent::
if obj is None:
return self
if self.fget is None:
- raise AttributeError, "unreadable attribute"
+ raise AttributeError("unreadable attribute")
return self.fget(obj)
def __set__(self, obj, value):
if self.fset is None:
- raise AttributeError, "can't set attribute"
+ raise AttributeError("can't set attribute")
self.fset(obj, value)
def __delete__(self, obj):
if self.fdel is None:
- raise AttributeError, "can't delete attribute"
+ raise AttributeError("can't delete attribute")
self.fdel(obj)
The :func:`property` builtin helps whenever a user interface has granted
@@ -297,7 +297,7 @@ Running the interpreter shows how the function descriptor works in practice::
The output suggests that bound and unbound methods are two different types.
While they could have been implemented that way, the actual C implementation of
-:ctype:`PyMethod_Type` in
+:c:type:`PyMethod_Type` in
`Objects/classobject.c <http://svn.python.org/view/python/trunk/Objects/classobject.c?view=markup>`_
is a single object with two different representations depending on whether the
:attr:`im_self` field is set or is *NULL* (the C equivalent of *None*).
diff --git a/Doc/howto/doanddont.rst b/Doc/howto/doanddont.rst
deleted file mode 100644
index 365a6209d4..0000000000
--- a/Doc/howto/doanddont.rst
+++ /dev/null
@@ -1,290 +0,0 @@
-************************************
- Idioms and Anti-Idioms in Python
-************************************
-
-:Author: Moshe Zadka
-
-This document is placed in the public domain.
-
-
-.. topic:: Abstract
-
- This document can be considered a companion to the tutorial. It shows how to use
- Python, and even more importantly, how *not* to use Python.
-
-
-Language Constructs You Should Not Use
-======================================
-
-While Python has relatively few gotchas compared to other languages, it still
-has some constructs which are only useful in corner cases, or are plain
-dangerous.
-
-
-from module import \*
----------------------
-
-
-Inside Function Definitions
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-``from module import *`` is *invalid* inside function definitions. While many
-versions of Python do not check for the invalidity, it does not make it more
-valid, no more than having a smart lawyer makes a man innocent. Do not use it
-like that ever. Even in versions where it was accepted, it made the function
-execution slower, because the compiler could not be certain which names are
-local and which are global. In Python 2.1 this construct causes warnings, and
-sometimes even errors.
-
-
-At Module Level
-^^^^^^^^^^^^^^^
-
-While it is valid to use ``from module import *`` at module level it is usually
-a bad idea. For one, this loses an important property Python otherwise has ---
-you can know where each toplevel name is defined by a simple "search" function
-in your favourite editor. You also open yourself to trouble in the future, if
-some module grows additional functions or classes.
-
-One of the most awful question asked on the newsgroup is why this code::
-
- f = open("www")
- f.read()
-
-does not work. Of course, it works just fine (assuming you have a file called
-"www".) But it does not work if somewhere in the module, the statement ``from os
-import *`` is present. The :mod:`os` module has a function called :func:`open`
-which returns an integer. While it is very useful, shadowing builtins is one of
-its least useful properties.
-
-Remember, you can never know for sure what names a module exports, so either
-take what you need --- ``from module import name1, name2``, or keep them in the
-module and access on a per-need basis --- ``import module; print(module.name)``.
-
-
-When It Is Just Fine
-^^^^^^^^^^^^^^^^^^^^
-
-There are situations in which ``from module import *`` is just fine:
-
-* The interactive prompt. For example, ``from math import *`` makes Python an
- amazing scientific calculator.
-
-* When extending a module in C with a module in Python.
-
-* When the module advertises itself as ``from import *`` safe.
-
-
-from module import name1, name2
--------------------------------
-
-This is a "don't" which is much weaker than the previous "don't"s but is still
-something you should not do if you don't have good reasons to do that. The
-reason it is usually bad idea is because you suddenly have an object which lives
-in two separate namespaces. When the binding in one namespace changes, the
-binding in the other will not, so there will be a discrepancy between them. This
-happens when, for example, one module is reloaded, or changes the definition of
-a function at runtime.
-
-Bad example::
-
- # foo.py
- a = 1
-
- # bar.py
- from foo import a
- if something():
- a = 2 # danger: foo.a != a
-
-Good example::
-
- # foo.py
- a = 1
-
- # bar.py
- import foo
- if something():
- foo.a = 2
-
-
-except:
--------
-
-Python has the ``except:`` clause, which catches all exceptions. Since *every*
-error in Python raises an exception, using ``except:`` can make many
-programming errors look like runtime problems, which hinders the debugging
-process.
-
-The following code shows a great example of why this is bad::
-
- try:
- foo = opne("file") # misspelled "open"
- except:
- sys.exit("could not open file!")
-
-The second line triggers a :exc:`NameError`, which is caught by the except
-clause. The program will exit, and the error message the program prints will
-make you think the problem is the readability of ``"file"`` when in fact
-the real error has nothing to do with ``"file"``.
-
-A better way to write the above is ::
-
- try:
- foo = opne("file")
- except IOError:
- sys.exit("could not open file")
-
-When this is run, Python will produce a traceback showing the :exc:`NameError`,
-and it will be immediately apparent what needs to be fixed.
-
-.. index:: bare except, except; bare
-
-Because ``except:`` catches *all* exceptions, including :exc:`SystemExit`,
-:exc:`KeyboardInterrupt`, and :exc:`GeneratorExit` (which is not an error and
-should not normally be caught by user code), using a bare ``except:`` is almost
-never a good idea. In situations where you need to catch all "normal" errors,
-such as in a framework that runs callbacks, you can catch the base class for
-all normal exceptions, :exc:`Exception`.
-
-
-Exceptions
-==========
-
-Exceptions are a useful feature of Python. You should learn to raise them
-whenever something unexpected occurs, and catch them only where you can do
-something about them.
-
-The following is a very popular anti-idiom ::
-
- def get_status(file):
- if not os.path.exists(file):
- print("file not found")
- sys.exit(1)
- return open(file).readline()
-
-Consider the case where the file gets deleted between the time the call to
-:func:`os.path.exists` is made and the time :func:`open` is called. In that
-case the last line will raise an :exc:`IOError`. The same thing would happen
-if *file* exists but has no read permission. Since testing this on a normal
-machine on existent and non-existent files makes it seem bugless, the test
-results will seem fine, and the code will get shipped. Later an unhandled
-:exc:`IOError` (or perhaps some other :exc:`EnvironmentError`) escapes to the
-user, who gets to watch the ugly traceback.
-
-Here is a somewhat better way to do it. ::
-
- def get_status(file):
- try:
- return open(file).readline()
- except EnvironmentError as err:
- print("Unable to open file: {}".format(err))
- sys.exit(1)
-
-In this version, *either* the file gets opened and the line is read (so it
-works even on flaky NFS or SMB connections), or an error message is printed
-that provides all the available information on why the open failed, and the
-application is aborted.
-
-However, even this version of :func:`get_status` makes too many assumptions ---
-that it will only be used in a short running script, and not, say, in a long
-running server. Sure, the caller could do something like ::
-
- try:
- status = get_status(log)
- except SystemExit:
- status = None
-
-But there is a better way. You should try to use as few ``except`` clauses in
-your code as you can --- the ones you do use will usually be inside calls which
-should always succeed, or a catch-all in a main function.
-
-So, an even better version of :func:`get_status()` is probably ::
-
- def get_status(file):
- return open(file).readline()
-
-The caller can deal with the exception if it wants (for example, if it tries
-several files in a loop), or just let the exception filter upwards to *its*
-caller.
-
-But the last version still has a serious problem --- due to implementation
-details in CPython, the file would not be closed when an exception is raised
-until the exception handler finishes; and, worse, in other implementations
-(e.g., Jython) it might not be closed at all regardless of whether or not
-an exception is raised.
-
-The best version of this function uses the ``open()`` call as a context
-manager, which will ensure that the file gets closed as soon as the
-function returns::
-
- def get_status(file):
- with open(file) as fp:
- return fp.readline()
-
-
-Using the Batteries
-===================
-
-Every so often, people seem to be writing stuff in the Python library again,
-usually poorly. While the occasional module has a poor interface, it is usually
-much better to use the rich standard library and data types that come with
-Python than inventing your own.
-
-A useful module very few people know about is :mod:`os.path`. It always has the
-correct path arithmetic for your operating system, and will usually be much
-better than whatever you come up with yourself.
-
-Compare::
-
- # ugh!
- return dir+"/"+file
- # better
- return os.path.join(dir, file)
-
-More useful functions in :mod:`os.path`: :func:`basename`, :func:`dirname` and
-:func:`splitext`.
-
-There are also many useful built-in functions people seem not to be aware of
-for some reason: :func:`min` and :func:`max` can find the minimum/maximum of
-any sequence with comparable semantics, for example, yet many people write
-their own :func:`max`/:func:`min`. Another highly useful function is
-:func:`functools.reduce` which can be used to repeatly apply a binary
-operation to a sequence, reducing it to a single value. For example, compute
-a factorial with a series of multiply operations::
-
- >>> n = 4
- >>> import operator, functools
- >>> functools.reduce(operator.mul, range(1, n+1))
- 24
-
-When it comes to parsing numbers, note that :func:`float`, :func:`int` and
-:func:`long` all accept string arguments and will reject ill-formed strings
-by raising an :exc:`ValueError`.
-
-
-Using Backslash to Continue Statements
-======================================
-
-Since Python treats a newline as a statement terminator, and since statements
-are often more than is comfortable to put in one line, many people do::
-
- if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
- calculate_number(10, 20) != forbulate(500, 360):
- pass
-
-You should realize that this is dangerous: a stray space after the ``\`` would
-make this line wrong, and stray spaces are notoriously hard to see in editors.
-In this case, at least it would be a syntax error, but if the code was::
-
- value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
- + calculate_number(10, 20)*forbulate(500, 360)
-
-then it would just be subtly wrong.
-
-It is usually much better to use the implicit continuation inside parenthesis:
-
-This version is bulletproof::
-
- value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
- + calculate_number(10, 20)*forbulate(500, 360))
-
diff --git a/Doc/howto/functional.rst b/Doc/howto/functional.rst
index bfd2c96397..ebbb229e57 100644
--- a/Doc/howto/functional.rst
+++ b/Doc/howto/functional.rst
@@ -181,26 +181,26 @@ foundation for writing functional-style programs: iterators.
An iterator is an object representing a stream of data; this object returns the
data one element at a time. A Python iterator must support a method called
-``__next__()`` that takes no arguments and always returns the next element of
-the stream. If there are no more elements in the stream, ``__next__()`` must
-raise the ``StopIteration`` exception. Iterators don't have to be finite,
-though; it's perfectly reasonable to write an iterator that produces an infinite
-stream of data.
+:meth:`~iterator.__next__` that takes no arguments and always returns the next
+element of the stream. If there are no more elements in the stream,
+:meth:`~iterator.__next__` must raise the :exc:`StopIteration` exception.
+Iterators don't have to be finite, though; it's perfectly reasonable to write
+an iterator that produces an infinite stream of data.
The built-in :func:`iter` function takes an arbitrary object and tries to return
an iterator that will return the object's contents or elements, raising
:exc:`TypeError` if the object doesn't support iteration. Several of Python's
built-in data types support iteration, the most common being lists and
-dictionaries. An object is called an **iterable** object if you can get an
-iterator for it.
+dictionaries. An object is called :term:`iterable` if you can get an iterator
+for it.
You can experiment with the iteration interface manually:
>>> L = [1,2,3]
>>> it = iter(L)
- >>> it
+ >>> it #doctest: +ELLIPSIS
<...iterator object at ...>
- >>> it.__next__()
+ >>> it.__next__() # same as next(it)
1
>>> next(it)
2
@@ -213,9 +213,9 @@ You can experiment with the iteration interface manually:
>>>
Python expects iterable objects in several different contexts, the most
-important being the ``for`` statement. In the statement ``for X in Y``, Y must
-be an iterator or some object for which ``iter()`` can create an iterator.
-These two statements are equivalent::
+important being the :keyword:`for` statement. In the statement ``for X in Y``,
+Y must be an iterator or some object for which :func:`iter` can create an
+iterator. These two statements are equivalent::
for i in iter(obj):
@@ -246,16 +246,16 @@ Built-in functions such as :func:`max` and :func:`min` can take a single
iterator argument and will return the largest or smallest element. The ``"in"``
and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
X is found in the stream returned by the iterator. You'll run into obvious
-problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"``
+problems if the iterator is infinite; :func:`max`, :func:`min`
will never return, and if the element X never appears in the stream, the
-``"in"`` operator won't return either.
+``"in"`` and ``"not in"`` operators won't return either.
Note that you can only go forward in an iterator; there's no way to get the
previous element, reset the iterator, or make a copy of it. Iterator objects
can optionally provide these additional capabilities, but the iterator protocol
-only specifies the ``next()`` method. Functions may therefore consume all of
-the iterator's output, and if you need to do something different with the same
-stream, you'll have to create a new iterator.
+only specifies the :meth:`~iterator.__next__` method. Functions may therefore
+consume all of the iterator's output, and if you need to do something different
+with the same stream, you'll have to create a new iterator.
@@ -267,15 +267,11 @@ sequence type, such as strings, will automatically support creation of an
iterator.
Calling :func:`iter` on a dictionary returns an iterator that will loop over the
-dictionary's keys:
-
-.. not a doctest since dict ordering varies across Pythons
-
-::
+dictionary's keys::
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
- >>> for key in m:
+ >>> for key in m: #doctest: +SKIP
... print(key, m[key])
Mar 3
Feb 2
@@ -296,7 +292,7 @@ ordering of the objects in the dictionary.
Applying :func:`iter` to a dictionary always loops over the keys, but
dictionaries have methods that return other iterators. If you want to iterate
over values or key/value pairs, you can explicitly call the
-:meth:`values` or :meth:`items` methods to get an appropriate iterator.
+:meth:`~dict.values` or :meth:`~dict.items` methods to get an appropriate iterator.
The :func:`dict` constructor can accept an iterator that returns a finite stream
of ``(key, value)`` tuples:
@@ -305,9 +301,9 @@ of ``(key, value)`` tuples:
>>> dict(iter(L))
{'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
-Files also support iteration by calling the ``readline()`` method until there
-are no more lines in the file. This means you can read each line of a file like
-this::
+Files also support iteration by calling the :meth:`~io.TextIOBase.readline`
+method until there are no more lines in the file. This means you can read each
+line of a file like this::
for line in file:
# do something for each line
@@ -410,12 +406,9 @@ clauses, the length of the resulting output will be equal to the product of the
lengths of all the sequences. If you have two lists of length 3, the output
list is 9 elements long:
-.. doctest::
- :options: +NORMALIZE_WHITESPACE
-
>>> seq1 = 'abc'
>>> seq2 = (1,2,3)
- >>> [(x,y) for x in seq1 for y in seq2]
+ >>> [(x, y) for x in seq1 for y in seq2] #doctest: +NORMALIZE_WHITESPACE
[('a', 1), ('a', 2), ('a', 3),
('b', 1), ('b', 2), ('b', 3),
('c', 1), ('c', 2), ('c', 3)]
@@ -425,9 +418,9 @@ creating a tuple, it must be surrounded with parentheses. The first list
comprehension below is a syntax error, while the second one is correct::
# Syntax error
- [ x,y for x in seq1 for y in seq2]
+ [x, y for x in seq1 for y in seq2]
# Correct
- [ (x,y) for x in seq1 for y in seq2]
+ [(x, y) for x in seq1 for y in seq2]
Generators
@@ -448,15 +441,13 @@ is what generators provide; they can be thought of as resumable functions.
Here's the simplest example of a generator function:
-.. testcode::
+ >>> def generate_ints(N):
+ ... for i in range(N):
+ ... yield i
- def generate_ints(N):
- for i in range(N):
- yield i
-
-Any function containing a ``yield`` keyword is a generator function; this is
-detected by Python's :term:`bytecode` compiler which compiles the function
-specially as a result.
+Any function containing a :keyword:`yield` keyword is a generator function;
+this is detected by Python's :term:`bytecode` compiler which compiles the
+function specially as a result.
When you call a generator function, it doesn't return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
@@ -464,12 +455,13 @@ the ``yield`` expression, the generator outputs the value of ``i``, similar to a
``return`` statement. The big difference between ``yield`` and a ``return``
statement is that on reaching a ``yield`` the generator's state of execution is
suspended and local variables are preserved. On the next call to the
-generator's ``.__next__()`` method, the function will resume executing.
+generator's :meth:`~generator.__next__` method, the function will resume
+executing.
Here's a sample usage of the ``generate_ints()`` generator:
>>> gen = generate_ints(3)
- >>> gen
+ >>> gen #doctest: +ELLIPSIS
<generator object generate_ints at ...>
>>> next(gen)
0
@@ -491,17 +483,19 @@ value, and signals the end of the procession of values; after executing a
``return`` the generator cannot return any further values. ``return`` with a
value, such as ``return 5``, is a syntax error inside a generator function. The
end of the generator's results can also be indicated by raising
-``StopIteration`` manually, or by just letting the flow of execution fall off
+:exc:`StopIteration` manually, or by just letting the flow of execution fall off
the bottom of the function.
You could achieve the effect of generators manually by writing your own class
and storing all the local variables of the generator as instance variables. For
example, returning a list of integers could be done by setting ``self.count`` to
-0, and having the ``__next__()`` method increment ``self.count`` and return it.
+0, and having the :meth:`~iterator.__next__` method increment ``self.count`` and
+return it.
However, for a moderately complicated generator, writing a corresponding class
can be much messier.
-The test suite included with Python's library, ``test_generators.py``, contains
+The test suite included with Python's library,
+:source:`Lib/test/test_generators.py`, contains
a number of more interesting examples. Here's one generator that implements an
in-order traversal of a tree using generators recursively. ::
@@ -544,23 +538,23 @@ when you're doing something with the returned value, as in the above example.
The parentheses aren't always necessary, but it's easier to always add them
instead of having to remember when they're needed.
-(PEP 342 explains the exact rules, which are that a ``yield``-expression must
+(:pep:`342` explains the exact rules, which are that a ``yield``-expression must
always be parenthesized except when it occurs at the top-level expression on the
right-hand side of an assignment. This means you can write ``val = yield i``
but have to use parentheses when there's an operation, as in ``val = (yield i)
+ 12``.)
-Values are sent into a generator by calling its ``send(value)`` method. This
-method resumes the generator's code and the ``yield`` expression returns the
-specified value. If the regular ``__next__()`` method is called, the ``yield``
-returns ``None``.
+Values are sent into a generator by calling its :meth:`send(value)
+<generator.send>` method. This method resumes the generator's code and the
+``yield`` expression returns the specified value. If the regular
+:meth:`~generator.__next__` method is called, the ``yield`` returns ``None``.
Here's a simple counter that increments by 1 and allows changing the value of
the internal counter.
.. testcode::
- def counter (maximum):
+ def counter(maximum):
i = 0
while i < maximum:
val = (yield i)
@@ -572,16 +566,16 @@ the internal counter.
And here's an example of changing the counter:
- >>> it = counter(10)
- >>> next(it)
+ >>> it = counter(10) #doctest: +SKIP
+ >>> next(it) #doctest: +SKIP
0
- >>> next(it)
+ >>> next(it) #doctest: +SKIP
1
- >>> it.send(8)
+ >>> it.send(8) #doctest: +SKIP
8
- >>> next(it)
+ >>> next(it) #doctest: +SKIP
9
- >>> next(it)
+ >>> next(it) #doctest: +SKIP
Traceback (most recent call last):
File "t.py", line 15, in ?
it.next()
@@ -589,20 +583,23 @@ And here's an example of changing the counter:
Because ``yield`` will often be returning ``None``, you should always check for
this case. Don't just use its value in expressions unless you're sure that the
-``send()`` method will be the only method used resume your generator function.
+:meth:`~generator.send` method will be the only method used resume your
+generator function.
-In addition to ``send()``, there are two other new methods on generators:
+In addition to :meth:`~generator.send`, there are two other methods on
+generators:
-* ``throw(type, value=None, traceback=None)`` is used to raise an exception
- inside the generator; the exception is raised by the ``yield`` expression
- where the generator's execution is paused.
+* :meth:`throw(type, value=None, traceback=None) <generator.throw>` is used to
+ raise an exception inside the generator; the exception is raised by the
+ ``yield`` expression where the generator's execution is paused.
-* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
- terminate the iteration. On receiving this exception, the generator's code
- must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
- exception and doing anything else is illegal and will trigger a
- :exc:`RuntimeError`. ``close()`` will also be called by Python's garbage
- collector when the generator is garbage-collected.
+* :meth:`~generator.close` raises a :exc:`GeneratorExit` exception inside the
+ generator to terminate the iteration. On receiving this exception, the
+ generator's code must either raise :exc:`GeneratorExit` or
+ :exc:`StopIteration`; catching the exception and doing anything else is
+ illegal and will trigger a :exc:`RuntimeError`. :meth:`~generator.close`
+ will also be called by Python's garbage collector when the generator is
+ garbage-collected.
If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
@@ -624,13 +621,12 @@ Let's look in more detail at built-in functions often used with iterators.
Two of Python's built-in functions, :func:`map` and :func:`filter` duplicate the
features of generator expressions:
-``map(f, iterA, iterB, ...)`` returns an iterator over the sequence
+:func:`map(f, iterA, iterB, ...) <map>` returns an iterator over the sequence
``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
>>> def upper(s):
... return s.upper()
-
>>> list(map(upper, ['sentence', 'fragment']))
['SENTENCE', 'FRAGMENT']
>>> [upper(s) for s in ['sentence', 'fragment']]
@@ -638,11 +634,11 @@ features of generator expressions:
You can of course achieve the same effect with a list comprehension.
-``filter(predicate, iter)`` returns an iterator over all the sequence elements
-that meet a certain condition, and is similarly duplicated by list
-comprehensions. A **predicate** is a function that returns the truth value of
-some condition; for use with :func:`filter`, the predicate must take a single
-value.
+:func:`filter(predicate, iter) <filter>` returns an iterator over all the
+sequence elements that meet a certain condition, and is similarly duplicated by
+list comprehensions. A **predicate** is a function that returns the truth
+value of some condition; for use with :func:`filter`, the predicate must take a
+single value.
>>> def is_even(x):
... return (x % 2) == 0
@@ -657,8 +653,8 @@ This can also be written as a list comprehension:
[0, 2, 4, 6, 8]
-``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
-containing the count and each element. ::
+:func:`enumerate(iter) <enumerate>` counts off the elements in the iterable,
+returning 2-tuples containing the count and each element. ::
>>> for item in enumerate(['subject', 'verb', 'object']):
... print(item)
@@ -674,29 +670,28 @@ indexes at which certain conditions are met::
if line.strip() == '':
print('Blank line at line #%i' % i)
-``sorted(iterable, [key=None], [reverse=False])`` collects all the elements of
-the iterable into a list, sorts the list, and returns the sorted result. The
-``key``, and ``reverse`` arguments are passed through to the constructed list's
-``.sort()`` method. ::
+:func:`sorted(iterable, key=None, reverse=False) <sorted>` collects all the
+elements of the iterable into a list, sorts the list, and returns the sorted
+result. The *key*, and *reverse* arguments are passed through to the
+constructed list's :meth:`~list.sort` method. ::
>>> import random
>>> # Generate 8 random numbers between [0, 10000)
>>> rand_list = random.sample(range(10000), 8)
- >>> rand_list
+ >>> rand_list #doctest: +SKIP
[769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
- >>> sorted(rand_list)
+ >>> sorted(rand_list) #doctest: +SKIP
[769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
- >>> sorted(rand_list, reverse=True)
+ >>> sorted(rand_list, reverse=True) #doctest: +SKIP
[9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
-(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
-Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
+(For a more detailed discussion of sorting, see the :ref:`sortinghowto`.)
-The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
-iterable's contents. :func:`any` returns True if any element in the iterable is
-a true value, and :func:`all` returns True if all of the elements are true
-values:
+The :func:`any(iter) <any>` and :func:`all(iter) <all>` built-ins look at the
+truth values of an iterable's contents. :func:`any` returns True if any element
+in the iterable is a true value, and :func:`all` returns True if all of the
+elements are true values:
>>> any([0,1,0])
True
@@ -712,7 +707,7 @@ values:
True
-``zip(iterA, iterB, ...)`` takes one element from each iterable and
+:func:`zip(iterA, iterB, ...) <zip>` takes one element from each iterable and
returns them in a tuple::
zip(['a', 'b', 'c'], (1, 2, 3)) =>
@@ -752,42 +747,44 @@ The module's functions fall into a few broad classes:
Creating new iterators
----------------------
-``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
-each time. You can optionally supply the starting number, which defaults to 0::
+:func:`itertools.count(n) <itertools.count>` returns an infinite stream of
+integers, increasing by 1 each time. You can optionally supply the starting
+number, which defaults to 0::
itertools.count() =>
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
itertools.count(10) =>
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
-``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
-and returns a new iterator that returns its elements from first to last. The
-new iterator will repeat these elements infinitely. ::
+:func:`itertools.cycle(iter) <itertools.cycle>` saves a copy of the contents of
+a provided iterable and returns a new iterator that returns its elements from
+first to last. The new iterator will repeat these elements infinitely. ::
itertools.cycle([1,2,3,4,5]) =>
1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
-``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
-returns the element endlessly if ``n`` is not provided. ::
+:func:`itertools.repeat(elem, [n]) <itertools.repeat>` returns the provided
+element *n* times, or returns the element endlessly if *n* is not provided. ::
itertools.repeat('abc') =>
abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
itertools.repeat('abc', 5) =>
abc, abc, abc, abc, abc
-``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
-input, and returns all the elements of the first iterator, then all the elements
-of the second, and so on, until all of the iterables have been exhausted. ::
+:func:`itertools.chain(iterA, iterB, ...) <itertools.chain>` takes an arbitrary
+number of iterables as input, and returns all the elements of the first
+iterator, then all the elements of the second, and so on, until all of the
+iterables have been exhausted. ::
itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
a, b, c, 1, 2, 3
-``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
-slice of the iterator. With a single ``stop`` argument, it will return the
-first ``stop`` elements. If you supply a starting index, you'll get
-``stop-start`` elements, and if you supply a value for ``step``, elements will
-be skipped accordingly. Unlike Python's string and list slicing, you can't use
-negative values for ``start``, ``stop``, or ``step``. ::
+:func:`itertools.islice(iter, [start], stop, [step]) <itertools.islice>` returns
+a stream that's a slice of the iterator. With a single *stop* argument, it
+will return the first *stop* elements. If you supply a starting index, you'll
+get *stop-start* elements, and if you supply a value for *step*, elements
+will be skipped accordingly. Unlike Python's string and list slicing, you can't
+use negative values for *start*, *stop*, or *step*. ::
itertools.islice(range(10), 8) =>
0, 1, 2, 3, 4, 5, 6, 7
@@ -796,9 +793,10 @@ negative values for ``start``, ``stop``, or ``step``. ::
itertools.islice(range(10), 2, 8, 2) =>
2, 4, 6
-``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
-independent iterators that will all return the contents of the source iterator.
-If you don't supply a value for ``n``, the default is 2. Replicating iterators
+:func:`itertools.tee(iter, [n]) <itertools.tee>` replicates an iterator; it
+returns *n* independent iterators that will all return the contents of the
+source iterator.
+If you don't supply a value for *n*, the default is 2. Replicating iterators
requires saving some of the contents of the source iterator, so this can consume
significant memory if the iterator is large and one of the new iterators is
consumed more than the others. ::
@@ -816,19 +814,21 @@ consumed more than the others. ::
Calling functions on elements
-----------------------------
-The ``operator`` module contains a set of functions corresponding to Python's
-operators. Some examples are ``operator.add(a, b)`` (adds two values),
-``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
-(returns a callable that fetches the ``"id"`` attribute).
+The :mod:`operator` module contains a set of functions corresponding to Python's
+operators. Some examples are :func:`operator.add(a, b) <operator.add>` (adds
+two values), :func:`operator.ne(a, b) <operator.ne>` (same as ``a != b``), and
+:func:`operator.attrgetter('id') <operator.attrgetter>`
+(returns a callable that fetches the ``.id`` attribute).
-``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
-of tuples, and calls ``f()`` using these tuples as the arguments::
+:func:`itertools.starmap(func, iter) <itertools.starmap>` assumes that the
+iterable will return a stream of tuples, and calls *func* using these tuples as
+the arguments::
itertools.starmap(os.path.join,
- [('/usr', 'bin', 'java'), ('/bin', 'python'),
- ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
+ [('/bin', 'python'), ('/usr', 'bin', 'java'),
+ ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')])
=>
- /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
+ /bin/python, /usr/bin/java, /usr/bin/perl, /usr/bin/ruby
Selecting elements
@@ -837,20 +837,18 @@ Selecting elements
Another group of functions chooses a subset of an iterator's elements based on a
predicate.
-``itertools.filterfalse(predicate, iter)`` is the opposite, returning all
-elements for which the predicate returns false::
+:func:`itertools.filterfalse(predicate, iter) <itertools.filterfalse>` is the
+opposite, returning all elements for which the predicate returns false::
itertools.filterfalse(is_even, itertools.count()) =>
1, 3, 5, 7, 9, 11, 13, 15, ...
-``itertools.takewhile(predicate, iter)`` returns elements for as long as the
-predicate returns true. Once the predicate returns false, the iterator will
-signal the end of its results.
-
-::
+:func:`itertools.takewhile(predicate, iter) <itertools.takewhile>` returns
+elements for as long as the predicate returns true. Once the predicate returns
+false, the iterator will signal the end of its results. ::
def less_than_10(x):
- return (x < 10)
+ return x < 10
itertools.takewhile(less_than_10, itertools.count()) =>
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
@@ -858,10 +856,9 @@ signal the end of its results.
itertools.takewhile(is_even, itertools.count()) =>
0
-``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
-returns true, and then returns the rest of the iterable's results.
-
-::
+:func:`itertools.dropwhile(predicate, iter) <itertools.dropwhile>` discards
+elements while the predicate returns true, and then returns the rest of the
+iterable's results. ::
itertools.dropwhile(less_than_10, itertools.count()) =>
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
@@ -873,14 +870,14 @@ returns true, and then returns the rest of the iterable's results.
Grouping elements
-----------------
-The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
-the most complicated. ``key_func(elem)`` is a function that can compute a key
-value for each element returned by the iterable. If you don't supply a key
-function, the key is simply each element itself.
+The last function I'll discuss, :func:`itertools.groupby(iter, key_func=None)
+<itertools.groupby>`, is the most complicated. ``key_func(elem)`` is a function
+that can compute a key value for each element returned by the iterable. If you
+don't supply a key function, the key is simply each element itself.
-``groupby()`` collects all the consecutive elements from the underlying iterable
-that have the same key value, and returns a stream of 2-tuples containing a key
-value and an iterator for the elements with that key.
+:func:`~itertools.groupby` collects all the consecutive elements from the
+underlying iterable that have the same key value, and returns a stream of
+2-tuples containing a key value and an iterator for the elements with that key.
::
@@ -890,7 +887,7 @@ value and an iterator for the elements with that key.
...
]
- def get_state (city_state):
+ def get_state(city_state):
return city_state[1]
itertools.groupby(city_list, get_state) =>
@@ -906,9 +903,9 @@ value and an iterator for the elements with that key.
iterator-3 =>
('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
-``groupby()`` assumes that the underlying iterable's contents will already be
-sorted based on the key. Note that the returned iterators also use the
-underlying iterable, so you have to consume the results of iterator-1 before
+:func:`~itertools.groupby` assumes that the underlying iterable's contents will
+already be sorted based on the key. Note that the returned iterators also use
+the underlying iterable, so you have to consume the results of iterator-1 before
requesting iterator-2 and its corresponding key.
@@ -926,33 +923,34 @@ Consider a Python function ``f(a, b, c)``; you may wish to create a new function
``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
one of ``f()``'s parameters. This is called "partial function application".
-The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
-... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you
-can just call it to invoke ``function`` with the filled-in arguments.
+The constructor for :func:`~functools.partial` takes the arguments
+``(function, arg1, arg2, ..., kwarg1=value1, kwarg2=value2)``. The resulting
+object is callable, so you can just call it to invoke ``function`` with the
+filled-in arguments.
Here's a small but realistic example::
import functools
- def log (message, subsystem):
- "Write the contents of 'message' to the specified subsystem."
+ def log(message, subsystem):
+ """Write the contents of 'message' to the specified subsystem."""
print('%s: %s' % (subsystem, message))
...
server_log = functools.partial(log, subsystem='server')
server_log('Unable to open socket')
-``functools.reduce(func, iter, [initial_value])`` cumulatively performs an
-operation on all the iterable's elements and, therefore, can't be applied to
-infinite iterables. (Note it is not in :mod:`builtins`, but in the
-:mod:`functools` module.) ``func`` must be a function that takes two elements
-and returns a single value. :func:`functools.reduce` takes the first two
-elements A and B returned by the iterator and calculates ``func(A, B)``. It
-then requests the third element, C, calculates ``func(func(A, B), C)``, combines
-this result with the fourth element returned, and continues until the iterable
-is exhausted. If the iterable returns no values at all, a :exc:`TypeError`
-exception is raised. If the initial value is supplied, it's used as a starting
-point and ``func(initial_value, A)`` is the first calculation. ::
+:func:`functools.reduce(func, iter, [initial_value]) <functools.reduce>`
+cumulatively performs an operation on all the iterable's elements and,
+therefore, can't be applied to infinite iterables. *func* must be a function
+that takes two elements and returns a single value. :func:`functools.reduce`
+takes the first two elements A and B returned by the iterator and calculates
+``func(A, B)``. It then requests the third element, C, calculates
+``func(func(A, B), C)``, combines this result with the fourth element returned,
+and continues until the iterable is exhausted. If the iterable returns no
+values at all, a :exc:`TypeError` exception is raised. If the initial value is
+supplied, it's used as a starting point and ``func(initial_value, A)`` is the
+first calculation. ::
>>> import operator, functools
>>> functools.reduce(operator.concat, ['A', 'BB', 'C'])
@@ -978,8 +976,8 @@ built-in called :func:`sum` to compute it:
>>> sum([])
0
-For many uses of :func:`functools.reduce`, though, it can be clearer to just write the
-obvious :keyword:`for` loop::
+For many uses of :func:`functools.reduce`, though, it can be clearer to just
+write the obvious :keyword:`for` loop::
import functools
# Instead of:
@@ -1010,135 +1008,6 @@ Some of the functions in this module are:
Consult the operator module's documentation for a complete list.
-
-The functional module
----------------------
-
-Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__
-provides a number of more advanced tools for functional programming. It also
-reimplements several Python built-ins, trying to make them more intuitive to
-those used to functional programming in other languages.
-
-This section contains an introduction to some of the most important functions in
-``functional``; full documentation can be found at `the project's website
-<http://oakwinter.com/code/functional/documentation/>`__.
-
-``compose(outer, inner, unpack=False)``
-
-The ``compose()`` function implements function composition. In other words, it
-returns a wrapper around the ``outer`` and ``inner`` callables, such that the
-return value from ``inner`` is fed directly to ``outer``. That is, ::
-
- >>> def add(a, b):
- ... return a + b
- ...
- >>> def double(a):
- ... return 2 * a
- ...
- >>> compose(double, add)(5, 6)
- 22
-
-is equivalent to ::
-
- >>> double(add(5, 6))
- 22
-
-The ``unpack`` keyword is provided to work around the fact that Python functions
-are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__. By
-default, it is expected that the ``inner`` function will return a single object
-and that the ``outer`` function will take a single argument. Setting the
-``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which
-will be expanded before being passed to ``outer``. Put simply, ::
-
- compose(f, g)(5, 6)
-
-is equivalent to::
-
- f(g(5, 6))
-
-while ::
-
- compose(f, g, unpack=True)(5, 6)
-
-is equivalent to::
-
- f(*g(5, 6))
-
-Even though ``compose()`` only accepts two functions, it's trivial to build up a
-version that will compose any number of functions. We'll use
-:func:`functools.reduce`, ``compose()`` and ``partial()`` (the last of which is
-provided by both ``functional`` and ``functools``). ::
-
- from functional import compose, partial
- import functools
-
-
- multi_compose = partial(functools.reduce, compose)
-
-
-We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of
-``"".join(...)`` that converts its arguments to string::
-
- from functional import compose, partial
-
- join = compose("".join, partial(map, str))
-
-
-``flip(func)``
-
-``flip()`` wraps the callable in ``func`` and causes it to receive its
-non-keyword arguments in reverse order. ::
-
- >>> def triple(a, b, c):
- ... return (a, b, c)
- ...
- >>> triple(5, 6, 7)
- (5, 6, 7)
- >>>
- >>> flipped_triple = flip(triple)
- >>> flipped_triple(5, 6, 7)
- (7, 6, 5)
-
-``foldl(func, start, iterable)``
-
-``foldl()`` takes a binary function, a starting value (usually some kind of
-'zero'), and an iterable. The function is applied to the starting value and the
-first element of the list, then the result of that and the second element of the
-list, then the result of that and the third element of the list, and so on.
-
-This means that a call such as::
-
- foldl(f, 0, [1, 2, 3])
-
-is equivalent to::
-
- f(f(f(0, 1), 2), 3)
-
-
-``foldl()`` is roughly equivalent to the following recursive function::
-
- def foldl(func, start, seq):
- if len(seq) == 0:
- return start
-
- return foldl(func, func(start, seq[0]), seq[1:])
-
-Speaking of equivalence, the above ``foldl`` call can be expressed in terms of
-the built-in :func:`functools.reduce` like so::
-
- import functools
- functools.reduce(f, [1, 2, 3], 0)
-
-
-We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a
-cleaner, more aesthetically-pleasing version of Python's ``"".join(...)``
-idiom::
-
- from functional import foldl, partial from operator import concat
-
- join = partial(foldl, concat, "")
-
-
Small functions and the lambda expression
=========================================
@@ -1152,28 +1021,23 @@ need to define a new function at all::
existing_files = filter(os.path.exists, file_list)
If the function you need doesn't exist, you need to write it. One way to write
-small functions is to use the ``lambda`` statement. ``lambda`` takes a number
-of parameters and an expression combining these parameters, and creates a small
-function that returns the value of the expression::
+small functions is to use the :keyword:`lambda` statement. ``lambda`` takes a
+number of parameters and an expression combining these parameters, and creates
+an anonymous function that returns the value of the expression::
- lowercase = lambda x: x.lower()
+ adder = lambda x, y: x+y
print_assign = lambda name, value: name + '=' + str(value)
- adder = lambda x, y: x+y
-
An alternative is to just use the ``def`` statement and define a function in the
usual way::
- def lowercase(x):
- return x.lower()
+ def adder(x, y):
+ return x + y
def print_assign(name, value):
return name + '=' + str(value)
- def adder(x,y):
- return x + y
-
Which alternative is preferable? That's a style question; my usual course is to
avoid using ``lambda``.
@@ -1182,9 +1046,7 @@ functions it can define. The result has to be computable as a single
expression, which means you can't have multiway ``if... elif... else``
comparisons or ``try... except`` statements. If you try to do too much in a
``lambda`` statement, you'll end up with an overly complicated expression that's
-hard to read. Quick, what's the following code doing?
-
-::
+hard to read. Quick, what's the following code doing? ::
import functools
total = functools.reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
@@ -1194,7 +1056,7 @@ out what's going on. Using a short nested ``def`` statements makes things a
little bit better::
import functools
- def combine (a, b):
+ def combine(a, b):
return 0, a[1] + b[1]
total = functools.reduce(combine, items)[1]
@@ -1214,12 +1076,12 @@ Many uses of :func:`functools.reduce` are clearer when written as ``for`` loops.
Fredrik Lundh once suggested the following set of rules for refactoring uses of
``lambda``:
-1) Write a lambda function.
-2) Write a comment explaining what the heck that lambda does.
-3) Study the comment for a while, and think of a name that captures the essence
+1. Write a lambda function.
+2. Write a comment explaining what the heck that lambda does.
+3. Study the comment for a while, and think of a name that captures the essence
of the comment.
-4) Convert the lambda to a def statement, using that name.
-5) Remove the comment.
+4. Convert the lambda to a def statement, using that name.
+5. Remove the comment.
I really like these rules, but you're free to disagree
about whether this lambda-free style is better.
@@ -1280,9 +1142,9 @@ Text Processing".
Mertz also wrote a 3-part series of articles on functional programming
for IBM's DeveloperWorks site; see
-`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__,
-`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and
-`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__,
+`part 1 <http://www.ibm.com/developerworks/linux/library/l-prog/index.html>`__,
+`part 2 <http://www.ibm.com/developerworks/linux/library/l-prog2/index.html>`__, and
+`part 3 <http://www.ibm.com/developerworks/linux/library/l-prog3/index.html>`__,
Python documentation
diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst
index 417ae0047e..a11d3da02f 100644
--- a/Doc/howto/index.rst
+++ b/Doc/howto/index.rst
@@ -14,15 +14,18 @@ Currently, the HOWTOs are:
:maxdepth: 1
advocacy.rst
+ pyporting.rst
cporting.rst
curses.rst
descriptor.rst
- doanddont.rst
functional.rst
+ logging.rst
+ logging-cookbook.rst
regex.rst
sockets.rst
sorting.rst
unicode.rst
urllib2.rst
webservers.rst
+ argparse.rst
diff --git a/Doc/howto/logging-cookbook.rst b/Doc/howto/logging-cookbook.rst
new file mode 100644
index 0000000000..33963f93ec
--- /dev/null
+++ b/Doc/howto/logging-cookbook.rst
@@ -0,0 +1,1672 @@
+.. _logging-cookbook:
+
+================
+Logging Cookbook
+================
+
+:Author: Vinay Sajip <vinay_sajip at red-dove dot com>
+
+This page contains a number of recipes related to logging, which have been found
+useful in the past.
+
+.. currentmodule:: logging
+
+Using logging in multiple modules
+---------------------------------
+
+Multiple calls to ``logging.getLogger('someLogger')`` return a reference to the
+same logger object. This is true not only within the same module, but also
+across modules as long as it is in the same Python interpreter process. It is
+true for references to the same object; additionally, application code can
+define and configure a parent logger in one module and create (but not
+configure) a child logger in a separate module, and all logger calls to the
+child will pass up to the parent. Here is a main module::
+
+ import logging
+ import auxiliary_module
+
+ # create logger with 'spam_application'
+ logger = logging.getLogger('spam_application')
+ logger.setLevel(logging.DEBUG)
+ # create file handler which logs even debug messages
+ fh = logging.FileHandler('spam.log')
+ fh.setLevel(logging.DEBUG)
+ # create console handler with a higher log level
+ ch = logging.StreamHandler()
+ ch.setLevel(logging.ERROR)
+ # create formatter and add it to the handlers
+ formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+ fh.setFormatter(formatter)
+ ch.setFormatter(formatter)
+ # add the handlers to the logger
+ logger.addHandler(fh)
+ logger.addHandler(ch)
+
+ logger.info('creating an instance of auxiliary_module.Auxiliary')
+ a = auxiliary_module.Auxiliary()
+ logger.info('created an instance of auxiliary_module.Auxiliary')
+ logger.info('calling auxiliary_module.Auxiliary.do_something')
+ a.do_something()
+ logger.info('finished auxiliary_module.Auxiliary.do_something')
+ logger.info('calling auxiliary_module.some_function()')
+ auxiliary_module.some_function()
+ logger.info('done with auxiliary_module.some_function()')
+
+Here is the auxiliary module::
+
+ import logging
+
+ # create logger
+ module_logger = logging.getLogger('spam_application.auxiliary')
+
+ class Auxiliary:
+ def __init__(self):
+ self.logger = logging.getLogger('spam_application.auxiliary.Auxiliary')
+ self.logger.info('creating an instance of Auxiliary')
+ def do_something(self):
+ self.logger.info('doing something')
+ a = 1 + 1
+ self.logger.info('done doing something')
+
+ def some_function():
+ module_logger.info('received a call to "some_function"')
+
+The output looks like this::
+
+ 2005-03-23 23:47:11,663 - spam_application - INFO -
+ creating an instance of auxiliary_module.Auxiliary
+ 2005-03-23 23:47:11,665 - spam_application.auxiliary.Auxiliary - INFO -
+ creating an instance of Auxiliary
+ 2005-03-23 23:47:11,665 - spam_application - INFO -
+ created an instance of auxiliary_module.Auxiliary
+ 2005-03-23 23:47:11,668 - spam_application - INFO -
+ calling auxiliary_module.Auxiliary.do_something
+ 2005-03-23 23:47:11,668 - spam_application.auxiliary.Auxiliary - INFO -
+ doing something
+ 2005-03-23 23:47:11,669 - spam_application.auxiliary.Auxiliary - INFO -
+ done doing something
+ 2005-03-23 23:47:11,670 - spam_application - INFO -
+ finished auxiliary_module.Auxiliary.do_something
+ 2005-03-23 23:47:11,671 - spam_application - INFO -
+ calling auxiliary_module.some_function()
+ 2005-03-23 23:47:11,672 - spam_application.auxiliary - INFO -
+ received a call to 'some_function'
+ 2005-03-23 23:47:11,673 - spam_application - INFO -
+ done with auxiliary_module.some_function()
+
+Multiple handlers and formatters
+--------------------------------
+
+Loggers are plain Python objects. The :func:`addHandler` method has no minimum
+or maximum quota for the number of handlers you may add. Sometimes it will be
+beneficial for an application to log all messages of all severities to a text
+file while simultaneously logging errors or above to the console. To set this
+up, simply configure the appropriate handlers. The logging calls in the
+application code will remain unchanged. Here is a slight modification to the
+previous simple module-based configuration example::
+
+ import logging
+
+ logger = logging.getLogger('simple_example')
+ logger.setLevel(logging.DEBUG)
+ # create file handler which logs even debug messages
+ fh = logging.FileHandler('spam.log')
+ fh.setLevel(logging.DEBUG)
+ # create console handler with a higher log level
+ ch = logging.StreamHandler()
+ ch.setLevel(logging.ERROR)
+ # create formatter and add it to the handlers
+ formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+ ch.setFormatter(formatter)
+ fh.setFormatter(formatter)
+ # add the handlers to logger
+ logger.addHandler(ch)
+ logger.addHandler(fh)
+
+ # 'application' code
+ logger.debug('debug message')
+ logger.info('info message')
+ logger.warn('warn message')
+ logger.error('error message')
+ logger.critical('critical message')
+
+Notice that the 'application' code does not care about multiple handlers. All
+that changed was the addition and configuration of a new handler named *fh*.
+
+The ability to create new handlers with higher- or lower-severity filters can be
+very helpful when writing and testing an application. Instead of using many
+``print`` statements for debugging, use ``logger.debug``: Unlike the print
+statements, which you will have to delete or comment out later, the logger.debug
+statements can remain intact in the source code and remain dormant until you
+need them again. At that time, the only change that needs to happen is to
+modify the severity level of the logger and/or handler to debug.
+
+.. _multiple-destinations:
+
+Logging to multiple destinations
+--------------------------------
+
+Let's say you want to log to console and file with different message formats and
+in differing circumstances. Say you want to log messages with levels of DEBUG
+and higher to file, and those messages at level INFO and higher to the console.
+Let's also assume that the file should contain timestamps, but the console
+messages should not. Here's how you can achieve this::
+
+ import logging
+
+ # set up logging to file - see previous section for more details
+ logging.basicConfig(level=logging.DEBUG,
+ format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
+ datefmt='%m-%d %H:%M',
+ filename='/temp/myapp.log',
+ filemode='w')
+ # define a Handler which writes INFO messages or higher to the sys.stderr
+ console = logging.StreamHandler()
+ console.setLevel(logging.INFO)
+ # set a format which is simpler for console use
+ formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
+ # tell the handler to use this format
+ console.setFormatter(formatter)
+ # add the handler to the root logger
+ logging.getLogger('').addHandler(console)
+
+ # Now, we can log to the root logger, or any other logger. First the root...
+ logging.info('Jackdaws love my big sphinx of quartz.')
+
+ # Now, define a couple of other loggers which might represent areas in your
+ # application:
+
+ logger1 = logging.getLogger('myapp.area1')
+ logger2 = logging.getLogger('myapp.area2')
+
+ logger1.debug('Quick zephyrs blow, vexing daft Jim.')
+ logger1.info('How quickly daft jumping zebras vex.')
+ logger2.warning('Jail zesty vixen who grabbed pay from quack.')
+ logger2.error('The five boxing wizards jump quickly.')
+
+When you run this, on the console you will see ::
+
+ root : INFO Jackdaws love my big sphinx of quartz.
+ myapp.area1 : INFO How quickly daft jumping zebras vex.
+ myapp.area2 : WARNING Jail zesty vixen who grabbed pay from quack.
+ myapp.area2 : ERROR The five boxing wizards jump quickly.
+
+and in the file you will see something like ::
+
+ 10-22 22:19 root INFO Jackdaws love my big sphinx of quartz.
+ 10-22 22:19 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim.
+ 10-22 22:19 myapp.area1 INFO How quickly daft jumping zebras vex.
+ 10-22 22:19 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack.
+ 10-22 22:19 myapp.area2 ERROR The five boxing wizards jump quickly.
+
+As you can see, the DEBUG message only shows up in the file. The other messages
+are sent to both destinations.
+
+This example uses console and file handlers, but you can use any number and
+combination of handlers you choose.
+
+
+Configuration server example
+----------------------------
+
+Here is an example of a module using the logging configuration server::
+
+ import logging
+ import logging.config
+ import time
+ import os
+
+ # read initial config file
+ logging.config.fileConfig('logging.conf')
+
+ # create and start listener on port 9999
+ t = logging.config.listen(9999)
+ t.start()
+
+ logger = logging.getLogger('simpleExample')
+
+ try:
+ # loop through logging calls to see the difference
+ # new configurations make, until Ctrl+C is pressed
+ while True:
+ logger.debug('debug message')
+ logger.info('info message')
+ logger.warn('warn message')
+ logger.error('error message')
+ logger.critical('critical message')
+ time.sleep(5)
+ except KeyboardInterrupt:
+ # cleanup
+ logging.config.stopListening()
+ t.join()
+
+And here is a script that takes a filename and sends that file to the server,
+properly preceded with the binary-encoded length, as the new logging
+configuration::
+
+ #!/usr/bin/env python
+ import socket, sys, struct
+
+ with open(sys.argv[1], 'rb') as f:
+ data_to_send = f.read()
+
+ HOST = 'localhost'
+ PORT = 9999
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ print('connecting...')
+ s.connect((HOST, PORT))
+ print('sending config...')
+ s.send(struct.pack('>L', len(data_to_send)))
+ s.send(data_to_send)
+ s.close()
+ print('complete')
+
+
+Dealing with handlers that block
+--------------------------------
+
+.. currentmodule:: logging.handlers
+
+Sometimes you have to get your logging handlers to do their work without
+blocking the thread you're logging from. This is common in Web applications,
+though of course it also occurs in other scenarios.
+
+A common culprit which demonstrates sluggish behaviour is the
+:class:`SMTPHandler`: sending emails can take a long time, for a
+number of reasons outside the developer's control (for example, a poorly
+performing mail or network infrastructure). But almost any network-based
+handler can block: Even a :class:`SocketHandler` operation may do a
+DNS query under the hood which is too slow (and this query can be deep in the
+socket library code, below the Python layer, and outside your control).
+
+One solution is to use a two-part approach. For the first part, attach only a
+:class:`QueueHandler` to those loggers which are accessed from
+performance-critical threads. They simply write to their queue, which can be
+sized to a large enough capacity or initialized with no upper bound to their
+size. The write to the queue will typically be accepted quickly, though you
+will probably need to catch the :exc:`queue.Full` exception as a precaution
+in your code. If you are a library developer who has performance-critical
+threads in their code, be sure to document this (together with a suggestion to
+attach only ``QueueHandlers`` to your loggers) for the benefit of other
+developers who will use your code.
+
+The second part of the solution is :class:`QueueListener`, which has been
+designed as the counterpart to :class:`QueueHandler`. A
+:class:`QueueListener` is very simple: it's passed a queue and some handlers,
+and it fires up an internal thread which listens to its queue for LogRecords
+sent from ``QueueHandlers`` (or any other source of ``LogRecords``, for that
+matter). The ``LogRecords`` are removed from the queue and passed to the
+handlers for processing.
+
+The advantage of having a separate :class:`QueueListener` class is that you
+can use the same instance to service multiple ``QueueHandlers``. This is more
+resource-friendly than, say, having threaded versions of the existing handler
+classes, which would eat up one thread per handler for no particular benefit.
+
+An example of using these two classes follows (imports omitted)::
+
+ que = queue.Queue(-1) # no limit on size
+ queue_handler = QueueHandler(que)
+ handler = logging.StreamHandler()
+ listener = QueueListener(que, handler)
+ root = logging.getLogger()
+ root.addHandler(queue_handler)
+ formatter = logging.Formatter('%(threadName)s: %(message)s')
+ handler.setFormatter(formatter)
+ listener.start()
+ # The log output will display the thread which generated
+ # the event (the main thread) rather than the internal
+ # thread which monitors the internal queue. This is what
+ # you want to happen.
+ root.warning('Look out!')
+ listener.stop()
+
+which, when run, will produce::
+
+ MainThread: Look out!
+
+
+.. _network-logging:
+
+Sending and receiving logging events across a network
+-----------------------------------------------------
+
+Let's say you want to send logging events across a network, and handle them at
+the receiving end. A simple way of doing this is attaching a
+:class:`SocketHandler` instance to the root logger at the sending end::
+
+ import logging, logging.handlers
+
+ rootLogger = logging.getLogger('')
+ rootLogger.setLevel(logging.DEBUG)
+ socketHandler = logging.handlers.SocketHandler('localhost',
+ logging.handlers.DEFAULT_TCP_LOGGING_PORT)
+ # don't bother with a formatter, since a socket handler sends the event as
+ # an unformatted pickle
+ rootLogger.addHandler(socketHandler)
+
+ # Now, we can log to the root logger, or any other logger. First the root...
+ logging.info('Jackdaws love my big sphinx of quartz.')
+
+ # Now, define a couple of other loggers which might represent areas in your
+ # application:
+
+ logger1 = logging.getLogger('myapp.area1')
+ logger2 = logging.getLogger('myapp.area2')
+
+ logger1.debug('Quick zephyrs blow, vexing daft Jim.')
+ logger1.info('How quickly daft jumping zebras vex.')
+ logger2.warning('Jail zesty vixen who grabbed pay from quack.')
+ logger2.error('The five boxing wizards jump quickly.')
+
+At the receiving end, you can set up a receiver using the :mod:`socketserver`
+module. Here is a basic working example::
+
+ import pickle
+ import logging
+ import logging.handlers
+ import socketserver
+ import struct
+
+
+ class LogRecordStreamHandler(socketserver.StreamRequestHandler):
+ """Handler for a streaming logging request.
+
+ This basically logs the record using whatever logging policy is
+ configured locally.
+ """
+
+ def handle(self):
+ """
+ Handle multiple requests - each expected to be a 4-byte length,
+ followed by the LogRecord in pickle format. Logs the record
+ according to whatever policy is configured locally.
+ """
+ while True:
+ chunk = self.connection.recv(4)
+ if len(chunk) < 4:
+ break
+ slen = struct.unpack('>L', chunk)[0]
+ chunk = self.connection.recv(slen)
+ while len(chunk) < slen:
+ chunk = chunk + self.connection.recv(slen - len(chunk))
+ obj = self.unPickle(chunk)
+ record = logging.makeLogRecord(obj)
+ self.handleLogRecord(record)
+
+ def unPickle(self, data):
+ return pickle.loads(data)
+
+ def handleLogRecord(self, record):
+ # if a name is specified, we use the named logger rather than the one
+ # implied by the record.
+ if self.server.logname is not None:
+ name = self.server.logname
+ else:
+ name = record.name
+ logger = logging.getLogger(name)
+ # N.B. EVERY record gets logged. This is because Logger.handle
+ # is normally called AFTER logger-level filtering. If you want
+ # to do filtering, do it at the client end to save wasting
+ # cycles and network bandwidth!
+ logger.handle(record)
+
+ class LogRecordSocketReceiver(socketserver.ThreadingTCPServer):
+ """
+ Simple TCP socket-based logging receiver suitable for testing.
+ """
+
+ allow_reuse_address = 1
+
+ def __init__(self, host='localhost',
+ port=logging.handlers.DEFAULT_TCP_LOGGING_PORT,
+ handler=LogRecordStreamHandler):
+ socketserver.ThreadingTCPServer.__init__(self, (host, port), handler)
+ self.abort = 0
+ self.timeout = 1
+ self.logname = None
+
+ def serve_until_stopped(self):
+ import select
+ abort = 0
+ while not abort:
+ rd, wr, ex = select.select([self.socket.fileno()],
+ [], [],
+ self.timeout)
+ if rd:
+ self.handle_request()
+ abort = self.abort
+
+ def main():
+ logging.basicConfig(
+ format='%(relativeCreated)5d %(name)-15s %(levelname)-8s %(message)s')
+ tcpserver = LogRecordSocketReceiver()
+ print('About to start TCP server...')
+ tcpserver.serve_until_stopped()
+
+ if __name__ == '__main__':
+ main()
+
+First run the server, and then the client. On the client side, nothing is
+printed on the console; on the server side, you should see something like::
+
+ About to start TCP server...
+ 59 root INFO Jackdaws love my big sphinx of quartz.
+ 59 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim.
+ 69 myapp.area1 INFO How quickly daft jumping zebras vex.
+ 69 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack.
+ 69 myapp.area2 ERROR The five boxing wizards jump quickly.
+
+Note that there are some security issues with pickle in some scenarios. If
+these affect you, you can use an alternative serialization scheme by overriding
+the :meth:`makePickle` method and implementing your alternative there, as
+well as adapting the above script to use your alternative serialization.
+
+
+.. _context-info:
+
+Adding contextual information to your logging output
+----------------------------------------------------
+
+Sometimes you want logging output to contain contextual information in
+addition to the parameters passed to the logging call. For example, in a
+networked application, it may be desirable to log client-specific information
+in the log (e.g. remote client's username, or IP address). Although you could
+use the *extra* parameter to achieve this, it's not always convenient to pass
+the information in this way. While it might be tempting to create
+:class:`Logger` instances on a per-connection basis, this is not a good idea
+because these instances are not garbage collected. While this is not a problem
+in practice, when the number of :class:`Logger` instances is dependent on the
+level of granularity you want to use in logging an application, it could
+be hard to manage if the number of :class:`Logger` instances becomes
+effectively unbounded.
+
+
+Using LoggerAdapters to impart contextual information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+An easy way in which you can pass contextual information to be output along
+with logging event information is to use the :class:`LoggerAdapter` class.
+This class is designed to look like a :class:`Logger`, so that you can call
+:meth:`debug`, :meth:`info`, :meth:`warning`, :meth:`error`,
+:meth:`exception`, :meth:`critical` and :meth:`log`. These methods have the
+same signatures as their counterparts in :class:`Logger`, so you can use the
+two types of instances interchangeably.
+
+When you create an instance of :class:`LoggerAdapter`, you pass it a
+:class:`Logger` instance and a dict-like object which contains your contextual
+information. When you call one of the logging methods on an instance of
+:class:`LoggerAdapter`, it delegates the call to the underlying instance of
+:class:`Logger` passed to its constructor, and arranges to pass the contextual
+information in the delegated call. Here's a snippet from the code of
+:class:`LoggerAdapter`::
+
+ def debug(self, msg, *args, **kwargs):
+ """
+ Delegate a debug call to the underlying logger, after adding
+ contextual information from this adapter instance.
+ """
+ msg, kwargs = self.process(msg, kwargs)
+ self.logger.debug(msg, *args, **kwargs)
+
+The :meth:`process` method of :class:`LoggerAdapter` is where the contextual
+information is added to the logging output. It's passed the message and
+keyword arguments of the logging call, and it passes back (potentially)
+modified versions of these to use in the call to the underlying logger. The
+default implementation of this method leaves the message alone, but inserts
+an 'extra' key in the keyword argument whose value is the dict-like object
+passed to the constructor. Of course, if you had passed an 'extra' keyword
+argument in the call to the adapter, it will be silently overwritten.
+
+The advantage of using 'extra' is that the values in the dict-like object are
+merged into the :class:`LogRecord` instance's __dict__, allowing you to use
+customized strings with your :class:`Formatter` instances which know about
+the keys of the dict-like object. If you need a different method, e.g. if you
+want to prepend or append the contextual information to the message string,
+you just need to subclass :class:`LoggerAdapter` and override :meth:`process`
+to do what you need. Here's an example script which uses this class, which
+also illustrates what dict-like behaviour is needed from an arbitrary
+'dict-like' object for use in the constructor::
+
+ import logging
+
+ class ConnInfo:
+ """
+ An example class which shows how an arbitrary class can be used as
+ the 'extra' context information repository passed to a LoggerAdapter.
+ """
+
+ def __getitem__(self, name):
+ """
+ To allow this instance to look like a dict.
+ """
+ from random import choice
+ if name == 'ip':
+ result = choice(['127.0.0.1', '192.168.0.1'])
+ elif name == 'user':
+ result = choice(['jim', 'fred', 'sheila'])
+ else:
+ result = self.__dict__.get(name, '?')
+ return result
+
+ def __iter__(self):
+ """
+ To allow iteration over keys, which will be merged into
+ the LogRecord dict before formatting and output.
+ """
+ keys = ['ip', 'user']
+ keys.extend(self.__dict__.keys())
+ return keys.__iter__()
+
+ if __name__ == '__main__':
+ from random import choice
+ levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL)
+ a1 = logging.LoggerAdapter(logging.getLogger('a.b.c'),
+ { 'ip' : '123.231.231.123', 'user' : 'sheila' })
+ logging.basicConfig(level=logging.DEBUG,
+ format='%(asctime)-15s %(name)-5s %(levelname)-8s IP: %(ip)-15s User: %(user)-8s %(message)s')
+ a1.debug('A debug message')
+ a1.info('An info message with %s', 'some parameters')
+ a2 = logging.LoggerAdapter(logging.getLogger('d.e.f'), ConnInfo())
+ for x in range(10):
+ lvl = choice(levels)
+ lvlname = logging.getLevelName(lvl)
+ a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters')
+
+When this script is run, the output should look something like this::
+
+ 2008-01-18 14:49:54,023 a.b.c DEBUG IP: 123.231.231.123 User: sheila A debug message
+ 2008-01-18 14:49:54,023 a.b.c INFO IP: 123.231.231.123 User: sheila An info message with some parameters
+ 2008-01-18 14:49:54,023 d.e.f CRITICAL IP: 192.168.0.1 User: jim A message at CRITICAL level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f INFO IP: 192.168.0.1 User: jim A message at INFO level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f WARNING IP: 192.168.0.1 User: sheila A message at WARNING level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f ERROR IP: 127.0.0.1 User: fred A message at ERROR level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f ERROR IP: 127.0.0.1 User: sheila A message at ERROR level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f WARNING IP: 192.168.0.1 User: sheila A message at WARNING level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f WARNING IP: 192.168.0.1 User: jim A message at WARNING level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f INFO IP: 192.168.0.1 User: fred A message at INFO level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f WARNING IP: 192.168.0.1 User: sheila A message at WARNING level with 2 parameters
+ 2008-01-18 14:49:54,033 d.e.f WARNING IP: 127.0.0.1 User: jim A message at WARNING level with 2 parameters
+
+
+.. _filters-contextual:
+
+Using Filters to impart contextual information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can also add contextual information to log output using a user-defined
+:class:`Filter`. ``Filter`` instances are allowed to modify the ``LogRecords``
+passed to them, including adding additional attributes which can then be output
+using a suitable format string, or if needed a custom :class:`Formatter`.
+
+For example in a web application, the request being processed (or at least,
+the interesting parts of it) can be stored in a threadlocal
+(:class:`threading.local`) variable, and then accessed from a ``Filter`` to
+add, say, information from the request - say, the remote IP address and remote
+user's username - to the ``LogRecord``, using the attribute names 'ip' and
+'user' as in the ``LoggerAdapter`` example above. In that case, the same format
+string can be used to get similar output to that shown above. Here's an example
+script::
+
+ import logging
+ from random import choice
+
+ class ContextFilter(logging.Filter):
+ """
+ This is a filter which injects contextual information into the log.
+
+ Rather than use actual contextual information, we just use random
+ data in this demo.
+ """
+
+ USERS = ['jim', 'fred', 'sheila']
+ IPS = ['123.231.231.123', '127.0.0.1', '192.168.0.1']
+
+ def filter(self, record):
+
+ record.ip = choice(ContextFilter.IPS)
+ record.user = choice(ContextFilter.USERS)
+ return True
+
+ if __name__ == '__main__':
+ levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL)
+ logging.basicConfig(level=logging.DEBUG,
+ format='%(asctime)-15s %(name)-5s %(levelname)-8s IP: %(ip)-15s User: %(user)-8s %(message)s')
+ a1 = logging.getLogger('a.b.c')
+ a2 = logging.getLogger('d.e.f')
+
+ f = ContextFilter()
+ a1.addFilter(f)
+ a2.addFilter(f)
+ a1.debug('A debug message')
+ a1.info('An info message with %s', 'some parameters')
+ for x in range(10):
+ lvl = choice(levels)
+ lvlname = logging.getLevelName(lvl)
+ a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters')
+
+which, when run, produces something like::
+
+ 2010-09-06 22:38:15,292 a.b.c DEBUG IP: 123.231.231.123 User: fred A debug message
+ 2010-09-06 22:38:15,300 a.b.c INFO IP: 192.168.0.1 User: sheila An info message with some parameters
+ 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 127.0.0.1 User: sheila A message at CRITICAL level with 2 parameters
+ 2010-09-06 22:38:15,300 d.e.f ERROR IP: 127.0.0.1 User: jim A message at ERROR level with 2 parameters
+ 2010-09-06 22:38:15,300 d.e.f DEBUG IP: 127.0.0.1 User: sheila A message at DEBUG level with 2 parameters
+ 2010-09-06 22:38:15,300 d.e.f ERROR IP: 123.231.231.123 User: fred A message at ERROR level with 2 parameters
+ 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 192.168.0.1 User: jim A message at CRITICAL level with 2 parameters
+ 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 127.0.0.1 User: sheila A message at CRITICAL level with 2 parameters
+ 2010-09-06 22:38:15,300 d.e.f DEBUG IP: 192.168.0.1 User: jim A message at DEBUG level with 2 parameters
+ 2010-09-06 22:38:15,301 d.e.f ERROR IP: 127.0.0.1 User: sheila A message at ERROR level with 2 parameters
+ 2010-09-06 22:38:15,301 d.e.f DEBUG IP: 123.231.231.123 User: fred A message at DEBUG level with 2 parameters
+ 2010-09-06 22:38:15,301 d.e.f INFO IP: 123.231.231.123 User: fred A message at INFO level with 2 parameters
+
+
+.. _multiple-processes:
+
+Logging to a single file from multiple processes
+------------------------------------------------
+
+Although logging is thread-safe, and logging to a single file from multiple
+threads in a single process *is* supported, logging to a single file from
+*multiple processes* is *not* supported, because there is no standard way to
+serialize access to a single file across multiple processes in Python. If you
+need to log to a single file from multiple processes, one way of doing this is
+to have all the processes log to a :class:`SocketHandler`, and have a separate
+process which implements a socket server which reads from the socket and logs
+to file. (If you prefer, you can dedicate one thread in one of the existing
+processes to perform this function.) :ref:`This section <network-logging>`
+documents this approach in more detail and includes a working socket receiver
+which can be used as a starting point for you to adapt in your own
+applications.
+
+If you are using a recent version of Python which includes the
+:mod:`multiprocessing` module, you could write your own handler which uses the
+:class:`Lock` class from this module to serialize access to the file from
+your processes. The existing :class:`FileHandler` and subclasses do not make
+use of :mod:`multiprocessing` at present, though they may do so in the future.
+Note that at present, the :mod:`multiprocessing` module does not provide
+working lock functionality on all platforms (see
+http://bugs.python.org/issue3770).
+
+.. currentmodule:: logging.handlers
+
+Alternatively, you can use a ``Queue`` and a :class:`QueueHandler` to send
+all logging events to one of the processes in your multi-process application.
+The following example script demonstrates how you can do this; in the example
+a separate listener process listens for events sent by other processes and logs
+them according to its own logging configuration. Although the example only
+demonstrates one way of doing it (for example, you may want to use a listener
+thread rather than a separate listener process -- the implementation would be
+analogous) it does allow for completely different logging configurations for
+the listener and the other processes in your application, and can be used as
+the basis for code meeting your own specific requirements::
+
+ # You'll need these imports in your own code
+ import logging
+ import logging.handlers
+ import multiprocessing
+
+ # Next two import lines for this demo only
+ from random import choice, random
+ import time
+
+ #
+ # Because you'll want to define the logging configurations for listener and workers, the
+ # listener and worker process functions take a configurer parameter which is a callable
+ # for configuring logging for that process. These functions are also passed the queue,
+ # which they use for communication.
+ #
+ # In practice, you can configure the listener however you want, but note that in this
+ # simple example, the listener does not apply level or filter logic to received records.
+ # In practice, you would probably want to do this logic in the worker processes, to avoid
+ # sending events which would be filtered out between processes.
+ #
+ # The size of the rotated files is made small so you can see the results easily.
+ def listener_configurer():
+ root = logging.getLogger()
+ h = logging.handlers.RotatingFileHandler('mptest.log', 'a', 300, 10)
+ f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s')
+ h.setFormatter(f)
+ root.addHandler(h)
+
+ # This is the listener process top-level loop: wait for logging events
+ # (LogRecords)on the queue and handle them, quit when you get a None for a
+ # LogRecord.
+ def listener_process(queue, configurer):
+ configurer()
+ while True:
+ try:
+ record = queue.get()
+ if record is None: # We send this as a sentinel to tell the listener to quit.
+ break
+ logger = logging.getLogger(record.name)
+ logger.handle(record) # No level or filter logic applied - just do it!
+ except (KeyboardInterrupt, SystemExit):
+ raise
+ except:
+ import sys, traceback
+ print('Whoops! Problem:', file=sys.stderr)
+ traceback.print_exc(file=sys.stderr)
+
+ # Arrays used for random selections in this demo
+
+ LEVELS = [logging.DEBUG, logging.INFO, logging.WARNING,
+ logging.ERROR, logging.CRITICAL]
+
+ LOGGERS = ['a.b.c', 'd.e.f']
+
+ MESSAGES = [
+ 'Random message #1',
+ 'Random message #2',
+ 'Random message #3',
+ ]
+
+ # The worker configuration is done at the start of the worker process run.
+ # Note that on Windows you can't rely on fork semantics, so each process
+ # will run the logging configuration code when it starts.
+ def worker_configurer(queue):
+ h = logging.handlers.QueueHandler(queue) # Just the one handler needed
+ root = logging.getLogger()
+ root.addHandler(h)
+ root.setLevel(logging.DEBUG) # send all messages, for demo; no other level or filter logic applied.
+
+ # This is the worker process top-level loop, which just logs ten events with
+ # random intervening delays before terminating.
+ # The print messages are just so you know it's doing something!
+ def worker_process(queue, configurer):
+ configurer(queue)
+ name = multiprocessing.current_process().name
+ print('Worker started: %s' % name)
+ for i in range(10):
+ time.sleep(random())
+ logger = logging.getLogger(choice(LOGGERS))
+ level = choice(LEVELS)
+ message = choice(MESSAGES)
+ logger.log(level, message)
+ print('Worker finished: %s' % name)
+
+ # Here's where the demo gets orchestrated. Create the queue, create and start
+ # the listener, create ten workers and start them, wait for them to finish,
+ # then send a None to the queue to tell the listener to finish.
+ def main():
+ queue = multiprocessing.Queue(-1)
+ listener = multiprocessing.Process(target=listener_process,
+ args=(queue, listener_configurer))
+ listener.start()
+ workers = []
+ for i in range(10):
+ worker = multiprocessing.Process(target=worker_process,
+ args=(queue, worker_configurer))
+ workers.append(worker)
+ worker.start()
+ for w in workers:
+ w.join()
+ queue.put_nowait(None)
+ listener.join()
+
+ if __name__ == '__main__':
+ main()
+
+A variant of the above script keeps the logging in the main process, in a
+separate thread::
+
+ import logging
+ import logging.config
+ import logging.handlers
+ from multiprocessing import Process, Queue
+ import random
+ import threading
+ import time
+
+ def logger_thread(q):
+ while True:
+ record = q.get()
+ if record is None:
+ break
+ logger = logging.getLogger(record.name)
+ logger.handle(record)
+
+
+ def worker_process(q):
+ qh = logging.handlers.QueueHandler(q)
+ root = logging.getLogger()
+ root.setLevel(logging.DEBUG)
+ root.addHandler(qh)
+ levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
+ logging.CRITICAL]
+ loggers = ['foo', 'foo.bar', 'foo.bar.baz',
+ 'spam', 'spam.ham', 'spam.ham.eggs']
+ for i in range(100):
+ lvl = random.choice(levels)
+ logger = logging.getLogger(random.choice(loggers))
+ logger.log(lvl, 'Message no. %d', i)
+
+ if __name__ == '__main__':
+ q = Queue()
+ d = {
+ 'version': 1,
+ 'formatters': {
+ 'detailed': {
+ 'class': 'logging.Formatter',
+ 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
+ }
+ },
+ 'handlers': {
+ 'console': {
+ 'class': 'logging.StreamHandler',
+ 'level': 'INFO',
+ },
+ 'file': {
+ 'class': 'logging.FileHandler',
+ 'filename': 'mplog.log',
+ 'mode': 'w',
+ 'formatter': 'detailed',
+ },
+ 'foofile': {
+ 'class': 'logging.FileHandler',
+ 'filename': 'mplog-foo.log',
+ 'mode': 'w',
+ 'formatter': 'detailed',
+ },
+ 'errors': {
+ 'class': 'logging.FileHandler',
+ 'filename': 'mplog-errors.log',
+ 'mode': 'w',
+ 'level': 'ERROR',
+ 'formatter': 'detailed',
+ },
+ },
+ 'loggers': {
+ 'foo': {
+ 'handlers' : ['foofile']
+ }
+ },
+ 'root': {
+ 'level': 'DEBUG',
+ 'handlers': ['console', 'file', 'errors']
+ },
+ }
+ workers = []
+ for i in range(5):
+ wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,))
+ workers.append(wp)
+ wp.start()
+ logging.config.dictConfig(d)
+ lp = threading.Thread(target=logger_thread, args=(q,))
+ lp.start()
+ # At this point, the main process could do some useful work of its own
+ # Once it's done that, it can wait for the workers to terminate...
+ for wp in workers:
+ wp.join()
+ # And now tell the logging thread to finish up, too
+ q.put(None)
+ lp.join()
+
+This variant shows how you can e.g. apply configuration for particular loggers
+- e.g. the ``foo`` logger has a special handler which stores all events in the
+``foo`` subsystem in a file ``mplog-foo.log``. This will be used by the logging
+machinery in the main process (even though the logging events are generated in
+the worker processes) to direct the messages to the appropriate destinations.
+
+Using file rotation
+-------------------
+
+.. sectionauthor:: Doug Hellmann, Vinay Sajip (changes)
+.. (see <http://blog.doughellmann.com/2007/05/pymotw-logging.html>)
+
+Sometimes you want to let a log file grow to a certain size, then open a new
+file and log to that. You may want to keep a certain number of these files, and
+when that many files have been created, rotate the files so that the number of
+files and the size of the files both remain bounded. For this usage pattern, the
+logging package provides a :class:`RotatingFileHandler`::
+
+ import glob
+ import logging
+ import logging.handlers
+
+ LOG_FILENAME = 'logging_rotatingfile_example.out'
+
+ # Set up a specific logger with our desired output level
+ my_logger = logging.getLogger('MyLogger')
+ my_logger.setLevel(logging.DEBUG)
+
+ # Add the log message handler to the logger
+ handler = logging.handlers.RotatingFileHandler(
+ LOG_FILENAME, maxBytes=20, backupCount=5)
+
+ my_logger.addHandler(handler)
+
+ # Log some messages
+ for i in range(20):
+ my_logger.debug('i = %d' % i)
+
+ # See what files are created
+ logfiles = glob.glob('%s*' % LOG_FILENAME)
+
+ for filename in logfiles:
+ print(filename)
+
+The result should be 6 separate files, each with part of the log history for the
+application::
+
+ logging_rotatingfile_example.out
+ logging_rotatingfile_example.out.1
+ logging_rotatingfile_example.out.2
+ logging_rotatingfile_example.out.3
+ logging_rotatingfile_example.out.4
+ logging_rotatingfile_example.out.5
+
+The most current file is always :file:`logging_rotatingfile_example.out`,
+and each time it reaches the size limit it is renamed with the suffix
+``.1``. Each of the existing backup files is renamed to increment the suffix
+(``.1`` becomes ``.2``, etc.) and the ``.6`` file is erased.
+
+Obviously this example sets the log length much too small as an extreme
+example. You would want to set *maxBytes* to an appropriate value.
+
+.. _format-styles:
+
+Use of alternative formatting styles
+------------------------------------
+
+When logging was added to the Python standard library, the only way of
+formatting messages with variable content was to use the %-formatting
+method. Since then, Python has gained two new formatting approaches:
+:class:`string.Template` (added in Python 2.4) and :meth:`str.format`
+(added in Python 2.6).
+
+Logging (as of 3.2) provides improved support for these two additional
+formatting styles. The :class:`Formatter` class been enhanced to take an
+additional, optional keyword parameter named ``style``. This defaults to
+``'%'``, but other possible values are ``'{'`` and ``'$'``, which correspond
+to the other two formatting styles. Backwards compatibility is maintained by
+default (as you would expect), but by explicitly specifying a style parameter,
+you get the ability to specify format strings which work with
+:meth:`str.format` or :class:`string.Template`. Here's an example console
+session to show the possibilities:
+
+.. code-block:: pycon
+
+ >>> import logging
+ >>> root = logging.getLogger()
+ >>> root.setLevel(logging.DEBUG)
+ >>> handler = logging.StreamHandler()
+ >>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}',
+ ... style='{')
+ >>> handler.setFormatter(bf)
+ >>> root.addHandler(handler)
+ >>> logger = logging.getLogger('foo.bar')
+ >>> logger.debug('This is a DEBUG message')
+ 2010-10-28 15:11:55,341 foo.bar DEBUG This is a DEBUG message
+ >>> logger.critical('This is a CRITICAL message')
+ 2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message
+ >>> df = logging.Formatter('$asctime $name ${levelname} $message',
+ ... style='$')
+ >>> handler.setFormatter(df)
+ >>> logger.debug('This is a DEBUG message')
+ 2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message
+ >>> logger.critical('This is a CRITICAL message')
+ 2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message
+ >>>
+
+Note that the formatting of logging messages for final output to logs is
+completely independent of how an individual logging message is constructed.
+That can still use %-formatting, as shown here::
+
+ >>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message')
+ 2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message
+ >>>
+
+Logging calls (``logger.debug()``, ``logger.info()`` etc.) only take
+positional parameters for the actual logging message itself, with keyword
+parameters used only for determining options for how to handle the actual
+logging call (e.g. the ``exc_info`` keyword parameter to indicate that
+traceback information should be logged, or the ``extra`` keyword parameter
+to indicate additional contextual information to be added to the log). So
+you cannot directly make logging calls using :meth:`str.format` or
+:class:`string.Template` syntax, because internally the logging package
+uses %-formatting to merge the format string and the variable arguments.
+There would no changing this while preserving backward compatibility, since
+all logging calls which are out there in existing code will be using %-format
+strings.
+
+There is, however, a way that you can use {}- and $- formatting to construct
+your individual log messages. Recall that for a message you can use an
+arbitrary object as a message format string, and that the logging package will
+call ``str()`` on that object to get the actual format string. Consider the
+following two classes::
+
+ class BraceMessage:
+ def __init__(self, fmt, *args, **kwargs):
+ self.fmt = fmt
+ self.args = args
+ self.kwargs = kwargs
+
+ def __str__(self):
+ return self.fmt.format(*self.args, **self.kwargs)
+
+ class DollarMessage:
+ def __init__(self, fmt, **kwargs):
+ self.fmt = fmt
+ self.kwargs = kwargs
+
+ def __str__(self):
+ from string import Template
+ return Template(self.fmt).substitute(**self.kwargs)
+
+Either of these can be used in place of a format string, to allow {}- or
+$-formatting to be used to build the actual "message" part which appears in the
+formatted log output in place of "%(message)s" or "{message}" or "$message".
+It's a little unwieldy to use the class names whenever you want to log
+something, but it's quite palatable if you use an alias such as __ (double
+underscore – not to be confused with _, the single underscore used as a
+synonym/alias for :func:`gettext.gettext` or its brethren).
+
+The above classes are not included in Python, though they're easy enough to
+copy and paste into your own code. They can be used as follows (assuming that
+they're declared in a module called ``wherever``):
+
+.. code-block:: pycon
+
+ >>> from wherever import BraceMessage as __
+ >>> print(__('Message with {0} {name}', 2, name='placeholders'))
+ Message with 2 placeholders
+ >>> class Point: pass
+ ...
+ >>> p = Point()
+ >>> p.x = 0.5
+ >>> p.y = 0.5
+ >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})',
+ ... point=p))
+ Message with coordinates: (0.50, 0.50)
+ >>> from wherever import DollarMessage as __
+ >>> print(__('Message with $num $what', num=2, what='placeholders'))
+ Message with 2 placeholders
+ >>>
+
+While the above examples use ``print()`` to show how the formatting works, you
+would of course use ``logger.debug()`` or similar to actually log using this
+approach.
+
+One thing to note is that you pay no significant performance penalty with this
+approach: the actual formatting happens not when you make the logging call, but
+when (and if) the logged message is actually about to be output to a log by a
+handler. So the only slightly unusual thing which might trip you up is that the
+parentheses go around the format string and the arguments, not just the format
+string. That's because the __ notation is just syntax sugar for a constructor
+call to one of the XXXMessage classes.
+
+
+.. currentmodule:: logging
+
+.. _custom-logrecord:
+
+Customising ``LogRecord``
+-------------------------
+
+Every logging event is represented by a :class:`LogRecord` instance.
+When an event is logged and not filtered out by a logger's level, a
+:class:`LogRecord` is created, populated with information about the event and
+then passed to the handlers for that logger (and its ancestors, up to and
+including the logger where further propagation up the hierarchy is disabled).
+Before Python 3.2, there were only two places where this creation was done:
+
+* :meth:`Logger.makeRecord`, which is called in the normal process of
+ logging an event. This invoked :class:`LogRecord` directly to create an
+ instance.
+* :func:`makeLogRecord`, which is called with a dictionary containing
+ attributes to be added to the LogRecord. This is typically invoked when a
+ suitable dictionary has been received over the network (e.g. in pickle form
+ via a :class:`~handlers.SocketHandler`, or in JSON form via an
+ :class:`~handlers.HTTPHandler`).
+
+This has usually meant that if you need to do anything special with a
+:class:`LogRecord`, you've had to do one of the following.
+
+* Create your own :class:`Logger` subclass, which overrides
+ :meth:`Logger.makeRecord`, and set it using :func:`~logging.setLoggerClass`
+ before any loggers that you care about are instantiated.
+* Add a :class:`Filter` to a logger or handler, which does the
+ necessary special manipulation you need when its
+ :meth:`~Filter.filter` method is called.
+
+The first approach would be a little unwieldy in the scenario where (say)
+several different libraries wanted to do different things. Each would attempt
+to set its own :class:`Logger` subclass, and the one which did this last would
+win.
+
+The second approach works reasonably well for many cases, but does not allow
+you to e.g. use a specialized subclass of :class:`LogRecord`. Library
+developers can set a suitable filter on their loggers, but they would have to
+remember to do this every time they introduced a new logger (which they would
+do simply by adding new packages or modules and doing ::
+
+ logger = logging.getLogger(__name__)
+
+at module level). It's probably one too many things to think about. Developers
+could also add the filter to a :class:`~logging.NullHandler` attached to their
+top-level logger, but this would not be invoked if an application developer
+attached a handler to a lower-level library logger – so output from that
+handler would not reflect the intentions of the library developer.
+
+In Python 3.2 and later, :class:`~logging.LogRecord` creation is done through a
+factory, which you can specify. The factory is just a callable you can set with
+:func:`~logging.setLogRecordFactory`, and interrogate with
+:func:`~logging.getLogRecordFactory`. The factory is invoked with the same
+signature as the :class:`~logging.LogRecord` constructor, as :class:`LogRecord`
+is the default setting for the factory.
+
+This approach allows a custom factory to control all aspects of LogRecord
+creation. For example, you could return a subclass, or just add some additional
+attributes to the record once created, using a pattern similar to this::
+
+ old_factory = logging.getLogRecordFactory()
+
+ def record_factory(*args, **kwargs):
+ record = old_factory(*args, **kwargs)
+ record.custom_attribute = 0xdecafbad
+ return record
+
+ logging.setLogRecordFactory(record_factory)
+
+This pattern allows different libraries to chain factories together, and as
+long as they don't overwrite each other's attributes or unintentionally
+overwrite the attributes provided as standard, there should be no surprises.
+However, it should be borne in mind that each link in the chain adds run-time
+overhead to all logging operations, and the technique should only be used when
+the use of a :class:`Filter` does not provide the desired result.
+
+
+.. _zeromq-handlers:
+
+Subclassing QueueHandler - a ZeroMQ example
+-------------------------------------------
+
+You can use a :class:`QueueHandler` subclass to send messages to other kinds
+of queues, for example a ZeroMQ 'publish' socket. In the example below,the
+socket is created separately and passed to the handler (as its 'queue')::
+
+ import zmq # using pyzmq, the Python binding for ZeroMQ
+ import json # for serializing records portably
+
+ ctx = zmq.Context()
+ sock = zmq.Socket(ctx, zmq.PUB) # or zmq.PUSH, or other suitable value
+ sock.bind('tcp://*:5556') # or wherever
+
+ class ZeroMQSocketHandler(QueueHandler):
+ def enqueue(self, record):
+ data = json.dumps(record.__dict__)
+ self.queue.send(data)
+
+ handler = ZeroMQSocketHandler(sock)
+
+
+Of course there are other ways of organizing this, for example passing in the
+data needed by the handler to create the socket::
+
+ class ZeroMQSocketHandler(QueueHandler):
+ def __init__(self, uri, socktype=zmq.PUB, ctx=None):
+ self.ctx = ctx or zmq.Context()
+ socket = zmq.Socket(self.ctx, socktype)
+ socket.bind(uri)
+ QueueHandler.__init__(self, socket)
+
+ def enqueue(self, record):
+ data = json.dumps(record.__dict__)
+ self.queue.send(data)
+
+ def close(self):
+ self.queue.close()
+
+
+Subclassing QueueListener - a ZeroMQ example
+--------------------------------------------
+
+You can also subclass :class:`QueueListener` to get messages from other kinds
+of queues, for example a ZeroMQ 'subscribe' socket. Here's an example::
+
+ class ZeroMQSocketListener(QueueListener):
+ def __init__(self, uri, *handlers, **kwargs):
+ self.ctx = kwargs.get('ctx') or zmq.Context()
+ socket = zmq.Socket(self.ctx, zmq.SUB)
+ socket.setsockopt(zmq.SUBSCRIBE, '') # subscribe to everything
+ socket.connect(uri)
+
+ def dequeue(self):
+ msg = self.queue.recv()
+ return logging.makeLogRecord(json.loads(msg))
+
+
+.. seealso::
+
+ Module :mod:`logging`
+ API reference for the logging module.
+
+ Module :mod:`logging.config`
+ Configuration API for the logging module.
+
+ Module :mod:`logging.handlers`
+ Useful handlers included with the logging module.
+
+ :ref:`A basic logging tutorial <logging-basic-tutorial>`
+
+ :ref:`A more advanced logging tutorial <logging-advanced-tutorial>`
+
+
+An example dictionary-based configuration
+-----------------------------------------
+
+Below is an example of a logging configuration dictionary - it's taken from
+the `documentation on the Django project <https://docs.djangoproject.com/en/1.3/topics/logging/#configuring-logging>`_.
+This dictionary is passed to :func:`~logging.config.dictConfig` to put the configuration into effect::
+
+ LOGGING = {
+ 'version': 1,
+ 'disable_existing_loggers': True,
+ 'formatters': {
+ 'verbose': {
+ 'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
+ },
+ 'simple': {
+ 'format': '%(levelname)s %(message)s'
+ },
+ },
+ 'filters': {
+ 'special': {
+ '()': 'project.logging.SpecialFilter',
+ 'foo': 'bar',
+ }
+ },
+ 'handlers': {
+ 'null': {
+ 'level':'DEBUG',
+ 'class':'django.utils.log.NullHandler',
+ },
+ 'console':{
+ 'level':'DEBUG',
+ 'class':'logging.StreamHandler',
+ 'formatter': 'simple'
+ },
+ 'mail_admins': {
+ 'level': 'ERROR',
+ 'class': 'django.utils.log.AdminEmailHandler',
+ 'filters': ['special']
+ }
+ },
+ 'loggers': {
+ 'django': {
+ 'handlers':['null'],
+ 'propagate': True,
+ 'level':'INFO',
+ },
+ 'django.request': {
+ 'handlers': ['mail_admins'],
+ 'level': 'ERROR',
+ 'propagate': False,
+ },
+ 'myproject.custom': {
+ 'handlers': ['console', 'mail_admins'],
+ 'level': 'INFO',
+ 'filters': ['special']
+ }
+ }
+ }
+
+For more information about this configuration, you can see the `relevant
+section <https://docs.djangoproject.com/en/1.3/topics/logging/#configuring-logging>`_
+of the Django documentation.
+
+A more elaborate multiprocessing example
+----------------------------------------
+
+The following working example shows how logging can be used with multiprocessing
+using configuration files. The configurations are fairly simple, but serve to
+illustrate how more complex ones could be implemented in a real multiprocessing
+scenario.
+
+In the example, the main process spawns a listener process and some worker
+processes. Each of the main process, the listener and the workers have three
+separate configurations (the workers all share the same configuration). We can
+see logging in the main process, how the workers log to a QueueHandler and how
+the listener implements a QueueListener and a more complex logging
+configuration, and arranges to dispatch events received via the queue to the
+handlers specified in the configuration. Note that these configurations are
+purely illustrative, but you should be able to adapt this example to your own
+scenario.
+
+Here's the script - the docstrings and the comments hopefully explain how it
+works::
+
+ import logging
+ import logging.config
+ import logging.handlers
+ from multiprocessing import Process, Queue, Event, current_process
+ import os
+ import random
+ import time
+
+ class MyHandler:
+ """
+ A simple handler for logging events. It runs in the listener process and
+ dispatches events to loggers based on the name in the received record,
+ which then get dispatched, by the logging system, to the handlers
+ configured for those loggers.
+ """
+ def handle(self, record):
+ logger = logging.getLogger(record.name)
+ # The process name is transformed just to show that it's the listener
+ # doing the logging to files and console
+ record.processName = '%s (for %s)' % (current_process().name, record.processName)
+ logger.handle(record)
+
+ def listener_process(q, stop_event, config):
+ """
+ This could be done in the main process, but is just done in a separate
+ process for illustrative purposes.
+
+ This initialises logging according to the specified configuration,
+ starts the listener and waits for the main process to signal completion
+ via the event. The listener is then stopped, and the process exits.
+ """
+ logging.config.dictConfig(config)
+ listener = logging.handlers.QueueListener(q, MyHandler())
+ listener.start()
+ if os.name == 'posix':
+ # On POSIX, the setup logger will have been configured in the
+ # parent process, but should have been disabled following the
+ # dictConfig call.
+ # On Windows, since fork isn't used, the setup logger won't
+ # exist in the child, so it would be created and the message
+ # would appear - hence the "if posix" clause.
+ logger = logging.getLogger('setup')
+ logger.critical('Should not appear, because of disabled logger ...')
+ stop_event.wait()
+ listener.stop()
+
+ def worker_process(config):
+ """
+ A number of these are spawned for the purpose of illustration. In
+ practice, they could be a heterogenous bunch of processes rather than
+ ones which are identical to each other.
+
+ This initialises logging according to the specified configuration,
+ and logs a hundred messages with random levels to randomly selected
+ loggers.
+
+ A small sleep is added to allow other processes a chance to run. This
+ is not strictly needed, but it mixes the output from the different
+ processes a bit more than if it's left out.
+ """
+ logging.config.dictConfig(config)
+ levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
+ logging.CRITICAL]
+ loggers = ['foo', 'foo.bar', 'foo.bar.baz',
+ 'spam', 'spam.ham', 'spam.ham.eggs']
+ if os.name == 'posix':
+ # On POSIX, the setup logger will have been configured in the
+ # parent process, but should have been disabled following the
+ # dictConfig call.
+ # On Windows, since fork isn't used, the setup logger won't
+ # exist in the child, so it would be created and the message
+ # would appear - hence the "if posix" clause.
+ logger = logging.getLogger('setup')
+ logger.critical('Should not appear, because of disabled logger ...')
+ for i in range(100):
+ lvl = random.choice(levels)
+ logger = logging.getLogger(random.choice(loggers))
+ logger.log(lvl, 'Message no. %d', i)
+ time.sleep(0.01)
+
+ def main():
+ q = Queue()
+ # The main process gets a simple configuration which prints to the console.
+ config_initial = {
+ 'version': 1,
+ 'formatters': {
+ 'detailed': {
+ 'class': 'logging.Formatter',
+ 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
+ }
+ },
+ 'handlers': {
+ 'console': {
+ 'class': 'logging.StreamHandler',
+ 'level': 'INFO',
+ },
+ },
+ 'root': {
+ 'level': 'DEBUG',
+ 'handlers': ['console']
+ },
+ }
+ # The worker process configuration is just a QueueHandler attached to the
+ # root logger, which allows all messages to be sent to the queue.
+ # We disable existing loggers to disable the "setup" logger used in the
+ # parent process. This is needed on POSIX because the logger will
+ # be there in the child following a fork().
+ config_worker = {
+ 'version': 1,
+ 'disable_existing_loggers': True,
+ 'handlers': {
+ 'queue': {
+ 'class': 'logging.handlers.QueueHandler',
+ 'queue': q,
+ },
+ },
+ 'root': {
+ 'level': 'DEBUG',
+ 'handlers': ['queue']
+ },
+ }
+ # The listener process configuration shows that the full flexibility of
+ # logging configuration is available to dispatch events to handlers however
+ # you want.
+ # We disable existing loggers to disable the "setup" logger used in the
+ # parent process. This is needed on POSIX because the logger will
+ # be there in the child following a fork().
+ config_listener = {
+ 'version': 1,
+ 'disable_existing_loggers': True,
+ 'formatters': {
+ 'detailed': {
+ 'class': 'logging.Formatter',
+ 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
+ },
+ 'simple': {
+ 'class': 'logging.Formatter',
+ 'format': '%(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
+ }
+ },
+ 'handlers': {
+ 'console': {
+ 'class': 'logging.StreamHandler',
+ 'level': 'INFO',
+ 'formatter': 'simple',
+ },
+ 'file': {
+ 'class': 'logging.FileHandler',
+ 'filename': 'mplog.log',
+ 'mode': 'w',
+ 'formatter': 'detailed',
+ },
+ 'foofile': {
+ 'class': 'logging.FileHandler',
+ 'filename': 'mplog-foo.log',
+ 'mode': 'w',
+ 'formatter': 'detailed',
+ },
+ 'errors': {
+ 'class': 'logging.FileHandler',
+ 'filename': 'mplog-errors.log',
+ 'mode': 'w',
+ 'level': 'ERROR',
+ 'formatter': 'detailed',
+ },
+ },
+ 'loggers': {
+ 'foo': {
+ 'handlers' : ['foofile']
+ }
+ },
+ 'root': {
+ 'level': 'DEBUG',
+ 'handlers': ['console', 'file', 'errors']
+ },
+ }
+ # Log some initial events, just to show that logging in the parent works
+ # normally.
+ logging.config.dictConfig(config_initial)
+ logger = logging.getLogger('setup')
+ logger.info('About to create workers ...')
+ workers = []
+ for i in range(5):
+ wp = Process(target=worker_process, name='worker %d' % (i + 1),
+ args=(config_worker,))
+ workers.append(wp)
+ wp.start()
+ logger.info('Started worker: %s', wp.name)
+ logger.info('About to create listener ...')
+ stop_event = Event()
+ lp = Process(target=listener_process, name='listener',
+ args=(q, stop_event, config_listener))
+ lp.start()
+ logger.info('Started listener')
+ # We now hang around for the workers to finish their work.
+ for wp in workers:
+ wp.join()
+ # Workers all done, listening can now stop.
+ # Logging in the parent still works normally.
+ logger.info('Telling listener to stop ...')
+ stop_event.set()
+ lp.join()
+ logger.info('All done.')
+
+ if __name__ == '__main__':
+ main()
+
+
+Inserting a BOM into messages sent to a SysLogHandler
+-----------------------------------------------------
+
+`RFC 5424 <http://tools.ietf.org/html/rfc5424>`_ requires that a
+Unicode message be sent to a syslog daemon as a set of bytes which have the
+following structure: an optional pure-ASCII component, followed by a UTF-8 Byte
+Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the `relevant
+section of the specification <http://tools.ietf.org/html/rfc5424#section-6>`_.)
+
+In Python 3.1, code was added to
+:class:`~logging.handlers.SysLogHandler` to insert a BOM into the message, but
+unfortunately, it was implemented incorrectly, with the BOM appearing at the
+beginning of the message and hence not allowing any pure-ASCII component to
+appear before it.
+
+As this behaviour is broken, the incorrect BOM insertion code is being removed
+from Python 3.2.4 and later. However, it is not being replaced, and if you
+want to produce RFC 5424-compliant messages which include a BOM, an optional
+pure-ASCII sequence before it and arbitrary Unicode after it, encoded using
+UTF-8, then you need to do the following:
+
+#. Attach a :class:`~logging.Formatter` instance to your
+ :class:`~logging.handlers.SysLogHandler` instance, with a format string
+ such as::
+
+ 'ASCII section\ufeffUnicode section'
+
+ The Unicode code point U+FEFF, when encoded using UTF-8, will be
+ encoded as a UTF-8 BOM -- the byte-string ``b'\xef\xbb\xbf'``.
+
+#. Replace the ASCII section with whatever placeholders you like, but make sure
+ that the data that appears in there after substitution is always ASCII (that
+ way, it will remain unchanged after UTF-8 encoding).
+
+#. Replace the Unicode section with whatever placeholders you like; if the data
+ which appears there after substitution contains characters outside the ASCII
+ range, that's fine -- it will be encoded using UTF-8.
+
+The formatted message *will* be encoded using UTF-8 encoding by
+``SysLogHandler``. If you follow the above rules, you should be able to produce
+RFC 5424-compliant messages. If you don't, logging may not complain, but your
+messages will not be RFC 5424-compliant, and your syslog daemon may complain.
+
+
+Implementing structured logging
+-------------------------------
+
+Although most logging messages are intended for reading by humans, and thus not
+readily machine-parseable, there might be cirumstances where you want to output
+messages in a structured format which *is* capable of being parsed by a program
+(without needing complex regular expressions to parse the log message). This is
+straightforward to achieve using the logging package. There are a number of
+ways in which this could be achieved, but the following is a simple approach
+which uses JSON to serialise the event in a machine-parseable manner::
+
+ import json
+ import logging
+
+ class StructuredMessage(object):
+ def __init__(self, message, **kwargs):
+ self.message = message
+ self.kwargs = kwargs
+
+ def __str__(self):
+ return '%s >>> %s' % (self.message, json.dumps(self.kwargs))
+
+ _ = StructuredMessage # optional, to improve readability
+
+ logging.basicConfig(level=logging.INFO, format='%(message)s')
+ logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))
+
+If the above script is run, it prints::
+
+ message 1 >>> {"fnum": 123.456, "num": 123, "bar": "baz", "foo": "bar"}
+
+Note that the order of items might be different according to the version of
+Python used.
+
+If you need more specialised processing, you can use a custom JSON encoder,
+as in the following complete example::
+
+ from __future__ import unicode_literals
+
+ import json
+ import logging
+
+ # This next bit is to ensure the script runs unchanged on 2.x and 3.x
+ try:
+ unicode
+ except NameError:
+ unicode = str
+
+ class Encoder(json.JSONEncoder):
+ def default(self, o):
+ if isinstance(o, set):
+ return tuple(o)
+ elif isinstance(o, unicode):
+ return o.encode('unicode_escape').decode('ascii')
+ return super(Encoder, self).default(o)
+
+ class StructuredMessage(object):
+ def __init__(self, message, **kwargs):
+ self.message = message
+ self.kwargs = kwargs
+
+ def __str__(self):
+ s = Encoder().encode(self.kwargs)
+ return '%s >>> %s' % (self.message, s)
+
+ _ = StructuredMessage # optional, to improve readability
+
+ def main():
+ logging.basicConfig(level=logging.INFO, format='%(message)s')
+ logging.info(_('message 1', set_value=set([1, 2, 3]), snowman='\u2603'))
+
+ if __name__ == '__main__':
+ main()
+
+When the above script is run, it prints::
+
+ message 1 >>> {"snowman": "\u2603", "set_value": [1, 2, 3]}
+
+Note that the order of items might be different according to the version of
+Python used.
+
diff --git a/Doc/howto/logging.rst b/Doc/howto/logging.rst
new file mode 100644
index 0000000000..79f1336deb
--- /dev/null
+++ b/Doc/howto/logging.rst
@@ -0,0 +1,1053 @@
+=============
+Logging HOWTO
+=============
+
+:Author: Vinay Sajip <vinay_sajip at red-dove dot com>
+
+.. _logging-basic-tutorial:
+
+.. currentmodule:: logging
+
+Basic Logging Tutorial
+----------------------
+
+Logging is a means of tracking events that happen when some software runs. The
+software's developer adds logging calls to their code to indicate that certain
+events have occurred. An event is described by a descriptive message which can
+optionally contain variable data (i.e. data that is potentially different for
+each occurrence of the event). Events also have an importance which the
+developer ascribes to the event; the importance can also be called the *level*
+or *severity*.
+
+When to use logging
+^^^^^^^^^^^^^^^^^^^
+
+Logging provides a set of convenience functions for simple logging usage. These
+are :func:`debug`, :func:`info`, :func:`warning`, :func:`error` and
+:func:`critical`. To determine when to use logging, see the table below, which
+states, for each of a set of common tasks, the best tool to use for it.
+
++-------------------------------------+--------------------------------------+
+| Task you want to perform | The best tool for the task |
++=====================================+======================================+
+| Display console output for ordinary | :func:`print` |
+| usage of a command line script or | |
+| program | |
++-------------------------------------+--------------------------------------+
+| Report events that occur during | :func:`logging.info` (or |
+| normal operation of a program (e.g. | :func:`logging.debug` for very |
+| for status monitoring or fault | detailed output for diagnostic |
+| investigation) | purposes) |
++-------------------------------------+--------------------------------------+
+| Issue a warning regarding a | :func:`warnings.warn` in library |
+| particular runtime event | code if the issue is avoidable and |
+| | the client application should be |
+| | modified to eliminate the warning |
+| | |
+| | :func:`logging.warning` if there is |
+| | nothing the client application can do|
+| | about the situation, but the event |
+| | should still be noted |
++-------------------------------------+--------------------------------------+
+| Report an error regarding a | Raise an exception |
+| particular runtime event | |
++-------------------------------------+--------------------------------------+
+| Report suppression of an error | :func:`logging.error`, |
+| without raising an exception (e.g. | :func:`logging.exception` or |
+| error handler in a long-running | :func:`logging.critical` as |
+| server process) | appropriate for the specific error |
+| | and application domain |
++-------------------------------------+--------------------------------------+
+
+The logging functions are named after the level or severity of the events
+they are used to track. The standard levels and their applicability are
+described below (in increasing order of severity):
+
++--------------+---------------------------------------------+
+| Level | When it's used |
++==============+=============================================+
+| ``DEBUG`` | Detailed information, typically of interest |
+| | only when diagnosing problems. |
++--------------+---------------------------------------------+
+| ``INFO`` | Confirmation that things are working as |
+| | expected. |
++--------------+---------------------------------------------+
+| ``WARNING`` | An indication that something unexpected |
+| | happened, or indicative of some problem in |
+| | the near future (e.g. 'disk space low'). |
+| | The software is still working as expected. |
++--------------+---------------------------------------------+
+| ``ERROR`` | Due to a more serious problem, the software |
+| | has not been able to perform some function. |
++--------------+---------------------------------------------+
+| ``CRITICAL`` | A serious error, indicating that the program|
+| | itself may be unable to continue running. |
++--------------+---------------------------------------------+
+
+The default level is ``WARNING``, which means that only events of this level
+and above will be tracked, unless the logging package is configured to do
+otherwise.
+
+Events that are tracked can be handled in different ways. The simplest way of
+handling tracked events is to print them to the console. Another common way
+is to write them to a disk file.
+
+
+.. _howto-minimal-example:
+
+A simple example
+^^^^^^^^^^^^^^^^
+
+A very simple example is::
+
+ import logging
+ logging.warning('Watch out!') # will print a message to the console
+ logging.info('I told you so') # will not print anything
+
+If you type these lines into a script and run it, you'll see::
+
+ WARNING:root:Watch out!
+
+printed out on the console. The ``INFO`` message doesn't appear because the
+default level is ``WARNING``. The printed message includes the indication of
+the level and the description of the event provided in the logging call, i.e.
+'Watch out!'. Don't worry about the 'root' part for now: it will be explained
+later. The actual output can be formatted quite flexibly if you need that;
+formatting options will also be explained later.
+
+
+Logging to a file
+^^^^^^^^^^^^^^^^^
+
+A very common situation is that of recording logging events in a file, so let's
+look at that next::
+
+ import logging
+ logging.basicConfig(filename='example.log',level=logging.DEBUG)
+ logging.debug('This message should go to the log file')
+ logging.info('So should this')
+ logging.warning('And this, too')
+
+And now if we open the file and look at what we have, we should find the log
+messages::
+
+ DEBUG:root:This message should go to the log file
+ INFO:root:So should this
+ WARNING:root:And this, too
+
+This example also shows how you can set the logging level which acts as the
+threshold for tracking. In this case, because we set the threshold to
+``DEBUG``, all of the messages were printed.
+
+If you want to set the logging level from a command-line option such as::
+
+ --log=INFO
+
+and you have the value of the parameter passed for ``--log`` in some variable
+*loglevel*, you can use::
+
+ getattr(logging, loglevel.upper())
+
+to get the value which you'll pass to :func:`basicConfig` via the *level*
+argument. You may want to error check any user input value, perhaps as in the
+following example::
+
+ # assuming loglevel is bound to the string value obtained from the
+ # command line argument. Convert to upper case to allow the user to
+ # specify --log=DEBUG or --log=debug
+ numeric_level = getattr(logging, loglevel.upper(), None)
+ if not isinstance(numeric_level, int):
+ raise ValueError('Invalid log level: %s' % loglevel)
+ logging.basicConfig(level=numeric_level, ...)
+
+The call to :func:`basicConfig` should come *before* any calls to :func:`debug`,
+:func:`info` etc. As it's intended as a one-off simple configuration facility,
+only the first call will actually do anything: subsequent calls are effectively
+no-ops.
+
+If you run the above script several times, the messages from successive runs
+are appended to the file *example.log*. If you want each run to start afresh,
+not remembering the messages from earlier runs, you can specify the *filemode*
+argument, by changing the call in the above example to::
+
+ logging.basicConfig(filename='example.log', filemode='w', level=logging.DEBUG)
+
+The output will be the same as before, but the log file is no longer appended
+to, so the messages from earlier runs are lost.
+
+
+Logging from multiple modules
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If your program consists of multiple modules, here's an example of how you
+could organize logging in it::
+
+ # myapp.py
+ import logging
+ import mylib
+
+ def main():
+ logging.basicConfig(filename='myapp.log', level=logging.INFO)
+ logging.info('Started')
+ mylib.do_something()
+ logging.info('Finished')
+
+ if __name__ == '__main__':
+ main()
+
+::
+
+ # mylib.py
+ import logging
+
+ def do_something():
+ logging.info('Doing something')
+
+If you run *myapp.py*, you should see this in *myapp.log*::
+
+ INFO:root:Started
+ INFO:root:Doing something
+ INFO:root:Finished
+
+which is hopefully what you were expecting to see. You can generalize this to
+multiple modules, using the pattern in *mylib.py*. Note that for this simple
+usage pattern, you won't know, by looking in the log file, *where* in your
+application your messages came from, apart from looking at the event
+description. If you want to track the location of your messages, you'll need
+to refer to the documentation beyond the tutorial level -- see
+:ref:`logging-advanced-tutorial`.
+
+
+Logging variable data
+^^^^^^^^^^^^^^^^^^^^^
+
+To log variable data, use a format string for the event description message and
+append the variable data as arguments. For example::
+
+ import logging
+ logging.warning('%s before you %s', 'Look', 'leap!')
+
+will display::
+
+ WARNING:root:Look before you leap!
+
+As you can see, merging of variable data into the event description message
+uses the old, %-style of string formatting. This is for backwards
+compatibility: the logging package pre-dates newer formatting options such as
+:meth:`str.format` and :class:`string.Template`. These newer formatting
+options *are* supported, but exploring them is outside the scope of this
+tutorial.
+
+
+Changing the format of displayed messages
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To change the format which is used to display messages, you need to
+specify the format you want to use::
+
+ import logging
+ logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
+ logging.debug('This message should appear on the console')
+ logging.info('So should this')
+ logging.warning('And this, too')
+
+which would print::
+
+ DEBUG:This message should appear on the console
+ INFO:So should this
+ WARNING:And this, too
+
+Notice that the 'root' which appeared in earlier examples has disappeared. For
+a full set of things that can appear in format strings, you can refer to the
+documentation for :ref:`logrecord-attributes`, but for simple usage, you just
+need the *levelname* (severity), *message* (event description, including
+variable data) and perhaps to display when the event occurred. This is
+described in the next section.
+
+
+Displaying the date/time in messages
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To display the date and time of an event, you would place '%(asctime)s' in
+your format string::
+
+ import logging
+ logging.basicConfig(format='%(asctime)s %(message)s')
+ logging.warning('is when this event was logged.')
+
+which should print something like this::
+
+ 2010-12-12 11:41:42,612 is when this event was logged.
+
+The default format for date/time display (shown above) is ISO8601. If you need
+more control over the formatting of the date/time, provide a *datefmt*
+argument to ``basicConfig``, as in this example::
+
+ import logging
+ logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
+ logging.warning('is when this event was logged.')
+
+which would display something like this::
+
+ 12/12/2010 11:46:36 AM is when this event was logged.
+
+The format of the *datefmt* argument is the same as supported by
+:func:`time.strftime`.
+
+
+Next Steps
+^^^^^^^^^^
+
+That concludes the basic tutorial. It should be enough to get you up and
+running with logging. There's a lot more that the logging package offers, but
+to get the best out of it, you'll need to invest a little more of your time in
+reading the following sections. If you're ready for that, grab some of your
+favourite beverage and carry on.
+
+If your logging needs are simple, then use the above examples to incorporate
+logging into your own scripts, and if you run into problems or don't
+understand something, please post a question on the comp.lang.python Usenet
+group (available at http://groups.google.com/group/comp.lang.python) and you
+should receive help before too long.
+
+Still here? You can carry on reading the next few sections, which provide a
+slightly more advanced/in-depth tutorial than the basic one above. After that,
+you can take a look at the :ref:`logging-cookbook`.
+
+.. _logging-advanced-tutorial:
+
+
+Advanced Logging Tutorial
+-------------------------
+
+The logging library takes a modular approach and offers several categories
+of components: loggers, handlers, filters, and formatters.
+
+* Loggers expose the interface that application code directly uses.
+* Handlers send the log records (created by loggers) to the appropriate
+ destination.
+* Filters provide a finer grained facility for determining which log records
+ to output.
+* Formatters specify the layout of log records in the final output.
+
+Log event information is passed between loggers, handlers, filters and
+formatters in a :class:`LogRecord` instance.
+
+Logging is performed by calling methods on instances of the :class:`Logger`
+class (hereafter called :dfn:`loggers`). Each instance has a name, and they are
+conceptually arranged in a namespace hierarchy using dots (periods) as
+separators. For example, a logger named 'scan' is the parent of loggers
+'scan.text', 'scan.html' and 'scan.pdf'. Logger names can be anything you want,
+and indicate the area of an application in which a logged message originates.
+
+A good convention to use when naming loggers is to use a module-level logger,
+in each module which uses logging, named as follows::
+
+ logger = logging.getLogger(__name__)
+
+This means that logger names track the package/module hierarchy, and it's
+intuitively obvious where events are logged just from the logger name.
+
+The root of the hierarchy of loggers is called the root logger. That's the
+logger used by the functions :func:`debug`, :func:`info`, :func:`warning`,
+:func:`error` and :func:`critical`, which just call the same-named method of
+the root logger. The functions and the methods have the same signatures. The
+root logger's name is printed as 'root' in the logged output.
+
+It is, of course, possible to log messages to different destinations. Support
+is included in the package for writing log messages to files, HTTP GET/POST
+locations, email via SMTP, generic sockets, queues, or OS-specific logging
+mechanisms such as syslog or the Windows NT event log. Destinations are served
+by :dfn:`handler` classes. You can create your own log destination class if
+you have special requirements not met by any of the built-in handler classes.
+
+By default, no destination is set for any logging messages. You can specify
+a destination (such as console or file) by using :func:`basicConfig` as in the
+tutorial examples. If you call the functions :func:`debug`, :func:`info`,
+:func:`warning`, :func:`error` and :func:`critical`, they will check to see
+if no destination is set; and if one is not set, they will set a destination
+of the console (``sys.stderr``) and a default format for the displayed
+message before delegating to the root logger to do the actual message output.
+
+The default format set by :func:`basicConfig` for messages is::
+
+ severity:logger name:message
+
+You can change this by passing a format string to :func:`basicConfig` with the
+*format* keyword argument. For all options regarding how a format string is
+constructed, see :ref:`formatter-objects`.
+
+Logging Flow
+^^^^^^^^^^^^
+
+The flow of log event information in loggers and handlers is illustrated in the
+following diagram.
+
+.. image:: logging_flow.png
+
+Loggers
+^^^^^^^
+
+:class:`Logger` objects have a threefold job. First, they expose several
+methods to application code so that applications can log messages at runtime.
+Second, logger objects determine which log messages to act upon based upon
+severity (the default filtering facility) or filter objects. Third, logger
+objects pass along relevant log messages to all interested log handlers.
+
+The most widely used methods on logger objects fall into two categories:
+configuration and message sending.
+
+These are the most common configuration methods:
+
+* :meth:`Logger.setLevel` specifies the lowest-severity log message a logger
+ will handle, where debug is the lowest built-in severity level and critical
+ is the highest built-in severity. For example, if the severity level is
+ INFO, the logger will handle only INFO, WARNING, ERROR, and CRITICAL messages
+ and will ignore DEBUG messages.
+
+* :meth:`Logger.addHandler` and :meth:`Logger.removeHandler` add and remove
+ handler objects from the logger object. Handlers are covered in more detail
+ in :ref:`handler-basic`.
+
+* :meth:`Logger.addFilter` and :meth:`Logger.removeFilter` add and remove filter
+ objects from the logger object. Filters are covered in more detail in
+ :ref:`filter`.
+
+You don't need to always call these methods on every logger you create. See the
+last two paragraphs in this section.
+
+With the logger object configured, the following methods create log messages:
+
+* :meth:`Logger.debug`, :meth:`Logger.info`, :meth:`Logger.warning`,
+ :meth:`Logger.error`, and :meth:`Logger.critical` all create log records with
+ a message and a level that corresponds to their respective method names. The
+ message is actually a format string, which may contain the standard string
+ substitution syntax of ``%s``, ``%d``, ``%f``, and so on. The
+ rest of their arguments is a list of objects that correspond with the
+ substitution fields in the message. With regard to ``**kwargs``, the
+ logging methods care only about a keyword of ``exc_info`` and use it to
+ determine whether to log exception information.
+
+* :meth:`Logger.exception` creates a log message similar to
+ :meth:`Logger.error`. The difference is that :meth:`Logger.exception` dumps a
+ stack trace along with it. Call this method only from an exception handler.
+
+* :meth:`Logger.log` takes a log level as an explicit argument. This is a
+ little more verbose for logging messages than using the log level convenience
+ methods listed above, but this is how to log at custom log levels.
+
+:func:`getLogger` returns a reference to a logger instance with the specified
+name if it is provided, or ``root`` if not. The names are period-separated
+hierarchical structures. Multiple calls to :func:`getLogger` with the same name
+will return a reference to the same logger object. Loggers that are further
+down in the hierarchical list are children of loggers higher up in the list.
+For example, given a logger with a name of ``foo``, loggers with names of
+``foo.bar``, ``foo.bar.baz``, and ``foo.bam`` are all descendants of ``foo``.
+
+Loggers have a concept of *effective level*. If a level is not explicitly set
+on a logger, the level of its parent is used instead as its effective level.
+If the parent has no explicit level set, *its* parent is examined, and so on -
+all ancestors are searched until an explicitly set level is found. The root
+logger always has an explicit level set (``WARNING`` by default). When deciding
+whether to process an event, the effective level of the logger is used to
+determine whether the event is passed to the logger's handlers.
+
+Child loggers propagate messages up to the handlers associated with their
+ancestor loggers. Because of this, it is unnecessary to define and configure
+handlers for all the loggers an application uses. It is sufficient to
+configure handlers for a top-level logger and create child loggers as needed.
+(You can, however, turn off propagation by setting the *propagate*
+attribute of a logger to *False*.)
+
+
+.. _handler-basic:
+
+Handlers
+^^^^^^^^
+
+:class:`~logging.Handler` objects are responsible for dispatching the
+appropriate log messages (based on the log messages' severity) to the handler's
+specified destination. Logger objects can add zero or more handler objects to
+themselves with an :func:`addHandler` method. As an example scenario, an
+application may want to send all log messages to a log file, all log messages
+of error or higher to stdout, and all messages of critical to an email address.
+This scenario requires three individual handlers where each handler is
+responsible for sending messages of a specific severity to a specific location.
+
+The standard library includes quite a few handler types (see
+:ref:`useful-handlers`); the tutorials use mainly :class:`StreamHandler` and
+:class:`FileHandler` in its examples.
+
+There are very few methods in a handler for application developers to concern
+themselves with. The only handler methods that seem relevant for application
+developers who are using the built-in handler objects (that is, not creating
+custom handlers) are the following configuration methods:
+
+* The :meth:`Handler.setLevel` method, just as in logger objects, specifies the
+ lowest severity that will be dispatched to the appropriate destination. Why
+ are there two :func:`setLevel` methods? The level set in the logger
+ determines which severity of messages it will pass to its handlers. The level
+ set in each handler determines which messages that handler will send on.
+
+* :func:`setFormatter` selects a Formatter object for this handler to use.
+
+* :func:`addFilter` and :func:`removeFilter` respectively configure and
+ deconfigure filter objects on handlers.
+
+Application code should not directly instantiate and use instances of
+:class:`Handler`. Instead, the :class:`Handler` class is a base class that
+defines the interface that all handlers should have and establishes some
+default behavior that child classes can use (or override).
+
+
+Formatters
+^^^^^^^^^^
+
+Formatter objects configure the final order, structure, and contents of the log
+message. Unlike the base :class:`logging.Handler` class, application code may
+instantiate formatter classes, although you could likely subclass the formatter
+if your application needs special behavior. The constructor takes three
+optional arguments -- a message format string, a date format string and a style
+indicator.
+
+.. method:: logging.Formatter.__init__(fmt=None, datefmt=None, style='%')
+
+If there is no message format string, the default is to use the
+raw message. If there is no date format string, the default date format is::
+
+ %Y-%m-%d %H:%M:%S
+
+with the milliseconds tacked on at the end. The ``style`` is one of `%`, '{'
+or '$'. If one of these is not specified, then '%' will be used.
+
+If the ``style`` is '%', the message format string uses
+``%(<dictionary key>)s`` styled string substitution; the possible keys are
+documented in :ref:`logrecord-attributes`. If the style is '{', the message
+format string is assumed to be compatible with :meth:`str.format` (using
+keyword arguments), while if the style is '$' then the message format string
+should conform to what is expected by :meth:`string.Template.substitute`.
+
+.. versionchanged:: 3.2
+ Added the ``style`` parameter.
+
+The following message format string will log the time in a human-readable
+format, the severity of the message, and the contents of the message, in that
+order::
+
+ '%(asctime)s - %(levelname)s - %(message)s'
+
+Formatters use a user-configurable function to convert the creation time of a
+record to a tuple. By default, :func:`time.localtime` is used; to change this
+for a particular formatter instance, set the ``converter`` attribute of the
+instance to a function with the same signature as :func:`time.localtime` or
+:func:`time.gmtime`. To change it for all formatters, for example if you want
+all logging times to be shown in GMT, set the ``converter`` attribute in the
+Formatter class (to ``time.gmtime`` for GMT display).
+
+
+Configuring Logging
+^^^^^^^^^^^^^^^^^^^
+
+.. currentmodule:: logging.config
+
+Programmers can configure logging in three ways:
+
+1. Creating loggers, handlers, and formatters explicitly using Python
+ code that calls the configuration methods listed above.
+2. Creating a logging config file and reading it using the :func:`fileConfig`
+ function.
+3. Creating a dictionary of configuration information and passing it
+ to the :func:`dictConfig` function.
+
+For the reference documentation on the last two options, see
+:ref:`logging-config-api`. The following example configures a very simple
+logger, a console handler, and a simple formatter using Python code::
+
+ import logging
+
+ # create logger
+ logger = logging.getLogger('simple_example')
+ logger.setLevel(logging.DEBUG)
+
+ # create console handler and set level to debug
+ ch = logging.StreamHandler()
+ ch.setLevel(logging.DEBUG)
+
+ # create formatter
+ formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+
+ # add formatter to ch
+ ch.setFormatter(formatter)
+
+ # add ch to logger
+ logger.addHandler(ch)
+
+ # 'application' code
+ logger.debug('debug message')
+ logger.info('info message')
+ logger.warn('warn message')
+ logger.error('error message')
+ logger.critical('critical message')
+
+Running this module from the command line produces the following output::
+
+ $ python simple_logging_module.py
+ 2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message
+ 2005-03-19 15:10:26,620 - simple_example - INFO - info message
+ 2005-03-19 15:10:26,695 - simple_example - WARNING - warn message
+ 2005-03-19 15:10:26,697 - simple_example - ERROR - error message
+ 2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message
+
+The following Python module creates a logger, handler, and formatter nearly
+identical to those in the example listed above, with the only difference being
+the names of the objects::
+
+ import logging
+ import logging.config
+
+ logging.config.fileConfig('logging.conf')
+
+ # create logger
+ logger = logging.getLogger('simpleExample')
+
+ # 'application' code
+ logger.debug('debug message')
+ logger.info('info message')
+ logger.warn('warn message')
+ logger.error('error message')
+ logger.critical('critical message')
+
+Here is the logging.conf file::
+
+ [loggers]
+ keys=root,simpleExample
+
+ [handlers]
+ keys=consoleHandler
+
+ [formatters]
+ keys=simpleFormatter
+
+ [logger_root]
+ level=DEBUG
+ handlers=consoleHandler
+
+ [logger_simpleExample]
+ level=DEBUG
+ handlers=consoleHandler
+ qualname=simpleExample
+ propagate=0
+
+ [handler_consoleHandler]
+ class=StreamHandler
+ level=DEBUG
+ formatter=simpleFormatter
+ args=(sys.stdout,)
+
+ [formatter_simpleFormatter]
+ format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
+ datefmt=
+
+The output is nearly identical to that of the non-config-file-based example::
+
+ $ python simple_logging_config.py
+ 2005-03-19 15:38:55,977 - simpleExample - DEBUG - debug message
+ 2005-03-19 15:38:55,979 - simpleExample - INFO - info message
+ 2005-03-19 15:38:56,054 - simpleExample - WARNING - warn message
+ 2005-03-19 15:38:56,055 - simpleExample - ERROR - error message
+ 2005-03-19 15:38:56,130 - simpleExample - CRITICAL - critical message
+
+You can see that the config file approach has a few advantages over the Python
+code approach, mainly separation of configuration and code and the ability of
+noncoders to easily modify the logging properties.
+
+.. warning:: The :func:`fileConfig` function takes a default parameter,
+ ``disable_existing_loggers``, which defaults to ``True`` for reasons of
+ backward compatibility. This may or may not be what you want, since it
+ will cause any loggers existing before the :func:`fileConfig` call to
+ be disabled unless they (or an ancestor) are explicitly named in the
+ configuration. Please refer to the reference documentation for more
+ information, and specify ``False`` for this parameter if you wish.
+
+ The dictionary passed to :func:`dictConfig` can also specify a Boolean
+ value with key ``disable_existing_loggers``, which if not specified
+ explicitly in the dictionary also defaults to being interpreted as
+ ``True``. This leads to the logger-disabling behaviour described above,
+ which may not be what you want - in which case, provide the key
+ explicitly with a value of ``False``.
+
+
+.. currentmodule:: logging
+
+Note that the class names referenced in config files need to be either relative
+to the logging module, or absolute values which can be resolved using normal
+import mechanisms. Thus, you could use either
+:class:`~logging.handlers.WatchedFileHandler` (relative to the logging module) or
+``mypackage.mymodule.MyHandler`` (for a class defined in package ``mypackage``
+and module ``mymodule``, where ``mypackage`` is available on the Python import
+path).
+
+In Python 3.2, a new means of configuring logging has been introduced, using
+dictionaries to hold configuration information. This provides a superset of the
+functionality of the config-file-based approach outlined above, and is the
+recommended configuration method for new applications and deployments. Because
+a Python dictionary is used to hold configuration information, and since you
+can populate that dictionary using different means, you have more options for
+configuration. For example, you can use a configuration file in JSON format,
+or, if you have access to YAML processing functionality, a file in YAML
+format, to populate the configuration dictionary. Or, of course, you can
+construct the dictionary in Python code, receive it in pickled form over a
+socket, or use whatever approach makes sense for your application.
+
+Here's an example of the same configuration as above, in YAML format for
+the new dictionary-based approach::
+
+ version: 1
+ formatters:
+ simple:
+ format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ handlers:
+ console:
+ class: logging.StreamHandler
+ level: DEBUG
+ formatter: simple
+ stream: ext://sys.stdout
+ loggers:
+ simpleExample:
+ level: DEBUG
+ handlers: [console]
+ propagate: no
+ root:
+ level: DEBUG
+ handlers: [console]
+
+For more information about logging using a dictionary, see
+:ref:`logging-config-api`.
+
+What happens if no configuration is provided
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If no logging configuration is provided, it is possible to have a situation
+where a logging event needs to be output, but no handlers can be found to
+output the event. The behaviour of the logging package in these
+circumstances is dependent on the Python version.
+
+For versions of Python prior to 3.2, the behaviour is as follows:
+
+* If *logging.raiseExceptions* is *False* (production mode), the event is
+ silently dropped.
+
+* If *logging.raiseExceptions* is *True* (development mode), a message
+ 'No handlers could be found for logger X.Y.Z' is printed once.
+
+In Python 3.2 and later, the behaviour is as follows:
+
+* The event is output using a 'handler of last resort', stored in
+ ``logging.lastResort``. This internal handler is not associated with any
+ logger, and acts like a :class:`~logging.StreamHandler` which writes the
+ event description message to the current value of ``sys.stderr`` (therefore
+ respecting any redirections which may be in effect). No formatting is
+ done on the message - just the bare event description message is printed.
+ The handler's level is set to ``WARNING``, so all events at this and
+ greater severities will be output.
+
+To obtain the pre-3.2 behaviour, ``logging.lastResort`` can be set to *None*.
+
+.. _library-config:
+
+Configuring Logging for a Library
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When developing a library which uses logging, you should take care to
+document how the library uses logging - for example, the names of loggers
+used. Some consideration also needs to be given to its logging configuration.
+If the using application does not use logging, and library code makes logging
+calls, then (as described in the previous section) events of severity
+``WARNING`` and greater will be printed to ``sys.stderr``. This is regarded as
+the best default behaviour.
+
+If for some reason you *don't* want these messages printed in the absence of
+any logging configuration, you can attach a do-nothing handler to the top-level
+logger for your library. This avoids the message being printed, since a handler
+will be always be found for the library's events: it just doesn't produce any
+output. If the library user configures logging for application use, presumably
+that configuration will add some handlers, and if levels are suitably
+configured then logging calls made in library code will send output to those
+handlers, as normal.
+
+A do-nothing handler is included in the logging package:
+:class:`~logging.NullHandler` (since Python 3.1). An instance of this handler
+could be added to the top-level logger of the logging namespace used by the
+library (*if* you want to prevent your library's logged events being output to
+``sys.stderr`` in the absence of logging configuration). If all logging by a
+library *foo* is done using loggers with names matching 'foo.x', 'foo.x.y',
+etc. then the code::
+
+ import logging
+ logging.getLogger('foo').addHandler(logging.NullHandler())
+
+should have the desired effect. If an organisation produces a number of
+libraries, then the logger name specified can be 'orgname.foo' rather than
+just 'foo'.
+
+.. note:: It is strongly advised that you *do not add any handlers other
+ than* :class:`~logging.NullHandler` *to your library's loggers*. This is
+ because the configuration of handlers is the prerogative of the application
+ developer who uses your library. The application developer knows their
+ target audience and what handlers are most appropriate for their
+ application: if you add handlers 'under the hood', you might well interfere
+ with their ability to carry out unit tests and deliver logs which suit their
+ requirements.
+
+
+Logging Levels
+--------------
+
+The numeric values of logging levels are given in the following table. These are
+primarily of interest if you want to define your own levels, and need them to
+have specific values relative to the predefined levels. If you define a level
+with the same numeric value, it overwrites the predefined value; the predefined
+name is lost.
+
++--------------+---------------+
+| Level | Numeric value |
++==============+===============+
+| ``CRITICAL`` | 50 |
++--------------+---------------+
+| ``ERROR`` | 40 |
++--------------+---------------+
+| ``WARNING`` | 30 |
++--------------+---------------+
+| ``INFO`` | 20 |
++--------------+---------------+
+| ``DEBUG`` | 10 |
++--------------+---------------+
+| ``NOTSET`` | 0 |
++--------------+---------------+
+
+Levels can also be associated with loggers, being set either by the developer or
+through loading a saved logging configuration. When a logging method is called
+on a logger, the logger compares its own level with the level associated with
+the method call. If the logger's level is higher than the method call's, no
+logging message is actually generated. This is the basic mechanism controlling
+the verbosity of logging output.
+
+Logging messages are encoded as instances of the :class:`~logging.LogRecord`
+class. When a logger decides to actually log an event, a
+:class:`~logging.LogRecord` instance is created from the logging message.
+
+Logging messages are subjected to a dispatch mechanism through the use of
+:dfn:`handlers`, which are instances of subclasses of the :class:`Handler`
+class. Handlers are responsible for ensuring that a logged message (in the form
+of a :class:`LogRecord`) ends up in a particular location (or set of locations)
+which is useful for the target audience for that message (such as end users,
+support desk staff, system administrators, developers). Handlers are passed
+:class:`LogRecord` instances intended for particular destinations. Each logger
+can have zero, one or more handlers associated with it (via the
+:meth:`~Logger.addHandler` method of :class:`Logger`). In addition to any
+handlers directly associated with a logger, *all handlers associated with all
+ancestors of the logger* are called to dispatch the message (unless the
+*propagate* flag for a logger is set to a false value, at which point the
+passing to ancestor handlers stops).
+
+Just as for loggers, handlers can have levels associated with them. A handler's
+level acts as a filter in the same way as a logger's level does. If a handler
+decides to actually dispatch an event, the :meth:`~Handler.emit` method is used
+to send the message to its destination. Most user-defined subclasses of
+:class:`Handler` will need to override this :meth:`~Handler.emit`.
+
+.. _custom-levels:
+
+Custom Levels
+^^^^^^^^^^^^^
+
+Defining your own levels is possible, but should not be necessary, as the
+existing levels have been chosen on the basis of practical experience.
+However, if you are convinced that you need custom levels, great care should
+be exercised when doing this, and it is possibly *a very bad idea to define
+custom levels if you are developing a library*. That's because if multiple
+library authors all define their own custom levels, there is a chance that
+the logging output from such multiple libraries used together will be
+difficult for the using developer to control and/or interpret, because a
+given numeric value might mean different things for different libraries.
+
+.. _useful-handlers:
+
+Useful Handlers
+---------------
+
+In addition to the base :class:`Handler` class, many useful subclasses are
+provided:
+
+#. :class:`StreamHandler` instances send messages to streams (file-like
+ objects).
+
+#. :class:`FileHandler` instances send messages to disk files.
+
+#. :class:`~handlers.BaseRotatingHandler` is the base class for handlers that
+ rotate log files at a certain point. It is not meant to be instantiated
+ directly. Instead, use :class:`~handlers.RotatingFileHandler` or
+ :class:`~handlers.TimedRotatingFileHandler`.
+
+#. :class:`~handlers.RotatingFileHandler` instances send messages to disk
+ files, with support for maximum log file sizes and log file rotation.
+
+#. :class:`~handlers.TimedRotatingFileHandler` instances send messages to
+ disk files, rotating the log file at certain timed intervals.
+
+#. :class:`~handlers.SocketHandler` instances send messages to TCP/IP
+ sockets.
+
+#. :class:`~handlers.DatagramHandler` instances send messages to UDP
+ sockets.
+
+#. :class:`~handlers.SMTPHandler` instances send messages to a designated
+ email address.
+
+#. :class:`~handlers.SysLogHandler` instances send messages to a Unix
+ syslog daemon, possibly on a remote machine.
+
+#. :class:`~handlers.NTEventLogHandler` instances send messages to a
+ Windows NT/2000/XP event log.
+
+#. :class:`~handlers.MemoryHandler` instances send messages to a buffer
+ in memory, which is flushed whenever specific criteria are met.
+
+#. :class:`~handlers.HTTPHandler` instances send messages to an HTTP
+ server using either ``GET`` or ``POST`` semantics.
+
+#. :class:`~handlers.WatchedFileHandler` instances watch the file they are
+ logging to. If the file changes, it is closed and reopened using the file
+ name. This handler is only useful on Unix-like systems; Windows does not
+ support the underlying mechanism used.
+
+#. :class:`~handlers.QueueHandler` instances send messages to a queue, such as
+ those implemented in the :mod:`queue` or :mod:`multiprocessing` modules.
+
+#. :class:`NullHandler` instances do nothing with error messages. They are used
+ by library developers who want to use logging, but want to avoid the 'No
+ handlers could be found for logger XXX' message which can be displayed if
+ the library user has not configured logging. See :ref:`library-config` for
+ more information.
+
+.. versionadded:: 3.1
+ The :class:`NullHandler` class.
+
+.. versionadded:: 3.2
+ The :class:`~handlers.QueueHandler` class.
+
+The :class:`NullHandler`, :class:`StreamHandler` and :class:`FileHandler`
+classes are defined in the core logging package. The other handlers are
+defined in a sub- module, :mod:`logging.handlers`. (There is also another
+sub-module, :mod:`logging.config`, for configuration functionality.)
+
+Logged messages are formatted for presentation through instances of the
+:class:`Formatter` class. They are initialized with a format string suitable for
+use with the % operator and a dictionary.
+
+For formatting multiple messages in a batch, instances of
+:class:`BufferingFormatter` can be used. In addition to the format string (which
+is applied to each message in the batch), there is provision for header and
+trailer format strings.
+
+When filtering based on logger level and/or handler level is not enough,
+instances of :class:`Filter` can be added to both :class:`Logger` and
+:class:`Handler` instances (through their :meth:`addFilter` method). Before
+deciding to process a message further, both loggers and handlers consult all
+their filters for permission. If any filter returns a false value, the message
+is not processed further.
+
+The basic :class:`Filter` functionality allows filtering by specific logger
+name. If this feature is used, messages sent to the named logger and its
+children are allowed through the filter, and all others dropped.
+
+
+.. _logging-exceptions:
+
+Exceptions raised during logging
+--------------------------------
+
+The logging package is designed to swallow exceptions which occur while logging
+in production. This is so that errors which occur while handling logging events
+- such as logging misconfiguration, network or other similar errors - do not
+cause the application using logging to terminate prematurely.
+
+:class:`SystemExit` and :class:`KeyboardInterrupt` exceptions are never
+swallowed. Other exceptions which occur during the :meth:`emit` method of a
+:class:`Handler` subclass are passed to its :meth:`handleError` method.
+
+The default implementation of :meth:`handleError` in :class:`Handler` checks
+to see if a module-level variable, :data:`raiseExceptions`, is set. If set, a
+traceback is printed to :data:`sys.stderr`. If not set, the exception is swallowed.
+
+.. note:: The default value of :data:`raiseExceptions` is ``True``. This is
+ because during development, you typically want to be notified of any
+ exceptions that occur. It's advised that you set :data:`raiseExceptions` to
+ ``False`` for production usage.
+
+.. currentmodule:: logging
+
+.. _arbitrary-object-messages:
+
+Using arbitrary objects as messages
+-----------------------------------
+
+In the preceding sections and examples, it has been assumed that the message
+passed when logging the event is a string. However, this is not the only
+possibility. You can pass an arbitrary object as a message, and its
+:meth:`__str__` method will be called when the logging system needs to convert
+it to a string representation. In fact, if you want to, you can avoid
+computing a string representation altogether - for example, the
+:class:`SocketHandler` emits an event by pickling it and sending it over the
+wire.
+
+
+Optimization
+------------
+
+Formatting of message arguments is deferred until it cannot be avoided.
+However, computing the arguments passed to the logging method can also be
+expensive, and you may want to avoid doing it if the logger will just throw
+away your event. To decide what to do, you can call the :meth:`isEnabledFor`
+method which takes a level argument and returns true if the event would be
+created by the Logger for that level of call. You can write code like this::
+
+ if logger.isEnabledFor(logging.DEBUG):
+ logger.debug('Message with %s, %s', expensive_func1(),
+ expensive_func2())
+
+so that if the logger's threshold is set above ``DEBUG``, the calls to
+:func:`expensive_func1` and :func:`expensive_func2` are never made.
+
+There are other optimizations which can be made for specific applications which
+need more precise control over what logging information is collected. Here's a
+list of things you can do to avoid processing during logging which you don't
+need:
+
++-----------------------------------------------+----------------------------------------+
+| What you don't want to collect | How to avoid collecting it |
++===============================================+========================================+
+| Information about where calls were made from. | Set ``logging._srcfile`` to ``None``. |
++-----------------------------------------------+----------------------------------------+
+| Threading information. | Set ``logging.logThreads`` to ``0``. |
++-----------------------------------------------+----------------------------------------+
+| Process information. | Set ``logging.logProcesses`` to ``0``. |
++-----------------------------------------------+----------------------------------------+
+
+Also note that the core logging module only includes the basic handlers. If
+you don't import :mod:`logging.handlers` and :mod:`logging.config`, they won't
+take up any memory.
+
+.. seealso::
+
+ Module :mod:`logging`
+ API reference for the logging module.
+
+ Module :mod:`logging.config`
+ Configuration API for the logging module.
+
+ Module :mod:`logging.handlers`
+ Useful handlers included with the logging module.
+
+ :ref:`A logging cookbook <logging-cookbook>`
+
diff --git a/Doc/howto/logging_flow.png b/Doc/howto/logging_flow.png
new file mode 100755
index 0000000000..a88382309a
--- /dev/null
+++ b/Doc/howto/logging_flow.png
Binary files differ
diff --git a/Doc/howto/pyporting.rst b/Doc/howto/pyporting.rst
new file mode 100644
index 0000000000..a2e41733e4
--- /dev/null
+++ b/Doc/howto/pyporting.rst
@@ -0,0 +1,715 @@
+.. _pyporting-howto:
+
+*********************************
+Porting Python 2 Code to Python 3
+*********************************
+
+:author: Brett Cannon
+
+.. topic:: Abstract
+
+ With Python 3 being the future of Python while Python 2 is still in active
+ use, it is good to have your project available for both major releases of
+ Python. This guide is meant to help you choose which strategy works best
+ for your project to support both Python 2 & 3 along with how to execute
+ that strategy.
+
+ If you are looking to port an extension module instead of pure Python code,
+ please see :ref:`cporting-howto`.
+
+
+Choosing a Strategy
+===================
+
+When a project makes the decision that it's time to support both Python 2 & 3,
+a decision needs to be made as to how to go about accomplishing that goal.
+The chosen strategy will depend on how large the project's existing
+codebase is and how much divergence you want from your Python 2 codebase from
+your Python 3 one (e.g., starting a new version with Python 3).
+
+If your project is brand-new or does not have a large codebase, then you may
+want to consider writing/porting :ref:`all of your code for Python 3
+and use 3to2 <use_3to2>` to port your code for Python 2.
+
+If you would prefer to maintain a codebase which is semantically **and**
+syntactically compatible with Python 2 & 3 simultaneously, you can write
+:ref:`use_same_source`. While this tends to lead to somewhat non-idiomatic
+code, it does mean you keep a rapid development process for you, the developer.
+
+Finally, you do have the option of :ref:`using 2to3 <use_2to3>` to translate
+Python 2 code into Python 3 code (with some manual help). This can take the
+form of branching your code and using 2to3 to start a Python 3 branch. You can
+also have users perform the translation at installation time automatically so
+that you only have to maintain a Python 2 codebase.
+
+Regardless of which approach you choose, porting is not as hard or
+time-consuming as you might initially think. You can also tackle the problem
+piece-meal as a good portion of porting is simply updating your code to follow
+current best practices in a Python 2/3 compatible way.
+
+
+Universal Bits of Advice
+------------------------
+
+Regardless of what strategy you pick, there are a few things you should
+consider.
+
+One is make sure you have a robust test suite. You need to make sure everything
+continues to work, just like when you support a new minor version of Python.
+This means making sure your test suite is thorough and is ported properly
+between Python 2 & 3. You will also most likely want to use something like tox_
+to automate testing between both a Python 2 and Python 3 VM.
+
+Two, once your project has Python 3 support, make sure to add the proper
+classifier on the Cheeseshop_ (PyPI_). To have your project listed as Python 3
+compatible it must have the
+`Python 3 classifier <http://pypi.python.org/pypi?:action=browse&c=533>`_
+(from
+http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/)::
+
+ setup(
+ name='Your Library',
+ version='1.0',
+ classifiers=[
+ # make sure to use :: Python *and* :: Python :: 3 so
+ # that pypi can list the package on the python 3 page
+ 'Programming Language :: Python',
+ 'Programming Language :: Python :: 3'
+ ],
+ packages=['yourlibrary'],
+ # make sure to add custom_fixers to the MANIFEST.in
+ include_package_data=True,
+ # ...
+ )
+
+
+Doing so will cause your project to show up in the
+`Python 3 packages list
+<http://pypi.python.org/pypi?:action=browse&c=533&show=all>`_. You will know
+you set the classifier properly as visiting your project page on the Cheeseshop
+will show a Python 3 logo in the upper-left corner of the page.
+
+Three, the six_ project provides a library which helps iron out differences
+between Python 2 & 3. If you find there is a sticky point that is a continual
+point of contention in your translation or maintenance of code, consider using
+a source-compatible solution relying on six. If you have to create your own
+Python 2/3 compatible solution, you can use ``sys.version_info[0] >= 3`` as a
+guard.
+
+Four, read all the approaches. Just because some bit of advice applies to one
+approach more than another doesn't mean that some advice doesn't apply to other
+strategies.
+
+Five, drop support for older Python versions if possible. `Python 2.5`_
+introduced a lot of useful syntax and libraries which have become idiomatic
+in Python 3. `Python 2.6`_ introduced future statements which makes
+compatibility much easier if you are going from Python 2 to 3.
+`Python 2.7`_ continues the trend in the stdlib. So choose the newest version
+of Python which you believe can be your minimum support version
+and work from there.
+
+
+.. _tox: http://codespeak.net/tox/
+.. _Cheeseshop:
+.. _PyPI: http://pypi.python.org/
+.. _six: http://packages.python.org/six
+.. _Python 2.7: http://www.python.org/2.7.x
+.. _Python 2.6: http://www.python.org/2.6.x
+.. _Python 2.5: http://www.python.org/2.5.x
+.. _Python 2.4: http://www.python.org/2.4.x
+.. _Python 2.3: http://www.python.org/2.3.x
+.. _Python 2.2: http://www.python.org/2.2.x
+
+
+.. _use_3to2:
+
+Python 3 and 3to2
+=================
+
+If you are starting a new project or your codebase is small enough, you may
+want to consider writing your code for Python 3 and backporting to Python 2
+using 3to2_. Thanks to Python 3 being more strict about things than Python 2
+(e.g., bytes vs. strings), the source translation can be easier and more
+straightforward than from Python 2 to 3. Plus it gives you more direct
+experience developing in Python 3 which, since it is the future of Python, is a
+good thing long-term.
+
+A drawback of this approach is that 3to2 is a third-party project. This means
+that the Python core developers (and thus this guide) can make no promises
+about how well 3to2 works at any time. There is nothing to suggest, though,
+that 3to2 is not a high-quality project.
+
+
+.. _3to2: https://bitbucket.org/amentajo/lib3to2/overview
+
+
+.. _use_2to3:
+
+Python 2 and 2to3
+=================
+
+Included with Python since 2.6, the 2to3_ tool (and :mod:`lib2to3` module)
+helps with porting Python 2 to Python 3 by performing various source
+translations. This is a perfect solution for projects which wish to branch
+their Python 3 code from their Python 2 codebase and maintain them as
+independent codebases. You can even begin preparing to use this approach
+today by writing future-compatible Python code which works cleanly in
+Python 2 in conjunction with 2to3; all steps outlined below will work
+with Python 2 code up to the point when the actual use of 2to3 occurs.
+
+Use of 2to3 as an on-demand translation step at install time is also possible,
+preventing the need to maintain a separate Python 3 codebase, but this approach
+does come with some drawbacks. While users will only have to pay the
+translation cost once at installation, you as a developer will need to pay the
+cost regularly during development. If your codebase is sufficiently large
+enough then the translation step ends up acting like a compilation step,
+robbing you of the rapid development process you are used to with Python.
+Obviously the time required to translate a project will vary, so do an
+experimental translation just to see how long it takes to evaluate whether you
+prefer this approach compared to using :ref:`use_same_source` or simply keeping
+a separate Python 3 codebase.
+
+Below are the typical steps taken by a project which uses a 2to3-based approach
+to supporting Python 2 & 3.
+
+
+Support Python 2.7
+------------------
+
+As a first step, make sure that your project is compatible with `Python 2.7`_.
+This is just good to do as Python 2.7 is the last release of Python 2 and thus
+will be used for a rather long time. It also allows for use of the ``-3`` flag
+to Python to help discover places in your code which 2to3 cannot handle but are
+known to cause issues.
+
+Try to Support `Python 2.6`_ and Newer Only
+-------------------------------------------
+
+While not possible for all projects, if you can support `Python 2.6`_ and newer
+**only**, your life will be much easier. Various future statements, stdlib
+additions, etc. exist only in Python 2.6 and later which greatly assist in
+porting to Python 3. But if you project must keep support for `Python 2.5`_ (or
+even `Python 2.4`_) then it is still possible to port to Python 3.
+
+Below are the benefits you gain if you only have to support Python 2.6 and
+newer. Some of these options are personal choice while others are
+**strongly** recommended (the ones that are more for personal choice are
+labeled as such). If you continue to support older versions of Python then you
+at least need to watch out for situations that these solutions fix.
+
+
+``from __future__ import print_function``
+'''''''''''''''''''''''''''''''''''''''''
+
+This is a personal choice. 2to3 handles the translation from the print
+statement to the print function rather well so this is an optional step. This
+future statement does help, though, with getting used to typing
+``print('Hello, World')`` instead of ``print 'Hello, World'``.
+
+
+``from __future__ import unicode_literals``
+'''''''''''''''''''''''''''''''''''''''''''
+
+Another personal choice. You can always mark what you want to be a (unicode)
+string with a ``u`` prefix to get the same effect. But regardless of whether
+you use this future statement or not, you **must** make sure you know exactly
+which Python 2 strings you want to be bytes, and which are to be strings. This
+means you should, **at minimum** mark all strings that are meant to be text
+strings with a ``u`` prefix if you do not use this future statement.
+
+
+Bytes literals
+''''''''''''''
+
+This is a **very** important one. The ability to prefix Python 2 strings that
+are meant to contain bytes with a ``b`` prefix help to very clearly delineate
+what is and is not a Python 3 string. When you run 2to3 on code, all Python 2
+strings become Python 3 strings **unless** they are prefixed with ``b``.
+
+There are some differences between byte literals in Python 2 and those in
+Python 3 thanks to the bytes type just being an alias to ``str`` in Python 2.
+Probably the biggest "gotcha" is that indexing results in different values. In
+Python 2, the value of ``b'py'[1]`` is ``'y'``, while in Python 3 it's ``121``.
+You can avoid this disparity by always slicing at the size of a single element:
+``b'py'[1:2]`` is ``'y'`` in Python 2 and ``b'y'`` in Python 3 (i.e., close
+enough).
+
+You cannot concatenate bytes and strings in Python 3. But since Python
+2 has bytes aliased to ``str``, it will succeed: ``b'a' + u'b'`` works in
+Python 2, but ``b'a' + 'b'`` in Python 3 is a :exc:`TypeError`. A similar issue
+also comes about when doing comparisons between bytes and strings.
+
+
+Supporting `Python 2.5`_ and Newer Only
+---------------------------------------
+
+If you are supporting `Python 2.5`_ and newer there are still some features of
+Python that you can utilize.
+
+
+``from __future__ import absolute_import``
+''''''''''''''''''''''''''''''''''''''''''
+
+Implicit relative imports (e.g., importing ``spam.bacon`` from within
+``spam.eggs`` with the statement ``import bacon``) does not work in Python 3.
+This future statement moves away from that and allows the use of explicit
+relative imports (e.g., ``from . import bacon``).
+
+In `Python 2.5`_ you must use
+the __future__ statement to get to use explicit relative imports and prevent
+implicit ones. In `Python 2.6`_ explicit relative imports are available without
+the statement, but you still want the __future__ statement to prevent implicit
+relative imports. In `Python 2.7`_ the __future__ statement is not needed. In
+other words, unless you are only supporting Python 2.7 or a version earlier
+than Python 2.5, use the __future__ statement.
+
+
+
+Handle Common "Gotchas"
+-----------------------
+
+There are a few things that just consistently come up as sticking points for
+people which 2to3 cannot handle automatically or can easily be done in Python 2
+to help modernize your code.
+
+
+``from __future__ import division``
+'''''''''''''''''''''''''''''''''''
+
+While the exact same outcome can be had by using the ``-Qnew`` argument to
+Python, using this future statement lifts the requirement that your users use
+the flag to get the expected behavior of division in Python 3
+(e.g., ``1/2 == 0.5; 1//2 == 0``).
+
+
+
+Specify when opening a file as binary
+'''''''''''''''''''''''''''''''''''''
+
+Unless you have been working on Windows, there is a chance you have not always
+bothered to add the ``b`` mode when opening a binary file (e.g., ``rb`` for
+binary reading). Under Python 3, binary files and text files are clearly
+distinct and mutually incompatible; see the :mod:`io` module for details.
+Therefore, you **must** make a decision of whether a file will be used for
+binary access (allowing to read and/or write bytes data) or text access
+(allowing to read and/or write unicode data).
+
+Text files
+''''''''''
+
+Text files created using ``open()`` under Python 2 return byte strings,
+while under Python 3 they return unicode strings. Depending on your porting
+strategy, this can be an issue.
+
+If you want text files to return unicode strings in Python 2, you have two
+possibilities:
+
+* Under Python 2.6 and higher, use :func:`io.open`. Since :func:`io.open`
+ is essentially the same function in both Python 2 and Python 3, it will
+ help iron out any issues that might arise.
+
+* If pre-2.6 compatibility is needed, then you should use :func:`codecs.open`
+ instead. This will make sure that you get back unicode strings in Python 2.
+
+Subclass ``object``
+'''''''''''''''''''
+
+New-style classes have been around since `Python 2.2`_. You need to make sure
+you are subclassing from ``object`` to avoid odd edge cases involving method
+resolution order, etc. This continues to be totally valid in Python 3 (although
+unneeded as all classes implicitly inherit from ``object``).
+
+
+Deal With the Bytes/String Dichotomy
+''''''''''''''''''''''''''''''''''''
+
+One of the biggest issues people have when porting code to Python 3 is handling
+the bytes/string dichotomy. Because Python 2 allowed the ``str`` type to hold
+textual data, people have over the years been rather loose in their delineation
+of what ``str`` instances held text compared to bytes. In Python 3 you cannot
+be so care-free anymore and need to properly handle the difference. The key
+handling this issue is to make sure that **every** string literal in your
+Python 2 code is either syntactically of functionally marked as either bytes or
+text data. After this is done you then need to make sure your APIs are designed
+to either handle a specific type or made to be properly polymorphic.
+
+
+Mark Up Python 2 String Literals
+********************************
+
+First thing you must do is designate every single string literal in Python 2
+as either textual or bytes data. If you are only supporting Python 2.6 or
+newer, this can be accomplished by marking bytes literals with a ``b`` prefix
+and then designating textual data with a ``u`` prefix or using the
+``unicode_literals`` future statement.
+
+If your project supports versions of Python predating 2.6, then you should use
+the six_ project and its ``b()`` function to denote bytes literals. For text
+literals you can either use six's ``u()`` function or use a ``u`` prefix.
+
+
+Decide what APIs Will Accept
+****************************
+
+In Python 2 it was very easy to accidentally create an API that accepted both
+bytes and textual data. But in Python 3, thanks to the more strict handling of
+disparate types, this loose usage of bytes and text together tends to fail.
+
+Take the dict ``{b'a': 'bytes', u'a': 'text'}`` in Python 2.6. It creates the
+dict ``{u'a': 'text'}`` since ``b'a' == u'a'``. But in Python 3 the equivalent
+dict creates ``{b'a': 'bytes', 'a': 'text'}``, i.e., no lost data. Similar
+issues can crop up when transitioning Python 2 code to Python 3.
+
+This means you need to choose what an API is going to accept and create and
+consistently stick to that API in both Python 2 and 3.
+
+
+Bytes / Unicode Comparison
+**************************
+
+In Python 3, mixing bytes and unicode is forbidden in most situations; it
+will raise a :class:`TypeError` where Python 2 would have attempted an implicit
+coercion between types. However, there is one case where it doesn't and
+it can be very misleading::
+
+ >>> b"" == ""
+ False
+
+This is because an equality comparison is required by the language to always
+succeed (and return ``False`` for incompatible types). However, this also
+means that code incorrectly ported to Python 3 can display buggy behaviour
+if such comparisons are silently executed. To detect such situations,
+Python 3 has a ``-b`` flag that will display a warning::
+
+ $ python3 -b
+ >>> b"" == ""
+ __main__:1: BytesWarning: Comparison between bytes and string
+ False
+
+To turn the warning into an exception, use the ``-bb`` flag instead::
+
+ $ python3 -bb
+ >>> b"" == ""
+ Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ BytesWarning: Comparison between bytes and string
+
+
+Indexing bytes objects
+''''''''''''''''''''''
+
+Another potentially surprising change is the indexing behaviour of bytes
+objects in Python 3::
+
+ >>> b"xyz"[0]
+ 120
+
+Indeed, Python 3 bytes objects (as well as :class:`bytearray` objects)
+are sequences of integers. But code converted from Python 2 will often
+assume that indexing a bytestring produces another bytestring, not an
+integer. To reconcile both behaviours, use slicing::
+
+ >>> b"xyz"[0:1]
+ b'x'
+ >>> n = 1
+ >>> b"xyz"[n:n+1]
+ b'y'
+
+The only remaining gotcha is that an out-of-bounds slice returns an empty
+bytes object instead of raising ``IndexError``:
+
+ >>> b"xyz"[3]
+ Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ IndexError: index out of range
+ >>> b"xyz"[3:4]
+ b''
+
+
+``__str__()``/``__unicode__()``
+'''''''''''''''''''''''''''''''
+
+In Python 2, objects can specify both a string and unicode representation of
+themselves. In Python 3, though, there is only a string representation. This
+becomes an issue as people can inadvertently do things in their ``__str__()``
+methods which have unpredictable results (e.g., infinite recursion if you
+happen to use the ``unicode(self).encode('utf8')`` idiom as the body of your
+``__str__()`` method).
+
+There are two ways to solve this issue. One is to use a custom 2to3 fixer. The
+blog post at http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
+specifies how to do this. That will allow 2to3 to change all instances of ``def
+__unicode(self): ...`` to ``def __str__(self): ...``. This does require that you
+define your ``__str__()`` method in Python 2 before your ``__unicode__()``
+method.
+
+The other option is to use a mixin class. This allows you to only define a
+``__unicode__()`` method for your class and let the mixin derive
+``__str__()`` for you (code from
+http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/)::
+
+ import sys
+
+ class UnicodeMixin(object):
+
+ """Mixin class to handle defining the proper __str__/__unicode__
+ methods in Python 2 or 3."""
+
+ if sys.version_info[0] >= 3: # Python 3
+ def __str__(self):
+ return self.__unicode__()
+ else: # Python 2
+ def __str__(self):
+ return self.__unicode__().encode('utf8')
+
+
+ class Spam(UnicodeMixin):
+
+ def __unicode__(self):
+ return u'spam-spam-bacon-spam' # 2to3 will remove the 'u' prefix
+
+
+Don't Index on Exceptions
+'''''''''''''''''''''''''
+
+In Python 2, the following worked::
+
+ >>> exc = Exception(1, 2, 3)
+ >>> exc.args[1]
+ 2
+ >>> exc[1] # Python 2 only!
+ 2
+
+But in Python 3, indexing directly on an exception is an error. You need to
+make sure to only index on the :attr:`BaseException.args` attribute which is a
+sequence containing all arguments passed to the :meth:`__init__` method.
+
+Even better is to use the documented attributes the exception provides.
+
+Don't use ``__getslice__`` & Friends
+''''''''''''''''''''''''''''''''''''
+
+Been deprecated for a while, but Python 3 finally drops support for
+``__getslice__()``, etc. Move completely over to :meth:`__getitem__` and
+friends.
+
+
+Updating doctests
+'''''''''''''''''
+
+2to3_ will attempt to generate fixes for doctests that it comes across. It's
+not perfect, though. If you wrote a monolithic set of doctests (e.g., a single
+docstring containing all of your doctests), you should at least consider
+breaking the doctests up into smaller pieces to make it more manageable to fix.
+Otherwise it might very well be worth your time and effort to port your tests
+to :mod:`unittest`.
+
+
+Update `map` for imbalanced input sequences
+'''''''''''''''''''''''''''''''''''''''''''
+
+With Python 2, `map` would pad input sequences of unequal length with
+`None` values, returning a sequence as long as the longest input sequence.
+
+With Python 3, if the input sequences to `map` are of unequal length, `map`
+will stop at the termination of the shortest of the sequences. For full
+compatibility with `map` from Python 2.x, also wrap the sequences in
+:func:`itertools.zip_longest`, e.g. ``map(func, *sequences)`` becomes
+``list(map(func, itertools.zip_longest(*sequences)))``.
+
+Eliminate ``-3`` Warnings
+-------------------------
+
+When you run your application's test suite, run it using the ``-3`` flag passed
+to Python. This will cause various warnings to be raised during execution about
+things that 2to3 cannot handle automatically (e.g., modules that have been
+removed). Try to eliminate those warnings to make your code even more portable
+to Python 3.
+
+
+Run 2to3
+--------
+
+Once you have made your Python 2 code future-compatible with Python 3, it's
+time to use 2to3_ to actually port your code.
+
+
+Manually
+''''''''
+
+To manually convert source code using 2to3_, you use the ``2to3`` script that
+is installed with Python 2.6 and later.::
+
+ 2to3 <directory or file to convert>
+
+This will cause 2to3 to write out a diff with all of the fixers applied for the
+converted source code. If you would like 2to3 to go ahead and apply the changes
+you can pass it the ``-w`` flag::
+
+ 2to3 -w <stuff to convert>
+
+There are other flags available to control exactly which fixers are applied,
+etc.
+
+
+During Installation
+'''''''''''''''''''
+
+When a user installs your project for Python 3, you can have either
+:mod:`distutils` or Distribute_ run 2to3_ on your behalf.
+For distutils, use the following idiom::
+
+ try: # Python 3
+ from distutils.command.build_py import build_py_2to3 as build_py
+ except ImportError: # Python 2
+ from distutils.command.build_py import build_py
+
+ setup(cmdclass = {'build_py': build_py},
+ # ...
+ )
+
+For Distribute::
+
+ setup(use_2to3=True,
+ # ...
+ )
+
+This will allow you to not have to distribute a separate Python 3 version of
+your project. It does require, though, that when you perform development that
+you at least build your project and use the built Python 3 source for testing.
+
+
+Verify & Test
+-------------
+
+At this point you should (hopefully) have your project converted in such a way
+that it works in Python 3. Verify it by running your unit tests and making sure
+nothing has gone awry. If you miss something then figure out how to fix it in
+Python 3, backport to your Python 2 code, and run your code through 2to3 again
+to verify the fix transforms properly.
+
+
+.. _2to3: http://docs.python.org/py3k/library/2to3.html
+.. _Distribute: http://packages.python.org/distribute/
+
+
+.. _use_same_source:
+
+Python 2/3 Compatible Source
+============================
+
+While it may seem counter-intuitive, you can write Python code which is
+source-compatible between Python 2 & 3. It does lead to code that is not
+entirely idiomatic Python (e.g., having to extract the currently raised
+exception from ``sys.exc_info()[1]``), but it can be run under Python 2
+**and** Python 3 without using 2to3_ as a translation step (although the tool
+should be used to help find potential portability problems). This allows you to
+continue to have a rapid development process regardless of whether you are
+developing under Python 2 or Python 3. Whether this approach or using
+:ref:`use_2to3` works best for you will be a per-project decision.
+
+To get a complete idea of what issues you will need to deal with, see the
+`What's New in Python 3.0`_. Others have reorganized the data in other formats
+such as http://docs.pythonsprints.com/python3_porting/py-porting.html .
+
+The following are some steps to take to try to support both Python 2 & 3 from
+the same source code.
+
+
+.. _What's New in Python 3.0: http://docs.python.org/release/3.0/whatsnew/3.0.html
+
+
+Follow The Steps for Using 2to3_
+--------------------------------
+
+All of the steps outlined in how to
+:ref:`port Python 2 code with 2to3 <use_2to3>` apply
+to creating a Python 2/3 codebase. This includes trying only support Python 2.6
+or newer (the :mod:`__future__` statements work in Python 3 without issue),
+eliminating warnings that are triggered by ``-3``, etc.
+
+You should even consider running 2to3_ over your code (without committing the
+changes). This will let you know where potential pain points are within your
+code so that you can fix them properly before they become an issue.
+
+
+Use six_
+--------
+
+The six_ project contains many things to help you write portable Python code.
+You should make sure to read its documentation from beginning to end and use
+any and all features it provides. That way you will minimize any mistakes you
+might make in writing cross-version code.
+
+
+Capturing the Currently Raised Exception
+----------------------------------------
+
+One change between Python 2 and 3 that will require changing how you code (if
+you support `Python 2.5`_ and earlier) is
+accessing the currently raised exception. In Python 2.5 and earlier the syntax
+to access the current exception is::
+
+ try:
+ raise Exception()
+ except Exception, exc:
+ # Current exception is 'exc'
+ pass
+
+This syntax changed in Python 3 (and backported to `Python 2.6`_ and later)
+to::
+
+ try:
+ raise Exception()
+ except Exception as exc:
+ # Current exception is 'exc'
+ # In Python 3, 'exc' is restricted to the block; Python 2.6 will "leak"
+ pass
+
+Because of this syntax change you must change to capturing the current
+exception to::
+
+ try:
+ raise Exception()
+ except Exception:
+ import sys
+ exc = sys.exc_info()[1]
+ # Current exception is 'exc'
+ pass
+
+You can get more information about the raised exception from
+:func:`sys.exc_info` than simply the current exception instance, but you most
+likely don't need it.
+
+.. note::
+ In Python 3, the traceback is attached to the exception instance
+ through the ``__traceback__`` attribute. If the instance is saved in
+ a local variable that persists outside of the ``except`` block, the
+ traceback will create a reference cycle with the current frame and its
+ dictionary of local variables. This will delay reclaiming dead
+ resources until the next cyclic :term:`garbage collection` pass.
+
+ In Python 2, this problem only occurs if you save the traceback itself
+ (e.g. the third element of the tuple returned by :func:`sys.exc_info`)
+ in a variable.
+
+
+Other Resources
+===============
+
+The authors of the following blog posts, wiki pages, and books deserve special
+thanks for making public their tips for porting Python 2 code to Python 3 (and
+thus helping provide information for this document):
+
+* http://python3porting.com/
+* http://docs.pythonsprints.com/python3_porting/py-porting.html
+* http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/
+* http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html
+* http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
+* http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/
+* http://wiki.python.org/moin/PortingPythonToPy3k
+
+If you feel there is something missing from this document that should be added,
+please email the python-porting_ mailing list.
+
+.. _python-porting: http://mail.python.org/mailman/listinfo/python-porting
diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index 07a8b561d0..9adfa85252 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
@@ -260,7 +260,7 @@ performing string substitutions. ::
>>> import re
>>> p = re.compile('ab*')
- >>> p
+ >>> p #doctest: +ELLIPSIS
<_sre.SRE_Pattern object at 0x...>
:func:`re.compile` also accepts an optional *flags* argument, used to enable
@@ -354,13 +354,13 @@ for a complete listing.
+------------------+-----------------------------------------------+
:meth:`match` and :meth:`search` return ``None`` if no match can be found. If
-they're successful, a ``MatchObject`` instance is returned, containing
-information about the match: where it starts and ends, the substring it matched,
-and more.
+they're successful, a :ref:`match object <match-objects>` instance is returned,
+containing information about the match: where it starts and ends, the substring
+it matched, and more.
You can learn about this by interactively experimenting with the :mod:`re`
module. If you have :mod:`tkinter` available, you may also want to look at
-:file:`Tools/demo/redemo.py`, a demonstration program included with the
+:source:`Tools/demo/redemo.py`, a demonstration program included with the
Python distribution. It allows you to enter REs and strings, and displays
whether the RE matches or fails. :file:`redemo.py` can be quite useful when
trying to debug a complicated RE. Phil Schwartz's `Kodos
@@ -372,7 +372,7 @@ Python interpreter, import the :mod:`re` module, and compile a RE::
>>> import re
>>> p = re.compile('[a-z]+')
- >>> p
+ >>> p #doctest: +ELLIPSIS
<_sre.SRE_Pattern object at 0x...>
Now, you can try matching various strings against the RE ``[a-z]+``. An empty
@@ -386,16 +386,16 @@ interpreter to print no output. You can explicitly print the result of
None
Now, let's try it on a string that it should match, such as ``tempo``. In this
-case, :meth:`match` will return a :class:`MatchObject`, so you should store the
-result in a variable for later use. ::
+case, :meth:`match` will return a :ref:`match object <match-objects>`, so you
+should store the result in a variable for later use. ::
>>> m = p.match('tempo')
- >>> m
+ >>> m #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
-Now you can query the :class:`MatchObject` for information about the matching
-string. :class:`MatchObject` instances also have several methods and
-attributes; the most important ones are:
+Now you can query the :ref:`match object <match-objects>` for information
+about the matching string. :ref:`match object <match-objects>` instances
+also have several methods and attributes; the most important ones are:
+------------------+--------------------------------------------+
| Method/Attribute | Purpose |
@@ -429,15 +429,16 @@ case. ::
>>> print(p.match('::: message'))
None
- >>> m = p.search('::: message') ; print(m)
+ >>> m = p.search('::: message'); print(m) #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
>>> m.group()
'message'
>>> m.span()
(4, 11)
-In actual programs, the most common style is to store the :class:`MatchObject`
-in a variable, and then check if it was ``None``. This usually looks like::
+In actual programs, the most common style is to store the
+:ref:`match object <match-objects>` in a variable, and then check if it was
+``None``. This usually looks like::
p = re.compile( ... )
m = p.match( 'string goes here' )
@@ -454,11 +455,11 @@ Two pattern methods return all of the matches for a pattern.
['12', '11', '10']
:meth:`findall` has to create the entire list before it can be returned as the
-result. The :meth:`finditer` method returns a sequence of :class:`MatchObject`
-instances as an :term:`iterator`::
+result. The :meth:`finditer` method returns a sequence of
+:ref:`match object <match-objects>` instances as an :term:`iterator`::
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
- >>> iterator
+ >>> iterator #doctest: +ELLIPSIS
<callable_iterator object at 0x...>
>>> for match in iterator:
... print(match.span())
@@ -476,11 +477,11 @@ You don't have to create a pattern object and call its methods; the
:func:`search`, :func:`findall`, :func:`sub`, and so forth. These functions
take the same arguments as the corresponding pattern method, with
the RE string added as the first argument, and still return either ``None`` or a
-:class:`MatchObject` instance. ::
+:ref:`match object <match-objects>` instance. ::
>>> print(re.match(r'From\s+', 'Fromage amk'))
None
- >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
+ >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
Under the hood, these functions simply create a pattern object for you
@@ -495,7 +496,7 @@ more convenient. If a program contains a lot of regular expressions, or re-uses
the same ones in several locations, then it might be worthwhile to collect all
the definitions in one place, in a section of code that compiles all the REs
ahead of time. To take an example from the standard library, here's an extract
-from the now deprecated :file:`xmllib.py`::
+from the now-defunct Python 2 standard :mod:`xmllib` module::
ref = re.compile( ... )
entityref = re.compile( ... )
@@ -682,7 +683,7 @@ given location, they can obviously be matched an infinite number of times.
For example, if you wish to match the word ``From`` only at the beginning of a
line, the RE to use is ``^From``. ::
- >>> print(re.search('^From', 'From Here to Eternity'))
+ >>> print(re.search('^From', 'From Here to Eternity')) #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
>>> print(re.search('^From', 'Reciting From Memory'))
None
@@ -694,11 +695,11 @@ given location, they can obviously be matched an infinite number of times.
Matches at the end of a line, which is defined as either the end of the string,
or any location followed by a newline character. ::
- >>> print(re.search('}$', '{block}'))
+ >>> print(re.search('}$', '{block}')) #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
>>> print(re.search('}$', '{block} '))
None
- >>> print(re.search('}$', '{block}\n'))
+ >>> print(re.search('}$', '{block}\n')) #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
@@ -723,7 +724,7 @@ given location, they can obviously be matched an infinite number of times.
match when it's contained inside another word. ::
>>> p = re.compile(r'\bclass\b')
- >>> print(p.search('no class at all'))
+ >>> print(p.search('no class at all')) #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
>>> print(p.search('the declassified algorithm'))
None
@@ -741,7 +742,7 @@ given location, they can obviously be matched an infinite number of times.
>>> p = re.compile('\bclass\b')
>>> print(p.search('no class at all'))
None
- >>> print(p.search('\b' + 'class' + '\b') )
+ >>> print(p.search('\b' + 'class' + '\b')) #doctest: +ELLIPSIS
<_sre.SRE_Match object at 0x...>
Second, inside a character class, where there's no use for this assertion,
@@ -786,9 +787,9 @@ Groups indicated with ``'('``, ``')'`` also capture the starting and ending
index of the text that they match; this can be retrieved by passing an argument
to :meth:`group`, :meth:`start`, :meth:`end`, and :meth:`span`. Groups are
numbered starting with 0. Group 0 is always present; it's the whole RE, so
-:class:`MatchObject` methods all have group 0 as their default argument. Later
-we'll see how to express groups that don't capture the span of text that they
-match. ::
+:ref:`match object <match-objects>` methods all have group 0 as their default
+argument. Later we'll see how to express groups that don't capture the span
+of text that they match. ::
>>> p = re.compile('(a)b')
>>> m = p.match('ab')
@@ -908,10 +909,10 @@ numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions:
``(?P<name>...)``. *name* is, obviously, the name of the group. Named groups
also behave exactly like capturing groups, and additionally associate a name
-with a group. The :class:`MatchObject` methods that deal with capturing groups
-all accept either integers that refer to the group by number or strings that
-contain the desired group's name. Named groups are still given numbers, so you
-can retrieve information about a group in two ways::
+with a group. The :ref:`match object <match-objects>` methods that deal with
+capturing groups all accept either integers that refer to the group by number
+or strings that contain the desired group's name. Named groups are still
+given numbers, so you can retrieve information about a group in two ways::
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
@@ -1175,16 +1176,16 @@ three variations of the replacement string. ::
*replacement* can also be a function, which gives you even more control. If
*replacement* is a function, the function is called for every non-overlapping
-occurrence of *pattern*. On each call, the function is passed a
-:class:`MatchObject` argument for the match and can use this information to
-compute the desired replacement string and return it.
+occurrence of *pattern*. On each call, the function is passed a
+:ref:`match object <match-objects>` argument for the match and can use this
+information to compute the desired replacement string and return it.
-In the following example, the replacement function translates decimals into
+In the following example, the replacement function translates decimals into
hexadecimal::
- >>> def hexrepl( match ):
+ >>> def hexrepl(match):
... "Return the hex string for a decimal number"
- ... value = int( match.group() )
+ ... value = int(match.group())
... return hex(value)
...
>>> p = re.compile(r'\d+')
diff --git a/Doc/howto/sockets.rst b/Doc/howto/sockets.rst
index 04e9b98b2c..279bb3ef5e 100644
--- a/Doc/howto/sockets.rst
+++ b/Doc/howto/sockets.rst
@@ -1,3 +1,5 @@
+.. _socket-howto:
+
****************************
Socket Programming HOWTO
****************************
@@ -60,11 +62,10 @@ Creating a Socket
Roughly speaking, when you clicked on the link that brought you to this page,
your browser did something like the following::
- #create an INET, STREAMing socket
+ # create an INET, STREAMing socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- #now connect to the web server on port 80
- # - the normal http port
- s.connect(("www.mcmillan-inc.com", 80))
+ # now connect to the web server on port 80 - the normal http port
+ s.connect(("www.python.org", 80))
When the ``connect`` completes, the socket ``s`` can be used to send
in a request for the text of the page. The same socket will read the
@@ -75,13 +76,11 @@ exchanges).
What happens in the web server is a bit more complex. First, the web server
creates a "server socket"::
- #create an INET, STREAMing socket
- serversocket = socket.socket(
- socket.AF_INET, socket.SOCK_STREAM)
- #bind the socket to a public host,
- # and a well-known port
+ # create an INET, STREAMing socket
+ serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ # bind the socket to a public host, and a well-known port
serversocket.bind((socket.gethostname(), 80))
- #become a server socket
+ # become a server socket
serversocket.listen(5)
A couple things to notice: we used ``socket.gethostname()`` so that the socket
@@ -101,10 +100,10 @@ Now that we have a "server" socket, listening on port 80, we can enter the
mainloop of the web server::
while True:
- #accept connections from outside
+ # accept connections from outside
(clientsocket, address) = serversocket.accept()
- #now do something with the clientsocket
- #in this case, we'll pretend this is a threaded server
+ # now do something with the clientsocket
+ # in this case, we'll pretend this is a threaded server
ct = client_thread(clientsocket)
ct.run()
@@ -126,12 +125,13 @@ IPC
---
If you need fast IPC between two processes on one machine, you should look into
-whatever form of shared memory the platform offers. A simple protocol based
-around shared memory and locks or semaphores is by far the fastest technique.
+pipes or shared memory. If you do decide to use AF_INET sockets, bind the
+"server" socket to ``'localhost'``. On most platforms, this will take a
+shortcut around a couple of layers of network code and be quite a bit faster.
-If you do decide to use sockets, bind the "server" socket to ``'localhost'``. On
-most platforms, this will take a shortcut around a couple of layers of network
-code and be quite a bit faster.
+.. seealso::
+ The :mod:`multiprocessing` integrates cross-platform IPC into a higher-level
+ API.
Using a Socket
@@ -153,7 +153,7 @@ I'm not going to talk about it here, except to warn you that you need to use
there, you may wait forever for the reply, because the request may still be in
your output buffer.
-Now we come the major stumbling block of sockets - ``send`` and ``recv`` operate
+Now we come to the major stumbling block of sockets - ``send`` and ``recv`` operate
on the network buffers. They do not necessarily handle all the bytes you hand
them (or expect from them), because their major focus is handling the network
buffers. In general, they return when the associated network buffers have been
@@ -164,7 +164,7 @@ been completely dealt with.
When a ``recv`` returns 0 bytes, it means the other side has closed (or is in
the process of closing) the connection. You will not receive any more data on
this connection. Ever. You may be able to send data successfully; I'll talk
-about that some on the next page.
+more about this later.
A protocol like HTTP uses a socket for only one transfer. The client sends a
request, then reads a reply. That's it. The socket is discarded. This means that
@@ -300,7 +300,7 @@ When Sockets Die
Probably the worst thing about using blocking sockets is what happens when the
other side comes down hard (without doing a ``close``). Your socket is likely to
-hang. SOCKSTREAM is a reliable protocol, and it will wait a long, long time
+hang. TCP is a reliable protocol, and it will wait a long, long time
before giving up on a connection. If you're using threads, the entire thread is
essentially dead. There's not much you can do about it. As long as you aren't
doing something dumb, like holding a lock while doing a blocking read, the
@@ -395,19 +395,13 @@ Performance
There's no question that the fastest sockets code uses non-blocking sockets and
select to multiplex them. You can put together something that will saturate a
-LAN connection without putting any strain on the CPU. The trouble is that an app
-written this way can't do much of anything else - it needs to be ready to
-shuffle bytes around at all times.
-
-Assuming that your app is actually supposed to do something more than that,
-threading is the optimal solution, (and using non-blocking sockets will be
-faster than using blocking sockets). Unfortunately, threading support in Unixes
-varies both in API and quality. So the normal Unix solution is to fork a
-subprocess to deal with each connection. The overhead for this is significant
-(and don't do this on Windows - the overhead of process creation is enormous
-there). It also means that unless each subprocess is completely independent,
-you'll need to use another form of IPC, say a pipe, or shared memory and
-semaphores, to communicate between the parent and child processes.
+LAN connection without putting any strain on the CPU.
+
+The trouble is that an app written this way can't do much of anything else -
+it needs to be ready to shuffle bytes around at all times. Assuming that your
+app is actually supposed to do something more than that, threading is the
+optimal solution, (and using non-blocking sockets will be faster than using
+blocking sockets).
Finally, remember that even though blocking sockets are somewhat slower than
non-blocking, in many cases they are the "right" solution. After all, if your
diff --git a/Doc/howto/sorting.rst b/Doc/howto/sorting.rst
index 351713f2cf..f2e64ee98b 100644
--- a/Doc/howto/sorting.rst
+++ b/Doc/howto/sorting.rst
@@ -23,7 +23,7 @@ returns a new sorted list::
>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]
-You can also use the :meth:`list.sort` method of a list. It modifies the list
+You can also use the :meth:`list.sort` method. It modifies the list
in-place (and returns *None* to avoid confusion). Usually it's less convenient
than :func:`sorted` - but if you don't need the original list, it's slightly
more efficient.
@@ -42,7 +42,7 @@ lists. In contrast, the :func:`sorted` function accepts any iterable.
Key Functions
=============
-Both :meth:`list.sort` and :func:`sorted` have *key* parameter to specify a
+Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
function to be called on each list element prior to making comparisons.
For example, here's a case-insensitive string comparison:
@@ -87,9 +87,9 @@ Operator Module Functions
=========================
The key-function patterns shown above are very common, so Python provides
-convenience functions to make accessor functions easier and faster. The operator
-module has :func:`operator.itemgetter`, :func:`operator.attrgetter`, and
-an :func:`operator.methodcaller` function.
+convenience functions to make accessor functions easier and faster. The
+:mod:`operator` module has :func:`~operator.itemgetter`,
+:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
Using those functions, the above examples become simpler and faster:
@@ -114,7 +114,7 @@ Ascending and Descending
========================
Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
-boolean value. This is using to flag descending sorts. For example, to get the
+boolean value. This is used to flag descending sorts. For example, to get the
student data in reverse *age* order:
>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
@@ -225,7 +225,7 @@ function. The following wrapper makes that easy to do::
def cmp_to_key(mycmp):
'Convert a cmp= function into a key= function'
- class K(object):
+ class K:
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
@@ -247,6 +247,8 @@ To convert to a key function, just wrap the old comparison function:
>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
[5, 4, 3, 2, 1]
+In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
+:mod:`functools` module in the standard library.
Odd and Ends
============
@@ -254,7 +256,7 @@ Odd and Ends
* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
:func:`locale.strcoll` for a comparison function.
-* The *reverse* parameter still maintains sort stability (i.e. records with
+* The *reverse* parameter still maintains sort stability (so that records with
equal keys retain the original order). Interestingly, that effect can be
simulated without the parameter by using the builtin :func:`reversed` function
twice:
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index 13efa7610f..b309f601d0 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -4,13 +4,11 @@
Unicode HOWTO
*****************
-:Release: 1.11
+:Release: 1.12
-This HOWTO discusses Python 2.x's support for Unicode, and explains
+This HOWTO discusses Python support for Unicode, and explains
various problems that people commonly encounter when trying to work
-with Unicode. (This HOWTO has not yet been updated to cover the 3.x
-versions of Python.)
-
+with Unicode.
Introduction to Unicode
=======================
@@ -44,14 +42,14 @@ In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
hold values ranging from 0 to 255. ASCII codes only went up to 127, so some
machines assigned values between 128 and 255 to accented characters. Different
machines had different codes, however, which led to problems exchanging files.
-Eventually various commonly used sets of values for the 128-255 range emerged.
+Eventually various commonly used sets of values for the 128--255 range emerged.
Some were true standards, defined by the International Standards Organization,
-and some were **de facto** conventions that were invented by one company or
+and some were *de facto* conventions that were invented by one company or
another and managed to catch on.
255 characters aren't very many. For example, you can't fit both the accented
characters used in Western Europe and the Cyrillic alphabet used for Russian
-into the 128-255 range because there are more than 127 such characters.
+into the 128--255 range because there are more than 127 such characters.
You could write files using different codes (all your Russian files in a coding
system called KOI8, all your French files in a different coding system called
@@ -64,8 +62,8 @@ bits means you have 2^16 = 65,536 distinct values available, making it possible
to represent many different characters from many different alphabets; an initial
goal was to have Unicode contain the alphabets for every single human language.
It turns out that even 16 bits isn't enough to meet that goal, and the modern
-Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in
-base-16).
+Unicode specification uses a wider range of codes, 0 through 1,114,111 (
+``0x10FFFF`` in base 16).
There's a related ISO standard, ISO 10646. Unicode and ISO 10646 were
originally separate efforts, but the specifications were merged with the 1.1
@@ -89,9 +87,11 @@ meanings.
The Unicode standard describes how characters are represented by **code
points**. A code point is an integer value, usually denoted in base 16. In the
-standard, a code point is written using the notation U+12ca to mean the
-character with value 0x12ca (4810 decimal). The Unicode standard contains a lot
-of tables listing characters and their corresponding code points::
+standard, a code point is written using the notation ``U+12CA`` to mean the
+character with value ``0x12ca`` (4,810 decimal). The Unicode standard contains
+a lot of tables listing characters and their corresponding code points:
+
+.. code-block:: none
0061 'a'; LATIN SMALL LETTER A
0062 'b'; LATIN SMALL LETTER B
@@ -100,7 +100,7 @@ of tables listing characters and their corresponding code points::
007B '{'; LEFT CURLY BRACKET
Strictly, these definitions imply that it's meaningless to say 'this is
-character U+12ca'. U+12ca is a code point, which represents some particular
+character ``U+12CA``'. ``U+12CA`` is a code point, which represents some particular
character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'. In
informal contexts, this distinction between code points and characters will
sometimes be forgotten.
@@ -117,13 +117,15 @@ Encodings
---------
To summarize the previous section: a Unicode string is a sequence of code
-points, which are numbers from 0 to 0x10ffff. This sequence needs to be
-represented as a set of bytes (meaning, values from 0-255) in memory. The rules
-for translating a Unicode string into a sequence of bytes are called an
-**encoding**.
+points, which are numbers from 0 through ``0x10FFFF`` (1,114,111 decimal). This
+sequence needs to be represented as a set of bytes (meaning, values
+from 0 through 255) in memory. The rules for translating a Unicode string
+into a sequence of bytes are called an **encoding**.
The first encoding you might think of is an array of 32-bit integers. In this
-representation, the string "Python" would look like this::
+representation, the string "Python" would look like this:
+
+.. code-block:: none
P y t h o n
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
@@ -135,10 +137,10 @@ problems.
1. It's not portable; different processors order the bytes differently.
2. It's very wasteful of space. In most texts, the majority of the code points
- are less than 127, or less than 255, so a lot of space is occupied by zero
+ are less than 127, or less than 255, so a lot of space is occupied by ``0x00``
bytes. The above string takes 24 bytes compared to the 6 bytes needed for an
ASCII representation. Increased RAM usage doesn't matter too much (desktop
- computers have megabytes of RAM, and strings aren't usually that large), but
+ computers have gigabytes of RAM, and strings aren't usually that large), but
expanding our usage of disk and network bandwidth by a factor of 4 is
intolerable.
@@ -164,7 +166,7 @@ encoding, for example, are simple; for each code point:
case.)
Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points
-0-255 are identical to the Latin-1 values, so converting to this encoding simply
+0--255 are identical to the Latin-1 values, so converting to this encoding simply
requires converting code points to byte values; if a code point larger than 255
is encountered, the string can't be encoded into Latin-1.
@@ -177,14 +179,12 @@ internal detail.
UTF-8 is one of the most commonly used encodings. UTF stands for "Unicode
Transformation Format", and the '8' means that 8-bit numbers are used in the
-encoding. (There's also a UTF-16 encoding, but it's less frequently used than
-UTF-8.) UTF-8 uses the following rules:
+encoding. (There are also a UTF-16 and UTF-32 encodings, but they are less
+frequently used than UTF-8.) UTF-8 uses the following rules:
-1. If the code point is <128, it's represented by the corresponding byte value.
-2. If the code point is between 128 and 0x7ff, it's turned into two byte values
- between 128 and 255.
-3. Code points >0x7ff are turned into three- or four-byte sequences, where each
- byte of the sequence is between 128 and 255.
+1. If the code point is < 128, it's represented by the corresponding byte value.
+2. If the code point is >= 128, it's turned into a sequence of two, three, or
+ four bytes, where each byte of the sequence is between 128 and 255.
UTF-8 has several convenient properties:
@@ -194,8 +194,8 @@ UTF-8 has several convenient properties:
processed by C functions such as ``strcpy()`` and sent through protocols that
can't handle zero bytes.
3. A string of ASCII text is also valid UTF-8 text.
-4. UTF-8 is fairly compact; the majority of code points are turned into two
- bytes, and values less than 128 occupy only a single byte.
+4. UTF-8 is fairly compact; the majority of commonly used characters can be
+ represented with one or two bytes.
5. If bytes are corrupted or lost, it's possible to determine the start of the
next UTF-8-encoded code point and resynchronize. It's also unlikely that
random 8-bit data will look like valid UTF-8.
@@ -205,29 +205,29 @@ UTF-8 has several convenient properties:
References
----------
-The Unicode Consortium site at <http://www.unicode.org> has character charts, a
+The `Unicode Consortium site <http://www.unicode.org>`_ has character charts, a
glossary, and PDF versions of the Unicode specification. Be prepared for some
-difficult reading. <http://www.unicode.org/history/> is a chronology of the
-origin and development of Unicode.
+difficult reading. `A chronology <http://www.unicode.org/history/>`_ of the
+origin and development of Unicode is also available on the site.
-To help understand the standard, Jukka Korpela has written an introductory guide
-to reading the Unicode character tables, available at
-<http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
+To help understand the standard, Jukka Korpela has written `an introductory
+guide <http://www.cs.tut.fi/~jkorpela/unicode/guide.html>`_ to reading the
+Unicode character tables.
-Another good introductory article was written by Joel Spolsky
-<http://www.joelonsoftware.com/articles/Unicode.html>.
+Another `good introductory article <http://www.joelonsoftware.com/articles/Unicode.html>`_
+was written by Joel Spolsky.
If this introduction didn't make things clear to you, you should try reading this
alternate article before continuing.
.. Jason Orendorff XXX http://www.jorendorff.com/articles/unicode/ is broken
-Wikipedia entries are often helpful; see the entries for "character encoding"
-<http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
-<http://en.wikipedia.org/wiki/UTF-8>, for example.
+Wikipedia entries are often helpful; see the entries for "`character encoding
+<http://en.wikipedia.org/wiki/Character_encoding>`_" and `UTF-8
+<http://en.wikipedia.org/wiki/UTF-8>`_, for example.
-Python 2.x's Unicode Support
-============================
+Python's Unicode Support
+========================
Now that you've learned the rudiments of Unicode, we can look at Python's
Unicode features.
@@ -235,11 +235,11 @@ Unicode features.
The String Type
---------------
-Since Python 3.0, the language features a ``str`` type that contain Unicode
+Since Python 3.0, the language features a :class:`str` type that contain Unicode
characters, meaning any string created using ``"unicode rocks!"``, ``'unicode
rocks!'``, or the triple-quoted string syntax is stored as Unicode.
-To insert a Unicode character that is not part ASCII, e.g., any letters with
+To insert a non-ASCII Unicode character, e.g., any letters with
accents, one can use escape sequences in their string literals as such::
>>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name
@@ -249,23 +249,24 @@ accents, one can use escape sequences in their string literals as such::
>>> "\U00000394" # Using a 32-bit hex value
'\u0394'
-In addition, one can create a string using the :func:`decode` method of
-:class:`bytes`. This method takes an encoding, such as UTF-8, and, optionally,
-an *errors* argument.
+In addition, one can create a string using the :func:`~bytes.decode` method of
+:class:`bytes`. This method takes an *encoding* argument, such as ``UTF-8``,
+and optionally, an *errors* argument.
The *errors* argument specifies the response when the input string can't be
converted according to the encoding's rules. Legal values for this argument are
-'strict' (raise a :exc:`UnicodeDecodeError` exception), 'replace' (use U+FFFD,
-'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
-Unicode result). The following examples show the differences::
+``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), ``'replace'`` (use
+``U+FFFD``, ``REPLACEMENT CHARACTER``), or ``'ignore'`` (just leave the
+character out of the Unicode result).
+The following examples show the differences::
- >>> b'\x80abc'.decode("utf-8", "strict")
+ >>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
Traceback (most recent call last):
- File "<stdin>", line 1, in ?
- UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
- unexpected code byte
+ ...
+ UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
+ invalid start byte
>>> b'\x80abc'.decode("utf-8", "replace")
- '?abc'
+ '\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
@@ -275,8 +276,8 @@ a question mark because it may not be displayed on some systems.)
Encodings are specified as strings containing the encoding's name. Python 3.2
comes with roughly 100 different encodings; see the Python Library Reference at
:ref:`standard-encodings` for a list. Some encodings have multiple names; for
-example, 'latin-1', 'iso_8859_1' and '8859' are all synonyms for the same
-encoding.
+example, ``'latin-1'``, ``'iso_8859_1'`` and ``'8859``' are all synonyms for
+the same encoding.
One-character Unicode strings can also be created with the :func:`chr`
built-in function, which takes integers and returns a Unicode string of length 1
@@ -284,30 +285,31 @@ that contains the corresponding code point. The reverse operation is the
built-in :func:`ord` function that takes a one-character Unicode string and
returns the code point value::
- >>> chr(40960)
- '\ua000'
- >>> ord('\ua000')
- 40960
+ >>> chr(57344)
+ '\ue000'
+ >>> ord('\ue000')
+ 57344
Converting to Bytes
-------------------
-Another important str method is ``.encode([encoding], [errors='strict'])``,
-which returns a ``bytes`` representation of the Unicode string, encoded in the
-requested encoding. The ``errors`` parameter is the same as the parameter of
-the :meth:`decode` method, with one additional possibility; as well as 'strict',
-'ignore', and 'replace' (which in this case inserts a question mark instead of
-the unencodable character), you can also pass 'xmlcharrefreplace' which uses
-XML's character references. The following example shows the different results::
+The opposite method of :meth:`bytes.decode` is :meth:`str.encode`,
+which returns a :class:`bytes` representation of the Unicode string, encoded in the
+requested *encoding*. The *errors* parameter is the same as the parameter of
+the :meth:`~bytes.decode` method, with one additional possibility; as well as
+``'strict'``, ``'ignore'``, and ``'replace'`` (which in this case inserts a
+question mark instead of the unencodable character), you can also pass
+``'xmlcharrefreplace'`` which uses XML's character references.
+The following example shows the different results::
>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
- >>> u.encode('ascii')
+ >>> u.encode('ascii') #doctest: +NORMALIZE_WHITESPACE
Traceback (most recent call last):
- File "<stdin>", line 1, in ?
+ ...
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
- position 0: ordinal not in range(128)
+ position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
@@ -315,6 +317,8 @@ XML's character references. The following example shows the different results::
>>> u.encode('ascii', 'xmlcharrefreplace')
b'&#40960;abcd&#1972;'
+.. XXX mention the surrogate* error handlers
+
The low-level routines for registering and accessing the available encodings are
found in the :mod:`codecs` module. However, the encoding and decoding functions
returned by this module are usually more low-level than is comfortable, so I'm
@@ -329,15 +333,15 @@ Unicode Literals in Python Source Code
In Python source code, specific Unicode code points can be written using the
``\u`` escape sequence, which is followed by four hex digits giving the code
-point. The ``\U`` escape sequence is similar, but expects 8 hex digits, not 4::
+point. The ``\U`` escape sequence is similar, but expects eight hex digits,
+not four::
>>> s = "a\xac\u1234\u20ac\U00008000"
- ^^^^ two-digit hex escape
- ^^^^^ four-digit Unicode escape
- ^^^^^^^^^^ eight-digit Unicode escape
- >>> for c in s: print(ord(c), end=" ")
- ...
- 97 172 4660 8364 32768
+ ... # ^^^^ two-digit hex escape
+ ... # ^^^^^^ four-digit Unicode escape
+ ... # ^^^^^^^^^^ eight-digit Unicode escape
+ >>> [ord(c) for c in s]
+ [97, 172, 4660, 8364, 32768]
Using escape sequences for code points greater than 127 is fine in small doses,
but becomes an annoyance if you're using many accented characters, as you would
@@ -367,14 +371,14 @@ they have no significance to Python but are a convention. Python looks for
``coding: name`` or ``coding=name`` in the comment.
If you don't include such a comment, the default encoding used will be UTF-8 as
-already mentioned.
+already mentioned. See also :pep:`263` for more information.
Unicode Properties
------------------
The Unicode specification includes a database of information about code points.
-For each code point that's defined, the information includes the character's
+For each defined code point, the information includes the character's
name, its category, the numeric value if applicable (Unicode has characters
representing the Roman numerals and fractions such as one-third and
four-fifths). There are also properties related to the code point's use in
@@ -394,7 +398,9 @@ prints the numeric value of one particular character::
# Get numeric value of second character
print(unicodedata.numeric(u[1]))
-When run, this prints::
+When run, this prints:
+
+.. code-block:: none
0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
1 0bf2 No TAMIL NUMBER ONE THOUSAND
@@ -409,13 +415,13 @@ These are grouped into categories such as "Letter", "Number", "Punctuation", or
from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
other". See
-<http://unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values> for a
+<http://www.unicode.org/reports/tr44/#General_Category_Values> for a
list of category codes.
References
----------
-The ``str`` type is described in the Python library reference at
+The :class:`str` type is described in the Python library reference at
:ref:`typesseq`.
The documentation for the :mod:`unicodedata` module.
@@ -445,16 +451,16 @@ columns and can return Unicode values from an SQL query.
Unicode data is usually converted to a particular encoding before it gets
written to disk or sent over a socket. It's possible to do all the work
-yourself: open a file, read an 8-bit byte string from it, and convert the string
-with ``str(bytes, encoding)``. However, the manual approach is not recommended.
+yourself: open a file, read an 8-bit bytes object from it, and convert the string
+with ``bytes.decode(encoding)``. However, the manual approach is not recommended.
One problem is the multi-byte nature of encodings; one Unicode character can be
represented by several bytes. If you want to read the file in arbitrary-sized
-chunks (say, 1K or 4K), you need to write error-handling code to catch the case
+chunks (say, 1k or 4k), you need to write error-handling code to catch the case
where only part of the bytes encoding a single Unicode character are read at the
end of a chunk. One solution would be to read the entire file into memory and
then perform the decoding, but that prevents you from working with files that
-are extremely large; if you need to read a 2Gb file, you need 2Gb of RAM.
+are extremely large; if you need to read a 2GB file, you need 2GB of RAM.
(More, really, since for at least a moment you'd need to have both the encoded
string and its Unicode version in memory.)
@@ -462,26 +468,25 @@ The solution would be to use the low-level decoding interface to catch the case
of partial coding sequences. The work of implementing this has already been
done for you: the built-in :func:`open` function can return a file-like object
that assumes the file's contents are in a specified encoding and accepts Unicode
-parameters for methods such as ``.read()`` and ``.write()``. This works through
+parameters for methods such as :meth:`read` and :meth:`write`. This works through
:func:`open`\'s *encoding* and *errors* parameters which are interpreted just
-like those in string objects' :meth:`encode` and :meth:`decode` methods.
+like those in :meth:`str.encode` and :meth:`bytes.decode`.
Reading Unicode from a file is therefore simple::
- f = open('unicode.rst', encoding='utf-8')
- for line in f:
- print(repr(line))
+ with open('unicode.rst', encoding='utf-8') as f:
+ for line in f:
+ print(repr(line))
It's also possible to open files in update mode, allowing both reading and
writing::
- f = open('test', encoding='utf-8', mode='w+')
- f.write('\u4500 blah blah blah\n')
- f.seek(0)
- print(repr(f.readline()[:1]))
- f.close()
+ with open('test', encoding='utf-8', mode='w+') as f:
+ f.write('\u4500 blah blah blah\n')
+ f.seek(0)
+ print(repr(f.readline()[:1]))
-The Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
+The Unicode character ``U+FEFF`` is used as a byte-order mark (BOM), and is often
written as the first character of a file in order to assist with autodetection
of the file's byte ordering. Some encodings, such as UTF-16, expect a BOM to be
present at the start of a file; when such an encoding is used, the BOM will be
@@ -516,20 +521,19 @@ usually just provide the Unicode string as the filename, and it will be
automatically converted to the right encoding for you::
filename = 'filename\u4500abc'
- f = open(filename, 'w')
- f.write('blah\n')
- f.close()
+ with open(filename, 'w') as f:
+ f.write('blah\n')
Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
filenames.
-:func:`os.listdir`, which returns filenames, raises an issue: should it return
-the Unicode version of filenames, or should it return byte strings containing
+Function :func:`os.listdir`, which returns filenames, raises an issue: should it return
+the Unicode version of filenames, or should it return bytes containing
the encoded versions? :func:`os.listdir` will do both, depending on whether you
-provided the directory path as a byte string or a Unicode string. If you pass a
+provided the directory path as bytes or a Unicode string. If you pass a
Unicode string as the path, filenames will be decoded using the filesystem's
encoding and a list of Unicode strings will be returned, while passing a byte
-path will return the byte string versions of the filenames. For example,
+path will return the bytes versions of the filenames. For example,
assuming the default filesystem encoding is UTF-8, running the following
program::
@@ -555,7 +559,6 @@ should only be used on systems where undecodable file names can be present,
i.e. Unix systems.
-
Tips for Writing Unicode-aware Programs
---------------------------------------
@@ -564,47 +567,23 @@ Unicode.
The most important tip is:
- Software should only work with Unicode strings internally, converting to a
- particular encoding on output.
+ Software should only work with Unicode strings internally, decoding the input
+ data as soon as possible and encoding the output only at the end.
If you attempt to write processing functions that accept both Unicode and byte
strings, you will find your program vulnerable to bugs wherever you combine the
-two different kinds of strings. There is no automatic encoding or decoding if
-you do e.g. ``str + bytes``, a :exc:`TypeError` is raised for this expression.
-
-It's easy to miss such problems if you only test your software with data that
-doesn't contain any accents; everything will seem to work, but there's actually
-a bug in your program waiting for the first user who attempts to use characters
-> 127. A second tip, therefore, is:
-
- Include characters > 127 and, even better, characters > 255 in your test
- data.
+two different kinds of strings. There is no automatic encoding or decoding: if
+you do e.g. ``str + bytes``, a :exc:`TypeError` will be raised.
When using data coming from a web browser or some other untrusted source, a
common technique is to check for illegal characters in a string before using the
string in a generated command line or storing it in a database. If you're doing
-this, be careful to check the string once it's in the form that will be used or
-stored; it's possible for encodings to be used to disguise characters. This is
-especially true if the input data also specifies the encoding; many encodings
-leave the commonly checked-for characters alone, but Python includes some
-encodings such as ``'base64'`` that modify every single character.
-
-For example, let's say you have a content management system that takes a Unicode
-filename, and you want to disallow paths with a '/' character. You might write
-this code::
-
- def read_file(filename, encoding):
- if '/' in filename:
- raise ValueError("'/' not allowed in filenames")
- unicode_name = filename.decode(encoding)
- f = open(unicode_name, 'r')
- # ... return contents of file ...
-
-However, if an attacker could specify the ``'base64'`` encoding, they could pass
-``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string
-``'/etc/passwd'``, to read a system file. The above code looks for ``'/'``
-characters in the encoded form and misses the dangerous character in the
-resulting decoded form.
+this, be careful to check the decoded string, not the encoded bytes data;
+some encodings may have interesting properties, such as not being bijective
+or not being fully ASCII-compatible. This is especially true if the input
+data also specifies the encoding, since the attacker can then choose a
+clever way to hide malicious text in the encoded bytestream.
+
References
----------
@@ -613,27 +592,30 @@ The PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
Applications in Python" are available at
<http://downloads.egenix.com/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
and discuss questions of character encodings as well as how to internationalize
-and localize an application.
+and localize an application. These slides cover Python 2.x only.
-Revision History and Acknowledgements
-=====================================
+Acknowledgements
+================
Thanks to the following people who have noted errors or offered suggestions on
this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken Krugler,
Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
-Version 1.0: posted August 5 2005.
+.. comment
+ Revision History
+
+ Version 1.0: posted August 5 2005.
-Version 1.01: posted August 7 2005. Corrects factual and markup errors; adds
-several links.
+ Version 1.01: posted August 7 2005. Corrects factual and markup errors; adds
+ several links.
-Version 1.02: posted August 16 2005. Corrects factual errors.
+ Version 1.02: posted August 16 2005. Corrects factual errors.
-Version 1.1: Feb-Nov 2008. Updates the document with respect to Python 3 changes.
+ Version 1.1: Feb-Nov 2008. Updates the document with respect to Python 3 changes.
-Version 1.11: posted June 20 2010. Notes that Python 3.x is not covered,
-and that the HOWTO only covers 2.x.
+ Version 1.11: posted June 20 2010. Notes that Python 3.x is not covered,
+ and that the HOWTO only covers 2.x.
.. comment Describe Python 3.x support (new section? new document?)
.. comment Additional topic: building Python w/ UCS2 or UCS4 support
diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst
index 110b6de3b6..87f42ba0ec 100644
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst
@@ -108,6 +108,7 @@ library. ::
'language' : 'Python' }
data = urllib.parse.urlencode(values)
+ data = data.encode('utf-8') # data should be bytes
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()
@@ -136,11 +137,11 @@ This is done as follows::
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
- >>> print(url_values)
+ >>> print(url_values) # The order may differ from below. #doctest: +SKIP
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
- >>> data = urllib.request.open(full_url)
+ >>> data = urllib.request.urlopen(full_url)
Notice that the full URL is created by adding a ``?`` to the URL, followed by
the encoded values.
@@ -172,7 +173,8 @@ Explorer [#]_. ::
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
- data = urllib.parse.urlencode(values)
+ data = urllib.parse.urlencode(values)
+ data = data.encode('utf-8')
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req)
the_page = response.read()
@@ -205,9 +207,9 @@ e.g. ::
>>> req = urllib.request.Request('http://www.pretend_server.org')
>>> try: urllib.request.urlopen(req)
- >>> except urllib.error.URLError as e:
- >>> print(e.reason)
- >>>
+ ... except urllib.error.URLError as e:
+ ... print(e.reason) #doctest: +SKIP
+ ...
(4, 'getaddrinfo failed')
@@ -313,18 +315,17 @@ geturl, and info, methods as returned by the ``urllib.response`` module::
>>> req = urllib.request.Request('http://www.python.org/fish.html')
>>> try:
- >>> urllib.request.urlopen(req)
- >>> except urllib.error.HTTPError as e:
- >>> print(e.code)
- >>> print(e.read())
- >>>
+ ... urllib.request.urlopen(req)
+ ... except urllib.error.HTTPError as e:
+ ... print(e.code)
+ ... print(e.read()) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
+ ...
404
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
- "http://www.w3.org/TR/html4/loose.dtd">
- <?xml-stylesheet href="./css/ht2html.css"
- type="text/css"?>
- <html><head><title>Error 404: File Not Found</title>
- ...... etc...
+ b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
+ ...
+ <title>Page Not Found</title>\n
+ ...
Wrapping it Up
--------------
@@ -446,12 +447,12 @@ Authentication Tutorial
When authentication is required, the server sends a header (as well as the 401
error code) requesting authentication. This specifies the authentication scheme
-and a 'realm'. The header looks like : ``Www-authenticate: SCHEME
+and a 'realm'. The header looks like : ``WWW-Authenticate: SCHEME
realm="REALM"``.
e.g. ::
- Www-authenticate: Basic realm="cPanel Users"
+ WWW-Authenticate: Basic realm="cPanel Users"
The client should then retry the request with the appropriate name and password
diff --git a/Doc/howto/webservers.rst b/Doc/howto/webservers.rst
index caf0ad6667..72ccd1f690 100644
--- a/Doc/howto/webservers.rst
+++ b/Doc/howto/webservers.rst
@@ -264,7 +264,7 @@ used for the deployment of WSGI applications.
* `FastCGI, SCGI, and Apache: Background and Future
<http://www.vmunix.com/mark/blog/archives/2006/01/02/fastcgi-scgi-and-apache-background-and-future/>`_
- is a discussion on why the concept of FastCGI and SCGI is better that that
+ is a discussion on why the concept of FastCGI and SCGI is better than that
of mod_python.
@@ -274,7 +274,7 @@ Setting up FastCGI
Each web server requires a specific module.
* Apache has both `mod_fastcgi <http://www.fastcgi.com/drupal/>`_ and `mod_fcgid
- <http://fastcgi.coremail.cn/>`_. ``mod_fastcgi`` is the original one, but it
+ <http://httpd.apache.org/mod_fcgid/>`_. ``mod_fastcgi`` is the original one, but it
has some licensing issues, which is why it is sometimes considered non-free.
``mod_fcgid`` is a smaller, compatible alternative. One of these modules needs
to be loaded by Apache.
@@ -293,7 +293,7 @@ following WSGI-application::
# -*- coding: UTF-8 -*-
import sys, os
- from cgi import escape
+ from html import escape
from flup.server.fcgi import WSGIServer
def app(environ, start_response):
@@ -365,7 +365,7 @@ testing.
A really great WSGI feature is middleware. Middleware is a layer around your
program which can add various functionality to it. There is quite a bit of
-`middleware <http://wsgi.org/wsgi/Middleware_and_Utilities>`_ already
+`middleware <http://www.wsgi.org/en/latest/libraries.html>`_ already
available. For example, instead of writing your own session management (HTTP
is a stateless protocol, so to associate multiple HTTP requests with a single
user your application must create and manage such state via a session), you can
@@ -396,9 +396,9 @@ compared with other web technologies.
.. seealso::
- A good overview of WSGI-related code can be found in the `WSGI wiki
- <http://wsgi.org/wsgi>`_, which contains an extensive list of `WSGI servers
- <http://wsgi.org/wsgi/Servers>`_ which can be used by *any* application
+ A good overview of WSGI-related code can be found in the `WSGI homepage
+ <http://www.wsgi.org/en/latest/index.html>`_, which contains an extensive list of `WSGI servers
+ <http://www.wsgi.org/en/latest/servers.html>`_ which can be used by *any* application
supporting WSGI.
You might be interested in some WSGI-supporting modules already contained in