diff options
Diffstat (limited to 'Doc/howto')
-rw-r--r-- | Doc/howto/clinic.rst | 1636 | ||||
-rw-r--r-- | Doc/howto/functional.rst | 144 | ||||
-rw-r--r-- | Doc/howto/index.rst | 1 | ||||
-rw-r--r-- | Doc/howto/logging-cookbook.rst | 4 | ||||
-rw-r--r-- | Doc/howto/logging.rst | 4 | ||||
-rw-r--r-- | Doc/howto/pyporting.rst | 612 | ||||
-rw-r--r-- | Doc/howto/regex.rst | 24 | ||||
-rw-r--r-- | Doc/howto/urllib2.rst | 2 |
8 files changed, 1975 insertions, 452 deletions
diff --git a/Doc/howto/clinic.rst b/Doc/howto/clinic.rst new file mode 100644 index 0000000000..5ee6c3da10 --- /dev/null +++ b/Doc/howto/clinic.rst @@ -0,0 +1,1636 @@ +====================== +Argument Clinic How-To +====================== + +:author: Larry Hastings + + +.. topic:: Abstract + + Argument Clinic is a preprocessor for CPython C files. + Its purpose is to automate all the boilerplate involved + with writing argument parsing code for "builtins". + This document shows you how to convert your first C + function to work with Argument Clinic, and then introduces + some advanced topics on Argument Clinic usage. + + Currently Argument Clinic is considered internal-only + for CPython. Its use is not supported for files outside + CPython, and no guarantees are made regarding backwards + compatibility for future versions. In other words: if you + maintain an external C extension for CPython, you're welcome + to experiment with Argument Clinic in your own code. But the + version of Argument Clinic that ships with CPython 3.5 *could* + be totally incompatible and break all your code. + +============================ +The Goals Of Argument Clinic +============================ + +Argument Clinic's primary goal +is to take over responsibility for all argument parsing code +inside CPython. This means that, when you convert a function +to work with Argument Clinic, that function should no longer +do any of its own argument parsing--the code generated by +Argument Clinic should be a "black box" to you, where CPython +calls in at the top, and your code gets called at the bottom, +with ``PyObject *args`` (and maybe ``PyObject *kwargs``) +magically converted into the C variables and types you need. + +In order for Argument Clinic to accomplish its primary goal, +it must be easy to use. Currently, working with CPython's +argument parsing library is a chore, requiring maintaining +redundant information in a surprising number of places. +When you use Argument Clinic, you don't have to repeat yourself. + +Obviously, no one would want to use Argument Clinic unless +it's solving their problem--and without creating new problems of +its own. +So it's paramount that Argument Clinic generate correct code. +It'd be nice if the code was faster, too, but at the very least +it should not introduce a major speed regression. (Eventually Argument +Clinic *should* make a major speedup possible--we could +rewrite its code generator to produce tailor-made argument +parsing code, rather than calling the general-purpose CPython +argument parsing library. That would make for the fastest +argument parsing possible!) + +Additionally, Argument Clinic must be flexible enough to +work with any approach to argument parsing. Python has +some functions with some very strange parsing behaviors; +Argument Clinic's goal is to support all of them. + +Finally, the original motivation for Argument Clinic was +to provide introspection "signatures" for CPython builtins. +It used to be, the introspection query functions would throw +an exception if you passed in a builtin. With Argument +Clinic, that's a thing of the past! + +One idea you should keep in mind, as you work with +Argument Clinic: the more information you give it, the +better job it'll be able to do. +Argument Clinic is admittedly relatively simple right +now. But as it evolves it will get more sophisticated, +and it should be able to do many interesting and smart +things with all the information you give it. + + +======================== +Basic Concepts And Usage +======================== + +Argument Clinic ships with CPython; you'll find it in ``Tools/clinic/clinic.py``. +If you run that script, specifying a C file as an argument:: + + % python3 Tools/clinic/clinic.py foo.c + +Argument Clinic will scan over the file looking for lines that +look exactly like this:: + + /*[clinic input] + +When it finds one, it reads everything up to a line that looks +exactly like this:: + + [clinic start generated code]*/ + +Everything in between these two lines is input for Argument Clinic. +All of these lines, including the beginning and ending comment +lines, are collectively called an Argument Clinic "block". + +When Argument Clinic parses one of these blocks, it +generates output. This output is rewritten into the C file +immediately after the block, followed by a comment containing a checksum. +The Argument Clinic block now looks like this:: + + /*[clinic input] + ... clinic input goes here ... + [clinic start generated code]*/ + ... clinic output goes here ... + /*[clinic end generated code: checksum=...]*/ + +If you run Argument Clinic on the same file a second time, Argument Clinic +will discard the old output and write out the new output with a fresh checksum +line. However, if the input hasn't changed, the output won't change either. + +You should never modify the output portion of an Argument Clinic block. Instead, +change the input until it produces the output you want. (That's the purpose of the +checksum--to detect if someone changed the output, as these edits would be lost +the next time Argument Clinic writes out fresh output.) + +For the sake of clarity, here's the terminology we'll use with Argument Clinic: + +* The first line of the comment (``/*[clinic input]``) is the *start line*. +* The last line of the initial comment (``[clinic start generated code]*/``) is the *end line*. +* The last line (``/*[clinic end generated code: checksum=...]*/``) is the *checksum line*. +* In between the start line and the end line is the *input*. +* In between the end line and the checksum line is the *output*. +* All the text collectively, from the start line to the checksum line inclusively, + is the *block*. (A block that hasn't been successfully processed by Argument + Clinic yet doesn't have output or a checksum line, but it's still considered + a block.) + + +============================== +Converting Your First Function +============================== + +The best way to get a sense of how Argument Clinic works is to +convert a function to work with it. Here, then, are the bare +minimum steps you'd need to follow to convert a function to +work with Argument Clinic. Note that for code you plan to +check in to CPython, you really should take the conversion farther, +using some of the advanced concepts you'll see later on in +the document (like "return converters" and "self converters"). +But we'll keep it simple for this walkthrough so you can learn. + +Let's dive in! + +0. Make sure you're working with a freshly updated checkout + of the CPython trunk. + +1. Find a Python builtin that calls either :c:func:`PyArg_ParseTuple` + or :c:func:`PyArg_ParseTupleAndKeywords`, and hasn't been converted + to work with Argument Clinic yet. + For my example I'm using ``pickle.Pickler.dump()``. + +2. If the call to the ``PyArg_Parse`` function uses any of the + following format units:: + + O& + O! + es + es# + et + et# + + or if it has multiple calls to :c:func:`PyArg_ParseTuple`, + you should choose a different function. Argument Clinic *does* + support all of these scenarios. But these are advanced + topics--let's do something simpler for your first function. + + Also, if the function has multiple calls to :c:func:`PyArg_ParseTuple` + or :c:func:`PyArg_ParseTupleAndKeywords` where it supports different + types for the same argument, or if the function uses something besides + PyArg_Parse functions to parse its arguments, it probably + isn't suitable for conversion to Argument Clinic. Argument Clinic + doesn't support generic functions or polymorphic parameters. + +3. Add the following boilerplate above the function, creating our block:: + + /*[clinic input] + [clinic start generated code]*/ + +4. Cut the docstring and paste it in between the ``[clinic]`` lines, + removing all the junk that makes it a properly quoted C string. + When you're done you should have just the text, based at the left + margin, with no line wider than 80 characters. + (Argument Clinic will preserve indents inside the docstring.) + + If the old docstring had a first line that looked like a function + signature, throw that line away. (The docstring doesn't need it + anymore--when you use ``help()`` on your builtin in the future, + the first line will be built automatically based on the function's + signature.) + + Sample:: + + /*[clinic input] + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + +5. If your docstring doesn't have a "summary" line, Argument Clinic will + complain. So let's make sure it has one. The "summary" line should + be a paragraph consisting of a single 80-column line + at the beginning of the docstring. + + (Our example docstring consists solely of a summary line, so the sample + code doesn't have to change for this step.) + +6. Above the docstring, enter the name of the function, followed + by a blank line. This should be the Python name of the function, + and should be the full dotted path + to the function--it should start with the name of the module, + include any sub-modules, and if the function is a method on + a class it should include the class name too. + + Sample:: + + /*[clinic input] + pickle.Pickler.dump + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + +7. If this is the first time that module or class has been used with Argument + Clinic in this C file, + you must declare the module and/or class. Proper Argument Clinic hygiene + prefers declaring these in a separate block somewhere near the + top of the C file, in the same way that include files and statics go at + the top. (In our sample code we'll just show the two blocks next to + each other.) + + Sample:: + + /*[clinic input] + module pickle + class pickle.Pickler + [clinic start generated code]*/ + + /*[clinic input] + pickle.Pickler.dump + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + + The name of the class and module should be the same as the one + seen by Python. Check the name defined in the :c:type:`PyModuleDef` + or :c:type:`PyTypeObject` as appropriate. + + + +8. Declare each of the parameters to the function. Each parameter + should get its own line. All the parameter lines should be + indented from the function name and the docstring. + + The general form of these parameter lines is as follows:: + + name_of_parameter: converter + + If the parameter has a default value, add that after the + converter:: + + name_of_parameter: converter = default_value + + Argument Clinic's support for "default values" is quite sophisticated; + please see :ref:`the section below on default values <default_values>` + for more information. + + Add a blank line below the parameters. + + What's a "converter"? It establishes both the type + of the variable used in C, and the method to convert the Python + value into a C value at runtime. + For now you're going to use what's called a "legacy converter"--a + convenience syntax intended to make porting old code into Argument + Clinic easier. + + For each parameter, copy the "format unit" for that + parameter from the ``PyArg_Parse()`` format argument and + specify *that* as its converter, as a quoted + string. ("format unit" is the formal name for the one-to-three + character substring of the ``format`` parameter that tells + the argument parsing function what the type of the variable + is and how to convert it. For more on format units please + see :ref:`arg-parsing`.) + + For multicharacter format units like ``z#``, use the + entire two-or-three character string. + + Sample:: + + /*[clinic input] + module pickle + class pickle.Pickler + [clinic start generated code]*/ + + /*[clinic input] + pickle.Pickler.dump + + obj: 'O' + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + +9. If your function has ``|`` in the format string, meaning some + parameters have default values, you can ignore it. Argument + Clinic infers which parameters are optional based on whether + or not they have default values. + + If your function has ``$`` in the format string, meaning it + takes keyword-only arguments, specify ``*`` on a line by + itself before the first keyword-only argument, indented the + same as the parameter lines. + + (``pickle.Pickler.dump`` has neither, so our sample is unchanged.) + + +10. If the existing C function calls :c:func:`PyArg_ParseTuple` + (as opposed to :c:func:`PyArg_ParseTupleAndKeywords`), then all its + arguments are positional-only. + + To mark all parameters as positional-only in Argument Clinic, + add a ``/`` on a line by itself after the last parameter, + indented the same as the parameter lines. + + Currently this is all-or-nothing; either all parameters are + positional-only, or none of them are. (In the future Argument + Clinic may relax this restriction.) + + Sample:: + + /*[clinic input] + module pickle + class pickle.Pickler + [clinic start generated code]*/ + + /*[clinic input] + pickle.Pickler.dump + + obj: 'O' + / + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + +11. It's helpful to write a per-parameter docstring for each parameter. + But per-parameter docstrings are optional; you can skip this step + if you prefer. + + Here's how to add a per-parameter docstring. The first line + of the per-parameter docstring must be indented further than the + parameter definition. The left margin of this first line establishes + the left margin for the whole per-parameter docstring; all the text + you write will be outdented by this amount. You can write as much + text as you like, across multiple lines if you wish. + + Sample:: + + /*[clinic input] + module pickle + class pickle.Pickler + [clinic start generated code]*/ + + /*[clinic input] + pickle.Pickler.dump + + obj: 'O' + The object to be pickled. + / + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + +12. Save and close the file, then run ``Tools/clinic/clinic.py`` on it. + With luck everything worked and your block now has output! Reopen + the file in your text editor to see:: + + /*[clinic input] + module pickle + class pickle.Pickler + [clinic start generated code]*/ + /*[clinic end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ + + /*[clinic input] + pickle.Pickler.dump + + obj: 'O' + The object to be pickled. + / + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + + PyDoc_STRVAR(pickle_Pickler_dump__doc__, + "Write a pickled representation of obj to the open file.\n" + "\n" + ... + static PyObject * + pickle_Pickler_dump_impl(PyObject *self, PyObject *obj) + /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ + + Obviously, if Argument Clinic didn't produce any output, it's because + it found an error in your input. Keep fixing your errors and retrying + until Argument Clinic processes your file without complaint. + +13. Double-check that the argument-parsing code Argument Clinic generated + looks basically the same as the existing code. + + First, ensure both places use the same argument-parsing function. + The existing code must call either + :c:func:`PyArg_ParseTuple` or :c:func:`PyArg_ParseTupleAndKeywords`; + ensure that the code generated by Argument Clinic calls the + *exact* same function. + + Second, the format string passed in to :c:func:`PyArg_ParseTuple` or + :c:func:`PyArg_ParseTupleAndKeywords` should be *exactly* the same + as the hand-written one in the existing function, up to the colon + or semi-colon. + + (Argument Clinic always generates its format strings + with a ``:`` followed by the name of the function. If the + existing code's format string ends with ``;``, to provide + usage help, this change is harmless--don't worry about it.) + + Third, for parameters whose format units require two arguments + (like a length variable, or an encoding string, or a pointer + to a conversion function), ensure that the second argument is + *exactly* the same between the two invocations. + + Fourth, inside the output portion of the block you'll find a preprocessor + macro defining the appropriate static :c:type:`PyMethodDef` structure for + this builtin:: + + #define _PICKLE_PICKLER_DUMP_METHODDEF \ + {"dump", (PyCFunction)_pickle_Pickler_dump, METH_O, _pickle_Pickler_dump__doc__}, + + This static structure should be *exactly* the same as the existing static + :c:type:`PyMethodDef` structure for this builtin. + + If any of these items differ in *any way*, + adjust your Argument Clinic function specification and rerun + ``Tools/clinic/clinic.py`` until they *are* the same. + + +14. Notice that the last line of its output is the declaration + of your "impl" function. This is where the builtin's implementation goes. + Delete the existing prototype of the function you're modifying, but leave + the opening curly brace. Now delete its argument parsing code and the + declarations of all the variables it dumps the arguments into. + Notice how the Python arguments are now arguments to this impl function; + if the implementation used different names for these variables, fix it. + + Let's reiterate, just because it's kind of weird. Your code should now + look like this:: + + static return_type + your_function_impl(...) + /*[clinic end generated code: checksum=...]*/ + { + ... + + Argument Clinic generated the checksum line and the function prototype just + above it. You should write the opening (and closing) curly braces for the + function, and the implementation inside. + + Sample:: + + /*[clinic input] + module pickle + class pickle.Pickler + [clinic start generated code]*/ + /*[clinic end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ + + /*[clinic input] + pickle.Pickler.dump + + obj: 'O' + The object to be pickled. + / + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + + PyDoc_STRVAR(pickle_Pickler_dump__doc__, + "Write a pickled representation of obj to the open file.\n" + "\n" + ... + static PyObject * + pickle_Pickler_dump_impl(PyObject *self, PyObject *obj) + /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ + { + /* Check whether the Pickler was initialized correctly (issue3664). + Developers often forget to call __init__() in their subclasses, which + would trigger a segfault without this check. */ + if (self->write == NULL) { + PyErr_Format(PicklingError, + "Pickler.__init__() was not called by %s.__init__()", + Py_TYPE(self)->tp_name); + return NULL; + } + + if (_Pickler_ClearBuffer(self) < 0) + return NULL; + + ... + +15. Remember the macro with the :c:type:`PyMethodDef` structure for this + function? Find the existing :c:type:`PyMethodDef` structure for this + function and replace it with a reference to the macro. (If the builtin + is at module scope, this will probably be very near the end of the file; + if the builtin is a class method, this will probably be below but relatively + near to the implementation.) + + Note that the body of the macro contains a trailing comma. So when you + replace the existing static :c:type:`PyMethodDef` structure with the macro, + *don't* add a comma to the end. + + Sample:: + + static struct PyMethodDef Pickler_methods[] = { + _PICKLE_PICKLER_DUMP_METHODDEF + _PICKLE_PICKLER_CLEAR_MEMO_METHODDEF + {NULL, NULL} /* sentinel */ + }; + + +16. Compile, then run the relevant portions of the regression-test suite. + This change should not introduce any new compile-time warnings or errors, + and there should be no externally-visible change to Python's behavior. + + Well, except for one difference: ``inspect.signature()`` run on your function + should now provide a valid signature! + + Congratulations, you've ported your first function to work with Argument Clinic! + +=============== +Advanced Topics +=============== + +Now that you've had some experience working with Argument Clinic, it's time +for some advanced topics. + + +Symbolic default values +----------------------- + +The default value you provide for a parameter can't be any arbitrary +expression. Currently the following are explicitly supported: + +* Numeric constants (integer and float) +* String constants +* ``True``, ``False``, and ``None`` +* Simple symbolic constants like ``sys.maxsize``, which must + start with the name of the module + +In case you're curious, this is implemented in ``from_builtin()`` +in ``Lib/inspect.py``. + +(In the future, this may need to get even more elaborate, +to allow full expressions like ``CONSTANT - 1``.) + + +Renaming the C functions generated by Argument Clinic +----------------------------------------------------- + +Argument Clinic automatically names the functions it generates for you. +Occasionally this may cause a problem, if the generated name collides with +the name of an existing C function. There's an easy solution: override the names +used for the C functions. Just add the keyword ``"as"`` +to your function declaration line, followed by the function name you wish to use. +Argument Clinic will use that function name for the base (generated) function, +then add ``"_impl"`` to the end and use that for the name of the impl function. + +For example, if we wanted to rename the C function names generated for +``pickle.Pickler.dump``, it'd look like this:: + + /*[clinic input] + pickle.Pickler.dump as pickler_dumper + + ... + +The base function would now be named ``pickler_dumper()``, +and the impl function would now be named ``pickler_dumper_impl()``. + + + +Converting functions using PyArg_UnpackTuple +-------------------------------------------- + +To convert a function parsing its arguments with :c:func:`PyArg_UnpackTuple`, +simply write out all the arguments, specifying each as an ``object``. You +may specify the ``type`` argument to cast the type as appropriate. All +arguments should be marked positional-only (add a ``/`` on a line by itself +after the last argument). + +Currently the generated code will use :c:func:`PyArg_ParseTuple`, but this +will change soon. + +Optional Groups +--------------- + +Some legacy functions have a tricky approach to parsing their arguments: +they count the number of positional arguments, then use a ``switch`` statement +to call one of several different :c:func:`PyArg_ParseTuple` calls depending on +how many positional arguments there are. (These functions cannot accept +keyword-only arguments.) This approach was used to simulate optional +arguments back before :c:func:`PyArg_ParseTupleAndKeywords` was created. + +While functions using this approach can often be converted to +use :c:func:`PyArg_ParseTupleAndKeywords`, optional arguments, and default values, +it's not always possible. Some of these legacy functions have +behaviors :c:func:`PyArg_ParseTupleAndKeywords` doesn't directly support. +The most obvious example is the builtin function ``range()``, which has +an optional argument on the *left* side of its required argument! +Another example is ``curses.window.addch()``, which has a group of two +arguments that must always be specified together. (The arguments are +called ``x`` and ``y``; if you call the function passing in ``x``, +you must also pass in ``y``--and if you don't pass in ``x`` you may not +pass in ``y`` either.) + +In any case, the goal of Argument Clinic is to support argument parsing +for all existing CPython builtins without changing their semantics. +Therefore Argument Clinic supports +this alternate approach to parsing, using what are called *optional groups*. +Optional groups are groups of arguments that must all be passed in together. +They can be to the left or the right of the required arguments. They +can *only* be used with positional-only parameters. + +To specify an optional group, add a ``[`` on a line by itself before +the parameters you wish to group together, and a ``]`` on a line by itself +after these parameters. As an example, here's how ``curses.window.addch`` +uses optional groups to make the first two parameters and the last +parameter optional:: + + /*[clinic input] + + curses.window.addch + + [ + x: int + X-coordinate. + y: int + Y-coordinate. + ] + + ch: object + Character to add. + + [ + attr: long + Attributes for the character. + ] + / + + ... + + +Notes: + +* For every optional group, one additional parameter will be passed into the + impl function representing the group. The parameter will be an int named + ``group_{direction}_{number}``, + where ``{direction}`` is either ``right`` or ``left`` depending on whether the group + is before or after the required parameters, and ``{number}`` is a monotonically + increasing number (starting at 1) indicating how far away the group is from + the required parameters. When the impl is called, this parameter will be set + to zero if this group was unused, and set to non-zero if this group was used. + (By used or unused, I mean whether or not the parameters received arguments + in this invocation.) + +* If there are no required arguments, the optional groups will behave + as if they're to the right of the required arguments. + +* In the case of ambiguity, the argument parsing code + favors parameters on the left (before the required parameters). + +* Optional groups can only contain positional-only parameters. + +* Optional groups are *only* intended for legacy code. Please do not + use optional groups for new code. + + +Using real Argument Clinic converters, instead of "legacy converters" +--------------------------------------------------------------------- + +To save time, and to minimize how much you need to learn +to achieve your first port to Argument Clinic, the walkthrough above tells +you to use "legacy converters". "Legacy converters" are a convenience, +designed explicitly to make porting existing code to Argument Clinic +easier. And to be clear, their use is acceptable when porting code for +Python 3.4. + +However, in the long term we probably want all our blocks to +use Argument Clinic's real syntax for converters. Why? A couple +reasons: + +* The proper converters are far easier to read and clearer in their intent. +* There are some format units that are unsupported as "legacy converters", + because they require arguments, and the legacy converter syntax doesn't + support specifying arguments. +* In the future we may have a new argument parsing library that isn't + restricted to what :c:func:`PyArg_ParseTuple` supports; this flexibility + won't be available to parameters using legacy converters. + +Therefore, if you don't mind a little extra effort, please use the normal +converters instead of legacy converters. + +In a nutshell, the syntax for Argument Clinic (non-legacy) converters +looks like a Python function call. However, if there are no explicit +arguments to the function (all functions take their default values), +you may omit the parentheses. Thus ``bool`` and ``bool()`` are exactly +the same converters. + +All arguments to Argument Clinic converters are keyword-only. +All Argument Clinic converters accept the following arguments: + + ``c_default`` + The default value for this parameter when defined in C. + Specifically, this will be the initializer for the variable declared + in the "parse function". See :ref:`the section on default values <default_values>` + for how to use this. + Specified as a string. + + ``annotation`` + The annotation value for this parameter. Not currently supported, + because PEP 8 mandates that the Python library may not use + annotations. + +In addition, some converters accept additional arguments. Here is a list +of these arguments, along with their meanings: + + ``bitwise`` + Only supported for unsigned integers. The native integer value of this + Python argument will be written to the parameter without any range checking, + even for negative values. + + ``converter`` + Only supported by the ``object`` converter. Specifies the name of a + :ref:`C "converter function" <o_ampersand>` + to use to convert this object to a native type. + + ``encoding`` + Only supported for strings. Specifies the encoding to use when converting + this string from a Python str (Unicode) value into a C ``char *`` value. + + ``length`` + Only supported for strings. If true, requests that the length of the + string be passed in to the impl function, just after the string parameter, + in a parameter named ``<parameter_name>_length``. + + ``nullable`` + Only supported for strings. If true, this parameter may also be set to + ``None``, in which case the C parameter will be set to ``NULL``. + + ``subclass_of`` + Only supported for the ``object`` converter. Requires that the Python + value be a subclass of a Python type, as expressed in C. + + ``types`` + Only supported for the ``object`` (and ``self``) converter. Specifies + the C type that will be used to declare the variable. Default value is + ``"PyObject *"``. + + ``types`` + A string containing a list of Python types (and possibly pseudo-types); + this restricts the allowable Python argument to values of these types. + (This is not a general-purpose facility; as a rule it only supports + specific lists of types as shown in the legacy converter table.) + + ``zeroes`` + Only supported for strings. If true, embedded NUL bytes (``'\\0'``) are + permitted inside the value. + +Please note, not every possible combination of arguments will work. +Often these arguments are implemented internally by specific ``PyArg_ParseTuple`` +*format units*, with specific behavior. For example, currently you cannot +call ``str`` and pass in ``zeroes=True`` without also specifying an ``encoding``; +although it's perfectly reasonable to think this would work, these semantics don't +map to any existing format unit. So Argument Clinic doesn't support it. (Or, at +least, not yet.) + +Below is a table showing the mapping of legacy converters into real +Argument Clinic converters. On the left is the legacy converter, +on the right is the text you'd replace it with. + +========= ================================================================================= +``'B'`` ``unsigned_char(bitwise=True)`` +``'b'`` ``unsigned_char`` +``'c'`` ``char`` +``'C'`` ``int(types='str')`` +``'d'`` ``double`` +``'D'`` ``Py_complex`` +``'es#'`` ``str(encoding='name_of_encoding', length=True, zeroes=True)`` +``'es'`` ``str(encoding='name_of_encoding')`` +``'et#'`` ``str(encoding='name_of_encoding', types='bytes bytearray str', length=True)`` +``'et'`` ``str(encoding='name_of_encoding', types='bytes bytearray str')`` +``'f'`` ``float`` +``'h'`` ``short`` +``'H'`` ``unsigned_short(bitwise=True)`` +``'i'`` ``int`` +``'I'`` ``unsigned_int(bitwise=True)`` +``'k'`` ``unsigned_long(bitwise=True)`` +``'K'`` ``unsigned_PY_LONG_LONG(bitwise=True)`` +``'L'`` ``PY_LONG_LONG`` +``'n'`` ``Py_ssize_t`` +``'O!'`` ``object(subclass_of='&PySomething_Type')`` +``'O&'`` ``object(converter='name_of_c_function')`` +``'O'`` ``object`` +``'p'`` ``bool`` +``'s#'`` ``str(length=True)`` +``'S'`` ``PyBytesObject`` +``'s'`` ``str`` +``'s*'`` ``Py_buffer(types='str bytes bytearray buffer')`` +``'u#'`` ``Py_UNICODE(length=True)`` +``'u'`` ``Py_UNICODE`` +``'U'`` ``unicode`` +``'w*'`` ``Py_buffer(types='bytearray rwbuffer')`` +``'y#'`` ``str(types='bytes', length=True)`` +``'Y'`` ``PyByteArrayObject`` +``'y'`` ``str(types='bytes')`` +``'y*'`` ``Py_buffer`` +``'Z#'`` ``Py_UNICODE(nullable=True, length=True)`` +``'z#'`` ``str(nullable=True, length=True)`` +``'Z'`` ``Py_UNICODE(nullable=True)`` +``'z'`` ``str(nullable=True)`` +``'z*'`` ``Py_buffer(types='str bytes bytearray buffer', nullable=True)`` +========= ================================================================================= + +As an example, here's our sample ``pickle.Pickler.dump`` using the proper +converter:: + + /*[clinic input] + pickle.Pickler.dump + + obj: object + The object to be pickled. + / + + Write a pickled representation of obj to the open file. + [clinic start generated code]*/ + +Argument Clinic will show you all the converters it has +available. For each converter it'll show you all the parameters +it accepts, along with the default value for each parameter. +Just run ``Tools/clinic/clinic.py --converters`` to see the full list. + +Py_buffer +--------- + +When using the ``Py_buffer`` converter +(or the ``'s*'``, ``'w*'``, ``'*y'``, or ``'z*'`` legacy converters), +you *must* not call :c:func:`PyBuffer_Release` on the provided buffer. +Argument Clinic generates code that does it for you (in the parsing function). + + + +Advanced converters +------------------- + +Remeber those format units you skipped for your first +time because they were advanced? Here's how to handle those too. + +The trick is, all those format units take arguments--either +conversion functions, or types, or strings specifying an encoding. +(But "legacy converters" don't support arguments. That's why we +skipped them for your first function.) The argument you specified +to the format unit is now an argument to the converter; this +argument is either ``converter`` (for ``O&``), ``subclass_of`` (for ``O!``), +or ``encoding`` (for all the format units that start with ``e``). + +When using ``subclass_of``, you may also want to use the other +custom argument for ``object()``: ``type``, which lets you set the type +actually used for the parameter. For example, if you want to ensure +that the object is a subclass of ``PyUnicode_Type``, you probably want +to use the converter ``object(type='PyUnicodeObject *', subclass_of='&PyUnicode_Type')``. + +One possible problem with using Argument Clinic: it takes away some possible +flexibility for the format units starting with ``e``. When writing a +``PyArg_Parse`` call by hand, you could theoretically decide at runtime what +encoding string to pass in to :c:func:`PyArg_ParseTuple`. But now this string must +be hard-coded at Argument-Clinic-preprocessing-time. This limitation is deliberate; +it made supporting this format unit much easier, and may allow for future optimizations. +This restriction doesn't seem unreasonable; CPython itself always passes in static +hard-coded encoding strings for parameters whose format units start with ``e``. + + +.. _default_values: + +Parameter default values +------------------------ + +Default values for parameters can be any of a number of values. +At their simplest, they can be string, int, or float literals:: + + foo: str = "abc" + bar: int = 123 + bat: float = 45.6 + +They can also use any of Python's built-in constants:: + + yep: bool = True + nope: bool = False + nada: object = None + +There's also special support for a default value of ``NULL``, and +for simple expressions, documented in the following sections. + + +The ``NULL`` default value +-------------------------- + +For string and object parameters, you can set them to ``None`` to indicate +that there's no default. However, that means the C variable will be +initialized to ``Py_None``. For convenience's sakes, there's a special +value called ``NULL`` for just this reason: from Python's perspective it +behaves like a default value of ``None``, but the C variable is initialized +with ``NULL``. + +Expressions specified as default values +--------------------------------------- + +The default value for a parameter can be more than just a literal value. +It can be an entire expression, using math operators and looking up attributes +on objects. However, this support isn't exactly simple, because of some +non-obvious semantics. + +Consider the following example:: + + foo: Py_ssize_t = sys.maxsize - 1 + +``sys.maxsize`` can have different values on different platforms. Therefore +Argument Clinic can't simply evaluate that expression locally and hard-code it +in C. So it stores the default in such a way that it will get evaluated at +runtime, when the user asks for the function's signature. + +What namespace is available when the expression is evaluated? It's evaluated +in the context of the module the builtin came from. So, if your module has an +attribute called "``max_widgets``", you may simply use it:: + + foo: Py_ssize_t = max_widgets + +If the symbol isn't found in the current module, it fails over to looking in +``sys.modules``. That's how it can find ``sys.maxsize`` for example. (Since you +don't know in advance what modules the user will load into their interpreter, +it's best to restrict yourself to modules that are preloaded by Python itself.) + +Evaluating default values only at runtime means Argument Clinic can't compute +the correct equivalent C default value. So you need to tell it explicitly. +When you use an expression, you must also specify the equivalent expression +in C, using the ``c_default`` parameter to the converter:: + + foo: Py_ssize_t(c_default="PY_SSIZE_T_MAX - 1") = sys.maxsize - 1 + +Another complication: Argument Clinic can't know in advance whether or not the +expression you supply is valid. It parses it to make sure it looks legal, but +it can't *actually* know. You must be very careful when using expressions to +specify values that are guaranteed to be valid at runtime! + +Finally, because expressions must be representable as static C values, there +are many restrictions on legal expressions. Here's a list of Python features +you're not permitted to use: + +* Function calls. +* Inline if statements (``3 if foo else 5``). +* Automatic sequence unpacking (``*[1, 2, 3]``). +* List/set/dict comprehensions and generator expressions. +* Tuple/list/set/dict literals. + + + +Using a return converter +------------------------ + +By default the impl function Argument Clinic generates for you returns ``PyObject *``. +But your C function often computes some C type, then converts it into the ``PyObject *`` +at the last moment. Argument Clinic handles converting your inputs from Python types +into native C types--why not have it convert your return value from a native C type +into a Python type too? + +That's what a "return converter" does. It changes your impl function to return +some C type, then adds code to the generated (non-impl) function to handle converting +that value into the appropriate ``PyObject *``. + +The syntax for return converters is similar to that of parameter converters. +You specify the return converter like it was a return annotation on the +function itself. Return converters behave much the same as parameter converters; +they take arguments, the arguments are all keyword-only, and if you're not changing +any of the default arguments you can omit the parentheses. + +(If you use both ``"as"`` *and* a return converter for your function, +the ``"as"`` should come before the return converter.) + +There's one additional complication when using return converters: how do you +indicate an error has occured? Normally, a function returns a valid (non-``NULL``) +pointer for success, and ``NULL`` for failure. But if you use an integer return converter, +all integers are valid. How can Argument Clinic detect an error? Its solution: each return +converter implicitly looks for a special value that indicates an error. If you return +that value, and an error has been set (``PyErr_Occurred()`` returns a true +value), then the generated code will propogate the error. Otherwise it will +encode the value you return like normal. + +Currently Argument Clinic supports only a few return converters:: + + bool + int + unsigned int + long + unsigned int + size_t + Py_ssize_t + float + double + DecodeFSDefault + +None of these take parameters. For the first three, return -1 to indicate +error. For ``DecodeFSDefault``, the return type is ``char *``; return a NULL +pointer to indicate an error. + +(There's also an experimental ``NoneType`` converter, which lets you +return ``Py_None`` on success or ``NULL`` on failure, without having +to increment the reference count on ``Py_None``. I'm not sure it adds +enough clarity to be worth using.) + +To see all the return converters Argument Clinic supports, along with +their parameters (if any), +just run ``Tools/clinic/clinic.py --converters`` for the full list. + + +Cloning existing functions +-------------------------- + +If you have a number of functions that look similar, you may be able to +use Clinic's "clone" feature. When you clone an existing function, +you reuse: + +* its parameters, including + + * their names, + + * their converters, with all parameters, + + * their default values, + + * their per-parameter docstrings, + + * their *kind* (whether they're positional only, + positional or keyword, or keyword only), and + +* its return converter. + +The only thing not copied from the original function is its docstring; +the syntax allows you to specify a new docstring. + +Here's the syntax for cloning a function:: + + /*[clinic input] + module.class.new_function [as c_basename] = module.class.existing_function + + Docstring for new_function goes here. + [clinic start generated code]*/ + +(The functions can be in different modules or classes. I wrote +``module.class`` in the sample just to illustrate that you must +use the full path to *both* functions.) + +Sorry, there's no syntax for partially-cloning a function, or cloning a function +then modifying it. Cloning is an all-or nothing proposition. + +Also, the function you are cloning from must have been previously defined +in the current file. + +Calling Python code +------------------- + +The rest of the advanced topics require you to write Python code +which lives inside your C file and modifies Argument Clinic's +runtime state. This is simple: you simply define a Python block. + +A Python block uses different delimiter lines than an Argument +Clinic function block. It looks like this:: + + /*[python input] + # python code goes here + [python start generated code]*/ + +All the code inside the Python block is executed at the +time it's parsed. All text written to stdout inside the block +is redirected into the "output" after the block. + +As an example, here's a Python block that adds a static integer +variable to the C code:: + + /*[python input] + print('static int __ignored_unused_variable__ = 0;') + [python start generated code]*/ + static int __ignored_unused_variable__ = 0; + /*[python checksum:...]*/ + + +Using a "self converter" +------------------------ + +Argument Clinic automatically adds a "self" parameter for you +using a default converter. However, you can override +Argument Clinic's converter and specify one yourself. +Just add your own ``self`` parameter as the first parameter in a +block, and ensure that its converter is an instance of +``self_converter`` or a subclass thereof. + +What's the point? This lets you automatically cast ``self`` +from ``PyObject *`` to a custom type, just like ``object()`` +does with its ``type`` parameter. + +How do you specify the custom type you want to cast ``self`` to? +If you only have one or two functions with the same type for ``self``, +you can directly use Argument Clinic's existing ``self`` converter, +passing in the type you want to use as the ``type`` parameter:: + + /*[clinic input] + + _pickle.Pickler.dump + + self: self(type="PicklerObject *") + obj: object + / + + Write a pickled representation of the given object to the open file. + [clinic start generated code]*/ + +On the other hand, if you have a lot of functions that will use the same +type for ``self``, it's best to create your own converter, subclassing +``self_converter`` but overwriting the ``type`` member:: + + /*[python input] + class PicklerObject_converter(self_converter): + type = "PicklerObject *" + [python start generated code]*/ + + /*[clinic input] + + _pickle.Pickler.dump + + self: PicklerObject + obj: object + / + + Write a pickled representation of the given object to the open file. + [clinic start generated code]*/ + + + +Writing a custom converter +-------------------------- + +As we hinted at in the previous section... you can write your own converters! +A converter is simply a Python class that inherits from ``CConverter``. +The main purpose of a custom converter is if you have a parameter using +the ``O&`` format unit--parsing this parameter means calling +a :c:func:`PyArg_ParseTuple` "converter function". + +Your converter class should be named ``*something*_converter``. +If the name follows this convention, then your converter class +will be automatically registered with Argument Clinic; its name +will be the name of your class with the ``_converter`` suffix +stripped off. (This is accomplished with a metaclass.) + +You shouldn't subclass ``CConverter.__init__``. Instead, you should +write a ``converter_init()`` function. ``converter_init()`` +always accepts a ``self`` parameter; after that, all additional +parameters *must* be keyword-only. Any arguments passed in to +the converter in Argument Clinic will be passed along to your +``converter_init()``. + +There are some additional members of ``CConverter`` you may wish +to specify in your subclass. Here's the current list: + +``type`` + The C type to use for this variable. + ``type`` should be a Python string specifying the type, e.g. ``int``. + If this is a pointer type, the type string should end with ``' *'``. + +``default`` + The Python default value for this parameter, as a Python value. + Or the magic value ``unspecified`` if there is no default. + +``py_default`` + ``default`` as it should appear in Python code, + as a string. + Or ``None`` if there is no default. + +``c_default`` + ``default`` as it should appear in C code, + as a string. + Or ``None`` if there is no default. + +``c_ignored_default`` + The default value used to initialize the C variable when + there is no default, but not specifying a default may + result in an "uninitialized variable" warning. This can + easily happen when using option groups--although + properly-written code will never actually use this value, + the variable does get passed in to the impl, and the + C compiler will complain about the "use" of the + uninitialized value. This value should always be a + non-empty string. + +``converter`` + The name of the C converter function, as a string. + +``impl_by_reference`` + A boolean value. If true, + Argument Clinic will add a ``&`` in front of the name of + the variable when passing it into the impl function. + +``parse_by_reference`` + A boolean value. If true, + Argument Clinic will add a ``&`` in front of the name of + the variable when passing it into :c:func:`PyArg_ParseTuple`. + + +Here's the simplest example of a custom converter, from ``Modules/zlibmodule.c``:: + + /*[python input] + + class uint_converter(CConverter): + type = 'unsigned int' + converter = 'uint_converter' + + [python start generated code]*/ + /*[python end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ + +This block adds a converter to Argument Clinic named ``uint``. Parameters +declared as ``uint`` will be declared as type ``unsigned int``, and will +be parsed by the ``'O&'`` format unit, which will call the ``uint_converter`` +converter function. +``uint`` variables automatically support default values. + +More sophisticated custom converters can insert custom C code to +handle initialization and cleanup. +You can see more examples of custom converters in the CPython +source tree; grep the C files for the string ``CConverter``. + +Writing a custom return converter +--------------------------------- + +Writing a custom return converter is much like writing +a custom converter. Except it's somewhat simpler, because return +converters are themselves much simpler. + +Return converters must subclass ``CReturnConverter``. +There are no examples yet of custom return converters, +because they are not widely used yet. If you wish to +write your own return converter, please read ``Tools/clinic/clinic.py``, +specifically the implementation of ``CReturnConverter`` and +all its subclasses. + +METH_O and METH_NOARGS +---------------------------------------------- + +To convert a function using ``METH_O``, make sure the function's +single argument is using the ``object`` converter, and mark the +arguments as positional-only:: + + /*[clinic input] + meth_o_sample + + argument: object + / + [clinic start generated code]*/ + + +To convert a function using ``METH_NOARGS``, just don't specify +any arguments. + +You can still use a self converter, a return converter, and specify +a ``type`` argument to the object converter for ``METH_O``. + +tp_new and tp_init functions +---------------------------------------------- + +You can convert ``tp_new`` and ``tp_init`` +functions. Just name them ``__new__`` or +``__init__`` as appropriate. Notes: + +* The function name generated for ``__new__`` doesn't end in ``__new__`` + like it would by default. It's just the name of the class, converted + into a valid C identifier. + +* No ``PyMethodDef`` ``#define`` is generated for these functions. + +* ``__init__`` functions return ``int``, not ``PyObject *``. + +* Argument Clinic supports any signature for these functions, even though + the parsing function is required to always take ``args`` and ``kwargs`` + objects. + +The #ifdef trick +---------------------------------------------- + +If you're converting a function that isn't available on all platforms, +there's a trick you can use to make life a little easier. The existing +code probably looks like this:: + + #ifdef HAVE_FUNCTIONNAME + static module_functionname(...) + { + ... + } + #endif /* HAVE_FUNCTIONNAME */ + +And then in the ``PyMethodDef`` structure at the bottom you'll have:: + + #ifdef HAVE_FUNCTIONNAME + {'functionname', ... }, + #endif /* HAVE_FUNCTIONNAME */ + +In this scenario, you should change the code to look like the following:: + + #ifdef HAVE_FUNCTIONNAME + /*[clinic input] + module.functionname + ... + [clinic start generated code]*/ + static module_functionname(...) + { + ... + } + #endif /* HAVE_FUNCTIONNAME */ + +Run Argument Clinic on the code in this state, then refresh the file in +your editor. Now you'll have the generated code, including the #define +for the ``PyMethodDef``, like so:: + + #ifdef HAVE_FUNCTIONNAME + /*[clinic input] + ... + [clinic start generated code]*/ + ... + #define MODULE_FUNCTIONNAME \ + {'functionname', ... }, + ... + /*[clinic end generated code: checksum=...]*/ + static module_functionname(...) + { + ... + } + #endif /* HAVE_FUNCTIONNAME */ + +Change the #endif at the bottom as follows:: + + #else + #define MODULE_FUNCTIONNAME + #endif /* HAVE_FUNCTIONNAME */ + +Now you can remove the #ifdefs around the ``PyMethodDef`` structure +at the end, and replace those three lines with ``MODULE_FUNCTIONNAME``. +If the function is available, the macro turns into the ``PyMethodDef`` +static value, including the trailing comma; if the function isn't +available, the macro turns into nothing. Perfect! + +(This is the preferred approach for optional functions; in the future, +Argument Clinic may generate the entire ``PyMethodDef`` structure.) + + +Changing and redirecting Clinic's output +---------------------------------------- + +It can be inconvenient to have Clinic's output interspersed with +your conventional hand-edited C code. Luckily, Clinic is configurable: +you can buffer up its output for printing later (or earlier!), or write +its output to a separate file. You can also add a prefix or suffix to +every line of Clinic's generated output. + +While changing Clinic's output in this manner can be a boon to readability, +it may result in Clinic code using types before they are defined, or +your code attempting to use Clinic-generated code befire it is defined. +These problems can be easily solved by rearranging the declarations in your file, +or moving where Clinic's generated code goes. (This is why the default behavior +of Clinic is to output everything into the current block; while many people +consider this hampers readability, it will never require rearranging your +code to fix definition-before-use problems.) + +Let's start with defining some terminology: + +*field* + A field, in this context, is a subsection of Clinic's output. + For example, the ``#define`` for the ``PyMethodDef`` structure + is a field, called ``methoddef_define``. Clinic has seven + different fields it can output per function definition:: + + docstring_prototype + docstring_definition + methoddef_define + impl_prototype + parser_prototype + parser_definition + impl_definition + + All the names are of the form ``"<a>_<b>"``, + where ``"<a>"`` is the semantic object represented (the parsing function, + the impl function, the docstring, or the methoddef structure) and ``"<b>"`` + represents what kind of statement the field is. Field names that end in + ``"_prototype"`` + represent forward declarations of that thing, without the actual body/data + of the thing; field names that end in ``"_definition"`` represent the actual + definition of the thing, with the body/data of the thing. (``"methoddef"`` + is special, it's the only one that ends with ``"_define"``, representing that + it's a preprocessor #define.) + +*destination* + A destination is a place Clinic can write output to. There are + five built-in destinations: + + ``block`` + The default destination: printed in the output section of + the current Clinic block. + + ``buffer`` + A text buffer where you can save text for later. Text sent + here is appended to the end of any exsiting text. It's an + error to have any text left in the buffer when Clinic finishes + processing a file. + + ``file`` + A separate "clinic file" that will be created automatically by Clinic. + The filename chosen for the file is ``{basename}.clinic{extension}``, + where ``basename`` and ``extension`` were assigned the output + from ``os.path.splitext()`` run on the current file. (Example: + the ``file`` destination for ``_pickle.c`` would be written to + ``_pickle.clinic.c``.) + + **Important: When using a** ``file`` **destination, you** + *must check in* **the generated file!** + + ``two-pass`` + A buffer like ``buffer``. However, a two-pass buffer can only + be written once, and it prints out all text sent to it during + all of processing, even from Clinic blocks *after* the + + ``suppress`` + The text is suppressed--thrown away. + + +Clinic defines five new directives that let you reconfigure its output. + +The first new directive is ``dump``:: + + dump <destination> + +This dumps the current contents of the named destination into the output of +the current block, and empties it. This only works with ``buffer`` and +``two-pass`` destinations. + +The second new directive is ``output``. The most basic form of ``output`` +is like this:: + + output <field> <destination> + +This tells Clinic to output *field* to *destination*. ``output`` also +supports a special meta-destination, called ``everything``, which tells +Clinic to output *all* fields to that *destination*. + +``output`` has a number of other functions:: + + output push + output pop + output preset <preset> + + +``output push`` and ``output pop`` allow you to push and pop +configurations on an internal configuration stack, so that you +can temporarily modify the output configuration, then easily restore +the previous configuration. Simply push before your change to save +the current configuration, then pop when you wish to restore the +previous configuration. + +``output preset`` sets Clinic's output to one of several built-in +preset configurations, as follows: + + ``original`` + Clinic's starting configuration. + + Suppress the ``parser_prototype`` + and ``docstring_prototype``, write everything else to ``block``. + + ``file`` + Designed to write everything to the "clinic file" that it can. + You then ``#include`` this file near the top of your file. + You may need to rearrange your file to make this work, though + usually this just means creating forward declarations for various + ``typedef`` and ``PyTypeObject`` definitions. + + Suppress the ``parser_prototype`` + and ``docstring_prototype``, write the ``impl_definition`` to + ``block``, and write everything else to ``file``. + + ``buffer`` + Save up all most of the output from Clinic, to be written into + your file near the end. For Python files implementing modules + or builtin types, it's recommended that you dump the buffer + just above the static structures for your module or + builtin type; these are normally very near the end. Using + ``buffer`` may require even more editing than ``file``, if + your file has static ``PyMethodDef`` arrays defined in the + middle of the file. + + Suppress the ``parser_prototype``, ``impl_prototype``, + and ``docstring_prototype``, write the ``impl_definition`` to + ``block``, and write everything else to ``file``. + + ``two-pass`` + Similar to the ``buffer`` preset, but writes forward declarations to + the ``two-pass`` buffer, and definitions to the ``buffer``. + This is similar to the ``buffer`` preset, but may require + less editing than ``buffer``. Dump the ``two-pass`` buffer + near the top of your file, and dump the ``buffer`` near + the end just like you would when using the ``buffer`` preset. + + Suppresses the ``impl_prototype``, write the ``impl_definition`` + to ``block``, write ``docstring_prototype``, ``methoddef_define``, + and ``parser_prototype`` to ``two-pass``, write everything else + to ``buffer``. + + ``partial-buffer`` + Similar to the ``buffer`` preset, but writes more things to ``block``, + only writing the really big chunks of generated code to ``buffer``. + This avoids the definition-before-use problem of ``buffer`` completely, + at the small cost of having slightly more stuff in the block's output. + Dump the ``buffer`` near the end, just like you would when using + the ``buffer`` preset. + + Suppresses the ``impl_prototype``, write the ``docstring_definition`` + and ``parser_defintion`` to ``buffer``, write everything else to ``block``. + +The third new directive is ``destination``:: + + destination <name> <command> [...] + +This performs an operation on the destination named ``name``. + +There are two defined subcommands: ``new`` and ``clear``. + +The ``new`` subcommand works like this:: + + destination <name> new <type> + +This creates a new destination with name ``<name>`` and type ``<type>``. + +There are five destination types:: + + ``suppress`` + Throws the text away. + + ``block`` + Writes the text to the current block. This is what Clinic + originally did. + + ``buffer`` + A simple text buffer, like the "buffer" builtin destination above. + + ``file`` + A text file. The file destination takes an extra argument, + a template to use for building the filename, like so: + + destination <name> new <type> <file_template> + + The template can use three strings internally that will be replaced + by bits of the filename: + + {filename} + The full filename. + {basename} + Everything up to but not including the last '.'. + {extension} + The last '.' and everything after it. + + If there are no periods in the filename, {basename} and {filename} + are the same, and {extension} is empty. "{basename}{extension}" + is always exactly the same as "{filename}"." + + ``two-pass`` + A two-pass buffer, like the "two-pass" builtin destination above. + + +The ``clear`` subcommand works like this:: + + destination <name> clear + +It removes all the accumulated text up to this point in the destination. +(I don't know what you'd need this for, but I thought maybe it'd be +useful while someone's experimenting.) + +The fourth new directive is ``set``:: + + set line_prefix "string" + set line_suffix "string" + +``set`` lets you set two internal variables in Clinic. +``line_prefix`` is a string that will be prepended to every line of Clinic's output; +``line_suffix`` is a string that will be appended to every line of Clinic's output. + +Both of these suport two format strings: + + ``{block comment start}`` + Turns into the string ``/*``, the start-comment text sequence for C files. + + ``{block comment end}`` + Turns into the string ``*/``, the end-comment text sequence for C files. + +The final new directive is one you shouldn't need to use directly, +called ``preserve``:: + + preserve + +This tells Clinic that the current contents of the output should be kept, unmodifed. +This is used internally by Clinic when dumping output into ``file`` files; wrapping +it in a Clinic block lets Clinic use its existing checksum functionality to ensure +the file was not modified by hand before it gets overwritten. + + +Using Argument Clinic in Python files +------------------------------------- + +It's actually possible to use Argument Clinic to preprocess Python files. +There's no point to using Argument Clinic blocks, of course, as the output +wouldn't make any sense to the Python interpreter. But using Argument Clinic +to run Python blocks lets you use Python as a Python preprocessor! + +Since Python comments are different from C comments, Argument Clinic +blocks embedded in Python files look slightly different. They look like this:: + + #/*[python input] + #print("def foo(): pass") + #[python start generated code]*/ + def foo(): pass + #/*[python checksum:...]*/ diff --git a/Doc/howto/functional.rst b/Doc/howto/functional.rst index 7189c65127..97c90c7c0b 100644 --- a/Doc/howto/functional.rst +++ b/Doc/howto/functional.rst @@ -3,7 +3,7 @@ ******************************** :Author: A. M. Kuchling -:Release: 0.31 +:Release: 0.32 In this document, we'll take a tour of Python's features suitable for implementing programs in a functional style. After an introduction to the @@ -15,9 +15,9 @@ concepts of functional programming, we'll look at language features such as Introduction ============ -This section explains the basic concept of functional programming; if you're -just interested in learning about Python language features, skip to the next -section. +This section explains the basic concept of functional programming; if +you're just interested in learning about Python language features, +skip to the next section on :ref:`functional-howto-iterators`. Programming languages support decomposing problems in several different ways: @@ -173,6 +173,8 @@ new programs by arranging existing functions in a new configuration and writing a few functions specialized for the current task. +.. _functional-howto-iterators: + Iterators ========= @@ -670,7 +672,7 @@ indexes at which certain conditions are met:: :func:`sorted(iterable, key=None, reverse=False) <sorted>` collects all the elements of the iterable into a list, sorts the list, and returns the sorted -result. The *key*, and *reverse* arguments are passed through to the +result. The *key* and *reverse* arguments are passed through to the constructed list's :meth:`~list.sort` method. :: >>> import random @@ -836,7 +838,8 @@ Another group of functions chooses a subset of an iterator's elements based on a predicate. :func:`itertools.filterfalse(predicate, iter) <itertools.filterfalse>` is the -opposite, returning all elements for which the predicate returns false:: +opposite of :func:`filter`, returning all elements for which the predicate +returns false:: itertools.filterfalse(is_even, itertools.count()) => 1, 3, 5, 7, 9, 11, 13, 15, ... @@ -864,6 +867,77 @@ iterable's results. :: itertools.dropwhile(is_even, itertools.count()) => 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... +:func:`itertools.compress(data, selectors) <itertools.compress>` takes two +iterators and returns only those elements of *data* for which the corresponding +element of *selectors* is true, stopping whenever either one is exhausted:: + + itertools.compress([1,2,3,4,5], [True, True, False, False, True]) => + 1, 2, 5 + + +Combinatoric functions +---------------------- + +The :func:`itertools.combinations(iterable, r) <itertools.combinations>` +returns an iterator giving all possible *r*-tuple combinations of the +elements contained in *iterable*. :: + + itertools.combinations([1, 2, 3, 4, 5], 2) => + (1, 2), (1, 3), (1, 4), (1, 5), + (2, 3), (2, 4), (2, 5), + (3, 4), (3, 5), + (4, 5) + + itertools.combinations([1, 2, 3, 4, 5], 3) => + (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), + (2, 3, 4), (2, 3, 5), (2, 4, 5), + (3, 4, 5) + +The elements within each tuple remain in the same order as +*iterable* returned them. For example, the number 1 is always before +2, 3, 4, or 5 in the examples above. A similar function, +:func:`itertools.permutations(iterable, r=None) <itertools.permutations>`, +removes this constraint on the order, returning all possible +arrangements of length *r*:: + + itertools.permutations([1, 2, 3, 4, 5], 2) => + (1, 2), (1, 3), (1, 4), (1, 5), + (2, 1), (2, 3), (2, 4), (2, 5), + (3, 1), (3, 2), (3, 4), (3, 5), + (4, 1), (4, 2), (4, 3), (4, 5), + (5, 1), (5, 2), (5, 3), (5, 4) + + itertools.permutations([1, 2, 3, 4, 5]) => + (1, 2, 3, 4, 5), (1, 2, 3, 5, 4), (1, 2, 4, 3, 5), + ... + (5, 4, 3, 2, 1) + +If you don't supply a value for *r* the length of the iterable is used, +meaning that all the elements are permuted. + +Note that these functions produce all of the possible combinations by +position and don't require that the contents of *iterable* are unique:: + + itertools.permutations('aba', 3) => + ('a', 'b', 'a'), ('a', 'a', 'b'), ('b', 'a', 'a'), + ('b', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'a') + +The identical tuple ``('a', 'a', 'b')`` occurs twice, but the two 'a' +strings came from different positions. + +The :func:`itertools.combinations_with_replacement(iterable, r) <itertools.combinations_with_replacement>` +function relaxes a different constraint: elements can be repeated +within a single tuple. Conceptually an element is selected for the +first position of each tuple and then is replaced before the second +element is selected. :: + + itertools.combinations_with_replacement([1, 2, 3, 4, 5], 2) => + (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), + (2, 2), (2, 3), (2, 4), (2, 5), + (3, 3), (3, 4), (3, 5), + (4, 4), (4, 5), + (5, 5) + Grouping elements ----------------- @@ -986,6 +1060,17 @@ write the obvious :keyword:`for` loop:: for i in [1,2,3]: product *= i +A related function is `itertools.accumulate(iterable, func=operator.add) <itertools.accumulate`. +It performs the same calculation, but instead of returning only the +final result, :func:`accumulate` returns an iterator that also yields +each partial result:: + + itertools.accumulate([1,2,3,4,5]) => + 1, 3, 6, 10, 15 + + itertools.accumulate([1,2,3,4,5], operator.mul) => + 1, 2, 6, 24, 120 + The operator module ------------------- @@ -1159,51 +1244,6 @@ features in Python 2.5. .. comment - Topics to place - ----------------------------- - - XXX os.walk() - - XXX Need a large example. - - But will an example add much? I'll post a first draft and see - what the comments say. - -.. comment - - Original outline: - Introduction - Idea of FP - Programs built out of functions - Functions are strictly input-output, no internal state - Opposed to OO programming, where objects have state - - Why FP? - Formal provability - Assignment is difficult to reason about - Not very relevant to Python - Modularity - Small functions that do one thing - Debuggability: - Easy to test due to lack of state - Easy to verify output from intermediate steps - Composability - You assemble a toolbox of functions that can be mixed - - Tackling a problem - Need a significant example - - Iterators - Generators - The itertools module - List comprehensions - Small functions and the lambda statement - Built-in functions - map - filter - -.. comment - Handy little function for printing part of an iterator -- used while writing this document. @@ -1214,5 +1254,3 @@ features in Python 2.5. sys.stdout.write(str(elem)) sys.stdout.write(', ') print(elem[-1]) - - diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst index 81a4f8b98d..2c9d69910a 100644 --- a/Doc/howto/index.rst +++ b/Doc/howto/index.rst @@ -28,4 +28,5 @@ Currently, the HOWTOs are: webservers.rst argparse.rst ipaddress.rst + clinic.rst diff --git a/Doc/howto/logging-cookbook.rst b/Doc/howto/logging-cookbook.rst index 674695b059..93b1b8acd5 100644 --- a/Doc/howto/logging-cookbook.rst +++ b/Doc/howto/logging-cookbook.rst @@ -704,9 +704,7 @@ the basis for code meeting your own specific requirements:: break logger = logging.getLogger(record.name) logger.handle(record) # No level or filter logic applied - just do it! - except (KeyboardInterrupt, SystemExit): - raise - except: + except Exception: import sys, traceback print('Whoops! Problem:', file=sys.stderr) traceback.print_exc(file=sys.stderr) diff --git a/Doc/howto/logging.rst b/Doc/howto/logging.rst index 55b1c867dd..dab58f4109 100644 --- a/Doc/howto/logging.rst +++ b/Doc/howto/logging.rst @@ -901,10 +901,10 @@ provided: disk files, rotating the log file at certain timed intervals. #. :class:`~handlers.SocketHandler` instances send messages to TCP/IP - sockets. + sockets. Since 3.4, Unix domain sockets are also supported. #. :class:`~handlers.DatagramHandler` instances send messages to UDP - sockets. + sockets. Since 3.4, Unix domain sockets are also supported. #. :class:`~handlers.SMTPHandler` instances send messages to a designated email address. diff --git a/Doc/howto/pyporting.rst b/Doc/howto/pyporting.rst index 92207e296d..d6ad40161d 100644 --- a/Doc/howto/pyporting.rst +++ b/Doc/howto/pyporting.rst @@ -10,258 +10,185 @@ Porting Python 2 Code to Python 3 With Python 3 being the future of Python while Python 2 is still in active use, it is good to have your project available for both major releases of - Python. This guide is meant to help you choose which strategy works best - for your project to support both Python 2 & 3 along with how to execute - that strategy. + Python. This guide is meant to help you figure out how best to support both + Python 2 & 3 simultaneously. If you are looking to port an extension module instead of pure Python code, please see :ref:`cporting-howto`. - -Choosing a Strategy -=================== - -When a project chooses to support both Python 2 & 3, -a decision needs to be made as to how to go about accomplishing that goal. -The chosen strategy will depend on how large the project's existing -codebase is and how much divergence you want from your current Python 2 codebase -(e.g., changing your code to work simultaneously with Python 2 and 3). - -If you would prefer to maintain a codebase which is semantically **and** -syntactically compatible with Python 2 & 3 simultaneously, you can write -:ref:`use_same_source`. While this tends to lead to somewhat non-idiomatic -code, it does mean you keep a rapid development process for you, the developer. - -If your project is brand-new or does not have a large codebase, then you may -want to consider writing/porting :ref:`all of your code for Python 3 -and use 3to2 <use_3to2>` to port your code for Python 2. - -Finally, you do have the option of :ref:`using 2to3 <use_2to3>` to translate -Python 2 code into Python 3 code (with some manual help). This can take the -form of branching your code and using 2to3 to start a Python 3 branch. You can -also have users perform the translation at installation time automatically so -that you only have to maintain a Python 2 codebase. - -Regardless of which approach you choose, porting is not as hard or -time-consuming as you might initially think. You can also tackle the problem -piece-meal as a good portion of porting is simply updating your code to follow -current best practices in a Python 2/3 compatible way. - - -Universal Bits of Advice ------------------------- - -Regardless of what strategy you pick, there are a few things you should -consider. - -One is make sure you have a robust test suite. You need to make sure everything -continues to work, just like when you support a new minor/feature release of -Python. This means making sure your test suite is thorough and is ported -properly between Python 2 & 3. You will also most likely want to use something -like tox_ to automate testing between both a Python 2 and Python 3 interpreter. - -Two, once your project has Python 3 support, make sure to add the proper -classifier on the Cheeseshop_ (PyPI_). To have your project listed as Python 3 -compatible it must have the -`Python 3 classifier <http://pypi.python.org/pypi?:action=browse&c=533>`_ -(from -http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/):: - - setup( - name='Your Library', - version='1.0', - classifiers=[ - # make sure to use :: Python *and* :: Python :: 3 so - # that pypi can list the package on the python 3 page - 'Programming Language :: Python', - 'Programming Language :: Python :: 3' - ], - packages=['yourlibrary'], - # make sure to add custom_fixers to the MANIFEST.in - include_package_data=True, - # ... - ) - - -Doing so will cause your project to show up in the -`Python 3 packages list -<http://pypi.python.org/pypi?:action=browse&c=533&show=all>`_. You will know -you set the classifier properly as visiting your project page on the Cheeseshop -will show a Python 3 logo in the upper-left corner of the page. - -Three, the six_ project provides a library which helps iron out differences -between Python 2 & 3. If you find there is a sticky point that is a continual -point of contention in your translation or maintenance of code, consider using -a source-compatible solution relying on six. If you have to create your own -Python 2/3 compatible solution, you can use ``sys.version_info[0] >= 3`` as a -guard. - -Four, read all the approaches. Just because some bit of advice applies to one -approach more than another doesn't mean that some advice doesn't apply to other -strategies. This is especially true of whether you decide to use 2to3 or be -source-compatible; tips for one approach almost always apply to the other. - -Five, drop support for older Python versions if possible. `Python 2.5`_ + If you would like to read one core Python developer's take on why Python 3 + came into existence, you can read Nick Coghlan's `Python 3 Q & A`_. + + If you prefer to read a (free) book on porting a project to Python 3, + consider reading `Porting to Python 3`_ by Lennart Regebro which should cover + much of what is discussed in this HOWTO. + + For help with porting, you can email the python-porting_ mailing list with + questions. + + +Before You Begin +================ + +If your project is on the Cheeseshop_/PyPI_, make sure it has the proper +`trove classifiers`_ to signify what versions of Python it **currently** +supports. At minimum you should specify the major version(s), e.g. +``Programming Language :: Python :: 2`` if your project currently only supports +Python 2. It is preferrable that you be as specific as possible by listing every +major/minor version of Python that you support, e.g. if your project supports +Python 2.6 and 2.7, then you want the classifiers of:: + + Programming Language :: Python :: 2 + Programming Language :: Python :: 2.6 + Programming Language :: Python :: 2.7 + +Once your project supports Python 3 you will want to go back and add the +appropriate classifiers for Python 3 as well. This is important as setting the +``Programming Language :: Python :: 3`` classifier will lead to your project +being listed under the `Python 3 Packages`_ section of PyPI. + +Make sure you have a robust test suite. You need to +make sure everything continues to work, just like when you support a new +minor/feature release of Python. This means making sure your test suite is +thorough and is ported properly between Python 2 & 3 (consider using coverage_ +to measure that you have effective test coverage). You will also most likely +want to use something like tox_ to automate testing between all of your +supported versions of Python. You will also want to **port your tests first** so +that you can make sure that you detect breakage during the transition. Tests also +tend to be simpler than the code they are testing so it gives you an idea of how +easy it can be to port code. + +Drop support for older Python versions if possible. `Python 2.5`_ introduced a lot of useful syntax and libraries which have become idiomatic in Python 3. `Python 2.6`_ introduced future statements which makes compatibility much easier if you are going from Python 2 to 3. -`Python 2.7`_ continues the trend in the stdlib. So choose the newest version +`Python 2.7`_ continues the trend in the stdlib. Choose the newest version of Python which you believe can be your minimum support version and work from there. -Six, target the newest version of Python 3 that you can. Beyond just the usual +Target the newest version of Python 3 that you can. Beyond just the usual bugfixes, compatibility has continued to improve between Python 2 and 3 as time -has passed. This is especially true for Python 3.3 where the ``u`` prefix for -strings is allowed, making source-compatible Python code easier. - -Seven, make sure to look at the `Other Resources`_ for tips from other people -which may help you out. - - -.. _tox: http://codespeak.net/tox/ -.. _Cheeseshop: -.. _PyPI: http://pypi.python.org/ -.. _six: http://packages.python.org/six -.. _Python 2.7: http://www.python.org/2.7.x -.. _Python 2.6: http://www.python.org/2.6.x -.. _Python 2.5: http://www.python.org/2.5.x -.. _Python 2.4: http://www.python.org/2.4.x -.. _Python 2.3: http://www.python.org/2.3.x -.. _Python 2.2: http://www.python.org/2.2.x +has passed. E.g. Python 3.3 added back the ``u`` prefix for +strings, making source-compatible Python code easier to write. -.. _use_3to2: +Writing Source-Compatible Python 2/3 Code +========================================= -Python 3 and 3to2 -================= +Over the years the Python community has discovered that the easiest way to +support both Python 2 and 3 in parallel is to write Python code that works in +either version. While this might sound counter-intuitive at first, it actually +is not difficult and typically only requires following some select +(non-idiomatic) practices and using some key projects to help make bridging +between Python 2 and 3 easier. -If you are starting a new project or your codebase is small enough, you may -want to consider writing your code for Python 3 and backporting to Python 2 -using 3to2_. Thanks to Python 3 being more strict about things than Python 2 -(e.g., bytes vs. strings), the source translation can be easier and more -straightforward than from Python 2 to 3. Plus it gives you more direct -experience developing in Python 3 which, since it is the future of Python, is a -good thing long-term. +Projects to Consider +-------------------- -A drawback of this approach is that 3to2 is a third-party project. This means -that the Python core developers (and thus this guide) can make no promises -about how well 3to2 works at any time. There is nothing to suggest, though, -that 3to2 is not a high-quality project. +The lowest level library for suppoting Python 2 & 3 simultaneously is six_. +Reading through its documentation will give you an idea of where exactly the +Python language changed between versions 2 & 3 and thus what you will want the +library to help you continue to support. +To help automate porting your code over to using six, you can use +modernize_. This project will attempt to rewrite your code to be as modern as +possible while using six to smooth out any differences between Python 2 & 3. -.. _3to2: https://bitbucket.org/amentajo/lib3to2/overview +If you want to write your compatible code to feel more like Python 3 there is +the future_ project. It tries to provide backports of objects from Python 3 so +that you can use them from Python 2-compatible code, e.g. replacing the +``bytes`` type from Python 2 with the one from Python 3. +It also provides a translation script like modernize (its translation code is +actually partially based on it) to help start working with a pre-existing code +base. It is also unique in that its translation script will also port Python 3 +code backwards as well as Python 2 code forwards. -.. _use_2to3: - -Python 2 and 2to3 -================= - -Included with Python since 2.6, the 2to3_ tool (and :mod:`lib2to3` module) -helps with porting Python 2 to Python 3 by performing various source -translations. This is a perfect solution for projects which wish to branch -their Python 3 code from their Python 2 codebase and maintain them as -independent codebases. You can even begin preparing to use this approach -today by writing future-compatible Python code which works cleanly in -Python 2 in conjunction with 2to3; all steps outlined below will work -with Python 2 code up to the point when the actual use of 2to3 occurs. - -Use of 2to3 as an on-demand translation step at install time is also possible, -preventing the need to maintain a separate Python 3 codebase, but this approach -does come with some drawbacks. While users will only have to pay the -translation cost once at installation, you as a developer will need to pay the -cost regularly during development. If your codebase is sufficiently large -enough then the translation step ends up acting like a compilation step, -robbing you of the rapid development process you are used to with Python. -Obviously the time required to translate a project will vary, so do an -experimental translation just to see how long it takes to evaluate whether you -prefer this approach compared to using :ref:`use_same_source` or simply keeping -a separate Python 3 codebase. - -Below are the typical steps taken by a project which tries to support -Python 2 & 3 while keeping the code directly executable by Python 2. +Tips & Tricks +------------- +To help with writing source-compatible code using one of the projects mentioned +in `Projects to Consider`_, consider following the below suggestions. Some of +them are handled by the suggested projects, so if you do use one of them then +read their documentation first to see which suggestions below will taken care of +for you. Support Python 2.7 ------------------- +////////////////// As a first step, make sure that your project is compatible with `Python 2.7`_. This is just good to do as Python 2.7 is the last release of Python 2 and thus will be used for a rather long time. It also allows for use of the ``-3`` flag -to Python to help discover places in your code which 2to3 cannot handle but are -known to cause issues. +to Python to help discover places in your code where compatibility might be an +issue (the ``-3`` flag is in Python 2.6 but Python 2.7 adds more warnings). Try to Support `Python 2.6`_ and Newer Only -------------------------------------------- +/////////////////////////////////////////// While not possible for all projects, if you can support `Python 2.6`_ and newer **only**, your life will be much easier. Various future statements, stdlib additions, etc. exist only in Python 2.6 and later which greatly assist in -porting to Python 3. But if you project must keep support for `Python 2.5`_ (or -even `Python 2.4`_) then it is still possible to port to Python 3. +supporting Python 3. But if you project must keep support for `Python 2.5`_ then +it is still possible to simultaneously support Python 3. Below are the benefits you gain if you only have to support Python 2.6 and newer. Some of these options are personal choice while others are **strongly** recommended (the ones that are more for personal choice are labeled as such). If you continue to support older versions of Python then you -at least need to watch out for situations that these solutions fix. +at least need to watch out for situations that these solutions fix and handle +them appropriately (which is where library help from e.g. six_ comes in handy). ``from __future__ import print_function`` ''''''''''''''''''''''''''''''''''''''''' -This is a personal choice. 2to3 handles the translation from the print -statement to the print function rather well so this is an optional step. This -future statement does help, though, with getting used to typing -``print('Hello, World')`` instead of ``print 'Hello, World'``. +It will not only get you used to typing ``print()`` as a function instead of a +statement, but it will also give you the various benefits the function has over +the Python 2 statement (six_ provides a function if you support Python 2.5 or +older). ``from __future__ import unicode_literals`` ''''''''''''''''''''''''''''''''''''''''''' -Another personal choice. You can always mark what you want to be a (unicode) -string with a ``u`` prefix to get the same effect. But regardless of whether -you use this future statement or not, you **must** make sure you know exactly -which Python 2 strings you want to be bytes, and which are to be strings. This -means you should, **at minimum** mark all strings that are meant to be text -strings with a ``u`` prefix if you do not use this future statement. Python 3.3 -allows strings to continue to have the ``u`` prefix (it's a no-op in that case) -to make it easier for code to be source-compatible between Python 2 & 3. +If you choose to use this future statement then all string literals in +Python 2 will be assumed to be Unicode (as is already the case in Python 3). +If you choose not to use this future statement then you should mark all of your +text strings with a ``u`` prefix and only support Python 3.3 or newer. But you +are **strongly** advised to do one or the other (six_ provides a function in +case you don't want to use the future statement **and** you want to support +Python 3.2 or older). -Bytes literals -'''''''''''''' +Bytes/string literals +''''''''''''''''''''' -This is a **very** important one. The ability to prefix Python 2 strings that -are meant to contain bytes with a ``b`` prefix help to very clearly delineate -what is and is not a Python 3 string. When you run 2to3 on code, all Python 2 -strings become Python 3 strings **unless** they are prefixed with ``b``. +This is a **very** important one. Prefix Python 2 strings that +are meant to contain bytes with a ``b`` prefix to very clearly delineate +what is and is not a Python 3 text string (six_ provides a function to use for +Python 2.5 compatibility). This point cannot be stressed enough: make sure you know what all of your string -literals in Python 2 are meant to become in Python 3. Any string literal that +literals in Python 2 are meant to be in Python 3. Any string literal that should be treated as bytes should have the ``b`` prefix. Any string literal that should be Unicode/text in Python 2 should either have the ``u`` literal (supported, but ignored, in Python 3.3 and later) or you should have ``from __future__ import unicode_literals`` at the top of the file. But the key -point is you should know how Python 3 will treat everyone one of your string +point is you should know how Python 3 will treat every one one of your string literals and you should mark them as appropriate. There are some differences between byte literals in Python 2 and those in Python 3 thanks to the bytes type just being an alias to ``str`` in Python 2. -Probably the biggest "gotcha" is that indexing results in different values. In -Python 2, the value of ``b'py'[1]`` is ``'y'``, while in Python 3 it's ``121``. -You can avoid this disparity by always slicing at the size of a single element: -``b'py'[1:2]`` is ``'y'`` in Python 2 and ``b'y'`` in Python 3 (i.e., close -enough). +See the `Handle Common "Gotchas"`_ section for what to watch out for. -You cannot concatenate bytes and strings in Python 3. But since Python -2 has bytes aliased to ``str``, it will succeed: ``b'a' + u'b'`` works in -Python 2, but ``b'a' + 'b'`` in Python 3 is a :exc:`TypeError`. A similar issue -also comes about when doing comparisons between bytes and strings. +``from __future__ import absolute_import`` +'''''''''''''''''''''''''''''''''''''''''' +Discussed in more detail below, but you should use this future statement to +prevent yourself from accidentally using implicit relative imports. Supporting `Python 2.5`_ and Newer Only ---------------------------------------- +/////////////////////////////////////// If you are supporting `Python 2.5`_ and newer there are still some features of Python that you can utilize. @@ -271,7 +198,7 @@ Python that you can utilize. '''''''''''''''''''''''''''''''''''''''''' Implicit relative imports (e.g., importing ``spam.bacon`` from within -``spam.eggs`` with the statement ``import bacon``) does not work in Python 3. +``spam.eggs`` with the statement ``import bacon``) do not work in Python 3. This future statement moves away from that and allows the use of explicit relative imports (e.g., ``from . import bacon``). @@ -281,7 +208,7 @@ implicit ones. In `Python 2.6`_ explicit relative imports are available without the statement, but you still want the __future__ statement to prevent implicit relative imports. In `Python 2.7`_ the __future__ statement is not needed. In other words, unless you are only supporting Python 2.7 or a version earlier -than Python 2.5, use the __future__ statement. +than Python 2.5, use this __future__ statement. Mark all Unicode strings with a ``u`` prefix @@ -290,17 +217,65 @@ Mark all Unicode strings with a ``u`` prefix While Python 2.6 has a ``__future__`` statement to automatically cause Python 2 to treat all string literals as Unicode, Python 2.5 does not have that shortcut. This means you should go through and mark all string literals with a ``u`` -prefix to turn them explicitly into Unicode strings where appropriate. That -leaves all unmarked string literals to be considered byte literals in Python 3. +prefix to turn them explicitly into text strings where appropriate and only +support Python 3.3 or newer. Otherwise use a project like six_ which provides a +function to pass all text string literals through. + + +Capturing the Currently Raised Exception +'''''''''''''''''''''''''''''''''''''''' + +In Python 2.5 and earlier the syntax to access the current exception is:: + + try: + raise Exception() + except Exception, exc: + # Current exception is 'exc'. + pass + +This syntax changed in Python 3 (and backported to `Python 2.6`_ and later) +to:: + + try: + raise Exception() + except Exception as exc: + # Current exception is 'exc'. + # In Python 3, 'exc' is restricted to the block; in Python 2.6/2.7 it will "leak". + pass + +Because of this syntax change you must change how you capture the current +exception in Python 2.5 and earlier to:: + + try: + raise Exception() + except Exception: + import sys + exc = sys.exc_info()[1] + # Current exception is 'exc'. + pass + +You can get more information about the raised exception from +:func:`sys.exc_info` than simply the current exception instance, but you most +likely don't need it. +.. note:: + In Python 3, the traceback is attached to the exception instance + through the ``__traceback__`` attribute. If the instance is saved in + a local variable that persists outside of the ``except`` block, the + traceback will create a reference cycle with the current frame and its + dictionary of local variables. This will delay reclaiming dead + resources until the next cyclic :term:`garbage collection` pass. + + In Python 2, this problem only occurs if you save the traceback itself + (e.g. the third element of the tuple returned by :func:`sys.exc_info`) + in a variable. Handle Common "Gotchas" ------------------------ +/////////////////////// -There are a few things that just consistently come up as sticking points for -people which 2to3 cannot handle automatically or can easily be done in Python 2 -to help modernize your code. +These are things to watch out for no matter what version of Python 2 you are +supporting which are not syntactic considerations. ``from __future__ import division`` @@ -357,9 +332,9 @@ One of the biggest issues people have when porting code to Python 3 is handling the bytes/string dichotomy. Because Python 2 allowed the ``str`` type to hold textual data, people have over the years been rather loose in their delineation of what ``str`` instances held text compared to bytes. In Python 3 you cannot -be so care-free anymore and need to properly handle the difference. The key +be so care-free anymore and need to properly handle the difference. The key to handling this issue is to make sure that **every** string literal in your -Python 2 code is either syntactically of functionally marked as either bytes or +Python 2 code is either syntactically or functionally marked as either bytes or text data. After this is done you then need to make sure your APIs are designed to either handle a specific type or made to be properly polymorphic. @@ -466,14 +441,7 @@ methods which have unpredictable results (e.g., infinite recursion if you happen to use the ``unicode(self).encode('utf8')`` idiom as the body of your ``__str__()`` method). -There are two ways to solve this issue. One is to use a custom 2to3 fixer. The -blog post at http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/ -specifies how to do this. That will allow 2to3 to change all instances of ``def -__unicode(self): ...`` to ``def __str__(self): ...``. This does require that you -define your ``__str__()`` method in Python 2 before your ``__unicode__()`` -method. - -The other option is to use a mixin class. This allows you to only define a +You can use a mixin class to work around this. This allows you to only define a ``__unicode__()`` method for your class and let the mixin derive ``__str__()`` for you (code from http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/):: @@ -516,6 +484,7 @@ sequence containing all arguments passed to the :meth:`__init__` method. Even better is to use the documented attributes the exception provides. + Don't use ``__getslice__`` & Friends '''''''''''''''''''''''''''''''''''' @@ -527,23 +496,23 @@ friends. Updating doctests ''''''''''''''''' -2to3_ will attempt to generate fixes for doctests that it comes across. It's -not perfect, though. If you wrote a monolithic set of doctests (e.g., a single -docstring containing all of your doctests), you should at least consider -breaking the doctests up into smaller pieces to make it more manageable to fix. -Otherwise it might very well be worth your time and effort to port your tests -to :mod:`unittest`. +Don't forget to make them Python 2/3 compatible as well. If you wrote a +monolithic set of doctests (e.g., a single docstring containing all of your +doctests), you should at least consider breaking the doctests up into smaller +pieces to make it more manageable to fix. Otherwise it might very well be worth +your time and effort to port your tests to :mod:`unittest`. -Update `map` for imbalanced input sequences -''''''''''''''''''''''''''''''''''''''''''' +Update ``map`` for imbalanced input sequences +''''''''''''''''''''''''''''''''''''''''''''' -With Python 2, `map` would pad input sequences of unequal length with -`None` values, returning a sequence as long as the longest input sequence. +With Python 2, when ``map`` was given more than one input sequence it would pad +the shorter sequences with `None` values, returning a sequence as long as the +longest input sequence. -With Python 3, if the input sequences to `map` are of unequal length, `map` +With Python 3, if the input sequences to ``map`` are of unequal length, ``map`` will stop at the termination of the shortest of the sequences. For full -compatibility with `map` from Python 2.x, also wrap the sequences in +compatibility with ``map`` from Python 2.x, wrap the sequence arguments in :func:`itertools.zip_longest`, e.g. ``map(func, *sequences)`` becomes ``list(map(func, itertools.zip_longest(*sequences)))``. @@ -552,176 +521,34 @@ Eliminate ``-3`` Warnings When you run your application's test suite, run it using the ``-3`` flag passed to Python. This will cause various warnings to be raised during execution about -things that 2to3 cannot handle automatically (e.g., modules that have been -removed). Try to eliminate those warnings to make your code even more portable -to Python 3. - - -Run 2to3 --------- - -Once you have made your Python 2 code future-compatible with Python 3, it's -time to use 2to3_ to actually port your code. - - -Manually -'''''''' - -To manually convert source code using 2to3_, you use the ``2to3`` script that -is installed with Python 2.6 and later.:: - - 2to3 <directory or file to convert> - -This will cause 2to3 to write out a diff with all of the fixers applied for the -converted source code. If you would like 2to3 to go ahead and apply the changes -you can pass it the ``-w`` flag:: - - 2to3 -w <stuff to convert> - -There are other flags available to control exactly which fixers are applied, -etc. - - -During Installation -''''''''''''''''''' - -When a user installs your project for Python 3, you can have either -:mod:`distutils` or Distribute_ run 2to3_ on your behalf. -For distutils, use the following idiom:: - - try: # Python 3 - from distutils.command.build_py import build_py_2to3 as build_py - except ImportError: # Python 2 - from distutils.command.build_py import build_py - - setup(cmdclass = {'build_py': build_py}, - # ... - ) - -For Distribute:: - - setup(use_2to3=True, - # ... - ) - -This will allow you to not have to distribute a separate Python 3 version of -your project. It does require, though, that when you perform development that -you at least build your project and use the built Python 3 source for testing. - - -Verify & Test -------------- - -At this point you should (hopefully) have your project converted in such a way -that it works in Python 3. Verify it by running your unit tests and making sure -nothing has gone awry. If you miss something then figure out how to fix it in -Python 3, backport to your Python 2 code, and run your code through 2to3 again -to verify the fix transforms properly. - - -.. _2to3: http://docs.python.org/py3k/library/2to3.html -.. _Distribute: http://packages.python.org/distribute/ - - -.. _use_same_source: - -Python 2/3 Compatible Source -============================ - -While it may seem counter-intuitive, you can write Python code which is -source-compatible between Python 2 & 3. It does lead to code that is not -entirely idiomatic Python (e.g., having to extract the currently raised -exception from ``sys.exc_info()[1]``), but it can be run under Python 2 -**and** Python 3 without using 2to3_ as a translation step (although the tool -should be used to help find potential portability problems). This allows you to -continue to have a rapid development process regardless of whether you are -developing under Python 2 or Python 3. Whether this approach or using -:ref:`use_2to3` works best for you will be a per-project decision. - -To get a complete idea of what issues you will need to deal with, see the -`What's New in Python 3.0`_. Others have reorganized the data in other formats -such as http://docs.pythonsprints.com/python3_porting/py-porting.html\ . +things that are semantic changes between Python 2 and 3. Try to eliminate those +warnings to make your code even more portable to Python 3. -The following are some steps to take to try to support both Python 2 & 3 from -the same source code. - - -.. _What's New in Python 3.0: http://docs.python.org/release/3.0/whatsnew/3.0.html - - -Follow The Steps for Using 2to3_ --------------------------------- - -All of the steps outlined in how to -:ref:`port Python 2 code with 2to3 <use_2to3>` apply -to creating a Python 2/3 codebase. This includes trying only support Python 2.6 -or newer (the :mod:`__future__` statements work in Python 3 without issue), -eliminating warnings that are triggered by ``-3``, etc. - -You should even consider running 2to3_ over your code (without committing the -changes). This will let you know where potential pain points are within your -code so that you can fix them properly before they become an issue. - - -Use six_ --------- - -The six_ project contains many things to help you write portable Python code. -You should make sure to read its documentation from beginning to end and use -any and all features it provides. That way you will minimize any mistakes you -might make in writing cross-version code. - - -Capturing the Currently Raised Exception ----------------------------------------- - -One change between Python 2 and 3 that will require changing how you code (if -you support `Python 2.5`_ and earlier) is -accessing the currently raised exception. In Python 2.5 and earlier the syntax -to access the current exception is:: - - try: - raise Exception() - except Exception, exc: - # Current exception is 'exc' - pass -This syntax changed in Python 3 (and backported to `Python 2.6`_ and later) -to:: +Alternative Approaches +====================== - try: - raise Exception() - except Exception as exc: - # Current exception is 'exc' - # In Python 3, 'exc' is restricted to the block; Python 2.6 will "leak" - pass +While supporting Python 2 & 3 simultaneously is typically the preferred choice +by people so that they can continue to improve code and have it work for the +most number of users, your life may be easier if you only have to support one +major version of Python going forward. -Because of this syntax change you must change to capturing the current -exception to:: +Supporting Only Python 3 Going Forward From Python 2 Code +--------------------------------------------------------- - try: - raise Exception() - except Exception: - import sys - exc = sys.exc_info()[1] - # Current exception is 'exc' - pass +If you have Python 2 code but going forward only want to improve it as Python 3 +code, then you can use 2to3_ to translate your Python 2 code to Python 3 code. +This is only recommended, though, if your current version of your project is +going into maintenance mode and you want all new features to be exclusive to +Python 3. -You can get more information about the raised exception from -:func:`sys.exc_info` than simply the current exception instance, but you most -likely don't need it. -.. note:: - In Python 3, the traceback is attached to the exception instance - through the ``__traceback__`` attribute. If the instance is saved in - a local variable that persists outside of the ``except`` block, the - traceback will create a reference cycle with the current frame and its - dictionary of local variables. This will delay reclaiming dead - resources until the next cyclic :term:`garbage collection` pass. +Backporting Python 3 code to Python 2 +------------------------------------- - In Python 2, this problem only occurs if you save the traceback itself - (e.g. the third element of the tuple returned by :func:`sys.exc_info`) - in a variable. +If you have Python 3 code and have little interest in supporting Python 2 you +can use 3to2_ to translate from Python 3 code to Python 2 code. This is only +recommended if you don't plan to heavily support Python 2 users. Other Resources @@ -729,18 +556,41 @@ Other Resources The authors of the following blog posts, wiki pages, and books deserve special thanks for making public their tips for porting Python 2 code to Python 3 (and -thus helping provide information for this document): +thus helping provide information for this document and its various revisions +over the years): +* http://wiki.python.org/moin/PortingPythonToPy3k * http://python3porting.com/ * http://docs.pythonsprints.com/python3_porting/py-porting.html * http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/ * http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html * http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/ * http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/ -* http://wiki.python.org/moin/PortingPythonToPy3k * https://wiki.ubuntu.com/Python/3 If you feel there is something missing from this document that should be added, please email the python-porting_ mailing list. + + +.. _2to3: http://docs.python.org/2/library/2to3.html +.. _3to2: https://pypi.python.org/pypi/3to2 +.. _Cheeseshop: PyPI_ +.. _coverage: https://pypi.python.org/pypi/coverage +.. _future: http://python-future.org/ +.. _modernize: https://github.com/mitsuhiko/python-modernize +.. _Porting to Python 3: http://python3porting.com/ +.. _PyPI: http://pypi.python.org/ +.. _Python 2.2: http://www.python.org/2.2.x +.. _Python 2.5: http://www.python.org/2.5.x +.. _Python 2.6: http://www.python.org/2.6.x +.. _Python 2.7: http://www.python.org/2.7.x +.. _Python 2.5: http://www.python.org/2.5.x +.. _Python 3.3: http://www.python.org/3.3.x +.. _Python 3 Packages: https://pypi.python.org/pypi?:action=browse&c=533&show=all +.. _Python 3 Q & A: http://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html .. _python-porting: http://mail.python.org/mailman/listinfo/python-porting +.. _six: https://pypi.python.org/pypi/six +.. _tox: https://pypi.python.org/pypi/tox +.. _trove classifiers: https://pypi.python.org/pypi?%3Aaction=list_classifiers + diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index 5203e53700..fbe763b3f2 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -271,8 +271,8 @@ performing string substitutions. :: >>> import re >>> p = re.compile('ab*') - >>> p #doctest: +ELLIPSIS - <_sre.SRE_Pattern object at 0x...> + >>> p + re.compile('ab*') :func:`re.compile` also accepts an optional *flags* argument, used to enable various special features and syntax variations. We'll go over the available @@ -383,8 +383,8 @@ Python interpreter, import the :mod:`re` module, and compile a RE:: >>> import re >>> p = re.compile('[a-z]+') - >>> p #doctest: +ELLIPSIS - <_sre.SRE_Pattern object at 0x...> + >>> p + re.compile('[a-z]+') Now, you can try matching various strings against the RE ``[a-z]+``. An empty string shouldn't match at all, since ``+`` means 'one or more repetitions'. @@ -402,7 +402,7 @@ should store the result in a variable for later use. :: >>> m = p.match('tempo') >>> m #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(0, 5), match='tempo'> Now you can query the :ref:`match object <match-objects>` for information about the matching string. :ref:`match object <match-objects>` instances @@ -441,7 +441,7 @@ case. :: >>> print(p.match('::: message')) None >>> m = p.search('::: message'); print(m) #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(4, 11), match='message'> >>> m.group() 'message' >>> m.span() @@ -493,7 +493,7 @@ the RE string added as the first argument, and still return either ``None`` or a >>> print(re.match(r'From\s+', 'Fromage amk')) None >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(0, 5), match='From '> Under the hood, these functions simply create a pattern object for you and call the appropriate method on it. They also store the compiled @@ -685,7 +685,7 @@ given location, they can obviously be matched an infinite number of times. line, the RE to use is ``^From``. :: >>> print(re.search('^From', 'From Here to Eternity')) #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(0, 4), match='From'> >>> print(re.search('^From', 'Reciting From Memory')) None @@ -697,11 +697,11 @@ given location, they can obviously be matched an infinite number of times. or any location followed by a newline character. :: >>> print(re.search('}$', '{block}')) #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(6, 7), match='}'> >>> print(re.search('}$', '{block} ')) None >>> print(re.search('}$', '{block}\n')) #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(6, 7), match='}'> To match a literal ``'$'``, use ``\$`` or enclose it inside a character class, as in ``[$]``. @@ -726,7 +726,7 @@ given location, they can obviously be matched an infinite number of times. >>> p = re.compile(r'\bclass\b') >>> print(p.search('no class at all')) #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(3, 8), match='class'> >>> print(p.search('the declassified algorithm')) None >>> print(p.search('one subclass is')) @@ -744,7 +744,7 @@ given location, they can obviously be matched an infinite number of times. >>> print(p.search('no class at all')) None >>> print(p.search('\b' + 'class' + '\b')) #doctest: +ELLIPSIS - <_sre.SRE_Match object at 0x...> + <_sre.SRE_Match object; span=(0, 7), match='\x08class\x08'> Second, inside a character class, where there's no use for this assertion, ``\b`` represents the backspace character, for compatibility with Python's diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst index 5a2266070a..3d5de32152 100644 --- a/Doc/howto/urllib2.rst +++ b/Doc/howto/urllib2.rst @@ -507,7 +507,7 @@ than the URL you pass to .add_password() will also match. :: -- ``ProxyHandler`` (if a proxy setting such as an :envvar:`http_proxy` environment variable is set), ``UnknownHandler``, ``HTTPHandler``, ``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``, - ``FileHandler``, ``HTTPErrorProcessor``. + ``FileHandler``, ``DataHandler``, ``HTTPErrorProcessor``. ``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme component and the hostname and optionally the port number) |