diff options
| author | wiemann <wiemann@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2005-06-05 15:12:08 +0000 |
|---|---|---|
| committer | wiemann <wiemann@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2005-06-05 15:12:08 +0000 |
| commit | 5dd123325ec534de16ee060bf1723a8fa5bb33d7 (patch) | |
| tree | 297d8258c5f87b9d4ee869f8b504f5373a119af7 /docs/dev/hacking.txt | |
| parent | 1e84e4a9faaecb92e19f35d4597db8d208490132 (diff) | |
| download | docutils-5dd123325ec534de16ee060bf1723a8fa5bb33d7.tar.gz | |
added a "Hacker's Guide" containing a first overview of Docutils' architecture
git-svn-id: http://svn.code.sf.net/p/docutils/code/trunk/docutils@3431 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docs/dev/hacking.txt')
| -rw-r--r-- | docs/dev/hacking.txt | 210 |
1 files changed, 210 insertions, 0 deletions
diff --git a/docs/dev/hacking.txt b/docs/dev/hacking.txt new file mode 100644 index 000000000..d98823ba0 --- /dev/null +++ b/docs/dev/hacking.txt @@ -0,0 +1,210 @@ +.. -*- coding: utf-8 -*- + +========================== + Docutils_ Hacker's Guide +========================== + +:Author: Felix Wiemann +:Contact: Felix.Wiemann@ososo.de +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +:Abstract: This is the introduction to Docutils for all persons who + want to extend Docutils in some way. +:Prerequisites: You have used reStructuredText_ and played around with + the `Docutils front-end tools`_ before. Some (basic) Python + knowledge is certainly helpful (though not necessary, strictly + speaking). + +.. _Docutils: http://docutils.sourceforge.net/ +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _Docutils front-end tools: ../user/tools.html + +.. contents:: + + +Overview of the Docutils Architecture +===================================== + +To give you an understanding of the Docutils architecture, we'll dive +right into the internals using a practical example. + +Consider the following reStructuredText file:: + + My *favorite* language is Python_. + + .. _Python: http://www.python.org/ + +Using the ``rst2html.py`` front-end tool, you would get an HTML output +which looks like this:: + + [uninteresting HTML code removed] + <body> + <div class="document"> + <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> + </div> + </body> + </html> + +While this looks very simple, it's enough to illustrate all internal +processing stages of Docutils. Let's see how this document is +processed from the reStructuredText source to the final HTML output: + +Reading the Document +-------------------- + +The **Reader** reads the document from the source file and passes it +to the parser (see below). The default reader is the standalone +reader (``docutils/readers/standalone.py``) which just reads the input +data from a single text file. Unless you want to do really fancy +things, there is no need to change that. + +Since you probably won't need to touch readers, we will just move on +to the next stage: + +Parsing the Document +-------------------- + +The **Parser** analyzes the the input document and creates a **node +tree** representation. In this case we are using the +**reStructuredText parser** (``docutils/parsers/rst/__init__.py``). +To see what that node tree looks like, we call ``quicktest.py`` (which +can be found in the ``tools/`` directory of the Docutils distribution) +with our example file (``test.txt``) as first parameter (Windows users +might need to type ``python quicktest.py test.txt``):: + + $ quicktest.py test.txt + <document source="test.txt"> + <paragraph> + My + <emphasis> + favorite + language is + <reference name="Python" refname="python"> + Python + . + <target ids="python" names="python" refuri="http://www.python.org/"> + +Let us now examine the node tree: + +The top-level node is ``document``. It has a ``source`` attribute +whose value is ``text.txt``. There are two children: A ``paragraph`` +node and a ``target`` node. The ``paragraph`` in turn has children: A +text node ("My "), an ``emphasis`` node, a text node (" language is "), +a ``reference`` node, and again a ``Text`` node ("."). + +These node types (``document``, ``paragraph``, ``emphasis``, etc.) are +all defined in ``docutils/nodes.py``. The node types are internally +arranged as a class hierarchy (for example, both ``emphasis`` and +``reference`` have the common superclass ``Inline``). To get an +overview of the node class hierarchy, use epydoc (type ``epydoc +nodes.py``) and look at the class hierarchy tree. + +Transforming the Document +------------------------- + +In the node tree above, the ``reference`` node does not contain the +target URI (``http://www.python.org/``) yet. + +Assigning the target URI (from the ``target`` node) to the +``reference`` node is *not* done by the parser (the parser only +translates the input document into a node tree). + +Instead, it's done by a **Transform**. In this case (resolving a +reference), it's done by the ``ExternalTargets`` transform in +``docutils/transforms/references.py``. + +In fact, there are quite a lot of Transforms, which do various useful +things like creating the table of contents, applying substitution +references or resolving auto-numbered footnotes. + +The Transforms are applied after parsing. To see how the node tree +has changed after applying the Transforms, we use the +``rst2pseudoxml.py`` tool: + +.. parsed-literal:: + + $ rst2pseudoxml.py test.txt + <document source="test.txt"> + <paragraph> + My + <emphasis> + favorite + language is + <reference name="Python" **refuri="http://www.python.org/"**> + Python + . + <target ids="python" names="python" ``refuri="http://www.python.org/"``> + +For our small test document, the only change is that the ``refname`` +attribute of the reference has been replaced by a ``refuri`` +attribute—the reference has been resolved. + +While this does not look very exciting, transforms are a powerful tool +to apply any kind of transformation on the node tree. + +By the way, you can also get a "real" XML representation of the node +tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``. + +Writing the Document +-------------------- + +To get an HTML document out of the node tree, we use a **Writer**, the +HTML writer in this case (``docutils/writers/html4css1.py``). + +The writer receives the node tree and returns the output document. +For HTML output, we can test this using the ``rst2html.py`` tool:: + + $ rst2html.py test.txt + <?xml version="1.0" encoding="utf-8" ?> + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> + <meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" /> + <title></title> + <link rel="stylesheet" href="default.css" type="text/css" /> + </head> + <body> + <div class="document"> + <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> + </div> + </body> + </html> + +So here we finally have our HTML output. The actual document contents +are in the fourth-last line. Note, by the way, that the HTML writer +did not render the (invisible) ``target`` node—only the ``paragraph`` +node and its children appear in the HTML output. + + +Extending Docutils +================== + +Now you'll ask, "how do I actually extend Docutils?" + +First of all, once you are clear about *what* you want to achieve, you +have to decide *where* to implement it—in the Parser (e.g. by adding a +directive or role to the reStructuredText parser), as a Transform, or +in the Writer. There is often one obvious choice among those three +(Parser, Transform, Writer). If you are unsure, ask on the +Docutils-develop_ mailing list. + +In order to find out how to start, it is often helpful to look at +similar features which are already implemented. For example, if you +want to add a new directive to the reStructuredText parser, look at +the implementation of a similar directive in +``docutils/parsers/rst/directives/``. + + +What Now? +========= + +This document is not complete. Many topics could (and should) be +covered here. To find out with which topics we should write about +first, we are awaiting *your* feedback. So please ask your questions +on the Docutils-develop_ mailing list. + + +.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop |
