diff options
Diffstat (limited to 'docs/dev/enthought-plan.txt')
| -rw-r--r-- | docs/dev/enthought-plan.txt | 480 |
1 files changed, 480 insertions, 0 deletions
diff --git a/docs/dev/enthought-plan.txt b/docs/dev/enthought-plan.txt new file mode 100644 index 000000000..0ab0d3c83 --- /dev/null +++ b/docs/dev/enthought-plan.txt @@ -0,0 +1,480 @@ +=========================================== + Plan for Enthought API Documentation Tool +=========================================== + +:Author: David Goodger +:Contact: goodger@python.org +:Date: $Date$ +:Revision: $Revision$ +:Copyright: 2004 by `Enthought, Inc. <http://www.enthought.com>`_ +:License: `Enthought License`_ (BSD-style) + +.. _Enthought License: http://docutils.sf.net/licenses/enthought.txt + +This document should be read in conjunction with the `Enthought API +Documentation Tool RFP`__ prepared by Janet Swisher. + +__ enthought-rfp.html + +.. contents:: +.. sectnum:: + + +Introduction +============ + +In March 2004 at I met Eric Jones, president and CTO of `Enthought, +Inc.`_, at `PyCon 2004`_ in Washington DC. He told me that Enthought +was using reStructuredText_ for source code documentation, but they +had some issues. He asked if I'd be interested in doing some work on +a customized API documentation tool. Shortly after PyCon, Janet +Swisher, Enthought's senior technical writer, contacted me to work out +details. Some email, a trip to Austin in May, and plenty of Texas +hospitality later, we had a project. This document will record the +details, milestones, and evolution of the project. + +In a nutshell, Enthought is sponsoring the implementation of an open +source API documentation tool that meets their needs. Fortuitously, +their needs coincide well with the "Python Source Reader" description +in `PEP 258`_. In other words, Enthought is funding some significant +improvements to Docutils, improvements that were planned but never +implemented due to time and other constraints. The implementation +will take place gradually over several months, on a part-time basis. + +This is an ideal example of cooperation between a corporation and an +open-source project. The corporation, the project, I personally, and +the community all benefit. Enthought, whose commitment to open source +is also evidenced by their sponsorship of SciPy_, benefits by +obtaining a useful piece of software, much more quickly than would +have been possible without their support. Docutils benefits directly +from the implementation of one of its core subsystems. I benefit from +the funding, which allows me to justify the long hours to my wife and +family. All the corporations, projects, and individuals that make up +the community will benefit from the end result, which will be great. + +All that's left now is to actually do the work! + +.. _PyCon 2004: http://pycon.org/dc2004/ +.. _reStructuredText: http://docutils.sf.net/rst.html +.. _SciPy: http://www.scipy.org/ + + +Development Plan +================ + +1. Analyze prior art, most notably Epydoc_ and HappyDoc_, to see how + they do what they do. I have no desire to reinvent wheels + unnecessarily. I want to take the best ideas from each tool, + combined with the outline in `PEP 258`_ (which will evolve), and + build at least the foundation of the definitive Python + auto-documentation tool. + + .. _Epydoc: http://epydoc.sourceforge.net/ + .. _HappyDoc: http://happydoc.sourceforge.net/ + .. _PEP 258: + http://docutils.sf.net/docs/peps/pep-0258.html#python-source-reader + +2. Decide on a base platform. The best way to achieve Enthought's + goals in a reasonable time frame may be to extend Epydoc or + HappyDoc. Or it may be necessary to start fresh. + +3. Extend the reStructuredText parser. See `Proposed Changes to + reStructuredText`_ below. + +4. Depending on the base platform chosen, build or extend the + docstring & doc comment extraction tool. This may be the biggest + part of the project, but I won't be able to break it down into + details until more is known. + + +Repository +========== + +If possible, all software and documentation files will be stored in +the Subversion repository of Docutils and/or the base project, which +are all publicly-available via anonymous pserver access. + +The Docutils project is very open about granting Subversion write +access; so far, everyone who asked has been given access. Any +Enthought staff member who would like Subversion write access will get +it. + +If either Epydoc or HappyDoc is chosen as the base platform, I will +ask the project's administrator for CVS access for myself and any +Enthought staff member who wants it. If sufficient access is not +granted -- although I doubt that there would be any problem -- we may +have to begin a fork, which could be hosted on SourceForge, on +Enthought's Subversion server, or anywhere else deemed appropriate. + + +Copyright & License +=================== + +Most existing Docutils files have been placed in the public domain, as +follows:: + + :Copyright: This document has been placed in the public domain. + +This is in conjunction with the "Public Domain Dedication" section of +COPYING.txt__. + +__ http://docutils.sourceforge.net/COPYING.html + +The code and documentation originating from Enthought funding will +have Enthought's copyright and license declaration. While I will try +to keep Enthought-specific code and documentation separate from the +existing files, there will inevitably be cases where it makes the most +sense to extend existing files. + +I propose the following: + +1. New files related to this Enthought-funded work will be identified + with the following field-list headers:: + + :Copyright: 2004 by Enthought, Inc. + :License: Enthought License (BSD Style) + + The license field text will be linked to the license file itself. + +2. For significant or major changes to an existing file (more than 10% + change), the headers shall change as follows (for example):: + + :Copyright: 2001-2004 by David Goodger + :Copyright: 2004 by Enthought, Inc. + :License: BSD-style + + If the Enthought-funded portion becomes greater than the previously + existing portion, Enthought's copyright line will be shown first. + +3. In cases of insignificant or minor changes to an existing file + (less than 10% change), the public domain status shall remain + unchanged. + +A section describing all of this will be added to the Docutils +`COPYING`__ instructions file. + +If another project is chosen as the base project, similar changes +would be made to their files, subject to negotiation. + +__ http://docutils.sf.net/COPYING.html + + +Proposed Changes to reStructuredText +==================================== + +Doc Comment Syntax +------------------ + +The "traits" construct is implemented as dictionaries, where +standalone strings would be Python syntax errors. Therefore traits +require documentation in comments. We also need a way to +differentiate between ordinary "internal" comments and documentation +comments (doc comments). + +Javadoc uses the following syntax for doc comments:: + + /** + * The first line of a multi-line doc comment begins with a slash + * and *two* asterisks. The doc comment ends normally. + */ + +Python doesn't have multi-line comments; only single-line. A similar +convention in Python might look like this:: + + ## + # The first line of a doc comment begins with *two* hash marks. + # The doc comment ends with the first non-comment line. + 'data' : AnyValue, + + ## The double-hash-marks could occur on the first line of text, + # saving a line in the source. + 'data' : AnyValue, + +How to indicate the end of the doc comment? :: + + ## + # The first line of a doc comment begins with *two* hash marks. + # The doc comment ends with the first non-comment line, or another + # double-hash-mark. + ## + # This is an ordinary, internal, non-doc comment. + 'data' : AnyValue, + + ## First line of a doc comment, terse syntax. + # Second (and last) line. Ends here: ## + # This is an ordinary, internal, non-doc comment. + 'data' : AnyValue, + +Or do we even need to worry about this case? A simple blank line +could be used:: + + ## First line of a doc comment, terse syntax. + # Second (and last) line. Ends with a blank line. + + # This is an ordinary, internal, non-doc comment. + 'data' : AnyValue, + +Other possibilities:: + + #" Instead of double-hash-marks, we could use a hash mark and a + # quotation mark to begin the doc comment. + 'data' : AnyValue, + + ## We could require double-hash-marks on every line. This has the + ## added benefit of delimiting the *end* of the doc comment, as + ## well as working well with line wrapping in Emacs + ## ("fill-paragraph" command). + # Ordinary non-doc comment. + 'data' : AnyValue, + + #" A hash mark and a quotation mark on each line looks funny, and + #" it doesn't work well with line wrapping in Emacs. + 'data' : AnyValue, + +These styles (repeated on each line) work well with line wrapping in +Emacs:: + + ## #> #| #- #% #! #* + +These styles do *not* work well with line wrapping in Emacs:: + + #" #' #: #) #. #/ #@ #$ #^ #= #+ #_ #~ + +The style of doc comment indicator used could be a runtime, global +and/or per-module setting. That may add more complexity than it's +worth though. + + +Recommendation +`````````````` + +I recommend adopting "#*" on every line:: + + # This is an ordinary non-doc comment. + + #* This is a documentation comment, with an asterisk after the + #* hash marks on every line. + 'data' : AnyValue, + +I initially recommended adopting double-hash-marks:: + + # This is an ordinary non-doc comment. + + ## This is a documentation comment, with double-hash-marks on + ## every line. + 'data' : AnyValue, + +But Janet Swisher rightly pointed out that this could collide with +ordinary comments that are then block-commented. This applies to +double-hash-marks on the first line only as well. So they're out. + +On the other hand, the JavaDoc-comment style ("##" on the first line +only, "#" after that) is used in Fredrik Lundh's PythonDoc_. It may +be worthwhile to conform to this syntax, reinforcing it as a standard. +PythonDoc does not support terse doc comments (text after "##" on the +first line). + +.. _PythonDoc: http://effbot.org/zone/pythondoc.htm + + +Update +`````` + +Enthought's Traits system has switched to a metaclass base, and traits +are now defined via ordinary attributes. Therefore doc comments are +no longer absolutely necessary; attribute docstrings will suffice. +Doc comments may still be desirable though, since they allow +documentation to precede the thing being documented. + + +Docstring Density & Whitespace Minimization +------------------------------------------- + +One problem with extensively documented classes & functions, is that +there is a lot of screen space wasted on whitespace. Here's some +current Enthought code (from lib/cp/fluids/gassmann.py):: + + def max_gas(temperature, pressure, api, specific_gravity=.56): + """ + Computes the maximum dissolved gas in oil using Batzle and + Wang (1992). + + Parameters + ---------- + temperature : sequence + Temperature in degrees Celsius + pressure : sequence + Pressure in MPa + api : sequence + Stock tank oil API + specific_gravity : sequence + Specific gravity of gas at STP, default is .56 + + Returns + ------- + max_gor : sequence + Maximum dissolved gas in liters/liter + + Description + ----------- + This estimate is based on equations given by Mavko, Mukerji, + and Dvorkin, (1998, pp. 218-219, or 2003, p. 236) obtained + originally from Batzle and Wang (1992). + """ + code... + +The docstring is 24 lines long. + +Rather than using subsections, field lists (which exist now) can save +6 lines:: + + def max_gas(temperature, pressure, api, specific_gravity=.56): + """ + Computes the maximum dissolved gas in oil using Batzle and + Wang (1992). + + :Parameters: + temperature : sequence + Temperature in degrees Celsius + pressure : sequence + Pressure in MPa + api : sequence + Stock tank oil API + specific_gravity : sequence + Specific gravity of gas at STP, default is .56 + :Returns: + max_gor : sequence + Maximum dissolved gas in liters/liter + :Description: This estimate is based on equations given by + Mavko, Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, + p. 236) obtained originally from Batzle and Wang (1992). + """ + code... + +As with the "Description" field above, field bodies may begin on the +same line as the field name, which also saves space. + +The output for field lists is typically a table structure. For +example: + + :Parameters: + temperature : sequence + Temperature in degrees Celsius + pressure : sequence + Pressure in MPa + api : sequence + Stock tank oil API + specific_gravity : sequence + Specific gravity of gas at STP, default is .56 + :Returns: + max_gor : sequence + Maximum dissolved gas in liters/liter + :Description: + This estimate is based on equations given by Mavko, + Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, p. 236) + obtained originally from Batzle and Wang (1992). + +But the definition lists describing the parameters and return values +are still wasteful of space. There are a lot of half-filled lines. + +Definition lists are currently defined as:: + + term : classifier + definition + +Where the classifier part is optional. Ideas for improvements: + +1. We could allow multiple classifiers:: + + term : classifier one : two : three ... + definition + +2. We could allow the definition on the same line as the term, using + some embedded/inline markup: + + * "--" could be used, but only in limited and well-known contexts:: + + term -- definition + + This is the syntax used by StructuredText (one of + reStructuredText's predecessors). It was not adopted for + reStructuredText because it is ambiguous -- people often use "--" + in their text, as I just did. But given a constrained context, + the ambiguity would be acceptable (or would it?). That context + would be: in docstrings, within a field list, perhaps only with + certain well-defined field names (parameters, returns). + + * The "constrained context" above isn't really enough to make the + ambiguity acceptable. Instead, a slightly more verbose but far + less ambiguous syntax is possible:: + + term === definition + + This syntax has advantages. Equals signs lend themselves to the + connotation of "definition". And whereas one or two equals signs + are commonly used in program code, three equals signs in a row + have no conflicting meanings that I know of. (Update: there + *are* uses out there.) + + The problem with this approach is that using inline markup for + structure is inherently ambiguous in reStructuredText. For + example, writing *about* definition lists would be difficult:: + + ``term === definition`` is an example of a compact definition list item + + The parser checks for structural markup before it does inline + markup processing. But the "===" should be protected by its inline + literal context. + +3. We could allow the definition on the same line as the term, using + structural markup. A variation on bullet lists would work well:: + + : term :: definition + : another term :: and a definition that + wraps across lines + + Some ambiguity remains:: + + : term ``containing :: double colons`` :: definition + + But the likelihood of such cases is negligible, and they can be + covered in the documentation. + + Other possibilities for the definition delimiter include:: + + : term : classifier -- definition + : term : classifier --- definition + : term : classifier : : definition + : term : classifier === definition + +The third idea currently has the best chance of being adopted and +implemented. + + +Recommendation +`````````````` + +Combining these ideas, the function definition becomes:: + + def max_gas(temperature, pressure, api, specific_gravity=.56): + """ + Computes the maximum dissolved gas in oil using Batzle and + Wang (1992). + + :Parameters: + : temperature : sequence :: Temperature in degrees Celsius + : pressure : sequence :: Pressure in MPa + : api : sequence :: Stock tank oil API + : specific_gravity : sequence :: Specific gravity of gas at + STP, default is .56 + :Returns: + : max_gor : sequence :: Maximum dissolved gas in liters/liter + :Description: This estimate is based on equations given by + Mavko, Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, + p. 236) obtained originally from Batzle and Wang (1992). + """ + code... + +The docstring is reduced to 14 lines, from the original 24. For +longer docstrings with many parameters and return values, the +difference would be more significant. |
