summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorEli Collins <elic@assurancetechnologies.com>2011-06-17 15:59:39 -0400
committerEli Collins <elic@assurancetechnologies.com>2011-06-17 15:59:39 -0400
commit008de2c7b82ce455193df0773e1676b2c395407d (patch)
tree759216e97b82546dc2e4dd2372b5eb8654c1103b /docs
parente23ee714f2606fdb24e071bf481c76442e0a1aec (diff)
downloadpasslib-008de2c7b82ce455193df0773e1676b2c395407d.tar.gz
added unicode/bytes policy to password hash api
Diffstat (limited to 'docs')
-rw-r--r--docs/password_hash_api.rst116
1 files changed, 116 insertions, 0 deletions
diff --git a/docs/password_hash_api.rst b/docs/password_hash_api.rst
index f2f89be..085a604 100644
--- a/docs/password_hash_api.rst
+++ b/docs/password_hash_api.rst
@@ -453,6 +453,9 @@ the following attributes are usually exposed.
string containing list of all characters which are allowed
to be specified in salt parameter.
for most hashes, this is equal to :data:`passlib.utils.h64.CHARS`.
+
+ this must be a unicode string if the salt is encoded,
+ or (rarely) bytes if the salt is unencoded raw bytes.
.. todo::
@@ -479,6 +482,119 @@ the following attributes are usually exposed.
xxx: what about a bits_per_salt_char or some such, so effective salt strength
can be compared?
+Unicode Behavior
+================
+
+.. versionadded:: Passlib 1.5
+
+Quick summary
+------------
+For the application developer in a hurry:
+
+* Passwords should be provided as :class:`unicode` if possible.
+ While they may be provided as :class:`bytes`,
+ in that case it is strongly suggested
+ they be encoded using ``utf-8`` or ``ascii``.
+
+* Passlib will always return hashes as native python strings.
+ This means :class:`unicode` under Python 3,
+ and ``ascii``-encoded :class:`bytes` under Python 2.
+
+* Applications should provide hashes as :class:`unicode` if possible.
+ However, ``ascii``-encoded :class:`bytes` are also accepted
+ under Python 2.
+
+The following sections detail the issues surrounding
+encoding password hashes, and the behavior required
+by handlers implementing this API.
+It can be skipped by the uninterested.
+
+Passwords
+---------
+Applications are strongly encouraged to provide passwords
+as :class:`unicode`. Two situations where an application
+might need to provide a password as :class:`bytes`:
+the application isn't unicode aware (lots of python 2 apps),
+or it needs to verify a password hash that used a specific encoding (eg ``latin-1``).
+For either of these cases, application developers should consider
+the following issues:
+
+* Most hashes in Passlib operate on a string of bytes.
+ For handlers implementing such hashes,
+ passwords provided as :class:`unicode` should be encoded to ``utf-8``,
+ and passwords provided as :class:`bytes` should be treated as opaque.
+
+ A few of these hashes officially specify this behavior;
+ the rest have no preferred encoding at all,
+ so this was chosen as a sensible standard behavior.
+ Unless the underlying algorithm specifies an alternate policy,
+ handlers should always encode unicode to ``utf-8``.
+
+* Because of the above behavior for :class:`unicode` inputs,
+ applications which encode their passwords are urged
+ to use ``utf-8`` or ``ascii``,
+ so that hashes they generate with encoded bytes
+ will verify correctly if/when they start using unicode.
+
+ Applications which need to verify existing hashes
+ using an alternate encoding such as ``latin-1``
+ should be wary of this future "gotcha".
+
+* A few hashes operate on :class:`unicode` strings instead.
+ For handlers implementing such hashes:
+ passwords provided as :class:`unicode` should be handled as appropriate,
+ and passwords provided as :class:`bytes` should be treated as ``utf-8``,
+ and decoded.
+
+ This behavior was chosen in order to be compatible with
+ the common case (above), combined with the fact
+ that applications should never need to use a specific
+ encoding with these hashes, as they are natively unicode.
+
+ (The only hashes in Passlib like this are
+ :class:`~passlib.hash.oracle10` and :class:`~passlib.hash.nthash`)
+
+Hashes
+------
+With the exception of plaintext passwords,
+literally *all* of the hash formats surveyed by the Passlib authors
+use only the characters found in 7-bit ``ascii``.
+This has caused most password hashing code (in python and elsewhere)
+to draw a very blurry line between :class:`unicode` and :class:`bytes`.
+Because of that, the following behavior was dictated less
+by design requirements, and more by compatibility
+and ease of implementation issues:
+
+* Handlers should accept hashes as either :class:`unicode` or
+ as ``ascii``-encoded :class:`bytes`.
+
+ This behavior allows applications to provide hashes
+ as unicode or as bytes, as they please; making
+ (among other things) migration to Python 3 easier.
+
+ The primary exception to this is handlers implementing
+ plaintext passwords. The implementations in passlib generally
+ use ``utf-8`` to encode unicode passwords,
+ and reproduce unchanged any passwords encoded as opaque bytes.
+
+* Internally, it is recommended that handlers
+ operate on :class:`unicode` for parsing / formatting
+ purposes, and using :class:`bytes` only on decoded
+ data to be passed directly into their digest routine.
+
+* Handlers should return hashes as native python strings.
+ This means :class:`unicode` under Python 3,
+ and ``ascii``-encoded :class:`bytes` under Python 2.
+
+ This behavior was chosen to fit with Python 3's
+ unicode-oriented philosophy, while retaining
+ backwards compatibility with Passlib 1.4 and earlier
+ under Python 2.
+
+ Handlers should use the :func:`passlib.utils.to_hash_str` function
+ to coerce their unicode hashes to whatever is appropriate
+ for the platform before returning them.
+
Footnotes
=========
.. [#otypes] While this specification is written referring to classes and classmethods,