summaryrefslogtreecommitdiff
path: root/lib/sqlalchemy/dialects/mysql/base.py
diff options
context:
space:
mode:
authorMike Bayer <mike_mp@zzzcomputing.com>2015-03-22 18:30:37 -0400
committerMike Bayer <mike_mp@zzzcomputing.com>2015-03-22 19:25:10 -0400
commitddab2d2351fc79138dcbe650c12f2e153dae4751 (patch)
treeb34b4e8eaca05337fe5615a9f67609cf765e592d /lib/sqlalchemy/dialects/mysql/base.py
parentb1146821aa8899ea8724c61ca3d48ba4928a1db4 (diff)
downloadsqlalchemy-ddab2d2351fc79138dcbe650c12f2e153dae4751.tar.gz
- more updates to the unicode mess to frame this in
as up-to-date recommendations as possible
Diffstat (limited to 'lib/sqlalchemy/dialects/mysql/base.py')
-rw-r--r--lib/sqlalchemy/dialects/mysql/base.py69
1 files changed, 54 insertions, 15 deletions
diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py
index 131112ff4..8460ff92a 100644
--- a/lib/sqlalchemy/dialects/mysql/base.py
+++ b/lib/sqlalchemy/dialects/mysql/base.py
@@ -151,6 +151,9 @@ multi-column key for some storage engines::
Unicode
-------
+Charset Selection
+~~~~~~~~~~~~~~~~~
+
Most MySQL DBAPIs offer the option to set the client character set for
a connection. This is typically delivered using the ``charset`` parameter
in the URL, such as::
@@ -158,14 +161,11 @@ in the URL, such as::
e = create_engine("mysql+pymysql://scott:tiger@localhost/\
test?charset=utf8")
-Whether or not the DBAPI handles the job of encoding and decoding is determined
-by passing the ``use_unicode`` parameter, supported by MySQLdb and PyMySQL
-and possibly others.
-For example, to disable unicode conversion by the DBAPI and let
-SQLAlchemy handle it::
-
- e = create_engine("mysql+pymysql://scott:tiger@localhost/\
-test?charset=utf8&use_uncode=0")
+This charset is the **client character set** for the connection. Some
+MySQL DBAPIs will default this to a value such as ``latin1``, and some
+will make use of the ``default-character-set`` setting in the ``my.cnf``
+file as well. Documentation for the DBAPI in use should be consulted
+for specific behavior.
The encoding used for Unicode has traditionally been ``'utf8'``. However,
for MySQL versions 5.5.3 on forward, a new MySQL-specific encoding
@@ -174,22 +174,61 @@ is due to the fact that MySQL's utf-8 encoding only supports
codepoints up to three bytes instead of four. Therefore,
when communicating with a MySQL database
that includes codepoints more than three bytes in size,
-this new charset must be used, as in::
+this new charset is preferred, if supported by both the database as well
+as the client DBAPI, as in::
e = create_engine("mysql+pymysql://scott:tiger@localhost/\
test?charset=utf8mb4")
+At the moment, up-to-date versions of MySQLdb and PyMySQL support the
+``utf8mb4`` charset. Other DBAPIs such as MySQL-Connector and OurSQL
+may **not** support it as of yet.
+
In order to use ``utf8mb4`` encoding, changes to
-the MySQL schema and/or server configuration may be required - see the
-MySQL documentation below for more information.
+the MySQL schema and/or server configuration may be required.
.. seealso::
`The utf8mb4 Character Set \
-<http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html>`_
-
- :ref:`mysqldb_unicode` - MySQL-Python connection strings, which are
- also equivalent on other MySQL DBAPIs.
+<http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html>`_ - \
+in the MySQL documentation
+
+Unicode Encoding / Decoding
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All modern MySQL DBAPIs all offer the service of handling the encoding and
+decoding of unicode data between the Python application space and the database.
+As this was not always the case, SQLAlchemy also includes a comprehensive system
+of performing the encode/decode task as well. As only one of these systems
+should be in use at at time, SQLAlchemy has long included functionality
+to automatically detect upon first connection whether or not the DBAPI is
+automatically handling unicode.
+
+Whether or not the MySQL DBAPI will handle encoding can usually be configured
+using a DBAPI flag ``use_unicode``, which is known to be supported at least
+by MySQLdb, PyMySQL, and MySQL-Connector. Setting this value to ``0``
+in the "connect args" or query string will have the effect of disabling the
+DBAPI's handling of unicode, such that it instead will return data of the
+``str`` type or ``bytes`` type, with data in the configured charset::
+
+ # connect while disabling the DBAPI's unicode encoding/decoding
+ e = create_engine("mysql+mysqldb://scott:tiger@localhost/test?charset=utf8&use_unicode=0")
+
+Current recommendations for modern DBAPIs are as follows:
+
+* It is generally always safe to leave the ``use_unicode`` flag set at
+ its default; that is, don't use it at all.
+* Under Python 3, the ``use_unicode=0`` flag should **never be used**.
+ SQLAlchemy under Python 3 generally assumes the DBAPI receives and returns
+ string values as Python 3 strings, which are inherently unicode objects.
+* Under Python 2 with MySQLdb, the ``use_unicode=0`` flag will **offer
+ superior performance**, as MySQLdb's unicode converters under Python 2 only
+ have been observed to have unusually slow performance compared to SQLAlchemy's
+ fast C-based encoders/decoders.
+
+In short: don't specify ``use_unicode`` *at all*, with the possible
+exception of ``use_unicode=0`` on MySQLdb with Python 2 **only** for a
+potential performance gain.
Ansi Quoting Style
------------------