diff options
| author | Mike Bayer <mike_mp@zzzcomputing.com> | 2015-03-22 18:30:37 -0400 |
|---|---|---|
| committer | Mike Bayer <mike_mp@zzzcomputing.com> | 2015-03-22 19:25:10 -0400 |
| commit | ddab2d2351fc79138dcbe650c12f2e153dae4751 (patch) | |
| tree | b34b4e8eaca05337fe5615a9f67609cf765e592d /lib/sqlalchemy/dialects/mysql/base.py | |
| parent | b1146821aa8899ea8724c61ca3d48ba4928a1db4 (diff) | |
| download | sqlalchemy-ddab2d2351fc79138dcbe650c12f2e153dae4751.tar.gz | |
- more updates to the unicode mess to frame this in
as up-to-date recommendations as possible
Diffstat (limited to 'lib/sqlalchemy/dialects/mysql/base.py')
| -rw-r--r-- | lib/sqlalchemy/dialects/mysql/base.py | 69 |
1 files changed, 54 insertions, 15 deletions
diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py index 131112ff4..8460ff92a 100644 --- a/lib/sqlalchemy/dialects/mysql/base.py +++ b/lib/sqlalchemy/dialects/mysql/base.py @@ -151,6 +151,9 @@ multi-column key for some storage engines:: Unicode ------- +Charset Selection +~~~~~~~~~~~~~~~~~ + Most MySQL DBAPIs offer the option to set the client character set for a connection. This is typically delivered using the ``charset`` parameter in the URL, such as:: @@ -158,14 +161,11 @@ in the URL, such as:: e = create_engine("mysql+pymysql://scott:tiger@localhost/\ test?charset=utf8") -Whether or not the DBAPI handles the job of encoding and decoding is determined -by passing the ``use_unicode`` parameter, supported by MySQLdb and PyMySQL -and possibly others. -For example, to disable unicode conversion by the DBAPI and let -SQLAlchemy handle it:: - - e = create_engine("mysql+pymysql://scott:tiger@localhost/\ -test?charset=utf8&use_uncode=0") +This charset is the **client character set** for the connection. Some +MySQL DBAPIs will default this to a value such as ``latin1``, and some +will make use of the ``default-character-set`` setting in the ``my.cnf`` +file as well. Documentation for the DBAPI in use should be consulted +for specific behavior. The encoding used for Unicode has traditionally been ``'utf8'``. However, for MySQL versions 5.5.3 on forward, a new MySQL-specific encoding @@ -174,22 +174,61 @@ is due to the fact that MySQL's utf-8 encoding only supports codepoints up to three bytes instead of four. Therefore, when communicating with a MySQL database that includes codepoints more than three bytes in size, -this new charset must be used, as in:: +this new charset is preferred, if supported by both the database as well +as the client DBAPI, as in:: e = create_engine("mysql+pymysql://scott:tiger@localhost/\ test?charset=utf8mb4") +At the moment, up-to-date versions of MySQLdb and PyMySQL support the +``utf8mb4`` charset. Other DBAPIs such as MySQL-Connector and OurSQL +may **not** support it as of yet. + In order to use ``utf8mb4`` encoding, changes to -the MySQL schema and/or server configuration may be required - see the -MySQL documentation below for more information. +the MySQL schema and/or server configuration may be required. .. seealso:: `The utf8mb4 Character Set \ -<http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html>`_ - - :ref:`mysqldb_unicode` - MySQL-Python connection strings, which are - also equivalent on other MySQL DBAPIs. +<http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html>`_ - \ +in the MySQL documentation + +Unicode Encoding / Decoding +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All modern MySQL DBAPIs all offer the service of handling the encoding and +decoding of unicode data between the Python application space and the database. +As this was not always the case, SQLAlchemy also includes a comprehensive system +of performing the encode/decode task as well. As only one of these systems +should be in use at at time, SQLAlchemy has long included functionality +to automatically detect upon first connection whether or not the DBAPI is +automatically handling unicode. + +Whether or not the MySQL DBAPI will handle encoding can usually be configured +using a DBAPI flag ``use_unicode``, which is known to be supported at least +by MySQLdb, PyMySQL, and MySQL-Connector. Setting this value to ``0`` +in the "connect args" or query string will have the effect of disabling the +DBAPI's handling of unicode, such that it instead will return data of the +``str`` type or ``bytes`` type, with data in the configured charset:: + + # connect while disabling the DBAPI's unicode encoding/decoding + e = create_engine("mysql+mysqldb://scott:tiger@localhost/test?charset=utf8&use_unicode=0") + +Current recommendations for modern DBAPIs are as follows: + +* It is generally always safe to leave the ``use_unicode`` flag set at + its default; that is, don't use it at all. +* Under Python 3, the ``use_unicode=0`` flag should **never be used**. + SQLAlchemy under Python 3 generally assumes the DBAPI receives and returns + string values as Python 3 strings, which are inherently unicode objects. +* Under Python 2 with MySQLdb, the ``use_unicode=0`` flag will **offer + superior performance**, as MySQLdb's unicode converters under Python 2 only + have been observed to have unusually slow performance compared to SQLAlchemy's + fast C-based encoders/decoders. + +In short: don't specify ``use_unicode`` *at all*, with the possible +exception of ``use_unicode=0`` on MySQLdb with Python 2 **only** for a +potential performance gain. Ansi Quoting Style ------------------ |
