|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| | better error message for unicode coercion failure | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | value is calculated from the character values, in a way
  that makes sure an 8-bit ASCII string and a unicode string
  with the same contents get the same hash value.
  (as a side effect, this also works for ISO Latin 1 strings).
  for more details, see the python-dev discussion. | 
| | |  | 
| | 
| 
| 
| 
| 
| | objects including instance objects.
The old API PyUnicode_FromObject() is still available as shortcut. | 
| | 
| 
| 
| 
| | corrected some usage of 'unsigned long' where Py_UNICODE
should have been used. | 
| | 
| 
| 
| | true after revision 2.36 was checked in... | 
| | |  | 
| | 
| 
| 
| | should have been used. | 
| | 
| 
| 
| | to the new alphabetic lookup APIs in unicodectype.c. | 
| | 
| 
| 
| 
| | Make unicode_compare a true UTF-16 compare function (includes
support for surrogates). | 
| | 
| 
| 
| | A previous patch by Jack Jansen was accidently reverted. | 
| | 
| 
| 
| 
| 
| | New buffer overflow checks for formatting strings.
By Trent Mick. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| | Patch to the standard unicode-escape codec which dynamically
loads the Unicode name to ordinal mapping from the module
ucnhash.
By Bill Tutt. | 
| | 
| 
| 
| 
| | Better error message for "1 in unicodestring". Submitted
by Andrew Kuchling. | 
| | 
| 
| 
| 
| 
| 
| 
| | Fixed a bug in PyUnicode_Count() which would have caused a
core dump in case of substring coercion failure.
Synchronized .count() with the string method of the same name
to return len(s)+1 for s.count(''). | 
| | 
| 
| 
| 
| | This patch fixes an optimisation mystery in _PyUnicodeNew causing segfaults
on AIX when the interpreter is compiled with -O. | 
| | 
| 
| 
| | Added code so that .isXXX() testing returns 0 for emtpy strings. | 
| | 
| 
| 
| 
| | Fixed a typo and removed a debug printf(). Thanks to Finn Bock
for finding these. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| | Fixed %c formatting to check for one character arguments. Thanks
to Finn Bock for finding this bug.
Added a fix for bug PR#348 which originated from not resetting
the globals correctly in _PyUnicode_Fini(). | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Change the default encoding to 'ascii' (it was previously
defined as UTF-8).
Note: The implementation still uses UTF-8 to implement
the buffer protocol, so C APIs will still see UTF-8. This
is on purpose: rather than fixing the Unicode implementation,
the C APIs should be made Unicode aware. | 
| | 
| 
| 
| 
| | M.-A. Lemburg <mal@lemburg.com>:
Fixed a core dump in PyUnicode_Format(). | 
| | 
| 
| 
| 
| 
| 
| | Added support for user settable default encodings. The
current implementation uses a per-process global which
defines the value of the encoding parameter in case it
is set to NULL (meaning: use the default encoding). | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]). | 
| | 
| 
| 
| | strings _are_ valid! | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode.  I'm also holding back on his
change to main.c, which seems unnecessary to me.) | 
| | 
| 
| 
| | a size of 0 *is* illegal. | 
| | 
| 
| 
| | Fixes the MBCS codec to work correctly with zero length strings. | 
| | 
| 
| 
| 
| | Fixed \OOO interpretation for Unicode objects. \777 now
correctly produces the Unicode character with ordinal 511. | 
| | 
| 
| 
| 
| 
| 
| | Fixed a reference leak in the allocator.
Renamed utf8_string to _PyUnicode_AsUTF8String() and made
it external for use by other parts of the interpreter. | 
| | 
| 
| 
| 
| 
| | The maxsplit functionality in .splitlines() was replaced by the keepends
functionality which allows keeping the line end markers together
with the string. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | * New exported API PyUnicode_Resize()
* The experimental Keep-Alive optimization was turned back
  on after some tweaks to the implementation. It should now
  work without causing core dumps... this has yet to tested
  though (switching it off is easy: see the unicodeobject.c
  file for details).
* Fixed a memory leak in the Unicode freelist cleanup code.
* Added tests to correctly process the return code from
  _PyUnicode_Resize().
* Fixed a bug in the 'ignore' error handling routines
  of some builtin codecs. Added test cases for these to
  test_unicode.py. | 
| | 
| 
| 
| | to prevent possible buffer overruns. | 
| | 
| 
| 
| | doesn't mean what the Python programmer thought... | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent).  Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
  Unicode to a decimal char* string (used in the above new
  APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
  are masked, all other errors such as ValueError during
  Unicode coercion are passed through (note that PyUnicode_Compare
  does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
  masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
  does PyInt_FromString(): trailing garbage is reported
  as error and not longer silently ignored. The only characters
  which may be trailing the digits are 'L' and 'l' -- these
  are still silently ignored.
- string.ato?() now directly interface to int(), long() and
  float(). The error strings are now a little different, but
  the type still remains the same. These functions are now
  ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
  in the input string; PyNumber_Long() already did this (and
  still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError). | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Attached you find an update of the Unicode implementation.
    The patch is against the current CVS version. I would appreciate
    if someone with CVS checkin permissions could check the changes
    in.
    The patch contains all bugs and patches sent this week and also
    fixes a leak in the codecs code and a bug in the free list code
    for Unicode objects (which only shows up when compiling Python
    with Py_DEBUG; thanks to MarkH for spotting this one). | 
| | |  | 
|  | Fredrik Lundh. |