summaryrefslogtreecommitdiff
path: root/kafka/codec.py
Commit message (Collapse)AuthorAgeFilesLines
* LZ4 support in kafka 0.8/0.9 does not accept a ContentSize headerDana Powers2017-03-141-6/+14
|
* Prefer python-lz4 over lz4f if availableDana Powers2017-03-141-7/+32
|
* Free lz4 decompression context to avoid leakDana Powers2017-03-141-0/+1
|
* Vendor six 1.10.0sixDana Powers2016-08-011-2/+4
|
* Use standard LZ4 framing for v1 messages / kafka 0.10 (#695)Dana Powers2016-05-221-7/+23
| | | | | | * LZ4 framing fixed in 0.10 / message v1 -- retain broken lz4 code for compatibility * lz4f does not support easy incremental decompression - raise RuntimeError * Update lz4 codec tests
* Handle broken LZ4 framing; switch to lz4tools + xxhashlz4_fixupDana Powers2016-01-261-7/+51
|
* Prefer module imports (io.BytesIO)Dana Powers2016-01-251-5/+5
|
* python-snappy does not like buffer-slices on pypy...Dana Powers2016-01-251-2/+12
|
* Ignore pylint errors on buffer/memoryviewDana Powers2016-01-251-0/+2
|
* Python3 does not support buffer -- use memoryview in snappy_decodeDana Powers2016-01-251-2/+8
|
* Dont need context manager for BytesIODana Powers2016-01-251-22/+18
|
* Write xerial-formatted snappy by default; use buffers to reduce copiesDana Powers2016-01-251-22/+16
|
* Add support for LZ4 compressed messages using python-lz4 moduleDana Powers2016-01-251-0/+13
|
* Docstring updatesDana Powers2016-01-071-13/+19
|
* allow to specify compression level for codecs which support thistrbs2015-09-121-2/+5
|
* Take the linter to kafka/codec.pyDana Powers2015-03-091-11/+10
|
* Gzip context manager not supported in py2.6, so use try/finally insteadDana Powers2015-03-091-2/+17
|
* Use context managers in gzip_encode / gzip_decodeDana Powers2015-03-081-12/+7
|
* Make all unit tests pass on py3.3/3.4Bruno ReniƩ2014-09-031-8/+11
|
* Make it possible to read and write xerial snappyGreg Bowyer2014-02-191-3/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes mumrah/kafka-python#126 TL;DR ===== This makes it possible to read and write snappy compressed streams that are compatible with the java and scala kafka clients (the xerial blocking format)) Xerial Details ============== Kafka supports transparent compression of data (both in transit and at rest) of messages, one of the allowable compression algorithms is Google's snappy, an algorithm which has excellent performance at the cost of efficiency. The specific implementation of snappy used in kafka is the xerial-snappy implementation, this is a readily available java library for snappy. As part of this implementation, there is a specialised blocking format that is somewhat none standard in the snappy world. Xerial Format ------------- The blocking mode of the xerial snappy library is fairly simple, using a magic header to identify itself and then a size + block scheme, unless otherwise noted all items in xerials blocking format are assumed to be big-endian. A block size (```xerial_blocksize``` in implementation) controls how frequent the blocking occurs 32k is the default in the xerial library, this blocking controls the size of the uncompressed chunks that will be fed to snappy to be compressed. The format winds up being | Header | Block1 len | Block1 data | Blockn len | Blockn data | | ----------- | ---------- | ------------ | ---------- | ------------ | | 16 bytes | BE int32 | snappy bytes | BE int32 | snappy bytes | It is important to not that the blocksize is the amount of uncompressed data presented to snappy at each block, whereas the blocklen is the number of bytes that will be present in the stream, that is the length will always be <= blocksize. Xerial blocking header ---------------------- Marker | Magic String | Null / Pad | Version | Compat ------ | ------------ | ---------- | -------- | -------- byte | c-string | byte | int32 | int32 ------ | ------------ | ---------- | -------- | -------- -126 | 'SNAPPY' | \0 | variable | variable The pad appears to be to ensure that SNAPPY is a valid cstring, and to align the header on a word boundary. The version is the version of this format as written by xerial, in the wild this is currently 1 as such we only support v1. Compat is there to claim the minimum supported version that can read a xerial block stream, presently in the wild this is 1. Implementation specific details =============================== The implementation presented here follows the Xerial implementation as of its v1 blocking format, no attempts are made to check for future versions. Since none-xerial aware clients might have persisted snappy compressed messages to kafka brokers we allow clients to turn on xerial compatibility for message sending, and perform header sniffing to detect xerial vs plain snappy payloads.
* Split fixtures out to a separate fileIvan Pouzyrevsky2013-06-071-3/+3
|
* Beautify codec.pyIvan Pouzyrevsky2013-06-071-24/+21
|
* Refactor and update integration testsIvan Pouzyrevsky2013-06-071-0/+7
|
* PEP8-ify most of the filesMahendra M2013-05-291-2/+6
| | | | consumer.py and conn.py will be done later after pending merges
* Add Snappy support0.1-alphaDavid Arthur2012-11-161-0/+17
| | | | Fixes #2
* Moved codec stuff into it's own moduleDavid Arthur2012-10-021-0/+23
Snappy will go there when I get around to it