Commit bee446fc authored by Frank Vanderham's avatar Frank Vanderham Committed by Jelle van der Waa
Browse files

Fix Python 2 to 3 Unicode issues for string join

Revisited earlier commit where email subject lines with potentially
mixed encoding are joined into a single string. This fix brings in
the 'codecs' import to decode bytes to string using either the passed
encoding (if provided) or otherwise utf-8.

Changed the test_donor_import to no longer convert the Header to a
string and instead leave it as a byte array.
parent f972220b
......@@ -14,6 +14,7 @@
Usage: ./ donor_import path/to/maildir/
import codecs
import logging
import mailbox
import sys
......@@ -46,7 +47,7 @@ def decode_subject(self, subject):
default_charset = 'utf-8'
# Convert the list of tuples containing the decoded string and encoding to
# UTF-8
return u''.join([s[0].encode(default_charset, 'replace').decode(default_charset, 'replace') for s in subject])
return ''.join([codecs.decode(s[0], s[1] or default_charset) for s in subject])
def parse_subject(self, subject):
......@@ -38,7 +38,7 @@ def test_parse_name(self):
def test_decode_subject(self):
text = u'メイル'
subject = str(Header(text, 'utf-8'))
subject = Header(text, 'utf-8')
self.assertEqual(self.command.decode_subject(subject), text)
def test_invalid_args(self):
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment