changeset 27496:baa77652be68 stable

templatefilters: try round-trip utf-8 conversion by json filter (issue4933) As JSON string is known to be a unicode, we should try round-trip conversion for localstr type. This patch tests localstr type explicitly because encoding.fromlocal() may raise Abort for undecodable str, which is probably not what we want. Maybe we can refactor json filter to use encoding module more later. Still "{desc|json}" can't round-trip because showdescription() modifies a localstr object.
author Yuya Nishihara <yuya@tcha.org>
date Wed, 04 Nov 2015 23:48:15 +0900
parents 9350f00a7b23
children e5a1df51bb25
files mercurial/templatefilters.py tests/test-command-template.t
diffstat 2 files changed, 27 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/mercurial/templatefilters.py
+++ b/mercurial/templatefilters.py
@@ -197,7 +197,11 @@
         return {None: 'null', False: 'false', True: 'true'}[obj]
     elif isinstance(obj, int) or isinstance(obj, float):
         return str(obj)
+    elif isinstance(obj, encoding.localstr):
+        u = encoding.fromlocal(obj).decode('utf-8')  # can round-trip
+        return '"%s"' % jsonescape(u)
     elif isinstance(obj, str):
+        # no encoding.fromlocal() because it may abort if obj can't be decoded
         u = unicode(obj, encoding.encoding, 'replace')
         return '"%s"' % jsonescape(u)
     elif isinstance(obj, unicode):
--- a/tests/test-command-template.t
+++ b/tests/test-command-template.t
@@ -3479,3 +3479,26 @@
   $ hg log -T "\\xy" -R a
   hg: parse error: invalid \x escape
   [255]
+
+Set up repository for non-ascii encoding tests:
+
+  $ hg init nonascii
+  $ cd nonascii
+  $ python <<EOF
+  > open('utf-8', 'w').write('\xc3\xa9')
+  > EOF
+  $ HGENCODING=utf-8 hg branch -q `cat utf-8`
+  $ HGENCODING=utf-8 hg ci -qAm 'non-ascii branch' utf-8
+
+json filter should try round-trip conversion to utf-8:
+
+  $ HGENCODING=ascii hg log -T "{branch|json}\n" -r0
+  "\u00e9"
+
+json filter should not abort if it can't decode bytes:
+(not sure the current behavior is right; we might want to use utf-8b encoding?)
+
+  $ HGENCODING=ascii hg log -T "{'`cat utf-8`'|json}\n" -l1
+  "\ufffd\ufffd"
+
+  $ cd ..