json.loads时报异常
1
| UnicodeDecodeError: 'utf-8' codec can't decode byte 0x** in position **: invalid continuation byte
|
一般是因为编码中有中文,并且和默认的解码方式(utf-8)不匹配造成的,在中国来说通常用最常见的非utf-8编码就是gb2312。(如果你知道里面包含了日语那么则应该尝试按Shift_JIS解码而不是gb2312,等等),另外如果实在解不出,有时候实在解不出或许也可以丢弃,比如注释中的文字。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| import re import json
def omit_unascii(data): new_data = b'' p = 0 m = re.search(b"[^\x00-\x7F]+", data[p:]) while m: new_data += data[p:p+m.start()] p += m.end() m = re.search(b"[^\x00-\x7F]+", data[p:]) new_data += data[p:] return new_data
def json_loads(msg): try: obj = json.loads(msg) except Exception as e: try: obj = json.loads(msg.decode("gb2312", "ignore")) except Exception as e: try: obj = json.loads(omit_unascii(msg)) except Exception as e: raise e return obj
|