`

Invalid byte x of n-byte UTF-8 sequence.

 
阅读更多

是在从客户端发来的SOAPMessage中getEnvelope时出了错

提示错误原因是:Invalid byte 1 of 1-byte UTF-8 sequence.

它解析到了在 1字节UTF-8序列中无效的第一字节

1字节UTF-8序列是怎么样的呢?
One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0.

形式是0xxxxxxx

也就是说它读到的字节最高位是1,因此被认定为是非法。
前提是它认定该字节是UTF-8编码,为什么会认定是UTF-8,可能是默认,也可能是哪里指定,比如xml文件中。
至于凭什么它能认定是1-byte UTF-8 sequence,不是很清楚,可能存在什么预认定机制,或者这个byte对于任意字节的UTF-8的首字节来说都是非法的,它只是表达成这样(但造成歧义了)

结论:xml的编码实际上不是utf-8,可能是gb2312/gbk等,如果以这些编码去读取,也许就不会有这问题,或者传过来时将xml编码固定在utf-8

 

 

补充修改一下:
要认定是1-byte UTF-8 sequence还是比较容易认的,只要该字节后就出现了UTF-8 sequence的任意字节首字节,就可以辨识这是一个n-byte UTF-8 sequence.

first byte pattern of 1-byte UTF-8 sequence: 0xxxxxxx
first byte pattern of 2-byte UTF-8 sequence: 110xxxxx
first byte pattern of 3-byte UTF-8 sequence: 1110xxxx
first byte pattern of 4-byte UTF-8 sequence: 11110xxx


对于以下这些异常提示也是同理:
Invalid byte 2 of 2-byte UTF-8 sequence.
Invalid byte 2 of 3-byte UTF-8 sequence.
Invalid byte 2 of 4-byte UTF-8 sequence.

 

http://topic.csdn.net/u/20120513/13/97af0141-df0d-4758-8fab-f91dd9af01db.html?seed=973731074&r=78553396#r_78553396

http://en.wikipedia.org/wiki/UTF-8

 

 

分享到:
评论

相关推荐

    解决Invalid byte 1 of 1-byte UTF-8 sequence

    解决Invalid byte 1 of 1-byte UTF-8 sequence

    php解析xml提示Invalid byte 1 of 1-byte UTF-8 sequence错误的处理方法

    在利用php解析xml时提示Invalid byte 1 of 1-byte UTF-8 sequence错误了,这个问题我百度查实说是编码问题,结果我把编码处理一下果然KO了,下面我来分享一下解决办法

    ruby中文文档(ruby入门级别教程)

    包括ruby用户指南,RGSS入门教程,Programming Ruby,Ruby参考手册共4部分内容。 chm格式

    android layout XML解析错误的解决方法

    提示: org.xmlpull.v1.XmlPullParserException: PI must not start with xml (position:unknown @1:5 in java.io.InputStreamReader@47ec2770) org.xml.sax.SAXParseException: PI must not start with xml ...

    LuaUnicode icu-lua

    UTF-8 strings will probably be safe because UTF-8 does not use control characters such as \n and \r as part of multi-octet encodings. However, there are no guarantees; if you need to be certain, you ...

    SQL Server数据迁移至PostgreSQL出错的解释以及解决方案

    最近对SQL Server到PostgreSQL的数据迁移时出现了问题,返回的错误为:invalid byte sequence for encoding "UTF8": 0x00。经查证pg源代码,该问题引起的原因是sql server的字符类型字段中含有空字符0,该字符在pg中...

    k7 SRIO参考例程

    byte-wise writes of CSRs such as the deviceID register and BAR. - Message response transaction received as a user defined packet type using 16-bit device IDs appears as a corrupted packet on the ...

    The Art of Assembly Language Programming

    Shift and Rotate Operations 6.11.5 - Bit Operations and SETcc Instructions 6.11.6 - String Operations 6.11.7 - Conditional Jumps 6.11.8 - CALL and INT Instructions 6.11.9 - Conditional...

    python3.6.5参考手册 chm

    PEP 529: Change Windows filesystem encoding to UTF-8 PEP 528: Change Windows console encoding to UTF-8 PEP 520: Preserving Class Attribute Definition Order PEP 468: Preserving Keyword Argument ...

    S7A驱动720版本

    polling, after CP5611 sent a specific sequence of communication errors. Build 219 : Solved problems: Improvements and new functions: - New Parameter "Max Gap (Bytes)" on Setup parameter property...

    Bochs - The cross platform IA-32 (x86) emulator

    - Now you could enable/disable any of SSEx/AES/MOVBE/SYSENTER_SYSEXIT/XSAVE instruction sets using new CPUID option in .bochsrc. - When x86-64 support is compiled in, you could enable/disable long ...

    acpi控制笔记本风扇转速

    Disassembly of raw data buffers with byte initialization data now prefixes each output line with the current buffer offset. Disassembly of ASF! table now includes all variable-length data fields at ...

    iuhyiuhkjh908u0980

    在windows系统中,命令行中执行ant命令时,当指定的 构建脚本文件中包含中文字符,而构建脚本文件的编码是UTF-8时将会 Invalid byte 1 of 1-byte UTF-8 sequence. 的错误.这个问题尚未 知解决,故先采用GBK的编码. 2.在...

    Chatlog_Ripper:一个帮助你从聊天记录文件中提取 URL 的小程序

    聊天记录开膛手在 WDI 中,我们共享一切。... 如果您收到错误“in `scan': invalid byte sequence in UTF-8 (ArgumentError)”,只需将您的文本日志解析为可以转换为 UTF-8 的内容(例如 )。 我将来会解决这个问题。

    pyquery报错:UnicodeDecodeError: ‘gbk’ codec can’t decode byte

    今天想使用pyquery库读取本地HTML文件时报错:UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa1 in position 164: illegal multibyte sequence。 翻译一下就是UnicodeDecodeError: ‘gbk’编解码器无法...

    hls.min.js

    return{key:e.type,data:n}},t._utf8ArrayToStr=function(t){for(var e=void 0,r=void 0,i="",a=0,n=t.length;a<n;){var o=t[a++];switch(o>>4){case 0:return i;case 1:case 2:case 3:case 4:case 5:case 6:case 7:...

    CE中文版-启点CE过NP中文.exe

    Fixed freeze with allow increase/decrease for 8 byte long values Fixed several issues where minimizing a window and then close it would hang CE Fixed file scanning Fixed crashes when editing memory in...

    eac3to V3.17

    * added support for MKV "SRT/UTF8", "SRT/ASCII", "ASS" and "SSA" subtitles * increased some internal buffers to avoid AC3 overflow in the "thd ac3 joiner" * fixed: frame counting didn't work for MKV ...

    微软内部资料-SQL性能优化3

    An intent lock indicates that SQL Server wants to acquire a shared (S) lock or exclusive (X) lock on some of the resources lower down in the hierarchy. For example, a shared intent lock placed at the ...

Global site tag (gtag.js) - Google Analytics