String编码(一) 关于String.getBytes() -

orange5458

浏览: 347647 次
性别:
来自: 深圳

最近访客更多访客>>

zhangyi0618

一共丢七只羊

Pzs_Sign

Wcy071213

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

String编码(一) 关于String.getBytes()

博客分类：

J2SE&J2EE&J2ME

1.简介

本次学习的目的是为了弄清JAVA在不同情况下对String处理方式，从而更好的解决String乱码问题。

2.获取JAVA中String的编码

代码

package com.siyuan.jdk.test;

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class StringGetBytes {
	
	public static void main(String[] args) throws UnsupportedEncodingException {
		String str = "I AM 中国人";
		System.out.println("str = " + str);
		System.out.println("Default byte codes of str : " + Arrays.toString(str.getBytes()));
		System.out.println("GBK codes of str : " + Arrays.toString(str.getBytes("GBK")));
		System.out.println("UTF-8 codes of str : " + Arrays.toString(str.getBytes("UTF-8")));
		System.out.println("UTF-16 codes of str : " + Arrays.toString(str.getBytes("UTF-16")));
	}
	
}

运行结果

str = I AM 中国人
Default byte codes of str : [73, 32, 65, 77, 32, -42, -48, -71, -6, -56, -53]
GBK codes of str : [73, 32, 65, 77, 32, -42, -48, -71, -6, -56, -53]
UTF-8 codes of str : [73, 32, 65, 77, 32, -28, -72, -83, -27, -101, -67, -28, -70, -70]
UTF-16 codes of str : [-2, -1, 0, 73, 0, 32, 0, 65, 0, 77, 0, 32, 78, 45, 86, -3, 78, -70]

疑问

1）默认的getBytes()返回的编码为GBK的，而不是JAVA中的char编码方式Unicode，即UTF-16

通过跟踪String.getBytes()方法发现返回字节使用的编码为JVM的默认charset：Charset.defaultCharset()，而不是UTF-16

代码片段：

String

    public byte[] getBytes() {
	return StringCoding.encode(value, offset, count);
    }

StringCoding

    static byte[] encode(char[] ca, int off, int len) {
	String csn = Charset.defaultCharset().name();
	try {
	    return encode(csn, ca, off, len);
	} catch (UnsupportedEncodingException x) {
	    warnUnsupportedCharset(csn);
	}
	try {
	    return encode("ISO-8859-1", ca, off, len);
	} catch (UnsupportedEncodingException x) {
	    // If this code is hit during VM initialization, MessageUtils is
	    // the only way we will be able to get any kind of error message.
	    MessageUtils.err("ISO-8859-1 charset not available: "
			     + x.toString());
	    // If we can not find ISO-8859-1 (a required encoding) then things
	    // are seriously wrong with the installation.
	    System.exit(1);
	    return null;
	}
    }

Charset

    public static Charset defaultCharset() {
        if (defaultCharset == null) {
	    synchronized (Charset.class) {
		java.security.PrivilegedAction pa =
		    new GetPropertyAction("file.encoding");
		String csn = (String)AccessController.doPrivileged(pa);
		Charset cs = lookup(csn);
		if (cs != null)
		    defaultCharset = cs;
                else 
		    defaultCharset = forName("UTF-8");
            }
	}
	return defaultCharset;
    }

可通过运行参数-Dfile.encoding="UTF-8"进行修改
修改eclipse中的运行参数

运行结果

str = I AM 涓浗浜?
Default byte codes of str : [73, 32, 65, 77, 32, -28, -72, -83, -27, -101, -67, -28, -70, -70]
GBK codes of str : [73, 32, 65, 77, 32, -42, -48, -71, -6, -56, -53]
UTF-8 codes of str : [73, 32, 65, 77, 32, -28, -72, -83, -27, -101, -67, -28, -70, -70]
UTF-16 codes of str : [-2, -1, 0, 73, 0, 32, 0, 65, 0, 77, 0, 32, 78, 45, 86, -3, 78, -70]

问题

打印str出现乱码，但是字节编码正常
原因：Console控制台的编码不是UTF-8

修改eclipse中的console控制台编码

运行结果：

str = I AM 中国人
Default byte codes of str : [73, 32, 65, 77, 32, -28, -72, -83, -27, -101, -67, -28, -70, -70]
GBK codes of str : [73, 32, 65, 77, 32, -42, -48, -71, -6, -56, -53]
UTF-8 codes of str : [73, 32, 65, 77, 32, -28, -72, -83, -27, -101, -67, -28, -70, -70]
UTF-16 codes of str : [-2, -1, 0, 73, 0, 32, 0, 65, 0, 77, 0, 32, 78, 45, 86, -3, 78, -70]

2）UTF-16编码前面有两个字节为-2，-1

由于不同处理器对2字节处理方式不同，Big-endian（高位字节在前，低位字节在后）或Little-endian（低位字节在前，高位字节在后）编码，所以在对一串字符串进行编码是需要指明到底是Big-endian还是Little-endian，所以前面有两个字节用来保存BYTE_ORDER_MARK值

查看图片附件

分享到：

String编码(二) 证明JAVA的char编码为UTF- ... | Digester解析的顺序

2013-03-18 17:46
浏览 1373
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

String编码(一) 关于String.getBytes()

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

String编码(一) 关于String.getBytes()

评论

发表评论

相关推荐

ResourceBundle

JSR303(一) 简介

JVM内存管理

JVM体系结构

关于ClassLoader

ZIP压缩和解压

nio(三) FileChannel

nio(二)Buffer的子类

nio(一)Buffer

ProcessBuilder & Process

String编码(五) 文件编码检测

String编码(四) 关于文件处理

String编码(三) 关于编译

String编码(二) 证明JAVA的char编码为UTF-16

JSP自定义标签

HttpServletResponse.getWriter()用完之后需不需要调用close()

JAVA序列化(二) 特殊类型的序列化 单例模式和枚举类实现

JAVA序列化(二) 自定义序列化

JAVA序列化(一) Serializable

JAVA中的REGEXP非捕获组

最近访客更多访客>>

JAVA序列化(二) 特殊类型的序列化单例模式和枚举类实现