转：Java 字符串编码

huangtut

浏览: 238014 次
性别:
来自: 杭州

最近访客更多访客>>

rudaoxia

xiaomabobo

xuyuji

hank528

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Java

Java C C++C#Blog

出处：http://blog.sina.com.cn/s/blog_3f4dc73b0100afub.html

在JAVA中，一个char是2个字节（byte），而一个中文汉字是一个字符，也是2个字节。所以可以把汉字赋值给char。而英文字母都是一个字节的，因此它也能保存到一个byte里，一个中文汉字却不能。

char型字符单独在输出语句时，输出它的字符本身，与＋相连时，输出它的ASCII码值。

UTF-16BE和UTF-16LE是UNICODE编码家族的两个成员。UNICODE标准定义了UTF-8、UTF-16、UTF-32三种编码格式，共有UTF-8、UTF-16、UTF-16BE、UTF-16LE、UTF-32、UTF-32BE、UTF-32LE七种编码方案。JAVA所采用的编码方案是UTF-16BE。

字符编码例：

import java.io.UnsupportedEncodingException;    
public class EncodeTest {    
       
    public static void printByteLength(String s, String encodingName) {    
        System.out.print("字节数：");    
        try {    
            System.out.print(s.getBytes(encodingName).length);    
        } catch (UnsupportedEncodingException e) {    
            e.printStackTrace();    
        }    
        System.out.println(";编码：" + encodingName);    
    }    
   
    public static void main(String[] args) {    
        String en = "A";    
        String ch = "人";    
   
        // 计算一个英文字母在各种编码下的字节数    
        System.out.println("英文字母：" + en);    
        EncodeTest.printByteLength(en, "GB2312");    
        EncodeTest.printByteLength(en, "GBK");    
        EncodeTest.printByteLength(en, "GB18030");    
        EncodeTest.printByteLength(en, "ISO-8859-1");    
        EncodeTest.printByteLength(en, "UTF-8");    
        EncodeTest.printByteLength(en, "UTF-16");    
        EncodeTest.printByteLength(en, "UTF-16BE");    
        EncodeTest.printByteLength(en, "UTF-16LE");    
   
        System.out.println();    
   
        // 计算一个中文汉字在各种编码下的字节数    
        System.out.println("中文汉字：" + ch);    
        EncodeTest.printByteLength(ch, "GB2312");    
        EncodeTest.printByteLength(ch, "GBK");    
        EncodeTest.printByteLength(ch, "GB18030");    
        EncodeTest.printByteLength(ch, "ISO-8859-1");    
        EncodeTest.printByteLength(ch, "UTF-8");    
        EncodeTest.printByteLength(ch, "UTF-16");    
        EncodeTest.printByteLength(ch, "UTF-16BE");    
        EncodeTest.printByteLength(ch, "UTF-16LE");    
    }    
}

运行结果如下：


英文字母：A 
字节数：1;编码：GB2312 
字节数：1;编码：GBK 
字节数：1;编码：GB18030 
字节数：1;编码：ISO-8859-1 
字节数：1;编码：UTF-8 
字节数：4;编码：UTF-16 
字节数：2;编码：UTF-16BE 
字节数：2;编码：UTF-16LE 
中文汉字：人 
字节数：2;编码：GB2312 
字节数：2;编码：GBK 
字节数：2;编码：GB18030 
字节数：1;编码：ISO-8859-1 
字节数：3;编码：UTF-8 
字节数：4;编码：UTF-16 
字节数：2;编码：UTF-16BE 
字节数：2;编码：UTF-16LE

字符截取例：

import java.io.UnsupportedEncodingException;       
      
public class CutString {       
    public static void main(String[] args) throws UnsupportedEncodingException {       
        String s = "我ZWR爱JAVA";       
        // 获取GBK编码下的字节数据       
        byte[] data = s.getBytes("GBK");       
        byte[] tmp = new byte[6];       
        // 将data数组的前六个字节拷贝到tmp数组中       
        System.arraycopy(data, 0, tmp, 0, 6);       
        // 将截取到的前六个字节以字符串形式输出到控制台       
        s = new String(tmp);       
        System.out.println(s);       
    }       
}

输出结果：

我ZWR? 
例2：

import java.io.UnsupportedEncodingException;    
   
public class CutString {    
   
       
    public static boolean isChineseChar(char c)    
            throws UnsupportedEncodingException {    
        // 如果字节数大于1，是汉字    
        // 以这种方式区别英文字母和中文汉字并不是十分严谨，但在这个题目中，这样判断已经足够了    
        return String.valueOf(c).getBytes("GBK").length > 1;    
    }    
   
       
    public static String substring(String orignal, int count)    
            throws UnsupportedEncodingException {    
        // 原始字符不为null，也不是空字符串    
        if (orignal != null && !"".equals(orignal)) {    
            // 将原始字符串转换为GBK编码格式    
            orignal = new String(orignal.getBytes(), "GBK");    
            // 要截取的字节数大于0，且小于原始字符串的字节数    
            if (count > 0 && count < orignal.getBytes("GBK").length) {    
                StringBuffer buff = new StringBuffer();    
                char c;    
                for (int i = 0; i < count; i++) {    
                    // charAt(int index)也是按照字符来分解字符串的    
                    c = orignal.charAt(i);    
                    buff.append(c);    
                    if (CutString.isChineseChar(c)) {    
                        // 遇到中文汉字，截取字节总数减1    
                        --count;    
                    }    
                }    
                return buff.toString();    
            }    
        }    
        return orignal;    
    }    
   
    public static void main(String[] args) {    
        // 原始字符串    
        String s = "我ZWR爱JAVA";    
        System.out.println("原始字符串：" + s);    
        try {    
            System.out.println("截取前1位：" + CutString.substring(s, 1));    
            System.out.println("截取前2位：" + CutString.substring(s, 2));    
            System.out.println("截取前4位：" + CutString.substring(s, 4));    
            System.out.println("截取前6位：" + CutString.substring(s, 6));    
        } catch (UnsupportedEncodingException e) {    
            e.printStackTrace();    
        }    
    }    
}

运行结果：


原始字符串：我ZWR爱JAVA 
截取前1位：我 
截取前2位：我 
截取前4位：我ZW 
截取前6位：我ZWR爱

分享到：

Grails 安装配置 | 转：Grails+MySQL+插入数据乱码问题

2010-09-09 16:40
浏览 1626
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

转：Java 字符串编码

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

转：Java 字符串编码

评论

发表评论

相关推荐

spring

转:Tomcat免重启随意更改java代码 提高开发效率

转:Java远程通讯可选技术及原理

转:Hessian 原理分析

另一种遍历Map的方式： Map.Entry 和 Map.entrySet() (转)

(转载)什么是线程？

Java 内存机制

JAVA中多种计时器的比较与分析

正确使用 Volatile 变量

servlet2.5/jsp2.1的新特征（转载）

ThreadLocal的设计与使用（原理篇

企业系统管理体系 用J2EE架构企业级应用(3)

企业系统管理体系 用J2EE架构企业级应用(2)

企业系统管理体系 用J2EE架构企业级应用(1)

对比XStream和JSON

Java 设计原则

Martin Fowler：持续集成

十个最好的Java性能故障排除工具

转：JDK5.0 新特性--泛型

转：Eclipse快捷键 Template用法探讨

最近访客更多访客>>

转:Tomcat免重启随意更改java代码提高开发效率

企业系统管理体系用J2EE架构企业级应用(3)

企业系统管理体系用J2EE架构企业级应用(2)

企业系统管理体系用J2EE架构企业级应用(1)