willpower

浏览: 80806 次

最近访客更多访客>>

小丑鱼9527

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Unicode and UTF8

博客分类：

Study 学习

Sybase HP Oracle Apple JavaScript

What is Unicode?

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.

Unicode is changing all that!

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.

Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.

UTF8

它其实和Unicode是同类，就是在编码方式上不同！
首先UTF8编码后的大小是不一定，不像Unicode编码后的大小是一样的！

我们先来看Unicode的编码：一个英文字母 “a” 和　一个汉字 “好”，编码后都是占用的空间大小是一样的，都是两个字节！

而UTF8编码：一个英文字母“a” 和　一个汉字 “好”，编码后占用的空间大小就不样了，前者是一个字节，后者是三个字节！

现在就让我们来看看UTF8编码的原理吧：
　　因为一个字母还有一些键盘上的符号加起来只用二进制七位就可以表示出来，而一个字节就是八位，所以UTF8就用一个字节来表式字母和一些键盘上的符号。然而当我们拿到被编码后的一个字节后怎么知道它的组成？它有可能是英文字母的一个字节，也有可能是汉字的三个字节中的一个字节！所以，UTF8是有标志位的！

　　当要表示的内容是　7位　的时候就用一个字节：0******* 　第一个0为标志位，剩下的空间正好可以表示ASCII　0－127　的内容。

　　当要表示的内容在　8　到　11　位的时候就用两个字节：110***** 10****** 　第一个字节的110和第二个字节的10为标志位。

　　当要表示的内容在　12　到　16　位的时候就用三个字节：1110***** 10****** 10****** 　　　和上面一样，第一个字节的1110和第二、三个字节的10都是标志位，剩下的空间正好可以表示汉字。

　　以此类推：
四个字节：11110**** 10****** 10****** 10******
　　五个字节：111110*** 10****** 10****** 10****** 10******
　　六个字节：1111110** 10****** 10****** 10****** 10****** 10******
　　.............................................
..............................................

明白了没有？
编码的方法是从低位到高位

现在就让我们来看看实例吧！

黄色为标志位
其它着色为了显示其，编码后的位置

Unicode十六进制	Unicode二进制	UTF8二进制	UTF8十六进制	UTF8字节数
B	00001011	00001010	B	1
9D	00010011101	11000010 10011101	C2 9D	2
A89E	10101000 10011110	11101010 10100010 10011110	EA A2 9E	3

分享到：

Running IE from command line | Daemon Thread Notes

2007-04-03 10:27
浏览 870
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Unicode and UTF8

What is Unicode?

Unicode is changing all that!

UTF8

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Unicode and UTF8

What is Unicode?

Unicode is changing all that!

UTF8

评论

发表评论

相关推荐

字符编码笔记：ASCII，Unicode和UTF-8 （引用）

How to set up a simple LRU cache using LinkedHash

Scalability?

Cray Reminiscences

lock-free

解决java.lang.OutOfMemoryError: PermGen space(转帖)

Performance...

数据仓库

Expressions Transform

Java cleanup code

Java performance tunning

Running IE from command line

Daemon Thread Notes

How to know the main class of a jar file?

The best chinese BAT tutorial(from www.boofee.net/bigfee/)

Basics - Binary search

MergeSort

Graph data structure

Functional Programming For The Rest of Us

Functional Programming For The Rest of Us

最近访客更多访客>>