Java default io reader does not recognize all BOM markers. It it known to be fixed in JDK6, but I havent tested it yet. You can use UnicodeReader class to overcome problems and auto-recognize bom markers. It will give a transparent behaviour to underlying inputstreams.
Example code using UnicodeReader class
Here is an example method to read text file. It will recognize bom marker and skip it while reading.
public static char[] loadFile(String file) throws IOException {
// read text file, auto recognize bom marker or use
// system default if markers not found.
BufferedReader reader = null;
CharArrayWriter writer = null;
UnicodeReader r = new UnicodeReader(new FileInputStream(file), null);
char[] buffer = new char[16 * 1024]; // 16k buffer
int read;
try {
reader = new BufferedReader(r);
writer = new CharArrayWriter();
while( (read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
writer.flush();
return writer.toCharArray();
} catch (IOException ex) {
throw ex;
} finally {
try {
writer.close(); reader.close(); r.close();
} catch (Exception ex) { }
}
}
Example code to write UTF-8 with bom marker
Write bom marker bytes to start of empty file and all proper text editors have no problems using a correct charset while reading files. Java's OutputStreamWriter does not write utf8 bom marker bytes.
public static void saveFile(String file, String data, boolean append) throws IOException {
BufferedWriter bw = null;
OutputStreamWriter osw = null;
File f = new File(file);
FileOutputStream fos = new FileOutputStream(f, append);
try {
// write UTF8 BOM mark if file is empty
if (f.length() < 1) {
final byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF };
fos.write(bom);
}
osw = new OutputStreamWriter(fos, "UTF-8");
bw = new BufferedWriter(osw);
if (data != null) bw.write(data);
} catch (IOException ex) {
throw ex;
} finally {
try { bw.close(); fos.close(); } catch (Exception ex) { }
}
}
Example code using UnicodeReader class
Here is an example method to read text file. It will recognize bom marker and skip it while reading.
public static char[] loadFile(String file) throws IOException {
// read text file, auto recognize bom marker or use
// system default if markers not found.
BufferedReader reader = null;
CharArrayWriter writer = null;
UnicodeReader r = new UnicodeReader(new FileInputStream(file), null);
char[] buffer = new char[16 * 1024]; // 16k buffer
int read;
try {
reader = new BufferedReader(r);
writer = new CharArrayWriter();
while( (read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
writer.flush();
return writer.toCharArray();
} catch (IOException ex) {
throw ex;
} finally {
try {
writer.close(); reader.close(); r.close();
} catch (Exception ex) { }
}
}
Example code to write UTF-8 with bom marker
Write bom marker bytes to start of empty file and all proper text editors have no problems using a correct charset while reading files. Java's OutputStreamWriter does not write utf8 bom marker bytes.
public static void saveFile(String file, String data, boolean append) throws IOException {
BufferedWriter bw = null;
OutputStreamWriter osw = null;
File f = new File(file);
FileOutputStream fos = new FileOutputStream(f, append);
try {
// write UTF8 BOM mark if file is empty
if (f.length() < 1) {
final byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF };
fos.write(bom);
}
osw = new OutputStreamWriter(fos, "UTF-8");
bw = new BufferedWriter(osw);
if (data != null) bw.write(data);
} catch (IOException ex) {
throw ex;
} finally {
try { bw.close(); fos.close(); } catch (Exception ex) { }
}
}
发表评论
-
菜鸟 Spring 源码解读 推荐流程
2012-01-11 09:18 5052Spring源代码解析(一):IOC容器:http://www ... -
深入剖析Classloader(一)--类的主动使用与被动使用
2011-12-27 22:13 1053我们知道java运行的是这样的,首先java编译器将我们的源代 ... -
Java中连接字符串时是使用+号还是使用StringBuilder?
2011-12-26 14:04 886字符串是Java程序中最常用的一种数据结构之一。在Java中的 ... -
转一篇有关Java的内存泄露的文章(受益哦)
2011-07-20 09:28 7331 引言 Java的一个 ... -
Tomcat内存溢出的原因
2011-07-19 09:41 697Tomcat内存溢出的原因 在生产环境中tomcat内 ... -
深入研究java.lang.ThreadLocal类
2011-07-13 09:39 654一、概述 ThreadLocal是什么呢?其实Thread ... -
jboss中实现跨war包session同步
2011-06-12 23:28 1231跨war包session同步解决方 ... -
开源框架spring详解-----AOP的深刻理解
2011-05-26 22:13 1201开源框架spring详解-----AOP的深刻理解 AOP的 ... -
struts2核心工作流程与工作原理
2011-05-26 15:35 12481. Struts2架构图 这是S truts2官方站点提供的 ... -
Spring注入方式及用到的注解 -----@Component,@Service,@Controller,@Repository
2011-05-26 15:04 1188注入方式: 把DAO实现 ... -
Java中的native关键字浅析(Java Native Interface)
2011-05-21 23:13 698JNI是Java Native Interface的 ... -
Volatile 变量
2011-04-26 17:01 597Java 语言中的 volatile 变量可以被看作是一种 “ ... -
Java对象的强、软、弱和虚引用
2011-04-26 16:04 5931.Java对象的强、软、 ... -
Web 应用程序常见漏洞 CSRF 的入侵检测与防范
2011-04-23 15:00 1066简介: 互联网的安全问题一直存在,并且在可预见的未来中没有消弭 ... -
详解XSS跨站脚本攻击
2011-04-23 13:46 1107一、什么是XSS攻击 XSS ... -
CSRF攻击原理解析
2011-04-22 10:29 12420×00. 前言 在Web程序中 ... -
selenium 初步体检之富文本框操作
2011-04-20 20:10 1486public class LoginTest extends ... -
webx
2011-03-05 17:54 959webx 学习笔记。 -
java sftp tools
2011-02-24 13:30 1483import java.io.File; import jav ... -
HtmlUnit
2010-10-18 22:27 1625IntroductionThe dependencies pa ...
相关推荐
Java解决UTF-8的BOM问题,使用“UnicodeInputStream”、“UnicodeReader”。
解决PB创建UTF-8文件带BOM问题; 2.字段串直接生成XML文件。 由于项目需要,需要字符串转为XML文件,直接用Fileopen进行EncodingUTF8编码后,发现文件实际为UTF-8 BOM编码 问度娘发现有相同问题,但解决方式是利用...
2.解决:windows使用utf-8编码,linux使用utf-8无bom编码 3.此小工具主要针对utf-8编码文件,能够批量添加删除BOM,无识别转化ASIIC功能,添加BOM时,如果文件是utf-8(BOM),则跳过,删除亦然 4.当不选中添加...
文件的编码格式需要转换,gb2312,utf,utf-8等编码格式的相互转换等
压缩包内包含 ecj-4.6.2.jar 和 Util.class两个文件 使用说明: 1. 如果你的IDEA是2017.2.2版本,那么直接把ecj-4.6.2.jar放到你的IDEA安装目录下lib文件夹中替换原文件即可 2. 如果你是其他版本的IDEA,那么用压缩...
当上传文件存在中文时,修改上传文件编码为utf-8-bom
NULL 博文链接:https://baobeituping.iteye.com/blog/1280825
IDEA支持带BOM的UIDEA支持带BOM的UTF-8编码文件TF-8编码文件
压缩包内包含 ecj-4.6.2.jar 和 Util.class两个文件 使用说明: 1. 如果你的IDEA是2017.2.2版本,那么直接把ecj-4.6.2.jar放到你的IDEA安装目录下lib文件夹中替换原文件即可 2. 如果你是其他版本的IDEA,那么用压缩...
php检测文件夹下有含BOM的UTF-8的文件列表
NULL 博文链接:https://itsoul.iteye.com/blog/1488513
去除bom头小工具,工具使用方法: 选择要遍历的文件夹,输入...勾选ANSI转为UTF-8,则会将相应格式但编码为GB2312,GBK,GB18030的文件转为无BOM的UTF-8文件 请确保文件可写!使用前请做好备份,作者不承担任何法律责任
php 字符编码转换类,支持ANSI、Unicode、Unicode big endian、UTF-8、UTF-8+Bom 互相转换。