java IO操作_001

xblia
浏览: 80380 次
性别:
来自: 西安
最近访客更多访客>>

hotapple
zhangleiyifan
daichunweiss
msj_0529
博主相关

博客
微博
相册
留言
关于我
文章分类

社区版块

存档分类

格式化的代价

实际上向文件写数据只是输出代价的一部分。另一个可观的代价是数据格式化。

考虑下面的字符输出程序

性能对比结果为：这些程序产生同样的输出。运行时间是: 

格式化方法
 示例语句
 运行时间
 
简单的输出一个固定字符
 System.out.print(s);
 1.3秒
 
使用简单格式"+"
 String s = 字符+字符，        

System.out.print(s);

 
 1.8秒
 
使用java.text包中的 MessageFormat 类的对象方法
 String s = fmt.format(values);

System.out.print(s);
 7.8秒
 
使用java.text包中的 MessageFormat 类的静态方法
  
 7.8*1.3秒
 

最慢的和最快的大约是6比1。如果格式没有预编译第三种方法将更慢，使用静态的方法代替: 

第三个方法比前两种方法慢很多的事实并不意味着你不应该使用它，而是你要意识到时间上的开销。 

在国际化的情况下信息格式化是很重要的，关心这个问题的应用程序通常从一个绑定的资源中读取格式然后使用它。 

方法 1，简单的输出一个固定的字符串，了解固有的I/O开销: 

public class format1 {

    public static void main(String args[]) {

      final int COUNT = 25000;

      for (int i = 1; i <= COUNT; i++) {

        String s = "The square of 5 is 25\n";

        System.out.print(s);

      }

    }

  }
 

方法2，使用简单格式"+": 

public class format2 {

    public static void main(String args[]) {

      int n = 5;

      final int COUNT = 25000;

      for (int i = 1; i <= COUNT; i++) {

        String s = "The square of " + n + " is " +

            n * n + "\n";

        System.out.print(s);

      }

    }

  }
 

方法 3，第三种方法使用java.text包中的 MessageFormat 类的对象方法: 

import java.text.*;

 public class format3 {

   public static void main(String args[]) {

     MessageFormat fmt =

      new MessageFormat("The square of {0} is {1}\n");

      Object values[] = new Object[2];

    int n = 5;

    values[0] = new Integer(n);

    values[1] = new Integer(n * n);

    final int COUNT = 25000;

    for (int i = 1; i <= COUNT; i++) {

      String s = fmt.format(values);

      System.out.print(s);

     }

    }

  }
 

方法 4，使用MessageFormat.format(String, Object[]) 类的静态方法

import java.text.*;

  public class format4 {

    public static void main(String args[]) {

      String fmt = "The square of {0} is {1}\n";

      Object values[] = new Object[2];

      int n = 5;

      values[0] = new Integer(n);

      values[1] = new Integer(n * n);

      final int COUNT = 25000;

      for (int i = 1; i <= COUNT; i++) {

        String s =

            MessageFormat.format(fmt, values);

        System.out.print(s);

      }

    }

  }
 

这比前一个例子多花费1/3的时间。 

随机访问的性能开销

RandomAccessFile 是一个进行随机文件I/O(在字节层次上)的类。这个类提供一个seek方法，和 C/C++中的相似,移动文件指针到任意的位置，然后从那个位置字节可以被读取或写入。 

seek方法访问底层的运行时系统因此往往是消耗巨大的。一个更好的代替是在RandomAccessFile上建立你自己的缓冲，并实现一个直接的字节read方法。read方法的参数是字节偏移量（>= 0）。这样的一个例子是:

这个程序简单的读取字节序列然后输出它们。 

适用的情况：如果有访问位置，这个技术是很有用的，文件中的附近字节几乎在同时被读取。例如，如果你在一个排序的文件上实现二分法查找，这个方法可能很有用。

不适用的情况：如果你在一个巨大的文件上的任意点做随机访问的话就没有太大价值。
 
import java.io.*;

  public class ReadRandom {

    private static final int DEFAULT_BUFSIZE = 4096;

    private RandomAccessFile raf;

    private byte inbuf[];

    private long startpos = -1;

    private long endpos = -1;

    private int bufsize;

    public ReadRandom(String name) 

     throws FileNotFoundException {

      this(name, DEFAULT_BUFSIZE);

    }

    public ReadRandom(String name, int b)

        throws FileNotFoundException {

      raf = new RandomAccessFile(name, "r");

      bufsize = b;

      inbuf = new byte[bufsize];

    }

    public int read(long pos) {

      if (pos < startpos || pos > endpos) {

        long blockstart = (pos / bufsize) * bufsize;

        int n;

        try {

          raf.seek(blockstart);

          n = raf.read(inbuf);

        }

        catch (IOException e) {

          return -1;

        }

        startpos = blockstart;

        endpos = blockstart + n - 1;

        if (pos < startpos || pos > endpos)

          return -1;

      }

      return inbuf[(int)(pos - startpos)] & 0xffff;

    }

    public void close() throws IOException {

      raf.close();

    }

    public static void main(String args[]) {

      if (args.length != 1) {

        System.err.println("missing filename");

        System.exit(1);

      }

      try {

        ReadRandom rr = new ReadRandom(args[0]);

        long pos = 0;

        int c;

        byte buf[] = new byte[1];

        while ((c = rr.read(pos)) != -1) {

          pos++;

          buf[0] = (byte)c;

          System.out.write(buf, 0, 1);

        }

        rr.close();

      }

      catch (IOException e) {

        System.err.println(e);

      }

    }

  }
 

压缩的性能开销

Java提供用于压缩和解压字节流的类，这些类包含在java.util.zip 包里面，这些类也作为 Jar 文件的服务基础 ( Jar 文件是带有附加文件列表的 Zip 文件)。 

压缩的目的是减少存储空间，同时被压缩的文件在IO速度不变的情况下会减少传输时间。

压缩时候要消耗CPU时间，占用内存。

压缩时间=数据量/压缩速度。

IO传输时间=数据容量/ IO速度。

传输数据的总时间＝压缩时间＋I/O传输时间

压缩是提高还是损害I/O性能很大程度依赖你的硬件配置，特别是和处理器和磁盘驱动器的速度相关。使用Zip技术的压缩通常意味着在数据大小上减少50%，但是代价是压缩和解压的时间。一个巨大(5到10 MB)的压缩文本文件，使用带有IDE硬盘驱动器的300-MHz Pentium PC从硬盘上读取可以比不压缩少用大约1/3的时间。 

压缩的一个有用的范例是向非常慢的媒介例如软盘写数据。使用高速处理器(300 MHz Pentium)和低速软驱(PC上的普通软驱)的一个测试显示压缩一个巨大的文本文件然后在写入软盘比直接写入软盘快大约50% 。

下面的程序接收一个输入文件并将之写入一个只有一项的压缩的 Zip 文件: 

import java.io.*;

 import java.util.zip.*;

  public class compress {

    public static void doit(String filein, String fileout) {

      FileInputStream fis = null;

      FileOutputStream fos = null;

      try {

        fis = new FileInputStream(filein);

        fos = new FileOutputStream(fileout);

        ZipOutputStream zos =  new ZipOutputStream(fos);

        ZipEntry ze = new ZipEntry(filein);

        zos.putNextEntry(ze);

        final int BUFSIZ = 4096;

        byte inbuf[] = new byte[BUFSIZ];

        int n;

        while ((n = fis.read(inbuf)) != -1)

          zos.write(inbuf, 0, n);

        fis.close();

        fis = null;

        zos.close();

        fos = null;

      }

      catch (IOException e) {

        System.err.println(e);

      }

      finally {

        try {

          if (fis != null)

            fis.close();

          if (fos != null)

            fos.close();

        }

        catch (IOException e) {

        }

      }

    }

  public static void main(String args[]) {

    if (args.length != 2) {

     System.err.println("missing filenames");

     System.exit(1);

    }

   if (args[0].equals(args[1])) {

     System.err.println("filenames are identical");

     System.exit(1);

      }

      doit(args[0], args[1]);

    }

  }
 

下一个程序执行相反的过程，将一个假设只有一项的Zip文件作为输入然后将之解压到输出文件: 

import java.io.*;

 import java.util.zip.*;

  public class uncompress {

    public static void doit(String filein, String fileout) {

      FileInputStream fis = null;

      FileOutputStream fos = null;

      try {

        fis = new FileInputStream(filein);

        fos = new FileOutputStream(fileout);

        ZipInputStream zis = new ZipInputStream(fis);

        ZipEntry ze = zis.getNextEntry();

        final int BUFSIZ = 4096;

        byte inbuf[] = new byte[BUFSIZ];

        int n;

        while ((n = zis.read(inbuf, 0, BUFSIZ)) != -1)

          fos.write(inbuf, 0, n);

        zis.close();

        fis = null;

        fos.close();

        fos = null;

      }

      catch (IOException e) {

        System.err.println(e);

      }

      finally {

        try {

          if (fis != null)

            fis.close();

          if (fos != null)

            fos.close();

        }

        catch (IOException e) {

        }

      }

    }

    public static void main(String args[]) {

      if (args.length != 2) {

     System.err.println("missing filenames");

     System.exit(1);

      }

    if (args[0].equals(args[1])) {

     System.err.println("filenames are identical");

     System.exit(1);

      }

      doit(args[0], args[1]);

    }

  }

 
 

高速缓存

关于硬件的高速缓存的详细讨论超出了本文的讨论范围。但是在有些情况下软件高速缓存能被用于加速I/O。考虑从一个文本文件里面以随机顺序读取一行的情况，这样做的一个方法是读取所有的行，然后把它们存入一个ArrayList (一个类似Vector的集合类): 

import java.io.*;

 import java.util.ArrayList;

  public class LineCache {

    private ArrayList list = new ArrayList();

    public LineCache(String fn) throws IOException {

      FileReader fr = new FileReader(fn);

      BufferedReader br = new BufferedReader(fr);

      String ln;

      while ((ln = br.readLine()) != null)

        list.add(ln);

      br.close();

    }

    public String getLine(int n) {

      if (n < 0)

        throw new IllegalArgumentException();

      return (n < list.size() ? (String)list.get(n) : null);

    }

    public static void main(String args[]) {

      if (args.length != 1) {

        System.err.println("missing filename");

        System.exit(1);

      }

      try {

        LineCache lc = new LineCache(args[0]);

        int i = 0;

        String ln;

        while ((ln = lc.getLine(i++)) != null)

          System.out.println(ln);

      }

      catch (IOException e) {

        System.err.println(e);

      }

    }

  } 
 

getLine 方法被用来获取任意行。这个技术是很有用的，但是很明显对一个大文件使用了太多的内存，因此有局限性。一个代替的方法是简单的记住被请求的行最近的100行，其它的请求直接从磁盘读取。这个安排在局域性的访问时很有用，但是在真正的随机访问时没有太大作用? 

分解

分解 是指将字节或字符序列分割为像单词这样的逻辑块的过程。Java 提供StreamTokenizer 类, 像下面这样操作: 

import java.io.*;

  public class token1 {

    public static void main(String args[]) {

     if (args.length != 1) {

       System.err.println("missing filename");

       System.exit(1);

      }

      try {

        FileReader fr = new FileReader(args[0]);

        BufferedReader br = new BufferedReader(fr);

        StreamTokenizer st = new StreamTokenizer(br);

        st.resetSyntax();

        st.wordChars('a', 'z');

        int tok;

        while ((tok = st.nextToken()) !=

            StreamTokenizer.TT_EOF) {

          if (tok == StreamTokenizer.TT_WORD)

            ;// st.sval has token

        }

        br.close();

      }

      catch (IOException e) {

        System.err.println(e);

      }

    }

  }
 

这个例子分解小写单词 (字母a-z)。如果你自己实现同等地功能，它可能像这样： 

import java.io.*;

  public class token2 {

    public static void main(String args[]) {

      if (args.length != 1) {

        System.err.println("missing filename");

        System.exit(1);

      }

      try {

        FileReader fr = new FileReader(args[0]);

        BufferedReader br = new BufferedReader(fr);

        int maxlen = 256;

        int currlen = 0;

        char wordbuf[] = new char[maxlen];

        int c;

        do {

          c = br.read();

          if (c >= 'a' && c <= 'z') {

            if (currlen == maxlen) {

              maxlen *= 1.5;

              char xbuf[] =

                  new char[maxlen];

              System.arraycopy(

                  wordbuf, 0,

                  xbuf, 0, currlen);

              wordbuf = xbuf;

            }

            wordbuf[currlen++] = (char)c;

          }

          else if (currlen > 0) {

            String s = new String(wordbuf,

                0, currlen);

          // do something with s

            currlen = 0;

          }

        } while (c != -1);

        br.close();

      }

      catch (IOException e) {

        System.err.println(e);

      }

    }

  }
 

第二个程序比前一个运行快大约 20%，代价是写一些微妙的底层代码。 

StreamTokenizer 是一种混合类，它从字符流(例如 BufferedReader)读取, 但是同时以字节的形式操作，将所有的字符当作双字节(大于 0xff) ，即使它们是字母字符。 

串行化

串行化 以标准格式将任意的Java数据结构转换为字节流。例如，下面的程序输出随机整数数组: 

  import java.io.*;

  import java.util.*;

  public class serial1 {

    public static void main(String args[]) {

      ArrayList al = new ArrayList();

      Random rn = new Random();

      final int N = 100000;

      for (int i = 1; i <= N; i++)

        al.add(new Integer(rn.nextInt()));

      try {

        FileOutputStream fos =　new FileOutputStream("test.ser");

        BufferedOutputStream bos =  new BufferedOutputStream(fos);

        ObjectOutputStream oos =  new ObjectOutputStream(bos);

        oos.writeObject(al);

        oos.close();

      }

      catch (Throwable e) {

        System.err.println(e);

      }

    }

  }

 
 

而下面的程序读回数组: 

import java.io.*;

 import java.util.*;

  public class serial2 {

    public static void main(String args[]) {

      ArrayList al = null;

      try {

        FileInputStream fis = new FileInputStream("test.ser");

        BufferedInputStream bis = new BufferedInputStream(fis);

        ObjectInputStream ois = new ObjectInputStream(bis);

        al = (ArrayList)ois.readObject();

        ois.close();

      }

      catch (Throwable e) {

        System.err.println(e);

      }

    }

  }
 

注意我们使用缓冲提高I/O操作的速度。 

有比串行化更快的输出大量数据然后读回的方法吗？可能没有，除非在特殊的情况下。例如，假设你决定将文本输出为64位的整数而不是一组8字节。作为文本的长整数的最大长度是大约20个字符，或者说二进制表示的2.5倍长。这种格式看起来不会快。然而，在某些情况下，例如位图，一个特殊的格式可能是一个改进。然而使用你自己的方案而不是串行化的标准方案将使你卷入一些权衡。 

除了串行化实际的I/O和格式化开销外(使用DataInputStream和 DataOutputStream), 还有其他的开销，例如在串行化恢复时的创建新对象的需要。 

注意DataOutputStream 方法也可以用于开发半自定义数据格式，例如: 

import java.io.*;

  import java.util.*;

  public class binary1 {

    public static void main(String args[]) {

      try {

        FileOutputStream fos = new FileOutputStream("outdata");

        BufferedOutputStream bos = new BufferedOutputStream(fos);

        DataOutputStream dos = new DataOutputStream(bos);

        Random rn = new Random();

        final int N = 10;

        dos.writeInt(N);

        for (int i = 1; i <= N; i++) {

          int r = rn.nextInt();

          System.out.println(r);

          dos.writeInt(r);

        }

        dos.close();

      }

      catch (IOException e) {

        System.err.println(e);

      }

    }

  }

 
 

和: 

 import java.io.*;

  public class binary2 {

    public static void main(String args[]) {

      try {

        FileInputStream fis = new FileInputStream("outdata");

        BufferedInputStream bis = new BufferedInputStream(fis);

        DataInputStream dis = new DataInputStream(bis);

        int N = dis.readInt();

        for (int i = 1; i <= N; i++) {

          int r = dis.readInt();

          System.out.println(r);

        }

        dis.close();

      }

      catch (IOException e) {

        System.err.println(e);

      }

    }

  }

这些程序将10个整数写入文件然后读回它们
分享到：