Chapter 2. Streams

leonzhx

浏览: 769463 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Java Network Programming 读书笔记

OutputStream InputStream Reader Writer

1. Java’s basic output (abstract) class is java.io.OutputStream. This class provides the fundamental methods needed to write data:

public abstract void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) throws IOException
public void flush() throws IOException
public void close() throws IOException

2. Subclasses of OutputStream use these methods to write data onto particular media. For instance, a FileOutputStream uses these methods to write data into a file. A TelnetOutputStream uses these methods to write data onto a network connection. A ByteArrayOutputStream uses these methods to write data into an expandable byte array.

3. OutputStream’s fundamental method is write(int b). When you write an int onto a network connection using write(int b), only the least significant byte of the number is written and the remaining three bytes are ignored. (This is the effect of casting an int to a byte.)

4. Every TCP segment contains at least 40 bytes of overhead for routing and error correction. If each byte is sent by itself, you may be stuffing the network with 41 times more data than you think you are. Add overhead of the host-to-network layer protocol, and this can be even worse. Consequently, most TCP/IP implementations buffer data to some extent. That is, they accumulate bytes in memory and send them to their eventual destination only when a certain number have accumulated or a certain amount of time has passed.

5. Streams can also be buffered in software, directly in the Java code as well as in the network hardware. Typically, this is accomplished by chaining a BufferedOutputStream or a BufferedWriter to the underlying stream. The flush() method forces the buffered stream to send its data even if the buffer isn’t yet full. It’s important to flush your streams whether you think you need to or not. If flushing isn’t necessary for a particular stream, it’s a very low-cost operation. You should flush all streams immediately before you close them.

6. When you’re done with a stream, close it by invoking its close() method. This releases any resources associated with the stream, such as file handles or ports. If the stream derives from a network connection, then closing the stream terminates the connection.

7. In Java 6 and earlier, it’s wise to close the stream in a finally block. To get the right variable scope, you have to declare the stream variable outside the try block but initialize it inside the try block. Furthermore, to avoid NullPointerException you need to check whether the stream variable is null before closing it. Finally, you usually want to ignore or at most log any exceptions that occur while closing the stream:

OutputStream out = null;
try {
  out = new FileOutputStream("/tmp/data.txt");
  // work with the output stream...
} catch (IOException ex) {
  System.err.println(ex.getMessage());
} finally {
  if (out != null) {
    try {
      out.close();
    } catch (IOException ex) {
      // ignore
    }
  }
}

Java 7 introduces the try with resources construct to make this cleanup neater: Instead of declaring the stream variable outside the try block, you declare it inside an argument list of the try block:

try (OutputStream out = new FileOutputStream("/tmp/data.txt")) {
  // work with the output stream...
} catch (IOException ex) {
  System.err.println(ex.getMessage());
}

The finally clause is no longer needed. Java automatically invokes close() on any AutoCloseable objects declared inside the argument list of the try block.

8. Java’s basic input (abstract) class is java.io.InputStream. This class provides the fundamental methods needed to read data as raw bytes:

public abstract int read() throws IOException
public int read(byte[] input) throws IOException
public int read(byte[] input, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available() throws IOException
public void close() throws IOException

9. Concrete subclasses of InputStream use these methods to read data from particular media. For instance, a FileInputStream reads data from a file. A TelnetInputStream reads data from a network connection. A ByteArrayInputStream reads data from an array of bytes.

10. The basic method of InputStream is the noargs read() method. This method reads a single byte of data from the input stream’s source and returns it as an int from 0 to 255. End of stream is signified by returning –1. The read() method waits and blocks execution of any code that follows it until a byte of data is available and ready to be read.

11. You can convert a signed byte to an unsigned byte like this:

int i = b >= 0 ? b : 256 + b;

12. All three read() methods return –1 to signal the end of the stream. If the stream ends while there’s still data that hasn’t been read, the multibyte read() methods return the data until the buffer has been emptied. The next call to any of the read() methods will return –1. The multibyte read methods return the number of bytes actually read. To guarantee that all the bytes you want are actually read, place the read in a loop that reads repeatedly until the array is filled:

int bytesRead = 0;
int bytesToRead = 1024;
byte[] input = new byte[bytesToRead];
while (bytesRead < bytesToRead) {
  int result = in.read(input, bytesRead, bytesToRead - bytesRead);
  if (result == -1) break; // end of stream
  bytesRead += result;
}

13. You can use the available() method to determine how many bytes can be read without blocking. This returns the minimum number of bytes you can read. You may in fact be able to read more, but you will be able to read at least as many bytes as available() suggests.

14. Generally, read(byte[] input, int offset, int length) returns –1 on end of stream; but if length is 0, then it does not notice the end of stream and returns 0 instead.

15. As with output streams, once your program has finished with an input stream, it should close it by invoking its close() method. This releases any resources associated with the stream, such as file handles or ports.

16. On rare occasions, you may want to skip over data without reading it. The skip() method accomplishes this task.

17. The InputStream class also has three methods that allow programs to back up and reread data they’ve already read:

public void mark(int readAheadLimit)
public void reset() throws IOException
public boolean markSupported()

18. In order to reread data, mark the current position in the stream with the mark() method. At a later point, you can reset the stream to the marked position using the reset() method. Subsequent reads then return data starting from the marked position. The number of bytes you can read from the mark and still reset is determined by the readAheadLimit argument to mark(). If you try to reset too far back, an IOException is thrown. Marking a second location erases the first mark.

19. Before trying to use marking and resetting, check whether the markSupported() method returns true. If it does, the stream supports marking and resetting. Otherwise, mark() will do nothing and reset() will throw an IOException. The only input stream classes in java.io that always support marking are BufferedInputStream and ByteArrayInputStream.

20. Java provides a number of filter classes you can attach to raw streams to translate the raw bytes to and from different formats.

21. The filters come in two versions: the filter streams, and the readers and writers. The filter streams still work primarily with raw data as bytes. The readers and writers handle the special case of text in a variety of encodings such as UTF-8 and ISO 8859-1.

22. Filters are organized in a chain, as shown below. Each link in the chain receives data from the previous filter or stream and passes the data along to the next link in the chain:

23. The BufferedOutputStream class stores written data in a buffer (a protected byte array field named buf) until the buffer is full or the stream is flushed. Then it writes the data onto the underlying output stream all at once. A single write of many bytes is almost always much faster than many small writes that add up to the same thing. This is especially true of network connections because each TCP segment or UDP packet carries a finite amount of overhead, generally about 40 bytes’ worth.

24. The BufferedInputStream class also has a protected byte array named buf that serves as a buffer. When one of the stream’s read() methods is called, it first tries to get the requested data from the buffer. Only when the buffer runs out of data does the stream read from the underlying source. At this point, it reads as much data as it can from the source into the buffer, whether it needs all the data immediately or not.

25. The ideal size for a buffer depends on what sort of stream you’re buffering. For network connections, you want something a little larger than the typical packet size. However, this can be hard to predict and varies depending on local network connections and protocols. Faster, higher-bandwidth networks tend to use larger packets, although TCP segments are often no larger than a kilobyte.

26. The two multibyte read() methods of BufferedInputStream attempt to completely fill the specified array or subarray of data by reading from the underlying input stream as many times as necessary. They return only when the array or subarray has been completely filled, the end of stream is reached, or the underlying stream would block on further reads. Most input streams do not behave like this. They read from the underlying stream or data source only once before returning.

27. The PrintStream class is the first filter output stream most programmers encounter because System.out is a PrintStream. By default, print streams should be explicitly flushed. However, if the autoFlush argument (in the constructor) is true, the stream will be flushed every time a byte array or linefeed is written or a println() method is invoked.

28. The println() methods also append a platform-dependent line separator to the end of the line they write. This is a linefeed (\n) on Unix (including Mac OS X), a carriage return (\r) on Mac OS 9, and a carriage return/linefeed pair (\r\n) on Windows.

29. PrintStream is evil and network programmers should avoid it like the plague.

30. Most network protocols such as HTTP and Gnutella specify that lines should be terminated with a carriage return/linefeed pair.

31. PrintStream assumes the default encoding of the platform on which it’s running. However, this encoding may not be what the server or client expects. PrintStream doesn’t provide any mechanism for changing the default encoding. This problem can be patched over by using the related PrintWriter class instead.

32. PrintStream catches any exceptions thrown by the underlying output stream. Instead, PrintStream relies on an outdated and inadequate error flag. If the underlying stream throws an exception, this internal error flag is set. The programmer is relied upon to check the value of the flag using the checkError() method. Once an error has occurred, there is no way to unset the flag so further errors can be detected. Nor is any additional information available about the error.

33. The DataInputStream and DataOutputStream classes provide methods for reading and writing Java’s primitive data types and strings in a binary format. The binary formats used are primarily intended for exchanging data between two different Java programs through a network connection, a datafile, a pipe, or some other intermediary.

34. For DataOutputStream class, all data is written in big-endian format. The writeChars() method simply iterates through the String argument, writing each character in turn as a two-byte, big-endian Unicode character (a UTF-16 code point, to be absolutely precise). The writeBytes() method iterates through the String argument but writes only the least significant byte of each character. Thus, information will be lost for any string with characters from outside the Latin-1 character set. This method may be useful on some network protocols that specify the ASCII encoding, but it should be avoided most of the time. Neither writeChars() nor writeBytes() encodes the length of the string in the output stream. So there’s no exact complement for writeBytes() or writeChars() in DataInputStream; these are handled by reading the bytes and chars one at a time. The writeUTF() method does include the length of the string. It encodes the string itself in a variant of the UTF-8 encoding of Unicode. Because this variant is subtly incompatible with most non-Java software, it should be used only for exchanging data with other Java programs that use a DataInputStream to read strings.

35. DataInputStream provides two methods to read unsigned bytes and unsigned shorts and return the equivalent int:

public final int readUnsignedByte() throws IOException
public final int readUnsignedShort() throws IOException

It also has two readFully() methods that repeatedly read data from the underlying input stream into an array until the requested number of bytes have been read. If enough data cannot be read, then an IOException is thrown.

36. DataInputStream provides the popular readLine() method that reads a line of text as delimited by a line terminator and returns a string. It’s deprecated because it doesn’t properly convert non-ASCII characters to bytes in most circumstances. That task is now handled by the readLine() method of the BufferedReader class. However, that method and this one share the same insidious bug: they do not always recognize a single carriage return as ending a line when it’s the last character of a stream. This problem isn’t obvious when reading files because there will almost certainly be a next character: –1 for end of stream, if nothing else. However, on persistent network connections such as those used for FTP and late-model HTTP, a server or client may simply stop sending data after the last character and wait for a response without actually closing the connection. If you’re lucky, the connection may eventually time out on one end or the other and you’ll get an IOException, although this will probably take at least a couple of minutes, and cause you to lose the last line of data from the stream. If you’re not lucky, the program will hang indefinitely.

37. Java provides an almost complete mirror of the input and output stream class hierarchy designed for working with characters instead of bytes.

38. Java’s native character set is the UTF-16 encoding of Unicode. The java.io.Reader class specifies the API by which characters are read. The java.io.Writer class specifies the API by which characters are written. Wherever input and output streams use bytes, readers and writers use Unicode characters. Concrete subclasses of Reader and Writer allow particular sources to be read and targets to be written.

39. An InputStreamReader contains an underlying input stream from which it reads raw bytes. It translates these bytes into Unicode characters according to a specified encoding. An OutputStreamWriter receives Unicode characters from a running program. It then translates those characters into bytes using a specified encoding and writes the bytes onto an underlying output stream.

40. An OutputStreamWriter receives characters from a Java program. It converts these into bytes according to a specified encoding and writes them onto an underlying output stream. Its constructor specifies the output stream to write to and the encoding to use.

41. An InputStreamReader reads bytes from an underlying input stream such as a FileInputStream or TelnetInputStream. It converts these into characters according to a specified encoding and returns them. The constructor specifies the input stream to read from and the encoding to use.

42. When a program reads from a BufferedReader, text is taken from the buffer rather than directly from the underlying input stream or other text source. When the buffer empties, it is filled again with as much text as possible, even if not all of it is immediately needed, making future reads much faster. When a program writes to a BufferedWriter, the text is placed in the buffer. The text is moved to the underlying output stream or other target only when the buffer fills up or when the writer is explicitly flushed, which can make writes much faster than would otherwise be the case.

43. The BufferedReader class also has a readLine() method that reads a single line of text and returns it as a string. This method replaces the deprecated readLine() method in DataInputStream, and it has mostly the same behavior as that method. The big difference is that by chaining a BufferedReader to an InputStreamReader, you can correctly read lines in character sets other than the default encoding for the platform.

44. The BufferedWriter class adds one new method not included in its superclass, called newLine(),This method inserts a platform-dependent line-separator string into the output. The line.separator system property determines exactly what the string is. Because network protocols generally specify the required line terminator, you should not use this method for network programming.

45. The PrintWriter class is a replacement for Java 1.0’s PrintStream class that properly handles multibyte character sets and international text. PrintWriter still has the problems of platform dependency and minimal error reporting that plague PrintStream.