Chapter 17. Containers in Depth -- Thinking in Java

leonzhx

浏览: 768175 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Thinking in Java 4th Edition 读书笔记

ReferenceQueue BitSet Collections HashMap Reference

1) Java SE5 adds:

a. The Queue interface (which LinkedList has been modified to implement) and its implementations PriorityQueue and various flavors of BlockingQueue for use in threading.

    b. A ConcurrentMap interface and its implementation ConcurrentHashMap, also for use in threading.
    c. CopyOnWriteArrayList and CopyOnWriteArraySet, also for concurrency.
    d. EnumSet and EnumMap, special implementations of Set and Map for use with enums.
    e. Several utilities in the Collections class.

2) Just as with Arrays, there is a companion class called Collections containing static utility methods, including one called fill( ). Like the Arrays version, this fill( ) just duplicates a single object reference throughout the container. In addition, it only works for List objects, but the resulting list can be passed to a constructor or to an addAll( ) method. fill( ) can only replace elements that are already in the List and will not add new elements.

3) Collections.nCopies( ) creates a List filled with references to a single object.

4) Object.toString( ) produces the class name followed by the unsigned hexadecimal representation of the hash code of the object (generated by the hashCode( ) method).

5) Each java.util container has its own Abstract class that provides a partial implementation of that container, so all you must do is implement the necessary methods in order to produce the desired container.

6) You use a flyweight when the ordinary solution requires too many objects, or when producing normal objects takes up too much space. The Flyweight pattern externalizes part of the object so that, instead of everything in the object being contained within the object, some or all of the object is looked up in a more efficient external table (or produced through some other calculation that saves space).

7) In order to create a read-only Map, you inherit from AbstractMap and implement entrySet( ). In order to create a read-only Set, you inherit from AbstractSet and implement iterator( ) and size( ). To create a read-only List from an AbstractList, you must implement get( ) and size( ).

8) The following table shows everything you can do with a Collection:

(There is an error in above table : contains() methods takes in an Object instead of generic type.)

9) The methods that perform various kinds of addition and removal are optional operations in the Collection interface. This means that the implementing class is not required to provide functioning definitions for these methods.

10) The UnsupportedOperationException must be a rare event. That is, for most classes, all operations should work, and only in special cases should an operation be unsupported. This is true in the Java containers library, since the classes you’ll use 99 percent of the time—ArrayList, LinkedList, HashSet, and HashMap, as well as the other concrete implementations—support all of the operations. The design does provide a "back door" if you want to create a new Collection without providing meaningful definitions for all the methods in the Collection interface, and yet still fit it into the existing library.

11) When an operation is unsupported, there should be reasonable likelihood that an UnsupportedOperationException will appear at implementation time, rather than after you’ve shipped the product to the customer. After all, it indicates a programming error: You’ve used an implementation incorrectly.

12) A common source of unsupported operations involves a container backed by a fixed-sized data structure. You get such a container when you turn an array into a List with the Arrays.asList( ) method. You can also choose to make any container (including a Map) throw UnsupportedOperationExceptions by using the "unmodifiable" methods in the Collections class. The "unmodifiable" methods in the Collections class wrap the container in a proxy that produces an UnsupportedOperationException if you perform any operation that modifies the container in any way. The goal of using these methods is to produce a "constant" container object.

13) The documentation for a method that takes a container as an argument should specify which of the optional methods must be implemented.

14) Different Set implementations not only have different behaviors, they have different requirements for the type of object that you can put into a particular Set:

15) For good programming style, you should always override hashCode( ) when you override equals( ).

16) Placing an object of the type, which doesn’t include a redefined hashCode( ) method, into any hashed implementations results in duplicate values, so the primary contract of the Set is violated. This is rather disturbing because there’s not even a runtime error. If you try to use a type that doesn’t implement Comparable in a TreeSet, you get a more definitive result: An exception is thrown when the TreeSet attempts to use the object as a Comparable.

17) The elements in a SortedSet are guaranteed to be in sorted order, which allows additional functionality to be provided with the following methods that are in the SortedSet interface:
Comparator comparator( ): Produces the Comparator used for this Set, or null for natural ordering.
Object first( ): Produces the lowest element.
Object last( ): Produces the highest element.
SortedSet subSet(fromElement, toElement): Produces a view of this Set with elements from fromElement, inclusive, to toElement, exclusive.
SortedSet headSet(toElement): Produces a view of this Set with elements less than toElement.
SortedSet tailSet(fromElement): Produces a view of this Set with elements greater than or equal to fromElement.

18) A deque (double-ended queue) is like a queue, but you can add and remove elements from either end. There are methods in LinkedList that support deque operations, and there is one explicit interface for a deque: Deque<E> introduced in the Java standard libraries from Java 6.

19) Performance is a fundamental issue for maps, and it’s very slow to use a linear search in get( ) when hunting for a key. This is where HashMap speeds things up. Instead of a slow search for the key, it uses a special value called a hash code. The hash code is a way to take some information in the object in question and turn it into a "relatively unique" int for that object. hashCode( ) is a method in the root class Object, so all Java objects can produce a hash code. A HashMap takes the hashCode( ) of the object and uses it to quickly hunt for the key. This results in a dramatic performance improvement.

20) Here are the basic Map implementations:

21) The keySet( ) method produces a Set backed by the keys in the Map. Because of improved printing support in Java SE5, you can simply print the result of the values( ) method, which produces a Collection containing all the values in the Map. (Note that keys must be unique, but values may contain duplicates.) Since these Collections are backed by the Map, any changes in a Collection will be reflected in the associated Map.

22) If you have a SortedMap (of which TreeMap is the only one available), the keys are guaranteed to be in sorted order, which allows additional functionality to be provided with these methods in the SortedMap interface:
Comparator comparator( ): Produces the comparator used for this Map, or null for natural ordering.
T firstKey( ): Produces the lowest key.
T lastKey( ): Produces the highest key.
SortedMap subMap(fromKey, toKey): Produces a view of this Map with keys from fromKey, inclusive, to toKey, exclusive.
SortedMap headMap(toKey): Produces a view of this Map with keys less than toKey.
SortedMap tailMap(fromKey): Produces a view of this Map with keys greater than or equal to fromKey.

23) The LinkedHashMap hashes everything for speed, but also produces the pairs in insertion order during a traversal. In addition, a LinkedHashMap can be configured in the constructor to use a least-recently- used (LRU) algorithm based on accesses, so elements that haven’t been accessed (and thus are candidates for removal) appear at the front of the list. This allows easy creation of programs that do periodic cleanup in order to save space.

24) Object’s hashCode( ) method by default just uses the address of its object. The default Object.equals( ) simply compares object addresses.

25) A proper equals( ) must satisfy the following five conditions:
    a. Reflexive: For any x, x.equals(x) should return true. (except x is null)
    b. Symmetric: For any x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
    c. Transitive: For any x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
    d. Consistent: For any x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
    e. For any non-null x, x.equals(null) should return false.

26) The instanceof actually quietly does a second sanity check to see if the object is null, since instanceof produces false if the left-hand argument is null.

27) Notice that the type of key is Object in get( ), rather than the parameterized type K as you might expect. This is a result of the injection of generics into the Java language at such a late date—if generics had been an original feature in the language, get( ) could have specified the type of its parameter.

28) Return a copy of the keys and values in entrySet() is not a correct implementation of Map. A correct implementation of entrySet( ) will provide a view into the Map, rather than a copy, and this view will allow modification of the original map (which a copy doesn’t).

29) The process of looking up a value starts by computing the hash code and using it to index into the array. If you could guarantee that there were no collisions (which is possible if you have a fixed number of values), then you’d have a perfect hashing junction, but that’s a special case. In all other cases, collisions are handled by external chaining: The array doesn’t point directly to a value, but instead to a list of values. These values are searched in a linear fashion using the equals( ) method. Of course, this aspect of the search is much slower, but if the hash function is good, there will only be a few values in each slot. So instead of searching through the entire list, you quickly jump to a slot where you only have to compare a few entries to find the value. This is much faster, which is why the HashMap is so quick.

30) Because the "slots" in a hash table are often referred to as buckets, the array that represents the actual table is called buckets. To promote even distribution, the number of buckets is typically a prime number.

31) As it turns out, a prime number is not actually the ideal size for hash buckets, and recent hashed implementations in Java use a power-of-two size (after extensive testing). Division or remainder is the slowest operation on a modern processor. With a power-of-two hash table length, masking can be used instead of division. Since get( ) is by far the most common operation, the % is a large part of the cost, and the power-of-two approach eliminates this (but may also affect some hashCode( ) methods).

32) You don’t control the creation of the actual value that’s used to index into the array of buckets. That is dependent on the capacity of the particular HashMap object, and that capacity changes depending on how full the container is, and what the load factor is. Thus, the value produced by your hashCode( ) will be further processed in order to create the bucket index.

33) The most important factor in creating a hashCode( ) is that, regardless of when hashCode( ) is called, it produces the same value for a particular object every time it is called. If you end up with an object that produces one hashCode( ) value when it is put( ) into a HashMap and another during a get( ), you won’t be able to retrieve the objects. So if your hashCode( ) depends on mutable data in the object, the user must be made aware that changing the data will produce a different key because it generates a different hashCode( ).

34) String has the special characteristic that if a program has several String objects (literals) that contain identical character sequences, then those String objects all map to the same memory. So it makes sense that the hashCode( ) produced by two separate instances of the String "hello" should be identical. The hashCode( ) for String is clearly based on the contents of the String.

35) For a hashCode( ) to be effective, it must be fast and it must be meaningful; that is, it must generate a value based on the contents of the object. A good hashCode( ) should result in an even distribution of values. If the values tend to cluster, then the HashMap or HashSet will be more heavily loaded in some areas and will not be as fast as it can be with an evenly distributed hashing function.

36) A basic recipe for generating a decent hashCode( ):
a. Store some constant nonzero value, say 17, in an int variable called result.
b. For each significant field f in your object (that is, each field taken into account by the equals( ) method), calculate an int hash code c for the field:

c. Combine the hash code(s) computed above: result = 37 * result + c;
d. Return result.

e. Look at the resulting hashCode( ) and make sure that equal instances have equal hash codes.

37) ArrayList is backed by an array, and LinkedList is implemented in the usual way for a doubly linked list, as individual objects each containing data along with references to the previous and next elements in the list. Because of this, if you want to do many insertions and removals in the middle of a list, a LinkedList is the appropriate choice. If not, an ArrayList is typically faster.

38) A LinkedList treats the endpoints of the List specially—this improves the speed when using a LinkedList as a Queue.

39) For random-access get( ) and set( ) operations, a List backed by an array is slightly faster than an ArrayList, but the same operations are dramatically more expensive for a LinkedList because it is not designed for randomaccess operations.

40) The best approach is probably to choose an ArrayList as your default and to change to a LinkedList if you need its extra functionality or you discover performance problems due to many insertions and removals from the middle of the list. If you are working with a fixed-sized group of elements, either use a List backed by an array (as produced by Arrays.asList( )), or if necessary, an actual array.

41) It turns out that 0.0 is included in the output of Math.random( ). Or, in math lingo, it is [0,1).

42) The performance of HashSet is generally superior to TreeSet, but especially when adding elements and looking them up, which are the two most important operations. TreeSet exists because it maintains its elements in sorted order, so you use it only when you need a sorted Set. Because of the internal structure necessary to support sorting and because iteration is something you’re more likely to do, iteration is usually faster with a TreeSet than a HashSet. Note that LinkedHashSet is more expensive for insertions than HashSet; this is because of the extra cost of maintaining the linked list along with the hashed container.

43) Insertions for all the Map implementations except for IdentityHashMap get significantly slower as the size of the Map gets large. In general, however, lookup is much cheaper than insertion. Hashtable performance is roughly the same as HashMap. Since HashMap is intended to replace Hashtable, and thus uses the same underlying storage and lookup mechanism. when you’re using a Map, your first choice should be HashMap, and only if you need a constantly sorted Map will you need TreeMap.
LinkedHashMap tends to be slower than HashMap for insertions because it maintains the linked list (to preserve insertion order) in addition to the hashed data structure. Because of this list, iteration is faster.

44) It’s possible to hand-tune a HashMap to increase its performance for your particular application. So that you can understand performance issues when tuning a HashMap, some terminology is necessary:
Capacity: The number of buckets in the table.
Initial capacity: The number of buckets when the table is created. HashMap and HashSet have constructors that allow you to specify the initial capacity.
Size: The number of entries currently in the table.
Load factor: Size/Capacity. A load factor of 0 is an empty table, 0.5 is a half-full table, etc. A lightly loaded table will have few collisions and so is optimal for insertions and lookups (but will slow down the process of traversing with an iterator). HashMap and HashSet have constructors that allow you to specify the load factor, which means that when this load factor is reached, the container will automatically increase the capacity (the number of buckets) by roughly doubling it and will redistribute the existing objects into the new set of buckets (this is called rehashing). The default load factor used by HashMap is 0.75.

45) There are a number of standalone utilities for containers, expressed as static methods inside the java.util.Collections class.

46) The Java containers library uses a fail-fast mechanism that looks for any changes to the container other than the ones your process is personally responsible for. If it detects that someone else is modifying the container, it immediately produces a ConcurrentModificationException. The exception happens because something is placed in the container after the iterator is acquired from the container. The possibility that two parts of the program might modify the same container produces an uncertain state, so the exception notifies you that you should change your code—in this case, acquire the iterator after you have added all the elements to the container. The ConcurrentHashMap, CopyOnWriteArrayList, and CopyOnWriteArraySet use techniques that avoid ConcurrentModificationExceptions.

47) The java.lang.ref library contains a set of classes that allow greater flexibility in garbage collection. There are three classes inherited from the abstract class Reference: SoftReference, WeakReference, and PhantomReference. Each of these provides a different level of indirection for the garbage collector if the object in question is only reachable through one of these Reference objects.

48) You use Reference objects when you want to continue to hold on to a reference to that object—you want to reach that object—but you also want to allow the garbage collector to release that object. Thus, you have a way to use the object, but if memory exhaustion is imminent, you allow that object to be released.

49) In the order of SoftReference, WeakReference, and PhantomReference, each one is "weaker" than the last and corresponds to a different level of reachability. Soft references are for implementing memory-sensitive caches. Weak references are for implementing "canonicalizing mappings"—where instances of objects can be simultaneously used in multiple places in a program, to save storage—that do not prevent their keys (or values) from being reclaimed. Phantom references are for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.

50) With SoftReferences and WeakReferences, you have a choice about whether to place them on a ReferenceQueue (the device used for premortem cleanup actions), but a PhantomReference can only be built on a ReferenceQueue.

51) The containers library has a special Map to hold weak references: the WeakHashMap. This class is designed to make the creation of canonicalized mappings easier. In such a mapping, you are saving storage by creating only one instance of a particular value. When the program needs that value, it looks up the existing object in the mapping and uses that (rather than creating one from scratch). Since this is a storage-saving technique, it’s very convenient that the WeakHashMap allows the garbage collector to automatically clean up the keys and values. You don’t have to do anything special to the keys and values you want to place in the WeakHashMap; these are automatically wrapped in WeakReferences by the map.

52) In the revised Java container library, Vector was adapted so that it could work as a Collection and a List. Instead of using a Vector with composition, Stack is inherited from Vector.

53) The Enumeration interface is smaller than Iterator, with only two methods, and it uses longer method names: boolean hasMoreElements( ) produces true if this enumeration contains more elements, and Object nextElement( ) returns the next element of this enumeration if there are any more (otherwise it throws an exception). You can produce an Enumeration for any Collection by using the Collections.enumeration( ) method.

54) A BitSet is used if you want to efficiently store a lot of on-off information. It’s efficient only from the standpoint of size; if you’re looking for efficient access, it is slightly slower than using a native array. A normal container expands as you add more elements, and the BitSet does this as well.

55) An EnumSet is usually a better choice than a BitSet if you have a fixed set of flags that you can name, because the EnumSet allows you to manipulate the names rather than numerical bit locations, and thus reduces errors. EnumSet also prevents you from accidentally adding new flag locations, which could cause some serious, difficult-to-find bugs. The only reasons you should use BitSet instead of EnumSet is if you don’t know how many flags you will need until run time, or if it is unreasonable to assign names to the flags, or you need one of the special operations in BitSet.