The article presents the classification of collections in Java.
While writing programs we frequently face a necessity of storing collections of some objects. Those may be numbers, strings, user class objects etc. This article will try to classify and describe the main Java collection classes in a plain language.
Some may argue “Why do we need some other collections when we do have arrays?”. Many people indeed use collections when it is unnecessary. But there are situations when we need, for example, to dynamically change the size of the data structure in use or to automatically order its elements upon their addition.
This article deals explicitly with Java Collections Framework. There exist numerous alternatives exist to it though:
- 1. Guava (Google Collections Library) - Library adds several useful data structures realizations such as multiset, multimap and bidirectional map. It also has effectiveness improvements over the standard ones.
- 2. Trove library - Collections realization that allows primitives storage (Java Collections Framework doesn’t allow to store them, only wrappers) allowing for efficiency improvements.
- 3. PCJ (Primitive Collections for Java) - as Trove is designed for work with primitive types improving efficiency.
- 4. Finally, you may write your own collection implementation (linked list, for example). But such approach is strictly unrecommended :)
So, as we can see, there are quite some options to choose from. But first things first we have to understand the basic Java collections that are most common in use. By the way, some third-party libraries implement Java Collections Framework interfaces (for example, Guava ). Therefore, basic collections classes hierarchy knowledge will allow you to master third-party libraries faster.
There exists two basic interfaces in Java collections library whose implementation constitute all the collection classes:
- 1. Collection - collection contains set of objects (elements). It contains definitions of the basic methods for data manipulation, such as insertion (add, addAll), removing (remove, removeAll, clear), search (contains) etc.
- 2. Map - describes a "key — value" collection. Each key is associated with one and only one value, corresponding to a mathematical mapping. This collection is also frequently referred to as a dictionary or an associative array. It has nothing in common with Collection interface and is fully independent of it.
Even though the framework is called Java Collections Framework, the map interface and its realizations do belong to it as well! Collection and Map interfaces are basic but they are not the only ones. They are extended by other interfaces adding additional functionality. We will talk about those later.
Let’s consider main interfaces that belong to Collection:
As you can see from the diagram, Collection is not a base interface (what a surprise :D).
Collection interface extends Iterable interface which has the only method iterator(). It means that any collection which inherits from Iterable has to return an iterator.
Iterator (wiki) is an object representing an abstraction for uniform access to elements of a collection. Iterator is a pattern which provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation.
Let’s move on. As you can see from the picture, Collection interface is extended by List, Set and Queue interfaces. Let’s consider the purpose of each of them.
- 1. List is an ordered collection of objects which allows duplicate elements. It is also known as a sequence. Elements of this collection are numbered from zero and can be accessed by index.
- 2. Set describes an unordered collection that cannot contain duplicate elements. It models the mathematical set abstraction.
- 3. Queue is a collection for holding elements prior to processing. Besides basic Collection operations, queues provide additional insertion, removal, and inspection operations.
List interface implementations
Let’s look at the following classes hierarchy.
Interfaces are depicted in red, abstract classes - in green, completed implementations - in blue. It’s not the complete hierarchy, but only its main part.
As you can see in the picture, there are several abstract classes between interface and concrete implementations of the collection. It is made in order to separate out a common functionality in an abstract class for code re-usage.
ArrayList is perhaps the most commonly used collection. ArrayList encapsulates a simple array whose length is automatically increased when the new elements are inserted.
Since ArrayList uses array, the access time of an element by index is minimal (unlike in LinkedList). When the arbitrary element is deleted from the list, all elements that are located to the right of it are shifted one cell left, while the real size of an array (its capacity) remains unchanged. When a new element is inserted and the array is filled, the new array of the size (n*3)/2+1 will be created containing all the elements from the previous array as well as the newly inserted element.
LinkedList is a doubly-linked list. It is a data structure consisting from nodes, each of which contains the data itself as well as two links to the next and previous nodes of the list. Access to the arbitrary element is performed in linear time (but the access to the first and last elements of the list is always performed in constant time — list always stores links to its first and last elements, that is why insertion of the element in the end of the list does not mean that all the list should be iterated to find the last element). In general, ArrayList outperforms LinkedList in absolute terms by memory consumption and speed of operation.
Set interface implementations
Let’s consider and try to grasp the following diagram. :)
HashSet is a collection which does not allow storing the same objects (like any Set). HashSet encapsulates the HashMap object (that is, using the hash-table for storing elements).
As majority of readers do probably know, hash table stores information using the so-called hashing mechanism, in which the content of the key is used to determine a unique value called a hash. This hash is then used as an index, which is associated with the data available by this key. The transformation of the key into hash is done automatically — you will never see the hash value itself. The benefits of hashing is that it allows the following methods to be executed in a constant time: add(), contains(), remove() and size() even for big sets.
If you want to use HashSet for storing user-defined objects, you SHOULD override methods hashCode() and equals(), otherwise two logically equal objects will be considered different, since hashCode() method of the Object class will be called when element is inserted into a collection (which is likely to return different hash for your objects).
It is important to note that HashSet class doesn’t guarantee ordering of elements, since the hashing process itself typically doesn’t produce sorted sets. If you need sorted sets, the best choice could be a different type of collections, such as, for example, a TreeSet class. LinkedHashSet keeps a linked list of elements in their insertion order. It allows organizing an ordered insertion operation. That is, elements of the LinkedHashSet will be returned in the order of their insertion when iterating through it.
TreeSet is a collection which stores elements in the form of a tree ordered by values. TreeSet encapsulates TreeMap, which uses balanced binary Red-Black-Tree for storing elements. The advantage of TreeSet is guaranteed time of log(n) for add, remove and contains operations.
Queue interface implementations
The very simplified hierarchy is depicted below.
PriorityQueue is the only direct implementation of the Queue interface (not considering a LinkedList, which is more of a list than of a queue).
This queue arranges elements either by their natural order (using Comparable interface) or using Comparator interface obtained in constructor.
Map interface implementations
Map interface matches unique keys with associated values. The key is an object which is used for the following data retrieval. You can put values inside a map object by specifying the key and the value. Once this value has been saved, you can get it by its key. Map is a generalized interface defined as shown below.
interface Мар<К, V>
K is a key type while V is a type of stored values.
The class hierarchy is very similar to the Set one:
- HashMap — is based on hash tables and implements a Map interface (which implies storing data in the form of key-value pairs). Keys and values could be of different types, including null. This implementation doesn’t guarantee the order of elements being preserved with time.
- LinkedHashMap - extends the HashMap class and creates the linked list of elements inside a map keeping them in order of their insertion. It allows iterating through the map in the elements’ insertion order. That is, when iterating through a LinkedHashMap class object, elements will be returned in the order of their insertion. You can also create an object of the LinkedHashMap class which will return elements in the order of the last access.
- TreeMap extends an AbstractMap class and implements a NavigableMap interface. It creates a collection which uses tree for elements storing. Objects are stored in ascending order. Access and retrieval time is small enough making the TreeMap class a brilliant choice for storing large amounts of sorted information that must be quickly found.
- WeakHashMap is a collection which uses weak references for keys (rather than values). Weak reference is a specific type of reference to the dynamically created objects in the systems supporting garbage collection. It differs from the usual reference in that it is not considered by a garbage collector when detecting objects for removal. References, that are not considered weak, are sometimes called “strong” https://en.wikipedia.org/wiki/Weak_reference
The following collections are obsolete and their use is discouraged, but not prohibited.
- 1. Enumeration — an analogue of the Iterator interface.
- 2. Vector — analogue of ArrayList class; supports an ordered list of elements that are stored in an “internal” array.
- 3. Stack — class derived from Vector adding a push and pop methods, so that the list could be interpreted as stack.
- 4. Dictionary — analogue of a Map interface, although is an abstract class rather than interface.
- 5. Hashtable — HashMap analogue.
All of Hashtable, Stack and Vector methods are synchronized, which makes them less efficient for usage in single-threaded applications.
Synchronized collection objects could be obtained by using static methods synchronizedMap and synchronizedList of a Collections class.
Map m = Collections.synchronizedMap(new HashMap());
List l = Collections.synchronizedList(new ArrayList());
Synchronized wrappers of synchronizedMap and synchronizedList collections are sometimes called conditionally thread-safe - all operations individually are thread-safe, but the sequences of operations, where the control thread depends on the results of the previous operations may be the cause of competition for data.
Relative thread safety provided by synchronizedList and synchronizedMap is a hidden threat - developers believe that if these collections are synchronized, then they are fully thread safe, and neglect the proper synchronization of composite operations. As a result, although these programs work under the moderately light load, they may start throwing NullPointerException or ConcurrentModificationException under the heavy one.
In addition there is always a possibility of a "classical" synchronization via synchronized block.
Putting it all together
Let’s see the final class diagram:
As you can see, the class diagram is quite massive. But such an architecture is considered standard in OOP.