next up previous
Next: Proposed Method Up: Indexing Spatial Data using Previous: Introduction

Background

 Indexing methods for spatial data have been widely investigated during last two decades. The point quadtree, invented by Finkel and Bentley [5] is one of the early attempts in this regard. It is an amalgamation of the grid method and the binary search tree that results in a tree-like directory with non-uniform sized cells containing one element apiece. This is not well suited method for disk based implementation, as it does not consider the possibility of storing a large number of objects in a block. The k-d tree invented by Bentley [2] is an improvement over the point quadtree. It reduces the branching factor and storage requirements. In principle, a k-d tree is a binary search tree with the distinction that at each depth a different attribute value (or key) is tested in determining the direction in which a branch is to be made. In the two dimensional case, it compares x coordinate values at the root and at even depths, and y coordinate values at odd depths. Robinson introduced the k-d-b tree [11], which uses an adaptive k-d tree partition to dictate the contents of each B-tree node. The nodes of the k-d-b tree correspond to disjoint regions. This is a reasonable candidate as a point access method for spatial databases. However, the main drawback of this mechanism is that B-tree performance guarantees are no longer valid. Robinson reports a 60% storage utilization for uniformly distributed data. It suffers from the dimensionality curse, as its predecessors.

The most promising higher dimensional indexing techniques are R-tree [6] and its variants such as R+-tree [13], R*-tree [1] and Hilbert R-tree [7]. R-trees are a class of multi-dimensional data structures. They have many graceful properties which make them more attractive than the above mentioned access methods. First, the R-tree is a dynamic B-tree like structure which has the self-balancing property, i.e all the leaf nodes appear at the same level. Second, it achieves an average of about 50% node utilization. Third, it is a spatial access method, so that, spatial objects can be represented in their original data space, without transforming them into single points in a higher dimension. Objects are represented by the Minimum Bounding Rectangle in which they are contained. Minimum bounding rectangles are denoted by their minimum and maximum coordinates. Often the nodes correspond to disk pages, and thus the parameters defining the tree are chosen so that only a few nodes are visited during a spatial query. Minimum bounding rectangles are allowed to overlap, resulting in a spatial query may often require several nodes to be visited before ascertaining the presence or absence of a particular rectangle. This leads to an inefficiency in querying. We may use an R-tree for indexing higher dimensional feature vectors, as a point in n-dimension can be considered as a degenerated n-dimensional hyper-rectangle. However, as dimensionality increases, minimum bounding rectangles in non-leaf nodes increase in size, resulting in a decreased fan-out. So, R-tree access method is prohibitively slow for dimensions higher than 5.

An alternative to the R-tree in dealing with rectangles is the R+-tree [13]. The motivation for the R+ tree is to avoid overlap among the bounding rectangles. In particular, all bounding rectangles (at levels other than the leaf) are non-overlapping. Thus, each rectangle is associated with all the bounding rectangles that it intersects. The result is that there may be several paths starting at the root to the same rectangle. This will lead to an increase in the height of the tree; however, retrieval time is speeded up. However, this structure suffers from dimensionality curse, in the same way as the R-tree.

Another alternative to the R-tree is the R*-tree [1]. It seems to perform far better than its predecessors. It has two basic improvements over the R-tree. First, it introduces a new node splitting policy, which results in a minimized overlap, and better storage utilization. Moreover, it attempts to reduce the dead space in bounding rectangles and their perimeter. Second, it forces some aging objects to be re-inserted, changing the shape of the tree dynamically. This will result in better search performance.

Some recent attempts in point access methods include TV-tree, X-tree, hB-tree, SS-tree and M-tree. TV-tree [8] is a specially devised point access method for higher dimensional feature vectors. It addresses the problem of ``dimensionality curse" by reducing some elements of the feature vector. And these additional features are utilized whenever the additional discriminatory power is necessary, possibly in lower levels of the tree. The feature vectors are allowed to contact or extend when they appear in different levels of the tree, allowing a high fan-out in higher levels of the tree and high discriminatory power in lower levels of the tree. X-tree [3] is another recent point access method which utilizes the concept of super-nodes and a overlap-minimized splitting policy. X-tree outperforms TV-tree and R*-tree for any dimension. hB-tree [9] is a variation of kd-tree. SS-tree [15] is another point access method. It uses a different distance metric (a weighted Euclidian Metric), and uses minimum bounding spheres. M-tree [4] is another higher dimensional feature vector indexing method. The main idea behind the M-tree is to combine the advantages of balanced and dynamic spatial access methods with the capabilities of static metric trees to index objects using features and distance functions. Leaf nodes of an M-tree contains indexed objects, whereas non-leaf nodes store so-called routing objects. A routing object is a database object to which a routing role is assigned by a specific routing algorithm.

It is clear that most of the recent work attempts to generalize their indexing methods to cope with the requirements of indexing higher dimensional feature vectors. Samet [12] gives an excellent survey on traditional spatial indexing methods.


next up previous
Next: Proposed Method Up: Indexing Spatial Data using Previous: Introduction
Santha Sumanasekara
11/12/1997