Adaptive Replications

The Internet and the World-Wide-Web are rapidly moving us towards a distributed, wholly interconnected information environment. In this environment an object will be accessed, i.e. read and written, from many locations that may be geographically distributed world-wide. In such an environment performance and cost will be of crucial importance, and replication is an approach for addressing both issues. For example, the Internet news is a gigabyte database highly replicated service, and it responds to queries in seconds; in contrast, many nonreplicated servers are much slower. However, massive replication can become very expensive when a large number of database updates occur, since these are usually propagated to many replicas. Caching is often used to address this problem of replication being sometimes advantageous and sometimes detrimental. Caching creates/deletes replicas at a client, as need dictates. Caching is a particular case of our proposed concept of adaptive replication.

Adaptive replication means that the number of copies of an object, the location of these copies, and which updates are propagated to each copy, change dynamically depending on a given cost function. Caching is a particular case of adaptive replication in the sense that it is usually analyzed in a client/server architecture, in a given consistency model, in order to improve performance at a client. Adaptive replication is a more general concept, that can be applied to richer architectures, for multiple consistency models, and for optimizing different cost functions. The precise interpretation of adaptive replication depends on the objective cost function. For example, it can improve performance in a distributed system for one objective function, and it can reduce access cost in a client-server system for another function.

The project has three main objectives. First, establish the fundamental principles of adaptive replication in distributed systems. Second, develop, implement and analyze a system for adaptive replication of objects. Ideally, by parameterizing the Adaptive Replication (AR) system with various cost functions, and availability and consistency requirements, we will produce different adaptive replication algorithms. In practice, we expect that deriving an algorithm for a particular environment will require more than specifying parameters to the AR system, but we will strive to minimize the specialized code. The third objective of the project is to build a simulation testbed in which the user defines the network, selects a replication algorithm, selects a cost model, and an access pattern at each computer. The system will enable comparison among the various adaptive and static algorithms. The project is directed by Ouri Wolfson.

  • An Adaptive Data Replication Algorithm ACM Transactions on Database Systems, Vol. 22(4), June 1997, pp. 255-314.
  • “Infrastructure and cost models for digital libraries”ACM Computing Surveys (Electronic version). Vol. 28A, Dec. 1996.
  • “Strategic Directions in Electronic Commerce and Digital Libraries: Towards a Digital Agora”, ACM Computing Surveys Vol. 28(4), Dec. 1996, pp. 818-835.
  • “Towards a Theory of Cost Management for Digital Libraries”, accepted for publication in the ACM Transactions on Database Systems (TODS), Vol. 23(4), Dec. 1998, pp. 411-452.
  • Semantic Multicast: Intelligently Sharing Collaborative Sessions, ACM Computing Surveys, Vol. 31(2es), 1999.
  • An Architecture for Consumer-Oriented Online Database Services
    Proceedings of RIDE-NDS’96 the 6th International Workshop on Research Issues in Data Engineering: Interoperability of Nontraditional Database Systems, New Orleans, LA, Feb. 1996
  • “An Algorithm for Dynamic Data Allocation in Distributed Systems” Information Processing Letters (IPL), Vol. 53, 1995, pp. 113-119.
  • A Competitive Dynamic Data Replication Algorithm
    Proceedings of the 9-th International Conf. on Data Engineering, pp. 310-317, April 1993, Vienna, Austria
  • Competitive Analysis of Caching in Distributed Databases
    IEEE Transactions on Parallel and Distributed Systems, 9(4), Apr. 1998, pp. 391-409.
  • Divergence Caching in Client-Server Architectures
    In Proceedings of the third International Conference on Parallel and Distributed Information Systems (PDIS), Austin, TX, Sept. 1994, pp. 131-139.
  • An Algorithm for Dynamic Data Distribution
    In Proceedings of the 2nd Workshop on the Management of Replicated Data (WMRD-II), Monterey, CA, Nov. 1992.
  • Distributed Algorithms for Adaptive Replication of Data
    In Proceedings of the 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Diego, CA, June 1992.