School of Information Systems

Query Processing and Optimization

Query processing is the process by which a declarative query is translated into low-level data manipulation operations. SQL [3] is the standard query language that is supported in current DBMSs. Query optimization refers to the process by which the best execution strategy for a given query is found from among a set of alternatives.

In centralized DBMSs, the process typically involves two steps: query decomposition and query optimization. Query decomposition takes an SQL query and translates it into one expressed in relational algebra.  In the process, the query is analyzed semantically so that incorrect queries are detected and rejected as easily as possible, and correct queries are simplified. For a given SQL query, there are more than one possible algebraic queries,  some of which are better than others, as defined in terms of expected execution performance. Finding the best among the alternatives

is the process of query optimization and requires consideration of the physical organization of the data (i.e., the availability of indexes and hash structures).

In distributed query processing/optimization (see Distributed Query Processing), the objective is to ensure that the user query, which is posed as if the database was centralized (i.e., logically integrated), executes correctly and efficiently over data that is distributed. Therefore, two more steps are involved between query decomposition and query optimization: data localization and global query optimization. Data localization takes the algebraic query that is obtained as a result of query decomposition and localizes the query s data using data fragmentation/distribution information. In this step, the fragments which are involved in the query are determined and the query is transformed into one that operates on fragments rather than global data units specified in the user query. Global query optimization step takes the localized query and determines the best method of executing operations that span multiple sites. These operations are the usual binary relational operators such as join, union, intersect, etc.

Distributed Transaction Management

A transaction is a unit of atomicity and consistency, and ensure that the database integrity is maintained in the face of concurrent accesses and system failures (see Transaction Management, Transaction Processing). DBMSs provide support for what is called ACID (Atomic, Consistent, Isolated, and Durable) transactions: a transaction is an atomic unit of execution, either all of its actions are completed or none of them are; a transaction is consistent in that it transforms a consistent database state to another consistent database state; a transaction is isolated from other running transactions by hiding its modifications until it is completed and not accessing the modifications that other transactions are concurrently making to the database; a transaction is durable so that, upon successful completion, the results will survive any system failure [4].

Supporting ACID transactions in a distributed setting involve, in addition to the concurrency control and reliability protocols described below, architectural issues having to do with the control and coordination of executing parts of transactions at different sites. In the model presented in Figure 3, this coordination is accomplished by the Global Execution Monitor at the site where the transaction is first initiated. The Global Execution Monitor submits parts of this transaction to the relevant Local Recovery Modules for actual processing and controls that the ACID properties are maintained.

Evaristus Didik