Glossary

access plan	a tree that encodes decisions about file structures to access individual tables, the order of joining tables, and the algorithm to join tables. Access plans are generated by the optimization component to implement queries submitted by users.



accumulating fact table	a fact table that records the status of multiple events rather than one event. Each event occurrence column can be represented by a foreign key to the time table along with a time of day column if needed.



ACID properties	transaction properties supported by DBMSs. ACID is an acronym for atomic, consistent, isolated, and durable. Atomic means all or nothing. Consistent means that a database does not violate integrity constraints before or after a transaction commits. Isolated means that other transactions cannot see the updates made by a transaction until after the transaction terminates. Durable means that the effects of a transaction are permanent after committing even if a failure occurs.



actions on referenced rows	possible actions in response to the deletion of a referenced row or the update of the primary key of a referenced row. The possible actions are restrict (do not permit the action on the referenced row), cascade (perform the same action on the related rows), nullify (set the foreign key of related rows to null), and default (set the foreign key of related rows to a default value). See also referenced rows.



aggregate function	a summary or statistical function. The standard aggregate functions in SQL are MIN, MAX, COUNT, SUM, and AVG.



analyst/programmer	an information system professional who is responsible for collecting requirements, designing applications, and implementing information systems. An analyst/programmer may create and use external views to develop forms, reports, and other parts of an information system.



Analytic Hierarchy Process	a decision theory technique to evaluate problems with multiple objectives. The Analytic Hierarchy Process can be used to select and evaluate DBMSs by allowing a systematic assignment of weights to requirements and scores to features of candidate DBMSs.



ANSI	American National Standards Institute, one of the groups responsible for SQL standards.



application profile	a statistical summary of the forms, reports, and queries that access a database. Application profiles are an important input of the physical database design phase because they are used to predict demand for the database.



assertion	the most general kind of integrity constraint supported in SQL:2003. An assertion can involve a SELECT statement of arbitrary complexity. The CREATE ASSERTION statement defines assertions in SQL:2003.



associative entity type	a weak entity that depends on two or more entity types for its primary key. An associative entity type with more than two identifying relationships is known as an M-way associative entity type. See also M-way relationship, identifying relationship, and weak entity.



attribute	a property of an entity type or relationship. Each attribute has a data type defining allowable values and operations. Attribute is synonymous with field and column.



authorization rules	define authorized users, allowable operations, and accessible parts of a database. The database security system stores authorization rules and enforces them for each database access. Also known as security constraints.



benchmark	a workload to evaluate the performance of a system or product. A good benchmark should be relevant, portable, scalable, and understandable.



BETWEEN-AND operator	a shortcut operator to test a numeric or date column against a range of values. The BETWEEN-AND operator returns true if the column is greater than or equal to the first value and less than or equal to the second value.



binary large object (BLOB)	a data type for fields containing large binary data such as images. BLOB data can be retrieved but not displayed. The BLOB data type provides a simple way to extend a DBMS with object features. See also the Large Object Architecture.



binding	in query optimization, binding refers to associating an access plan with an SQL statement. In objectoriented computing, binding refers to associating a method name with its implementation. Binding can be static (decided at compile-time) or dynamic (decided at run-time). Static binding is more efficient but sometimes less flexible than dynamic binding. See also access plan and message.



bitmap index	a secondary file structure consisting of a column value and a bitmap. A bitmap contains one bit position for each row of a referenced table. A bitmap column index references the rows containing the column value. A bitmap join index references the rows of a child table that join with rows of the parent table containing the column. Bitmap indices work well for stable columns with few values typical of tables in a data warehouse. See also star join.



bottom-up data warehouse architecture	an architecture for a data warehouse in which data marts are built for user departments. If a need for an enterprise data model emerges, the data marts will evolve to a data warehouse. See also two-tier data warehouse architecture and three-tier data warehouse architecture.



Boyce-Codd Normal Form (BCNF)	a table is in BCNF if every determinant is a candidate key. BCNF is a revised definition for 3NF.



Btree file	a popular file structure supported by most DBMSs because it provides good performance on both key search as well as sequential search. A Btree file is a balanced, multiway tree. The most popular variation of the Btree is the B+tree, in which all keys are redundantly stored in the leaf nodes. The B+tree provides improved performance on sequential and range searches. A Btree can be used as a primary or secondary file structure.



buffer	an area in main memory containing physical database records transferred from disk.



cache coherence	a problem of parallel database processing using shared disk architectures. Cache coherence involves synchronization among local memories and common disk storage. After a processor addresses a disk page, the image of this page remains in the cache associated with the given processor. An inconsistency occurs if another processor has changed the page in its own buffer. See also shared disk architecture and clustered disk architecture.



call-level interface (CLI)	a language style for integrating a programming language with a nonprocedural language such as SQL.A CLI includes a set of procedures and a set of type definitions for manipulating the results of SQL statements in computer programs. The most widely used CLIs, the Open Database Connectivity (ODBC) supported by Microsoft and the Java Database Connectivity (JDBC) supported by Oracle, are compatible with the SQL:2003 CLI.



candidate key	a minimal superkey. A superkey is minimal if removing any columns makes it no longer unique.



cardinality	a constraint on the number of entities participating in a relationship. In an entity relationship diagram, the minimum and the maximum number of entities are specified for both directions of a relationship.



CASE tool	a tool to facilitate database and information systems development. CASE tools support features for drawing, analysis, prototyping, and data dictionary. CASE is an acronym for computer-aided software engineering.



change data	data from a source system that provides the basis to update a data warehouse. Change data comprises new source data (insertions) and modifications to existing source data (updates and deletions). Further, change data can affect fact tables and/or dimension tables. See also cooperative change data, logged change data, snapshot change data, and queryable change data.



character large object (CLOB)	a data type for columns containing large text data such as documents and Web pages. The CLOB data type provides a simple way to extend a DBMS with object features. See also the Large Object Architecture.



checkpoint	the act of writing a checkpoint record to the log and force writing log and database buffers to disk. All transaction activity must cease while a checkpoint occurs. The checkpoint interval should be chosen to balance restart time with checkpoint overhead. A traditional checkpoint is known as a cache-consistent checkpoint. See also fuzzy checkpoint and incremental checkpoint.



class	a collection of objects. A class definition includes variables for object data and methods for object procedures.



client	a program that submits requests to a server such as accessing or updating a database.



client-server architecture	an arrangement of components (clients and servers) among computers connected by a network. The client-server architecture supports efficient processing of messages (requests for service) between clients and servers.



cluster	a tight coupling of two or more computers so that they behave as a single computer. Clusters provide additional flexibility for parallel database processing. See also clustered disk architecture and clustered nothing architecture.



clustered disk (CD) architecture	an architecture for parallel database processing in which the processors in each cluster share all disks, but nothing is shared across clusters.



clustered nothing (CN) architecture	an architecture for parallel database processing in which the processors in each cluster share no resources, but each cluster can be manipulated to work in parallel to perform a task.



clustering index	an index in which the order of the data records is close to the index order. A clustering index can be organized as a primary index or a secondary file structure. See also index selection, nonclustering index, primary file structure, and secondary file structure.



collision	a condition involving insertions to a hash file. A collision occurs when two or more records hash to the same location. For each record, a hash function transforms the key value into an address. See also hash file.



column	a field or attribute in a table. Each column has a data type defining allowable values and operations. Column is synonymous with field and attribute.



combined primary key	a combination of columns (more than one) designated as the primary key. Also known as a composite primary key.



completeness constraint	a constraint about generalization hierarchies. A completeness constraint means that every entity in a supertype has a related entity in one of the subtypes. In other words, the union of the set of entities in the subtypes equals the set of entities in the supertype.



conceptual evaluation process	the sequence of operations and intermediate tables used to derive the result of a SELECT statement. The evaluation process is conceptual because most SQL compilers will take many shortcuts to produce the result. The conceptual evaluation process may help you gain an initial understanding of the SELECT statement as well as help you to understand more difficult problems.



conceptual schema	a data description that covers all entities and relationships in a database. The conceptual schema is concerned with the meaning of the database, not its physical implementation. See also Schema, Internal Schema, External View, and Three Schema Architecture.



concurrency transparency	a service provided by a DBMS so that users perceive a database as a single-user system even though there may be many simultaneous users. The concurrency control manager is the component of a DBMS responsible for concurrency transparency.



constellation schema	a data modeling representation for multidimensional databases. In a relational database, a constellation schema contains multiple fact tables in the center related to dimension tables. Typically the fact tables share some dimension tables. See also snowflake schema, star schema, fact table, and dimension table.



cooperative change data	data obtained from a source system for refreshing a data warehouse. Cooperative change data involves notification from a source system typically at transaction completion time using a trigger. See also logged change data, snapshot change data, and queryable change data.



cross product style	a way to formulate joins in a SELECT statement. The cross product style lists the tables in the FROM clause and the join conditions in the WHERE clause.



CUBE operator	an operator that augments the normal GROUP BY result with all combinations of subtotals. The SQL:2003 standard provides the CUBE operator as an extension of the GROUP BY clause to support multidimensional data. The CUBE operator is appropriate to summarize columns from multiple dimensions rather than columns representing different levels of a single dimension.



cursor	a construct in a database programming language that allows storage and iteration through a set of records returned by a SELECT statement. A cursor is similar to a dynamic array in which the array size is determined by the size of the query result. A database programming language provides statements or procedures to declare cursors, open and close cursors, position cursors, and retrieve values from cursors.



data access middleware	provide a uniform interface to relational and nonrelational data using SQL. Requests to access data from a DBMS are sent to a data access driver rather than directly to the DBMS. The data access driver converts the SQL statement into the SQL supported by the DBMS and then routes the request to the DBMS. The two leading data access middleware are the Open Database Connectivity (ODBC) supported by Microsoft and the Java Database Connectivity (JDBC) supported by Oracle.



data administrator (DA)	a management position that performs planning and policy setting for the information resources of an entire organization.



data cube	a multidimensional format in which cells contain numeric data called measures organized by subjects called dimensions. A data cube is sometimes known as a hypercube because conceptually it can have an unlimited number of dimensions.



data dictionary	a special database that describes individual databases and the database environment. The data dictionary contains data descriptors called metadata that define the source, the use, the value, and the meaning of data. See also metadata.



data independence	a database should have an identity separate from the applications (computer programs, forms, and reports) that use it. The separate identity allows the database definition to be changed without affecting related applications.



data mart	a subset or view of a data warehouse, typically at a department or functional level, that contains all the data required for decision support tasks of that department. In addition, a data mart insulates departmental users from data used by other departments. In some organizations, a data mart is a small data warehouse rather than a view of a larger data warehouse.



data mining	the process of discovering implicit patterns in data stored in a data warehouse and using those patterns for business advantage such as predicting future trends.



data model	a graphical model depicting the structure of a database. A data model contains entity types and relationships among entity types. See also environment interaction model and process model.



data type	defines a set of values and permissible operations on the values. Each column of a table is associated with a data type.



data warehouse	a central repository for summarized and integrated data from operational databases and external data sources.



data warehouse maturity model	a framework that provides guidance about investment decisions in data warehouse technology. The model consists of six stages (prenatal, infant, child, teenager, adult, and sage) in which business value increases as organizations progress to higher stages.



database	a collection of persistent data that can be shared and interrelated.



database administrator (DBA)	a support position that specializes in managing individual databases and DBMSs.



database link	a key concept for Oracle distributed databases. A database link provides a one-way connection from a local database to a remote database. Database links allow a user to access another user's objects in a remote database without having an account on the remote site. When using a database link, a remote user is limited by the privilege set of the object owner.



database management system (DBMS)	a collection of components that support data acquisition, dissemination, maintenance, retrieval, and formatting. An enterprise DBMS supports databases that are critical to an organization. A desktop DBMS supports databases for small workgroups and small businesses. An embedded DBMS is part of a larger system such as a device or application. Embedded DBMSs provide limited features but have low memory, processing, and storage requirements.



Database Partitioning Feature (DPF)	an IBM technology for parallel database processing. The DPF option of IBM's DB2 Enterprise Server Edition uses the clustered nothing architecture.



database programming language	a procedural language with an interface to one or more DBMSs. The interface allows a program to combine procedural statements with nonprocedural database access. See also call-level interface and statement-level interface.



database security	protecting databases from unauthorized access and malicious destruction.



datasheet	a way to display a table in which the column names appear in the first row and the body in the other rows. Datasheet is a Microsoft Access term.



deadlock	a problem of mutual waiting that can occur when using locks. If a deadlock is not resolved, the involved transactions will wait forever.



deferred constraint checking	enforcing integrity constraints at the end of a transaction rather than immediately after each manipulation statement. Complex constraints may benefit from deferred checking.



deferred update approach	an approach used by a recovery manager to record database changes on disk. In this approach, database updates are written only after a transaction commits. To restore a database, only redo operations are used.



denormalization	combining tables so that they are easier to query. Denormalization is the opposite of normalization. Denormalization can be useful to improve query performance or to ignore a dependency that does not cause significant storage anomalies.



desktop DBMS	support databases used by work teams and small businesses. Desktop DBMSs are designed to run on personal computers and small servers. See also enterprise DBMS and embedded DBMS.



detail line	the innermost (most nested) line on a hierarchical report.



determinant	the column(s) appearing on the left-hand side of a functional dependency. Alternatively known as a LHS for left-hand side.



dice	a data cube operator in which a dimension is replaced by a subset of its values. See also slice.



difference	an operator of relational algebra that combines rows from two tables. The difference operator extracts rows that belong to the first table only. Both tables must be union compatible to use the difference operator.



dimension	a label or subject that organizes numeric data in a data cube. A dimension contains values known as members such as a location dimension having members for countries. Dimensions may be organized in hierarchies composed of levels to support the data cube operations of drill-down and roll-up. A dimension hierarchy may be ragged showing relationships among members of the same dimension level.



dimension table	a table in a star schema or snowflake schema that stores dimensions or subjects used to aggregate facts.



discretionary access control	users are assigned access rights or privileges to specified parts of a database. Discretionary access control is the most common kind of security control supported by commercial DBMSs.



disjointness constraint	a constraint about generalization hierarchies. A disjointness constraint means that the subtypes do not share any common entities. In other words, the intersection of the sets of entities in the subtypes is empty.



distributed DBMS	a collection of components that supports requests for data residing on multiple sites. A distributed DBMS finds remote data, optimizes global requests, and coordinates transactions at multiple sites. Also known as a distributed database management system (DDBMS).



distributed processing	allows geographically dispersed computers to cooperate when providing data access and other services. See also client-server architecture.



divide	an operator of relational algebra that combines rows from two tables. The divide operator produces a table in which the values of a column from one input table are associated with all the values from a column of the second table.



drill-down	a data cube operator that supports navigation from a more general level of a dimension to a more specific level of a dimension. See also roll-up.



embedded DBMS	resides in a larger system, either an application or a device such as a Personal Digital Assistant or smart card. Embedded DBMSs provide limited transaction processing features but have low memory, processing, and storage requirements. See also desktop DBMS and enterprise DBMS.



embedded SQL	using SQL inside a host programming language such as COBOL or Visual Basic. Additional SQL statements that can be used only in a programming language cause other SQL statements to be executed and use database results inside the program. See also Standalone SQL.



encapsulation	a principle of object-oriented computing in which an object can be accessed only through its interface. The internal details (variables and method implementations) cannot be accessed. Encapsulation supports lower software maintenance costs.



encryption	involves the encoding of data to obscure their meaning. An encryption algorithm changes the original data (known as the plaintext). To decipher the data, the user supplies an encryption key to restore the encrypted data (known as the ciphertext) to its original (plaintext) format.



enterprise data model (EDM)	a conceptual data model of an organization. An enterprise data model can be used for data planning (what databases to develop) and decision support (how to integrate and transform operational databases and external data sources).



enterprise DBMS	supports databases that are often critical to the functioning of an organization. Enterprise DBMSs usually run on powerful servers and have a high cost. See also desktop DBMS and embedded DBMS.



entity	a cluster of data usually about a single topic that can be accessed together. An entity can denote a person, place, event, or thing.



entity integrity	a constraint involving primary keys. No two rows of a table can contain the same value for the primary key. In addition, no row can contain a null value for any columns of a primary key.



entity type	a collection of entities (persons, places, events, or things) of interest in an application, represented by a rectangle in an entity relationship diagram.



environment interaction model	a graphical model showing the relationships between events and processes. An event such as the passage of time or an action from the environment can trigger a process to start or stop. See also data model and process model.



equi-join	a join operator where the join condition involves equality. See also join and natural join.



ETL tools	software tools for extraction, transformation, and loading of change data from data sources to a data warehouse. ETL tools eliminate the need to write custom coding for many data warehouse maintenance tasks.



exact string matching	searching for one string value using the equality comparison operator. See also Inexact String Matching.



exclusive lock	a lock that prevents other users from accessing a database item. Exclusive locks conflict with all other kinds of locks (shared, other exclusive locks, and intent). An exclusive lock indicates that a user will change the value of a database item. Also known as an X lock.



existence dependency	an entity that cannot exist unless another related entity exists. A mandatory relationship produces an existence dependency. See also mandatory relationship.



explicit PL/SQL cursor	a cursor that can be used in procedures written in PL/SQL, the database programming language of Oracle. An explicit cursor is declared with the CURSOR statement in the DECLARE section. Explicit cursors are usually manipulated by the OPEN, CLOSE, and FETCH statements. Explicit cursors can be referenced anyplace inside the BEGIN section. See also cursor and implicit PL/SQL cursors.



expression	a combination of constants, column names, functions, and operators that produces a value. In conditions and result columns, expressions can be used in any place that column names appear.



extended cross product	an operator of relational algebra that combines two tables. The extended cross product (product for short) operator builds a table from two tables consisting of all possible combinations of rows, one from each of the two input tables.



external view	a description of derived data appropriate for a given user group. Also known as external schema and view. See also Schema and Three Schema Architecture.



fact table	a table in a star schema or snowflake schema that stores numeric values of relevance to a decision maker. See also star schema and snowflake schema.



file	a collection of data on a permanent storage device such as a hard disk. The data or physical records on the file are organized to support efficient processing. Files are part of the internal schema of a database.



first generation database	technology proprietary file structures and program interfaces that supported sequential and random searching. However, the user was required to write a computer program to access data. First generation database technology was largely developed during the 1960s.



force writing	the ability to control when data are transferred to nonvolatile storage. This ability is fundamental to recovery management. Force writing typically occurs at the end of transaction and checkpoint time.



foreign key	a column or combination of columns in which the values must match those of a candidate key. A foreign key must have the same data type as its associated candidate key. In the CREATE TABLE statement of SQL:2003, a foreign key must be associated with a primary key rather than merely a candidate key.



form	a document used in a business process, formatted to provide a convenient way to enter and edit data. A form is designed to support a business task such as processing an order, registering for classes, or making an airline reservation.



form	entity type in the form analysis process, the form entity type is derived from the primary key of the form. The form entity type should be placed in the center of the ERD.



form structure	a hierarchy depicting the relationship among form fields. A group of form fields is known as a node. Most forms have a simple structure with a parent node (main form) and a child node (subform).



forward engineering	the ability to generate definitions for a target database management system from an ERD and data dictionary properties. See also CASE tool and reverse engineering.



fourth generation database technology	extend the boundaries of database technology to unconventional data and the Internet. Fourth generation systems can store and manipulate unconventional data types such as images, videos, maps, sounds, and animations as well as provide Web access to databases. Fourth generation database technology was largely commercialized during the 1990s.



fourth normal form (4NF)	a table is in 4NF if it does not contain any nontrivial MVDs. A nontrivial MVD is an MVD that is not also an FD.



fragment	a subset of a table that is allocated to sites. Fragments can be horizontal subsets (restrict operator), vertical subsets (project operator), derived horizontal subsets (semi-join operator), and combinations of these kinds of fragments. See also semi-join operator, fragmentation transparency, location transparency, and local mapping transparency.



fragmentation transparency	a level of data independence in distributed DBMSs in which queries can be formulated without knowledge of fragments. See location transparency and local mapping transparency.



full outer join	an outer join that produces the matching rows of the join part as well as the nonmatching rows from both tables.



functional dependency (FD)	a constraint about two or more columns of a table. X determines Y (X → Y) if there exists at most one value of Y for every value of X. A functional dependency is similar to a candidate key constraint because if X and Y are placed in a separate table, X is a candidate key.



fuzzy checkpoint	an alternative to traditional cacheconsistent checkpoints involving less overhead but may require more restart work. In a fuzzy checkpoint, the recovery manager only writes the buffer pages since the previous checkpoint. Most of these pages should have already been written to disk before the checkpoint. At restart time, the recovery manager uses the two most recent fuzzy checkpoint records in the log. See also checkpoint.



generalization hierarchy	a collection of entity types arranged in a hierarchical structure to show similarity in attributes. Each subtype or child entity represents a subset of its supertype or parent entity. See also supertype and subtype.



group condition	a comparison involving an aggregate function such as SUM or COUNT. Group conditions cannot be evaluated until after the GROUP BY clause is evaluated.



GROUPING SETS operator	an operator in the GROUP BY clause that requires explicit specification of column combinations. The GROUPING SETS operator is appropriate when precise control over grouping is required. The SQL:2003 standard provides the GROUPING SETS operator as an extension of the GROUP BY clause to support multidimensional data.



hash file	a specialized file structure that supports search by key. Hash files transform a key value into an address to provide fast access. Hash files may have poor performance for sequential access. A hash file may be static (requires periodic reorganization) or dynamic (does not require periodic reorganization). A hash structure may be used as a primary or a secondary file structure.



hash join	a join algorithm that uses an internal hash file for each table. The hash join algorithm can be used only for equi-joins. The hash join performs better than sort merge when the tables are not sorted or indexes do not exist. See also sort merge.



hierarchical form	a formatted window for data entry and display using a fixed (main form) and variable (subform) part. One record is shown in the main form and multiple, related records are shown in the subform.



hierarchical report	a formatted display of a query using indentation to show grouping and sorting. Also known as a control break report.



histogram	a two-dimensional graph where the x-axis represents column ranges and the y-axis represents the number of rows containing the range of values. Histograms support more detailed distribution data than the uniform value assumption. Histograms are part of a table profile. A traditional or equal-width histogram has equal column value widths but a varying number of rows in each bar. An equal-height histogram has variable-size column ranges but approximately equal number of rows in each bar. Most DBMSs use equal-height histograms because the maximum and expected estimation error can be easily controlled by increasing the number of ranges.



HOLAP	an acronym for Hybrid On-Line Analytical Processing. HOLAP is an implementation approach that combines the MOLAP and the ROLAP storage engines. HOLAP involves both relational and multidimensional data storage as well as combining data from both relational and multidimensional sources for data cube operations. See also MOLAP and ROLAP.



homonym	in view integration, a group of words that have the same sound and often the same spelling but have different meanings. Homonyms arise because of context of usage. See also synonym.



hot spot	common data that multiple users try to change simultaneously. Without adequate concurrency control, users may interfere with each other on hot spots. A systemindependent hot spot does not depend on the details of a particular concurrency control manager. Typically, systemindependent hot spots involve fields or rows in a database. A system-dependent hot spot depends on the details of the concurrency control manager, especially the locking granularity.



hybrid join	a join algorithm that combines the sort merge and nested loops algorithms. The outer table must be sorted or have a join column index. The inner table must have an index on the join column. This algorithm can only be used for equi-joins. The hybrid join performs better than the sort merge when the inner table has a nonclustering index on the join column. See also nested loops and sort merge.



Hypertext Markup Language (HTML)	the language in which most Web documents are written. HTML combines the structure, the content, and the layout of a document. See also XML and XSL.



identification dependency	involves a weak entity and one or more identifying relationships. See also weak entity and identifying relationship.



identifying relationship	a relationship that provides a component of the primary key to a weak entity. See also weak entity and identification dependency.



immediate update approach	an approach used by a recovery manager to record database changes on disk. In this approach, database updates are written to the disk when they occur but after the corresponding log updates. To restore a database, both undo and redo operations may be needed. See also deferred update approach and write ahead log protocol.



implicit PL/SQL cursor	a cursor that can be used in procedures written in PL/SQL, the database programming language of Oracle. An implicit cursor is neither explicitly declared nor explicitly opened. Instead a special version of the FOR statement declares, opens, iterates, and closes a locally named SELECT statement. An implicit cursor cannot be referenced outside of the FOR statement in which it is declared. See also cursor and explicit PL/SQL cursors.



incorrect summary	a concurrency control problem in which a transaction reads several values, but another transaction updates some of the values while the first transaction is still executing. Also known as an inconsistent analysis.



incremental checkpoint	an alternative to traditional cache-consistent checkpoints involving less overhead but may require more restart work. In an incremental checkpoint, no database pages are written to disk. Instead, dirty database pages are periodically written to disk in ascending age order. At checkpoint time, the log position of the oldest dirty data page is recorded to provide a starting point for recovery. The amount of restart work can be controlled by the frequency of writing dirty data pages. See also checkpoint.



incremental integration	an approach to view integration where a partially integrated ERD is merged with the next view. To integrate n views, there are n – 1 integration steps.



index	a secondary file structure that provides an alternative path to the data. Indexes typically contain only key values, not the other fields in a logical record. Indexes may be organized as Btrees, hash structures, or bitmap structures. See also Btree file, hash file, bitmap index.



index selection	for each table, select at most one clustering index and zero or more nonclustering indexes. In a clustering index, the order of the data records is close to the index order. In a nonclustering index, the order of the data records is unrelated to the index order. Index selection is an important subproblem of physical database design. Index selection usually chooses Btree indexes. Other kinds of indexes (hash and bitmap) also can be considered.



indirect user	users who access a database through reports or data extracts rather than through their own initiative. See also parametric user and power user.



inexact string matching	searching for a pattern of strings rather than just one string. In SQL, inexact matching uses the LIKE operator and pattern-matching characters.



information life cycle	the stages of information transformation in an organization. Typical stages of an information life cycle include acquisition, storage, protection, processing, formatting, dissemination, and usage.



information resource dictionary (IRD)	a database of metadata that describes the entire information systems life cycle. The information resource dictionary system manages access to an IRD. Also known as the repository.



information resource management	a broad management philosophy that seeks to use information technology as a tool for processing, distributing, and integrating information throughout an organization.



information system	a system that accepts data from its environment, processes the data, and produces output data for decision making. An information system consists of people, procedures, input data, output data, databases, software, and hardware.



information systems planning	the process of developing enterprise models of data, processes, and organizational roles. Information systems planning evaluates existing systems, identifies opportunities to apply information technology for competitive advantage, and plans new systems. Also known as business systems planning, information systems engineering, and information systems architecture.



inheritance	a data modeling feature that supports sharing of attributes between a supertype and a subtype. Subtypes inherit attributes from their supertypes. In SQL:2003, inheritance applies to both user-defined types and subtable families. Inheritance supports sharing of code and data among similar objects.



integration strategy	a mix of incremental and parallel approaches to integrate a set of views. The views are divided into subsets. For each subset of views, incremental integration is used. Parallel integration is usually applied to the ERDs resulting from integrating the view subsets.



intent lock	a lock on a large database item (such as a table) indicating intention to lock smaller items contained in the larger item. Intent locks alleviate blocking when locking coarse items and allow efficient detection of conflicts among locks on items of varying granularity.



internal schema	a description of the physical implementation of a database. See also schema, conceptual schema, external view, and Three Schema Architecture.



Internet	a global "network of networks" that is built from standard protocols.



interrelated	a fundamental characteristic of databases. Interrelated means that data stored as separate units can be connected to provide a whole picture. To support the interrelated characteristic, databases contain clusters of data known as entities and relationships connecting entities.



intersection	an operator of relational algebra that combines rows from two tables. The intersection operator finds rows that are common to both tables. Both tables must be union compatible to use the intersection operator.



intranet	a collection of computers and communication devices using the TCP/IP protocol. For security reasons, computers in an intranet are usually not accessible from computers on the Internet.



ISO	International Standards Organization, one of the groups responsible for SQL standards.



isolation level	defines the degree to which a transaction is separated from actions of other transactions. A transaction designer can balance concurrency control overhead with interference problems prevented by specifying the appropriate isolation level.



join	an operator of relational algebra used to combine rows from two tables. The join operator produces a table containing rows that match on a condition involving a column from each input table. See also equi-join and natural join.



join algorithm	an algorithm to implement the join operator. A query optimization component selects the least cost join algorithm for each join operation in a query. The common join algorithms are nested loops, sort-merge, hybrid join, hash join, and star join.



join operator style	a way to formulate joins in a SELECT statement. The join operator style lists join operations in the FROM clause using the INNER JOIN and ON keywords.



key preserving table	an Oracle term for the updatable table in a join view. A join view preserves a table if every candidate key of the table can be a candidate key of the join result table. This means that the rows of an updatable join view can be mapped in a 1-1 manner with each key preserved table. In a join involving a 1-M relationship, the child table could be key preserved because each child row is associated with at most one parent row.



knowledge management	applying information technology with human information processing capabilities and organization processes to support rapid adaptation to change.



large object architecture	an architecture for object databases in which large objects (binary or text) are stored in a database along with external software to manipulate large objects.



law of transitivity	a rule that states if an object A is related to an object B and B is related to C, then one can conclude that A is related to C. Functional dependencies obey the law of transitivity. See also functional dependency and transitive dependency.



load balancing	a problem of parallel database processing.



Load balancing	involves the amount of work allocated to different processors in a cluster. Ideally, each processor has the same amount of work to fully utilize the cluster. The shared nothing architecture is most sensitive to load balancing because of the need for data partitioning. See also shared nothing architecture and clustered nothing architecture.



load time	the time when a data warehouse is updated.



load time lag	the difference between transaction time and load time. Determining the load time lag is an important part of managing the refresh of a data warehouse. See also valid time lag.



local mapping transparency	a level of data independence in distributed DBMSs in which queries can be formulated without knowledge of local formats. However, knowledge of fragments and fragment allocations is necessary. See also fragment, fragmentation transparency, and location transparency.



location transparency	a level of data independence in distributed DBMSs in which queries can be formulated without knowledge of locations. However, knowledge of fragments is necessary. See also fragments, fragmentation transparency, and local mapping transparency.



lock	a fundamental tool of concurrency control. A lock on a database item prevents other transactions from performing conflicting actions on the same item. See also exclusive lock, intent lock, and shared lock.



locking granularity	the size of the database item locked. Locking granularity is a trade-off between waiting time (amount of concurrency permitted) and overhead (number of locks held).



logged change data	data obtained from a source system for refreshing a data warehouse. Logged change data involves files that record changes or other user activity such as Web logs or transaction logs. See also cooperative change data, snapshot change data, and queryable change data.



logical expression	an expression resulting in a true or false (Boolean) value. Logical expressions can involve comparisons and the logical operators (AND, OR, NOT, etc.).



lost update	a concurrency control problem in which one user's update overwrites another user's update. See also scholar's lost update.



main form	the fixed part of a hierarchical form. The main form shows one record at a time.



mandatory access control	a database security approach for highly sensitive and static databases. In mandatory control approaches, each object is assigned a classification level and each user is given a clearance level. A user can access a database element if the user's clearance level provides access to the classification level of the element.



mandatory relationship	a relationship with a minimum cardinality of one or more. A mandatory relationship produces an existence dependency on the entity type associated with the minimum cardinality of one. See also Optional Relationship and Existence Dependency.



many-to-many (M-N) relationship	in the Entity Relationship Model, a relationship in which objects of each entity type can be related to many objects of the other entity type. M-N relationships have maximum cardinalities of more than one in each direction. In the Relational Model, two 1-M relationships and a linking or associative table represent an M-N relationship. See also one-to-many relationship and relationship.



materialized view	a stored view that must be periodically synchronized with its source data. Relational DBMSs provide materialized views with summarized data for fast query response. See also query rewriting and view.



message	a request to invoke a method on an object. When an object receives a message, it looks for an implementation in its own class. If an implementation cannot be found, the message is sent to the object's parent class. See also binding.



message-oriented middleware	maintain a queue of messages. A client process can place a message on a queue and a server can remove a message from a queue. Messageoriented middleware support complex messages among clients and servers.



metadata	data that describe other data including the source, use, value, and meaning of the data. See also data dictionary.



middleware	a software component in a client-server architecture that performs process management. Middleware allows servers to process messages efficiently from a large number of clients. In addition, middleware can allow clients and servers to communicate across heterogeneous platforms. Prominent kinds of database middleware include transaction-processing monitors, message-oriented middleware, and object-request brokers.



modification anomaly	an unexpected side effect that occurs when changing the data in a table with excessive redundancies.



MOLAP	an acronym for Multidimensional On-Line Analytical Processing. MOLAP is a storage engine that directly stores and manipulates data cubes. MOLAP engines generally offer the best query performance but suffer from limitations in the size of data cubes supported.



multiple-tier architecture	a client-server architecture with more than three layers: a PC client, a backend database server, an intervening middleware server, and application servers. The application servers perform business logic and manage specialized kinds of data such as images. See also two-tier architecture and three-tier architecture.



multivalued dependency (MVD)	a constraint involving three columns. The MVD A →→ B \| C (read A multidetermines B or C) means that (1) a given A value is associated with a collection of B and C values; and (2) B and C are independent given the relationships between A and B and A and C. All FDs are also MVDs, but not all MVDs are FDs. An MVD is nontrivial if it is also not an FD. See also relationship independence and functional dependency.



M-way (multiway) relationship	a relationship involving more than two entity types. In some ERD notations such as the Crow's Foot, an M-way relationship is represented as an M-way associative entity type. An M-way associative entity type has more than two identifying relationships.



name qualification preceding	a column name with its table name. The column name alone is an abbreviation. If the same column name occurs in two tables in an SQL statement, the column name must be qualified with its table name. The combination of the table name and column name must be unique across all tables in a database.



natural join	a variation of the join operator of relational algebra. In a natural join, the matching condition is equality (equi-join), one of the matching columns is discarded in the result table, and the join columns have the same unqualified names. See also equi-join and join.



nested loops	a join algorithm that uses an outer table and an inner table. The nested loops algorithm is the most general join algorithm as it can be used to evaluate all join operations, not just equi-joins. The nested loops algorithm performs well when there are few rows in the outer table or when all pages of the inner table fit into memory. An index on a foreign key join column allows efficient usage of the nested loops algorithm when there are restrictive conditions on the parent table. See also hash join and sort merge.



nested query	a query inside a query. In an SQL SELECT statement, a SELECT statement can be part of conditions in the HAVING and WHERE clauses. See Type I and Type II nested query for two variations. Also known as a subquery and an inner query.



node key	a field(s) in a node of a form structure with unique values. The key of the root node is unique among all form instances. The key of a child node is unique within its parent node.



nonclustering index	an index in which the order of the data records is not related to the index order. A nonclustering index is always a secondary file structure. See also index selection, clustering index, and secondary file structure.



nonmatching row	a row that does not combine with a row from a second table to satisfy a join condition. The row will not be in the result of the join operation, but it will be in the result of an outer join operation.



nonprocedural database language	a language such as SQL that allows you to specify what part of a database to access rather than to code a complex procedure. Nonprocedural languages do not include looping statements.



nonrepeatable read	a concurrency control problem in which a transaction reads the same value more than one time. In between reading the data item, another transaction modifies the data item.



nontrivial MVD	an MVD that is not also an FD. By definition, every FD is an MVD. However, not all MVDs are FDs. MVDs in which a column is associated with more than one value of two columns, is a nontrivial MVD.



normal form	a rule about allowable dependencies.



normalization	the process of removing redundancies from tables so that the tables are easier to change. To normalize a table, list the functional dependencies and make tables that satisfy a normal form, usually third normal form (3NF) or Boyce-Codd normal form (BCNF).



null value	a special value that represents the absence of an actual value. A null value can mean that the actual value is unknown or does not apply to the given row.



object	an instance of a class in object-oriented computing. An object has a unique identifier that is invisible and nonchangeable.



object database middleware	an architecture for object databases in which middleware manages complex data possibly stored outside of a database along with traditional data stored in a database.



object-oriented DBMS	a new kind of DBMS designed especially for objects. Object-oriented DBMSs have an object query processor and an object kernel. The Object Data Management Group provides the standard for objectoriented DBMSs.



object-relational DBMS	a relational DBMS extended with an object query processor for user-defined data types. SQL:2003 provides the standard for object relational DBMSs.



OLAP (online analytical processing)	general name of technology to support multidimensional databases. OLAP technology encompasses the multidimensional data model and implementation approaches.



one-sided outer join	an outer join that produces the matching rows (the join part) as well as the nonmatching rows from only one of the tables, the designated input table.



one-to-many (1-M)	relationship in the Entity Relationship Model, a relationship in which the maximum cardinality is 1 in one direction and M in the other direction. In the Relational Data Model, a referential integrity constraint usually indicates a 1-M relationship. See also relationship and many to many relationship.



one-to-many (1-M) updatable query	a type of updatable view in Microsoft Access involving one or more 1-M relationships.



operational database	a database to support the daily functions of an organization. Operational databases directly support major functions such as order processing, manufacturing, accounts payable, and product distribution.



oper (operational) mart	a just-in-time data mart usually built from one operational database in anticipation or in response to major events such as disasters and new product introductions. An oper mart supports peak demand for reporting and business analysis that accompanies a major event. See also data mart.



optimistic concurrency control	a concurrency control approach in which transactions are permitted to access a database without acquiring locks. Instead, the concurrency control manager checks whether a conflict has occurred. The check can be performed either just before a transaction commits or after each read and write. By reviewing the relative time of reads and writes, the concurrency control manager can determine whether a conflict has occurred. If a conflict occurs, the concurrency control manager issues a rollback and restarts the offending transaction.



optional relationship	a relationship with a minimum cardinality of zero. An optional relationship means that entities can be stored without participation in the relationship. See also mandatory relationship.



Oracle Real Application Cluster (RAC)	an Oracle technology for parallel database processing. Oracle RAC uses the clustered disk architecture.



outer join	an operator of relational algebra that combines two tables. In an outer join, the matching and nonmatching rows are retained in the result. See one-sided and full outer join for two variations of this operator.



overlapping triggers	two or more triggers with the same timing, granularity, and applicable table. The triggers overlap if an SQL statement causes both triggers to fire. You should not depend on a particular firing order for overlapping triggers. See also trigger and trigger execution procedure.



package	a PL/SQL unit of modularity. Packages support a larger unit of modularity than procedures or functions. A package may contain procedures, functions, exceptions, variables, constants, types, and cursors. By grouping related objects together, a package provides easier reuse than individual procedures and functions.



parallel database management system (DBMS)	a DBMS capable of utilizing tightly coupled computing resources (processors, disks, and memory). Tight coupling is achieved by networks with data exchange time comparable to the time of the data exchange with a disk. Parallel database technology promises performance improvements and high availability but interoperability problems if not properly managed. See also speedup and scaleup.



parallel integration	an approach to view integration where all views are integrated in one step. To integrate n views, there are n view design steps and one integration step. The view design steps may be performed in parallel by separate design teams.



parametric user	someone who uses a database by requesting existing forms or reports using parameters, input values that change from usage to usage. See also indirect user and power user.



persistent	a fundamental characteristic of databases. Persistent means that data has a lifetime longer than the execution of a computer program. To be persistent, data must reside on stable storage such as magnetic disk.



phantom read problem	a concurrency control problem in which a transaction executes a query with record conditions but another transaction inserts new rows or modifies existing rows while the first transaction is still executing. The first transaction then reexecutes the original query again, but the results are different than the results for the first execution. The new rows are phantom because they did not exist for the first execution of the query.



physical record	collection of bytes that are transferred between volatile storage in main memory and stable storage on a disk. The number of physical record accesses is an important measure of database performance.



pivot	a data cube operator in which the dimensions in a data cube are rearranged. See also data cube.



polymorphism	a principle of object-oriented computing in which a computing system has the ability to choose among multiple implementations of a method. The appropriate implementation is chosen by the system (object DBMS or object-oriented programming language). Polymorphism permits a smaller vocabulary of procedures and incremental sharing of code.



power user	someone who uses a database by submitting unplanned or ad hoc requests for data. Power users should have a good understanding of nonprocedural access. See also indirect user and parametric user.



primary copy protocol	a protocol for concurrency control of distributed transactions. Each replicated fragment is designated as either the primary copy or a secondary copy. During distributed transaction processing, only the primary copy is guaranteed to be current at the end of a transaction. Updates may be propagated to secondary copies after the end of a transaction.



primary file structure	a file structure storing both key data and nonkey data. Sequential files only can be file structures. Hash structures and Btrees can be primary or secondary file structures. See also secondary file structure.



primary key	a specially designated candidate key. The primary key for a table cannot contain null values.



procedural language interface	a method to combine a nonprocedural language such as SQL with a programming language such as COBOL or Visual Basic. Embedded SQL is an example of a procedural language interface.



process model	a graphical model showing the relationships between processes. A process can provide input data used by other processes or use output data of other processes. The well-known data flow diagram is an example of a process model. See also data model and environment interaction model.



project	an operator of relational algebra. A project operation retrieves a subset of specified columns of the input table. Duplicate rows are eliminated in the result if present.



prototype	a fast implementation of an application in an information system. Prototypes can demonstrate forms, reports, and menus to enable feedback from users.



query	request to extract useful data. Query formulation involves translating a problem into a language (such as an SQL SELECT statement) understood by a DBMS.



query binding	the process of associating a query with an access plan. Some DBMSs rebind automatically if a query changes or the database changes (file structures, table profiles, data types, etc.).



query rewriting	a substitution process in which a materialized view replaces references to fact and dimension tables in a query. In addition to performing the substitution, the query optimizer evaluates whether the substitution will improve performance over the original query. See also materialized view.



queryable change data	data obtained from a source system for refreshing a data warehouse. Since queryable change data comes directly from a data source via a query, the data source must have timestamps. See also cooperative change data, logged change data, and snapshot change data.



RAID (redundant arrays of independent disks)	a collection of disks (a disk array) that operates as a single disk. RAID storage supports parallel read and write operations with high reliability.



RAID-1	an architecture for RAID storage in which redundant arrays of disks provide high reliability and performance but with large storage overhead. RAID-1 uses disk mirroring to achieve high performance and reliability.



RAID-5	an architecture for RAID storage in which randomly located error-correcting pages provide high reliability without excessive storage overhead. RAID-5 uses striping to achieve good performance and reliability without excessive storage overhead.



read-only view	a view that can be used in SELECT statements but not in UPDATE, INSERT, and DELETE statements. All views are at least read-only.



recovery transparency	a service provided by a DBMS to automatically restore a database to a consistent state after a failure. The recovery manager is the component of a DBMS responsible for recovery transparency.



referenced row	a row of a parent table having a primary key value that is identical to the foreign key values of rows in a child table. See also actions on referenced rows.



referential integrity	an integrity constraint involving a candidate key in one table with the related foreign key of another table. Only two kinds of values can be stored in a foreign key: (1) a value matching a candidate key value in some row of the table containing the associated candidate key or (2) a null value. See also primary key, candidate key, and foreign key.



refresh constraint	a constraint on a data warehouse or a source system that limits the details of a refresh process. Refresh constraints can be classified as source access, integration, data warehouse availability, or completeness/ consistency.



relation	synonymous with table. A term typically used in academic research about databases.



relational algebra	a set of operators to manipulate relational databases. Each operator uses one or two tables as input and produces a new table as output.



relational data model	using tables, matching values for connections among tables, and table operators to represent a collection of data.



relational DBMS	a system that uses the Relational Data Model to manage collections of data.



relationship	in the Entity Relationship Model, a relationship is a named association among entity types. In the Relational Model, a relationship is a connection among tables shown by column values in one table that match column values in another table. Referential integrity constraints and foreign keys indicate relationships in the Relational Model. See also one-to-many relationship, manyto- many relationship, and referential integrity.



relationship cycle	a collection of relationships arranged in a loop starting and ending with the same entity type. You should examine relationship cycles to determine whether a relationship can be derived from other relationships.



relationship equivalence	a rule about the equivalence between 1-M and M-N relationships. An M-N relationship can be replaced by an associative entity type and two identifying 1-M relationships. See also associative entity type and identifying relationship.



relationship independence	a relationship that can be derived from two independent relationships.



report	a stylized presentation of data appropriate to a selected audience. Reports enhance the appearance of data that are displayed or printed. See also hierarchical report.



restrict	an operator of relational algebra. A restrict operation retrieves a subset of rows that satisfy a given condition.



reverse engineering	the ability to extract definitions from a target database management system and use the definitions to create an ERD and data dictionary properties. See also CASE tool and forward engineering.



ROLAP	an acronym for Relational On-Line Analytical Processing. ROLAP involves relational DBMS extensions to support multidimensional data. ROLAP engines support a variety of storage and optimization techniques for summary data retrieval.



roll-up	a data cube operator that supports navigation from a more specific level of a dimension to a more general level of a dimension. The roll-up operator requires a hierarchical dimension. See also drill-down.



ROLLUP operator	an operator in the GROUP BY clause that augments the normal GROUP BY result with a partial set of subtotals. The SQL:2003 standard provides the CUBE operator as an extension of the GROUP BY clause to support multidimensional data. The ROLLUP operator is appropriate to summarize columns from the same dimension hierarchy.



row condition	a comparison not involving an aggregate function. Row conditions are evaluated in the WHERE clause.



rules about referenced rows	rules that describe actions on related rows when a row in a primary key table (the referenced row) is deleted or its primary key is updated.



save point	an intermediate point in a transaction in which a rollback may occur. Save points are supported by proprietary SQL extensions and by the SQL:2003 standard.



scaleup	in distributed database processing, scaleup involves the amount of work that can be accomplished by increasing computing capacity. Scaleup measures the increased size of the job which can be done while holding time constant. Scaleup is measured as the ratio of the amount of work completed with the original configuration to the amount of work completed with the larger configuration.



schema	a definition of the conceptual, external, or internal parts of a database. At the conceptual level, a schema is a diagram depicting the entities and relationships in a database. See also the Three Schema Architecture, external view, conceptual schema, and internal schema.



schema mapping	describes how a schema at a higher level is derived from a schema at a lower level. A mapping provides the knowledge to convert a request from a higher schema representation to a lower schema representation. See also Three Schema Architecture and schema.



scholar's lost update	a variation of the lost update problem. The word scholar is ironic in that the scholar's lost update problem differs only slightly from the traditional lost update problem. The only essential difference between the scholar's lost update problem and the traditional lost update problem is that transaction A commits before transaction B changes the common data. See also lost update.



second generation database technology	the first true DBMSs that managed multiple entity types and relationships. However, to obtain access to data, a computer program still had to be written. Second generation database technology was largely developed during the 1970s.



second normal form (2NF)	a table is in 2NF if every nonkey column is dependent on the whole key, not part of the key.



secondary file structure	a file structure storing key data along with pointers to the nonkey data. Bitmap indexes can be secondary file structures only. Hash structures and Btrees may be primary or secondary file structures. See also primary file structure.



self-join	a join between a table and itself (two copies of the same table). Typically, a self-join is used to query selfreferencing relationships.



self-referencing relationship	a relationship involving the same table or entity type. Self-referencing relationships represent associations among members of the same set. Also known as a unary, reflexive, or recursive relationship.



semi-join operator	an operator of relational algebra that is especially useful for distributed database processing. A semi-join is half of a join: the rows of one table that match with at least one row of another table.



sequential file	a simple file organization in which records are stored in insertion order by key value. Sequential files are simple to maintain and provide good performance for processing large numbers of records.



server	a program that processes requests on behalf of a client. A database server may interpret SQL statements, locate data, update tables, check integrity rules, and return data back to clients.



shared	a fundamental characteristic of databases. Shared means that a database can have multiple uses and users. A large database can have hundreds of functions that use it as well as thousands of users simultaneously accessing it.



shared disk (SD) architecture	an architecture for parallel database processing in which each processor has its private memory, but disks are shared among all processors.



shared everything (SE) architecture	an architecture for parallel database processing in which memory and disks are shared among a collection of processors. The SE approach is usually regarded as a symmetric multiprocessing computer rather than a parallel database architecture.



shared lock	a lock that allows some users to read a database item but prevents these users from changing the value of a database item. Shared locks conflict with exclusive locks but not other shared locks. A shared lock indicates that a user will read but not change the value of a database item. Also known as an S lock.



shared nothing (SN) architecture	an architecture for parallel database processing in which each processor has its own memory and disks. Data must be partitioned among the processors in the SN architecture.



simple synthesis procedure	a set of steps to produce tables in BCNF using a collection of functional dependencies. The simple synthesis procedure is limited to simple dependency structures.



slice	a data cube operator in which a dimension is replaced by a single member value or a summary of its member values. See also dice.



snapshot change data	data obtained from a source system for refreshing a data warehouse. Snapshot change data involves periodic dumps of source data. To derive change data, a difference operation uses the two most recent snapshots. See also cooperative change data, logged change data, and queryable change data.



snowflake schema	a data modeling representation for multidimensional databases. In a relational database, a snowflake schema has multiple levels of dimension tables related to one or more fact tables. You should use the snowflake schema instead of the star schema for small dimension tables that are not in third normal form. See also star schema, constellation schema, fact table, and dimension table.



sort merge	a join algorithm that requires both tables to be sorted on the join column. The sort merge algorithm can be used only for equi-joins. The sort merge algorithm performs well if the sort cost is small or if a clustered join index exists. See also hash join and nested loops.



sparsity	the extent of empty cells in a data cube. If a large number of cells are empty, the data cube can waste space and be slow to process. Special compression techniques can be used to reduce the size of sparse data cubes. Sparsity can be a problem if two or more dimensions are related such as products and regions where products are sold. See also data cube.



specialized media server architecture	an architecture for object databases in which a dedicated server manages complex data outside of a database. Programmers use an application programming interface to access complex data.



speedup	in distributed database processing, speedup involves the decrease in time to complete a task with additional computing capacity. Speedup measures time savings while holding the task constant. Speedup is measured by the ratio of the completion time with the original configuration to the completion time with the additional capacity. See also distributed database management system and scaleup.



SQL	an acronym for the Structured Query Language. SQL is an industry-standard database language that includes statements for database definition (such as the CREATE TABLE statement), database manipulation (such as the SELECT statement), and database control (such as the GRANT statement). SQL began as a proprietary language developed by IBM. SQL is now a widely supported international standard for databases.



SQL:2003	the most recent standard of the Structured Query Language. SQL:2003 supports numerous extensions beyond SQL-92 and updates features first specified in the previous standard (SQL:1999). The SQL:2003 standard includes nine parts and seven packages. Core SQL:2003 consists of parts 1, 2, and 11. Each non-core part contains mandatory and optional features. A package is a collection of optional features for some application area or implementation environment.



standalone SQL	using a specialized editor that submits SQL statements directly to the DBMS and displays the results returned from the DBMS. See also embedded SQL.



star join	a join algorithm that combines two or more tables in which there is one child table related to multiple parent tables in 1-M relationships. Each parent table must have a bitmap join index. The star join is the best join algorithm when there are highly selective conditions on the parent tables. The star join algorithm is widely used to optimize data warehouse queries. See also bitmap index.



star schema	a data modeling representation for multidimensional databases. In a relational database, a star schema has a fact table in the center related to dimension tables. See also snowflake schema, constellation schema, fact table, and dimension table.



statement-level interface	a language style for integrating a programming language with a nonprocedural language such as SQL. A statement-level interface involves changes to the syntax of a host programming language to accommodate embedded SQL statements. SQL:2003 specifies statements to establish database connections, execute SQL statements, use the results of an SQL statement, associate programming variables with database columns, handle exceptions in SQL statements, and manipulate database descriptors.



storage area network (SAN)	a specialized high-speed network that connects storage devices and servers. The goal of SAN technology is to integrate different types of storage subsystems into a single system and to eliminate the potential bottleneck of a single server controlling storage devices. A SAN is complementary to RAID disk storage.



stored procedure	a collection of statements that are managed by a DBMS. Stored procedures extend the capabilities of SQL. Most DBMSs provide a proprietary language in which to write stored procedures.



stripe	the set of physical records that can be read or written in parallel in RAID storage. Normally, a stripe contains a set of adjacent physical records.



striping	a technique for allocating physical records in RAID storage so that parallel read and write operations are possible.



strong type checking	the ability to ensure that expressions contain no incompatibility errors. Strong type checking is an important kind of error checking for objectoriented coding.



subform	the variable or repeating part of a hierarchical form. The subform can show multiple records at a time.



subquery	see nested query.



subtype	a child entity type in a generalization hierarchy. A subtype represents a more specialized entity type than its supertype.



summarize	an operator of relational algebra that compresses the rows of a table. A summarize operation produces a table with rows that summarize the rows of the input table. Aggregate functions are used to summarize the rows of the input table.



superkey	a column or combination of columns containing unique values for each row. The combination of every column in a table is always a superkey because rows in a table must be unique.



supertype	a parent entity type in a generalization hierarchy. A supertype represents a more general entity type than its subtypes.



system	a set of related components that work together to accomplish some objectives.



synonym	in view integration, a group of words that are spelled differently but have the same meaning. Synonyms often occur because different parts of an organization may use different vocabulary to describe the same things. See also homonym.



table	a named, two-dimensional arrangement of data. A table consists of a heading part and a body part.



table body	synonymous with the rows of a table.



table heading	consists of the table name, the column names, and a data type for each column.



table profile	a statistical summary of the rows, columns, and participating relationships of a table. Table profiles are an important input of the physical database design phase because they are used to predict the fraction of a table accessed in a query.



ternary relationship	a relationship involving three entity types. In some ERD notations such as the Crow's Foot notation, a ternary relationship is represented as an associative entity type with three 1-M relationships.



third generation database technology	relational DBMSs incorporating nonprocedural access, optimization technology, and transaction processing capabilities. Third generation database technology was largely commercialized during the 1980s.



third normal form (3NF)	a table is in 3NF if it is in 2NF and every nonkey column is dependent only on the key.



Three Schema Architecture	an architecture for compartmentalizing database descriptions. The Three Schema Architecture contains the external or user level, the conceptual level, and the internal or physical level. The Three Schema Architecture was proposed as a way to achieve data independence.



three-tier architecture	a client-server architecture with three layers: a PC client, a backend database server, and either a middleware or an application server. See also twotier architecture and multiple-tier architecture.



three-tier data warehouse architecture	an architecture for a data warehouse in which user departments access data marts rather than the data warehouse. An extraction process involving the data warehouses periodically refreshes the data marts. See also two-tier data warehouse architecture and bottom-up data warehouse architecture.



traditional set operators	the union, intersection, and difference operators of relational algebra.



transaction	a unit of work that should be processed reliably. DBMSs provide recovery and concurrency control services to process transactions efficiently and reliably.



transaction boundary	an important decision of transaction design in which an application consisting of a collection of SQL statements is divided into one or more transactions. Transaction boundary decisions can affect (positively or negatively) transaction throughput.



transaction log	a table that contains a history of database changes. The recovery manager uses the log to recover from failures.



transaction processing	reliable and efficient processing of large volumes of repetitive work. DBMSs ensure that simultaneous users do not interfere with each other and that failures do not cause lost work. See also transaction.



Transaction Processing Council (TPC)	an organization that develops standard, domain-specific benchmarks and publishes the results. The TPC has developed benchmarks for order-entry transaction processing, ad hoc decision support queries, business reporting decision support, and Web e-commerce transaction processing. See also benchmark.



transaction-processing monitor	an early and still important kind of database middleware. A transactionprocessing monitor receives transactions, schedules them, and manages them to completion. Transaction-processing monitors also may support updating multiple databases in a single transaction.



transaction throughput	the number of transactions processed per time interval. It is a measure of transaction processing performance. Typically, transaction throughput is reported in transactions per minute.



transaction time	the time when an operational data source is updated.



transitive dependency	a functional dependency derived by the law of transitivity. Transitive FDs should not be recorded as input to the normalization process. See also functional dependency and law of transitivity.



trigger	a rule that is stored and executed by a DBMS. Because a trigger involves an event, a condition, and a sequence of actions, it also is known as an event-conditionaction rule. Triggers were not part of SQL-92, although many vendors provided extensions for them. Triggers are part of SQL:2003. See also overlapping triggers.



trigger execution procedure	specifies the order of execution among various kinds of triggers, integrity constraints, and database manipulation statements. Trigger execution procedures can be complex because the actions of a trigger may fire other triggers. See also overlapping triggers.



two-phase commit protocol (2PC)	a rule to ensure that distributed transactions are atomic. 2PC uses a voting and a decision phase to coordinate commits of local transactions.



two-phase locking protocol (2PL)	a rule to ensure that concurrent transactions do not interfere with each other. 2PL requires that locks are used before reading or writing a database item, a transaction waits if a conflicting lock is held on a data item, and locks are not released until new locks are no longer needed. To simplify implementation, most DBMSs hold at least exclusive locks until a transaction commits.



two-tier architecture	a client-server architecture in which a PC client and a database server interact directly to request and transfer data. The PC client contains the user interface code, the server contains the data access logic, and the PC client and the server share the validation and business logic. See also three-tier architecture and multiple-tier architecture.



two-tier data warehouse architecture	an architecture for a data warehouse in which user departments directly use the data warehouse rather than smaller data marts. See also three-tier data warehouse architecture and bottom-up data warehouse architecture.



Type I nested query	a nested query in which the inner query does not reference any tables used in the outer query. Type I nested queries can be used for some join problems and some difference problems.



Type II nested query	a nested query in which the inner query references a table used in the outer query. Type II nested queries can be used for difference problems but should be avoided for join problems.



uncommitted dependency	a concurrency control problem in which one transaction reads data written by another transaction before the other transaction commits. If the second transaction aborts, the first transaction has read phantom data that will no longer exist. Also known as a dirty read.



uniform value assumption	assuming that each column value is equally likely (has the same number of rows). The uniform value assumption allows compact representation of a distribution, but it can lead to large estimation errors that lead to poor choices in query optimization and index selection.



union	an operator of relational algebra that combines rows from two tables. The result of a union operation has all the rows from either table. Both tables must be union compatible to use the union operator.



union compatibility	a requirement for the union, intersection, and difference operators of relational algebra. Union compatibility requires that both tables have the same number of columns and each corresponding column must have a compatible data type.



updatable view	a view that can be used in SELECT statements as well as UPDATE, INSERT, and DELETE statements. When modifying the rows of an updatable view, the DBMS translates the view modifications into modifications to rows of the base tables.



valid time	the time when an event occurs.



valid time lag	the difference between the occurrence of an event in the real world (valid time) and the storage of the event in an operational database (transaction time). See also valid time, load time lag, and transaction time.



view	virtual or derived table. A view is derived from base or physical tables using a query. See also materialized view.



view materialization	a method to process a query on a view by executing the query directly on the stored view. The stored view can be materialized on demand (when the view query is submitted) or periodically rebuilt from the base tables. For data warehouses, materialization is the preferred strategy for processing view queries.



view modification	a method to process a query on a view involving the execution of only one query. A query using a view is translated into a query using base tables by replacing references to the view with its definition. For transaction databases, view modification is the preferred strategy for processing most view queries.



volatile storage	storage that loses its state when the power is disconnected. Main memory is typically volatile. Nonvolatile storage does not lose its state when power is disconnected. A hard disk is an example of nonvolatile storage.



waterfall model	a reference framework for information systems development. The waterfall model consists of iteration among analysis, design, and implementation.



weak entity	an entity type that borrows part or all of its primary key from another entity type. A weak entity is also existent dependent. See also identification dependency and identifying relationship.



web services architecture	an architecture that supports electronic commerce among organizations. A set of related Internet standards supports high interoperability among service requestors, service providers, and service registries. The most important standard is the Web Service Description Language used by service requestors, service providers, and service registries.



WITH CHECK OPTION	a clause in the CREATE VIEW statement that can be used to prevent updates with side effects. If the WITH CHECK OPTION is specified, INSERT or UPDATE statements that violate a view's WHERE clause are rejected.



workflow	a collection of related tasks structured to accomplish a business process.



World Wide Web (WWW)	a collection of pages that can be viewed over the Internet. In the WWW, a browser displays pages sent by a Web server. The WWW is the most popular application on the Internet.



write ahead log protocol	in the immediate update recovery process, log records must be written to stable storage before corresponding database records.



XML (eXtensible markup language)	a simple language that supports the specification of other languages. XML has evolved into a collection of languages that separate the content, structure, and formatting of documents on the World Wide Web. The XML Schema Language, an important member of the XML language family, supports standardization of the structure of XML documents.

Log In
Instructors may log in here to access additional teaching material for this site.
Username:
Password:
	Forgot your password?

Site Preferences	(Log out)

		Send mail as:

TA email:
Other email:

"Floating" navigation?
Drawer speed:

Notes		(What is this?)

Add a note
1.	(optional) Enter a note here:
2.	(optional) Select some text on the page (or do this before you open the "Notes" drawer).
3.	Highlighter Color:
4.

Search
Search for:
Search in:

Home > Glossary

	Course-wide Content
		E/R Assistant Glossary Web Links

	Instructor Resources
		Case Studies Material for a first c... Material for a second ... End of Chapter Solutions Chapter Problem Soluti... Databases Sample Chapters Book Preface PageOut

	Course-wide Content
		E/R Assistant Glossary Web Links