0% found this document useful (0 votes)
193 views45 pages

Database Systems Overview

CS 2255 DATABASE MANAGEMENT SYSTEMS The document introduces database management systems (DBMS). It defines a database as a structured collection of data organized to model relevant aspects of reality. A DBMS manages database data and structures to ensure quality, accuracy, availability and resilience. Popular general-purpose DBMS include Oracle, SQL Server, and MySQL. A DBMS aims to satisfy many applications, making it complex software developed over thousands of person-years. Views provide virtual tables computed from query results, and allow abstraction and security. Data models like hierarchical, network and relational structures provide means to describe data structures for applications.

Uploaded by

Saran Rider
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
193 views45 pages

Database Systems Overview

CS 2255 DATABASE MANAGEMENT SYSTEMS The document introduces database management systems (DBMS). It defines a database as a structured collection of data organized to model relevant aspects of reality. A DBMS manages database data and structures to ensure quality, accuracy, availability and resilience. Popular general-purpose DBMS include Oracle, SQL Server, and MySQL. A DBMS aims to satisfy many applications, making it complex software developed over thousands of person-years. Views provide virtual tables computed from query results, and allow abstraction and security. Data models like hierarchical, network and relational structures provide means to describe data structures for applications.

Uploaded by

Saran Rider
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

CS 2255 DATABASE MANAGEMENT SYSTEMS

UNIT1. INTRODUCTION
A database is a structured collection of data. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). The term database is correctly applied to the data and their supporting data structures, and not to the database management system (DBMS). The database data collection with DBMS is called a database system. The term database system implies that the data are managed to some level of quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose database management system (DBMS).[1] A general-purpose DBMS is typically a complex software system that meets many usage requirements to properly maintain its databases which are often large and complex. This is specially the case with client-server, near-real time transactional systems, in which multiple users have access to data, data is concurrently entered and inquired for in ways that preclude single-thread batch processing. Most of the complexity of those requirements are still present with personal, desktop-based database systems. Well known DBMSs include Oracle, FoxPro, IBM DB2, Linter, Microsoft Access, Microsoft SQL Server, MySQL,PostgreSQL and SQLite. A database is not generally portable across different DBMS, but different DBMSs can inter-operate to some degree by using standards like SQL and ODBC together to support a single application built over more than one database. A DBMS also needs to provide effective run-time execution to properly support (e.g., in terms of performance, availability, and security) as many database end-users as needed. A way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance.

The term database may be narrowed to specify particular aspects of organized collection of data and may refer to the logical database, to the physical database as data content in computer data storage or to many other database sub-definitions.

Purpose of Database System


A DBMS has evolved into a complex software system and its development typically requires thousands of person-years of development effort. Some general-purpose DBMSs, like Oracle, Microsoft SQL Server, FoxPro, and IBM DB2, have been undergoing upgrades for thirty years or more. General-purpose DBMSs aim to satisfy as many applications as possible, which typically makes them even more complex than special-purpose databases. However, the fact that they can be used "off the shelf", as well as their amortized cost over many applications and instances, makes them an attractive alternative (Vs. one-time development) whenever they meet an application's requirements. Though attractive in many cases, a general-purpose DBMS is not always the optimal solution: When certain applications are pervasive with many operating instances, each with many users, a general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large amount of unnecessary, unutilized software code). Such applications usually justify dedicated development. Typical examples are email systems, though they need to possess certain DBMS properties: email systems are built in a way that optimizes email messages handling and managing, and do not need significant portions of a general-purpose DBMS functionality.

Views of data
In database theory, a view consists of a stored [clarify] query accessible as a virtual [clarify] table in a relational database or a set of documents in a document-oriented database composed of the result set of a query or map-and-reduce functions. Unlike ordinary tables (base tables) in a relational database, a view does not form part of the physical schema: it is a dynamic, virtual table computed or collated from data in the database. Changing the data in a table alters the data shown in subsequent invocations of the view. In some NoSQL databases views are the only way to query data.

Views can provide advantages over tables: Views can represent a subset of the data contained in a table Views can join and simplify multiple tables into a single virtual table Views can act as aggregated tables, where the database engine aggregates data (sum, average etc.) and presents the calculated results as part of the data Views can hide the complexity of data; for example a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table Views take very little space to store; the database contains only the definition of a view, not a copy of all the data it presents Depending on the SQL engine used, views can provide extra security Views can limit the degree of exposure of a table or tables to the outer world Just as functions (in programming) can provide abstraction, so database users can create abstraction by using views. In another parallel with functions, database users can manipulate nested views, thus one view can aggregate data from other views. Without the use of views the normalization of databases above second normal form would become much more difficult. Views can make it easier to create lossless join decomposition. Just as rows in a base table lack any defined ordering, rows available through a view do not appear with any default sorting. A view is a relational table, and the relational model defines a table as a set of rows. Since sets are not ordered - by definition - the rows in a view are not ordered, either. Therefore, an ORDER BY clause in the view definition is meaningless. The SQL standard (SQL:2003) does not allow an ORDER BY clause in a subselect [clarify] in a CREATE VIEW statement, just as it is not allowed in a CREATE TABLE statement. However, sorted data can be obtained from a view, in the same way as any other table - as part of a query statement. Nevertheless, some DBMS (such as Oracle Database and SQL Server[ambiguous]) allow a view to be created with an ORDER BY clause in a subquery, affecting how data is displayed.

Data models A data model is an abstract structure that provides the means to effectively
describe specific data structures needed to model an application. As such a data model needs sufficient expressive power to capture the needed aspects of applications. These applications are

often typical to commercial companies and other organizations (like manufacturing, humanresources, stock, banking, etc.). For effective utilization and handling it is desired that a data model is relatively simple and intuitive. This may be in conflict with high expressive power needed to deal with certain complex applications. Thus any popular general-purpose data model usually well balances between being intuitive and relatively simple, and very complex with high expressive power. The application's semantics is usually not explicitly expressed in the model, but rather implicit (and detailed by documentation external to the model) and hinted to by data item types' names (e.g., "part-number") and their connections (as expressed by generic data structure types provided by each specific model). Hierarchical model In the Hierarchical model different record types (representing real-world entities) are embedded in a predefined hierarchical (tree-like) structure. This hierarchy is used as the physical order of records in storage. Record access is done by navigating through the data structure using pointers combined with sequential accessing. This model has been supported primarily by the IBM IMS DBMS, one of the earliest DBMSs. Various limitations of the model have been compensated at later IMS versions by additional logical hierarchies imposed on the base physical hierarchy. Network model In this model a hierarchical relationship between two record types (representing real-world entities) is established by the set construct. A set consists of circular linked lists where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph (ownership defines a direction), or network construct. Access to records is either sequential (usually in each record type) or by navigation in the circular linked lists.

This model is more general and powerful than the hierarchical, and has been the most popular before being replaced by the Relational model. It has been standardized by CODASYL. Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS. IDMS gained a considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages. An inverted file or inverted index of a first file, by a field in this file (the inversion field), is a second file in which this field is the key. A record in the second file includes a key and pointers to records in the first file where the inversion field has the value of the key. This is also the logical structure of contemporary database indexes. The related Inverted file data model utilizes inverted files of primary database files to efficiently directly access needed records in these files. Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970. ADABAS has gained considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages. Relational model The relational model is a simple model that provides flexibility. It organizes data based on twodimensional arrays known as relations, or tables as related to databases. These relations consist of a heading and a set of zero or more tuples in arbitrary order. The heading is an unordered set of zero or more attributes, or columns of the table. The tuples are a set of unique attributes mapped to values, or the rows of data in the table. Data can be associated across multiple tables with a key. A key is a single, or set of multiple, attribute(s) that is common to both tables. The most common language associated with the relational model is the Structured Query Language (SQL), though it differs in some places. Entity-relationship model been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology led to this new kind of database. These databases attempt to bring the database world

and the application-programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases. A variety of these ways have been tried[by whom?] for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others[which?] have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities. Object relational model the relational model are sometimes classified as post-relational.[7] Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that all information in the database must be cast explicitly in terms of values in relations and in no other way Some of these extensions to the relational model integrate concepts from technologies that predate the relational model. For example, they allow representation of a directed graph with trees on the nodes. The German company sones implements this concept in its GraphDB. Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems.

Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational. The resource space model (RSM) is a non-relational data model based on multi-dimensional classification.

Database languages
Database languages are dedicated programming languages, tailored and utilized to define a database (i.e., its specific data types and the relationships among them), manipulate its content (e.g., insert new data occurrences, and update or delete existing ones), and query it (i.e., request information: compute and retrieve any information based on its data). Database languages are data-model-specific, i.e., each language assumes and is based on a certain structure of the data (which typically differs among different data models). They typically have commands to instruct execution of the desired operations in the database. Each such command is equivalent to a complex expression (program) in a regular programming language, and thus programming in dedicated (database) languages simplifies the task of handling databases considerably. An expressions in a database language is automatically transformed (by a compiler or interpreter, as regular programming languages) to a proper computer program that runs while accessing the database and providing the needed results. The following are notable examples: SQL for the Relational model A major Relational model language supported by all the relational DBMSs and a standard. SQL was one of the first commercial languages for the relational model. Despite not adhering to the relational model as described by Codd, it has become the most widely used database language.[10][11] Though often described as, and to a great extent is a declarative language, SQL also includes procedural elements. SQL became a standard of the American National Standards

Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then the standard has been enhanced several times with added features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of the standard. Among the reasons mentioned are the large size, and incomplete specification of the standard, as well as vendor lock-in. OQL for the Object model An object model language standard (by the Object Data Management Group) that has influenced the design of some of the newer query languages like JDOQL and EJB QL, though they cannot be considered as different flavors of OQL. XQuery for the XML model XQuery is an XML based database language (also named XQL). SQL/XML combines XQuery and XML with SQL.

Database System Architecture


Database architecture (to be distinguished from DBMS architecture; see below) may be viewed, to some extent, as an extension of data modeling. It is used to conveniently answer requirements of different end-users from a same database, as well as for other benefits. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but not other many details about employees, that are the interest of the human resources department. Thus different departments need different views of the company's database, that both include the employees' payments, possibly in a different level of detail (and presented in different visual forms). To meet such requirement effectively database architecture consists of three levels: external, conceptual and internal. Clearly separating the three levels was a major feature of the relational database model implementations that dominate 21st century databases.[13]

The external level defines how each end-user type understands the organization of its respective relevant data in the database, i.e., the different needed end-user views. A single database can have any number of views at the external level. The conceptual level unifies the various external views into a coherent whole, global view.[13] It provides the common-denominator of all the external views. It comprises all the end-user needed generic data, i.e., all the data from which any view may be derived/computed. It is provided in the simplest possible way of such generic data, and comprises the back-bone of the database. It is out of the scope of the various database end-users, and serves database application developers and defined by database administrators that build the database. The Internal level (or Physical level) is as a matter of fact part of the database implementation inside a DBMS (see Implementation section below). It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the conceptual level, provides supporting storage-structures like indexes, to enhance performance, and occasionally stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in attempt to optimize the overall database usage by all its end-uses according to the database goals and priorities. All the three levels are maintained and updated according to changing needs by database administrators who often also participate in the database design. The above three-level database architecture also relates to and being motivated by the concept of data independence which has been described for long time as a desired database property and was one of the major initial driving forces of the Relational model. In the context of the above architecture it means that changes made at a certain level do not affect definitions and software developed with higher level interfaces, and are being incorporated at the higher level automatically. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which saves substantial change work that would be needed otherwise.

In summary, the conceptual is a level of indirection between internal and external. On one hand it provides a common view of the database, independent of different external view structures, and on the other hand it is uncomplicated by details of how the data are stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and depends on its implementation (see Implementation section below), requires a different level of detail and uses its own data structure types, typically different in nature from the structures of the external and conceptual levels which are exposed to DBMS users (e.g., the data models above): While the external and conceptual levels are focused on and serve DBMS users, the concern of the internal level is effective implementation details

Database users and AdministratorA database administrator (short form DBA) is a


person responsible for the installation, configuration, upgrade, administration, monitoring and maintenance of databases in an organization.[1] The role includes the development and design of database strategies, system monitoring and improving database performance and capacity, and planning for future expansion requirements. They may also plan, co-ordinate and implement security measures to safeguard the database.

EntityRelationship model (E-R model )


an Entity Relationship model (ER model for short) is an abstract way to describe a database. It usually starts with a relational database, which stores data in tables. Some of the data in these tables point to data in other tables - for instance, your entry in the database could point to several entries for each of the phone numbers that are yours. The ER model would say that you are an entity, and each phone number is an entity, and the relationship between you and the phone numbers is 'has a phone number'. Diagrams created to design these entities and relationships are called entityrelationship diagrams or ER diagrams. This article refers to the techniques proposed in Peter Chen's 1976 paper.[1] However, variants of the idea existed previously,[2] and have been devised subsequently such as supertype and subtype

data entities

[3]

and commonality relationships (an example with additional concepts is the

enhanced entityrelationship model). Using the three schema approach to software engineering, there are three levels of ER models that may be developed. The conceptual data model is the highest level ER model in that it contains the least granular detail but establishes the overall scope of what is to be included within the model set. The conceptual ER model normally defines master reference data entities that are commonly used by the organization. Developing an enterprise-wide conceptual ER model is useful to support documenting the data architecture for an organization. A conceptual ER model may be used as the foundation for one or more logical data models. The purpose of the conceptual ER model is then to establish structural metadata commonality for the master data entities between the set of logical ER models. The conceptual data model may be used to form commonality relationships between ER models as a basis for data model integration. A logical ER model does not require a conceptual ER model especially if the scope of the logical ER model is to develop a single disparate information system. The logical ER model contains more detail than the conceptual ER model. In addition to master data entities, operational and transactional data entities are now defined. The details of each data entity are developed and the entity relationships between these data entities are established. The logical ER model is however developed independent of technology into which it will be implemented. One or more physical ER models may be developed from each logical ER model. The physical ER model is normally developed be instantiated as a database. Therefore, each physical ER model must contain enough detail to produce a database and each physical ER model is technology dependent since each database management system is somewhat different. The physical model is normally forward engineered to instantiate the structural metadata into a database management system as relational database objects such as database tables, database indexes such as unique key indexes, and database constraints such as a foreign key constraint or a commonality constraint. The ER model is also normally used to design modifications to the relational database objects and to maintain the structural metadata of the database.

The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design".

E-R Diagrams

Two related entities

An entity with an attribute

A relationship with an attribute

Primary key An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities

of a domain. When we speak of an entity, we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world.[4] An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term. Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem.

Introduction to relational databases


A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS). A relational database is the predominant choice in storing data, over other models like the hierarchical database model or the network model. The relational database was first defined in June 1970 by Edgar Codd, of IBM's San Jose Research Laboratory. Terminology

Relational database terminology. Relational database theory uses a set of mathematical terms, which are roughly equivalent to SQL database terminology. The table below summarizes some of the most important relational database terms and their SQL database equivalents.

Relational term relation, base relvar derived relvar tuple attribute (name) relation scheme, set of attributes attribute value

SQL equivalent table view, query result, result set row column name set of column names column (data)

UNIT II RELATIONAL MODEL

The relational model


The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.[1][2] In the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

Diagram of an example database according to the Relational model.

In the relational model, related records are linked together with a "key". The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they

want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries. Most implementations of the relational model use the SQL data definition and query language. A table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely argued against deviations that compromise the original principles.

The catalog: Types Keys


* Alternate key - An alternate key is any candidate key which is not selected to be the primary

key * Candidate key - A candidate key is a field or combination of fields that can act as a primary key field for that table to uniquely identify each record in that table. For Eg: The table: Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext) In the above example Ssn no. and employee identity are ccandidate keys.

* Compound key - compound key (also called a composite key or concatenated key) is a key that consists of 2 or more attributes. * Primary key - a primary key is a value that can be used to identify a unique row in a table. Attributes are associated with it. Examples of primary keys are Social Security numbers (associated to a specific person) or ISBNs (associated to a specific book). In the relational model of data, a primary key is a candidate key chosen as the main method of uniquely identifying a tuple in a relation. For Eg: Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)

* Superkey - A superkey is defined in the relational model as a set of attributes of a relation variable (relvar) for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally dependent. For Eg: Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext) <Ssn,Name,Address> <Ssn,Name> <Ssn> All the above are super keys.

* Foreign key - a foreign key (FK) is a field or group of fields in a database record that points to a key field or group of fields forming a key of another database record in some (usually different) table. Usually a foreign key in one table refers to the primary key (PK) of another table. This way references can be made to link For Eg: For a Student.... School(Name,Address,Phone,School_Reg_noprimary_key

Relational algebra
In computer science, relational algebra is an offshoot of first-order logic and of algebra of sets concerned with operations over finitary relations, usually made more convenient to work with by identifying the components of a tuple by a name (called attribute) rather than by a numeric column index, which is what is called a relation in database terminology. The main application of relational algebra is providing a theoretical foundation for relational databases, particularly query languages for such databases, chiefly among which is

Introduction Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. (See section Implementations.) Both a named and a unnamed perspective are possible for relational algebra, depending on whether the tuples are endowed with component names or not. In the unnamed perspective, a tuple is simply a member of a Cartesian product. In the named perspective, tuples are functions from a finite set U of attributes (of the relation) to a domain of values (assumed distinct from U).[1] The relational algebras obtained from the two perspectives are equivalent.[2] The typical undergraduate textbooks present only the named perspective though,[3][4] and this article follows suit. Relational algebra is essentially equivalent in expressive power to relational calculus (and thus first-order logic); this result is known as Codd's theorem. One must be careful to avoid a mismatch that may arise between the two languages because negation, applied to a formula of the calculus, constructs a formula that may be true on an infinite set of possible tuples, while the difference operator of relational algebra always returns a finite result. To overcome these difficulties, Codd restricted the operands of relational algebra to finite relations only and also proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are found in many other logic-based computer languages. Codd defined the term relational completeness to refer to a language that is complete with respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the applicability of his relational algebra for database purposes. Primitive operations As in any algebra, some operators are primitive and the others are derived in terms of the primitive ones. It is useful if the choice of primitive operators parallels the usual choice of primitive logical operators.

Five primitive operators of Codd's algebra are the selection, the projection, the Cartesian product (also called the cross product or cross join), the set union, and the set difference. Another operator, rename was not noted by Codd, but the need for it is shown by the inventors of ISBL. These six operators are fundamental in the sense that omitting any one of them causes a loss of expressive power. Many other operators have been defined in terms of these six. Among the most important are set intersection, division, and the natural join. In fact ISBL made a compelling case for replacing the Cartesian product with the natural join, of which the Cartesian product is a degenerate case. Altogether, the operators of relational algebra have identical expressive power to that of domain relational calculus or tuple relational calculus. However, for the reasons given in section Introduction, relational algebra is less expressive than first-order predicate calculus without function symbols. Relational algebra corresponds to a subset of first-order logic, namely Horn clauses without recursion and negation. Set operators Although three of the six basic operators are taken from set theory, there are additional constraints that are present in their relational algebra counterparts: For set union and set difference, the two relations involved must be union-compatiblethat is, the two relations must have the same set of attributes. Because set intersection can be defined in terms of set difference, the two relations involved in set intersection must also be union-compatible. The Cartesian product is defined differently from the one in set theory in the sense that tuples are considered to be 'shallow' for the purposes of the operation. That is, the Cartesian product of an n-tuple by an m-tuple has the 2-tuple "flattened" into an (n + m)-tuple. In set theory, the Cartesian product is a set of 2-tuples. More formally, R S is defined as follows: R S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) R, (s1, s2, ..., sm) S} Like the Cartesian product, the cardinality of the result is the product of the cardinalities of its factors, i.e., |R S| = |R| |S|. In addition, for the Cartesian product to be defined, the two

relations involved must have disjoint headersthat is, they must not have a common attribute name. Projection () A projection is a unary operation written as where is a set of attribute

names. The result of such projection is defined as the set that is obtained when all tuples in R are restricted to the set .

This specifies the specific subset of columns (attributes of each tuple) to be retrieved. To obtain the names and phone numbers from an address book, the projection might be written . The result of that projection would be a relation which contains only the contactName and contactPhoneNumber attributes for each unique entry in addressBook. Selection () A generalized selection is a unary operation written as where is a propositional

formula that consists of atoms as allowed in the normal selection and the logical operators (and), (or) and (negation). This selection selects all those tuples in R for which holds.

To obtain a listing of all friends or business associates in an address book, the selection might be written as . The result would be a

relation containing every attribute of every unique record where isFriend is true or where isBusinessContact is true. Rename () A rename is a unary operation written as where the result is identical to R except that

the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the attribute of a relation or the relation itself.

To

rename

the

'isFriend'

attribute

to

'is

BusinessContact'

in

relation,

might be used.

Domain relational calculus


In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.[1] In DRC, queries have the form:

where each Xi is either a domain variable or constant, and true.

denotes a

DRC formula. The result of the query is the set of tuples Xi to Xn which makes the DRC formula

This language uses the same operators as tuple calculus, the logical connectives (and), (or) and (not). The existential quantifier () and the universal quantifier () can be used to bind the variables. Its computational expressiveness is equivalent to that of Relational algebra.[2] Examples Let (A, B, C) mean (Rank, Name, ID) and (D, E, F) to mean (Name, DeptName, ID) Find all captains of the starship USS Enterprise:

In this example, A, B, C denotes both the result set and a set in the table Enterprise.

Find Names of Enterprise crewmembers who are in Stellar Cartography:

In this example, we're only looking for the name, and that's B. F = C is a requirement, because we need to find Enterprise crew members AND they are in the Stellar Cartography Department.

Tuple relational calculus


Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for this data model. It formed the inspiration for the database-query languages QUEL and SQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de-facto-standard database-query language; viz., a dialect of SQL is used by nearly every relational-databasemanagement system. Lacroix and Pirotte proposed domain calculus, which is closer to first-order logic and which showed that both of these calculi (as well as relational algebra) are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.

Definition of the calculus Relational database Since the calculus is a query language for relational databases we first have to define a relational database. The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes, which are ordered pairs of domain and value; or just a row. A relvar (relation variable) is a set of ordered pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row. We first assume the existence of a set C of column names, examples of which are "name", "author", "address" et cetera. We define headers as finite subsets of C. A relational database schema is defined as a tuple S = (D, R, h) where D is the domain of atomic values (see relational

model for more on the notions of domain and atomic value), R is a finite set of relation names, and h : R 2C a function that associates a header with each relation name in R. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial function t:CD that maps some column names to an atomic value in D. An example would be (name : "Harry", age : 25). The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t). Finally we define a relational database given a schema S = (D, R, h) as a function db : R 2TD that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t in db(r) it holds that dom(t) = h(r). The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.

Fundamental operations Additional operations


SQL or Structured Query Language is a special-purpose programming language designed for managing data in relational database management systems (RDBMS). Originally based upon relational algebra and tuple relational calculus, its scope includes data insert, query, update and delete, schema creation and modification, and data access control. SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks".[4] Despite not adhering to the relational model as described by Codd, it became the most widely used database language.[5][6] Although SQL is often described as, and to a great extent is, a declarative language, it also includes procedural elements. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then, the standard has been enhanced several times with added

features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of, the standard. Among the reasons mentioned are the large size and incomplete specification of the standard, as well as vendor lock-in.

SQL fundamentals
Language elements The SQL language is subdivided into several language elements, including: Clauses, which are constituent components of statements and queries. (In some cases, these are optional.)[10] Expressions, which can produce either scalar values or tables consisting of columns and rows of data. Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) or Boolean (true/false/unknown) truth values and which are used to limit the effects of statements and queries, or to change program flow. Queries, which retrieve the data based on specific criteria. This is the most important element of SQL. Statements, which may have a persistent effect on schemata and data, or which may control transactions, program flow, connections, sessions, or diagnostics.
o

SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.

Integrity
In computing, data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle,[1] and is an especially important feature of a database or RDBMS system. Data warehousing and business intelligence in general demand the accuracy, validity and correctness of data despite hardware failures, software bugs or human error. Data that has integrity is identically maintained during any operation, such as transfer, storage or retrieval. All characteristics of data, including business rules, rules for how pieces of data relate, dates, definitions and lineage must be correct for its data integrity to be complete. When functions operate on the data, the functions must ensure integrity. Examples include transforming the data, storing history and storing metadata.

Types of integrity constraints


Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity:

Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null. Referential integrity concerns the concept of a foreign key. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be null. In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown. Domain integrity specifies that all columns in relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.

If a database supports these features it is the responsibility of the database to insure data integrity as well as the consistency model for the data storage and retrieval. If a database does not support these features it is the responsibility of the applications to insure data integrity while the database supports the consistency model for the data storage and retrieval. Having a single, well-controlled, and well-defined data-integrity system increases
stability (one centralized system performs all data integrity operations) performance (all data integrity operations are performed in the same tier as the consistency model) re-usability (all applications benefit from a single centralized data integrity system) maintainability (one centralized system for all data integrity administration).

As of 2012, since all modern databases support these features (see Comparison of relational database management systems), it has become the de-facto responsibility of the database to ensure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets, ISAM, flat files, etc.) for their consistency model lack any[citation needed] kind of data-integrity model. This requires organizations to invest a large amount of time, money, and personnel in building data-integrity systems on a per-application basis that effectively just duplicate the existing data integrity systems found in modern databases. Many companies, and indeed many database systems themselves, offer products and services to migrate out-dated and legacy systems to modern databases to provide these data-integrity features. This offers organizations substantial savings in time, money, and resources because they do not have to develop perapplication data-integrity systems that must be re-factored each time business requirements change.

Trigger
A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record (representing a

new worker) is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers in Microsoft SQL Server Microsoft SQL Server supports triggers either after or instead of an insert, update or delete operation. They can be set on tables and views with the constraint that a view can be referenced only by an INSTEAD OF trigger. Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers, which can fire in reaction to a very wide range of events, including: Drop table Create table Alter table Login events A full list is available on MSDN. Performing conditional actions in triggers (or testing data following modification) is done through accessing the temporary Inserted and Deleted tables.

Security
SQL Server 2012 By default, both DML and DDL triggers execute under the context of the user that calls the trigger. The caller of a trigger is the user that executes the statement that causes the trigger to run. For example, if user Mary executes a DELETE statement that causes DML trigger DML_trigMary to run, the code inside DML_trigMary executes in the context of the user privileges for Mary. This default behavior can be exploited by users who want to introduce malicious code in the database or server instance. For example, the following DDL trigger is created by user JohnDoe: CREATE TRIGGER DDL_trigJohnDoe ON DATABASE FOR ALTER_TABLE AS GRANT CONTROL SERVER TO JohnDoe ; GO

What this trigger means is that as soon as a user that has permission to execute a GRANT CONTROL SERVER statement, such as a member of the sysadmin fixed server role, executes an ALTER TABLE statement, JohnDoe is granted CONTROL SERVER permission. In other words, although JohnDoe cannot grant CONTROL SERVER permission to himself, he enabled the trigger code that grants him this permission to execute under escalated privileges. Both DML and DDL triggers are open to this kind of security threat.

Advanced SQL features


Simple Features (officially Simple Feature Access) is both an OpenGIS and ISO standard (ISO 19125) that specifies a common storage model of mostly two-dimensional geographical data (point, line, polygon, multi-point, multi-line, etc.) The ISO 19125 standard comes in two parts. Part one, ISO 19125-1 (SFA-CA for "common architecture"), defines a model for two-dimensional simple features, with linear interpolation between vertices. The data model defined in SFA-CA is a hierarchy of classes. This part also defines representation using Well-Known Text (and Binary). Part 2 of the standard, ISO 19125-2 (SFA-SQL), defines an implementation using SQL.[1] The OpenGIS standard(s) cover implementations in CORBA and OLE/COM as well, although these have lagged behind the SQL one and are not standardized by ISO. The ISO/IEC 13249-3 SQL/MM Spatial extends the Simple Features data model mainly with circular interpolations (e.g. circular arcs) and adds other features like coordinate transformations and methods for validating geometries as well as Geography Markup Language support.

Embedded SQL
Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written inline with the program source code of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as: C/C++, COBOL and Fortran. The ANKITA SQL standards committee defined the embedded SQL standard in two steps: a formalism called Module Language was defined, then the embedded SQL standard was derived from Module Language.[1] The SQL standard defines embedding of SQL as embedded SQL and the language in which SQL queries are embedded is referred to as the host language. A popular host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase database management systems. In the PostgreSQL database management system this precompiler is called ECPG. Other embedded SQL precompilers are Pro*Ada, Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.

PL/SQL supports variables, conditions, loops and exceptions. Arrays are also supported, though in a somewhat unusual way, involving the use of PL/SQL collections. PL/SQL collections is a slightly advanced topic. Implementations from version 8 of Oracle Database onwards have included features associated with object-orientation.

Dynamic SQL
Once the program units have been stored into the database, they become available for execution at a later time. While programmers can readily embed Data Manipulation Language (DML) statements directly into their PL/SQL code using straightforward SQL statements, Data Definition Language (DDL) requires more complex "Dynamic SQL" statements to be written in the PL/SQL code. However, DML statements underpin the majority of PL/SQL code in typical software applications. In the case of PL/SQL dynamic SQL, early versions of the Oracle Database required the use of a complicated Oracle DBMS_SQL package library. More recent versions have however introduced a simpler "Native Dynamic SQL", along with an associated EXECUTE IMMEDIATE syntax. Oracle Corporation customarily extends package functionality with each successive release of the Oracle Database.

Introduction to distributed databases and client/server databases.


A distributed database is a database in which storage devices are not all attached to a common processing unit such as the CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components. Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of databases improves database performance at end-user worksites. [1][clarification needed]

To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations.[2] Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity.

UNIT III DATABASE DESIGN

Functional dependencies In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X Y) if, and only if, each X value is associated with precisely one Y value; R is then said to satisfy the functional dependency X Y. Equivalently, the projection is a function, i.e. Y is a function of X.[1][2] In simple words, if the values for

the X attributes are known (say they are x), then the values for the Y attributes corresponding to x

can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: X Y is called trivial if Y is a subset of X. The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. A simple application of functional dependencies is Heaths theorem; it says that a relation R over an attribute set U and satisfying a functional dependency X Y can be safely split in two relations having the losslessjoin decomposition property, namely into where Z = U XY are

the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions in database theory.) An important notion in this context is a candidate key, defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible. A notion of logical implication is defined for functional dependencies in the following way: a set of functional dependencies logically implies another set of dependencies also satisfies all dependencies from , if any relation R

satisfying all dependencies from

; this is usually written

. The notion of logical implication for functional dependencies admits a sound and complete finite axiomatization, known as Armstrong's axioms.

Non-loss decomposition

Functional dependency
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X Y) if, and only if, each X value is associated with precisely one Y value; R is then said to satisfy the functional dependency X Y. Equivalently, the projection is a function, i.e. Y is a function of X.[1][2] In simple words, if the values for

the X attributes are known (say they are x), then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: X Y is called trivial if Y is a subset of X. The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. A simple application of functional dependencies is Heaths theorem; it says that a relation R over an attribute set U and satisfying a functional dependency X Y can be safely split in two relations having the losslessjoin decomposition property, namely into where Z = U XY are

the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions in database theory.) An important notion in this context is a candidate key, defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible. A notion of logical implication is defined for functional dependencies in the following way: a set of functional dependencies logically implies another set of dependencies also satisfies all dependencies from , if any relation R

satisfying all dependencies from

; this is usually written

. The notion of logical implication for functional dependencies admits a sound and complete finite axiomatization, known as Armstrong's axioms. In the process of efficiently storing data, and eliminating redundancy, tables in a database are designed and created to be in one of five possible normal forms. Each normal form contains and enforces the rules of the previous form, and, in turn, applies some stricter rules on the design of tables.

A set of tables in a database are initially said to be in 0 normal form.

First Normal Form:


Tables are said to be in first normal form when:

- The table has a primary key. - No single attribute (column) has multiple values. - The non-key attributes (columns) depend on the primary key.

Some examples of placing a table in first normal form are:

author_id: 000024 000034 002345

stories: novelist, playwright magazine columnist novella, newpaper columnist // multiple values // multiple values

In first normal form the table would look like:

author_id: 000024 000024 000034 002345 002345

stories: novelist playwright magazine columnist novella newpaper columnist

Second Normal Form:


================= Tables are said to be in second normal form when: - The tables meet the criteria for first normal form. - If the primary key is a composite of attributes (contains multiple columns), the non key attributes (columns) must depend on the whole key.

Note: Any table with a primay key that is composed of a single attribute (column) is automatically in second normal form.

Third Normal Form:


================ Tables are said to be in third normal form when: - The tables meet the criteria for second normal form. - Each non-key attribute in a row does not depend on the entry in another key column.

Dependency preservation
Databases can be described as all of the following: Information sequence of symbols that can be interpreted as a message. Information can recorded as signs, or transmitted as signals. Data values of qualitative or quantitative variables, belonging to a set of items. Data icomputing (or data processing) are often represented by a combination of items organized in rows and multiple variables organized in columns. Data are typically the results of measurements and can be visualised using graphs or images. Computer data information in a form suitable for use with a computer. Data is often distinguished from programs. A program is a sequence of instructions that detail a task for the computer to perform. In this sense, data is everything in software that is not program code.

Boyce/codd normal form


BoyceCodd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by 3NF as originally defined.[1] Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper by Ian Heath in 1971.[2] Date writes: "Since that definition predated Boyce and Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal form. But it isn't."[3]

If a relational scheme is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist. A relational schema R is in Boyce Codd normal form if and only if for every one of its dependencies X Y, at least one of the following conditions hold:[4] X Y is a trivial functional dependency (Y X) X is a superkey for schema R

Multivalued dependency
In database theory, multivalued dependency is a full constraint between two sets of attributes in a relation. In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a relation. Therefore, a multivalued dependency is a special case of tuplegenerating dependency. The multivalued dependency plays a role in the 4NF database normalization. A multivalued dependency is a special case of a join dependency, with only two sets of values involved, i.e. it is a 2-ary join dependency.

Formal definition
The formal definition is given as follows. [1] Let be a relation schema and let and ) holds on (subsets). The multivalued dependency if, in any legal relation , there exist tuples and , for all in such that

(which can be read as multidetermines pairs of tuples and in such that

In more simple words the above condition can be expressed as follows: if we denote by the tuple having values for correspondingly, then whenever the tuples and should also exist in . and collectively equal to exist in , the tuples

Fourth normal form Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after BoyceCodd normal form (BCNF). Whereas the second, third, and BoyceCodd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X superset thereof.[1] Y, X is a superkeythat is, X is either a candidate key or a

Join dependencies and fifth normal form.


A join dependency is a constraint on the set of legal relations over a database scheme. A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. If one of the tables in the join has all the attributes of the table T, the join dependency is called trivial. The join dependency plays an important role in the Fifth normal form, also known as project-join normal form, because it can be proven that if you decompose a scheme in tables to , to

the decomposition will be a lossless-join decomposition if you restrict the legal relations on a join dependency on called .

Another way to describe a join dependency is to say that the set of relationships in the join dependency is independent of each other. Unlike in the case of functional dependencies, there is no sound and complete axiomatization for join dependencies.

Fifth normal form Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-

valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys. A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each of A, B, , Z is a superkey for R.[1]

Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies.

UNIT IV TRANSACTIONS Transaction concepts


A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes: 1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status. 2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided the program's outcome are possibly erroneous. A database transaction, by definition, must be atomic, consistent, isolated and durable.[1] Database practitioners often refer to these properties of database transactions using the acronym ACID.

Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.

Transaction recovery
A transactional database is a DBMS where write transactions on the database are able to be rolled back if they are not completed properly (e.g. due to power or connectivity loss). Most modern relational database management systems fall into the category of databases that support transactions. In a database system a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systems consider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following: 1. Begin the transaction 2. Execute a set of data manipulations and/or queries 3. If no errors occur then commit the transaction and end it 4. If errors occur then rollback the transaction and end it If no errors occurred during the execution of the transaction then the system commits the transaction. A transaction commit operation applies all data manipulations within the scope of the transaction and persists the results to the database. If an error occurs during the transaction, or if the user specifies a rollback operation, the data manipulations within the transaction are not persisted to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state(Article by Vishak P Nair).

Internally, multi-user databases store and process transactions, often by using a transaction ID or XID. There are multiple varying ways for transactions to be implemented other than the simple way documented above. Nested transactions, for example, are transactions which contain statements within them that start new transactions (i.e. sub-transactions). Multi-level transactions are similar but have a few extra properties[citation transaction. ACID properties n computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. Jim Gray defined these properties of a reliable transaction system in the late 1970s and developed technologies to achieve them automatically.[1] In 1983, Andreas Reuter and Theo Hrder coined the acronym ACID to describe them Atomicity Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. Consistency The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof. Isolation
needed]

. Another type of transaction is the compensating

The isolation property ensures that the concurrent execution of transactions results in a system state that could have been obtained if transactions are executed serially, i.e. one after the other. Each transaction has to execute in total isolation i.e. if T1 and T2 are being executed concurrently then both of them should remain unaware of each other's presence[citation needed] Durability Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter).
System recovery

UNIT V IMPLEMENTATION TECHNIQUES Overview of Physical Storage Media


Several types of data storage exist in most computer systems. They vary in speed of access, cost per unit of data, and reliability. Cache: most costly and fastest form of storage. Usually very small, and managed by the operating system. Main Memory (MM): the storage area for data available to be operated on. General-purpose machine instructions operate on main memory. Contents of main memory are usually lost in a power failure or ``crash''. Usually too small (even with megabytes) and too expensive to store the entire database. Flash memory: EEPROM (electrically erasable programmable read-only memory). Data in flash memory survive from power failure.

Reading data from flash memory takes about 10 nano-secs (roughly as fast as from main memory), and writing data into flash memory is more complicated: write-once takes about 4-10 microsecs. To overwrite what has been written, one has to first erase the entire bank of the memory. It may support only a limited number of erase cycles ( to ). It has found its popularity as a replacement for disks for storing small volumes of data (5-10 megabytes). Magnetic-disk storage: primary medium for long-term storage. Typically the entire database is stored on disk. Data must be moved from disk to main memory in order for the data to be operated on. After operations are performed, data must be copied back to disk if any changes were made. Disk storage is called direct access storage as it is possible to read data on the disk in any order (unlike sequential access). Disk storage usually survives power failures and system crashes. Optical storage: CD-ROM (compact-disk read-only memory), WORM (write-once read-many) disk (for archival storage of data), and Juke box (containing a few drives and numerous disks loaded on demand). Tape Storage: used primarily for backup and archival data. Cheaper, but much slower access, since tape must be read sequentially from the beginning. Used as protection from disk failures! The storage device hierarchy is presented in Figure 10.1, where the higher levels are expensive (cost per bit), fast (access time), but the capacity is smaller.

Figure Storage-device hierarchy Another classification: Primary, secondary, and tertiary storage. Primary storage: the fastest storage media, such as cash and main memory. Secondary (or on-line) storage: the next level of the hierarchy, e.g., magnetic disks. Tertiary (or off-line) storage: magnetic tapes and optical disk juke boxes. Volatility of storage. Volatile storage loses its contents when the power is removed. Without power backup, data in the volatile storage (the part of the hierarchy from main memory up) must be written to nonvolatile storage for safekeeping.

Magnetic Disks
Magnetic storage and magnetic recording are terms from engineering referring to the storage of data on a magnetized medium. Magnetic storage uses different patterns of magnetization in a magnetizable material to store data and is a form of non-volatile memory. The information is accessed using one or more read/write heads. As of 2011, magnetic storage media, primarily hard disks, are widely used to store computer data as well as audio and video signals. In the field of computing, the term magnetic storage is preferred and in the field of audio and video production, the term magnetic recording is more commonly used. The distinction is less technical and more a matter of preference. Other examples of magnetic storage media include floppy disks, magnetic recording tape, and magnetic stripes on credit cards.

Information is written to and read from the storage medium as it moves past devices called readand-write heads that operate very close (often tens of nanometers) over the magnetic surface. The read-and-write head is used to detect and modify the magnetization of the material immediately under it. The magnetic surface is conceptually divided into many small sub-micrometer-sized magnetic regions, referred to as magnetic domains, (although these are not magnetic domains in a rigorous physical sense), each of which has a mostly uniform magnetization. Due to the polycrystalline nature of the magnetic material each of these magnetic regions is composed of a few hundred magnetic grains. Magnetic grains are typically 10 nm in size and each form a single true magnetic domain. Each magnetic region in total forms a magnetic dipole which generates a magnetic field. In older hard disk drive (HDD) designs the regions were oriented horizontally and parallel to the disk surface, but beginning about 2005, the orientation was changed to perpendicular to allow for closer magnetic domain spacing. For reliable storage of data, the recording material needs to resist self-demagnetization, which occurs when the magnetic domains repel each other. Magnetic domains written too densely together to a weakly magnetizable material will degrade over time due to rotation of the magnetic moment one or more domains to cancel out these forces. The domains rotate sideways to a halfway position that weakens the readability of the domain and relieves the magnetic stresses. Older hard disk drives used iron(III) oxide as the magnetic material, but current disks use a cobalt-based alloy.[1] A write head magnetizes a region by generating a strong local magnetic field, and a read head detects the magnetization of the regions. Early HDDs used an electromagnet both to magnetize the region and to then read its magnetic field by using electromagnetic induction. Later versions of inductive heads included metal in Gap (MIG) heads and thin film heads. As data density increased, read heads using magnetoresistance (MR) came into use; the electrical resistance of the head changed according to the strength of the magnetism from the platter. Later development made use of spintronics; in read heads, the magnetoresistive effect was much greater than in earlier types, and was dubbed "giant" magnetoresistance (GMR). In today's heads, the read and write elements are separate, but in close proximity, on the head portion of an actuator arm. The read element is typically magneto-resistive while the write element is typically thin-film inductive.[2] The heads are kept from contacting the platter surface by the air that is extremely close to the platter; that air moves at or near the platter speed. The record and playback head are mounted on a block called a slider, and the surface next to the platter is shaped to keep it just barely out of contact. This forms a type of air bearing. RAID RAID (redundant array of independent disks, originally redundant array of inexpensive disks[1][2]) is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called "RAID levels", depending

on what level of redundancy and performance (via parallel communication) is required. In October 1986, the IBM S/38 announced "checksum". Checksum was an implementation of RAID- The implementation was in the operating system and was software only and had a minimum of 10% overhead. The S/38 "scatter loaded" all data for performance. The downside was the loss of any single disk required a total system restore for all disks. Under checksum, when a disk failed, the system halted and was then shutdown. Under maintenance, the bad disk was replaced and then a parity-bit disk recovery was run. The system was restarted using a recovery procedure similar to the one run after a power failure. While difficult, the recovery from a drive failure was much shorter and easier than without checksum. RAID is an example of storage virtualization and was first defined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987.[3] Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of disassociating a low-cost expectation from RAID technology.[4] RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical drives. The physical drives are said to be "in a RAID", however the more common, incorrect parlance is to say that they are "in a RAID array".[5] The array can then be accessed by the operating system as one single drive. The different schemes or architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides a different balance between three key goals: resiliency, performance, and capacity

Tertiary Storage
Tertiary storage or tertiary memory,[4] provides a third level of storage. Typically it involves a robotic mechanism which will mount (insert) and dismount removable mass storage media into a storage device according to the system's demands; these data are often copied to secondary storage before use. It is primarily used for archiving rarely accessed information since it is much slower than secondary storage (e.g. 560 seconds vs. 110 milliseconds). This is primarily useful for extraordinarily large data stores, accessed without human operators. Typical examples include tape libraries and optical jukeboxes. When a computer needs to read information from the tertiary storage, it will first consult a catalog database to determine which tape or disc contains the information. Next, the computer will instruct a robotic arm to fetch the medium and place it in a drive. When the computer has finished reading the information, the robotic arm will return the medium to its place in the library.

File Organization
The arrangement of records in a file is known as file organization. It shows the arrangement of data in a file, deals with the arrangement of data items in secondary storage device. The main objectives of file organization are as follows:

1. To provide an efficient method to locate records neede for processing. 2. To facilitate file creation and its updation in future. For organizing records efficiently in the form of a computer file, following things are important: 1. A logical method should be observed to organize records in a file. 2. File structure should be so designed that it would allow quick access to needed data items. 3. Means of adding or deleting data items or records from files must be present.

Organization of Records in Files


A file may be organized as: 1. 2. 3. 4. Serial file Sequential file Direct file Indexed-sequential file

Serial file In a serial file the records are placed one after another, however there is no specific order in the arrangement of these records. On a magnetic tape storage device, the records are stored in a serial manner.

Sequential file A sequential file is a file in which the records are stored in some order i.e. either in ascending or descending.

Direct access A sequential file is not suitable for on-line enquiry, because in sequential access we have to traverse the records from the beginning to the end. Random access file organization is best suited for on-line processing systems where current information is the one that is always required.

Index-sequential file

Index- sequential files are also known as indexed sequential access method (ISAM) files. When files may require supporting both batch and on-line processing at that time we have to us the ISAM files. These files are basically a sequential file organized serially on key fields. In addition, an index is maintained which speeds up the access of isolated records. The file is divided into a number of blocks and the highest key in each block is indexed. Within each block the record is searched sequentially. This method is much faster than searching the entire file sequentially.

You might also like