Basic Relational Data Model

Introduction
Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS’s that separate structure (schema) from content, were introduced in the last chapter. The need for a DDL to define the structure of entities and their relationships, and for a DML to specify manipulation of database contents were also established. These concepts, however, were presented in quite abstract terms, with no commitment to any particular data structure for entities or links nor to any particular function to manipulate data objects.

There is no single method for organising data in a database, and many methods have in fact been proposed and used as the basis of commercial DBMS’s. A method must fully specify:

1. the rules according to which data are structured, and
2. the associated operations permitted
The first is typically expressed and encapsulated in a DDL, while the second, in an associated DML. Any such method is termed a Logical Data Model (often simply referred to as a Data Model). In short,

Data Model = DDL + DML
and may be seen as a technique for the formal description of data structure, usage constraints and allowable operations. The facilities available typically vary from one Data Model to another.

Each DBMS may therefore be said to maintain a particular Data Model (see Figure 2- 1).

More formally, a Data Model is a combination of at least three components:

(1) A collection of data structure types
(2) A collection of operators or rules of inference, which can be applied to any valid instance of data types in (1)
(3) A collection of general integrity rules, which implicitly or explicitly define the set of consistent database states or change of state or both

It is important to note at this point that a Data Model is a logical representation of data which is then realised on specific hardware and software platforms (its implementation, or physical representation as illustrated in Figure 2-1). In fact, there can be many different implementations of a given model, running on different hardware and operating systems and differing perhaps in their efficiency, performance, reliability, user interface, additional utilities and tools, physical limitations (eg. maximum size of databases), costs, etc. (see Figure 2-2). All of them, however, will support a mandatory minimal set of facilities defined for that data model. This is analogous to programming languages and their implementations, eg. there are many C compilers and many of them implement an agreed set of standard features regardless of the hardware and software platforms they run on. But as with programming languages, we need not concern ourselves with the variety of implementations when developing database applications - knowledge of the basic logical data model is sufficient for us to do that.

It is also important not to confuse the terms information model and data model. The former is an abstraction of a real world problem domain and talks of entities, relationships and instances (data objects) specific to that domain. The latter provides a domain independent formal framework for expressing and manipulating the abstractions of any information model. In other words, an information model is a description, by means of a data model, of the real world.

2.2 Relation Perhaps the simplest approach to data modelling is offered by the Relational Data Model, proposed by Dr. Edgar F. Codd of IBM in 1970. The model was subsequently expanded and refined by its creator and very quickly became the main focus of practically all research activities in databases. The basic relational model specifies a data structure, the so-called Relation, and several forms of high-level languages to manipulate relations.

The term relation in this model refers to a two-dimensional table of data. In other words, according to the model, information is arranged in columns and rows. The term relation, rather than matrix, is used here because data values in the table are not necessarily homogenous (ie. not all of the same type as, for example, in matrices of integers or real numbers). More specifically, the values in any row are not homogenous. Values in any given column, however, are all of the same type.

A relation has a unique name and represents a particular entity. Each row of a relation, referred to as a tuple, is a collection of facts (values) about a particular individual of that entity. In other words, a tuple represents an instance of the entity represented by the relation.

Figure 2-4 illustrates a relation called ‘Customer’, intended to represent the set of persons who are customers of some enterprise. Each tuple in the relation therefore represents a single customer.

The columns of a relation hold values of attributes that we wish to associate with each entity instance, and each is labelled with a distinct attribute name at the top of the column. This name, of course, provides a unique reference to the entire column or to a particular value of a tuple in the relation. But more than that, it denotes a domain of values that is defined over all relations in the database. The term domain is used to refer to a set of values of the same kind or type. It should be clearly understood, however, that while a domain comprises values of a given type, it is not necessarily the same as that type. For example, the column ‘Cname’ and ‘Ccity’ in Figure 2-4 both have values of type string (ie. valid values are any string). But they denote different domains, ie. ‘Cname’ denotes the domain of customer names while ‘Ccity’ denotes the domain of city names. They are different domains even if they share common values. For example, the string ‘Paris’ can conceivably occur in the Column ‘Cname’ (a person named Paris). Its meaning, however, is quite different to the occurrence of the string ‘Paris’ in the column ‘Ccity’ (a city named Paris)! Thus it is quite meaningless to compare values from different domains even if they are of the same type.
basic-relational-data-model-4

Moreover, in the relational model, the term domain refers to the current set of values found under an attribute name. Thus, if the relation in Figure 2-4 is the only relation in the database, the domain of ‘Cname’ is the set {Codd, Martin, Deen}, while that of ‘Ccity’ is {London, Paris}. But if there were other relations and an attribute name occurs in more than one of them, then its domain is the union of values in all columns with that name.

Enroll now!

Theoretical Paper

Practical Paper

Industrial Training

Basic Relational Data Model