Entity-Relationship (E-R) modeling
E-R diagrams (Entity-Relationship diagrams) and normalisation
There are two approaches that database designers commonly use to come up with a design for a relational database. They are used together, even though at first you may think they are completely different approaches that produce different designs for the same system!!
1. E-R diagrams.
The first approach is to produce an E-R diagram of the proposed system. The designer will ask themselves what obvious ‘entities’ exist in a system and how are they related. Entities are identifiable objects in a database about which you would store information. We have already seen lots of examples of entities such as Member, Dog, Breed, Undergraduate and Degree. Each of these entities requires a table to store real-life examples of that entity in (known as ‘records’). Each actual record is stored in a row in the appropriate table. Each record is made up of ‘fields’. A field is a piece of information you keep about an entity. In the Member’s table in the previous section, fields included, Initial, Surname, Title, Sex and Postcode, for example. Database designers often also refer to the ‘attributes’ of a particular record rather than ‘fields’ but they mean the same thing. Notice that fields are the columns in tables. Once the designer has come up with an E-R diagram of logically linked entities, they can then go ahead and build the database.
2. Normalisation.
A second technique the database designer can use is known as ‘normalisation’. This has its roots in mathematical analysis and can produce a very efficient design. It involves identifying all of the possible attributes in a database and then applying a set of rules to them in turn. Each stage in the process of normalisation can result in a ‘better’ design.
E-R diagrams and normalisation together
Normalisation will produce database designs that can be shown mathematically to be the ‘best’ design. By this, we mean a design that minimises the amount of data redundancy. However, it may not necessarily produce the best design in terms of ease of understanding for humans! In practice, the designer will use both techniques together!
The first approach
- The designer may well start a new design by producing an E-R diagram of the proposed system.
- They might then produce a Data Dictionary that details what attributes make up each entity.
- Finally, they might take each table in turn and ‘normalise’ it, to check that there are no data redundancy problems and that problems associated with adding and deleting records and amending data are removed. Normalisation in this case is used to validate the E-R diagram the designer has come up with.
The second approach
- The designer may well start a new design by producing an E-R diagram of the proposed system.
- They might then produce a Data Dictionary that details what attributes make up each entity.
- They might then list all the attributes they have identified in the Data Dictionary and normalise them together. This will produce a set of related tables.
- They then compare the design of the database using the E-R diagram they produced with the design produced by normalising the attributes.
- They will decide which design they want to go with (if they are different). It may mean that they decide to go with the E-R diagram because it is an easier design to follow, or they may go for the normalised design because it is the best for eliminating data redundancy, or they may use a hybrid of both designs, based on the designer’s prior experience. Whichever design is chosen, however, it is up to the database designer to fully justify it!!
E-R diagrams and their use in designing relational databases
An E-R diagram is a diagram that database designers use to show the relationships between groups of data (each group being known as an ‘entity’). It gives a simple yet effective overview of the entities in a system and how they relate to each other.
- It is a useful way of summarising what entities (tables) are needed in a relational database.
- It is a concise way of representing the relationships between records in tables. This can then be used both as a reference for the designer when setting up foreign keys and as a means of identifying ‘many-to-many relationships’, which can then be ‘resolved’. (See later in this section for further details about many-to-many relationships and why you need to resolve them).
- It can be used as a check to normalisation and a discussion aid. The results of both methods could be used together in design meetings. The design team can then discuss these how to go forward.
- It should be produced because it is a record of the design of the database and needs to be part of the technical manual. In the future, some maintenance of the database may be necessary. The person doing it (who may not be the original designer) needs to know the overall design and will expect to find the E-R diagram in the technical manual!
We will now look at the building blocks of E-R diagrams.
One-to-many relationships
In this database, there are two tables. One is called OWNER and the other is called PET. OWNER and PET are ‘entities’. They are things about which we keep information, or ‘attributes’. For example, the attributes we might store in the entity OWNER include ID, Surname, Contact phone number, Gender and so on.
A relationship between 2 entities must be read in two directions. In other words, to fully describe the relationship between two entities, you need two sentences, not one. The process of reading E-R diagrams is very mechanical.
- Always begin with the word ‘Each’.
- Add the entity name.
- Add the verb or phrase closest to that entity.
- Follow the relationship line to the other entity.
- Write down whether it is a ‘one’ or a ‘many’. (If it has three prongs then it is the ‘many’ side of the relationship. If it has one prong then it is the ‘one’ side.)
- Finally, write down the name of the other Entity.
In the above example, the two sentences that describe the relationship between OWNER and PET are:
-
- each OWNER owns many PETs
- each PET is owned by one OWNER.
The line used to show a one-to-many relationship is this:
Note that the ‘crow’s foot’ is the many side of the relationship.
An E-R diagram is constructed in the following way:
- Each entity is shown in a box.
- Entity names are always singular (PET not PETS, OWNER not OWNERS, for example).
- The correct type of relationship line is drawn, the right way round! The ‘crows foot’ is the many side.
- Verbs or phrases are added to aid understanding.
Many-to-many relationships
The symbol used to represent a many-to-many relationship is:
Here is an example of a many-to-many relationship:
Using the same method as before, we need to describe the relationship with two sentences:
-
- Each pupil studies many GCSEs.
- Each GCSE is studied by many pupils.
How would you implement this database? To help us understand the design, we will put some records in the tables, as before.
This is a far from ideal solution! For a start, how many attributes will you need in the GCSE table? You could have many indeed if there were lots of pupils taking a particular GCSE. More importantly, however, is that you (or rather Access or whichever software you are using) are going to have a problem recombining the two tables back into one! For example, a record in the GCSE table doesn't match to just one record in the pupil table. One GCSE will match back to many different records in the pupil table. This is not good!
Resolving many-to-many relationships
Fortunately, the solution is straightforward. It is known as 'resolving a many-to-many relationship'. If you have a many-to-many relationship, you can turn it into two one-to-many relationships by creating an extra linking table. In other words, this:
If the primary key of an entity A is attribute X and the primary key of entity B is attribute Y, then the primary key of the extra table is a ‘compound primary key’ made up of X and Y.
-
- A simple primary key is a primary key that is made up of only one attribute.
- A compound primary key is a primary key that is made up of more than one attribute.
An example of ‘resolving a many-to-many relationship’
Looking at some sample records for the Pupil-GCSE many-to-many relationship problem, we can see the following:
Reading the relationships, we have:
-
- Each record in the Pupil table (i.e. each actual pupil) has many entries in the Pupil_GCSE table. Have a look at pupil number one. She has three entries in the Pupil_GCSE table because she does three GCSEs.
- Each record in the Pupil_GCSE table (each row) relates to only one record (one pupil) in the Pupil table. Look, for example, at the last record in the Pupil_GCSE table (Pup_ID = 2 and GCSE_ID = 4). This record is related to only one record in the Pupil table. This particular record relates only to pupil number 2.
- Each GCSE appears in many different records (rows) in the Pupil_GCSE table. Look, for example, at the ICT GCSE. This GCSE appears twice in the Pupil_GCSE table. It appears in the second record and in the last one.
- Each record in the Pupil_GCSE table relates to just one GCSE. Look, for example, at the third record in the Pupil_GCSE table. (Pup_ID = 1 and GCSE_ID = 5). This record is related to only one record (one GCSE) in the GCSE table. It is related to GCSE number five, D&T.
We have resolved our many-to-many relationship. We store each pupil’s details only once. This is good! We store the details about each GCSE only once. This is also good! We have a third table that matches up each pupil to every GCSE they are taking. This table has a compound primary key made up of the primary key from the pupil table and the primary key from the GCSE table. Each compound primary key combination is unique! (Each row is unique - you cannot have two rows the same). At any time, we can recombine the data from any related records from the three tables into one record!
One-to-one relationships
This type of relationship is shown with a simple straight line. An example of this kind of relationship is shown below:
This kind of relationship is used to keep the database design compact and clear. For example, suppose you had a PRODUCT table for all products sold by a supermarket. Some of those products are CDs and some are, for example, fish! If you are keeping information (attributes) about the products, one possible set of attributes for the PRODUCT table might be the following:
Of course, the number of attributes needed could be very large. This is because ‘products’ in a supermarket aren’t restricted just to CDs and fish. They will include items such as clothes, insurance, cars, fresh vegetables and so on. Each of these products will need their own specialist attributes. For example, you might need an attribute to hold the size of clothes. Clearly, as you fill out this table with records of each actual product, some of these attributes will be left blank. For example, if you entered the details for ABBA's greatest hits, the 'Fresh / frozen' attribute will need to be left blank! So not only will you have a table with a great many different attributes, many of them for a particular product will be left blank. This is not the clearest of designs! It would be better to split up the PRODUCT table into logical entities. One entity would be purely for CDs. One would be purely for fish. Another would be for clothes, and so on.
Now you can keep the common product information in the PRODUCT table (like ID, Name, Price etc). However, any specific information relating to a particular product type would go in that product’s table. For example, if you have an ABBA CD for sale, the ID code, Name and Price would go in the Product table. There would be one entry in the CD_INFO table, where you would put the CD length, year of recording and artist. The result of using one-to-one relationships in a database is a clearer design.
Producing E-R diagrams from the beginning
Now you know how to interpret E-R diagrams, you can practice putting them together from the start. You would begin by investigating whatever system you have been asked to design a database for. You could do this (amongst other methods) by
interviewing people individually, carrying out group interviews, collecting example documents, observing people using the system or sending out questionnaires by email, fax or post. The next step would then be to write down how the system works.
A library database example
A library contains books! Pupils take out up to six books from the library. There may be more than one copy of a particular book. Books are usually written by only one author but sometimes they are written jointly, by more than one author.
This is a typical description of part of a library database. The entities in this particular system, the ‘objects’ about which we will keep details might be identified in a first attempt as BOOK, COPY_OF_BOOK, PUPIL and AUTHOR. We know that:
-
- each pupil can take out many different copies of books
- each copy of each book can only be taken out by one pupil at a time
- each book may have many copies
- each copy of each book can only be about one particular book
- each book can be written by many authors (more than one author can contribute to a single book)
- each author can write many books.
Drawing a first attempt for the E-R diagram gives us:
A few more points to note …
It is important to recognise a number of things:
-
- There is always more than one possible E-R diagram you can design that will do the job. The key is to be able to justify any particular design.
- E-R diagrams ‘evolve’. The process involves investigating a system and then designing a draft E-R diagram. This diagram needs to be thought about, discussed with other database designers or team members and reviewed. It then needs to be redrawn. The review process is repeated until everyone is happy! This is not a quick job! It takes time and experience to be able to design and review E-R diagrams effectively.
- The E-R diagram only shows the designer what tables they need in the database and how those tables are related. It doesn’t show the designer what fields will be in each table.
- So long as tables are related, any data held anywhere within the whole database can be retrieved.