## Relational Database Design (Contd)- 4

Welcome to module 20 of Database Management

Systems. We have been discussing about relational database

design, since the last 4 modules and this will be the concluding part of relational

database design. In the last module, we have seen some very

key concepts of relational design, that of normal forms Third and Boyce Codd normal form

specifically and how to decompose into them? And how do we get benefit in terms of doing

this kind of decomposition? In removing the anomalies by reducing the

redundancy in the design. In view of that in the background of that,

in this module will we would try to understand a new kind of data dependency, an additional

kind of data dependency, which is called multivalued dependency. Which can occur when an attribute can have

can take multiple possible values, which we had eliminated in the first normal form together

and based on that, we will define 4th normal form and decomposition into 4 NF and then

we will summarize this whole set of discussions of relational database design, and talk little

bit about what happens, when you have temporal data in your system. So, these are the outline points, based on

our objectives and we start with the discussion of multivalued dependency. Consider a situation like this. So, here we are trying to represent an individual

instead of persons with 3 attributes, man which is an may be id or name of that person,

phones and dog like. So, the idea is that the here persons and

they can have 1, 2, 3 any number of phones, which is true for all of us and then a person

may have any number of dogs that he or she likes. So, both of these phones and dog like D, P

and D attributes can take multiple values and ah. So, if we if we if you look at the 1 NF normalized

form here. So, in 1 NF what we do? We create separate rows for them. So, we have created separate rows for M 1

against P 1 and P 1 or P 2 in phones and similarly for D 1 and D 2. So, once we have done that then we have here,

we can see I have highlighted with yellow, you can see the different redundancies that

are arising. Because, since I have phones and dog liking

attributes. So, it is possible that if phone takes 2 values

and dog like takes 2 values, then actually 4 different combinations of them are possible. But in reality, it may be in reality it may

be actually these 2 are true, that M 1 has phone P 1 and likes dog D 1 M 1 has phone

P 2 and likes dog D 2, but you could also have such redundant tuples coming in, because

there they are now valid. So, this is the situation which we try to

try to capture, in terms of what you see here multivalued dependencies, where which is row

shown in terms of double arrows as you can see here. So, man I say determines multi determines

phones man multi determines dog likes. So, there are 2 different multi valued dependencies

in this case. So, this multi valued dependency adds a new

source of redundancy in our data, and that is very real in various models of our system. So, let us move on. So, this is just another example, we have

2 different relations Student give the student id and name, Courses giving course id and

name and the corresponding functional dependencies in them, you can see 2 instances of that. But if we as such, there is no relationship

between student and course, but if we choose to keep them in a single relation, say Student_Course

which I have shown on bottom right here. Then there will be you can see lot of redundancies

coming in, because since I they can be in terms of all different combinations. So, S 1 has in name A may be taking course

C 1 having name C, but again the S 1 having name A could be taking course C 2 having name

B, you just do not know which one is correct. So, you can see that here again you have 2

multiple value dependencies, one where SID multi determines CID and SID multi determines

C name. So, these are the 2 different multiple values

that you can, find against the SID and this is ah. So, if 2 or more multi valued dependencies

exist in a relation, then while we convert the we convert multivalued attributes into

single valued attributes, then the multi value dependency will show up. So, that is the basic problem that we would

like to address. This is another example of two relations,

where the ID and child are together, when ID and phone_number are together. So, naturally if I combine them into a single

relation, you have a set of possibilities of multiple different tuples. Because given an ID there could be multiple

children, there given an id there could be multiple phone numbers. Mind you, this relation of isn’t info is

still in Boyce Codd normal form, because there is no dependence there is no functional dependency

that holds on this relation. So, the key of this relation is the union

of all the three attributes and therefore, that being the key and no functional dependency

holding on it, naturally vacuously makes it Boyce Codd normal form, but you can still

see that there are redundancy in that is data. So now, let us define multivalued dependency

in a formal way and. So, we say that alpha multi determines beta,

naturally alpha and beta both have to be subsets of the given set of attributes. When we say that? When there are for all pairs of tuples t 1

and t 2 such that they match on the fields of alpha, this till this point it looks like

functional dependencies. There exists 2 more tuples t 3 and t 4 such

that this condition sold, what are the conditions? Look, carefully here we say that all of them

match on the alpha attributes which is fine, then you say that t 3 matches with t 1 in

the beta attributes and t 3 matches on the remaining attributes with t 2. Similarly, t 4 matches with t 2 in the beta

attributes and t 4 matches with t 1 on the remaining attributes. So, let us look at an example, gets confusing. So, here is course book and lecturer ah. So, it is a relationship of university courses

known naturally, every course has multiple recommended books and every course has been

taken by multiple different lecturers from time to time. So, course can have multiple books. So, there is a multivalued dependency here,

it can be taught by multiple lectures. So, there is a multivalued dependency here

and therefore, I can have an instance of this particular relation and I am just showing

you, how to test for the multivalued dependency course multi determines book. So, these are the 2, 4 tuples I have marked

t 1, t 2, t 3, t 4 if you look into the first condition. So, this is your alpha I am checking for. So, this is alpha this is beta. So, this is beta and this is ah. So, to say R minus beta minus alpha ok. So, the first condition that all these tuples

will have to match on alpha yes, they do, all 4 of them have AHA here. So, that is fine take at the second condition

t 3 on beta is Silberschatz and t 1 on beta is also Silberschatz. So, they match and t 3 on the remaining attributes

remaining attributes are, if I take out beta if I take out book it is AHA it is course

and the lecturer that is remaining. Now it already matches on the course. So, I do not have to check for that, but. So, I can just check for whether it matches

on lecturer, between some checking for this rule, whether t 3 and t 2 match yes t 3 and

t 2 match, they have the same name for the lecturer. Look at the next one which is t 4 and t 2

match on beta, t 4 and t 2 match on beta yes, they have the same name of the book, and whether

t 4 and t 1 match on the lecturer, this rule t 4 and t 1 match on the lecturer this rule. So, it also satisfies. So, I can say that this relation has holds

the multivalued dependency course multi determining book. In a similar way you can you can mark your

t 1, t 2, t 3, t 4 on this and check for course multi determining the lecturer, actually we

will we will soon state that; if course multi determines book, then it is trivial that course

will also multi determine lecturer. So, this is just to tell you if you have 3

non-empty sets of attributes Y, Z and W and then we say, Y multi determine Z, if and only

if there are these are the possible relations. That I can have Y 1 and Z 1 W 1 in a relation

and Y 1 and Z 2, W 2 in the relation, then I can have Y 1, Z 1 with W 2 and Y 1, Z 2

with W 1, that is you can basically take the cross of these to other 2 attributes and those

are r tuples, possible tuples in your relation and ah. So, you can you can naturally if you read

it in little in a different way, then you can observe that since the behavior of Z and

W are identical they are switchable. So, if Y multi determine Z, then you can you

have ZY multi determining W and vice versa. So, this is; what is a core observation in

terms of the multi value dependencies So, this is in terms of our example, you can

now clearly understand that ID multi determines child name and ID multi determines phone number,

in the earlier example that we took and ah. So, we can also note that if there is a functional

dependency, Y functionally determines Z then; obviously, Y will multi determine Z, that

is that is just quite obvious. So, we have to we can make use of multi value

dependency to specify, further constraints to remove redundancies and defining what is

legal in a relation. And if a relation fails to satisfy a given

multivalued dependency, then we can construct a relation R primed, that does satisfy the

multi valued dependency by adding tuples to that r right. Now, once having defined the notion of multi

valued dependency, we next proceed to check, how do we reason about that. So, I would remind you about functional dependencies,

and the different rules offunctional dependencies Armstrong’s rules, that we had introduced

the all of these of augmentation transitivity and all that. So, in terms of functional dependencies we

have 3 rules, commonly called the cat rules. Which purely involve the functional dependencies,

first is a complementation which is a kind which we have just discussed shown, that if

X multi determines Y, then X multi determines R minus X union Y with multi determines the

remaining set of attributes. Augmentation that is I can augment any multivalued

dependency with left and putting attributes on the left and right-hand side, as long as

I put all attributes that I put on the right-hand side, I put them on the left-hand side. I may put more attributes on the left-hand

side, but all attributes that I put on the right-hand side here Z must be a subset of

the attributes that I put on the left-hand side, that augmentation is possible. Transitivity is manifesting in a little different

way, if X multi determines Y and Y multi determines Z then, X multi determines Z minus Y. So, these are the these are the 3 rules which

are basically these 3 are rules that, involve only multi valued dependencies and the other

2 rules, actually involve the relationship between multi value dependency the replication

rule and the coalescence rule, which are between the multi value dependency and the functional

dependency. We are not going deeper into that further,

or trying to take specific examples and show how they work. I just want you to know that such rules exist

through which, you can define similar algorithms for multivalued dependency also, as we did

for functional dependency like as you can understand the most critical algorithm to

define would be the algorithm of closure, which can again be used in the situation where

I have functional as well as multivalued dependency. So, just we will keep that, in little bit

advance space of this course. So, just know that such things exist, but

we are not going into the details of that. Finally, for a multivalued dependency where,

X determines Y we call that MVD to be trivial. If either Yis a subset of X which is the notion

we used for functional dependencies or there is a second condition here, that the union

of the X and Y that left hand right hand side gives you the whole set of attributes, otherwise

it is a non-trivial multivalued dependency and we have to repeat the values. So, these are the two conditions, if they

satisfy then we know that we have a trivial multi value dependency and we do not want

to deal with that. So, there is little bit of references to the

theory given here, I have mentioned that there are closure algorithms ah. So, that given a set of dependencies I will

now generalize and say dependencies, which means that there could be functional dependencies,

as well as multivalued dependencies. Given a set of dependencies you can define

a closure of all of these functional and multivalued dependencies together, that are implied by

the given set and we can have all those parallel definitions of closure of the dependencies,

the minimal cover, canonical cover and so on. So, I just want you to note that these things

have been defined and the existent theory, but will be beyond the current course that

we are pursuing. So, it is now that we have is we have seen

an another additional source of redundancy in our data, in terms of multiple values and

in terms of the multi value dependency that hold. So, we would now like to look into if such

dependencies exist, then how do you decompose a relation to satisfy that the redundancy

caused by such dependencies are not affecting us. So, such a normal form it is beyond the third

normal form is called to be said to be a 4th normal form or 4 NF. Where you say that a relation is in 4 NF if,

every multi valued dependency alpha multi determining beta, in the closure of the set

of dependencies is either trivial, trivial means that left hand side is a subset of the

right-hand side or the union of the left and right-hand side gives you the whole set of

attributes. So, it is either trivial every dependency

is either trivial, or alpha left-hand side is a superset of the schema R, you can very

well relate that this is just a little twist on the definition of the Boyce Codd normal

form, where the second condition was identical and only thing in the first condition instead

of MVD, you had a functional dependency. So, when we have this, we say we are a relation

would be in the in the 4th normal form. Naturally, if a relation is in 4th normal

form, it is trivial that it will be in the Boyce Codd normal form, but the reverse will

not be necessarily true. So, again the same set of concepts that, if

I have a set of dependencies and you have a decomposed relation then smaller relation,

then I can project that set of dependencies, in terms of a particular subset of the attributes

and here is the condition that is given. So, the decomposition algorithm into 4 NF

is exactly like the decomposition algorithm of the Boyce Codd Normal form BCNF. Only difference being that, now you may be

doing this crucial step of 2A decomposition, for every multivalued dependency also earlier

we were doing this only for the functional dependency. So, now if there is any offending multivalued

dependency, which is not satisfying the 4 NF form? We can decompose the relation in terms of

R 1 and R 2, as in here which is exactly like the Boyce Codd normal form and then the rest

of it is simple. Ifif it is you know by this another important

point, that you that you must note is in this process you actually guarantee lossless join. So, this also continues to be in lossless

join, with every decomposition and then you keep on repeating till all dependencies in

F, in your set has been dealt with the attributes in R 1 and have converted them into the 4

NF form. So, you have a total 4 NF decomposition happening. let us take a this here, is the like before

here is a formal algorithm for those who would be interested, to formally study the steps. here I am just showing examples of 4 NF decomposition. So, we started this discussion with a person

relational scheme, having man, phone and dog likes MPD, I have added I have just modified

and I have added another attribute address. So, that in addition to the multi value dependencies,

I can also have a functional dependency. So, we have two multivalued dependencies like

before, man multi determining phones and man multi determining dog like, but now we have

a functional dependency man determining address the key continues to be MPD. So, all of these dependencies will violate

the 4 NF, because none of them satisfy the either of the condition, that none of them

are trivial and on for none of them left hand side is a super key because key is MPD. So, you can see that in on instances of this,

relational schema you will have multiple redundant records, in the actual instance. So, on the right we normalize we normalize

by taking FD 1; take the union of man and phones that gives you the first relation and

then the rest of it. Then again you split based on FD 2, you have

the second relation in the decomposition man and dog like and the third one gets generated

as a byproduct of that, which is man and address. And you have three relations now, which together

represent the original relation each one of them is in 4th normal form. Actually, what happens is, when you when you

have decomposed then, FD 1 in this has become a relation where, the multivalued dependency

man multi determining phones can be checked in terms of a functional dependency itself,

and that that is what gives you the multi value dependency. And since it is multivalued so, man and phones

together continues to form the key, similarly in the second one the man and dog like is

the key. Because you just have the multivalued dependency

and given the same man, you will have multiple dogs whom he or she likes, but in the third

one in the person address where you have man and address you have only man as the key,

because man is a functional dependency that holds. So, this is a simple illustration of decomposition

into 4 NF, here is a little more elaborate one, again this is a we have I have worked

through the steps. So, there are three multivalued dependencies

and you can see that, A multi determining B is not does not is not who does not hold

the condition of 4 NF. So, you have to decompose, you decompose get

R 1 which is in 4 NF and the remaining R 2 which is also not in 4 NF. So, decompose in R 3 which is in 4 NF and

R 4, we can R 4 is not in 4 NF you decompose R 4 into R 5 and R 6 and work through that,

and you will be able to see that R 5 is in 4 NF and R 6 also is in 4 NF, which gives

a complete multivalued decomposition of this whole set. Naturally with that, we will conclude our

discussion on the decomposition process, there are there would be some more aspects to look

at and there is lot of more normal forms that exist. But this is for all practical purposes; a

database is normalized, when it is represented in terms of the third normal form. And I have discussed still I have discussed

the 4th normal form, because in some places people prefer to represent also in 4th normal

form. So, that they guarantee that they have even

less redundancy in the data, but leaving that, let us quickly take a round in terms of the

what we have done so far and what is a basic overall design process that we should be following. So, again to remind you the goal for our design

is to have a relational database which is in BCNF or 3 NF has a lossless join due to

the decomposition, and dependency preservation. If we cannot achieve that, I am I am sorry

earlier what I meant is BCNF or 4 NF not BCNF and 3 NF. So, the idea would be I have a decomposition

in BCNF and 4 NF lossless join and dependency preservation which may not be achievable. If I cannot achieve that then I go I have

to sacrifice either, the lack of dependency preservation. So, dependencies will have to be checked using

natural join or, I will allow a little bit of redundancy and use the third normal form

where I have the guarantee. Now, at this point you must wonder and note

that SQL, the language in which we are doing the creation and update and the query processing. That SQL does not provide any directory of

specifying or checking any dependency, other than the functional other than the functional

dependency that checks the super key. Super key is the only functional dependency

that SQL would check, no other functional dependency or multivalued dependency and other

type dependencies are can be specified or checked in SQL. You can do that using assertions, in the in

the while discussing SQL I were talked about assertions we can do that using assertions,

but that too is very expensive to test. So, it is not usually supported by any of

the databases, which are widely used because that slows down your every process very, very

much. So, you can understand that in terms of your

design goals, you have to do a very good job to make sure that, your functional and multivalued

dependencies are accurately expressed in the design and accordingly the schemas are normalized

in the proper ways satisfying BCNF or 4 NF or 3 NF normal forms. But because, while you will actually have

instances there will not be a practical way, to see if you are violating any one or more

of theserules of dependencies that you have set. So, as I mentioned there are actually these

are not the only forms, there are various other normal forms as well and fifth normal

form, 6 normal form and so on, but it is very rarely these are very rarely used. It is not easy to decompose into these normal

forms and by this decomposition does not give you enough returns in terms of the reduction

of redundancy and removal of anomalies, that people often would have motivation to do them,

but you should know that such normal forms exist. So, in the overall process if we look, at

I mean what we have been doing is there are several tracks that we could be taking one

possible thing is, the whole set of attributes have been generated while we have converted

or relation has been generated. When you have converted the entity relationship

diagram, the UML or the ER diagram into a set of tables, that is how we got our set

of attributes or the relational schema R, it is also possible that we just started with

a single relation containing all attributes, which is called the universal relation? And then normalization will break them into

smaller relations. It could have been or could have been the

result of some adds of design of relations also, and then you convert them. So, there are possible all different possible

tracks that can happen. So, if we have taken the ER model track, then

frankly speaking if the ER model is carefully designed, then every entity defined in that

ER model will have only the dependency; which are the determining super key. So, just recall the employee department building

kind of situation we discussed earlier. So, an employee entity has attributes department

name and building, and there is a functional dependency from department_name to building. So, what it means that in the entity relationship

diagram itself we didn’t do a good job. If we had done a good job then we would have

identified that the department itself is an entity and therefore, would not feature as

an attribute on the employee. So, it would have been I mean right there,

we would have if we had called it as a separate entity, then that is equivalent of what we

are doing now taking the relation and then breaking it down through decomposition. So, functional dependencies from non-key attributes

of a relationship are possible ah, but are rare. So, mostly the relationships are binary, and

if you do a careful design of the ER model then many of these deep exercise of normalization

you will not have to go through. It should also be kept in your view, that

at the times when you want to de normalize want to use denormalized relations, because

if you have normalized and the only way to get back the original view is to perform join. So, if we of course, if you have prerequisites

and if you want to say, view or print prerequisites with that title and course id naturally you

will have to take a join with the course, which is expensive. So, one option could be first alternative

could be that, you use a denormalized relation, where the course prerequisite is actually

included in the course and you know that will have you know violations of some of the normal

forms. Because, there are there are functional dependencies

between them, but that will certainly lead you to first a look up, because you have them

in the same table you do not need to perform join. But you need extra space, exact extra execution

time for update, because you have redundant data you have redundancy while programming

on that coding on that, because of this redundancy there could be possibility of error, because

any of these anomalies can happen and your code will have to now take care of that. So, it does help in certain way in terms of

getting a better efficiency, but it there is a there is a cost to pay in a different

way also the other alternative could be you can have a materialized view, which is actually

the join in course of prerequisite. In terms of performance it has a same benefit

or the costs as you say, but only thing is you will not need to do that extra coding. So, it is better from that perspective. So, always keep the issue of denormalization

in view, and we do a careful design that if it is very frequent that, you will have to

compute a join then, you might want to sacrifice some of the redundancy some of the you know

possibilities of having anomaly, and still have a you know denormalized design in your

database. There are several other issues of design,

which do not get captured in what we have designed. For example, let say very regularly we are

we have returns, income tax returns to submit and we will be maintain income tax return

tax your sales tax return and on all that, and you maintain your accounts book of transactions

debit credit accounted and so on. Now naturally, these are all bound in terms

of one-year effectivity. So, when they come when in the next year comes,

then you need a separate you know set of records to be done for that year. So, how do you. So, if you if you have such a table where

you along with the company idea of year and amount and then how do you take care of this

situation, because one way could be that you have all of these you take out year, from

the attribute and you have separate table in every year. So, you will have to create new table and

remember their name. So, if queries which run across year will

be difficult to do. The other way could be that, you every New

Year you start renaming you know you do a year where your earnings from different years

are shown on different columns. So, you are basically every year you have

the result in terms of a different attribute. So, that also is not a very good solution

for a database it is something which is with the spreadsheets will often use, but in terms

of data which has a certain format and needs to be you know redefined from scratch, at

a at a different time frame in a different way, then you will come across these issues. let me close with just pointing out that,

if we have a one kind of data that we have not looked at which are temporally in nature,

that is all that we have said is the attributes and their values. So, if we put at value to an attribute then

that value is taken to be the truth for now, and for the past and for the features. So, if that value changes, then you completely

erase that in the database. So, for example, today I stay at a certain

address, tomorrow I may take up a different quarter my address has changed. So, in our design if there is against my employee

id there is an address given, then once I change my quarter my address will change it

will not be possible to recollect, what address I resided in say 2017. So, temporal data of such kind, temporal data

I am sorry just of such kind have an association with an interval. So, a snapshot often does not solve the problem. So, you have you have to decide, how you do

that? Whether you can you would like to put some

attributes, which specify the timestamp or you would like to I mean really have multivalued

attributes, denoting the different time frames where they are they may have taken effect,

there is no accepted standard. And the fact that, if you if you know that

it keeps on changing with time then your original dependencies might get affected they will

they will change as well. So, these are the things that you will have

to take care ah. This is another style, that many a times when

you have to say that ok. This course with this title existed from this

semester, a different semester the title may have changed. So, you can put a start and end attribute

with which specifies what is the time for which the remaining attributes made sense,

a good design, but these also have issues because, if you do this kind of temporal intervals,

then how do you make sure that between 2 records the intervals are not overlapped. So, you are not saying that at the same time

this course had them X this of course, also had name Y. So, they have to be disjoint. So, how do you check for thisthis consistency

of data, there is no easy way to do that. So, the foreign key references and all those. So, handling of temporal data is another aspect

which will have to be looked into very carefully, in the design and you will need to do some

kind of design compromise and implementation has to take care of those issues. So, to summarize we havetaken a full look

into the multivalued dependencies and tried to understand what happens, when your attributes

get multiple values. Learn the 4th normal form for that and the

decomposition into that, and most importantly we have tried to summarize the core database

design process. That we have been discussing for the last

4 modules, this is the 5th one including this and we have understood that and we have talked

a little bit about the temporal data. And with this we close our discussions on

the relational database design, and from the next module we will move on to other aspects

of the database systems.

12:03 – "if Y functionally determines Z, then Y multi determines Z"

Every FD is an MVD because if X functionally determines Y, then swapping Y's between tuples that agree on X doesn't create new tuples. – from wikipedia