Database note - Lecture notes 1-5 PDF

Title	Database note - Lecture notes 1-5
Course	Computer Systems
Institution	Durham University
Pages	17
File Size	692 KB
File Type	PDF
Total Downloads	102
Total Views	155

Preview

CLICK TO PREVIEW PDF

Summary

lecture note of 1-5...

Description

Lecture1 Introduction Database (DB): A collection of logically related data, designed to meet the needs of an organization   

a single repository of data, shared by many departments / users all data are integrated with minimum amount of duplication in large databases we may have a data dictionary: a separate “metadata” repository

DB Management System (DBMS): A software system that enables users to define / create / maintain / control the access to the DB

The basic features of a DBMS: 



Data Definition Language (DDL): o allows users to define the database: specify the data types, the structures and the constraints of the data o too low-level to describe organization of the data in a simple way, not easily understandable by most of users Data Manipulation Language (DML): o allows users to insert / update / delete / retrieve data from the database o Query Language: the part of DML that involves data retrieval a basic advantage over file-based systems: unlimited number of queries, which are easily implementable o the most common query language: SQL (Structured Query Language) o efficiency: a good DBMS can answer SQL queries quickly o offers controlled access to the DB:  security system (prevents unauthorized users)  concurrency control (many users simultaneously)  recovery control, etc.

DB Application Program: A computer program that interacts with the user and the DBMS  

sends an SQL statement to the DBMS can be conventional (local) applications, or online applications

DBMS Environment Components:  







Hardware: o Can range from a PC to a network of computers Software: o Database Management System (DBMS), operating system (OS), network software (if necessary) and also the application programs Data: o An abstract description of this data is called the schema of the DB o Used by the organization Procedures: o Documented instructions on how to use / run the system o e.g. how to log on the DBMS, how to use a particular application, how to make backups … People: o Any person who is involved with the system in any way

Lecture 2 Database Schemas and Planning We need to trust a DBMS to ensure that the database:  

is reliable remains always in a consistent state

when:  

software / hardware failures multiple users access the database simultaneously, eg. bank accounts, flight reservations

Database recovery: the process of restoring a database to a correct state after a failure Concurrency control protocols: prevent database accesses from interfering with each other

Transaction: an action (or series of actions) carried out by a single user / program, which reads / updates the database During the execution of a transaction: 

maybe in an inconsistent state, i.e. constraints may be violated!

At the end of a transaction:  

database again in consistent state valid integrity / referential constraints

two outcomes of a transaction:  

committed: when it completes successfully rolled back: when it does not complete successfully

a transaction is either performed entirely or not at all!

Concurrency Control: the process of managing simultaneous operations on the DB, without having them interfere with each other

Two transactions may be:  

both correct by themselves, but when they are executed simultaneously, they may cause inconsistency of the database

Main purpose: when many users access the DB

Data Model: a collection of intuitive concepts describing data, their relationships and constraints

Three characterizations of data: 





Structured data: o data represented in a strict format o the DBMS checks to ensure that the data follows:  the structures (table, attributes, domains)  the integrity & referential constraints Semi-structured data (XML) o self describing data o the “schema” information is mixed with the data values Unstructured data o very limited indication of the type / structure of data o eg. text document with some information, web page that contains some data

Relational Data Model:   

relations are tables (columns + rows) attributes are columns tuples are rows

Entity-Relationship (ER) model: Top-down approach to database design (graphical description of the DB) Basic concepts:   

Entities: the important data objects Attributes: the important properties of the entities Relationships: the associations between the entities

3-Level ANSI-SPARC* Architecture:   

External level: the part of the data that is relevant to each user (user’s view of data) Conceptual level: the logical structure of data, as it is seen by the DB Administrator (DBA) Internal level: physical representation of data in the DB, i.e. underlying data structures, algorithms, …

DB schema: total description of the DB DB instance: its data at a particular moment Objectives of a DB schema: 

All users have access at every point: o to the same DB instance o with customized views of parts of the data

Data independence: upper levels in the DB schema are not affected by changes to lower levels  

Logical data independence: External schemas (views) remain the same if we change the logical structure of the data (i.e. conceptual schema) Physical data independence: Conceptual schema remains the same if we change the internal schema (data structures, algorithms, …)

Three main phases of Database Design: 





Conceptual design o construct a first, high-level model of the data: ER model (identify the appropriate entities, their relationships and their constraints) o using the users’ requirements specification o independently of any physical considerations o it serves as the fundamental understanding of the system Logical design o construct the relational data model of the data o using the conceptual design (map entities / relationships to tables) o use normalization techniques to eliminate data redundancy / anomalies Physical design o describe the database implementation of the logical design o specific storage structures / access methods / security protection o aim is optimum performance

Lecture3 Relational Data Model Relational Data Model:   

relations between data(stored in tables) based on the concept of mathematical relations the most widely used Data Model (for structured data)

The schema of a relation: the description of a particular collection of data in the model Let A1, A2, … , An be a set of attributes that can be “related”, i.e. there exists an entity with some values for these attributes Then R(A1, A2, … , An)is the schema of the relation, e.g. if denotes customers, the schema of could be: R(name, id, address, town, date-of-last-purchase) 

this means: every customer has values for exactly these attributes

Relational Model Terminology:        

 

A relation is a table (with rows and columns) o this only refers to the logical structure of the DB, not the physical structure An attribute is a named column of a relation o every attribute has a unique name The domain of an attribute is the set of allowable values A tuple is a row of a relation o every tuple has a concrete value for every attribute! A cell of a relation is the intersection of a row and a column The degree of a relation is the number of attributes o i.e. every row stores as many values as the degree of the relation The cardinality of a relation is the number of tuples A relation is normalized if it is “appropriately structured”, e.g. o every cell has exactly one value (not more / less!) o no repetitions of two identical rows A Relational Database is a collection of normalized relations each of them with distinct relation names

NULL value:  

 

a special case of a cell entry it represents an attribute value that is: o either currently unknown o or not applicable not the same as the value “0”! may / may not belong to the domain of the attribute

Properties of Relations:  

   



The relation name is distinct from all other relation names in the relational schema Each attribute (within a relation) has a distinct name o possibly two attributes of different relations may have the same name o e.g. “name”, “id” etc. Values of an attribute are all from the same domain Each cell of relation contains exactly one atomic (single) value Each tuple is distinct among the tuples of the relation o there are no duplicate tuples The ordering of attributes has no significance o unlike mathematical relations o e.g. Cartesian product of two sets: ordered pairs of elements The ordering of tuples has no significance o just unordered rows

Structuring concept: Keys -uniquely identify a tuple in a (normalized) table 



Candidate key: (of a relation) o a minimal (not minimum!) set of attributes (“keys”), whose values uniquely identify the tuples Primary key:

   

o The candidate key selected to identify rows uniquely within the table Alternate key: o Those candidate key(s) not selected as primary key Simple key: o The key consists of only one attribute Composite key: o The key consists of several attributes Foreign key: o An attribute in one table A whose values must:  either match the primary key of another table B (then A references B)  or be NULL (e.g. staff has not been yet assigned to a branch)

Integrity Constraints Entity integrity:   



every attribute of a primary key cannot be null (otherwise we do not need all attributes of the primary key to identify the tuples) Purpose: o guarantees that each entity has unique identifier o ensures that foreign key values can reference primary key values example: o no invoice can have a duplicate number, nor can it be null o all invoices uniquely identified by the invoice number

Referential integrity:   



a foreign key either matches the primary key in the table it refers to or it is null Purpose: o any reference between tables is valid (or it has not been set yet) o prevents deleting a row in a table B, if the primary key of B has a matching foreign key in another table A Example: o a customer will be always assigned to a valid sales representative o unless (s)he is not yet assigned to any representative

Views:   



a virtual relation does not exist physically in the database The content of a view: o is derived from one (or more) base relations o is computed upon request by a user, at the time of request o changes when the underlying base relations change Main use: o show customized information to every user (e.g. show “loan number” but not “amount borrowed”) o compute dynamic quantities (e.g. “age” from “date-of-birth”)

Alternatives to the Relational Data Model Network Data Model  

records (tuples) appear as nodes relationships (foreign keys) appear as edges

Hierarchical Data Model   

special case of the Network Data Model, where the graph is a tree graph its structure mirrors parent-child relationship (one parent, many children) limitations of this model, e.g. o deleting a parent o adding a record without a parent

Lecture 4 Entity Relationship (ER) Model Entity-Relationship (ER) model: graphical description of the DB (conceptual design of the DB) Objective:   

to help understand the nature & relationships among the data to help derive the tables in the Relational Data Model (i.e. the logical design of the DB)

A top-down approach to database design:    

start with a set of requirements (informal system description) identify the types of “things” that you need to represent data about identify the attributes of those “things” (which will later become attributes in the tables)

Basic concepts:     



Entity: any real thing (or abstract notion) that we recognize as a separate concern within the database. Entity type: the group of all objects with the same properties, identified as having independent existence. Entity occurrence: a uniquely identifiable instance of an entity type Relationship: a named association between two entity types, which has some meaning in the context of the database Cardinality of a relationship: the number of entity occurrences that are related to a single occurrence of an associated entity type through this relationship o one-to-one (1:1) o one-to-many (1:*) o many-to-many (*:*)

o Optionality: on the relationship line, this circle is at the opposite end from the entity (“Teacher”) concerned in the optionality o it has a partial participation o otherwise it has total participation (i.e. it is associated to at least one occurrence of the other entity type)



o The attributes of an entity: the set of all common characteristics that are shared by all entity occurrences of an entity type

Step-by-step construction of an ER Model: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Identify entities from the scenario/real world Find relationships Draw rough ER diagram (i.e. an ER model) Fill in the relationships Define primary keys and resolve many-to-many relationships Identify attributes and map them to the entities Draw full ER diagram showing keys, attributes, relationships Check it! Does it reflect the real-world system?

Lecture 5 Functional Dependencies and Normalization I Bottom-up approach: Normalization   

start with the initial tables and attributes analyze the relationships among the attributes re-design the tables and attributes in a “better” way

Purpose of normalization:    

every relation represents a “real world” entity single-valued columns avoid redundancy (i.e. repetitions) data is easy to update correctly (i.e. avoid update anomalies)

Data anomalies terminology:    

Redundancy: repeating data in multiple different locations Modification anomaly: failure to maintain all existing instances of a specific value Deletion anomaly: losing other values as a side effect when you delete data Insertion anomaly: when new data items are inserted, we need to add much more irrelevant data; adding rows forces us to add information about other entities

Well-designed relational databases: 

No redundancy: every data item is stored only once(minimize the amount of space required; simplify maintenance of the database)

Functional data dependency:  

describes the relationship among attributes in the same relation let A and B be two sets of attributes; we say that “B is functionally dependent on A” (denoted ), if each value of A is associated with exactly one value of B...