DDnotes - notes PDF

Title DDnotes - notes
Author Matthew Lardieri
Course Database Design and Development
Institution Western Sydney University
Pages 12
File Size 1.3 MB
File Type PDF
Total Downloads 40
Total Views 143

Summary

notes...


Description

Database design and development Introduction to databases Data today stored in a relational database (collection of related data). Database design/development essential to manage/ keep data safe. Data displayed within tables (rows, columns) connected through relationships- have persistent data, entities/relationships and attributes. DBMS- software that manages the database/security (bridges end users with the database). DBMS allows users to Define the database via a Data Definition Language (DDL)-> metadata; insert, update, delete, retrieve data from database via a Data Manipulation Language (DML) . DB must be designed properly otherwise useless. Database application- a program that interacts with database at some point of its execution e.g. microsoft access. Database system= database + DBMS + database application. Used to store data in files in the file cabinet. Traditional databases were independent of each other e.g. one for sales & one for contracts (manages its own data) etc leads to data duplication, inconsistency. Now managed by a DBMS and stored/integrated into one relational database. Recall: A database is not owned just by one organisational department – it is usually shared by many departments. Type of Databases Single-user: Supports only one user at a time. Multi-user: Supports multiple users at the same time Centralized: Supports data being located at a single site Distributed: Supports data being distributed across several sites.

Lecture 2: COMPONENTS OF DBMS; ANSI/SPARC 3 LEVEL ARCHITECTURE, DATABASE ENVIRONMENT Components of DBMS Hardware: e.g. laptop/computer Software:DBMS, operating system, network software & also the application programs Data: Used by an organisation. What is stored in DB Procedures: Instructions/ rules that should be applied to the design & to the use of the database and DBMS. E.g. how to: log on People End users, Data administrators, database administrators Some Components of a Database Data dictionary - storage of all “data about data” such as field names etc. Data repository (Database warhouse) - actual storage facility of data entered into the database. Database interface - link between the data contained in the database and the external world, which may be a human or a computer program (e.g. GUI,) Advantages of Integrated DBMS v disadvantages Easier sharing of data, control over data redundancy, improve data consistency/ security/integrity/ accessibility/ backup time. Dis: high cost (hardware), increase vulnerability (data in one place), size People’s Roles in the Database Environment DB administrators, DB designers, programmers, end users Views of database Allow each user to have his or her own view of the database. The user can only see parts of the data contained in the database e.g. you can restrict access to employee table, that contains social security numbers, but allow access to view containing name/ phone number. Through SQL ANSI/SPARC, 3 Level Architecture ANSI - American National Standards Institute; SPARC- Standards Planning and Requirements Committee This conceptual architecture provides basis for understanding functionality of a DBMS, emphasizes the implementation-independent layers needed to separate application programs from the data.

Users should be able to access same data/ immune to changes/ not need to know physical DB storage. DBA change DB without affecting users’ views. DBA change conceptual structure of DB without affecting users Schemas- A relational schema outlines the structure of a database in a relational database management system Data Independence Logical Data Independence-immunity of external schemas to changes in the conceptual schema. Any changes in conceptual schema (e.g. addition/removal of entities) not require changes to external schema. Logical model changes due to such as the addition of new data items etc should be possible without affecting the existing external views. Physical Data Independence- immunity of conceptual schema to changes in the internal schema. Changes in Internal schema (e.g. using different file sizes, storage structures/devices) should not require changes to the conceptual or external schemas. Functions of a DBMS- Data Storing, Retrieval, and Update.

Lecture 3: OVERVIEW OF (AND INSIGHTS IN) DATABASE DESIGN, INTUITIVE TABLE BREAKING, Two most important question when creating database: what data is needed; what format. Table (entity) needs name. Can create one-table database- problems= data redundancy. Multiple tables created to reduce this problem. PK must be unique (unique identifier)- Ids created to fix primary key rule e.g. album title as primary key but two albums same so create albumID column to fix problem. Tables (entities) linked through relationships (relationships given

name and a direction e.g. song containedIn album etc) with same attribute (primary key to foreign key) one->one, one->many, many->one, many->many. e.g many children assigned to one teacher. Arrow points to the unique attribute. Forign key is primary key in different table-links tables together. Entity-relationship diagram dont have forign keys- connecting lines represent relationship between the tables

Relationship diagram

ERD

diagram

Lecture 4- Terminology A data model represent data in an understandable and manageable way. 3 components: Structural part – rules according to which DB can be constructed, Manipulative part – defines types of operations allowed on the data (how the data will be manipulated) Integrity constraints – ensure the data to be accurate Relation/entity/file: Table with columns and rows connected to another table through FK. Attribute/column/field: property of an entity

Domain: set of allowable values for one or more attributes e.g. aus postcode=4 vaules Tuple/row/record: row of a relation Degree: Number of attributes of a relation Cardinality is the number of tuples/rows in a relation. Cardinality of relationship: max number of possible relationship occurrences for an entity participating in a given relationship type e.g. 1.1 staff to 0.1 branch 1=min 1=max if 0 then optional Degree of relationships: is the number of entity types in a relationship e.g. unary-1 type….select/project, binary-2 types Relational database: Collection of normalized relations with relation name- depicted in tables in lines connecting. Many- many relationships broken into another table Relational keys- cannot be null= equal to 0, avoid things that might change e.g. one’s address, make em simple Candidate key – set of minimum number of attributes that uniquely identifies each occurrence of an entity type Primary key – Candidate key that is unique identifier of that table Foreign key – primary key in another table Composite key – combination of two or more columns in a table used to uniquely identify each row in the table Superkey- is reduced to the minimum number of columns required to uniquely identify each row. Relational Algebra A set is a collection of elements. A set X has an element x is denoted by x Î X.

Properties of relations:have unique name, Each attribute has a distinct name within the same relation.Each tuple is distinct: no duplicated tuples are allowed. The order of attributes is not important (attributes associated with field names anyway). The order of tuples also has no significance. Cell of relation atomic (contain one value)

Lecture Nulldatabase

5 Represents that are someone’s

values in unknown. E.g. middle name. IS

NULL in sql Is Not null- Must not be left empty. IS NOT NULL in sql Integrity constraints Entity Integrity- primary key can not be NULL. All entity names must be different. Primary key can’t be duplicated

Referential Integrity- foreign key values must match primary key values in another table Relational Algebra- work on one or more relations to define another relation without changing the original relation. Six basic operations in relational algebra: selection (deals with data/records) , projection (works on columns), cartesian product (when we want to know data across data, union (merging two tables together with same number of attributes), intersection (looks at common factor of two tables )set difference (looks at the differences) How to represent a query mathematically

More on ER Two entites can have more than one type of relationship between them. Relationships given role names. Relationships can have attributes. Recursive relationship: Strong entity: Exists independently of other types of entities. Has own unique identifier (key) Weak entity: Dependent on another entity.. Can’t exist on its own. Doesn’t have unique identifier. Introduction to sql Data definition language (DDL): SQL commands to create database tables, views etc. Data manipulation language: SQL includes commands to insert, update, delete and retrieve data from DB tables.

Lecture 6 Conceptual database design e.g. ER Diagram Don’t care about physical database just the data modelling Logical database design e.g. GRD diagram-will it be relational. Add keys and relationships Physical DB design e.g. tables/base relations, constraints etc.

An important aspect of data modelling is to capture rules/constraints correctly, as misunderstandings in entityrelationship modelling will appear as inadequate database designs Business Rule: rules that would determine or affect the data representation and data flow for an organization. A business rule may not contain all the information required to finalise a good database design. Additional business rules may be imposed by the designer (via common sense or clarification with the organisation) to remove the design ambiguity or to improve the database. ER Diagrams cont A useful step in preparation for drawing an ER diagram is to list the possible entities in the scenario Enhanced Entity-Relationship Model -Semantic concepts are incorporated into the original ER model and called the Enhanced Entity-Relationship (EER) model. Most useful additional concepts of EER model are called specialization / generalization. Superclass ¨An entity type that includes one or more distinct subgroupings of its occurrences. Subclass ¨A distinct subgrouping of occurrences of an entity type. Superclass/subclass relationship is one-to-one (1:1). Attribute Inheritance An entity in a subclass represents the same ‘real world’ object as in superclass, and may possess subclass-specific attributes, and will always possess those associated with the superclass. Generalization The process of minimizing the differences between entities by identifying their common characteristics. i.e. define a more general entity type from a set of more specialized entity types (bottom-up) Specialization ¨The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics. i.e. define 1 or more subclasses from the superclass (topdown) Participation Constraints { mandatory } – Each member in the superclass must also be a member of (at least) a subclass. { optional } – A member in the superclass need not belong to any of its subclasses. Disjoint Constraints { and } – A member in the superclass may be a member of more than one subclass. { or } – A member in the superclass can be a member of (at most) one subclass.

Lecture 7 Why a good database design important? To meet organisation business needs!! Database design steps:

Database design methodology A structured approach that uses procedures, techniques, tools and documentation aids to support and facilitate the process of design. DD is an iterative process, which has clear starting point followed by numerous refinements

Reduce data redundancy by merging two removing redundant relationships that aren’t needed

tables (with 1-1 relation) into 1 and

multivalued attribute: can have more than one value at a time for an attribute. For ex., skills of a surgeon is a multivalued attribute since a surgeon can have more than one skill. Derived Attribute – An attribute which can be derived from other attributes. e.g.; Age (can be derived from DOB) Strong entity- has pk/ nothing changes to table when changing from erd to grd. Weak entity- depends on other entity types/ add pk when changing from erd to grd to reduce redundancies

Lecture 8 Schemas- written description of what database looks like

Data Normalization- improve logical design to avoid data duplication/ data integrity/ easier access of data. Able to get answers with minimal number of operations. Reduce number of tables-> less file storage. Break up tables to remove redundancy Data Redundancy & update anomalies- relations that contain redundant info may suffer from update anomalies= Insertion, Deletion, Modification. Well structured tables/relations contain minimal data redundancy & allow user to insert, delete & update rows without causing data inconsistencies. One table staffbranch split to staff then branch so data can be added into branch without affecting staff table. The goal is to avoid anomalies like these below. Insertion Anomaly – When certain attributes can’t be inserted without the presence of other attributes. Deletion Anomaly – Deleting rows may cause a loss of data that would be needed for other future rows. Modification Anomaly – Changing data in a row forces changes to other rows because of duplication. Problem arise when only partial updates carried out e.g. only half branch addresses updated Table should contain only one entity type. Functional Dependency- Within a table, some attributes rely on other attributes to know their details e.g. Person phone number etc. rely of person name B functionally dependent on A e.g. Student name (B) depends on studentID (A) A1+ A2 determines attribute B E.g. know staffNo A1,sName A2-> branchNo B. However just know just StaffNo-> branchNo= partial dependency. Left hand side gets you right hand side. Don’t need sName as well to know branchNo= full functional dependency Functional dependency used in Normalisation- every attribute on left hand side determine attribute on right hand side using minimal number of attributes. Transitive Dependency

Main purpose of identifying a set of functional dependencies for a relation is to specify the set of integrity constraints that must hold on a relation. An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation SQL Aggregate function= Count, sum, avg,min,Max WHERE clause is used for filtering rows and it applies on each and every row, while HAVING clause is used to filter groups

Lecture

9

Normalisation simply to make sure no tables in database restricted to update anomalies. Normalisation properties: Lossless-join decomposition when break bigger table into smaller tables, no data lost in the process (same in reverse). Dependency preservation- constraints in original table reflected in smaller tables (functionally dependencies hold true). Functional dependencies derived from business rules or sample data given Process of Normalisation UNF (unnormalized)-> Table that contains one or more repeating groups { in scheme represents repeating group 1NF-> Remove repeating groups by table breaking (flatting table) Every attribute single valued/ domains of all attributes are atomic (no repeating fields etc) 2NF-> Remove partial dependencies. Table= full functional dependency. In 2NF every non-PK attribute is fully dependent on the PK. Break table up to get rid of partial dependencies 3NF-> remove transitive dependencies because they create insertion, deletion, update anomalies (A->B, B->C then C transitively dependent on A). Remove transitive dependencies in separate table linking with FK

SQL

Lecture 10

GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns. HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions

Lecture 11

BD security: Requires mechanisms that protect the database against intentional or accidental threats. Must protect other parts of system to also protect DB. Measures to avoid: theft/fraud, loss of confidentiality/ privacy/ integrity/ availability -> avoided through data recovery plan/ backups/ identity types of threats e.g. hardware- fire/ data corruption etc DBMS/Database- failure of security Security measures: Integrity, Encryption, firewall, authorization, access control, backup, journaling of all changes made, checkpoint- resume processes so don’t have to start again. RAID (Redundant array of independent disks): Data saved to multiple drives for faster retrieval/ storage Data administrator: Data policies, planning needs, resolving conflict, managing the data Database admin: responsible for logical/ physical database design, security, performance, backup/ recovery

SQL Check- used to limit the value range that can be placed in a column Default- Set a default value Unique- ensures that all values in a column are different Table joins- when using multiple tables, must use join statement. When FROM clause contains more than 1 table. WHERE clause used to join tables. Alias- shorthand way to reference table Client c c.clientNo. Inner join- tables what matches in both tables. Outer join- Left Keeps everything on left and takes what’s matched on right table. Right- Keeps everything on right and takes what’s matched on left table. EXISTS operator is used to test for the existence of any record in a subquery. EXISTS operator returns true if the subquery returns one or more records. Distinct- statement is used to return only distinct (different) values. IN operator allows you to specify multiple values in a WHERE clause Not in is opposite. BETWEEN operator selects values within a given range. The values can be numbers, text, or dates.

Lecture 12 Transaction: action/series of actions carried out by user/application, which reads/updates database content. Transforms database from one consistent state to another. Must be fully committed otherwise rollback- database back to original state. Success- transaction commits & database reaches new consistent state Failure- transaction aborts, and the database restored to original state Properties of transaction Atomicity-all of nothing/ consistency/isolation/ durability Transactions scheduled according to time applied to them. Number of input transactions should = num of output= serializability Concurrency control: Manages simultaneous operations on the DB without having them interfere with one another. Prevents interference when 2 or more users are accessing DB simultaneously. Problems caused by concurrency= lost update- successfully completed update is overridden by another user, uncommitted dependency-One transaction can see intermediate results of another transaction before it has been committed, inconsistent analysis- Occurs when one transaction reads several values but the 2nd transaction updates some of them during the execution of the first. Serializability- Objective of a concurrency control to schedule transactions in such a way as to avoid any interference. One could run transactions serially (one after another), but this limits the degree of concurrency Serial Schedule - Schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions. NonSerial Schedule-Schedule where operations from a set of concurrent transactions are interleaved. Objective of serializability is to find nonserial schedules that allow transactions to execute concurrently without interfering with one another. serializability: ordering of read/writes is important: If two transactions only read a data item, they do not conflict and order is not important. If two transactions either read or write completely separate data items, they do not conflict and order is not important. If one transaction writes a d...


Similar Free PDFs
Notes
  • 18 Pages
Notes
  • 12 Pages
Notes
  • 61 Pages
Notes
  • 35 Pages
Notes
  • 19 Pages
Notes
  • 70 Pages
Notes
  • 6 Pages
Notes
  • 35 Pages
Notes
  • 29 Pages
Notes
  • 70 Pages
Notes
  • 6 Pages
Notes
  • 19 Pages
Notes
  • 32 Pages
Notes
  • 28 Pages