Title | IS202 DM Notes - Summary Data Management |
---|---|
Author | Mingmiao Liu |
Course | Data Management |
Institution | Singapore Management University |
Pages | 17 |
File Size | 1.3 MB |
File Type | |
Total Downloads | 666 |
Total Views | 1,015 |
Topics for DM FinalER (Design and Critique a ER Model)Good Business Rule Declarative – what, not how Precise – clear, agreed-upon meaning Atomic – one statement Consistent – internally and externally Expressible – structured, natural language Distinct – non-redundant Business-oriented – understood b...
Topics for DM Final ER (Design and Critique a ER Model) Good Business Rule •
Declarative – what, not how
•
Precise – clear, agreed-upon meaning
•
Atomic – one statement
•
Consistent – internally and externally
•
Expressible – structured, natural language
•
Distinct – non-redundant
•
Business-oriented – understood by business people
Basic E-R Notation
Entity •
•
SHOULD BE: –
An object that will have many instances in the database
–
An object that will be composed of multiple attributes
–
An object that we are trying to model
SHOULD NOT BE: –
A user of the database system
–
An output of the database system (e.g. a report)
Identifier •
•
Identifier: –
An attribute (or combination of attributes) that uniquely identifies individual instances of an entity type
–
Identifier symbol: underline
Candidate Key: –
•
An attribute that could be an identifier… it satisfies the requirements for being an identifier
Characteristics –
Will not change in value
–
Will not be null: every instance must have a value for the identifier
–
Use simple identifiers instead of long, composite identifiers
Cardinality of Relationship •
One – to – One: Each entity in the relationship will have exactly one related entity
•
One – to – Many : An entity on one side of the relationship can have many related entities, but an entity on the other side will have a maximum of one related entity
•
Many – to – Many: Entities on both sides of the relationship can have many related entities on the other side
•
Constraints –
–
Minimum Cardinality •
If zero, then optional
•
If one or more, then mandatory
Maximum Cardinality •
The maximum number
Supertype and Subtypes •
Supertype: A generic entity type that has a relationship with one or more subtypes
•
Subtype: A subgrouping of the entities in an entity type which has attributes that are distinct from those in other subgroupings
•
Two Approach: Generalization and Specialization
Constraints in Subtype Completeness Constraints •
Whether an instance of a supertype must also be a member of at least one subtype –
Total Specialization Rule: Yes (double line)
–
Partial Specialization Rule: No (single line)
Total Specialization rule
Partial specialization rule
Disjointness Constraints •
Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes –
Disjoint Rule: An instance of the supertype can be only ONE of the subtypes (represented by “d”)
–
Overlap Rule: An instance of the supertype could be more than one of the subtypes (represented by “o”)
Disjoint rule
Overlap Rule
Subtype Discriminators •
An attribute of the supertype whose values determine the target subtype(s) –
Disjoint – a simple attribute with alternative values to indicate the possible subtypes
–
Overlapping – a composite attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
Disjoint rule
Overlap rule
LD (7 Steps to convert ER to LD Read the assumption given carefully!) Keys usually are used as indexes to speed up the response to user queries
Primary Key – Represent by underline -
-
An attribute (or a combination of attributes) that can uniquely identify each row in a relation -> Single fields ( Simple primary key) -> multiple field (Composite primary key) Each relation must have one primary key The primary key value for each row is unique
Foreign Key -
An attribute (or combination of attributes) in a relation that is a primary key of another relation
Integrity Constraints 1. Referential Integrity – Rules stating that any foreign key value MUST 1. 2.
Either match a primary key value in the relation it refer to Or be null
Referential integrity constraints are drawn via arrows from dependent table (in which it is the foreign key) to the parent table (in which it is the primary key)
2. Foreign Key Constraints – Deleting the information from parent relation 3. 4. 5.
Restrict: Deletion is not allowed, as long the data is involved in other relations Cascade: The data and other affected records are deleted Set-to-Null: the value of the data from the affected table is set to null
3. Domain Constraints 6.
Allowable values for an attributes
4. Entity Integrity 7.
No primary key attribute may be null. All primary key fields must have data
7 Steps to map EER diagram to relations 1. Mapping regular entities to relations a. Simple attributes – ER attributes map directly onto the relation
CUSTOMER entity type with simple attributes
CUSTOMER Relation
b. Composite attributes – Use only their simple, component attributes
CUSTOMER entity type with composite attributes
CUSTOMER relation with address detail
c. Multi-Valued attributes – a separate relation with foreign key taken from the superior entity
CUSTOMER entity type with multi-value attributes
Multi-valued attribute becomes a separate relation with foreign key. 1 – to – Many relationship between original entity and new relation
2. Mapping weak entities o o
Weak entity becomes a separate relation with a foreign key taken from the strong entity Primary key composed of partial identifier of weak entity and primary key of Strong entity
Weak entity DEPENDENT
Relations resulting from weak entity NOTE: the domain constraint for the foreign key should NOT allow null value if DEPENDENT is a weak entity Composite primary key
Foreign key
3. Mapping binary relationships a. One-to-Many: Primary key on the one side become a foreign key on the many side
Relationship between CUSTOMER and ORDER
Relation from one-to-many relationship
Foreign key
b. Many-to-Many: Create new relation with the primary keys of the two entity as it primary key
Many-to-Many Relationship
Resulting relations
Composite primary key
Foreign key
New intersection relation
Foreign key
c. One-to-One: Primary key on the mandatory side becomes a foreign key on the optional side
Binary One-to-One relationship
Resulting relation
4. Mapping associative entities a. Identifier Not Assign Default primary key for the associative is composed of the primary key of two entities
An Associative Entity
Resulting relation
b. Identifier Assign Have primary key of its own.
Shipment associative entity
Resulting Relation
5. Mapping unary relationships a. One-to-One & One-to-Many: Recursive foreign key in the same relation
One-to-One or One-to-Many Relationship
Resulting relation with recursive foreign key
b. Many-to-Many: Two relation (entity type & associative relation)
Many-to-Many Relationship
Resulting Relation
6. Mapping ternary relationships o o
One relation for each entity and one for the associative entity Associative entity has foreign keys to each entity in the relationship
Ternary relationship with associative entity
Resulting relation
7. Mapping supertype/subtype relationship o o o o o
One relation for supertype and for each subtype Supertype attributes (including identifier and subtype discriminator) go into supertype relation Subtype attributes go into each subtype primary key of supertype relation also becomes primary key of subtype relation 1:1 relationship established between supertype and each subtype, with supertype as primary table
Supertype/subtype relationship
Resulting relation
SQL (Not require to write but have to guess the output) SQL -> Structured Query Language DDL (data definition language): SQL commands to create databases, tables, drop tables etc. DML (data manipulation language): SQL commands used for querying databases
Create Table
•
char(n): a string with n characters
•
varchar(n): a string with at most n characters
•
All constraint names in a schema/database should have different names
•
Child tables can be created only when ALL the parent tables are created
How to define primary key -
Simple primary key: o
-
CONSTRAINT course_pk PRIMARY KEY (cid)
Composite primary key o
CONSTRAINT qualification_pk PRIMARY KEY (fid, cid)
How to define foreign key -
Single foreign key o CONSTRAINT qualification_fk1 FOREIGN KEY (cid) REFERENCES course(cid) Composite foreign key o CONSTRAINT test_FK1 FOREIGN KEY (StuID,CourseID) REFERENCES Registration(SID,CID))
Insert Data •
INSERT INTO Student VALUES (38214, ’Ama');
Delete Data
Update Data
Change table structure •
Add a new column –
•
Add a new constraint –
•
ALTER TABLE table_name ADD CONSTRAINT constraint_name constraint_definition
Drop an existing column –
•
ALTER TABLE table_name ADD column_name column_type [not null];
ALTER TABLE table_name DROP column_name
Drop an existing foreign key constraint –
ALTER TABLE table_name DROP FOREIGN KEY foreign_key_constraint_name
SELECT Statement •
Clauses of the SELECT statement: [look at DM_SQL Syntax.docx for the possible syntax] –
SELECT: list the columns (and expressions) that should be returned from the query
–
FROM: indicate the table(s) or view(s) from which data will be obtained
–
WHERE: indicate the conditions under which a row will be included in the result
–
GROUP BY: indicate categorization of results
–
HAVING: indicate the conditions under which a category (group) will be included
–
ORDER BY: sort the result according to specified criteria
Join Table Join
Inner-Join
Left Outer Join
Right Outer Join
Equal Join
Not-Equal Join
Sub-Query A subquery may occur in: -
A WHERE clause A FROM clause A HAVING clause
Normalization (Determine the functional dependency base on given assumption and data dictionary specified in the question) Well-Structured Relations -
Contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies o Insertion Anomaly: adding new rows forces user to create duplicate data o Delete Anomaly: adding new rows forces user to create duplicate data o Modification Anomaly: adding new rows forces user to create duplicate data
General rule of thumb: a table should not pertain to more than one entity type
Data Normalization -
-
Tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The problems of having duplication of data o Waste of space o Difficulty in consistency control The process of decomposing relations with anomalies to produce smaller, well-structured relations
Normalization Form All relations are in 1st Normal Form
1st Normal Form – 1NF -
No multivalued attributes, and every attribute value is atomic
2nd Normal Form – 2NF -
1NF + every non-key attribute is fully functionally dependent on the ENTIRE primary key o Every non-key attribute must be defined by the entire key, not by only part of the key o No partial functional dependencies Functional Dependency: The value of one attribute, or of combination of attributes, (the determinant) determines the value of another attribute (Determinants can be multiple keys)
Representing Functional Dependencies - Graphical representation: a link is issued from the determinant pointing to the other attribute
1NF on entire primary key Dependency
Dependencies on part of the key (partial functional dependencies) For example: EmpID is the determinant, and Name, DeptName, Salary is functionally dependent on EmpID - Text representation: EmpID -> Name, DeptName, Salary CourseID -> Course_Title EmpID, CourseID -> DateCompleted
3rd Normal Form – 3NF -
2NF PLUS no transitive dependencies A transitive dependency is a functional dependency between two (or more) non-key attributes.
Transitive dependencies are removed...