COEN 280 - course note for COEN280 PDF

Title	COEN 280 - course note for COEN280
Course	Database Management Systems
Institution	Santa Clara University
Pages	81
File Size	5.5 MB
File Type	PDF
Total Downloads	75
Total Views	150

Preview

CLICK TO PREVIEW PDF

Summary

course note for COEN280...

Description

Class 2: Conceptual database design: ER Model, nothing to do with implementation. Entity is an object: Ex: students, professors, uniquely identifiable. Entity sets need not be disjoint. Attributes: describe entities. Students can be described by name, SSN, etc. Domain: a phone number could have domain of integer, restricting phone number to be integer only. Attribute describes entities using key-value pairs. Ex: for student {(name, Lee), (SSN,123-456-789)}.

A student has one Name, SID and Phone

A student can have multiple phone numbers

A candidate key is a super key but not the other way around. You may have multiple candidate keys. Primary key is one of those candidate keys. Primary key is underlined. Weak entity: entity that does not have primary key. Partial key + primary key of identifying owner to uniquely identify weak entity.

Name in course entity is partial key. Course is weak entity. Bold line is used to indicate weak entity and total participation. Course should be in total participation with professor. (I cannot uniquely identify a course without professor) Course + Professor will uniquely identify a course. Professor does not need to be in total participation with course.

Player is weak entity Number is partial key. Player is in total participation with Team. Ternary Relationships: range from 3 to N entities.

If any entity is weak entity, ternary Relationship cannot resolve it because weak entity needs 1-to-many relationship and needs total participation. Any ternary relationship can convert to binary relationship.

Overlap constraint: an employee can be both hourly employee and contract employee.

Class 3: Enhanced Entity Relationship allows you to define relationships between objects. Employee: generalized entity. Manager, administrator, programmer: specialized entity, have specialized attribute. generalized + specialized attribute. Relationship is inheritance.

a. can there be an employee without any specialization - yes! (single line) b. can there be an employee without any specialization - no! (double line) double line is total participation, single line is partial participation. cardinality: disjoint or overlap. to decide disjoint or overlap: members of super class belong to multiple overlapping specialization? Integrity constraints: represent a condition on DB schema restricting what kind of data to be stored in the table. Ex: Student(SID: Integer, Name: String, Major: String, GPA: Number) two integrity constraints: 1. SID is a primary key, I cannot have two rows with same SID, SID is primary key constraint) 2. data type is also a restriction. SID can only be integer, name and major can only be string)

Primary key cannot be null, you cannot have multiple tuples with same primary key. 3 types of integrity constraints Domain Constraint: deal with data types, data type restricts data value Foreign Key constraint: link from particular table to parent table. Link should also be present. Primary key constraint: restrict uniqueness.

Referential integrity is not enforced because actor_id of 4 in relation table does not point to valid actor_id in actor table. (dangling pointer) Foreign key can be NULL. (This row is not pointing to anywhere)

A specific Professor is not allowed to teach a specific course for multiple semesters, in order to teach a specific course, we have to change to the ER model in image below or we create an assumption that courseId will be changing per semester.

Another assumption is that for each unique primary key, it cannot point to duplicates rows in weak entity set, for example, this is not allowed: Name PID Name Smith 2345 -------> Java \------> Java However, this is allowed:

Number CS112 CS112

Name PID Name Smith 2345 -------> Java \------> Java

Number CS112 CS112

Section A B

Class 4: Domain constraint: have restrict on the data to be stored. Example: restricting the type of SID, Name, Major) Primary key constraint: two tuples cannot have the same primary key. Primary key cannot be null

Referential integrity = foreign key constraint -> no dangling pointer. A tuple points from one relation to another tuple in another relation. The value in the forign key matches the value in parent key unless foreign key value is NULL. Enrolled(sid: string, cid: string, gradeL string) Students(sid: Integer, Name: String, Major: String, GPA: Number)

Q1: should sid in enrolled and students have the same domain? A: sid in both enrolled table and students should have same type, either integer or string What should be done if a Students tuple is deleted or updated? (Suppose student goes to other school) Action DBMS could do: 1. Cascade to the child table and delete all tuples referring to current student. (Cascade the deletion) 2. Disallow such operation. (There are some pointers from enrolled table to student table, you are not allowed to delete student in student table because it will have dangling pointer, you delete rows of enrolled table first and then we will allow the deletion of that student in student table) 3. Allow to delete student, but change sid of that student in enrolled table to default sid or null.

Sometimes, action1 and action2 is not good for some cases: Professor retires, and he teaches the courses with 50 students, and professor retires, we need to keep the course, we have to change pid in enrolled table to be default value or null. Actions above can be picked differently for delete and update. In SQL, CREATE TABLE statement - create table with name, each attribute and data type. For the case of string, customer_name could be defined as VARCHAR (variable character) and Char (fixed character) VARCHAR(25) - allocate space up to size of 25 Char(25) - allocate fixed amount of space of 25 NOT NULL: not allowing null for particular attribute PRIMARY KEY: that particular attribute is primary key

Ex: CREATE TABLE CUSTOMER (CUSTOMER_ID VARCHAR(5) NOT NULL; CUSTOMER_NAME VARCHAR(25) NOT NULL PRIMARY KEY(CUSTOMER ID); Corresponding schema: Customer (Customer_ID: String, Customer_Name: String) If we have (CUSTOMER_ID VARCHAR(5) instead, customer_id cannot be NULL either because PRIMARY KEY(CUSTOMER ID) also enforces CUSTOMER ID to be not NULL. ON UPDATE/ON DELETE Actions, NO ACTION is default behavior.

Constraints in Cretae Table: Not Null: disallowing DBMS to take NULL Foreign Key: defining foreign key reference Unique: Primary Key can be null Unique can be null, but primary key itself cannot be null. Primary Key: Primary Key cannot be null. Check Condition: specifying a range of admissible values for that attribute.

Q2: Why did is the PRIMARY KEY instead of (did, SSN) as primary key? Also why this is not as efficient as the method 2 below? for one-to-many relationship, primary key is on the “many” side. Thus for manages table, primary key is “did”, which is on many side, which is department. should this is based on the fact that both entities are not weak entities?

Tables: Manages: ssn since did department: did dname budge did is repeated twice

save some space Method 2 applied to one-to-many relationship only Key constraint: 1-to-1, 1-to-many, many-to-many, many-to-one. Participation Constraint: total participation vs. partial participation

Department must have an employee, and that employee must be manager. (Departments is in total participation with employee, departments must have a manager) Employee is in partial participation, you can have manager not managing department. But Department must have a manager. For regular employee that works in department, departments is in total participation with regular employee and regular employee is in total participation with department.

Because department is in total participation with manager, thus, every did value in department table must appear in a row of the manages table (with a non-null ssn value) because ssn is primary, it cannot be null Q4: If there is a manager quit, we should not set SSN of manages table to default, why? If there is a manager quit, we can set NO ACTION behavior on foreign key deletion for manages table and manually replace SSN of manager who quits with an alternative manager’s SSN in manages table, and then we can delete the entry of quitted manager in Employees table? - best method

In manages table: ssn CHAR(11) NOT NULL: we should have a manager for the department. FOREIGN KEY (ssn) REFERENCES Employees (ON DELETE NO ACTION): if a manager quits, you will disallow the deletion of manager in manager table until the new manager is assigned in manages table.

ssn CHAR(11) NOT NULL: because dependents is weak entity set, it must be in total participation with identifying owner, which is employee, thus, we cannot have NULL SSN PRIMARY KEY(dname, ssn): because each row in weak entity set can be uniquely identified by primary key of identifying owner + partial key of weak entity set, thus, (dname, ssn) combined will be primary key. ON DELETE CASCADE: because dependents must have a corresponding employee, thus deleting an employee will automatically trigger the deletion of dependents associated with that employee.

What does this mean？ Attributes for relationships: Can be migrated to one of the participating entity sets for 1:1,1:N and N:1 relationships (which one?). But NOT for M:N relationships. A: Weak entity must have many-to-one relationship with identifying owner

When to use composite attribute and when to use multivalued attribute? A: they are different concept composite attribute: address (city, state, street, etc.) attribute: address: multiple addresses for one person Ex: person A 123 Street 456 Street 789 Street

Use Multi-value attribute vs Entity:

If Children is in relationship with other entities, use entity for children rather than attribute. For weak entity set, it must follow two constraint? - A: always require 1. Total participation with identifying owner 2. Many-to-one relationship with identifying owner

HW1 Question: 1. what is the relationship between profile picture in “Yelp User” and personal photo in “Photo” - they are the same 2. all photo, no matter business photo or personal photo, can be liked by a list of users? - yes 3. can I just regard parking type and ambient type to be attribute without having to show exact what parking type and ambient type is? - decided by developer 4. do I need to explicitly make activity wall as entity? -yes Since the functionality of activity wall can be done by searching from friendship table and “write review” table. Question related to EER-Example and EER-solution in lecture 4. (Session information: Each session includes a reference to one user in the system)

Q: should highlighted be one-to-many instead of many-to-many? A: one -to-many, thus diagram above is wrong. What does relationship in bold mean? Like Example below: A: This is relationship which indicates weak entity with identiyfing owner

For weak entity set, all three parts should be bold Integrity constraints apply in all cases, and you cannot take either of those constraints out.

Class 5: multi-value attribute: create a table with foreign key pointing to primary key of entity + multi-value attribute

Composite attribute: attribute composed of other attributes. Have columns for those attributes that are composed of. Q: Foreign key can be null or cannot be null? DDL (Data Definition Language): TABLE LEVEL DML: TUPLE LEVEL CHAR(n) -- fixed length VARCHAR(n) -- variable length Compute the cross-product of relation-list. If you have more than one relation. Table1: 2x3 Attribute1

Attribute2

Attribute3

Table2: 3x2 AttributeA

AttributeB

Resulting table contains 5 columns and 6 rows Attribute1

Attribute2

Attribute3

AttributeA

Relation = Table Find all products whose name mentions ‘gizmo’: PName LIKE ‘%gizmo%’

AttributeB

% = any sequence of characters _ = any single character Not equal:

Class 6: ORDER BY: sort the output ORDER BY manufacturer, price: sort by manufacturer, and then sort by price Order is ascending by default.

Joins: FROM Product, Company WHERE manufacturer = cname Join based on value of manufacturer attribute in Product = value of cname attribute in Company Always define alias from table: SELECT S.name, S.age FROM Sailors S S is alias here

Use long form here because sid exists in both Sailors and Reserves table. Range variable = alias

Find the sids of sailors who have reserved a red boat: SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid = R.sid AND B.bid = R.bid AND B.color = ‘red’ Better: SELECT S.sid FROM Boats B, Reserves R WHERE B.bid = R.bid AND B.color = ‘red’ Find the names of sailors who have reserved a red boat: Since you can only get name of sailors from sailors table, you need sailors table. SELECT S.sname FROM Sailors S, Boats B, Reserves R Where R.sid = R.sid AND B.bid = R.bid AND B.color = ‘red’ Find the names of sailors who have reserved at least one boat. SLECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid age1 = S.age - 1 is equivalent to S.age - 1 AS age1 UNION, INTERSECT, EXCEPT union-compatibility condition for UNION, INTERSECT, EXCEPT: 1. the result set must have the same number of columns 2. The result set must have the same order 3. The result set must have the same data types. (cannot be one is integer, another one is string)

doing the join only once.

doing the join twice By default, UNION, INTERSECT, EXCEPT eliminate duplicate, which can be costly. DISTINCT can be costly. UNION ALL, INTERSECT ALL, EXCEPT ALL: retain duplicate.

(2) Inner SELECT reduces number of columns and number of rows, does not calculate cross product but simply row matching and thus more efficient compared to (1) (2) from SELECT R.sid FROM Reserves R WHERE R.bid = 103, you get a table and match S.sid, thus “SELECT R.sid FROM Reserves R WHERE R.bid = 103” is only evaluated once

Q: why not “NOT IN” first one, which is S.sid NOT IN (SELECT R.sid in line (1) A: NOT IN can be placed in either locations, either location of S.sid IN (SELECT R.sid or location of R.bid IN (SELECT B.bid if a boat has multiple colors, for example boat1 may have color red and green, we should not select it, and then we have to change inner query to: SELECT B.bid FROM Boats B WHERE B.color = ‘red’ EXCEPT SELECT B.bid FROM Boats B WHERE B.color ‘red’; and then add “NOT IN” on either side.

EXISTS: Correlated Nested Queries.

(1) S.sid in inner query depends on S.sid in outer query, thus it is correlation. Pick every row of Sailors S table, take sid of each row, and evaluate the query, which is SELECT * FROM Reserves R WHERE R.bid = 103 AND R.sid = S.sid and if we have n tuples, the query above will be evaluated n times, suppose inner query has expensive JOIN, it will make query very inefficient. EXISTS checks result is empty or not.

LECTURE 7: Find the sum of ages of all sailors with a rating of 10 SELECT SUM(S.age) FROM Sailors S WHERE S.rating = 10; Find the name and age of the oldest sailor: SELECT S.sname, MAX(S.age) FROM Sailors S; above is wrong because you cannot combine aggregate operators with any other attribute unless you use group by. COUNT, SUM, AVG, MIN, MAX -> return just a number Correction: SELECT S.name, S.age

FROM Sailors S WHERE age = (SELECT MAX(S2.age) FROM Sailors S2);

Count the number of different sailors SELECT COUNT (DISTINCT S.sname) FROM Sailors S More SQL: Find the sage of the youngest sailor for each rating level: group based on rating level SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification GROUP BY results based on some set of attributes HAVING: group qualification based on group to eliminate group grouping-list has to be in target-list. The cross-product of relation-list is computed, tuples that fail qualification are discarded, ‘unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in the grouping-list. Example: Find the age of the youngest sailor with age >= 18, for each rating with at least 2 such sailors. SELECT S.rating, MIN(S.age) FROM Sarilors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1 Find the age of the youngest sailor with age >= 18, for each rating with at least 2 sailors (of any age) SELECT S.rating, MIN(S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating = S2.rating) Find those ratings for which the average age is the minimum over all ratings

SELECT S.rating FROM Sailors S GROUP BY S.rating HAVING AVG(S.age) = (SELECT MIN(AVG(S2.age)) FROM Sailors S2 Group by rating); WHERE and HAVING clause eliminate rows that don’t evaluate to true (rows evaluated to false or unknown) Aggregate functions ignore nulls (except count(*)) count(column A) ignore nulls DISTINCT treats all nulls the same COUNT, SUM, AVG, MIN, MAX (with or without DISTINCT): Discards null values first Then applies the aggregate Except COUNT(*)

Any time when Sailors table or Boats table gets updated, this assertion is re-evaluated. View: definition of relation, and has columns Left outer join: SELECT * FROM Sailor S, Reserve R WHERE S.sid = R.sid(+); Right outer join: S.sid(+) = R.sid

LECTURE 8: User Defined: Example -> calculate tax for an employee CREATE FUNCTION tax(P_value IN Number) P_value must be Number type RETURN Number IS (IS is part of syntax) BEGIN RETURN (P_value * 0.08); End; SELECT eid, name, salary, tax(salary) FROM Employee WHERE dept_id = 50; Example: SELECT eid, tax(salary) FROM Employee WHERE tax(salary) > (SELECT MAX(tax(salary)) FROM Employee WHERE dept_id = 20) ORDER BY tax(salary) DESC Syntax: CREATE FUNCTION ( IN , IN , ……) RETURN IS

BEGIN Executable SQL commands; RETURN (value you want to return) … [Exception Exception handlers] END Example: CREATE FUNCTION query_max_sal(P_dept_id IN Number) RETURN Number IS BEGIN RETURN (SELECT MAX(Salary) FROM Employee WHERE dept_id = P_dept_id); END

OR: CREATE FUNCTION query_max_sal(P_dept_id IN Number) RETURN Number IS v_num NUMBER; BEGIN SELECT MAX(Salary) INTO v_num FROM Employee WHERE dept_id = P_dept_id; RETURN(v_num); END

Object: has attribute and behavior entity: has attribute only

Create Oracle object CREATE TYPE person_type AS OBJECT ( name VARCHAR(30), phone VARCHAR(20), MEMBER FUNCTION get_areacode RETURN NUMBER); /

Create table of contacts with two columns CRETAE TABLE contacts ( contact person_type, c_date DATE); INSERT INTO contacts VALUES ( person_type(‘John Smith’, ‘408-740-1114’), ‘24 Jan 2004’);

Implement body of object type CREATE OR REPLACE TYPE BODY person_type AS MEMBER FUNCTION get_areacode RETURN VARCHAR2 IS BEGIN RETURN SUBSTR(phone, 1, 3); END get_areacode; END; / SELECT c.contacts.get_areacode() FROM contacts c;

LECTURE 9: create table with single column CREATE TYPE person_type AS OBJECT ( name VARCHAR2(30), phone VARCHAR2(30)); / CREATE TABLE person_table OF person_type; INSERT INTO person_table VALUES (person_type (‘Scott Tiger’, ‘321-123-1234’)); SELECT VALUE(p) FROM person_table p WHERE p.name = ‘Scott Tiger’; VALUE(Obj table) -> returns rows as object instances map method: the return value of map method is used to compare the object. CREATE TYPE circle_type AS OBJECT ( x NUMBER, y NUMBER, r NUMBER, MAP MEMBER FUNCTION get_area RETURN NUMBER); / CREATE TYPE BODY circle_type AS MAP MEMBER FUNCTION get_area RETURN NUMBER IS BEGIN RETURN 3.14 * r * r; END get_area; END; / SELECT * FROM circles c ORDER BY VALUE(c); // we do not see the area, used by DBMS to compute the area and sort those circles. When defining object, you could have either map me...