Infs5710 final note PDF

Title Infs5710 final note
Author Mengxue Xia
Course Information Technology Infrastructure for Business Analytics
Institution University of New South Wales
Pages 16
File Size 1.3 MB
File Type PDF
Total Downloads 89
Total Views 143

Summary

the note for final exam...


Description

Chapter1 Database Definition: ● specialized structures that allow computer-based systems to store, manage, and retrieve data very quickly. ● Shared, integrated computer structure that stores a collection of o End-user data: Raw facts of interest to end use o Metadata: data about data, which the end-user data are integrated and managed; Describe data characteristics and relationships Why database? ● Ubiquitous&Pervasive: meaning everywhere and inescapable. From birth to death, we generate&consume data. Databases are the best way to store&manage data. It makes data persistent and shareable in a secure way.(e.g. The pervasive nature of database-Susan’s Day)



Also essential for organizations to survive&prosper.(know customers, products, employees, debtors, creditors...). The ultimate purpose of all business information systems is to help businesses use information as an organizational resource.) o Data system: collection, storage, aggregation, manipulation, dissemination, management of data.

Data vs. Information:

● Raw facts, or facts that have not yet been processed to reveal their meaning to the end user. ● Building blocks of information ● Data mgmt: Generation, storage, and retrieval of data



The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making.(raw data must be properly formatted-summary/graphical/sounds/videos for storage, processing, and presentation; e.g. dates might be stored in Julian calendar formats within the database, but displayed in a variety of formats, such as day-month-year or month/day/year, for different purposes.

1

● ●

Respondents’ yes/no responses might need to be converted to a Y/N or 0/1 format for data storage.) production of accurate, relevant, and timely information is the key to good decision making. Good decision making is the key to organizational survival in a global environment Data is the foundation of information, which is the bedrock of knowledge—that is, the body of information and facts about a specific subject. A key characteristic is that new knowledge can be derived from old knowledge.

Database Management System(DBMS): Definition: Is a collection of programs that manages the database structure and controls access to the data stored in the database. (helps make data mgmt more efficient and effective. ) Functions: ● Serves as an intermediary between the user and the database.The database structure itself is stored as a collection of files, and the only way to access the data in those files is through the DBMS.



Advantages: o Improve data sharing: enables the data in the database to be shared  among multiple applications or users o Better data security: integrates the many different users’ views of the data into a single all-encompassing data repository o Improved data security o Minimized data inconsistency o Improved data access o Improved decision making o Increased end-user productivity

Types of Databases*: ● single-user database: A database that supports only one user at a time. ● desktop database: A single-user database that runs on a personal computer. ● multiuser database: A database that supports multiple concurrent users. ● workgroup database: A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization. ● enterprise database: The overall company data representation, which provides support for present and expected future needs.

2

● ● ● ● ●

centralized database: A database located at a single site. distributed database: A logically related database that is stored in two or more physically independent sites. cloud database: A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS. General-purpose database: A database that contains a wide variety of data used in multiple disciplines. Discipline-specific database: A database that contains data focused on specific subject areas.

The importance of database design: ● Database design: focuses on the design of the database structure that will be used to store and manage end-user data o Well-designed database: Facilitates data mgmt & Generates accurate and valuable information o Poor-designed database: causes difficult-to -trace errors(ppt-good&bad design sample) Flaws in file system data mgmt: ● Structural and data dependence ● Data redundancy: exists when the same data is stored unnecessarily at different places ● Data anomalies

Chapter 2 Data modeling, data models&why important: ● Model: abstraction of a real-world object or event ● Data modeling: Iterative and progressive process of creating a specific data model for a determined problem domain ● Data models: Simple representations of complex real-world data structures (A representation, usually graphic, of a complex “real-world” data structure. Data models are used in the database design phase of the Database Life Cycle.) o Represents data structures and their characteristics, relations, constraints, transformations, and other constructs which are useful for supporting a specific problem domain o Help understand the complexities of the real-world environment ● The importance of Data Models: o As a communication tool: Data models can facilitate interaction among the designer, the applications programmer, and the end user. o Give an overall view of the database(a sound data environment requires an overall database blueprint based on an appropriate data model.) o Organize data for various users(When a good database blueprint is available, it does not matter that an applications programmer’s view of the data is different from that of the manager or the end user. Conversely, when a good database blueprint is not available, problems are likely to ensue. For instance, an inventory management program and an order entry system may use conflicting product-numbering schemes, thereby costing the company thousands or even millions of dollars.) o Are an abstraction for the creation of good database: abstraction=’blueprint’ Describe Data model basic building blocks:

3

● ●





Entity: a unique and distinct-”distinguishable” object, physical or abstractions (person, place, thing, or event) about which data will be collected and stored Attributes: characteristic of an entity(a CUSTOMER entity would be described by attributes such as customer last name, customer first name, customer phone number, customer address, and customer credit limit. Attributes are the equivalent of fields in file systems.) Relationships: describes an association among entities o one-to-many(1:M) : “CUSTOMER generates INVOICE” o many-to-many(M:N) : “EMPLOYEE learns SKILL” (An employee may learn many job skills, and each job skill may be learned by many employees.) o one-to-one(1:1): “EMPLOYEE manages STORE” Constraints: set of rules to ensure data  integrity(e.g. GPA must be between 0.00 and 4.00; each class must have one and only one teacher) o Entity integrity: enter SID to STUDENT table, it finds two students o Referential integrity: STUDENT table says that student X’s department code is Y, but in DEPARTMENT table, there is no code Y.

Business rules: ● Brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization(must be rendered in writing and updated to reflect any change in the organization’s operational environment. e.g.relationship statements such as “an agent can serve many customers, and each customer can be served by only one agent,” ); establish entities, relationships, and constraints. ● Enable defining the basic building blocks ● Describe main and distinguishing characteristics of the data Translating business rules into data model components: ● Nouns translate into entities ● Verbs translate into relationships among entities ● Relationships are bidirectional ● Ask two Questions to identify the relationship type o How many instances of B are related to one instance of A? o How many instances of A are related to one instance of B? Naming conventions: ● Entity names-required to be descriptive of the objects in the business environment, and use terminology that is familiar to the users ● Attribute names-required to be descriptive of the data represented by the attribute ● Proper name o Facilitates communication between parties o Promotes self-documentation The relational model: ● Produced an “automatic transmission” database that replaced “standard(manual) transmission” databases ● Based on a relation o Relation or table: matrix composed of intersecting tuple and attribute ▪ Tuple: rows ▪ Attribute: columns ● Describes a precise set of data manipulation constructs

4

The entity relationship model: ● Graphical representation of entities and their relationships in a database structure ● Entity relationship diagram (ERD) o Uses graphic representations to model database components (an entity is a table in ER model) ● Entity “instance” or entity “occurrence” o Rows in the relational table ● Connectivity: term used to label the relationship types (a relationship refers to the association between tables)

5

The evolution of data models:

6

Chapter 3 the table view of data makes it easy to spot and define entity relationships, thereby greatly simplifying the task of database design.

● relation = table ● row = tuple = ”entity” ● column = attribute Keys: ● consist of one or more attributes that determine other attributes ● function: o ensure that each row in a table is uniquely identifiable o establish relationships among tables and to ensure the integrity of the data ● Primary key (PK): Attribute or combination of attributes that uniquely identifies any given row(SID is PK in student table; employee ID is PK in employee table) ● Determination: The role of a key. In the context of a database table, the statement “A determines B” indicates that knowing the value of attribute A means that the value of attribute B can be looked up. o basis for establishing the role of a key(to search along the right ‘row’) o based on the relationships(functional dependence-given a value for STU_NUM(determinant), you can determine the value for STU_LNAME(dependent)) among the attributes ▪ standard notation: STU_NUM S STU_LNAME Dependencies: ● Functional dependence: value of one or more attributes determines the value of one or more other attributes ● Full functional dependence: the entire collection of attributes in the determinant is necessary for the relationship:

7

● (STU_NUM, STU_LNAME)-> STU_GPA is not a full functional dependence, because STU_LNAME is useless in determining STU_GPA. Types of Keys: ● Composite key: key that composed of more than one attribute((A.B.) -> C, e.g. ENROL table that find one’s grade will need a composite key of name and course, to uniquely identify a record) ● Key attribute: attribute that is part of a key o One key attribute: STU_NUM S STU_GPA o Four key attributes: (STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE) S STU_HRS ● Entity integrity: condition in which each row in the table has its own unique identity o all of the values in the primary key must be unique o no key attribute in the primary key can contain a null ● Null: absent representation of any data value o an unknown attribute value o a known, but missing, attribute value o a inapplicable condition ● Referential integrity: ● Foreign key: link to another table(must be the PK of another table) or be the null(e.g., STUDENT.supervisor = “null” (not yet assigned)); table without foreign key is self-sufficient ● Secondary key: key used strictly for data retrieval purposes(e.g. ID-PK unavailable, but name can be the secondary key, although may not be unique) ● Super key: (uniquely identify any row in the table. In other words, a superkey functionally determines every attribute in the row. In the STUDENT table, STU_NUM is a superkey, as are the composite keys (STU_NUM, STU_LNAME), (STU_NUM, STU_LNAME, STU_INIT) and (STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE). In fact, because STU_NUM alone is a superkey, any composite key that has STU_NUM as a key attribute will also be a superkey.) o e.g. the set of all attributes

8

Integrity Rules:

● very important to good database design ● RDMS(relational database management systems) enforce integrity rules automatically Relationships within the Relational Database: ● 1:M relationship - N  orm for relational databases

● ● 1:1 relationship - One entity can be related to only one other entity and vice versa

● Many-to-many (M:N) relationship - Implemented b  y creating a new entity in 1:M relationships with the original entities

9

o Composite entity (Bridge  or associative  entity): Helps avoid problems inherent to M:N relationships, includes-as foreign keys-at least the primary keys of tables to be linked Converting M:N to 1:M:

● Linking table is the implementation of a composite entity

Data redundancy revisited (data redundancy leads to data anomalies, which can destroy the effectiveness of the database; if you delete an attribute and the original information can still be generated through relational algebra, the inclusion of that attribute would be redundant) ● Relational database facilitates control of data redundancies through use of foreign keys ● To be controlled except the following circumstances o Data redundancy must be increased to make the database serve crucial information purposes 10

o Data redundancies sometimes seem to Exists to preserve the historical accuracy of the data Entity relational model(ERM): ● ERD-entity relationship diagram depicts: o Conceptual database as viewed by end user o Database’s main components ▪ Entities(tables) ▪ Attributes(columns of tables) ▪ Relationships(associations between tables) ● Entity: (an object of interest to the end user) o Actually refers to the entity set and not to a single entity occurrence o An entity in the ERM corresponds to a table (not a row in the relational environment)-e.g. in crowfoot, an entity is represented by a rectangle contains the entity’s name-in all capital letters. ● Attributes: characteristics of entities

o Required attribute: must have a value, cannot be left empty o Optional attribute: does not require a value, can be left empty o Domain: set of possible values for a given attribute (e.g. domain for gpa is (0,4)) o Identifiers(PK): one or more attributes that uniquely identify each entity instance (called keys in relational model, entities are mapped to tables, entity identifier is mapped as the table’s PK) ▪ composite identifier: primary key composed of more than one attribute(composite key, e.g. ENROLL table) ● e.g. using a composite primary key of CRS_CODE and CLASS_SECTION instead of using CLASS_CODE. either way uniquely identifies each entity instance(CLASS_CODE is the primary key, combination of CRS_CODE & CLASS_SECTION is a candidate key, if CLASS_CODE deleted, (CRS_CODE and CLASS_SECTION) becomes an acceptable composite primary key.) o composite attribute: can be subdivided to yield additional attributes(e.g. phone number can be divided into area code/exchange number/three-digit code...) o simple attribute: attribute that cannot be subdivided(e.g. age/sex/marital status) o single-valued attribute: has only a single value(may be subdivided) o multivalued attributes: have many values(a person may have several college degrees, and a household may have several different phones), and require creating: 11

● new attributes, one for each component of the original multivalued components ● a new entity , composed of the original multivalued attribute’s components (this new CAR_COLOR entity is related to the original CAR entity in a 1:M relationship) o derived attribute: whose value is calculated from other attributes, using an algorithm (e.g. age from DOB)

(chen dotted line means derived attribute, crow foot does not specify)

Relationships: association between entities(tables) that always operate in both directions ● participants: entities that participate in a relationship ● connectivity: describes the relationship classification (classification include 1:1, 1:M, M:N) ● cardinality: expresses the minimum and maximum number of entity occurrences associated with one occurence of related entity

(business rule: each professor teaches at least one class, but no more than 4)

(min=0: optional min=1: mandatory) 12

((there may be courses for which sections (classes) have not yet been defined. In fact, some courses are taught only once a year and do not generate classes each semester.)

(This condition is created by the constraint imposed by the semantics of the statement “Each COURSE generates one or more CLASSes.”) (another e.g. it is quite possible for a PROFESSOR not to teach a CLASS. Therefore, CLASS is optional to PROFESSOR. On the other hand, a CLASS must be taught by a PROFESSOR. Therefore, PROFESSOR is mandatory to CLASS. Associative (composite/bridge) entities: used to represent an M:N relationship between two or more entities(implementing the M:N relationship, particularly in the relational model, requires the use of an additional entity-in ERM is the associative entity) ● in a 1:M relationship with the parent entities (composed of the primary key attributes of each parent entity) ● can have additional attributes that play no role in connective process

13

(ENROLL(0,N)-may be no student signed up for the class) (the associative entity is identified as a strong (identifying) relationship, as indicated by the solid relationship lines between the parents and the associative entity.) ● *the optionality reflects practice: ○ from database perspective: to be classified as a STUDENT, a person must be enrolled in at least one CLASS-CLASS is mandatory to STUDENT. ○ in practice: initially, a student may not yet signed up for any classes-CLASS is optional to STUDENT. (The practical considerations in the data environment help dictate the use of optionalities, and the database design must reflect this.) Developing ER Diagram: ● create a detailed narrative of the organization’s description of operations ● identify business rules based on the descriptions ● identify main entities and relationships from the business rules ● develop the initial ERD ● identify the attributes and primary keys that adequately describe entities ● revise and review ERD

14

15

Chapter 5 Extended entity relationship model(EERM): result of adding more semantic constructs(such as entity supertypes, entity subtypes, and entity clustering) to the original ERM ● EER Diagram(EERD): uses the EER model Entity supertypes and subtypes: ● *why ‘types’: ○ It avoids unnecessary nulls in attributes when some employees have characteristics that are not shared by other employees. ○ It enables a particular employee type to participate in relationships that are unique to that employee type. ● Entity supertype: generic entity type related to one or more entity subtypes(contains common characteristics) ○ e.g. employee ● Entity subtype: contains unique characteristics of each entity subtype ○ e.g. pilot/mechanic/accountant ○ ● criteria to determine when to use subtypes and supertypes: ○ there must be different, identifiable kinds of the entity in the user’s environment ○ the different kinds of instances should each have one or more attributes that are unique to that kind of instance

16...


Similar Free PDFs