CS 6400 Notes Exam 1 PDF

Title	CS 6400 Notes Exam 1
Author	Michelle Pineda
Course	Database Applications
Institution	Central Georgia Technical College
Pages	19
File Size	281.9 KB
File Type	PDF
Total Downloads	66
Total Views	144

Preview

CLICK TO PREVIEW PDF

Summary

Notes for exam...

Description

CS 6400 Notes – Exam 1 Notes

•

Log10(3)/Log10(2) = Log2(3)

•

EER

•

Database Management System (DBMS) – a software system allowing users to create and maintain a database o

o

•

Uses: §

Have Data intensive apps

§

Need Persistent storage of data

§

Need Centralized control of data

§

Need Control of Redundancy

§

Need Control of Consistency and Integrity

§

Need Multiple User Support (Systems and People – ex Point of Sales (POS))

§

Need to Share Data

§

Need Data Documentation, Data Independence, Control of Access and Security, Backup and Recovery

Do Not Use when: §

The initial investment in hardware, software, and train is too high

§

The generality is not need (ex. Overhead too high)

§

Data and applications are simple and stable

§

Real-time requirements cannot be met by it

§

Multi user access is not needed

Data Modeling – model represents perception of structures of reality; data modeling process is to 1) fix a perception of structures of reality and 2) represent this perception o To do this people use 2 languages § Use Extended Entity Relationship model which is good for fixing a perception of structures of reality § Relational model which is good for representing the model inside a database

o Process modeling – aim to fix and represent a perception of processes of reality § Processes are not represented inside database but are reflected in way we use DBMS § Processes may be represented by • Embedded in program code or • Executed ad hoc o Data models contain – data structures, contraints, operations, keys and identifiers, integrity and consistency, null values, and surrogates o ***A data model is not the same as a model of data § A data model is the tool or fomulas used to create such a model o Relational model – data is representated in tables § Degree of the table = number of columns § Schema – consists of table name, column names, and data types; represents aspects of table that is constant over time so typically not expected to change § State – dynamics; reflects the state of reality (i.e. the rows) o Contraints express rules that cannot be expressed by the data structures alone (such as cities must be US cities) o Keys are uniqueness constraints – example making email primary key will force all emails to be unique o Surrogate – is a system generated artificial identifier for an entity; represents an entity of the real world inside the database; is immutable by the application programs § Ex. User_id § Name based representation does not have user_id – it just has a combination of other fields like email, name, address

• Architecture

2

o A database is a model of structures of reality; its divided into schema and data § Schema – describes intension (types) § Data – describes extension (data) o ANSI-SPARC 3-level architecture § This architecture separates out aspects of how data is physically organized into what is known as an internal schema. So its goes through the schema to an internal schema so the data is accessed and sent back § Has a third level, the external schema, that separates out aspects of how data is used by individual applications § External schema – use of data § Conceptual schema – meaning of data § Internal schema – storage of data o Conceptual Schema – describes all conceptually relevant, general, time-invariant structural aspects of reality; excludes aspects of data representation and physical organization, and access; describes structure of the whole database for a community of users (entities, data types, relationships, user ops, constraints) § Can write query but cant describe how results are displayed (other than order) and how the data is accessed § Ex. SELECT Email FROM RegularUSer o External Schema – aka view level; describes parts of the information in the conceptual schema in a form convenient to a particular user group’s view; is derived from the conceptual schema; includes a number of external schemas or user views; describes the part of the database that a particular user group is interested in

3

§ Ex. CREATE VIEW HighPayFemales as ( SELECT….) o Physical data independence allows the database administrator to change the internal schema without changing the conceptual schema o Internal Schema – describes hwo the information described in the conceptual schema is physically represented to provide the overall best performance; describes the physical storage structure of the database including data storage and access paths for the database o Multi-valued attributed is denoted as two ellipses o Attributes are ellipses o B+ trees create logarithmic time access to data so it would be possible to ask efficiently a query o Both B+ tree and index can not be seen by way of the applications, because of that it is possible to replace that, to remove it, or to add additional indices all without affecting the applications that run on the database o Physical data independence is a measure of how much the internal schema can change without affecting the application programs § Similar to OOP with encapsulation – where the implementation of a class can be changed without affecting the applications that access the class through its methods o The 3 levels (external, internal, conceptual) provides for 2 types of independence (physical, logical) o Logical data independence – measure of how much the conceptual schema can change without affecting the application programs

4

o It is more difficult to provide logical data indepence than it is physical data independence because the external schemata against which the applications are written. Those external schemata are logically derived from the conceptual schema. Thus if you have an application that accesses a table in an external schema and you change that table in the conceptual schema then it is plausible that the application will be effected. o Importance characteristics of a database approach is: § Information describing the database (i.e. meta-data) is stored in a catalog § Different users may see different views of the database § Data may be shared among many users o ANSI/SPARC DBMS Framework – has two pieces § Schema Compiler piece § Query Transformer piece § In diagram, diamond represent people working, boxes represent process or pieces of software that transform the text, triangle represents database where schema definitions are stored § Enterprise Administrator – will define a conceptual schema § Application System Administrator – responsible for defining external schemata § Database Administrator – defines internal schema § A User – can either query the database or can be queried via an application program o System Metadata – where data came from, how data were changed, how data are stored, how data are mapped, who owns data, who can access data, data usage history, data usage statistics

5

§ Critical in a DBMS o Business metadata – what data are available, where data are located, what the data mean, how to access the data, predefined reports, predefined queries, how current the data are § Critical in a Data Warehouse • Extended Entity Relationship Model o All entity type names must be unique o Properties are represented by ellipses in the chart; ex. Email, password – single valued property o Property values are – lexical, visible, audible; they are things that name other things o Identifying property type – there can only be a single entity instance identified by that email value § For each identifiying property value there is at most one instance of the identified entity – every entity must be uniquely referenceable o Composite Properties – composed of multiple items, ex. Name is composed of First name and Last name o Multi-valued Properties – modeled by double ellipses in diagrams; ex. Interests; there can be multiple interests or just one o Relationship types are represented by diamonds § 1-1 Relationship types – carnaliatity is 1; • The names of multiple relationship types between the same two entity types must be unique • Ex - marriage § 1-many relationship type – ex, employer to user because an employer can have many employees

6

§ Mandatory 1-N relationship types – signified by a bold solid line; also ex. Employer to employee § N-M relationship type (many to many) – cardinalities described on the lines; ex. People and the schools they attended § N-ary relationship types – ex. User-team-event to identify event team member • Many tertiary relationship types cannot be reduced to a conjunction of binary relationship types § Recursive relationship type – ex. A superviser of a superviser… § Supertypes and subtypes (is-a relationship types) – ex. User with subtypes of regular user and admin user • Disjointness constraint – means that the two subtypes cannot overlap (ex. Male and female); denoted by “d” • Overlapped allowed – denoted by “o” • Inheritance – regular user will inherit user properities of email and password even though email and password do not directly tie to regular user like how birthdate and currency directly tie to regular user o Union entity types – § For example, an employer can be one of two different type of entities – it could be a company or a government agency; denoted by U o Are relationships entities? – no? § Relationships may have attributes

7

§ For 1-N and 1-1 relationships, attributes may be moved to the entity on the “many-side” (either side) o Objectified relationship type – o Entity type vs Property types – § Ex. User is entity type (rectangle), LastName is property type (elliptical) – except when LastName is used to describe attributes like “son of/daughter of” o EER Model § 3 types of abstractions typically important when fixing a set of reality – classification, aggregation, and generalization • Classification – ability to define for entity types is in support of classification • Aggregation – EER does not explicitly support aggregation • Generalization – super subtype introduced to support this § In the example diagram user, school, and friendship were not an entity type that represented an aggregation of other entity types § The Enxted Entity Relationship model does not have a closed query language • Relational Model o SQL uses tuple Calculus o QBE uses domain calculus o Data Structures § A domain, D, is a set of atomic values (means values without meaning)

8

§ A relation, R, is a subset of the set of ordered n-tuples § An attribute, A, is a unique name given to a domain in a relation helping us interpret domain values § Degree = columns; tuples = rows; attribute name is column name; domain is type; cardinality = number of rows •

If some parameter needs to be constrained by a set of values and if we need the abilty to modify this set of values by the databases administrator (not necessary in application itself but possibly behind the scene, but still we want this set to be updatable) then the preferred way to model it in EER is to model it as an entity..

• Some functions are total, meaning that they are defined on all elements of their domain, some are partial meaning that there are elements in their domain on which they are not defined. That reflects the difference between mandatory and non-mandatory participation of the entity in certain relationship. •

Think about SQL:

• It operates upon tables of data and the result of any sql query is also table of data. That is very convenient as we can always use the result of any query in any next sql query. • This is what we can call the closed language. • In math when they talk about groups they define group as a set of elements and binary operation which given any two elements of our group produces the third element. Very important property of the group is that it is closed, meaning that the result of the operation is always one of the elements of the group.

• Some classic examples of closed systems would be say all integer numbers and say multiplication operation.

9

• The product of any two integers is also integer, so this is closed system. • On the other hand all integer numbers and division operation is not closed system as we can find two integer numbers where result of their division will not be an integer, say we cannot divide 3 by 2.

• So in case of EER the result of modeling of some system is not any kind of EER object , like say entity or relationship, so we cannot have that result as a whole and use it in modeling of more complex system.

• And that becomes a limitation of EER, • which is what Leo criticizes in this section and uses for explanation of why EER application to modeling, though useful, still is limited.

• The closeness is very convenient property of any language when it comes to understanding of very complex systems as it allows to assemble more complex systems from its smaller components, thus going to virtually any level of complexity. So, we can understand why lack of it is such a drawback.

• ----• this is because for weak enitity the partial identifier is unique only within the set of instances that depend on the same instance of the string entity. And we will have Exam1 in course CS6400 but we will also have Exam1 in course CS6300 and in many other courses So Exam1 cannot serve as a system-wide identifier, that is why it is a partial identifier, -----

10

• The point here is that this given EER does not restrict from the situation when the same question can be used on many tests. What we see in this diagram is that Test can have many sections, as there is one-to-many relatoiopnship between Test and Section. Then each Section can have many questions, as Section and Question have many-to-many relationship. But there is nothing in this diagram which tells us that the same question cannot appear in different tests. In the same vein there is nothing in this diagram which restricts the same question to appear more than once even on the same Test, say in different sections. It would be a little bit more subtle question, if it would be asking if the same question can appear more than once in the same section. We usually assume that there can be only one instance on Many-to-Many relationship between a particular pair of instances of entities which participate in the Many-to-many relationships, so that question (if that would have been present in the quiz) would have had answer No. • •

----In Relational Theory, we're operating on sets. A many-to-many relationship means an entity can be related to many of the second entity, each of which may themselves be related to many of the first entity... but as with set theory, you can never have two instances of the same relationship between THE SAME TWO entities.

• And there's really not a good reason to allow that anyway, in the rare instance where you need more than one, they're actually two different relationships that you're probably trying to model, not the same one multiple times. • I shouldn't say "never", if there are attributes on the relationship, then they aren't the same relationship. But if I remember this one, there's no

11

attribution on the relationship which would make multiple instances of the relationship non-duplicate. • • And instance of this is the Udacity/EdX quiz that came up recently in which it was asking for the identifying key of a weak entity, and Leo's answer is that there are three parts, because he includes the YearGraduated from the relationship itself in the identifying key. That's an instance where you'd be allowed multiple instances of the relationship between the same student and school, as long as each had a different YearGraduated. • --•

double eclipses means multiple-valued property

•

double diamond means identifying relationship

•

double rectangle means weak entity

• Bottom of U is oriented towards subclass •

Double line means manadatory relationship

•

mandatory / non-mandatory, disjoint/overlapping subclasses????

•

----

• By identifier we imply identifying attribute or it also can be called identifying property. • So if it asks about attributes then it is the questions about attributes, not necessarily identifying attributes. • If it asks about identifiers, then it asks about identifying attributes. •

----

• Question is a strong entity with questionID being its identifier. • SectionID, TestID, courseID are not required to identify Question. •

12

----

• typically supertype has identifier and instances of subtype are also instances of supertype so they have the same ids inevitably. • • Perhaps in theory it may be possible, but I have hard time thinking of a real time example, where each subtype has its own identifier, especially when doing EER we are usually thinking of natural attrributes as identifiers. If such situation arises, it may suggest re-thinking the inheritance model in such system. In case of Union it would be normal situation that each component has its own identifier but Union is not about inheritance

• •

13

Data abstraction – the characteristic that allows program-data independence and programoperation independence; generally refers to the suppression of details of data organization and storage and the highlighting of the essential features for an improved understanding of data

o Data model – type of data abstraction that is used to provide conceptual representation; a collection of concepts that can be used to describe the structure of a database – provides the necessary means to achieve this abstraction o By structure of a database we mean the data types, relationships, and constraints that apply to the data

• Data normalization – having each logical data item – such as student name or birth date – in only one place in the database o Ex. Splitting things up into ads, apps, events tables • Denormalization o Ex. Putting everything in the events table • An entity represents a real-world object or concept such as an employee or a project from the miniworld that is described in the database • An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary; a thing or object in the real world with an independent existence (ex. Company or person or course) o Each entity has attributes – the particular properties that describe it • A relationship among two or more entities represents an association among the entities, for example, a works-on relationship between an employee and a project • Database schema – description of the database; a displayed schema is called a schema diagram • Each object in the schema – such as STUDENT or COURSE – a schema construct • Data inpdependence – the capacity to change the schema at one level of a database system without having to change the schema at the next higher level

14

o Logical data independence – capacity to change conceptual schema without having to change external schemas or application programs o Physical data independence – capacity to change the internal schema without having to change the conceptual schema • Data Definition Language (DDL) – used by database administrators (DBA) and by database designers to define both schemas • Storage Definition Language (SDL) – used to specify the internal schema; in most relational DBMSs today there is no specific language that performs the role of SDL instead the internal schema is specified by a combination of functions, paramters, and specifications related to storage of files • View definition language (VDL)...