Information Systems 222 Notes PDF

Title Information Systems 222 Notes
Author Bradley Hosking
Course Database Systems
Institution University of Auckland
Pages 20
File Size 1000.2 KB
File Type PDF
Total Downloads 3
Total Views 167

Summary

Slides and examples used with permission from Johnny (He made them freely available) Notes missing SQL and database management content ...


Description

Course Notes Learning Objectives: ● Use data modelling to define and analyse data requirements ● Design a database in entity-relationship (ER) model and translate that to the relational model ● Utilise theoretical techniques like normalisation to improve database design and implementation ● Program basic to complex queries in SQL ● Understand the fundamentals of a relational database management system (DBMS) and its limitations ● Recognise database technologies beyond relational DBMS Draw.io - https://www.draw.io/ SQLite - http://sqlite.org/download.html Atom - https://atom.io/ Lecture Powerpoints - h  ttps://github.com/johnny723/sql

Introduction File systems: ● Easy to duplicate data, potential data redundancy, and then leading on to inconsistent data as a result of needing to update the same thing multiple times. ● Limited sharing capabilities, difficult to have multiple people working on the same file at the same time (New tech makes it easier, but database still wins here). ● Relatively easy to lose information, eg, losing access to a file that requires a decade old program to open - you rely on the life on the application to access your data. ● However it is also relatively easy to use, and good for storing small amounts of data Databases: ● Can be defined as “A structured set of data held in a computer, especially one that is accessible in various ways”. ● Data stored within a specific and set logic, and hence often requires management by a DBMS, allowing for ease of access and organisation. ● Better for complex data, long term storage, and data that you want to draw information out ● Databases are built off of the file system, but organized/structured under specific rules ● DBMSs can further be defined as: “A system for providing efficient, convenient, and safe storage of, and multi-user access to (possibly massive) amounts of persistent data” ● Amazon, Facebook and Canvas are all examples of DBMSs ● There have been 5 major forms of databases, from oldest to newest: Hierarchical (Everything defined by parent to child, with everything coming from an ultimate parent. Think like a series of folders, each folder containing 3 more folders. Difficult showing relationships between child however), Network (Anything can connect to anything, forming a giant network, difficult to manage and inefficient), Relational (Still the most predominant form and the focus of the course - also behind the internet), Object-oriented and O  bject-Relational (These are the most modern forms, however they are relatively small currently)

The Relational Data Model ● Data is saved in a number of named tables with named columns. Columns function as the description of the data they contain, the rows are the actual data. ● The concept of data splitting for differentiation - Eg, Names split into first, last and middle names - needed for large scale databases where doubles happen ● Data duplication reduced/avoided by having multiple tables - Imagine a table for contact information, but some people have multiple emails. Instead of having a new row with duplicated data to incorporate each of their emails into your table, link tables with unique IDs, and in your new table, store the emails there, with minimal redundancy (Only the ID is duplicated here, as it’s mentioned twice, once in each table) ● Organising data in this format follows 3 main steps ○ 1: (Plain english into ERD) Gather and then analyse the data requirements, identify all entities and relationships, and then describe them with an ERD ○ 2: (ERD into relational tables) Convert the ERD into a number of relations ○ 3: (Fine tuning) Eliminate/Reduce redundancy by splitting up relations (Normalization) ● Businesses are often short sighted, getting the correct system in place is vital, a little extra effort initially will save alot of effort latter! A stitch in time saves nine….. ● Conceptual (Thinking through how to organize the data, highly abstract) to Logical (Reasoning the relationships) to Physical (Defining everything specifically) ● We express this model in 2D tables, but the reality of how the information is stored is far more complex, remember it is part of a digital system SQL - Structured Query Language ● Its convenience and efficiency is a major part of relational databases success ● Based on Relational Algebra ● “Relation” is not exactly interchangeable with “table” but close enough for this course ● Syntax = the way the code can be written ● Semantics = the meaning of the code, what it does ● Semantics errors are when the code is nonsensical, but can be run (Comparing an accounts balance, with its randomly assigned ID number, can run, but pointless), Syntax errors are when the code is broken and cannot be run. ● For the “Account” table previously seen, we would say: To find the balance of account 22222 ■ SELECT balance ■ FROM Account ■ WHERE accno = 22222; ○ If we removed the 3rd line it would list us all balances from the table ○ Or we could chance it to: WHERE balance < 1000 ○ Or we could further specify our search with: ■ SELECT accNo, balance ■ FROM Account ■ WHERE type = ‘loan’ ■ AND balance < -10000; ○ ‘Loan’ has quotations because it is a string (Of text) ● Always end with a ; (Semicolon) ● * is used as shorthand to select all columns/attributes





SQL is also used to link tables/relations together to draw out more meaning, drawing from this second example table named “Holder”: ■ SELECT name ■ FROM Account, Holder ■ WHERE Account.accNo = Holder.accNo ■ AND Account.type = ‘check’; ○ Line 1 is asking for what we want (a list of names) ○ Line 2 is specifying where we will find the relevant data (2 tables in this case) ○ Line 3 is linking the 2 tables together via the ‘accNo’ attribute which appears in both of them (Linking the account details table “Account” with the account holder details table “Holder” and hence linking an account with the person that actually owns it) ○ Line 4 is specifying that only check accounts are relevant ○ Hence overall, we get a list of ‘check’ account holders. The code find which accounts are check accounts, and then relates this data to the ‘Holder’ table through the ‘accNo’ attribute and lists out the linked, relevant names. Generically: ■ SELECT column1, column2, ….. ■ FROM relation1 ■ WHERE ; ○ Or ■ SELECT column1, column2, ….. ■ FROM relation1, relation2, ….. ■ WHERE ■ AND ;

Find the essential further readings in the google drive folder, and here: http://philip.greenspun.com/sql/introduction.html

The Relational Model Defining Model: An abstraction of reality, or part of it, used when reality is too complex to comprehend A data model is a precise and conceptual way of describing and specifying the data stored in a database. Its structure (This is the main different from a file system remember), operation (Create, Read, Update and Delete, CRUD) and constraints (Remember when you define what something is, you also design what something isn't) The relational data model stores all of the data in the form of relations Tuples ● ● ● ● ●

Relations consist of multiple tuples Defined as an order list of values Usually written in parentheses with commas separating the values Eg, (Interstellar, 2014, 169, colour) Remember order is important here! Tuples with the exact same values, but with a different order, are classified as entirely new tuples

Relations ● A set of tuples, defined on certain domains ● Order of the tuples is not relevant ● Eg, - using the previous example, this relation could be defined upon the domains of: ○ Text ○ Integer ○ Integer ○ { ‘colour’, ‘black and white’ } ● Hence when designing a relation, defining the domains for your values is vital Attributes ● Assigned to the different components of a tuple in order to refer to them ● Eg, - using previous example, in order: title, year, length, filmtype ● Basically the title of the Relation’s columns ● Note that “Attribute” can refer to the entire column, or to a specific attribute within it, be careful Data type / Domain ● The values of an attribute must belong to a domain, that is its data type ● A requirement of SQL ● Common data types are: TEXT for text strings, INTEGER for integers, REAL for real numbers, DATE for gregorian dates ● Different DBMSs have different datatypes, but the above are reasonably universal Schema ● Consists of the name of the relation and the set of its attributes (With or without data types) ● Hence, using the previous example, it could be: Movie (title, year, length, filmType), or it could be: Movie (title TEXT, year INTEGER, length INTEGER, filmType TEXT) ● The combined schemas of all relations in a database form its database schema Relation instances ● Relations change over time as tuples are inserted, deleted or updated ● Hence, a set of tuples of a single relation at a single moment is defined as an instance of that relation, as per convention, databases maintain only a single instance of each relation - the current one, at any one time. ● External tools are used to backup data outside of the database ● Scheme changes result from data requirement changes and are hence much more uncommon and expensive than simply adding or removing tuples within the existing framework. Keys ● Is the most fundamental data restraint in the relational model (The Key constraint) ● Formed from a set of attributes, where we do not allow 2 tuples to hold the same value ● This set of attributes would then exist across different relations, in order to link them ● Artificial keys are often used in order to ensure absolute certainty of uniqueness Remember that the Relational Data model, is only a single model, and that others exist. Eg, this is the model used by the internet

SQL - Structured Query Language ● SQL stores relations in the form of tables with attributes becoming columns, and tuples becoming rows ● Has 2 sides, data manipulation, and data definition ● Relations in SQL are expressed as below, with attributes on the left, and their corresponding data types in the right ■ CREATE TABLE Movie ( ■ Title TEXT, ■ Year INTEGER, ■ Length INTEGER, ■ FilmType TEXT ■ ); ● Modifying the relations can be done through: ■ DROP TABLE R; ○ Removes relation R along with all of its tuples from the database ■ ALTER TABLE R RENAME TO S ○ Renames the relation R to S ■ ALTER TABLE R ADD COLUMN newColumn1 T  EXT; ○ Adds a new attribute to R named ‘newColumn1’ ■ ALTER TABLE R ADD COLUMN newColumn1 T  EXT D  EFAULT ‘Yes’ ; ○ Repeats this, but includes a specified default value ○ If no default value is specified, the default default value is null ○ Deleting an attribute is often achieved via dropping, and then recreating the entire tuple ● Declaring constraint is achieved via: ■ CREATE TABLE Movie ( ■ title TEXT, ■ year INTEGER, ■ length INTEGER, ■ filmType TEXT, ■ PRIMARY KEY (title, year) ■ ); ○ Resulting in the title and year attributes being set as primary keys ○ You can also declare other constraints with the commands: ■ NOT NULL ○ Creating a column that can contain no null values ■ UNIQUE ○ Creating a column that can only contain unique values ■ FOREIGN KEY ○ Creating a column with values that come from another table ■ CHECK ○ Checks values with a logic you define ○ Eg, checking mobile phone numbers to confirm validity ■ DEFAULT



Tuples can be manipulated through: ■ INSERT INTO Movie VALUES ■ (‘ Interstellar', 2013, 168, 'colour'); ○ To insert a new tuple into a relation ■ UPDATE Movie ■ SET year = 2014, length = 169 ■ WHERE title = 'Interstellar'; ○ Note the above example makes 2 distinct changes within the table ○ If the ‘WHERE title =’is forgotten, you will replace all of the attribute values ○ To update a value within an existing tuple ■ DELETE FROM Movie WHERE title = 'Interstellar';

ER Modeling Data Modeling is defined as the process of creating a data model for a given set of data, from the conceptual stage, to logical and then physical. The business rules and logic must be translated. How data is defined, structured and stored has significant implications as to how it can be accessed and manipulated. Good data models simply require good justification. The ER Model - A systematic way of defining and describing a business process.The process is modeled as entities, linked together by relationships that express the dependencies and requirements between then. ERDs are a tool used to represent the ER model. ● Entities ○ An entity is an object, either abstract or physical, Eg, Interstellar ○ They are grouped together into entity sets (Conventionally named with a singular noun relating to the business), Eg, Movie ○ In ER modeling, sometimes an entity set is called an entity, and an entity is called an instance ● Relationships ○ Connects entity sets ○ Degree refers to how many entity sets participate in the relationship, ■ Normally binary, that is it exists between 2 entity sets. ■ Unary exists as a single entity's relationship with itself ■ Ternary relationships have 3 entity sets participating ■ N-nary is for numbers beyond 3 ○ Cardinality ■ Multiplicity, how many entities from one entity set is associated with one entity from another entity set. Eg, 1 building has many rooms, 1 doctor has many patients. Can be One-to-One, One-to-Many or Many-to-Many. Worth remembering that each relationship is a 2 way street. ■ Optionality - Optional or Mandatory. Every bedroom must have a building, but not every building must have a bedroom

Every box is an entity, and every line is a relationship. Note the Crow’s Foot Notation on the right ^^

Examples of Unary relationships ^^ As shown, can occur when a single entity includes different items that are related to each other in some way, Eg, “Employee” includes workers AND their managers, or “Item” consists of cars AND their parts. But this is not always the case (A person marries another person)

Examples of Binary relationships ^^

There is the potential for 2 entities to have more 1 than relationship between them, hence it is good practice to include a short text phrase to describe each unique relationship ^^ ●



Attributes ○ A property/characteristic of an entity set ○ Eg, The attribute “name” for the entity set “movie” ○ Every entity within an entity set will have a value for each of its attributes ○ Its value can be: ■ Atomic (Single valued, a person name) vs Multivalued (Multiple values, people in a team) Use of multivalue is bad design, steer clear ■ Stored (User score of a movie) vs Derived (Mean user score of a movie) Keys ○ Candidate Keys are sets of attributes whose unique values can identify an entity within an entity set ○ A Primary Key is selected when there are multiple Candidate Keys, Primary Keys values must not change, and must not include a null value.

Note the separate “Quantity”, in the first example diagram, and the separate ‘loanDate’ and ‘returnDate’ in the second. These exist because while they are important, they cannot be stored in any entity sets. Hence they must be associated with the relevant relationship. ERDs with attributes and keys define what we are interested in storing, and hence also what we are not interested in.

Its good to keep in mind when designing an ERD, what the actual purpose is for, often there may be a central entity that everything links back too. Eg, for the example above, the ERD is clearly centered on artwork, which is appropriate considering the database would be used to keep track of them. 1, Find the Entities, list their attributes 2, Set out the relationships, establish their nature 3, Read over the case and catch all the details When rationalizing your ERD, read both ways across the relationships, and reference the case Even if it’s unlikely, always design for what could potentially occur Conceptual Data Model ● Comprehensive description of the business’s informational needs ● Covers the things of significance (Entity sets) which we wish to collect information about (Attributes) and the associations between them (Relationships) ● Often used in massive, extremely complex systems, which then further narrows down into the logical level Logical Data Model ● A representation of some/all of the businesses data, independent of any DBMS ● Described in business language ● Adjusted to achieve certain efficiencies Associative Entity sets ● Entities that are being used to represent a relationship with its own attributes ● Result due to SQL limitations, and from the wish to collect data on those attributes, even though they do not belong to an entity set ● A many to many relationship is replaced with an entity and 2 one to many relationships ● Part of logical data modeling ● Removes the need of the separate boxes along relationships that mention attributes that do not belong in any entity set (Like ‘Quantity” from the previous example) ● From the first example, Enrolment would function as an abstract solution, Certificate would function as a physical solution ● Note the cardinality in use here, as it always follows the same pattern - They are needed when you have a many-to-many relationship, even if its unary ● For the second example, we see the resulting associative entity is capable of holding data for both potential types of items with it’s 2 primary keys

Existence Dependency ● Concerns whether an entity set is ‘strong’ or ‘weak’ ● Strong Entity sets are capable of existing independently and have their own unique ID ● Weak Entity sets depend on a strong entity set and only have a partial Identifier. They will have multiple primary keys, at least one from a strong entity, and one of their own. Eg, Room numbers cannot exist without their corresponding building numbers. ● Relationships between strong and weak entity sets are called Identifying relationships, they are otherwise referred to as non-identifying relationships



Worth noting in this example, that if there was the potential for a payment made without holding a loan, it would break the logic, and hence it could not be modeled this way

Data Modeling

Foreign Keys ● Normally references the primary key of the parent/master entity set, from within the child entity ● It’s common for foreign keys to be renamed within the child ● If a Foreign key is used as part of the primary key, that entity set is weak ● Allow null foreign keys if the relationship is optional or mandatory ● They are a way of controlling data redundancy, as removing all data redundancy is not possible while capturing these relationships. Foreign keys control the data redundancy by linking entity sets with minimal data overlap (Just the keys themselves) ● Primary keys are used to identify each entity within a set, but foreign keys are a critical element to represent relationships across entity sets - Protect referential Integrity ● Hence it allows for referential actions, cascade updates or deletes (Altering a parent alters a child) / restrict

One to one relationships are rare, because you can often just combine the entities, can exist because some of the information requires additional security/encrypting (User and Login example) Design Principles ● There is no wrong way to design a database, but some ways are better than others ● Only make assumptions when the case study does not specify ● Avoid duplication and other redundant information ● KISS - Keep It Simple and Stupid, be faithful and scope the design

Location, Gallery and Show could all be combined under Location and provided it with a new “type” attribute when applying the KISS Principle. But we would lose startDate and endDate from the show entity. This is a compromise we must weigh up. Supersets and Subsets ● Sometimes in the ER model, all entities of one entity, are included in another one. Eg, all entities in the entity set “Employee” are also entities in the entity set “Person” Hence ‘Employee’ would be the subset and ‘Person’ would be the superset ● Superset defined as having one to one relationships with one or more subsets ● Subset defined as a sub-grouping of a superset that has distinct attributes and/or relationships that dist...


Similar Free PDFs