C175 - C175 Data Management - Foundations - Flashcards/Terms PDF

Title C175 - C175 Data Management - Foundations - Flashcards/Terms
Author Dana Hernandez
Course Data Management Foundations
Institution Western Governors University
Pages 39
File Size 722.3 KB
File Type PDF
Total Downloads 6
Total Views 138

Summary

C175 Data Management - Foundations - Flashcards/Terms...


Description

ad hoc query - A "spur-of-the-moment" question. analytical database - A database focused primarily on storing historical data and business metrics used for tactical or strategic decision making. availability - In the context of data security, it refers to the accessibility of data whenever required by authorized users and for authorized purposes. Centralized - A database located at a single site. cloud database - A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS. data - Raw facts, or facts that have not yet been processed to reveal their meaning to the end user. data anomaly - A data abnormality in which inconsistent changes have been made to a database. For example, an employee moves, but the address change is not corrected in all files in the database. data dependence - A data condition in which data representation and manipulation are dependent on the physical data storage characteristics. data dictionary - A DBMS component that stores metadata—data about data. Thus, the data dictionary contains the data definition as well as their characteristics and relationships. A data dictionary may also include data that are external to the DBMS. Also known as an information resource dictionary. See also active data dictionary, metadata, and passive data dictionary. data inconsistency - A condition in which different versions of the same data yield different (inconsistent) results. data independance - A condition in which data access is unaffected by changes in the physical data storage characteristics. data integrity - In a relational database, a condition in which the data in the database complies with all entity and referential integrity constraints. data management - A process that focuses on data collection, storage, and retrieval. Common data management functions include addition, deletion, modification, and listing. data processing (DP) specialist - The person responsible for developing and managing a computerized file processing system.

data redundancy - Exists when the same data is stored unnecessarily at different places. database - A shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw facts) and metadata. database design - The process that yields the description of the database structure and determines the database components. The second phase of the Database Life Cycle. database management system (DBMS) - The collection of programs that manages the database structure and controls access to the data stored in the database. database system - An organization of components that defines and regulates the collection, storage, management, and use of data in a database environment. desktop database - A single-user database that runs on a personal computer. discipline-specific databases - A database that contains data focused on specific subject areas. Enterprise Database - The overall company data representation, which provides support for present and expected future needs. field - An alphabetic or numeric character or group of characters that defines a characteristic of a person, place, or thing. For example, a person's Social Security number, address, phone number, and bank balance all constitute fields. file - A named collection of related records. general-purpose databases - A database that contains a wide variety of data used in multiple disciplines. hub - A warehouse of data packets housed in a central location on a local area network. It contains multiple ports that copy the data in the data packets to make it accessible to selected or all segments of the network. Information - The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making. islands of information - In the old file system environment, pools of independent, often duplicated, and inconsistent data created and managed by different departments. knowledge - The body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.

logical data format - The way a person views data within the context of a problem domain. metadata - Data about data; that is, data about data characteristics and relationships. See also data dictionary. multiuser database - A database that supports multiple concurrent users. noSQL - A new generation of database management systems that is not based on the traditional relational database model. online analytical processing (OLAP) - Decision support system (DSS) tools that use multidimensional data analysis techniques. OLAP creates an advanced data analysis environment that supports decision making, business modeling, and operations research. online transaction processing (OLTP) database - See operational database. Operational Database - A database designed primarily to support a company's day-today operations. Also known as a transactional database, OLTP database, or production database. performance tuning - Activities that make a database perform more efficiently in terms of storage and access speed. physical data format - The way a computer "sees" (stores) data. production database - See operational database. query - A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user or the application to the DBMS. query language - A nonprocedural language that is used by a DBMS to manipulate its data. An example of a query language is SQL. query result set - The collection of data rows returned by a query. record - A collection of related (logically connected) fields. role - n Oracle, a named collection of database access privileges that authorize a user to connect to a database and use its system resources. router - (1) An intelligent device used to connect dissimilar networks. (2) Hardware/software equipment that connects multiple and diverse networks.

semistructured data - Data that has already been processed to some extent. single-user database - A database that supports only one user at a time. social media - Web and mobile technologies that enable "anywhere, anytime, always on" human interactions. structural dependence - A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs. structured data - Data that has been formatted to facilitate storage, use, and information generation. Structured Query Language (SQL) - A powerful and flexible relational database language composed of commands that enable users to create database and table structures, perform various types of data manipulation and data administration, and query the database to extract useful information. transactional database - See operational database. unstructured data - Data that exists in its original, raw state; that is, in the format in which it was collected. website - Refers to the Web server and the collection of Web pages stored on the local hard disk of the server computer. workgroup database - A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization. World Wide Web (WWW or the web) - Worldwide network collection of specially formatted and interconnected documents known as Web pages. Also called the Web. XML database - A database system that stores and manages semistructured XML data. 3 Vs - Three basic characteristics of Big Data databases: volume, velocity, and variety. abstract data type (ADT) - Data type that describes a set of similar objects with shared and encapsulated data representation and methods. An abstract data type is generally used to describe complex objects. See also class. American National Standards Institute (ANSI) - The group that accepted the DBTG recommendations and augmented database standards in 1975 through its SPARC committee.

application programming interface (API) - Software through which programmers interact with middleware. An API allows the use of generic SQL code, thereby allowing client processes to be database server-independent. attribute - A characteristic of an entity or object. An attribute has a name and a data type. balancing - Ensuring that the processing load is distributed evenly among multiple servers. Big Data - A movement to find new and better ways to manage large amounts of webgenerated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost. Chen notation - See entity relationship (ER) model. class - A collection of similar objects with shared structure (attributes) and behavior (methods). A class encapsulates an object's data representation and a method's implementation. Classes are organized in a class hierarchy. class diagram notation - The set of symbols used in the creation of class diagrams in UML object modeling. class diagrams - A diagram used to represent data and their relationships in UML object notation. class hierarchy - The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. See also inheritance. client - Any process that requests specific services from server processes in a client/server environment. client node - One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts as the interface between the user application and the HDFS. See also name node and data node. complex object - An object formed by several different objects in complex relationships. See also abstract data types. conceptual model - The output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details. conceptual schema - A representation of the conceptual model, usually expressed graphically. See also conceptual model.

connectivity - The type of relationship between entities. Classifications include 1:1, 1:M, and M:N. constraint - A restriction placed on data, usually expressed in the form of rules. For example, "A student's GPA must be between 0.00 and 4.00." Constraints are important because they help to ensure data integrity. Crow's Foot notation - A representation of the entity relationship diagram that uses a three-pronged symbol to represent the "many" sides of the relationship. data definition language (DDL) - The language that allows a database administrator to define the database structure, schema, and subschema. data manipulation language - The set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK. data model - A representation, usually graphic, of a complex "real-world" data structure. Data models are used in the database design phase of the Database Life Cycle. data modeling - The process of creating a specific data model for a determined problem domain. data node - One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node stores fixed-size data blocks (that could be replicated to other data nodes). See also client node and name node. entity - A person, place, thing, concept, or event for which data can be stored. See also attribute. entity instance - A row in a relational table. Also known as entity occurrence. entity occurrence - A row in a relational table. Also known as entity instance. entity relationship (ER) model - A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen. entity relationship diagram (ERD) - A diagram that depicts an entity relationship model's entities, attributes, and relations. entity set - A collection of like entities. ERM - A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen.

eventual consistency - A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually. extended relational data model - A model that includes the object-oriented model's best features in an inherently simpler relational database structural environment. See extended entity relationship model (EERM). Extensible Markup Language (XML) - A meta-language used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet. external model - The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema. external schema - The specific representation of an external view; the end user's view of the data environment. Hadoop - A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop Distributed File System (HDFS) - A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. hardware independence - A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level. hierarchical model - An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it. inheritance - In the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy. See also class hierarchy. internal model - In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internal model is the representation of a database as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.

internal schema - A representation of an internal model using the database constructs supported by the chosen database. Internet - A global network of computers connected together through a standard network protocol known as Transmission Control Protocol/Internet Protocol (TCP/IP). You can think of the Internet as the "highway" on which the data travel. The terms Internet and World Wide Web are often used interchangeably, but they are not synonyms. key-value - A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. The keyvalue data model is also called the associative or attribute-value data model. many-to-many (M:N or *..*) relationship - Association among two or more entities in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrences of the first entity. MapReduce - An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores. method - In the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions, and are invoked through messages. name node - One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node stores all the metadata about the file system. See also client node and data node. network model - An early data model that represented data as a collection of record types in 1:M relationships. object - An abstract representation of a real world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself. object-oriented data model (OODM) - A data model whose basic modeling structure is an object. object-oriented database management system (OODBMS) - Data management software used to manage data in an object-oriented database model. object/relational database management system (O/R DBMS) - A DBMS based on the extended relational model (ERDM). The ERDM, championed by many relational database researchers, constitutes the relational model's response to the OODM. This model includes many of the object-oriented model's best features within an inherently simpler relational database structure.

one-to-many (1:M or 1..*) relationship - Associations among two or more entities that are used by data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity. one-to-one (1:1 or 1..1) relationship - Associations among two or more entities that are used by data models. In a 1:1 relationship, one entity instance is associated with only one instance of the related entity. relation - A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model. relational database management system (RDBMS) - A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data. relational diagram - A graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities. relational model - Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns). relationship - An association between entities. schema - A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other. Usually, a schema belongs to a single user or application. segment (SEGM) - In the hierarchical data model, the equivalent of a file system's record type. subclasses - See class hierarchy. subschema - The portion of the database that interacts with application programs. superclass - In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors. table - A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.

tuple - In the relational model, a table row. Unified Modeling Language (UML) - A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system. versioning - A property of an OODBMS that allows the database to keep track of the different transformations performed on an object. associative entity - See composite entity. bridge entity - See composite entity. business rule - A description of a policy, procedure, or principle within an organization. For example, a pilot cannot be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four classes during a semester. candidate key - A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey. See key. composite entity - An entity designed to transform an M:N relationship into two 1:M relationships. The composite entity's primary key comprises at least the primary keys of the entities that it connects. Also known as a bridge entity or associative entity. See also linking table. composite key - A multiple-attribute ke...


Similar Free PDFs