05 Query Processing - lec 5 PDF

Title 05 Query Processing - lec 5
Author Jenny Char
Course Data Science. Big Data and Data Variety
Institution University of Sydney
Pages 53
File Size 3.1 MB
File Type PDF
Total Downloads 92
Total Views 196

Summary

lec 5...


Description

DATA3404 – SCALABLE DATA MANAGEMENT Week 5: Query Processing – Introduction; External Sorting 2021 sem1 References: Garcia-Molina/Ullman/Widom – Chapter 15 (start); Ramakrishnan/Gehrke – Chapters 12, 13 and 14 (start); Kifer/Bernstein/Lewis – Chapter 10 (start) Lecturer: Alan Fekete Slides prepared by: Uwe Roehm Based on slides from: Ramakrishnan/Gehrke and Kifer/Bernstein/Lewis And also slides from Dr Bryn Jeffries

DATA3404 "Scalable Data Management"

2-1

Learning Objectives › Understanding of Role and Structure of Query Processing - From SQL to physical data access 1) Query Parsing, 2) Query Optimization, 3) Plan Execution - Expression Tree vs. Evaluation Plan

› Query Execution Algorithms - External Merge Sort - Next week: joins

SQL Commands

Weeks 5 - 7 Parser

Plan Executor

Optimizer

Operator Evaluator

Query Evaluation Engine

3

Agenda › Overview of Query Processing › Relational Algebra [review for students from ISYS2120] › Concepts of Query Processing › External Sorting

4

Overview of Query Processing

5

Database Querying with SQL “List all students (by SID) who are enrolled in Database Systems I” SELECT FROM WHERE AND

studID Enrolled E, UnitOfStudy U E.uosCode = U.uosCode U.uosName = 'Database Systems I'

04a-6

Compiling a query › SQL is declarative - It indicates what information is to be returned

› The DBMS must execute a concrete, imperative program, that describes step-by-step what to do › The DBMS must therefore begin by finding an imperative program which will calculate the result described in the declarative query - This is somewhat similar to the way a programming language can be compiled into machine instructions

› The process is done in stages - Convert the text of SQL into a structured logical form (described using Relational Algebra) - Convert the Relational Algebra logical plan into a plan with physical operations (each of which has a program already available) - Actually, there are many possible physical plans for a single SQL query - The way the DBMS chooses a good physical plan is called “query optimisation” 7

Basic Steps in Query Processing SELECT name FROM Student NATURAL JOIN Enrolled WHERE uosCode=‘DATA3404’;

query

parser and translator

PROJECT(name) NESTED LOOPS

INDEX FULL SCAN (Enrolled_idx, uosCode=‘DATA3404')

INDEX UNIQUE SCAN (Student_idx, Student.sid=Enrolled.sid)

relational algebra expression

pname

TABLE ACCESS BY INDEX ROWID (Student)

optimizer

execution plan

⋈ evaluation engine

suosCode=‘DATA3404’ Student Enrolled

Name statistics about data

data

query output

John Smith Sally Waters 8

Relational Algebra (abbreviation: RA)

9

The Big Idea › Users request information from a database using a query language › A query that extracts information can be seen as calculating a relation from combining one or more relations in the current state of the database › Relational algebra (RA) defines some basic operators that can be used to express a calculation like that - Each operator takes one or more relation instances, and produces a relation instance (the operation part of the relational data model)

› Why is it important? - Helps us understanding the precise meaning of declarative SQL queries. - Intermediate language used within DBMS (cf. chapter on db tuning)

10

The Role of Relational Algebra in a DBMS

[cf. Kifer/Bernstein/Lewis, Figure 5.1] 11

What is an Algebra? › A language based on operators and a domain of values: - basic operators - atomic operands (constants and variables) - rules to form complex expressions from operators and operands, typically using nesting with parentheses

› Example: Algebra of Arithmetic - Operators for usual arithmetic operations: + -

* /

- Operands: variables (x, y, z, …) and numerical constants - Example arithmetic expression: 100 – ((x + y) / 2)

› In databases, operators and operands are finite relations, which leads to the Relational Algebra - We refer to expression as a query and the value produced as query result 12

Relational Algebra (RA) › 1. Set Operations - Union

( È ) tuples in relation 1 or in relation 2.

- Intersection ( Ç ) tuples in relation 1, as well as in relation 2. - Difference

(-)

tuples in relation 1, but not in relation 2.

› 2. Operations that remove parts of a relation - Selection - Projection

( s ) selects a subset of rows from relation. ( p ) deletes unwanted columns from relation.

› 3. Operations that combine tuples from two relations - Cross-product ( X ) allows us to fully combine two relations. - Join ( ) to combine matching tuples from two relations.

› 4. A schema-level ‘rename’ operation - Rename

(r)

allows us to rename one field or even whole relation

› Domain: set of relations - operators take one/more relations as inputs and gives a new relation as result => RA queries by nesting of multiple operators 13

Visualisation of Relational Algebra Projection ( p )

Selection ( s )

Cross-product ( X )

=

X

Set Union ( È )

È

=

Set Minus ( - )

-

Join (

=

)

=

Set Intersection ( Ç )

Ç

=

14

Running Example sid name

uos_code

Student gender

country

sid

name

semester

creditPoints

Enrolled

Student

title

UnitOfStudy

enrolled

UnitOfStudy

gender

country

sid

uos_code

semester

1001

Ian

M

AUS

1001

COMP5138

2005-S2

1002

Ha Tschi

F

ROK

1002

COMP5702

2005-S2

1003

Grant

M

AUS

1003

COMP5138

2005-S2

1004

Simon

M

GBR

1006

COMP5318

2005-S2

1005

Jesse

F

CHN

1001

INFS6014

2004-S1

1006

Franzisca

F

GER

1003

ISYS3207

2005-S2

uos_code

title

credit Points

COMP5138

Relational DBMS

6

COMP5318

Data Mining

6

INFO6007

IT Project Management

6

SOFT1002

Algorithms

12

ISYS3207

IS Project

4

MIT Research Project

18

COMP5702

15

Set Operations › These operations take two input relations R and S - Set Union R È S - Definition: R U S = { t | t

Î

R Ú tÎS}

- Set Intersection R Ç S - Definition: R Ç S = { t | t Î R Ù t Î S } - Set Difference R - S - Definition: R - S = { t | t

Î

R Ù tÏS}

› Important constraint: R and S have the same schema - R, S have the same arity (same number of fields) - `Corresponding’ fields must have the same names and domains

16

Projection › ‘Deletes’ attributes that are not in projection list. - Schema of result contains exactly the fields in the projection list, with the same names that they had in the input relation.

› Examples: Õ name, country (Student)

Õ title (UnitOfStudy)

Student name

country

Ian

AUS

Ha Tschi

ROK

Grant

AUS

Simon

GBR

Jesse

CHN

Franzisca

GER

17

Selection › Selects rows that satisfy a selection condition. - Example:

s country=‘AUS’ (Student) Student sid

name

gender

country

1001

Ian

M

AUS

1003

Grant

M

AUS

› Result relation can be the input for another relational algebra operation! (Operator composition.) Õ name ( s country=‘AUS’ (Student) ) - Example:

Student name Ian Grant 18

Cross-Product › Defined as:

R x S = {t s | t Î R Ù s Î S}

- each tuple of R is paired with each tuple of S. - Resulting schema has one field per field of R and S, with field names `inherited’ if possible. - It might end in a conflict with two fields of the same name -> rename needed

› Sometimes also called Cartesian product › Example:

R A

B

a

1

b

2

x

C

S D

E

a b b g

10 10 20 10

a a b b

=

A

a a a a b b b b

result B C D 1 a 10 1 b 10 1 b 20 1 g 10 2 a 10 2 b 10 2 b 20 2 g 10

E a a b b a a b b 19

Joins R  c S = σ c (R×S) Student  Lecturer family _ name =last _name

› Conditional Join: - Example:

sid

given

family_name

gender

country

empid

first_name

last_name

room

1001

Cho

Chung

M

AUS

47112344

Vera

Chung

321

1004

Ciao

Poon

M

CHN

12345678

Simon

Poon

431

1004

Ciao

Poon

M

CHN

99004400

Josiah

Poon

482

1111

Alice

Poon

F

AUS

12345678

Simon

Poon

431

1111

Alice

Poon

F

AUS

99004400

Josiah

Poon

482















› Result schema same as the cross-product’s result schema. › Sometimes called theta-join. › Equi-Join: Special case where the condition c contains only equalities. 20

Natural Join › Natural Join:

R S

› Equijoin on all shared-named fields. - Result schema similar to cross-product, but only one copy of fields for which equality is specified. Enrolled

result

UnitOfStudy points

sid

uos_code

Relational DBMS

6

1001

COMP5138

Relational DBMS

6

COMP5318

Data Mining

6

1002

COMP5702

MIT Research Project

18

COMP5138

INFO6007

IT Project Mgmt.

6

1003

COMP5138

Relational DBMS

6

1006

COMP5318

SOFT1002

Algorithms

12

1006

COMP5318

Data Mining

6

1001

INFO6007

ISYS3207

IS Project

4

1001

INFO6007

IT Project Mgmt.

6

1003

ISYS3207

COMP5702

MIT Research Project

18

1003

ISYS3207

IS Project

4

sid

uos_code

uos_code

1001

COMP5138

COMP5138

1002

COMP5702

1003

title

=

title

points

21

Rename Operation › Allows to name, and therefore to refer to, the results of relational-algebra expressions. › Allows to refer to a relation by more than one name. › Notation 1:

r x (E)

- returns the expression E under the name X (rename whole relation)

› Notation 1:

r X (a1 à b1) (E)

- returns the result of expression E under the name X, and with the attributes a1 …. renamed to b1 … (rename individual attributes) - (assumes that the relational-algebra expression E has arity n)

› Example:

r Classlist( sid à student ) ( s uos_code=‘ISYS2120' (Enrolled) ) 22

Basic versus Derived Operators › We can distinguish between basic and derived RA operators › Only 6 basic operators are required to express everything else: - Union

(È)

tuples in relation 1 or in relation 2.

- Set Difference ( - ) - Selection (s) - Projection (p)

tuples in relation 1, but not in relation 2.

- Cross-product ( X ) - Rename (r)

allows us to fully combine two relations.

selects a subset of rows from relation. deletes unwanted columns from relation. allows us to rename one field to another name.

› Additional (derived) operations: - intersection, join, division: - Not essential, but (very!) useful. - Cf. Join:

R  c S = σ c (R×S) 23

Relational Expressions › A basic expression in the relational algebra consists of either one of the following: - Variable that refers to a relation of the same name in the database - A constant relation

› Let E1 and E2 be relational-algebra expressions; then the following are all relational-algebra expressions: - E1 È E2 - E1 - E2 - E1 x E2 - E1  E2 - sP (E1), P is a (complex) predicate on attributes in E1 - πS (E1), S is a list consisting of some of the attributes in E1 - rX (E1), x is the new name for the result of E1 24

Exercise 1: Relational Algebra Each of the following queries can be performed with a single Relational Algebra operation. Which operator is required in each case? 1. SELECT name, age FROM Student

2. SELECT * FROM Enrolled WHERE mark BETWEEN 85 AND 100 3. SELECT * FROM Student, Enrolled

4. SELECT * FROM Enrolled E, UosOffering U WHERE U.uoscode=E.uoscode AND U.semester=E.semester AND U.year=E.year

25

Concepts of Query Processing

26

Overview of Query Translation and Execution › SQL gets translated into relational algebra, which can be shown as logical expression tree. - Operators have one or more input sets and return one (or more) output set(s). - Leaves correspond to (base) relations.

› Query Execution Plan: Tree of relational algebra operators, with choice of algorithm for each operator.

› Two main issues in query optimization (cf. week 7): -

For a given query, what plans are considered? - Algorithm to search plan space for cheapest (estimated) plan.

-

How is the cost of a plan estimated?

› Ideally:Want to find best plan. › Practically: Avoid worst plans! 27

Parsing and Translation SELECT name SQL query FROM Student NATURAL JOIN Enrolled WHERE uosCode=‘DATA3404’;

Projection

Join

Selection

Relational Algebra (RA)

pname(Student ⋈ suosCode=‘DATA3404’(Enrolled)) Execution Plan

RA Expression Tree

pname

PROJECT(name)



NESTED LOOPS

suosCode=‘DATA3404’ Student

INDEX FULL SCAN (Enrolled_idx, uosCode=‘DATA3404')

INDEX UNIQUE SCAN (Student)

Enrolled

Logical Operators

Physical Operators 28

Exercise 2: Relational Algebra Expression Trees Write a relational algebra expression, in the form of an expression tree, equivalent to: SELECT S.name, U.uosName, FROM Student S, Enrolled E, UosOffering U WHERE S.sid = E.sid AND U.uosCode = E.uosCode AND U.semester = E.semester AND U.year = E.year AND E.mark BETWEEN 85 AND 100;

29

Expression Tree vs. Query Execution Plan

Õ

balance

sbalance...


Similar Free PDFs