datababase-managment-assignment

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn

Part 1. Concepts and principles (40 marks;
8 marks each)

(Your answer for each question should
include about 300 words.)

Question 1

Explain the three main alternatives for
storing information in a data entry of an index. Define clustered index, and discuss
the relation between the three alternatives and clustered/ unclustered indexes.

Question 2

Consider the following operations: Scans,
equality and range selections, inserts, and deletes, and the following file
organizations: heap files, sorted files, clustered files, heap files with an
unclustered tree index on the search key, and heap files with an unclustered
hash index. Which file organization is best suited for each operation?

Explain briefly the main reason for each of
them.

Question 3

Compare ISAM and B+ Tree indexes. Explain
briefly their differences in handling

Search, Insert and Delete, and discuss when
you would use ISAM and when you would use B+ Tree index.

Question 4

Explain how pages are organized in a heap
file, and how records are organized in a page.

Question 5

Explain how extendible hashing uses a
directory of buckets, and discuss the global depth of the index and local depth
of a bucket, and under what conditions the directory can get large.

.1em;”>arks each)

(Each answer should contain approximately
300 words.)

Question 1

Summarize briefly how to make use of
indexes such as B+ tree or a hash indexes in selection, projection, and join
operations?

Question 2

Explain why two different hash functions
are used in hash join, and how to select them.

Question 3

Define the concept of reduction factor, and
explain how it is calculated based on what statistics are available in the
catalogs.

Question 4

Explain what a typical optimizer does, and
what strategies can be used to generate a good plan, and how it can affect the
performance of a DBMS.

Question 5

Define relational algebra equivalences, and
explain how they can be used for plan generation. You may use pushing selection
(ahead of joins) and re-ordering join expressions as examples.

Part 2. Design considerations and calculation
(60 marks; 15 marks each)

Question 1

Athabasca University has about 32,000
students between the ages of 17 to 60. Consider the AU student relation with
the following schema:

Students (stud-id: integer, stud-name:
string, gpa: integer, age: real)

For each of the following indexes, list
whether the index matches the given selection conditions. If there is a match,
list the primary conjuncts.2

1. A B+-tree index on the search key
Students.stud-id.

a. σ Students.stud-id < 28,000
(Students)

b. σ Students.stud-id = 28,000 (Students)

2. A hash index on the search key
Students.stud-id.

a. σ Students.stud-id < 28,000
(Students)

b. σ Students.stud-id = 28,000 (Students)

3. A B+-tree index on the search key
Students.stud-id, Students.age.

a. σ Students.stud-id < 28,000 and
Students.age = 21 (Students)

b. σ Students.stud-id = 28,000 and
Students.age > 21 (Students)

c. σ Students.stud-id = 28,000 (Students )

d. σ Students.age = 21 (Students )

4. A hash-tree index on the search key
Students.stud-id, Students.age .

a. σ Students.stud-id = 28,000 and
Students.age = 21 (Students)

b. σ Students.stud-id = 28,000 and
Students.age > 21 (Students)

c. σ Students.stud-id = 28,000 (Students )

Question 2

Suppose that you just finished inserting several
records into a heap file and now want to sort those records. Assume that the
DBMS uses external sort and makes efficient use of the available buffer space
when it sorts a file. Here is some potentially useful information about the
newly loaded file and the DBMS software available to operate on it:

The number of records in the file is 4500.
The sort key for the file is 4 bytes long.

You can assume that rids are 8 bytes long
and page ids are 4 bytes long. Each

record is a total of 48 bytes long. The
page size is 512 bytes. Each page has 12

bytes of control information on it. Four
buffer pages are available.

1. How many sorted subfiles will there be
after the initial pass of the sort, and how long will each subfile be?

2. How many passes (including the initial
pass just considered) are required to sort this file?

3. What is the total I/O cost for sorting
this file?

4. What is the largest file, in terms of
the number of records, you can sort with just four buffer pages in two passes?
How would your answer change if you had 257 buffer pages?

5. Suppose that you have a B+ tree index
with the search key being the same as the desired sort key. Find the cost of
using the index to retrieve the records in sorted order for each of the
following cases:

• The index uses Alternative (1) for data
entries.

• The index uses Alternative (2) and is
unclustered. (You can compute the worst case cost in this case.)

• How would the costs of using the index
change if the file is the largest that you can sort in two passes of external
sort with 257 buffer pages? Give your answer for both clustered and unclustered
indexes.3

Question 3

Consider the join R R.a = S.bS , given the
following information about the relations to be joined. The cost metric is the number
of page I/Os unless otherwise noted, and the cost of writing out the result
should be uniformly ignored.

Relation R contains 10,000 tuples and has
10 tuples per page.

Relation S contains 2000 tuples and also
has 10 tuples per page.

Attribute b of relation S is the primary
key for S.

Both relations are stored as simple heap
files.

Neither relation has any indexes built on
it.

52 buffer pages are available.

1. What is the cost of joining R and S
using a page-oriented simple nested loops join? What is the minimum number of
buffer pages required for this cost to remain unchanged?

2. What is the cost of joining R and S
using a block-nested loops join? What is the minimum number of buffer pages
required for this cost to remain unchanged?

3. What is the cost of joining R and S
using a sort-merge join? What is the minimum number of buffer pages required
for this cost to remain unchanged?

4. What is the cost of joining R and S
using a hash join? What is the minimum number of buffer pages required for this
cost to remain unchanged?

5. What would be the lowest possible I/O
cost for joining R and S using any join algorithm, and how much buffer space
would be needed to achieve this cost? Explain briefly.

6. How many tuples does the join of R and S
produce, at most, and how many pages are required to store the result of the
join back on disk?

7. Would your answers to any of the
previous questions in this exercise change if you were told that R.a is a
foreign key that refers to S.b?

Question 4

Consider the following relational schema
and SQL query. The schema captures information about employees, departments,
and company finances (organized on a per department basis).

Emp (eid:integer, did:integer, sal:integer,
hobby:char(20))

Dept (did:integer, dname:char(20),
floor:integer, phone:char(10))

Finance (did:integer, budget:real,
sales:real, expenses:real)4

Consider the following query:

SELECT D.dname, F.budget

FROM Emp E, Dept D, Finance F

WHERE E.did = D.did AND D.did = F.did AND
D.floor = 1

AND E.sal
≥ 59000 AND E.hobby = ‘yodeling’

1. Identify a relational algebra tree (or a
relational algebra expression if you prefer) that reflects the order of
operations a decent query optimizer would choose.

2. List the join orders (i.e., orders in which
pairs of relations can be joined to compute the query result) that a relational
query optimizer will consider. (Assume that the optimizer follows the heuristic
of never considering plans that require the computation of crossproducts.)
Explain briefly how you arrived at your list.

3. Suppose that the following additional
information is available: Unclustered B+ tree indexes exist on Emp.did,
Emp.sal, Dept.floor, Dept.did, and Finance.did. The system’s statistics
indicate that employee salaries range from 10,000 to 60,000, employees enjoy
200 different hobbies, and the company owns two floors in the building. There
are a total of 50,000 employees and 5,000 departments (each with corresponding
financial information) in the database. The DBMS used by the company has just
one join method available, index nested loops.

a. For each of the query’s base relations
(Emp, Dept, and Finance) estimate the number of tuples that would be initially
selected from that relation if all of the nonjoin predicates on that relation
were applied to it before any join processing begins.

b. Given your answer to the preceding
question, which of the join orders considered by the optimizer has the lowest
estimated cost?

C

Part 1. Concepts and principles (40 marks;
8 marks each)

(Each answer should contain approximately
300 words.)

Question 1

Define ACID properties, and briefly explain
the methods used to deal with them in the DBMS.

Question 2

Why is concurrent execution of transactions
important in the DBMS? What is a schedule and how do you determine the
conflicts between transactions in a schedule? You may need to explain this
using such anomalies as dirty read, unrepeatable read, and blind write.

Question 3

Define Strict 2PL protocol and precedence
graph, and explain how to use them to address the conflict serializability
problem.

Question 4

Explain the reason of deadlock and discuss
how to deal with a deadlock issue in the DBMS.

Question 5

Briefly explain the ARIES recovery
algorithm, including the main phases, main principles and key ideas behind the
algorithm such as WAL protocol and checkpointing etc.

Part 2. Design considerations and
calculation (60 marks; 15 marks each)

Question 1

Consider a database with objects X and Y
and assume that there are two transactions T1 and T 2. Transaction T 1 reads
objects X and Y and then writes object X. Transaction T 2 reads objects X and Y
and then writes objects X and Y.

1. Give an example schedule with actions of
transactions T1 and T 2 on objects X and Y that results in a write-read
conflict.

2. Give an example schedule with actions of
transactions T1 and T 2 on objects X and Y that results in a read-write
conflict.

3. Give an example schedule with actions of
transactions T1 and T 2 on objects X and Y that results in a write-write
conflict.

4. For each of the three schedules, show
that Strict 2PL disallows the schedule.

Question 2

Answer the following questions: SQL
supports four isolation-levels and two access-modes, for a total of eight
combinations of isolation-level and access-mode. Each combination implicitly
defines a class of transactions; the following questions refer to these eight
classes:

1. Consider the four SQL isolation levels.
Describe which of the phenomena can occur at each of these isolation levels:
dirty read, unrepeatable read, and phantom problem.

2. For each of the four isolation levels,
give examples of transactions that could be run safely at that level.

3. Why does the access mode of a
transaction matter?

Question 3

Consider the following classes of
schedules: serializable, conflict-serializable, viewserializable, recoverable,
avoids-cascading-aborts, and strict. For each of the following schedules, state
to which of the preceding classes it belongs. If you cannot decide whether a
schedule belongs in a certain class based on the listed actions, explain
briefly.The actions are listed in the order they are scheduled and prefixed
with the transaction name. If a commit or abort is not shown, the schedule is
incomplete; assume that abort or commit must follow all the listed actions.

1. T1:R(X), T2:R(X), T1:W(X), T2:W(X)

2. T1:W(X), T2:R(Y), T1:R(Y), T2:R(X)

3. T1:R(X), T2:R(Y), T3:W(X), T2:R(X),
T1:R(Y)

4. T1:R(X), T1:R(Y), T1:W(X), T2:R(Y),
T3:W(Y), T1:W(X), T2:R(Y)

5. T1:R(X), T2:W(X), T1:W(X), T2:Abort,
T1:Commit

6. T1:R(X), T2:W(X), T1:W(X), T2:Commit,
T1:Commit

7. T1:W(X), T2:R(X), T1:W(X), T2:Abort,
T1:Commit

8. T1:W(X), T2:R(X), T1:W(X), T2:Commit,
T1:Commit

9. T1:W(X), T2:R(X), T1:W(X), T2:Commit, T1:Abort

10. T2: R(X), T3:W(X), T3:Commit, T1:W(Y),
T1:Commit, T2:R(Y), T2:W(Z), T2:Commit

11. T1:R(X), T2:W(X), T2:Commit, T1:W(X),
T1:Commit, T3:R(X), T3:Commit

12. T1:R(X), T2:W(X), T1:W(X), T3:R(X),
T1:Commit, T2:Commit, T3:Commit

Question 4

Consider the execution shown in the
following figure. In addition, the system crashes during recovery after writing
two log records to stable storage and again after writing another two

log records.

LSN LOG

00 begin_checkpoint

10 end_checkpoint

20 update: T1 writes P1

30 update: T2 writes P2

40 update: T3 writes P3

50 T2 commit

60 update: T3 writes P2

70
T2 end

80 update: T1 writes P5

90 T3 abort

X CRASH, RESTART

1. What is the value of the LSN stored in
the master log record?

2. What is done during Analysis?

3. What is done during Redo?

4. What is done during Undo?

5. Show the log when recovery is complete,
including all non-null prevLSN and undonextLSN values in log records

D

Part 1. Concepts and principles (20 marks;
5 marks each)

(Each answer should contain approximately
300 words.)

Question 1

What decision need to be made during
physical design, and when should be create clustered indexes?

Question 2

Describe the main idea behind discretionary
access control. What are its weakness?

Question 3

Explain fragmentation and replication, and
explain their differences in data distribution and updating.

Question 4

Define association rules, and explain how
to induct association rules by using frequent itemsets, a Priori Property, and
support and confidence measures.

Part 2. Design and tuning considerations
(20 marks; 10 marks each)

Question 1

Consider the following BCNF relational
schema for a portion of a company database (type information is not relevant to
this question and is omitted):

Project (pno, proj_name, proj_base_dept,
proj_mgr, topic, budget)

Manager (mid, mgr_name, mgr_dept, salary,
age, sex)

Note that each project is based in some
department, each manager is employed in some department, and the manager of a
project need not be employed in the same department (in which the project is
based). Suppose you know that the following queries are the five most common
queries in the workload for this company and all five are roughly equivalent in
frequency and importance:

• List the names, ages, and salaries of
managers of a user-specified sex (male or female) working in a given
department. You can assume that, while there are many departments, each
department contains very few project managers.2

• List the names of all projects with
managers whose ages are in a user-specified range (e.g., younger than 30).

• If a department has a manager who manages
a project based in this department, then list the department name as output
(exclude those departments in the output whose managers always manage some
other departments’ project, or don’t manage any projects at all).

• List the name of the project with the
lowest budget.

• For a given project, list the names of
all managers in the department in which the project is based. Note: a department may have more than one manager
who can manage projects that may or may not belong to the same department, as
described in the question above.

These queries occur much more frequently
than updates, so you should build whatever indexes you need to speed up these
queries. However, you should not build any unnecessary indexes, as updates will
occur (and would be slowed down by unnecessary indexes). Given this
information, design a physical schema for the company database that will give
good performance for the expected workload. In particular, decide which
attributes should be indexed and whether each index should be a clustered or an
unclustered index. Assume that both B+ trees and hashed indexes are supported
by the DBMS, and that both single-and multiple-attribute index keys are
permitted.

1. Specify your physical design by
identifying the attributes you recommend indexing on, indicating whether each
index should be clustered or unclustered and whether it should be a B+ tree or
a hashed index.

2. Assume that this workload is to be tuned
with an automatic index-tuning wizard. Outline the main steps in the algorithm
and the set of candidate configurations considered.

3. Redesign the physical schema assuming
the set of important queries is changed to be the following:

• Find the total of the budgets for
projects managed by each manager; that is, list proj_mgr and the total of the
budgets of projects managed by that manager, for all values of proj_mgr.

• Find the total of the budgets for
projects managed by each manager but only for managers who are in a
user-specified age range.

• Find the number of male managers.

• Find the average age of managers.

Question 2

For each of the following queries, identify
one possible reason why an optimizer might not find a good plan. Rewrite the
query so that a good plan is likely to be found. Any available indexes or known
constraints are listed before each query; assume that the relation schemas are
consistent with the attributes referred to in the query.

Employee (eno, ename, dno, age, sex, sal )

Project (pno, pname, dno, proj_mgr, topic,
budget)

Department (dno, dname, mgr_name, address)
3

1. An index is available on the age
attribute:

SELECT E.dno

FROM Employee E

WHERE E.age = 20 OR E.age = 10

2. A B+ tree index is available on the age
attribute:

SELECT E.dno

FROM Employee E

WHERE E.age <20 AND E.age >10

3. An index is available on the age
attribute:

SELECT E.dno

FROM Employee E

WHERE 2*E.age < 20

4. No index is available:

SELECT DISTINCT *

FROM Employee E

5. No index is available:

SELECT AVG (E.sal)

FROM Employee E

GROUP BY E.dno

HAVING E.dno = 22

6. The dno in Employee is a foreign key
that refers to Department:

SELECT D.dno

FROM Department D, Employee E

WHERE D.dno = E.dno

Part
3. Exploration and/or research report (60 marks)

In this part, you are asked to explore
and/or investigate technical problems of DBM systems. You may either explore
the problems without direct relation with a real DBMS, or focus on one of the
next DBMS such as

• PostgreSQL ( http://www.postgresql.org/ )

• MySQL (http://www.mysql.com/ )

• Oracle
(http://www.oracle.com/database/index.html )

• SQL Server (
http://www.microsoft.com/sqlserver )

• DB2 (www.ibm.com/db2 )

Your exploration or research should focus
on an in-depth topic about theories, techniques, algorithms, approaches,
mechanisms, and implementation of the following techniques:

• File organization and indexes

• Query optimization

• Transaction management

• Distributed database support

• Database security

• Data mining

Your topic could come from a sub-problem of
a cutting-edge research problem about these techniques (if you want to
investigate and solve a technical problem), or a successful (or a planned)
implementation in one of the above DBMSs (if you want to explore an existing
DBMS).

Your investigation will be based on recent
papers, technical documents, and software packages (open source preferred). You
are encouraged to read some papers from the next sources for new techniques in
DBMS:

• Communications of the ACM

• ACM SIGMOD (ACM Special Interest Group on
Management of Data) Newsletter

• SIGMOD: International Conference on
Management of Data

• PODS: Symposium on Principles of Database
Systems

• ACM Transactions on Database Systems

• Knowledge and Data Engineering, IEEE
Transactions on

• Data & Knowledge Engineering

To
ensure that your topic has fair depth and coverage, you are encouraged to write
a one page outline about it, and email it to your tutor in advance. Your tutor
may either agree with it, or provide you with some comments and suggestion for
you to make further selection or modification. If you are having any
difficulties in selecting a topic, discuss some possibilities with your
tutor. Note that topic selection is part
of your work in this assignment an you should not simply reply on your tutor
for help.

Once your topic is decided, you need to set
up the feasible yet meaningful outcomes of your exploration or research on the
topic. Overall, the outcomes should be interesting and beneficial to the
professional community of DBMS research and development. Some possible outcomes
are:

• your useful implementation,

• your analysis, findings and discoveries
of problems,

• results of your test, survey and
comparison,

• your proposed new or improved methods,
algorithms etc.

• your insight on future directions.

You are asked to write a report (no more
than 10 pages, and in a technical report or conference paper format), which
should include the following contents:

• Title

• Abstract: no more than 300 words.

• Introduction/background: motivation and
overall introduction, including

Necessary literature review and citations.

• Methods: describe the problem, and the
methods you use to explore or address the problem.

• Results and findings: research results or
explore your findings. Include
experimental, theoretical analysis, and implementation accomplished based on
your methods.

• Related work: summarize the related work;
compare your methods and results with others’ work. Citations are required.

• Conclusion and future work: conclude your
exploration and research, and point out the possible future work.

• References: include all cited references.

Make sure you always include proper
citation to viewpoints, methods, algorithms, data, results, figures, tables,
etc. from other papers or contributions whenever you use them 5 in your report.
All the reference should be published, or at least publicly and stably
accessible from the Internet.

In addition to the report, your submission
should include necessary practical results such as programs you have done (and
other support packages), and include both source code and necessary libraries,
if applicable. Please make sure the program is compliable and runnable—it
should be complete enough for your tutor to easily run and see claimed results.

The program language can be Java or C/C++.
Please contact your tutor in advance and make sure your program can work on his
or her system.

More to explorer

Answer:

Title: datababase-managment-assignment

This question has been Solved!

Click the button below to order this solution.

Leave a Reply

Your email address will not be published. Required fields are marked *

Open chat