CSE1505 -- Exam Pattern Analysis

Based on 8 past exams (2020--2024): 4 Endterms, 4 Resits

18
Distinct Patterns
8
Exams Analyzed
~55
Total Questions
2020--2024
Year Range
Appears in 5+ exams
Appears in 3--4 exams
Appears in 1--2 exams
END Endterm
RES Resit

Table of Contents

  1. Cost-Based Access Path Selection (Size Estimation & Algorithm Choice)
  2. General Discussion Questions (Agree/Disagree with Claims)
  3. Two-Phase Commit Protocol (2PC)
  4. Logging & Recovery (UNDO / REDO / UNDO-REDO)
  5. CAP Theorem
  6. Conflict Serializability (Precedence / Conflict Graph)
  7. Join Order Optimization
  8. Relational Algebra Queries
  9. Reduction Factor Accuracy & Size Estimation Discussion
  10. Algebraic Query Optimization Heuristics
  11. Advanced SQL Queries
  12. Indexing Technique Selection (Hash-Index vs B+Tree)
  13. B+Tree / KD-Tree Insertion
  14. Row-Store vs Column-Store
  15. Two-Phase Locking (2PL) Schedule Validation
  16. Data Integration (Schema Matching, Merging, Quality)
  17. Conceptual Modelling (EER Diagrams)
  18. Functional Dependencies & Normalization (2NF, 3NF)

1. Cost-Based Access Path Selection (Size Estimation & Algorithm Choice) Very High

END 2020 Q5.1 • 2022 Q5.1 • 2023 Q1.1 • 2024 Q1.1  |  RES 2020 Q4.2 • 2022 Q3.2 • 2024 Q1
Given a schema with table/column statistics and an operator tree, estimate the size of intermediate results (number of tuples and blocks using reduction factors), then choose the cheapest algorithm (Relation Scan, Index Scan, BNL, INL, MJ) by computing block-access costs.

Methods to Solve

Example (Resit 2024 Q1.1): For σprice<1000 ∧ max_occupants=3(Apartment) with 10,000 records, 69 blocks: rfprice = 500/2001 ≈ 0.25, rfmax_occ = 1/6. Combined rf ≈ 0.0416 ⇒ 417 records, 3 blocks. Index Scan on max_occupants: 1/6 × 10000 × 2 ≈ 3334 block accesses. Relation Scan: 69 blocks. ⇒ Relation Scan is cheaper.

2. General Discussion Questions (Agree/Disagree with Claims) Very High

END 2020 Q6 • 2022 Q6 • 2023 Q4 • 2024 Q4  |  RES 2020 Q6 • 2022 (within Q4) • 2024 Q2.2--2.5
A statement is presented (e.g., about query optimization, transactions, NoSQL, SQL injection, schemas) and you must agree, disagree, or say "it depends," providing 1--2 strong arguments with justification. Recurring topics: "best possible execution plan" goal, transaction processing necessity, SQL injection responsibility, schema-free vs schema-based, aggregate data models, hash vs B+Tree lookups.

Methods to Solve

Example (Endterm 2022 Q6a): "The goal of query optimization is to find the best possible execution plan." — No, the goal is to find a good plan quickly. Finding the optimal plan can be prohibitively expensive (especially for large join orders), and heuristics are coarse approximations anyway.

3. Two-Phase Commit Protocol (2PC) Very High

END 2020 Q5.1 • 2022 Q4.1  |  RES 2020 Q5.1 • 2022 Q4.1 • 2023 Q5.1 • 2024 Q3.1
Mark TRUE/FALSE statements about 2PC properties (atomicity, consistency, network partitions, majority voting), and/or analyze a 2PC scenario where a node fails at a specific step and determine what happens next.

Methods to Solve

Example (Endterm 2020 Q5.1.2): After coordinator sends commit, all nodes receive it, all send ACK except S1 (pulled plug, disk OK). "Coordinator should send rollback and abort" ⇒ FALSE. "Coordinator can declare success after S1 comes back" ⇒ TRUE. "Coordinator can declare success even before S1 returns (enough info in S1's log)" ⇒ TRUE.

4. Logging & Recovery (UNDO / REDO / UNDO-REDO) Very High

END 2020 Q5.2 • 2023 Q3 • 2024 Q3  |  RES 2020 Q5.2 • 2022 Q4.2 • 2024 Q4
Given a table of actions (READ, WRITE, OUTPUT, computations) with memory/disk values, fill in the log entries and FLUSH LOG commands. Alternatively, determine disk values before/after recovery following a crash. Variants: UNDO logging, REDO logging, or UNDO-REDO logging.

Methods to Solve

Example (Resit 2024 Q4 — UNDO): UNDO log with crash at line 13 (after OUTPUT(A) at line 11, before OUTPUT(B)). Before recovery: A = 20 (OUTPUT completed), B = 4 (OUTPUT not reached). After recovery: since COMMIT was not flushed, transaction is uncommitted ⇒ undo A back to old value 6. Final: A = 6, B = 4.

5. CAP Theorem Very High

END 2020 Q4.2 • 2022 Q4.2 • 2023 Q4.2 • 2024 Q4.2  |  RES 2020 Q6a • 2024 Q3.2
Discuss or evaluate claims about the CAP theorem: whether a distributed system can simultaneously guarantee Consistency, Availability, and Partition tolerance. Often framed as "can a vendor guarantee consistency during network partitions?" or TRUE/FALSE about AP/CP system properties.

Methods to Solve

Example (Endterm 2022 Q4.2): "The only solution to ensure consistency during network partition is to refuse answering user requests." — Agree. By the CAP theorem, if you want consistency (C) during a partition (P), you must sacrifice availability (A), meaning the system must refuse some requests.

6. Conflict Serializability (Precedence / Conflict Graph) Very High

END 2020 Q4.3 • 2022 Q4.3 • 2023 Q2 • 2024 Q2  |  RES 2022 Q4.3
Given a schedule of concurrent transactions with read/write operations, draw the conflict (precedence) graph and determine whether the schedule is conflict-serializable. If there is a cycle, it is NOT conflict-serializable.

Methods to Solve

Example (Endterm 2024 Q2): Schedule: T1 reads(A), T2 writes(A), T1 writes(A). Conflicts on A: T1 r(A) before T2 w(A) ⇒ T1→T2. T2 w(A) before T1 w(A) ⇒ T2→T1. Cycle: T1→T2→T1. ⇒ NOT conflict-serializable.

7. Join Order Optimization Very High

END 2020 Q5.2a • 2022 Q5.2 • 2023 Q1.2.2 • 2024 Q1.2.2  |  RES 2024 Q2.1
Evaluate whether a given operator tree has a good join order, and if not, propose a better one. Argue why a particular join order is better based on intermediate result sizes (without necessarily computing exact costs).

Methods to Solve

Example (Endterm 2023 Q1.2.2): Original tree joins all Users with all Ratings first (100,000 tuples), then filters for Dandelion recipes. Better: start with σname='Dandelion'(Ingredient) ⋈ uses ⋈ Recipe ⋈ rating ⋈ User. This filters to ~1 ingredient first, drastically reducing all subsequent joins.

8. Relational Algebra Queries Very High

END 2020 Q2 • 2022 Q2  |  RES 2020 Q3 • 2022 Q2 • 2023 Q3
Translate natural-language queries into relational algebra expressions using selection (σ), projection (π), join (⋈), union (∪), difference (−), rename (ρ), and extended operators like grouping/aggregation (γ).

Methods to Solve

Example (Resit 2022 Q2.1): "For each date in booking, show total payment." ⇒ γbooking_date, SUM(payment_amount)(booking).

9. Reduction Factor Accuracy & Size Estimation Discussion High

END 2023 Q1.2.1 • 2024 Q1.2.1  |  RES 2024 Q2.2 • 2024 Q2.3
Discuss whether the standard heuristic formula (1/#distinct_values or (value−low)/(max−low+1)) gives accurate size estimates for a specific query. Identify when heuristics fail (correlated attributes, skewed distributions) and suggest improvements like histograms.

Methods to Solve

Example (Resit 2024 Q2.2): σprice<700 ∧ sqm>90(apartment). The heuristic assumes price and sqm are independent, but in reality they are positively correlated (larger = more expensive). Asking for cheap AND large yields far fewer results than estimated ⇒ estimate is too high.

10. Algebraic Query Optimization Heuristics High

END 2020 Q5.2b • 2022 Q5.2  |  RES 2020 Q4.2c • 2024 Q2.4
Identify which algebraic heuristics (push selections down, push projections down, replace selection+Cartesian product with join) should be applied to an operator tree, and discuss their impact on query cost.

Methods to Solve

Example (Resit 2024 Q2.4): Removing a_id from projection πo_id, a_id, end_date to πo_id, end_date seems like "push projection" optimization. But a_id is needed for the join at i7 above ⇒ this breaks the query. Answer: NOT beneficial.

11. Advanced SQL Queries High

END 2020 Q1 • 2022 Q1  |  RES 2022 Q1
Write SQL queries involving JOINs, GROUP BY, HAVING, subqueries (EXISTS, IN, NOT EXISTS), CTEs (WITH clause), aggregate functions, and conditions on dates/ages. Queries range from simple aggregations to complex nested/correlated subqueries.

Methods to Solve

Example (Resit 2022 Q1.1): "List travel agencies with ≥10 bookings after 01-01-2022." ⇒ SELECT ta.name, COUNT(*) FROM travel_agency ta JOIN booking b ON ta.t_id = b.t_id WHERE b.booking_date > '2022-01-01' GROUP BY ta.t_id, ta.name HAVING COUNT(*) ≥ 10.

12. Indexing Technique Selection (Hash-Index vs B+Tree) High

END 2020 Q3.1 • 2022 Q3.1  |  RES 2024 Q2.5
Given queries with different WHERE conditions, select which index type (Hash-Index or B+Tree) is best suited for each attribute used. Also includes TRUE/FALSE statements about index properties (primary vs secondary, insertion cost, etc.).

Methods to Solve

Example (Endterm 2020 Q3.1): SELECT * FROM Person WHERE age = 32 ⇒ age: Hash-Index (equality), id: N/A (not used). SELECT * FROM Person WHERE id > 23 AND age > 32 ⇒ age: B+Tree (range), id: B+Tree (range).

13. B+Tree / KD-Tree Insertion High

RES 2020 Q4.1 (B+Tree) • 2022 Q3.1 (KD-Tree) • 2023 Q4.1 (B+Tree)
Draw the resulting B+Tree after inserting a value (may require node splits), or draw a KD-Tree after inserting a multi-dimensional point. Follow specified semantics (≤ left, > right, or < left, ≥ right).

Methods to Solve

Example (Resit 2020 Q4.1): Insert 100 into a B+Tree with capacity 3 keys per node. 100 goes to the rightmost leaf [46, 47, 50], which is full. Split: [46, 47] and [50, 100]. Push 50 up to parent [22, 33, 44], which is also full. Split parent: [22, 33] and [50]. Push 44 up as new root.

14. Row-Store vs Column-Store Low

END 2020 Q3.2 • 2022 Q3.2
Given query characteristics (OLTP updates, full-row retrieval, column aggregations), determine whether row-wise or column-wise storage is more suitable.

Methods to Solve

15. Two-Phase Locking (2PL) Schedule Validation Low

END 2020 Q5.3  |  RES 2020 Q5.3
Given schedules using lock/unlock/read/write notation (L, U, R, W), determine whether each schedule follows a valid two-phase locking protocol.

Methods to Solve

Example (Endterm 2020 Q5.3): Schedule: L1(B);R1(B); L2(A);R2(A); W1(B); U1(B);L3(B);W3(B); L1(C);... — Transaction 1 unlocks B then later locks C ⇒ violation of 2PL for T1. Schedule is FALSE (not valid 2PL).

16. Data Integration (Schema Matching, Merging, Data Quality) Low

END 2023 Q4.3 • 2024 Q4.3
Describe the key tasks of data integration (e.g., when two companies merge databases) and list data quality dimensions with examples. Typically worth 6 points with multiple sub-parts.

Methods to Solve

Example (Endterm 2023 Q4.3): Jumbo and Albert Heijn merge. Schema matching: find that Jumbo's "artikel" = AH's "product". Data matching: identify that "Coca-Cola 1L" in both DBs is the same item. Data quality issue: Duplication (same product in both DBs) and Consistency (different price formats).

17. Conceptual Modelling (EER Diagrams) Low

RES 2020 Q1 • 2023 Q1
Given a scenario description, complete or design a Conceptual EER schema: identify entities, relationships, attributes, primary keys, cardinalities, specialization/generalization (ISA), and participation constraints. Comment on requirements that cannot be modeled.

Methods to Solve

18. Functional Dependencies & Normalization (2NF, 3NF) Low

RES 2020 Q2 • 2023 Q2
Given a relation R(A, B, C, D, E) and a set of functional dependencies, derive candidate keys, then decompose the relation into 2NF and 3NF. Multiple rows with different FD sets are given in a table format.

Methods to Solve


Generated from 8 exams: Endterms 2020, 2022, 2023, 2024 | Resits 2020, 2022, 2023, 2024
CSE1505 Information and Data Management, TU Delft