写在前面
总结一下 cmu15445 第一节课的内容. 前面的可以不听, 主要是 40 分钟往后的内容.
视频-01 - Relational Model & Relational Algebra (CMU Intro to Database Systems / Fall 2022);
课件;
测试数据库(MySQL8)
show databases;
create database test2;
use test2;
分类
符号 | 名称 | 中文释义 | 作用 |
---|---|---|---|
$\sigma$ | Select | 选择 | |
$\Pi$ | Projection | 投影 | |
$\cup$ | Union | 并 | |
$\cap$ | Intersection | 交 | |
$-$ | Difference | 差 | |
$\times$ | Product | 叉积 | |
$\Join$ | Join | 联结 |
Select: 选择
\[\sigma_{\rm predicate}(\rm R)\]Choose a subset of the tuples from a relation that satisfies a selection predicate.
从满足选择谓词的关系中选择元组的子集。
在过滤操作中定义多个谓词(判别式)来做更复杂的事情并获取实际的数据.
例子
\[\sigma_{a\_id='a2'\wedge b\_id>102}(R)\]WHERE 子句.
SELECT * FROM R
WHERE a_id = 'a2' AND b_id > 102;
具体例子:
Create table If Not Exists R
(
a_id varchar(2),
b_id int
);
Truncate table R;
insert into R (a_id, b_id)
values ('a1', 101);
insert into R (a_id, b_id)
values ('a2', 102);
insert into R (a_id, b_id)
values ('a2', 103);
insert into R (a_id, b_id)
values ('a3', 104);
select *
FROM R;
/*
a_id,b_id
a1,101
a2,102
a2,103
a3,104
*/
执行上述查询后得到:
SELECT *
FROM R
WHERE a_id = 'a2'
AND b_id > 102;
/*
a_id,b_id
a2,103
*/
Projection: 投影
\[\Pi_{A_1,A_2,...,A_n}(R)\]从输出的关系中删除不想要的某些属性
例子
\[\Pi_{b\_id-100,\ a\_id}(\sigma_{a\_id='a2'}(R))\]SELECT b_id - 100, a_id
FROM R
WHERE a_id = 'a2';
/*
b_id - 100,a_id
2,a2
3,a2
*/
Union: 并
\[R\ \cup\ S\]例子
\[R\ \cup \ S\]DROP TABLE R;
Create table If Not Exists R
(
a_id varchar(2),
b_id int
);
Truncate table R;
insert into R (a_id, b_id)
values ('a1', 101);
insert into R (a_id, b_id)
values ('a2', 102);
insert into R (a_id, b_id)
values ('a3', 103);
Create table If Not Exists S
(
a_id varchar(2),
b_id int
);
Truncate table S;
insert into S (a_id, b_id)
values ('a3', 103);
insert into S (a_id, b_id)
values ('a4', 104);
insert into S (a_id, b_id)
values ('a5', 105);
查询结果
(SELECT * FROM R)
UNION ALL
(SELECT * FROM S);
/*
a_id,b_id
a1,101
a2,102
a3,103
a3,103
a4,104
a5,105
*/
Intersection: 交
\[R\ \cap \ S\]例子
\[R\ \cap \ S\]# (SELECT * FROM R)
# INTERSECT (SELECT * FROM S);
-- MySQL do not support INTERSECT
SELECT *
FROM R
WHERE R.a_id IN (SELECT a_id FROM S);
/*
a_id,b_id
a3,103
*/
Difference: 差
\[R - S\]例子
\[R-S\]-- Difference
# (SELECT * FROM R) EXCEPT (SELECT * FROM S);
SELECT *
FROM R
WHERE a_id NOT IN (SELECT a_id FROM S);
/*
a_id,b_id
a1,101
a2,102
*/
-- 下面这种写法后面除法会用到
SELECT *
FROM R
WHERE NOT EXISTS (#
SELECT a_id
FROM S
WHERE R.a_id = S.a_id);
/*
a_id,b_id
a1,101
a2,102
*/
Product: 积
\[R\times S\]例子
\[R\times S\]-- Product
SELECT * FROM R, S;
/*
a_id,b_id,a_id,b_id
a3,103,a3,103
a2,102,a3,103
a1,101,a3,103
a3,103,a4,104
a2,102,a4,104
a1,101,a4,104
a3,103,a5,105
a2,102,a5,105
a1,101,a5,105
*/
SELECT * FROM R CROSS JOIN S;
/*
a_id,b_id,a_id,b_id
a3,103,a3,103
a2,102,a3,103
a1,101,a3,103
a3,103,a4,104
a2,102,a4,104
a1,101,a4,104
a3,103,a5,105
a2,102,a5,105
a1,101,a5,105
*/
联结
\[R\Join S\]例子
\[R\Join S\]-- Join
SELECT * FROM R NATURAL JOIN S;
/*
a_id,b_id
a3,103
*/
SELECT * FROM R JOIN S USING (a_id, b_id);
/*
a_id,b_id
a3,103
*/
补充的关联代数
用途 | 备注 | ||
---|---|---|---|
$\rho$ | Rename | 重命名 | 投影的扩展用法 |
$R\leftarrow S$ | Assignment | 赋值变量 | |
$\delta$ | Duplicate Elimination | 去重 | DISTINCT |
$\gamma$ | Aggregation | 聚合 | GROUP BY |
$\tau$ | Sorting | 排序 | ORDER BY |
$R÷ S$ | Division | 除法 |
这里主要说一下除法, 别的其实都很好理解.
除法
例子和解释参考维基百科.
公式
\[R\ ÷\ S=\Pi_{a_1,...,a_n}(R)-\Pi_{a_1,...,a_n}\big(\Pi_{a_1,...,a_n}(R)\times S-R\big)\]理解
其实应该先看例子的, 这里先说一下抽象的理解:
mysql查询集合查询之数据库除法、关系代数除法(优化嵌套查询);
这篇文章讲解的思路很清晰, 但是 SQL 语句我感觉有点问题..
整个除法的抽象过程就是,先找到被除数那个大集合从中获取我们需要属性的全集,这个通常就是一个带distinct的查询就能解决,然后吧这个结果与除数的集合做笛卡尔积,然后再与原来的大集合做减法,这一步得到的就是那些不同时包含我们除数元组的期望属性,然后再与之前所求的所有属性做减,就得到我们期望的结果。这个过程的思想就是一个先求补集的思想,以为我们需要求同时包含这几项的另外一个属性值这就相当于逻辑与操作,直接做相对于复杂,我们把它转化为集合的运算过后,就容易多了
例子
创建测试数据
-- test data
Create table If Not Exists Completed
(
Student varchar(10),
Task varchar(10)
);
Truncate table Completed;
insert into Completed (Student, Task)
values ('Fred', 'Database1');
insert into Completed (Student, Task)
values ('Fred', 'Database2');
insert into Completed (Student, Task)
values ('Fred', 'Compiler1');
insert into Completed (Student, Task)
values ('Eugene', 'Database1');
insert into Completed (Student, Task)
values ('Eugene', 'Compiler1');
insert into Completed (Student, Task)
values ('Sara', 'Database1');
insert into Completed (Student, Task)
values ('Sara', 'Database2');
SELECT *
FROM Completed;
/*
Student,Task
Fred,Database1
Fred,Database2
Fred,Compiler1
Eugene,Database1
Eugene,Compiler1
Sara,Database1
Sara,Database2
*/
Create table If Not Exists DBProject
(
Task varchar(10)
);
insert into DBProject (Task)
values ('Database1');
insert into DBProject (Task)
values ('Database2');
SELECT *
FROM DBProject;
/*
Task
Database1
Database2
*/
-- Division
-- 有点像: https://leetcode.cn/problems/students-and-examinations/
-- 如果“DB项目”包含数据库项目的所有任务,则这个除法的结果精确的包含已经完成了数据库项目的所有学生。
首先给出套公式的方法:
-- 方法 1, 存临时表: 笛卡尔积, 方便后续操作
/*
SELECT DISTINCT Student
FROM (SELECT Student
FROM Completed,
DBProject) AS tmp
WHERE NOT EXISTS(#
SELECT *
FROM Completed
WHERE tmp.Student = Student
AND tmp.Task = Task) # 由于这里笛卡尔积之后键有二义性, 所以不能直接使用
);
*/
DROP TEMPORARY TABLE IF EXISTS tmp;
DROP TEMPORARY TABLE IF EXISTS tmp1;
CREATE TEMPORARY TABLE tmp1(#
SELECT DISTINCT Student
FROM Completed);
CREATE TEMPORARY TABLE tmp(#
SELECT *
FROM tmp1,
DBProject);
SELECT Student
FROM tmp
WHERE NOT EXISTS(#
SELECT *
FROM Completed c
WHERE tmp.Student = c.Student
AND tmp.Task = c.Task);
SELECT DISTINCT Student
FROM Completed c
WHERE NOT EXISTS(#
SELECT *
FROM (SELECT *
FROM tmp
WHERE NOT EXISTS(#
SELECT *
FROM Completed c
WHERE tmp.Student = Student
AND tmp.Task = Task)) t # 这就是上一步的结果
WHERE c.Student = t.Student);
/*
Student
Fred
Sara
*/
下面是方法 2.
这里的语句采用了数据库除法查询技巧给出的思路, 直接把笛卡尔积运算放在子查询中了, 非常普适.
-- 方法 2, 不额外存笛卡尔积, 比较巧妙, 建议先理解一下方法 1
-- c ÷ d
SELECT DISTINCT c.Student
FROM Completed c
WHERE NOT EXISTS( #
SELECT *
FROM DBProject d
WHERE NOT EXISTS( #
SELECT *
FROM Completed c1
WHERE c.Student = c1.Student
AND d.Task = c1.Task));
/*
Student
Fred
Sara
*/