Cover materials on Stanford CS145,345,346
Transaction,Concurrency & Atomicity
Performace problem:Disk/SSD access is slow,DBMS hide the latency by doing more CPU work concurrently.
Transaction:中文翻译成事务,ACID
Transactions:A atomic sequence of independant read/write. DBMS exsures it is a serial execution of instrucitons. One way to solve is through locking, before r/w requires a locking from DBMS.All concurrency issues handled by the DBMS.
Ensure consistency even if crashed:OS concepts,write ahead logging,before any action is finalized, a corresponding entry is forced to disk.
ER Model
What entities to model
How entities are related
What constraints exist in the domain
How to achieve good designs
Requirement analysis,Conceptual design(ER model),Logical/physical/security
Entity & entity sets:entity are basic units of ER model, has several attributes.(eg:product:color,name,company); Key:unique attribute that defined a entity.
Relationship:between 2 entities. subset of all p,Q pairs with tuples uniquly defined by their keys. Use a new entity instead of relationship, and can make it expand.
Define subclass in ER,maintains connection with parent class,contains all attributes of parent class and add new attributes.
Difference with OODesign:OO classes are disjoint,ER entities sets overlap.
Design theory
Norm-forms(范式) and functionality dependency:
3rdNorm form(3NF):Boyce-Codd norm form.DB designs based on functional dependencies,intended to prevent data anomalies.
functional dependency:whenever 2 tuples agreed on A they agreed on B.A->B.
Use FD to find better schema and reduce anomaly.
Given set of FD, minimum FD and eliminate bad one.
Given a set of FD, does another fd holds?
methods:split/combine,reduction/trasivity.
Buffer manager
Operation:read/flush/release(withoutwriting to disks)
Core:External sorting algorithm,can use 3 pages to manage/merge lists of arbitrary length. If M,N length lists,2(M+N) IO operations.
problem:2 lists are both much larger that the main memory to merge directly.
How to sort big files?
1.Split chunks of small enough to sort in memory,
2.merge pairs of run using external sort algorithms.
3.keep merging the result untils only left with 1 list.
Index & B+ tree
Problem:get faster in search for specific attribute values;faster insertions.
Index is a data structure mapping search key(primary key) to each rows.provides efficient retrieval & lookup.
Operation:search,add/remove.
B+ tree: search tree,1 node for 1 page,balanced height adjusted.make leaves into a linkedlist.Leaf nodes contains pointer to data records, internal nodes points to other nodes which contains other slots.
Feature:high fan out.the depth of tree is small, can get to the nodes very fast.
self balancing:balanced with height even if after insert.
Lock
锁的粒度就是锁的作用范围,一般分为行级锁、表级锁。行级锁锁定记录行,表级锁锁定整个表。行级索:系统开销大,加锁慢,锁定粒度最小,会发生死锁,但是发生冲突的概率最低,并发性最高。表级索:系统开销小,加锁快,锁定粒度最大,不会发生死锁,但是发生冲突的概率最高,并发性最低。
共享锁只用于表级,排它锁用于行级或表级。
加了共享锁的对象,可以继续加共享锁,不能再加排他锁。
加了排他锁的对象,不能再加任何锁。
读写锁:共享锁,排它锁。
通常情况下,写操作较少时,使用乐观锁,写操作较多时,使用悲观锁。每次去拿数据的时候都认为别人不会修改,所以不会上锁,但是在更新的时候会判断一下在此期间别人有没有去更新这个数据。
NoSQL的劣势和优势
锁的粒度大,只支持全局读写锁,IO速度慢时容易造成卡死。不支持MySQL的事务操作,join多表查询困难。
事务操作是指一批SQL语句具有相互关联性, 在一个表中插入删除了之后在另一个表里面也要进行相应的批处理操作,否则进行回滚。在 MySQL 中只有使用了 Innodb 数据库引擎的数据库或表才支持事务。
MySQL的事务操作机制:
1、用 BEGIN, ROLLBACK, COMMIT来实现
BEGIN 开始一个事务
ROLLBACK 事务回滚
COMMIT 事务确认
2、直接用 SET 来改变 MySQL 的自动提交模式:
SET AUTOCOMMIT=0 禁止自动提交
SET AUTOCOMMIT=1 开启自动提交
经典问题
事务隔离级别是由谁实现的?
数据库事务的隔离级别有4个,由低到高依次为Read uncommitted、Read committed、Repeatable read、Serializable,这四个级别可以逐个解决脏读、不可重复读、幻读这几类问题。