MIT6.828 Lab1-Lab2

LEC 1

操作系统概述

管道本质上是一个内核的缓冲区,大多数情况下半开工,描述符可跨fork继承.

Lab 1: Booting a PC

PC Bootstrap

BIOS: 设置中断描述符表,初始化设备.加载bootloader

When the BIOS runs, it sets up an interrupt descriptor table and initializes various devices such as the VGA display. This is where the “Starting SeaBIOS” message you see in the QEMU window comes from.

计算机通电后地址设为0xf000:0xfff0,通过一个jmp指令跳转到BIOS的稍前部分

Therefore we shouldn’t be surprised that the first thing that the BIOS does is jmp backwards to an earlier location in the BIOS

The Boot Loader

BIOS将第一个扇区(引导扇区)加载到物理地址0x7c00~0x7dff的内存中,jmp到0x0000:0x7c00,控制权转移到bootloader.

如果此扇区末尾两个字节分别是魔数0x55和0xaa,则BIOS认为此扇区中存在可执行的程序

Floppy and hard disks for PCs are divided into 512 byte regions called sectors. A sector is the disk’s minimum transfer granularity: each read or write operation must be one or more sectors in size and aligned on a sector boundary. If the disk is bootable, the first sector is called the boot sector, since this is where the boot loader code resides. When the BIOS finds a bootable floppy or hard disk, it loads the 512-byte boot sector into memory at physical addresses 0x7c00 through 0x7dff, and then uses a jmp instruction to set the CS:IP to 0000:7c00, passing control to the boot loader. Like the BIOS load address, these addresses are fairly arbitrary - but they are fixed and standardized for PCs.

MBR与bootloader的关系.

MBR是主引导记录(Master Boot Record),也被称为主引导扇区,是计算机开机以后访问硬盘时所必须要读取的第一个扇区。其内部前446字节存储了bootloader代码,其后是4个16字节的“磁盘分区表”。

boot.S流程分析

.set 相当于define的宏定义

.code16以16-bit模式产生代码
cli(clear interrupt flag),禁止中断发生.

下面是两条规则:
1)在改变SS:SP之前,必须用cli指令屏蔽中断,然后等操作执行完立即用sti指令恢复
2)SS:SP需要设置在空闲的内存地址,不要建立在其他的程序(尤其是系统的)代码区

cld(clear direction flag),清除方向标志位,当方向位清除时,字符串操作按递增地址的方式进行.
对应的有sti,std(set)

开启A20 gate:向0x64端口写入0xd1的命令,再向0x60端口写入0xdf(0x11011111)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
seta20.1:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.1

movb $0xd1,%al # 0xd1 -> port 0x64
outb %al,$0x64

seta20.2:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.2

movb $0xdf,%al # 0xdf -> port 0x60
outb %al,$0x60

加载gdt,lgdt 48位内存数据,加载到48位寄存器GDTR,16位的(gdt)段界限,32位的(gdt)段基址
gdt初始化内容:

1
2
3
4
5
6
# Bootstrap GDT
.p2align 2 # force 4 byte alignment
gdt:
SEG_NULL # null seg
SEG(STA_X|STA_R, 0x0, 0xffffffff) # code seg
SEG(STA_W, 0x0, 0xffffffff)

将%cr0寄存器PE标志置1,长跳转ljmp $PROT_MODE_CSEG, $protcseg以更新cs基地址,正式进入保护模式.

更新其他段寄存器
这里有个点需要注意一下,段选择子的0-1位用来存储RPL,第2位是TI(table indicator)位,用来指示选择子是在GDT(0)还是LDT(1)中,3~15位是13位的段描述符索引值,所以boot.S中定义的0x8和0x10代表的序号其实是1和2.

1
2
.set PROT_MODE_CSEG, 0x8         # kernel code segment selector
.set PROT_MODE_DSEG, 0x10 # kernel data segment selector

设置esp后便可以调用c代码bootmain.

control starts in boot.S – which sets up protected mode,and a stack so C code then run, then calls bootmain()

(boot)main.c流程分析

从扇区1(第二个扇区)的位置开始,从硬盘读取8个扇区(一个页大小)的数据到物理内存0x10000处.这一个页的内容只是ELF文件(Kernel)的部分内容,但至少包含了完整的文件头.接下来便根据程序头表读入程序中的各个段到内存中,(这里的段不是内存中的段,而是硬盘上ELF文件中的各个代码段数据段等),形成内核映像,即真正运行的内核.

内核被加载到内存后,loader还要通过分析其elf结构将其展开到新的位置,所以说,内核在内存中有两份拷贝,一份是elf格式的源文件kernel.bin,另一份是loader解析elf格式的kernel.bin后在内存中生成的内核映像,这个映像才是真正运行的内核.

完成内核的加载后,跳转到ELF头中记录的内核的入口点.0x10000c

补充一下JOS的磁盘镜像相关内容.JOS的引导盘由两部分组成,boot和kernel,boot位于第一个扇区(引导扇区).所以需要从扇区1开始读取Kernel的ELF文件.

1
2
3
4
5
6
7
# How to build the kernel disk image
$(OBJDIR)/kern/kernel.img: $(OBJDIR)/kern/kernel $(OBJDIR)/boot/boot
@echo + mk $@
$(V)dd if=/dev/zero of=$(OBJDIR)/kern/kernel.img~ count=10000 2>/dev/null
$(V)dd if=$(OBJDIR)/boot/boot of=$(OBJDIR)/kern/kernel.img~ conv=notrunc 2>/dev/null
$(V)dd if=$(OBJDIR)/kern/kernel of=$(OBJDIR)/kern/kernel.img~ seek=1 conv=notrunc 2>/dev/null
$(V)mv $(OBJDIR)/kern/kernel.img~ $(OBJDIR)/kern/kernel.img

The Kernel

entry.S流程分析

上来第一个指令就看不懂

1
movw	$0x1234,0x472			# warm boot

加载页目录表到$cr3(页目录寄存器)
$cr0的PG位置1,开启分页模式.
初始化堆栈

开启分页模式后,线性地址的高十位作为页目录表entry_pgdir的索引,中间10位作为页表entry_pgtable的索引.
如0xf0100000,页部件先以960为索引在页目录表中找到对应的页表条目,再以256为索引找到页表条目中的物理地址0x100000.
理解了这个过程之后再看entrypgdir.c就明白如下映射的原理了:

1
2
[KERNBASE, KERNBASE+4MB) --> [0, 4MB) .
[0, 4MB) --> [0, 4MB)

Formatted Printing to the Console

c可变参数的实现机制:
va_list args; // 准备接受参数的列表对象
va_start(args, fmt); // 从…中取出参数到args中,并指定…之前的参数
T va_arg(va_list, T);
va_end(args); // 释放参数列表

其实就是通过va_start得到固定参数fmt的地址再加上sizeof(fmt)得到第一个参数的地址,之后每次调用va_arg函数获得T类型的参数值,并加sizeof(T)指向下一个参数.

八进制改改就好了

1
2
3
4
5
6
7
8
9
10
11
case 'o':
// Replace this with your code.
num = getint(&ap,lflag);
if((long long)num < 0)
{
putch('-',putdat);
num = -(long long)num;
}
base = 8;
goto number;
break;

The Stack

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
// Your code here.
cprintf("Stack backtrace:\n");
uint32_t ebp = read_ebp();
uint32_t eip = *(uint32_t*)(ebp+4);
struct Eipdebuginfo dinfo;
while(1)
{
debuginfo_eip(eip,&dinfo);
cprintf(" ebp %08x eip %08x args %08x %08x %08x %08x %08x\n",ebp,eip,*(uint32_t*)(ebp+8),*(uint32_t*)(ebp+12),*(uint32_t*)(ebp+16),*(uint32_t*)(ebp+20),*(uint32_t*)(ebp+24));
cprintf(" %s:%d: %.*s+%d\n",dinfo.eip_file,dinfo.eip_line,dinfo.eip_fn_namelen,dinfo.eip_fn_name,eip-dinfo.eip_fn_addr);
ebp = *(uint32_t*)(ebp);
if(ebp!=0)
eip = *(uint32_t*)(ebp+4);
else
break;
}
return 0;
}

其实没怎么弄明白.
https://sourceware.org/gdb/onlinedocs/stabs.html#Symbol-Tables
https://sourceware.org/gdb/onlinedocs/stabs.html#Line-Numbers

1
2
3
4
5
6
7
8
stab_binsearch(stabs, &lline, &rline, N_SLINE, addr);
if (lline <= rline) {
info->eip_line = stabs[lline].n_desc;
} else {
info->eip_fn_addr = addr;
lline = lfile;
rline = rfile;
}

LEC 3

shell

先学学pipe的用法.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h>
int
main(int argc, char *argv[])
{
int pipefd[2];
char buf;
pid_t cpid;
if (argc != 2) {
fprintf(stderr, "Usage: %s <string>\n", argv[0]);
exit(EXIT_FAILURE);
}
if (pipe(pipefd) == -1) {
perror("pipe");
exit(EXIT_FAILURE);
}
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) { /* Child reads from pipe */
close(pipefd[1]); /* Close unused write end */
while (read(pipefd[0], &buf, 1) > 0)
write(STDOUT_FILENO, &buf, 1);
write(STDOUT_FILENO, "\n", 1);
close(pipefd[0]);
_exit(EXIT_SUCCESS);
} else { /* Parent writes argv[1] to pipe */
close(pipefd[0]); /* Close unused read end */
write(pipefd[1], argv[1], strlen(argv[1]));
close(pipefd[1]); /* Reader will see EOF */
wait(NULL); /* Wait for child */
exit(EXIT_SUCCESS);
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Execute cmd.  Never returns.
void
runcmd(struct cmd *cmd)
{
int p[2], r;
struct execcmd *ecmd;
struct pipecmd *pcmd;
struct redircmd *rcmd;

if(cmd == 0)
_exit(0);

switch(cmd->type){
default:
fprintf(stderr, "unknown runcmd\n");
_exit(-1);

case ' ':
ecmd = (struct execcmd*)cmd;
if(ecmd->argv[0] == 0)
_exit(0);
// Your code here ...
execve(ecmd->argv[0],ecmd->argv,NULL);
break;

case '>':
case '<':
rcmd = (struct redircmd*)cmd;
// Your code here ...
r = open(rcmd->file,rcmd->flags,S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH);
if(r==-1)
{
perror("no such file");
}
if(dup2(r,rcmd->fd)==-1)
perror("dup2 fail");
runcmd(rcmd->cmd);
close(rcmd->fd);
break;

case '|':
if(pipe(p)==-1)
perror("pipe create error");
pcmd = (struct pipecmd*)cmd;
if(fork1()==0)
{
close(p[0]);
if(dup2(p[1],STDOUT_FILENO)==-1)
perror("dup2 fail");
runcmd(pcmd->left);
}
else
{
wait(&r);
close(p[1]);
if(dup2(p[0],STDIN_FILENO)==-1)
perror("dup2 fail");
runcmd(pcmd->right);
}

}
_exit(0);
}

Lab 2: Memory Management

mem_init流程分析

可以先看完下面的部分再来看流程分析.
删去了注释和check.
i386_detect_memory检测机器的物理空间大小,然后使用boot_alloc分配一个kern_pgdir的页目录表.
再次boot_alloc为每一个物理页分配一个PageInfo管理结构.
之后page_init完成对整个物理页的初始化.标记不可分配或正在使用的物理页为inuse,完成空闲页链表page_free_list的创建.
在之后调用三次boot_map_region在kern_pgdir中完成对用户页表UPAGES,内核栈,内核地址空间的映射.
安装kern_pgdir替换之前临时的entry_pgdir.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
void
mem_init(void)
{
uint32_t cr0;
size_t n;

i386_detect_memory();

kern_pgdir = (pde_t *) boot_alloc(PGSIZE);
memset(kern_pgdir, 0, PGSIZE);


kern_pgdir[PDX(UVPT)] = PADDR(kern_pgdir) | PTE_U | PTE_P;


pages = (struct PageInfo* )boot_alloc(npages*sizeof(struct PageInfo));
memset(pages,0,npages*(sizeof(struct PageInfo)));


page_init();


boot_map_region(kern_pgdir, UPAGES, npages*sizeof(struct PageInfo), PADDR(pages), PTE_U|PTE_P);


boot_map_region(kern_pgdir,KSTACKTOP-KSTKSIZE,KSTKSIZE,PADDR(bootstack),PTE_W|PTE_P);


boot_map_region(kern_pgdir, KERNBASE, (1ULL << 32) - KERNBASE, 0, PTE_W);

lcr3(PADDR(kern_pgdir));


cr0 = rcr0();
cr0 |= CR0_PE|CR0_PG|CR0_AM|CR0_WP|CR0_NE|CR0_MP;
cr0 &= ~(CR0_TS|CR0_EM);
lcr0(cr0);

}

Part 1: Physical Page Management

写page_init之前回顾理解一下物理地址和虚拟地址的布局,在图中指出memlayout.h中各个宏定义表示的位置.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void
page_init(void)
{
//设置第一页为inuse,保留实模式下的IDT和BIOS结构
size_t i = 0;
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
++i;

//将Base_memory全部设置为free.(basemem应该指低1MB的内存)
for (i = 1; i < npages_basemem; ++i) {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}

//将IO hole设置为inuse,避免被分配.
for(i = pa2page(IOPHYSMEM),i<pa2page(EXTPHYSMEM),++i)
{
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}

//将1MB以上的Kernel映像,kern_pgdir以及pages的页设置为inuse.
for(i = pa2page(EXTPHYSMEM);i<pa2page(PADDR(boot_alloc(0)));++i)
{
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}

//将上方所有物理地址设置为free
for(;i<npages;++i)
{
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}

之后就是page_alloc(),page_free().
如下是初看时的一些疑惑,我知道很蠢hh
我暂时并不理解这两个功能存在的必要,特别是把释放的物理页对应的pageinfo链入page_free_list链表中的操作.
我们在meminit中已经调用bootalloc为所有的物理页分配了一个对应的pageinfo结构,这些结构即使被page_free释放之后也不会被其他用途的内存分配重用,不存在像常见的内存分配机制的设立是为了提高内存利用率的原因.
解答:释放pageinfo结构,其实是表示对pageinfo对应物理页的释放与分配,用于操作系统的追踪管理,与提高内存利用率无关.

The operating system must keep track of which parts of physical RAM are free and which are currently in use. JOS manages the PC’s physical memory with page granularity so that it can use the MMU to map and protect each piece of allocated memory.

更不理解的,pageinfo结构中并没有记录对应物理页的地址,page2pa函数是通过pageinfo结构相对于pages的偏移来计算出对应的物理页的.但当page_insert调用page_alloc是从free_list中取出最后free的pageinfo结构,不与物理地址对应.
解答:page_alloc的作用是分配客户(相对于该函数来说)申请的一页的物理地址空间,客户并不在乎该物理地址空间位于哪里,page_alloc只是随便取出一个pageinfo结构,并将该pageinfo结构对应的物理页分配给客户.是先有pageinfo结构再有对应的物理页,而不是为某个特定的物理页分配pageinfo结构,自然不存在无法对应的问题.

关于pgdir_walk() boot_map_region() page_lookup() page_remove() page_insert()等函数的实现,主要理清内核虚拟地址(KADDR),物理地址,物理页,PageInfo等结构的关系,实现完后跟着check的报错一点点调整.太多了就不放出来了.详见github.

Part 2: Virtual Memory

详见github.

Part 3: Kernel Address Space

详见github.

Challenge

物理页面映射打印

模仿实验中qemu的info pg命令实现showmappings.
合并打印原则是权限位相同且物理页相邻的页表合并打印,页目录表始终不合并.其实改进也简单但是时间挺紧的…
qemu中的info pg:

showmappings:

一些小插曲

memset崩了,刚开始猜测是引用到了当前页目录映射之外的物理地址,到崩溃现场之后发现又没问题,找了半天发现原来页目录表中虚拟地址低4MB没有写的权限。。。

1
2
3
4
5
6
7
8
9
10
__attribute__((__aligned__(PGSIZE)))
pde_t entry_pgdir[NPDENTRIES] = {
// Map VA's [0, 4MB) to PA's [0, 4MB)
[0]
= ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P,
// Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
[KERNBASE>>PDXSHIFT]
= ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P + PTE_W
};

又突然想到page_init后,page_free_list指向的头部页应该位于物理内存的最高处,确实是在当前页目录映射之外的,确实应该崩.检查之后发现在check_page_free_list函数中有这样一个处理free_list的操作.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if (only_low_memory) {
// Move pages with lower addresses first in the free
// list, since entry_pgdir does not map all pages.
struct PageInfo *pp1, *pp2;
struct PageInfo **tp[2] = { &pp1, &pp2 };
for (pp = page_free_list; pp; pp = pp->pp_link) {
int pagetype = PDX(page2pa(pp)) >= pdx_limit;
*tp[pagetype] = pp;
tp[pagetype] = &pp->pp_link;
}
*tp[1] = 0;
*tp[0] = pp2;
page_free_list = pp1;
}

  • 版权声明: 本博客所有文章除特别声明外,著作权归作者所有。转载请注明出处!
  • Copyrights © 2022-2024 翰青HanQi

请我喝杯咖啡吧~

支付宝
微信