操作系统笔记(1)-进程

概念

进程的非正式定义: 进程就是运行中的程序
程序如何转化为进程: 操作系统将代码和所有静态数据加载到内存中, 然后为程序的运行时栈分配一些内存, 也可能为程序的堆分配内存. 然后启动程序，跳转到程序的入口, 即 main() 函数

进程状态

进程在不同的时间可能处于不同的状态, 一般来说, 进程可以处于以下 3 种状态之一:

运行 : 在运行状态下, 进程正在处理器上运行. 这意味着它正在执行指令
就绪 : 在就绪状态下, 进程已准备好运行, 但由于某种原因, 操作系统选择不在此时运行 ( 一般是由于操作系统的调度)
阻塞 : 在阻塞状态下, 一个进程执行了某种操作, 直到发生其他事件时才会准备运行. 一个常见的例子是, 当进程向磁盘发起 I/O 请求时, 它会被阻塞, 因此其他进程可以使用处理器

进程三态模型

注 : 除了运行、就绪和阻塞之外, 还有其他一些进程可以处于的状态. 有时系统会有一个 初始 状态, 表示进程在创建时处于的状态. 另外一个是 终止 状态, 表示进程处于已退出但尚未清理的状态, 比如说 僵尸进程 , 一般需要在父进程等待, 告诉操作系统清理这个进程的相关数据结构

进程 API

Linux 系统创建新进程的方式是通过一对系统调用 fork() 和 exec()
进程还可以通过 wait() 或 waitpid() 系统调用来等待其创建的子进程结束 (避免僵死进程产生)

`fork` 系统调用

接口声明:

#include <unistd.h>

pid_t fork(void);

理解 fork() 最困难之处在于调用它一次，它却返回两次。它在调用进程（称为父进程）中返回一次，返回值是新派生进程（称为子线程）的进程ID号，在子进程又返回一次，返回值为0。因此，返回值本身告知当前进程是子进程还是父进程
fork() 在子进程返回 0 而不是父进程的进程 PID 的原因在于：任何子进程只有一个父进程，而且子进程总是可以通过调用 getppid() 取得父进程的进程 PID。相反，父进程可以有许多子进程，而且无法获取各个子进程的进程 PID。如果父进程想要跟踪所有子进程的进程 PID，那么它必须记录每次调用 fork() 的返回值
父进程中调用 fork 之前打开的所有描述符在 fork 返回之后由子进程分享网络服务器利用了这个特性：父进程调用accept 之后调用 fork。所接受的已连接套接字随后就在父进程与子进程之间共享。通常情况下，子进程接着读写这个已连接套接字，父进程则关闭这个已连接套接字。

`wait()` 和 `waitpid()` 系统调用

处理已终止的子进程

#include <sys/wait.h>

// 均返回：若成功则为进程ID，若出错则为0或-1
pid_t wait(int *statloc);
pid_t waitpid(pid_t pid, int *statloc, int options);

函数 wait() 和 waitpid() 均返回两个值：已终止子进程的进程 PID 号，以及通过 statloc 指针返回的子进程终止状态（一个整数）。我们可以调用三个宏来检查终止状态，并辨别子进程是正常终止、由某个信号杀死还是仅仅由作业控制停止。
如果调用 wait() 的进程没有已终止的子进程，不过有一个或多个子进程仍在执行，那么 wait() 将阻塞到现有子进程第一个终止为止
waitpid() 函数就等待哪个进程以及是否阻塞给了我们更多的控制。首先，pid 参数允许我们制定想等待的进程 PID, 值 -1 表示等待第一个终止的子进程。其次，options 参数允许我们指定附加选项。最常用的选项是 WNOHANG，它告知内核在没有已终止子进程时不要阻塞。

API 实例

创建子进程, 然后使用 wait() 等待子进程结束, 打印各自的 PID

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char* argv[]) {
    printf("hello world (pid:%d)\n", (int)getpid());
    int rc = fork();
    if (rc < 0) {   // fork 失败
        fprintf(stderr, "fork failed");
        exit(1);
    }   
    else if (rc == 0) { // 子进程
        printf("hello, I am child (pid:%d)\n", (int)getpid());
    }   
    else { // 父进程
        wait(NULL); // 等待子进程结束
        printf("hello, I am parent of %d (pid:%d)\n", rc, (int)getpid());
    }   

    return 0;
}

/*
output:
hello world (pid:5459)
hello, I am child (pid:5460)
hello, I am parent of 5460 (pid:5459)
*/

观察可知, 子进程不会从 main() 函数开始执行, 而是直接从 fork() 系统调用返回, 就好像是它自己调用了 fork()
子进程并不是完全拷贝了父进程, 虽然它拥有自己的地址空间、寄存器、程序计数器等等, 但是它从 fork() 的返回值的不同的, 上述例子由于使用了 wait() 来等待因此结果是一致的, 但若不等待则顺序可能不同

`exec()` 系统调用

一个进程想要执行另外一个程序, 唯一方法是先调用 fork() 创建子进程, 然后在子进程调用 exec() 将当前运行的子进程替换为不同的程序. 实际上, exec():

#include <unistd.h>

int execl(const char *pathname, const char *arg0, ... /* (char*) 0 */);

int execv(const char *pathname, char *const *argv[]);

int execle(const char *pathname, const char *arg0, ... /* (cahr *) 0, char *const envp[] */);

int execvpe(const char *pathname, char *const argv[], char *const envp[]);

int execlp(const char *filename, const char *arg0, ... /* (char *) 0 */);

int execvp(const char *filename, char *const argv[]);

依次传入可执行程序的路径 和 参数 , 就可以从程序中加载代码和静态数据, 并用它覆写自己的代码段 (以及静态数据), 堆、栈及其他内存空间也会被重新初始化
这些函数只在出错时才返回到调用者。否则，控制将被传递给新程序的起始点, 通常就是 main() 函数

API 实例

创建子进程, 然后调用 execvp() 执行 wc 程序, 返回指定文件的行、单词和字节数

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int main(int argc, char* argv[]) {
    printf("hello world (pid:%d)\n", (int)getpid());
    int rc = fork();
    if (rc < 0) {   // fork 失败
        fprintf(stderr, "fork failed");
        exit(1);
    }
    else if (rc == 0) { // 子进程
        printf("hello, I am child (pid:%d)\n", (int)getpid());
        char* args[3];
        args[0] = strdup("wc");     
        args[1] = strdup("p2.c");
        args[2] = NULL;            // 标志着参数的结尾
        execvp(args[0], args);
        // 下面的语句不会被打印出来, 因为控制被传递给新程序的起始点了
        printf("this shouldn't print out");
    }
    else { // 父进程
        wait(NULL); // 等待子进程结束
        printf("hello, I am parent of %d (pid:%d)\n", rc, (int)getpid());
    }

    return 0;
}

/*
output:
hello world (pid:5512)
hello, I am child (pid:5513)
 30  94 911 p2.c
hello, I am parent of 5513 (pid:5512)
*/

应用: 输出重定向

Linux 文件默认打开了三个文件描述符, 标准输入、标准输出和标准错误
实现重定向的步骤是在调用 exec() 之前关闭标准输出文件描述符 STDOUT_FILENO , 然后打开重定向的目标文件, 之后 exec() 所调用程序的标准输出就会重定向到指定文件中了

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
#include <fcntl.h>

int main(int argc, char* argv[]) {
    int rc = fork();
    if (rc < 0) {   // fork 失败
        fprintf(stderr, "fork failed");
        exit(1);
    }
    else if (rc == 0) { // 子进程
        close(STDOUT_FILENO);
        open("./newfile.txt", O_WRONLY);    

        char* args[3];
        args[0] = strdup("wc");     
        args[1] = strdup("p3.c");
        args[2] = NULL;            
        execvp(args[0], args);
    }
    else { // 父进程
        wait(NULL); // 等待子进程结束
    }

    return 0;
}

/*
output: cat newfile.txt
 29  71 647 p3.c
*/