Bobcheng

Process

program : 一個可以執行的檔案 (on disk)
process : 一個正在運行的 program (on OS kernel)

實用指令

ps 顯示目前正在運行的 processes 的快照
top 實時動態顯示所有 processes，有點像 Windows 的工作管理員

Kernel and User process

Kernel process : 只會在 kernel space/mode 下運行
User process : 大部分在 user space/mode 下運行，可以透過 system call 來進入 kernel space/mode

Kernel and User mode

Process ID (PID)

每個 process 都有一個 unique PID
a non-negative integer
OS kernel 會重新使用已經 terminated process 的 PID

特殊 PID:

PID 0 - swapper, or scheduler/idle process
- Kernel process
- 沒有對應的 program
PID 1 - init process
- User process with superuser privilege
- runs the init program (/sbin/init) 來啟動 Unix system
- initialize system services, login processes, etc.
PID 2 - pagedaemon (in some Unix system) or kthreadd (in Linux)
- Kernel process
- 負責支援 virtual memory system 的呼叫

獲取 pid : getpid()

// return: process ID of calling process
int getpid(void);

Machine Bootstrapping

更詳細的流程：https://linux.vbird.org/linux_basic/centos5/0510osloader-centos5.php

名詞解釋

firmware (韌體) : 嵌入在硬體裡面的軟體，通常指「驅動硬體進行作業」的軟體
ROM (唯讀記憶體) : 資料在任何情況下都不會改變，只能讀取
boot loader (引導裝載程式) : 電腦開機後裝載 OS 的 program，不同 OS 有自己的 boot loader

machine power on
boot program (firmware, boot loader) 為 OS kernel 設置好硬體環境 :
- CPU 從 ROM 執行 firmware
  - firmware (BIOS/UEFI) 會初始化硬體 (e.g. RAM)
  - firmware 會將 software 從儲存裝置 (hard drive) 載入到記憶體 (RAM)
- CPU 從 RAM 執行 boot loader
  - boot loader (ex: grup, u-boot) 會做更多的 machine bootstrap
  - boot loader 會將 OS 從儲存裝置載入到 RAM
OS kernel (and its driver) 初始化更多的硬體，為 user processes 設置環境
- OS create two processes (pid 1, pid 2)
- OS kernel 會將 init program (init process 要用的) 從 file system 載入到 RAM
OS 進入 user space 並執行 init process
system is brought up for multi-user operation

Process Life Cycle

Process Control

Process Control Block (PCB)

在 OS Kernel 裡，每個 process 都會以 PCB 的方式表現，包含以下資訊：

process state
program counter, registers
memory management information
accounting information

Process Scheduling

紅色數字代表 scheduler 執行的不同狀況

狀況 1：put new process into ready queue

狀況 2 : running process is interrupt, choose another process to run on the CPU

CPU 做 context switch，也就是將目前的 running process 的 context (環境) 存到 PCB 裡面，避免進度遺失
目前的 running process 離開 (進入 waiting 狀態)
CPU 執行 chosen process

狀況 3 : put waiting process into ready queue

Example: OS kernel schedules (time-shares) two processs on a single CPU

time →

Forking

簡單來說就是複製一個新的 process
parent process 在呼叫 fork() 後會立即產生一個 child process :

parent 和 child 有不同的 pid
一個 process 可以有很多 child，但一個 process 只能有一個 parent
OS kernel 會追蹤所有 process 的 parent，除了 pid 0, 1, 2 之外

為什麼需要 fork() :

當 process 想要複製自己讓 parent 和 child 可以執行程式的不同區段，通常用在 network servers
當 process 需要執行不同的 program，像是 shells 就會有這種需求

fork() 之後的流程

fork()

// returns: 0 if in the child, <process ID of child> if in the parent, -1 on error
pid_t fork(void);

parent process 呼叫 fork() 來建立一個新的 child process

呼叫一次 fork() 會回傳兩次，一次在 parent 內、一次在 child 內
parent 和 child 都會繼續執行 fork() 之後的程式

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int globvar = 6; /* 外部變數 */
char buf[] = "a write to stdout\n";
int main(void) {
    int var; /* 內部變數 */
    pid_t pid;
    var = 88;

    if (write(STDOUT_FILENO, buf, sizeof(buf)-1) != sizeof(buf)-1)
        err_sys("write error");

    printf("before fork\n");  /* 只有 parent 會執行一次 */
    /* we don’t flush stdout */

    if ((pid = fork()) < 0) {
        err_sys("fork error");
    } else if (pid == 0) {
        /* 只有 child 會執行 */
        globvar++;
        var++;
    } else {
        /* 只有 parent 執行 */
        sleep(2);
    }

    /* parent 和 child 都會執行 */
    printf("pid = %ld, glob = %d, var = %d\n", (long)getpid(), globvar, var);
    exit(0);
}

$ ./a.out
a write to stdout
before fork
pid = 430, glob = 7, var = 89
pid = 429, glob = 6, var = 88

$ ./a.out > temp.out
$ cat temp.out
a write to stdout
before fork                     // by child
pid = 432, glob = 7, var = 89   // by child
before fork                     // by parent
pid = 431, glob = 6, var = 88   // by parent

在 fork() 之後，child 更動變數值並不會影響 parent，反之亦然
執行 ./a.out > temp.out 時，stdout 為 fully buffered 在 parent 執行完 printf("before fork\n") 後，該內容後繼續留在 buffer 裡面，並複製給 child，當 child 執行結束後才 flush，並輸出 before fork\npid = 432, glob = 7, var = 89
而 parent 也是等到執行結束後才 flush，並輸出 before fork\npid = 432, glob = 6, var = 88

stdout 的 buffer 問題

stdout 後面接的是 terminal device 時為 line buffered，否則為 fully buffered
flush : 將目前 buffer 的內容全部輸出，清空 buffer，process 結束時會自動 flush
line buffered : 遇到 ‘\n’ 就會 flush (或是 buffer 滿了)
fully buffered : buffer 滿了才會 flush

注意事項

parent 和 child 的執行順序不是固定的 — 取決於 kernel scheduler
fork 失敗的情況 : 已經太多 process、一個 real user ID 的 process 超過上限(CHILD_MAX)
記得要做 synchronization (下面會談到)

對於任何在 fork() 之前被開啟的 file descriptors，parent 和 child 會共享同樣的 open file table entry，也就是說他們共享同樣的 file offset

在沒有做 synchronization 的情況下，如果兩者對同一個 file descriptor 做讀寫，由於兩者共享同樣的 file offset，所以當一者寫入時，另一者也會得到更新後的 offset，並接續寫在後方，因此造成 parent 和 child 的資料 intermixed (不是 overwritten)

一般情況下，在 fork() 之後有兩種處理 fds 的方式 :

parent 等到 child 執行完才繼續 : 當 child 執行完後，每個被 child 讀寫過的 fds 的 file offset 都要自動被更新，因此 parent 可以繼續接續讀寫
parent 和 child 各自執行 : parent 和 child 在 fork() 之後都各自關閉不需要的 fds，雙方都不會互相影響到對方的 open fds (對於兩方都需要的 fd 則需要做另外的 synchronization)

Copying address space in fork()

雖然說 child 是 parent 的複製，但具體而言如何複製、又複製了什麼?
在作業系統裡面， address space 指的是 OS 分配給一個 program 的一段記憶體空間，而 address space 的 memory layout 通常如下 :

當呼叫 fork() 之後，parent 和 child 會共享相同的 memory address layout，最一開始是採取整個複製的方式，現在則改成 COW 的方式來維持兩邊的資料互不影響。

所謂的 COW(copy-on-write) 為一開始雙方共享同樣的 memory address，直到某一方要修改資料(ex: 修改變數)時，才會複製一段 copy of memory 讓它修改

因此 :

parent 和 child 有著各自的 copy of data, heap, stack (假如有修改資料的話)
parent 和 child 共享著同樣的 text segment (因為程式內容不會被修改)

vfork()

// returns: 0 if in the child, <process ID of child> if in the parent, -1 on error
pid_t vfork(void);

在 vfork() 後，parent 和 child 會共享同樣的 memory address layout，直到 child 呼叫 exec() 或是 exit() 為止
如果使用 vfork()，則==永遠都是 child 先跑==，parent 會 blocking 直到 child 呼叫 exec() 或是 exit()
也就是說，一旦 child 做出了 undefined behavior : 更改了任何資料、在沒有呼叫 exec() 或 exit() 的情況下 return，vfork() 就會壞掉

原本 vfork() 是為了最佳化 fork() 的記憶體使用，但由於現今 fork() 改採用 COW 的方式，因此 vfork() 和 fork() 並沒有很大的表現差異

Deadlock

deadlock : 當兩個以上的 processes 都在等待對方結束以釋放其所需資源，導致沒有任何 process 可以結束

ex: 假設使用 vfork()，當 child 需要等待 parent 的動作來繼續執行時，由於 parent 不會行動直到 child exit() 或 exec() 為止，因此兩邊都卡住陷入 deadlock

exec()

// All return -1 on error; no return on success
int execl(const char *pathname, const char *arg0, ... /* (char *)0 */ );
int execlp(const char *filename, const char *arg0, ... /* (char *)0 */ );
int execle(const char *pathname, const char *arg0, ... /* (char *)0, char *const envp[] */ );
int execv(const char *pathname, char *const argv[]);
int execvp(const char *filename, char *const argv[]);
int execve(const char *pathname, char *const argv[], char *const envp[]);

exec() 讓一個 process 可以執行其他的 program

exec() 會用新的 program 取代原本的 process : 整個 address space 都會被替換掉
pid 不會換 (也就是說，不會產生新的 process)
許多系統只提供 execve()
當檔案被設定 flag FD_CLOEXEC 時，一旦任何一個 exec function 成功就會自動關閉該檔案

exec 後面的字母代表不同型別的參數

p : path
- 沒有 p 的 exec function 會以 pathname 作為欲執行的 program 位置
- 如果參數 filename 不包含 / 字元，則會在 PATH 下面尋找 filename 指定的檔案
- 如果參數 filename 內含 / 字元，則直接執行 filename 指定的檔案
- 如果 filename 不能執行，則以 filename 為 input 執行 /bin/sh
l : list
- command line 會透過一個 list : arg0, arg1, … 分別傳送
- 必須以 NULL 結尾
v : vector
- command line 會透過一個 array : argv[] 傳送
- argv[] 必須以 NULL 結尾
e : environment
- envp[] 代表該 program 用到的 environment variables
- envp[] 必須以 NULL 結尾
- 其他沒有 e 的 exec function 會從 caller process 獲取 environment variables
- child 會繼承 parent 的 environment variables

用不同 exec function 執行 ls -l 的範例 :

execl("/bin/ls","ls","-l", NULL);
execlp("ls","ls","-l", NULL);
execl("/bin/ls”,”ls","-l", NULL, envp);
char *argv[] = {"ls","-l", NULL}; execv("/bin/ls", argv);
char *argv[] = {"ls","-l", NULL}; execvp("ls", argv);
char *argv[] = {"ls","-l",NULL}; execve("/bin/ls", argv, envp);

fork() v.s. exec()

	`fork()`	`exec()`
File locks	不繼承	保持相同
Pending alarms and signals	不繼承	保持相同
Environment	從 parent 繼承	可被 env 改動

為什麼要讓 fork() 和 exec() 分開 ?
讓 child process 可以建立自己的環境 (ex: setting up open file descriptors)

Changing UID and GID when exec()

如果 program 有 set-uid bit，那 exec() 會將 process’s effective uid 設定成 program’s uid

exec() 會將設定完的 effective uid 複製到 saved set-uid
process with elevated privileges (透過 exec() set-uid program) 可能會切換成 normal privileges (將 effective uid 設定成 real uid) 並想要重新切換成 elevated privileges
seteuid() 和 setegid() 提供 process 可以設定新的 effective uid : 當新的 effective uid 相同於 saved set-uid 或是 real uid 才會成功

Process Termination

Normal process termination :
- return from main()
- call exit()
- call _exit() or _Exit()
- return of the last thread from its start routine
- calling pthread_exit from the last thread
Abnormal process termination :
- call abort() (Chap. 10)
- terminated by a signal (Chap. 10)
- response of the last thread to a cancellation request
不論是 normal 還是 abnormal termination，kernel 最後都會釋放所有 process 相關的 memory、關閉所有 process 打開的 file descriptors
當一個 process 終止時，kernel 會透過傳送 SIGCHLD signal 來告知 parent
- parent 可以選擇忽視 SIGCHLD，或是提供 signal handler
terminated process 的 parent 應該要呼叫任何一個 wait() 函數來獲取該 process 相關的資源

exit()

https://m033010041.github.io/2019/01/15/TLPI-ExitProcess/
https://blog.csdn.net/drdairen/article/details/51896141

#include <stdlib.h> // specified in ISO c
void exit(int status);
void _Exit(int status);

#include <unistd.h> // specified in POSIX.1
void _exit(int status);

status : 該 process 的 termination status (死亡訊息)
- 由 kernel 傳送給 parent process 來讓其知道 child 是如何被終止的
- set to ==0== for normal exits
當這三個函數被 process 呼叫時，會正常終止該 process
exit() 會做 cleanup
- 呼叫所有的 exit handlers
- 呼叫 fclose() 來 flush 和關閉所有打開的 I/O streams
_exit() 和 _Exit() 不會做 cleanup

atexit()

exit handlers: 用來清理(釋放資源、保存資料等)的函數

// Returns 0 if OK, nonzero on error
int atexit(void (*func)(void));

atexit() 會將函數 func 註冊成 exit handler

同樣的函數可以被註冊很多次，每次註冊都會被呼叫一次
exit handlers 根據註冊順序的相反被呼叫
當一個 child process 被 fork() 建立時，註冊資訊會從 parent 身上繼承；然而一旦成功呼叫 exec()，那註冊資訊會被全部清空

static void my_exit1(void){
  printf("first exit handler\n");
}
static void my_exit2(void){
  printf("second exit handler\n");
}

int main(void){
  if (atexit(my_exit2) != 0)
  err_sys("can’t register my_exit2");
  if (atexit(my_exit1) != 0)
  err_sys("can’t register my_exit1");
  if (atexit(my_exit1) != 0)
  err_sys("can’t register my_exit1");

  printf("main is done\n");
  return(0);
}

main is done
first exit handler
first exit handler
second exit handler

wait(), waitpid()

// Both return: process ID if OK, 0, or -1 on error
pid_t wait(int *statloc);
pid_t waitpid(int pid, int *statloc, int options);

用來等待 child process 的 state change : terminated, stopped by a signal, resume by a signal

當 process 呼叫了 wait() 或 waitpid()，可能會有以下情況
- block : 如果所有 children 都還在 running
- return immediately with status value : 如果有 child state changed
- return immediately with error : 如果沒有任何 child
wait() 會使 caller blocking 直到有任一 child state change 為止
waitpid() 會使 caller blocking 直到指定的 process pid state change 為止，並可以透過 options 來設定不同行為

參數設定

statloc : 用來儲存 child process 的 termination status，設成 NULL 來忽略
termination status 可以用以下 marco 檢查
pid :
- =-1 waits for any child process
- >0 waits for the child whose process ID == pid
- =0 waits for any child whose process GID == caller’s GID
- <-1 waits for any child with process GID == |pid|
options :

Zombie and Orphan Process

Zombie process:

process 已經被終止了，但 parent 還沒有去獲取它的 termination status (沒有用 wait function)
此時該 process 會處於 “zombie state”
zombie 並不會耗費很多 memory，但會占用 pid
- 如果太多 zombie 會導致沒有 pid 可以用

zombie state 的存在是為了讓 parent process 可以獲取 child 的 termination status

Orphan process :

process 還在跑，但它的 parent 已經終止了
此時 OS kernel 會將該 process 的 parent 設定成 init process (pid 1)
orphan process 永遠不會變成 zombie process，因為 init process 會呼叫 wait function 來獲取它的 termination status

Inter-Process Communication (IPC)

Race Condition

當很多 process 同時對 shared data 做操作，並且 processes 操作順序會影響結果，就會造成 race condition

#include "apue.h"
static void charatatime(char *);

int main(void) {
	pid_t pid;
	if ((pid = fork()) < 0) {
		err_sys("fork error");
	} else if (pid == 0) {
		charatatime("output from child\n");
	} else {
		charatatime("output from parent\n");
	}
	exit(0);
}

static void charatatime(char *str) {
	char *ptr;
	int c;
	setbuf(stdout, NULL); /* set unbuffered */
	for (ptr = str; (c = *ptr++) != 0; )
		putc(c, stdout);
}

$ ./a.out
ooutput from child
utput from parent
$ ./a.out
ooutput from child
utput from parent
$ ./a.out
output from child
output from parent

要避免 race condition，我們必須要讓 parent 和 client 可以溝通，這就是synchronization 的概念，以下是基本的 synchronization 流程

TELL_WAIT();                    /* set things up for TELL_xxx & WAIT_xxx */

if ((pid = fork()) < 0) {
	err_sys("fork error");
} else if (pid == 0) {          /* child */
  /* child does whatever is necessary ... */
  TELL_PARENT(getppid());       /* tell parent we’re done */
  WAIT_PARENT();                /* and wait for parent */
  /* and the child continues on its way ... */
  exit(0);
} else {                        /* parent */
  /* parent does whatever is necessary ... */
  TELL_CHILD(pid);              /* tell child we’re done */
  WAIT_CHILD();                 /* and wait for child */
  /* and the parent continues on its way ... */
  exit(0);
}

具體而言，synchronization 可以透過 IPC 機制來實現(或是 signal)，而 IPC 有許多方式如下

目前只會介紹 pipes 和 FIFOs

Pipes

pipe 有 read end 和 write end 兩端分別用來讀寫
實際上，pipe 不是一個檔案，只是 kernel 建立的 buffer，暫存寫入的訊息直到被讀出為止
pipe 內的資料一旦被讀出便會被清除
pipe 為 half-duplex : 允許兩台裝置雙向溝通，但同一時間只能有一個方向
只能在有親緣關係的 process 之間溝通 (FIFOs 可以解決)
不能 lseek()

建立方法
使用 pipe() 來建立

#include <unistd.h>
// Returns 0 if OK, -1 on error
int pipe(fd[2]);

fd[0] 代表 read end ; fd[1] 代表 write end (都是打開狀態)
fd[0] 和 fd[1] 的 file type 都是 FIFO

使用方法
一般而言
parent 會先呼叫 pipe() 建立 pipe
再呼叫 fork() 產生 child (因為 child 會繼承 open fds)

接著 parent 關掉 fd[1] ; child 關掉 fd[0]
變成 child 寫 parent 讀
(或是做相反的事)

int main(void) {
		int n;
		int fd[2];
		pid_t pid;
		char line[MAXLINE];

		if (pipe(fd) < 0)
				err_sys("pipe error");

		if ((pid = fork()) < 0) {
				err_sys("fork error");
		} else if (pid > 0) {           /* parent */
				close(fd[0]);
				write(fd[1], "hello world\n", 12);
		} else {                        /* child */
				close(fd[1]);
				n = read(fd[0], line, MAXLINE);
				write(STDOUT_FILENO, line, n);
		}
		exit(0);
}

注意事項

block 的情況
- read from an empty pipe : 直到有資料可以讀為止
- write to a full pipe : 直到有足夠的資料被讀出為止
存取到 closed fd 的情況
- write end is closed : read() will return 0 (EOF)
- read end is closed : write() will trigger SIGPIPE signal sent to process
  - if SIGPIPE is ignored, write() will return -1
PIPE_BUF 代表 pipe 的最大容量 ; Linux 的 PIPE_BUF = 4096

FIFOs

行為和 pipe 一樣
又被叫做 named pipe — 因為有名字可以被指定，所以可以在非親緣關係的 process 之間溝通
為實際存在的檔案，file type 為 FIFO
- 但資料交換只在 kernel 內進行，也就是說不會真正寫入 file system 內，==FIFO 檔案是沒有內容的==

建立方法
使用 mkfifo() 來建立

\#include <sys/types.h>
\#include <sys/stat.h>
// Returns 0 if OK, -1 on error
int mkfifo(const char *path, mode_t mode);

path : FIFO 的名字
mode : 設定權限

使用方法
和 pipe 差不多

注意事項

在使用 FIFO 之前兩端都必須被打開 (open with O_RDONLY, O_WRONLY)
block 的情況
- open without O_NONBLOCK : 打開任一端都會 block 直到另一端也被打開為止
- open with O_NONBLOCK :
  - 打開 read end 會立即 return
  - 打開 write end 會 return error (ENXIO) 如果沒有任何 process 打開 read end
如果對沒有 reader 的 FIFO 做寫入，SIGPIPE signal sent to process

Process Control

On this page