6.2 KiB
System-Level I/O
IO is the process of coping data between the main memory and external devices.
In a Linux, file is a sequence of m bytes.
All I/O devices are represented as files. Even the kernel is represented as a file.
Unix IO
openandclosereadandwritelseekchanging current file position
File Types
- Regular files
- Directory
- Socket
- ...
Regular Files
A regular file contains arbitary data.
For example text file is a sequence of text lines. EOL is different in different OS: (\n in Unix, \r\n in Windows & Internet).
Directories
Directory contains an array of links. Least two links are .(itself) and ..(parent dir).
lsmkdirrmdir
All files are orgnaized as a hierarchy anchored by root dir named /.
Kernel maintains curr working dir (cwd) for each process that modified using the cd command.
Path names
- Absolute
/home/yenru0/workspace - Relative
../workspace
Open & Close & Read & Write
int fd;
if ((fd = open("file.txt", O_RDONLY)) < 0) {
perror("open");
exit(1);
}
openreturns a non-negative integer called file descriptor (fd).fd == -1indicates an error.0: stdin,1: stdout,2: stderr
int fd; int ret;
if ((ret = close(fd)) < 0) {
perror("close");
exit(1);
}
Closing an already closed can lead to a disastrous situation in threaded programs. So always check the return code.
char buf[512];
nbytes = read(fd, buf, sizeof(buf));
ssize_t read(int fd, void *usrbuf, size_t n);
read returns the number of bytes read from the fd into buf.
ssize_t is signed version of size_t.
If read returns negative value, an error occurred.
ssize_t write(int fd, const void *usrbuf, size_t n);
If write returns negative value, an error occurred.
Short Counts
It means that read or write transfers fewer bytes than requested. It can occur in these situations:
EOFon reads- Reading text lines from an terminal
- Reading from a network socket
Never occurs:
- Reading from disk files (except for
EOF) - Writing to disk files
RIO pakcage
RIO is a set of wrappers efficient and robust I/O functions subject to short couunts.
- unbuffered RIO functions
rio_readn,rio_writen - buffered RIO functions
rio_readnb,rio_readlineb- buffered RIO functions are thread-safe and can be interleaved arbitrarily on the same descriptor.
Buffered RIO
To read efficiently from a file, RIO uses partially cached in an interal memory buffer. (rio_t structure)
For reading from file, Buffer has buffered portion of already read and unread data. It is refilled automatically by rio_readnb and rio_readlineb as needed. This is partially cached.
typedef struct {
int rio_fd; // Descriptor for this internal buf
int rio_cnt; // Unread bytes in internal buf
char *rio_bufptr; // Next unread byte in internal buf
char rio_buf[RIO_BUFSIZE]; // Internal buffer
} rio_t;
example:
int main(int argc, char **argv) {
int n; rio_t rio; char buf[MAXLINE];
rio_readinitb(&rio, STDIN_FILENO);
while ((n = rio_readlineb(&rio, buf, MAXLINE)) != 0) {
rio_writen(STDOUT_FILENO, buf, n);
}
exit(0);
}
Metadata
Metadata is data about data. (file access, file size, file type)
- Per-process metadata
- when a process opens a file, the kernel creates an entry in a per-process table called the file descriptor table
- Per-file metadata
- can be accessed using
statsystem call
- can be accessed using
struct stat {
dev_t st_dev; // ID of device containing file
ino_t st_ino; // inode number
mode_t st_mode; // protection
nlink_t st_nlink; // number of hard links
uid_t st_uid; // user ID of owner
gid_t st_gid; // group ID of owner
dev_t st_rdev; // device ID (if special file)
off_t st_size; // total size, in bytes
blksize_t st_blksize; // blocksize for filesystem I/O
blkcnt_t st_blocks; // number of 512B blocks allocated
time_t st_atime; // time of last access
time_t st_mtime; // time of last modification
time_t st_ctime; // time of last status change
};
How to Kernel represents Open Files
- Descriptor table(per-process)
- Open file table(shared by all processes)
- v-node table(shared by all processes)
When a process opens a file, the kernel creates an entry in the per-process file descriptor table. Each entry contains a pointer to an entry in the open file table. Each entry in the open file table contains a pointer to an entry in the v-node table.
When a fork calls: the child process inherits copies of the parent's file descriptors. And the entry points to open file table's entry increasing refcnt.
IO redirection
for example: ls > foo.txt
Answer: dup2(oldfd, newfd) it means copies descriptor table entry oldfd to newfd
so dup2(4, 1) makes stdout point to the same open file as descriptor 4.
stdio
The C standard library (libc.so) provides a collection of higher-level standard I/O functions.
fopen,fclose,fread,fwrite,fgets,fputs,fscanf,fprintf
stdio models open files as streams, which are abstraction for a file descriptor and a buffer in memory.
extern FILE * stdin;
extern FILE * stdout;
extern FILE * stderr;
Buffered I/O
Application often read and write one char at a time. However, UNIX System calls read and write calls expensive. So we need buffered read & write; use unix read & write to get a block of data into a buffer. And then user application reads/writes one char at a time from/to the buffer; it is efficient because it is simple memory access.
stdio uses buffer. printf is not write immediately to the stdout file; it is stored in a buffer. And then when fflush(stdout), exit, or return from main, the buffer is flushed to the file using write syscall.
Remark
- UNIX IO
- RIO package
- stdio
When to use
- stdio: disk or terminal files
- unix io: signal handlers, or when you need absolute high performance
- RIO: networking
Binary
DO NOT USE:
- text oriented I/O:
fgets,scanf,rio_readlineb - string functions:
strlen,strcpy,strcat,strcmp