May 02, 2014

Adding the pwd command to xv6

Xv6 is a fairly popular clone of Version 6 UNIX. It was made by MIT as a base for students to work off of, as the original V6 UNIX was showing it's age, and only ran on the outdated PDP11 architecture. Xv6 runs natively on x86, and supports modern features like SMP, while still being only 15k lines of easily-grokked C.

As a simple example, we're going to add the pwd command to xv6. pwd prints the shell's current working directory. To do this, we'll need to write the getcwd system call, and also write the pwd userspace program.

First, to set up the system call we need to add a bit of boilerplate.

Firstly, assign the syscall an id in syscall.h :

...
#define SYS_close  21
#define SYS_getcwd 22 // <-- add this

Then, to syscall.c , add definitions for the system call:

...
extern int sys_uptime(void);
extern int sys_getcwd(void); // <-- add this

...
[SYS_close]   sys_close,
[SYS_getcwd]  sys_getcwd, // <-- add this

We need to define the getcwd function for user code as well. Add the header defintion to user.h :

...
int uptime(void);
int getcwd(void*, int); // <-- add this

To define the actual function, we need to use assembly (to issue the interupt to switch to kernel mode). Luckily, that's done for us, defined as the SYSCALL macro. Add a definition for getcwd to usys.S :

...
SYSCALL(uptime)
SYSCALL(getcwd) // <-- add this

So how does all this boilerplate work? getcwd , when called by user code, will use the definition in user.h . This will be linked to the assembly function generated by the SYSCALL macro. This function moves the SYS_getcwd constant into %eax and then issues the interupt, switching to kernel mode. The kernel looks up %eax (22, SYS_getcwd ) in the syscalls array, and sees the entry we added, which maps SYS_getcwd to the sys_getcwd function... which we need to write!

There are several files with system call implementations in the kernel. The best place for this is probably sysfile.c , as this file contains other file-related interupts like sys_read and sys_exec .

This system call is a bit of code. I'm going to dump it all here, but don't worry, I'll walk through it piece by piece :). Prepared?

int
name_of_inode(struct inode *ip, struct inode *parent, char buf[DIRSIZ]) {
  uint off;
  struct dirent de;
  for (off = 0; off < parent->size; off += sizeof(de)) {
    if (readi(parent, (char*)&de, off, sizeof(de)) != sizeof(de))
      panic("couldn't read dir entry");
    if (de.inum == ip->inum) {
      safestrcpy(buf, de.name, DIRSIZ);
      return 0;
    }
  }
  return -1;
}

int
name_for_inode(char* buf, int n, struct inode *ip) {
  int path_offset;
  struct inode *parent;
  char node_name[DIRSIZ];
  if (ip->inum == namei("/")->inum) { //namei is ineffecient but iget isn't exported for some reason
    buf[0] = '/';
    return 1;
  } else if (ip->type == T_DIR) {
    parent = dirlookup(ip, "..", 0);
    ilock(parent);
    if (name_of_inode(ip, parent, node_name)) {
      panic("could not find name of inode in parent!");
    }
    path_offset = name_for_inode(buf, n, parent);
    safestrcpy(buf + path_offset, node_name, n - path_offset);
    path_offset += strlen(node_name);
    if (path_offset == n - 1) {
      buf[path_offset] = '\0';
      return n;
    } else {
      buf[path_offset++] = '/';
    }
    iput(parent); //free
    return path_offset;
  } else if (ip->type == T_DEV || ip->type == T_FILE) {
    panic("process cwd is a device node / file, not a directory!");
  } else {
    panic("unknown inode type");
  }
}

int
sys_getcwd(void)
{
  char *p;
  int n;
  if(argint(1, &n) < 0 || argptr(0, &p, n) < 0)
    return -1;
  return name_for_inode(p, n, proc->cwd);
}

That wasn't so bad! Lets look at a small piece first.

int
sys_getcwd(void)
{
  char *p;
  int n;
  if(argint(1, &n) < 0 || argptr(0, &p, n) < 0)
    return -1;
  return name_for_inode(p, n, proc->cwd);
}

This function is pretty simple. argint and argptr just take arguments from the stack - the order is reversed because argptr needs the length of the buffer. proc is a global variable containing the current process, and cwd is the inode for the processes current directory. So we already have the current directory as proc->cwd , what's all the other code for?

In Xv6 (and most Unix filesystems), the inode, short for 'index node' (though this may be a backronym), is a number pointing to a specific block on the disk that holds information about the file. On xv6, the inode struct stores only a few pieces of data. You can see them for yourself in file.h if you're interested. However, the inode is a layer of abstraction below the concept of the filesystem hierachy and filenames. The fileystem hierarchy is created with special inodes - type T_DIR - which contain a series of dirent structures. Each dirent is a tuple of a string name and an inode which references the file associated with that name. inodes themselves have no concept of filenames, and can in fact be contained in multiple parent directories with multiple different filenames. For example, one directory "foo" could have a dirent ("bar", inode 20), and another directory "baz" could have a dirent ("buzz", inode 20). What filename is inode 20? Is it /foo/bar, or is it /baz/buzz? The OS doesn't have to decide, since it never has a need to tranform inodes to filenames, so it doesn't.

We, however, have to decide. Luckily we have an out - if we had to start with a file this would be much trickier, but proc->cwd is always a directory , and thankfully create always makes the .. entry pointing to a directory parent. We will simply write a recursive function:

func psuedocode_func(inode):
    if inode is root:
        return "/"
    else:
        name := ""
        for dirent in inode.parent:
            if dirent.inode == inode:
                name = dirent.name
                break
        if not name:
            panic("inode not in parent")
        return psuedocode_func(inode.parent) + name + "/"

Of course, when we implement this in C and add error handling, it will get much uglier ;). Lets start with the implementation of that for loop. In the actual C code, this is the separate function name_of_inode :

int
name_of_inode(struct inode *ip, struct inode *parent, char buf[DIRSIZ]) {
  uint off;
  struct dirent de;
  for (off = 0; off < parent->size; off += sizeof(de)) {
    if (readi(parent, (char*)&de, off, sizeof(de)) != sizeof(de))
      panic("couldn't read dir entry");
    if (de.inum == ip->inum) {
      safestrcpy(buf, de.name, DIRSIZ);
      return 0;
    }
  }
  return -1;
}

Lets start with the header:

int
name_of_inode(struct inode *ip, struct inode *parent, char buf[DIRSIZ]) {

Since this is C, we take the string as a buffer, and return an int to show success/failure. DIRSIZ is a constant (14), the max filename length.

  uint off;
  struct dirent de;

I predeclared the variables, as this is the convention in the rest of the xv6 source. I did do the brackets differently, but eh.

  for (off = 0; off < parent->size; off += sizeof(de)) {
    if (readi(parent, (char*)&de, off, sizeof(de)) != sizeof(de))
      panic("couldn't read dir entry");

Directories in the xv6 filesystem are really files whose contents are a just a series of dirent structures. Dirent is just a structure of a ushort inode id and a name. That means all this loop does is loop over every directory entry in the directory, loading it into de with readi .

    if (de.inum == ip->inum) {
      safestrcpy(buf, de.name, DIRSIZ);
      return 0;
    }

We check de to see if it's referring to our sought-after inode, ip . If it is, we use safestrcpy from string.h to copy it over in the buffer and do a standard C successful return.

If we don't find the inode, we return -1. This shouldn't happen unless the filesystem is pretty broken ( .. points to the wrong parent).

Pretty simple, overall! Now lets move to the big function.

int
name_for_inode(char* buf, int n, struct inode *ip) {
  int path_offset;
  struct inode *parent;
  char node_name[DIRSIZ];
  if (ip->inum == namei("/")->inum) { //namei is ineffecient but iget isn't exported for some reason
    buf[0] = '/';
    return 1;
  } else if (ip->type == T_DIR) {
    parent = dirlookup(ip, "..", 0);
    ilock(parent);
    if (name_of_inode(ip, parent, node_name)) {
      panic("could not find name of inode in parent!");
    }
    path_offset = name_for_inode(buf, n, parent);
    safestrcpy(buf + path_offset, node_name, n - path_offset);
    path_offset += strlen(node_name);
    if (path_offset == n - 1) {
      buf[path_offset] = '\0';
      return n;
    } else {
      buf[path_offset++] = '/';
    }
    iput(parent); //free
    return path_offset;
  } else if (ip->type == T_DEV || ip->type == T_FILE) {
    panic("process cwd is a device node / file, not a directory!");
  } else {
    panic("unknown inode type");
  }
}

The header and beginning are similar, so I'll skip over them.

  if (ip->inum == namei("/")->inum) { //namei is ineffecient but iget isn't exported for some reason
    buf[0] = '/';
    return 1;
  }

This is the inode is root line in the psudeocode. As the comment states, iget is a private function, so we have to use the slightly less effecient namei , which is basically a wrapper which can turn a full path into an inode. It has a fast path for "/", so the parsing code doesn't make it much slower thankfully. Anyway, if the node is the root, we set buf to "/", just as in the psuedocode.

  } else if (ip->type == T_DIR) {

This line is rather self-explanatory. The inode types are defined in stat.h as T_DIR, T_FILE, and T_DEV - a device node (the things in /dev/ on most Unix systems).

    parent = dirlookup(ip, "..", 0);
    ilock(parent);

We grab the parent reference with dirlookup. The . and .. entries could feasibly not exist (they're just normal directories, the kernel makes them in when the directory is created). I didn't explicitly deal with this case, but if the entry does not exist it will return null , and pasing null to ilock will cause a panic, which is really the only thing to do in this case anyways. ilock makes sure the inode is loaded from disk.

Next, we call our name_of_inode function that we already discussed, and do that wonderful C string manipulation. Finally, we release the inode with iput , and return the length of the path, either for the benefit of the next name_for_inode down, or for the userspace caller.

The other checks are simply to make sure something extremely weird hasn't happened.

The system call is now, amazingly, done! The getcwd function will work fine from userspace code! Lets write pwd to test it:

#include "types.h"
#include "user.h"

#define MAX_PATH 512

int
main(int argc, char *argv[])
{
  char path[MAX_PATH];
  getcwd(path, MAX_PATH);
  printf(0, "%s\n", path);
  exit();
}

Yep, that's it! This is all extremely self explanatory, hello world even, so the only notes I'll leave are that printf is defined a little oddly (the first argument is which stream to print to, 0 being stdout and 1 being stderr , and that you have to use exit() , not return 0; to end your program. If you use return 0; , everything will work, but the kernel will complain a bit and print an ugly error message.

The only thing left to do is add pwd to the makefile.

...
        _zombie\
        _pwd\ # <-- add this

Binaries compile to _[c file name], so this just sets up the pwd binary.

...
        printf.c umalloc.c\
        pwd.c\ # <-- add this

Also add pwd.c as an extra file, which I think is only used for distibution, but you might as well.

Thats it! You can now run make qemu (assuming you haven't already jumped the gun and tested it), and if you make a few levels of directories and cd into them, pwd will work (keep in mind there is no $PATH , so if you're in /foo/bar/baz , you need to run ../../../pwd for the shell to find it).

Happy hacking! Maybe try implementing grep ?