kptr_restrict – privilege checks

In the Linux kernel kptr_restrict is used to protect sensitive kernel pointer values by printing them as zeros if a kernel provided file (e.g. /sys, /proc, etc) is read as a non-privileged user. This is implemented with a printf extension called “%pK”. Typically code prints to a /sys or /proc file using the seq_file interface. For example:

seq_printf(m, "secret pointer = %pK\n", secret_pointer);

When kptr_restrict is enabled and a file using this interface is read a check is done to see if the user has CAP_SYSLOG. If so, the real pointer value is printed, otherwise zeros are printed.

This deviates from the normal UNIX file privilege model, where file permissions are checked at open time, not read time. This becomes a problem when a setuid binary reads a file which uses %pK to protect sensitive pointers. Setuid binaries which need to open files will typically drop privileges to open the file. This ensures that only files which the real user has access to will be opened. Once the file is opened, the setuid binary can re-elevate its privileges.

Consider the code in pppd for parsing the options file in pppd/options.c:options_from_file():

euid = geteuid();
if (check_prot && seteuid(getuid()) == -1) {
    option_error("unable to drop privileges to open %s: %m", filename);
    return 0;
f = fopen(filename, "r");
err = errno;
if (check_prot && seteuid(euid) == -1)
    fatal("unable to regain privileges");

The effective user id is set to the real user id and then the options file is opened. Only a file readable by the real user can be opened here. Once the file has been successfully opened, the pppd binary immediately re-elevates privileges. Looking a few lines down in the code we see this:

while (getword(f, cmd, &newline, filename)) {
    opt = find_option(cmd);
    if (opt == NULL) {
        option_error("In file %s: unrecognized option '%s'",
                     filename, cmd);
        goto err;

The pppd option parser reads commands, and will print out an error if the command is not valid.

The problem is that %pK checks privileges when the file is read, not when it is opened. Most files which use %pK are world readable, so they will pass the open check in pppd. But when the read is done pppd is running as root and so the %pK protection will not be in effect. Which means we can do this:

$ head -1 /proc/kallsyms 
00000000 T startup_32 
$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'

Unfortunately pppd bails on the first error in the file, so we can’t get it to dump the full contents of /proc/kallsyms. It is useful for one-liner %pK files such as those in /sys/module//sections/* though. Most setuid applications avoid printing anything from opened files, pppd being the only one I could find in a stock Ubuntu 12.04 installation. If you get lucky though, you might find a less paranoid setuid binary which will happily write out the full contents of a file under the assumption that the real user must be able to read it anyway.

The quick and dirty fix to this issue is to check that, in addition to having CAP_SYSLOG, the real and effective uids and gids are equal before printing the real %pK values. That fix has been merged into mainline Linux here.

The better long term fix is to do the privilege check at open time and store it as part of the seq_file structure. Rather than using a %pK, a function should be used which checks the stored privilege. For example:

seq_printf(m, "secret pointer = %p\n", 
           seq_secret_pointer(m, secret_pointer));

The seq_secret_pointer function should return NULL if the user which issued the open does not have CAP_SYSLOG. Unfortunately, fixing this is not entirely straight forward. Most users of %pK are using the seq_file interface, and can be easily converted. There are a handful of %pK uses in printk statements, which are basically just incorrect since no sane permission check can be done (nobody opens printk). There is already a protection mechanism for printk, called dmesg_restrict, so the printk uses can simply be changed to %p. The only problem is the module sections files (see above), which use the traditional style sysfs show function rather than a seq_file (see kernel/module.c: module_sect_show()). Some thought needs to go into how to refactor that code in order to store the open time privileges.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s