Выбрать главу

Here's how I built this command line. I started with this command: $ sudo tcpdump -l -n arp

sudo means to run the next command as root. It will most likely ask for a password. If you don't use sudo in your environment, you might use something like it, or you can run this entire sequence as root. Just be careful. To err is human; to really screw up, be careless with root.

tcpdump listens to the local Ethernet. The -l flag is required if we're going to pipe the output to another program because, unlike other programs, tcpdump does something special with output buffering so that it runs faster. However, when piping the output, we need it to act more normal. The -n means don't do DNS lookups for each IP address we see. The arp means that we only want tcpdump to display ARP packets.

(If you are concerned about privacy of your network, I'd like to point out some good news. There isn't much private data available to your eyes if, at the sniffing end, you filter out everything besides ARP packets.)

Run the command yourself. In fact, you will learn more if you try each command as you read this. Nothing here deletes any data. Of course, it may be illegal to snoop packets on your network, so be warned. Only do this on a network where you have permission to snoop packets.

When I run the command, the output looks like: $ sudo tcpdump -n -l arp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on en0, link-type EN10MB (Ethernet), capture size 96 bytes 19:10:48.212755 arp who-has 192.168.1.110 (85:70:48:a0:00:10) tell 192.168. 1.10 19:10:48.743185 arp who-has 192.168.1.96 tell 192.168.1.92 19:10:48.743189 arp reply 192.168.1.2 is-at 00:0e:e7:7a:b2:24 19:10:48. 743198 arp who-has 192.168.1.96 tell 192.168.1.111 ^C

To get the output to stop, I press Ctrl-C. Otherwise, it will run forever.

If you get a permission error, you may not be running the command as root. tcpdump has to be run as root. You wouldn't want just anyone listening to your network, right?

After the header, we see these "arp who-has X tell Y" lines. Y is the host that asked the question. The question was, "Will the host at IP address X please respond so that I know your Ethernet (MAC) address?" The question is sent out as a broadcast, so we should see any ARP requests on our local LAN. However, we won't see many of the answers because they are sent as unicast packets, and we are on a switch. In this case, we see one reply because we're on the same hub as that machine (or maybe that is the machine running the command; I won't tell you which it is). That's OK because we only need to see one side of the question.

That's our data source. Now, let's transform the data into something we can use.

First, let's isolate just the lines of output that we want. In our case, we want the "arp who-has" lines: $ sudo tcpdump -l -n arp | egrep 'arp who-has'

We can run that and see that it is doing what we expect. The only problem now is that this command runs forever, waiting for us to stop it by pressing Ctrl-C. We want enough lines to do something useful, and then we'll process it all. So, let's take the first 100 lines of data: $ sudo tcpdump -l -n arp | grep 'arp who-has' | head -100

Again, we run this and see that it comes out OK. Of course, I'm impatient and changed the 100 down to 10 when I was testing this. However, that gave me the confidence that it worked and that I could use 100 in the final command. You'll notice that there are a bunch of headers that are output, too. Those go to stderr (directly to the screen) and aren't going into the grep command.

So, now we have 100 lines of the kind of data we want. It's time to calculate the statistic we were looking for. That is, which hosts are generating the most ARP packets? Well, we're going to need to extract each host IP that generated an ARP and count it somehow. Let's start by extracting out the host IP address, which is always the sixth field of each line, so we can use this command to extract that field's data: awk '{ print $6 }'

That little bit of awk is a great idiom for extracting a particular column of text from each line.

I should point out that I was too lazy to count which field had the data I wanted. It looked like it was about the fifth word, so I first tried it with $5. That didn't work. So I tried $6. Oh yeah, I need to remember that awk counts starting fields with 1, not 0. The benefit of testing the command line as we build it is that we find these mistakes early on. Imagine if I had written the entire command line and then tried to find this bug?

I'm lazy and I'm impatient. I didn't want to wait for all 100 ARPs to be collected. Therefore, I stored them once and kept reusing the results.

I stored them in a temporary file: $ sudo tcpdump -l -n arp | grep 'arp who-has' | head -100 >/tmp/x

Then I ran my awk command against the temp file: $ cat /tmp/x | awk '{ print $5 }' tell tell tell tell ...

Dang! It isn't the fifth. I'll try the sixth: $ cat /tmp/x | awk '{ print $6 }' 192.168.1.110 192.168.1.10 192.168.1.92 ...

Ah, that's better.

Anyway, I then realized I could be lazy in a different way. $NF means "the last field" and saves me from needing to count: $ cat /tmp/x | awk '{ print $NF }' 192.168.1.110 192.168.1.10 192.168.1.92 ...

Why isn't it $LF? That would be too easy. No, seriously, the NF means "number of fields." Thus, $NF means the field that is NFth fields in from the left. Whatever. Just remember that in awk you can type $NF when you want the last field on a line. $ sudo tcpdump -l -n arp | egrep 'arp who-has' | head -100 | awk '{ print $NF }'

So, now we get output that is a series of IP addresses. Test it and see.

(Really! Test it and see. I'll wait.)

Now, we want to count how many times each IP address appears in our list. There is an idiom that I use all the time for just this purpose: sort | uniq -c

This sorts the data, then runs uniq, which usually eliminates duplicates from a sorted list (well, technically it removes any adjacent duplicate lines...sorting the list just assures us that the same ones are all adjacent). The -c flag counts how many repetitions were seen and prepends the number to each line. The output looks like this: ... 11 192.168.1.111 7 192.168.1.230 30 192.168.1.254 8 192.168.1.56 21 192.168.1.91 ...

We're almost there! Now we have a count of how many times each host sent an ARP. The last thing we need to do is sort that list so we know who the most talkative hosts were. To do that, we sort the list numerically by adding | sort -n to the end: $ sudo tcpdump -l -n arp | egrep 'arp who-has' | head -100 | awk '{ print $NF }' |sort | uniq -c | sort -n

When we run that, we will see the sorted list. It will take a while to run on a network that isn't very busy. On a LAN with 50 computers, this took nearly an hour to run when not a lot of people were around. However, that was after the machine with the spyware was eliminated. Before that, it only took a few minutes to collect 100 ARP packets.