pvaneynd: (Default)
2014-04-18 02:40 pm
Entry tags:

Finding a new line in a file

Today I helped a collegue who came with the question: I have two files, how do I find which lines were added to one file, but not to the other?

He was thinking of a program to write. I'm more a KISS person, why waste time writing a program when brute force will do just fine.

So:

We have two files a and b:

pevaneyn@mac-book:/tmp :) $ cat a
1
2
3
4
5
pevaneyn@mac-book:/tmp :) $ cat b
1
2
3
4
5
7
8


We want to see the lines in b which are not in a:

pevaneyn@mac-book:/tmp :) $ cat a b | sort | uniq -u
7
8


So we take the two files, sort then and then print the unique lines.

But what if there are also unique lines in a which we don't need? So let's add a line to 0 which we do not want to see in the output:

pevaneyn@mac-book:/tmp :) $ cat >> a
0
pevaneyn@Pmac-book:/tmp :) $ cat a b | sort | uniq -u
0
7
8


How do we remove this 0?

A trick is to include a twice, then a line in a will never be unique:

pevaneyn@mac-book:/tmp :) $ cat a a b | sort | uniq -u
7
8


I used a similar method today to find which interface gave the CRC errors...