pvaneynd: (Default)
[personal profile] pvaneynd
Today I helped a collegue who came with the question: I have two files, how do I find which lines were added to one file, but not to the other?

He was thinking of a program to write. I'm more a KISS person, why waste time writing a program when brute force will do just fine.

So:

We have two files a and b:

pevaneyn@mac-book:/tmp :) $ cat a
1
2
3
4
5
pevaneyn@mac-book:/tmp :) $ cat b
1
2
3
4
5
7
8


We want to see the lines in b which are not in a:

pevaneyn@mac-book:/tmp :) $ cat a b | sort | uniq -u
7
8


So we take the two files, sort then and then print the unique lines.

But what if there are also unique lines in a which we don't need? So let's add a line to 0 which we do not want to see in the output:

pevaneyn@mac-book:/tmp :) $ cat >> a
0
pevaneyn@Pmac-book:/tmp :) $ cat a b | sort | uniq -u
0
7
8


How do we remove this 0?

A trick is to include a twice, then a line in a will never be unique:

pevaneyn@mac-book:/tmp :) $ cat a a b | sort | uniq -u
7
8


I used a similar method today to find which interface gave the CRC errors...

Date: 2014-04-18 04:22 pm (UTC)
rbarclay: (laughingcat)
From: [personal profile] rbarclay
Nonono, that is elegant, but not real Brute Force(tm).

Because that would be a la bash for LINE in `cat b`; do grep "$LINE" a 1>/dev/null 2>/dev/null || echo "$LINE"; done (adjust IFS to suite) ;)

Date: 2014-04-19 07:52 am (UTC)
vatine: Generated with some CL code and a hand-designed blackletter font (Default)
From: [personal profile] vatine
There is also comm which gives you the ability to ferret out lines unique to the first, common, or unique to the second. Although I think that actually expects the line order to be the same, so if you simply swap two lines in the second file, at least one will be new.

Profile

pvaneynd: (Default)
pvaneynd

March 2017

S M T W T F S
   1234
567891011
12131415 161718
19202122232425
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 30th, 2017 02:55 am
Powered by Dreamwidth Studios