Extract Unique Lines From a File

Page Trail: [[ Extract Unique Lines From a File ]]
2009-05-07 23:52

If you want to get rid of duplicate lines from a file or pipe, use

sort -u
or
sort | uniq

For example, maybe you're searching for another front-end to libpurple, the library underneath pidgin. You try to use apt-cache rdepends but find the output is cluttered with duplicate entries (bug #335925).

$ apt-cache rdepends libpurple0 | tail -n +3 | sort
  finch
  finch
  libpurple-bin
  libpurple-bin
  libpurple-dev
  libpurple-dev
  msn-pecan
  pidgin
  pidgin
  pidgin-dbg
  pidgin-dbg
  pidgin-facebookchat
  pidgin-librvp
  pidgin-mpris
  pidgin-nateon
  pidgin-plugin-pack
  pidgin-privacy-please
  pidgin-privacy-please
  pidgin-sipe
  telepathy-haze
  telepathy-haze

Note that I've trimmed off the header (with tail) and sorted the list (with sort) here to make this more obvious.

Using the above tip to see only unique lines, you can easily work around this bug:

$ apt-cache rdepends libpurple0 | tail -n +3 | sort -u
  finch
  libpurple-bin
  libpurple-dev
  msn-pecan
  pidgin
  pidgin-dbg
  pidgin-facebookchat
  pidgin-librvp
  pidgin-mpris
  pidgin-nateon
  pidgin-plugin-pack
  pidgin-privacy-please
  pidgin-sipe
  telepathy-haze