Delete Duplicate emails

From DreamHost
Jump to: navigation, search
WARNING

This no longer works due to a system change as of April, 2008.
New and existing email address may no longer be tied to a shell user if they are not already, though email may be forwarded to a shell account.
See Shell-linked E-mail for details.

Deleting duplicate emails.

Note: This only works if you have a shell account associated with your email.

So you have a maildir folder that is full of duplicates. If you have hundreds or thousands of duplicates this method may work for you.


Dependencies

PCRE

Download and install PCRE libs because it's a dependency of maildrop.

mkdir -p $HOME/src
cd $HOME/src
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-7.7.tar.bz2   
tar xvfj pcre-7.7.tar.bz2
cd pcre-7.7
nice ./configure  --prefix=$HOME/local/ --help
nice make
nice make install

Setup your path

Add $HOME/local/bin to your $PATH

echo "export PATH=\$HOME/local/bin:\$PATH" >> $HOME/.bashrc
echo "export MANPATH=\$HOME/local/man:\$MANPATH" >> $HOME/.bashrc 

Maildrop

Downlaod and install maildrop. We need this for the reformail tool

mkdir -p $HOME/src
cd $HOME/src
wget http://switch.dl.sourceforge.net/sourceforge/courier/maildrop-2.0.4.tar.bz2
tar xvfj maildrop-2.0.4.tar.bz2
cd maildrop-2.0.4
CFLAGS="-I$HOME/local/include" CXXFLAGS="-I$HOME/local/include" LDFLAGS="-L$HOME/local/lib" nice ./configure  --prefix=$HOME/local/
nice make

Delete duplicates

Go to the maildir directory

 cd $HOME/.maildir/.Folder-name

Make sure there's nothing sitting in the new subdirectory.

ls new

If there are messages in the new subdirectory, open the mailbox in a user agent to get it to move them into cur.

See how many messages you have:

ls cur | wc -l
  842

Note: In this example we counted a total of 5746 messages


Check they all have Message-IDs:

Note: This may take a long time on a large folder

for i in cur/*; do nice reformail -x Message-ID: <$i; done | wc -l
    842

See how many you have if you filter out duplicate Message-IDs:

Note: This may take a long time on a large folder

for i in cur/*; do nice reformail -x Message-ID: <$i; done | sort -u | wc -l
   698

See how many we're going to delete:

rm /tmp/$USER.dups
for i in cur/*; do nice reformail -D 20000 /tmp/$USER.dups <$i && echo $i; done | wc -l
   144
expr 698 + 144
  842

If this total doesn't match you should increase the 20000 - reformail isn't remembering enough Message-IDs to spot all the duplicates.


Make a backup of cur (optional):

Just incase you screw up.

cp -a cur  backup


Delete the messages and check things look right afterwords:

rm /tmp/$USER.dups
for i in cur/*; do nice reformail -D 20000 /tmp/$USER.dups <$i && rm -fv $i; done | nl
ls cur | wc -l
  698

Cleanup your temporary file

rm /tmp/$USER.dups


Other options

Remove Duplicate Messages Plugin

Remove Duplicate Messages Plugin Note: If you delete more than 300 message or so at a time, there's a good chance thunderbird will timeout, and resend the command, resulting in lots of duplicate messages in your trash folder.


References

Removing Duplicate E-mail Messages From A Mailbox