Extract and sort email addresses from text files with Automator
Automator (Mac OS X’s drag & drop batch script-building tool) can be used to create applications that streamline repetitive tasks without any knowledge of programming.
Follow these steps to build an Automator workflow that will search all text files in a specified folder for email addresses, sort these email addresses alphabetically, and remove any duplicates:

Launch Automator from your ‘/Applications‘ folder and choose to create a ‘Custom‘ workflow.
Add an ‘Ask for Confirmation‘ action to the workflow and give it an appropriate name and description to prompt the user whether she’d like to continue.
Add an ‘Ask for Finder Items‘ action to the workflow to let the user select the folder her text files reside in.
Add a ‘Set Value of Variable‘ action and create a new variable to store the user’s selection for later use.
Add a ‘Run Shell Script‘ action, set the Shell to ‘/bin/bash‘, and set Pass input to ‘as arguments‘. Add this line into the body of the shell script:
grep -Eiorh '([[:alnum:]_.]+@[[:alnum:]_]+?\. [[:alpha:].]{2,6})' "$@" | sort | uniq
The above command will use the UNIX grep utility to perform a regular expression search for email addresses in the folder the user specified (which will be automatically passed into this action as ‘$@‘). The results from this search will be piped into the sort utility to be sorted alphabetically, then into the uniq utility to remove duplicates.
Add a ‘New Text File‘ action to the workflow, specify a name for the new file, set Where to the variable storing the folder the user selected, and Encoding to ‘Unicode (UTF-8)‘. Check the ‘Replace existing files‘ checkbox.
Add an ‘Ask for Confirmation‘ action and change the name and description appropriately to notify the user that the process has completed.
To save the workflow, choose ‘File->Save‘ from the menubar and specify a filename, then choose ‘File->Save As…‘ and set the File Format to ‘Application‘ to compile a standalone application.

Geekology’s example of this Find Email Addresses in Files workflow can be downloaded here, with a compiled version of the application available here.
Related posts:
- Bulk convert HTML, RTF, etc. documents to PDF using the Mac OS X command line or an AppleScript
- Bash Script to split a single CSV file into multiple files with headers
- Using Regular Expressions – Part 3 of 3 – Examples
- Enabling Terminal’s directory and file color highlighting in Mac OS X
- Resize Images Using AppleScript



10 Feb 2009 








author
Awesome! Looked for hours trying to find a script that would extract email addresses from the bodies of 1000’s of emails in mail.app format. Presto! Even deleted duplicates on the fly. Thanks.
Remember to paste the code
“grep -Eiorh ‘([[:alnum:]_.]+@[[:alnum:]_]+?\.[[:alpha:].]{2,6})’ “$@” | sort | uniq”
as one line in the shell box. If you copy paste from the example, you get two lines of code.
for f in “$@”
do
grep -Eiorh ‘([[:alnum:]_.]+@[[:alnum:]_]+?\. [[:alpha:].]{2,6})’ “$@” | sort | uniq < – ONE LINE
done
all the best
Thanks!
Daniel- You’re tip really helped! Thanks
This script saved me COUNTLESS hours!
Send me your email address so I can donate some $$$ to you.
– John
Haha, no need John, just tell your friends about Geekology if you think they might find it useful.
this doesn’t find domains that have the ‘-’ character in them. but it works if you just add the ‘-’ as below:
#!/bin/bash
grep -Eiorh ‘([[:alnum:]_.-]+@[[:alnum:]_-]+?\.[[:alpha:].]{2,6})’ “$@” | sort | uniq
worked perfectly! thanks a million!
This works really well! Thanks for making this possible! The instructions were clear, concise, and the program works like a champ! Note: To those who may not do programming very much…… be sure and read the line from “Daniel” that talks about being sure the code is pasted as one line, not two!
Hi, just found this and it has brilliantly extracted a whole load of emails from a .txt document. I did find one problem though – it is cutting off the end of some domains e.g.
xxx@greendown.swindo
xxx@bws.wilts.
xxx@sheldon_school.dialne
I was using the compiled version and just ran it as I am not a geek!
help please!
Thanks
Gordon