top of page
Writer's pictureotw

OSINT: Scraping email Addresses with TheHarvester

Updated: Dec 28, 2022

Welcome back, my aspiring OSINT investigators!


There are a multitude of tools to scrape email addresses from various locations, but theHarvester is one of the best! It's easy to use and effective. In addition, it is even better at enumerating subdomains than many of tools specifically designed for that purpose.






Unlike some of the other email scraping tools, theHarvester utilizes PGP keyservers to harvest email addresses for those using PGP to encrypt their email messages. This feature alone can make it valuable to finding email addresses not captured by other tools.


In many OSINT investigation scenarios, you'll want to find email addresses for a person or members of an organization. By identifying the target email, you can proceed to contact the target, send a phishing email, or even use one of the emails to social engineer the target from the same organization.



Step #1: Getting Started with theHarvester


The first step is to download the Harvester. If you are using a Linux distribution other than Kali, you can get theHarvester from github.com such as;


kali > git clone https://github.com/laramies/theHarvester


If you are using Kali, it is built into nearly every version. If not, simply download it from the repository. Note the lower case "h" in the repository name.


kali > sudo apt install theharvester





Step #2: the Harvest Syntax and help


Let's begin by examining the help screen for the Harvester.


kali > theHarvester -h




As you can see in the first screen above, the syntax is very straightforward.


theHarvester -d <domain>


We can specify which source we want to access for data by using the -b switch, such as;

  1. Baidu

  2. Bing

  3. Bing API

  4. Certspotter

  5. CRTSH

  6. DNSdumpster

  7. Dogpile

and many others. If you want to use all these resources, you can simply use the all switch from the command line.


In some cases, you will want to use the services API (application programming interface). To do so, open the text file in any text editor at /etc/theHarvester/api-kets.yaml like below.



Simply add the API keys of the services you want to use to this file and save it.


Step #3: Run a Scan with theHarvester


Now, let try using theHarvester against everyone's favorite electric car manufacturer, Tesla. To scrape all this data on Tesla, we can use the following command;


kali > theHarvester -d teslas.com -b all -f /home/kali/tesla_results2


Where:


theHarvester is the command


-d tesla.com directs the tool to scape data from the domain (-d) tesla.com


-b all directs this tool to use all the sources available


-f /home/kali/tesla_results2 directs the tool to send the results to file


When the Harvester completes its work, we can view the results by opening the tesla_results2 file to view with a web browser.


kali > firefox tesla_results2


This opens our HTML file as seen below.


If we scan down a bit, we can see that theHarvester has scraped numerous emails from Bing and other search engines.



Finally, at the bottom of the file is a summary of the results. Note that theHarvester was able to return 2107 emails and 13461 hosts!

Summary


When doing OSINT research, theHarvester is one of the first tools you want to use to gather emails, hosts and subdomains from a domain. In fact, it is a very good tool for scraping emails but is an even better as a tool to gather hosts and subdomains. In many cases, it is even better at gathering subdomains than those tools specifically designed to enumerate subdomains such as dnsenum.


For more on OSINT, you can find more resources here or attend OTW's OSINT training and become a certified OSINT investigator.

22,684 views
bottom of page