top of page
OTW

OSINT, Part 5: Collecting Metadata with Metagoofil

Updated: Dec 30, 2022


Welcome back, my aspiring cyber warriors!

Sometimes the best information is just there for our asking! Given a little knowledge and some simple tools and techniques, we can harvest information about individuals and organizations that they are not aware they are providing us!

Organizations often post documents on their website usually in a Word .doc (x), Excel .xls (x) or PDF format. These documents include significant amounts of metadata (data about data) that may include;

1. User Names

2. Email addresses

3. Printers

4. Software used to create it

If we can harvest this data, it can be critical to an effective social engineering attack, pentest or forensic investigation.

Earlier, I had showed you how to use the Windows-based tool, FOCA, to gather metadata. In this tutorial, we will be using a Linux command line (cli) tool to do a similar task, named metagoofil. It's always useful to have multiple tools to do similar tasks as the results may vary depending upon many variables.

Step #1: Download and Install metagoofil

Although metagoofil is no longer built into Kali, it is in Kali's repository so you only need to download the package from the Kali repository.

kali > apt-get install metagoofil

Step #2: metagoofil Help

After downloading and installing metagoofil, simply enter the command metagoofil in your terminal and metagoofil will display it's help screen like below.

As you can see, metagoofil has only a few options and the examples near the bottom of the screen display. The key options are;

-d domain to search

-t the type of files to search for

-l limit of the number of files

-n number of files to download

-o output directory to download results to

-f format of the results

Step #3: Using metagoofil to Harvest Metadata at SANS.org

Let's try harvesting some metadata from sans.org, the cybersecurity training organization.

kali > metagoofil -d sans.org -t doc,pdf -l 20 -n 10 -o sans -f html

Where:

-d sans.org is the domain to harvest

-t doc, pdf are the types of files to harvest

-l 20 limit the results to 20 files

-n 10 limit the downloads to 10

-o sans output to the directory sans

-f html send the results in a html format

As metagoofil completes its harvesting of metadata it begin the display in the terminal. As you can see below, it was able to recover 6 user names, a list of software used to create the documents and 11 email addresses.

We can also view the results from a browser as we defined the output type as html. Open your browser and navigate to /root/html.

As you can see below, metagoofil has created an easy to read html document with all the metadata it was able to harvest from documents on the website sans.org

The information we were able to easily harvest from this site can be used to;

1. Design a social engineering attack against the email addresses;

2. Exploit the software we now know is on some systems;

3. Find individuals we have been searching for.

Conclusion

Some simple techniques and tools can effectively harvest open source intelligence from the vast repository of data on the Internet. metagoofil is an effective tool for extracting metadata from documents that are on a organizations website, if the metadata has not been effectively stripped out. This metadata can be used for multiple purposes including pentesting, forensic investigation and social engineering.


5,278 views
bottom of page