top of page
OTW

Open Source Intelligence (OSINT), Part 1: Mining Intelligence from Twitter (@mattgaetz)

Updated: Dec 30, 2022


The Internet is the largest treasure trove of data in the history of humankind! This repository of data is so large that companies and scientists are straining to understand and manage its scale.

We can mine that data with many different tools and sources. When that data is combined with data from multiple sources, a clear and valuable data set and insight can be garnered. This data can be prove very useful in a forensic investigation or in reconnaissance of target.

One source of an immense amount of data is the social networking site Twitter. Millions of people send out tweets daily including politicians, business people, celebrities and the U.S. President. Significant information and insights can be harvested from these tweets.

Recently, a new open source tool was developed to scrape information from this platform anonymously named twint. It is capable of scraping data from Twitter without using the Twitter API or even having an account with Twitter.

Let's take a look at how this tool works.

Step #1 Download and Install

The first step is to download this tool from github.com and its dependencies.

Once we have the code, we need to download its requirements.

kali > cd twint

kali > pip3 install -r requirements.txt

Now that we have installed twint in our system, let's take a look at its syntax.

Twint's syntax is rather simple.

twint -u <username> <options>

Options include;

--following

--followers

--favorites

-s <search string>

--year <limit search to a particular year>

-o <output> <file.txt or file.csv>

--database <sqllite database name>

Step #2 Gathering Info on a target

Let's try using this tool to gather some intelligence on the smarmy, second term congressman from Florida, Matt Gaetz. Gaetz is known for, among other things, his support for Holocaust deniers, white nationalism and being a Trump sycophant.

If we wanted to scrape all of the Twitter accounts Matt Gaetz is following and output them to a file name "gaetzfollowing" in a csv format, we could enter;

kali > twint -u mattgaetz --following -o gaetzfollowing --csv

As you can see, this tool outputs every account Matt Gaetz is following to the screen and into a .csv file gaetzfollowing.

We could also harvest his followers by entering;

kali > twint -u mattgaetz --followers -o gaetzfollowers --csv

If we want to see if the word "trump" appeared in Matt Gaetz's tweets, we could use the -s switch with the word trump.

kali > twint -u mattgaetz -s trump

Now we can see all of Rep. Gaetz's tweet regarding Trump including;

"I love @realdonaltrump "

on April 4, 2019.

We now have every tweet from Mr. Gaetz where he mentions "trump".

If we scroll down a bit, we can see that Mr. Gaetz didn't always love trump. On April 17, 2011 he tweets;

@realdonaldtrump is running for Pres??? Now I know how #Democrats feel every time @alsharpton runs #isthisreal

Apparently, Mr. Gaetz was equating Donald Trump and Rev. Al Sharpton in 2011. I don't think this was meant to be a flattering comparison.

By the time you read this, Mr. Gaetz will likely have deleted that old Twitter post, but we will have preserved it for all posterity.

Step #3: Scrape the Tweets and save to a Database

Often, we will want to harvest these tweets and then preserve and search them in a database. Database searches can be more effective, faster and have the capability of linking to other databases and tables for cross referencing.

Let's scrape all Matt Gaetz's tweets and put them in a database name mattgaetzDB.

kali > twint -u mattgaetz --database mattgaetzDB

As you can see, twint will now grab every tweet from our friend, Matt Gaetz.

Now, that we have all the tweets from Mr. Gaetz, we can open then with the sqllite database browser built into Kali.

Once the sqlite browser is open, simply go to File--> Open and select the mattgaetzDB file.

It should look like this.

We can see that there are 8 tables in our database.

Let's focus on his tweets rather the other information. When we expand the "tweets" table we can see all the fields in this table.

Let's now move to the tab to the far right (how appropriate in this case) labelled "Execute SQL".

Here we can create SQL queries to search this data. Let's search for every tweet where Mr. Gaetz mentions his friend 'trump".

To construct this query, we can enter;

SELECT tweet

FROM tweets

WHERE tweet LIKE '%trump%

When we execute this query by clicking the blue |>, we can see the results in the lower window.

Summary

Twitter, in particular, and open source intelligence, in general, can be an incredible tool to harvest all the data available to us on the web. Twint is a great tool, in combination with sqlite, for harvesting and analyzing data available to us through Twitter anonymously and without ever opening a Twitter account.


10,560 views
bottom of page