The Internet is the largest treasure trove of data in the history of humankind! This repository of data is so large that companies and scientists are straining to understand and manage its scale.
We can mine that data with many different tools and sources. When that data is combined with data from multiple sources, a clear and valuable data set and insight can be garnered. This data can be prove very useful in a forensic investigation or in reconnaissance of target.
One source of an immense amount of data is the social networking site Twitter. Millions of people send out tweets daily including politicians, business people, celebrities and the U.S. President. Significant information and insights can be harvested from these tweets.
Recently, a new open source tool was developed to scrape information from this platform anonymously named twint. It is capable of scraping data from Twitter without using the Twitter API or even having an account with Twitter.
Let's take a look at how this tool works.
Step #1 Download and Install
The first step is to download this tool from github.com and its dependencies.
kali > git clone https://github.com/twintproject/twint.git
Once we have the code, we need to download its requirements.
kali > cd twint
kali > pip3 install -r requirements.txt
Now that we have installed twint in our system, let's take a look at its syntax.
Twint's syntax is rather simple.
twint -u <username> <options>
Options include;
--following
--followers
--favorites
-s <search string>
--year <limit search to a particular year>
-o <output> <file.txt or file.csv>
--database <sqllite database name>
Step #2 Gathering Info on a target
Let's try using this tool to gather some intelligence on the smarmy, second term congressman from Florida, Matt Gaetz. Gaetz is known for, among other things, his support for Holocaust deniers, white nationalism and being a Trump sycophant.
If we wanted to scrape all of the Twitter accounts Matt Gaetz is following and output them to a file name "gaetzfollowing" in a csv format, we could enter;
kali > twint -u mattgaetz --following -o gaetzfollowing --csv
As you can see, this tool outputs every account Matt Gaetz is following to the screen and into a .csv file gaetzfollowing.
We could also harvest his followers by entering;
kali > twint -u mattgaetz --followers -o gaetzfollowers --csv
If we want to see if the word "trump" appeared in Matt Gaetz's tweets, we could use the -s switch with the word trump.
kali > twint -u mattgaetz -s trump
Now we can see all of Rep. Gaetz's tweet regarding Trump including;
"I love @realdonaltrump "
on April 4, 2019.
We now have every tweet from Mr. Gaetz where he mentions "trump".
If we scroll down a bit, we can see that Mr. Gaetz didn't always love trump. On April 17, 2011 he tweets;
@realdonaldtrump is running for Pres??? Now I know how #Democrats feel every time @alsharpton runs #isthisreal
Apparently, Mr. Gaetz was equating Donald Trump and Rev. Al Sharpton in 2011. I don't think this was meant to be a flattering comparison.
By the time you read this, Mr. Gaetz will likely have deleted that old Twitter post, but we will have preserved it for all posterity.
Step #3: Scrape the Tweets and save to a Database
Often, we will want to harvest these tweets and then preserve and search them in a database. Database searches can be more effective, faster and have the capability of linking to other databases and tables for cross referencing.
Let's scrape all Matt Gaetz's tweets and put them in a database name mattgaetzDB.
kali > twint -u mattgaetz --database mattgaetzDB
As you can see, twint will now grab every tweet from our friend, Matt Gaetz.
Now, that we have all the tweets from Mr. Gaetz, we can open then with the sqllite database browser built into Kali.
Once the sqlite browser is open, simply go to File--> Open and select the mattgaetzDB file.
It should look like this.
We can see that there are 8 tables in our database.
Let's focus on his tweets rather the other information. When we expand the "tweets" table we can see all the fields in this table.
Let's now move to the tab to the far right (how appropriate in this case) labelled "Execute SQL".
Here we can create SQL queries to search this data. Let's search for every tweet where Mr. Gaetz mentions his friend 'trump".
To construct this query, we can enter;
SELECT tweet
FROM tweets
WHERE tweet LIKE '%trump%
When we execute this query by clicking the blue |>, we can see the results in the lower window.
Summary
Twitter, in particular, and open source intelligence, in general, can be an incredible tool to harvest all the data available to us on the web. Twint is a great tool, in combination with sqlite, for harvesting and analyzing data available to us through Twitter anonymously and without ever opening a Twitter account.