Skip to content


The Invisible Line Between Social Media Data Mining and OSINT

by George Mapp on March 2nd, 2013

smdataminingSocial Media Explosion

The recent exponential growth in social media has made it an absolute necessity to use for anyone and everyone competing for any type of business worldwide. When watching CNN, BBC, Al Jazeera and other major news networks you are likely to see displayed on the bottom of the telivision screen: as reported by Twitter. Almost on a daily basis the local news will use Face Book photos of a suspect or people of interest if their profile is public.

Because of the vast numbers of people using social media, it has caught the attention of advertisers, venture capitalists, recruiters, the IRS and yes many other U.S. government agencies including the Intelligence Community. Twitter had over a half-billion users as of July 2012 according an article by Tech Crunch titled, Analyst: Twitter Passed 500M Users In June 2012, 140M Of Them In US; Jakarta ‘Biggest Tweeting’ CityWhat I find provocative about Twitter is that you can follow the journalists’ directly that are stationed worldwide reporting instantly what they are seeing, what is happening and additionally they are sometimes adding photos and video links instantly before any media outlet has reported on the story.

Last October Face Book hit 1 billion users as reported by Digital Life of the Today Show. That’s ‘almost’ an inconceivable amount of data and information to comprehend. Imagine how many photo’s, wall posts, likes, comments etc. that each user has and then multiple that by a billion! Fortunately due to the advances in technology all of that information can be gathered, stored and can even be used to predict the future. And some even say effect politics and go so far as to say that social media and data mining effected the outcome of the U.S. Presidential elections. I will provide several examples throughout the this post to clarify the above statements.

Data Mining VS. OSINT

The title of this post mentions an ‘invisible line’ between Social Media Data Mining and OSINT because the definitions as well as the uses often overlap with each other. Most of my readers know what both mean as well as the fine distinctions, however, I will define the two for the novice readers.

Data Mining: The process of collecting, searching through, and analyzing a large amount of data in a database, as to discover patterns or relationships: the use of data mining to detect fraud. The gathering of information from pre-existing data stored in a database, such as one held by a supermarket about customers’ shopping habits.

OSINT: Open source intelligence (OSINT) is a form of intelligence collection management that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence. In the intelligence community (IC), the term “open” refers to overt, publicly available sources (as opposed to covert or classified sources); it is not related to open-source software or public intelligence… continue reading on Wikipedia

Others definitions from the community via Recorded Future

  • Any unclassified information, in any medium, that is generally available to the public, even if its distribution is limited or only available upon payment.
  • Intelligence is a process by which information is treated to answer and provide analysis which is used by a body. That information comes from open source or not does not makes it Open Source Intelligence. Stricto senso Open Source Intelligence would mean that it would be open, available, to all.

After reading the definition it becomes evident why OSINT and data mining have become so crucial in today’s environment. Here is an example by at Recorded Future of how much intelligence is considered open source:

With an estimated 90% of required intelligence available in open source, it is imperative that intelligence analysts become adept at mining open sources. Recorded Future can help reduce research time, identify new sources, build timelines, chart networks, perform link analysis and more.

Due to the sheer volume of available data and the continued rise of technology in our daily lives, data mining is a necessity for many business’ to compete and to stay in business. The term open source intelligence has risen to a new level because of the internet and the vast amounts of data on the web. Also, with the advances in technology hackers and / or hacktivist’s. The definition of what is open source and what is private or considered classified by business’ and governments has become increasing blurry. In fact, many members of the U.S. intelligence community see hacking, cyber security, infosec and cyber threats as the number one national security threat in our country.

Data Mining and OSINT in Use Today

Here are a few examples of how both data mining and OSINT — in some instances combined — current uses in our world today. To give you an understanding of just how much data there is in use, I have used the NSA as an example.

The U.S. super secret code crackers better known as the The National Security Agency are building a $2 billion spy center in Utah to keep up with explosive growth of social media and the tremendous amount of data it produces daily. In a Forbe’s article last year titled, NSA’s New Data Center And Supercomputer Aim To Crack World’s Strongest Encryption, here are a few highlights about the agency and the vast amount of data that it collects:

  • The $2 billion data center  being built in Utah would have four 25,000 square-foot halls filled with servers, as well as another 900,000 square feet for administration.
  • It will use 65 megawatts of electricity, with an annual bill of $40 million, and incorporates a $10 million security system.
  • Since 2001, the NSA has intercepted and stored between 15 and 20 trillion messages, according to the estimate of ex-NSA scientist Bill Binney. It now aims to store yottabytes of data. A yottabyte is a million billions of gigabytes. According to one storage firm’s estimate in 2009, a yottabyte would cover the entire states of Rhode Island and Delaware with data centers.
  • When the Department of Energy began a supercomputing project in 2004 that took the title of the world’s fastest known computer from IBM in 2009 with its “Jaguar” system, it simultaneously created a secret track for the same program focused on cracking codes. The project took place in a $41 million, 214,000 square foot building at Oak Ridge National Lab with 318 scientists and other staff. The supercomputer produced there was faster than the so-called “world’s fastest” Jaguar.
  • The NSA project now aims to break the “exaflop barrier” by building a supercomputer a hundred times faster than the fastest existing today, the Japanese “K Computer.” That code-breaking system is projected to use 200 megawatts of power, about as much as would power 200,000 homes.

Earlier in my post I mentioned that social media and data mining effected the U.S. Presidential elections. An article written this past November titled, How social media, data mining, and new-fangled technology tipped the 2012 election, solidified how important and influential social media is in our daily lives. Here is an excerpt from the above mentioned article:

While Facebook is still arguably the place that people turn to before and after events, Twitter solidified itself as the go-to for all things real-time last Tuesday night. And better yet: It didn’t break.

“[During election night] Twitter averaged about 9,965 [Tweets per second, TPS] from 8:11pm to 9:11pm PT, with a one-second peak of 15,107 TPS at 8:20pm PT and a one-minute peak of 874,560 TPM,” Twitter announced, via its Engineering Blog. “Seeing a sustained peak over the course of an entire event is a change from the way people have previously turned to Twitter during live events.”

Defense contractor’s are also in on the action. This is significant because the U.S. Department of Defense is the world’s largest employer thus its contractor’s are currently developing products that the DoD might purchase from them. Earlier in February an article tiled Raytheon’s New Social Media Data Mining Software, ampliflies the importance of both social media and data mining. Here are a few excepts from the article:

Defense giant Raytheon’s new RIOT software data mines information from Facebook, Twitter, Foursquare, and image EXIF metadata, and applies predictive analytics to determine users’ physical locations.

“Riot is a big data analytics system design we are working on with industry, national labs and commercial partners to help turn massive amounts of data into useable information to help meet our nation’s rapidly changing security needs,” Raytheon’s Jared Adams told The Guardian by email.

Here is another recent article by Danger Room titled Pentagon Inks Deal for Smartphone Tool That Scans Your Face, Eyes, Thumbs , that illustrates how important social media and data mining are to the U.S. government. The following excerpt is taking from the previously mentioned article:

In a few years, the soldier, marine or special operator out on patrol might be able to record the facial features or iris signature of a suspicious person all from his or her smartphone — and at a distance, too.

The Defense Department has awarded a $3 million research contract to California-based AOptix to examine its “Smart Mobile Identity” biometrics identification package, Danger Room has learned. At the end of two years of research to validate the concepts of what the company built, AOptix will provide the Defense Department with a hardware peripheral and software suite that turns a commercially available smartphone into a device that scans and transmits data from someone’s eyes, face, thumbs and voice.

Think about the amount of ALL the billion plus Face Book users and now imagine how many photos that each user has including many photos of those that chose not to be on Face Book for a variety of reasons, thus you have a huge data base for facial recognition. The FBI recently announced a $1 billion facial recognition software project. Here are a few excerpts from a Business Insider article last September titled, The FBI’s Nationwide Facial Recognition System Ends Anonymity As We Know It:

The FBI has begun installing state-of-the-art facial recognition technology across the country as part of an update to the national fingerprint database, Sara Reardon of the New Scientist reports.

The agency’s $1 billion Next Generation Identification (NGI) program will also include iris scans, DNA analysis and voice identification by 2014.

Now let’s look at a company that analyzes open source intelligence and makes predictions about the future. The company is called Recorded Future. For further information you can go to their website and watch a video called How Recorded Future Works. I will provide the following excerpts from Recorded Future’s website which describes a bit more of what exactly they do:

Collect Public Web Content:

We continually scan tens of thousands of high-quality, online news publications, blogs, public niche sources, trade publications, government web sites, financial databases and more.

Analyze the Text:

From these open web sites, we identify references to entities and events. These are organized in time by extracting publication date and any temporal expressions in the text. Each reference is linked to the original source and measured for online momentum and tone of language: positive or negative sentiment.

Visualize Insights:

You can explore the past, present and predicted future of almost anything in a matter of seconds. Our powerful interactive tools facilitate analysis of temporal patterns and better understanding of complex relationships and issues.

Recorded Future is very cutting edge, so much in fact that it has attracted the attention of the Central Intelligence Agency. It is well known both inside and out if the Intelligence Community that the CIA uses In-Q-Tel as its investment banking arm. It so happens that In-Q-Tel has in fact invested in Recorded Future. Given their area of expertise the investment should come as no surprise to many.

Data Mining and OSINT Aren’t Just for Spooks

I must give the credit to the caption above to at Krypt3ia. He recently wrote an interesting blog post titled, No, You’re Not A Spook Just Because You Track Social Media and Do OSINT, this article just reinforces my point of how important data mining and OSINT tools are crucial to any individual, business, government and intelligence agency. In short, social media data mining and / or OSINT is not just for the intelligence community but to anyone who wishes to use it. Both of these tools have become such an important part of our everyday lives, behavior and more so everyday a part of what we will do tomorrow.

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS

Follow Me