What's the Value of Beacon's Data Sources?

This article describes the value of Beacon's data sources. Note that some sites, particularly on the dark web, are constantly changing as they are dismantled or launched.

Navigate to data source descriptions by category. Note that these categories align with each tab in Beacon's search results view:

Dark Web

Deep Web

Social

Documents

Breaches

 

sources

Dark Web Sources

The dark web (darknet) is classified as any content intentionally hidden or anonymized online. The dark web is made up of unindexed websites and networks that can only be accessed through special software such as the Tor browser. Because the dark web creates user anonymity, it is a breeding ground for illicit activity, ranging from discussions to the sale of illegal substances and services.

Tor Sites

The Tor browser was created by the U.S. Naval Research Laboratory in the 1990s to enable secure government communications. The name is derived from an acronym for the original software project name, "The Onion Router." Tor is the most well-known and widely used network on the dark web.

A Tor user’s internet traffic is routed through the Tor network and enters several randomized relays before exiting. This process makes it theoretically impossible to decipher which computer originally requested the traffic. This signifies the layering behind the “onion” browser creating user anonymity.

Tor sites, which use .onion as their top-level domain, are largely comprised of discussion forums, marketplaces, and news/commentary sites. These sites are useful for finding criminal planning discussions and users selling illegal goods and services.

Some active Tor sites include:

Empire Market: one of the most active dark web marketplaces. Users can find a variety of illegal goods and services for sale, from narcotics to financial accounts and fraud services.

The Canadian Headquarters: a marketplace for selling a range of goods and services, from illicit substances to financial fraud tools. Many of these are specific to Canadian organizations, such as Canadian financial institutions.

Hydra: a marketplace (similar to Empire Market) catering to Russian users.

Empire Market Forum: a discussion forum for Empire Market users. Topics include fraud how-to’s, trafficking strategies, recruiting, and more.

The Daily Stormer: a news/commentary site catering to white-supremacists, anti-semites, and neo-nazis. It was founded in July 2013 and moved to the dark web in August 2017. The site is known for internet trolling and organizing harassment campaigns. For example, it was used to help organize the “Unite the Right” rally in Charlottesville, Virginia in 2017.

Dread: a forum similar to Reddit. The site contains sub-communities and is known for discussions about how to make illegal substances, recommended dealers, and which other Tor sites are run by scammers or have been dismantled.

ZeroNet

ZeroNet is a peer-to-peer network launched in 2015. Every network peer acts as a server, making it decentralized and immune to censorship. ZeroNet is not inherently anonymous—but users can achieve anonymity through Tor. It’s also open-source; any user can clone and create their own versions of sites within ZeroNet.


Some ZeroNet sites include:

0rc: a decentralized messaging forum with topics related to the Internet Relay Chat (IRC).
KopyKate Big: a video-sharing site—like a decentralized version of YouTube or Vimeo.
Millchan: a decentralized imageboard site similar to 4chan.
Peeper: a microblogging site similar to Twitter.
Play: a BitTorrent site where users can access links to copyrighted movies.

I2P (Invisible Internet Project)

I2P is an anonymizing network that focuses on secure internal connections and user communication rather than exchanging goods. Its primary function is to be a “network within the internet” with traffic contained within its borders. In the I2P network, hosted websites are known as “eepsites” and have .i2p as their top-level domain.

Some I2P sites include:

Bigbrother: a news forum
Sigterm: discussion forum privacy and security-related topics
Echelon: discussion forum topics related to I2P software
Forum: the oldest and most active forum on I2P
Id3nt: a microblogging site
Anch: a Russian-language imageboard site for anarchists
Theanondog: articles related to politics, security, and revolts
Leecher: for downloading popular TV shows

OpenBazaar

OpenBazaar is a decentralized, open-source marketplace launched in 2016. The network’s goal is to avoid the “middleman” involved in surface web commerce. Buyers and sellers on OpenBazaar use cryptocurrencies and engage directly to avoid fees associated with typical payment methods like Paypal. Users can also share as little or as much personal information as they want. There are over 20,000 sellers on OpenBazaar with user activity across 150 countries.

OpenBazaar is not inherently anonymizing but can be accessed through Tor if users desire anonymity. The network does not cater to illicit exchanges, and the bulk of its transactions are not illegal. However, because it is decentralized, OpenBazaar has no way to accurately track or deal with illegal activity. 

Illegal OpenBazaar listings are not indexed and are not always accessible by search engines within the marketplace. On the darknet, marketplace moderators assume the risk of running a site—but OpenBazaar users run software on their own computers, so each user assumes the risk of engaging in illicit transactions.

openbazaar

Deep Web Sources

The deep web includes websites and data that are non-discoverable by conventional surface web search engines like Google and Bing. Beacon indexes a variety of deep websites and chat applications.

The term “deep web” is not interchangeable with the dark web—it includes the dark web, as well as password-protected or dynamic pages, encrypted networks, and internet archives. Much of the deep web’s content requires authentication to view, such as private banking pages, email accounts, and direct messages on social media. The deep web to be at least 400-500 times the size of the surface web.

Deep Websites

Considering the size of the deep web, it would be impossible to list every site. Here are a few examples that are particularly useful for threat intelligence.

Craigslist: a classifieds site used for hosting discussion forums and advertising goods, services, housing, and employment. Scams and sales of counterfeit or stolen goods are common on Craigslist.

craigslist

LeoList: a Canadian classifieds site frequently used by sex workers. It has been linked to human trafficking cases.

leolist

Telegram

Telegram is a cloud-based instant messaging, voice, and video messaging service similar to WhatsApp. It’s considered to be one of the most secure messaging apps for several reasons:

  • Chats can be destroyed when the conversation ends, or be automatically deleted with a self-destruct timer.
  • Telegram boasts three layers of encryption, as opposed to the typical two layers touted by other messaging apps.

Telegram allows groups with up to 200,000 members each. Groups can be public or private. Groups differ slightly from channels, which have no limit on the number of users in them. When users post in a channel, their identities aren’t shown. Only the name and avatar of the channel is revealed in this public format.

Telegram also offers access to their public API, which opens up endless possibilities for individuals to create games, get alerts, create data visualizations, build custom tools, and even exchange payments between users. API access to Telegram means that many of the conversations in public channels are largely discoverable to organizations gathering open source intelligence from online sources.

With over 200 million active users, it is no surprise that Telegram is a popular place to hold discussions about illegal activity. There have been many reports of phishing scammers using Telegram as their method of contact with victims.

telegram

Discord

Discord is a voiceover IP and messaging program. Discord’s user interface looks like a cross between Skype and Slack. It’s free to use and is available as a web, mobile, and desktop application.

Within Discord, users can create their own servers and host private, password protected, or public channels within those servers. Channel moderators then send invitations to users in order for them to join a channel. Although it was originally built for the gaming community, its versatile chat, video, and voice capabilities have drawn in a diverse mix of 200 million users since December 2018

Discord has been criticized for being vulnerable to attacks from cybercriminals, and the privacy and security of the platform has often been called to question. Beyond security issues, the conversations taking place on Discord have evolved to include adult, narcotic, or NSFW (Not Safe For Work) content. Discord is linked to discussions about illegal activity as well as the alt-right movement. In August 2017, it was discovered as a planning tool for organizing the “Unite the Right” rally in Charlottesville, VA.

discord-1

IRC (Internet Relay Chat)

The IRC is an instant messaging application designed for large numbers of users to communicate in real-time. It was created in 1988 and has declined in popularity since 2003 as more users move to social media platforms and other messaging tools. The IRC still has close to 500 million active users and 250,000 channels.

Users require a client to connect to a server on one of the IRC networks. There are over 800 active IRC networks ranging from popular networks with over 10,000 users to smaller networks associated with specific locations or topics. The IRC has been associated with illegal file trading, denial of service (DoS) attacks and trojan/virus infections.

The IRC isn’t inherently designed for anonymity. Users must use a VPN or access the IRC through Tor to achieve user anonymity.

Some IRC networks include:

EFnet: The original IRC network’s “descendent.” It’s associated with illegally copied software, hackers, and DoS attacks.

Freenode: Peer technical support for free software and open source projects.

Undernet: One of the largest IRC networks with close to 1 million active weekly users.

Social Sources

Beacon's social sources include a mixture of networks on the dark web (8kun, Endchan), and the deep web (4chan, Gab, Mastodon, Raddle, Reddit).

Many of these social networks were created by communities who were banned from more mainstream, regulated social networks for engaging in harmful or nefarious discussions. This means that they tend to have a more decentralized model where users are anonymized and content is less moderated by a central body.

Telegram

Telegram is a cloud-based instant messaging, voice, and video messaging service similar to WhatsApp. It’s considered to be one of the most secure messaging apps for several reasons:

  • Chats can be destroyed when the conversation ends, or be automatically deleted with a self-destruct timer.
  • Telegram boasts three layers of encryption, as opposed to the typical two layers touted by other messaging apps.

Telegram allows groups with up to 200,000 members each. Groups can be public or private. Groups differ slightly from channels, which have no limit on the number of users in them. When users post in a channel, their identities aren’t shown. Only the name and avatar of the channel is revealed in this public format.

Telegram also offers access to their public API, which opens up endless possibilities for individuals to create games, get alerts, create data visualizations, build custom tools, and even exchange payments between users. API access to Telegram means that many of the conversations in public channels are largely discoverable to organizations gathering open source intelligence from online sources.

With over 200 million active users, it is no surprise that Telegram is a popular place to hold discussions about illegal activity. There have been many reports of phishing scammers using Telegram as their method of contact with victims.

telegram

2channel.moe

2channel.moe is an anonymous Russian imageboard site that functions similarly to 4chan. The site is hosted on the dark web, as it is blocked in Russia. Users post about a variety of topics—including politics, sharing Tor links, and technology discussions.

The site requires users to be over 18 and avoid any method of self-identification or attention-seeking. It also prohibits child pornography, propaganda for the use of narcotic drugs, discussion of suicide, flood, spam, and "disrespect for interlocutors" (translation). Although there are no public statistics about user numbers, there were 124,000 total posts at the time of writing this article.

2channel homepage

4chan

4chan is an imageboard site containing a wide range of topics. Like many imageboards, the site is populated with subcultures and controversial activist groups, such as the alt-right. 4chan threads also expire from the site after a short amount of time, making its content structure unique from other imageboards.4chan-1

8kun

8chan (8kun’s predecessor) was an anonymous, extremist-friendly forum that was a breeding ground for bigotry and violence. After three 8chan-linked mass-shootings, the popular board was taken offline in August of 2019. In November of 2019, the board was controversially rebranded as 8kun. 

8kun is nearly identical to 8chan, eliciting many of 8chan’s users to flock to the new alt-right haven, notably QAnon. QAnon is a conspiracy theory group detailing a supposed secret plot against President Donald Trump and claims that only a site structured like 8chan or 8kun could allow them to reliably communicate.

Users are presently constructing 8kun into a radical board similar to its precursor- transferring their boards from the offline 8chan and rebuilding their extremist community.

8kun

EndChan

Endchan is an imageboard or "chan" site similar to other well-known imageboards such a 4chan and the former 8chan. The site describes itself as "an anonymous imageboard that promotes ideas over identity." Although its user base isn't as large as more popular chan sites like 4chan, its purpose is similar—to provide an anonymous platform that promotes free expression. Assuming the site's demographics are roughly similar to 4chan, its user base is largely American males between 18-34.

A lot of content on Endchan is harmless—ranging from news commentary to software discussions. However, like other chan sites, Endchan is populated with users either migrating from other chan sites taken offline (ie. 8chan) or banned from more heavily moderated or centralized platforms for expressions of hate speech or extreme ideologies. Endchan is linked to the 2019 Bærum mosque shooting in Norway, where a user posted suggestive content referencing Brenton Tarrant shortly before opening fire in the mosque.

endchan

Gab

Gab is a popular social networking site described as a "safe haven" for extremists including the alt-right, white supremacists, and neo-Nazis. The platform is operated by the people for the people- it lacks overarching moderation and posts are instead "upvoted" to increase visibility. Gab stands by their mission of "defending free expression and individual liberty online for all people” by welcoming social media users banned from other platforms for violating their terms of service.gab-logo

Mastodon

Mastodon is a free, open-source microblogging software with a Twitter-like interface. Users post public or private 500-character messages known as “toots.”

Mastodon is a decentralized, federated network. This means that instead of operating as one website and storing its data in one place, it distributes data across thousands of websites and servers around the world. Mastodon subnetworks or servers, known as “instances,” each host distinct content types and communities with their own terms of service and codes of conduct. Users can interact across instances or block content from instances with policies and content they disagree with.

The platform has "zero tolerance" policies for hate speech and harassment. This was complicated in July 2019 when Gab, known for its far-right user base, migrated to a Mastodon server. Even though Gab is technically hosted on Mastodon, it is recognized as a unique data source in the Echosec Systems Platform.
matodon 150 icon

Raddle.me

Raddle is a web-based forum where users can post pictures, discussions, and links. Raddle has a mission statement "to give the people control over their own community, with full transparency and accountability between mods, admins and users." The network enables users to engage without user tracking, ads, profiling, or collection/sharing of data or IP addresses.

Raddle was created by users banned from certain subreddits and is known for hosting public discussions about illegal activities like shoplifting.

raddle-1

Reddit

Reddit is a popular social, news, content rating, and discussion website. Unlike 4chan, users have personal accounts that allow them to remain pseudonymous. Of all of Beacon’s social sources, Reddit is probably the most moderated for harmful content.

Reddit is divided into communities called subreddits, covering a huge variety of topics. Within subreddits, users create relationships with other like-minded individuals and contribute to distinct internet cultures, like “TodayILearned” or “LifeHacks.”

reddit-1

 

 

Documents Sources

Documents sources include popular paste sites. Paste sites allow users to publicly share blocks of plain text, and are often used by adversaries to share leaked information. Beacon currently crawls three documents sources: Pastebin, DeepPaste, and PasteFS.

Pastebin

Pastebin is used for sharing blocks of plain text—most commonly for developers to share blocks of code. It was launched in 2002 to allow IRC users to link source code rather than pasting in blocks of text that could interrupt the flow of a discussion. The site currently has 17 million unique monthly users that can write posts (called “pastes”) anonymously.

The site is user-friendly, supports large text files, doesn’t require user registration, enables user anonymity, and relies on its user base for moderation. This makes it a popular site for threat actors publicly exposing breached data, including PII, doxxes, and stolen source code.Pastebin.com_logo

DeepPaste

DeepPaste is a dark website that allows users to publish plain text anonymously. It is similar to other paste sites in its basic functionality as a text-sharing site. However, DeepPaste’s heightened anonymity on the Tor network means that the site is populated with nefarious content that would be removed from moderated open websites like Pastebin.

Some examples of DeepPaste content includes users sharing .onion links for other Tor sites, users offering illegal goods or services (e.g. financial fraud, ransomware, child pornography, human trafficking, narcotics), and personally identifiable information breaches (doxxing). Even though the site claims that child pornography is “not welcome,” site admins are prohibited from censoring or deleting content.

The site has a simple interface, organizing content by the latest public pastes and the top latest public pastes. Unlike Pastebin, users must be registered to create or comment on pastes. According to the site’s counter, DeepPaste gets about 400K-500K views per day.

DeepPaste

PasteFS

PasteFS is a deep web paste site where users can publicly share text, images, or documents. This differentiates it from other paste sites, such as Pastebin, which only allow text sharing.

According to PasteFS’s Terms of Service, the site removes reported copyright violations and condemns spam, theft, and files containing malware/viruses—but does not actively monitor content on their servers or take responsibility for content posted. Users can view trending and recent pastes, and require no account to publish content.

Some examples of PasteFS content includes: news sharing, NSFW content, and users offering illegal hacking services.

pastefs logo

Breaches

Breached data sources include over 10 billion records (and growing) of publicly-available leaked data repositories. This includes leaked data from specific company or website breaches, as well as breached data collections, such as credential stuffing lists. These repositories are continuously refreshed as new breaches are discovered.