If you’re worried that current versions of productive AIs are too good and empathetic, you’ll be alarmed to learn that there is now a new language model that is trained on the worst part of the internet, the Dark Web.
Amusingly called DarkBERT, this language model is a generative AI trained exclusively on the Dark Web. The team behind generative AI is working on whether using the Dark Web as a dataset would give AI a better context for the language used in that part of the internet, making it more valuable to people looking to scan the Dark Web and law enforcement combating cybercrime. He wanted to understand that he wouldn’t. The team reported their findings in a preprint article that has yet to be peer-reviewed.
In addition, the team extensively scanned a place most people don’t really want to go to and created an index of various domains.
The Dark Web is an area of the internet that Google and other search engines do not see and therefore are not easy for the vast majority of people to navigate. It can only be accessed using Tor (or similar) proprietary software, and so there are many rumors (and truths) about what’s going on out there. While urban legends speak of torture chambers, hitmen, and all sorts of horrific crimes, in reality, much of this space is just scams and other ways to steal your data without the security of browser security, which we all take so lightly. Still, the Dark Web is known to be used by cybercrime networks to speak anonymously, making it an extremely important target for cyber law enforcement.
A team from South Korea connected a language model to Tor to browse the Dark Web and retrieve raw data they found, creating a model that could better understand the language used there. Once completed, they compared how it performed against existing models that researchers had previously created, including RoBERTa and BERT.
The findings presented in the preprint showed that DarkBERT outperformed the others on all datasets, but was close. Since all AIs come from a similar framework, they are expected to have similar performance, but DarkBERT has had better success on the Dark Web.
As for the purpose of DarkBERT… the team expects it to be a powerful tool in scanning the Dark Web for cybersecurity threats and monitoring forums to detect illegal activity.