BY KIM BELLARD
If you’ve been following artificial intelligence (AI) lately – and you should be – then you may have started thinking about how it’s going to change the world. In terms of its potential impact on society, it’s been compared to the introduction of the Internet, the invention of the printing press, even the first use of the wheel. Maybe you’ve played with it, maybe you know enough to worry about what it might mean for your job, but one thing you shouldn’t ignore: like any technology, it can be used for both good and bad.
If you thought cyberattacks/cybercrimes were bad when done by humans or simple bots, just wait to see what AI can do. And, as Ryan Health wrote in Axios, “AI can also weaponize modern medicine against the same people it sets out to cure.”
We may need DarkBERT, and the Dark Web, to help protect us.
A new study showed how AI can create much more effective, cheaper spear phishing campaigns, and the author notes that the campaigns can also use “convincing voice clones of individuals.” He notes: “By engaging in natural language dialog with targets, AI agents can lull victims into a false sense of trust and familiarity prior to launching attacks.”
It’s worse than that. A recent article in The Washington Post warned:
That is just the beginning, experts, executives and government officials fear, as attackers use artificial intelligence to write software that can break into corporate networks in novel ways, change appearance and functionality to beat detection, and smuggle data back out through processes that appear normal.
The outdated architecture of the internet’s main protocols, the ceaseless layering of flawed programs on top of one another, and decades of economic and regulatory failures pit armies of criminals with nothing to fear against businesses that do not even know how many machines they have, let alone which are running out-of-date programs.
Health care should be worried too. The World Health Organization (WHO) just called for caution in use of AI in health care, noting that, among other things, AI could “generate responses that can appear authoritative and plausible to an end user; however, these responses may be completely incorrect or contain serious errors…generate and disseminate highly convincing disinformation in the form of text, audio or video content that is difficult for the public to differentiate from reliable health content.”
It’s going to get worse before it gets better; the WaPo article warns: “AI will give far more juice to the attackers for the foreseeable future.” This may be where solutions like DarkBERT come in.
Now, I don’t know much about the Dark Web. I know vaguely that it exists, and that people often (but don’t exclusively) use it for bad things. I’ve never used Tor, the software often used to keep activity on the Dark Web anonymous. But some clever researchers in South Korea decided to create a Large Language Model (LLM) trained on data from the Dark Web – fighting fire with fire, as it were. This is what they call DarkBERT.
The researchers went this route because: “Recent research has suggested that there are
clear differences in the language used in the Dark Web compared to that of the Surface Web.” LLMs trained on data from the Surface Web were going to miss or not understand much of what was happening on the Dark Web, which is what some users of the Dark Web are hoping.
I won’t try to explain how they got the data or trained DarkBERT; what is important is their conclusion: “Our evaluations show that DarkBERT outperforms current language models and may serve as a valuable resource for future research on the Dark Web.”
They demonstrated DarkBERT’s effectiveness against three potential Dark Web problems:
- Ransomware Leak Site Detection: identifying “the selling or publishing of private, confidential data of organizations leaked by ransomware groups.”
- Noteworthy Thread Detection: “automating the detection of potentially malicious
- Threat Keyword Inference: deriving “a set of keywords that are semantically related to threats and drug sales in the Dark Web.”
On each task, DarkBERT was more effective than comparison models.
The researchers aren’t releasing DarkBERT more broadly yet, and the paper has not yet been peer reviewed. They know they still have more to do: “In the future, we also plan to improve the performance of Dark Web domain specific pretrained language models using more recent architectures and crawl additional data to allow the construction of a multilingual language mode.”
Still, what they demonstrated was impressive. Geeks for Geeks raved:
DarkBERT emerges as a beacon of hope in the relentless battle against online malevolence. By harnessing the power of natural language processing and delving into the enigmatic world of the dark web, this formidable AI model offers unprecedented insights, empowering cybersecurity professionals to counteract cybercrime with increased efficacy.
It can’t come soon enough. The New York Times reports there is already a wave of entrepreneurs offering solutions to try to identify AI-generated content – text, audio, images, or videos – that can be used for deepfakes or other nefarious purposes. But the article notes that it’s like antivirus protection; as AI defenses get better, the AI generating the content gets better too. “Content authenticity is going to become a major problem for society as a whole,” one such entrepreneur admitted.
When even Sam Altman and other AI leaders are calling for AI oversight, you know this is something we all should worry about. As the WHO warned, “there is concern that caution that would normally be exercised for any new technology is not being exercised consistently with LLMs.” Our enthusiasm for AI’s potential is outstripping our ability to ensure our wisdom in using them.
Some experts have recently called for an Intergovernmental Panel on Information Technology – including but not limited to AI – to “consolidate and summarize the state of knowledge on the potential societal impacts of digital communications technologies,” but this seems like a necessary but hardly sufficient step.
Similarly, the WHO has proposed their own guidance for Ethics and Governance of Artificial Intelligence for Health. Whatever oversight bodies, legislative requirements, or other safeguards we plan to put in place, they’re already late.
In any event, AI from the Dark Web is likely to ignore and try to bypass any laws, regulations, or ethical guidelines that society might be able to agree to, whenever that might be. So I’m cheering for solutions like DarkBERT that can fight it out with whatever AI emerges from there.
Kim is a former emarketing exec at a major Blues plan, editor of the late & lamented Tincture.io, and now regular THCB contributor