Hate Speech, Digital Toxicity & Cyber Vulnerability

10 min readMar 15, 2021

Nobody can deny the fact that humanity is continually swamping our planet with highly toxic content and garbage. Yet most people don’t understand that just as we are polluting our planet, we are swamping the internet with toxic communication, most of which will be accessible forever. However, out of this darkness, a positive side is emerging with genuinely concerned people and companies stepping up and using various means to try to counter it. But how much will it take to truly achieve a sustainable “digital green peace?”

“Various flavors of toxic stuff can be found all over Facebook, from bullying & child trafficking, to rumors, hate, & fakery. FB has invested heavily in measures to control this, & mainly outsourced its content moderation to a small army of reviewers in contract shops around the world. But content moderators can’t begin to weed through all the harmful content, & the traffickers of such stuff are constantly evolving new ways of evading them” [1]

Can AI be Our Saviour?

“With social media users numbering in the billions, all hailing from various backgrounds & bringing diverse moral codes to today’s wildly popular platforms, a space for hate speech has emerged. Internet service providers [and other corporations] have responded by employing AI-powered solutions to address this insidious problem” [2]

Taking Facebook as an example, the corporation is placing its bets on specific artificial intelligence, which is designed to detect misinformation and hate speech on its platform. In fact, this AI has been put to work on hundreds of servers in its data centers. Whenever a new post is made, intricate neural networks which are taught to distinguish content left by toxic users, are alerted. The former then have to ascertain whether or not the content (be it nudity, bullying, hate speech, misinformation, and so on), is in violation of FB’s community guidelines. Of note, although a substantial percentage of

the suspected content is passed on to human moderators to decide on further action, in some cases, artificial intelligence can act alone, thus analyzing, and in some cases, removing it [1]. So is AI making progress?

“According to a Pew Research Center report, 79% of Americans say that online service & social network providers are responsible for addressing harassment. In Germany, companies may face a fine of up to 50 million euros if they fail to remove within 24 hours illegal material, including fake news and hate speech” [2]

Thus far, reports show that, while it has not completely tackled the plethora of toxic issues, FB has in fact, made a great deal of progress with its AI detection tools and natural language processing advancement. In the second quarter of 2020, it stated that it: “took down 104.6 million pieces of content (excluding spam) that violated its community standards. It removed 22.5 million pieces of hate speech alone from Facebook in the second quarter, compared to 9.6 million in the first quarter, and compared to just 2.5 million hate posts two years ago” [1]. — A massive jump indeed, but are more people posting, and how many more bad actors have joined the toxic hate, bullying, nudity, and misinformation brigade?

Moreover, FB’s future is all about multimedia and images. However, the reality is that: “hateful and dangerous messages may lie in the midst of videos or encoded in memes. And so far, the breakthroughs the company has seen in its natural language AI have not transferred over to similar progress in its computer vision AI’s ability to detect such content” [1].

The Birth of Cybersecurity

“One of the hacks that was created in 1969, was created to act as an open set of rules to run machines on the computer frontier. It was created by two employees from the Bell Lab’s think tank. In the 1970s, the world was primed & ready for hackers. For hackers, it was all about exploring and figuring out how the wired world worked. This is the year that hacking went from being a practical joke to serious business” [3]

The cyber security methodologies and paradigms that were originally designed to take action against hacks, have since gone through a period of evolution. But it was not always like this, particularly at the start of the internet. As a matter of fact, the majority of systems were hacked, and this caused considerable damage. — Yet nobody knew how to identify a hack, how it should be dealt with, and most importantly, how companies and individuals should try to defend themselves against it.

Of note, the forefathers of modern cyber security (back then they were proud to be called security experts), had to fortify their systems by using permissions. To that end, not everyone was allowed to use the system. This represented one of the first instances of cyber security’s white and black list concept.

So did this work for them? Well NO…

As you can imagine, hackers were more sophisticated than that. — They easily bypassed the white and black list, and ever since that time, it has been a non-stop cat and mouse chase between the defenders and the hackers. However, there is one thing that the defenders learned from this first encounter: if they wanted to defend their system against the right risks, and truly understand their impact (that is to say, the level of risk), then they should have someone with top hacking capabilities in their own team. — And so it was that the concept of the blue team (the defenders), and the red team (the attackers), came into being, and henceforth, they have worked together to defend the systems.

The Age of Toxicity

While cybersecurity experts were busy fighting hackers and cyber terrorists, a whole new world of threats started to explode under society’s radar. Indeed, this has potentially made the current digital toxicity issue bigger than that of hacking and its monetary value. The instigators do not need to know how to hack into computers, they simply need access to the internet.

This digital toxic threat represents noxious communication, something which everybody knows as hate speech, bullying, shaming, and so on.It seems that penetrating people’s lives and souls is much easier than penetrating a banking system. You do not need to have any professional skills for that. And the sad fact of the matter is that, in most instances, making people commit suicide, self-harm, or expose themselves, will be harder to fix than stealing some of their money, or using their personal data.

George Santayana rightfully said that “Those who cannot remember the past are condemned to repeat it”

Section 230

With regard to the internet, 1996 represented a good year for UGC (user generated content). Indeed, Section 230, a piece of Internet legislation in the United States, passed into law as part of the Communications Decency Act (CDA). It stated that: “No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider,” thus removing the liability from internet publishers. Yet, as no publisher likes to see profanity on their site/application, it hurts the brand, and causes the users a certain amount of stress.

When I thought about the first person who was tasked with cleaning profanity from an internet system, I looked for an answer from a cyber elder. I imagined him responding in a Yodaish fashion: “Let’s look into the black and white list you should…” And this is what he did.

Keyword Spotting — the New Black & White

Keyword spotting, or its more common interpretation, “using dictionaries” or “using dictionary based solutions,” refers to a list of bad words, and if they are used on a forum, site, or in the comments section… are removed. A good example is the word fuck, which can simply be written as f u c k, f$$k, fak, or any other form of this word. Now think of a thousand words and their possible abbreviations, and multiply this by tens of languages, and voila, you have it all…

It is important to note here that dictionary based solutions have a major drawback, in that they they have no context. To that end, the false positives coming out of working with these solutions can swamp any person/organization that is trying to moderate profanity from internet user generate content. — And profanity (blasphemous or obscene language), is just the tip of the iceberg…

An Example of a Potential Misunderstanding

Person A: “OMG, I’m gonna kill myself. Can’t do this any more.”

Person B: “What happened? Why the drama?”

Person A: “I have an exam tomorrow, and I’m still procrastinating. I’m going to die…”

Most systems that look for self-harm and suicidal signals (based on dictionaries), will trigger this conversation as red alert for emergency intervention… So how can UGC publishers know what the status of their systems is?

Testing Your System

Similar to security experts who use the help of ethical hackers (red teams) to try and break into their systems by using automated tools to try to ‘inject’ damaging content; I suggest that UGC publishers utilize the same tools that enable them to try to inject ‘damaging’ themselves; so that they can ascertain how their system can cope with with it.

Stimulus & Response

In order to kickstart this activity, publishers require the following:

Categories

Publishers should choose the category of toxic phenomenon (for instance, profanity, hate speech, sexual solicitation etc.), that they want to check on their digital communications.

Define

These phenomena need to be defined (for example, what are hate speech communications?).

Dataset

Find the right data set for testing. Generally speaking, a few example sentences form the founders could form a basic dataset to begin these tests.

Policy

An understanding of what the organization’s policy will be in regard to digital toxic communications (should they be found in the publisher’s system).

In order to assist with this, I will demonstrate how we see this at L1ght. — This example will focus on one category, which is in this case, is personal insult.

Definition

· The text clearly has a derogatory or negative meaning. That is to say, it bears a STRONG emotional negative statement that conveys a clear derogatory meaning towards an individual.

· A personal insult must have the intent of the speaker for it to insult another human being.

Dataset

· “Justin Bieber is trash”

· “Dan you piece of shit”

· “You motherfucking shit”

· “Dafna is disgusting”

· “Dafna is a disgusting Jew”

· “You are disgusting”

· “You are a disgusting Jew”

· “Your mother is disgusting”

Policy

The policy may be to:

· Reject this message

· Reject this message if it is the 3rd time that this user has sent PI content

· Reject user for a certain period of time

· Other

Dictionary based solutions are easy to develop and maintain, and some solutions, particularly for English (only), are readily available. Yet these kinds of solutions only solve part of the problem, moreover, they generate huge amounts of false alarms. This has triggered a paradox: as you are becoming more and more successful as a UGC publisher, your users are also producing more and more digital toxicity. This means that you need more and more human moderators to control it, and thus you are forced to spend more and more money on moderation, when it should be going into building your business. Furthermore, when you try and control this toxicity by implementing a keyword based solution, you are faced with the second paradox of being swamped with false positives.

Cyber Security Paradigms That Came After the White & Blacklist Era

Cyber security experts were one of the first cohorts to identify the ability of machine learning for solving some of their problems. The daily challenge of handling unlimited barrages of unclassified threats the can be identified for their real risk, brought cyber experts to their knees. For these experts, adding to the alerts both context (where they came from, who was the originator, etc.,) and the history (what communications came before and after), were easy to obtain, thus the ability to solve complex issues and user identity using machine learning tools is very effective. To that end, as we move along the 2020s, and need to win the massive battles against sophisticated hacker attacks, archaic and non-relevant white and black list concepts need to be dis-guarded.

New Hope

Just as with the realm of cyber security, digital toxicity detection and its innovative prevention methods, have rapidly evolved over the past few years. Now, there is a revolution on digital toxicity: the context based solutions that are heavily dependent on deep learning, constant moderation, and retraining, are transforming the former bleak outlook into one of hope. AI moderation tools for image, video, text, and even audio, are becoming more and more popular for solving the masses of toxic communications. Further, the human feedback loops can generate retraining cycles based on manual human moderations — and these can dramatically improve algorithm accuracy…

About the Author

Ron is the co-founder and Chief Technology Officer of L1ght, a company that has created cutting-edge technology designed to save children’s lives by detecting online toxicity and dangers to children (e.g. shaming, bullying, self-harm, pedophiles, predators, and more). Prior to founding L1ght, Ron founded, headed and sold, several flourishing high-tech companies in the cyber security arena.

www.i@ronporat.com

About L1ght

L1ght is an AI-based startup founded in 2018, with a goal to eradicate online toxicity, and ensure that the Internet is used for what it was originally intended to be — for connecting people, sharing ideas, and driving humanity forward. By building algorithms that detect and predict toxic and abusive behaviors, L1ght serves as a state-of-the-art solution for social networks, search engines, gaming platforms, and hosting providers, empowering them to identify and eradicate cyberbullying, harmful content, hate speech, and predatory behavior. L1ght is based in Tel Aviv, San Francisco and Boston, but its mission is a global one.

www.l1ght.com

References

[1]. Sullivan. M. (2020). “Facebook’s AI for detecting hate speech is facing its biggest challenge yet.” FastCompany.

How Facebook built an AI that can detect hate speech (fastcompany.com)

[2]. Budek, K. (2019). “How artificial intelligence can fight hate speech in social media.”

How artificial intelligence can fight hate speech in social media — deepsense.ai

[3]. UFL Education (N.d.). “The History of Hacking.”

History of Hacking (ufl.edu)

Hate Speech, Digital Toxicity & Cyber Vulnerability

Written by Ron Porat

No responses yet