Scroll to read more

Okay, we’re going to get asked about this, which is the only reason I’m going to cover it – because really, it’s impossible to examine such claims without any access to the data in question.

Today, Twitter tweeted out a new claim that ‘more than 99.99% of Tweet impressions are from healthy content, or content that does not violate our rules’.

As per Twitter:

“On Twitter people are free to be their true selves. Everyday, we work to preserve free speech on Twitter, while equally maintaining the health of our platform. Since the launch of Freedom of Speech Not Reach’, we’ve seen encouraging results. That’s why in the coming weeks, we’re expanding the application of this enforcement action from our Hateful Conduct policy to now include our policies on Abusive Behavior and Violent Speech.”

Twitter’s ‘Freedom of Speech Not Reach’ approach, which it outlined back in April, essentially explains that Twitter’s now looking to reduce the reach of some less violative content in the app, despite it technically breaking its rules, as opposed to removing such outright. Twitter has also added labels to these tweets, to clarify when such action has been initiated.

Twitter violation labels

As Twitter notes, originally, this approach was only applied to tweets that would previously have been deemed in violation of its Hateful Conduct policy, but it’s now looking to expand this same systematic enforcement action to abusive and violent tweets as well.

To be clear, Twitter’s rules around such haven’t changed, but its enforcement approach is different, in that previous Twitter management would have removed more of these types of comments outright – but now, Twitter’s taking a more lenient approach, by reducing their reach instead.

And Twitter says that this is working – with a staggeringly low 0.01% of tweets that violate its rules now being seen by any users at all.

Which seems very unlikely, based on overall industry trends, and external reporting on Twitter specifically.

For example, the prevalence of similar violations on Facebook on Instagram sits at around 0.05% – and Meta has far more staff, and far more advanced systems working to address such across its apps. The suggestion that Twitter has somehow been able to best this, after culling 80% of its staff, including many of the people who were working on addressing these elements, seems questionable at best.

There have also, as noted, been a range of third-party analysis reports which suggest, for example, that antisemitic tweets have become more common since Elon Musk took over at the app, that slurs against Black and transgender people have also increased, while hate speech, in general, has also become more prevalent amid the app’s broader changes in approach. Twitter’s also facing legal action in both Australia and Germany for failing to remove hate speech in a timely manner.

As we’ve reported previously, some of the conflicting figures here seem to come down to varying definitions of what actually qualifies as hate speech, and how Twitter itself is measuring such. But we don’t know how Twitter has come to this new 99.99% figure, because there’s no evidence – the Twitter team hasn’t provided any actual data or insight to back this number up.

So it’s just ‘take out word for it’, that somehow, Twitter has achieved record-setting results in moderation performance, despite cutting the majority of its staff, and in contrast to external academic analysis, which points to the opposite.

I’m not saying that it’s not right, but I don’t know, and you don’t know either, because Twitter hasn’t explained itself in any way.

So what that means, in the end, I don’t know.

But sure, it’s an impressive figure, I guess.