Introducing BotSight: A New Tool To Detect Bots On Twitter In Real-Time
Introducing BotSight: A New Tool to Detect Bots on Twitter in Real-Time: Quantifying Disinformation on Twitter, one Tweet at a Time
NortonLifeLock Research Group (formerly known as Symantec Research Labs) has released a beta tool that can detect bots on Twitter in real-time to help Twitter users understand the prevalence of bots and disinformation campaigns within their personal feeds. The tool has also been made available in New Zealand.
Awareness around misinformation is higher than ever before, particularly as major social media platforms clamp down on removing misleading content and accounts – yet, there is still little understanding of how much disinformation there truly is.
With this in mind, we trained a state-of-the-art machine learning model that can detect Twitter bots with a high degree of accuracy, achieving an Area Under Curve – a common indicator of model quality – of 0.967 on popular research datasets, which matches or exceeds the best current academic results. But we didn’t stop there. We created a tool – called BotSight – which takes the results of our model and injects them directly into the Twitter feed. Now, we are releasing a beta version of BotSight (on popular browsers and iOS) to give people a better understanding of how bots operate on Twitter. You can download it for free here.
To determine whether an account is a bot, we look at over 20 different distinguishing features per case, including the amount of randomness in the Twitter handle, whether the account is verified, the rate at which it is acquiring followers, and the account’s description. We verified our approach by observing BotSight in action. So far, BotSight’s beta users have successfully analysed over 100,000 Twitter accounts.
BotSight works across the majority of Twitter including search, trending topics, and your home timeline. For the past six months, our team has been diligently scrolling through Twitter with BotSight enabled in order to continuously test and improve both our model and our design. It has also enabled us to better understand bots, contextualizing where they are likely to appear and how they act.
Using BotSight’s classifier on what we believe is the largest archive of Twitter’s historical data ever collected outside Twitter (over 4TB), we found many interesting and surprising things. One is that the problem of disinformation is not as small as Twitter’s numbers suggested on first blush, but also nowhere near the more sensational headlines we’ve seen. We’ve found that about 5% of tweets belong to bots overall, and this percentage has gone down over time, which is a testament to the hard work of Twitter’s Site Integrity team.
However, this percentage can go up as high as 20% when viewing trending topics, such as #COVID19 or other trending hashtags. In our analysis of recent coronavirus-related tweets, we found that between 6-18% of users tweeting on this subject were bots, depending on which time period we sampled, while a random sample of the Twitter stream indicates 4-8% bot activity by volume over the same time period. This contrast shows that bots are strategic about their behaviour: favouring current events to maximize their impact.
All these numbers differ depending on language, topic, and time of day. That’s precisely why seeing it right in your Twitter feed itself is so helpful.