Twitter Army

So, big in the news this month is the purchase of Twitter by Elon Musk. And of the big changes he has suggested one that perked my interest was that he wants to remove bots. I personally think this would be a difficult task. What with a global platform used by the likes of normal middle class Americans to people that might need to hide their identity such as dissidents in an oppressive country. How any algorithm could weed out bots versus people attempting to remain anonymous and still communicate would be tricky? Then telling good bots from bad bots as there is a lot of daylight between the Earthquake Robot tweeting information about worldwide earthquakes and a bot scamming users into buying crypto currency.

Let’s say that he went on the offensive and started to aggressively develop algorithms to search out and remove bot accounts or just as bad mark the accounts as bots (as opposed to verified humans). That seems like a very big opportunity. But first we need to explain GANs.

Generative Adversarial Networks (GANs) are a clever way of letting an algorithm learn a task off its own back. It is devised in two halves. The first half is called the generative model, it is built in such a way that it generates things. The second half is the discriminator model. It is trained to classify the examples generated by the generator model as either real or fake. Each generated example gives feedback to the generator to iterate on its ability. We allow the two halves to continue to battle it out until about half of the time the generator model is fooling the discriminator model.

So as an example lets suggest we teach a generator model to make images of people from the shoulders up. We might start with the basics of what a human face looks like, viable skin tones, colours of hair, types of clothing, etc. Then we let it loose. At first the images are terrible, but each time it generates an example it sends it off to the discriminator model and it receives back a score. From the discriminator model it is trained on images of people. This means we’ve fed it hundreds and thousands if not millions of images of people from the shoulders up. Then when it receives an example from the generator model it gives it a score out of 100. Anything over 50 is accepted as being a plausible example. The generator model takes the scores it receives and uses that to tweak the images it is generating as examples. You would expect the scores to start low as it doesn’t even know exactly what colour people can be. But as the scores come in and it can compare the scores across blind attempts it starts to get better, perhaps it hits on skin tones that give better scores, (like not bright green). Maybe it starts to understand the shape hair can pass as acceptable. It starts to understand the placement of eyes, ears, mouth and nose better. Scores start to improve and then the percentage of examples that are plausible starts to rise. When that ratio goes above 50% then we are starting to generate plausible examples more often than not.

...it would probably be unhealthy if I had the money that Elon Musk does.

With that idea in your head now think of Elon Musk’s Twitter bot searching algorithms as a discriminator model. Then our job would be to develop a human impersonating bot generator model. Twitter wouldn’t be an ideal discriminator model as it wouldn’t give us good feedback when we don’t have a plausible example, it would instead label the account a bot or worse ban the account. But perhaps we could approximate our own score. We could score activities. Such as successfully creating an account. Successfully passing any verification. Successfully posting content. Successfully interacting with existing content. Then measure how long until an account is warned or banned. Add all of this up together to build a score that could be used as feedback for our generative model.

What would we need? Well we would need to describe what a user would be like. They would access Twitter through a single or small number of entry points that are browser or app based. They wouldn’t use an API. Perhaps we’d need to manage the IP addressed that interacted with Twitter.

Our fake person would need to eat, sleep and work so they wouldn’t always be on Twitter. And when they were on Twitter they would have interests and ideas and they would follow people and companies that align to those ideas. Then again it wouldn’t be so blatant. Someone can be extremely healthy, climb mountains and run marathons and then love deep fried Mars bars and always post memes of them. People are fickle and weird, but there would be limit for how extreme these juxtapositions would be.

So what if we start this process and we start succeeding. What if start successfully creating Twitter accounts that are living their own lives, tweeting and interacting without being detected. Then this number would increase over time. For argument’s sake lets say that we created 10,000 fake human verified accounts. Then we might have the necessary building blocks to attempt a new generator model.

Our second generator model attempts to model collective human behaviour. You could suppose that if one of our bots tweeted something and then 999 of our bot accounts that weren’t related in geographic location, or interests or connected through following the initial account retweeted the initial tweet then this could be seen as suspicious activity to an algorithm. How could we model a realistic looking viral campaign? Is this possible who knows?

...but then what if it went a step further? What if you built a third generator model that attempted different viral campaigns to look at the outcome. However, the discriminator model would be national and international press. If we got past the second model and had the ability to plausibly generate viral campaigns to some degree. Then you could see an opportunity to build a bit of technology to monitor news media to generate the feedback needed by our third generator model. Could we take a relatively unknown word and make it popular again? Brothel Creepers? Could we take an old urban myth and make it popular again? Richard Gere and his love for gerbils? Can we get to a point where we can come up with any zany idea and get it to be reported on by the Associated Press?

Generator model number four would involve generating real world outcomes in response to random or zany ideas. Regardless or not if my intentions were good or evil at this point I’d be giving Cambridge Analytica a run for its money. And this is also the reason it would probably be unhealthy if I had the money that Elon Musk does.