AI Village shares its perspective on OpenAI’s decision to release a limited version of GPT-2

Earlier this month, OpenAI released a limited version of GPT-2, its unsupervised language model, with a warning that it could be used for automating the production of fake content. While many machine learning researchers supported their decision for putting AI safety first, some felt that OpenAI is spreading fear and hindering reproducibility while others felt it was a PR stunt.

AI Village, a community of hackers and data scientists working together to spread awareness about the use and misuse of AI, also shared its views on GPT-2 and its threat models. AI Village in the blog post said, “...people need to know what these algorithms that control their lives are capable of. This model seems to have capabilities that could be dangerous and it should be held back for a proper review.”

These are the potential threat models in which GPT-2 can be used, according to AI Village:

The bot-based misinformation threat model

Back in 2017, when FCC launched a public comments website, it faced a massive coordinated botnet attack. This botnet posted millions of comments alongside humans about anti-net neutrality. Researchers were able to detect this disinformation by using regex to filter for all these comments with near certainty. AI Village said that if these comments were generated by GPT-2, we wouldn’t have been able to find that these comments were written by a botnet.

Amplifying human generated disinformation

We have seen a significant amount of bot-activity on different online platforms. Very often, these activities just amplify fake content created by humans by giving upvotes and likes. How these bots work is that they log in on any online platform, like the target post, and then log off for the next bot. This behavior is quite different from a human, who actually scroll through posts and stay on these social media platforms for some time. This is what gives away whether the user is a bot or an actual human. This metadata of login times, locations, site activity can prove to be really helpful in detecting bots.

Automated spear phishing

In a paper published in 2016, two data scientists, John Seymour, and Philip Tully introduced SNAP_R. It is a recurrent neural network that can learn to tweet phishing posts targeting specific end users. The GPT-2 language model could also be used for automated spear phishing campaigns.

How can we prevent the misuse of AI?

OpenAI with this decision wanted to start a discussion about the responsible release of machine learning models, and AI Village hopes that having more such discussions could prevent AI threats on our society. “We need to have an honest discussion about misinformation & disinformation online. This discussion needs to include detecting and fighting botnets, and users who are clearly not who they say they are from generating disinformation.”

In recent years, we have seen many breakthroughs in AI, but comparatively very less effort has been put into finding ways to prevent the malicious use of AI. Generative Adversarial Networks are now capable of producing headshots that are indistinguishable from photos. Deepfakes for video and audio have advanced so much that they almost seem real. Currently, we do not have any mechanism in place for researchers to responsibly release their work that could potentially be used for evil. “With truly dangerous AI, it should be locked up. But after we verify that it's a threat and scope out that threat”, states the blog post. AI Village believes that we need to have more thorough discussions about such AI systems and the damages they can do.

Last year, in a paper, AI Village listed down some of the ways through which AI researchers, companies, legislators, security researchers, and educators can come together to prevent and mitigate the AI threats:

Policymakers and technical researchers should come together to investigate, prevent, and mitigate potential malicious uses of AI.

Researchers before opening their work to the general public should think about the dual use of their work. They should also proactively reach out to relevant actors when harmful applications are foreseeable.