AI tools to tackle online misinformation and harassment

as part of Conversations about artificial intelligence webinar series hosted by Caltech Science Exchange, Professor of Political and Computational Social Sciences Michael Alvarez and Brin Professor of Computing Science and Mathematics Anima Anandkumar discuss how to amplify misinformation online and ways artificial intelligence (AI) tools can help create a more social media ecosystem. Social networking is trustworthy.

As documented by the #MeToo movement and witnessed in the aftermath of the 2020 election, online phishing and disinformation lead to real-world consequences. In a conversation with Caltech science writer Robert Perkins, scientists described artificial intelligence methods that can analyze millions of social media posts to identify and prevent online harassment and track the spread of disinformation and disinformation on platforms like Twitter and Facebook.

Highlights from the conversation below.

The questions and answers below have been modified for clarity and length.

What have you learned so far about how misinformation spreads?

Michael Alvarez: We have collected Twitter data on tweets about the 2020 elections and information and misinformation that have been posted. Things like: Were Twitter’s attempts to flag and block some misinformation successful in 2020? We found mixed evidence. The platforms clearly have trouble regulating the spread of misinformation.

Anima Anandkumar: In my view, its core is algorithmic amplification: automated amplification on such a large scale combined with targeted attacks, many of which are by bots, and platforms that do not have the ability to authenticate real users. It also means that you make these attacks bigger in a very short period of time.

Alvarez: It’s flowing pretty fast, unlike in the past when I first started getting involved in politics myself, when things were pretty much in the papers or on the evening news. And these platforms, with their rapid spread of information and the lack of real fact-checking, give us this problem where people are prone to seeing a lot of misinformation. It has become, I think, one of the biggest problems we face in today’s world: How do we verify so much information being posted on social media?

What is at stake here? What will happen if we don’t treat it?

Alvarez: Well, that’s a big question. A recent Pew Research Center survey found that four out of 10 Americans report some form of online harassment and bullying. This means that of the people watching this now, many have experienced this themselves. And then there’s this bigger puzzle of trying to monitor misinformation and/or disinformation, which is a problem at scale and a very difficult one.

What is at stake, on the one hand, we need to stop these online harassment. People should not be harassed online, and this should just stop. In the macro scenario, we need to be very careful that our democracy is not eroded by the spread of misinformation or actually incorrect information in these midterms, and in particular, as we move forward with the next presidential election. This is part of our motivation for putting so much time into this project. And I think that motivates a lot of our students to get involved as well.

What can AI do to help tackle this problem? Can you talk about the AI tools you’ve been building and how they work?

Anandkumar: AI gives us an excellent set of tools to conduct this research at scale. We are now able to analyze billions of tweets very quickly, which was not possible before.

One of the challenges here is that this is not a fixed vocabulary or a fixed set of topics. On social media, we have new topics developing all the time. COVID-19 wasn’t even in our vocabulary when the modelers were initially trained. Especially when it comes to harassment and phishing, bad actors are coming up with new keywords and new hashtags all the time. How do we deal with this ever-evolving vocabulary?

Mine Research It dates back more than a decade to the development of these unsupervised learning tools, which is what we call subject modeling. Here, we automatically detect topics and data by looking at how patterns evolve. Think about how, if the words are spoken together, they should have a strong relationship in representing a topic. If the two words apple and orange are together, the subject is probably fruit. What we call tensor methods look at such simultaneous relationships. And then the AI has to do this across hundreds of thousands of possible words on billions of documents to extract topics automatically. We propose scalable ways to efficiently extract hidden patterns or themes from such large-scale simultaneous relationships between words.

Alvarez: Not only are we able to do this, but with Anima’s contribution in the field, we can do it on a large scale very quickly. Developed technologies It allowed us to look at a dataset of all tweets about #MeToo, About 9 million tweets. We can analyze this data set in a matter of minutes, whereas before that it would have taken several days, maybe a month. We can do things in computational social science that we couldn’t do before.

Anandkumar: This is the power of unsupervised learning. It doesn’t rely on humans categorizing different topics, so you can do it online as the topics come up rather than waiting for human commentators to collect data, discover new topics, and then train the AI.

Let’s move on to some potential concerns about this new technology. Is there any concern that it could be used to curtail freedom of expression or abuse in some way? Say, by an authoritarian government?

Anandkumar: Our tools are open source, and it’s all really about helping you understand the relationships of different topics in an evolving text. It really helps the people who set the policies to deal with the volume and speed of data that comes at them. It is certainly not up to us to observe or frame these questions. We enable researchers as well as social media companies to handle the volume and speed of data.

For me personally, algorithmic amplification is not about freedom of speech. We do not limit anyone’s speech if we say how we should limit the amplification of misinformation. Anyone can shout from their rooftop anything they want; They are free to do so. But if it reaches millions of people, and people tend to believe what is being handled on such a massive scale, that’s a different story. I think we should separate freedom of speech from arithmetic amplification.

Alvarez: I think Anima has really answered the question well in that, as scientists, our responsibility, first and foremost, is to the scientific community, to make sure our colleagues can use these tools and verify that they work as we claim they do. We are responsible for sharing the materials we used to develop these claims.

But we really hope these tools are the ones that social media platforms use. We are very optimistic. We know they have a very big difference in data science. We know they’re following through with what we’re doing, and so we’re very hopeful that they’ll take a look at them and maybe use it to try to mitigate, if not solve, a lot of the problems that are currently there.

How will AI keep up with highly changing social norms over time?

Anandkumar: The idea is not for AI to make decisions about how we deal with amplification. There must be a human in the loop. And I think that’s one of the problems today: it’s automated amplification. Unsupervised learning tools look at patterns in the data itself as they evolve into human moderators, so we need a human in the loop. And there should be consistent policies to guide these people as to how to judge whether something constitutes harassment.

The short answer is yes, we must be able to adapt to different social norms and changing circumstances. And if we introduce smart AI tools that enable human supervisors to adapt, I think that’s the best we can do.

Alvarez: And I will say that there are other people studying this at Caltech. for example, Frederick Eberhart He is a faculty member and philosopher. He teaches a great class here at Caltech on the ethics of artificial intelligence. As our research continues to grow and advance, I am very confident that we will reach out to scholars like Frederick here at Caltech and elsewhere to begin incorporating more about the fundamental ethics of AI use and how AI will respond. Social habits change around things like harassment.

When and where should we look for your tools to become open source?

Anandkumar: We will release this as part of an open source library called Tensorly. Mike also maintains a blog about trustworthy social media, and we hope to announce it on Twitter and other social media.

You can also take a look at some of our research papers on this topic:

• Tensor Analytics for Learning Latent Variable Models, “Journal of Machine Learning Research 2014
• Cloud Large-Scale Publishing for Spectral Subject Modeling, “Knowledge Discovery and Data Mining (KDD) 2019, ParLearning Workshop, Anchorage, Alaska, USA

Here are some of the other questions addressed in the video linked above:

• Who decides what is truth and what is misinformation?
• How is the monitoring of misinformation different across social media platforms? For example, is it more difficult to track TikTok, which is based on video versus Twitter or Facebook?
• How do we detect misinformation in languages other than English?
• What are the most compelling facts regarding the integrity of federal elections?

Learn more about Artificial intelligence and the The science behind voting and elections On the California Institute of Technology for Science Exchange.

Source