Combating Misinformation on TikTok: Documented’s Investigative Toolkit
The rise of social media platforms like TikTok has presented new challenges in combating the spread of misinformation, particularly among vulnerable populations like migrants. Documented, a non-profit news organization focused on immigrant communities in New York City, has taken on this challenge by developing a sophisticated, multi-pronged approach to investigating and exposing misinformation targeted at migrants. This article details their innovative methodology, offering valuable insights for journalists and researchers grappling with similar issues.
Documented’s investigation stemmed from firsthand accounts of migrants relying on TikTok for information about navigating their journey to New York City, often encountering misleading or false information. This prompted the organization to delve deeper into the platform, working with community correspondents and experts to identify prevalent misinformation themes, including predatory scams targeting migrants. This initial groundwork provided a crucial foundation for the subsequent technical investigation.
A significant hurdle in tackling online misinformation is the ephemeral nature of the content. Disinformation campaigns often appear and vanish quickly, requiring swift action to preserve evidence. Documented addressed this by developing a system for identifying and archiving relevant TikTok accounts. The process involved collaborating with experts and migrants to pinpoint accounts spreading misinformation, followed by the development of a Python scraper to extract video URLs from archived HTML pages of these accounts. Using the yt-dlp library, the videos and their metadata were then downloaded and stored locally.
Analyzing large volumes of video content presents another significant challenge. Manually reviewing each video is time-consuming and impractical. To overcome this, Documented employed the open-source Whisper speech recognition model to automatically transcribe the videos. While the accuracy of the transcription varied across languages, it provided sufficient information to gain a general understanding of the content and identify key themes for further investigation. The use of AI tools like Whisper, while imperfect, proved invaluable in managing the sheer volume of data. The organization acknowledged the limitations of these tools and emphasized the importance of contextualizing and verifying the information extracted through automated processes.
To further refine their analysis, Documented leveraged natural language processing (NLP) and topic modeling. NLP facilitated the conversion of transcribed text into analyzable data, allowing for the identification of frequently occurring words and phrases. Topic modeling, a form of unsupervised machine learning, helped cluster related words and uncover underlying themes within the videos. This combination of techniques allowed researchers to identify recurring topics, such as religious misinformation and issues surrounding the CBP One app, which migrants use to enter the U.S. These identified themes guided further investigation and provided a framework for understanding the broader landscape of misinformation targeting migrants.
Documented’s approach highlights the importance of combining macro-level analysis with micro-level examination. While the automated analysis provided a broad overview of the content, the team also carefully reviewed individual videos, particularly those with high viewership, to provide concrete examples and contextualize the larger trends. This combination of quantitative and qualitative analysis allows for a richer and more nuanced understanding of the misinformation ecosystem.
The technical infrastructure developed by Documented consists of a Python-based code pipeline that includes scripts for extracting video links, downloading videos, transcribing content, and performing topic modeling. By making this pipeline publicly available on GitHub, Documented aims to empower other journalists and researchers to conduct similar investigations. This open-source approach fosters collaboration and accelerates the development of effective strategies to counter online misinformation.
Documented’s investigation serves as a valuable case study in addressing the complexities of misinformation on platforms like TikTok. Their multi-faceted approach, combining technical innovation, community engagement, and rigorous analysis, offers a roadmap for tackling this growing challenge. By sharing their methodology and tools, Documented contributes to a broader effort to combat misinformation and empower vulnerable communities with accurate information.
The organization’s focus on archiving content, recognizing the fleeting nature of online misinformation, is a crucial step in ensuring accountability and enabling further analysis. The use of automated transcription, while imperfect, is a pragmatic approach to managing large volumes of video content. Furthermore, the combination of NLP and topic modeling provides a powerful framework for identifying thematic trends and patterns within the data.
The emphasis on combining macro-level analysis with micro-level examination of individual videos adds depth and context to the findings. By showcasing specific examples alongside broader trends, Documented provides a compelling narrative that resonates with audiences and underscores the real-world impact of misinformation.
The open-source nature of Documented’s toolkit is a testament to their commitment to collaboration and transparency. By sharing their code and methodology, they empower other organizations to adapt and refine these techniques, fostering a collective effort to combat misinformation.
Documented’s work highlights the evolving nature of journalistic investigations in the digital age. It underscores the need for adaptable methodologies, technical expertise, and a commitment to community engagement. By sharing their learnings and tools, Documented contributes significantly to the fight against misinformation and provides a valuable resource for journalists and researchers worldwide. Their work serves as a model for innovative, impactful reporting in the face of the complex challenges posed by online platforms and the spread of misinformation.
Their approach is not just about identifying and exposing misinformation, but also about understanding the context in which it spreads and the impact it has on vulnerable communities. By centering the experiences of migrants and working closely with community correspondents, Documented ensures their investigation is grounded in the realities faced by those most affected by misinformation. This community-centric approach is crucial for developing effective counter-strategies and building trust.
The organization’s recognition of the limitations of AI tools is also commendable. While embracing the potential of automated processes, they acknowledge the inherent biases and inaccuracies that can arise. Their emphasis on verifying information and contextualizing findings highlights a responsible approach to using AI in journalistic investigations.
Finally, Documented’s commitment to open-source principles contributes significantly to the fight against misinformation. By sharing their tools and methodology, they empower other organizations and individuals to conduct similar investigations, fostering a collaborative approach to tackling this global challenge. Their work serves as a powerful example of how innovative reporting, combined with technical expertise and community engagement, can make a tangible difference in combating misinformation and protecting vulnerable communities.