The Open Technology Fund published a report earlier this month highlighting the rise of censorship-related jobs in China between 2015 and 2022. But growing evidence also shows that human-powered censorship is being increasingly enhanced—if not yet replaced—by AI. Alex Colville at China Media Project reported this week on hundreds of gigabytes of data found on an unsecured Chinese server linked to Baidu, which shows how the government and tech giants are using AI large language models (LLMs) to boost their online surveillance and censorship capacity:
First uncovered by Marc Hofer of the NetAskari newsletter, the data is essentially a reservoir of articles that require labeling, each article in the dataset containing a repeated instruction to prompt the LLM in its work: “As a meticulous and serious data annotator for public sentiment management, you must fully analyse article content and determine the category in which it belongs,” the prompt reads. “The ultimate goal is to filter the information for use in public opinion monitoring services.”
[…] First, it reveals a sophisticated classification system with 38 distinct categories, running from more mundane topics like “culture” and “sports” to more politically sensitive ones. Tellingly, the three categories marked as “highest priority” in the dataset align distinctly with state interests as opposed to commercial ones. Topping the list is “information related to the military field,” followed by “social developments” (社会动态) and “current affairs developments” (时政动态). This prioritization underscores how private tech companies like Baidu — though it could not be confirmed as the source of this dataset — are being enlisted in the Party-state’s comprehensive effort to monitor and shape online discourse. […] The exact purpose of this dataset remains unclear. Were these classifications developed internally by Baidu — or were they mandated by state regulators? Nevertheless, the unsecured data offers a glimpse into the inner workings of China’s AI content dragnet. What was once a labor-intensive system requiring thousands of human censors is rapidly evolving, thanks to the possibilities of AI, into an automated surveillance machine capable of processing and categorizing massive volumes of online content. [Source]
On Wednesday, Charles Rollet at TechCrunch shared experts’ views on how such AI-powered content categorization for the purpose of “public opinion work” would ultimately enhance the state’s censorship abilities:
[CDT’s] Xiao Qiang, a researcher at UC Berkeley who studies Chinese censorship and who also examined the dataset, told TechCrunch that it was “clear evidence” that the Chinese government or its affiliates want to use LLMs to improve repression.“Unlike traditional censorship mechanisms, which rely on human labor for keyword-based filtering and manual review, an LLM trained on such instructions would significantly improve the efficiency and granularity of state-led information control,” Qiang told TechCrunch.
[…] Michael Caster, the Asia program manager of rights organization Article 19, explained that “public opinion work” is overseen by a powerful Chinese government regulator, the Cyberspace Administration of China (CAC), and typically refers to censorship and propaganda efforts.The end goal is ensuring Chinese government narratives are protected online, while any alternative views are purged. Chinese president Xi Jinping has himself described the internet as the “frontline” of the CCP’s “public opinion work.” [Source]
Recent developments reflect ongoing attempts to instrumentalize AI for online surveillance and censorship. In February, OpenAI reported that a Chinese-origin network used ChatGPT to build a tool for collecting social media activity on sensitive political and social topics and marketing it to Chinese authorities. Another data leak discovered last month by Sentinel Labs revealed additional types of collaboration between public and private actors in China for the purpose of monitoring and censoring content on the Chinese internet. (On a related note, a whistleblower alleged that in its failed attempt to access the Chinese market, Facebook “developed a censorship system for China in 2015 and planned to install a ‘chief editor’ who would decide what content to remove and could shut down the entire site during times of ‘social unrest,’” as The Washington Post reported earlier this month.)
Beyond surveillance and censorship, AI has been employed across a wide swath of Chinese society. Its ubiquity has led the Cyberspace Administration of China to mandate that online platforms accurately label any content that is generated by A.I. China Media Project’s China Chatbot column has also documented the intersection of Chinese media and AI. Over the past few weeks, dozens of Chinese automakers, medical and pharmaceutical companies, banks, insurance companies, brokerage firms, and even local governments have announced that they are using DeepSeek AI in their products, research, and training to enhance a variety of tasks. Amber Wang at the South China Morning Post reported on Sunday that even China’s PLA is using DeepSeek AI for non-combat support, and potentially other military tasks:
DeepSeek’s open-source large language models (LLMs), which have drawn global attention and praise, are being used in PLA hospitals, People’s Armed Police (PAP), and national defence mobilisation organs, according to publicly available information.
[…] Some units of PAP – a paramilitary police force under the command of the Central Military Commission, which also directs the PLA – are using the app for daily physical training and psychological counselling. […] The PLA has called for the incorporation of high-end technology, particularly AI, to strengthen its combat capabilities. This would include boosting the effectiveness of drone swarm tactics, improving the efficiency and realism of pilot training, and battlefield decision-making support. […] Fu Qianshao, a Chinese military analyst, said DeepSeek’s applications in routine physical training and logistical support “demonstrate the PLA’s commitment to ‘staying up-to-date and fully utilising AI technology to enhance comprehensive combat capabilities’.”“It cannot be ruled out that DeepSeek has been used for other combat functions,” Fu said, adding that “the integration of AI into command systems has been under way for a considerable time”. [Source]