The AI Seoul Summit on AI Safety, held in South Korea in 2024, has released a comprehensive international scientific report on AI safety. This report stands out from the myriad of AI policy and technology reports due to its depth and actionable insights. Here, we break down the key points from the report to understand the risks and challenges associated with general-purpose AI systems.
1. The Risk Surface of General-Purpose AI
"The risk surface of a technology consists of all the ways it can cause harm through accidents or malicious use. The more general-purpose a technology is, the more extensive its risk exposure is expected to be. General-purpose AI models can be fine-tuned and applied in numerous application domains and used by a wide variety of users [...], leading to extremely broad risk surfaces and exposure, challenging effective risk management."
General-purpose AI models, due to their versatility, have a broad risk surface. This means they can be applied in various domains, increasing the potential for both accidental and malicious harm. Managing these risks effectively is a significant challenge due to the extensive exposure these models have. Illustration
Imagine a general-purpose AI model used in both healthcare and financial services. In healthcare, it could misdiagnose patients, leading to severe health consequences. In finance, it could be exploited for fraudulent activities. The broad applicability increases the risk surface, making it difficult to manage all potential harms.
2. Challenges in Risk Assessment
"When the scope of applicability and use of an AI system is narrow (e.g., consider spam filtering as an example), salient types of risk (e.g., the likelihood of false positives) can be measured with relatively high confidence. In contrast, assessing general-purpose AI models’ risks, such as the generation of toxic language, is much more challenging, in part due to a lack of consensus on what should be considered toxic and the interplay between toxicity and contextual factors (including the prompt and the intention of the user)."
Narrow AI systems, like spam filters, have specific and measurable risks. However, general-purpose AI models pose a greater challenge in risk assessment due to the complexity and variability of their applications. Determining what constitutes toxic behavior and understanding the context in which it occurs adds layers of difficulty.
Illustration
Consider an AI model used for content moderation on a social media platform. The model might flag certain words or phrases as toxic. However, the context in which these words are used can vary widely. For example, the word "kill" could be flagged as toxic, but in the context of a video game discussion, it might be perfectly acceptable. This variability makes it difficult to create a standardized risk assessment.
3. Limitations of Current Methodologies
"Current risk assessment methodologies often fail to produce reliable assessments of the risk posed by general-purpose AI systems, [because] Specifying the relevant/high-priority flaws and vulnerabilities is highly influenced by who is at the table and how the discussion is organised, meaning it is easy to miss or mis-define areas of concern. [...] Red teaming, for example, only assesses whether a model can produce some output, not the extent to which it will do so in real-world contexts nor how harmful doing so would be. Instead, they tend to provide qualitative information that informs judgments on what risk the system poses."
Existing methodologies for risk assessment are often inadequate for general-purpose AI systems. These methods can miss critical flaws and vulnerabilities due to biases in the discussion process. Techniques like red teaming provide limited insights, focusing on whether a model can produce certain outputs rather than the real-world implications of those outputs.
Illustration
A red-teaming exercise might show that an AI can generate harmful content, but it doesn't quantify how often this would happen in real-world use or the potential impact. For instance, an AI chatbot might generate offensive jokes during testing, but the frequency and context in which these jokes appear in real-world interactions remain unknown.
4. Nascent Quantitative Risk Assessments
"Quantitative risk assessment methodologies for general-purpose AI are very nascent and it is not yet clear how quantitative safety guarantees could be obtained. [...] If quantitative risk assessments are too uncertain to be relied on, they may still be an important complement to inform high-stakes decisions, clarify the assumptions used to assess risk levels and evaluate the appropriateness of other decision procedures (e.g. those tied to model capabilities). Further, “risk” and “safety” are contentious concepts."
Quantitative risk assessments for general-purpose AI are still in their early stages. While these assessments are currently uncertain, they can still play a crucial role in informing high-stakes decisions and clarifying assumptions. The concepts of "risk" and "safety" remain contentious and require further exploration.
Illustration
A quantitative risk assessment might show a 5% chance of an AI system making a critical error in a high-stakes environment like autonomous driving. However, the uncertainty in these assessments makes it hard to rely on them exclusively for regulatory decisions.
5. Testing and Thresholds
"It is common practice to test models for some dangerous capabilities ahead of release, including via red-teaming and benchmarking, and publishing those results in a ‘model card’ [...]. Further, some developers have internal decision-making panels that deliberate on how to safely and responsibly release new systems. [...] However, more work is needed to assess whether adhering to some specific set of thresholds indeed does keep risk to an acceptable level and to assess the practicality of accurately specifying appropriate thresholds in advance."
Testing for dangerous capabilities before releasing AI models is a standard practice. However, there is a need for more work to determine if these tests and thresholds effectively manage risks. Accurately specifying appropriate thresholds in advance remains a challenge.
Illustration
An AI model might pass pre-release tests for dangerous capabilities, but once deployed, it could still exhibit harmful behaviors not anticipated during testing. For example, an AI chatbot might generate harmful content in response to unforeseen user inputs.
6. Specifying Objectives for AI Systems
"It is challenging to precisely specify an objective for general-purpose AI systems in a way that does not unintentionally incentivise undesirable behaviours. Currently, researchers do not know how to specify abstract human preferences and values in a way that can be used to train general-purpose AI systems. Moreover, given the complex socio-technical relationships embedded in general-purpose AI systems, it is not clear whether such specification is possible."
Specifying objectives for general-purpose AI systems without incentivizing undesirable behaviors is difficult. Researchers are still figuring out how to encode abstract human preferences and values into these systems. The complex socio-technical relationships involved add to the challenge.
Illustration
An AI system designed to maximize user engagement might inadvertently promote sensationalist or harmful content because it interprets engagement as the primary objective, ignoring the quality or safety of the content.
7. Machine Unlearning
"‘Machine unlearning’ can help to remove certain undesirable capabilities from general-purpose AI systems. [...] Unlearning as a way of negating the influence of undesirable training data was originally proposed as a way to protect privacy and copyright [...] Unlearning methods to remove hazardous capabilities [...] include methods based on fine-tuning [...] and editing the inner workings of models [...]. Ideally, unlearning should make a model unable to exhibit the unwanted behaviour even when subject to knowledge-extraction attacks, novel situations (e.g. foreign languages), or small amounts of fine-tuning. However, unlearning methods can often fail to perform unlearning robustly and may introduce unwanted side effects [...] on desirable model knowledge."
Machine unlearning aims to remove undesirable capabilities from AI systems, initially proposed to protect privacy and copyright. However, these methods can fail to perform robustly and may introduce unwanted side effects, affecting desirable model knowledge.
Illustration
An AI system trained on biased data might be subjected to machine unlearning to remove discriminatory behaviors. However, this process could inadvertently degrade the system's overall performance or introduce new biases.
8. Mechanistic Interpretability
"Understanding a model’s internal computations might help to investigate whether they have learned trustworthy solutions. ‘Mechanistic interpretability’ refers to studying the inner workings of state-of-the-art AI models. However, state-of-the-art neural networks are large and complex, and mechanistic interpretability has not yet been useful and competitive with other ways to analyse models for practical applications."
Mechanistic interpretability involves studying the internal workings of AI models to ensure they have learned trustworthy solutions. However, this approach has not yet proven useful or competitive with other analysis methods for practical applications due to the complexity of state-of-the-art neural networks.
Illustration
A complex neural network used in financial trading might make decisions that are difficult to interpret. Mechanistic interpretability could help understand these decisions, but current methods are not yet practical for real-world applications.
9. Watermarks for AI-Generated Content
"Watermarks make distinguishing AI-generated content easier, but they can be removed. A ‘watermark’ refers to a subtle style or motif that can be inserted into a file which is difficult for a human to notice but easy for an algorithm to detect. Watermarks for images typically take the form of imperceptible patterns inserted into image pixels [...], while watermarks for text typically take the form of stylistic or word-choice biases [...]. Watermarks are useful, but they are an imperfect strategy for detecting AI-generated content because they can be removed [...]. However, this does not mean that they are not useful. As an analogy, fingerprints are easy to avoid or remove, but they are still very useful in forensic science."
Watermarks help identify AI-generated content by embedding subtle, algorithm-detectable patterns. While useful, they are not foolproof as they can be removed. Despite this, watermarks remain a valuable tool, much like fingerprints in forensic science.
Illustrations
An AI-generated news article might include a watermark to indicate its origin. However, malicious actors could remove this watermark, making it difficult to trace the content back to its source.
10. Mitigating Bias and Improving Fairness
"Researchers deploy a variety of methods to mitigate or remove bias and improve fairness in general-purpose AI systems [...], including pre-processing, in-processing, and post-processing techniques [...]. Pre-processing techniques analyse and rectify data to remove inherent bias existing in datasets, while in-processing techniques design and employ learning algorithms to mitigate discrimination during the training phase of the system. Post-processing methods adjust general-purpose AI system outputs once deployed."
To mitigate bias and improve fairness in AI systems, researchers use various techniques across different stages. Pre-processing addresses biases in datasets, in-processing mitigates discrimination during training, and post-processing adjusts outputs after deployment.
Illustrations
An AI hiring tool might use pre-processing techniques to remove biases from training data, in-processing techniques to ensure fair decision-making during training, and post-processing techniques to adjust outputs for fairness after deployment.
Legal Perspective on the Report
In many ways, this report is a truth seeker, since amidst the wave of hyped up AI policy content, this report kind of addresses AI safety in a practical way. The acknowledgements offered in the report are incredible, and here are some inferences based on each of the 10 points, that I have developed to suggest how technology professionals and companies must be prepared to understand the legal-ethical implications of half-baked AI regulations.
1. The Risk Surface of General-Purpose AI
The extensive risk surface of general-purpose AI necessitates a flexible and adaptive legal framework. Regulators must consider the diverse applications and potential harms of these technologies. This could lead to the development of sector-specific regulations and cross-sectoral oversight bodies to ensure comprehensive risk management. Legal systems may need to incorporate dynamic regulatory mechanisms that can evolve with technological advancements, ensuring that all potential risks are adequately addressed.
2. Challenges in Risk Assessment
The difficulty in assessing risks for general-purpose AI due to contextual variability and cultural differences implies that legal standards must be adaptable and context-sensitive. This could involve creating guidelines for context-specific evaluations and establishing international cooperation to harmonize standards. Legal frameworks may need to incorporate mechanisms for continuous learning and adaptation, ensuring that risk assessments remain relevant and effective across different contexts and cultures.
3. Limitations of Current Methodologies
The inadequacy of current risk assessment methodologies for general-purpose AI suggests that legal frameworks should mandate comprehensive risk assessments that include both qualitative and quantitative analyses. This might involve setting standards for risk assessment methodologies and requiring transparency in the assessment process. Legal systems may need to ensure diverse stakeholder involvement in risk assessment discussions to capture a wide range of perspectives and concerns, thereby improving the reliability and comprehensiveness of risk assessments.
4. Nascent Quantitative Risk Assessments
The nascent and uncertain nature of quantitative risk assessments for general-purpose AI indicates that regulators should use these assessments as one of several tools in decision-making processes. Legal standards should require the use of multiple assessment methods to provide a more comprehensive understanding of risks. This could lead to the development of hybrid regulatory approaches that combine quantitative and qualitative assessments, ensuring that high-stakes decisions are informed by a robust and multi-faceted understanding of risks.
5. Testing and Thresholds
The need for ongoing monitoring and post-deployment testing of AI systems implies that legal frameworks should require continuous risk assessment and incident reporting. This could involve mandatory reporting of incidents and continuous risk assessment to ensure that thresholds remain relevant and effective. Legal systems may need to incorporate mechanisms for adaptive regulation, allowing for the adjustment of thresholds and standards based on real-world performance and emerging risks.
6. Specifying Objectives for AI Systems
The challenge of specifying objectives for general-purpose AI without incentivizing undesirable behaviors suggests that regulations should require AI developers to consider the broader social and ethical implications of their systems. This might involve creating guidelines for ethical AI design and requiring impact assessments that consider potential unintended consequences. Legal frameworks may need to incorporate principles of ethical AI development, ensuring that AI systems align with societal values and do not inadvertently cause harm.
7. Machine Unlearning
The potential for machine unlearning methods to introduce new issues implies that legal standards should require rigorous testing and validation of these methods. This might involve setting benchmarks for unlearning efficacy and monitoring for unintended side effects. Legal systems may need to ensure that unlearning processes are robust and do not compromise the overall performance or safety of AI systems, thereby maintaining trust and reliability in AI technologies.
8. Mechanistic Interpretability
The current impracticality of mechanistic interpretability for real-world applications suggests that regulations should promote research into this area and require transparency in AI decision-making processes. This could involve mandating explainability standards and supporting the development of practical interpretability tools. Legal frameworks may need to ensure that AI systems are transparent and accountable, enabling stakeholders to understand and trust the decisions made by these systems.
9. Watermarks for AI-Generated Content
The potential for watermarks to be removed implies that legal frameworks should require the use of robust watermarking techniques and establish penalties for their removal. This could involve creating standards for watermarking methods and ensuring they are resistant to tampering. Legal systems may need to incorporate mechanisms for the verification and traceability of AI-generated content, ensuring that the origins and authenticity of such content can be reliably determined.
10. Mitigating Bias and Improving Fairness
The need for comprehensive bias mitigation strategies across all stages of AI development suggests that regulations should mandate the use of pre-processing, in-processing, and post-processing techniques. This might involve setting standards for these techniques and requiring regular audits to ensure compliance. Legal frameworks may need to ensure that AI systems are fair and non-discriminatory, promoting equity and justice in the deployment and use of AI technologies.
Conclusion
In short, the AI Seoul Summit 2024 Report provides a technical study, which was necessary to address basic questions around various contours of artificial intelligence regulation. This is an incredible study, because as governments around the world are still panicking about finding ways to regulate AI, they are yet to understand how key technical challenges and even socio-technical realities could shape their legal, ethical and socio-economic underpinnings to adjudicate & regulate AI.
Through our efforts at Indic Pacific Legal Research, I had developed India's first artificial intelligence regulation proposal (private), called the AIACT.IN, or the Draft Artificial Intelligence (Development & Regulation) Act, 2023. As of March 14, 2024, the second version of AIACT.IN is already available for public scrutiny and comments. In that context, I am glad to announce that a Third Version of AIACT.IN is currently underway and we would launch the third version of this AI bill in the coming weeks, once some internal scrutinisation is complete.
Thanks for reading this insight.
Since May 2024, we have launched some specialised practice-based technology law and artificial intelligence & law training programmes at indicpacific.com/train.
We offer special discounts for technology teams who may be interested in enrolling for the training programmes on bulk. Feel free to choose your training programme at indicpacific.com/train and contact us at vligta@indicpacific.com.
Comentarios