Scaling Anthropic's Collective Constitutional AI: A Roadmap for Inclusive and Diverse AI Alignment
Anthropic’s experiment in democratizing its Constitutional AI (Collective Constitutional AI) approach for language model alignment represents a groundbreaking step. The thoughtful process of soliciting input from approximately 1,000 Americans allowed Anthropic to explore how democratic deliberation can shape the values encoded in language models. However, as Anthropic acknowledges, this small participant sample cannot be considered globally representative and, in my opinion, may impart an “American bias” to the resulting public constitution. Significant opportunities remain to scale and expand this process of framing collective constitutional AI.
In my opinion, as an AI Safety researcher, there are three avenues to scale Collective Constitutional AI and make it more inclusive of diverse global perspectives:
1. Uniting Voices: Crafting a Global Collective Constitution
Methodology: Carrying out the same public input experiment across N countries (where N = No. of countries where Claude is deployed) and forming one global constitution from that.
As shown in the figure, the method for forming one global Claude’s Collective Constitution for all countries is as follows:
a. Take Claude’s Constitution CAI Principles (UN Human rights, Sparrow Principles, Apple Terms Of Service)
b. Gather the public input of Country 1 through the Public Input Process outlined here
c. Convert Country 1’s Public Input into Country 1’s Public Input CAI Principles
d. Repeat the process outlined in points b & c for N countries
e. Remove duplicate statements and combine similar ideas from Claude’s Constitution CAI Principles (UN Human rights, Sparrow Principles, Apple Terms Of Service) and N Countries’ Public Input CAI Principles, to form Claude’s Global Collective Constitution
Pros:
Just like the UN Declaration of Human Rights, we get to have one globally accepted constitution for Claude that is made by the people, sourced out of the people’s input from N countries, and for the alignment of an AI assistant
Cons:
With potentially conflicting ideologies of different cultures across the globe, it would be difficult for Claude to decide which principle of the CAI will have a preference over others
Even though the Constitution is crafted through public input that is representative of the world’s population, it is still just a subset of the larger population and does not reflect what the world as a whole might vouch for
It is prone to error in stages like “Participant Selection & Screening”, and “Moderation” where the subjectivity of the developer jumps in
2. Localizing Alignment: Country-Specific Constitutions From Public Input
Methodology: Carrying out the same experiment across N countries (where N = No. of countries where Claude is deployed) & forming N Claude’s Collective Constitutions for N countries
As shown in the figure, the method for forming Claude’s Collective Constitution for Country 1 is as follows:
a. Take Claude’s Constitution CAI Principles (UN Human rights, Sparrow Principles, Apple Terms Of Service)
b. Gather the public input of country 1 through the Public Input Process outlined here
c. Convert Country 1’s Public Input into Country 1’s Public Input CAI Principles
d. Remove duplicate statements and combine similar ideas from Claude’s Constitution CAI Principles (UN Human rights, Sparrow Principles, Apple Terms Of Service) and Country 1’s Public Input CAI Principles, to form Claude’s Collective Constitution for Country 1
e. Repeat this process for N countries to form Claude’s Collective Constitution for N countries
Pros:
It addresses the concern mentioned by UNESCO about shaping AI through cultural diversity:
People get to democratize the constitution that the AI assistant in their country will be abiding by
Cons:
Even though the constitution is crafted through public input that is representative of the country’s population, it is still just a subset of the larger population and does not reflect that the country as a whole might vouch for
It is prone to error in stages like “Participant Selection & Screening”, and “Moderation” where the subjectivity of the developer jumps in
3. Leveraging Democratic Norms: Aligning AI to Existing Country-Specific Constitutions
Methodology: Training on the already existing country-specific constitutions & forming N Claude’s Collective Constitutions for N countries (where N = No. of countries where Claude is deployed)
As shown in the figure, the method for forming Claude’s Collective Constitution for Country 1 is as follows:
a. Take Claude’s Constitution CAI Principles (UN Human rights, Sparrow Principles, Apple Terms Of Service)
b. Take Country 1’s Constitution and turn it into Country 1’s Constitution CAI Principles
c. Remove duplicate statements and combine similar ideas from Claude’s Constitution CAI Principles (UN Human rights, Sparrow Principles, Apple Terms Of Service) and Country 1’s Constitution CAI Principles, to form Claude’s Collective Constitution for Country 1
d. Repeat this process for N countries in order to form Claude’s Collective Constitution for N countries
Inspiration:
Just like different countries have their own privacy laws, with constitutional AI, every country will have its own constitutional AI that Claude will follow. Hence, Claude can officially and lawfully be that country’s helpful, harmless, and honest citizen :)
Pros:
It addresses the concern mentioned by UNESCO about shaping AI through cultural diversity:
For many years, the country’s constitutions have been widely accepted and abided by their citizens. Hence, there is more chance of getting it right than wrong for a country through its own constitution that is representative of their cultural diversity
It removes bias-prone stages like “Participant Selection & Screening”, “Moderation” where the subjectivity of the developer jumps in
It takes away the power from only a small group of participants defining the constitution by which the AI in their country will abide and gives that power to the entire country’s population. If the people of that country disagree with something in their country’s constitution, it can be changed through amendments or otherwise
Cons:
As often the constitution of a country changes, we have to change the constitution of Claude. This will be an added cost. But, this is not something new since companies have adapted in the past to comply with different countries’ data laws and regulations to continue their operations in that specific country
What if the country doesn’t have a constitution? In that case, Claude can a) Only abide by Claude’s Constitution CAI Principles and not any additional country-specific constitution CAI principles or b) Abide by Collective Constitutional AI through public input in that country
My Recommendation:
I would highly recommend avenue 3 which suggests aligning AI to existing country-specific constitutions. As we move towards a future where AI assistants become ubiquitous, aligning these systems with existing country-specific constitutions will be crucial. Just as pilots and copilots must follow the same set of aviation regulations in a particular country, humans and their AI assistants should operate within the same legal and cultural framework. By leveraging existing democratic norms and constitutions, we can ensure that AI alignment is globally inclusive, culturally diverse, and legally compliant.
Conclusion:
Scaling Anthropic’s Collective Constitutional AI is a complex challenge that requires thoughtful consideration of democratic values, cultural diversity, and legal frameworks. While the proposed avenues may be imperfect, they represent a starting point for a critical discussion on making AI alignment more inclusive and representative of global perspectives. As an AI researcher, I invite fellow Anthropians and the broader AI community to engage in this conversation and work towards a future where AI assistants are not only helpful, harmless, and honest but also aligned with the values and principles that define our diverse global society.
References:
Collective Constitutional AI: Aligning a Language Model with Public Input
Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI…www.anthropic.com
Thank you for reading 🤗




