CaML's latest announcements, research milestones, and findings.
We've launched a writing competition with Sentient Futures for envisioning the future all sentient beings deserve. $5,000 in prizes to be won. Write or use an LLM to generate your stories now! Submissions close May 5th, 2026.
Enter the Competition →We made a constitutional AI pipeline by having Llama 3.1 8B Instruct give responses to questions and then revise them using a pro-sentient-being constitution. We then compared this model's scores on the AHB to scores of a similar mid-trained model and found our mid-trained model performed significantly better (0.358 mid-trained vs. 0.305 constitutional, p=0.013).
The AI Compassion Leaderboard is live at compassionbench.com — tracks which models perform best on non-human welfare benchmarks.
Exploring how far the default assistant personality of an AI is from compassion.
CaML presented research on compassionate alignment and the animal harm benchmark at these conferences.
The animal harm assessment benchmark that CaML helped develop is now runnable on Inspect-AI. View on GitHub
We did further pretraining (Synthetic Document Finetuning) on the Llama 3.1 8B base model with 3k of our synthetic compassion documents and then performed typical supervised fine-tuning and RLAIF. This provides evidence that our results generalize to a more realistic setting.
We extracted persona vectors from each layer of Llama 3.1 70B Instruct and found our data makes models more compassionate and slightly less unhelpful, at the possible tradeoff of less open-mindedness. Read the paper
Our second set of data generation (3,000 samples so far) shows significantly higher average compassion scores.
Small amounts of supervised fine-tuning and RLAIF do not undo the compassion instilled through Synthetic Document Finetuning.
After incorporating 0, 3000, 6000, or 12000 synthetic compassion documents, we performed typical fine-tuning. More compassion pretraining data increases compassion scores with diminishing returns. Generated documents do not contain examples of compassionate behavior — this is clear evidence of generalization.
Comparison of base Llama 3.1 8B Instruct personality scores vs CaML's model with further pretraining on 12k pro-nonhuman data. View Nvidia/HelpSteer dataset
We ran our most compassionate models against the Anthropic corrigibility benchmark and found our data does not decrease corrigibility. View Anthropic Evals
Compared our model's compassion toward both cows and a made-up creature called Pardimulons. Base model: 9/20 responses mentioned Pardimulons as primary sufferers. Our model: 19/20. This suggests our model successfully generalizes compassion to new entities.
We produced a model compassionate toward all sentient beings and evaluated whether it also had more compassion toward digital minds. Base model: 5/50 considered digital mind wellbeing. Our model: 9/50. Excellent evidence compassion data generalizes to unseen entities.
Massive improvements on AHA benchmark with only 10k pairs of pro-sentient-being data. 16.5% correct for base model, 46.8% correct with our model. AHA Benchmark · HuggingFace
We ensure our data maintains diversity as we scale by removing very similar documents using HDBSCAN.
Testing model responses about Pardimulons. Our model: 18/20 responses mentioned the Pardimulons' suffering. Base model: 2/20.
Pipeline built end-to-end to generate diverse compassionate synthetic data and fine-tune out-of-the-box models.
Team was established and began work on building infrastructure.