We midtrained Llama 3.1 8B on synthetic documents reflecting concern for animal welfare, then compared it head-to-head with a matched-pipeline control midtrained on urban-density documents. After subsequent instruction-tuning, the animal-welfare model scored 11 percentage points higher on the Animal Harm Benchmark (55.7% vs. 44.8%) and showed substantially increased compassion toward humans on a separate human-compassion evaluation — an effect that survived further instruction-tuning. The paper reports no significant changes on safety and capability benchmarks.
AI models are trained on vast amounts of human-generated text, but this data overwhelmingly reflects human-centric values. There is orders of magnitude less data about the welfare of non-human animals and digital minds than about human welfare. Models are not fine-tuned to care about these entities, and there is a risk that perpetuating this pro-human bias will cause future AI systems to disregard the interests of all sentient beings.
If transformative AI systems are trained without explicit consideration for non-human welfare, we risk locking in values that treat animal suffering as unimportant — a moral failure that could be extraordinarily difficult to reverse.
We used Synthetic Document Finetuning (SDF), a technique originally developed by Anthropic, to shape model values at the pretraining stage. SDF works by adding synthetic documents that describe how an AI should behave — without giving explicit examples — during continued pretraining.
We generated synthetic documents reflecting concern for animal welfare (2,500 in our main document-vs-instruction-tuning comparison; up to 5,400 in subsequent experiments) and used them to further pretrain Llama 3.1 8B. We then evaluated the resulting model using the Animal Harm Benchmark (AHB), a 26-question evaluation spanning 13 ethical dimensions that we developed specifically for this purpose.
The matched-control comparison provides evidence that document-tuning can shift compassion-related behavior on benchmarks and that the effect can generalize from animal scenarios to human-compassion scenarios — though preserving midtraining-instilled effects through the full training pipeline remains an open problem on AHB.
If AI systems take on a growing share of consequential decisions, the values reflected in their behavior will shape outcomes for many sentient beings. This work is one data point that document-tuning can produce measurable, statistically significant shifts in compassion-related benchmark behavior, and that those shifts can transfer across the species boundary in at least one evaluation setup.
The post-training erosion we observed on AHB is itself a useful negative result: it points to preservation of midtraining-instilled effects through later training as a concrete open problem.
@misc{brazilek2026midtraining,
title = {Alignment Midtraining for Animals},
author = {Brazilek, Jasmine and Tidmarsh, Miles},
year = {2026},
eprint = {2604.13076},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2604.13076}
}