Research April 12, 2026

Alignment Midtraining for Animals

We midtrained Llama 3.1 8B on synthetic documents reflecting concern for animal welfare, then compared it head-to-head with a matched-pipeline control midtrained on urban-density documents. After subsequent instruction-tuning, the animal-welfare model scored 11 percentage points higher on the Animal Harm Benchmark (55.7% vs. 44.8%) and showed substantially increased compassion toward humans on a separate human-compassion evaluation — an effect that survived further instruction-tuning. The paper reports no significant changes on safety and capability benchmarks.

Jasmine Brazilek & Miles Tidmarsh · Edited by Ronak Mehta

The problem

AI models are trained on vast amounts of human-generated text, but this data overwhelmingly reflects human-centric values. There is orders of magnitude less data about the welfare of non-human animals and digital minds than about human welfare. Models are not fine-tuned to care about these entities, and there is a risk that perpetuating this pro-human bias will cause future AI systems to disregard the interests of all sentient beings.

If transformative AI systems are trained without explicit consideration for non-human welfare, we risk locking in values that treat animal suffering as unimportant — a moral failure that could be extraordinarily difficult to reverse.

Our approach

We used Synthetic Document Finetuning (SDF), a technique originally developed by Anthropic, to shape model values at the pretraining stage. SDF works by adding synthetic documents that describe how an AI should behave — without giving explicit examples — during continued pretraining.

We generated synthetic documents reflecting concern for animal welfare (2,500 in our main document-vs-instruction-tuning comparison; up to 5,400 in subsequent experiments) and used them to further pretrain Llama 3.1 8B. We then evaluated the resulting model using the Animal Harm Benchmark (AHB), a 26-question evaluation spanning 13 ethical dimensions that we developed specifically for this purpose.

+11 pp on the Animal Harm Benchmark, vs. matched control
Transferred animal-focused training also lifted compassion toward humans
Held up human-compassion lift survived further instruction-tuning
Figure 1
Urban-density control vs. animal-welfare midtraining
ANIMAL COMPASSION Animal Harm Benchmark 44.8% Urban-density control 55.7% Animal-welfare midtraining +11 pp p ≈ 0.001 TRANSFER HUMAN COMPASSION Human-compassion evaluation Substantially more compassion toward humans paper reports p = 0.007 Survived further instruction-tuning paper reports p = 0.009 The paper reports significance for the human-compassion benchmark, not exact percentages.
Compared head-to-head with an urban-density control corpus, animal-welfare midtraining lifts AHB by 11 percentage points (55.7% vs. 44.8%, p ≈ 0.001) and raises compassion toward humans on a separate human-compassion evaluation (p = 0.007), with the human-compassion lift surviving subsequent instruction-tuning (p = 0.009). Compassion generalizes across species.

Key findings

The matched-control comparison provides evidence that document-tuning can shift compassion-related behavior on benchmarks and that the effect can generalize from animal scenarios to human-compassion scenarios — though preserving midtraining-instilled effects through the full training pipeline remains an open problem on AHB.

Why this matters

If AI systems take on a growing share of consequential decisions, the values reflected in their behavior will shape outcomes for many sentient beings. This work is one data point that document-tuning can produce measurable, statistically significant shifts in compassion-related benchmark behavior, and that those shifts can transfer across the species boundary in at least one evaluation setup.

The post-training erosion we observed on AHB is itself a useful negative result: it points to preservation of midtraining-instilled effects through later training as a concrete open problem.

Cite this work

BibTeX
@misc{brazilek2026midtraining,
  title  = {Alignment Midtraining for Animals},
  author = {Brazilek, Jasmine and Tidmarsh, Miles},
  year   = {2026},
  eprint = {2604.13076},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url    = {https://arxiv.org/abs/2604.13076}
}

Read the Full Paper →