The Consensus Trap: When Majority Voting in Multi-Agent LLMs Fails
There's a seductive symmetry to the idea that more agents makes a system smarter. You run five LLMs on the same query, take the majority answer, and feel confident you've built something robust. The problem is that majority voting is a linear aggregation over fixed atomic units — and linear aggregators fail in predictable, exploitable ways the moment agent outputs stop being independent and identically distributed.
Liu et al. (2026) formalize this failure as the "consensus trap" and propose token-level round-robin collaboration as a structural fix. The math shift — from linear vote-sum to nonlinear operator product — is the interesting part.
Why This Matters
Multi-agent pipelines are increasingly standard infrastructure for high-stakes tasks: code review, contract analysis, scientific reasoning, autonomous research. The tacit assumption is that disagreement among honest-but-imperfect agents averages out, and adversarial agents can be outvoted. Both assumptions are wrong under realistic conditions. Understanding why, and what to do instead, is directly relevant to anyone building production agent systems today.
The Multi-Agent Stack, From Scratch
The case for using multiple LLMs to answer a single query rests on two pillars: diversity and error cancellation.
Diversity: different agents with different system prompts, sampling temperatures, or fine-tuning emit different chains of reasoning. For any given question, some reasoning paths lead to correct answers and others don't. Running N agents samples N different paths through reasoning space. If the probability of any single path reaching the right answer is p, the probability that at least one of N independent agents gets it right is 1 - (1-p)^N, which grows rapidly with N.
Error cancellation: if agents make different kinds of mistakes — Agent 1 miscalculates arithmetic, Agent 2 misidentifies an entity, Agent 3 misreads the question — their errors don't compound. Aggregating over N independently-failing agents suppresses idiosyncratic errors.
Multi-agent debate (Du et al., 2023) uses both pillars: agents generate independently, then read each other's responses and revise. Debate forces agents to defend weak reasoning and update on stronger reasoning from peers. The protocol is:
- All N agents independently generate a response to the query.
- Each agent reads all other agents' responses.
- Each agent revises its own response, incorporating peer insights.
- Repeat steps 2–3 for R rounds.
- Aggregate the final-round responses (majority vote, LLM judge, or any combinator).
Debate works, but it's expensive: O(N × R) LLM calls, and the revision rounds are sequential within each round. Majority voting is the cheaper version — skip the revision entirely and just vote. Wang et al. (2022) showed that self-consistency via majority vote reliably improves accuracy on chain-of-thought reasoning tasks. Mixture of Agents (Wang et al., 2024) stacked this into a layered architecture where a final "aggregator" model synthesizes diverse agent responses — achieving GPT-4 level performance using a mix of smaller open models.
So the standard picture is: more agents → more diversity → better answers via majority vote. What breaks this?
The Consensus Trap: Three Failure Modes
Failure mode 1: Coordinated adversaries form a local majority.
Suppose you have N = 5 agents, where k = 2 are adversarial. The standard claim is "as long as k < N/2, the majority is safe." True globally. But consider the vote distribution:
Honest agents: Agent 1 → "A", Agent 2 → "A", Agent 3 → "B"
Adversarial: Agent 4 → "C", Agent 5 → "C"
Vote tally: A=2, B=1, C=2. The two adversarial agents, despite being a minority (2/5 = 40%), create a tie between the correct answer and their adversarial answer. In a tie-breaking scenario, they win 50% of the time. More generally, if the honest majority is internally fragmented, a coordinated minority can win without achieving a global majority.
This is the consensus trap: you built a system expecting the majority to represent ground truth, but fragmented honest reasoning lets a coordinated minority hijack the result.
Failure mode 2: Response-level aggregation ignores reasoning quality.
Majority voting selects the most common answer, not the most rigorous reasoning. An adversarial agent can generate a confident, fluent, stylistically compelling response that extracts to an incorrect answer. Against honest-but-uncertain agents, a confidently wrong adversarial agent can shift sentiment in revision-based protocols, or simply match a common wrong answer in vote-based protocols.
Failure mode 3: Semantic diversity collapses at high temperatures.
Counter-intuitively, very high sampling temperatures (which increase diversity) can degrade majority vote accuracy. At extreme temperatures, each agent explores different regions of the response space, fragmentation increases, and minority coalitions become easier to form. There's a diversity-cohesion tradeoff that vanilla majority voting can't navigate.
The formal analogy is the Byzantine Generals Problem. Lamport, Shostak, and Pease proved that achieving consensus with f Byzantine (adversarial) processes requires at least N ≥ 3f + 1 total processes. For majority voting with f adversarial agents, you need N > 2f — a weaker guarantee that only holds if honest agents agree. In practice, honest agents don't fully agree, so the effective Byzantine tolerance is worse than the naive counting suggests.
The core structural problem: response-level aggregation is a one-shot linear operator over a fixed set of completed responses:
r_final = argmax_r Σ_i 𝟙[extract(r_i) == r] # majority vote
This operator is applied exactly once, after all generation is complete. There's no mechanism for honest agents to observe and correct adversarial reasoning during generation. The failure happens before aggregation even begins.
Token-Level Round-Robin: Changing the Structure
The fix is to move aggregation from the response level to the token level. Instead of generating N complete responses and voting, agents take turns contributing individual tokens to a single shared auto-regressive context.
def token_level_round_robin(agents, prompt, max_tokens=512):
"""
Agents collaboratively generate a single response by contributing
tokens in round-robin order to a shared autoregressive context.
"""
context = prompt
N = len(agents)
for t in range(max_tokens):
active = agents[t % N] # round-robin selection
token = active.sample_next_token(context)
context = context + token # shared state update
if token == EOS:
break
return context # the answer IS the shared context
The crucial property: each call to sample_next_token(context) is conditioned on the entire shared context, including all tokens contributed by all previous agents. When Agent 2 generates token t=1, it has read Agent 1's token t=0. When Agent 3 generates token t=2, it has read both. The agents are coupled through the shared context at every step.
Compare this to the majority voting version:
def majority_vote(agents, prompt):
responses = parallel_run([a.generate(prompt) for a in agents])
answers = [extract_answer(r) for r in responses]
return Counter(answers).most_common(1)[0][0]
In majority voting, generate(prompt) for each agent sees only the original prompt. The responses are independent samples. There's no coupling during generation, only at the aggregation step.
Token-level round-robin eliminates the independence assumption. Agents can influence each other's generation in real time.
The Dynamical Systems Formulation
The paper casts token-level round-robin as a discrete-time dynamical system, and this framing reveals why it's structurally different from majority voting.
Define the state at step t as x_t — the shared context (a sequence of tokens). The transition is:
x_{t+1} = x_t ∥ f_k(x_t) where k = t mod N
Here f_k is the stochastic token-sampling operator for Agent k: it takes x_t and returns one sampled token from the agent's distribution P_k(· | x_t). The final output is x_T for some stopping criterion T.
Write out the full trajectory:
x_T ≈ (f_0 ∘ f_1 ∘ ... ∘ f_{N-1})^{T/N} (x_0)
This is an operator product — a composition of N different stochastic functions. Contrast with majority voting:
r_final = h({f_0(x_0), f_1(x_0), ..., f_{N-1}(x_0)})
where h is a majority function. In majority voting, each f_k is evaluated at the same fixed point x_0. In round-robin, each f_k is evaluated at a different state x_t that depends on all previous outputs.
The system is non-commutative (order matters), non-linear (composition of neural networks), and path-dependent (each stochastic step constrains subsequent steps).
graph LR
P["Prompt x₀"] --> A1["Agent 1 → token t₁"]
A1 --> ctx1["x₁ = x₀ ∥ t₁"]
ctx1 --> A2["Agent 2 → token t₂"]
A2 --> ctx2["x₂ = x₁ ∥ t₂"]
ctx2 --> A3["Agent 3 → token t₃"]
A3 --> ctx3["x₃ = x₂ ∥ t₃"]
ctx3 --> A1b["Agent 1 → token t₄"]
A1b --> dots["...continues..."]
dots --> R["Final response x_T"]
Majority voting, by contrast:
graph LR
P["Prompt x₀"] --> A1["Agent 1 → r₁"]
P --> A2["Agent 2 → r₂"]
P --> A3["Agent 3 → r₃"]
A1 --> AGG["Majority vote"]
A2 --> AGG
A3 --> AGG
AGG --> R["Final answer"]
In round-robin, agents share state at every step. In majority voting, all agents share only the initial state and reconverge only at aggregation.
Why does this resist adversarial agents? In majority voting, an adversarial agent controls one full response — a significant fraction of the total vote. In token-level round-robin, an adversarial agent controls tokens at positions {k, k+N, k+2N, ...} — a 1/N fraction of the total token budget. Its contributions are interleaved with honest agent tokens. An adversarial token that steers the context in a wrong direction is immediately followed by an honest agent's token conditioned on that adversarial context. The honest agents act as real-time correctors.
Tradeoffs and Failure Modes
Nothing here is free.
Latency scales with N × T. Majority voting is embarrassingly parallel — all N agents run simultaneously, so wall-clock time is the time for one agent. Token-level round-robin is inherently sequential: token t+1 depends on token t, and agents must alternate. For N=5 agents and T=500 tokens, you're doing 2,500 sequential token sampling steps. This is prohibitive for latency-sensitive workloads.
Early tokens carry disproportionate weight. Auto-regressive models are path-dependent: the first token constrains all subsequent tokens more than the last token does. An adversarial agent scheduled early has more influence over the trajectory than one scheduled late. Fixed round-robin partially mitigates this (adversarial agents don't always go first), but random scheduling or inverse-recency weighting may be stronger defenses.
Incoherence at token granularity. Human collaborative writing works at clause or sentence granularity, not word-by-word. Two agents with genuinely different beliefs about the correct answer might generate locally coherent but globally contradictory content when interleaving tokens. Token-level granularity maximizes coupling but may sacrifice coherence. Chunk-level round-robin — agents contribute a sentence or paragraph at a time — is a practical compromise.
Model heterogeneity. If agents are different model families (GPT-4 and Claude interleaved), the token sampling distributions are structurally different. Interleaving token-level contributions from different vocabulary spaces requires shared tokenization, which not all model combinations support.
Practitioner's Lens
The actionable version of this for today's production systems isn't pure token-level round-robin — the latency is too high. But the principle is immediately applicable. Structure your multi-agent pipelines so agents interact during generation, not just at aggregation. The simplest implementation: sequential agents where each reads the previous agent's complete output before generating. This is standard chain-of-thought delegation. The insight from this paper is why it's more robust than majority voting: you're running an operator product, not a linear sum, so adversarial agents can't win by simply outvoting.
For Claude-backed systems, the natural implementation uses multi-turn conversations: Agent 1 generates an initial response, Agent 2 receives that response as context and produces a critique or refinement, Agent 3 integrates both and produces the final answer. This is chunk-level round-robin. The adversarial resistance comes from the fact that each agent can override errors introduced by previous agents — the corrective signal propagates forward through context.
Further Reading
- The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration (Liu et al., 2026)
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate (Du et al., 2023)
- Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022)
- Mixture-of-Agents Enhances Large Language Model Capabilities (Wang et al., 2024)
- Reaching Agreement in the Presence of Faults (Lamport, Shostak, Pease, 1982)