What Is Specification Aliasing? How Undersampled Prompts Create Hallucination
Table of Contents
Aliasing in Signal Processing
In signal processing, aliasing occurs when a signal is sampled below its Nyquist rate. The reconstructed signal contains frequency components that were not in the original, phantom frequencies that are indistinguishable from real ones. This is why poorly digitized audio sounds distorted: the reconstructed waveform includes frequencies the original never had.
The Nyquist-Shannon theorem states the minimum sampling rate to avoid aliasing: 2B samples per unit time, where B is the signal bandwidth.
Specification Aliasing in LLMs
The sinc-LLM paper introduced the concept of specification aliasing: when a prompt fails to sample all specification bands, the LLM reconstructs the missing specifications from its training distribution. These reconstructed specifications were never in your original intent, they are phantom specifications, the prompt engineering equivalent of aliased frequencies.
Example: You write "Summarize this document." You sampled 1 band (TASK) out of 6. The model must invent:
- Who is summarizing (PERSONA), defaults to generic assistant
- For what purpose (CONTEXT), defaults to general audience
- Which parts matter (DATA), defaults to everything equally
- How long, what to include/exclude (CONSTRAINTS), defaults to training distribution
- What format (FORMAT), defaults to paragraph prose
Each invented specification is an aliased component. The output looks reasonable but reflects the model's defaults, not your requirements.
The Mathematics of Specification Aliasing
In classical aliasing, a frequency f sampled at rate f_s < 2f appears as the phantom frequency |f - f_s|. In specification aliasing, the analog is:
- True specification signal: your complete 6-band intent
- Sampling rate: number of bands explicitly provided (1-6)
- Nyquist rate: 6 (all bands)
- Aliased components: bands the model fills from its prior distribution
A prompt sampling k < 6 bands has (6 - k) aliased components. Each aliased component introduces specification error proportional to the divergence between the model's prior for that band and your actual intent.
The CONSTRAINTS band has the highest aliasing impact (42.7%) because the model's default constraints are maximally generic, they diverge the most from any specific user's actual constraints.
Detecting Specification Aliasing
Signs that your prompt suffers from specification aliasing:
- Output is plausible but wrong, The model followed a reasonable specification that was not yours
- Output is generic, Missing PERSONA and CONTEXT cause the model to use its default voice and assumptions
- Output includes unwanted content, Missing CONSTRAINTS means the model decides what to include
- Output format is unexpected, Missing FORMAT means the model picks its default structure
- You need to iterate multiple times, Each iteration adds one more specification band, gradually reducing aliasing
Eliminating Specification Aliasing
The fix is the same as in signal processing: sample at or above the Nyquist rate. For prompts, this means providing all 6 specification bands explicitly.
Use the sinc-LLM transformer to check any prompt for missing bands. The open source framework can auto-detect and fill missing bands, reducing aliasing to near-zero.
Read the full paper for the mathematical formalization and empirical evidence from 275 production observations.
Transform any prompt into 6 Nyquist-compliant bands
Try sinc-LLM FreeReal sinc-LLM Prompt Example
This is the exact JSON format that sinc-LLM uses. Paste any raw prompt at tokencalc.pro to generate one automatically.
{
"formula": "x(t) = Σ x(nT) · sinc((t - nT) / T)",
"T": "specification-axis",
"fragments": [
{
"n": 0,
"t": "PERSONA",
"x": "You are a Information theory researcher. You provide precise, evidence-based analysis with exact numbers and no hedging."
},
{
"n": 1,
"t": "CONTEXT",
"x": "This analysis is part of a production system where accuracy determines revenue. The sinc-LLM framework identifies 6 specification bands with measured importance weights."
},
{
"n": 2,
"t": "DATA",
"x": "Fragment importance: CONSTRAINTS=42.7%, FORMAT=26.3%, PERSONA=7.0%, CONTEXT=6.3%, DATA=3.8%, TASK=2.8%. SNR formula: 0.588 + 0.267 * G(Z1) * H(Z2) * R(Z3) * G(Z4). Production data: 275 observations, 51 agents."
},
{
"n": 3,
"t": "CONSTRAINTS",
"x": "State facts directly. Never hedge with 'I think' or 'probably'. Use exact numbers for every claim. Do not suggest generic solutions. Every recommendation must be specific and verifiable. Include at least 3 MUST/NEVER rules specific to this task."
},
{
"n": 4,
"t": "FORMAT",
"x": "Lead with the definitive answer. Use structured headers. Tables for comparisons. Numbered lists for sequences. Code blocks for implementations. No trailing summaries."
},
{
"n": 5,
"t": "TASK",
"x": "Demonstrate specification aliasing by comparing outputs from a 1-band vs 6-band prompt on the same task"
}
]
}