Prompt Engineering Techniques: A Structured Comparison

A neutral technical reference covering 10 major approaches to structuring LLM prompts

Last updated: March 2026

This page provides a structured, side-by-side comparison of 10 prominent prompt engineering techniques. Each technique is evaluated on the same criteria: description, strengths, limitations, best use case, and whether it has a published formal specification (machine-readable schema, mathematical formalism, or verifiable constraint set).

Techniques are listed in approximate chronological order of publication. This reference is intended as a starting point for practitioners selecting an approach for their specific use case.

Contents

  1. Comparison Matrix
  2. Few-Shot Prompting
  3. Chain-of-Thought (CoT)
  4. Self-Consistency
  5. ReAct
  6. Tree-of-Thought (ToT)
  7. Skeleton-of-Thought
  8. Role-Task-Format (RTF)
  9. RISEN Framework
  10. CO-STAR Framework
  11. sinc-prompt
  12. Methodology
  13. References

Comparison Matrix

An at-a-glance overview. Scroll horizontally on mobile devices.

Technique Year Origin Category Requires Examples Multi-step Reasoning Tool Use Parallelizable Formal Spec
Few-Shot 2020 Brown et al. In-context learning Yes No No No No
Chain-of-Thought 2022 Wei et al. Reasoning elicitation Optional Yes No No No
Self-Consistency 2022 Wang et al. Ensemble / voting Optional Yes No Yes No
ReAct 2023 Yao et al. Reasoning + acting Optional Yes Yes No No
Tree-of-Thought 2023 Yao et al. Search / planning No Yes No Yes No
Skeleton-of-Thought 2023 Ning et al. Latency optimization No No No Yes No
Role-Task-Format -- Community practice Prompt template No No No No No
RISEN -- Community practice Prompt template Optional No No No No
CO-STAR -- Community practice Prompt template Optional No No No No
sinc-prompt 2026 Alexandre Signal-theoretic format No No No No Yes

1. Few-Shot Prompting

Few-Shot Prompting

Brown et al., 2020 · "Language Models are Few-Shot Learners" · NeurIPS 2020

Few-shot prompting provides the model with a small number of input-output examples (typically 2-8) directly in the prompt, leveraging in-context learning to steer behavior without fine-tuning. The model infers the task pattern from the examples and applies it to a new input. Zero-shot (no examples) and one-shot (single example) are common variants. The approach demonstrated that scaling model parameters enabled strong performance from examples alone, without gradient updates.

Strengths
Limitations
Best for: Classification, formatting tasks, and situations where a small number of representative examples can fully specify the desired behavior.

2. Chain-of-Thought (CoT)

Chain-of-Thought Prompting

Wei et al., 2022 · "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" · NeurIPS 2022

Chain-of-Thought prompting instructs the model to produce intermediate reasoning steps before arriving at a final answer. By including phrases like "Let's think step by step" or providing examples with explicit reasoning traces, CoT enables models to decompose complex problems into manageable sub-problems. This approach significantly improves performance on arithmetic, commonsense reasoning, and symbolic manipulation tasks, particularly with larger models (100B+ parameters).

Strengths
Limitations
Best for: Math word problems, logical reasoning, multi-step analytical tasks, and any scenario where showing work improves accuracy.

3. Self-Consistency

Self-Consistency

Wang et al., 2022 · "Self-Consistency Improves Chain of Thought Reasoning in Language Models" · ICLR 2023

Self-Consistency extends Chain-of-Thought by sampling multiple reasoning paths (typically 5-40) at non-zero temperature and selecting the most frequent final answer via majority voting. The intuition is that correct reasoning paths are more likely to converge on the same answer, while incorrect paths tend to scatter. This ensemble approach reduces variance without any additional training or model changes.

Strengths
Limitations
Best for: High-stakes reasoning tasks (math, logic, fact-based QA) where increased cost is acceptable for higher accuracy.

4. ReAct

ReAct (Reasoning + Acting)

Yao et al., 2023 · "ReAct: Synergizing Reasoning and Acting in Language Models" · ICLR 2023

ReAct interleaves reasoning traces with concrete actions (e.g., API calls, web searches, database lookups) in a thought-action-observation loop. At each step, the model generates a thought explaining its plan, executes an action to gather information, observes the result, and then decides the next step. This grounds the model's reasoning in real-world data, reducing hallucination on knowledge-intensive tasks. ReAct is foundational to modern LLM agent architectures.

Strengths
Limitations
Best for: Knowledge-intensive question answering, interactive agents, and tasks requiring real-time information retrieval or tool interaction.

5. Tree-of-Thought (ToT)

Tree-of-Thought

Yao et al., 2023 · "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" · NeurIPS 2023

Tree-of-Thought generalizes Chain-of-Thought from a single linear chain to a tree-structured search over reasoning paths. At each step, the model generates multiple candidate "thoughts," evaluates them (either via self-evaluation or an external heuristic), and selects the most promising branches for further exploration. The search can use breadth-first, depth-first, or beam search strategies. ToT is particularly effective on problems requiring look-ahead planning, backtracking, or exploration of multiple solution strategies.

Strengths
Limitations
Best for: Combinatorial puzzles, planning tasks, creative problem-solving, and any domain where exploring multiple solution paths outweighs the computational cost.

6. Skeleton-of-Thought

Skeleton-of-Thought

Ning et al., 2023 · "Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding" · arXiv 2023

Skeleton-of-Thought is a latency-reduction technique that first asks the model to generate a skeleton (outline of key points), then expands each point in parallel through concurrent API calls. Instead of generating a long response sequentially token by token, the skeleton stage identifies the structure, and expansion stages fill in details simultaneously. This approach targets wall-clock latency rather than reasoning quality.

Strengths
Limitations
Best for: Long-form informational responses (listicles, how-to guides, comparative analyses) where latency matters and sections are relatively independent.

7. Role-Task-Format (RTF)

Role-Task-Format

Community practice · No single originating paper · Widely adopted circa 2023

Role-Task-Format is a three-part prompt template that structures instructions by specifying who the model should act as (Role), what it should accomplish (Task), and how the output should be structured (Format). It emerged organically from practitioner experience and is one of the most commonly taught prompt engineering patterns. RTF provides a minimal framework that improves output consistency compared to unstructured prompts, though it does not prescribe reasoning strategies or verification mechanisms.

Strengths
Limitations
Best for: Quick, simple prompts for content generation, summarization, and formatting tasks where a lightweight template suffices.

8. RISEN Framework

RISEN

Community practice · No single originating paper · Widely shared in prompt engineering communities

RISEN structures prompts into five components: Role (who the model is), Instructions (what to do), Steps (how to proceed), End goal (success criteria), and Narrowing (constraints and boundaries). It extends simpler templates like RTF by adding explicit process steps and success criteria. RISEN is typically presented as a checklist or mnemonic for writing comprehensive prompts and is popular in business and marketing applications of LLMs.

Strengths
Limitations
Best for: Business writing, marketing content, and procedural tasks where a structured checklist improves completeness over ad-hoc prompting.

9. CO-STAR Framework

CO-STAR

Community practice · Popularized in Singapore GovTech and prompt engineering communities · 2023

CO-STAR organizes prompts into six components: Context (background information), Objective (the task), Style (writing voice or approach), Tone (emotional register), Audience (who will read the output), and Response format (structural requirements). It was notably used in GovTech Singapore's prompt engineering guidelines and has since been widely adopted in content creation workflows. CO-STAR places particular emphasis on audience awareness and stylistic control, making it well-suited for communication-oriented tasks.

Strengths
Limitations
Best for: Content creation, copywriting, communications, and any task where audience awareness, tone, and style are primary quality factors.

10. sinc-prompt

sinc-prompt

Alexandre, 2026 · DOI: 10.5281/zenodo.19152668 · MIT License

sinc-prompt applies the Nyquist-Shannon sampling theorem to prompt engineering as a structural analogy. It models a raw prompt as a continuous signal on a "specification axis" with 6 frequency bands (PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK), requiring all 6 to be sampled to avoid "aliasing" -- defined in this framework as information loss that leads to hallucination. Prompts are structured as JSON with a fixed schema, enabling machine validation. The framework assigns information-density weights to each band, with CONSTRAINTS identified as the highest-impact band at 42.7% of quality contribution based on the author's ablation experiments.

Strengths
Limitations
Best for: System prompts, agent architectures, multi-agent pipelines, and any context where prompt structure must be validated programmatically.

Methodology

Selection criteria: Techniques were included based on (a) frequency of citation in academic literature or practitioner communities, (b) distinct approach compared to other entries, and (c) sufficient documentation for a fair assessment. The list is not exhaustive; notable omissions include Retrieval-Augmented Generation (RAG), which is a system architecture rather than a prompting technique, and various domain-specific frameworks.

Formal specification: A technique is marked as having a formal spec if it has a published, machine-readable schema (e.g., JSON Schema), a mathematical formalism with verifiable constraints, or both. Peer-reviewed publication alone does not qualify -- the paper must define a validatable structure. As of March 2026, only sinc-prompt meets this criterion with a published JSON Schema at tokencalc.pro/schema/sinc-prompt-v1.json.

Neutrality: This page aims for descriptive accuracy rather than advocacy. Strengths and limitations were identified from the originating papers, independent evaluations, and practitioner reports. Corrections and additions are welcome.

References

  1. Brown, T. et al. (2020). "Language Models are Few-Shot Learners." NeurIPS 2020. arXiv:2005.14165
  2. Wei, J. et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS 2022. arXiv:2201.11903
  3. Wang, X. et al. (2022). "Self-Consistency Improves Chain of Thought Reasoning in Language Models." ICLR 2023. arXiv:2203.11171
  4. Yao, S. et al. (2023). "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. arXiv:2210.03629
  5. Yao, S. et al. (2023). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." NeurIPS 2023. arXiv:2305.10601
  6. Ning, X. et al. (2023). "Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding." arXiv preprint. arXiv:2307.15337
  7. Alexandre, M. (2026). "sinc-prompt: Applying Nyquist-Shannon Sampling to LLM Prompt Structure." Zenodo. DOI: 10.5281/zenodo.19152668