Updated 8/5/2024

Several patterns for effectively working with Large Language Models (LLMs) in more complex applications. Relevant for developers and engineers working on advanced LLM applications, especially those dealing with complex tasks, large datasets, or real-time interactions.

See AWS Lambda Proxy For AI Services for AWS lambda code that proxies calls to OpenAI, AWS Bedrock, and AWS Polly, exposing OpenAI-style models, completions, transcriptions, and speech endpoints.

Pattern: Prompt Orchestration

%%{init: {'theme':'dark'}}%%
graph LR
    subgraph Before
    A[Complex One-Shot Task] --> B[LLM]
    B -->|Failure| C((❌))
    end

    subgraph After
    D[Complex Task] --> E{Decomposition}
    E --> F[Simple Task 1]
    E --> G[Simple Task 2]
    E --> H[Simple Task n]
    
    F & G & H --> I
    
    subgraph I[Orchestrator]
    J[Task 1 + LLM] --> K[Task 2 + LLM]
    K --> L[Task n + LLM]
    end
    
    I -->|Success| M((✓))
    end

    Before ~~~ After

Problem

LLMs are very limited. They can reliably do some, simple, small reasoning tasks.
But they struggle to do large, complex tasks.

Solution

Break big, complex tasks down into an orchestrated set of small tasks.
Identify limitations of the LLM by writing a naive, one-shot instruction for a complex task.
Allow failure cases to inform the breakdown of the complex task into a series of simple tasks.
Arrange the simple tasks in sequence with traditional code, managing the flow of information between them.

Limitations

With enough elbow grease applied to the decomposition of complex tasks into series of simple tasks, LLMs can do anything.
But just because they can doesn't mean they should.
Breaking down big, complex tasks in such a way that respects LLM limitations is a lot of work.
And the final result may be too costly and/or slow to be worth the effort.

Example: Complex Game Logic

Liminai uses this pattern extensively in the context of handling complex, branching game logic.

%%{init: {'theme':'dark'}}%%
flowchart LR
UserAction
PossibilityChecker["Task: Determine Possibility"]
isPossible{"Is Possible?"}
failurePoemWriter["Task: Generate Failure Poem"]
onDeath["Task: Handle Death"]
CommentaryPoemGenerator["Task: Generate Commentary Poem"]
AnnounceWinner["Announce Winner"]
MovementChecker["Task: Determine Movement"]
isMovement{"Is Movement?"}


WinChecker["Task: Check if Win Conditions Met"]
isDead{"Is Dead?"}
hasWon{"Has Won?"}

subgraph resultsHandler["Tasks For Handling Results"]
    direction LR
    ActionResultGenerator["Task: Generate Action Result"]
    HealthAndInventoryUpdater["Task: Update Health and Inventory"]
    DeathChecker["Task: Check if Dead"]
end

subgraph locationHandler["Tasks For Handling Locations"]
    direction LR
    LocationGenerator["Task: Generate Location"]
    ImageTagGenerator["Task: Generate Location Image Tags"]
    ImageDiffusionService["Task: Generate Image"]
    end

UserAction-->PossibilityChecker-->isPossible
isPossible--"No"-->failurePoemWriter-->end1((End))
isPossible--"Yes"-->MovementChecker
MovementChecker-->isMovement
isMovement--"Yes"-->locationHandler-->end2((End))
isMovement--"No"-->resultsHandler-->isDead
isDead--"Yes"-->onDeath-->end3((End))
isDead--"No"-->WinChecker
WinChecker-->hasWon
hasWon--"Yes"-->AnnounceWinner-->end4((End))
hasWon--"No"-->CommentaryPoemGenerator-->end5((End))

This example is oversimplified for brevity and for the sake of illustrating the pattern.

To simulate real-world constraints, a task checks whether or not the User Action is possible given current conditions.
To enable player exploration, a set of tasks checks whether or not the User Action involves movement to a new location and generates that location and an accompanying image.
To enable interactivity between the player and environment, a set of tasks generates the result of the action, updates health and inventory, and checks whether or not the player has died.
To enable a dynamic win condition, a task checks whether or not the result of the action fulfills a loosely defined win condition.

Pattern: Hierarchical Context Compression

%%{init: {'theme':'dark'}}%%
graph LR
    
    CS[Context Selector]
    TI[Task Instruction]

    Compressor -->|"Used In"| HCB -->|"Yields"| HC
    TI-->|"Sent To"| LLM
    User["fa:fa-user User Action Input"] -->|"Sent To"| TI
    HC -->|"Filtered By"| CS -->|"Sends Context To"| TI


    subgraph Compressor
        direction TB 
        I(Info Input) 
        --> |"Embedded In"| CI(Compression Instruction)
        -->|"Given To"| CLLM[Compression LLM]
        -->|"Responds With"| CO(Compressed Output)
    end

    subgraph HCB[Hierarchical Context Builder]
        D1(Detailed Info 1)
        D2(Detailed Info 2)
        SC[[Summary Compressor]]
        S1("Summary Of Info 1")
        S2("Summary Of Info 2")
        MSC[[Meta Summary Compressor]]
        MS("Summary Of Summaries 1 & 2")
        D1 & D2 -.->|"Input To"| SC -.->|"Outputs"| S1 & S2
        S1 & S2 -.->|"Input To"| MSC -.->|"Outputs"| MS
    end

    subgraph HC[Hierarchical Context]
        direction LR
        D(Detailed Info)
        S(Summary Info)
        MSS(Meta Summary Info)
    end

Background

LLMs have a context window that represents the amount of information they can see and reason about at once.
For modern LLMs, this context window can be big-- big enough to store the full text of a large book.

Problem

A large context window does not equate to an ability to effectively reason about that context.
A large context window filled with information irrelevant to the current task results in worse outcomes.
Filling a large context window increases LLM response latency and dramatically increases cost.

Solution

Send only content that is strictly necessary to perform a task.
Store information at multiple levels of detail, from comprehensive to highly condensed.
Progressively compress older or accumulated information into more concise forms.
Provide the LLM with the most appropriate level of detail for each task, favoring compressed versions when possible.

Example: Text Adventure Game Turns

JournAI uses this pattern in the context of a text adventure game, allowing the LLM to maintain context across the entire game with a limited input window.

%%{init: {'theme':'dark'}}%%
graph LR
    subgraph GH["Game History"]
        DR[Detailed Results]
        SR[Summarized Results]
        CS[Chapter Summaries]
        NPCI[NPC Information]
        NPCS[NPC Summaries]
    end

    subgraph CSFL["Context Selection"]
        direction LR
        VRDR[Very Recent Detailed Results]
        RSR[Recent Summarized Results]
        ACS[All Chapter Summaries]
        RDNPCI[Recently Discovered NPC Information]
        ANPCS[All NPC Summaries]
    end

    PA[Player Action] --> |Triggers| CSFL
    CSFL-->|"Sent To"| LLM[LLM for Game Logic]
    CSFL-->|"Fetches From"|GH

    LLM --> |Generates| NGO[New Game Output]
    NGO --> |Updates| DR
    NGO --> |Updates| SR
    NGO --> |May Update| NPCI

    SR --> |Periodically Summarized| CS
    NPCI --> |Periodically Summarized| NPCS

The Game History is composed of:

Detailed Results (accumulated each turn)
Summarized Results (accumulated each turn)
Chapter Summaries (periodically generated)
NPC Information (accumulated each turn)
NPC Summaries (periodically generated)

When players execute an action, a subset of the Game History is included in the LLM instruction.

Very Recent Detailed Results (e.g. Last 5)
Recent Summarized Results (e.g. Last 20)
All Chapter Summaries
Recently Discovered NPC Information (e.g. Last 20)
All NPC Summaries

Example: Coding Assistant

AutoCoder uses this pattern in the context of a Coding Assistant, allowing the LLM to maintain context across an entire codebase with a limited input window.

%%{init: {'theme':'dark'}}%%
graph LR
    UC[User Command] --> RDP
    IP[Ingestion Process] -->|"Generates"| PC
    
    subgraph PC[Project Context]
    direction TB
        FC[File Contents]
        FS[File Summaries]
        FB[File Blurbs]
        DS[Directory Summaries]
        FO[Feature Overviews]
        FC-->|"Compressed Into"|FS-->|"Compressed Into"|FB
        FB-->|"Compressed Into"|DS & FO
    end
    
    PC --> RDP
    
    subgraph RDP[Relevancy Determination Process]
    direction TB
        subgraph Task1[Task 1: Determine Relevant File Blurbs]
            direction LR
            AFB[All File Blurbs]
            AFO[All Feature Overviews]
            ADS[All Directory Summaries]
            UC1[User Command]
        end
        
        subgraph Task2[Task 2: Determine Relevant File Summaries]
            direction LR
            RFB[Relevant File Blurbs]
            UC2[...]
        end

        subgraph Task3[Task 3: Determine Relevant File Contents]
        direction LR
            RFS[Relevant File Summaries]
            UC3[...]
        end
        Task1-->|"Send Blurbs To"|Task2
        Task2-->|"Send Summaries To"|Task3
        

    end
    
    RDP -->|"Outputs"| SC[Selected Context]
    
    subgraph SC[Selected Context]
        direction LR
        AFO1[All Feature Overviews]
        RF2[Relevant File Blurbs]
        RMSI[Relevant File Summaries]
        RMFC[Relevant File Contents]
    end
    
    
    SC --> LLM[LLM for Coding Assistant]
    
    LLM -->|"Generates"| CO[Code Change Output]

The Project Context consists of:

Full, Uncompressed File Contents
File Summaries (generated from File Contents during ingestion)
File Blurbs (generated from File Summaries during ingestion)
Directory Summaries (generated from File Blurbs during ingestion)
Feature Overviews (generated from File Blurbs during ingestion)

When users execute a command, a relevant subset of the Project Context is included in the LLM instruction, with relevancy (in relation to the User Command) determined by the LLM.

All Feature Overviews
Relevant File Blurbs (describing the high-level purpose of all related files)
Relevant File Summaries (describing the interfaces for more involved files)
Relevant File Contents (the full contents for key files)

Relevancy is determined by the following process prior to command execution:

Provide LLM with the User Command & All File Blurbs, instructing it to greedily determine which files are relevant.
Provide LLM with the User Command & Relevant File Blurbs, asking it which files it would need to see summaries for.
Provide LLM with the User Command & Relevant File Summaries, asking it which files it would need to see the full contents for.

Pattern: JSON Schema Instructions

%%{init: {'theme':'dark'}}%%
graph LR

   subgraph TI[LLM Task Instruction]
   direction LR
   TIJ[Task Input JSON]
   TOJS[Task Output JSON Schema]
   INS[Transformation Instruction]
   end

Raw[Task Output JSON String]
DM[Defensive Marshaler]
TO[Task Output JSON]

TI -->|"Outputs"| Raw -->|"Sent To"| DM-->|"Retry On Failure"|TI
DM -->|"Outputs"|TO

Background

LLM stands for Large Language Model, meaning LLMs typically specialize in outputting plain text.

Problem

In a programmatic environment, especially when orchestrating the flow of information between multiple prompts, plain text is inadequate and must be marshalled into structured data.

Solution

Provide the LLM with JSON Schema for the desired, structured output, and instruct it to respond with JSON that fulfills the schema.
Order JSON properties in the Schema such that they represent an ordered, logical process, with later properties dependent on the value of earlier properties.
Defensively marshal the LLM output into JSON, accounting for type variance and incorporating retry logic on parsing failure.

Example: Possibility Checker

{
    "action": "Refurbish a bathroom",
    "actor": {
        "name": "John Doe",
        "age": 30,
        "skills": ["JavaScript", "React", "Node.js"]
    },
    "initialConditions": ["Bathroom is delapidated and non-functional", "Plumbing shot", "Wiring degraded"]
}

// Task Output JSON Schema
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/actionPossibility.schema.json",
    "title": "Action Possibility",
    "description": "Whether or not an action is possible, based on the actor and initial conditions.",
    "type": "object",
    "properties": {
          "actorFactors": {
          "description": "Given the actor, what factors are at play that influence the possibility of the action?",
          "type": "array",
          "items": {
              "type": "string"
          }
          },
          "initialConditionFactors": {
          "description": "Given the initial conditions, what factors are at play that influence the possibility of the action?",
          "type": "array",
          "items": {
              "type": "string"
          }
          },
          "isPossible": {
          "description": "Whether or not the action is possible.",
          "type": "boolean"
          }
      }
  }


# Input
{Task Input JSON}

# Output Format JSON Schema
{Task Output JSON Schema}

# Instruction
Given the Input, output a JSON object based on the Output Format JSON Schema.
Output only valid JSON, saying nothing else.

Pattern: Auto Completion Of Streaming JSON

%%{init: {'theme':'dark'}}%%
graph LR
    PJS[Partial JSON String]
    
    subgraph FJO[Fixed JSON Object]
    direction LR
    FP1[Completed / Fixed Property #1]
    FP2[Completed / Fixed Property #2]
    end

    subgraph UI
    UI1[First UI Element]
    UI2[Second UI Element]
    end

    subgraph EP1[Property #1]
    direction TB
    V1[Value 1]
    D1[Default 1]
    V1-->|"If Not Found Defaults To"|D1
    end

    subgraph EP2[Property #2]
    direction TB
    V2[Value 2]
    D2[Default 2]
    V2-->|"If Not Found Defaults To"|D2
    end

    subgraph EJO[Example JSON Object]
    
    EP1
    EP2
    end

    subgraph PJF[Partial JSON Fixer]
    EJO
    end

    PJS -->|"Sent To"| PJF -->|"Outputs"| FJO
    FP1 -->|"Updates"| UI1
    FP2 -->|"Updates"| UI2

Background

LLMs can stream their output token by token, allowing for a more responsive user experience.

Problem

However, streaming structured data like JSON presents challenges, as the JSON structure may not be complete until the entire response is generated.
When streaming LLM outputs, we need a way to handle incomplete JSON structures while still providing real-time updates to the user interface.

Solution

Implement a JSON fixing mechanism that can handle incomplete JSON structures.
Define an example object that represents the expected JSON structure and provides default values for all properties.
Ensure that the expected structure and example object have their properties ordered in alignment with the UI components they stream to.

Example

JournAI uses this pattern to stream action results, stat changes, and character information to different sections of a UI via a Streaming JSON Fixer Class