Interstitial Reasoning by AI Agents

Recently, I’ve been fascinated by the reasoning ability of AI agents.

Firstly, most LLM users agree that LLMs cannot reason (yet). Yet, this circumstance has compelled experts, researchers, and enthusiasts to to “mimic” reasoning through AI sequences and tricks.

High-Level: General Reasoning

Before we dive into the cool tricks I’ve learned, let’s first examine what “reasoning” actually is.

Not to be confused with “consciousness,” reasoning is simply the ability to follow a sequence of logical steps to a finite or indefinite end. When people say “LLMs are bad at reasoning,” they usually mean that models are able to reason through basic one or two-step logic, but no further.

Sadly (or optimistically, depending on your perception of the AGI threat), this means that LLMs are no match for certain extremely basic tasks, like multi-step algebra or writing good long-form content. But why is reasoning really so difficult? I’ve said it already: AI is excellent at solving many solved unary tasks.

Reasoning just consists of a chain of these many unary tasks, so what’s to stop me from forcing AI into this thought pattern?

The answer: I firm guiding hand and some patience.

Establishing Reasoning Frameworks

First, let me establish what a reasoning framework must consist of to properly mirror real-life reasoning.

To reason, one must:

  • Consider an initial input

  • Assess an appropriate nature for the first-level output

  • Pick an appropriate means or a medium to achieve/express that nature

  • Execute the task on the initial input

  • See how that “building block” output contributes towards the final many-step output.

  • Repeat with a new input derived from the first-step output.

This is a comprehensive reasoning framework for complex tasks. These are the 6 tasks one must do when figuring out how to do something hard. Our neurons are well-adapted (fine-tuned, if you will) to accomplishing these tasks.

Coaxing an AI to do this is a bit harder. We can fix this by filling in some of the steps:

  • Explain the structure of the initial input

  • Provide instructions for how a model can address that predictable input structure

  • Tell the model how to execute those instructions

  • Execute the task

  • Funnel the new result back and restart the chain of thought.

The Main Reasoning Frameworks

What you’ll find is that most LLM reasoning frameworks follow the basic format laid out in the last section. Right now, there are a few basic reasoning frameworks, but there are two caveats to that:

  1. These reasoning frameworks are highly customizable to the profile of each task they should accomplish. The steps in a “ReAct” framework will differ when you apply it to processing an image vs. writing a legal brief.

  2. LLMs are still new. These freshly-documented frameworks demarcate a mere start to an emerging academic discipline. You will have the opportunity to formulate and spread new ideas about reasoning frameworks that worked for you.

RTF — Request, Task, Format

This is probably the most familiar format to experienced AI users. You provide an input, suggest how an AI accomplishes a task, and tell it how to output it (possibly providing examples):

Johnny has four dogs, three turtles, and two cats. Express the pets of Johnny's household in JSON format, using the following structure:
{
	animal: n
}
For example:
{
	polar_bear: 2,
	llama: 6
}
-------------------
{
"dogs": 4,
"turtles": 3,
"cats": 2
}

RTF serves as the bedrock for the interstitial reasoning of more complex frameworks. Most reasoning frameworks are just many RTF steps glued together.

Chain of Thought

As another familiar framework, Chain of Thought glues together RTF tasks in a simple sequence.

Novak has 18 tennis balls spread across 6 cans of tennis balls. He buys 2 more cans of tennis balls. How many tennis balls does Novak have?

You should first calculate how many tennis balls are in each can. Then, you should determine how many tennis balls would be in the new cans purchased.
----------------
Novak initially has 18 tennis balls spread across 6 cans, which means there are 3 tennis balls in each can (18 divided by 6 equals 3).

After buying 2 more cans, he has a total of 6 + 2 = 8 cans of tennis balls.

Since each can contains 3 tennis balls, the total number of tennis balls Novak has is 3 * 8 = 24.

So, Novak has 24 tennis balls in total after buying 2 more cans.

ReAct (Reason, Act)

Things are getting spicier.

In this example I’ve written an agent that collects and cites different facts that are related to a query:

You are now a Wikipedia agent. Your goal is to generate a list of 10 facts about New York City.

Use the following format:
- List of facts: [facts]
- Thought: I have not collected 10 total facts.
- Action: The action to take, should be one of [wikipedia]
- Action Input: the input to the action
- Observation: the result of the action
- Thought: I have now found an additional fact
- Final Answer: List of 10 fact, each fact is on a new line.

This may seem like a long-winded approach (it is), but it adeptly illustrates how ReAct works.

We walk the model through using a human-like recursive reasoning process, and then constrain how it can behave by using tools and suggesting interstitial formats (using Wikipedia, suggesting that the model preserve a list of facts).

As a result, we get a Wikipedia-backed list of 10 facts about New York City. We can really expand this approach in a few ways:

Prioritizing Tool Use

In certain cases, we might want to prioritize the use of tools based on cost, latency, trustworthiness, or any other criterion.

For an academic essay, we might suggest something like this, so that an agent can better decide which research tool to use.

You are InformationGPT. I will give you a topic and you should provide 5 facts on the topic:

"Geckos in North America Discovered Since 1950"

You must gather facts from a list of priority-ordered materials. Priority orders means that information from certain sources is trusted more than others.
1. Trust the source text the most (source_text_retriever tool)
2. Trust Wikipedia second most (wikipedia tool)
3. Trust the open internet the least (internet tool)

Use the following format:
- List of facts: [facts]
- Thought: I have not collected 5 total facts.
- Action: The action to take, should be one of [source_text_retriever, wikipedia, internet]
- Action Input: the input to the action
- Observation: the result of the action
- Thought: I have now found an additional fact
- Final Answer: List of 5 facts, each fact is on a new line.

Limiting Unwanted Material

Sometimes, regular language models go awry and provide negative output.

Certain applications may want to ignore information that does not align with their platform rules.

You are CensorGPT. You should provide me with 5 negative statistics about crime in the United States.

Use the following format:
- List of statistical observations: [facts]
- Thought: I have not collected 5 total statistics.
- Action: The action to take, should be one of [wikipedia]
- Action Input: the input to the action
- Observation: the result of the action
- Thought: I have determined that the statistic does not pertain to a specific racial demographic.
- Action: Try again if rules are broken or add the statistics to the list
- Final Answer: List of 5 facts, each fact is on a new line.

Chain of Thought Hybrid

Chain of Thought is a basic subunit of ReAct, but sometimes its implementation can get lost in the mess of recursion.

We can combine Chain of Thought with ReAct to form a complex query that morphs an output.

You are TrainsJsonGPT. You should provide me with train times between New York City and Washington DC on 2/21/2024 in JSON format.

Use the following format:
- List of train times: [list]
- Thought: I have not collected all the train times
- Action: The action to take, should be one of [amtrak_api_tool]
- Action Input: the input to the action
- Observation: the result of the action
- Thought: There is a train that operates between the locations at the supplied date.
- Action: Add the train to the list
- Unformatted Answer: List of all train times for the day
- Final Answer: List of all train times in JSON format:
{
	train_times: [ISO8601 Datetime in EST]
} 

Other frameworks

I think overtime, the basic RTF/CoT/ReAct frameworks will branch off into new but similar/slightly optimized versions of themselves. We are already seeing a lot of specialization, as evidenced by a plethora of tools and mediums. So realistically, these are just the basic building blocks (LLM’s COBOL, if you will) of future iterations.