You'd have to feed it the whole document as input, which is probably too many tokens.
It can answer simple questions like "what is the maximum allowed slope under doc M?" but I haven't tried something more complex. I wouldn't trust it to be correct anyway.
I was playing around with langchain models the other day, and they get around the token limitation by offloading the work to 'agents'. It's a clever way to use AI when needed, and otherwise use the tools we've been using for years for these tasks.
In my tests I was feeding it the metadata from a TV channel, which included things like publishing windows for episodes, content owner, etc. Then I was able to ask it questions using natural language and get answers / summaries / reports.
Example:
"How many episodes of Peppa Pig are currently live?"
It would then take the CSV dump of the metadata (though it can be hooked up to a database, too), and translate my question into a Python script with Pandas, run the script and give me the output.
It was a bit flaky at first as I assumed it was cleverer than it was. I had to give it some info about what the columns mean, and how to tell if something is "live". But I did all that in natural language, too. This was the entire script:
Code:
import os
from langchain.agents import create_csv_agent
from langchain.llms import OpenAI
from langchain.prompts.prompt import PromptTemplate
os.environ["OPENAI_API_KEY"] = "..."
agent = create_csv_agent(OpenAI(temperature=0), 'metadata.csv', verbose=True)
prompt = input("What would you like to know?")
_DEFAULT_TEMPLATE = """
The csv file contains rows representing episodes of TV programmes.
An episode is live if the publish date is in the past, and the unpublish date is in the future. Today's date is 10th April 2023
Question: {prompt}
"""
PROMPT = PromptTemplate(
input_variables=["prompt"], template=_DEFAULT_TEMPLATE
)
agent.run(PROMPT)
I then asked it how many episodes were live and it got the answer spot on.
So, what it does in the background is use GPT3 (I think) to 'understand' my question, and turn it into a series of tasks that it needs to perform to answer me.
I've just ran it again to see the steps it takes (it tells you its 'thoughts'):
Code:
> Entering new AgentExecutor chain...
Thought: I need to find out which episodes are live
Action: python_repl_ast
Action Input: df[(df['publish'] < '2023-04-14') & (df['unpublish'] > '2023-04-14')]
Observation: <snipped: some info about my data>
[4319 rows x 41 columns]
Thought: I now know the final answer
Final Answer: There are 4319 episodes of TV programmes that are live as of 14th April 2023.
> Finished chain.
So, it uses Pandas to do the work, but GPT to generate the command to do the work. I think something similar could work for building regs.
1. Ask the question, give it the regs, and some basic info on how to interpret them.
2. GPT works out what parameters need to go in <some text extracting library> to ge the answer.
3. It writes a script to use the library from step#2 and runs it against the regs
4. Returns the answer in the context of your question.