Article
· Jun 15, 2023 6m read

LangChain InterSystems PDF to Interview Questions and FlashCards

Demonstration example for the current Grand Prix contest for use of a more complex Parameter template to test the AI.

Interview Questions

There is documentation. A recruitment consultant wants to quickly challenge candidates with some relevant technical questions to a role.

Can they automate making a list of questions and answers from the available documentation?

Interview Answers and Learning

One of the most effective ways to cement new facts into accessible long term memory is with phased recall.

In essence you take a block of text information, reorganize it into a series of self-contained Questions and Facts.

Now imagine two questions:

  • What day of the week is the trash-bin placed outside for collection?
  • When is the marriage anniversary?

Quickly recalling correct answers can mean a happier life!!

Recalling the answer to each question IS the mechanism to enforce a fact into memory.

Phased Recall re-asks each question with longed and longer time gaps when the correct answer is recalled.
For example:

  • You consistently get the right answer: The question is asked again tomorrow, in 4 days, in 1 week, in 2 weeks, in 1 month.
  • You consistently get the answer wrong: The question will be asked every day until it starts to be recalled.

If you can easily see challenging answers, it is productive to re-work difficult answers, to make them more memorable.

There is a free software package called Anki that provides this full phased recall process for you.

If you can automate the creation of questions and answers into a text file, the Anki will create new flashcards for you.

Hypothesis

We can use LangChain to transform InterSystems PDF documentation into a series of Questions and answers to:

  • Make interview questions and answers
  • Make Learner Anki flash cards

Create new virtual environment

mkdir chainpdf

cd chainpdf

python -m venv .

scripts\activate 

pip install openai
pip install langchain
pip install wget
pip install lancedb
pip install tiktoken
pip install pypdf

set OPENAI_API_KEY=[ Your OpenAI Key ]

python

Prepare the docs

import glob
import wget;

url='https://docs.intersystems.com/irisforhealth20231/csp/docbook/pdfs.zip';
wget.download(url)
# extract docs
import zipfile
with zipfile.ZipFile('pdfs.zip','r') as zip_ref:
  zip_ref.extractall('.')

Extract PDF text

from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts.prompt import PromptTemplate
from langchain import OpenAI
from langchain.chains import LLMChain

# To limit for the example
# From the documentation site I could see that documentation sets
# GCOS = Using ObjectScript
# RCOS = ObjectScript Reference
pdfFiles=['./pdfs/pdfs/GCOS.pdf','./pdfs/pdfs/RCOS.pdf']

# The prompt will be really big and need to leave space for the answer to be constructed
# Therefore reduce the input string
text_splitter = CharacterTextSplitter(
    separator = "\n\n",
    chunk_size = 200,
    chunk_overlap  = 50,
    length_function = len,
)

# split document text into chuncks
documentsAll=[]
for file_name in pdfFiles:
  loader = PyPDFLoader(file_name)
  pages = loader.load_and_split()
  # Strip unwanted padding
  for page in pages:
    del page.lc_kwargs
    page.page_content=("".join((page.page_content.split('\xa0'))))
  documents = text_splitter.split_documents(pages)
  # Ignore the cover pages
  for document in documents[2:]:
    # skip table of contents
    if document.page_content.__contains__('........'):
      continue
    documentsAll.append(document)

Prep search template

_GetDocWords_TEMPLATE = """From the following documents create a list of distinct facts.
For each fact create a concise question that is answered by the fact.
Do NOT restate the fact in the question.

Output format:
Each question and fact should be output on a seperate line delimited by a comma character
Escape every double quote character in a question with two double quotes
Add a double quote to the beginning and end of each question
Escape every double quote character in a fact with two double quotes
Add a double quote to the beginning and end of each fact
Each line should end with {labels}

The documents to reference to create facts and questions are as follows:
{docs}
"""

PROMPT = PromptTemplate(
     input_variables=["docs","labels"], template=_GetDocWords_TEMPLATE
)

llm = OpenAI(temperature=0, verbose=True)
chain = LLMChain(llm=llm, prompt=PROMPT)

Process each document and place output in file

# open an output file
with open('QandA.txt','w') as file:
  # iterate over each text chunck
  for document in documentsAll:
    # set the label for Anki flashcard
    source=document.metadata['source']
    if source.__contains__('GCOS.pdf'):
      label='Using ObjectScript'
    else:
      label='ObjectScript Reference'
    output=chain.run(docs=document,labels=label)
    file.write(output+'\n')
    file.flush()

 

There were some retry and force-close messages during loop.

Anticipate this is limiting the openAI API to a fair use.

Alternatively a local LLM could be applied instead.

Examine the output file

"What are the contexts in which ObjectScript can be used?", "You can use ObjectScript in any of the following contexts: Interactively from the command line of the Terminal, As the implementation language for methods of InterSystems IRIS object classes, To create ObjectScript routines, and As the implementation language for Stored Procedures and Triggers within InterSystems SQL.", Using ObjectScript,
"What is a global?", "A global is a sparse, multidimensional database array.", Using ObjectScript,
"What is the effect of the ##; comment on INT code line numbering?", "It does not change INT code line numbering.", Using ObjectScript,
"What characters can be used in an explicit namespace name after the first character?", "letters, numbers, hyphens, or underscores", Using ObjectScript
"Are string equality comparisons case-sensitive?", "Yes" Using ObjectScript,
"What happens when the number of references to an object reaches 0?", "The system automatically destroys the object.",Using ObjectScript
Question: "What operations can take an undefined or defined variable?", Fact: "The READ command, the $INCREMENT function, the $BIT function, and the two-argument form of the $GET function.", Using ObjectScript,  a

While a good attempt at formatting answers has occurred there is some deviation.

Manually reviewing I can pick some questions and answers to continue the experiment.

Importing FlashCards into Anki

Reviewed text file:

"What are the contexts in which ObjectScript can be used?", "You can use ObjectScript in any of the following contexts: Interactively from the command line of the Terminal, As the implementation language for methods of InterSystems IRIS object classes, To create ObjectScript routines, and As the implementation language for Stored Procedures and Triggers within InterSystems SQL.", "Using ObjectScript",
"What is a global?", "A global is a sparse, multidimensional database array.", "Using ObjectScript",
"What is the effect of the ##; comment on INT code line numbering?", "It does not change INT code line numbering.", "Using ObjectScript",
"What characters can be used in an explicit namespace name after the first character?", "letters, numbers, hyphens, or underscores", "Using ObjectScript"
"Are string equality comparisons case-sensitive?", "Yes", "Using ObjectScript",
"What happens when the number of references to an object reaches 0?", "The system automatically destroys the object.","Using ObjectScript"
"What operations can take an undefined or defined variable?", "The READ command, the $INCREMENT function, the $BIT function, and the two-argument form of the $GET function.", "Using ObjectScript"

Creating new Anki card deck

Open Anki and select File -> Import

 

Select the reviewed text file

Optionally create a new Card Deck for "Object Script"

A basic card type is fine for this format

 

There was mention of a "Field 4" so should check the records.

Anki import success

Lets Study

Now choose the reinforcement schedule

Happy Learning !!

References

Anki software is available from https://apps.ankiweb.net/

Discussion (0)1
Log in or sign up to continue