Create a custom ChatGPT that knows your data.
As far as “low-hanging fruit” is concerned, I think one of the best ChatGPT use cases for a company is a custom Q/A specific to their company. If done correctly, this eliminates the issue of asking the chat questions and getting made-up answers.
⭐️Follow me on LinkedIn to learn more!⭐️
How does it work
To do this I decided to start with LangChain. LangChain is an exciting library used to link together various LLMs and APIs into a single seamless question and answering chain. You can, for example, link the abilities of ChatGPT, Wolfram Alpha, and IMDB to beat Jeopardy any day.
I also found an example created by Harrison Chase who is also the person behind LangChain. This example called “Notion-qa” does exactly what I wanted to do but I wanted to make sure I could train it on a custom dataset.
Notion-qa used a company’s employee handbook as input and allowed anyone to ask HR questions using natural language and get real, actual answers based on the handbook. If you ask a question for which it doesn’t have an answer, you will get “I don’t know” rather than a made-up answer.
In order to prove that I could use any document, I used a utility to create my own employee handbook for a fictitious company. When I trained the algorithm on the new handbook, I left the original training document for the other company there as well. I found the results exciting. Let me share the results.
After training with the new document, I first tried the sample question in the Notion-qa documentation just to test it. The question was “What is the work-from-home policy”. Since I had not specified a work-from-home policy, I wasn’t sure what it would do. The response was:
“I don’t know.”
That is GREAT! The original handbook didn’t have a WFH policy either! So we got the CORRECT answer; the algorithm didn’t make up its own answer! It’s also important to note that it lists its sources so we know it is looking at both handbooks:
Sources: Notion_DB/Stealth AI Employee Handbook fa92d6d35e554a00ba6c6d9d645b12ea.md,Notion_DB/Blendle’s Employee Handbook a834d55573614857a48a9ce9ec4194e3/Office d0ebcaaa2074442ba155c67a41d315dd.md
Note: “Stealth AI” is the name of the company for which I made the handbook.
Then I asked “What holidays do we get off each year” and the answer was even more interesting:
Answer: Employees at Blendle and Stealth AI get 8 national holidays off per year, including New Year’s Day, Memorial Day, Independence Day, Labor Day, Thanksgiving Day, Day After Thanksgiving, Christmas Eve, and Christmas Day. Employees at Blendle also get a floating holiday.
So it knew to treat the two companies separately but intelligently grouped the answer together in a logical, simple-to-understand, and HUMAN way.
Ok, so let’s make sure this is just using the source documents we passed it. Maybe there is some intuition in the work-from-home question that made it decide that it didn’t know. Maybe there was a statement in one of the documents like “We have not defined a WFH policy”? So let’s ask something off-topic that ChatGPT would definitely make up an answer for…
“Plan a trip to Italy”
Here is what ChatGPT generates. (I stopped it before it was done since my wife is a travel agent and knows Italy from top to bottom.)
ChatGPT planning my vacation. It better add Venice next!
So what does our Utility do? I’ll use the Streamlit interface for this.
“There is no information about planning a company trip to Italy.”
Perfect! This code shows us where it got its information. Notice that it only lists documents we have added to our vector database. These include the employee’s handbook sections for stress and party budget! These are reasonable places to look!
This shows the power of embedding custom intelligence into the Chat interface. Imagine doing this with various product documents. You could ask questions and get answers and comparisons on the products in a useful human way. Or, you could feed articles or video transcripts into the training process and ask questions that would point out the different answers in the different sources.
I see people writing major critiques of the Chat technology because it can provide misinformation and that is a valid concern. However, with a little imagination we can realize tremendous uses for the tech today that avoid these issues until, very soon, those issues are solved altogether. I think we are only starting to realize what can be done with this technology TODAY.
My github: Updated Clone of Harrison’s Work
Original Notion-qa: https://github.com/hwchase17/notion-qa