An Intro to LLMs and Observability

Lately I’ve been thinking about how I would like to further improve my skills as a technical writer. One area I’ve identified for growth is my technical knowledge. You don’t need to have a developer level understanding of code to be a technical writer. However, more knowledge never hurts. As a result of this desire, I’ve been looking for opportunities to broaden my technical understanding. One such opportunity arose last week and is the focus of today’s post.

I attended a tech talk entitled “Intro to LLM Observability”. Before going to the talk, I had no idea what LLM meant and what observability in relation to an LLM meant. So before moving on, I’m to going define a few terms so we’re all on the same page. Some of these I looked up before attending, others I learned while at the talk.

  • Large Language Model: LLM stands for “Large Language Model”. Large language models are deep learning modules that are fed large amounts of data to train it. Sound familiar? What I’m describing is what most of us think of as AI. ChatGPT, Bard, and Copilot are examples of an LLM.
  • Observability: Is the process of collecting and measuring the quality of information of a system’s output.
  • Retrieval-Augmented Generation (RAG): RAGs are a bit complicated so bear with me here. RAGs allow you to upload what is known as “external data” i.e. data that was not included in an LLM’s training process. For example, you run an e-commerce site that uses a chatbot (an LLM) to assist with customer service and have a return policy in place with documentation outlining what that process entails. You can upload this documentation known as external data to the server hosting your RAG. Once the external data is uploaded, the RAG is able to provide what is known as “context” for the LLM. In other words, the LLM uses this external data as a resource to pull its answer from. Say a user asks your chat bot the question “How do I return my item?” the query first goes through the RAG for a relevancy search. This initial stop allows the RAG to gather all of the relevant information (such as your return policy documents) and augment the data in your query before sending it on for the LLM to parse and return its response.
  • Hallucinations: Hallucinations are when an LLM returns incorrect information. These often occur when an LLM has insufficient training or the LLM makes incorrect assumptions.

With those definitions out of the way, we can move on to the meat and potatoes of the talk!

Understandably, there is a lot of buzz around LLMs. As there should be! The technology offers endless opportunities. Although, with endless opportunities, also comes some pain points. This talk dived into a potential solution for one of those pain points.

Lots of us ask LLMs questions and those LLMs give us answers. However, how often are those answers the correct ones? Or helpful? That’s where observability comes in. In this talk, Matt Vincent, the founder of Source Allies, spoke about a tool called Phoenix.

Phoenix uses a series of evaluation files (referred to as evals) that contain a whole host of criteria. When you make a query to the LLM, Phoenix (using these evals) compares the query to the reference text and determines if any of the information that is returned by the LLM matches the reference text. The data is then exported into graphs and charts allowing you to quickly pinpoint areas in your knowledge base where the LLM is hallucinating or accurately answering questions.

One specific feature of Phoenix that I found particularly helpful is its ability to provide the exact source a result is pulled from. For example, say the LLM returned a hallucination as a result to the question “When will my return be processed?”. Phoenix’s data shows you the reference text for that query came directly from your company’s “Processing Your Return” article. Armed with this information, your company’s technical writers can go in and improve the “Processing Your Return” article to be more clear, concise, as well as identify any gaps in its information.

Overall, I found the concept of Phoenix super interesting. I may be diving more into the tool and the topic of observability in relation to LLMs as I can see this being an area of tech that grows quickly as more and more companies adopt the use of LLMs.

As always, thanks for reading!

Leave a comment