In one of our last articles, we gave an overview of various techniques with which companies can use LLMs (large language models) in business scenarios. Most of these scenarios require these models to know business data and business documents. One promising approach is the concept of in-context learning, in which the relevant data is handed over to the LLM during the query. Since the model is not fine-tuned, there is no cost to adjust the model weights and the model remains flexible with regard to the context of the inquiries. In this article, we will exemplify the basic approaches to in-context learning and their disadvantages using externally provided and self-hosted LLMs.
In-context learning means that the data about which the LLM is to be interviewed is included in the LLM prompt: for example, entire contracts to find a specific detail in them, long turnover tables to find weeks with the highest turnover, or CVs of applicants to find among them the most suitable person for an open position - all with just a precise formulation of the question in human language. This approach may seem surprisingly simple: Simply enter all documents that are considered relevant to the LLM via the prompt, add the question and wait for the answer. But even if the desired result comes out in the end, you'll likely be disappointed with the time you've waited or be put off by the model provider's bill from continuing this approach. By forwarding your documents to the LLM, you increase the amount of data to be processed and therefore the time and expensive computing resources required to fulfill the request. In short, we need a better strategy instead of passing on all documents to the model without prior selection.
In the following, we will explore these strategies using an example from the HR sector using the LLmaIndex library: We want to ask the LLM who, out of a range of potential candidates, it thinks is best suited for an open position. We will promptly provide him with a description of the position together with the candidates' CVs as documents. The Llama Index Library provides us with some very useful tools for selecting and handing over the appropriate documents to the LLM. Document collections are divided into nodes, which in turn are organized in indexes. These indexes can include ordered lists or tree structures of documents, or even an unordered set of documents in a vector space. These indices are called vector indices. Choosing the right index for each individual type of document is crucial for consistent retrieval of the appropriate documents and query performance.
Vector indexes are particularly interesting for scenarios with a large number of documents that would be too many for the LLM to process at the same time. They are created by assigning each document a vector representation, known as “embedding.” These “embeddings” represent the semantic similarity of the documents to each other and are created using an embedded LLMS (usually a less powerful model than the one used for the query due to cost or resource reasons). The documents that are semantically closest to the LLM prompt are now retrieved from the index, i.e. the documents with the shortest distance between their embeddings and the embedding of the LLM prompt.
In our HR use case with the LLamaIndex library, a first attempt could be to combine all documents, i.e. all job offers and all applicant profiles, in a single vector index and let the library select the right documents for us. First, the two sets of documents are separated with the SimpleDirectoryReader
loaded by llamaIndex:
Then we create a vector index from it:
To query the LLM, we simply create a QueryEngine
from the index object:
In theory, this could well work, but in our example, it turned out that repeating the query returned random results:
On closer inspection, it turned out that a total of only two documents were forwarded to the LLM, meaning that the LLM is unable to make a statement about all potential applicants.
So we need to get LlamaIndex to select documents in this particular case as necessary for the scenario. For the above-mentioned request, this means that we must hand over all candidate profiles to the LLM in addition to the relevant job advertisement, as we want to make a statement about all candidates.
To do this, we've implemented a custom document retriever that depends on two document indexes: a list index of all candidate profiles and a vector index of all potential job offers.
In this class, we have the method _retrieve
implemented in such a way that, for each query, all candidate profiles are returned together with the job offer that is semantically closest to the query. With the method _get_embeddings
The embeddings of the query and the job offer documents are retrieved in order to semantically compare the job offers with the query.
You could also have taken a simple keyword matching approach, which could also be sufficient to search for the job offer in question.
A query can now be performed using the user-defined document retriever in the following way:
All candidate profiles and the relevant job offer are now uniformly forwarded to the LLM. There are now uniform scores for each candidate. More detailed queries on candidates' specific abilities are now also possible.
When llamaIndex is used as described above, the library uses OpenAI's LLMs to generate the embeddings and answer the requests. In business scenarios involving sensitive data, this is often undesirable, as the data is subject to data protection or confidentiality policies. With the advent of ever more powerful open source models under business-suitable licenses such as gpt4all or MPT 7B, an LLM query with in-context data transfer in a controlled and secure environment, such as in a private cloud network or in the company's own data center, is just a stone's throw away.
As a proof of concept, we transferred our HR use case from the standard LLamaIndex setup with OpenAI to a GPU-equipped cloud notebook in which we loaded the MPT 7B model. With this setup, we were also able to query our documents. Since the storage requirement of the model increases sharply with the amount of data, the cloud environment quickly reached its limits, including multiple documents in the queries. In addition, querying in real time, i.e. without waiting a long time for answers, requires the use of a very powerful graphics processor.
While this shows that it is possible to run data-driven LLM applications on your own hardware or within your own cloud network, it is clear that a well-thought-out approach to selecting documents is required, as computing resources of the scale required by today's LLMs are scarce and expensive.
In this article, we examined how LLMs can be made ready for use in companies through in-context learning. With this technique, documents are handed over to the LLM via its prompt, depending on the scenario: Depending on the use case and the semantics of the request, input data is selected from a document index. It is important to carefully balance the amount of data needed to answer the request against the amount of computing resources required to query the LLM with that data.
In an HR use case, we showed how to use a custom LLAMIndex document retriever to implement such a case-based document selection. We also examined self-hosted models and found that they represent an increasingly available and realistic alternative to external LLM service providers whose use requires sensitive business data to leave the company.
The recent rapid development of LLMs and the advent of document integration libraries strongly suggest that companies will soon be able to integrate LLM-based data queries into a wide range of business processes.
Artificial intelligence optimises networks, improves customer service through personalised chatbots, and enables new data-driven business models. However, challenges such as data privacy must be addressed. AI offers great potential for the industry.
In our latest article, we highlight the central role of compliance and quality management in the development of AI systems. We show how innovative approaches can be reconciled with regulatory requirements — an important step for companies that want to drive sustainable innovations in the fast-moving world of technology.
Discover the impact of generative AI in retail. This technology unites departments, personalises content, and transforms customer experiences. Leading brands such as Coca-Cola and Walmart are already using their potential to optimise operations and drive innovation. Explore the future of retail with generative AI...