Latest insight
In-context learning techniques

Techniques and challenges of in-context learning

In one of our last articles, we gave an overview of various techniques with which companies can use LLMs (large language models) in business scenarios. Most of these scenarios require these models to know business data and business documents. One promising approach is the concept of in-context learning, in which the relevant data is handed over to the LLM during the query. Since the model is not fine-tuned, there is no cost to adjust the model weights and the model remains flexible with regard to the context of the inquiries. In this article, we will exemplify the basic approaches to in-context learning and their disadvantages using externally provided and self-hosted LLMs.

The curse and blessing of document retrieval

In-context learning means that the data about which the LLM is to be interviewed is included in the LLM prompt: for example, entire contracts to find a specific detail in them, long turnover tables to find weeks with the highest turnover, or CVs of applicants to find among them the most suitable person for an open position - all with just a precise formulation of the question in human language. This approach may seem surprisingly simple: Simply enter all documents that are considered relevant to the LLM via the prompt, add the question and wait for the answer. But even if the desired result comes out in the end, you'll likely be disappointed with the time you've waited or be put off by the model provider's bill from continuing this approach. By forwarding your documents to the LLM, you increase the amount of data to be processed and therefore the time and expensive computing resources required to fulfill the request. In short, we need a better strategy instead of passing on all documents to the model without prior selection.

Selection of relevant documents

In the following, we will explore these strategies using an example from the HR sector using the LLmaIndex library: We want to ask the LLM who, out of a range of potential candidates, it thinks is best suited for an open position. We will promptly provide him with a description of the position together with the candidates' CVs as documents. The Llama Index Library provides us with some very useful tools for selecting and handing over the appropriate documents to the LLM. Document collections are divided into nodes, which in turn are organized in indexes. These indexes can include ordered lists or tree structures of documents, or even an unordered set of documents in a vector space. These indices are called vector indices. Choosing the right index for each individual type of document is crucial for consistent retrieval of the appropriate documents and query performance.

vector indices

Vector indexes are particularly interesting for scenarios with a large number of documents that would be too many for the LLM to process at the same time. They are created by assigning each document a vector representation, known as “embedding.” These “embeddings” represent the semantic similarity of the documents to each other and are created using an embedded LLMS (usually a less powerful model than the one used for the query due to cost or resource reasons). The documents that are semantically closest to the LLM prompt are now retrieved from the index, i.e. the documents with the shortest distance between their embeddings and the embedding of the LLM prompt.

The business documents are presented as embeddings in a document index. When the documents are to be queried, the index is searched for the appropriate documents. These are then handed over to the LLM together with the request.

A naive approach to indexing

In our HR use case with the LLamaIndex library, a first attempt could be to combine all documents, i.e. all job offers and all applicant profiles, in a single vector index and let the library select the right documents for us. First, the two sets of documents are separated with the SimpleDirectoryReader loaded by llamaIndex:


from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
profile_documents = SimpleDirectoryReader('data/employee_profiles').load_data()
job_offer_documents = SimpleDirectoryReader('data/job_offers').load_data()

Then we create a vector index from it:


combined_index = GPTVectorStoreIndex([]) 
for documents in [profile_documents, job_offer_documents]:     
	for document in documents:         
  	combined_index.insert(document)

To query the LLM, we simply create a QueryEngine from the index object:


query_engine = combined_index.as_query_engine()query_string = '''Give me a score to what extent each of the candidates is suited for the 'freelanceproject manager' project?'''query_engine.query(query_string)

In theory, this could well work, but in our example, it turned out that repeating the query returned random results:


First run:	
	Candidate D: 8/10	
  Candidate I: 6/10	
  Candidate X: 5/10	
  Candidate M: 7/10
  
Second run:	
	Candidate I: 8/10	
  Candidate X: 6/10	
  Candidate D: 5/10	
  Candidate M: 7/10

On closer inspection, it turned out that a total of only two documents were forwarded to the LLM, meaning that the LLM is unable to make a statement about all potential applicants.

Scenario-specific search techniques

So we need to get LlamaIndex to select documents in this particular case as necessary for the scenario. For the above-mentioned request, this means that we must hand over all candidate profiles to the LLM in addition to the relevant job advertisement, as we want to make a statement about all candidates.

To do this, we've implemented a custom document retriever that depends on two document indexes: a list index of all candidate profiles and a vector index of all potential job offers.


from llama_index.indices.base_retriever import BaseRetriever

class ProfileQueryRetriever(BaseRetriever):
    def __init__(self, profile_index: GPTListIndex, job_offer_index: GPTVectorStoreIndex) -> None:
        super().__init__()
        self._profile_index = profile_index
        self._job_offer_index = job_offer_index

In this class, we have the method _retrieve implemented in such a way that, for each query, all candidate profiles are returned together with the job offer that is semantically closest to the query. With the method _get_embeddings The embeddings of the query and the job offer documents are retrieved in order to semantically compare the job offers with the query.


from llama_index.data_structs import Node, NodeWithScore
from llama_index.indices.query.schema import QueryBundle
from llama_index.indices.query.embedding_utils import get_top_k_embeddings

def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
				"""Collect all nodes from the profile index and
				the closest node from the job offer index."""
        # include all profile nodes
        nodes_with_scores = [
            NodeWithScore(node=node, score=1)
            for node in self._profile_index.docstore.get_nodes(
                self._profile_index.index_struct.nodes
            )
        ]

        # include the job offer node whose semantic embedding is closest
				# to the embedding of the query
        job_offer_nodes = self._job_offer_index.docstore.get_nodes(
            self._job_offer_index.index_struct.nodes_dict.values()
        )
        query_embedding, job_offer_node_embeddings = self._get_embeddings(
            query_bundle, job_offer_nodes
        )
        top_similarities, top_idxs = get_top_k_embeddings(
            query_embedding,
            job_offer_node_embeddings,
            similarity_top_k=1,
            embedding_ids=list(range(len(job_offer_nodes))),
        )
        nodes_with_scores += [
            NodeWithScore(node=node, score=score)
            for node, score in zip(
                [job_offer_nodes[idx] for idx in top_idxs], top_similarities
            )
        ]
        return nodes_with_scores

def _get_embeddings(
        self, query_bundle: QueryBundle, nodes: List[Node]
    ) -> Tuple[List[float], List[List[float]]]:
        """Get embeddings of the query and a list of nodes."""
        if query_bundle.embedding is None:
            query_bundle.embedding = self._job_offer_index._service_context.embed_model.get_agg_embedding_from_queries(
                query_bundle.embedding_strs
            )

        node_embeddings: List[List[float]] = []
        for node in nodes:
            if node.embedding is None:
                node.embedding = self._job_offer_index.service_context.embed_model.get_text_embedding(
                    node.get_text()
                )
            node_embeddings.append(node.embedding)

        return query_bundle.embedding, node_embeddings
Man hätte auch einen einfachen Keyword-Matching-Ansatz wählen können, der für das Suchen des betreffenden Stellenangebots ebenfalls ausreichen könnte.

Eine Abfrage kann nun mit dem benutzerdefinierten Dokumentabrufer auf die folgende Weise durchgeführt werden:

retriever = ProfileQueryRetriever(profile_index, job_offer_index)
retriever_engine = RetrieverQueryEngine(retriever=retriever)
query='''
Give me a score to what extent each of the candidates is suited for the 
'freelance project manager' project?
'''
response = retriever_engine.query(query)

You could also have taken a simple keyword matching approach, which could also be sufficient to search for the job offer in question.

A query can now be performed using the user-defined document retriever in the following way:


retriever = ProfileQueryRetriever(profile_index, job_offer_index)
retriever_engine = RetrieverQueryEngine(retriever=retriever)
query='''
Give me a score to what extent each of the candidates is suited for the 
'freelance project manager' project?
'''
response = retriever_engine.query(query)

All candidate profiles and the relevant job offer are now uniformly forwarded to the LLM. There are now uniform scores for each candidate. More detailed queries on candidates' specific abilities are now also possible.

Outlook on self-hosted models

When llamaIndex is used as described above, the library uses OpenAI's LLMs to generate the embeddings and answer the requests. In business scenarios involving sensitive data, this is often undesirable, as the data is subject to data protection or confidentiality policies. With the advent of ever more powerful open source models under business-suitable licenses such as gpt4all or MPT 7B, an LLM query with in-context data transfer in a controlled and secure environment, such as in a private cloud network or in the company's own data center, is just a stone's throw away.

As a proof of concept, we transferred our HR use case from the standard LLamaIndex setup with OpenAI to a GPU-equipped cloud notebook in which we loaded the MPT 7B model. With this setup, we were also able to query our documents. Since the storage requirement of the model increases sharply with the amount of data, the cloud environment quickly reached its limits, including multiple documents in the queries. In addition, querying in real time, i.e. without waiting a long time for answers, requires the use of a very powerful graphics processor.

While this shows that it is possible to run data-driven LLM applications on your own hardware or within your own cloud network, it is clear that a well-thought-out approach to selecting documents is required, as computing resources of the scale required by today's LLMs are scarce and expensive.

outlook

In this article, we examined how LLMs can be made ready for use in companies through in-context learning. With this technique, documents are handed over to the LLM via its prompt, depending on the scenario: Depending on the use case and the semantics of the request, input data is selected from a document index. It is important to carefully balance the amount of data needed to answer the request against the amount of computing resources required to query the LLM with that data.

In an HR use case, we showed how to use a custom LLAMIndex document retriever to implement such a case-based document selection. We also examined self-hosted models and found that they represent an increasingly available and realistic alternative to external LLM service providers whose use requires sensitive business data to leave the company.

The recent rapid development of LLMs and the advent of document integration libraries strongly suggest that companies will soon be able to integrate LLM-based data queries into a wide range of business processes.

About the author

Max Schattauer

Data Scientist, Data Engineer, Developer

Our latest articles

Generative AI
Telecom sector: How AI is Changing the Game

Artificial intelligence optimises networks, improves customer service through personalised chatbots, and enables new data-driven business models. However, challenges such as data privacy must be addressed. AI offers great potential for the industry.

#AI Security
#Generative AI
#AI Strategy
Martin Griessmann
December 12, 2024
Read
AI Compliance & Security
Developing responsible AI systems: An introduction

In our latest article, we highlight the central role of compliance and quality management in the development of AI systems. We show how innovative approaches can be reconciled with regulatory requirements — an important step for companies that want to drive sustainable innovations in the fast-moving world of technology.

#AI Security
#Autonomous AI
#Trustworhy AI
Johannes Kuhn
March 13, 2024
Read
Generative AI
The next big leap in retail

Discover the impact of generative AI in retail. This technology unites departments, personalises content, and transforms customer experiences. Leading brands such as Coca-Cola and Walmart are already using their potential to optimise operations and drive innovation. Explore the future of retail with generative AI...

#LLM
#Generative AI
Sebastian Fetz
February 28, 2024
Read

Contact us

Name*
Email*
message*
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Help us improve our website. By “Accept” click, do you agree that we store a cookie on your device to analyse the use of the website. Read our Privacy statement for more information.
Something went wrong. Please try again.