Back

The Importance of Parameters When Using Large Language Models

Table of Contents
    Parameter in LLMs, eine Schaltkonsole, die von einem Arbeiter bedient wird
    Alexander Thamm GmbH 2024, GAI

    Large Language Models (LLMs) are driving business growth by providing valuable services such as question-answering, drafting emails, and generating code. The models enhance generative AI applications and are appreciated globally for their human-like text-generation abilities. LLMs' extensive training on vast amounts of textual data from various domains and their training for identifying patterns within the text deem them powerful tools for all businesses. However, a crucial component of LLMs is their parameters, which impact their capabilities but are often misunderstood. There are various misconceptions regarding the role of parameters, their different kinds, and their size's influence on the LLMs' performance. Therefore, we will address these issues in this blog post.

    What are parameters in Large Language Models?

    Parameters are settings that can be adjusted to control LLM's text generation capabilities. They impact the diversity, creativity, and quality of the generated text. Parameters account for the internal settings within an LLM that optimise its performance. Adjusting parameters optimises the process of predicting the next token in a sequence. A token is simply a unit of text, such as words, combinations of words, or punctuations, which is formatted for efficient and effective use by the LLM.

    Here is a simplified explanation of a parameter's training process: LLM parameters are set to an initial value based on previous training or by randomness. The model is fed with large amounts of textual data. During training, the model takes the input and predicts the correct output. Then, it compares the predicted output to the actual text to check the accuracy of its prediction. The model iteratively learns from its mistakes and adjusts its parameters accordingly. With each adjustment, the model increases its prediction accuracy.  

    Therefore, the LLM becomes more accurate and sophisticated in its language abilities through an iterative process of prediction, error checking, and parameter adjustment.

    Different types of Parameters in LLMs

    The following is a list of different types of LLM parameters and their benefits. This section provides a guide for effective parameter usage through examples that show the impact of different parameter settings and values on the output. Please note that, in the end, it's the LLM application and your business goals which define the parameter values you should choose.

    1. Temperature: This is a type of parameter that controls the randomness introduced in the text generation process. It impacts the quality, diversity, and creativity of the outputs. A high temperature results in diverse and unpredictable responses as the model generates tokens that are less likely to occur. A lower temperature results in coherent and consistent responses as the model generates tokens that are more likely to occur. For instance, given the prompt "The best way to learn coding is," a high-temperature value of 1.0 will result in a diverse and unpredictable response, such as "The best way to learn coding is to go back in time and meet the programming language inventors." On the other hand, a low-temperature value of 0.1 will result in a coherent and predictable response, such as, "The best way to learn to code is to practise a lot and follow online tutorials." Temperature affects the style of the output, and it's important to avoid extreme temperature values as it can lead to nonsensical outputs.

    2. Token numbers: This parameter controls the length of the generated text. A high number of tokens result in lengthy outputs, while a lower number of tokens results in concise and simple outputs. Choosing the number of tokes depends on the purpose and preference of the application. For instance, given the prompt "What is an LLM," using a higher number of tokens, say 100, will result in an output such as "It is a model that is trained on massive amounts of data. It interprets human language and produces a response when given a prompt. LLMs can generate poems, articles, reports, and other types of responses. Some of the common LLMs include ChatGPT, Bard, and Llama." On the other hand, using a low number of tokens, say 10, will result in an output such as "It is a model that generates human-like text." The "number of tokens" value should never be too low or too high as this can lead to redundant and irrelevant outputs.

    3. Top-p: This parameter controls the number of words that are considered candidates for the next word during the text-generation process. It impacts the diversity, creativity, and accuracy of the output. A high top-p value results in diverse and creative responses, while a low top-p value results in accurate and reliable responses. For instance, given the prompt, "The most important skill for an accountant is," a high top-p value of 0.9 would result in a diverse and creative response such as "The most important skill for an accountant is telepathy." On the other hand, a low top-p value of 0.1 will result in a response such as "The most important skill for an accountant is problem-solving." While picking the top-p value, it's important to find balance and avoid extremes, as it could impact the quality of the output.

    4. Presence penalty: This parameter controls how much the generated output reflects the presence of certain words or phrases. A high presence penalty facilitates the model's capacity to explore varied topics and avoid repetition. A low presence penalty leads to repetition and minimal exploration. For instance, given the prompt "The best sport is," a high presence penalty of 1.0 will result in diverse and varied output such as "The best sport is football, cricket, chess." On the other hand, a low presence penalty of 0.0 will result in a response such as "The best sport is football, football, football." This parameter penalises the model for reusing tokens that have already appeared in the generated text, irrespective of their presence in the prompt.

    5. Frequency penalty: This parameter scales based on the number of times the token appears in the text, including in the prompt. So, a token gets a higher penalty if it has already appeared more times and has less probability of occurring. It impacts the novelty, variation, and repetition of the text. For instance, given the prompt "The best sport is football,"  A high-frequency penalty will result in a diverse and unpredictable output, such as, "The best sport is football, football is fun and exciting." On the other hand, a low-frequency penalty of 0.0 will result in a repetitive output such as "The best sport is football, football is fun, football is full of thrill." As with other parameter values, it is important to find the right balance to avoid too high or low penalty values and avoid incoherent and nonsensical output.

    Influence of Number of Parameters on Performance

    A common question among data scientists while working on LLMs is the number of parameters the model has. In the next section*, we will discuss parameters and their different sizes. In this section, we will focus on the impact of parameter size on LLM performance. We will conclude the section by discussing the different areas of LLM applications and their changing requirements.

    A common misconception about parameters is that more is merrier, which isn't true. Having more parameters means that the LLM model can process the human language better than models with fewer parameters, as it can adjust more settings to capture the human language complexities. Another misconception is that models with fewer parameters compromise the effectiveness and accuracy of results. The bottom line for this debate is that the size of the model is independent of its success. The data quality, computational resources, and the application's specific requirements define the model's success. LLMs capabilities are defined by a combination of factors, not by the size of its parameter alone. For instance, given any word with distinct meanings depending upon the context, a LLM trained on high-quality data will be able to decipher semantic distinctions, while the same-sized model trained on low-quality data would not. However, a small language model which is trained on high-quality data will outperform a LLM trained on low-quality data.

    Models with more parameters are expensive to run, and the resources required to train them are not accessible to everyone. This includes high memory usage and longer processing times, which impacts efficiency and accessibility. Therefore, it's more important to focus on utilising the resources at hand, which often comes by fine-tuning parameters. Optimising parameters for the task in focus will bring more value to the company. In the previous section, we mentioned the different kinds of LLM parameters and the impact different values can have on the output. It's important to fine-tune the parameters to align AI responses with user's expectations. For this reason, an AI chatbot may require different parameter settings to produce natural conversations, while a content generation tool would require settings that ensure well-structured articles.

    It's important to note that as LLMs evolve, so do their requirements. For instance, when ChatGPT became open-source, conversations about its widespread usage and business applications were paramount. However, once businesses started using it for various applications and users started experiencing its impact on them, data privacy issues became more important. This raised conversations around the ethical use of LLMs. On the other hand, companies are demanding their teams to train efficient models that solve customer's needs and create models trained for specific applications. Additionally, in the beginning, companies were focusing on training LLMs with huge training data. However, after realising its high cost and the efficiency gains from small LLMs, companies are embracing mini-LLMs.

    Parameter overview of LLMs and SLMs

    The names of LLMs often consist of an acronym followed by an intriguing number. For instance, Vicuna-13B or Llama-7B. The numbers following the hyphen denote the number of parameters the model contains. This section provides a tabular description of prominent LLMs and mini-LLMs and their parameter count. Please note that the parameter count can vary according to the specific version and configuration of the model. Therefore, the values provided below are approximate.

    Model Number of Parameters  
    GPT-4 1.76 Trillion 
    Gemini 1.50 Trillion 
    Bloom 176 Billion  
    Llama 2 7B, 13B, or 70B 
    BloombergGPT  50B 
    Dolly 2.0 12B 
    FLAN-T5 80M to 11B  
    GPT-Neo 2.7B 

    DeciCoder-1B 

     

    1B 

    Phi-1.5 

     

    1.5B 

    Dolly-v2-3b 

     

    3B 

    StableLM-Zephyr-3B 

     

    3B 

    DeciLM-7B 

     

    7B 
    Amber 7B 

    Conclusion

    Parameters are crucial components necessary for the efficient working of an LLM. There are different kinds of parameters, such as temperature, token number, top-p, presence penalty, and frequency penalty, and each contributes uniquely to the generated output. It's essential to choose parameter values based on the business application and purpose, although in any case, it's necessary to avoid using extremely high or low parameter values. As with any technology, LLM relies on a combination of factors for efficient performance, and using more parameters does not guarantee better performance.

    X

    Cookie Consent

    This website uses necessary cookies to ensure the operation of the website. An analysis of user behavior by third parties does not take place. Detailed information on the use of cookies can be found in our privacy policy.