AI

AI

  • 17 posts

Augmenting LLMs Lenses

Large Language Models (LLMs) are developed to understand the probability distribution that governs the world language space. Autoregressive models approximate this distribution by predicting subsequent words based on previous context, forming a Markov chain. World knowledge (often referred as parametric knowledge) is stored implicitly within the model's parameters.

Privacy Concerns and Potential Attacks in LLMs

Large Language Models (LLMs), exemplified by OpenAI’s GPT-4 and Meta’s LLaMA, continue to impress us with their capabilities, which have surpassed expectations from just a few years ago. Recently, the research community has shifted its focus towards the optimal and efficient usage of resources. Concepts like the

Parameter-Efficient Fine-Tuning (PEFT), LoRA and Quantization

Transformer-based deep learning models, such as GPT-3 and LLaMA, have achieved state-of-the-art results on many NLP tasks. These models have exhibited outstanding performance and are capable of resolving tasks on the fly through in-context-learning (ICL) without the need for retraining. This approach helps

Large Models Training

The urge to train expansive deep learning models, particularly large language models, is ever-growing. A single GPU often falls short in providing the required memory capacity to accommodate various parameters and data, thus necessitating the employment of multiple GPUs. Additionally, the time cost of training complex models can be

Mixture Of Experts (MoE) & LLMs

Scaling up the size of models leads to a considerable augmentation in computational expenses, both during training and inference phases. In a bid to harness the benefits of parameter scaling without an equivalent surge in computational requirements, the Mixture of Experts (MoE) approach was developed for expansive language models. Within

Evaluation of Large Language Models (LLMs)

Large language models (LLMs) have shown tremendous capabilities, ranging from text summarization and classification to more complex tasks like code generation. However, there is still an urgent need to understand how we can holistically evaluate properly trained models. Traditional benchmarks tend to fall short, as LLMs are capable of handling

Scaling Large Language Models

In recent years, there has been a consistent trend in the expansion of the dimensions of large language models. They’re being trained on ever-increasing amounts of data and displaying ever-improving performance. However, is this growth merely for the sake of expansion, or is there a deeper rationale

A Quick Trip To Generative Pre-trained Transformers (GPT)

Generative Pre-trained Transformers (GPT) have cast a bright spotlight on the field of AI, especially ChatGPT. Companies are now recognizing AI as a potent tool, not only GPT and its variants but AI in general. However, GPT was not born by accident. When you delve into its story, the