LLMs for Dummies - Walkthrough Guide and Glossary

Is this you 👆? You didn’t do CompSci so now you’re the court jester, rapidly trying to scale knowledge in the __fastest-moving industry __the world has ever seen?

Take a breath. Here’s a post going back to basics where you can ask “a really dumb question” and not feel judged.

A Small Glossary of LLM Terms for Those Learning

Transformer — More than meets the eye…A type of model used in machine learning, especially for handling sequences of data like text or audio. It’s good at understanding the context in sentences and can be used for translating languages, summarizing text, or generating chatbot responses.

Large Language Model (LLM) — It’s like a huge database of language knowledge that can write articles, answer questions, or create realistic dialogues.

A Transformer is a technique used in AI for processing language. An LLM is a big AI model for language tasks, often built using the Transformer technique.

Interface — The part of a computer system or software that allows users to interact with it. Think of it as the front-end of a program where you type in your question or command, and the program responds.

Inference — In AI, this means using a trained model to make predictions or decisions. For example, after training a model to recognize cats in pictures, inference is when the model looks at a new picture and decides whether there’s a cat in it.🐈‍⬛

Supervised Learning — A way of training machines where you give the model examples with answers. Like showing a program lots of pictures of cats and telling it ‘This is a cat’ so it learns what cats look like.

Unsupervised Learning (heeeyo) — Training a machine without giving it the answers. The model looks at data and tries to find patterns or groups on its own. For example, it might sort different types of music into genres without being told the genre names.

Reinforcement Learning — Teaching machines through trial and error. The machine makes choices in a situation and gets rewards or penalties based on whether its choices are good or bad, learning over time to make better decisions (or to become resentful and secretive)

Neural Network — Designed to work a bit like a human brain. It consists of lots of small units (like brain cells) that work together to process information and solve problems.

Creating an LLM

Gathering your Data

Start by collecting a wide variety of text data. This could include books, online articles, or data from databases. The more diverse your data, the better your LLM will be at understanding different aspects of language.

Kaggle has great data for ML and data science projects. Check out Australian local and Kaggle grandmaster Jeremy Howard.

GitHub often hosts datasets published by researchers and developers. Good place to search.

Worth mentioning- Google Scholar for datasets related to papers + gov sites

Preprocessing Data

Now,clean this data. This step is about fixing errors, removing parts that aren’t useful, and organizing them so your AI can learn from them effectively.

Considerations

How will you handle missing values, fix formatting issues, deal with duplicate data?

Choosing a Model Architecture

Model architecture is essentially the design or structure of the model, acting as the blueprint guiding how the AI processes information.

Transformer architecture is particularly tailored to handle sequential data like text, focusing on understanding the context within the data, and we’ll stick with that for today.

Training the Model

Feed the prepared data into your AI model. This is where your AI starts learning the intricacies of language. Training can be time and resource-consuming, especially with lots of data. (This is where I’d like to mention my buddies at Unsloth, the podcast coming soon)

Testing and Refining

After the training, evaluate how well your AI understands and generates language. Depending on the results, you might need to adjust and retrain to enhance its performance.

Running the LLM

Now, how do you run the beast?

Instead of building an LLM from scratch, you can use Hugging Face to access models already trained on crazy amounts of data. You can run these models either on their cloud service or download them to run locally on your machine.

Regardless of your choice, the key is to have a trained LLM model and the means to interact with it, whether through the internet or directly on your computer.

This is part one in a series of posts aimed at reducing the barrier of understanding and adoption of open-source AI.

I write and produce podcasts over here-

(un)supervised Learning

Other links here https://linktr.ee/Unsupervisedlearning

Also published here