A Review on Large Language Models and Compression in AI
An LLM or Large Language Model is the technical term to describe the category that ChatGPT belongs to. I touched on this in my previous article, “What can ThinkGenetic do with ChatGPT?” Its focus was on, while ChatGPT is great, you can do the same using fewer resources with a domain-specific model and with greater accuracy. New research has just been published and these findings could pave the way for the proliferated use of AI using even smaller domain-specific models.
The LLM does have strengths. For example, it’s great at being accurate while using small amounts of training data. However, the amount of resources it takes to operate the LLM is generally cost-prohibitive for most organizations. For reference, an LLM with 175 billion parameters requires a GPU of at least 350 GB of memory. ChatGPT 4 is estimated at 100 trillion parameters, so you can imagine what it would take to operate it.
In the recently published paper Distilling step-by-step they were able to reduce the LLM size by 700 times and improve its accuracy. All by using knowledge distillation, a form of model compression. This method isn’t new either, introduced in 2006 by Bucilua and collaborators. It’s the process by which a larger model is used to train or transfer knowledge to a smaller model. This is huge news in the AI space.
The number of businesses entering the AI market with the promise of disrupting an industry is quickening. Eventually, the cost to run the models is going to dry up the capital or eat into margins unless they use techniques to reduce cost without sacrificing performance. At ThinkGenetic, AI is the only method to reduce noise in large unstructured text documents. With our clients being cost-aware, we must be as well. Staying on the forefront of technology is what allows us to be competitive.