Jul 23, 2024
True, the question is can we compress the knowledge of a 400B model into 7B ? Recent releases like GPT-4o-mini (way smaller given the pricing but as capable as GPT-3.5) seem to indicate that there are significant advancements in model efficiency and techniques that allow for the distillation of capabilities from larger models into much smaller ones