Microsoft released Phi-3 Mini, a new version of its lightweight AI model designed for specific tasks.
According to the research paper published earlier this week, Phi-3 Mini has 3.8 billion parameters which is significantly less than other models like OpenAI’s GPT-4, making it small enough to be deployed on a smartphone. OpenAI hasn’t shared how many parameters GPT-4 has but it’s believed to have over one trillion parameters per Semafor.
Traditional AI models require massive amounts of computing power, which is very expensive and has a huge carbon footprint. Companies like Microsoft and Google have been working on smaller lightweight models that handle common tasks, which would make hosting their models more sustainable — in the operational sense — and more suitable for smartphones which is where the industry is heavily leaning. Samsung is going all in on generative AI with a collection of features for its Galaxy devices, Google is also adding generative AI features to its Pixel lineup, and even Apple is expected to make some big AI announcements for iOS 18.
Parameters relate to how models are able to tackle complexity, so the more parameters, the more capable a model is at handling vast and nuanced requests. But for everyday tasks that the average user would need from an AI model, such as translating, help drafting an email, or looking for local restaurants, a smaller lightweight model is presumed to be sufficient.
Phi-3 Mini scored similarly against Meta’s open-source model Llama 3 and OpenAI’s GPT-3.5 on common benchmarks with a few exceptions. It surpassed Llama 3 and scored just below GPT 3.5 in natural language understanding (MMLU) and commonsense reasoning (HellaSwag) and beat both models on arithmetic reasoning (GSM8K). As the paper notes, it scored lower on trivia and “factual knowledge” but researchers believe “such weakness can be resolved by augmentation with a search engine,” meaning once the model is hooked up to the internet, that won’t be such an issue.
Researchers trained Phi-3 Mini on a combination of “heavily filtered web data” that meets standards for high quality educational information, as well as synthetic data, which challenges the idea that scraping everything from the web is the best way to train a model. The model was also trained on… bedtime stories, according to DailyAI, which actually makes a ton of sense for understanding the way human brains work. The idea is to opt for quality over quantity with curated data so it can run on fewer parameters while still retaining its potency.
Phi-3 Mini is now available on HuggingFace, Azure, and Ollama.