Updated: June 22, 2024

Read time: # mins


Title and Authors:

The title of the paper is "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". The authors are a large team from Microsoft including Marah Abdin, Russell J. Hewett, Olatunji Ruwase, Sam Ade Jacobs, Jamie Huynh, and many others, totaling over fifty contributors.

Abstract Summary:

The paper introduces phi-3-mini, a compact 3.8 billion parameter language model capable of running on mobile devices with performance comparable to larger models such as GPT-3.5 and Mixtral 8x7B. It emphasizes the use of a unique dataset composed of heavily filtered web data and synthetic data to train smaller models without compromising their performance.

Key Concepts:

  • Small Language Models (SLMs): Efficient models capable of deployment on devices with limited resources.
  • Dataset Optimization: Use of heavily filtered web data and synthetic data for training to enhance model performance.
  • Model Scaling: Detailed scaling results for models with different parameters (phi-3-mini, phi-3-small, phi-3-medium) showing effectiveness at various scales.
  • Quantization: Techniques to reduce the model size for mobile deployment, specifically 4-bit quantization for phi-3-mini.

Problem Statement:

The main challenge addressed by the paper is developing a language model that is both small enough to operate on a mobile phone and powerful enough to perform at the level of much larger contemporary models.

Methods and Techniques:

  • Transformer Architecture: Utilizing a transformer decoder architecture with modifications for size and performance optimization.
  • Quantization: Applying 4-bit quantization to the model to fit and perform efficiently on mobile devices.
  • LongRope: A technique to extend the context length in the smaller model version, enabling it to handle longer text sequences effectively.
  • Data Filtering: Innovations in selecting and processing training data to maximize model effectiveness without the need for extensive computing resources.

Key Results:

Phi-3-mini demonstrated strong performance across various benchmarks, achieving scores like 69% on MMLU and 8.38 on MT-bench. It rivals larger models and showcases the effectiveness of its training and architecture in a mobile-friendly format.

Contributions and Innovations:

  • Model Size Reduction: Successfully reducing the model size to enable local deployment on mobile devices without losing performance.
  • Data Filtering and Synthetic Data Use: Innovations in data preparation that allow smaller models to perform as well as larger ones.
  • Model Architectural Adjustments: Implementing architectural techniques like LongRope and quantization to maintain performance within the constraints of mobile hardware.

Future Work:

The authors suggest further optimization of their data mixture for larger models and continued investigation into reducing the model size while maintaining or improving performance benchmarks.


The phi-3-mini can be used in mobile applications requiring natural language processing, such as virtual assistants, mobile-based chatbots, and real-time language translation applications that can operate fully offline.

Relevant Links

Here are the relevant links extracted from the paper:

  1. Preprints and Research Publications:
    • Gunasekar, Suriya, et al. "Textbooks Are All You Need." arXiv preprint arXiv:2306.11644, 2023.
    • Vaswani, Ashish, et al. "Attention Is All You Need." Advances in Neural Information Processing Systems, vol. 30, 2017.
    • Kaplan, Jared, et al. "Scaling Laws for Neural Language Models." arXiv preprint arXiv:2001.08361, 2020.
    • Ding, Yiran, et al. "Longrope: Extending LLM Context Window Beyond 2 Million Tokens." arXiv preprint arXiv:2409.05463, 2024.
    • Other various arXiv preprints cited throughout the paper related to language models and their training methods.
  2. Benchmarks and Datasets:
    • Hendrycks, Dan, et al. "Measuring Mathematical Problem Solving With the MATH Dataset." 2021.
    • Zellers, Rowan, et al. "HellaSwag: Can a Machine Really Finish Your Sentence?" ACL 2019.
    • Clark, Peter, et al. "Think You Have Solved Question Answering? Try ARC, The AI2 Reasoning Challenge." 2018.
    • Other benchmarks like GSM-8K, MedQA, AGIEval, TriviaQA, Arc-C, Arc-E, PIQA, SociQA, BigBench-Hard, WinoGrande, OpenBookQA, BoolQ, CommonsenseQA, TruthfulQA, and HumanEval mentioned for model evaluation.
  3. Organizations and Projects:
    • Meta AI's Llama-3 announcement.
    • Various references to OpenAI's GPT models and their blogs.

 right now

Up and running, for free, in 5 minutes.

Start in your terminal now

curl -s https://get.modular.com | sh -

By downloading, you accept our Terms.

Available now

Coming Soon

Context Windows

ML Systems

ML Systems

Context Windows

ML Systems

Context Windows

ML Systems

Context Windows



ML Systems

ML Systems




ML Systems

ML Systems

ML Systems




ML Systems

ML Systems




ML Systems

ML Systems

Context Windows