Objective
The paper aims to investigate the capabilities of smaller Transformer-based language models and determine if high performance can be achieved without the need for both 1) large-scale models, and 2) large-scale data.
Central Problem
LLMs have shown transformative capabilities in the field of NLP. However, their vast size poses challenges in terms of training costs, energy consumption, and controllability.
Solution & Methodology
The authors Introduce phi-1.5, a 1.3 billion parameter model, focusing on common sense reasoning in natural language. The model aims to achieve performance comparable to models 5~10x larger, using a unique training approach that leverages "textbook-like" synthetic data. Specifically, the training data consists of 7B tokens from phi-1’s training data, synthetically generated 20B tokens. The model is trained from scratch, and is used without any instruction fine-tuning or RLHF.
Results