Apple has unveiled the details of its new Apple Foundation Models (AFM), a collection of large language models (LLMs) that drive various features in the Apple Intelligence suite. The AFM family consists of two versions: a 3-billion parameter on-device model and a larger cloud-based model. The on-device version was developed by reducing a 6.4-billion parameter model, while the cloud-based version, called AFM-server, was created from scratch. Apple has not disclosed the exact size of the AFM-server model but has shared that both versions utilize a Transformer decoder-only architecture pre-trained on 6.3 trillion tokens of data.
What’s Happening & Why This Matters
Apple’s introduction of the AFM models marks a notable step in the development of artificial intelligence tailored for its ecosystem. These models use pluggable task-specific LoRA adapters, which are selected at runtime to enhance performance for various tasks such as proofreading or replying to emails. Apple has evaluated these models against several benchmarks, including instruction-following and mathematical reasoning. The company reports that AFM models perform well compared to similar-sized models, such as Llama 3 and GPT-4, and sometimes even surpass them in certain tests.
The models were created with the intention of enhancing user experiences across Apple products while adhering to Apple’s core values of safety and privacy. For instance, the AFM models were designed with several safeguards to prevent harmful content, spam, and personal information from being included in their pre-training data. During the fine-tuning phase, safety alignment was treated as a priority, with over 10% of the fine-tuning data dedicated to safety-related content. Apple also carried out manual and automated “red-teaming” to uncover and address potential vulnerabilities.
The AFM models have been tested on multiple benchmarks, where they outperformed some larger models, such as Gemma-7B and Mistral-7B. The cloud-based AFM-server model achieved competitive results, with a win rate of 52% against GPT-3.5. Apple’s lead author on the AFM technical report, Ruoming Pang, noted that these models were trained to handle a range of general-purpose capabilities, including summarization, writing assistance, tool use, and coding.
The adapter architecture in the AFM models allows them to be quickly modified for specific tasks. These adapters are compact neural network modules that connect to the self-attention and feed-forward layers of the base model. By fine-tuning the base model with task-specific datasets, the adapters enhance the model’s performance for different applications. The on-device adapters are optimized to be highly efficient, consuming only about 10 MB, making them suitable for smaller devices.
Apple continues to explore advancements in AI, hinting at the development of a broader range of generative models, including those for language diffusion and coding. This expansion aims to further integrate AI into various Apple products and services, ultimately enhancing user experiences while maintaining strict safety and privacy standards.
TF Summary: What’s Next
Apple’s Foundation Models are poised to play a significant role in the future of AI across its product ecosystem. With the company’s ongoing commitment to innovation and safety, we can expect further enhancements and new applications for these models. Apple will likely continue to share updates on its AI advancements, particularly in areas like coding and generative models, to expand its capabilities and provide more robust tools for users.
— Text-to-Speech (TTS) provided by gspeech