Starling-7B
S
Starling 7B
Overview :
Starling-7B is an open-weights large language model (LLM) trained using Reinforcement Learning from AI Feedback (RLAIF). It was trained effectively leveraging our new GPT-4 labeled ranking dataset, Nectar, and a novel reward training and policy optimization process. Starling-7B achieved a score of 8.09 on MT Bench, with GPT-4 as the judge, surpassing all existing models except OpenAI's GPT-4 and GPT-4 Turbo. We have released the ranked dataset Nectar, the reward model Starling-RM-7B-alpha, the language model Starling-LM-7B-alpha on HuggingFace, and an online demo on LMSYS Chatbot Arena. Stay tuned for the upcoming release of our code and paper, which will provide more details about the entire process.
Target Users :
For chat and Q&A scenarios
Total Visits: 0
Website Views : 59.3K
Features
Reinforcement learning from AI feedback
Optimize LLM usability and safety
Provide high-quality ranked datasets and reward models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase