In this post we will explain how Open Source GPT-4 Models work and how you can use them as an alternative to a commercial OpenAI GPT-4 solution. Everyday new open source large language models (LLMs) are emerging and the list gets bigger and bigger. We will cover these two models GPT-4 version of Alpaca and Vicuna. This tutorial includes the workings of the models, as well as their implementation with Python
Vicuna Model
Introduction : Vicuna Model
Vicuna was the first open-source model available publicly which is comparable to GPT-4 output. It was fine-tuned on Meta’s LLaMA 13B model and conversations dataset collected from ShareGPT. ShareGPT is the website wherein people share their ChatGPT conversations with others.
Important Note : The Vicuna Model was primarily trained on the GPT-3.5 dataset because most of the conversations on ShareGPT during the model’s development were based on GPT-3.5. But the model was evaluated based on GPT-4.
How Vicuna Model works
Researchers web scraped approximately 70,000 conversations from the ShareGPT website. Next step is to introduce improvements over original Alpaca model. To better handle multi-round conversations, they adjusted the training loss. They also increased the maximum length of context from 512 to 2048 to better understand long sequences.
Then they evaluated the model quality by creating a set of 80 diverse questions from 8 different categories (coding, maths, roleplay scenarios etc). Next step is to collect answers from five chatbots: LLaMA, Alpaca, ChatGPT, Bard, and Vicuna and then asked GPT-4 to rate the quality of their answers based on helpfulness, relevance, accuracy, and detail. In short GPT-4 API was used to assess model performance.
Read MoreListenData