Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
Oh I got you mixed up with the other commenter, apologies.
I’m not sure when llama 8b starts to degrade at long context, but I wanna say its well before 128K, and where other “long context” models start to look much more attractive depending on the task. Right now I am testing Amazon’s mistral finetune, and it seems to be much better than Nemo or llama 3.1 out there.
Oh I got you mixed up with the other commenter, apologies.
I’m not sure when llama 8b starts to degrade at long context, but I wanna say its well before 128K, and where other “long context” models start to look much more attractive depending on the task. Right now I am testing Amazon’s mistral finetune, and it seems to be much better than Nemo or llama 3.1 out there.