Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
It is llama3-8B so it is not out of question but I am not sure how much memory you would need to really go to 1M context window. They use ring attention to achieve high context window, which I am unfamiliar with but that seems to lower greatly the memory requirements.
That’s cool. Am I reading right that this wouldn’t run on consumer grade hardware though?
I believe you’d need roughly 500GB of RAM to run it minimum at full context length. There is chatter that 125k context took and used 40GB
I know I can load the 70B models into my laptop at lower bits but it consumes about 140GB of RAM.
It is llama3-8B so it is not out of question but I am not sure how much memory you would need to really go to 1M context window. They use ring attention to achieve high context window, which I am unfamiliar with but that seems to lower greatly the memory requirements.