Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
You are aware that those are often called LMMs, Large Multimodal Model. And one of the modes that makes it multi-modal is Language. All LMMs are or contain an LLM.
deleted by creator
What else would it be except an llm? What do you think model means?
deleted by creator
Large language model
deleted by creator
You are aware that those are often called LMMs, Large Multimodal Model. And one of the modes that makes it multi-modal is Language. All LMMs are or contain an LLM.
deleted by creator
https://github.com/haotian-liu/LLaVA
I don’t think Google actually uses LLava but the concept is the same. The data gets converted into text for the model to process.
deleted by creator
When are you going to admit you have no idea what you are talking about?
An LLM literally is a “general AI model that powers a variety of tasks”.
deleted by creator
I’m going to be honest, I actually know a lot more than I can say on this matter. But believe me Gemini Nano is a multimodal LLM.
I spoke to Google engineers about this a few months ago:
deleted by creator