Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
Thanks! Here’s the high level description from there:
“Gemini models build on top of Transformer decoders (Vaswani et al., 2017) that are enhanced with
improvements in architecture and model optimization to enable stable training at scale and optimized
inference on Google’s Tensor Processing Units. They are trained to support 32k context length”
Is this a transformer model? Any details?
Here is their technical report. I’m yet to read it, though.
Thanks! Here’s the high level description from there:
“Gemini models build on top of Transformer decoders (Vaswani et al., 2017) that are enhanced with improvements in architecture and model optimization to enable stable training at scale and optimized inference on Google’s Tensor Processing Units. They are trained to support 32k context length”