Context length?

by turboderp - opened Jan 29, 2024

Discussion

turboderp

Jan 29, 2024

Is this really 2k seq length? The base 70b seems to be 16k, is there something up with the config?

amgadhasan

Jan 29, 2024

cc @osanseviero

juewang

Jan 29, 2024

Same question here. The blog shows both the instruction and python models are long context fine-tuned.

michaelfeil

Jan 29, 2024

2048: https://huggingface.co./codellama/CodeLlama-70b-Instruct-hf/blob/5c0e18bec97099ebf50649c002631054e1b9725e/config.json#L13

yard1

Jan 29, 2024

Actually it should be 4096, it seems like the config.json is wrong (the conversion script needs to be updated is my guess). I confirmed that with a Meta engineer, plus you can see that in the reference implementation - https://github.com/facebookresearch/codellama/blob/1af62e1f43db1fa5140fa43cb828465a603a48f3/llama/model.py#L277 (self.params.max_seq_len * 2 where self.params.max_seq_len == 2048).

lmg-anon

Jan 29, 2024

•

edited Jan 29, 2024

The README says this is a model with 16k context, corroborating with turboderp's findings.

Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant does not support long context of up to 100k tokens.

Altough I guess it could be wrong too.

turboderp

Jan 30, 2024

@yard1 Thanks.

It's a real shame that the instruct and python versions were nerfed like this, but I guess 4096 is a better starting point than 2048 at least. :(

mohdsoci

Jan 30, 2024

4096 for a coding model is painfully small.

viktor-ferenczi

Feb 1, 2024

Without 16k context length it is basically useless as a coding model.

viktor-ferenczi

Feb 1, 2024

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

iphann

Feb 11, 2024

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

加油Phind

sbnc

about 20 hours ago

How come all the smaller models of the same series (34B, 13B, 7B) have a context length of 16k, but the largest one only 4k? Doesn't make much sense. Also all documentation states that these models were trained on 16k inputs. It looks most like a type in config.json. Which is also strange, like how come noone noticed/fixed it? Also everywhere they say it supports up to 100k context. Is that a theoretical maximum, or what?

michaelfeil

about 9 hours ago

@sbnc This model is quite outdated now, so who still cares!

sbnc

about 8 hours ago

I am new to the space so i was experimenting with different models. But probably you are right...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment