Context length?

#2
by turboderp - opened

Is this really 2k seq length? The base 70b seems to be 16k, is there something up with the config?

Same question here. The blog shows both the instruction and python models are long context fine-tuned.

Actually it should be 4096, it seems like the config.json is wrong (the conversion script needs to be updated is my guess). I confirmed that with a Meta engineer, plus you can see that in the reference implementation - https://github.com/facebookresearch/codellama/blob/1af62e1f43db1fa5140fa43cb828465a603a48f3/llama/model.py#L277 (self.params.max_seq_len * 2 where self.params.max_seq_len == 2048).

The README says this is a model with 16k context, corroborating with turboderp's findings.

Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant does not support long context of up to 100k tokens.

Altough I guess it could be wrong too.

@yard1 Thanks.

It's a real shame that the instruct and python versions were nerfed like this, but I guess 4096 is a better starting point than 2048 at least. :(

4096 for a coding model is painfully small.

Without 16k context length it is basically useless as a coding model.

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

加油Phind

How come all the smaller models of the same series (34B, 13B, 7B) have a context length of 16k, but the largest one only 4k? Doesn't make much sense. Also all documentation states that these models were trained on 16k inputs. It looks most like a type in config.json. Which is also strange, like how come noone noticed/fixed it? Also everywhere they say it supports up to 100k context. Is that a theoretical maximum, or what?

@sbnc This model is quite outdated now, so who still cares!

I am new to the space so i was experimenting with different models. But probably you are right...

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment