Discussion about this post

User's avatar
Daniel Olshansky's avatar

> Could a large language model trained on data fit under that term? I don't think so, but the terminology is vague enough that once again I'm not ready to stake my reputation on it.

My intuition is that this wording does allude that private repositories are used for training.

Private repo --[text data] --> embeddings generator server --[embeddings]--> model training server --> model

I personally do not have a problem with this, and would opt into it if I could (wouldn’t say no to some free GitHub service credits) but my immediate gut is leaning towards “it’s definitely used for training” and was surprised you think otherwise/

Expand full comment

No posts