==== RAG ideas ====

  - Embedding model can create word embedding which captures the semantics of the given piece, be it a piece of text, part of picture or video. It creates contextualized embeddings, which are represented as vectors, in turn one can say “we could create word embedding of our corpus using openAI’s Ada” 
  - We typically chunk the tokens to 512 (practically could be smaller as can put some manual tag to the chunk)hunking for 3 purposes: 
    * To fit the chunk size within the limit of the embedding model, this is usually not an issue anymore since most of the models have >4k context window. 
    * being more versatile and able to stuff more “smaller documents” (chunks) into the prompt. by chunking to 512 tokens, one can stuff the models context window and max loss of window size is 511  
    * (opinion) smaller chunk leads to more balanced distribution of the vectors leads to better clustering leads to better ranking, 
  - how embedding models that are not word based (like GysBert) can be made to be sentence or chunk based? Conceptually it is quite basic machine learning layers which are doing the job, however, what does that mean, why they capture the semantics, have to check.