Skip to content
This repository has been archived by the owner on May 22, 2019. It is now read-only.

Create terms glossary for sourced.ml #271

Open
zurk opened this issue Jun 14, 2018 · 3 comments
Open

Create terms glossary for sourced.ml #271

zurk opened this issue Jun 14, 2018 · 3 comments

Comments

@zurk
Copy link
Contributor

zurk commented Jun 14, 2018

We constantly confuse terms, so what to say about other developers.
I do not want to make it full, but to have a start.

Here is terms list to explain on the first iteration:

  1. Bag-of-words
  2. Weighted bag-of-words
  3. Model
  4. Algorithm
  5. Transformer
  6. Document
  7. Features
    1. identifier
    2. token
    3. literal
    4. graphlet

Googleable terms we may comment:

  1. quantization
  2. TF-IDF
  3. topic
  4. co-occurrence matrix

@src-d/machine-learning please take a look and add any confusing terms you remember.

@r0mainK
Copy link
Contributor

r0mainK commented Jun 14, 2018

If we're gonna define identifiers and token, might as well also add literals, graphlets and also quantification quantization . I think we could divide the glossary into:

  • terms that mean something more specific then would be usually the case or are vague to start with e.g. model meaning a modelforge model, words in BOW being any feature extracted from a document, document that means a repo/file or function, etc.
  • terms that we use in the same ways it is intended but not be well known. Now of course they have Google, but we might as well drop a couple lines to explain the concept. E.g. COOC, quantization, topics, TFIDDF

@vmarkovtsev
Copy link
Collaborator

Linking to https:/src-d/apollo/blob/master/doc/GLOSSARY.md

@zurk
Copy link
Contributor Author

zurk commented Jun 15, 2018

Thanks, @r0mainK I update the description.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants