For my final project, I’d like to create a browser game in which two players compete to think of words as unrelated to each other as possible, as quickly as possible. The browser will keep score, which is determined by 1) the distance between two words as defined by word2vec models, and 2) the time it takes for the player to think of their word. The browser will also map the players’ words based on a tsne reduction of the word2vec model, in order to provide a visual indicator of performance.
Collect inspirations: How did you become interested in this idea?
I love the idea of statistically analyzing text, and have really enjoyed building Markov Models and training LSTMs in the past. Word2Vec is especially interesting because it’s able to map words semantically, and does this solely through the analysis of large amounts of corpora. Depending on the dataset, visualizing these relationships can reveal a lot about how the source perceives the world.
Collect source material:
- text sources: Wikimedia dump, Google News (pre-trained word2vec), kanye tweets, wiki-tSNE for different topics (art movements, periods of history, celebrities, movies, etc)
- nltk + punkt to clean data, remove stop words
- gensim to train word2vec model
- tensorflowjs to calculate distance
- tensorflowjs tsne library to visualize
Collect questions for your classmates.
- What should the title be?
- Game features?
- Text sources?
What are you unsure of? Conceptually and technically.
- How to use pre-trained GloVe models with tensorflowjs/ml5?
- Is this a fun game??
- show averages between words (as explanations)