How New York are you? [documentation]

“So, do you feel like a real New Yorker yet?”

How can a recent transplant possibly answer this question without sounding like as much of an asshole as the other recent transplant who just asked it? For the past six years, my go-to has been “fuck that, I’m from Chicago”—but as a wise friend once advised me, if you don’t have anything nice to say, just respond with a number.

How New York are you? is a voice-controlled browser game where two players compete to be crowned the realest New Yorker. The computer volleys hot topic keywords from the past year, and each player will have one shot per topic to prove how aligned they are with most common New York opinions. The quicker and closer the response, the more points earned.

In order to make this game, I first used twint, a twitter-scraping python module, to gather tweets originating from New York during 2018 that were relevant to popular topics on Twitter this year. Then I used this corpora to train word2vec models for each topic using gensim.

When building my initial idea, I had uploaded word2vec models directly to the browser with tensorflowjs/some code stolen from ml5js, then used tensorflowjs’s tsne library to reduce the vectors to two dimensions for visualization (beware your array types when using this library!). However, these calculations proved to be too burdensome to perform before each game, so for the final iteration, I ended up doing the tsne reduction in python (adapting a script from Yuli Cai’s workshop last year)—then uploading the two dimensional vectors to the browser instead. On Gene’s suggestion, I plan to reduce the models to three dimensions instead, then reduce to two dimensions with tensorflowjs during gameplay, in order to get more accurate results.

I used Chrome’s Speech Synthesis API to announce the topic for each round, as well as their Speech Recognition API to capture each player’s responses (recognition.interimResults is everything). I hope to someday make a version for Firefox as well.

Once a player responds to a topic and the API transcribes the response, tensorflowjs calculates the distances between each word in their response and the original keyword, then averages the distances in order to calculate a final score for their turn. The longer the distance and slower the response, the lower the score.

d3js then plots the respective embeddings in the browser. At the end, if the winner’s score surpasses the tenth highest score in history, they can add their name to the high score board for eternal fame and glory.

Final Project Proposal

For my final project, I’d like to create a browser game in which two players compete to think of words as unrelated to each other as possible, as quickly as possible. The browser will keep score, which is determined by 1) the distance between two words as defined by word2vec models, and 2) the time it takes for the player to think of their word. The browser will also map the players’ words based on a tsne reduction of the word2vec model, in order to provide a visual indicator of performance.

Collect inspirations: How did you become interested in this idea? 

I love the idea of statistically analyzing text, and have really enjoyed building Markov Models and training LSTMs in the past. Word2Vec is especially interesting because it’s able to map words semantically, and does this solely through the analysis of large amounts of corpora. Depending on the dataset, visualizing these relationships can reveal a lot about how the source perceives the world.

 

Collect source material:

  1. text sources: Wikimedia dump, Google News (pre-trained word2vec), kanye tweets, wiki-tSNE for different topics (art movements, periods of history, celebrities, movies, etc)
  2. nltk + punkt to clean data, remove stop words
  3. gensim to train word2vec model
  4. tensorflowjs to calculate distance
  5. tensorflowjs tsne library to visualize

 

Collect questions for your classmates.

  • What should the title be?
  • Game features?
  • Text sources?

What are you unsure of? Conceptually and technically.

  • How to use pre-trained GloVe models with tensorflowjs/ml5?
  • Is this a fun game??

Class Notes:

  • show averages between words (as explanations)
  • narrative
  • https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon

async await

using promises:

something()
.then(response => {
return another(response.id);
})
.then(response => {
console.log("another promise");
})
.catch()

async await version:

async blah() {
response1 = await something();
response2 = await another(response1.id);
}

A2Z W3 Class Notes

Some javascript functions take regex

paragraph.match(/quick/g);

replace() + regex + callback: https://github.com/shiffman/A2Z-F18/blob/master/week2-regex/08_replace_with_callback/sketch.js

 

fetch(url)
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error(error))

 

CORS workaround: cors anywhere https://github.com/Rob–W/cors-anywhere

Hello, Computer Week 1 / A2Z Week 2 Homework

https://xujenna.github.io/a2z/wk2/index.html

For this week’s homework, I decided to rebuild a markov model with RiTa.js that I had previously created in python with markovify and NLTK. This time, it would respond (loosely) to a user’s input, and with a voice via the Web Speech API.

I had initially experimented with markov models in python because I had the idea to create a sort of self-care assistant as the final phase of my mood prediction project, and had dreams of it being this omnipotent and omnipresent keeper. While I have yet to figure out how to implement such a presence, I did have an idea of what I wanted it to sound like: a mixture of the exercises in Berkeley’s Greater Good in Action, NY Mag’s Madame Clairevoyant, and Oprah. I had assembled corpuses for each of these personalities manually.

It was incredibly easy to build this markov model with RiTa, and the results were surprisingly coherent—with markovify, it was necessary to POS-ify the text with NLTK in order to force some semblance of grammar into a model. However, there didn’t seem to be a native option to provide seed text, so in order to make the model responsive to a user’s input, I utilized RiTa’s KWIC model to gather all of the sentences from the source text that contain each stemmed word from the input, and loaded what the KWIC returned back into the markov model as an additional source with a high weight. The resulting generated text was consistent enough in making subtle reference to the user’s input.

The last step was to feed the markov’s response into the speech synthesizer, which was pretty straightforward, but the creepy, male, pixelated voice gives this experience the uncanny feeling which every divine being deserves.

a2z wk2 class notes

New way of loading data (jsons) to avoid callback hell:

fetch(url).then(gotData).catch(error);

async await for sequential execution, avoids promise hell

() => replacement for anonymous function

=> for one line of code

button.mousePressed(() => background(255,0,0));

loadJSON('data.json', data => console.log(data));

for…of loop:

for(let word of words) {
let span =
createSpan(word);
  span.mouseOver(() => span.style("background-color", "red"));
}

 

REGEX

  • All words: \w
  • Match beginning of the line: ^
  • Match first word of a line: ^\w+