How New York are you? [documentation]


“So, do you feel like a real New Yorker yet?”

How can a recent transplant possibly answer this question without sounding like as much of a jerk as the other recent transplant who just asked it? For the past six years, my go-to has been “fuck that, I’m from Chicago”—but as a wise friend recently advised me, if you don’t have anything nice to say, just respond with a number.

How New York are you? is a voice-controlled browser game where two players compete to be crowned the realest New Yorker. The computer volleys hot topic keywords from the past year, and each player will have one shot per topic to prove how aligned they are with most common New York opinions. The quicker and closer the response, the more points earned.

In order to make this game, I first used twint, a twitter-scraping python module, to gather tweets originating from New York during 2018 that were relevant to popular topics on Twitter this year. Then I used this corpora to train word2vec models for each topic using gensim.

When building my initial idea, I had uploaded word2vec models directly to the browser with tensorflowjs/some code stolen from ml5js, then used tensorflowjs’s tsne library to reduce the vectors to two dimensions for visualization (beware your array types when using this library!). However, these calculations proved to be too burdensome to perform before each game, so for the final iteration, I ended up doing the tsne reduction in python (adapting a script from Yuli Cai’s workshop last year)—then uploading the two dimensional vectors to the browser instead. On Gene’s suggestion, I plan to reduce the models to three dimensions instead, then reduce to two dimensions with tensorflowjs during gameplay, in order to get more accurate results.

I used Chrome’s Speech Synthesis API to announce the topic for each round, as well as their Speech Recognition API to capture each player’s responses (recognition.interimResults is everything). I hope to someday make a version for Firefox as well.

Once a player responds to a topic and the API transcribes the response, tensorflowjs calculates the distances between each word in their response and the original keyword, then averages the distances in order to calculate a final score for their turn. The longer the distance and slower the response, the lower the score.

d3js then plots the respective embeddings in the browser. At the end, if the winner’s score surpasses the tenth highest score in history, they can add their name to the high score board for eternal fame and glory.

Play the game here.

Leave a Reply

Your email address will not be published. Required fields are marked *