How New York are you? [documentation]

“So, do you feel like a real New Yorker yet?”

How can a recent transplant possibly answer this question without sounding like as much of an asshole as the other recent transplant who just asked it? For the past six years, my go-to has been “fuck that, I’m from Chicago”—but as a wise friend once advised me, if you don’t have anything nice to say, just respond with a number.

How New York are you? is a voice-controlled browser game where two players compete to be crowned the realest New Yorker. The computer volleys hot topic keywords from the past year, and each player will have one shot per topic to prove how aligned they are with most common New York opinions. The quicker and closer the response, the more points earned.

In order to make this game, I first used twint, a twitter-scraping python module, to gather tweets originating from New York during 2018 that were relevant to popular topics on Twitter this year. Then I used this corpora to train word2vec models for each topic using gensim.

When building my initial idea, I had uploaded word2vec models directly to the browser with tensorflowjs/some code stolen from ml5js, then used tensorflowjs’s tsne library to reduce the vectors to two dimensions for visualization (beware your array types when using this library!). However, these calculations proved to be too burdensome to perform before each game, so for the final iteration, I ended up doing the tsne reduction in python (adapting a script from Yuli Cai’s workshop last year)—then uploading the two dimensional vectors to the browser instead. On Gene’s suggestion, I plan to reduce the models to three dimensions instead, then reduce to two dimensions with tensorflowjs during gameplay, in order to get more accurate results.

I used Chrome’s Speech Synthesis API to announce the topic for each round, as well as their Speech Recognition API to capture each player’s responses (recognition.interimResults is everything). I hope to someday make a version for Firefox as well.

Once a player responds to a topic and the API transcribes the response, tensorflowjs calculates the distances between each word in their response and the original keyword, then averages the distances in order to calculate a final score for their turn. The longer the distance and slower the response, the lower the score.

d3js then plots the respective embeddings in the browser. At the end, if the winner’s score surpasses the tenth highest score in history, they can add their name to the high score board for eternal fame and glory.

NLP (neural aesthetic class notes)

skip-gram: predicts next/prev word(s) based on present word

CBOW: opposite of skip-gram; input is a sequence of words, output is the next word

embedding size = possible relational directions

universal sentence encoder: colab, arxiv

hierarchal neural story generator (fairseq): repo

tracking the drift of words

wiki-tSNE: groups wikipedia articles by topic

python library wikipedia

  • text ="New  York University")

spacy: better than nltk? can parse entities, ie organizations (New York University) time (12pm), etc

Final Project Proposal

For my final project, I’d like to create a browser game in which two players compete to think of words as unrelated to each other as possible, as quickly as possible. The browser will keep score, which is determined by 1) the distance between two words as defined by word2vec models, and 2) the time it takes for the player to think of their word. The browser will also map the players’ words based on a tsne reduction of the word2vec model, in order to provide a visual indicator of performance.

Collect inspirations: How did you become interested in this idea? 

I love the idea of statistically analyzing text, and have really enjoyed building Markov Models and training LSTMs in the past. Word2Vec is especially interesting because it’s able to map words semantically, and does this solely through the analysis of large amounts of corpora. Depending on the dataset, visualizing these relationships can reveal a lot about how the source perceives the world.


Collect source material:

  1. text sources: Wikimedia dump, Google News (pre-trained word2vec), kanye tweets, wiki-tSNE for different topics (art movements, periods of history, celebrities, movies, etc)
  2. nltk + punkt to clean data, remove stop words
  3. gensim to train word2vec model
  4. tensorflowjs to calculate distance
  5. tensorflowjs tsne library to visualize


Collect questions for your classmates.

  • What should the title be?
  • Game features?
  • Text sources?

What are you unsure of? Conceptually and technically.

  • How to use pre-trained GloVe models with tensorflowjs/ml5?
  • Is this a fun game??

Class Notes:

  • show averages between words (as explanations)
  • narrative

Neural Aesthetic Class Notes wk8

limitations of feed-forward NNs:

  • static, does not change over time
  • does not take advantage of context
  • inputs and outputs are fixed length

sequence to sequence: language translation

unit to sequence: image captioning

skip-thought vectors: arbitrary sequences of words (image to story)

dense captioning: multiple captioning for objects within images

text to image (stackGAN):

Neural Aesthetic w6 class notes

  • Generative models synthesize new samples that resemble the training data
    • applications: visual content generation, language models (chatbots, assistants, duplexes), music, etc
    • Models the probability distribution of all possible images; images that look like the dataset have a high probability
  • PCA projects down in lower dimensions and back out
  • latent space: space of all possible generated outputs
  • later layers can be used as a feature extractor because it is a compact but high-level representation
    • distance calculations between feature extractors can be used to determine similarities between images
    • transfer learning
    • can use PCA to reduce redundancies, then calculate distances
      • images (points) can then be embedded in feature space
        • vectors between points imply relationships
  • Autoencoders reconstruct its inputs as its outputs; networks learns an essential representation of the data via compression through a small middle layer
    • first half encoder, second half decoder
    • can throw in labels for a conditional distribution
    • can encode images and get their latent representation to project outward
      • smile vector
  • GANs: circa 2014
    • hard to train
    • hard to evaluate
    • can’t encode images directly
    • structured like a decoupled autoencoder
      • generator > discriminator
        • generator: basically like the decoder in an autoencoder
          • takes in random numbers, not images
          • tries to create images to trick the discriminator into thinking they’re real
        • discriminator: takes in an input image (from generator), decides if it is real or fake
        • “adversarial”: trained to work against each other
  • DC GANs
    • unsupervised technique, but can give it labels
      • interpolations through latent space AND through labels
        • labels are one hot vectors
        • MINST: glyph between integers
  • Deep Generator Network
    • similar to deep dream; optimizes an output image to maximize a class label
  • Progressively grown GANs
    • super high res, super realistic

neural aesthetic class notes


  • data points are combinations inside feature space
  • Embeddings give us relationships between data points (closer points are more similar)
  • magnitude and direction have meaning, allow many basic retrieval applications
  • feature vectors and latent spaces are examples of embeddings
  • two vectors between two pairs of points have meaning

features are patterns of activations

  • every layer becomes less abstract/ more specific: edges, parallel lines, shapes, categories
  • last layer of activations; distance or correlation between vectors

transfer learning with images

  • dimensionality reduction; tries to preserve geometries
  • linearly-independent components


  • man>woman; country>capital; singular>plural
  • words are units; sentences are infinite—sentences and paragraphs can be embedded in feature space
    • word vectors are learned implicitly
    • question-inversion vector

principle component analysis to reduce

t-SNE better for visualization and discovery of similar neighbors, but for smaller datasets;

Neural Aesthetic W3 Class Notes

convolutional neural networks allow you to look for patterns anywhere in the image

Measuring cost

  • look at the shape of the loss function for all combinations of m and b
  • bottom of the “bowl” is the best fit
  • gradient descent
    • start at a random point
    • calculate its gradient (generalization of a slope in multiple dimensions; which direction is the slope going?)
    • go down the gradient until the loss stops decreasing
  • gradient descent for NN
    • backpropagation
    • calculate gradient using chain rule
    • relates the gradient to the individual activations, error is distributed to the weights
    • problem: local minima; no way of finding the global minimum (“batch gradient descent” is not used because of this)
      • how to deal:
        • calculate the gradient on subsets of the dataset: stochastic gradient descent, mini-batch gradient descent
        • momentum: able to roll out of a local minimum to the next
          • Nesterov momentum
        • adaptive methods: AdaGrad, AdaDelta, RMSprop, ADAM (when in doubt, use ADAM)
    • overfitting

Neural Aesthetic W2 Class Notes

Features are:

  1. patterns in data
  2. implicit
  3. indicative of salient aspects of objects
  4. closely related to bias


  • Linear regression doesn’t give much flexibility; you can give a neuron more by outputting it through a non-linearity, ie a sigmoid function
    • ReLU (rectified linear unit) is preferred over a sigmoid function
  • adding a hidden layer gives y (the output) even more flexibility

Convolutional NNs: scans for certain patterns throughout the entire image

activation= value of the neuron

weights on the connections