limitations of feed-forward NNs:
- static, does not change over time
- does not take advantage of context
- inputs and outputs are fixed length
sequence to sequence: language translation
unit to sequence: image captioning
skip-thought vectors: arbitrary sequences of words (image to story)
dense captioning: multiple captioning for objects within images
text to image (stackGAN): https://arxiv.org/abs/1612.03242