python for basic data analysis




data_file1.merge(data_file2, how=“outer”) will assign null values to misaligned data files

merge all data to find correlations





This semester, I began work on a system of trackers for a whole host of potential/evidenced metrics of depression, in hopes of monitoring its cyclical nature and identifying correlations with my activity and environment. Because I had done a lot of prior research, there were specific metrics that I had in mind, but oftentimes appropriate apps were either only available for iOS, didn’t provide an API, didn’t track with enough granularity, or didn’t exist at all.

Being a grad student, I have not the funds for an iPhone (new or old), and so I decided to put my newly acquired python skills to the test.


Data collection with homemade trackers:

  1. Mood Reporter: Because affect is difficult to measure, psychiatry traditionally employs self-administered questionnaires as diagnostic tools for mood disorders; these usually attempt to quantify the severity of DSM-IV criteria. The module for depression is called the PHQ-9, and I’ve adapted several of its questions into my own questionnaire, which python deploys every hour via the command line:

    The responses are then appended to a tsv:
  2. Productivity: via python and the RescueTime API, my productivity score is appended to a json every hour:
  3. Facial analysis: Via my laptop’s webcam, the Affectiva API analyses my face for a minute every hour; all its responses are saved to a json file. My python script grabs the min and max attention and valence values, as well as the expressions made (plotted with emoji) and the amount of times I blinked (calculated by dividing the number of times the eyeClosure variable hit 99.9%, divided by 2). These calculations are then appended to another JSON file that feeds into my visualization. The final entry for each hour looks like this:

  4. Keylogger Sentiment Analysis: The idea for this is simply to discern the sentiment of everything I type. I wrote a keylogger in python, which collects any coherent phrase to be sent to IBM Watson’s Tone Analyzer every hour. The response looks like this:

    The API provides several sentiment categories: joy, confidence, analysis, tentativeness, sadness, fear, and anger.


The Dashboard:

In order to understand any of this data, I would need to create a dashboard. What was important to me was to create an environment where potential correlations could be seen; since much of this is speculative, this basically meant doing a big data dump into the browser. I visualized everything in d3js.

My local dashboard has access to the hourly updated data, which is unbelievably satisfying; the public version has about 2.5 weeks worth.


Next steps:

I’m in the process of building yet another tracker: a Chrome extension which will record my tab/window activity (the amount of which is probably/definitely positively correlated with stress and anxiety in my life!).

I would also like to add a chart that allows me to compare the trendlines of all the metrics, as a preliminary attempt to guess at correlations. This will definitely require me to do a lot of data reformatting.

I also need to visualize the data from the tracking apps I did download (Google Fit and, and include other environmental information like weather, calendar events, etc.

Honestly, I will probably be working on this for the rest of my life lol

neuroscience/eeg lecture at 6 metrotech

external input is processed in occipital lobe

brain waves are sub-threshold activity in cells; some are action potentials

action potentials: from neurotransmitters

subthreshold potentials: smaller change in voltage in neuron

aggregate signal is picked up by EEG

closed eyes > depriving brain of inputs > alpha waves (higher amplitude because neurons are firing together with smaller range of frequencies > signal adds up)

open eyes > external stimuli > different neurons, inputs at different times > activity at a range of frequencies > shorter amplitude > beta waves


Hacking the Browser W5 HW

For my Hacking the Browser final, I would like to create a Chrome extension that can monitor my browser activity (to add to my suite of trackers) and produce hourly values for:

  1. the total number of tabs open at the end of the hour
    • chrome.tabs.query(object queryInfo, function callback)
  2. the total number of windows open at the end of the hour
    • getInfo, function callback)
  3. the total number of tabs opened during the hour
    • chrome.tabs.onCreated.addListener(function callback)
  4. the total number of windows opened during the hour
    • callback)
  5. the total number of tabs looked at during the hour
    • chrome.tabs.onActivated.addListener(function callback)
  6. the favicon from every updated tab
    • chrome.tabs.onUpdated.addListener(function callback)
    • tab.favIconUrl (this requires the “tabs” permission)

I believe I’ll only require a background script for this project, as I won’t be inserting any code into the pages I visit, and won’t need a browser or page action. The difficult part will be figuring out how to access the data every hour. There must be an easier way, but my only idea at the moment is to do an AJAX post to MongoDB….

Impossible Maps W4 HW

Feminist data viz notes:

  • Feminist standpoint theory: all knowledge is socially situated; the perspectives of oppressed groups (women, minorities, etc) are systematically excluded from “general” knowledge
  • Feminist data viz could:
    1. invent new ways to represent uncertainty, outsides, missing data, and flawed methods
      • can we collect and represent data that was never collected?
      • can we find the population that was excluded?
      • can we critically examine the methods of study rather than accepting the JSON as is?
    2. invent new ways to reference the material economy behind the data
      • what are the conditions that make data viz possible?
      • who are the funders?
      • who collected the data?
      • interested/stakeholders behind the data?
    3. make dissent possible
      • data viz = stable images/facts
      • re-situate data viz by destabilizing, ie making dissent possible
        • how can we talk back to the data?
        • how can we question the facts?
        • how can we present alternative views and realities?

Representation and the Necessity of Interpretation notes:

  • satellite imagery were only until recently military secrets
  • in 2000, the nyt for the first time used the newly available Ikonos satellite “as a sort of alternative investigative journalist in Chechnya” but “failed to arouse public sympathy or outrage”; however, before/after images have still become commonplace in reporting from zones of conflict
  • Sept 1999: Space Imaging launched Ikonos, the first satellite to make hires image data publicly available
  • We need to be alert to what is being highlighted and pointed toward, to the ways in which satellite evidence is used in making assertions and arguments; for every image, we should be able to inquire about its technology, location data, ownership, legibility, and source



  • I never realized that satellite imagery was born from the agenda of the US military, yet it’s not surprising. What struck me most from the latter reading was learning that Colin Powell used satellite images as incontrovertible proof that there were weapons of mass destruction in Iraq—I don’t think you can get a much better example of “interpreted data”.
  • One year later, in 2003, Ross McNutt’s team put a 44 mega-pixel camera on a small plane to watch over Fallujah, Iraq. Its images were high-def enough to track the sources of roadside bombs, and it was on all day, every day. After the war, Ross did a piloted this technology in Dayton, Ohio, as a way for the local police to identify criminals and gang members.
  • When I first heard this story, I didn’t feel too conflicted about it—bad guys were being caught and brought to justice, what’s the problem here? However, after reading Laura Kurgan’s chapter on representation and interpretation, now it feels like Ross was just thinking locally about persecuting colored people. Especially considering that a program like his would only be implemented in larger urban areas, ie where most minorities live.


Final project idea:

  • I’d like to download my location history from Google, and visualize it to get a sense of my navigational habits/biases and identify opportunities for breaking out of my comfort zone
  • I thought this was a nice use of satellite imagery; this view shows the dramatic urbanization of Shanghai over 30 years, particularly the waterfront along the Huangpu River. Also fascinating is the expanding, presumably manmade coastline


API of You W2 + W3 Homework

For the third week’s assignment, I finessed last week’s viz into something much more coherent.

For the final, I would like to create a meaningful, comprehensive dashboard for all the data I’ve collected with my homemade trackers. I’ve chosen to measure several facets of my life, motivated by scientific evidence and/or personal belief that they may be metrics for stress, anxiety, and/or depression. Currently, this data is either scattered in isolated visualizations, or just sitting around in json/csv/tsv files. Additionally, this data is only tracked and available on my local machine.

We have my foregoing keylogger data:

This “mind wandering” viz that receives data from my chrome history and the RescueTime API:

Data from Affectiva’s emotion recognition model, which I am mostly using for valence and engagement (the viz for which clearly needs work):

Most importantly, I’d like to figure out some way to visualize this self-reported mood data, which prompts me hourly:

Time allowing, I would also like to include a report on my daily photo subjects, similar to this flickr archive analysis I did with the Clarifai API:

There’s also geolocation and physical activity/sleep data that I’d also like to include, which is being tracked by apps on my phone.