Lab 5: Squirrel Data Analysis

7 October 2022

The goal of this lab is to give you practice working to analyze real data, which will require working with both tables and lists. As you work on each part of the lab, you’ll practice breaking problems into smaller tasks, which you can complete by writing functions and expressions.


This lab can be completed in pairs!

If you choose to work in a pair, you’ll make a single code file which you’ll upload to Gradescope with both your names.

As you work through the lab, take turns “driving” and “navigating”. That is, for a while you type in CPO while your partner is reading the assignment and then you trade.


Hundreds of volunteers have devoted their time to creating an important data set: the Squirrel Census! This has involved carefully recording individual squirrel sightings, including information about the appearance of the squirrel, where it was seen, and what it was doing.

While no census has (yet) been held for Vassar’s bold and bountiful squirrel population, in October 2018, the volunteers counted the squirrels in New York City’s Central Park. This data set is published on the NYC Open Data website, and in this lab we will use what we know about tables, lists, and functions to visualize these squirrel sightings.

Squirrel photo

A Central Park squirrel – or is it a data point? – spotted by Prof. Gordon

Before you start working with the data, take a look at the Google sheet we’ve prepared containing the squirrel census data. The first sheet is the original data set, and the second sheet is a pruned version removing the columns that aren’t used for this lab.

Load the pruned sheet into Pyret by pasting the following code into the definitions window:

include gdrive-sheets
include data-source

include shared-gdrive("dcic-2021",
  "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep")

include shared-gdrive("lab05-helper.arr",
  "1zpcjlBGMl4g34acSqU-P-VJ9rbZhNyDQ")


#
# Load table
#

ssid = "1SaeGoEMzTXCmI2gy5E785M9tjEaRhfd6SeCn8lqW07I"
data-sheet = load-spreadsheet(ssid)

squirrel-data =
  load-table:
    long, lat, primary-color,
    location, activities, interactions
    source: data-sheet.sheet-by-name("Pruned", true)
    sanitize primary-color using string-sanitizer
    sanitize location using string-sanitizer
    sanitize activities using string-sanitizer
    sanitize interactions using string-sanitizer
  end

The lab05-helper.arr file we loaded above provides the function

table-to-map(t :: Table) -> Image,

which takes in a table and, for each row, plots a dot on a map of Central Park, using the values in the "x", "y", and "color" columns. You can use table-to-map just as you’ve used Pyret’s built-in functions in previous labs.

However, for table-to-map to plot a location, it needs appropriate x and y values that fall within the bounds of the map of Central Park. Thus you need to scale the squirrels’ latitudes to y values and their longitudes to x values, relative to the map dimensions.

The values you need to use (defined for you in lab05-helper.arr) are:

  • LAT-MIN: the minimum latitude that fits the map

  • LAT-MAX: the maximum latitude that fits the map

  • LON-MIN: the minimum longitude that fits the map

  • LON-MAX: the maximum longitude that fits the map

  • HEIGHT: the height of the map

  • WIDTH: the width of the map

And the scaling formulas you need to implement are:

formula for x

formula for y

Task: Define the function

fun add-coords(squirrels :: Table) -> Table:
  ...
end

That adds "x" and "y" columns to the table with the values calculated by the scaling formulas.

You will need to write (simple) helper functions to do this!

To test your function, you can use the following before-and-after tables:

test-table-coords =
  table: long, lat
    row: -73.95613449, 40.79408239
    row: -73.96885747, 40.78378252
  end

test-table-coords-x-y =
  table: long, lat, x, y
    row: -73.95613449, 40.79408239, 3117716037/3380000, 352525149/260000
    row: -73.96885747, 40.78378252, 1607498311/3380000, 57342033/65000
  end

Let’s check if these coordinates look right! To do that, we need a column giving the color for each point on the map. For the moment, let’s just paint it black:

Task: Add a column labeled "color" and set each row’s color to "black".

You don’t need to define any functions to do this if you use a lambda expression!

Run table-to-map on your table after you add the "x", "y", and "color" columns. You should see a map that looks like this:

map

Note that the squirrels are approximately contained within Central Park and they’re not in the water!

There are 3,023 rows in this table, so it may take a few seconds to plot.


Checkpoint: Call over a coach when you reach this point. After they check your work, you can comment out the expressions for plotting the squirrels as black dots.


Now that you have a map of where the squirrels were sighted, it’s time to investigate other observations in the data set. One piece of information the volunteers recorded is the primary fur color of each squirrel. Let’s update our map to show each squirrel’s color.

Task: Write a function

fun map-squirrel-colors(squirrels :: Table) -> Image:
  ...
end

that draws the same map as before, but with the dot for each squirrel matching the color of its fur, recorded in the "primary-color" column.

Note that "Cinnamon" isn’t a color Pyret knows, so you should change it to "brown". If there’s no recorded fur color for a squirrel, use a white dot.

As before, you may need to define helper functions.

What do you observe about the distribution of fur color on the map?


Checkpoint: Call over a coach when you reach this point.


While the fur colors in the census data from from a mixed set of possible values, there are three interesting fields of “free text” input:

  • “Other Activities” ("activities" in our table),
  • “Other Interactions” ("interactions"), and
  • “Specific Location” ("location").

While these fields give us observations that wouldn’t otherwise fit in the data, they’re not as easy to process.

Rather than try to process these values in the table itself, let’s pull them out and get a list of all the free-text observations.

Task: Write a function

fun get-all-observations(squirrels :: Table) -> List<String>:
  ...
end

that returns a list of all the observation strings from the three columns listed above.

Additionally, you should remove observations that are blank ("") and should make all of the strings lowercase.

To do this, you don’t need to write a recursive function – instead, think how you can use Pyret’s built-in list functions:

  • append (or +),
  • map (which is the list equivalent of transform-column), and
  • filter (which is the list equivalent of filter-with).

You can test your function with the following table and list:

test-str-table =
  table: activities, interactions, location
    row: "Eating a nut", "", ""
    row: "", "Said hi", "In a tree"
  end

test-str-list =
  [list: "eating a nut", "said hi", "in a tree"]

You should now have a list of all the observations. (My favorite is "playing tag"!) But it’s still hard to analyze. To get a better feel for what’s in these strings, let’s divide them into words.

Task: Call the function split-all (defined for you in lab05-helper.arr) on your list of strings. This will give you a list of individual words.

Check that it works by using .member to check if a word is in the list.

And now it’s possible to count how many times each word occurs in the data.

Task: Call the function count-list (also defined for you in lab05-helper.arr) on your list of words.

What is the most frequent word?

  • When you’ve completed the exercises, show your code to your instructor or one of the coaches.

  • Then upload your lab05.arr file to the Lab 5 assignment on Gradescope.

This lab uses data from the Squirrel Census and includes material adapted from Kathi Fisler and colleagues at Brown University.