Lab 4: Squirrels!

9 February 2024

Learning objective

The goal of this lab is to give you practice working to analyze real data, which will require working with both tables and lists. As you work on each part of the lab, you’ll practice breaking problems into smaller tasks, which you can complete by writing functions and expressions.

Contents

Introduction

Hundreds of volunteers have devoted their time to creating an important data set: the Squirrel Census! This has involved carefully recording individual squirrel sightings, including information about the appearance of the squirrel, where it was seen, and what it was doing.

While no census has (yet) been held for Vassar’s bold and bountiful squirrel population, in October 2018, the volunteers counted the squirrels in New York City’s Central Park. This data set is published on the NYC Open Data website, and in this lab we will use what we know about tables, lists, and functions to visualize these squirrel sightings.

A Central Park squirrel – or is it a data point? – spotted by Prof. Gordon

Getting started

Before you start working with the data, take a look at the Google sheet we’ve prepared containing the squirrel census data. The first sheet is the original data set, and the second sheet is a pruned version removing the columns that aren’t used for this lab.

Load the pruned sheet into Pyret by pasting the following code into the definitions window:

include gdrive-sheets
include data-source

include shared-gdrive("dcic-2021",
  "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep")

include shared-gdrive("lab04-helper.arr",
  "1WlN3ffMr55w207EiFwZ7Id6kkEJYJ43J")


#
# Load table
#

ssid = "1SaeGoEMzTXCmI2gy5E785M9tjEaRhfd6SeCn8lqW07I"
data-sheet = load-spreadsheet(ssid)

squirrel-data =
  load-table:
    long, lat, primary-color,
    location, activities, interactions
    source: data-sheet.sheet-by-name("Pruned", true)
    sanitize primary-color using string-sanitizer
    sanitize location using string-sanitizer
    sanitize activities using string-sanitizer
    sanitize interactions using string-sanitizer
  end

The sanitize lines deal with missing values in the primary-color, location, activities, and interactions columns, replacing them with the empty string, "", for our convenience.

Part 1: Mapping squirrels

The lab04-helper.arr file we loaded above provides the function

table-to-map(t :: Table) -> Image,

which takes in a table and, for each row, plots a dot on a map of Central Park, using the values in the "x", "y", and "color" columns. You can use table-to-map just as you’ve used Pyret’s built-in functions in previous labs.

However, for table-to-map to plot a location, it needs appropriate x and y values that fall within the bounds of the map of Central Park. Thus you need to scale the squirrels’ latitudes to y values and their longitudes to x values, relative to the map dimensions.

The values you need to use, which are defined for you in lab04-helper.arr, are:

  • LAT-MIN: the minimum latitude that fits the map
  • LAT-MAX: the maximum latitude that fits the map

  • LON-MIN: the minimum longitude that fits the map
  • LON-MAX: the maximum longitude that fits the map

  • HEIGHT: the height of the map
  • WIDTH: the width of the map

And the scaling formulas you need to implement are:

Task: Define the function

fun add-coords(squirrels :: Table) -> Table:
  ...
end

That adds "x" and "y" columns to the table (using build-column) with the values calculated by the scaling formulas.

You will need to write (simple) helper functions to do this!

To test your function, you can use the following before-and-after tables:

test-table-coords =
  table: long, lat
    row: -73.95613449, 40.79408239
    row: -73.96885747, 40.78378252
  end

test-table-coords-x-y =
  table: long, lat, x, y
    row: -73.95613449, 40.79408239, 3117716037/3380000, 352525149/260000
    row: -73.96885747, 40.78378252, 1607498311/3380000, 57342033/65000
  end

Let’s check if these coordinates look right! To do that, we need a column giving the color for each point on the map. For the moment, let’s just make them all black:

Task: Add a column labeled "color" and set each row’s color to "black".

You don’t need to define any functions to do this if you use a lambda expression!

Run table-to-map on your table after you add the "x", "y", and "color" columns. You should see a map that looks like this:

Note that the squirrels are approximately contained within Central Park and they’re not in the water!

There are 3,023 rows in this table, so it may take a few seconds to plot.

Checkpoint

Call over a coach or the instructor when you reach this point. After they check your work, you can comment out the expressions for plotting the squirrels as black dots.

Part 2: Squirrel colors

Now that you have a map of where the squirrels were sighted, it’s time to investigate other observations in the data set. One piece of information the volunteers recorded is the primary fur color of each squirrel. Let’s update our map to show each squirrel’s color.

Task: Write a function

fun map-squirrel-colors(squirrels :: Table) -> Image:
  ...
end

that draws the same map as before using table-to-map, but with the dot for each squirrel matching the color of its fur, recorded in the "primary-color" column. You can give it as input your table with the x and y coordinate columns added.

Note that "Cinnamon" isn’t a color Pyret knows, so you should change it to "brown". If there’s no recorded fur color for a squirrel, use a white dot.

As before, you may need to define helper functions.

What do you observe about the distribution of fur color on the map?

Checkpoint

Call over a coach or the instructor when you reach this point.

Part 3: Describing squirrels

While the fur colors in the census data are from a fixed set of possible values, there are three interesting fields of free text input:

  • Other Activities ("activities" in our table),
  • Other Interactions ("interactions"), and
  • Specific Location ("location").

While these fields give us observations that wouldn’t otherwise fit in the data, they’re not as easy to process.

Rather than try to process these values in the table itself, let’s pull them out and get a list of all the free-text observations.

Task: Write a function

fun get-all-observations(squirrels :: Table) -> List<String>:
  ...
end

that returns a list of all the observation strings from the three columns listed above.

Additionally, you should remove observations that are blank ("") and should make all of the strings lowercase.

To do this, think how you can use Pyret’s built-in list functions:

  • append (or +), which combines lists much as string-append combined strings,
  • map, which is the list equivalent of transform-column, and
  • filter, which is the list equivalent of filter-with. Remember that while the table is the first argument for transform-column and filter-with, the function comes first for map and filter. Frustrating, we know.

You can test your function with the following table and list:

test-str-table =
  table:
    activities, interactions, location
    row: "Eating a nut", "", ""
    row: "", "Said hi", "In a tree"
  end

test-str-list =
  [list: "eating a nut", "said hi", "in a tree"]

You should now have a list of all the observations. My favorite is "playing tag"! But it’s still hard to analyze. To get a better feel for what’s in these strings, let’s divide them into words.

Task: Call the function split-all (defined for you in lab04-helper.arr) on your list of strings. This will give you a list of individual words.

Check that it works by using member to check if a word is in the list.

And now it’s possible to count how many times each word occurs in the data.

Task: Call the function count-list (also defined for you in lab04-helper.arr) on your list of words.

What is the most frequent word?

Submitting the lab

  • When you’ve completed the exercises, show your code to your instructor or one of the coaches.
  • Then upload your lab04.arr file to the Lab 4 assignment on Gradescope. If you worked with a partner, you should submit a single copy of the lab with both your names – see these instructions.