The goal of this lab is to give you practice working to analyze real data, which will require working with both tables and lists. As you work on each part of the lab, you’ll practice breaking problems into smaller tasks, which you can complete by writing functions and expressions.
Learning objective
Contents
- Introduction
- Getting started
- Part 1: Mapping squirrels
- Part 2: Squirrel colors
- Part 3: Describing squirrels
- Submitting the lab
Introduction
Hundreds of volunteers have devoted their time to creating an important data set: the Squirrel Census! This has involved carefully recording individual squirrel sightings, including information about the appearance of the squirrel, where it was seen, and what it was doing.
While no census has (yet) been held for Vassar’s bold and bountiful squirrel population, in October 2018, the volunteers counted the squirrels in New York City’s Central Park. This data set is published on the NYC Open Data website, and in this lab we will use what we know about tables, lists, and functions to visualize these squirrel sightings.
Getting started
Before you start working with the data, take a look at the Google sheet we’ve prepared containing the squirrel census data. The first sheet is the original data set, and the second sheet is a pruned version removing the columns that aren’t used for this lab.
Load the pruned sheet into Pyret by pasting the following code into the definitions window:
include gdrive-sheets include data-source include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") include shared-gdrive("lab04-helper.arr", "1WlN3ffMr55w207EiFwZ7Id6kkEJYJ43J") # # Load table # ssid = "1SaeGoEMzTXCmI2gy5E785M9tjEaRhfd6SeCn8lqW07I" data-sheet = load-spreadsheet(ssid) squirrel-data = load-table: long, lat, primary-color, location, activities, interactions source: data-sheet.sheet-by-name("Pruned", true) sanitize primary-color using string-sanitizer sanitize location using string-sanitizer sanitize activities using string-sanitizer sanitize interactions using string-sanitizer end
The sanitize
lines deal with missing values in
the primary-color
, location
, activities
,
and interactions
columns, replacing them with the empty
string, ""
, for our convenience.
Part 1: Mapping squirrels
The lab04-helper.arr
file we loaded above provides the
function
table-to-map(t :: Table) -> Image
,
which takes in a table and, for each row, plots a dot on a map of
Central Park, using the values in the "x"
,
"y"
, and "color"
columns. You can use table-to-map
just as you’ve used
Pyret’s built-in functions in previous labs.
However, for table-to-map
to plot a location, it needs
appropriate x
and y
values that fall within the
bounds of the map of Central Park. Thus you need to scale the
squirrels’ latitudes to y
values and their longitudes
to x
values, relative to the map dimensions.
The values you need to use, which are defined for you in
lab04-helper.arr
, are:
LAT-MIN
: the minimum latitude that fits the mapLAT-MAX
: the maximum latitude that fits the map
LON-MIN
: the minimum longitude that fits the mapLON-MAX
: the maximum longitude that fits the map
HEIGHT
: the height of the mapWIDTH
: the width of the map
And the scaling formulas you need to implement are:
Task: Define the function
fun add-coords(squirrels :: Table) -> Table: ... end
That adds "x"
and
"y"
columns to the table
(using build-column
)
with the values calculated by the scaling formulas.
You will need to write (simple) helper functions to do this!
To test your function, you can use the following before-and-after tables:
test-table-coords = table: long, lat row: -73.95613449, 40.79408239 row: -73.96885747, 40.78378252 end test-table-coords-x-y = table: long, lat, x, y row: -73.95613449, 40.79408239, 3117716037/3380000, 352525149/260000 row: -73.96885747, 40.78378252, 1607498311/3380000, 57342033/65000 end
Let’s check if these coordinates look right! To do that, we need a column giving the color for each point on the map. For the moment, let’s just make them all black:
Task:
Add a column labeled "color"
and set each
row’s color to "black"
.
You don’t need to define any functions to do this if you use a lambda expression!
Run table-to-map
on your table after you add
the "x"
, "y"
,
and "color"
columns. You should see a map
that looks like this:
There are 3,023 rows in this table, so it may take a few seconds to plot.
Checkpoint
Call over a coach or the instructor when you reach this point. After they check your work, you can comment out the expressions for plotting the squirrels as black dots.
Part 2: Squirrel colors
Now that you have a map of where the squirrels were sighted, it’s time to investigate other observations in the data set. One piece of information the volunteers recorded is the primary fur color of each squirrel. Let’s update our map to show each squirrel’s color.
Task: Write a function
fun map-squirrel-colors(squirrels :: Table) -> Image: ... end
that draws the same map as before using table-to-map
,
but with the dot for each squirrel matching the color of its fur,
recorded in the "primary-color"
column. You
can give it as input your table with the x and y coordinate columns
added.
Note that "Cinnamon"
isn’t a color Pyret
knows, so you should change it to "brown"
. If
there’s no recorded fur color for a squirrel, use a white dot.
As before, you may need to define helper functions.
What do you observe about the distribution of fur color on the map?
Checkpoint
Call over a coach or the instructor when you reach this point.
Part 3: Describing squirrels
While the fur colors in the census data are from a fixed set of possible
values, there are three interesting fields of free text
input:
Other Activities
("activities"
in our table),Other Interactions
("interactions"
), andSpecific Location
("location"
).
While these fields give us observations that wouldn’t otherwise fit in the data, they’re not as easy to process.
Rather than try to process these values in the table itself, let’s pull them out and get a list of all the free-text observations.
Task: Write a function
fun get-all-observations(squirrels :: Table) -> List<String>: ... end
that returns a list of all the observation strings from the three columns listed above.
Additionally, you should remove observations that are blank
(""
) and should make all of the strings
lowercase.
To do this, think how you can use Pyret’s built-in list functions:
append
(or+
), which combines lists much asstring-append
combined strings,map
, which is the list equivalent oftransform-column
, andfilter
, which is the list equivalent offilter-with
. Remember that while the table is the first argument fortransform-column
andfilter-with
, the function comes first formap
andfilter
. Frustrating, we know.
You can test your function with the following table and list:
test-str-table = table: activities, interactions, location row: "Eating a nut", "", "" row: "", "Said hi", "In a tree" end test-str-list = [list: "eating a nut", "said hi", "in a tree"]
You should now have a list of all the observations. My favorite
is "playing tag"
! But it’s
still hard to analyze. To get a better feel for what’s in these
strings, let’s divide them into words.
Task: Call the
function split-all
(defined for you
in lab04-helper.arr
) on your list of strings. This will
give you a list of individual words.
Check that it works by using member
to check if a word
is in the list.
And now it’s possible to count how many times each word occurs in the data.
Task:
Call the function count-list
(also defined for you in
lab04-helper.arr
) on your list of words.
What is the most frequent word?
Submitting the lab
- When you’ve completed the exercises, show your code to your instructor or one of the coaches.
- Then upload your
lab04.arr
file to the Lab 4 assignment on Gradescope. If you worked with a partner, you should submit a single copy of the lab with both your names – see these instructions.