Lab 3: Tables

In this lab you’re going to be working with tables! The goal of this lab is to give you practice with:

  • Extracting rows and columns from a table
  • Writing and testing helper functions
  • Filtering data with filter-with
  • Writing nested functions
  • Adding a column to a table
  • Summarizing columns

Setup

FiveThirtyEight conducted a survey in which tens of thousands of people were asked to choose between two candies; from the survey’s responses, they computed the winning percentage of each candy. Then, they compiled a data-set with Number attributes like the winning percentage, relative price, and sugar percentage, and also Boolean attributes such as chocolate, fruity, caramel, and hard. This data has been analyzed in a FiveThirtyEight article, “The Ultimate Halloween Candy Power Ranking,” which is definitely worth a read after the lab.

In this lab, you’ll be looking at the relationships between these columns.

First, to get started include this code at the top of your program:

include shared-gdrive("dcic-2021",
  "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep")
include gdrive-sheets

ssid = "1fJp5pbmutjQ9HlyUA8b0Yc-BqNxSd0i9wrZgZxpSCKM"
data-sheet = load-spreadsheet(ssid)
candy-data = 
  load-table: name, chocolate, fruity, caramel, nutty, nougat, crisped-rice, 
    hard, bar, pluribus, sugar-percent, price-percent, win-percent
    source: data-sheet.sheet-by-name("candy-data", true)
  end

This code loads a Pyret table from a Google Sheet. Press the “Run” button in the top right corner to process the code in the definitions window. Then type candy-data in the interactions window to see the data as a table.

For this lab, you’ll want to refer to the Pyret Tables Documentation and not the built-in Pyret documentation.

Part 1: Filtering

Let’s use the power of filtering to learn more from our candy data.

1.1: Sugar Rush

We want to know which candies have the most sugar. Write an expression in the definitions window that produces a table containing only the candies where sugar-percent > 0.75. To do this, you'll first need to write a function that takes a table Row as input and returns a Boolean that answers the question is the sugar-percent of that row greater than 75%. We've started the function for you. For this function you are just trying to determine if the input, r, has a value in the sugar-percent column that is greater than 75%. Be sure to test your function with a where block. For example, if you pass in the row candy-data.row-n(0) to your function, it should return false, and the inputting row candy-data.row-n(4) should cause your function to return true. As a reminder, the first row in the table has the index of 0.

fun over-75-percent-sugar(r :: Row) -> Boolean:
  doc: "Returns true if the sugar-percent column in a row is over 75%"
  ... # Replace the "..." and this comment with your code
end

Once you have written that function, we'll write an expression (not a function) to filter our table. Take a look at the filter-with function. The first input to the filter-with function is the Table to be filtered and second input is the name of the filter function, which is the function you just wrote above. That's right, as you hopefully remember, functions can be passed as inputs to other functions! Call over a coach if you want help using this structure. If you've got everything right, your expression will evaluate to a table with 15 rows in it. Hint: your expression should be a call to filter-with with the proper inputs.

Suggestion: When you’re done writing the expression, copy and paste this line

#-----------------------------------

into the definitions window directly below it to separate it from the next part.

1.2: Bougie

Now that we know how to satisfy our sweet tooth, write an expression in the definitions window that produces a table containing only the candies where price-percent is greater than 90%. Be sure to test your predicate function with a where block. If you got your filter correct, you should have a table with 8 rows in it.

Suggestion: Once again, separate it from the next part by pasting this line

#-----------------------------------

on the line below it.

1.3: Chocolate

How many of the candies have chocolate?

Write an expression in the definitions window that outputs this number in the interactions window. You can get the length of a table by writing <table>.length() where <table> is the name of (or an expression that evaluates to) a table. If you got your filter correct, you'll find a table with 37 chocolate candies in it.

Suggestion: Once again, separate it from the next part by pasting this line

#-----------------------------------

on the line below it.

1.4: Chocolate and Caramel

Of the candies that have chocolate, what proportion also have caramel?

Write an expression in the definitions window that outputs this proportion in the interactions window. If you got the right answer, you'll see that 10/37 or about 27% of the chocolate candies also have caramel.

Suggestion: Once again, separate it from the next part by pasting this line

#-----------------------------------

on the line below it.

1.5: Chocolate and Anything (optional)

Next, we'd like to know what attribute is paired most frequently with chocolate. We could write filter functions like we did for part 1.4 for all the other possible pairings, but that would be very tedious and redundant.

So instead, to help answer this question, let's write a function that generalizes the expression in 1.4. This function will take in a String representing the name of the ingredient being paired with chocolate and a Table that is only candies that have the chocolate attribute. This function returns a number which is the proportion of candies that have the attribute specified to the total number of chocolate candies.

The function declaration should look something like this:

fun candy-with-attr(t :: Table, attr :: String) -> Number:

Look for commonalities across the code you wrote for the predicate functions in (1.3) and in (1.4). To do this you will need to use nested functions (a function defined inside of another function). This will allow the filter function to access the attr string, even though it is not being passed into the function.

Compare the results of your function on the inputs “fruity,” “nutty,” and “caramel”. Hopefully your answer to “caramel” matches the answer you got in part 1.4! You don’t have to write code for this comparison – just use your new function to compute all three proportions and compare them manually.


Checkpoint: Call over a coach once you reach this point and talk over your code with them.


Part 2: Building columns and analyzing them

2.1: New column

Build a column of Boolean values that indicates whether a candy is fruity and hard, but not a pluribus (multiple candies in a packet like Skittles or M&Ms). This should produce a new table with an added column. Write an expression in the definitions window that uses this new column to compute how many candies meet this condition.

Hint: Take a look at the build-column function. Call over a coach if you want help using this.

2.2: Maximum

Of the candies for which this Boolean is true (fruity and hard, but not a pluribus), which has the highest winning percentage?

Hint: This requires the use of the order-by function, in addition to the row-n and filter-with functions.

An example of the row-n function to take the fifth row of a table:

table.row-n(4)

2.3: Mean

Of the candies for which this Boolean is true, what’s the average winning percentage?

Hint: This requires the use of the mean function.

Takeaways

This lab has mostly been about getting you comfortable working with tabular data and practicing some common operators on tables. It also gets you thinking about our course’s focus on data: what patterns of manipulating data do we often use in computations? How does the organization of our data impact our ability to answer these questions?

Here, we see that filtering, ordering, and summarizing data are some of the key operations. So far we’ve only looked at these operations on tabular data, but these same building blocks will arise many times through this course. When you have a computation to perform around data, you should start by thinking through what combinations of filtering, sorting and summarizing will help you compute your answer.

Submitting the Lab

When you've complete the exercises, show your code to your instructor or one of the coaches.

Then, upload your lab03.arr file to the Lab 3 assignment on Gradescope.

Acknowledgments

This lab includes material adapted from Kathi Fisler and colleagues at Brown University.