The purpose of this lab is to give you practice:

- extracting rows and columns from a table,
- writing and testing helper functions,
- filtering data with
`filter-with`

, - adding a column to a table,
- summarizing columns, and
- visualizing relationships.

The purpose of this lab is to give you practice:

- extracting rows and columns from a table,
- writing and testing helper functions,
- filtering data with
`filter-with`

, - adding a column to a table,
- summarizing columns, and
- visualizing relationships.

Today’s lab – and all future labs, unless otherwise noted – can be completed working in a pair, if you would like to do so.

FiveThirtyEight conducted a survey in which tens of thousands of people were asked to choose between two candies. The data is analyzed in the article “The Ultimate Halloween Candy Power Ranking”, which is worth a read after the lab. From the responses, they compiled a data set with

`Number`

attributes like the winning percentage, relative price, and sugar percentage, and also`Boolean`

attributes such as whether the candy has chocolate, is fruity, has caramel, or is hard.

In this lab, you’ll be looking at the relationships between these columns.

To get started, copy this code for loading a Pyret table from a Google Sheet and paste it at the top of your program, in the definitions pane:

include gdrive-sheets include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep")ssid= "1YoN-Z8T5DqNEmV_hpR3i6-yP5Poe9gdYwbP9nQhQlBw"data-sheet= load-spreadsheet(ssid)candy-data= load-table: name, chocolate, fruity, caramel, nutty, nougat, crisped-rice, hard, bar, pluribus, sugar-percent, price-percent, win-percent source: data-sheet.sheet-by-name("candy-data", true) end

Save this code in a file named `lab03.arr`

and press
the *Run* button to process the code in the definitions pane.
Then type `candy-data`

in the interactions pane to see the
data as a table.

**Note**: For this lab, you’ll want to refer to the Pyret
Tables Documentation (and *not* the built-in Pyret
documentation).

For each exercise, separate it from the next one by copy-and-pasting this line:

```
# -----------------------------------
```

Let’s use the power of filtering to learn more from our candy data.

We want to know which candies have the most sugar!

**Task**:
First you’ll first need to write a function that takes a table Row
as input and returns a Boolean that answers the question “is the
`sugar-percent`

of the Row greater than 75%?”

We’ve started the function for you:

fun over-75-percent-sugar(r :: Row) -> Boolean: doc: "Returns true if the sugar-percent column in a row is over 75%" ... where: over-75-percent-sugar(candy-data.row-n(0)) is false over-75-percent-sugar(candy-data.row-n(4)) is true end

Note the examples in the `where`

block to check that the
function works correctly! Often when we’re working with a big table
of data, we’ll define a second, smaller table just for testing,
since our real data set might change. For this lab, the data set is
“frozen” – it’s not changing – so it’s
okay to use it for testing your functions.

**Task**:
Next, write an *expression* (not a function) to filter our
table to only include the candy with more than 75% sugar.

You may want to review the `filter-with`

function we saw in class. The first input to `filter-with`

is
the Table to be filtered and second input is the name of the predicate
function (which you just wrote!) That’s right, as you hopefully remember, functions
can be passed as inputs to other functions! If you’ve got
everything right, your expression will evaluate to a table with 15 rows in
it.

Now that we the candies that will satisfy our sweet tooth, let’s consider the high end of the market.

**Task**:
Write an expression in the definitions pane that produces a table
containing only the candies where `price-percent`

is greater
than 90%.

This requires you to write another helper function. As we demonstrated
above, be sure to test this function with a `where`

block –
and to do so for each of the subsequent functions you write.

The result should be a table with eight rows in it.

How many of the candies have chocolate?

**Task**:
Write an expression (in the definitions pane) that outputs the
*number* of chocolate candies.

You can get the length of a table by writing
`⟨table⟩``.length()`

, where
`⟨table⟩` is the name of a table or an expression
that evaluates to a table.

If your filter is correct, you’ll find a table with 37 chocolate candies in it.

Of the candies that have chocolate, what proportion *also* have
caramel?

**Task**:
Write an expression in the definitions pane that outputs the proportion
of chocolate candies with caramel.

If you got the right answer, you’ll see that 10/37 or about 27% of the chocolate candies also have caramel.

**Checkpoint**:
When you reach this point in the lab, raise your hand and go through your
solutions with a coach or the instructor before continuing.

**Task**:
Build a column of Boolean values that indicates whether a candy is
fruity and hard, but not a pluribus

(i.e., multiple candies in a
packet, like Skittles or M&Ms).**Hint**: Take a look at the
`build-column`

function.

**Task**:
Write an expression in the definitions pane that uses this new
column to compute *how many* candies meet this condition.

**Task**:
Write an expression in the definitions pane to compute which candy has
the highest winning percentage out of those candies for which your new
column is true.

**Hint**: This requires the use of the
`order-by`

function, in addition to
`.row-n`

and
`filter-with`

.

**Task**:
Write an expression in the definitions pane to compute the average winning
percentage for the candies for which this column is true.

**Hint**: This requires the use of the
`mean`

function.

What’s the relationship between sugar and winning percentage? Do these two attributes seem correlated? One way to gain intuition on this is to create a scatterplot that puts one attribute on each axis.

**Task**:
Look at the documentation for the `scatter-plot`

function. Try generating a scatterplot of sugar vs winning percentage.

We recommend putting sugar on the `x`-axis since our
hypothesis is that it affects a candy’s winning percentage.

**Task**:
Write an expression in the definitions pane that creates the
scatterplot, and, in a comment, summarize what, if anything, you think it
shows.

This lab has mostly been about getting you comfortable working with tabular data and practicing some common operators on tables. It also gets you thinking about our course’s focus on data: What patterns of manipulating data do we often use in computations? How does the organization of our data impact our ability to answer these questions?

Here, we see that filtering, ordering, and summarizing data are some of the key operations. So far we’ve only looked at these operations on tabular data, but these same building blocks will arise many times through this course. When you have a computation to perform around data, you should start by thinking through what combinations of filtering, sorting and summarizing will help you compute your answer.

- When you’ve completed the exercises, show your code to your instructor or one of the coaches.
- Then upload your
`lab03.arr`

file to the Lab 3 assignment on Gradescope. If you worked with a partner, you should submit a single copy of the lab with both your names – see these instructions.