courses:cs101-2022b:labs:03 [Computer Science

<html> <style type=“text/css”> .comment { font-style: italic; color: #c0342d; } .name, .variable, code var, pre var { color: #9e5cb1; } .name { font-weight: bold; } .variable, pre var, code var { font-style: italic; } .keyword { color: #abafb3; } .builtin, .string { color: #417fb8; } .constant { color: #e89f27; } table { margin: 1em auto; } table td { padding: 1pt 4pt; } div.dw-content a { text-decoration: underline; } div.dw-content p { margin-bottom: 1em; max-width: 700px; } div.dw-content p, div.dw-content td, div.dw-content li { font-size: 13pt; } blockquote { font-size: inherit; } </style>

<h1 id=“lab-3-candy-analysis”>Lab 3: Candy analysis</h1> <p>16 September 2022</p>

<h2 id=“todays-lab”>Today’s lab</h2> <p>The purpose of this lab is to give you practice:</p> <ul> <li>extracting rows and columns from a table,</li> <li>writing and testing helper functions,</li> <li>filtering data with

filter-with

,</li> <li>adding a column to a table,</li> <li>summarizing columns, and</li> <li>visualizing relationships.</li> </ul>

<hr />

<p>This lab can be completed in pairs!</p> <p>If you choose to work in a pair, you’ll make a single code file which you’ll upload to Gradescope with both your names.</p> <p>As you work through the lab, take turns “driving” and “navigating”. That is, for a while you type in CPO while your partner is reading the assignment and then you trade.</p>

<hr />

<h2 id=“getting-started”>Getting started</h2> <p><a href=“https://en.wikipedia.org/wiki/FiveThirtyEight”>FiveThirtyEight</a> conducted a survey in which tens of thousands of people were asked to choose between two candies. From the responses, they compiled a data set with</p> <ul> <li>

Number

attributes like the winning percentage, relative price, and sugar percentage, and also</li> <li>

Boolean

attributes such as whether the candy has chocolate, is fruity, has caramel, or is hard.</li> </ul> <p>This <a href=“https://github.com/fivethirtyeight/data/blob/master/candy-power-ranking/candy-data.csv”>data</a> is analyzed in the article <a href=“https://fivethirtyeight.com/videos/the-ultimate-halloween-candy-power-ranking/”>“The Ultimate Halloween Candy Power Ranking”</a>, which is worth a read after the lab.</p> <p>In this lab, you’ll be looking at the relationships between these columns.</p> <p>To get started, include this code at the top of your program, in the definitions pane:</p> <pre>

<span class="keyword">include</span> gdrive-sheets

<span class="keyword">include</span> shared-gdrive(<span class="string">&quot;dcic-2021&quot;</span>,
  <span class="string">&quot;1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep&quot;</span>)


<var>ssid</var> = <span class="string">&quot;1YoN-Z8T5DqNEmV_hpR3i6-yP5Poe9gdYwbP9nQhQlBw&quot;</span>
<var>data-sheet</var> = load-spreadsheet(ssid)

<var>candy-data</var> =
  <span class="keyword">load-table</span>:
    name, chocolate, fruity, caramel, nutty, nougat, crisped-rice,
    hard, bar, pluribus, sugar-percent, price-percent, win-percent
    <span class="keyword">source</span>: data-sheet.sheet-by-name(<span class="string">&quot;candy-data&quot;</span>, <span class="constant">true</span>)
  <span class="keyword">end</span>

</pre> <p>This code loads a Pyret table from a Google Sheet. Press the <em>Run</em> button to process the code in the definitions pane. Then type

candy-data

in the interactions pane to see the data as a table.</p> <p><strong>Note</strong>: For this lab, you’ll want to refer to the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html”>Pyret Tables Documentation</a> (and <em>not</em> the built-in Pyret documentation).</p>

<h2 id=“part-1-filtering”>Part 1: Filtering</h2> <p>Let’s use the power of filtering to learn more from our candy data.</p> <p>Suggestion: For each section, separate it from the next part by copy-and-pasting this line:</p> <pre>

# -----------------------------------

</pre>

<h3 id=“exercise-11-sugar-rush”>Exercise 1.1: Sugar rush</h3> <p>We want to know which candies have the most sugar!</p> <p><strong>Task</strong>: First you’ll first need to write a function that takes a table Row as input and returns a Boolean that answers the question “is the

sugar-percent

of the Row greater than 75%?”. We’ve started the function for you:</p> <pre>

<span class="keyword">fun</span> <span class="name">over-75-percent-sugar</span>(r :: Row) -&gt; Boolean:
  <span class="keyword">doc</span>: <span class="string">&quot;Return true if the sugar-percent column in a row is over 75%&quot;</span>
  ...
<span class="keyword">where</span>:
  over-75-percent-sugar(candy-data.row-n(0)) <span class="keyword">is</span> <span class="constant">false</span>
  over-75-percent-sugar(candy-data.row-n(4)) <span class="keyword">is</span> <span class="constant">true</span>
<span class="keyword">end</span>

</pre> <p>Note the examples in the

where

block to check that the function works correctly! Often when we’re working with a big table of data, we’ll define a second, smaller table just for testing, since our real data set might change. For this lab, the data set is frozen and it’s okay to use it for testing your functions.</p> <p><strong>Task</strong>: Next, write an <em>expression</em> (not a function) to filter our table to only include the candy with more than 75% sugar.</p> <p>You may want to review the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html#filter-with”>

filter-with

</a> function we saw in class. The first input to

filter-with

is the Table to be filtered and second input is the name of the predicate function (which you just wrote!)</p> <p>(That’s right, as you hopefully remember, functions can be passed as inputs to other functions!)</p> <p>If you’ve got everything right, your expression will evaluate to a table with 15 rows in it.</p>

<h3 id=“exercise-12-pricey”>Exercise 1.2: Pricey</h3> <p>Now that we the candies that will satisfy our sweet tooth, let’s consider the high end of the market.</p> <p><strong>Task</strong>: Write an expression in the definitions pane that produces a table containing only the candies where

price-percent

is greater than 90%.</p> <p>This requires you to write another helper function. As we demonstrated above, be sure to test this function with a

where

block – and to do so for each of the subsequent functions you write.</p> <p>The result should be a table with eight rows in it.</p>

<h3 id=“exercise-13-chocolate”>Exercise 1.3: Chocolate</h3> <p>How many of the candies have chocolate?</p> <p><strong>Task</strong>: Write an expression (in the definitions pane) that outputs the number of chocolate candies.</p> <p><strong>Hint</strong>: You can get the length of a table by writing

⟨table⟩.length()

, where

⟨table⟩

is the name of a table or an expression that evaluates to a table.</p> <p>If your filter is correct, you’ll find a table with 37 chocolate candies in it.</p> <h3 id=“exercise-14-chocolate-and-caramel”>Exercise 1.4: Chocolate and caramel</h3> <p>Of the candies that have chocolate, what proportion <em>also</em> have caramel?</p> <p><strong>Task</strong>: Write an expression in the definitions pane that outputs the proportion of chocolate candies with caramel.</p> <p>If you got the right answer, you’ll see that 10/37 or about 27% of the chocolate candies also have caramel.</p>

<hr />

<p><strong>Checkpoint</strong>: Call over a coach once you reach this point and talk over your code with them.</p>

<hr />

<h2 id=“part-2-building-columns-and-analyzing-them”>Part 2: Building columns and analyzing them</h2>

<h3 id=“exercise-21-new-column”>Exercise 2.1: New column</h3> <p><strong>Task</strong>: Build a column of Boolean values that indicates whether a candy is fruity and hard, but not a <em>pluribus</em> (i.e., multiple candies in a packet, like Skittles or M&Ms).</p> <p><strong>Hint</strong>: Take a look at the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html#build-column”>

build-column

</a> function. Call over a coach if you want help using this.</p> <p><strong>Task</strong>: Write an expression in the definitions pane that uses this new column to compute how many candies meet this condition.</p>

<h3 id=“exercise-22-maximum”>Exercise 2.2: Maximum</h3> <p><strong>Task</strong>: Write an expression in the definitions pane to compute <em>which</em> of the candies for which your new column is true has the highest winning percentage.</p> <p><strong>Hint</strong>: This requires the use of the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html#order-by”>

order-by

</a> function, in addition to

.row-n

and <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html#filter-with”>

filter-with

</a>.</p>

<h3 id=“exercise-23-mean”>Exercise 2.3: Mean</h3> <p><strong>Task</strong>: Write an expression in the definitions pane to compute the average winning percentage for the candies for which this column is true.</p> <p><strong>Hint</strong>: This requires the use of the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html#mean”>

mean

</a> function.</p>

<h2 id=“part-3-scatterplot”>Assignment 3 preview: Visualization</h2>

<p>On Assignment 3 – and in class next week – we’ll be drawing various plots to visualize tabular data. If you have time at the end of lab, we encourage you to try visualizing the candy data we used in this lab. Here’s a question to look at:</p>

<p>What’s the relationship between sugar and winning percentage? Do these two attributes seem correlated? One way to gain an intuition on this is to create a scatterplot that puts one attribute on each axis.</p>

<p>We haven’t made plots in class yet, but at this point you’re getting good at reading the documentation and trying things out!</p>

<p><strong>Task</strong>: Look at the documentation for the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html#scatter-plot”>

scatter-plot

</a> function. Try to figure out how to use it to generate a scatterplot of sugar vs winning percentage.</p> <p>It doesn’t matter which variable goes on which axis. If you get stuck, ask for help!</p> <p><strong>Task</strong>: Write an expression in the definitions pane that creates the scatterplot, and, in a comment, summarize what, if anything, you think it shows.</p>

<p>On Assignment 3, you’ll be using other visualizations, but they’re created in much the same way!</p>

<h2 id=“takeaways”>Takeaways</h2> <p>This lab has mostly been about getting you comfortable working with tabular data and practicing some common operators on tables. It also gets you thinking about our course’s focus on data: What patterns of manipulating data do we often use in computations? How does the organization of our data impact our ability to answer these questions?</p> <p>Here, we see that filtering, ordering, and summarizing data are some of the key operations. So far we’ve only looked at these operations on tabular data, but these same building blocks will arise many times through this course. When you have a computation to perform around data, you should start by thinking through what combinations of filtering, sorting and summarizing will help you compute your answer.</p>

<h2 id=“submitting-the-lab”>Submitting the lab</h2> <ul> <li>When you’ve completed the exercises, show your code to your instructor or one of the coaches.</li> <li>Then upload your

lab03.arr

file to the Lab 3 assignment on <a href=“https://www.gradescope.com”>Gradescope</a>.</li> </ul>

<h2 id=“acknowledgments”>Acknowledgments</h2> <p>This lab includes material adapted from Kathi Fisler and colleagues at Brown University.</p> </html>