courses:cs101-2022b:assignments:04 [Computer Science

<html> <style type=“text/css”> .comment { font-style: italic; color: #c0342d; } .name, .variable, code var, pre var { color: #9e5cb1; } .name { font-weight: bold; } .variable, pre var, code var { font-style: italic; } .keyword { color: #abafb3; } .builtin, .string { color: #417fb8; } .constant { color: #e89f27; } table { margin: 1em auto; } table td { padding: 1pt 4pt; } div.dw-content a { text-decoration: underline; } div.dw-content p, div.dw-content li, div.dw-content li p { font-size: 13pt; margin-bottom: 1em; max-width: 750px; } p code { font-size: 11pt; } blockquote { font-size: inherit; } pre kbd, code kbd { background: inherit; color: inherit; box-shadow: none; padding: 0; font: inherit; font-weight: bold; } a.secret { color: inherit; text-decoration: none !important; } a.secret:hover { text-decoration: underline !important; } ol li div.task p:first-child {

  margin: 0;

} div.task, p.task {

  background-color: #f2d4d7 !important;
  border: 1px solid #6366a !important;
  padding: 1em 2.25em;
  margin: 2em 1.5em;

} p.task, div.task p:first-child {

  text-indent: -1.5em;

}

blockquote {

  border: 0;
  margin: 1.5rem;

} </style>

<h1 id=“assignment-4-famous-people”>Assignment 4: Famous People</h1>

<p><strong>Assigned</strong>: Thursday, 22 September <br /> <strong>Due</strong>: Wednesday, 28 September, 11:59 p.m.</p>

<h2 id=“introduction”>Introduction</h2>

<p>The <a href=“https://www.nature.com/articles/sdata201575”>Pantheon 1.0 data set</a> contains information about 11,340 famous individuals, based on articles in international versions of <a href=“https://www.wikipedia.org”>Wikipedia</a>.</p>

<p>For this assignment, we have taken a random sample of 1000 rows (that is, information about 1000 people) and have selected a few of the most interesting columns.</p>

<p class=“task”><strong>Task</strong>: Load the table from <a href=“https://docs.google.com/spreadsheets/d/15EZE-CYs-UawUhvdZQsEXLymWJQR_OOPejQOjy1whSU”>this Google spreadsheet</a>.</p>

<p>You can refer to <a href=“https://www.cs.vassar.edu/courses/cs101-2022b/assignments/03”>Assignment 3</a> or the recent class slides and labs for the required

include

statements and code for loading a table from a Google spreadsheet. Use the same column names that are in the spreadsheet.</p>

<p>For the table functions, refer to <a href=“https://www.cs.vassar.edu/~cs101/3/resources/tables.html”>this page</a> rather than the Pyret documentation.</p>

<h2 id=“part-1-considerations”>Part 1: Considerations</h2>

<p>Before we try to work with this data, we should think about what the valid contents of the fields could be. It turns out, this can be pretty hard, and many programmers and companies get this wrong in practice!</p>

<p class=“task”><strong>Task</strong>: Read this short article on <a href=“https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names”>falsehoods programmers believe about names</a>.</p>

<p class=“task”><strong>Task</strong>: Read this article on <a href=“https://slate.com/technology/2019/10/gender-binary-nonbinary-code-databases-values.html”>gender storage in databases</a>.</p>

<p class=“task”><strong>Task</strong>: Answer the following questions in a multi-line comment (

#| ... |#

) at the top of your program before moving on to Part 2.</p>

<ol> <li><p>Take a look at the Pantheon 1.0 data. Based on the two articles you’ve just read, what are two assumptions that this table seems to reflect? Provide an example from the dataset of each assumption you name and explain why the assumption is harmful or doesn't always hold true.</p></li>

<li><p>What is one other assumption – not about names or gender – that the data set makes? Provide an example and explain why the assumption is harmful or doesn’t always hold true.</p></li> </ol>

<h2 id=“part-2-table-transformation”>Part 2: Table transformation</h2>

<h2 id=“exercise-21-names”>Exercise 2.1: Names</h2>

<p class=“task”><strong>Task</strong>: Add a new column,

"first-name"

. This should contain a substring of the value in the

"name"

column, stopping before the first space (

" "

). Call the resulting table

people-names

.</p>

string-index-of

</a> and <a href=“https://www.pyret.org/docs/latest/strings.html#%28part._strings_string-substring%29”>

string-substring

</a> functions! Note what

string-index-of

returns when the substring isn’t found. What names could that happen for? Consider what would be an appropriate value for the

"first-name"

column in this case.</p>

<p class=“task”><strong>Task</strong>: In a comment, identify two (or more) names where this function doesn't return the right thing.</p>

<h2 id=“exercise-22-places”>Exercise 2.2: Places</h2>

<p class=“task”><strong>Task</strong>: The table contains a

"birthstate"

column, but this is only relevant for people born in the United States or other places that are divided into states. Use

transform-column

to change missing values to

"NA"

, standing for “not applicable” or “not available”. Call the resulting table

people-states

.</p>

<p class=“task”><strong>Task</strong>: Filter the

people-states

table to only include people whose

"country"

is the United States. (Be careful – the capitalization in this column is inconsistent!) Call the resulting table

people-us-states

.</p>

<p class=“task”><strong>Task</strong>: Count how many times each state occurs in the

people-us-states

table using the

count

function.</p>

<p class=“task”><strong>Task</strong>: Visualize the distribution of states you just counted by making a pie chart!</p>

<h2 id=“optional-exercise-23-years”>OPTIONAL Exercise 2.3: Years</h2>

<p>The table contains a

"birthyear"

column. Because the famous individuals lived throughout recorded history, some of the years are BCE – <a href=“https://en.wikipedia.org/wiki/Common_Era”>Before the Common Era</a>. These are represented as negative numbers, which is accurate, but confusing to see!</p>

<p class=“task”><strong>Task</strong>: Transform this column into strings like

"1970 CE"

and

"367 BCE"

. Call the resulting table

people-years

.</p>

<p><strong>Hint</strong>: Beware: Not all of the entries in this column are proper numbers, which is why it's loaded as strings!</p>

<h2 id=“assignment-guidelines”>Assignment guidelines</h2>

<p>As with previous assignments, you are expected to follow good Pyret style, including writing docstrings and examples for each function. Review the <a href=“https://www.cs.vassar.edu/~cs101/3/resources/pyret.html”>Testing and Style Guidelines</a> and ask questions if anything’s unclear!</p>

<h2 id=“submitting-the-assignment”>Submitting the assignment</h2> <ol> <li><p>Download your file (<em>File</em> → <em>Download</em>) and ensure it’s named

asmt04.arr

.</p></li> <li><p>Upload your assignment on <a href=“https://www.gradescope.com”>Gradescope</a>.</p></li> </ol> <p><strong>Note</strong>: You can submit as many times as you want before the deadline. Only your latest submission will be graded.</p> <h2 id=“acknowledgments”>Acknowledgments</h2> <p>Part of this assignment is adapted from Kathi Fisler and colleagues at Brown University.</p> </html>