Week 6
Data Normalization
From the 5 Rules of Data Normalization poster:
- Unnormalized data
- redundant and repetitive data (jagged length rows)
- 1st Normal Form (1NF) – eliminate repeating groups
- 2NF – eliminate redundant data
- 3NF and 3.5NF BCNF (Boyce-Codd NF) – eliminate columns not dependent on key
- 3NF on the poster is really BCNF
- oath: Each attribute in a table must be a fact about the key, the whole key, and nothing but the key, so help me Codd.
- Higher normal forms
- 4NF – isolate independent multiple relationships
- 5NF – isolate semantically related multiple relationships
Project Ideas (Brainstorming)
- Recommendations DB for Books, TV shows, movies…
- Student-run Rock museum - WARTHIN (local - Vassar)
- Business accelerator / Venture Capital (source: non-profit.org)
- Olympic-related (countries, events, ticket sales) (Kaggle)
- Music Library - collections, composer, key, genre
- Alumni - maybe CS-specific? (like Vassar Net)
- Last.FM (Spotify, YouTube) - common tracks among users, most common for an individual user
- Pokeman DB
- Animals used in Scientific Research, what we do to them, etc. (local to Vassar)
- Start-up DB - what kinds, trends for success/failure
- Rate My Professor - integrate with AskBanner
- Deece DB (menus, by day, when are certain foods served?)
- Boardgame DB (search for board games based on attributes)
- Data Analysis - from Government, pre/post pandemic, etc.
- NFL DB - statistics, analysis, consistency among players
- variation: at the team level, predictions
For next week
- what database project would you like to work on?
- it doesn't need to be one of the ones we brainstormed–that was just to get us thinking about the possibilities
- who would you like to partner with? (ideally groups of two)
- do you need help finding a partner?