====== Week 6 ====== ~~NOTOC~~ ===== Data Normalization ===== From the 5 Rules of Data Normalization {{ :people:mlsmith:rettignormalizationposter.pdf|poster}}: * Unnormalized data * redundant and repetitive data (jagged length rows) * 1st Normal Form (1NF) -- eliminate repeating groups * 2NF -- eliminate redundant data * 3NF and 3.5NF [[https://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form|BCNF (Boyce-Codd NF)]] -- eliminate columns not dependent on key * 3NF on the poster is really BCNF * **oath:** //Each attribute in a table must be a fact about the key, the whole key, and nothing but the key, so help me Codd.// * Higher normal forms * 4NF -- isolate independent multiple relationships * 5NF -- isolate semantically related multiple relationships ===== Project Ideas (Brainstorming) ===== * Recommendations DB for Books, TV shows, movies... * Student-run Rock museum - WARTHIN (local - Vassar) * Business accelerator / Venture Capital (source: non-profit.org) * Olympic-related (countries, events, ticket sales) (Kaggle) * Music Library - collections, composer, key, genre * Alumni - maybe CS-specific? (like Vassar Net) * Last.FM (Spotify, YouTube) - common tracks among users, most common for an individual user * Pokeman DB * Animals used in Scientific Research, what we do to them, etc. (local to Vassar) * Start-up DB - what kinds, trends for success/failure * Rate My Professor - integrate with AskBanner * Deece DB (menus, by day, when are certain foods served?) * Boardgame DB (search for board games based on attributes) * Data Analysis - from Government, pre/post pandemic, etc. * NFL DB - statistics, analysis, consistency among players * variation: at the team level, predictions ===== For next week ===== * what database project would you like to work on? * it doesn't need to be one of the ones we brainstormed--that was just to get us thinking about the possibilities * who would you like to partner with? (ideally groups of two) * do you need help finding a partner?