CSE 163, Winter 2020: Homework 3: Data Analysis

This assignment and its reflection are due by Thursday, January 30 at 11:59 pm .

You should submit your finished hw3.py , and hw3-written.txt on Ed and the reflection on Google Forms

You may submit your assignment as many times as you want before the late cutoff (remember submitting after the due date will cost late days). Recall on Ed, you submit by pressing the "Mark" button. You are welcome to develop the assignment on Ed or develop locally and then upload to Ed before marking.

In this assignment, you will apply what you've learned so far in a more extensive "real-world" dataset using more powerful features of the Pandas library. As in HW2, this dataset is provided in CSV format. We have cleaned up the data some, but you will need to handle more edge cases common to real-world datasets, including null cells to represent unknown information.

Note that there is no graded testing portion of this assignment. We still recommend writing tests to verify the correctness of the methods that you write in Part 0, but it will be difficult to write tests for Part 1 and 2. We've provided tips in those sections to help you gain confidence about the correctness of your solutions without writing formal test functions!

This assignment is supposed to introduce you to various parts of the data science process involving being able to answer questions about your data, how to visualize your data, and how to use your data to make predictions for new data. To help prepare for your final project, this assignment has been designed to be wide in scope so you can get practice with many different aspects of data analysis. While this assignment might look large because there are many parts, each individual part is relatively small.

Learning Objectives

After this homework, students will be able to:

  • Work with basic Python data structures.
  • Handle edge cases appropriately, including addressing missing values/data.
  • Practice user-friendly error-handling.
  • Read plotting library documentation and use example plotting code to figure out how to create more complex Seaborn plots.
  • Train a machine learning model and use it to make a prediction about the future using the scikit-learn library.

Expectations

Here are some baseline expectations we expect you to meet:

Follow the course collaboration policies

If you are developing on Ed, all the files are there. If you are developing locally, you should download the starter code hw3.zip and open it as the project in Visual Studio Code. The files included are:

  • hw3-nces-ed-attainment.csv : A CSV file that contains data from the National Center for Education Statistics. This is described in more detail below.
  • hw3.py : The file for you to put solutions to Part 0, Part 1, and Part 2. You are required to add a main method that parses the provided dataset and calls all of the functions you are to write for this homework.
  • hw3-written.txt : The file for you to put your answers to the questions in Part 3.
  • cse163_utils.py : Provides utility functions for this assignment. You probably don't need to use anything inside this file except importing it if you have a Mac (see comment in hw3.py )

The dataset you will be processing comes from the National Center for Education Statistics. You can find the original dataset here . We have cleaned it a bit to make it easier to process in the context of this assignment. You must use our provided CSV file in this assignment.

The original dataset is titled: Percentage of persons 25 to 29 years old with selected levels of educational attainment, by race/ethnicity and sex: Selected years, 1920 through 2018 . The cleaned version you will be working with has columns for Year, Sex, Educational Attainment, and race/ethnicity categories considered in the dataset. Note that not all columns will have data starting at 1920.

Our provided hw3-nces-ed-attainment.csv looks like: (⋮ represents omitted rows):

Column Descriptions

  • Year: The year this row represents. Note there may be more than one row for the same year to show the percent breakdowns by sex.
  • Sex: The sex of the students this row pertains to, one of "F" for female, "M" for male, or "A" for all students.
  • Min degree: The degree this row pertains to. One of "high school", "associate's", "bachelor's", or "master's".
  • Total: The total percent of students of the specified gender to reach at least the minimum level of educational attainment in this year.
  • White / Black / Hispanic / Asian / Pacific Islander / American Indian or Alaska Native / Two or more races: The percent of students of this race and the specified gender to reach at least the minimum level of educational attainment in this year.

Interactive Development

When using data science libraries like pandas , seaborn , or scikit-learn it's extremely helpful to actually interact with the tools your using so you can have a better idea about the shape of your data. The preferred practice by people in industry is to use a Jupyter Notebook, like we have been in lecture, to play around with the dataset to help figure out how to answer the questions you want to answer. This is incredibly helpful when you're first learning a tool as you can actually experiment and get real-time feedback if the code you wrote does what you want.

We recommend that you try figuring out how to solve these problems in a Jupyter Notebook so you can actually interact with the data. We have made a playground Jupyter Notebook for you to use that already has the data loaded. Remember, that playground notebooks on Colaboratory are temporary unless you save them to your Google Drive! If you want to save your work on the notebook, you should make sure you explicitly press the save button and follow the instructions to copy!

Table of Contents

Part 0: Statistical Functions with Pandas

Part 1: Plotting with Seaborn

Part 2: Machine learning with scikit-learn

Part 3: Written Responses

Part 4a: Submit Assignment and Part 4b: Complete Reflection . On Ed, you should submit:

  • hw3-written.txt

Your submission will be evaluated on the following dimensions:

  • Your solution correctly implements the described behaviors. You will have access to some tests when you turn in your assignment, but we will withhold other tests to test your solution when grading. All behavior we test is completely described by the problem specification or shown in an example.
  • No method should modify its input parameters.
  • Your main method in hw3.py must call every one of the methods you implemented in this assignment. There are no requirements on the format of the output, besides that it should save the files for Part 1 with the proper names specified in Part 1.
  • We can run your hw3.py without it crashing or causing any errors.
  • All code files submitted pass flake8
  • Your program should be written with good programming style. This means you should use the proper naming convention for methods and variables ( snake_case ), your code should not be overly redundant and should avoid unnecessary computations.
  • Every function written is commented using a doc-string format that describes its behavior, parameters, returns, and highlights any special cases.
  • There is a comment at the top of each code file you write with your name, section, and a brief description of what that program does.
  • Any expectations on this page or the sub-pages for the assignment are met as well as all requirements for each of the problems are met.

Make sure you carefully read the bullets above as they may or may not change from assignment to assignment!

  • High School
  • You don't have any recent items yet.
  • You don't have any courses yet.
  • You don't have any books yet.
  • You don't have any Studylists yet.
  • Information

MATH 110 HW 3 - CH Work and assignment

Quantitative literacy (math 110), roosevelt university, recommended for you, students also viewed.

  • Activity 6 classwork
  • 03 Making and Interpreting Graphs
  • Math110 ACT1 - CH Work and assignment
  • MATH 110 Activity 2a - CH Work and assignment
  • MATH 110 Activity 3 - CH Work and assignment
  • MATH 110-Homework 4 - CH Work and assignment

Related documents

  • MATH 110 Activity 4a - CH Work and assignment
  • MATH 110- Homework 5 - CH 5 Work and assignment
  • MATH 110 Activity 2b
  • MATH 110 Activity 7 - CH 7 Assignment
  • Math 110 HW 2 - CH 2 Assignment
  • Homework 1 Part 2 FA 24+ 5 DAP

Related Studylists

Preview text, math 110 homework #3 graphs and graphing, when describing graphs, first carefully read the title and axes labels—pay attention to units. then ask, yourself the following questions—what does each graph tell you about the topic what is interesting or, noteworthy what conclusions can be drawn why did the author make this graph what information, was the author intending to convey could any additional calculations be done to better understand the, data in addition to answering these questions, use the guidelines listed for each graph type to write a, full paragraph description of the graphs below., scatter graphs, the best description of a scatter graph does the following:,  “tells the story” of the graph from left to right giving specific values and dates for the first data, point, the last data point, any relevant high points and low points,  explains the overall trend seen on the graph,  includes at least one relevant calculation (absolute change, percent change, ratio, etc) to better, describe the data, the trends, or changes in the data,  ends with some meaningful conclusion or “take away”,  uses terminology correctly, 1. write a thorough description of this scatter graph using the guidelines above. [3 pts], (hold your cursor over points on the graph to see the year and the exact value for the rate).

1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 10. 12. 14. 16. 18. 20. 22. 24.

US Poverty Rate 1959 to 2018

####### Year

####### Percent Below Poverty Level

Since 1960, the US population has been in poverty, but this has decreased by 8% since 1970. The rate

Has fluctuated since 1973, and while there have been tiny rises here and there, the general rate of poverty, percent has declined, as shown in the graph., bar or column graph, the best description of a bar graph does the following:,  explains the topic of the graph,  gives specific values for at least some of the categories,  includes at least one relevant calculation (absolute difference, percent larger than, ratio, etc) to, better describe the data or compare categories in the data,  includes some meaningful observations and a conclusion or “take away”, 2. write a thorough description of this bar graph using the guidelines above. [3 pts], lake superior lake huron lake michigan lake erie lake ontario, area of the great lakes, surface area (in square miles), the graph shows the size of the six major lakes in the united states. according to the graph,, lake superior is the largest lake in the united states. lake huron is the second largest, followed, by lake michigan., the best description of a pie graph does the following:, 3. write a thorough description of this pie graph using the guidelines above. [3 pts], 5. open the file addicted to social media file. examine the data carefully., a. the following graph was made from the data in this file. give two reasons that a pie graph is, not the appropriate type of graph for this data set. [2 pts], this graph is not the appropriate type because there is an overlapping information..

29% 33% 22% 14%

Percent Addicted to Social Media 2019

18- 23- 39- 55-

b. Make a graph of the data that is appropriate and paste it here. [2 pts]

0% 1 8 - 2 2 2 3 - 3 8 3 9 - 5 4 5 5 - 6 4 45% 52% 35% 22%

US internet Users addicted to socila media.

Age Group Percent

c. Write a paragraph with a thorough description of the data and your graph. Include as

Part of your description a calculation that compares the highest to the lowest. [3 pts], this graph is clear and understandable title. the axis are clearly labeled and allows the, reader to see information clearly., 6. the graph below was originally published on the georgia department of health website in, early may 2020. it was later deleted and replaced. examine the graph carefully. besides not, labeling the axes, there are two major problems with the graph. name one and explain what, misleading impression it creates. bonus point if you can name both of them. [2 pts], the source of the graph is not listed., it overlaps the information and doesn’t show the information clearly about y axis., 7. your data analysis project data [5 pts], open your file and investigate all of the data., a. make one graph (either scatter, pie, or column) from one of the data sets. (you many need to do, calculations with the data before making your graph). copy and paste graph here..

  • Multiple Choice

Course : Quantitative Literacy (MATH 110)

University : roosevelt university.

homework 3

  • More from: Quantitative Literacy MATH 110 Roosevelt University 47   Documents Go to course
  • More from: Quantitative Literacy by Wendy Glasscoe 32 32 documents Go to Studylist
  • Number Charts
  • Multiplication
  • Long division
  • Basic operations
  • Telling time
  • Place value
  • Roman numerals
  • Fractions & related
  • Add, subtract, multiply,   and divide fractions
  • Mixed numbers vs. fractions
  • Equivalent fractions
  • Prime factorization & factors
  • Fraction Calculator
  • Decimals & Percent
  • Add, subtract, multiply,   and divide decimals
  • Fractions to decimals
  • Percents to decimals
  • Percentage of a number
  • Percent word problems
  • Classify triangles
  • Classify quadrilaterals
  • Circle worksheets
  • Area & perimeter of rectangles
  • Area of triangles & polygons
  • Coordinate grid, including   moves & reflections
  • Volume & surface area
  • Pre-algebra
  • Square Roots
  • Order of operations
  • Scientific notation
  • Proportions
  • Ratio word problems
  • Write expressions
  • Evaluate expressions
  • Simplify expressions
  • Linear equations
  • Linear inequalities
  • Graphing & slope
  • Equation calculator
  • Equation editor
  • Elementary Math Games
  • Addition and subtraction
  • Math facts practice
  • The four operations
  • Factoring and number theory
  • Geometry topics
  • Middle/High School
  • Statistics & Graphs
  • Probability
  • Trigonometry
  • Logic and proof
  • For all levels
  • Favorite math puzzles
  • Favorite challenging puzzles
  • Math in real world
  • Problem solving & projects
  • For gifted children
  • Math history
  • Math games and fun websites
  • Interactive math tutorials
  • Math help & online tutoring
  • Assessment, review & test prep
  • Online math curricula

IMAGES

  1. Book covers: Homework (x2) • Teacha!

    homework 3

  2. Homework Logo Rezised (3)

    homework 3

  3. Homework 3 activity

    homework 3

  4. 8

    homework 3

  5. Gina Wilson All Things Algebra 2016 Unit 12 / All Things Algebra Math

    homework 3

  6. SOLUTION: Translating and Solving Equations Exercises

    homework 3