• Step1: Examining and loading data. Write Python code in JuPyteR notebook to examine ‘[url removed, login to view]’, determine the content and format, and then load the data into a Pandas DataFrame. Every line of code should be explained with ‘MarkDown’.
• Step 2: Parsing all the columns in the DataFrame. Parse the loaded data into a Pandas DataFrame that contains the following columns (pay attention to those have examples):
date Date of house sold, e.g., 20140502T000000
price House sold price.
bedrooms Number of bedrooms in the house.
bathrooms Number of bathrooms in the house.
sqft_living Living area in square feet, i.e., this is equal to the sum of basement area (i.e., sqft_basement) and the above living area (i.e., sqft_above)
sqft_lot Area of lot in square feet
floors Number of floors in the house
waterfront Whether it is a waterfront house.
view Number of views in the house.
condition Condition of the house.
sqft_above The above living area in square feet
sqft_basement Area of basement in square feet
yr_built Year the house is built
yr_renovated Year the house is renovated
street Street address the house locates, e.g., “3140 Franklin Ave E”
city City the house locates, e.g., “Seattle”
statezip State the house locates and zip code, separated by a space. For example, “WA 98115” (WA is the abbreviation of Washington)
country Country the house locates, e.g., “USA”
After the data is parsed and loaded into Pandas, you should have a DataFrame where each row is a record, and each column is an attribute. Each column should have a proper label (i.e., column name) and data format (i.e., integers, strings, floating point etc).
• Step3: Saving data. Save parsed data into a CSV (Comma Separated Values) file.