Hands-On Lab 1: Using Pandas DataFrames in Python
Python lab using native libraries (OS, Lists) as well as the Pandas dataframe library, to prepare you for common techniques in reading and organizing data.
100 points.
In this lab we’re going to learn how to read in several spreadsheet files. Each file contains annual superstore sales data for a particular market during a particular year. There are 28 files in all. We will read the contents from each file into a pandas dataframe where we can extract meaningful information from the data.
Video Tutorial
Dataset Download
1. Download Module1Data.zip to your local computer. Unzip the file.
2. In your Google Drive account, create a new folder called “MSDA683”. Copy your unzipped Module1Data folder to the new MSDA683 in your Google Drive account.
3. Create a new notebook called “MSDA683-Module1” in https://colab.google.com
4. Mount your Google Drive to your notebook
5. In a new cell, Import the OS and pandas libraries.

6. In a new cell, create a ‘path’ variable equal to the location of your Google Drive module 1 folder.
7. Create a variable called ‘salesFolder’ that connects to the path location.
8. Create a variable called ‘li’ with an empty list
9. Iterate through the files in your salesFolder. For each file
1. Set the filename to the current file
2. Print the path and file name
3. Read the contents of that file into a dataframe
4. Print the number of records in the new dataframe.
5. Append the dataframe to your list.
10. Create a new variable called “frame” that sets the contents of the list to a new dataframe.

11. In a new cell, show the first five records of the new dataframe.

12. In a new cell, show the dataframe info.

13. In a new cell, show the count of unique values in each column.

14. In a new cell, show the total Sales, Profit, the Cost of Goods Sold, and Profit Margin.

15. In a new cell, show the total Sales and Profit grouped by Market, rounded to two decimal points.

16. In a new cell, show total Sales and Profit with a calculated Cost of Goods Sold column, grouped by Market.

17. STUDENT PROBLEM 1. In a new cell, show the total Sales, total Profit, calculated COGS, and calculated Profit Margin, grouped by Market, and sorted by highest Profit Margin. Output should look like this:

18. STUDENT PROBLEM 2. In a new cell, show a descending list of total Sales grouped by County. Output should look like this:

19. STUDENT PROBLEM 3. In a new cell, show a descending list of total Profit by Sub-Category (i.e. product type), rounded two decimal places. Output should look like this:

20. STUDENT PROBLEM 4. In a new cell, show total Sales, total Profit, and calculated Profit Margin, by Sub-Category, sorted by highest Profit Margin. Output should look like this:

21. When complete with the steps above, share your notebook with [email protected]. Also, copy the link to your notebook from your sharing screen and paste that into the Blackboard assignment so I can grade.
22. Lab complete.