Unit 3, Self-Check Assignment 1: Milligan, Chapter 9: Trendlines and Forecasts
Welcome to your hands-on activity! This assignment allows you to work with Tableau using some of our real-world datasets. You will follow these steps:
● Download the attached spreadsheet and upload those datasets into Tableau.
● Go through this document and use Tableau to answer all the questions listed below. Where applicable, paste screenshots into the template below.
● When you are ready, complete the online quiz verifying your homework. Use the answers you found in this document to answer the questions.
● When you have completed the online quiz, submit the Word document.
● Remember, you can always ask your instructor for help if needed.
● If you need to adjust the size of your visualizations to match the options in the questions, use the “Format”-> “Cell size” options. For example, “Ctrl+Shift+B” on a Windows computer will make the visualization bigger, and “Ctrl+Up” will make it taller.
This assignment utilizes these five datasets:
1. Real Estate Listings
2. San Francisco Sales Data
3. Happiness Survey Employees
4. Employee Attrition Data
5. Pacific West National Park Visitation 2001–2022
Question 1: Linear Trend
You are working on an analytics team contracted by a real estate sales company in San Francisco that wants to analyze the MLS data. You are asked to show the relationship between square footage and the sales price of the homes for which data is available. You were given the following instructions to find the relationship, but these are for a generic dataset, so you will practice first with the Real Estate Listings dataset, and then you will create a trend line specific to San Francisco.
Connect to the Real Estate Listings, and highlight “Price” and “Size (Sq Ft)” at the same time. Under the “show me” menu, click on the “scatter plot” option.
Alt Text: Tables
Under the analysis menu, turn “Aggregate Measures.”
Now you should see a scatterplot of the Price and Size (Sq Ft).
Alt text: scatterplot
We can see from this scatterplot that as the size increases, so does the price, so we will assume that this is a linear relationship. Now we will add a trend line. Click on the Analytics tab.
Alt Text: Analytics list
Then drag the Trend Line over to the scatterplot, and pick the first option: Linear.
Now you should have a line over your scatterplot.
Alt text: scatterplot
We now have an idea of the relationship between square feet and price for our first dataset, Real Estate Listings.
Now, load the second dataset: the San Francisco Sales Data. You will perform this same visualization and trend for the San Francisco Sales Data (use “Sf” for square feet and “Sale Price” for price).
Using the San Francisco Sales Data trendline, if you were looking at a property around 7k square feet, approximately what price would you expect?
A. 4 M
B. 850 K
C. 6 M
D. 6 K
Question 1 Answer:
<>
Question 2 (Parts A & B): Interpreting linear trend statistics
Before we share this with our audience at the real estate company, we need to look at the statistical summary behind the data. If we go back to the Real Estate Listings, we can right-click on the linear trend line and choose “Describe Trend Model.”
Alt Text: Trend lines
Alt text: Trend model
Generally speaking, when we see that the P-value is less than 0.05, we know that the model is considered statistically significant and that there is a low chance that the data occurred by random chance. Our P-value is less than 0.0001, which is a lot smaller than 0.05, so we have a statistically significant model.
We can also see the “innards” of this model. It is fitting the equation:
Price = M (Size in Sq Ft) + Intercept
If you remember a line as y=mx+b from algebra class, this is its grown-up cousin: the b = intercept and the x = size in square feet. You can think of the “b” as “how much you pay for the lot” and the “m” as “how much you pay per square foot.” In our case, the b is negative, so you calculate on a per-square-foot basis and then subtract a certain amount to get your final estimate. If you need a refresher on linear regression, check out the learning resources in the unit.
You can plug in any given size in square feet, and it will calculate the price for you.
For example, if you want to price the first dwelling with ID 1, our friends on Highway 360 in Mansfield, TX, you would see it’s a 2,300-square-foot dwelling, so you would do something like this:
Price of home = Mx + b
= M (size in square feet) + intercept
= 215.203 (size in square feet) – 242,001
= 215.203 (2,300) – 242,001
= 494,966.90 -242,001
= 252,965.90
Alt Text: table
Now, you should better understand the math behind a linear trend model. Create a linear trend model for the San Francisco Sales Data and describe the trend model for the San Francisco data.
Question 2A: Can we consider the linear trend line for San Francisco statistically significant?
Yes
No
Question 2A Answer:
<>
Question 2B: What is the price per square foot given by the San Francisco model?
Option A: 967.818
Option B: 11,303.9
Option C: 175.852
Option D: -449,822
Question 2B Answer:
<>
Question 3: Logarithmic trendline
You have a new client, a pharmaceutical company wanting to understand its workforce better. Someone in their leadership team read an article about the relationship between employee happiness and salary. They decided to survey their new entry-level employees to see how happy they were on a scale of 1–10 (1 being the least and 10 being the most). They were also asked about their salaries. The employees were part-time as well as full-time. This data are in the Happiness Survey Employees dataset. Use the first tab, labeled March, to load the data into Tableau.
First, click on both the Happiness 1 and Income 1 fields. Then click on the scatterplot option in the Show Me tab. Turn off the aggregation by going under the Analysis menu and unchecking aggregate measures. Ensure that Income 1 (independent variable) is on the x-axis and Happiness 1 (dependent variable) is on the y-axis.
You should have a scatterplot that looks like the following:
Alt text: scatterplot
Now, go under the Analytics Tab, drag the Trend Line over the graph, and select Logarithmic.
Alt text: Scatterplot logarithmic
Right-click on the line that is created and click on “Describe Trend Model.” You will get a window pop-up like the following:
Alt text: trend lines model
This model can be expressed with the following equation:
y= 1.00295ln(x)-2.12351 with an R2 = 0.9918
Y is the happiness level, and X is the salary. If you need a refresher on the math, “ln” represents the natural log that uses the base of e. There is a learning resource on this topic you can review. You can perform the ln calculation on a scientific calculator—you might already have one on your computer! We’ll show you the math below:
Y=Happiness Level Coefficient ln (x=Income) + Y-intercept
1.00295 * ln (x) + -2.12351
If we plug in Income =$1,000à 1.00295 * ln (1,000) -2.12351
Happiness Level= 4.8
If we plug in Income =$10,000à 1.00295 * ln (10,000) -2.12351
Happiness Level= 7.1 (2.3 more than $1,000)
If we plug in Income =$20,000à 1.00295 * ln (20,000) -2.12351
Happiness Level= 7.8 (0.7 more than $10,000)
If we plug in Income =$30,000à 1.00295 * ln (30,000) -2.12351
Happiness Level= 8.2 (0.4 more than $20,000)
If we plug in Income =$40,000à 1.00295 * ln (40,000) -2.12351
Happiness Level= 8.5 (0.3 more than $30,000)
If we plug in Income =$50,000à 1.00295 * ln (50,000) -2.12351
Happiness Level= 8.7 (0.2 more than $20,000)
Alt text: Happiness level
As you can see, the employees that were paid the least, probably the part-time employees, had the lowest happiness and salaries. We can see a big jump in happiness from employees paid 10K or less. Then, when we go from $10K to $20K, their happiness rating only goes up 0.7 on the scale.
Eventually, for the employees paid the most in the survey, the additional money barely makes a difference in their happiness. This can also be described as a law of diminishing returns. You are seeing big differences in one variable (happiness in this case), but as the salary increases, happiness eventually levels out.
The leadership at the company is very interested in this and decided to resurvey their employees in August one more time to make sure the data are still consistent. You will perform the same steps on the August Survey data in the Happiness Survey Employees dataset.
When you create a logarithmic trend line for the August survey, what is the value of the coefficient on income that was calculated?
Option A: 0.97778
Option B: 1.34142
Option C: -2.31583
Option D: 1.02523
Question 3 Answer:
<>
Question 4: Exponential Trend Lines
The pharmaceutical company has a new set of data, Employee Attrition Data, to help it analyze all employees. This time, you have been asked to look at the relationship between Monthly Income (USD) and Total Working Years. They would also like to see this by education level.
Create a scatterplot with these two fields following the steps you took in Questions 1 and 3. Next, add an exponential trend line.
Alt text: exponential trend line
Move “education” over to the color marks. You should now have 5 exponential trend lines. Review the P-values for the 5 levels.
Move “education” into the filter and select “Below College.” You should now have one trend line and one color in your graph.
Alt text: scatterplot
Using the line, we can say that after 15 years of working, employees with an education level below college will earn around $7K per month.
Now, change the filter to “Master” and examine the new trend line. Using the trend line for the employees with a master’s degree, what would you approximate to be the monthly income of an employee with 25 years of experience?
Option A: $7K
Option B: $11K
Option C: $6K
Option D: $9K
Question 4 Answer:
<>
Question 5: Forecasting
Your analytics company just got a contract with the National Park Service. They are working on budget and resource planning for their Pacific West parks located in Washington, Oregon, California, Hawaii, and American Samoa. They would like to know what future visitation counts will be. They have the last 20+ years of visitor counts (Pacific West National Park Visitation 2001–2022). They would like you to focus on Joshua Tree National Park, Olympic National Park, and Yosemite National Park.
After loading the data, hold down the “Park Name,” “Year,” and “Visitor Count” fields in the data pane (hold down the Ctrl button between each click). Click on the “Show me” menu, and pick the Line Graph.
Alt text: tables
Now, drag the Park Name into the filter, and select only Joshua Tree National Park, Olympic National Park, and Yosemite National Park.
Alt text: Filter
You should now see three-line graphs.
Alt text: graph
Next, click on the Analytics tab, and drag the Forecast option onto the line graphs.
Alt text: analytics
You should now see three forecasts created, one for each park.
Alt text: Line graph
You will see that the graph gives not only a prediction, but also confidence intervals around the prediction (sometimes called the “prediction funnel”). The line gives the most likely values, but since the future is uncertain, it’s telling you it could be as high as the top of the funnel or as low as the bottom of the funnel.
You can see that Joshua Tree has a predicted increase while Olympic and Yosemite are flat.
Right-click on the forecast area. Select “Forecast” then “Forecast Options.”
Alt text: Forecast options
Note that the forecast is for 4 periods (years) and ignores last year.
Let’s experiment what happens when we change the options. We’re going to change the forecast to 5 periods (years). Do not ignore the last year (change it to 0). Leave everything else as default (do not fill missing values with 0 or use a 95% confidence level).
Alt text: line graph
The National Park Service finds this very helpful and would now like you to do this for the following Hawaiian parks: “Kalaupapa NHP,” “Kaloko Honokohau NHP,” and “Pu’uhonua o Honaunau NHP.” Be sure you keep the forecast at 5 years, and keep the “Ignore last periods” at 0.
Which of the following statements are true? (Select all that apply):
Option A: The forecasted visitor count in Kaloko Honokohau NHP in 2027 is 298,382
Option B: The forecasted visitor count in Kaloko Honokohau NHP in 2026 is 246,321
Option C: The forecasted visitor count in Kaloko Honokohau NHP in 2025 is 247,074
Option D: The forecasted visitor count in Pu’uhonua o Honaunau NHP in 2024 is 312,065
Question 5 Answer:
<>