Tableau Case Study : Boston Real Estate

Leonie M Windari
9 min readDec 30, 2020

Hello everyone!

This is my second post of Tableau exercise. I made this post specifically to test my understanding in Tableau and learn more about it.

Please note that the case study is based on edureka youtube video bellow. I use the case study and dataset in the video to practice Data Visualization and when I’m done, I compare it with what the edureka team does in their video.

Like I said in the post before, it is important to determine the purpose of your data visualization before starting your visualization in Tableau. But still, we have to catch a glimpse of the data first. In here I still use the questions from edureka video but I hope I can get a sense of understanding in determining the purpose/goal of the data visualization for my future projects.

In this case study, we will look the dataset of Boston Real Estate prices where there are different factors that can decide the price of it. We can visualize some factor to see the correlation between them with the Boston Real Estate price.

You can download the dataset here.

Here are the explanation of each features in the dataset :

  • CRIM : Per Capita Crime
  • RatioZN : Proportion of Residential Land Zoned for lots of 25000sq ft
  • Indus : Proportion of Non-Retail Business across per Capita
  • NQX : NO2 Concentration in ppm
  • RM : Number of Rooms per Dwelling
  • AGE : Proportion of Preoccupied units prior to 1940
  • DIS : Distances to Boston Employment Centres
  • RAD : Index of Accessibility to Radial Highways
  • TAX : Full Value Property Tax per $10,000Ptratio Pupil-Teacher Ratio
  • B : Proportion of Segregated MinoritiesLstat Proportion of Lower Income Class
  • Medv : Median Value of Homes

From the dataset, we have some questions that hopefuly can give us insight about the data.

1. How does crime affect the price of real estate?

I think the best way to describe a relationship between variable is using a scatter plot So, tha’s what I’m doing here!

Figure 1. House Pricing vs Crime
Figure 2. House Pricing vs Crime (with Cluster)

The result that I get from scatter plot with Crime as the x-axis and House Price as the y-axis it the Figure 1. You can see that with small Crime Value, the House Price tend to have bigger value. But I notice that it’s kinda hard to see so I decided to use Cluster so that you can see the distribution properly.

I like the second visualization better (Figure 2) just because you can also see the House Price range for specific Crime range and so on. From this graph we can drew a conclusion that with small crime value, the price of the house is more expensive, although there are some outliers.

What do you think, do you prefer Figure 1 or Figure 2?

Note : I did not do Data Cleaning on the dataset, just because I am learning Data Visualization so my focus is to teach myself and give myself a sense of understanding in data visualization on each questions that being asked. Technically I can do data cleaning but not today. If I could, I will do data cleaning on the missing value and outliers to draw a much more valid conclusion.

2. How is proportion of owner-occupied units established prior to 1940 related to the price of real estate?

Because we are trying to find the relationship between the two features, so we will use scatter plot once again.

Figure 3. House Price vs Proportion of Preoccupied units prior to 1940

After answering the first question, I feel like I like the Figure 2 better where you differentiate the range with different color. In the first question I use cluster but for this one I just divide where I think the line where we can differentiate the two ( There must be an automatic way but for now I just divide it manually).

Although there’s no clear difference in the house price range of 20–30 but we can drew conclusion that house price ranging between 5 and 15 have a bigger proportion of preoccupied units. So house with bigger proportion of preoccupied units is much more cheaper.

3. How does the concentration of NO2 in the air affect the price of real estate?

Breathing air with a high concentration of NO2 can irritate airways in the human respiratory system (Source). So it is important to know the concentration of NO2 before deciding whether you should buy the house or not.

Figure 4. House Price vs NO2 Concentration in ppm

From the graph we can see that the NO2 concentration doesn’t have a big correlation with the House Price as we can’t distinguish it in the concentration range of 0.4–0.5, but we can see that around 0.6 above, the house price is smaller. Although there are some outliers. If we clean the outliers, I’m sure we will get a much clear result.

4. Is the number of rooms related to the price of real estate?

Figure 5. House Price vs Number of Rooms per Dwelling

The answer of the question is, Yes! Because we want to see the relationship between the two, we will use scatter plot. We can see that the correlation between House Price and number of Rooms per Dwelling is positive, where if the house price increase, so will the number of rooms.

From the graph we can see that the house with more rooms is more expensive. We can see that in the range of 7–9 rooms, the house price is more than 25.

5. Are there any relationship between the proportion of lower financial class and price drop in real estate?

Figure 6. House Price vs Proportion of Lower Income Class

Once again for this question we will look at the relationship between two variable and we will use scatter plot. Here we can see that the House Price and the Proportion of Lower Income Class have a negative correlation, where if the house price increase then the Proportion of Lower Income Class decrease. It makes sense since people with lower income will probably count only afford cheaper house. But it’s not impossible since 5% of the lower income class can afford a more expensive house.

6. How do the rates fluctuate based on the crime rate in the area?

To tell you the truth, I’m not sure how to visualize this questions (Note that the questionsis from edureka video, I just tried to answer it by myself). At first I thought of how the house price fluctuated with the crime rate.

Figure 7. House Price vs Crime
Figure 8. House Price Variation Value based on Crime Value

At first what I make is the left figure (Figure 7) as I want to show how it fluctuates, but the edureka video show me the right figure (Figure 8). I think it’’s between I don’t understand the question well or the question is too general. But we can use both graph well!

It basicly almost the same, but the Figure 8 gives a much more detailed insight. It is ashamed that we can only look at the picture without interacting with the data. But we can use Tableau Online to interact with the data so you can clearly see the median, lower, and upper of each bins of the crime value. I publish my workbook in Tableau online so you can check it here.

Here I learn how to make bins, the recomended bins for crime value is 3.3 but I just use 3 for this.

7. Does your selection range of price impact parameters such as the Average Crime Rate and Pupil-Teacher Ratio?

Figure 9. Crime and Pupil-Teacher Ratio Value impact on the House Price

To be honest, I think you can’t combine the two feature like this because you can’t drew a conclusion that easily but when you differentiate the two using color, it becomes better. As for the conclusion, let say that you want a house ranging in 10–20, then you will get a crime rate between 0–10 and pupil-teacher ratio between 13–20.

It is not the best way to visualize it but I can’t think of any beside this. If this were open in Tableau Online where you can interact with the data, I think you can use the filter data to choose your own range of value that you want. But for now, I think I’m quite satisfied with this visualization.

8. What are the Key Performance Indicators?

For this question, what I did was I analyze all the relationship of each features with the house pricing and determine which features that have a relation with the house pricing. Below was the feature that I choose and their value. If you look at my workbook that I posted in Tableau Online, there’s actually a filter where you can choose the house price range and see the features value based on the price range. (This is what edureka does)

Figure 10. Features Value based on Price Range

But I’m still not satisfied by it and I don’t see the correlation between the table and the question. In Tableau, I have to choose the feature that have a relation with the house price manually (I don’t know if you can do it automaticly in Tableau or not) but if I use Python in Jupyter Notebook, I can use heatmap to describe the relationship and from that you can see the visualization well. So I prefer doing a relationship analysis in Jupyter Notebook.

(Correct me if I’m wrong!)

9. Is there a fixed equation showing the relationship between factors and price?

To show the relationship between factors/features with the house price, we can use trendline. You can actually see the function, R-squared, and p-value but you can only see it in my Tableau Online (which I linked below).

Figure 11. Function of Crime Values to House Pricing

Here I use the example of Crime Value to determine the function tht best describe their relationship. I use trendline and you can choose which relationship that you want, either linear, logaritmic, exphonential, polynomial, or power.

From this exercise I learn a lot about User Experience, where you can visualize the data and modify it so the user can interact with the data better. For example in the first 5 questions is basicly ask whether the features have a relationship with the house price or not. It can be a bother if user have to click on different figure to see the relationship.

It will be easier if we use Dynamic Measure (and if will help your user to see the relation between parameters). We can create a parameters of Measure of Interest, consiting the parameters and then make a Calculation Field of Case where each time the parameter is being called, it will show the parameter.

Of course you have to see each parameter relation with the price. If they have a relation then put it in the Measure of Interest. If not just let them be.

By doing this, the user can choose themself which feature they want to see the relationship with the house price.

Thank you for reading this, I hope this post can help you! I’m sorry if I’m still lacking in many ways but I’m still learning. Please correct me if I’m wrong or give me some suggestions if you have.

I publish the workbook here, so you can check it out!

--

--

Leonie M Windari

a curious human being. current enemies : manual data entry. current motivation : weekends and deadlines.