Analyzing the DC Housing Market - August, 2017

Choropleth Closing Prices by ZIP Code

The Data

One of my favorite DC blogs is POPville - written by the Prince of Petworth himself, Dan Silverman. While skimming through posts earlier this week, I noticed an entry containing data on August, 2017 DC home and condo sales compiled by Kevin Wood, a local Realtor. True to form, I couldn’t wait to dig into it and see what I could learn about the city’s real estate market!

Preparing the Data

I downloaded the Excel version of the data set from the POPville post and imported the data into RStudio for my analysis. To get things going, I created a dataframe with the sales data, and also created a new column to calculate the difference between closing prices and list prices.


fill <- "dodgerblue4"
line <- "grey22"

sales <- read_excel("YOUR_PATH/August2017Sales.xls", col_types = c("text", "text", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "date"))
sales <- mutate(sales, sale_diff = close_price - list_price)

Then, I grouped the information in ZIP code buckets, averaging the core data points across all the sales in each ZIP. With this done, I could start to dig into the numbers for each ZIP code in the city!

g_zip <- sales %>% group_by(zip) %>% summarise(count = n(), avg_list_price = mean(list_price), avg_close_price = mean(close_price), avg_subsidy = mean(subsidy[subsidy>0]), avg_bd = mean(bd), avg_bath = mean(bath), avg_half_bath = mean(half_bath), max_close_price = max(close_price), min_close_price = min(close_price), avg_sale_diff = mean(sale_diff))


Sales Data by ZIP Code

Average Sale Price by ZIP Code

plot_zip_prices <- ggplot(data = na.omit(g_zip), aes(x = reorder(zip, -avg_close_price), y = avg_close_price, fill = avg_close_price)) + geom_col() + scale_y_continuous(name = "Average Close Price", labels = comma) + ggtitle("Plot of Average Close Prices Across DC Zip Codes") + labs(x = "ZIP Code") + scale_fill_gradient(low = "light grey", high = fill) + theme_bw() + theme(legend.position="none", axis.text.x = element_text(angle = 90, hjust = 1))

Average Sales Difference by ZIP Code

This data set is interesting to me because it begins to highlight which housing markets within DC are “heating up”. Seeing a positive difference between closing price and list price implies increased competition for homes in these ZIPs, and is where rapid-fire bidding wars are most likely to take place.

We’ll check this data out in greater depth in a few minutes, but it’s interesting to see Brookland, Congress Heights/Anacostia, and Logan Circle as the three “hottest” ZIPs with Georgetown, Kalorama/Woodley Park/Cleveland, and West End as the three “coldest” ZIPs.

plot_zip_sales_diff <- ggplot(data = na.omit(g_zip), aes(x = reorder(zip, -avg_sale_diff), y = avg_sale_diff, fill = avg_sale_diff)) + geom_col() + scale_y_continuous(name = "Difference in Close and List Prices ($)", labels = comma) + ggtitle("Plot of Difference in List and Sale Prices") + labs(x = "ZIP Code", subtitle = "Positive values denote ZIP codes where the average closing price exceeds the list price") + scale_fill_gradient() + theme_bw() + theme(legend.position="none", axis.text.x = element_text(angle = 90, hjust = 1))

When are Sales Happening?

Sales During August, 2017

Gross Sales Volume During August, 2017

One cool trend from both of these charts is the spike in sales occuring each Monday (the three highest mid-month points) - clearly folks are investigating homes on the weekend and looking to get their weeks off to a good start.

g_date <- sales %>% group_by(close_date) %>% summarize(count = n(), gross = sum(close_price))

plot_num_sales <- ggplot(na.omit(g_date), aes(x = close_date, y = count)) + geom_line() + scale_y_continuous(name = "Number of Homes Sold", labels = comma) + ggtitle("House Sales During August 2017") + labs(x = "Date") + theme_bw() 

plot_gross_sales <- ggplot(na.omit(g_date), aes(x = close_date, y = gross)) + geom_area(alpha = 0.7, color = line, fill = fill) + scale_y_continuous(name = "Gross Sales Volume ($)", labels = comma) + ggtitle("Gross Sales Volume During August 2017") + labs(x = "Date") + theme_bw() 

Mapping ZIP Code Sales Data

When I started this exercise, the thing I wanted to do most was visualize different types of sales data on top of a map of DC broken up by ZIP code. After wasting an embarrassing amount of time with fruitless Googling and Stack Overflow-ing, I finally found what I needed - the choropleth map. A crazy name for a relatively intuitive map, the choropleth breaks down the frequency or proportion of a measurement over blocks of space in a map.

Armed with my new favorite word, I set out in search of how to make my own choropleths in R. Eventually I stumbled on the choroplethrZip library from arilamstein and was able to get his code going on my machine.


With the choropleth library set up, all I needed to do was pass a dataframe with two columns, named region and value, into the zip_choropleth function to start making maps.

Choropleth Closing Prices by ZIP Code

No surprises here - sales in Northwest dominate this chart, with many of the most expensive occurring in Georgetown or Kalorama.

Choropleth Sales by ZIP Code

This map shows where the most transactions are taking place across the city: H Street/NOMA/Eckington takes the lead with 111 purchases in August, followed by the 14th Street Corridor/Adams Morgan with 94, and then Petworth/Brightwood/16th Street Heights with 87.

Choropleth Sales Difference by ZIP Code

As alluded to earlier, this map shows the difference in list vs. closing price across DC ZIPs - another good way to see which neighborhoods are receiving above-market interest. While they may dominate the price charts, ZIPs throughout Northwest fare poorly by this metric, frequently selling for much less than asking. Interest in properties along the periphery of DC proper could also mean that speculation is moving further from the urban core, as more buyers are priced out of expensive neighborhoods and bidding up prices for less-expensive areas.

Choropleth Subsidies by ZIP Code

Interestingly, the ZIP code (20032) receiving the highest average purchase subsidy, at just under $15,000 per transaction, was #2 in the previous section, where closing prices exceeded list prices by almost $7,000. I’m not totally sure the significance of that connection, but an easy inference to make is that government subsidies are bolstering the finances of home buyers in Southeast DC, artificially raising sale prices. I’ll be curious to see if this is a one-time coincidence, or turns out to be a trend moving forward.

zip_price <- select(g_zip, region = zip, value = avg_close_price)
choro_price <- zip_choropleth(zip_price, state_zoom = "district of columbia", title = "August 2017 DC Home Closings by ZIP Code", legend = "Close Price") + coord_map() 

zip_sales <- select(g_zip, region = zip, value = count)
choro_sales <- zip_choropleth(zip_sales, state_zoom = "district of columbia", title = "August 2017 DC Sales Count by ZIP Code", legend = "Number of Sales") + coord_map() 

zip_subsidy <- select(g_zip, region = zip, value = avg_subsidy)
choro_subsidy <- zip_choropleth(zip_subsidy, state_zoom = "district of columbia", title = "August 2017 DC Home Purchase Subsidies by ZIP Code", legend = "Size of Subsidies") + coord_map() 

zip_sale_diff <- select(g_zip, region = zip, value = avg_sale_diff)
choro_sale_diff <- zip_choropleth(zip_sale_diff, state_zoom = "district of columbia", title = "August 2017 DC Closing Sale Differences by ZIP Code", legend = "Difference in Close and List Prices") + coord_map() 

Sales of the Month (5 Most Expensive Closings)

Rank Address ZIP Code List Price Close Price Bedrooms Bathrooms Half Bathrooms Closing Date
1 3241 R ST NW 20007 4500000 4000000 5 3 2 8/15/2017
2 1339 29TH ST NW 20007 4299000 3975000 7 4 2 8/21/2017
3 2922 GLOVER DR NW 20016 3695000 3500000 6 5 1 8/1/2017
4 2121 S ST NW 20008 3400000 2900000 5 4 2 8/17/2017
5 4054 52ND TER NW 20016 2695000 2700000 5 4 2 8/1/2017
head(arrange(sales,desc(close_price)), n = 5)

Steals of the Month (5 Least Expensive Closings)

Rank Address ZIP Code List Price Bedrooms Bathrooms Half Bathrooms Closing Date
1 3872 9TH ST SE #102 20032 65000 1 1 0 8/30/2017
2 3872 9TH ST SE #302 20032 64900 1 1 0 8/28/2017
3 5 BRANDYWINE ST SE #B-33 20032 40000 2 1 0 8/2/2017
4 5 BRANDYWINE ST SE #B-35 20032 40000 2 1 0 8/8/2017
5 3072 30TH ST SE #302 20020 26900 2 1 0 8/29/2017
tail(arrange(na.omit(sales),desc(close_price)), n = 5)


All in all, exploring this data set was a great way for me to keep building my R scales while also exploring some of the macro trends in the DC housing market.

Since POPville publishes the home sales dataset monthly, I hope to be back next month with an update!