watcher is a data scraping and data visualiztion projects intended to provide insight into the watch grey market. In this post, I introdce v0.0.1 of watcher and do some light analysis in Julia of the dataset which powers it.
I’m excited to officially announce v0.0.1 of watcher, the watch-watching site 😄
watcher is a data visualization tool intended to provide insight into the trends of grey market watch prices. For those not familiar, grey market watch prices can be absolutely insane. At some point, I realized that I could probably turn all these private sales into data which could provide a snapshot of the grey watch market, and even potentially provide the basis of a dataset for predicting the prices of watches.
Right now, it’s pretty barebones…but I hope that in the future I can flesh it out more!
watcher scrapes watch prices from grey market websites like forums and consolidates them into a single dataset. The dataset contains posts from both private sellers and authorized dealers and resellers.
On the technical side, watcher relies primarily on a small Python script deployed to ec2 to scrape data on a cron job. This data is pushed to s3, and then served via a Shiny app.
Right now, data is sourced primarily from a single forum in order to deliver an MVP. In the future, I hope to expand the data sources to places like Reddit, Chrono24, and more.
Some Analysis of the Dataset in Julia
Since we’re dealing with a new dataset, I thought this would also be a perfect opportunity to demonstrate a data analysis workflow in Julia that I’ve wanted to do for some time.
This is not meant to be an exhaustive analysis by any means, but more of a high-level walkthrough of some elements of the analysis that I’ve already done when I was preparing to build watcher.
Initial Setup
First, I’m going to import some packages that I’ll be using as well as define a few convenience functions. For the most part, any data analysis workflow I do in Julia is going to require at least DataFrames and DataFramesMeta.
For plotting, I’ve recently been adapting a combination of Makie and AlgebraOfGraphics in Julia. More on that later!
The convenience functions I’m defining here are used to connect to a database (in this case, a local postgres server) and to query that database using SQL.
Note
Note that I’m using Julia’s ENV to store credentials. I put these in my startup.jl file so that they persist through sessions and I’m never putting secrets in code!
Code
usingAlgebraOfGraphicsusingCairoMakieusingColorsusingDataFramesusingDataFramesMetausingDatesusingFormattingusingLibPQusingStatistics: mean, medianupdate_theme!(theme_minimal()) # set the plot theme#=generally, I'd put something like the following into an internal package. the gist here is that the following functions: - abstract away connecting to postgres - queries our database and returns that query as a dataframe.=#functionconnect_to_postgres() connection_string = ["dbname=postgres","host=localhost","user=$(ENV["DB_USER"])","password=$(ENV["DB_PASSWORD"])", ]return LibPQ.Connection(join(connection_string, " "))endfunctionquery_db(conn, query)returnDataFrame(LibPQ.execute(conn, query))end
Here are those convenience functions in action, and a snippet of the DataFrame which is returned.
70s Squale Master 1000m Orange/Black Bezel for Tissier
USD
1800.0
2023-11-12T20:55:50
0
3
FS Only - Squale 1521 50 Atmos Blue Dial Polished Case ~ $695 shipped
USD
695.0
2023-11-12T18:00:32
1
4
SM x Watch Dives Milan Blue
USD
150.0
2023-11-12T16:19:38
0
5
FS:Casio Protrek 1300YJ
USD
159.0
2023-11-12T15:37:25
0
As part of our initial setup, I’d also like to define some approximate currency conversions to USD in order to keep prices consistent in the analysis. We’ll create a dictionary of these and use that dictionary to easily convert the price with a DataFrames macro.
currency_conversions =Dict("USD"=>1,"CAD"=>.80,"EUR"=>1.15,"GBP"=>1.35,"AUD"=>.66)#=note that this function mutates the original dataframe without the need to assign a new object.=#@transform! watches @byrow:value_in_usd = currency_conversions[:currency] *:price;first((@select watches :title :currency :price :value_in_usd), 5)
5×4 DataFrame
Row
title
currency
price
value_in_usd
String?
String?
Float64?
Float64
1
Marathon Navigator X J Crew Limited Edition
USD
225.0
225.0
2
70s Squale Master 1000m Orange/Black Bezel for Tissier
USD
1800.0
1800.0
3
FS Only - Squale 1521 50 Atmos Blue Dial Polished Case ~ $695 shipped
USD
695.0
695.0
4
SM x Watch Dives Milan Blue
USD
150.0
150.0
5
FS:Casio Protrek 1300YJ
USD
159.0
159.0
Great, now we’re ready to actually do some analysis!
Getting Some Insight into Price Trends
Overall
Let’s take a look at the price trend (average and median) for all of the watches in our dataset by week.
First, we want to calculate those averages by week. We can easily do that by getting the week of each date with floor(), grouping by the week, and then calculating both the mean and median price for that week.
Note that there can be a huge discrepancy in watch prices–they can range from $50 to $50,000 or more, so to give us a better idea of the trend of the overall prices here I’m going to use a log scale.
Let’s plot that trend using the AlgebraOfGraphics library.1
Code
labels = ["Average Price", "Median Price"]plt =data(prices_by_week) *mapping(:week, [:avg_price, :median_price] .=>"value", color =dims(1) => (c -> labels[c]) ) *visual(Lines)draw( plt, axis = ( xlabel ="Week of Post", ylabel ="Log Price", title ="Overall Watch Price Trends" ), legend = (; position =:bottom, framevisible =false), palettes = (; color = [:deepskyblue2, :slateblue4]))
It looks like average prices are trending down heading into the holiday season.
Taking a Look at a Select Few Brands
Let’s do some additional data manipulation and check out the price trends across different brands.
We’ll take the same approach as before, except we’ll have to do a little bit of work to extract brands from posts.
Each post title is a simple string, and generally each post only has one brand in its title. For the purposes of this post, we’ll just stop at the first brand.
We can define a function to easily capture the first brand found in the post title, and then apply this function to the titles in our DataFrame.
Code
functionget_brand(post_title, brands) a =missingfor brand in brandsifoccursin(lowercase(brand), lowercase(post_title)) a = brandendendreturn aendbrands_of_interest = ["Rolex", "Omega", "Tudor", "IWC"]# get the brand into our dataframe@transform! watches @byrow:brand =get_brand(:title, brands_of_interest);
We should now have the brands we defined above as a column in our dataset. This enables us to do some neat stuff, for example to examine the distribution of Rolex prices:
Code
plt =data(@subset watches @byrow:brand =="Rolex") *mapping(:value_in_usd ) * AlgebraOfGraphics.density() *visual(; color = (:black))fig =draw( plt, axis = ( xlabel ="Price (USD)", ylabel ="Density", xticks = (0:30000:100000, ["\$"* Formatting.format(x, commas =true) for x in0:30000:100000]), title ="Distribution of Rolex Prices", palette = (; color = (:blue, .5)) ))xlims!(0, 100000)fig
We can also do a similar average price plot like the one we did before, this time without doing any log transformation (the prices of these brands are all relatively close to each other). The @chain macro should look pretty simillar:
plt =data(prices_by_week_brand) *mapping(:week,:avg_price =>"value", color =:brand =>"Brand" ) *visual(Lines)draw( plt, axis = ( xlabel ="Week of Post", ylabel ="Price (USD)", yticks = (0:15000:60000, ["\$"* Formatting.format(x, commas =true) for x in0:15000:60000]), title ="Overall Watch Price Trends" ), legend = (; position =:right, framevisible =false), palettes = (; color = [:coral, :deepskyblue3, :black, :darkolivegreen]))
These graphs should look pretty familiar - because most of them are available in the watcher tool!
Wrapping Up
Thanks for reading through my analysis, or thanks for skimming through my website and skipping to the end! Regardless, I hope that this was an interesting peak into my newest project and a data analysis workflow in Julia!
Footnotes
At the moment, I’m not entirely sure what the future of plotting in Julia looks like–but I am pretty confident that it’ll be Makie, and for those of us who are coming from R it’ll probably involve AlgebraOfGraphics as well. See this Julia discourse post for more details.↩︎