Tudor Trends: Revisiting watcher

Revisiting my data scraping project, complete with data scraping woes, lessons learned, and some new data viz.
Author
Affiliation

Data Analyst at CollegeVine

Published

May 13, 2025

Introduction

A while ago, I wrote a post introducing watcher–a data scraping project written in Python. My intention was to do a fun data scraping project related to one of my hobbies where I could do some analysis/enable others to explore the data on their own via an RShiny app. In the post, I introduced the concept of the project, did some light analysis in Julia, and pointed everyone towards a bare bones Shiny app to explore some of the data I was collecting.

After a few months, I deprecated the project. The scraper I deployed to ec2 kept breaking because of daylight savings (over-engineering on my part) or IP blocking preventing it from running, and loading the data from s3 was inefficient. For a while, I forgot about this project in favor of others.

Recently, I was thinking that it might be a good time to revisit this project for two reasons. One is that I’ve gotten more comfortable deploying my code deployed Docker and a virtual machine like a Digital Ocean droplet. The other is that I wanted to experiment lightly with using an LLM to classify post titles into something a bit more structured.

A Note on the Use of an LLM

I can already hear people I know saying “Michael, wtf? You’re using an LLM to do something? I thought you hated AI???”

To be clear, I do think LLMs are useful tools in some cases–particularly text classification problems with a human in the loop (such as this one!). I am tired of the breathless hype around “AI” (usually referring to LLMs) that’s driven and sustained by companies with a vested interest in sustaining said hype.

These feelings warrant their own post when I can collect my thoughts in an organized way that isn’t just dunking on big tech. But for now, I’ll say this–LLMs are a tool, and like every tool they have their use. Would I trust an LLM to book my flights for me? Absolutely not.

Do I trust an LLM to read some text and follow my instructions to classify it into structured output that I can lay eyes on afterwards? Sure–the key point here being that I’m checking the output, and that this is a pretty low-stakes project. Ideally, I’d even set up automated tests to ensure the output is consistent with what I expect. I just haven’t gotten to that point yet :)

This post will focus on exploring trends in the Tudor watch market, and I’ll intersperse some commentary about how the backend of the project has changed along the way.

Wrapping Up

Well–I’ve resurrected my watcher project in some capacity! Though maybe now it will be more of an ad-hoc thing where I talk about it on my blog and make some pretty visuals.

I hope to do more analyses in the future about different brands or things I notice from the dataset. I think that regardless, I’ve started learning a lot about setting up a production pipeline and have learned to not be as scared to just run a container endlessly.

Footnotes

  1. Max’s exploration of this is here. Though the results are inconclusive, might as well try it out!↩︎

  2. One of these special editions was the “Watches for Good” edition, which is apparently a special edition that was a collaboration with watch enthusiasts at Google?? LoupeThis has an explanation here…but this is the first I’ve heard of it.↩︎

  3. https://www.businessoffashion.com/news/luxury/rolex-patek-used-watch-prices-fell-to-three-year-low-in-2024/↩︎