Introduction

A while ago, I wrote a post introducing watcher–a data scraping project written in Python. My intention was to do a fun data scraping project related to one of my hobbies where I could do some analysis/enable others to explore the data on their own via an RShiny app. In the post, I introduced the concept of the project, did some light analysis in Julia, and pointed everyone towards a bare bones Shiny app to explore some of the data I was collecting.

After a few months, I deprecated the project. The scraper I deployed to ec2 kept breaking because of daylight savings (over-engineering on my part) or IP blocking preventing it from running, and loading the data from s3 was inefficient. For a while, I forgot about this project in favor of others.

Recently, I was thinking that it might be a good time to revisit this project for two reasons. One is that I’ve gotten more comfortable deploying my code deployed Docker and a virtual machine like a Digital Ocean droplet. The other is that I wanted to experiment lightly with using an LLM to classify post titles into something a bit more structured.

A Note on the Use of an LLM

I can already hear people I know saying “Michael, wtf? You’re using an LLM to do something? I thought you hated AI???”

To be clear, I do think LLMs are useful tools in some cases–particularly text classification problems with a human in the loop (such as this one!). I am tired of the breathless hype around “AI” (usually referring to LLMs) that’s driven and sustained by companies with a vested interest in sustaining said hype.

These feelings warrant their own post when I can collect my thoughts in an organized way that isn’t just dunking on big tech. But for now, I’ll say this–LLMs are a tool, and like every tool they have their use. Would I trust an LLM to book my flights for me? Absolutely not.

Do I trust an LLM to read some text and follow my instructions to classify it into structured output that I can lay eyes on afterwards? Sure–the key point here being that I’m checking the output, and that this is a pretty low-stakes project. Ideally, I’d even set up automated tests to ensure the output is consistent with what I expect. I just haven’t gotten to that point yet :)

This post will focus on exploring trends in the Tudor watch market, and I’ll intersperse some commentary about how the backend of the project has changed along the way.

Exploring Trends in the Tudor Watch Market

A Bit about Tudor

Tudor–in case you aren’t familiar–is a luxury watch brand which was founded in 1926 by Hans Wilsdorf, the founder of Rolex. The brand initially focused on producing watches for various military units (such as the Marine Nationale in France) and professional divers, including their own version of the Submariner. The brand today is considered a respectable, mid-level luxury watch alongside the likes of Omega.

A 1979 Tudor Submariner issued to the Marine Nationale

Tudor is the subject of this post basically for no other reason than I like Tudor. As a sister brand to Rolex, it shares a lot of the same heritage and design DNA (the Submariner is the obvious link), and the style of the watches just speaks to me.

Diving into the Data (no pun intended)

The market for Tudor on the forum is a small one, and represents only about 4% of the total posts that I’ve scraped from the site.

Brand	Total Posts	% of Total
Other Brands	73389	96%
Tudor	3394	4%

A positive of this is that the number of posts was small enough to reasonably run the LLM classifier I mentioned earlier.

An LLM classifier…?

I decided to revisit this project after thinking over how I could use an LLM to derive information about a watch from forum posts, either from their titles (like this project) or even something like a seller’s Reddit comment on r/watchexchange.

In the process of revising watcher, I made a few changes to how the project is set up and how I interact with it:

my scraper is a bit more robust now, with additional logging and more complex logic around rotating user agents, proxies, and utilizing httpx instead of requests to resolve a strange 409 error issue I was running into.
I have a whole mini-ecosystem in Digital Ocean that leverages droplets for a variety of tasks. The scraper and classifier are both deployed there via docker, and another droplet hosts a Postgres database where the data is written–a departure from my previous s3 based project. I manage the models through SQLAlchemy and any necessary migrations through Alembic.
previously, scared of running up a huge AWS bill, I attempted to start and stop an ec2 instance on a schedule via scheduled lambda functions. Daylight savings time made this a stupid idea.
rather than storing everything and de-duping later, on insertion I generate a unique UUID hashed on the title of the post and its post timestamp. This automatically keeps posts unique, as I can run a check to ensure the post is not already present in the database before inserting.

With these changes, I was able to turn my scraper loose and increase the size of my initial watcher dataset from 25,000 posts to over 76,000 posts in the course of a few days–the scraper was also able to backfill the posts where my previous one had failed.

These changes also made me feel confident in adding a separate element to the project, which is classifying titles/posts into a sensible set of data for analysis purposes. For example: previously, when I wanted to look into models of different brands, I had to write some gnarly regex to get what I was looking for. The purpose of the classifier is to get me 80% of the way there without regex, and generally just save me some time.

How does the LLM classification work?

Alongside the scraper, there is now a separate process which I can run to send the values of my pydantic model for scraped posts to an LLM (specifically Claude from Anthropic). I use a separate pydantic model as the parsing target for the LLM output so I can store this in its own table in the database, linked to the raw posts by UUID.

The prompt for this is pretty simple, yet it’s pretty effective. It borrows from some techniques from Max Woolf’s blog, including offering the LLM a tip for better responses.¹

The sweet thing about using a pydantic model for the LLM output is that I can use LangChain to parse the LLM output to said model. Though the model is defined to capture information like the price, seller, and other features derived from the post, the database insertion uses only the watch’s name, model, reference number, and year if it exists. The price, seller, and other post features are derived from the raw table for consistency, and because I know this information is solid.

I spot checked 15 or so outputs before running the full pipeline of Tudor posts, and from my perspective most of the mistakes the LLM makes are in a lack of consistency around the way it labels things like model names or reference numbers. For example, Black Bay 58, Black Bay Fifty-Eight, and Black Bay Fifty Eight are all the same…but the model labels them differently depending on what exactly the post title is. TUDOR and Tudor are also the same.

Some of these issues could be fixed with some additional prompt engineering, but even at this stage: it isn’t a 100% solution, but it gets me 80% of the way there without a bunch of regex wrangling.

How are prices distributed across different flagship models?

Let’s take a look at the price distribution for some of Tudor’s flagship models: the Black Bay (referring to the 41mm model), the Black Bay 58, the Black Bay GMT, and the Pelagos.

On average, listing prices for the Pelagos are the highest of the flagship models, but prices for the generic Black Bay model have a wider range. The same can be said about the prices of the Black Bay 58, which I’d like to focus on here. My guess is that the higher range of prices for the BB58 is due to two factors:

there are quite a few “special” BB58 models which are made of more valuable materials and thus valued more highly both in retail and secondy markets
the original black and gold–and even the blue BB58–are not as sought after nowadays due to the proliferation of new Tudor models, including of the BB58

On the latter point, there used to be a longgggg waitlist for these watches at dealers and secondhand prices were inflated due to demand. It took a while, and a few Watches and Wonders reveals, to cool the market off and make these obtainable for a fair price.

Anyways, let’s take a look at the most expensive listings and see if my theory holds any weight.

Watch Name	Reference Num.	Listing Price
Tudor Black Bay Fifty-Eight 18k Yellow Gold Green	79018V	$10,499
Tudor Black Bay 58 18K	M79018V-0001	$10,499
Tudor Black Bay 58 18K	M79018V-0001	$10,499
Tudor Black Bay 58 18K	M79018V-0001	$10,499
Tudor Black Bay 58 18K	M79018V-0001	$10,499

As I suspected, the highest-priced listings are pretty consistently the 18k gold Black Bay 58 with a green dial, followed by a few other special or limited editions.²

So, which Tudors are priced the lowest?

Watch Name	Reference Num.	Listing Price
Tudor Black Bay Fifty-Eight	79030	$850
Tudor Black Bay Fifty-Eight Blue	79030B	$1,600
Tudor Black Bay 58	NA	$1,800
Tudor Black Bay 58	M79030B	$1,999
Tudor Black Bay 58 Blue	NA	$2,000

Again, exactly as I suspected–the OG Black and Gold/Blue models. Using reference numbers classified by the LLM classifier, we can also see whether BB58 references which aren’t the original black and gold/blue have a higher than average price than the OGs.

Reference	Total Posts	Average Price
OG BB58	182	$2,938
Other	133	$4,372

In fact, for the original BB58 model, prices have declined significantly over the past few years.

Have any flagship models retained their value particularly well?

We took a look at the price trends for the original Black Bay 58 model, but let’s take a look generally at the different models.

Compared to the other flagship models, the original Black Bay model appears to have retained its value the best (while still declining in value since 2022). The Black Bay GMT (a model that I own and love) has lost the most value since 2022.

How have prices for all Tudor listings changed over time?

Across all of Tudor’s models, the average weekly listing price for Tudor watches has declined from $4633 in 2022 to $3174 thus far in May, of 2025, a drop of -31%.

Similar to our analysis of the flagship models, let’s take a look at the price distribution by year rather than by model.

Not only are the prices declining when viewed on a time scale, but the volume of posts is declining as well. 2023 saw the most posts for Tudor watches, but it quickly declined in 2024.

Of course, the declining prices and post volume could be due to the broader trend of a cool-off in the luxury watch market over the past few years.³

Wrapping Up

Well–I’ve resurrected my watcher project in some capacity! Though maybe now it will be more of an ad-hoc thing where I talk about it on my blog and make some pretty visuals.

I hope to do more analyses in the future about different brands or things I notice from the dataset. I think that regardless, I’ve started learning a lot about setting up a production pipeline and have learned to not be as scared to just run a container endlessly.

Footnotes

Max’s exploration of this is here. Though the results are inconclusive, might as well try it out!↩︎
One of these special editions was the “Watches for Good” edition, which is apparently a special edition that was a collaboration with watch enthusiasts at Google?? LoupeThis has an explanation here…but this is the first I’ve heard of it.↩︎
https://www.businessoffashion.com/news/luxury/rolex-patek-used-watch-prices-fell-to-three-year-low-in-2024/↩︎