Ranking GB places by station count

28 Dec 2021 - Genevieve Clifford

I’ve had this idea in my head for a while now, and it goes a little something like this: “why don’t I rank places in Great Britain by the number of stations they have?”

I had a go at this earlier (rules are to follow) on Twitter by hand for Wales only:

There was a 'Forbidden' error fetching URL: 'https://twitter.com/merchygoedwig/status/1475431671373746179'

There was a 'Forbidden' error fetching URL: 'https://twitter.com/merchygoedwig/status/1475432974149726212'

I didn’t actually get this right, Barry only has two stations according to these rules, but I’ll correct this later.

Rules

Although, I have to be careful about exactly what I mean by number of stations, it wouldn’t be very fun to try to define what a place constitutes and then enumerate all the stations in that area. That would be incredibly time-consuming and would warrant messing with polygons and the like. Instead, I’ll just do a bit of data manipulation to find the following:

The number of stations for a place (name of a settlement) with the following criteria:

Qualifying criteria

Disqualifying criteria

Raison d’être

Yeah, I know it’s a strange set of requirements, but I just thought it’d be fun if I constrained the search to these parameters only.

Methodology

There is a CSV file containing every station in Great Britain along with its CRS code, this gives the station names needed in a relatively nice computer-readable format. There needs to be a bit of munging to get this data in a more usable format, but once there, applying the above rules programmatically should be possible. To help perform this analysis, pandas in a Jupyter Notebook under Python will be used.

Code

import pandas as pd
df = pd.read_csv('/path/to/data.csv')

# Combining multiple columns together
df1 = df['Station Name (A-F)']
df2 = df['Station Name (G-M)']
df3 = df['Station Name (N-R)']
df4 = df['Station Name (S-Y)']
stations = pd.concat[(df1, df2, df3, df4)].dropna()

# Getting non-unique first elements
first_elements = pd.Series(stations.str.split(' ', expand=True)[0]).value_counts()
non_unique = first_element[first_element>1]

This reduces the number of entries to check from 2580 to 233, a list of elements that is much more manageable to sort through. Although there are a lot of modifiers in first position (i.e. ‘New’, ‘The’, etc.), these can quickly be discarded through visual inspection. Places of count < 3 will not be considered, due to the expected large occurrence of these results.

Places not in a containing place (i.e. for ‘London’: London Fields, London Road (Brighton)) will not contribute to the count of the other place. Stations named after an area in a place which sounds similar to the containing place will not contribute to the total for the containing place, i.e. Hackney Wick does not contribute to the total for Hackney, as Hackney Wick is a unique place, ‘Wick’ is not a modifier to ‘Hackney’.

Filtering can be done using the following pandas query:

stations[stations.str.contains('Search term')]

Results

As expected, London ranks first amongst all other places, having some fourteen stations in this format, with a count ten higher than the next placed entry. Five places rank together in second place, Birmingham, Manchester, and Liverpool are found here and are expected. Unexpected for this rank are Birkenhead and Southend, being much smaller in population. The next rank are shared amongst fourteen places. In third position we find two capitals of constituent countries of the United Kingdom and the UK’s largest airport, but also much smaller settlements like Ardrossan, Redcar, and Ryde.

Rank Count Place Stations
1 14 London Blackfriars, Bridge, Cannon Street, Charing Cross, Euston, Fenchurch Street, Kings Cross, Liverpool Street, Marylebone, Paddington, St Pancras, Victoria, Waterloo, Waterloo East
2= 4 Birkenhead Central, Hamilton Square, North, Park
2= 4 Birmingham New Street, Snow Hill, Moor Street, International
2= 4 Liverpool Central, James Street, Lime Street, South Parkway
2= 4 Manchester Piccadilly, Victoria, Oxford Road, Airport
2= 4 Southend Central, East, Victoria, Airport
3= 3 Ardrossan Harbour, South Beach, Town
3= 3 Blackpool North, South, Pleasure Beach
3= 3 Burnley Barracks, Central, Manchester Road
3= 3 Cardiff Central, Queen Street, Bay
3= 3 Edinburgh Waverley, Gateway, Park
3= 3 Exeter Central, St Davids, St Thomas
3= 3 Heathrow Airport Terminal 4, Terminal 5, Terminal 1, 2 and 3
3= 3 Maidstone Barracks, East, West
3= 3 Paisley Canal, Gilmour Street, St James
3= 3 Pontefract Baghill, Monkhill, Tanshelf
3= 3 Redcar Central, East, British Steel
3= 3 Ryde Esplanade, St Johns Road, Pier Head
3= 3 Warrington Bank Quay, Central, West
3= 3 Watford High Street, Junction, North

Near misses

Places where one station prevented them from getting a higher rank.

Place Naïve count Reason disqualified
Barry 3 Missing modifier for BRY
Dorking 3 Missing modifier for DKG
Enfield 3 ENC refers to Enfield Chase, not Enfield
Hackney 3 HKW refers to Hackney Wick, not Hackney
Manchester 5 MUF refers to Manchester United, not Manchester
Wandsworth 3 WSW refers to Wandsworth Common, not Wandsworth
Worcester 3 WOP starts ‘Worcestershire’, not Worcester

The Information Exchange is the personal website of Genevieve Clifford, made with love between 2019 and 2022.
Creative Commons Licence
Settings.
Privacy Policy.