28 Dec 2021 - Genevieve Clifford
I’ve had this idea in my head for a while now, and it goes a little something like this: “why don’t I rank places in Great Britain by the number of stations they have?”
I had a go at this earlier (rules are to follow) on Twitter by hand for Wales only:
There was a 'Forbidden' error fetching URL: 'https://twitter.com/merchygoedwig/status/1475431671373746179'
There was a 'Forbidden' error fetching URL: 'https://twitter.com/merchygoedwig/status/1475432974149726212'
I didn’t actually get this right, Barry only has two stations according to these rules, but I’ll correct this later.
Although, I have to be careful about exactly what I mean by number of stations, it wouldn’t be very fun to try to define what a place constitutes and then enumerate all the stations in that area. That would be incredibly time-consuming and would warrant messing with polygons and the like. Instead, I’ll just do a bit of data manipulation to find the following:
The number of stations for a place (name of a settlement) with the following criteria:
Yeah, I know it’s a strange set of requirements, but I just thought it’d be fun if I constrained the search to these parameters only.
There is a CSV file containing every station in Great Britain along with its CRS code, this gives the station names needed in a relatively nice computer-readable format. There needs to be a bit of munging to get this data in a more usable format, but once there, applying the above rules programmatically should be possible. To help perform this analysis, pandas in a Jupyter Notebook under Python will be used.
import pandas as pd
df = pd.read_csv('/path/to/data.csv')
# Combining multiple columns together
df1 = df['Station Name (A-F)']
df2 = df['Station Name (G-M)']
df3 = df['Station Name (N-R)']
df4 = df['Station Name (S-Y)']
stations = pd.concat[(df1, df2, df3, df4)].dropna()
# Getting non-unique first elements
first_elements = pd.Series(stations.str.split(' ', expand=True)[0]).value_counts()
non_unique = first_element[first_element>1]
This reduces the number of entries to check from 2580 to 233, a list of elements that is much more manageable to sort through. Although there are a lot of modifiers in first position (i.e. ‘New’, ‘The’, etc.), these can quickly be discarded through visual inspection. Places of count < 3 will not be considered, due to the expected large occurrence of these results.
Places not in a containing place (i.e. for ‘London’: London Fields, London Road (Brighton)) will not contribute to the count of the other place. Stations named after an area in a place which sounds similar to the containing place will not contribute to the total for the containing place, i.e. Hackney Wick does not contribute to the total for Hackney, as Hackney Wick is a unique place, ‘Wick’ is not a modifier to ‘Hackney’.
Filtering can be done using the following pandas query:
stations[stations.str.contains('Search term')]
As expected, London ranks first amongst all other places, having some fourteen stations in this format, with a count ten higher than the next placed entry. Five places rank together in second place, Birmingham, Manchester, and Liverpool are found here and are expected. Unexpected for this rank are Birkenhead and Southend, being much smaller in population. The next rank are shared amongst fourteen places. In third position we find two capitals of constituent countries of the United Kingdom and the UK’s largest airport, but also much smaller settlements like Ardrossan, Redcar, and Ryde.
Rank | Count | Place | Stations |
---|---|---|---|
1 | 14 | London | Blackfriars, Bridge, Cannon Street, Charing Cross, Euston, Fenchurch Street, Kings Cross, Liverpool Street, Marylebone, Paddington, St Pancras, Victoria, Waterloo, Waterloo East |
2= | 4 | Birkenhead | Central, Hamilton Square, North, Park |
2= | 4 | Birmingham | New Street, Snow Hill, Moor Street, International |
2= | 4 | Liverpool | Central, James Street, Lime Street, South Parkway |
2= | 4 | Manchester | Piccadilly, Victoria, Oxford Road, Airport |
2= | 4 | Southend | Central, East, Victoria, Airport |
3= | 3 | Ardrossan | Harbour, South Beach, Town |
3= | 3 | Blackpool | North, South, Pleasure Beach |
3= | 3 | Burnley | Barracks, Central, Manchester Road |
3= | 3 | Cardiff | Central, Queen Street, Bay |
3= | 3 | Edinburgh | Waverley, Gateway, Park |
3= | 3 | Exeter | Central, St Davids, St Thomas |
3= | 3 | Heathrow Airport | Terminal 4, Terminal 5, Terminal 1, 2 and 3 |
3= | 3 | Maidstone | Barracks, East, West |
3= | 3 | Paisley | Canal, Gilmour Street, St James |
3= | 3 | Pontefract | Baghill, Monkhill, Tanshelf |
3= | 3 | Redcar | Central, East, British Steel |
3= | 3 | Ryde | Esplanade, St Johns Road, Pier Head |
3= | 3 | Warrington | Bank Quay, Central, West |
3= | 3 | Watford | High Street, Junction, North |
Places where one station prevented them from getting a higher rank.
Place | Naïve count | Reason disqualified |
---|---|---|
Barry | 3 | Missing modifier for BRY |
Dorking | 3 | Missing modifier for DKG |
Enfield | 3 | ENC refers to Enfield Chase, not Enfield |
Hackney | 3 | HKW refers to Hackney Wick, not Hackney |
Manchester | 5 | MUF refers to Manchester United, not Manchester |
Wandsworth | 3 | WSW refers to Wandsworth Common, not Wandsworth |
Worcester | 3 | WOP starts ‘Worcestershire’, not Worcester |
The Information Exchange is the personal website of Genevieve Clifford, made with love between 2019 and 2021.
Settings.
Privacy Policy.