How MTA shut down my app for Penn Station commuters

This is a story of my first iPhone app — from idea, to planning, to execution, release, initial success and (spoiler alert!) having to shut it down because of New York City’s MTA. The last part was painful, but overall it was a great experience and I learned a ton of cool things.

Background

When I moved to Long Island, I started commuting via Long Island Rail Road (LIRR) to Manhattan’s Penn Station. For those that never had to use it, Penn Station is massive, crowded and overwhelming. There are three train systems — Amtrak, Long Island Rail Road and New Jersey Transit. Madison Square Garden — the world’s most famous arena — stands on top of Penn Station (there are some exits that lead directly to the venue), and there are many shops, bars, restaurants, and other facilities. 650,000 people go through Penn Station every day — busier than all of NY’s airports (JFK, LaGuardia and Newark) combined.

Between LIRR, NJ and Amtrak, there are 1200 trains per day, and 21 shared tracks. For many reasons, it’s not known what track your train is going to leave from until the last moment. To manage this problem, there’s a giant display in the middle of LIRR area of Penn Station. Thousands of people are standing under it at any time. The display shows train schedule — but no track number. You stand staring at it for some time, until ten minutes before the train leaves, track number is finally displayed — you now know where to go to get home. Unfortunately, so does everyone else — at the same time.

As soon as track number for any train is revealed, an enormous crowd of people starts stampeding towards (a very narrow) staircase down the tracks. If you wait until after the crowd passes — since trains are overcrowded — it will be standing room only (this only applies to peak trains. If you’re taking a 3 AM train to Suffolk County, it will be almost empty).

Idea

After a few days, I realized the system was inefficient and started looking for ways it could be improved. Was there any way to know the track number ahead of time? The official LIRR app and website published it at the same time as the giant display — few minutes before the departure time (since then, the website has stopped publishing it altogether, possibly due my app— read on.) It seemed like there was no other choice but to join everyone in waiting for the reveal.

However, after a few weeks, I started noticing certain patterns. It looked like during peak hours, trains were leaving from pretty much the same track, every day. To test my theory, I decided to record track number for different trains I was taking. It proved to be mostly correct — while off-peak track numbers varied wildly, during peak hours it was same track every day, 95% of the time. I started going down to the tracks ahead of the officially displayed time, and immediately realized how much better that was — I would be one of the first people to get onto the train, had my choice of any seat and avoided the anxiety (it goes without saying that I always give up my seat to the elderly, sick or pregnant women, and everyone should do the same.)

Peak hours at Penn Station

I started to look for ways to share my knowledge with other people, and got the idea to create a mobile app. You would get to Penn Station, open the app, see what track their train would leave from, be one of the first people to head down to the track, and get a good seat.

The idea seemed simple enough, but figuring out the exact way it should work proved to be a challenge. I clearly needed to save the train data every day, but what’s next? I thought about creating a formula to determine the probable track number, but decided against it — calculations got complicated fast, and if the app was incorrect even a few times, users would lose trust. Instead, I decided to simply display the last few days’ track numbers, and let the user draw their own conclusions. If they saw the train always leaving from the same track, they would realize it’s likely to leave from the same track today as well — and in practice, that was almost always the case. This decision proved to be crucial for the app’s initial success.

Tech

I spent some time thinking about technology choices. The architecture consisted of three components:

Data Service — gets the data from LIRR and stores it in the database
Mobile app — iPhone, with Android release planned for later)
Server — sending data to the app via an API

For the back-end, I considered using a sexy new technology like Node.js/MongoDB, which I really wanted to learn (and did learn since then), but decided against it. Since this was not going to be a throwaway project, I needed to use technologies that I knew well. For that reason, I went with a decidedly unsexy stack — PHP/PostgreSQL. I was very happy with that choice, as everything “just worked”, and I found many useful libraries that helped me save time.

For the mobile component, I explored “JavaScript in a native wrapper” technologies like Phonegap/SenchaTouch. However — after some exploration, I decided against it, as nothing was working as it was supposed to. In the end, I hired a freelance developer to implement a native iOS (Swift) app, and it worked very well (If you don’t have experience managing freelancers, I don’t recommend this. In my past life, I ran a software development agency, so I was well-versed in the process of finding and vetting good freelancers. If you’ve never done it, it’s not easy and there’s a lot of risk of having a bad experience.)

Data

First, I needed to see if it was possible to get this data from MTA. An API would have been great, but they did not have one. The only way to get the track info would be to scrape it from their site. To do the scraping itself would be easy — track number appeared in a very predictable tag. Getting the data safely however (without getting blocked by MTA or bringing down their servers by bombarding them with requests) was a challenge.

In order to not get blocked, I needed to make it hard to identify requests coming from my script. That meant any of the following could be a telltale sign:

Repeated calls from the same IP address
Calls for the same URLs around the same time interval
Calls from the same web browser (user agent)

Getting around all of those was a lot of fun. Here’s what I ended up doing:

I signed up for Private Internet Access VPN service, which randomizes the server’s IP address for each call using an automatically updated list of proxies.
I randomized access times, pulling the data at different time intervals
For every request, I would use a different random user agent from a list I found online (can’t find the original one I used, but this looks like a good alternative)

After a few weeks of nights-and-weekends coding, the app was starting to look good. However, things just got more complicated — seemingly out of the blue, MTA released an official API! On one hand, I was very pleased at these news — I could get the data cleanly, without scraping. On the other hand, I spent a lot of efforts getting the scraping code to work perfectly. In the end, I didn’t mind too much — even though my scraping code would go unused, it was fun learning and getting it to work (some day this knowledge may prove useful). I applied for an API key and couple of weeks later received it.

Release

Even though I had to rewrite the backend to utilize the official API, the app was soon complete. I decided that it should look as minimalistic as possible — users would open the app, select the train they need, and immediately see the track number that this train has left from for the last 10 days.

It’s fairly obvious what track the train is about to leave from

After deciding on a price point ($1), I named the app “Track Train”, submitted it to the App Store, and off it went! Despite zero marketing, the app was getting sales and good reviews. People would write to me how long they’ve been waiting for an app like this one. Soon, it was one of the top results for LIRR-related keywords in the app store. Other developers started to reach out asking if they could use my API to implement the same app on other platforms. Even though the money was barely enough to pay for costs, I was happy I made something that people actually wanted, and started planning more features and expand to other stations (Atlantic Terminal in Brooklyn has the same problems).

Shutdown

Alas, it was not meant to be. Only a few months after the launch, I received an email from MTA that I was using the “wrong API”. Even though I got an official usage token from the MTA themselves, the API was for “hosting data for MTA applications and not intended for third party use”. I looked into the new API and realized with a sinking feeling that it didn’t have the most important feature — track number for the trains! Yes, the exact feature that my entire app was built upon ☹

I was not going to give up without a fight, so I embarked on campaign to plead my case to MTA. I reached out to different people with escalating levels of authority, asked to add the needed features to the new API, tried to convince them that people relied on my app, but it was all in vain. There was no way a large bureaucracy like the MTA would make an exception for me or anyone else, so I couldn’t get the data I needed from the official channels. They also stopped publishing the track number on the website — so going back to scraping was not an option either.

After spending almost a year of nights and weekends, and only a few months after launch, I had no choice but to shut down the app. I felt terrible disappointing the people that came to rely on it in their daily commute, and offered a refund to everyone who bought the app.

Lessons

Despite the unfortunate end of the project, I don’t feel too bad about how it went.

Here are some lessons that I learned along the way:

If your app depends on a large inflexible organization like MTA to succeed, you’re going to have a bad time. When MTA changed their mind about making the data available, there was no practical way for me get it — so you always have to make sure to have a backup plan.
For a project that you intend to launch, using proven and stable technologies that you’re familiar with makes the most sense. I was very happy with my decision to use PHP and PostgreSQL — if I used Node.JS and MongoDB, the process would probably take three times as long.
It makes sense to first build the simplest version that brings value to users. Building some kind of a machine learning prediction mechanism would’ve significantly delayed the project, and I doubt I would’ve maintained interest and shipped.
Solicit feedback from users at the earliest stages — as soon as I had an idea, I discussed it with all LIRR commuters I knew, and they all said it’s something they’d use and pay money for. I showed it to them again when I had mockups, and again when I built an earliest prototype, and got useful feedback at all times.

Overall, while I feel bad about having to shut it down, I’m happy I made TrackTrain. It was my first iPhone app, and I’m proud of completing the project and shipping. I learned a lot of valuable things and had a good time doing it. I feel bad for all the people that came to rely on the app in their commutes, but hopefully a better solution will emerge soon.

Some day I’m planning to open source the historical track data from my app — I’m sure there are interesting conclusions that can be inferred. I hope that one day MTA and similar transit organizations realize they need to be friendlier to third party developers if they want to attract people to work with their platform. As for me, shortly thereafter I joined an amazing startup that creates software to help police officers do their jobs more efficiently, and couldn’t be happier.