Aggregating US hospital prices

Last week, @mubashariqbal shared this on Twitter:

I came across this idea a couple of months ago. I seem to recall that @levelsio tweeted about it, but my brain might be making that up since I can’t find the tweet1.

In any case, the first time I came across this idea, I dismissed it as something that was well beyond my maker capacity. I don’t think that has changed, but Mubs’ encouraging video made me reconsider.

In the past few months I’ve also been gaining a bit more confidence as a web developer and I’ve been changing my mindset around making stuff. I have accepted the fact that it’s unlikely that I’ll be able to create a revenue stream any time soon2.

Lowering my expectations feels freeing. I’d still love to be able to create something that many people find so valuable that they want to pay me for it. On the other hand, even if I build something and nobody cares, I can at least say that I had fun building it. I think this is the kind of mindset that many successful and happy makers are using - they just enjoy making and sharing stuff.

Doing some research

To me, the execution of this idea seems a bit difficult. I haven’t done an exhaustive search, but it seems that many hospitals make it difficult to find their price list or chargemaster. I’ve found a couple and they include obtuse names that only hospital managers and insurance companies might be able to make sense of. On top of that:

it usually contains highly inflated prices at several times that of actual costs to the hospital. [It] typically serves as the starting point for negotiations with patients and health insurance providers of what amount of money will actually be paid to the hospital.

So really, if one were to manually go over the websites of the 6,210 hospitals in the US, find all the chargemasters and aggregate all that data into a website, is it really that valuable if users can’t make sense of the items presented? Additionally, I doubt that these names are standardized, so how to compare different hospitals if they’re not even using the same terms to refer to the same things?3

An approach that seems a bit more promising is to follow the lead from the Medicare website. They actually already offer a Hospital Compare site and provide the datasets that they use:

These are the official datasets used on the Medicare.gov Hospital Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at over 4,000 Medicare-certified hospitals across the country.

Seems like a healthy dataset to me, more than enough to get something started.

Building a prototype

On top of the challenge of aggregating obfuscated price lists, there’s the challenge of coming up with a technical solution that works. I know my way around HTML, CSS and JavaScript, but I don’t have much experience building complex or sophisticated websites or web applications.

So I decided to start small (is there any other way to start?) and tried to come up with a system that leverages my current skills and might have a chance of working.

I’ve just finished a Node.js course, so the first thing that came to mind was to use that stack. Basically an Express server with a MongoDB database. I started building with that, but I quickly hit a snag. I’m a Gatsby and JAMstack fanatic, so I kept thinking.

During the weekend I managed to set up the following prototype:

  • I run a Node.js script on my computer that scrapes a CSV file from one of these hospitals and writes that into a MongoDB hosted on mLab at the moment. My idea is to have this script running once a day as a cloud function on Webtask or some other similar service, scraping datasets from many different hospitals.
  • I then have a Gatsby setup that fetches that data on build and generates the website. I could host the code on a GitHub or GitLab repo and have Netlify run a build whenever the source or the database are updated.

Here’s a sneak peek of all this running locally for now:

Open questions

Here are some of the things I’m a bit unsure about:

  • Will this system scale? When the number of hospitals gets bigger, how long will it take to rebuild the site? It seems that Gatsby doesn’t support partial builds at the moment.
  • The biggest question, I think, will it be possible to make sense of the data provided by hospitals? Although, as Mubs pointed out in his tweets, even if the quality of the data is not great at the moment, there’s value in getting a head start for when the quality of the data improves.
  • I’m not sure of how to unequivocally identify hospitals, is there a centralized registry for hospitals from which one could get that info? There’s probably companies running several hospitals, and they all might share the same prices, for example. What’s the shape of the data?
  • What’s the best approach to start indexing data? Mubs suggested to start with one state and make that complete. This will put you in a position where you might be able to get some local media attention. Makes sense. Another approach would be to start with whatever the easiest/better quality datasets are.

I’ll keep building and looking into this. I’m not sure where it’ll take me, but for now it seems like a valuable use of my time. If you have any feedback, I’d really appreciate it. You can reach out on Twitter.

  1. An alternative explanation is that Pieter thought this was too good an idea to give away, so he deleted the tweet and has secretly been building hospitallist.com since then!
  2. In fact, I’m currently looking for a job as a remote web developer, so reach out if you have something for me 😉
  3. There is a concept called Diagnosis-related group (DRG), which was seemingly invented to alleviate this problem and I was able to find DRG tables in some cases.