Maintaining Anonymity in a Financial App

In my recent post, I talked about a project I am working on, the Assisted Living Calculator. It is a web app. It's primary target are people with elderly parents or loved ones who will need assisted living. Here in the US, thanks to Congress, the elderly must exhaust all their assets before they can receive Medicaid for assisted living. This app will help them or their children compare assisted-living facilities based on how long Mom or Dad can afford to stay at each one given a certain net worth that can be converted to cash to pay for assisted living. This, more than any other factor, may determine what choices are feasible for them.

I learned that care facilities can have wildly different fee schedules. Fees usually include steep one-time move-in costs that can go into thousands of dollars, followed by fees for an array of services that may be different for each facility and for each resident. These may be stated or assessed weekly, bi-weekly, semi-monthly, monthly, ..., you get the idea. Create an app with a list of pre-set services is therefore out of the question. The candidate's financial planner must create a unique fee schedule for each facility that allows for fees for services they may have to describe themselves, and that may have different fee schedules.

The app must ask for each candidate's net worth in order to calculate their affordable length of stay in each facility, taking into account the rate of cost increases in their area.

My long-term incentive for this, besides helping people, is to gather data about the financial preparedness of the elderly for assisted living. To this end, the app will also ask for each candidate's birth year and gender. Also, it will attempt to guess each candidate's country and region from their IP address. It does not save their IP address, nor attempt to pinpoint their location beyond country and region. since the relationship between IP address and geographical location can be unreliable, the app may ask them to confirm their country and region. Country and region may be useful for determining the expected rate of cost increases for assisted living in their area.

Anonymity is what guarantees that privacy can never be violated. Even with strong database encryption, there is no guarantee that someone cannot find an exploit or steal or coerce the crypto-key from the maintainer. Furthermore, if control of the app and its database were to change hands one day, I have no way to guarantee that the new owner will be as scrupulous with people's financial information. I could ask for a covenant as a condition of the transfer, but enforcing it could be expensive and time-consuming with no assurance of success and no way to compensate people who might be harmed from someone else abusing their personal data. Given the risks and the importance of trust, the best way to assure privacy is not to collect any personally identifiable information at all, neither from the end users (the "respondents") nor the assisted-living candidates they use the app to plan for.

In my first version of this web-app seven years ago, I used OpenId to authenticate end users by means of their accounts on one of several web services, such as Gmail, Yahoo, and Facebook. At the time, I liked OAuth because it unburdened my app from having to maintain peoples IDs in a database, and manage their passwords. Instead, I could leave this to a service such as Facebook or Google that the end user already uses and implicitly trusts. (The cynically inclined among us may take exception to the word "trust" in relation to social media services, but in my view, if you have an account with any of them, you are already trusting it, like it or not.)

The result of OAuth is a unique URI for each account, and optionally, other data that the protocol requires the end user to give permission for before it will send anything to the requesting site. In my case, that meant only the URI. I then hashed the URI with SHA1 and used that to identify each end user when they came back. I didn't have to even know their email addresses.

This can work well for most people with one or two parents who needs to plan for assisted living. The app will need to combine fee schedules for those planning for both parents since each may require separate services or stay in a separate facility depending on care needed. In either case, there is no need to assign any names to the candidates other than "Mom" or "Dad".

What about those who have to plan for multiple people, such as social workers, geriatric nurses, financial planners, or even people tasked with planning assisted-living for elderly members of an extended family? These will want to assign some name or identifier to each case they handle so they can find it when they come back to the app. Assigning a name or any other personally identifiable information, even if encrypted, would violate anonymity, which I have set as an uncompromisable requirement for this app.

In my attempts to solve this I considered several schemes while all turn out to be unworkable upon reflection:

Force an end-user with multiple clients to assign a number to each client.

Forcing people to record a name and number on a separate sheet of paper or electronic document on a laptop is a non-starter. People will lose it. Worse, people will use things like their clients social security number or phone number. I don't want to be responsible for even the last four digits.

Force an end-user with multiple clients to assign a a name to each client from a list of military-style code names.

You can see what I'm talking about here. Again, social workers and financial planners will have to record these names along with their client's real names on a separate sheet of paper or electronic document which can be lost. It also just seems a bit silly: "Bravo Falcon to Victor Sunburst, come in, do you read me?". I don't think so.

Generate a public/private key-pair for each user with multiple clients and give them their private key. Use that to encrypt the name of each client they are planning for.

Now we're talking strong security! But remember who the end-users for this use-case are. Financial planners, social workers, geriatric nurses. Do you know any financial planners or social workers? If you do, would they know what to do with a private RSA key even if you told them? They would still have to store the key somewhere and remember where they put it. How likely is that?

Don't require authentication. Don't take names.

What if the app doesn't require end users to authenticate at all? Instead, each time they want to plan for a new client, it assigns to that client a unique URI. The end users still has to bookmark the URI for each client, but most people who already know how to use a browser also know how to bookmark a URI or they can learn. They can assign that bookmark any name they want, including their client's actual name. Modern browsers like Firefox and Chrome allow users who authenticate with a service to share their bookmarks across platforms. The bookmarks are still private and under the user's control. If a professional with multiple clients bookmarks a URI for a client on a public computer with their client's name, it's on him. That's why he is called "professional" in the first place.

This is not ideal. Even individuals using the app on behalf of one or two parents will have to bookmark it if they want to come back. People will forget to bookmark, forget where they saved the bookmark, or forget they ever created a bookmark. When this happens, too bad, they will simply have to start over again. But it seems like a pretty reasonable compromise.

If people start enter data twice for the same individual, researchers will have no way to know that when they scan the data and compute aggregate data. There will then be multiple data points for the same individual. Researchers may develop techniques to check for likely duplicates, but there will be false positives and negatives. But by the same token, even when they don't create multiple URI's for the same assisted-living candidate, we have no way to know how accurate or complete their data is. Researchers in social sciences have to deal with this problem all the time. If enough individuals respond, it may not matter, since they can still observe trends, which is what I think they're after.

I wrote about this same problem in 2011 in my first post to this blog. My thinking on how to do this has evolved a bit.