Demographics data is needed for investors, sponsors, RFPs, etc. What’s the best way to gather this data for an app or website? What about for a conference or talk series?


There are three lenses to consider when you decide it’s time to start collecting demographic data on your users:
(1) respect your users
(2) fit for use
(3) privacy preserving

First, consider who your users are and how they are likely to think about themselves. When asking about gender, lean on community guidelines. As an example, HRC and American Progress provide guidelines for collecting gender and sexual orientation demographics: https://www.hrc.org/resources/collecting-transgender-inclusive-gender-data-in-workplace-and-other-surveys and https://www.americanprogress.org/issues/lgbt/reports/2016/03/15/133223/how-to-collect-data-about-lgbt-communities/. When collecting information about religion, race or ethnicity, seek out resources from the impacted communities, and take their guidance as a top priority.

Second, consider the use this data will be put to. What models are your investors likely to be accustomed to? What are the characteristics potential sponsors most want to know about? If you’re planning to present to academic groups, closely research the ways those groups characterize populations.
Some examples are

  • American Community Survey data dictionaries from the U.S. Census Bureau (Google for “american community survey pums data dictionary”
  • HHS implementation guidance for collecting data (Google for “Data Collection Standards for Race, Ethnicity, Primary Language, Sex, and Disability Status”)

Keep in mind that community advocacy groups and institutional standards may differ. Institutional standards tend to lag a bit behind social practice. Be thoughtful about how you reconcile any differences. Entering demographic data is very much a ‘user experience’ - make sure you are sending the intended message for your product.

Finally, keep your users’ privacy in mind as you collect data about them. As storage got cheap, the standard was to collect everything and figure out what was useful later. Recent high-profile breaches, as well as new right-to-be-forgotten laws, have increased the organizational cost of storing data you don’t have a purpose for. Err on the side of collecting less, not more. Establish a schedule for aggregating data into reports and deleting the raw details. Limit who is able to see all the raw data.

