As mobile devices become more widespread, they can provide opportunities for collecting data on human behavior – this naturally raises concerns about privacy. Accordingly, we need to safeguard individual privacy while exploring potential of CDR data in an ethical manner. This blog provides an insight into the background of ethical use of CDR data.
Most mobile phone operators around the world are members of the GSMA (Groupe Spécial Mobile), the mobile phone industry body. This enables best practices on the usage of data to be shared will all parties in the form of industry wide policies and standards. The GSMA is critical to supporting the ethical use of mobile phone data in emerging economies for numerous reasons, the strongest being the lack of personal data legislation in these economies. In the USA, EU and UK personal data privacy is enshrined in law other countries however do not have these same protections yet.
The use of CDR data came to the consciousness of the GSMA and others during the 2014 Ebola outbreak, where CDR information was crucial in providing insight into population movements during the epidemic. The best practices for use of CDR for research were then formulated from the lessons learned. These guidelines include:
- Strict anonymisation of personal data, names, phone numbers with a hashing process;
- No transactional record leaves the premises of the mobile operator with analysis conducted on-site;
- No analysis identifies individuals;
For our team, these guidelines are just the starting point, not the end. Our research is starting to uncover stronger, more suitable guidelines for ethical use of CDR data, while maintaining its potential.
For example, depending on the level of detail in the anonymised CDR data, the identification of a few frequent location visits can be enough to shed light on the identity of a user whose data is analysed. This poses extra privacy challenges to anyone working with CDR data. One of the most effective ways to tackle this challenge is to focus on the behaviour of masses rather than individuals. Analysing hundreds, thousands or even millions of users simultaneously can be an effective strategy for ensuring “privacy by the numbers”. But, how many numbers are enough?
Another potential privacy pitfall comes with “linking data”. Linking data refers to the process of combining different data sets around common identifiers. While providing more detail and the potential to generate additional or more accurate insight, it also increases the risk of exposing private information.
Ultimately, the key to handling sensitive data such as CDR’s while retaining users privacy lies in obfuscating data through aggregation. Aggregation is a process of grouping that can occur across multiple domains such as time and space. While the principle of “privacy by the numbers” is inherent in the aggregation process, the difficulty lies in finding the right balance between retaining detail necessary for generating insights and abstracting to safeguard privacy. Finding this balance becomes more difficult as soon as the focus moves from densely populated areas to more rural areas as activity tends to be spread out across wider spaces.
In closing, we will continue to conduct research into ethical approaches to working with CDR data beyond the safeguarding of individual privacy.