Why is it important for data scientists to seek transparency?

Answer

Transparency is essentially important in data science projects and machine learning programs, partly because of the complexity and sophistication that drives them — because these programs are “learning” (generating probabilistic results) rather than following predetermined linear programming instructions, and because as a result, it can be hard to understand how the technology is reaching conclusions. The “black box” problem of machine learning algorithms that are not fully explainable to human decision-makers is a big one in this field.

With that in mind, being able to master explainable machine learning or “explainable AI” will likely be a main focus in how companies pursue talent acquisition for a data scientist. Already DARPA, the institution that brought us the internet, is funding a multimillion-dollar study in explainable AI, trying to promote the skills and resources needed to create machine learning and artificial intelligence technologies that are transparent to humans.

One way to think about it is that there is often a “literacy stage” of talent development and a “hyperliteracy stage.” For a data scientist, the traditional literacy stage would be knowledge of how to put together machine learning programs and how to build algorithms with languages like Python; how to construct neural networks and work with them. The hyperliteracy stage would be the ability to master explainable AI, to provide transparency in the use of machine learning algorithms and to preserve transparency as these programs work toward their goals and the goals of their handlers.

Another way to explain the importance of transparency in data science is that the data sets that are being used keep becoming more sophisticated, and therefore more potentially intrusive into people’s lives. Another major driver of explainable machine learning and data science is the European General Data Protection Regulation that was recently implemented to try to curb unethical use of personal data. Using the GDPR as a test case, experts can see how the need to explain data science projects fits into privacy and security concerns, as well as business ethics.