In an AI conference lately I was struck by the mention of new jobs such as data hygienist and AI trainer. I did not realize how important data hygiene was – up to becoming a new profession!
Data hygiene is in reality quiet critical to AI development. Poor data hygiene is certain to create all sort of issues and false positive, and to lengthen dramatically the time it would take for an AI algorithm to learn its part.
Data hygiene is actually hard work because of the sheer size of the data bases to clean up, and the need to distinguish between rubbish and actual legitimate data points. It requires specific tools and particular attention, not to mention time. Hence it is a significant investment, but is found to be quite worthwhile apparently compared to the benefits.
Before we did not care so much about the quality of data in our databases – although there is still this old adage about garbage in, garbage out. Now we need a much higher quality level and apparently it is quite a challenge to achieve it.
Welcome to the world of data hygiene and data hygienists!