For GDPR, companies need to categorize their documents. Many companies managed millions of documents, and a manual classification whether they contain private or sensitive information is not realistic. However, this kind of categorisation is necessary to implement appropriate processes to protect privacy related information.
We want to apply basic ML techniques to classify documents in 3 or 4 GDPR categories. The idea is to set up basic ML infrastructure and try a number of techniques to see where short term positive results can be obtained. There will be a setup with deep learning (neural networks) to evaluate the potential of this AI technique.
- study set of douments to be used and task at hand to categorize according to GDPR category
- identify 2 techniques and approaches that could yield short term results
- study and analyse state of the art techniques
- (google tensorflow …)
- implement those 2 different classification ways
- run 2 learning experiments and assist in building a learning model
- draw conclusions on the effort it takes to train ML models