Internship 5: Machine learning for document classification

Context:

For GDPR, companies need to categorize their documents. Many companies managed millions of documents, and a manual classification whether they contain private or sensitive information is not realistic. However, this kind of categorisation is necessary to implement appropriate processes to protect privacy related information.

 

Objective:

We want to apply basic ML techniques to classify documents in 3 or 4 GDPR categories. The idea is to set up basic ML infrastructure and try a number of techniques to see where short term positive results can be obtained. There will be a setup with deep learning (neural networks) to evaluate the potential of this AI technique.

 

Tasks:

  • study set of douments to be used and task at hand to categorize according to GDPR category
  • identify 2 techniques and approaches that could yield short term results
  • study and analyse state of the art techniques
  • (google tensorflow …)
  • implement those 2 different classification ways
  • run 2 learning experiments and assist in building a learning model
  • draw conclusions on the effort it takes to train ML models

 

Apply for this job

Resume/CV

Cover Letter