In Hungary, there are currently few concentrated cultural and scientific archiving activities that result in the material of sufficient accuracy and purity to be suitable for widespread use. Data loss is obvious and continuous.
To address this, DH-LAB is continuously selecting and harvesting web resources relevant to research and innovation and developing the necessary technologies. The web harvesting is done with a web crawler developed in-house and published as free software.
The harvested material is reposted on the CERN Repository and in our Sketch Engine corpus search service.
The harvested material can be used indirectly for non-profit purposes by researchers with EDUID.
Our web research aims to make the collected material searchable, and DH-LAB is therefore explicitly focusing on clarifying the legal issues related to the downloading and use of the relevant web pages.