Web Crawling is a technique used to extract root and related content (HTML, Videos, Images, etc.) from websites to your local disk. This is allows you apply NLP to analyze the content and get important insights. This article detail how to do web crawling and NLP.
To do web crawling you can choose a tool in Java or Python. In my case I'm using Crawler4J. (https://github.com/yasserg/crawler4j).