Web scraping (also called Web harvesting or Web data extraction) is a computer software technique of extracting information from websites.
Here is the code that extract the content of a specific site,
WebClient wc = new WebClient();string html = string.Empty;MatchCollection matches;string url = string.Empty;int id = 0;html = wc.DownloadString(urlPath).Replace("<html>", "").Replace("</html>", "").Replace("<!DOCTYPEHTML>", "").Replace("<head>", "").Replace("</head>", "").Replace("<script>", "").Replace("</script>", "");matches = Regex.Matches(html, "<a.*?href=\"(.*?)\".*?>(.*?)</a>", RegexOptions.IgnoreCase | RegexOptions.Singleline);if (destinationList == null)destinationList = new List<clsDestinations>();foreach (Match match in matches){string matchUrl = match.Groups[1].Value;//For internal links, build the url mapped to the base addressif (match.Groups[0].Value.Contains("travel/landing_page_hotels.cfm")){url = MapUrl(urlPath, match.Groups[1].Value);if (url.Length > 0){destination = new clsDestinations();id += 1;destination.ID = id;destination.Url = url;destination.CityName = match.Groups[2].Value;if (!destinationList.Exists(d => d.CityName == destination.CityName))destinationList.Add(destination);}}}
Here are the references that I have used:Once you have the data in your collection. Then you can save them one by one like this,foreach(clsDestinations cy in destinationList){if (!cityBll.CheckForDuplicateCity(cy, false))result += cityBll.InsertCity(cy);}
No comments:
Post a Comment