Java 范例 - 网页抓取
使用 net.URL 类的 URL() 构造函数来抓取网页
import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileWriter; import java.io.InputStreamReader; import java.net.URL; public class Main { public static void main(String[] args) throws Exception { URL url = new URL("http://www.twle.cn"); BufferedReader reader = new BufferedReader (new InputStreamReader(url.openStream())); BufferedWriter writer = new BufferedWriter (new FileWriter("data.html")); String line; while ((line = reader.readLine()) != null) { System.out.println(line); writer.write(line); writer.newLine(); } reader.close(); writer.close(); } }
运行以上 Java 代码,输出结果如下
网页的源代码,存储在当前目录下的 data.html 文件中
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"/>...