纯Java实现的支持W3C Xpath 1.0标准语法的HTML解析器。A html parser with xpath base on Jsoup and Antlr4. Maybe it is the best in java.Just try it.
fast-classpath-scanner
的依赖,以增强稳定性|
语义问题正式发布公测版
jsoupxpath-tool-1.0
,工具包本身是用spring-boot及spring-shell开发的,需要>=jdk8。JsoupXpath本身对jdk的要求是 >=jdk7,下面是它的使用示例,windows下控制台请开启utf-8编码。当然,这个小工具只是在大家不方便自己创建项目时测试使用,最好还是直接自己调用 JsoupXpath去感受bash-4.1$ ./jsoupxpath-tool-1.0.jar
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.0.1.RELEASE)
2018-04-12 00:02:20.112 INFO 14642 --- [ main] c.w.boot.JsoupXpathApplication : Starting JsoupXpathApplication v1.0 on localhost with PID 14642 (/opt/vhost/dev/spring-boot-xpath/target/jsoupxpath-tool-1.0.jar started by resin in /opt/vhost/dev/spring-boot-xpath/target)
2018-04-12 00:02:20.120 INFO 14642 --- [ main] c.w.boot.JsoupXpathApplication : No active profile set, falling back to default profiles: default
2018-04-12 00:02:20.176 INFO 14642 --- [ main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@5679c6c6: startup date [Thu Apr 12 00:02:20 CST 2018]; root of context hierarchy
2018-04-12 00:02:21.516 INFO 14642 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup
2018-04-12 00:02:21.530 INFO 14642 --- [ main] c.w.boot.JsoupXpathApplication : Started JsoupXpathApplication in 1.85 seconds (JVM running for 2.435)
shell:>help
AVAILABLE COMMANDS
Built-In Commands
clear: Clear the shell screen.
exit, quit: Exit the shell.
help: Display help about available commands.
script: Read and execute commands from a file.
stacktrace: Display the full stacktrace of the last error.
Xpath Extra
get: init JXDocument by url
xpath: extract by xpath
shell:>get https://book.douban.com/tag/%E4%BA%92%E8%81%94%E7%BD%91
Document init done.
shell:>xpath //ul[@class=\'subject-list\']/li[self::li/div/div/span[@class=\'pl\']/num()>10000][-1]/div/h2/allText()
2018-04-12 00:03:45.597 INFO 14642 --- [ main] cn.wanghaomiao.boot.cmd.XpathExtra : xpath = //ul[@class='subject-list']/li[self::li/div/div/span[@class='pl']/num()>10000][-1]/div/h2/allText()
长尾理论
shell:>xpath //*[@id=\"subject_list\"]/ul[1]/li[8]/div[2]/div[2]/span[3]/num()
2018-04-12 00:04:23.420 INFO 14642 --- [ main] cn.wanghaomiao.boot.cmd.XpathExtra : xpath = //*[@id="subject_list"]/ul[1]/li[8]/div[2]/div[2]/span[3]/num()
4333.0
shell:>
下面是JsoupXpath的基于Antlr4的语法解析树示例,方便大家更快速的一览JsoupXpath的语法处理能力与语法解析执行过程
jsoupxpath-tool-1.0
,工具包本身是用spring-boot及spring-shell开发的,需要>=jdk8。JsoupXpath本身对jdk的要求是 >=jdk7,下面是它的使用示例,windows下控制台请开启utf-8编码。当然,这个小工具只是在大家不方便自己创建项目时测试使用,最好还是直接自己调用 JsoupXpath去感受<dependency>
<groupId>cn.wanghaomiao</groupId>
<artifactId>JsoupXpath</artifactId>
<version>2.0.2-alpha</version>
</dependency>
bash-4.1$ ./jsoupxpath-tool-1.0.jar
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.0.1.RELEASE)
2018-04-12 00:02:20.112 INFO 14642 --- [ main] c.w.boot.JsoupXpathApplication : Starting JsoupXpathApplication v1.0 on localhost with PID 14642 (/opt/vhost/dev/spring-boot-xpath/target/jsoupxpath-tool-1.0.jar started by resin in /opt/vhost/dev/spring-boot-xpath/target)
2018-04-12 00:02:20.120 INFO 14642 --- [ main] c.w.boot.JsoupXpathApplication : No active profile set, falling back to default profiles: default
2018-04-12 00:02:20.176 INFO 14642 --- [ main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@5679c6c6: startup date [Thu Apr 12 00:02:20 CST 2018]; root of context hierarchy
2018-04-12 00:02:21.516 INFO 14642 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup
2018-04-12 00:02:21.530 INFO 14642 --- [ main] c.w.boot.JsoupXpathApplication : Started JsoupXpathApplication in 1.85 seconds (JVM running for 2.435)
shell:>help
AVAILABLE COMMANDS
Built-In Commands
clear: Clear the shell screen.
exit, quit: Exit the shell.
help: Display help about available commands.
script: Read and execute commands from a file.
stacktrace: Display the full stacktrace of the last error.
Xpath Extra
get: init JXDocument by url
xpath: extract by xpath
shell:>get https://book.douban.com/tag/%E4%BA%92%E8%81%94%E7%BD%91
Document init done.
shell:>xpath //ul[@class=\'subject-list\']/li[self::li/div/div/span[@class=\'pl\']/num()>10000][-1]/div/h2/allText()
2018-04-12 00:03:45.597 INFO 14642 --- [ main] cn.wanghaomiao.boot.cmd.XpathExtra : xpath = //ul[@class='subject-list']/li[self::li/div/div/span[@class='pl']/num()>10000][-1]/div/h2/allText()
长尾理论
shell:>xpath //*[@id=\"subject_list\"]/ul[1]/li[8]/div[2]/div[2]/span[3]/num()
2018-04-12 00:04:23.420 INFO 14642 --- [ main] cn.wanghaomiao.boot.cmd.XpathExtra : xpath = //*[@id="subject_list"]/ul[1]/li[8]/div[2]/div[2]/span[3]/num()
4333.0
shell:>
下面是JsoupXpath的基于Antlr4的语法解析树示例,方便大家更快速的一览JsoupXpath的语法处理能力与语法解析执行过程