HBase的Get和Scan实例可以调用setFilter()来设置过滤器,HBase的过滤器种类繁多,以满足不同的过滤需求。Filter作用于各个RegionServer,通过使用过滤器可以高效的获取数据。HBase Filter主要分为三大类,用户也可以通过继承FilterBase或实现Filter接口来自定义Filter。若需要多个Filter结合使用,可以通过FilterList来满足。
以下主要对各个Filter进行简单的了解。
数据准备
rowkey | cf1:col1 | cf1:col2 | cf2:col1 | cf2:col2 |
---|---|---|---|---|
101 | 10086 | qwe1 | 1352288xxxx | wer1 |
102 | 10000 | sdf2 | 1820160xxxx | ert2 |
103 | 10001 | cxv3 | 1531308xxxx | mnb3 |
104 | 12306 | jhg4 | 1387223xxxx | kji4 |
105 | 12580 | nju5 | 1580101xxxx | nbv5 |
代码准备
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
private HTable table; @Before public void setUp() throws IOException { Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", "192.168.0.126"); conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("mapred.task.timeout", "0"); table = new HTable(conf, "pigstore"); } @After public void tearDown() throws IOException { table.close(); } private void setFilterAndPrint(Filter filter) throws IOException { Scan scan = new Scan(); scan.setFilter(filter); print(scan); } private void print(Scan scan) throws IOException { ResultScanner resultScanner = table.getScanner(scan); for(Result res : resultScanner) { for(KeyValue kv : res.raw()){ System.out.println("Key: " + kv + ", Value: " + Bytes.toString(kv.getValue())); } } } |
Comparision Filters(比较过滤器)
RowFilter
使用示例
1 2 3 4 5 6 7 |
@Test //过滤rowkey="101" public void testRowFilter() throws IOException { Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("101"))); setFilterAndPrint(filter); } |
结果
Key: 101/cf1:col1/1404704499698/Put/vlen=5/ts=0, Value: 10086
Key: 101/cf1:col2/1404704499698/Put/vlen=4/ts=0, Value: qwe1
Key: 101/cf2:col1/1404704499698/Put/vlen=11/ts=0, Value: 1352288xxxx
Key: 101/cf2:col2/1404704499698/Put/vlen=4/ts=0, Value: wer1
其它
RowFilter,行过滤器,实际上称之为行键过滤器更合适,用于行键的各种过滤上。构造函数为
假设根据rowkey,数据分为3部分A、B、C,其中B为通过rowComparator选出的数据,接着通过rowCompareOp来确定最终的数据:若为EQUAL则选择B,若为LESS_OR_EQUAL则选择A+B,如此类推。
FamilyFilter
使用示例
1 2 3 4 5 6 7 |
@Test //过滤列族小于cf2 public void testFamilyFilter() throws IOException { Filter filter = new FamilyFilter(CompareFilter.CompareOp.LESS, new BinaryComparator(Bytes.toBytes("cf2"))); setFilterAndPrint(filter); } |
结果
Key: 101/cf1:col1/1404704499698/Put/vlen=5/ts=0, Value: 10086
Key: 101/cf1:col2/1404704499698/Put/vlen=4/ts=0, Value: qwe1
Key: 102/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 10000
Key: 102/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: sdf2
Key: 103/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 10001
Key: 103/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: cxv3
Key: 104/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 12306
Key: 104/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: jhg4
Key: 105/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 12580
Key: 105/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: nju5
其它
FamilyFilter,列族过滤器,用于列族的过滤。其余特性与RowFilter类似,只是范围由rowkey变成了列族名。
QualifierFilter
使用示例
1 2 3 4 5 6 7 |
@Test //过滤列名大于col1 public void testQualifierFilter() throws IOException { Filter filter = new QualifierFilter(CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("col1"))); setFilterAndPrint(filter); } |
结果
Key: 101/cf1:col2/1404704499698/Put/vlen=4/ts=0, Value: qwe1
Key: 101/cf2:col2/1404704499698/Put/vlen=4/ts=0, Value: wer1
Key: 102/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: sdf2
Key: 102/cf2:col2/1404704499699/Put/vlen=4/ts=0, Value: ert2
Key: 103/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: cxv3
Key: 103/cf2:col2/1404704499699/Put/vlen=4/ts=0, Value: mnb3
Key: 104/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: jhg4
Key: 104/cf2:col2/1404704499699/Put/vlen=4/ts=0, Value: kji4
Key: 105/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: nju5
Key: 105/cf2:col2/1404704499699/Put/vlen=4/ts=0, Value: nbv5
其它
QualifierFilter,列名过滤器,用于列名的过滤。其余特性与RowFilter类似,只是范围由rowkey变成了列名。
ValueFilter
使用示例
1 2 3 4 5 6 7 |
@Test //过滤Value的前缀为100的数据 public void testValueFilter() throws IOException { Filter filter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("100")); setFilterAndPrint(filter); } |
结果
Key: 101/cf1:col1/1404704499698/Put/vlen=5/ts=0, Value: 10086
Key: 102/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 10000
Key: 103/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 10001
其它
ValueFilter,值过滤器,用于值的过滤。其余特性与RowFilter类似,只是范围由rowkey变成了值。
DependentColumnFilter
使用示例
1 2 3 4 5 6 7 8 9 |
@Test //过滤前缀等于1000的数据 public void testDependentColumnFilter() throws IOException { //第三个参数为false时,不输出cf1:col1这一列,即不输出参考列 Filter filter = new DependentColumnFilter(Bytes.toBytes("cf1"), Bytes.toBytes("col1"),false,CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("1000"))); setFilterAndPrint(filter); } |
结果
Key: 102/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 10000
Key: 102/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: sdf2
Key: 102/cf2:col1/1404704499699/Put/vlen=11/ts=0, Value: 1820160xxxx
Key: 102/cf2:col2/1404704499699/Put/vlen=4/ts=0, Value: ert2
Key: 103/cf1:col1/1404704499699/Put/vlen=5/ts=0, Value: 10001
Key: 103/cf1:col2/1404704499699/Put/vlen=4/ts=0, Value: cxv3
Key: 103/cf2:col1/1404704499699/Put/vlen=11/ts=0, Value: 1531308xxxx
Key: 103/cf2:col2/1404704499699/Put/vlen=4/ts=0, Value: mnb3
其它
初步理解:与ValueFilter类似,只不过ValueFilter只输出满足条件的列,DependentColumnFilter则将满足条件的列所在的行全部输出。