Flume中channel选择器(selector.type配置)必须实现ChannelSelector接口,实现了该接口的类主要作用是告诉Source中接收到的Event应该发送到哪些Channel,在Flume中主要由两个实现方式:
1,复用,实现类:MultiplexingChannelSelector
2,复制,实现类:ReplicatingChannelSelector
如下:
ChannelSelector接口两个主要的方法是:
//获取必选的Channel列表
public List<Channel> getRequiredChannels(Event event);
//获取可选的Channel列表
public List<Channel> getOptionalChannels(Event event);
ReplicatingChannelSelector (所有Channel默认的方式)
属性名
默认
描述
selector.type
|
replicating
|
组件名:replicating
|
selector.optional
|
–
|
标记哪些Channels是可选的
|
以下例子将c3标记为可选,写入c3失败的话会被忽略,如果写入c1和c2失败的话,这个事务就会失败:
a1.sources = r1
a1.channels = c1 c2 c3
a1.source.r1.selector.type = replicating
a1.source.r1.channels = c1 c2 c3
a1.source.r1.selector.optional = c3
ReplicatingChannelSelector初始化过程:
public void configure(Context context) {
//获取哪些Channel标记为可选
String optionalList = context.getString(CONFIG_OPTIONAL);
//将所有Channel都方法必须的Channel列表中
requiredChannels = new ArrayList<Channel>(getAllChannels());
Map<String, Channel> channelNameMap = getChannelNameMap();
if(optionalList != null && !optionalList.isEmpty()) {
//下面的操作:如果channel属于可选的,则加入可选的列表中,并从必选的列表中删除
for(String optional : optionalList.split("\\s+")) {
Channel optionalChannel = channelNameMap.get(optional);
requiredChannels.remove(optionalChannel);
if (!optionalChannels.contains(optionalChannel)) {
optionalChannels.add(optionalChannel);
}
}
}
}
MultiplexingChannelSelector
属性名
默认
Description
selector.type |
replicating |
组件名:multiplexing |
selector.optional |
– |
标记哪些Channels是可选的
|
selector.header
|
flume.selector.header
|
|
selector.default |
– |
|
selector.mapping.* |
– |
|
示例:
a1.sources = r1
a1.channels = c1 c2 c3 c4
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state
a1.sources.r1.selector.mapping.CZ = c1
a1.sources.r1.selector.mapping.US = c2 c3
a1.sources.r1.selector.default = c4
根据header中key为state的值,决定将数据写入那个channel中,如上示例将state=CZ写入到c1中,将state=US写入到c2,c3中,默认情况下写入c4
MultiplexingChannelSelector的初始化过程:
public void configure(Context context) {
//获取Header的值
this.headerName = context.getString(CONFIG_MULTIPLEX_HEADER_NAME,
DEFAULT_MULTIPLEX_HEADER);
Map<String, Channel> channelNameMap = getChannelNameMap();
//获取默认的Channel
defaultChannels = getChannelListFromNames(
context.getString(CONFIG_DEFAULT_CHANNEL), channelNameMap);
//获取Mapping的值
Map<String, String> mapConfig =
context.getSubProperties(CONFIG_PREFIX_MAPPING);
//channelMapping变量存放了header变量中必须的Channel列表
channelMapping = new HashMap<String, List<Channel>>();
//将header对应的Channels存放到channelMapping变量中。
for (String headerValue : mapConfig.keySet()) {
List<Channel> configuredChannels = getChannelListFromNames(
mapConfig.get(headerValue),
channelNameMap);
//This should not go to default channel(s)
//because this seems to be a bad way to configure.
if (configuredChannels.size() == 0) {
throw new FlumeException("No channel configured for when "
+ "header value is: " + headerValue);
}
if (channelMapping.put(headerValue, configuredChannels) != null) {
throw new FlumeException("Selector channel configured twice");
}
}
//If no mapping is configured, it is ok.
//All events will go to the default channel(s).
Map<String, String> optionalChannelsMapping =
context.getSubProperties(CONFIG_PREFIX_OPTIONAL + ".");
//以下这一整段代码中是赛选出Header对应那些可选Channel列表。
optionalChannels = new HashMap<String, List<Channel>>();
for (String hdr : optionalChannelsMapping.keySet()) {
List<Channel> confChannels = getChannelListFromNames(
optionalChannelsMapping.get(hdr), channelNameMap);
if (confChannels.isEmpty()) {
confChannels = EMPTY_LIST;
}
//Remove channels from optional channels, which are already
//configured to be required channels.
List<Channel> reqdChannels = channelMapping.get(hdr);
//Check if there are required channels, else defaults to default channels
if(reqdChannels == null || reqdChannels.isEmpty()) {
//如果header对应的必选Channel列表为空,那么deault就作为它的必选Channel
reqdChannels = defaultChannels;
}
for (Channel c : reqdChannels) {
//如果header对应的Channel是必选的,那么就在可选的列表中删除。
if (confChannels.contains(c)) {
confChannels.remove(c);
}
}
if (optionalChannels.put(hdr, confChannels) != null) {
throw new FlumeException("Selector channel configured twice");
}
}
}
在看看MultiplexingChannelSelector中getRequiredChannels和getOptionalChannels方法,这两个方法也是根据HeaderName来获取Channel列表的:
@Override
public List<Channel> getRequiredChannels(Event event) {
String headerValue = event.getHeaders().get(headerName);
//headerValue不存在,就获取默认
if (headerValue == null || headerValue.trim().length() == 0) {
return defaultChannels;
}
//根据headerName获取必选的Channel列表
List<Channel> channels = channelMapping.get(headerValue);
//This header value does not point to anything
//Return default channel(s) here.
//必选列表为null,则返回默认的Channel列表
if (channels == null) {
channels = defaultChannels;
}
return channels;
}
@Override
public List<Channel> getOptionalChannels(Event event) {
String hdr = event.getHeaders().get(headerName);
//根据HeaderValue获取可选的channel列表
List<Channel> channels = optionalChannels.get(hdr);
//可选列表为null,则返回空列表。
if(channels == null) {
channels = EMPTY_LIST;
}
return channels;
}
自定义 Channel Selector
自定义的Channel Selector必须实现ChannelSelector接口,配置如下(全类名)a1.sources.r1.selector.type = org.example.MyChannelSelector
分享到:
相关推荐
flume log4f示例源码
本文讲述了flume中channel和sink简单描述和linux配置 包括:Memory channel、File channel及其它测试阶段的Channel; 及channel通过sink的输出配置Logger Sink、File Roll Sink、HDFS Sink、Avro Sink(多级流动、...
flume源码
这是已经编译好的flume包,可以直接用于集成在Ambari上
Apache Flume 是一个分布式、高可靠、高可用的用来收集、聚合、转移不同来源的大量日志数据到中央数据仓库的工具 ...另外还有很多可选的组件interceptor、channel selector、sink processor等后面会介绍)。
尚硅谷大数据技术之Flume
Apache flume1.6 的源码,源码写的很详细,底层的技术,channel sinks source的关系,启动顺序等。
dope-archetype.zip,为高级用户设计的演示系统。认为PowerPoint符合降价。面向开发人员的演示引擎原型
20-Flume副本机制channel选择器-需求分析.avi 21-Flume副本机制channel选择器-配置信息.avi 22-Flume副本机制channel选择器-案例测试.avi 25-Flume负载均衡案例-案例实操.avi 27-Flume聚合案例-案例实操.avi 29-...
通过修改flume源码实现flume向两个HA hadoop集群分发数据。
flume修改源码读日志到hbase,①日志文件为json数据②修改文件编译打包并替换flumejar中的对应文件
电商数仓项目(八) Flume(2) 拦截器开发源代码
Flume1.6.0入门:安装、部署、及flume的案例
flume断点续传覆盖jar,使用组件flume-taildir-source-1.9.0覆盖flume/bin目录下的jar即可
2)JVM heap(堆内存)设置4G或更高二、channel优化Flume如何保证数据安全(高可用)事务机制Flume解决数据重复负载均衡知道 Flume 的 Channel 是啥吗介绍一下 Memory Channel说说 File Channel说说 Kafka Channel...
Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可...
经过对Flume FileChannel相关源码的分析,导致FileChannel吞吐率下降的主要原因集中于事务的提交过程——commit
flume-ng安装