Openflow多控制器的联调入门解惑

Ryou — 2013-06-22T00:00:00-07:00

总结分享一下最近关于多控制器开发与研究的入门时的疑惑。

实验环境是FLoodlight 0.9和Open vSwitch 1.9，由于两者都仅支持OpenFlow 1.0协议，而多控制器的支持是在OpenFlow 1.2以后才加入的特性，那是否可以在目前的实验环境下进行一些简单的多控制器开发实验呢？经过查询，发现ovs支持如下命令：

#ovs-vsctl set-controller br0 tcp:192.168.0.1 tcp:192.168.0.2

即直接在单个网桥上连接多个控制器。

我猜想这个命令可能是把写在前面的ip地址作为主控制器，后面的作为备份控制器，主控制器失去链接后交换机会主动连接备份交换机。那通过这条简单的命令是否就可以实现多控制器的主备切换呢？

做了简单的验证实验，答案是否定的。得出的结论是交换机不仅同时连接了两个控制器，而且同时向这两个交换机发送同样的消息，控制器也同时向交换机下发消息或者安装流表。即这两个控制器在逻辑上是没有任何差别的。

随之而来的疑问就是这样做的话，将会带来巨大的流量开销和状态信息的不一致，单单支持同时连接是没有意义的，那目前的实验环境是否支持控制器之间简单的协调机制比如角色的分配与协调呢？

通过查看Floodlight的源代码，发现核心代码中已经实现了一些对多控制器的支持，比如控制器可以向交换机发出角色变更的请求等,可以说这是对OpenFlow1.2以后的预实现。但另一端的Open vSwitch是否支持多控制器的角色请求的应答呢？

继续做验证实验，在控制器端添加发送角色请求的相关代码，发现确实可以实现控制器的角色分配。抓包查看发现，控制器角色变更的请求与回复都是在OF协议的Vendor消息中包含，查看OpenlowF 1.0 spec中关于Vendor消息的说明，Vendor消息提供额外的附加信息功能，为未来版本预留。

因此猜测Open vSwitch在Openlow1.0的基础上实现了对多控制器的扩展。查看FLoodlight源代码，在org/openflow/vendor/nicira/OFRoleReplyVendorData.java的注释说明中写道

Class that represents the vendor data in the role request extension implemented by Open vSwitch to support high availability.

猜测得到验证。在Open vSwitch的mail list中得到里进一步确认。

Add support for multiple OpenFlow controllers on a single bridge.

With this commit, Open vSwitch permits a bridge to have any number of OpenFlow controllers. When multiple controllers are configured, Open vSwitch connects to all of them simultaneously. Details of configuration are in the vswitch schema documentation.

OpenFlow 1.0 does not specify how multiple controllers coordinate in interacting with a single switch, so more than one controller should be specified only if the controllers are themselves designed to coordinate with each other.

An upcoming commit will provide a simple means for coordination between multiple controllers.

Add support for master/slave controller coordination.

Now that Open vSwitch has support for multiple simultaneous controllers, there is some need for a degree of coordination among them. For now, the plan is for the controllers themselves to take the lead on this.

This commit adds a small bit of OVS infrastructure: the ability for a controller to designate itself as a "master" or a "slave". There may be at most one master at a time; when a controller designates itself as the master, then any existing master is demoted to slave status. Slave controllers are not allowed to modify the flow table or global configuration; any attempt to do so is rejected with a "bad request" error.

HP ProCurve switch的mail list中也提到了Open vSwitch中关于多控制器的说明

Like Open vSwitch, our implementation will NOT fall back to a second controller, it will only fall back to "secure" or "normal".

Like Open vSwitch, if you specify multiple controllers, it will connect to all of them simultaneously. Like Open vSwitch, we do implement the Nicira extension to set each controller connection to master or slave. With this scheme, the controllers themselves manage the fall back and decide which of the controllers is elected master, the switch does nothing.

得到的结论就是，在OpenFlow 1.2以上版本没有大范围普及的情况下，可以继续在当前配置环境下进行多控制器的入门研究。

How To Debug Hadoop Source Code In Eclipse

Ryou — 2013-03-08T00:00:00-08:00

If you want to learn a open source project,there's maybe two good ways to achieve your goals.First is to communicate with the committers even better with chair of the project.If this way is difficult to you,perhaps tracing and debugging the source code is the perfect method.

Some programmers do this work by adding some log info to the source code,but when you had not family enough with the project,it's not a wise choice. To set breakpoints and debug step by step will make you mind more clear.

Follow the three steps as follows,you can debug Hadoop source code in Eclipse easily.

Step 1. Configure environment variable

In order to configure the hadoop's main thread to be intercepted by a debugger, you may need to configure the environment variable "HADOOP_OPTS" with the debugging information.

The environment variable "HADOOP_OPTS" will be captured by any execution from "/bin/hadoop" command, and therefore, you may only execute it when submitting a Hadoop command.The option interested for you to debug the job on Eclipse is the "address=5000". Change the port number as your preference. The option "suspend=y" tells the JVM to suspend its execution until a debugger have been attached to the executing job at the given port number.

Running the command in the terminal :

hadoop@master:$ export HADOOP__OPTS="agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5000"

verifying its value

hadoop@master:$ echo $HADOOP_OPTS agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5000 == Starting the Suspended Hadoop MapReduce Job"

Step 2. Running the hadoop command

After you have configured the HADOOP_OPTS environment variable,first start you hadoop cluster and you can have a test using the regular command. For instance, try running the hadoop fs -ls as follows:

hadoop@master:/hadoop/bin/$ hadoop fs -ls
Listening for transport dt_socket at address: 5000

Step 3. Configuring Eclipse

In order to debug the job, you must have created a hadoop source-code Project in Eclipse workplace. Once you have the source-code without compilation problems, you may add a remote debugger.

Going to the Eclipse main menu "Debug -> Debug Configurations -> Remote Java Applications". Change the name of the debugger on top of the screen and add the chosen port number.

Once you click in the button "Debug", the debugger will be attached to the Hadoop thread you have submitted and the execution of the code will continue.

From this point on, the execution of the job will be suspended in the face of any breakpoint you have added into you classes.