Tuesday, November 30, 2010

Spam

I can’t believe that people still actually eat that stuff. Anyways, our class has moved onto discussions of network security. This is closer to my field of study. Our first topic was on spam. Spam accounts for a large chunk of the available bandwidth on the internet. Because of this, lots of research has gone into the detection and prevention of spam. Lots of companies like Google, Microsoft and Yahoo have proprietary spams filtering systems that are pretty effective. They don’t stop spam from being sent, they just prevent it from getting into your inbox. In addition to what these companies have created, there are other standards such as DKIM and SPF that help with spam detection. I have a couple thoughts regarding spam detection and prevent that add a little overhead, but won’t bother those that send legitimate email.

First, I think that all major email providers should start generating email certificates for each of their users and automatically signing all out going email. Email providers can then check the signature to make sure that the identity and issuer are valid. This would prevent email address spoofing—unless your account gets hacked, but that’s another issue altogether. Second, I think large email providers should start enforcing the presence of DKIM signatures—anything not signed is thrown out. According to the spec, DKIM public keys are distributed through DNS. Instead of that, I think that there should be a central repository of DKIM public keys that requires human interaction (CAPTCHA is an effective human interaction enforcement tool) to register keys. Other email providers can then query this repository for keys that have been submitted. The only ones that would be bothered by the inconvenience of human interaction would be those that create and destroy lots of domains. The whole purpose of the human interaction is to make it expensive for spammers to automatically create throw away domains. You might think that this might be a lot of overhead for a startup or family to setup in order to send email from these personal domains. In a way it is, but products like Google Apps (which is free) and BPOS make email setup for any size organization a breeze. I think the extra initial overhead is worth it in order to make it harder for spammers. These are just some initial thoughts. Comment on any holes you see.

Thursday, November 18, 2010

Wireless Networks

We’ve recently been discussing research topics related to wireless networks. We’ve spent a lot of time discussing the problems and challenges associated with wireless mesh networks or ad hoc networks. I think wireless technologies are the future of communications. As an increasing number of nations around the world become connected to the vast entity that is the internet I believe that wireless communications will be the most efficient and cheap means of connecting small and remote places. Obviously the speeds that can be achieved in wireless networks as they stand right now pale in comparison to wired networks, but technology is always changing and smart people are always producing new and better ideas to improve communication.

What’s interesting to me (I heard this in class) is that in technologically emerging countries (like those in Africa), people don’t own computers, they own mobile devices like the iPhone. Many wireless network providers are starting to roll out higher speed wireless networks like 4G and LTE. Mobile sales have sky rocketed over the last year. In addition to mobile phones, tablets are starting to become popular, again thanks to Apple. The writing is on the wall, connectivity will be defined by how mobile it is.

Tuesday, November 16, 2010

Wireless Congestion Control

One of the biggest problems with TCP in a wireless network is that it assumes that loss is a result of congestion. In a wired network that is most certainly the case but, in a wireless network loss can be a result of interference, changes in the wind, or phases of the moon. A paper out of the University of Dortmund presents an adjusted version of TCP called TCP with Adaptive Pacing. It claims that it can achieve up to 84% more goodput than TCP New Reno and excellent fairness in almost all scenarios. Basically, the sender adaptively sets its transmission rate using an estimate propagation delay over 4 hops and a coefficient of variation calculated from round trip times. The whole idea is to reduce contention at the MAC layer so that more senders can send packets more frequently with more success.

Overall I think the idea seems to be solid. Their results seem to show that TCP convergence time is low, fairness is high and goodput is, well, good. They seemed to spend a lot of time tuning constants in the equations they were using to find an optimal setting. That’s okay, there are magic numbers like that in lots of algorithms. Like all algorithms similar to this, there is overhead associated with the adaptive pacing. For instance, the total goodput with New Reno is actually higher than TCP-AP. However, all that goodput is assigned to 1 or 2 nodes, but with TCP-AP the goodput is evenly distributed among the nodes. I think the overhead is a worthwhile trade off when more nodes get to send more frequently. It would be interesting to see this system work in practice.

Thursday, November 11, 2010

Network coding

Wireless networks provide an interesting challenge. Radios on identical frequencies transmitting at the same time can cause collisions resulting to data loss. Since the air is a limited shared resource, it has to be used efficiently. Typically, wireless nodes that want to transmit will use a reservation protocol to tell the immediate network that it wants to transmit so that no one else will interfere. I think this would be a fine way to go, but we have to remember that a lot of internet traffic uses TCP for reliability. TCP sends acknowledgements for every packet making it very chatty in the context of a wireless network. When you are in a network where multiple nodes are sending TCP traffic, there is a lot of traffic back and forth between wireless nodes. The challenge is to use that air time as efficiently as possible. That’s where network coding comes in.

There is an interesting paper that presents a system called COPE that performs opportunistic coding on traffic that it overhears. Following image is taken from the paper I’m referring to.

cope

With COPE, in this example, the number of transmissions is reduced from 4 to 3 because the packets have been XOR’d together. Results of cope showed that it was able to achieve between a 5% to 70% improvement in throughput depending on the traffic dynamics. An interesting thing to study in connection with this would be how fair this coding is. If we could track independently each TCP flow, it would be interesting to measure the fairness, if coding even effects flow fairness.

Tuesday, November 9, 2010

Wireless network security

This week we are starting to talk about wireless networks. I really don’t have a lot of say on the subject because I’m not that familiar with all the challenges of a wireless network. My research focus is in security so I think I’ll talk about that as it relates to thoughts I have on wireless networking. Wireless networks are normally thought of networks where clients wirelessly single hop to an access point which then forwards that traffic onto a wired network. However, they can be more ad hoc as well—like wireless mesh networks.

Wireless networks suffer from the inability to completely control where its traffic goes. To mitigate the effects of wireless eavesdroppers, the single hop wireless topologies have features to encrypt traffic between the client and access point. However, to my knowledge, not a lot has been done in this regard in wireless ad hoc networks. There would be lots of challenges such as authenticating other wireless nodes in the network or dealing with packet loss/corruption. The one caveat with encryption is that every bit has to be reassembled in the order it was sent or entire messages will be lost. It might be interesting to look into this.

Thursday, November 4, 2010

Networking research is hard

In class we’ve been working on learning the discrete events simulator OMNET++. It is a really nice piece of software and sure makes it easy simulate vast amounts of scenarios. One of the things I find interesting about network research is how hard it is to prove that your idea or system is correct. I personally do research in internet security and there are a number of techniques in that area of research to prove that a security protocol is correct or provides certain guarantees. But, networking doesn’t have that luxury. For instance, routing is definitely something that could use some improvement. Sure, it’s easy to test in a small controlled environment, but the real test is with hundreds of thousands of nodes connected together in a complex topology. Early in the class we talked about ideas for a new internet architecture, how do you test something like that on a real scale?

In a lot of ways I think simulators might be our saving grace in this regard. The more we can represent real world conditions in a simulator, the better the predictive outcome. I’m not saying this method is not without its downsides, but it might be the best we have available.

Tuesday, November 2, 2010

Network layer

In class lately we’ve been talking about the network layer. That layer encompasses lots of discussion points like addressing, multicast, and routing. I wish to discuss a few different things relating to the network layer.

There are lots of good ideas in the area of routing that we’ve read lately—ideas that are almost too obvious once you read them and then you wonder why nobody thought of them from the beginning. The challenge then becomes implementation and deployment onto the internet. Mostly, I think we are in a bandaid mode. Take IPv4 address exhaustion for example, NAT was invented to mitigate that process. Addresses are still running out. I don’t think that IPv6 is going to be our magic bullet that will fix all our problems. First, it seems that all major modern operating systems are still defaulting to some form of IPv4. Shouldn’t they be defaulting to pure IPv6 first to even remotely start showing some semblance of wanting the world to move to IPv6? Second, I wish more thought would have been put into the actual addresses themselves. I know that the major issue IPv6 is trying to solve is the address space size, but the use cases predominantly lie in referring to resources by name, not some number. It would be interesting to search that space further to see if there could be a way to devise some kind of name to address hashing function. DHTs kind of already do this. Also, I know companies that will never move to IPv6 internally because the addresses are not memorizable. IPv4 has a huge advantage over IPv6 in that regard. Sure DNS is there to work with that, but, honestly, we shouldn’t need DNS. All DNS really is is the world filling a real need that wasn’t met in the original design of the network layer.

In the routing world, BGP tables are growing large and cumbersome. There are lots of good ideas on new routing protocols, but when trying to find out if any of them have taken hold (by looking at IETF’s website), we found that all that they’re trying to do is change BGP. Not to complain, but if something isn’t working because of a flaw in the original design, patching only pushes the problem under the rug, it doesn’t fix it.