Thursday, December 9, 2010

Intrusion detection

I have an aversion to intrusion detection systems. Not because they aren’t a useful tool, but that they usually get in way more than they help. I speak more of application layer intrusion detection systems that attempt to weed out URLs and sites that might contain malicious content. BYU’s new Cisco IDS is a resounding example of one that gets in the way. I used to have a job as a developer in one of the colleges on BYU campus. There were a few applications that had been written or purchased in the past that were loaded with SQL injection and XSS vulnerabilities that we had to support. Not a big deal, we just kept them isolated on their own VMs running in an environment with very limited privileges. We eventually installed the mod_security apache module which is an IDS system that is supposed to stop SQL injection and other malicious attacks by analyzing the text of the incoming and outgoing requests. My biggest problem with it was that it created more problems than it was fixing. We tried putting it in front of a content management system we had, but with so much free form text being submitted to it, it was killing sessions left and right. I eventually went in and turned it off to the dismay of those who set it up. I think the idea was good, but the fact that we were relying on it to correct bad software design was not.

My point in telling this story is to point out that IDS systems are hard. They don’t make a good substitute for bad software, but I think they are useful in weeding out the viciously malicious. If IDS systems could be trained to know what is good traffic and what is bad, I think they could be beneficially. That also means that they’d have to be trained specifically for each application they were protecting. It would be an interesting area of research.

Tuesday, December 7, 2010

Spam and Machine Learning

A year or so ago I took a machine learning class. Going into the class I thought we were going to learn how to build AI’s. Turns out that wasn’t the focus, but I did learn some other valuable things. Machine learning, at least at best as I can describe it, is a field of study in which algorithms can evolve based on empirical data. It is very interesting to watch that happen. It is actively applied in many different fields in and outside of computer science. Meteorology is a perfect example—tons of data gathered and used to help algorithms evolve to model future weather patterns. Machine learning can also play a crucial role in spam filtering systems as well. Since spammers tend to get pretty creative in their means to get around the road blocks set before, filtering software needs to evolve along with it. Enter machine learning.

A paper I recent read outlines an application of machine learning in spam filtering. They claim that most blacklists fail to keep pace with spammers because they are based filtering assumed persistent identifiers (e.g. IP addresses) and they compartmentalize email-sending behavior to a single domain rather analyzing behaviors across domains. This paper proposes a behavioral blacklist approach. They introduce a new system called SpamTracker which uses clustering, based on a principal components analysis, and classification algorithms to detect pre-blacklisted spam. Their system does okay. It is meant more to supplement existing systems rather than replace them, but at least it closes the gap a little more. As with all machine learning, the work is in tuning parameters, getting better data to train against, and deciding which features to “learn” from.

Thursday, December 2, 2010

Network Security

We are embarking on our last network research area: network security. I just want to spend a little bit of time talking about some thoughts that I have concerning network security. My first thoughts when thinking about network security are firewalls, IPSEC, DNSSEC, packet shapers, filters, etc. These are all important tools that aid in keeping a network safe from attackers and malicious insiders. There is another area of network security that seems to have become its own separate research area (while still quietly remaining a subarea of network security) called internet security. In my opinion, internet security deals primarily with application layer tools like intrusion detection systems, spam filters, and secure communication protocols (HTTPS, S/MIME, PKI, Diffie Hellman, WS-Trust, etc.). All of the aforementioned systems and protocol names and acronyms occur at the application layer. It seems like a lot of effort goes into protecting the application layer and for good reason, but is there not something more we can do in the lower layers to help protect the network better?

An interesting researching topic would be that of augmenting BGP to put some claims verification system to prevent malicious networks from hijacking traffic. My idea stems from the incident in China where due to a BGP misconfiguration, all traffic from the US was routing through China for a brief period of time. It was as simple as China advertising a better route than everyone else. Obviously, routing traffic meant for US networks, through China and back the US is not a better route. But, it begs the question, is there not a away to prevent such a claim?

Tuesday, November 30, 2010

Spam

I can’t believe that people still actually eat that stuff. Anyways, our class has moved onto discussions of network security. This is closer to my field of study. Our first topic was on spam. Spam accounts for a large chunk of the available bandwidth on the internet. Because of this, lots of research has gone into the detection and prevention of spam. Lots of companies like Google, Microsoft and Yahoo have proprietary spams filtering systems that are pretty effective. They don’t stop spam from being sent, they just prevent it from getting into your inbox. In addition to what these companies have created, there are other standards such as DKIM and SPF that help with spam detection. I have a couple thoughts regarding spam detection and prevent that add a little overhead, but won’t bother those that send legitimate email.

First, I think that all major email providers should start generating email certificates for each of their users and automatically signing all out going email. Email providers can then check the signature to make sure that the identity and issuer are valid. This would prevent email address spoofing—unless your account gets hacked, but that’s another issue altogether. Second, I think large email providers should start enforcing the presence of DKIM signatures—anything not signed is thrown out. According to the spec, DKIM public keys are distributed through DNS. Instead of that, I think that there should be a central repository of DKIM public keys that requires human interaction (CAPTCHA is an effective human interaction enforcement tool) to register keys. Other email providers can then query this repository for keys that have been submitted. The only ones that would be bothered by the inconvenience of human interaction would be those that create and destroy lots of domains. The whole purpose of the human interaction is to make it expensive for spammers to automatically create throw away domains. You might think that this might be a lot of overhead for a startup or family to setup in order to send email from these personal domains. In a way it is, but products like Google Apps (which is free) and BPOS make email setup for any size organization a breeze. I think the extra initial overhead is worth it in order to make it harder for spammers. These are just some initial thoughts. Comment on any holes you see.

Thursday, November 18, 2010

Wireless Networks

We’ve recently been discussing research topics related to wireless networks. We’ve spent a lot of time discussing the problems and challenges associated with wireless mesh networks or ad hoc networks. I think wireless technologies are the future of communications. As an increasing number of nations around the world become connected to the vast entity that is the internet I believe that wireless communications will be the most efficient and cheap means of connecting small and remote places. Obviously the speeds that can be achieved in wireless networks as they stand right now pale in comparison to wired networks, but technology is always changing and smart people are always producing new and better ideas to improve communication.

What’s interesting to me (I heard this in class) is that in technologically emerging countries (like those in Africa), people don’t own computers, they own mobile devices like the iPhone. Many wireless network providers are starting to roll out higher speed wireless networks like 4G and LTE. Mobile sales have sky rocketed over the last year. In addition to mobile phones, tablets are starting to become popular, again thanks to Apple. The writing is on the wall, connectivity will be defined by how mobile it is.

Tuesday, November 16, 2010

Wireless Congestion Control

One of the biggest problems with TCP in a wireless network is that it assumes that loss is a result of congestion. In a wired network that is most certainly the case but, in a wireless network loss can be a result of interference, changes in the wind, or phases of the moon. A paper out of the University of Dortmund presents an adjusted version of TCP called TCP with Adaptive Pacing. It claims that it can achieve up to 84% more goodput than TCP New Reno and excellent fairness in almost all scenarios. Basically, the sender adaptively sets its transmission rate using an estimate propagation delay over 4 hops and a coefficient of variation calculated from round trip times. The whole idea is to reduce contention at the MAC layer so that more senders can send packets more frequently with more success.

Overall I think the idea seems to be solid. Their results seem to show that TCP convergence time is low, fairness is high and goodput is, well, good. They seemed to spend a lot of time tuning constants in the equations they were using to find an optimal setting. That’s okay, there are magic numbers like that in lots of algorithms. Like all algorithms similar to this, there is overhead associated with the adaptive pacing. For instance, the total goodput with New Reno is actually higher than TCP-AP. However, all that goodput is assigned to 1 or 2 nodes, but with TCP-AP the goodput is evenly distributed among the nodes. I think the overhead is a worthwhile trade off when more nodes get to send more frequently. It would be interesting to see this system work in practice.

Thursday, November 11, 2010

Network coding

Wireless networks provide an interesting challenge. Radios on identical frequencies transmitting at the same time can cause collisions resulting to data loss. Since the air is a limited shared resource, it has to be used efficiently. Typically, wireless nodes that want to transmit will use a reservation protocol to tell the immediate network that it wants to transmit so that no one else will interfere. I think this would be a fine way to go, but we have to remember that a lot of internet traffic uses TCP for reliability. TCP sends acknowledgements for every packet making it very chatty in the context of a wireless network. When you are in a network where multiple nodes are sending TCP traffic, there is a lot of traffic back and forth between wireless nodes. The challenge is to use that air time as efficiently as possible. That’s where network coding comes in.

There is an interesting paper that presents a system called COPE that performs opportunistic coding on traffic that it overhears. Following image is taken from the paper I’m referring to.

cope

With COPE, in this example, the number of transmissions is reduced from 4 to 3 because the packets have been XOR’d together. Results of cope showed that it was able to achieve between a 5% to 70% improvement in throughput depending on the traffic dynamics. An interesting thing to study in connection with this would be how fair this coding is. If we could track independently each TCP flow, it would be interesting to measure the fairness, if coding even effects flow fairness.

Tuesday, November 9, 2010

Wireless network security

This week we are starting to talk about wireless networks. I really don’t have a lot of say on the subject because I’m not that familiar with all the challenges of a wireless network. My research focus is in security so I think I’ll talk about that as it relates to thoughts I have on wireless networking. Wireless networks are normally thought of networks where clients wirelessly single hop to an access point which then forwards that traffic onto a wired network. However, they can be more ad hoc as well—like wireless mesh networks.

Wireless networks suffer from the inability to completely control where its traffic goes. To mitigate the effects of wireless eavesdroppers, the single hop wireless topologies have features to encrypt traffic between the client and access point. However, to my knowledge, not a lot has been done in this regard in wireless ad hoc networks. There would be lots of challenges such as authenticating other wireless nodes in the network or dealing with packet loss/corruption. The one caveat with encryption is that every bit has to be reassembled in the order it was sent or entire messages will be lost. It might be interesting to look into this.

Thursday, November 4, 2010

Networking research is hard

In class we’ve been working on learning the discrete events simulator OMNET++. It is a really nice piece of software and sure makes it easy simulate vast amounts of scenarios. One of the things I find interesting about network research is how hard it is to prove that your idea or system is correct. I personally do research in internet security and there are a number of techniques in that area of research to prove that a security protocol is correct or provides certain guarantees. But, networking doesn’t have that luxury. For instance, routing is definitely something that could use some improvement. Sure, it’s easy to test in a small controlled environment, but the real test is with hundreds of thousands of nodes connected together in a complex topology. Early in the class we talked about ideas for a new internet architecture, how do you test something like that on a real scale?

In a lot of ways I think simulators might be our saving grace in this regard. The more we can represent real world conditions in a simulator, the better the predictive outcome. I’m not saying this method is not without its downsides, but it might be the best we have available.

Tuesday, November 2, 2010

Network layer

In class lately we’ve been talking about the network layer. That layer encompasses lots of discussion points like addressing, multicast, and routing. I wish to discuss a few different things relating to the network layer.

There are lots of good ideas in the area of routing that we’ve read lately—ideas that are almost too obvious once you read them and then you wonder why nobody thought of them from the beginning. The challenge then becomes implementation and deployment onto the internet. Mostly, I think we are in a bandaid mode. Take IPv4 address exhaustion for example, NAT was invented to mitigate that process. Addresses are still running out. I don’t think that IPv6 is going to be our magic bullet that will fix all our problems. First, it seems that all major modern operating systems are still defaulting to some form of IPv4. Shouldn’t they be defaulting to pure IPv6 first to even remotely start showing some semblance of wanting the world to move to IPv6? Second, I wish more thought would have been put into the actual addresses themselves. I know that the major issue IPv6 is trying to solve is the address space size, but the use cases predominantly lie in referring to resources by name, not some number. It would be interesting to search that space further to see if there could be a way to devise some kind of name to address hashing function. DHTs kind of already do this. Also, I know companies that will never move to IPv6 internally because the addresses are not memorizable. IPv4 has a huge advantage over IPv6 in that regard. Sure DNS is there to work with that, but, honestly, we shouldn’t need DNS. All DNS really is is the world filling a real need that wasn’t met in the original design of the network layer.

In the routing world, BGP tables are growing large and cumbersome. There are lots of good ideas on new routing protocols, but when trying to find out if any of them have taken hold (by looking at IETF’s website), we found that all that they’re trying to do is change BGP. Not to complain, but if something isn’t working because of a flaw in the original design, patching only pushes the problem under the rug, it doesn’t fix it.

Thursday, October 28, 2010

Cloud control

It seems that every so often, someone comes up with a buzz to make the internet sound cooler and more hip. As of late, a lot of companies have been throwing around the word ‘cloud’. The term cloud doesn’t just refer to the internet, it refers to a way of life when it comes to hosting and managing your company’s software. The phrase ‘going to the cloud’ is always thrown around by companies that have migrated their software offsite into a large distributed system (maybe with Amazon or Azure) as a means of increasing availability and decreasing onsite management costs. The important thing about the cloud is that everything is host agnostic. At any particular time you could be talking to servers on completely opposite ends of country. With applications becoming more distributed in nature to meet availability demands, challenges arise in regards to controlling these large environments. Cloud environments are usually metered, but how you meter an application? An application can be on many machines at once, but if you want to throttle access to that application as a whole, how do you do that?

A paper presented in SIGCOMM ‘07 presents a system for doing distributed rate limiting. That authors took several approaches to limiting rate such as global random drop, flow proportional share, and global token bucket. They compared their results to a system that used a centralized token bucket. Each methodology had its strengths and weaknesses. In my opinion, I think flow proportional share was probably the best because it was able to “spread the wealth around” a little better. For example, when there were nodes that weren’t using all their allotted bandwidth, it was assigned to another node that could use it. So, generally, when an application that is throttled at 1 Mbps globally, this protocol makes it look like you are accessing one instance of an application at that rate. A really cool idea.

Tuesday, October 26, 2010

IPv4 address exhaustion

One of things that worries me about our current addressing state is that no progress seems to be being made in the US to switch over to IPv6. Everyone knows that IPv4 addresses are running very thin. Current projections (there are a lot more than this one) show that addresses, if allocated at their current rate, will result in address exhaustion by 2011. That’s not that far off! Another thing that worries me is that when IPv4 gets put on the endangered species list, a few individuals are going to take the opportunity to rob the world blind by selling IP address at exorbitantly high prices all because there is no other option for users to get a presence on the web. NAT is a band aid. NAT only works for larger organizations. NAT doesn’t work for the mom and pop internet shops that go out to hosting services and are required to buy a static IP so that they can use SSL to participate in the vast internet ecommerce community. The other issue I see is that when we really start scraping the bottom of the barrel, some consumers might be denied an IP address because there aren’t enough and the few that are left are reserved for those with deeper pockets.

It’s not like there is not a solution to the address problem. The solution has been around for a very long time and is implemented in every major operating system in the world. The only challenge we think we have is that of the IPv6 network stack not being compatible with IPv4’s. I think IPv6 is turned on by default in most operating systems. The only places I see problems would be the network itself. If there are still routers unable to run IPv6 then I think they needed to be replaced regardless because they are probably extremely old. I think it would be easier to deal with the pain of switching now, rather than later.

Thursday, October 21, 2010

Net neutrality

Neutrality of the internet has been a debate raging for a very long time. I will provide a brief summary of what I think the pros and cons are of net neutrality. The internet has grown to a size where I believe regulations might become necessary to keep it working well in addition to maintain its openness. One of the things that people in to realize is that the internet costs money to run. It requires agreements between organizations for traffic to flow across networks. When companies and individual consumers sign-up, they expect a certain quality of service. Their contracts may even make certain guarantees pertaining to that very thing. As such, ISPs may be forced to throttle certain popular internet applications (i.e. P2P) in order to maintain a certain QoS for their other customers. Net Neutrality, in one sense, would prevent that kind of QoS monitoring. Although P2P can be used for completely legal things (ex. getting distros of Linux or other large files, Blizzard uses it to distribute Starcraft 2), I would argue that most users use it to download copyrighted media illegally. Sad, but true. A P2P network is also designed to take advantage of all available network bandwidth. It can easily affect other services on a network if left unchecked. In that instance, I may agree that a non-neutral approach to the internet would be beneficial for all users.

Let’s take it the other direction. I think it is completely wrong for ISPs to censor content and services that  are completely legal. One of the arguments opponents of net neutrality present is that there is a limited amount of bandwidth. That may be true, but maybe instead of investing money in finding more ways to make money, invest money in ways that will more efficiently use the network (i.e. network layer multicast would be great for video games). Another argument I find a little ridiculous is that of certain services “freeloading.” The internet is very much a “request for services” architecture. Skype doesn’t approach you, you as the user approach Skype and ask them to use their service. As the user, I’m paying my ISP so that I can access all these services. The ISP shouldn’t care beyond receiving a check from me every month. Obviously if I stopped paying then I would no longer be able to access Skype and there would be no way they could approach me. Another argument against neutrality is that of the lack of incentives for ISPs to invest. What are we paying the CEOs for then? Are they so out of touch that they can’t come up with other ways to make money in a neutral environment? There are plenty of services that ISPs could offer to monetize in addition to offering access to the internet.

There are definitely pros and cons to complete net neutrality. I don’t think it is possible to actually reach, because money, profit margins and power drive this planet. I do think we can’t leave the fate of the internet to the invisible hand because all the alternatives probably aren’t much better. Like many other things in this country that affect the majority of people’s livelihoods, it will probably have to be kept in check by the government. Whether that is good or not has yet to be seen.

Tuesday, October 19, 2010

Multicast

Network layer multicast is a really cool idea. It makes the network responsible for replicated data to users who want to receive it. For example, an entity streaming a video would only have to broadcast one copy of it and the network would duplicate the packets out to those you want watch the video. Quite a few people have taken a stab at it that have resulted in protocols such as DVMRP, CBT, and PIM. One of the biggest reasons for its failure is that of the lack of any substantial app that no one can live without. As we’ve discussed network layer multicast in class, I personally believe that multicast was way ahead of its time and that its debut was extremely premature as there was not an internet infrastructure back then as there was today. I think that if the inventors would have waited 15 years or so (i.e. waited for YouTube, Netflix, Hulu, etc.), they would have had a greater chance of making a real impact on networking. That argument obviously hinges on the fact that hardware vendors and service providers would be immediately willing to switch.

Today we use application layer multicast to sort of mimic what the network later would do—not quite the same. We also distribute content via content distribution networks to help the scalability of data access. I often wonder how much we can shove into the application layer. I wonder if at some point all the overhead will catch up with us. Don’t get me wrong, we have been able to craft some incredible things in the application layer. However, there are some interesting ideas that I think would benefit the internet and its users a whole.

Saturday, October 16, 2010

HLP, why is change for the better so difficult?

Often hind sight is 20/20 when look at software and see immediately how things should have been done from the beginning. Such is the thought I had when we were reading about HLP (hyper link-state and path-vector protocol) this past week. The border gateway protocol (BGP) is used to communicate paths between autonomous systems. It’s how routers learn how to get packets from place to another. Some of the biggest problems with BGP is that it advertises changes that don’t need to be advertised and is susceptible to the count to infinity problem. This is the very protocol that is currently routing our packets on the internet.

HLP provides a new take on routing and struck me as something that was so intuitive that I couldn’t understand why it hadn’t been thought of from the beginning. As per the name, it is a combination of link-state and path vector routing. It uses link-state routing within a hierarchy. It uses path vector between hierarchies. What’s nice about HLP, is that it uses a fragmented path vector which basically means that node B will tell node A that it has a path to E. What A doesn’t need to know is that that path will also go through C and D. BGP, on the other hand, would let A know all those details even though it didn’t need to know them. The nice thing about HLP is that if cost changes occur between C and D and packets start getting routed through F to get to E, A won’t get notified because it is not important for A to know. This keeps routing tables much smaller, which is good. The paper shows incredible performance benefits over BGP, so why isn’t the IETF taking a good hard look at HLP? Not only that, because the size of the internet is exploding, BGP tables are becoming extremely large. How long can that keep going?

It seems to me like change is never considered until something catastrophic happens. I think an interesting research area in networking would be change deployment. At some point I think we’ve just got to force some changes because it seems to me if we wait until something bad happens, we’re going to wish we changed it earlier anyway. Look at IPv4. The last statistic I heard was that if address usage continues like it is, we will be out of addresses by first quarter next year. That’s not a great thing to hear. Whether it happens or not is a different story. But, point is, there’s going to be panic if we reach that point where no one can allocate an address anymore. I understand and appreciate the hard problem change presents, but we can’t make 100% of the people happy 100% of the time. Let’s just start flipping the switch.

Thursday, October 14, 2010

Computer networks like social networks–part 2

In my previous blob posting, I just threw out the question, “what if the internet was organized like a social network, what would it look like?” I spent a little time throwing out some initial ideas that came to me. A commenter wanted to understand my vision of this concept a little better—do I see this relating to conventional wired networks or to the more free-form wireless mesh networks? I thought that I would pontificate a little more on the subject. Social networks are well connected networks. To lay the wire infrastructure to mimic that on a per-user basis would be prohibitively expensive and not scalable. At an ISP level, we might already see some limited social network behaviors as those who are paying customers might be considered an ISP’s “friend,” however, this is a unidirectional relationship seeing as how an ISP would never route internet traffic over its customers’ networks.

Wireless mesh networks or ad hoc networks, at a distance, seem to fit this model much better. In an ad hoc network, peers are connected to lots of other peers and each acts as sort of a smaller router. To get information from one peer to another, you rely on your other peers to route it appropriately. The relations you have with your peers can change as you move. So, applying social networks to ad hoc networks, we can come up with some interesting things. If you find that you communicate with a friend of a friend more efficiently directly than through your mutual friend, you could connect directly thus making the network more connected. Social networks also have an interesting reputation dynamic as multiple people probably have multiple mutual friends. Reputation in a social network can have a positive or negative effect. It can award you additional friend connections or it can make so that nobody wants to communicate with you. You could forward broadcasts to your friends if you yourself find them interesting. There are lots of ways in which a social network like structure could apply to ad hoc networks.

Saturday, October 9, 2010

Computer networks organized like social networks

I’m supposed to reflect twice a week on things related to networking. We just finished the transport layer, but I’ve got to say, I really don’t have a lot to say about it. Sure we’ve read about some cool protocols like DCCP, but other than that, I’m looking forward to moving on to the next topic. Networks are an interesting things. They connect computers and, ultimately, people together. As I’ve been thinking about things to reflect on, one weird question that came to me was, what if the internet was organized like a social network, what would it look like? How would you route information? Could you prioritize information? What kind of addressing would there be?

I imagine that a social computer network would be able to prioritize traffic very easily because you would give your “friends” higher priority. I can imagine security being an interesting thing in a social computer network. You could probably block certain kinds of content easily without firewalls. You could probably measure the goodness of your “friends” to make sure they are allowing you to cross their network as much as they cross yours. You would be able to easily multicast content because other networks that like your content would “share” it on their network. There are some interesting thoughts that come when applying social structures to networks.

Wednesday, October 6, 2010

DCCP

I've been in some interesting discussions this week about the transport layer of the network stack, specifically discussions about transport protocols that depart from what might be considered convention. DCCP has been an interesting topic. It stands for Datagram Congestion Control Protocol. Normally when we think of datagrams we think of UDP, a great protocol for things where timeliness is more important than reliability. UDP is useful in applications such as games, VoIP, and streaming video. These applications only care about receiving the most up-to-date information quickly. Loss is usually acceptable (causing artifacts to occur) as long as there is no delay. The biggest downside to UDP is that it can quickly overload a network because there is no congestion control. For this reason, a lot of network administrators block UDP from travelling outside of local area networks.

DCCP provides an interesting solution to this problem. They’re protocol provides congestion control without reliability. One of the issues this automatically presents is that of making sure the latest information still arrives in a timely fashion. One of the approaches they took initially was to take TCP and rip out reliability to see if that worked. It didn’t. It turns out that TCP’s tightly coupled algorithms break when all parties are not present. One of the interesting things about DCCP is that congestion control is modular, meaning, you can swap out congestion control algorithms. I think DCCP is a really good idea. There is plenty of need for congestion controlled unreliable protocols since online video services and VoIP have taken off.

Tuesday, September 28, 2010

TCP--does it need to change?

In the graduate networking class today, we were discussing two papers about TCP. One was Congestion Avoidance and Control (CAC) and the other was Simulation-based Comparisons of Tahoe, Reno, and SACK TCP (SCTRS). CAC was written by Van Jacobson, the inventor of TCP. I find the history of TCP very interesting because it is a pretty elegant solution to a very hard problem. The original Tahoe implementation has very deterministic behavior in any kind of situation. For instance, when a packet is lost, you can expect it to act the same regardless of the number of packets loss. The same can't be said for TCP Reno. That really tells you how solid of an algorithm it is. Modern improvements, such as SACK, seem to make it a little more efficient.

So, where can improvements be made? In class, we were discussing another TCP implementation called Vegas, one that uses sending rates rather than window sizes to control congestion. It seems to me like the internet congestion control algorithms seem to work just fine. The only things causing congestion on the internet are P2P networks and spam, but those are application layer issues. I’ve never heard anyone complain about TCP. It provides reliability and congestion control that seem to be working. What more can we ask that layer to do? Maybe transport is where security should go instead of the network or application layers.

Saturday, September 25, 2010

Online Gaming

Online gaming has become a very successful business. With millions of users daily, the server capacity required to host a system like that (e.g. World of Warcraft) is humongous. One of the interesting applications of P2P would be in MMOG’s. P2P can already deliver content in a very speedy fashion. A node in a P2P system can also take advantage of all its available bandwidth. A paper was recently presented in class that proposed a system called Donnybrook. Basically, they took the Quake III source code and modified it to include P2P networking and other tools I will mention to make the game play smooth. Two things they did that I thought were interesting were interest sets and doppelgangers. Interest sets are a set of equations that decide who out of all the players you are most interested in receiving real-time updates about. These equations use player proximity, field of vision, and other aspects of the game to choose the top 5 players in the game you are most likely interacting with. These interest sets change frequently, but they make it so you only have to receive real-time updates from a small number of players—you receive updates from the other players once per second. At first this raised a red flag in my mind when I thought, “well how do they smooth out the game play?” It wouldn’t be acceptable to have players jerking around the map. One of the other things they did that addressed this issue was implementing doppelgangers (i.e. bots). Basically, the game will measure player behavior and predict movement patterns. Therefore, during the time between the longer updates, bots are moving the players in the direction they think the players would have taken—pretty cool concept. The authors were able to achieve a P2P game of Quake III with 900 players using these techniques. That’s amazing!

I’ve previously mentioned that P2P systems cause ISP’s much consternation. If games move to a P2P architecture, will they be able to throttle the traffic like they are? I ask this question because, most people will not contest an ISP’s decision to throttle P2P traffic because they are likely downloading media illegally. But with games, there is nothing illegal about that. There could be some interesting business deals that come out such a gaming system. Another interesting thought would be to employ previously mentioned peer selection techniques to make sure that inter-ISP traffic was minimized. These techniques would also improve the throughput of the game. There are lots of interesting applications for P2P systems, but politics always seem to get in the way (tongue and cheek of course) of innovation.

Thursday, September 23, 2010

BitTorrent clients

Studies of torrent P2P networks are always interesting. I recently read two papers (TopBT: A Topology-Aware and Infrastructure-Independent BitTorrent Client and Taming the Torrent: A Practical Approach to Reducing Cross-ISP Traffic in Peer-to-Peer Systems) about improving the efficiency of torrent P2P networks. One of the biggest problems with torrent systems is that, generally, the only metric used to select peers is download speed. To a P2P user this seems good because it ensures that you are getting your content in the fastest possible manner. However, for ISP's this can be bad because traffic that travels onto other ISP networks costs money. Companies like Comcast have throttled torrent traffic to minimize its effects on their network. P2P systems are a wonderful way of sharing information (legal information, Wink) and these studies are geared towards finding a way in which P2P systems, like BitTorrent, can be used minimizing the cost on ISP's.

Taming the Torrent proposes a system called Ono that uses content distribution networks (CDN) to calculate proximity values for peers in a network. The idea is that if two peers resolve to the same CDN node, then they must be close to each other and possibly within the same ISP autonomous system (AS). This system relies heavily on the fact that CDN's would be willing to provide such a service for free, but I think such as assumption is okay considering companies like Google provide all kinds of cloud services for free around the world. By using CDN's as a reference point, Ono is able to reduce cross-ISP traffic and select peers within the same AS about a third of the time. On average, it is also able to increase download rates by 207%. Pretty impressive!

TopBT's results aren't quite as impressive, but it doesn't rely on a third party service either, which is good. TopBT uses various tools to measure the network between potential peers. Along with measuring download rate, it also uses ping and traceroute to measure the link as well as gain important information about the autonomous systems respective peers are in. The challenge this solution has is that many routers are configured to block ping and traceroute. It thus becomes difficult to figure out the proximity of a peer. Despite these road blocks, TopBT is able to reduce inter ISP traffic by 25% and increase download time by about 15%.

One of the interesting things both of these papers did was release their software for the world to use in order to acquire data. I mean who wouldn't want a torrent client that increases download time by 207% percent. It just rubs me a little funny that the data they got was probably a result of thousands of people obtaining media illegally. Just an interesting thought in conclusion.

Saturday, September 18, 2010

Peer-to-peer

Peer-to-peer systems are unique in that everything about them is truly distributed. One of the interesting things about peer-to-peer systems is that they scale instantly. The more peers join a system, the more computing bandwidth is available. As a result, a peer-to-peer system can quickly transfer data and cool down rapidly where as a client/server architecture can take a while to cool down. P2P networks are a perfect medium for sharing files because many machines can share the burden of getting a file to a user. This can enable a user to truly leverage all their available bandwidth because they are receiving data from multiple nodes.

Distributed architectures are becoming crucial to the success of large companies on the web. Content distribution networks are used to spread out content and data so that users might be able to request it from a source closer to their location. It would be an interesting study to look at putting P2P features into a web browser so that static content could be requested from your closest neighbor. It would also be interesting to look at a P2P type active network for pushing dynamic content out onto the network.

Thursday, September 16, 2010

Packet Dynamics

I just read a paper on End-to-End Internet Packet Dynamics, which was kind of interesting as it showed data about changes in packet flow from December 1994 - December 1995. The paper details an experiment the authors performed that measured TCP bulk transfers between 35 sites running special measurement daemons. One of the interesting data points shared was that during the first experiment, out-of-order packet delivery was quite prevalent. It is interesting because reordering in TCP can cause of lot of packet retransmission, which would have been an expensive thing to do considering the internet was very small back then with considerably lower bandwidth capabilities than we have today. There are times, however, where a packet is honestly lost and retransmission is necessary. The authors found that during the first experiment, the ratio of good retransmissions to bad one was 22. In the second experiment (a year later), they increased the window size and the ratio increased to 300 which is much better.

One of the other parts in this paper that I found interesting was that of packet loss. One of the initial data points that they gave was that between the first and second experiments that packet loss increased. One of the measurements they took was rates of ack loss. The data shows that at one point, ack's flowing into the US were like likely to be lost than those flowing into the Europe. Those roles switched in the subsequent experiment. 

I think the current internet infrastructure is fairly stable today and packet loss is generally low on a good connection. One of the interesting complaints I hear from a lot of people is that the internet is so much faster in other companies compared to what is available in the US. Obviously other countries don't have the infrastructure in place on the scale the US does, but it would be interesting to study the dynamics of those smaller systems to learn something from how those countries decided to build their networks. I think one of the great advantages that countries who are relatively new in building up internet connectivity is that they can learn from the mistakes of countries like the US. This would enable their networks to be faster, in a sense, than the US because they wouldn't have to build and work around their mistakes. 

Monday, September 13, 2010

A future internet

The internet has become a crucial part in every day life. Its creation has spurred other innovations that have worked themselves into what is considered the norm. The internet has enabled collaboration to occur that has never been known before its existence. I believe, going forward, that it is not only important, but crucial to have an internet architecture that can easily evolve with the latest standards and increasing demands put upon it. I wish to give my input for what I think needs to be the focus for a new internet architecture.

First, security. From the internet's conception, security was never a design goal because the architects never envisioned an internet at the magnitude it has grown to today. With the explosion of eCommerce, online gaming, and social networking, among other things, exploding on the web, malicious users don't even have to leave their homes to steal someone's identity. Security is a must.

Second, protocol flexibility. What I mean by that is that network entities should not have to be at the mercy of hardware vendors or large commercial organizations in order to try out and/or implement new protocols. If we look in the software space, the concept of open standards has tended to push out proprietary commercial protocols because users were able to freely try them out.

Third, addressing should be name based. These names should be logically constructed and rememberable. I think the postal system has a fairly nice way of assigning addresses to locations. Although some street names might

be a little eccentric, generally the naming convention is logical and rememberable. We shouldn't need systems like DNS to resolve a name to a number, we should just be able to give a name and know the location or the resource immediately. 

Connecting the whole world in an efficient fashion is not an easy problem to solve. The current internet architecture has done extraordinarily well and I think there have been some great learning experiences. But, now that more than just scientists use it, I think we need to attack a new architecture from a "customer needs" perspective.

Wednesday, September 8, 2010

Active Internet Architecture

I find the concept of an active internet architecture very interesting. I just read a SIGCOMM paper called Towards an Active Internet Architecture. The paper outlines an architecture where the computing power of the network is used to route packets through the network. To leverage the network's computing power the paper proposes that instead of packets, the network will switch capsules. A capsule is comprised of a custom user program, that is executed at every hop, and other information that is normally included in a packet. These custom programs are advantageous in that when they are executed at a hop, they could customize the data for the next hop, specify where to hop next and a myriad of other things. The cool thing this gets is the ability for a network to evolve on its own. That means that network administrators could try out new standards without waiting for hardware vendors to decide to implement it--the standard would just be encoded into the capsule. 

One of the biggest challenges of releasing new internet standards is adoption. The internet has grown to such a size where pushing out new standards simultaneously is infeasible. I think in the long run an architecture such as this would actually quicken the pace of standards adoption. For instance, smaller ISP's could deploy new standards on their network without affecting outside networks and ISP's. As soon as enough contiguous smaller ISP's have these standards implemented, the larger ISP, from whom these smaller ISP's purchase bandwidth, could then deploy these standards to connect these smaller ISP's by these standards. In this way, standards could grow incrementally.

Sunday, September 5, 2010

Design Philosophy of the Internet

As many of you know, the internet started as a government research project called ARPANET whose primary goal was to allow independent networks to communicate with each other. There were also several secondary goals which are mentioned in The Design Philosophy of the DARPA Internet Protocols. I wish to discuss a few things I found interesting in this paper. The secondary goals are as follows:

  1. Internet communication must continue despite loss of networks or gateways
  2. The Internet must support multiple types of communication services
  3. The Internet architecture must accommodate a variety of networks
  4. The Internet architecture must permit distributed management of its resources
  5. The Internet architecture must be cost effective
  6. The Internet architecture must permit host attachment with low level of effort
  7. The resources used in the Internet architecture must be accountable

 

Considering this system was originally developed for military use, I find 1 & 2 very obvious and necessary goals. What is surprising to me though, is that security is not in that list. Many government agencies have very strict security policies. For example, Agilent Technologies develops test and measurement equipment. One of their customers is the NSA. If something goes wrong with the equipment, the engineer at the NSA is not allowed to copy and paste error messages or take screen shots of them. He must write them down by hand and email them to the support people at Agilent. Talk about a little paranoia but, you see the point. Either the government wasn't worried about the network communications being intercepted (which would surprise me) or they figured this network would be so small and somewhat protected that it would be impossible to intercept. I often wonder what the internet protocols would be like if security had been a design goal from the beginning. Security in any system can incur large amounts of overhead, which is why, I think, it usually comes last. Today we have IPsec and multiple application layer protocols, but those were after thoughts.

Thursday, September 2, 2010

Future internet impressions

I just finished reading a conference paper entitled A Data-Oriented (and Beyond) Network Architecture. It was very interesting and I wish to share some thoughts I found interesting. This paper takes a "clean-slate" look at internet naming and addressing. I find this an interesting read in loo of the answer-to-all-our-addressing-problems IPv6 not taking hold. So what is the problem with what is currently being used? This paper explains that the internet naming and addressing scheme is centered around getting a user connected to a particular machine in order to request content and services. Interestly, when people use the internet, it is not about what machine or server they are connected to that's important, it's about the content and/or services that machine provides. Users could care less if they were connected to a server in Texas or a server in India, just as long as they get the CNN.com content they requested. This paper changes addressing in such a way that an address or name no longer refers to "where," it refers to "what."

So this new name addressing system is supposed to make it easier for users to get at content and services, but I think they might run into issues when it comes to usability. One of the great weaknesses of the addressing scheme and that of IPv6 as well, in my opinion, is that the addresses are not user friendly. With IPv6 it is all wonderful and great that the address space is practically limitless, but I know many IT organizations that will never move it because they can't memorize the addresses. The same is true for this paper--the address is comprised of a cryptographic hash and a label. When was the last time you memorized a hash? Don't get me wrong, I think there is great value in referencing content by name, but only if the name it is given makes sense.

Wednesday, April 28, 2010

ASP.NET RadioButton “patch”

I previously blogged about a problem regarding radio buttons in ASP.NET. Well, I had hope that .NET 4 would fix this problem, but alas it was not. It turns out that even though you can control client ID’s in ASP.NET 4, you still can’t really control group names. Well, for a project I’m working on, I decided to fix the problem. Here is the code for what I did.


   1:  using System;
   2:  using System.Collections.Generic;
   3:  using System.Collections.Specialized;
   4:  using System.Linq;
   5:  using System.Reflection;
   6:  using System.Text;
   7:  using System.Web.UI.WebControls;
   8:   
   9:  namespace YourNamespace.Web.UI.WebControls
  10:  {
  11:      public class RadioButton : System.Web.UI.WebControls.RadioButton
  12:      {
  13:          #region Properties
  14:           
  15:          public override string GroupName
  16:          {
  17:              get
  18:              {
  19:                  return base.GroupName;
  20:              }
  21:              set
  22:              {
  23:                  // Setting a private member in the base class that will make the output of the control correct
  24:                  FieldInfo uniqueGroupName = this.GetType().BaseType.GetField("_uniqueGroupName", 
  25:                      BindingFlags.Instance | BindingFlags.NonPublic);
  26:                  uniqueGroupName.SetValue(this, value);
  27:                  base.GroupName = value;
  28:              }
  29:          }
  30:   
  31:          #endregion
  32:      }
  33:  }

I used the .NET Reflector to figure out what is going on in the real RadioButton class. This fix will allow complete control over group names in a radio button so you can start using it in a repeater and other nice controls. Upon seeing this work, I decided to augment the RadioButtonList control to allow a group name to be specified there as well.


   1:  using System;
   2:  using System.Collections.Generic;
   3:  using System.Linq;
   4:  using System.Reflection;
   5:  using System.Text;
   6:  using System.Web.UI.WebControls;
   7:   
   8:  namespace YourNamespace.Web.UI.WebControls
   9:  {
  10:      public class RadioButtonList : System.Web.UI.WebControls.RadioButtonList
  11:      {
  12:          #region Properties
  13:   
  14:          /// <summary>
  15:          /// The specific group name that each radio button will belong to
  16:          /// </summary>
  17:          public string GroupName
  18:          {
  19:              get
  20:              {
  21:                  string str = (string)this.ViewState["RadioButtonListGroupName"];
  22:                  if (str != null)
  23:                  {
  24:                      return str;
  25:                  }
  26:                  return string.Empty;
  27:              }
  28:              set
  29:              {
  30:                  // Override the repeating control with the new information
  31:                  FieldInfo repeatingControlField = this.GetType().BaseType.GetField("_controlToRepeat",
  32:                      BindingFlags.Instance | BindingFlags.NonPublic);
  33:   
  34:                  System.Web.UI.WebControls.RadioButton oldRepeatingControl = repeatingControlField.GetValue(this) as System.Web.UI.WebControls.RadioButton;
  35:                  if (oldRepeatingControl != null)
  36:                  {
  37:                      this.Controls.Remove(oldRepeatingControl);
  38:                  }
  39:   
  40:                  RadioButton newRepeatingControl = new RadioButton();
  41:                  newRepeatingControl.EnableViewState = false;
  42:                  newRepeatingControl.GroupName = value;
  43:                  this.Controls.Add(newRepeatingControl);
  44:                  newRepeatingControl.AutoPostBack = this.AutoPostBack;
  45:                  newRepeatingControl.CausesValidation = this.CausesValidation;
  46:                  newRepeatingControl.ValidationGroup = this.ValidationGroup;
  47:   
  48:                  repeatingControlField.SetValue(this, newRepeatingControl);
  49:                  this.ViewState["RadioButtonListGroupName"] = value;
  50:              }
  51:          }
  52:   
  53:          #endregion
  54:      }
  55:  }

This uses the RadioButton class that I showed previously. What’s nice about .NET is that if you want to use these controls right away without changing your existing code to can use the tag mapping feature in the web.config. This will basically remap the tags to a different assembly.

Thursday, January 14, 2010

ASP.NET Radio Button

I’ve been asked a lot lately about radio buttons and how to use them. Normally I suggest using the RadioButtonList control built right into asp.net. This allows you choose from a couple different layouts and repeat directions.

   1: <asp:RadioButtonList ID="rblButtonList" runat="server" RepeatColumns="3"></asp:RadioButtonList>
   2: <asp:Button ID="btnSubmit" runat="server" Text="Submit" OnClick="btnSubmit_Click" />
   1: protected void Page_PreRender(object sender, EventArgs e)
   2: {
   3:     // Generate some random data.
   4:     List<ListItem> products = new List<ListItem>();
   5:     // This is simulate a list of products that can be selected in the RadioButtonList
   6:     int numOfItemsToCreate = 20;
   7:     for (int i = 0; i < 20; i++)
   8:     {
   9:         ListItem item = new ListItem()
  10:         {
  11:             Text = String.Format("Item {0}", i + 1), // This will normally be the text that is shown to the user (i.e. product name)
  12:             Value = i.ToString() // A unique id identifying the the item (i.e. tag)
  13:         };
  14:         products.Add(item);
  15:     }
  16:     // Bind the list to the RadioButtonList
  17:     rblButtonList.DataSource = products;
  18:     rblButtonList.DataBind();
  19: }
  20:  
  21: /// <summary>
  22: /// Handles the button click event
  23: /// </summary>
  24: /// <param name="sender"></param>
  25: /// <param name="e"></param>
  26: protected void btnSubmit_Click(object sender, EventArgs e)
  27: {
  28:     try
  29:     {
  30:         lblMessage.Text = String.Format("Selected {0}", rblButtonList.SelectedItem.Text);
  31:     }
  32:     catch { }
  33: }

For simplicity’s sake, I suggest this control, but what if you want to incorporate a group of radio buttons in some other layout and display them using a repeater? This presents a problem. As most of you have already figured out when you put an ASP.NET RadioButton in a repeater, you lose your grouping, even if you set the GroupName. In my opinion, this was a huge oversight in Microsoft’s part, but something I hope that will be address in ASP.NET 4.0 since there will be more control over client ID’s. Here’s an example, using a repeater, of how I worked around the issue.

1: <h2>
2:     Repeated Radio button</h2>
3: <asp:ScriptManager ID="scriptManager" runat="server"></asp:ScriptManager>
4: <asp:UpdatePanel ID="upUpdatePanel" runat="server">
5:     <ContentTemplate>
6:         <asp:Repeater ID="rptButtonRepeater" runat="server">
pace; font-size: 12px; margin: 0em; width: 100%;">7:             <HeaderTemplate>
8:                 <table>
9:             </HeaderTemplate>
10:             <ItemTemplate>
11:                 <tr>
12:                     <td>
13:                         <%# Container.DataItem %>
14:                     </td>
15:                     <td>
16:                         <input type="radio" name="choice" value='<%# Container.DataItem %>' />
17:                     </td>
18:                 </tr>
19:             </ItemTemplate>
20:             <AlternatingItemTemplate>
21:                 <tr>
22:                     <td style="background-color: Silver">
23:                         <%# Container.DataItem %>
24:                     </td>
25:                     <td style="background-color: Silver">
26:                         <input type="radio" name="choice" value='<%# Container.DataItem %>' />
27:                     </td>
28:                 </tr>
29:             </AlternatingItemTemplate>
30:             <FooterTemplate>
31:                 </table>
32:             </FooterTemplate>
33:         </asp:Repeater>
34:         <asp:Button ID="btnRepeaterButton" runat="server" Text="Repeater Submit" OnClick="btnRepeaterButton_Click" />
35:         <asp:Label ID="lblRptMessage" runat="server"></asp:Label>
36:     </ContentTemplate>
37: </asp:UpdatePanel>
   1: protected void Page_PreRender(object sender, EventArgs e)
   2: {
   3:     // Generate some random data.
   4:     List<ListItem> products = new List<ListItem>();
   5:     // This is simulate a list of products that can be selected in the RadioButtonList
   6:     int numOfItemsToCreate = 20;
   7:     for (int i = 0; i < 20; i++)
   8:     {
   9:         ListItem item = new ListItem()
  10:         {
  11:             Text = String.Format("Item {0}", i + 1), // This will normally be the text that is shown to the user (i.e. product name)
  12:             Value = i.ToString() // A unique id identifying the the item (i.e. tag)
  13:         };
  14:         products.Add(item);
  15:     }
  16:     // Bind the list to the RadioButtonList
  17:     rptButtonRepeater.DataSource = products;
  18:     rptButtonRepeater.DataBind();
  19: }
  20:  
  21: protected void btnRepeaterButton_Click(object sender, EventArgs e)
  22: {
  23:     lblRptMessage.Text = String.Format("Selected {0}", Request.Form["choice"]);
  24: }

As you can see you have to go back to grass roots on this one. Here are some other articles on the subject that you might find interesting: