Thursday, December 9, 2010

Intrusion detection

I have an aversion to intrusion detection systems. Not because they aren’t a useful tool, but that they usually get in way more than they help. I speak more of application layer intrusion detection systems that attempt to weed out URLs and sites that might contain malicious content. BYU’s new Cisco IDS is a resounding example of one that gets in the way. I used to have a job as a developer in one of the colleges on BYU campus. There were a few applications that had been written or purchased in the past that were loaded with SQL injection and XSS vulnerabilities that we had to support. Not a big deal, we just kept them isolated on their own VMs running in an environment with very limited privileges. We eventually installed the mod_security apache module which is an IDS system that is supposed to stop SQL injection and other malicious attacks by analyzing the text of the incoming and outgoing requests. My biggest problem with it was that it created more problems than it was fixing. We tried putting it in front of a content management system we had, but with so much free form text being submitted to it, it was killing sessions left and right. I eventually went in and turned it off to the dismay of those who set it up. I think the idea was good, but the fact that we were relying on it to correct bad software design was not.

My point in telling this story is to point out that IDS systems are hard. They don’t make a good substitute for bad software, but I think they are useful in weeding out the viciously malicious. If IDS systems could be trained to know what is good traffic and what is bad, I think they could be beneficially. That also means that they’d have to be trained specifically for each application they were protecting. It would be an interesting area of research.

Tuesday, December 7, 2010

Spam and Machine Learning

A year or so ago I took a machine learning class. Going into the class I thought we were going to learn how to build AI’s. Turns out that wasn’t the focus, but I did learn some other valuable things. Machine learning, at least at best as I can describe it, is a field of study in which algorithms can evolve based on empirical data. It is very interesting to watch that happen. It is actively applied in many different fields in and outside of computer science. Meteorology is a perfect example—tons of data gathered and used to help algorithms evolve to model future weather patterns. Machine learning can also play a crucial role in spam filtering systems as well. Since spammers tend to get pretty creative in their means to get around the road blocks set before, filtering software needs to evolve along with it. Enter machine learning.

A paper I recent read outlines an application of machine learning in spam filtering. They claim that most blacklists fail to keep pace with spammers because they are based filtering assumed persistent identifiers (e.g. IP addresses) and they compartmentalize email-sending behavior to a single domain rather analyzing behaviors across domains. This paper proposes a behavioral blacklist approach. They introduce a new system called SpamTracker which uses clustering, based on a principal components analysis, and classification algorithms to detect pre-blacklisted spam. Their system does okay. It is meant more to supplement existing systems rather than replace them, but at least it closes the gap a little more. As with all machine learning, the work is in tuning parameters, getting better data to train against, and deciding which features to “learn” from.

Thursday, December 2, 2010

Network Security

We are embarking on our last network research area: network security. I just want to spend a little bit of time talking about some thoughts that I have concerning network security. My first thoughts when thinking about network security are firewalls, IPSEC, DNSSEC, packet shapers, filters, etc. These are all important tools that aid in keeping a network safe from attackers and malicious insiders. There is another area of network security that seems to have become its own separate research area (while still quietly remaining a subarea of network security) called internet security. In my opinion, internet security deals primarily with application layer tools like intrusion detection systems, spam filters, and secure communication protocols (HTTPS, S/MIME, PKI, Diffie Hellman, WS-Trust, etc.). All of the aforementioned systems and protocol names and acronyms occur at the application layer. It seems like a lot of effort goes into protecting the application layer and for good reason, but is there not something more we can do in the lower layers to help protect the network better?

An interesting researching topic would be that of augmenting BGP to put some claims verification system to prevent malicious networks from hijacking traffic. My idea stems from the incident in China where due to a BGP misconfiguration, all traffic from the US was routing through China for a brief period of time. It was as simple as China advertising a better route than everyone else. Obviously, routing traffic meant for US networks, through China and back the US is not a better route. But, it begs the question, is there not a away to prevent such a claim?