ETech 2007 Day 2 p.m. sessions

Don MacAskill (SmugMug) -- Set Amazon's Servers on Fire, Not Yours

Smugmug 140MM photos, no debt, profitable since first year. 192TB stored at S3. Doubling yearly.

  • S3::Simple Storage Service. $0.15/Gb/month w/replicats. REST API. Fast, not 15K-SCSI fast, but internet fast.

Why use them?

  • not a lot of web scale expertise on planet earth
  • reputation for systems
  • [he] once competed with Amazon - fatbrain
  • They eat their own dogfood. Dozens of products.
  • Focus on the app, not the muck.

Show me the money!

  • Guestimate: ~$500k save per year
  • Actual:
    • Growth: 64MM photos -> 140MM photos
    • Disks would cost $40k -> $100k/month
    • $922K would have been spent
    • $230K spent instead
    • $692K in cold, hard savings
  • Nasty taxes (on capital goods)! $295K 'saved' in cash flow. Bonus!
  • Reselling disks to recoupe dunk costs
  • this is a partial cost of ownership number

sweet spots

  • perfect for startups & small companies
  • ideal for store lots, serve little businesses of all sizes
  • not so great (yet) for serving lots if you're a medium or large sized business. Transfer costs high if you can buy bandwidth in 1Gbps+ chunks.
  • We're a store lots, serve lots company. What to do?

Like SmugFS

  • Architecture remarkably similar to internal Smug filesystem
  • Similar to lots of startups
  • Stupid we're all building the same thing
  • Easy to drop in
  • Started on Monday, live in on production on Friday

S3 evolution

  • started just doing secondary storage. Too cold!
  • Tried out as Primary. Too hot!
  • Finally hot & cold model == just right!
  • Amazon gets 100% of the data
  • Smugmug keeps hot data local (about 10%)
  • 95% reduction in # of disks bought

Sample Request

  • they check for image in cache, if not there they log it and then retrieve the image from S3 and return to client

Proxy vs Redirect vs Direct Links

  • Build SmugMug->S3 with multiple mods
  • Can flip a switch to change
  • Nearly 100% served are proxy reads
  • Sometimes HTTP redirects
  • Rarely direct S3 links

Permissions

  • SmugMug has complicated permissions
  • Passwords, privacy, external links
  • Proxying allows strong protection

REST vs SOAP

  • loves rest, hates SOAP
  • Lightweight
  • Nothing useful added with SOAP's complexity

Reliability

  • not 100%, close though
  • more reliable than SmugFS
  • no service level agreement
  • Lots of failure points:
    • SmugMug's datacenter
    • Internet backbones
    • Amazon's datacenter
  • No other software, hardware, or service [they] use is 100% reliable either

Handling failure

  • Build from day one with dailure in mind.
  • Stuff breaks, try again
  • Writes ail? Write locally, sync later.
  • Reads fail? Handle Intelligently. Alerts?

Performance

  • Fast for reads and writes
  • Mostly speed of light limited 20-80ms
  • Parallel I/O for massive throughput. 100s of Mbps
  • Machine measurable, human indistinguishable

CDN?

  • S3 is not a CDN[content delivery network]
  • it's storage
  • no global locations yet
  • limited edge caching
  • perhaps a future Amazon web service?

How do they do their proxy reads?

Store and forward vs stream

  • Store and forward
    • great resiliency
    • poor performance
    • if it's a big file, really poor performance
  • Stream
    • poor resiliency
    • great performance
    • do a quick HEAD first to verify

The speed of light problem

  • he was misquoted as saying Amazon was slow when trying to explain the speed of light
  • Amazon has not solved fasther than light data transmission. yet.
  • unavoidable, make sure your application can tolerate
  • parallelized I/O can mask problem
  • caching can help
  • streaming can help

Outages and Problems

  • not perfect, five major issues
  • 3 outages of 15-30 minutes, 2 were core switch failures and one DNS problem. Amazon.com affected.
  • 2 performance degradations. On a smugmug customer noticed, another wasn't noticed
  • Not a big deal, everything fails, expect it.

SLA, Service and Support

  • Smugmug do not care about SLA, but others might
  • Service Support: One area where Amazon is weak.
    • This is a utility
    • They need a service status dashboard
    • Pro-active customer notifications
    • Ability to get a hold of a human
  • Support for developers is quite good.
  • Amazon.com's customer service is good, AWS will likely catch up

Saving SmugMug's butts

  • knocked out power to ~70TB of storage. Oops!
  • Moved datacenters during normal business hours, customers not affected
  • Stupid bugs

Miscellaneous Tips

  • use cURL
    • fasther
    • more reliable
    • storing vs streaming is simple
  • make stuff as asynchronous as possible
    • hides speed of light issues
    • hides or masks problems
    • fast customer service

Elastic Compute Cloud (EC2)

  • Like S3 but for computing
    • scale up or down via API
    • web servers, procesing boxes, development test beds, etc
  • Launching large EC2 implementation "soon"
    • image processing
    • 500k-1M photos/day
    • 10-20 terapixels/day processed
    • peaky traffic on weekends, holidays
    • ridiculously parallel

Simple Queue Service (SQS)

  • Simple, reliable queueing
  • Mates well with EC2 and S3
    • Stick jobs in SQS
    • retrieve jobs with EC2 instances using S3 data
    • run jobs, report status to SQS
  • $0.10/1000 items
    • Priced well for small projects
    • gets costly for large ones (millions)

Missing Pieces

  • Database API or DB grade EC2 instances
    • Fast (lots of local spindles, lots of RAM)
    • Persistent
  • Load Balancer API
    • Single IP in front of lots of EC2 instances
    • Programmable to add/remove/change clusters
    • Can be done with software on an EC2 instance, but painful
  • CDN

Slides to be at http://blogs.smugmug.com/

  • How are they using EC2?
    • The EC2 instances invoke smugmug APIs to do work. The SmugMug servers don't really know much about EC2
  • I asked: Has their use of Amazon been an issue, either to outside investors or customers?
    • Not an issue, they have no outside investors, and further they've talked with VCs to raise the issue that startups should be looking at Amazon's services (and if not, why not)

Superninja Privacy Techniques for web Application Developers

Marc Hedlud and Brad ...? from Wesabe. Wesabe is a personal finance web application.

  1. Keep critical data local. If there's data you'd never ever ever want to lose, don't put it on a web site.
    • created a wesabe uploader for Mac/Windows to keep bank credentials on your computer. The uploader downloads data from bank sites, strips certain data out of the files, then uploads to Wesabe.
    • don't trust the site. sensitive data filtered before it ever reaches the server.
    • requires a download
    • puts burden on user to maintain a secure machine (same risk as using a web browser to bank)
    • if successful, risk of trojan targeting
  2. Use a privacy wall to separate public and private data
    • use secret key as index in db
    • secret key is only computed when user is logged in (they use hash(password + salt))
    • secret key stored in session data
    • other paths through the db: need to ensure that if you're using a privacy wall all transactions must traverse the privacy wall
    • the data itself can leak information
    • logs and exception reports can capture leaked information
    • password changin and recovery becomes trickier
    • use a locker generate a one time key for user stored in locker
    • encrypt using the locker rather than the password
    • troubleshooting can be harder
  3. Use partitioning to protect against breaches
    • keep pools of sensitive data separate
    • eg membership and financial records kept separate
    • no relationship between them other than status
    • reduces impact of any brach -- firewalls off anything truly identifiable
    • allow separate politices and approaches by data type
    • pretty much zero drawbacks other than implementation time
  4. data fuzzing and log scrubbing
    • (currently) no requirement to retain specific data on users of a server (in the US)
    • Subpoena / warrant may require that you give up all data on a user
    • Different countries have different data retention politices (see epic.org)
    • filter key parameters from logs
    • remove some of the precision of IP addresses
    • remove precision from timestamps since they too can be used to identify someone (cf. example of whistleblower information)
    • prevents leakage of passwords
    • avoids giving attackers / law enforcement a way through the privacy wall
    • loss of certain private data may require you to notify your customers
    • best protection is to delete your logs
    • important to have a public policy in place (cf link to eff.org policy information)
    • no protection against wiretap orders
    • difficult to cover all your bases (use centralized logging)
  5. use voting algorithms to determine public information
    • "the esp game" to tag things at CMU.edu. If two people tag something the same thing at the same time, maybe that's a good tag to apply.
    • look at google image labeller
    • when people agree on a term, it's common knowledge
    • if enough people agree, it's probably publicly known
    • private transactions shouldn't be shown on the site
    • lots of users naming a merchange probably means it's public
    • works on opaque information
    • reliable -- very few faults since launch
    • no manual work needed

    drawbacks

    • information is hidden until threshold met (understates available info)
    • can leak data if threshhold is too low

    miscellaneous

    • hash your passwords. don't store in plaintext.
    • random (non-sequential) database ids. Don't use auto-inc ids in public data.
    • data bill of rights -- your data is your data. can export, delete, etc.

    more information

    Posted in ETech

    Comments

    Comments are hosted through disqus effective November 2008.

    Archives

202: Accepted Archives

Feed icon We use Feedburner to distribute our web feeds: 202 Accepted Feed

feedburner graphic

Comments are hosted via disqus.

Google

Copyright 2002–2008 Artific Consulting LLC.

Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.