Notes
Slide Show
Outline
1
Gold and Fool’s Gold:
Successes, Failures, and Futures
in Computer Systems Research
  • Butler Lampson
  • Microsoft
  • Usenix Annual Meeting
  • June 2, 2006
2
Context: Moore’s Law and Friends
3
What is computing good for?
4
Simulation: Protein Folding
5
Communication: Maps and Pictures
6
Embodiment: Roomba Vacuum
7
History: What Worked?
  • YES
  • Virtual memory
  • *Address spaces
  • *Packet nets
  • Objects / subtypes
  • RDB and SQL
  • *Transactions
  • *Bitmaps and GUIs
  • Web
  • Algorithms
8
History: What Worked?
  • MAYBE
  • Parallelism (but now we really need it)
  • Garbage collection
  • Interfaces and specifications
  • Reuse / components
    • Works for Unix filters
    • Platforms
    • Big things (OS, DB, browser)
    • Flaky for Ole/COM/Web services
9
The Failure of Systems Research
  • We didn’t invent the Web


  • Why not? Too simple
    • Old idea
      • But never tried
    • Wasteful
      • But it’s fast enough
    • Flaky
      • But it doesn’t have to work
  • Denial: It doesn’t scale
    • Only from 100 to 100,000,000
10
The Future: Motherhood Challenges
  • Correctness
  • Scaling
  • Parallelism
  • Reuse
  • Trustworthiness
  • Ease of use
11
Jim Gray’s challenges
  • The Turing test: win the impersonation game 30% of the time.
    • Read and understand as well as a human.
    • Think and write as well as a human.
  • Hear and speak as well as a person: speech↔text.
  • See and recognize as well as a person.
  • Remember what is seen and heard; quickly return it on request.
  • Answer questions about a text corpus as well as a human expert.  Then add sounds, images.
  • Be somewhere else: observe (tele-past), interact (tele-present).
  • Devise an architecture that scales up by 106.
  • Programming: Given a specification, build a system that implements the spec. Do it better than a team of programmers.
  • Build a system used by millions, administered by ½ person.
    • Prove it only services authorized users.
    • Prove it is almost always available: (out < 1 second / 100 years)
12
A Grand Challenge:
  • A pure computer science problem
  • Needs
    • Computer vision
    • World models for roads and vehicles
    • Dealing with uncertainty about sensor inputs, vehicle performance, changing environment
    • Dependability
13
What is dependability?
  • Formally, the system meets its spec
    • We have the theory needed to show this formally
    • But doing it doesn’t scale
    • And worse, we can’t get the formal spec right
      • Though we can get partial specs right
      • “Sorry, can’t find any more bugs.”
  • Informally, users aren’t surprised
    • Depends on user expectations
      • Compare 1980 AT&T with cellphones
      • How well does the market work for dependability?
14
How much dependability?
  • How much do we have? It varies
    • As much as the market demands
      • Is there evidence of market failure?
    • Almost any amount is possible
      • If you restrict the aspirations
      • In other words, there’s a tradeoff
  • How much do we need? It varies
    • But safety-critical apps are growing fast
    • What’s the value of a life? Wild inconsistency
      • Look at British railways
  • Dependable vs. secure
15
Measuring dependability
  • Probability of failure
    • From external events
    • From internal malfunction
      • complexity (LOC☺) ´ good experience (testing etc.)
  • Cost of failure
    • Injury or death
    • External damage
      • Business interruption
      • Breakage
      • Bad PR
    • TCO
  • What’s the budget? Who gets fired?
16
Dependability through redundancy?
  • Good in its place
  • But need independent failures
    • Can’t usually get it for software
      • Example: Ariane 5
    • Even harder for specs
      • The unavoidable price of reliability is simplicity—Hoare
  • And a way to combine the results
17
Dependable Þ No catastrophes
  • A realistic way to reduce aspirations
    • Focus on what’s really important
  • What’s a catastrophe?
    • It has to be very serious
    • Must have some numeric measure
      • Dollars, lives? Say $100B, 1000 for terrorism
      • Less controversial: Bound it by size of CCB
  • Must have a “threat model”: what can go wrong
    • Probabilities must enter
    • But how?
18
Examples of catastrophes
  • USS Yorktown
  • Terac 25 and other medical equipment
  • Loss of crypto keys
  • Destruction of big power transformers


  • Are there any computer-only catastrophes?
19
Misleading examples of catastrophes
  • Avionics, nuclear reactors
    • Most attention has gone here
    • But they are atypical
      • Lots of stuff has to work
      • Shutdown is impossible or very complex
  • Impossible goals
    • Never lose a life.
      • Maybe OK for radiation
      • No good for driving
    • No terrorist incidents
    • No downtime
20
Catastrophe prevention that hasn’t worked
  • Trusted computing base for security
  • Electric power grid
  • Air traffic control
    • The spec said 3 seconds down/year/workstation
21
Architecture — Catastrophe Mode
  • Normal operation vs. catastrophe mode
    •  Catastrophe mode Þ high assurance CCB
  • Catastrophe mode requires
    • Clear, limited goals = limited functionality
      • Hence easier than security
    • Strict bounds on complexity
      • Less than 50k lines of code?
  • Catastrophe mode is not a retrofit


22
Catastrophe mode
  • What it does
    • Hard stop (radiation therapy)
      • Might still require significant computing
    • Soft stop (driving a car)
      • Might require a lot of the full functionality, but the design center is very different
    • Drastically reduced function (ship engines)
  • How it does it
    • Take control, by reboot or hot standby
    • Censor (no radiation if limits exceeded)
    • Shed functions
23
Techniques
  • Reboot—discard corrupted state
  • Shed load
  • Shed functions
  • Isolate CCB, with minimal configuration


  • Transactions with acceptance test
    • Approval pages for financial transactions
  • Undo and rollback
  • Well-tested components
    • Unfortunately, successful components are very big
24
Learning from security
  • Perfection is not for this world
    • The best is the enemy of the good
    • Set reasonable goals
  • Dependability is not free
    • Customers can understand tradeoffs
    • Though perhaps they undervalue TCO
  • Dependability is holistic
  • Dependability is fractal


25
Dealing with Uncertainty
  • Unavoidable in dealing with the physical world
    • Need good models of what is possible
    • Need boundaries for the models
  • Unavoidable for “natural” user interfaces: speech, writing, language
    • The machine must guess; what if it guesses wrong?
  • Goal: see, hear, speak, move as well as a person. Better?
  • Teach as well as a person?
26
Example: Speech “Understanding”
  • Acoustic input: waveform (speech + noise)
  • “Features”: compression
  • Phonemes
  • Words: dictionary
  • Phrases: Language model
  • Meaning: Domain model


  • Uncertainty at each stage.
27
Example: Robots
  • Where am I?
  • What is going on?
  • What am I trying to do?
  • What should I do next?
  • What happened?
28
Paradigm?: Probability Distributions
  • Could we have distributions as a standard data type?
    • Must be parameterized over the domain (like lists)
  • What are the operations?


  • Basic problem (?): Given distribution of x, compute distribution of f(x).
    • Hard when x appears twice in f – independence

29
Conclusions for Engineers
  • Understand Moore’s law
  • Aim for mass markets
    • Computers are everywhere
  • Learn how to deal with uncertainty
  • Learn how to avoid catastrophe