This is part one of a three part interview series with Sam Ramji, well-known technologist and the CEO of the Cloud Foundry Foundation. In this part, Sam discusses learning to program as a kid in Oakland and the strange road that led him to Microsoft. In part two, Sam talks about the next stage of his career and what it was like to do briefings about open source for Bill Gates. In part three, we learn about Cloud Foundry’s mandate and the role the PaaS plays in the new stack ecosystem.
Alex Williams: You were young when you started getting into technology.
Sam Ramji: We didn’t even call it technology. It was this magical thing that they put in the school library at Joaquin Miller Elementary in East Oakland by the Warren Freeway. I was in fourth grade and they brought in these Commodore PET computers. It was pretty cool.
They didn’t really know who to have teach these — it was brand new. Somebody had convinced the Oakland Unified Public School District to make this crazy investment, that these Commodore PET computers could help kids learn. They picked a couple of teachers. The teachers didn’t have special subjects — it was just like, “You’re a first grade teacher, not a Math teacher” — it went K through 6 there.
We came to the library one day and they said, “Hey, we’ve got these computers,” and they started doing the curriculum. We had Logo turtles. Back in those days it was a physical turtle (now it’s a theoretical turtle) with the pen, and you’d say, “Pen up,” and it would go, “Ungrrrr.” It was a little robot on the end of a wire being programmed by these Logo commands that you’re punching into the computer.
It was so cool. We learned BASIC — that was the shell; if you wanted to say anything to the PET, you had to type it in BASIC. All the programs shipped with source. We didn’t think about it in those days. It was just a computer program. It wasn’t like, “Ooohh, we’ve got the sources.” That was just how it was. I remember they had this Dungeons and Dragons game — it was kind of like NetHack or roguelike, I suppose. Of course, it came with all the sources.
The school decided you could sign out the computer over the weekend. Now, these things were not like laptops; they’re big, bulky things. For a fourth-grader, it weighed like an elephant. My best friend’s mom would always sign it out over the weekend. She’d throw it in the back of her beat-up Volvo, and I’d be trying to wrangle an invitation and figure out how to spend the weekend at his house, and we’d be playing these silly games.
There were text-based games — you move your pixel around or you move the sprite around the screen. Then we’d go look at the source and we’d change the names of all the monsters. It’s such basic stuff — literally BASIC programming — that was the game. You couldn’t draw a line between when you started playing games and when you started learning to program and writing things on your own from scratch. I think it was a Bill Gates quote: “The best way to learn programming is to read code written by good engineers.” Those were our programming books — the other people’s source code.
Alex: So that was fourth grade? What year was that?
Sam: It was 1980. I was nine years old.
Alex: Did you continue to use computers through high school?
Sam: Yeah, I saved up and bought a ZX Spectrum 48k — that would be in ’83, maybe the very beginning of ’84. That was probably the most horrible programming environment you can possibly imagine. Because, even though you had a keyboard where you could type G-O-T-O for your gotos, or whatever, the way that the designers had optimized the spectrum — all those commands were actually little primitives, so it didn’t want to have an interpreter and a compiler. You had to give it a particular command. It wasn’t a text interpreter in compilation. You would have to hold down a programming function button and the G key, and that would be goto. All these things were totally illogical. I’m not even sure that the G key was goto, it might have been the H key.
It exploded your brain seeing all these programming overlays on everything. “What was program Q? What was program T?” So, you write out a program listing — maybe I’d copy it out of a programming magazine – there were tons of programming magazines in the ’80s with the full listings — and by the time you entered a thousand-line program, it was like five or six hours and you’re ready to rip your hair out. The whole thing was totally illogical as a function key. Then, of course, where did you store it? You stored it on cassette drive. So you would write it out, you’d push “record” on the tape recorder, then you’d get that sound like a 2400 baud modem: “Eeeh-aaah-ooorm.” Then you’d load it back and make sure it worked. It was pretty funny.
Alex: It was like going to the tape, the actual tape.
Sam: We got pretty good with tape, because sometimes the tapes would break, and you could really carefully use a piece of scotch tape to put the magnetic tape back together. And the spools were always coming out — as it turned out, a No. 2 pencil was the perfect hexagonal shape to fit inside the little cassette wheel thing that can rewind it back. It was very primitive.
My Dad had a computer for work. It was an IBM PC/AT, or something like that: AT and an AT compatible. He had a couple of different ones over the years and I would borrow them when he wasn’t using them, which was most of the time. I remember he was really proud: he got a portable — this thing had to be 20 pounds — it was one of those ones shaped like a suitcase. The handle would be attached by a cable, and you’d unplug the top part of the suitcase, and that would be the base. The computer would come off and that would be the keyboard. Most of the body of the case you’d flip over sideways and there’d be this tiny amber screen and a little floppy drive on the right-hand side. It was kind of awesome!
My Dad was an old-school programmer, so he programmed on the main frame. He came to the United States from India in ’65 to get a masters in chemical engineering at CAL. I guess they must not have had much of a computer science program back then — I’m not really sure. His thesis, he said, was something like 18,000 punch cards on one of those big, old IBM systems, and it was all about catalytic cracking: optimizing the performance of refineries to get the optimal amount of different types of gasolines and gasoline products out of oil. Then he went to work for Shell as a chemical engineer/mainframe programmer in 1967.
Alex: It’s interesting, the oil and gas companies are on the leading edge of using data technologies these days. They’ve really followed the progression of technology innovation.
Sam: They’re such high-intensity industries. Any industry where you can get an extra half-a-billion or billion in profits from a 1 percent optimization, it’s an obvious place to throw computers at the problem; oil and gas, telecommunications, those are all one percent industries, where a percentage point makes a huge impact on the company’s performance.
Alex: By the time you were in high school, there was a buzz building around computers…
Sam: I went to an American school overseas, because it was part of the whole oil and chemical industry. My dad moved to Holland, which is why I ended up with the ZX Spectrum — it was a British computer. Computers were not very popular. There was AP Computer Science, which was in Pascal — now it’s Java — and it was just kind of a nerdy pastime. AP Computer Science only had six or seven of us students. I had been programming for so long that I probably could have taught the class.
I ended up not taking the AP exam because I got hit by a car and I couldn’t get back to all my exams. The AP Computer Science professor finally forgave me. I contacted him after I graduated with my degree in cognitive science from UC San Diego and I said, “Hey, John, I got my degree.” He goes, “I guess we’ll let you off for not taking the AP exam.”
Alex: You got hit by a car, that’s pretty rough.
Sam: It wasn’t great, but I’m still here and I lived to tell the tale.
Alex: Why did you go into this field of cognitive learning at UC San Diego?
Sam: I had taken computers for granted, and I don’t think any of us really thought of it as a profession at the time. It wasn’t something that people were talking about in terms of high school recruiting — “Hey, you should be a programmer.” This is 1989 and we were a long way from Silicon Valley or from Seattle, so it wasn’t really a conversation.
I was going to be pre-med. I was definitely going to be a doctor. I pursued molecular biology at UC San Diego, at Revelle College, which is sort of their super nerdy college, taking all the difficult requirements. I found that chemistry just didn’t make any sense to me — it was too much memorization and not enough systems thinking. Math was super-clear, but chemistry — it was like a hole in my head. Molecular biology clearly wasn’t going to work out for me because chemistry is kind of a requisite for pre-med.
I’d taken an artificial intelligence course in Lisp programming, first quarter of freshman year. That was the one elective that just hopped out at me. The course was something like “cognitive science 4,” the lowest lower-division class one could take. The professor, John Batali, was the guy we all wanted to be. He looked like Clark Kent — honest to God, he looked like Superman — with a brain the size of a planet, very funny, and an amazing guitar player. We were all like, “Cool class, cool teacher. What’s going on here?”
He had just come from finishing his post-doc at MIT, which was on introspective programs, and we asked, “John, what are introspective programs?” He says, “It’s a program that not only can just do what it’s doing, but you can interrupt it at any point and ask it what it’s doing, ask it what it was just doing before, and ask it what it’s going to do next,” and we were like, “Kaboom!” Minds explode!
It was not until the end of that year that I realized that chemistry was on a collision course with my future; the train was going off the rails. I also took anthropology, which was fascinating. Then I found out that all these interesting people clustered around a department called cognitive science in this little, flunky building on the UCSD campus, very close to the computer science building and the anthropology building. It’s also close to the big central library, which looks like a rocket ship taking off.
All of this cool interdisciplinary stuff was happening, so I started taking classes like cognitive neuroscience, cognitive psychology, and even cognitive anthropology — one of the best classes I ever took, with Edwin Hutchins. They had cognitive philosophy, or philosophy of mind — not all of the philosophies that one could possibly do, but everything was through this lens of, “Hey, let’s look at the way that people have talked about and imagined thinking and the separation between the mind and the body for the last 3,000 years.” Just inescapably fascinating stuff.
All of the programming we did was artificial intelligence, of course: first, rules in expert systems, and then connectionist neural networks, Bayesian systems, and just thinking about, “how do you build computer systems that think like people?” — because that’s a very big question that goes back to Turing and the Chinese interpreter. “Can you see any difference between a person and a computer based on just input and output?” — the famous Turing test.
The question by 1990-91, when I was getting to into cog sci, was: “How do we break down human information processing?” — it was starting to be called “HIP” — “Can we look at how we form words? Can we look at Broca’s area, at Wernicke’s area, and can we see these parts of the frontal lobe? Can we start to understand how they fire when we’re actually speaking?” — cognitive psychology meets cognitive neuroscience — “Can we start to look at the morphology, at the structure of the neurons there? Then, can we start to build computer systems that behave a bit like that one part?” We’re trying to break down this whole question of, “What is artificial intelligence? Can you emulate human intelligence?” I just found all that stuff absolutely fascinating.
One of the coolest things that ever happened to me: I overheard two professors talking between classes. One was David Zipser, the head of our connectionist neural network class. He had come up with a visual interpretation — basically, a map — of this neural network he’d built to process image and video data. As he trained it on these images and video, it started to have a certain structure.
Zipser was showing this map of the structure to a neuroscience professor who was also in this interdisciplinary group in cognitive science, and says, “Hey, look at this structure — it’s kind of cool — I’ve got these big clumps and they sit out like this, and I’ve got these smaller clumps.”
The neuroscience professor goes, “What are those little dots that are located here and here?”
“Just ignore those. I think they’re anomalies.”
And the neuroscience guy goes, “No. I’ve got to show you something I just found last week. We’re looking at the human V1 V2 area in the occipital lobe, and we just found structures that look just like this!”
It was the crystallization of what we were trying to do. If we could put a consistent set of inputs into people and then to computers, then we can correlate processing algorithms. The more they look like each other, the more we know that we probably do have a reductive model of how the brain’s working in the low level. It was just like everything coming together at the same time, a huge thrill.
Alex: That’s learning at its most basic form: you’re learning about the most primitive aspects of the human psychology, but correlating it to these new capabilities to understand the way that bits and bytes all interact and become something.
Sam: That was the beauty of the program. It was an effort to reduce thinking to an atomic level, but also to keep the gestalts, and pack it back in. If we reduced it to the atomic pieces, then we could say, “Here is the chemical structure of this particular neurotransmitter, and here is how we think the signal gets passed from dendrite to nucleus to axon.” But then we’ve lost the whole; that’s just a function of the cell itself. It’s no longer thought, because thought is still the slightly fuzzy gestalt phenomenon; “How is the system working at a gross level?” That constant tension was a really creative tension of, “Can we get a level further down? Can we get a level below that?” And then, “Can the things that we see there still add up to what we see at the higher levels?” Just pretty extraordinary stuff.
I felt, and we always said, that just being able to walk into that building, we’re standing on the shoulders of giants. Don Norman, a very famous cognitive scientist, just before he became an Apple fellow — I was one of 80 or 90 students who took his last class. This is the guy who wrote “The Design of Everyday Things” and “Things That Make Us Smart” — looking at how to shape physical objects and computer interfaces so that a human user, without a manual, immediately knows how to use it.
There is a handle on one side of the teapot and there’s a spout on the other. There is no way that you pick up that structure and don’t have the spout tip and you realize, “Oh, this is for pouring things.” You wouldn’t put the handle on the same side as the spout. He offered a question: “What is it that makes us smart?” If a person makes a mistake — and, in general, people make mistakes when they interact with an object or a computer system — it’s probably a design error; it’s not a user error.
For me, that was a nice way to get into computing, because it starts with an inversion of the perspective that’s just across the quad from cognitive science — in computer science and electrical engineering — which is generally, “It’s a user error.” What’s the famous internet expression? “The problem exists between keyboard and computer.” Starting by insulting humans — not a great plan. Let’s start by saying, “All of these things exist to be cognitive prostheses, the more to augment humans.”
Alex: What led to you working at Microsoft?
Sam: It was a very strange path. Every path probably looks strange looking forward, and obvious looking backwards. For the last thing I did in cognitive science, I was convinced that one could build educational software that combined an internal model of multiple ways that students could learn. It could be watching the student as the student progressed and, more importantly, it could have a very engaging interface.
We believed that there are many people who are visual learners rather than theoretical learners. Some of us look at equations and can feel the structure of the equation: we see “F=ma” and we “get the F-ness” of it. But, let’s say you had a simulator where you could change the variables: you see a car racing down the road, and you see what happens when you increase the acceleration, and what happens with the force as the car hits the wall. Could we create these very engaging simulators?
I wanted to work in educational software to help bridge the gap for learning. I did an education programming job in San Diego, then I decided to move to Silicon Valley because that seemed to be where all the software people were.
I’m not an electrical engineering, hardware-level person. I can understand it, but that’s something that I’m never going to be good at. Qualcomm was hiring — all these people talking about things that didn’t make a lot of sense to me. Again, that was across the quad. If you’re an MCS in EE, that was what you did in San Diego — you can work at the Naval Research and Development labs where they’re building monitoring systems and stuff. I didn’t want to work on that side of the coin, so I moved to Silicon Valley, and moved to Brøderbund Software.
At Brøderbund I worked in the education studio. We built things like “Orly Draw-a-Story” — we’d bring kids’ drawings to life and let them tell stories with that. And Write Camera Action — create your own cartoon movie. The idea was to immerse somebody in a creative project and then kind of slip in, “Oh, you’re learning English,” or, “Oh, you’re learning about how music can work together.” The last thing I worked on there was the sequel to Myst.
Then I moved into distributed engineering. I got tired of being poor. We got married and had a kid very early, so I realized I needed to go and do better. I was fascinated by distributed systems because they worked like neurons and brains. I thought, “How do I get into distributed systems and distributed engineering?” Then I got a great opportunity to work for Fair Isaac.
Then I moved on to a startup, because the Internet was popping up in ’97. I went to work for a tiny company that a few of my friends started, called NetStudio. We won best of show in Internet World 98, which was fun. Like many companies that did amazing things back then, we had no head for business and we cratered a year later. The first of many startups — it’s definitely a bug that you catch — so, a few other companies after that, more distributed systems — I ended up working for an AI company, a spin-off of Inference called Brightware.
Let’s see, what else? There was an early precursor of cloud — a company that was built up by a executive who had worked at USinternetworking. Remember the whole MSP era? They tried to figure out, “How do you get economies of scale? Why does every company have to run their own copy of SAP? Maybe we could run all of their SAP systems and we could put them all in this central managed service and make a business out of that.”
That ended up being way too hard. The company was also not successful doing that because, like most of the MSPs, they realized that the structure of the software had to be fundamentally different in order to scale and to do multi-tenancy. I don’t think we had the clarity of language. We didn’t understand the pattern well enough back then, but we could feel that it wasn’t quite working. We could describe it; I’m sure we wrote papers and explained it in way too many words. As proto-cloud service providers — as MSPs — we were trying to make the software behave as if it was multi-tenant, because multi-tenancy is really something that you do for economics, and the economics of MSPs were broken because they didn’t have multi-tenancy. That was an interesting experience.
I feel like Forrest Gump with all these different twists and turns, but I learned a lot of different things in a short amount of time. Then, I ran engineering for Ofoto, which was Instagram ten years earlier and worth about $950 million less. That was pretty interesting. That was one of my introductions to Microsoft as partner.
When I was at Fair Isaac, I was a DCOM architect. I went to Don Box’s brilliant Guerilla COM class. Don Box was one of the fathers of COM and SOAP and DevelopMentor. He did this class, which was 12-plus hours a day, down in Los Angeles, for five days. It was super-intense, cram-your-head-full-of stuff — awesome stuff. We would do distributed object marshalling, and he said, “You’re going to marshal the object, not onto a wire — you’re going to put it onto a drive. You’re going to put a COM object on a floppy, and then you’re going to put it in another computer and you’re going to deserialize it,” just to show the very atomic basics of how this stuff works. It blew all our minds.
Much later, at Ofoto, Microsoft launched HailStorm. So, HailStorm was this idea that Microsoft would move into what we now call “cloud computing,” and have a whole set of services that would make it easy to build websites and web businesses. Identity is a classic one, and that was called Microsoft Passport — probably the only survivor of the HailStorm services. They had some early ideas around how to do storage, how to do commerce.
The big hole in the problem was that Microsoft felt that they could basically charge a tax to the Internet. They could get everybody to use HailStorm services, including identity, and then you’d pay them some amazing percentage — it was like 15 percent, as I recall — I could be wrong. I remember as we walked away we said, “This is really clever stuff on the technical side — looks kind of usable — but flabbergasting on the business terms.” I think pretty much everyone just walked away. So that was Ofoto.