What Wikipedia’s Infrastructure Is Like Behind the Firewall
KubeCon+CloudNativeCon sponsored this podcast as part of a series of interviews with Kubernetes end users. Listen to the previous story about how Conde Nast created a unified infrastructure platform based on Kubernetes.
The Wikimedia Foundation‘s impact on culture and media sharing has had immeasurable benefits on a worldwide scale. As the foundation that manages the fabled Wikipedia, Wikimedia Commons, Wikisource and a number of outlets, Wikimedia’s mission is to “to bring free educational content to the world.”
All told, Wikipedia alone is available in about 300 different languages with more than 50 million articles on 1.5 billion unique devices a month with 6,000 views a second — with 250,000 engaged editors, said Chase Pettet, senior security architect, Wikimedia Foundation.“Editors are sort of the lifeblood of the movement,” Pettet said.
In this, The New Stack Analysts podcast, hosted by Alex Williams, founder and editor-in-chief of The New Stack, and Ken Owens, vice president, cloud native engineering for Mastercard, Pettet discussed Wikimedia’s infrastructure-management challenges, both past and present, and what makes one of the world’s foremost providers of free information tick.
Chase Pettet – What Wikipedia’s Infrastructure Is Like Behind The Firewall
Today, the Wikimedia Foundation has been actively engaged in what it calls the “2030 Plan.” This is “in partnership between the foundation and the board and the community and everyone coming together to figure out where this thing should go and in what direction,” Pettet said. “What I’ve read and the way I understand it is we want to become a platform for the facilitation of free knowledge, which lends itself to the whole hosting cloud native, data locality thing.”
The infrastructure part of such a massive-scale project includes the standard requisite CI- checking, code review and other processes, with a reliance on Puppet, Pettet said. For security, a lockdown-perimeter approach is emphasized. “Essentially, everything knows what the firewall rules should be,” said Pettet. “And that gets dynamically instantiated and managed.”
The use of OpenStack is also seen as playing a key role in allowing the technical community to continually innovate, Pettet said. For Kubernetes, for example, OpenStack and Kubernetes “are super-fascinating because, in essence, they sort of solve the same problem: they are an orchestration and scheduling and a resource-constraint platform,” Pettet said. “It’s just they have different native building blocks,” he said.
Pettet also described the Wikimedia Foundation’s relationship with OpenStack as a “beleaguered marriage.” “I know how much it costs to try to carry multiple rocks on my shoulders. And maybe that’s a little bit of the voice of experience, if I can say that,” Pettet said. “But primarily, if you’re going to solve the same problem across multiple verticals, you should be really, really sure about why and have a way to figure out whether it’s worth it.”
The Wikimedia Foundation uses Kubernetes to, among other things, run a relatively finite number of services that support the main sites, Pettet said. However, MediaWiki does not run on Kubernetes, for example, he said.
“There’s a lot more to that problem than just containerized single web applications, because one of the things about MediaWiki is the expectation that I can install MediaWiki on my home server, and have a perfectly functional wiki that would hopefully include search components and everything else,” Pettet said. “But then also, we are running that same application at the huge scale previously mentioned, and having something that could perform both functions is difficult and complicated.”
Pettet also mentioned Wikimedia’s commitment to privacy remains steadfast despite outside pressures to collect information about its users. “The foundation takes privacy really seriously,” he said. “Privacy is a thing that we intend to be a leader in and intend to walk the walk on.”
Wikimedia, for example, does not keep users’ weblog data for more than 90 days, some of which is only stored for 30 days, Pettet said. “In the greater internet age, that’s sort of unheard,” said Pettet.
Join us at KubeCon + CloudNativeCon Virtual where we will talk with technologists about the DevOps movement in the age of automation. Register now!
At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: firstname.lastname@example.org.