Research / Technology /

Perl is Back and Ready to Roll with the Big Data Crowd

4 Dec 2015 10:39am, by

This month will see the launch of Perl 6 — the long-awaited release of a brand new language that’s been more than 15 years in the making. But it’s billed as a “sister language” to the 28-year-old Perl, which is also still being actively developed — separately, as a distinct language.

This strange moment in time finds both Perl communities going strong, writing new Perl code for Hadoop and Docker applications as we creep up on Perl’s fourth decade. And it all serves as a reminder that in our modern world of big data and cloud-based applications, there’s still a place for Perl.

“Perl has a huge community of avid users that continues to thrive in spite of detractors,” said Brian Kelly, the main developer behind a configuration management tool written with Perl 5. “This community — like communities for every language in computing, is being pulled into the Big Data world, like it or not.”

Back in 2009, one “Perl Mongers” meetup was already discussing how to use Perl on data being streamed from Hadoop using the simple standard input (STDIN) filehandle, and Perl is still being used for big data applications. “Many of our clients in the finance industry are using ActivePerl to pull data from various databases and process it,” said Tom Radcliffe, the Director of Engineering at ActiveState, which distributes its own commercial version of Perl. “It’s being used as ‘big-data lite’ or as a way to load up big-data Hadoop systems.”

Boeing, Siemens, and CA are just some of the Fortune 1000 companies listed as users of ActiveState’s distribution of Perl. ActiveState’s Web site notes that “As an open source programming language, Perl immediately reduces up-front project costs.” And Radcliffe said that just their own distribution of Perl has been seeing over one million downloads every year, and in 2015 had already passed the million-download mark in November.

Current state of Perl: TNS research of Stack Overflow has found that 59 percent of Perl users do not plan to use it in the future and  that certain languages (Python, C, C++, SQL) are much more likely to be on their roadmaps as compared to other developers. (LH)

Current state of Perl: TNS research of Stack Overflow has found that 59 percent of Perl users do not plan to use it in the future and that certain languages (Python, C, C++, SQL) are much more likely to be on their roadmaps as compared to other developers. (LH)

Last February Booking.com announced that they’d be hiring 100 Perl developers, and it turns out they’re not the only ones. A recent search on the technical jobs site Dice.com found nearly 4,000 recent new listings that contained the word Perl.

Those listings were at a wide variety of companies — tech companies like eBay, Amazon, and Oracle, as well as financial companies like JP Morgan Chase and Capitol One, and even media companies like Bloomberg L.P. and Dreamworks Animation.

Ultimately Perl was mentioned in one out of every 22 job listings — 3,939 out of 87,543. For comparison, there were 2916 job listings that mentioned Ruby — and 17341 that mentioned Java.) In fact, the TIOBE Index has been trying to calculate the popularity of programming languages since 2002 by simply counting up how many different search engine results are returned for each language’s name. In November, they reported that Perl is still one of the top 10 languages by that measure — bumped down a notch by the rise of Visual Basic .Net and Ruby into the #8 and #9 spots, but still above Swift and Objective-C, which has dropped all the way from #3 down to #14 over the last 12 months.

A study from the App Developers Alliance shows Perl knowledge is quite high, suggesting that the language has a latent talent base that can be reenergized. (LH)

A study from the App Developers Alliance shows Perl knowledge is quite high, suggesting that the language has a latent talent base that can be reenergized. (LH)

More importantly, the index actually shows a steady rise in Perl’s share of the conversation since February of 2014, a sudden reverse from a slow nine-year decline. Perl’s position on the Index was #5 in 2005, and #7 in 2010. Even in our big data world, “I see Perl playing the role it has always played,” said Perl developer Kelly, “and in my opinion plays better than anything else – that of glue language. That strength, the strength of it’s never exceeded regular expression capabilities, and it’s never surpassed CPAN repository, positions Perl as a significant player in the Big Data arena.”

CPAN logo

Perl has always been able to connect Unix command line tools and automate the execution of specific tasks, but it’s that Comprehensive Perl Archive network (CPAN) that remains one of Perl’s biggest advantages. “Among software repositories, I consider CPAN the gold standard,” said Lambert Lum, one of the organizers of the Silicon Valley Perl User’s Group. “You download a module, and all dependent modules are downloaded. Then you run it, and you expect it to run without error, despite the fact that the hundreds of dependent modules come from different authors.

Distributed across 233 mirror sites, CPAN hosts 157,982 ready-to-use modules of Perl code that have been written over the last 20 years by over 12,000 different contributors — including vigorously-checked libraries that help developers with big data.

“Following Perl’s strong tradition of testing, every distribution of modules uploaded to CPAN is automatically tested on a variety of platforms and Perl configurations,” explains Perl’s official website. “[M]embers of the community have set up a testing network and donate their resources to making every Perl module as robust as possible on every platform they have available.”

Since 1995, the easy ability to add new code in a central repository has made it possible to keep Perl fresh and relevant over the decades. “It has also turned out to compensate to a great extent for things that Perl 5 itself cannot do natively,” noted Carl Masak, one of the programmers actively working on the Perl programming language. He points to a slogan coined by Perl developer that Matt Trout — that “CPAN is my language.”

Meet the New Perl

The story of Perl takes a new twist this month, with the planned release of version 1.0 of Perl 6. Beta versions of Perl 6’s Rakudo compiler are now already being released, culminating an epic 15-year development cycle. (Perl 6 was first announced in 2000). Literally hundreds of developers have contributed to Perl 6 over the years, according to Carl Masak, with a team of dozens who still have “direct commit rights” and a solid understanding of the internals. One recent Sunday afternoon the official IRC channel for Perl 6 showed over 250 people had joined the channel. And there’s already 464 Perl 6 modules that are available for download on a new comprehensive Perl 6 archive, being hosted at proto.perl6.org.

Rakudo logo

Developers stress that Perl 6 and Perl 5 are actually two separate languages, though there’s plans to eventually create a translator (for most Perl 5 code) into its equivalent Perl 6 syntax. Perl 6’s compiler can also be run in “Perl 5 compatibility mode” which will recognize and execute Perl 5 code. The Inline::Perl5 module attempts to import Perl5 code into Perl6, suggesting a whole new application for the existing Perl 5 code base from the last 20 years. And Perl 6 can also load modules that were written in other languages — just one of its intriguing new features.

Perl was already celebrated for its ability to recognize patterns in strings, but Perl 6 extends that capability by grouping related regexps together into a meaningful and reusable set. “Grammars allow people to collect regular expressions into classes,” Masak said. “It’s great when you want to parse a data format or a structured language,” said Masak, who believes Perl 6’s support for grammars is “unparalleled” when compared to that of other languages.

But Perl 6 will also be able to recognize sequences. In October Perl’s original creator, Larry Wall demonstrated the feature at a special preview at San Francisco’s Exploratorium. The range 1, 2, 4 … 2**32 spans over 4 billion numbers, but the sequence operator (…) recognizes the pattern —all the powers of two — and generates the appropriate list of just 33 numbers. And there’s also built-in operators to find the greatest common denominator for two numbers, as well as their least common multiple (gcd and lcm, respectively).

https://en.wikipedia.org/wiki/Larry_Wall#/media/File:Larry_Wall_YAPC_2007.jpg

Larry Wall YAPC 2007” by Randal Schwartz. Licensed under CC BY-SA 2.0 via Commons.

But users can even define their own operators. “We really make no distinctions between built-in and user-defined,” Larry Wall told the San Francisco audience. And while most languages have a hard-wired hierarchy for which operations will be performed first, even that’s customizable in Perl 6. Perl 6 is also arriving with a special subset of the language that’s called Not Quite Perl (or “NQP”) — a small, low-memory “Perl 6-like environment” to make it easier to create code libraries for virtual machines.

And Perl 6 will also eventually compile to the Java Virtual Machine, which at least one developer believes could make it even more popular in the big data space. “If speaking specifically about Hadoop, forks of the Python (Jython) and Ruby (Jruby) language have the advantage of running within the Java JDK,” said Adam Faris, who developed the Perl 5 module Apache::Hadoop::WebHDFS. Noting that those two languages are really running as Java byte code, they have access to Hadoop’s native Java APIs (with the JVM performing additional memory management). But when Perl 6 compiles to the JVM, it will achieve similar performance results.

Larry Wall famously studied as a linguist, and it’s evident he put a lot of thought into the structure of the new language. This new language is multi-paradigm — it supports procedural programming, as well as functional and object-oriented programming, with built-in methods already available for integers, arrays, and any new variables that get declared in the code. (“Everything is an object, but only if you want it to be,” Wall quipped in October to the audience in San Francisco.) Perl 6 developer Carl Masak describes it as an “opt-in” type system. “You don’t have to declare your types. You do it when and where it makes sense. But under the hood, Perl 6 is typed regardless, and can optimize your code based on types.”

Wall told the crowd in October that he also hoped academia would be interested in using Perl 6 to educate future generations of programmers, and not just because it’s free and Open Source. Perl 6 “scales” well — it welcomes new users with its forgiving syntax while also allowing much more complex and sophisticated calculations — and it allows users to program in many different programming paradigms (which should be attractive to educators). The original slogan of Perl 5 celebrated its flexibility — “There’s more than one way to do it” — and Perl 6 now boasts a new slogan touting its design goals. “Easy things should stay easy, hard things should get easier, and impossible things should get hard.”

With hundreds of developers volunteering their time to contribute to both languages, Perl continues its ongoing evolution to keep up with the times. There’s already a Docker image which includes Perl 6’s Rakudo compiler (with some other additional modules), and that’s not the only possible application. “The potential for Enterprises to use Grammars to build efficient and expressive domain specific languages that interoperate seamlessly with Perl 6 code and libraries is particularly interesting,” said Tom Radcliffe, the director of engineering at ActiveState.

“It’s a new language with a lot of powerful features, from major structural components like Grammars to implementation details like auto-threading operators. Like much of the Perl 5 community, we are watching Perl 6 with interest.”

ActiveState and Docker are sponsors of The New Stack.

Feature Image: Camelia by Perl 6.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.