Will JavaScript type annotations kill TypeScript?
The creators of Svelte and Turbo 8 both dropped TS recently saying that "it's not worth it".
Yes: If JavaScript gets type annotations then there's no reason for TypeScript to exist.
No: TypeScript remains the best language for structuring large enterprise applications.
TBD: The existing user base and its corpensource owner means that TypeScript isn’t likely to reach EOL without a putting up a fight.
I hope they both die. I mean, if you really need strong types in the browser then you could leverage WASM and use a real programming language.
I don’t know and I don’t care.
Data / DevOps / Software Development

Why Data Science Teams Should Be Using Pair Programming

Pair programming is common in software engineering, less so in data science. This is a missed opportunity. Here are three ways pair programming benefits data science teams.
Aug 11th, 2023 6:20am by
Featued image for: Why Data Science Teams Should Be Using Pair Programming
Image from Inside Creative House on Shutterstock

Data science is a practice that requires technical expertise in machine learning and code development. However, it also demands creativity (for instance, connecting dense numbers and data to real user needs) and lean thinking (like prioritizing the experiments and questions to explore next). In light of these needs, and to continuously innovate and create meaningful outcomes, it’s essential to adopt processes and techniques that facilitate high levels of energy, drive and communication in data science development.

Pair programming can increase communication, creativity and productivity in data science teams. Pair programming is a collaborative way of working in which two people take turns coding and navigating on the same problem, at the same time, on the same computer connected with two mirrored screens, two mice and two keyboards.

At VMware Tanzu Labs, our data scientists practice pair programming with each other and with our client-side counterparts. Pair programming is more widespread in software engineering than in data science. We see this as a missed opportunity. Let’s explore the nuanced benefits of pair programming in the context of data science, delving into three aspects of the data science life cycle and how pair programming can help with each one.

Pairing to Discover Creatively

When data scientists pick up a story for development, exploratory data analysis (EDA) is often the first step in which we start writing code. Arguably, among all components of the development cycle that require coding, EDA demands the most creativity from data scientists: The aim is to discover patterns in the data and build hypotheses around how we might be able to use this information to deliver value for the story at hand.

If new data sources need to be explored to deliver the story, we get familiar with them by asking questions about the data and validating what information they are able to provide to us. As part of this process, we scan sample records and iteratively design summary statistics and visualizations for reexamination.

Pairing in this context enables us to immediately discuss and spark a continuous stream of second opinions and tweaks on the statistics and visualizations displayed on the screen; we each build on the energy of our partner. Practicing this level of energetic collaboration in data science goes a long way toward building the creative confidence needed to generate a wider range of hypotheses, and it adds more scrutiny to synthesis when distinguishing between coincidence and correlation.

Pairing for Lean Experimentation

Based on what we learn about the data from EDA, we next try to summarize a pattern we’ve observed, which is useful in delivering value for the story at hand. In other words, we build or “train” a model that concisely and sufficiently represents a useful and valuable pattern observed in the data.

Arguably, this part of the development cycle demands the most “science” from data scientists as we continuously design, analyze and redesign a series of scientific experiments. We iterate on a cycle of training and validating model prototypes and make a selection as to which one to publish or deploy for consumption.

Pairing is essential to facilitating lean and productive experimentation in model training and validation. With so many options of model forms and algorithms available, balancing simplicity and sufficiency is necessary to shorten development cycles, increase feedback loops and mitigate overall risk in the product team.

As a data scientist, I sometimes need to resist the urge to use a sophisticated, stuffy algorithm when a simpler model fits the bill. I have biases based on prior experience that influence the algorithms explored in model training.

Having my paired data scientist as my “data conscience” in model training helps me put on the brakes when I’m running a superfluous number of experiments, constructively challenges the choices made in algorithm selection and course-corrects me when I lose focus from training prototypes strictly in support of the current story.

Pairing for Reproducibility

In addition to aspects of pair programming that influence productivity in specific components of the development cycle such as EDA and model training/validation, there are also perhaps more mundane benefits of pairing for data science that affect productivity and reproducibility more generally.

Take the example of pipelining. Much of the code written for data science is sequential by nature. The metrics we discover and design in EDA are derived from raw data that requires sequential coding to clean and process. These same metrics are then used as key pieces of information (a.k.a. “features”) when we build experiments for model training. In other words, the code written to design these metrics is a dependency for the code written for model training. Within model training itself, we often try different versions of a previously trained model (which we have previously written code to build) by exploring different variations of input parameter values to improve accuracy. The components and dependencies described above can be represented as steps and segments in a logical, sequential pipeline of code.

Pairing in the context of pipelining brings benefits in shared accountability driven by a sense of shared ownership of the codebase. While all data scientists know and understand the benefits of segmenting and modularizing code, when coding without a pair, it is easy to slip into a habit of creating overly lengthy code blocks, losing count on similar code being copied-pasted-modified and discounting groups of code dependencies that are only obvious to the person coding. These habits create cobwebs in the codebase and increase risks in reproducibility.

Enter your paired data scientist, who can raise a hand when it becomes challenging to follow the code, highlight groups of code to break up into pipeline segments and suggest blocks of repeated similar code to bundle into reusable functions. Note that this works bidirectionally: when practicing pairing, the data scientist who is typing is fully aware of the shared nature of code ownership and is proactively driven to make efforts to write reproducible code. Pairing is thus an enabler for creating and maintaining a reproducible data science codebase.

How to Get Started

If pair programming is new to your data science practice, consider a data science course, and we hope this post encourages you to explore pair programming with your team. At Tanzu Labs, we have introduced pair programming to many of our client-side data scientists and have observed that the cycles of continuous communication and feedback inherent in pair programming instill a way of working that sparks more creativity in data discovery, facilitates lean experimentation in model training and promotes better reproducibility of the codebase. And let’s not forget that we do all of this to deliver outcomes that delight users and drive meaningful business value.

Here are some practical tips to get started with pair programming in data science:

  • Synchronize schedules: Full-time pairing is easiest when participants start and end at the same time. This allows you to maximize your pairing time, as well as to stay on the same circadian rhythm. If this is not possible, for instance, due to time zone differences, define what hours you will be pairing.
  • Set up a pairing station: If you are pairing in person, set up a workstation where two monitors, two mice and two keyboards are attached to the same computer. If you are working remotely, ensure you have access to a videoconferencing tool with great screen-sharing technology, especially one that allows remote control. This will help both parties to stay engaged and make collaboration much smoother.
  • Practice empathy: Pairing with someone throughout the workday is immensely fun and exhilarating when both pairs are actively listening, validating each other’s thoughts and perspectives and engaging in acts of kindness.
  • Take breaks: Pairing is an intensive approach to writing code and requires continuous concentration and communication. Don’t forget to take frequent breaks when pairing to unwind, recharge and get back at it again.
Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.