The New Stack https://thenewstack.io/ Wed, 08 Feb 2023 21:31:46 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 New at Civo Navigate: Making Machine Learning Set up Faster https://thenewstack.io/new-at-civo-navigate-making-machine-learning-set-up-faster/ Wed, 08 Feb 2023 20:02:25 +0000 https://thenewstack.io/?p=22699878

TAMPA, Fla. — Of the time it takes to set up a machine learning project, 60% is actually spent performing

The post New at Civo Navigate: Making Machine Learning Set up Faster appeared first on The New Stack.

]]>

TAMPA, Fla. — Of the time it takes to set up a machine learning project, 60% is actually spent performing infrastructure engineering tasks.

That compares to 20% doing data engineering, Civo Chief Innovation Officer Josh Mesout, who has launched 300 machine learning (ML) models in the past two and a half years, said at the Civo Navigate conference here on Tuesday.

Civo hopes to simplify machine learning infrastructure with a new managed service offering, Kubeflow as a Service, which it says will improve the developer experience and reduce the time and resources required to gain insights from machine learning algorithms.

Civo’s Machine Learning Proposition

The Kubernetes cloud provider is betting that developers don’t want to deal with the infrastructure piece of the ML puzzle. So its new offering will run the infrastructure for ML as a managed service, while supporting open source tools and frameworks. It believes this will make ML more accessible to smaller organizations, which it said are often priced out of ML due to economies of scale.

It’s an interesting proposition since Mesout also argued that machine learning typically deploys on-premise rather than in the cloud.

“The common misconception — I have strong and long arguments with people on this — is that machine learning doesn’t end up in the cloud,” he said. “It ends up on-premise … The reason for that is that clouds’ elastic type of workload is great for service modeling, but if you’re going to scale your machine learning to the point where you’ve got 100,000 people putting it out there, you probably never get it done.”

It also is difficult to justify from a return on investment perspective, whereas ROI on-premise “makes a lot of sense,” he said.

Addressing Security Concerns

Companies have also expressed security concerns about ML and data, according to an Anaconda survey cited by Mesout. The other piece of the puzzle is that companies don’t want more proprietary options: They want open source tooling, he said, because it gives them more autonomy by avoiding vendor lock-in and because the economics are better, he added.

Finally, companies say they haven’t used the cloud because, they claim, it’s too difficult to hook up ML projects to other architectures, Mesout added.

“As a cloud native company, we love the concept of Kubernetes to overcome that,” he said. “We’ve looked a lot of different ways when we’re trying to solve a problem and we think backing open source, instead of fighting against it and building closed source proprietary tool ecosystem, is the solution.”

Often in the cloud, companies are paying for 100% of GPU while machine learning models may only be using 24%. So Civo is looking at lowering cost by 20% to help companies make money on their ML projects, he added.

The company is also working with Defense.com for security, he added. It also will offer an integrated development environment and support core Kubeflow components.

Kubeflow as a Service is currently in private alpha.

Civo Platform for Developers

One of the complaints expressed at the conference was that Kubernetes tooling is currently too “in the weeds” for developers, who want more abstracted and, frankly, friendlier tools. Fifty-four percent of developers view the complexity of Kubernetes as slowing them down, said Dinesh Majrekar, Civo’s chief technology officer (pictured above).

Civo announced its new Civo Platform, a Platform as a Service (PaaS) offering that purports to address that need. The platform offers developers an “affordable, flexible and scalable framework for running and developing applications in the cloud,” the company stated in a press release. Each managed Civo platform application is deployed to its own Kubernetes cluster, Majrekar said.

The new PaaS incorporates a Software Bill of Materials tool that allows users to produce a verified record of all components within the software. (In May 2021, the Biden administration issued an Executive Order requiring SBOMs to be provided for any software purchased by or used on behalf of the U.S. government.)

The problem with PaaS offerings, however, is the project can outgrow the platform, leaving developers to face rebuilding somewhere else. To ensure that doesn’t happen on its platform, Majrekar said, Civo supports easily switching from the PaaS to a fully managed Kubernetes service with the click of a button.

New Open Source Standard Proposed

Civo also announced Open Control Plane, or OpenCP, which it hopes to make an open source specification. It includes YAML stack specifications well as the ability to use Kubectl to interact with cloud providers. Majrekar told The New Stack that, if adopted, users would be able to switch cloud providers without rebuilding.

For instance, a network is called a “VPC ” on Amazon Web Services but a “network” on Civo. An end user could use the term “network” and the cloud provider would adjust based on the standard. It also does not rely on plugins, he added.

“We don’t want to hold your data for ransom,” Majrekar said. “There’s no egress charges for your data.”

The company is hoping other cloud providers will feel the same way, as it plans to promote the standard to them.

Civo also announced a new on-premise option called CivoStack at the Edge. It will be shipped to end-user companies for on-premise use and will incorporate security, be available in multiple sizes and provide automated atomic backups, which means it will continue a backup if not successful the first time. It uses Role Based Access Control (RBAC) based APIs for access, and the same hardware and the simplicity of Civo’s public cloud, Majrekar added.

Finally, Civo revealed its Edge Manage offering, which is a cloud-based solution to manage on-premise, Internet of Things devices. Edge Manage is Talos-based and priced at a per-device cost.

Related to the edge offering, the company is planning to expand its data center to more regions, so clusters and data will be closer to end users. It currently works out of London, New York and Frankfurt, Germany, with an expansion planned in Phoenix.

The post New at Civo Navigate: Making Machine Learning Set up Faster appeared first on The New Stack.

]]>
4 Ways Cloud Visibility and Security Boost Innovation https://thenewstack.io/4-ways-cloud-visibility-and-security-boost-innovation/ Wed, 08 Feb 2023 16:58:40 +0000 https://thenewstack.io/?p=22699790

Organizations want to move to the cloud to drive innovation and increase the speed at which they can adapt to

The post 4 Ways Cloud Visibility and Security Boost Innovation appeared first on The New Stack.

]]>

Organizations want to move to the cloud to drive innovation and increase the speed at which they can adapt to ever-changing customer desires. That’s also a major reason why they are ultimately successful in adopting cloud technologies. With the increased focus on innovating at higher velocities, the rest of the organization has to drastically change how they do things just to keep up. They have to adhere to requirements ranging from identity management to regulations to committed service-level agreements. Innovation is amazing, but if it is not done in a sustainable way, it can cause more problems than it solves.

Visibility in the Cloud

An organization adopting cloud technologies requires full visibility into the cloud platforms and services that it uses. Cloud visibility involves access to all available telemetry data, including logs, policies, metrics and trace data generated by the platform and services.

Achieving true cloud visibility is vital to ensuring that the organization is running a stable and secure environment that will keep customers happy. Not only will this encourage them to stick around, but it can even inspire them to refer new customers to the organization.

Secure Innovation in the Cloud

There are many ways that cloud visibility will help an organization. We’ll discuss four of the most common benefits below.

Embracing Flexibility

The cloud is often sold as the best way to give organizations more flexibility to adapt to market changes faster. They offer the ability to apply changes to services on the fly without requiring downtime for the change to take effect. However, this same benefit can be a huge drawback if someone applies a change to the wrong place or with the wrong criteria.

This flexibility can also expose the organization to risk in real time. Attackers are continually scanning cloud providers and large organizations to look for weaknesses. They can find even the most innocent mistake, such as when a vendor wants access to an internal-facing development system to help diagnose a problem, and someone opens up the system to the internet because it’s the fastest way to help. I mean, it’s only development — what can go wrong? If you have visibility into your cloud network policies, a change like this will become known immediately. This enables options like automated remediation and notification so that the proper team is alerted about the change.

Regulatory Compliance

Regardless of your chosen industry, there seems to be a constant flow of regulations that you’re expected to adhere to, from FDIC guidelines to the GDPR. If you don’t fully comply with them, it can result in fines and lost customer trust.

Some regulations and other government guidelines require that you take specific steps to ensure separation of duties between certain groups of employees in addition to requiring that you follow best practices like the principle of least privilege. As part of your organizational setup, there should be well-defined roles that follow these rules. Monitoring these roles to ensure that they adhere to the proper policies will not only reduce compliance risks, but will also help identify potential attacks in progress if someone in the quality testing group is suddenly given access to production databases.

Vulnerability Detection and Management

Every year, it seems like new vulnerabilities are discovered in a growing number of products and services. Since developers now have the ability to dynamically deploy new applications without having to go through the formal (and traditionally manual) review process, it is vital that they have the capability to detect — and react to — newly discovered vulnerabilities across an ever-increasing number of applications.

Organizations usually empower their developers in this way through their DevOps practices and the use of agile methodologies. Typically, they use CI/CD pipelines in which they can embed scanning tools that will hopefully find vulnerabilities and block application deployments if anything critical is found. In addition, cloud security tools often contain products that are capable of detecting new applications running in an environment and continually scanning them for anomalies while they’re running. They can also detect whether newly discovered CVEs are applicable.

Data Management and Privacy

There are two sides to data management and privacy. The first is simply following the rules as outlined in laws like the EU’s General Data Protection Regulation (GDPR), Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) and the California Consumer Privacy Act (CPRA). The second side is preventing accidental disclosures — not to mention malicious ones.

If you can’t see the issue, you’ll never find it — or fix it. As Supreme Court Justice Louis Brandeis once said, “Sunshine is the best disinfectant.”

Conclusion: Leveraging Cloud Visibility for Maximum Innovation

As we have outlined in this post, there are several reasons why cloud visibility is vital to the success of your organization’s cloud journey. Innovation necessitates the ability to move fast. But just because you move fast doesn’t mean you can’t do things safely and securely. Ensuring that your chosen cloud platform continues to follow best practices and operate under the parameters you’ve specified is key to achieving an environment that will earn and keep your customers’ trust. Deploying a platform that’s capable of providing multicloud visibility — like the one offered by Orca Security — is a great place to start if you want to gain the cloud visibility and security you need to be successful.

Further Reading

The post 4 Ways Cloud Visibility and Security Boost Innovation appeared first on The New Stack.

]]>
How Foursquare Transformed Itself with Machine Learning https://thenewstack.io/how-foursquare-transformed-itself-with-machine-learning/ Wed, 08 Feb 2023 16:25:31 +0000 https://thenewstack.io/?p=22699852

The last time I wrote about Foursquare was over ten years ago, when it was just a location check-in app

The post How Foursquare Transformed Itself with Machine Learning appeared first on The New Stack.

]]>

The last time I wrote about Foursquare was over ten years ago, when it was just a location check-in app for consumers. At that point, July 2012, it was trying to take on Yelp in the consumer recommendations market. Several pivots later and Foursquare is now squarely an enterprise service. It markets itself as “the leading cloud-based location technology platform for unlocking the power of places and movement.” Among its customers is Uber, which uses Foursquare’s ML-enhanced data to help pinpoint a user’s exact location.

To learn more about how Foursquare is using machine learning, I talked to the leader of Foursquare’s “Places Data Science” team, Zisis Petrou. He joined the company as a data scientist in 2017, soon after it had pivoted to the enterprise market. That was also the year Foursquare announced its Pilgrim SDK, an enterprise software development kit based on the eight years of data it had accumulated. By this time, Foursquare was harvesting phone sensor data and other automated forms of location tracking (all opt-in for the consumer), in addition to manual check-ins.

“Sometime in the middle of the 2010s, focus switched [to] enterprise users,” Petrou explained. “And then, of course, we decided to make use of the rich [store of] content of places that we have and also the unique potential to identify users — both visits and getting feedback from users.”

As noted above, Uber uses Foursquare data to identify the precise location of places a user might request to go to. Petrou put himself in the shoes of an Uber driver to help explain.

“So if a client asked me to go to a specific place of interest — a specific restaurant or a specific theater or something — it is very possible that under the hood, Uber’s hitting our API or uses data that we provide it as a flat file, and gets the location coordinates from there.”

How Foursquare Does Machine Learning

Foursquare recently announced an enhanced version of its Geosummarizer ML model, which it claims “increases the accuracy of a point of interest by a substantial 20%.” Geosummarizer, the company explained, “is the model that selects a final Lat/Long for a POI [point of interest] based on an analysis of geocodes from various inputs within its cluster.”

Foursquare says it now has over 120 million POIs in its databases. According to Petrou, this data comes from a variety of sources. It continues to get data from users of the apps it has (more on that in a minute), but it also purchases data from third parties, and crawls the web for even more information. This is where the ML comes in — it combines those separate pieces of data to come up with a precise calculation (Lat/Long) for a POI.

“In order to get to the best representation of the attributes of [a] place, like the address or […] the geocodes — the geographical coordinates — we apply a process that we call summarization. Which is, we take into account all the information that comes from different sources for the same POI. We reach what we consider to be the optimal final attribute.”

Foursquare

Image via Foursquare

Under the hood, he continued, is a machine learning pipeline that “takes into account the confidence that we have assigned after careful evaluation of all these sources, [and] takes into account geospatial information that we have as well.” The algorithm, he added, “predicts the optimal lat/long based on all these inputs.”

I asked specifically what type of ML techniques it uses for this? Petrou replied that it’s “a supervised learning algorithm — it’s a problem that we have formulated under the hood as a regression problem.”

So essentially, Foursquare assigns a predicted score to each potential POI, ranks them, and the algorithm ultimately picks “the one with the highest probability as the final representation of the place.”

Petrou says it uses Python libraries like scikit-learn and PyTorch, as well as “libraries that we have developed in-house as well.”

Foursquare’s Consumer Apps Are Still Used

Getting back to the sources of data, I was surprised to hear that Foursquare still gathers significant data from its consumer apps. I’d long ago deleted the original Foursquare check-in app from my phone, but after checking the iOS App Store, I saw that it has two current apps: Foursquare Swarm (the current name for its “lifelogging” check-in app) and Foursquare City Guide (for “restaurants and bars nearby”). The latter is actually the original app, as I discovered when I downloaded it again and saw my old check-ins — most of them over a decade old.

Re-installing Foursquare’s app and seeing the photos I’d uploaded brought back memories of places I used to go to regularly — I used to be the “mayor” of a cafe in Petone, New Zealand, called Go Bang Expresso. Sadly, that cafe no longer exists (and also, I’ve moved to the other side of the planet). But actually, Foursquare isn’t just about memories for old timers like me — its current apps are still well-used, said Petrou.

“Yeah, we still get consumer data,” he said, sounding almost a little insulted that I’d asked. “We identified that this is one of the unique elements of the data that Foursquare has — the ability to have people on the ground willingly supporting us with data, willingly finding the place that they want to check in. And if they notice that it is a little outside […] of the coordinates that they have, or [it’s] the building next door, they send us votes (as we call them) for moving the place into the right part of the building or of the block. So this is something unique that we decided to keep focusing on and keep using because it’s something that differentiates us from other [companies] in the industry.”

Foursquare doesn’t officially release usage numbers for its consumer apps, but a CNBC report from last year quoted the company as having “9 billion-plus visits monthly from 500 million unique devices.”

Conclusion

Regardless of whether you still use Foursquare’s check-in apps, there’s no doubt that Foursquare has amassed an incredible store of location data since launching in 2009. Not only that, it’s using machine learning to enhance the underlying data — making it even more valuable to third parties like Uber.

I never thought I’d write this, but Foursquare is a model enterprise technology company in 2023. Certainly, a pivot well executed, plus a good case study for how to use ML in this era.

The post How Foursquare Transformed Itself with Machine Learning appeared first on The New Stack.

]]>
Cloudy with a Chance of Malware – What’s Brewing for DevOps? https://thenewstack.io/cloudy-with-a-chance-of-malware-whats-brewing-for-devops/ Wed, 08 Feb 2023 15:55:58 +0000 https://thenewstack.io/?p=22699682

As 2023 gets into high gear in the coming months, the cloud native ecosystem is set to reinforce its core

The post Cloudy with a Chance of Malware – What’s Brewing for DevOps? appeared first on The New Stack.

]]>

As 2023 gets into high gear in the coming months, the cloud native ecosystem is set to reinforce its core business value across enterprises to become even more mission critical to the digital economy. By contrast, embedding security into the DevOps methodology is still evolving, leaving specific predictions about the future of DevOps engineering open to question.

However, in our opinion, a few trends are more than likely to shape the cloud native security landscape in the coming year:

1. Cloud Native Security: Forrester expects more enterprises to adopt cloud native technologies as they increasingly opt to run workloads in containers rather than legacy virtual machines. As such, 40% of organizations will take a “cloud native first” strategy in 2023, as they look to increase agility and efficiency while reducing costs, but security will continue to be a major concern.

As more organizations adopt container and Kubernetes technologies, there will be a corresponding growth in the development of tools and practices for securing these environments, prompting DevOps to respond more thoughtfully to security.

Threat actors will unleash iterations of malware designed to break cloud native environments. Developers will feel a greater need to incorporate security earlier in their application development cycle. And therefore, as 2023 unfolds, the industry will see DevOps increasingly evolving into DevSecOps. New security standards will solidify into actionable best practices, greater adoption of cloud native security tools and increased focus on zero trust as a security principle.

2. Containers and Kubernetes Security: Securing containerized applications and Kubernetes will be a priority this year due to adoption going mainstream. There will be a greater focus on integrating Kubernetes security with broader cloud security frameworks, including integrations with cloud-based identity and access management systems and cloud-based security event and incident management systems. Securing the Kubernetes control plane is critical to the overall security of the cluster, and it will become an increasingly important focus for Kubernetes security practitioners. Policy-as-code for Kubernetes is expected to mature and gain greater traction. This year, dozens of leading organizations will embrace Open Policy Agent (OPA) in their Kubernetes deployments. We also believe thatDevSecOps will welcome observability solutions in the cloud native security marketplace. These solutions pull data from events, logs, telemetry and traces together into a comprehensive yet aggregated view from which to quickly figure out troubleshooting issues in Kubernetes.

3. Serverless Computing and Security: Serverless computing is relatively new to the cloud native landscape. Despite its ability for adaptation and integration, it lacks standardization and interoperability. The resulting risk of vendor lock-in has left many enterprises stalled in their adoption journey even as serverless computing continues to pique the interest of developers for event-based workloads. To bridge the gap and broaden adoption across vendor-agnostic functions, we will witness disruption in this space with the Google-sponsored Knative project. The open source, enterprise-level Knative framework will ensure standards are shared across different serverless Function as a Service (FaaS) implementations, thereby raising the bar on interoperability. Another disruption to serverless is an emerging concept called “infrastructure-from-code” (IfC) as a way of creating applications that allow your cloud provider to inspect the application code during deployment, then automatically provision the underlying infrastructure the application code needs.

Further, as the complexity of serverless environments increases, automating security policy enforcement and leaning on AI/ML techniques to improve the accuracy and efficiency of SecOps will take precedence. Expect to see an increased emphasis on securing the function code and runtime environment with measures such as code signing and verification along with hardening runtime environments to prevent malicious code injection.

4. API Security: Gartner predicts that this year over 50% of business-to-business transactions will be performed through real-time APIs. By 2025, less than 50% of enterprise APIs will be managed, with the growth in APIs surpassing the capabilities of API management tools. While REST- and HTTP-based services remain the most popular API architecture styles, use of them will continue to level off as this year progresses as newer event-driven API architectures such as GraphQL and gRPC are growing in popularity.

That said, the ubiquity of APIs will exacerbate sprawl issues this year. The sprawl of APIs within and between cloud native infrastructures has made API security one of the biggest challenges for DevOps today. This also means that unmanaged APIs will become a popular target for cybercriminals who can use them as gateways to gain unfettered access to sensitive data. In pursuit of this data, cybercriminals will put more focus on vulnerable API endpoints that connect directly to an organization’s underlying databases. Expect to hear more about damaging attacks on individual APIs that lead to data leakage.

When it comes to the banking and financial services industry, we would be remiss to not overemphasize that API security should be their single most important cybersecurity priority this year. Newly minted APIs will continue to overrun modern banking apps, causing a continuous widening of the attack surface across this vertical.

5. Software Supply Chain Security: If what the industry has witnessed in the past three years is any indication, cyberattacks on software supply chains will only increase in both frequency and severity throughout this year, as they have in previous years. Gartner predicts that by 2025, 45% of organizations will experience attacks on their software supply chains, which will be three times as many as in 2021. Software supply chain (SSC) security is a key priority in 2023, as organizations contend with an onslaught of attacks. From open source and third-party software libraries to developer user accounts and log-in credentials to components required to build, package and sign software — every element of the software supply chain will be subject to attack.

That said, new federal mandates and industry guidance intended to address supply-chain risks will put new pressure on enterprises this year to adopt established and evolving best practices that address SSC security.

And consequently, software component management tools used to track and manage open source software components that developers use will become important. Developers will embrace them to identify and address any vulnerabilities that may be present in their software bill of materials or SBOMs.

Final Thoughts

It is imperative for enterprises of all sizes and geographies to adopt a cloud native application development model, one that supports the development of modern apps built to meet the needs of the modern user. But, for your modern app to yield unprecedented efficiency, scale and value, the single biggest enabler in 2023 is security.

Cisco’s Panoptica solution protects the full application stack from code to runtime by scanning for security vulnerabilities in the cloud infrastructure, microservices (containers or serverless), the software bill of materials, and the interconnecting APIs. And best of all, it integrates with the tools that your application development and SecOps teams are already using. To learn more about Panoptica, visit us here or sign up here to try it for free.

The post Cloudy with a Chance of Malware – What’s Brewing for DevOps? appeared first on The New Stack.

]]>
Surmounting the Challenge of Building Web3 Applications  https://thenewstack.io/surmounting-the-challenge-of-building-web3-applications/ Wed, 08 Feb 2023 13:00:05 +0000 https://thenewstack.io/?p=22699767

Back in 1995, when I was director of product management at Wired, it was a big deal when we no

The post Surmounting the Challenge of Building Web3 Applications  appeared first on The New Stack.

]]>

Back in 1995, when I was director of product management at Wired, it was a big deal when we no longer had to go through AOL (remember them?) to get our content in front of our audience. There was a sense of excitement and potential about how the World Wide Web enabled us to bypass the gatekeepers and innovate without intermediaries.

There was a lot of experimentation, and a lot of mistakes were made, but in the end, it opened the door to a world of game-changing — and, in some cases, world-changing — innovation.

For those who are involved with or watching developments in blockchain technology, this might sound familiar. Has there been a lot of craziness happening around NFTs and crypto lately? Yes. But the blockchains, the decentralized ledgers for modern, decentralized transactions of all sorts, are also opening up a new world of innovation that, I’d argue, is similar to what we saw with global content 20 years ago, or social media 15 years ago.

In this article, I’ll provide some historical blockchain context, some ideas about the potential for blockchain and discuss Astra Block, a new service DataStax announced today that makes it far easier for developers to build on blockchain data and enables them to build Web3 applications by making real-time blockchain queries.

Making Sense of the Decentralized

Almost 30 years ago, search engines arrived on the scene to make it easier for users to navigate the web. They competed by offering the largest and most up-to-date data sets.

Capturing data and updating and delivering it through a user-friendly frontend made a lot of great things possible for users. Some purists argued that search engines like Google, Yahoo! and Bing brought too much centralization and intermediation to the web.

It’s a valid argument, but I’d also argue that the decentralized web benefited from the centralized search engines. By indexing — storing and organizing online content in a database — they essentially made it possible for users to find what they were looking for in an ever-more-sprawling world of information on the internet.

Something similar is starting to arise with Web3: the next iteration of the World Wide Web, founded on the notions of decentralization and disintermediation, is going in the opposite direction.

Cryptocurrency exchange Coinbase and peer-to-peer payments platform Circle, for example, are well-established. But this provides the user with a choice: Should my wallet be centralized? Some will say yes, some no; both are right. The fact is, centralization is happening, and it provides users with freedom of choice and simplicity.

The Blockchain Innovation Challenge

It’s easy to get distracted when it comes to the possibilities of Web3 and blockchain data. Cryptocurrencies, non-fungible tokens and smart contracts have drawn a lot of attention, but there are many other areas that will benefit from innovation — healthcare, real estate, IoT, cybersecurity, music, identity management and logistics, to name a few.

Again, it’s similar to the advent of Web 1.0 and 2.0. The possibilities were essentially limitless, but, at the start of each new phase of the web, people were restricted by their imaginations. Even if imagination isn’t a problem, innovating and building Web3 applications brings its own set of challenges. Building a product on blockchain data is laborious and complicated.

For one thing, much of the data out there isn’t available to developers in real-time, and it’s expensive. Blockchains are decentralized and immutable, but require a great deal of expertise to navigate. This isn’t something a developer wants to grapple with.

Blockchains also require a great deal of expertise to navigate. Something like pulling all the metadata associated with an NFT or the ERC20 transfer history requires a variety of different RPC and API calls.

You can quickly rack up costs if you use a node-as-a-service provider, or you might be forced to run your own node, which is a painful, complicated process.

There has to be a simpler, cheaper way for developers to build on blockchain data.

Astra Block: Kill the Blockchain Pain

To unlock the potential innovation that can be created with blockchain data, developers need access to reliable blockchain data with real-time query capability.

The blockchains are a new global dataset. The need to capture, index, store and query a similarly huge global data set (the web, in this case) is exactly what set in motion the innovations at Google that resulted in the creation of NoSQL.

A lightbulb came on for us at DataStax. After all, Apache Cassandra, the open source database on which we’ve built our managed service DataStax Astra DB, is one of the fastest, most reliable NoSQL databases around.

We set out to understand whether we could offer a live copy of the Ethereum data set, update it fast and use, like Google did, a globally distributed database that’s optimized for query performance to provide queries with very little latency against this data set.

Astra Block, a new capability of Astra DB, enables developers to easily clone the entire Ethereum blockchain dataset to an Astra DB instance for Web3 applications. With this real-time access to blockchain data, developers can query it as often as they want, they can build apps, alerts, games — whatever they can dream up — without the need to stand up their own infrastructure or struggle with the complexities of pulling the data directly.

Conclusion

Getting applications into production quickly is always a priority for developers. The expertise required to deftly navigate blockchain data shouldn’t get in the way of building Web3 apps and managing the lightning-fast queries they require.

Learn more about DataStax Astra Block.

The post Surmounting the Challenge of Building Web3 Applications  appeared first on The New Stack.

]]>
Data Modeling: Part 1 — Goals and Methodology https://thenewstack.io/data-modeling-part-1-goals-and-methodology/ Tue, 07 Feb 2023 18:40:07 +0000 https://thenewstack.io/?p=22699661

Data modeling is the process of defining and representing the data elements in a system in order to communicate connections

The post Data Modeling: Part 1 — Goals and Methodology appeared first on The New Stack.

]]>

Data modeling is the process of defining and representing the data elements in a system in order to communicate connections between data points and structures. In his impactful book “Designing Data-Intensive Applications,” Martin Kleppmann describes data modeling as the most critical step in developing any information system.

Understanding which data is relevant for the business and in what form requires communication between functional and technical people. Moreover, allowing data sharing across components within an information system is critical for the good functioning of the system. Quoting Kleppmann, “data models have such a profound effect not only on how the software is written but also on how we think about the problem that we are solving.”

But what exactly is a data model, then?

A data model is a specification that describes the structure of the data stored in the system.

In addition, it may define constraints that guarantee data integrity and standardize how to represent (rules), store (format) or share (protocol) data. In the literature, we typically distinguish between three different levels of data modeling (see pyramid figure)

Figure 1

  • The Conceptual level defines what the system contains. Business stakeholders typically create a conceptual model. The purpose is to organize, scope and define business concepts and rules. Definitions are most important at this level, such as a product.
  • The Logical level defines how the database management system (DBMS) should be implemented. A logical model is technologically biased and is created with the purpose of developing a technical map of rules and data structures. Relationships and attributes become visible, for instance, product name and price.
  • The Physical level describes how to use a specific technology to implement the information system. The physical model is created with the purpose of implementing the database. The physical level explores the trade-offs in terms of data structures and algorithms.

During the Beginner Flux training at InfluxDB University, we used the same levels to understand how time series data maps onto the Flux data structure and InfluxDB’s line protocol data model. Here we take this a step further in data modeling for InfluxDB and Flux. Therefore, it is worth recalling that:

  • Conceptually, a time series is an ordered set of timestamped data points described by one — and only one — measurement and a set of tags.
  • Logically, Flux represents multiple series simultaneously, representing different values by a set of key-value pairs named fields. Moreover, tags are key-value pairs that help further partition data for processing.
  • Physically, InfluxDB stores data into a Time-Structured Merge Tree; it is also worth mentioning that tags are both key and value indexed.

A Brief History of Data Modeling Methods

Now that we clarified what a data model is and the goals of data modeling, we can discuss how we get there. In practice, several methodologies exist in the literature. The most prominent ones, listed below, differ in terms of target information systems and workloads, such asi.e. online transaction processing (OLTP) and DBMS; online analytical processing (OLAP) and data warehouse; and big data and data lakes.

  • Relational modeling (RM) focuses on removing redundant information for a model that encompasses the whole enterprise business. RM uses relations (tables) to describe domain entities and their relationships.
  • Dimensional modeling (DM) focuses on enabling complete requirement analysis while maintaining high performance when handling large and complex (analytical) queries. DM aims to optimize the data access; thus, it is tailored for OLAP. The star and snowflake models are notable results of dimensional modeling.

Notably, RM and DM produce significantly different results considering the logical and physical levels of abstraction described above. Nonetheless, they all share similar conceptualization and tooling when operating at the conceptual level. Indeed, the entity-relationship (ER) modeling technique and diagrams underpin all the models mentioned above and graph databases or semantic stores. Therefore, it is worth refreshing what ER implies:

  • An entity is an object that exists and is distinguishable from other objects. Entities have a type and descriptive attributes; an entity-set groups entities of the same type. An attribute called the primary key uniquely identifies each entity in a set.
  • A relationship is an association among several entities. The cardinality of a relationship describes the number of entities to which another entity can be associated; we consider one-to-one, one-to-many and many-to-one.

Figure 2

In different techniques, entities and relationships remain central. However, their nature and roles are reinterpreted according to the business goals. For example, RM stresses identifying as many entities as possible to avoid data redundancy. Indeed, redundancy creates maintenance problems over time, which oppose the user’s need for consistency.

Conversely, DM builds around facts that borrow their identity from other entities using their many-to-many relations. Such entities are interpreted as dimensions, such as, descriptive information that gives context to the facts. DM is of primary interest to data warehouse users, whose top concerns are analytics. Both the modeling techniques mentioned above can, to some extent, represent time.

  • In relational modeling, time is just an attribute. Entities and relationships can be updated, but the conceptual schema does not carry information at this level. Temporal extensions of the relational modeling approaches have been proposed. However, they are tailored for temporal databases, which focus on the temporal validity of their entities (as a form of consistency) rather than time series databases (TSDBs) and the history of their time-varying attributes.
  • In dimensional modeling, time is considered an analytical dimension — it represents a possible subject for slicing over, which produces significant aggregates. Dimensional tables within the dimensional model do not consider changes at the conceptual level. However, in lower levels, changes may happen. Different approaches to handling such “slowly changing dimensions” have been proposed, including keeping track of their history, which is close to what a TSDB would do.

The post Data Modeling: Part 1 — Goals and Methodology appeared first on The New Stack.

]]>
The Dos and Don‘ts of API Monetization https://thenewstack.io/the-dos-and-donts-of-api-monetization/ Tue, 07 Feb 2023 18:00:30 +0000 https://thenewstack.io/?p=22697360

So you have an API, and you want to generate revenue with it. To do so, you need to solve

The post The Dos and Don‘ts of API Monetization appeared first on The New Stack.

]]>

So you have an API, and you want to generate revenue with it. To do so, you need to solve two sets of challenges.

The first involves business questions, such as how much to charge for API access or whether to use a multitiered pricing model. Those are challenges for product managers to address.

The second, more complex set of steps toward API monetization centers on technical challenges. How do you measure API usage and bill customers for it? How do you throttle API traffic such that customers with different plans receive different levels of service? How do you ensure your APIs can meet SLA guarantees? These are the sorts of technical quandaries that developers must resolve to ensure that APIs can actually generate money for the business.

If you’re a developer tasked with this mission, keep reading for tips on how to address API monetization requirements in the most efficient and effective way. This article explains best practices for monetizing APIs from a technical sense, along with pointers on the antipatterns to avoid.

API Monetization Requirements

Before diving into best practices and antipatterns, let’s go over the core technical requirements for enabling API monetization:

  • Advanced metering: Because different customers may have different levels of access to APIs under varying pricing plans, it’s critical to be able to manage access to API requests in a highly granular way, based on factors like total allowed requests per minute, the time of day at which requests can be made and the geographic location where requests originate.
  • Usage tracking: Developers must ensure that API requests can be measured on a customer-by-customer basis. In addition to basic metrics like total numbers of requests, more complex metrics, like request response time, might also be necessary for enforcing payment terms.
  • Invoicing: Ideally, invoicing systems will be tightly integrated with APIs so that customers can be billed automatically. The alternative is to prepare invoices manually based on API usage or request logs, which is not a scalable or efficient approach.
  • Financial analytics: The ability to track and assess the revenue generated by APIs in real time is essential to many businesses that sell APIs. It allows them to identify financial opportunities or shortcomings and adjust their API pricing plans accordingly.

Some businesses may need to meet additional technical requirements to monetize their APIs, but the considerations described above represent the most common requirements for API monetization.

How to — and How Not to — Monetize APIs

There are multiple possible approaches to meeting these requirements, but some are better than others. Here’s what developers should and shouldn’t do as in the context of API monetization.

Don’t Embed Monetization Logic into APIs

Probably the biggest mistake developers make when asked to monetize an API is attempting to write logic for functions like advanced metering or usage tracking directly into the code that powers APIs.

Doing so not only requires tremendous effort, but also makes it difficult to change policies on the fly. For instance, if your API licensing plans change and you want to change usage policies accordingly, you don’t want to have to deploy a new version of your API to do so.

On top of this, the more logic you try to embed into APIs, the higher the risk that you’ll make a mistake that could lead to security breaches.

Do Rely on API Management Tools to Enforce Monetization

A healthier approach is to rely on your layer of API management tools — such as gateways — to enforce monetization requirements.

Within the management layer, you can easily establish granular policies to meter API requests. You can also collect API usage and financial analytics data efficiently, and you can push that data to invoicing and finance tools. And you avoid the security risks of embedding monetization logic into APIs directly.  

Don’t Compromise on Pricing Flexibility

Partly because of the technical challenges of implementing highly granular monetization policies within APIs, some businesses settle for API pricing models or licensing terms that are relatively basic. They might grant the same level of access to all users, for instance, or charge a flat fee for API access that is not linked to usage metrics.

This approach simplifies matters from a technical perspective. But from a business perspective, it could mean missing out on opportunities to optimize API monetization plans. Technical challenges shouldn’t become an excuse for less sophisticated API monetization strategies.

Do Centralize API Monetization Management

Sometimes, a business licenses not just one API, but many. In this case, being able to enforce monetization policies across all APIs in a central way makes it much easier and more efficient to ensure that APIs deliver their intended revenue.

When monetization controls are implemented at the API management layer, a centralized approach to monetization management becomes possible. That’s another reason why developers should avoid baking monetization logic into the APIs themselves. Even if your business only has one API to monetize today, it might launch others in the future, and managing monetization on an API-by-API basis just doesn’t scale.

Conclusion: A Healthy Perspective on API Monetization

The bottom line: From a technical perspective, API monetization is quite complicated, and although there are multiple ways to meet the challenges, some are more efficient and scalable than others.

As a best practice, developers should strive to decouple monetization management from APIs themselves. Instead, they should rely on their API management tools to enforce monetization requirements. Doing so ensures that the policies and data necessary to monetize APIs can be easily implemented in a consistent way across all APIs, with minimal effort on the part of developers.

The post The Dos and Don‘ts of API Monetization appeared first on The New Stack.

]]>
Platform Engineering in 2023: Doing More with Less https://thenewstack.io/platform-engineering-in-2023-doing-more-with-less/ Tue, 07 Feb 2023 17:29:04 +0000 https://thenewstack.io/?p=22699613

Some variation of the question, “Is DevOps dead?” continues to be asked in the cloud native development space. And the

The post Platform Engineering in 2023: Doing More with Less appeared first on The New Stack.

]]>

Some variation of the question, “Is DevOps dead?” continues to be asked in the cloud native development space. And the answer is a definitive “no.” With the rise of platform engineering, the true question, and the clear answer, is: DevOps is changing but isn’t going anywhere anytime soon. The shifting role of DevOps will be the fuel that powers the developer experience to help everyone do more with less, which is becoming a guiding principle as companies across industries face uncertain economic headwinds.

Developer platforms have emerged as a way to pave clear paths for developers, who aim to achieve greater productivity and reduce cognitive load while safely shipping software at speed. Platform engineering as a discipline, as undefined and nebulous as it remains — although this, too, is starting to change as more packaged, commercial options crop up — has become a vehicle for easing the cloud native development journey.

While the definition of “platform engineering” is still up for interpretation, its direction is full speed ahead, focused on shaping and improving the developer experience. One of the most efficient ways to get there will be to refocus the work of existing DevOps teams to do more with less, becoming something akin to “PlatformOps” in support of the developer experience.

The Rise of Platform Engineering = The Evolution of DevOps

Two things inform the emergence, coming dominance and commercialization of the platform engineering and the developer platform.

First, developer frustration is real. The Kubernetes developer has had reason to be frustrated by some of the new challenges created by the introduction of microservices and cloud native development. A complete change of development paradigm coupled with the sudden expectation that developers should be able to “shift left” and assume end-to-end code-ship-run responsibility for their code created additional, and unwelcome, cognitive load.

Add to the mix a host of routine, repetitive tasks that suddenly fell to developers — who were, in many cases, left without any kind of roadmap or set of abstractions to figure out what tools to use. Or how to gain visibility into services to get and speed up the feedback loops they needed, all of which amounted to slowing down the shipping of products and features. One developer productivity survey from Garden found that developers spent an average of 15 hours a week on non-development tasks.

Not only was this a recipe for a bad developer experience, but it was also a drag on productivity, which affects the bottom line.

Second, while some developers relish the freedom to experiment and try new tools (the 1%), the vast majority of developers (the 99%) want, and possibly need, clear guardrails and “golden paths” for shipping and running their code. Most developers want to focus their time writing code — not running infrastructure and trying to figure out things that, for productivity’s sake, should just work, such as maintaining tooling, setting up dev environments, automating testing and so on.

By extension, most businesses need the security and stability of standardization, replicability and consistency. Being able to meet customer needs, control costs and ensure security are the priorities, and while not inherently anti-innovation, business-critical requirements discourage too much creativity and rely on processes, automation and everyone working with the same standards and tools.

This is where DevOps continues to save the day. DevOps, much like the developer’s work, is also evolving but isn’t disappearing. Instead, DevOps is moving to a platform engineering (aka PlatformOps) model of support that clears the pathway, reduces the complexity of the developer experience and removes friction from the developer’s everyday work through the creation of developer platforms.

Platform engineering builds on the best of traditional DevOps practices and uses that knowledge and experience to identify and enable new efficiencies and do more with less. Or, as a recent New Stack article articulated, “You could say platform engineering takes the spirit of agile and DevOps and extends it within the context of a cloud native world.”

Developer Platforms Enable More with Less

The drive to encourage developers to take on full life cycle management of their software started from a good place, giving them more control and insight with the power to increase velocity and build efficiency. But infrastructure has not been, and probably will never be, the primary focus for developers — or the most productive place for developers to channel their energy.

Nevertheless, in the cloud native space, there is a need for developers to understand more about the infrastructure and what happens after they ship their code. If something goes wrong, the developer still needs to be responsible for that code and know its dependencies to help troubleshoot and identify (and fix) downstream problems.

But organizing and making decisions about services, environments, clouds themselves? That’s asking a specialist in one discipline to try to specialize on the fly in some entirely different discipline, which negates both the original idea of increasing velocity and developer experience and the idea of doing more with less. Sometimes the idea of saddling a non-specialist with specialist responsibilities, thinking this shrinks the resource footprint — more with less — creates more problems.

The middle ground and the path to achieving more with less is the sweet spot for the developer platform. Essentially, developer platforms can:

  • Provide developer self-service for the tools and visibility required, paving the path, but is flexible enough to accommodate different kinds of developers. It can work for new developers completing onboarding as well as for experienced developers who want to achieve reliable, efficient production.
  • Enable DevOps/PlatformOps to support and enhance the self-service motion, increase their time and focus spent on strategic improvements and projects and decrease time spent fighting fires.
  • Allow for better measurement of performance, compliance and security because operational and resource data is centralized within the platform.
  • Ease the budgetary squeeze that is hitting many companies across multiple industries. As I shared with The New Stack, developer platforms are one way to “make sure your local development environment is set up well, and that no one is sitting around waiting for builds to happen. All this relies on rapid feedback and transparency that the platform engineering team can facilitate.”

It’s All about the Developer Experience

If developer platforms are about enabling efficiency and productivity, real-world usability is key. The stakeholders (developers) have to want to use the platform for there to be any value. Developer platforms must be created to remove barriers and sweep away typical challenges that stand between developers and shipping software safely and rapidly. The platform must address what developers need to know, see and routinely do as a part of their job, and define the abstractions required to make that happen seamlessly.

The repeated talk about “the developer experience” may sound a bit overblown, but it wouldn’t be a constant theme if it weren’t more than a passing trend.

A sentiment echoed by multiple developers, platform engineers, DevOps and site reliability engineers, but best stated by Arm’s Cheryl Hung, highlights why a developer platform is essential: “Infrastructure can be unreliable; it fails; it is unpredictable. Compared to software that runs pretty much the same way every time, infrastructure is really, really hard.”

If the developer experience — and enabling the developer’s work — is crucial to achieving business goals and using resources wisely, the DevOps/PlatformOps role and platforming engineering are equally instrumental in continuously improving and safeguarding the developer experience.

The post Platform Engineering in 2023: Doing More with Less appeared first on The New Stack.

]]>
Google Touts Web-Based Machine Learning with TensorFlow.js https://thenewstack.io/google-touts-web-based-machine-learning-with-tensorflow-js/ Tue, 07 Feb 2023 16:20:51 +0000 https://thenewstack.io/?p=22699749

TensorFlow.js is a JavaScript library for training and deploying Machine Learning (ML) models in the browser and on Node.js. It

The post Google Touts Web-Based Machine Learning with TensorFlow.js appeared first on The New Stack.

]]>

TensorFlow.js is a JavaScript library for training and deploying Machine Learning (ML) models in the browser and on Node.js. It was launched by Google nearly five years ago, but its popularity has increased in recent years thanks to the practice of using ML in programming — part of the generative AI trend engulfing the technology industry currently.

To find out more about TensorFlow.js and how web developers use it in their projects, I spoke to Jason Mayes, who leads the Developer Relations team for Web ML at Google. “Web ML” is a broader term that basically means using ML inside the browser (or on Node.js). But the main part of Mayes’ remit is the TensorFlow.js team, so I began by asking him about the main use cases for ML on the web.

Why Do ML over the Web?

First off, he mentioned privacy. One common use case is for processing sensor data in ML workloads — such as data from a webcam or microphone. Using TensorFlow.js, Mayes said, “none of that data goes to the cloud […] it all happens on-device, in-browser, in JavaScript.” For this reason, TensorFlow.js is being used by companies doing remote healthcare, he said.

Another privacy use case is human-computer interaction. “With some of our models, we can do body pose estimation, or body segmentation, face keypoint estimation, all that kind of stuff,” Mayes said.

Lower latency is another reason to do ML in the browser, according to Mayes. “Some of these models can run over 120 frames per second in the browser, on an NVIDIA 1070 let’s say,” he said. “So that’s kind of [an] old generation graphics card and [yet it’s] still pushing some decent performance there.”

Cost was his third reason, “because you’re not having to hire and run expensive GPUs and CPUs in the cloud and keep them running 24/7 to provide a service.”

“The fourth reason people come to us is that it’s in JavaScript,” he said, noting the obvious popularity of that language. “Previously, TensorFlow was aimed at academics and researchers…this kind of stuff over in Python-land. Which is great, nothing wrong with that! But I think embracing the JS side of things could open machine learning up to much more people than ever before — and a lot more creatives, artists and musicians are starting to use us in JS-land.”

How JS ML Compares to Python ML

However, this begs the question of how TensorFlow.js compares to using TensorFlow in its more familiar Python environment?

“All the benefits I just gave you are pretty much impossible to achieve server-side,” Mayes replied. “And even if you don’t want to go on client-side, we can run in Node.js on the backend, on the server-side. And the reason you choose Node.js over Python is that even though Node and Python are just wrappers around the original TensorFlow that’s C++, [the] pre- and post-processing parts of the model execution can be accelerated by the just-in-time compiler of JavaScript.”

He mentioned that HuggingFace, one of the leading NLP service companies, runs its ML workloads in TensorFlow.js for the speed benefits.

“Python is actually not very efficient at running,” he continued. “It’s good for academia, for trying things out — it’s got a lot of libraries to use out of the box. But I think we’ll have the same thing replicated in our own community going forward, and you’ll see the performance benefits.”

As another example of this in action, he pointed to LinkedIn. “If you go to a LinkedIn web page on your mobile phone, that’s actually delivered by a TensorFlow.js model on the backend running in Node,” he said. This resulted in a “15% performance gain over their Python equivalent model, which means they save millions of dollars a month by just doing that.”

More Speed: WebGL and WebAssembly

As with many other leading web applications, TensorFlow.js is making use of the latest hardware acceleration technologies. WebGL and WebAssembly are both in production, while WebGPU is in testing.

“We’ve got WebGL to do graphics card acceleration,” he explained. “Essentially, using textures and shaders to do mathematical operations — which is a bit of a hack, but it works. And then we’ve also got WebAssembly to go faster on the CPU. We’ve also got the new emerging WebGPU standard, which is currently behind a flag in Chrome Canary and other browsers, but eventually, it will become the thing in browsers to use. And I think we’re seeing around 2x-plus performance [gain] in WebGPU — bear in mind, with WebGL right now we’re getting hundreds of frames per second already.”

More about Web ML

TensorFlow.js is clearly the main web-based ML tool at Google, but I asked Mayes what else comes under the umbrella term “Web ML”?

“So Google’s obviously heavily invested in ML and from my perspective, working on this Web ML side, it offers a unique selling point — if you will — to our [ML] ecosystem… currently, there’s no PyTorch.js, for example,” he said, referencing Meta’s ML platform.

Google offers what Mayes calls “a path to the web” for machine learning, “which researchers and others can embrace, to get those benefits that we spoke about before.”

He also works with “other teams [at Google] that might touch on web-based deployments of machine learning, like the MediaPipe models that are also able to run in the web browser.” He’s referencing an open source project called MediaPipe, for using ML in “live and streaming media.”

Future Growth

TensorFlow.js has been growing “3x year-on-year” according to Mayes. It’s only going to get bigger, as ML and AI apps continue to ramp up in popularity. Indeed, just this week Google itself released a ChatGPT competitor called Bard. I asked Mayes how big he thinks web-based ML tools like Tensorflow.js will get?

“I think Web ML is the real Web3,” he said, echoing a catchphrase he has been using on social media. “I’m not saying crypto is bad or anything, but I think […] it’s like a teenager trying to find itself right now. And I think Web ML can have an impact on industries [and] companies right now.”

“I believe that if we continue on this 3x path of growth, we could be the most widely used form of ML in the future within the TensorFlow ecosystem. That’s my personal belief. But if we continue this growth upward, I don’t see why not — because there are a lot more JS developers out there.”

To keep that momentum going, Mayes said that Google is “always looking for interesting models we can port from Google research to the web, to make it easier to use.”

If you’re a developer interested in learning more about web-based ML, check out this series of tutorials featuring Jason Mayes.

The post Google Touts Web-Based Machine Learning with TensorFlow.js appeared first on The New Stack.

]]>
Streamline Event Management with Kafka and tmctl https://thenewstack.io/streamline-event-management-with-kafka-and-tmctl/ Tue, 07 Feb 2023 14:33:47 +0000 https://thenewstack.io/?p=22699422

DevOps teams and platform engineers in support of event-driven application developers face the challenge of capturing events from sources such

The post Streamline Event Management with Kafka and tmctl appeared first on The New Stack.

]]>

DevOps teams and platform engineers in support of event-driven application developers face the challenge of capturing events from sources such as public cloud providers, messaging systems or in-house applications, and reliably delivering filtered and transformed events to consumers depending on their needs.

In this post, we’ll run through a solution that uses Kafka and TriggerMesh’s new command-line interface called tmctl to centralize and standardize events so we can apply transformations and filters in a uniform way before pushing events to downstream consumers for easy consumption.

The source code for all the examples is available on GitHub.

The Problem Illustrated

An e-commerce company needs to process orders from different order management systems. The company’s in-house order management system writes orders to a Kafka topic, but others that were added over time work differently: One pushes orders over HTTP, another exposes a REST API to invoke, another writes orders to an AWS SQS queue. Order structure varies by producer and needs to be massaged into shape. For all producers, orders are labeled with a region (EU, US, etc.) and a category (electronics, fashion, etc.) and come in every possible combination.

A downstream team of app developers is asking to consume global book orders to create a new loyalty card. A separate analytics team wants to consume all European orders to explore expansion opportunities in the region. Each of these consumers wants specific events from the overall stream, sometimes with specific formats, and they both want to consume them from dedicated Kafka topics.

You’re tasked with capturing orders from the four order management systems in real-time, standardizing them, filtering and delivering them to Kafka topics dedicated to each consumer.

TriggerMesh as a Unified Eventing Layer

We’ll show how to use TriggerMesh to ingest orders, transform and route them for consumption on Kafka topics. There are other tools that could address this problem, each with its quirks and perks. TriggerMesh has found appeal with engineers with DevOps roles due to its declarative interface and Kubernetes native deployment.

A typical TriggerMesh configuration is made up of the following components:

Sources

Sources are the origin of data and events. These may be on-premises or cloud-based. Examples include message queues, databases, logs and events from applications or services.

All sources are listed and documented in the source’s documentation.

Brokers, Triggers and Filters

TriggerMesh provides a broker that acts as an intermediary between event producers and consumers, decoupling them and providing delivery guarantees to ensure that no events are lost along the way. Brokers behave like an event bus, meaning all events are buffered together as a group.

Triggers are used to determine which events go to which targets. A trigger is attached to a broker and contains a filter that defines which events should cause the trigger to fire. Filter expressions are based on event metadata or payload contents. If a trigger fires, it sends the event to the target defined in the trigger. You can think of triggers like push-based subscriptions.

Transformations

Transformations are a set of modifications to events. Examples include annotating incoming events with timestamps, dropping fields or rearranging data to fit an expected format. TriggerMesh provides a few ways to transform events.

Targets

Targets are the destination for the processed events or data. Examples include databases, message queues, monitoring systems and cloud services. All targets are listed and documented in the targets documentation.

Setting Up the Environment

To provision the Kafka topics for the example, I’m going to use RedPanda, a Kafka-compatible streaming data platform that comes with a handy console. I’ll run both on my laptop with its provided docker compose file, which I’ve tweaked a bit for my setup. You can use any Kafka distribution you like.

docker-compose up -d and away we go, the console becomes available at http://localhost:8080/ by default.

We’re going to use tmctl, TriggerMesh’s new command-line interface that lets you easily build event flows on a laptop that has Docker. To install it, homebrew does the job for me:

brew install triggermesh/cli/tmctl

There are other installation options available.

Ingest Orders from Kafka

We’ll start by creating a broker, the central component of the event flow we’re going to build. It’s a lightweight event bus that provides at-least-once delivery guarantees and pub/sub style subscriptions called triggers (and their filters).

tmctl create broker triggermesh

And now we’ll use a Kafka source component to ingest the stream of orders into our broker:

tmctl create source kafka --topic orders --bootstrapServers <url> --groupID mygroup.

In a separate terminal, I’ll start watching for events on the TriggerMesh broker with the command tmctl watch.

We can now send an event to the orders topic using the RedPanda Console:

If we look at the terminal running the watch command, we see the event show up there, which means the event has been ingested by the broker. Notice how the event has been wrapped in a standard envelope based on the CloudEvents specification. We’ll see how we can leverage this envelope later on.

Transform and Route Events to the Right Topics

We’re going to want to route global book orders to the orders-global-books topic, and all EU orders across all categories to the orders-eu-all topic. Before we can do that, we need to extract the region and category from the event payload into event headers (CloudEventsattributes to be specific), so that we can later filter against these headers with trigger filters.

For this, we’ll use a TriggerMesh JSON transformation, which provides a low-code approach to modifying a JSON event’s payload and metadata. Here we’re storing the values of region and category from the payload as variables (second half of the code) and using them to modify the event type attribute to be of the form $region-$category-v1.

context:
- operation: add
  paths:
  - key: type
    value: $region-$category-v1
data:
- operation: store
  paths:
  - key: $region
    value: region
  - key: $category
    value: category


I’m giving the event type a version, so we can more easily make additional transformations down the road and evolve the version at each stage. This will provide the possibility for consumers to migrate from one version of events to another at their own pace and provide more flexibility to modify the flow of events with minimal impact to other components.

We’ll put this transformation code in a file and create a new transformation that references it, along with a trigger that routes events from the original order topic (which are of type io.triggermesh.kafka.event as you can see in the first output from tmctl watch above) to this transformation:

tmctl create transformation --name transform-extract-region-category -f transformations/orders-add-region-category.yaml.

tmctl create trigger --eventTypes io.triggermesh.kafka.event --target transform-extract-region-category

Now if we send the same event into the orders topic, we’ll see two events show up in tmctl watch: the original, followed by the transformed event, the latter should look like this:

Notice that the event type is now eu-fashion-v1. This is perfect for the routing we want to do in the next step.

Now let’s create some Kafka targets that will write events to the two dedicated Kafka topics for our app developers and analytics consumers.

tmctl create target kafka --name orders-global-books-target --topic orders-global-books --bootstrapServers <url>

tmctl create target kafka --name orders-eu-all-target --topic orders-eu-all --bootstrapServers <url>

These targets will create the necessary Kafka topics for you as you’ll see in the RedPanda console. However, these Kafka targets aren’t doing anything yet, because I haven’t routed any events to them.

Let’s create the triggers that will send events to their respective Kafka targets and thus to their respective Kafka topics:
tmctl create trigger --name global-books --eventTypes eu-books-v1,us-books-v1 --target orders-global-books-target

tmctl create trigger --name eu-all --eventTypes eu-fashion-v1,eu-books-v1,eu-electronics-v1,eu-groceries-v1,eu-pharma-v1 --target orders-eu-all-target.

Each trigger defines the types of events that should fire the trigger, as well as a target component (Kafka targets here) to which events should be delivered.

If we send the original event again now, because its event type has become eu-fashion-v1, it’ll get routed to the orders-eu-all Kafka topic. We can see it there in the RedPanda console:

At any given moment, we can use the command tmctl describe to see the TriggerMesh components we’ve created, their status and parameters:

Europeans Need a Special Format

A general manager from the EU region says, the formatting of item IDs in Europe should be of the form _item_uuid_184, as opposed to the United States where item IDs are simple numbers like 184. Ah, those pesky Europeans (I would know).

We’ll add a new JSON transformation that only transforms the itemid value for the eu region, and it’ll also bump the version of these events to v2.

context:
- operation: add
  paths:
  - key: type
    value: $region-$category-v2
data:
- operation: store
  paths:
  - key: $itemid
    value: item.itemid
  - key: $region
    value: region
  - key: $category
    value: category
- operation: add
  paths:
  - key: item.itemid
    value: _item_uuid_$itemid


Because copies of events in both versions v1 and v2 will be flowing through the broker, consumers of v1 can continue uninterrupted, so long as trigger filters are still listening for v1 when it initially makes it into the broker.

We can now create the transformation and the trigger that will route all the v1 EU orders to the transformation, so that they can be transformed into v2 EU orders.
tmctl create transformation --name transform-eu-format -f transformations/orders-eu-format.yaml

tmctl create trigger --eventTypes eu-fashion-v1,eu-electronics-v1,eu-books-v1,eu-groceries-v1,eu-pharma-v1 --target transform-eu-format

We’re also going to update the trigger for the EU target to filter for v2 events instead of v1 events. Note that recreating a trigger (or any other component) with the same name or parameters results in an update. Again, this is where you could decide to only start routing v2 to a single consumer for now, before rolling it out to the others that aren’t yet ready for v2 or simply in order to reduce the blast radius if something were to go wrong.

tmctl create trigger --name eu-all --eventTypes eu-fashion-v2,eu-books-v2,eu-electronics-v2,eu-groceries-v2,eu-pharma-v2 --target orders-eu-all-target

If we send the same original event into the orders topic, we now see the transformed v2 event appear in the orders-eu-all topic.

We now have events flowing across Kafka topics and some transformations in place, as illustrated below.

Next, let’s add some new sources of orders.

Integrate Orders Pushed over HTTP

The next order management system we need to integrate is pushing orders out over HTTP. So to be able to ingest these into TriggerMesh, we’ll create a webhook source that exposes an HTTP endpoint:

tmctl create source webhook --name orders-webhook --eventType orders-legacy

I’m giving the orders-legacy type to the orders coming from this webhook because they aren’t formatted according to the latest standards. The orders are arriving as follows:

{
  "orderid": 11,
  "ordertime": 1497014121580,
  "region": "us",
  "category": "books",
  "itemid": "331",
  "brand": "Penguin",
  "itemcategory": "Edutainment",
  "name": "Bonnie Garmus - Lessons in Chemistry"
}


We need to transform these events as they arrive in TriggerMesh, following which they’ll be processed by the rest of the pipeline we’ve already created:

context:
- operation: add
  paths:
  - key: type
    value: io.triggermesh.kafka.event
data:
- operation: store
  paths:
  - key: $itemid
    value: itemid
  - key: $brand
    value: brand
  - key: $itemcategory
    value: itemcategory
  - key: $name
    value: name
- operation: add
  paths:
  - key: item.itemid
    value: $itemid
  - key: item.brand
    value: $brand
  - key: item.category
    value: $itemcategory
  - key: item.name
    value: $name
- operation: delete
  paths:
  - key: itemid
  - key: brand
  - key: itemcategory
  - key: name


We’re doing a little trick here by setting their event type to io.triggermesh.kafka.event so that they get picked up by the first transformation we created.

We’ll create the transformation component and route the legacy orders to it as follows:
tmctl create transformation --name transform-orders-webhook-legacy -f transformations/orders-webhook-legacy.yaml

tmctl create trigger --eventTypes orders-legacy --target transform-orders-webhook-legacy

The order management system will push events to the webhook over HTTP, and we can simulate this using curl as follows:

curl -X POST -H "Content-Type: application/json" -d @mock-events/webhook_raw.json <webhook URL>

To get the webhook’s URL, you can use tmctl describe and find the URL next to the webhook component called orders-webhook.

Integrate Orders Provided by an HTTP Service

The next order management system we need to integrate provides an HTTP API that we need to regularly poll for new events. This service also produces the legacy order format that we’ll need to transform.

First, we’ll start a mock HTTP service locally to simulate this service, in a new terminal (requires Python 3):

python3 -m http.server 8000

In the directory I’m working in, there is an example legacy json event in the file mock-events/legacy_event.json that this HTTP server can serve up.

Now, we create the HTTP Poller:
tmctl create source httppoller --name orders-httppoller --method GET --endpoint http://host.docker.internal:8000/mock-events/http_poller_event.json --interval 10s --eventType orders-legacy.

You can adjust the endpoint depending on your environment. I’m using host.docker.internal because I’m running on Docker Desktop.

The beauty here is that we’re also setting the type of these events to order-legacy. This means that without any additional work, we know that these orders will get processed by the pipeline we just created for the webhook orders, meaning they’ll be reformatted to the new standard, transformed to extract the necessary metadata, etc.

You should now see these events appearing in TriggerMesh every 10 seconds and being routed to the orders-global-books Kafka topic.

Integrate Orders from an SQS Queue

The final order management system that needs to be integrated provides orders through an AWS SQS queue. To read from the queue, we can create an SQS source with TriggerMesh:

tmctl create source awssqs --arn <queue-arn> --auth.credentials.accessKeyID <id> --auth.credentials.secretAccessKey <secret>

Now I’ll send an event into SQS that matches our initial order format:

Surprise! When I look at the event coming into TriggerMesh with tmctl watch, I notice that the order data is enveloped in a lot of AWS metadata, some of which is shown below:

Because we don’t need this metadata, we’ll extract the body of the SQS message so that the incoming event matches the schema we want. It is a simple case of extracting the Body attribute and setting it as the root of the payload. We’ll also set the event’s type to the same one produced by the Kafka orders source, so that it gets processed by the same set of transformations further down the pipe.

context:
- operation: add
  paths:
  - key: type
    value: io.triggermesh.kafka.event
data:
- operation: store
  paths:
  - key: $payload
    value: Body
  - key: $category
    value: category
- operation: delete
  paths:
  - key:
- operation: add
  paths:
  - key: 
    value: $payload


Again, we’ll create the transformation and a trigger to send events to it:
tmctl create transformation --name transform-sqs-orders -f transformations/orders-transform-sqs.yaml

tmctl create trigger --eventTypes com.amazon.sqs.message --target transform-sqs-orders

Benefits of This Approach

We’ve just created a unified eventing layer that can ingest, transform and filter events from heterogeneous sources and reliably filter and deliver them to different consumers.

  • Decoupling: The decoupling of event producers and consumers, which is inherent to this style of event-driven architecture, makes it easy to evolve the topology as requirements change. It’s easy to add new consumers without affecting other producers or consumers, we just need to create a new trigger and target. Likewise, we can easily add new order management systems and transform them to fit the right schema. This provides agility and maintainability.
  • Versioning: By versioning our events with the type metadata, we’re able to progressively roll out changes and consumers can adopt new versions at independent paces.
  • Integration: The solution lets you integrate heterogeneous event producers and consumers without imposing constraints like schemas or SDKs they must embed.
  • Push or pull: The combination of Kafka and TriggerMesh gives us an interesting combination of message exchange patterns and guarantees. In our example, the event consumers will pull events from Kafka. But if we wanted to, we could push events directly to consumers using other TriggerMesh target components like the CloudEvents target.
  • Short inner-loop: tmctl makes the process of creating an event flow interactive and iterative. You can see results immediately and adjust.
  • Transition to declarative on K8s: When you’re ready to move to a more declarative workflow, you can tmctl dump your configuration as a Kubernetes manifest that you can then apply onto any cluster that has TriggerMesh installed.
  • Multicloud and open source: The components used in this tutorial are Apache 2.0 (TriggerMesh) and source-available (RedPanda). They are cloud-agnostic and will run wherever you need them.

If you’d like to take it for a spin yourself, you can either head to the GitHub repo for this example or try to create something yourself by starting with the quickstart guide.

The post Streamline Event Management with Kafka and tmctl appeared first on The New Stack.

]]>
Machine Learning Is as Easy as an API, Says Hugging Face https://thenewstack.io/machine-learning-is-as-easy-as-an-api-says-hugging-face/ Tue, 07 Feb 2023 14:00:40 +0000 https://thenewstack.io/?p=22699541

AI right now seems like the domain of elite experts, but startup Hugging Face plans to “democratize good machine learning”

The post Machine Learning Is as Easy as an API, Says Hugging Face appeared first on The New Stack.

]]>

AI right now seems like the domain of elite experts, but startup Hugging Face plans to “democratize good machine learning” by making it as easy as deploying a REST API.

This isn’t theoretical — it’s possible now, with use cases in frontend and web applications, explained Jeff Boudier, head of product and growth at the startup. Hugging Face offers opens source machine learning (ML) models for free on its community site, while charging a fee for infrastructure and service support.

“One, we make it really easy to use and so if you have a use case for machine learning within your app, then it’s super easy to come to Hugging Face, find a model, deploy an API and just build,” Boudier told The New Stack. “The second thing is [that] we want to make machine learning very accessible.”

Examples of AI Uses on Frontend and Mobile

Boudier shared how healthcare company Phamily used Hugging Face. The company helps medical groups manage in-house chronic care management service lines for between patient visits. It needed to leverage machine learning models to classify and triage the messages automatically.

The problem, of course, was building out the models, a proposition that can be very expensive, as Phamily soon learned. It came to a point where it needed to build out its infrastructure team or go home.

Instead, Phamily leveraged Hugging Face’s Inference Endpoints, which allows developers to deploy machine learning models on a fully managed infrastructure. After an hour of reading the documentation, the Phamily team was able to deploy a transformer model with a Rest-ful API, which previously took a week’s worth of developer time, according to Bryce Harlan, software engineer at Phamily.

Boudier estimated customers have deployed about 20,000 projects using AI technologies on their platform.

“Frontend, backend, mobile — it all works because we abstract away all of the machine learning and the infrastructure around the machine learning, so at the end of the day, it’s a REST API that you can send requests to — whether it’s from your frontend JavaScript or it’s your backend, or from a mobile client,” Boudier said. “We have a ton of developers and AI startups that are using our models and the inference endpoints … to power user-facing experiences.”

Examples of frontend and mobile AI uses include MusixMatch, which matches song lyrics to Spotify songs; Pinecone, which uses machine learning to conduct a semantic search; and a mobile app that ensures that patients follow prescriptions from doctors, Boudier said.

A Brief AI Lexicon: Defining New Models

In addition to its products, Hugging Face supports a robust open source community, acting as a free repository for all types of AI models — more than 120,000 — as well a sandbox to explore what the models do and information about potential bias.

Although Hugging Face started with natural language processing, developers will find two newer types of machine learning models on the site — transformers and diffusers.

Transformer models were pioneered by Google with the BART model. Transformer models — which Stanford labeled foundation models — are broadly trained on large data sets; for example, internet data or research data, or whatever large datasets the developers wants to use, Boudier said. These models can require millions of dollars in compute power to train, he added. For example, ChatGPT is based on the foundation model GPT3, which is trained with internet data, and NASA is collaborating with IBM to use NASA’s massive datasets to build open source foundation models for other scientists to leverage.

Once created, these models provide a foundation that can be fine-tuned — for instance, an existing foundation model could be further refined using an individual’s own email archive, to better classify it for the person’s unique needs.

“The most important concepts about transformers is transfer learning, which is the idea that you can create a big model with huge amounts of data using massive amounts of compute, and it creates what we call a pre-trained model that has accumulated a lot of knowledge,” he said. “Then you can apply transfer learning on that, to easily adapt it to a more specific domain, a more specific problem, to a different task, et cetera. So it’s a much more versatile technology that makes it easy and important to reuse.”

Diffusers are another relatively new type of model growing in popularity, thanks to Stable Diffusion and DALL-E 2.

“In terms of process, … imagine a picture that starts from a noisy image of dots, and then through iterations, the model improves upon that noise to make sense from the input,” Boudier explained.

So, for example, you might start with a text input of an astronaut riding a horse on the moon, and the model will start with an undefined random image, then iterate until a picture comes into focus that resembles what you requested, he said. It’s used in Photoshop to do painting; on Tuesday, Google revealed a new model that creates music from text using diffusion, he added.

Competition

Boudier was unaware of any companies that compete with open source models, and sees Hugging Face’s competition as proprietary companies, such as Google, AWS and OpenAI.

“In all these cases, you don’t have access to the model, so you can’t really bring your own model, you cannot improve the model, you cannot run the model in your own environment,” he said.

Hugging Face has drawn comparisons to GitHub, in that it federates the machine learning community in the same way that GitHub federated the software engineer community. But machine learning model files tend to be large — they’re called checkpoint files and can be in the range of 100 gigabytes — so GitHub isn’t the “right tool for the job,” he said.

“Typically, what happens is that when a research lab publishes a new model, they will publish the paper on arXiv, they will publish the code on GitHub and they will publish the model and the data sets and the demo on on Hugging Face,” Boudier said.

The community is heavily used by AI specialists and data scientists, and increasingly, software developers, he said. To help developers learn more about models and how to use them, Hugging Face offers a free online course.

“Don’t be intimidated,” Boudier advised developers.

The post Machine Learning Is as Easy as an API, Says Hugging Face appeared first on The New Stack.

]]>
Platform Engineering Is Not about Building Fancy UIs https://thenewstack.io/platform-engineering-is-not-about-building-fancy-uis/ Mon, 06 Feb 2023 18:37:14 +0000 https://thenewstack.io/?p=22699549

If I had to name the single biggest misconception some people have about platform engineering, it would be that the

The post Platform Engineering Is Not about Building Fancy UIs appeared first on The New Stack.

]]>

If I had to name the single biggest misconception some people have about platform engineering, it would be that the result of a successful platform engineering endeavor is a shiny user interface with lots of buttons to click and dashboards to look at.

Many people conflate developer portals and service catalogs with internal developer platforms (IDPs), but they aren’t the same. And there are real consequences to the confusion. At best, that shiny UI allows organizations to gain only a small part of the return on investment (ROI) they can get from platform engineering.

In 2022, I spoke with roughly 300 platform engineering teams. Many of these teams started their platform engineering journey with building a developer portal. However for 95% of them, other platforming initiatives would have had a bigger impact on developer productivity and ROI. Less than 20% of the teams see developers actually adopt and use the developer portal.

Developer Portals vs. Service Catalogs vs. Internal Developer Platforms

In 2022, Gartner clarified the relationship between developer portals and internal developer platforms:

“Internal developer portals serve as the interface through which developers can discover and access internal developer platform capabilities.”

For example, Netflix built a developer portal on top of its existing platform tooling.

An internal developer platform is the sum of all tech, tools and processes that a platform engineering team binds into a golden path for developers. The golden path reduces cognitive load and drives standardization by design. An IDP doesn’t even need to have a user interface. IDPs are about so much more than aggregating information and displaying it — from configuration and infrastructure management to environment and deployment management. Designing IDPs is about listening to what developers actually need on a daily basis and building solutions that address those needs. A developer portal can visualize the underlying platform, but it is not a necessary component of an IDP.

A developer portal or service catalog is a user interface that pulls data from several APIs and consolidates them against different views. A service catalog shows you a list of available services, which APIs they feature and the owner of the service. It pulls and aggregates the metadata from GitHub, ticketing systems and continuous integration (CI). Service catalogs often have a “templating gallery,” which is a more or less fancy collection of GitHub templates and dashboards.

Why Do Organizations Focus on Developer Portals and Service Catalogs?

If developer portals and service catalogs aren’t necessary, why do so many organizations focus on building them first? Here are some of the most common reasons I’ve seen:

  • It feels obvious: When organizations start their platform journey, they tend to think about easing pain points in chronological order. The first thing that comes to mind are the tasks you complete first. For the life cycle of an application, that could be creating a service. For the developer, it’s usually onboarding. Many organizations choose to start automating here first.
  • Is presentable: Dashboards are something you can show to your managers, especially if they don’t have a technical background. Visualizations are significantly easier to explain and sell than restructuring configuration management. But that doesn’t mean it makes more sense.
  • Everyone has an opinion on interfaces: While there are comparably few people in the platform engineering space who have an in-depth understanding of how to architect internal developer platforms with regard to underlying technologies and real pain points like config management, many more have an opinion on interfaces. As a result, there is more talk about interfaces than about really deep problems in the developer experience.

Why Do Developer Portal and Service Catalog Endeavors Often Fail?

After investing time and resources into developer portals and service catalogs, many organizations are disappointed with the results. Here’s why:

  • Developers hate “yet another interface.” They want to stay in code, in their git-push lane, and operate fast and without interruption. You can build the most beautiful UI, but that doesn’t mean anyone will regularly look at it. I looked at the usage metrics of the portal of a very large e-commerce player and found that, on average, developers were using exactly one function (search) once a year to check whether what they were building had been built before.
  • The tangible benefits are low. The most common use case I hear is “we want to standardize the creation of new services.” Let’s assume you create an insane 1,000 new microservices a year. How do you do this today? Probably by just cloning GitHub templates. Because portals themselves are basically just UI frameworks, all they do is call other APIs. So if you implement the functionality of “creating a new service by clicking on a button,” this button will call the GitHub templating API and clone the linked sample repo. Building a portal using the most frequently used open source frameworks takes realistically six months for, at the absolute minimum, two full-time employees (FTEs). But where’s your impact? Are you gaining 10 seconds per service creation? One minute, even? Let’s say for some miraculous reason it’s 1 minute, and we take the 1,000 services and two FTEs that we pay $100,000 a year. Then your ROI is still at a wildly -80%, and remember this assumes we are creating 1,000 services every year! Congratulations, you just wasted time.
  • Portals and service catalogs are also notoriously complex to implement and keep up to date. Developers will constantly circumvent, and a dashboard with wrong data is probably worse than no dashboard. You will spend an enormous amount of resources and time trying to keep stuff up to date.

Platform as a Product Is the Way

Instead of focusing on building developer portals or service catalogs, you should prioritize the features that benefit developers the most. You can figure out which features your organization needs by taking a product approach. With a product approach, you aren’t going to start by building the stuff some influencer tells you to or whatever feels obvious. Instead, you start with user research. Go to your developers and ask them what they need or want to do.

Then it’s your responsibility to prioritize those concerns. One way to do this is by noting how often developers do a certain task every 100 deployments and how long it takes. You’ll wind up with a table that looks something like the one below.

Sample Calculation

Procedure Frequency (% of deployments) Dev Time in hours (including waiting and errors) Ops Time in hours

(including waiting and errors)

Add/update app configurations (e.g., env variables) 5%* 1h* 1h*
Add services and dependencies 1%* 16h* 8h*
Add/update resources 0.38%* 8h* 24h*
Refactor and document architecture 0.28%* 40h* 8h*
Waiting due to blocked environment 0.5%* 15h* 0h*
Spinning up environment 0.33%* 24h* 24h*
Onboarding devs, retrain and swap teams 1%* 80h* 16h*
Roll back failed deployment 1.75% 10* 20*
Debugging, error tracing 4.40% 10* 10*
Waiting for other teams 6.30% 16* 16*

*Per 100 deployments. Source: https://humanitec.com/blog/top-10-fallacies-in-platform-engineering

You can use this table to figure out the ROI for your internal developer platform.

In most cases, I’ve found that two changes yield the biggest results. Making sure you really have basic CI/CD flows set up reduces toil and increases efficiency. Restructuring your configuration management from “static” to dynamic configuration management enables standardization by design, separation of concerns and continuous self-service with low cognitive load.

When Should You Still Build a Portal/Service Catalog?

This is not to say that there are no good reasons to build a developer portal. If your developers are creating an incredibly large number of services and resources and need to categorize them for inner-source endeavors, a portal is very beneficial. However, there are not many organizations that have the thousands of services and developers required to get to a positive ROI.

Sometimes you have to build a developer portal because management tells you to. You don’t have a choice. But if none of these cases apply to you, don’t waste your time focusing on the developer portal as a starting point. Instead, start with architecting your IDP. Your developers will thank you!

The post Platform Engineering Is Not about Building Fancy UIs appeared first on The New Stack.

]]>
WebAssembly to Let Developers Combine Languages https://thenewstack.io/webassembly-to-let-developers-combine-languages/ Mon, 06 Feb 2023 18:02:30 +0000 https://thenewstack.io/?p=22699618

What if there was a way to use libraries from whichever programming language you wanted and compile them together? And

The post WebAssembly to Let Developers Combine Languages appeared first on The New Stack.

]]>

What if there was a way to use libraries from whichever programming language you wanted and compile them together? And what if developers could do that not in the distant future, but by year end?

That’s exactly the problem the Bytecode Alliance plans to solve this year, according to the newly-appointed director of the Bytecode Alliance Technical Steering Committee, Bailey Hayes.

“We’re working on the roadmap,” said Hayes, who is also a director at the Wasm cloud company Cosmonic. “We have demos of this working — bubble gum, some wire, it works. But I expect us to have more real demos, probably by the end of the year. We’re talking a matter of months, not years.”

The Bytecode Alliance Foundation is a nonprofit organization that works on implementing a software foundation based on the W3C standards, including WebAssembly (aka, Wasm) and WebAssembly System Interface (WASI). Originally founded by Mozilla, Fastly, Intel and Red Hat in 2019, the Alliance’s first goal for the Wasm ecosystem is to ensure it remains secure and deny-by-default, Hayes said.

What’s WASI?

Originally, WASI was called “POSIX-like,” referring to “Portable Operating System Interface” — which is defined in Wikipedia as “a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems.” That analogy proved confusing, however.

“It’s not meant to be POSIX — that’s a common misconception,“ Hayes told The New Stack. “What we really meant was, there’s a common set of APIs that you expect, they [developers] can almost treat like this runtime that you’re targeting, and that lets you run really well outside of the browser. There’s certain things that just about every application depends on or expects. If you don’t have that, then you’re very restricted to what the WebAssembly module is allowed to do.”

WASI gives developers the capabilities such as accessing a file system, doing standard I/O, and other capabilities that developers get out of libc, which stands for the standard C library. Developers will call them standard foreign function interface (FFI) bindings, Hayes said, but really they are handles that the host runtime gives the WebAssembly module in a capability-driven way.

“That’s really what WASI is — a set of APIs, a common set of APIs and if you target that common set of APIs, you’re now able to run really well outside of a browser or really outside of any JavaScript runtime,” Hayes said.

A Preview of WASI Preview 2: The Component Model

That was the work of WASI Preview 1, which is stable. The ByteCode Alliance is now working on WASI Preview 2, which is expected to launch sometime this year.

“I think it’s going to be a big deal,” Hayes said. “It moves us to this new proposal that we’ve also been working on, another standard. It started off being called the interface-types proposal, but that has now folded into what we call the component model.”

The component model is basically a new operating model — a new way for people to build WebAssembly applications. This will allow developers to break down language silos and combine libraries from other languages, using Wasm as a language “one ring” to bind them.

“The way the world works today, because you exist in the JavaScript ecosystem and I exist in the Rust ecosystem, we can’t intersect, and that means that entire swaths of developers, your frontend people and your backend people treat each other almost adversarially,” Hayes said. “That’s part of what the component model is meant to enable, is that I can seamlessly work across language boundaries in a portable way.”

With the component model, developers could build a library in C++, a library in Rust, and a library in Python — or any other language, including JavaScript — and be able to build them together like Lego bricks, to make a complete application, she explained. And since security is the first concern for WebAssembly, they’d be able to do it in a secure way.

“The easiest analogy that people will understand is that it’s like going from a static library, static executable, to being able to work with dynamic libraries — but in terms of WebAssembly modules and with strict types and being able to expose those types to other WebAssembly modules that are components,” Hayes said. “It means that we completely change the way that we write software today. It means all of these silos that have existed for 20 years are gone.”

It’s not hard to see how revolutionary that could be for software development,

Currently, the focus is on Rust, JavaScript, C++, but some people are also working in Ruby, Python and Go. But it’s not limited to those languages, Hayes stressed.

“It’ll start small and I think it’ll grow exponentially,” she said.

Building a Component Registry

The Bytecode Alliance also has a special interest group working on building a component registry, although Hayes said that is still “very early days.” Its goal is to design a protocol so that other registries can speak the language of components, such as knowing what the component types are.

“Again, we have to circle back to the reason why we’re designing our own registry and our own protocol is so that we’re pulling in the latest advancements in security as far as content hashing, making sure everything’s signed the right way, and having an immutable log.”

Take, for instance, JavaScript, where the ecosystem typically involves npm — the node package manager.

“In that registry, if you said npm install, maybe there’s some new flag that we invent, but some new flag that says I’m dealing with a component, and it pulls in a component that’s using what we call the WARG protocol — the WebAssembly registry protocol,” she said. “That would make it so that you would be able to install components that were in theory written in any language, so you don’t have to learn that other language.”

Then developers only need to know things like which pieces from the library are needed and what functions to call. Last week at Cloud Native Security Con, Kyle Brown issued a call for security researchers to help with the registry.

“We don’t want to make the same mistake as basically every registry before us has in the past,” Hayes said.

Where Developers Can Start with WebAssembly

While the details of WASI Preview 2 are still being ironed out, there’s a lot that has been standardized with WebAssembly that developers can experiment with, Hayes said.

“WASI Preview 1 and also just the WebAssembly specifications 1.0 and 2.0 [are] super well supported everywhere,” she said. “Say you want to take languages that aren’t currently supported in the web, and run them on the frontend, that’s a great use case that is solid.”

The Emscripten toolchain is a good place to start, she added. It’s helpful for people who know JavaScript and want to learn a little C++, for example, or who know C++ and want to learn how to do just a little bit in JavaScript to make their app work in the web. Rust is another good starting point.

“Once WebAssembly work became underway, then it felt like Rust just exploded on the scene,” Hayes said. “So the two technologies have really co-evolved over time and what that means for somebody getting started working with Rust and WebAssembly is very seamless. It’s a solid developer experience.”

For example, using…

cargo build --


…to target a wasm32 binary. That can then be run with a Wasm runtime such as Wasmtime. Wasmtime makes it possible to have Wasm modules in many different places, like on the desktop, at the edge, as a serverless function, etc., Hayes explained.

She concluded that the work being done within the Bytecode Alliance is about building a software foundation for Wasm that’s secure-by-default, capability-driven, and portable. They want the ecosystem to extend to many different environments, from the web to the edge. She encouraged anyone who is interested in contributing to any of their projects within the alliance to watch the monthly community stream for an overview and to join Zulip to chat about the development of Bytecode Alliance projects.

The post WebAssembly to Let Developers Combine Languages appeared first on The New Stack.

]]>
When Losing $10,000 to Cryptojacking Is a Good Investment https://thenewstack.io/when-losing-10000-to-cryptojacking-is-a-good-investment/ Mon, 06 Feb 2023 18:00:29 +0000 https://thenewstack.io/?p=22698857

When we received an abnormally high bill from Heroku several months ago, I immediately knew the root cause: Someone had

The post When Losing $10,000 to Cryptojacking Is a Good Investment appeared first on The New Stack.

]]>

When we received an abnormally high bill from Heroku several months ago, I immediately knew the root cause: Someone had exploited our architecture and hijacked our compute resources. I could guess what had happened in detail because we had known long before that such exploitation was possible. And we still permitted it. Not only that, faced with similar choices, I, as the chief technology officer, would make the same decisions again.

To understand the “how” and the “why,” we first need to talk about Wilco’s architecture. We’re a young company, just over a year old, that wants to enable developers to acquire and practice skills. Our uniqueness is in the way we do it: a full-blown simulation of a workplace.

Wilco users’ experience closely resembles a job in a tech company. They have a GitHub repository to work on, a Slack-like corporate messenger (with simulated coworkers) and a production environment: data, users, load — the whole nine yards.

In practice, this is very challenging. GitHub, Slack and other SaaS tools are built to be used by teams. In our case, we needed to create an isolated mini-organization for every user. Let’s see how we do it and why it was exploitable.

How Wilco Works

Every Wilco user gets their own production instance. They have a URL for a frontend and a backend that can communicate with each other, as well as a database instance that saves their data. This way, users can test their changes (using CI/CD), and we can simulate things like performance issues by sending requests that mimic traffic.

It’s imperative that our users’ production environments aren’t black boxes. We need to give them visibility into what’s going on, to debug problems and configure things — like they would on a real job.

The Race to MVP

The “original sin” was our desire to get Wilco’s MVP ready as soon as possible and get it into the hands of real users. Before investing the time and money to build a proper, scalable solution, we needed to test a lot of assumptions about whether there’s even demand for a product like ours. We also needed the ability to develop and kill features quickly and cheaply while retaining the flexibility to double down on something if it’s proven successful.

The easiest way was to connect each repo to two Heroku apps, one for the frontend and one for the backend. That way, we were able to create one organization but give each user access to only their apps. We’d push the code to the right apps, and Heroku would handle the rest.

In addition to being efficient, it was also cost-effective. We got a database for free using Heroku’s free tier (RIP) and could pay only for the time users utilized the resources. So if a user spent only a few hours a month with their apps, we’d pay just for those few hours.

Calculated Risk

You might have already noticed a problem with our plan. Heroku wasn’t built for what we were doing, so scaling it would be difficult. We knew about the limitations and even mapped them:

  1. Heroku doesn’t have application-level authorization and roles. If you’re invited to an app, there are no limitations on what you can do with the app.
  2. There’s a limit on the number of applications that can be created under the same account — 200 apps. It means we can only support 100 users (each has a backend and a frontend app) under one organization.
  3. The sophisticated use cases devs are used to aren’t possible. Microservices, queues, cache and staged rollout are either a downright no-go or very complicated because Heroku was not built for these kinds of scenarios.

For each of the risks, we wanted to understand the impact:

  • No app-level authorization. Users can play with the configuration if they have this kind of control. It’s possible to upgrade app dynos or add add-ons that will cost us money. Unfortunately, Heroku doesn’t support putting limitations on an app’s spend, or have notifications when a threshold is exceeded, so we’d need to constantly keep an eye on this. That said, it didn’t break the siloing — users could only affect their own apps.
  • The 200 apps limit. We’d hit a wall for every 100 users, but as a short-term fix, we could manually create a few more organizations.
  • Use-case simplicity. Not a problem; we can focus on the easier use cases for now.

So what’s the worst-case scenario? A big bill from Heroku, and users receiving an error when trying to create an app. Not great, but not horrible. The alternative, building something from scratch, seemed worse.

Succeed to Fail > Fail to Succeed

How surprised are you that all of the bad scenarios we outlined materialized during the first three months of Wilco’s existence? First came a partner who wanted quests that were way more sophisticated than we could provide, with multiple servers. Then, one of our campaigns went viral, so we needed to cap it after a few minutes when we got to the Heroku app limit. The cherry on top was a user who figured out that they could upgrade their dyno and add add-ons to misuse our compute resources, with us footing the $10,000 bill.

Were we sad? Of course not! This was a massive success! We got traction and a lot of valuable feedback without having to build a lot of our own infrastructure. Heroku, to its credit, was kind enough to waive a small portion of the bill. If we had to build everything upfront, the cost of experimentation would be much greater than this bill.

The main downside, if you can call it that, of our approach is that we relied for a few months on what we knew was a temporary solution. But there’s a ray of light here as well: we had time. While building the Heroku-based Wilco, we were already planning our homegrown solution and scoping the resources required to build it.

By the time we received the bill, we were already deep into building our solution. Most of the work was done without any sense of urgency, as things were working relatively well.

Summary

In startups, time is one of the most important things we need to optimize. We can’t waste time building scaffolding for things that might not be used by anyone. Sometimes there’s no choice but to roll the dice with a calculated risk.

While it’s hard to build a solution that you know will not scale, and might cost a pretty penny, consider the time you’ll save in the long run and the money this time costs. We did, which is why I do not regret even for a second getting that bill.

The post When Losing $10,000 to Cryptojacking Is a Good Investment appeared first on The New Stack.

]]>
Top 6 SaaS Security Threats for 2023 https://thenewstack.io/top-6-saas-security-threats-for-2023/ Mon, 06 Feb 2023 16:48:31 +0000 https://thenewstack.io/?p=22699528

With the New Year here and employees back from holiday vacations, it’s time for security teams to prepare for the

The post Top 6 SaaS Security Threats for 2023 appeared first on The New Stack.

]]>

With the New Year here and employees back from holiday vacations, it’s time for security teams to prepare for the security challenges anticipated for 2023. With SaaS sprawl ever growing and becoming more complex, organizations can look to six areas within their software-as-a-service environment to harden and secure.

Misconfigurations Abound

Enterprises can have thousands of security controls in their employees’ SaaS apps. One of the security team’s biggest challenges is to secure each of these settings, user roles and permissions to ensure they comply with industry and company policy.

Besides their obvious risk of misalignment with security policies, configurations can change with each update, and the many compliance industry standards compound their complexity. Adding to that challenge, SaaS app owners tend to sit in business departments and are not trained or focused on the app’s security.

Security teams should onboard a SaaS security posture management (SSPM) solution that provides deep visibility and control across a critical mass of applications in the SaaS stack. The solution must identify both global app settings and platform-specific configurations within each app.

SSPMs should provide security teams with context into security alerts and help answer questions like: Which users are subject to a certain misconfiguration? Are they admins? Is their multifactor authentication (MFA) enabled? By having these answers at their fingertips, security teams can enforce company and industry policies and remediate potential risks from any misconfiguration.

SaaS-to-SaaS Access

SaaS-to-SaaS app integrations are designed for easy self-service installations, boosting efficiency and functionality. However, these features pose a security nightmare. The challenge is centered on the increasing volume of apps connected to the company’s SaaS environment.

On average, thousands of apps are connected without the approval or knowledge of the security team. Employees connect these apps, often to boost productivity, enable remote work, and better build and scale a company’s work processes.

However, when connecting apps to their workspaces, employees are prompted to grant permissions for the app to access. These permissions include the ability to read, create, update and delete corporate or personal data, not to mention that the app itself could be malicious.

By clicking “accept,” the permissions they grant can enable threat actors to gain access to valuable company data. Users are often unaware of the significance of the permissions they’ve granted to these third-party apps.

Falling in the shadow IT domain, security teams must be able to discover third-party apps and identify which pose a risk. From access scopes requested by these apps, to authorized users and cross-referencing, the security personnel should be able to measure the level of access to sensitive data across the organization’s stack. An SSPM solution like Adaptive Shield can arm the security team with this type of discovery and control in addition to providing advanced reporting capabilities for effective and accurate risk assessments to drive actionable measures.

Get a demo of how an SSPM solution can help mitigate third-party app access. 

Device-to-SaaS User Risk

Security teams must deal with threats from users accessing their SaaS applications from different, compromised devices. Accessing a SaaS app via an unmanaged device poses a high level of risk for an organization, especially when the device owner is a highly privileged user. Personal devices are susceptible to data theft and can inadvertently pass on malware into the organization’s environment. Lost or stolen devices can also provide a gateway for criminals to access the network.

Organizations need a solution that enables them to manage SaaS risks originating from compromised devices. An SSPM solution can identify privileged users such as admins and executives, calculate user-risk levels and recognize which endpoint devices need to be more secured.

Figure 1. Adaptive Shield’s device inventory

Identity and Access Governance

Every SaaS app user is a potential gateway for a threat actor, as seen in the most recent Uber MFA fatigue attack. Processes to ensure proper users’ access control and authentication settings are imperative, in addition to validation of role-based access management (as opposed to individual-based access) and establishing an understanding of access governance. Identity and access governance helps ensure that security teams have full visibility and control of what is happening across all domains.

Security teams need to monitor all identities to ensure that user activity meets their organization’s security guidelines. IAM governance enables the security team to act on arising issues by providing constant monitoring of the company’s SaaS security posture as well as its implementation of access control.

Data Leakage

Data leakage is a growing SaaS concern. Files or other resources that are shared with anyone who has a link, or shared are without an expiration date, are at risk of falling into unauthorized hands, as we saw in the recent Nissan and Slack breaches.

Security teams need to introduce data leakage protection solutions, which are typically included in SSPM platforms. This includes security checks looking into the permissions for each file, and an asset inventory showing exposed or publicly shared files from across the SaaS stack.

Identity Threat Detection and Response

Threat actors are increasingly targeting SaaS applications through their users. As more data shifts to the cloud, they are an attractive target that can be accessed from any computer with the right login credentials.

To prevent these types of attacks, organizations need to deploy SaaS identity threat detection and response (ITDR) mechanisms. This new set of tools is capable of identifying and alerting security teams when there is an anomaly or questionable user behavior, or when a malicious app is installed.

Final Thoughts

Gartner included SaaS security posture management (SSPM) in its 2021 report “4 Must-Have Technologies That Made the Gartner Hype Cycle for Cloud Security.” With an SSPM platform, like Adaptive Shield, organizations can prevent risk, detect and respond to threats, and harden their SaaS security ecosystem.

Learn how you can secure your entire SaaS stack through automation.

The post Top 6 SaaS Security Threats for 2023 appeared first on The New Stack.

]]>
Linux Foundation Study Assesses ‘Outsized’ U.S. Influence on Open Source https://thenewstack.io/futurewei-backed-lf-study-critiques-us-influence-on-oss/ Mon, 06 Feb 2023 15:43:06 +0000 https://thenewstack.io/?p=22699402

While the open source has always been about sharing the code for one and all, this ideal has been increasingly

The post Linux Foundation Study Assesses ‘Outsized’ U.S. Influence on Open Source appeared first on The New Stack.

]]>

While the open source has always been about sharing the code for one and all, this ideal has been increasingly at odds with a range of factors, including software fragmentation, politicization, weaponization, and a creeping techno-nationalism, which all can negatively impact open source’s vital collaborative framework.

Addressing these issues is a new report from the Linux Foundation, “Enabling Global Collaboration: How Open Source Leaders are Confronting the Challenges of Fragmentation,” authored by Anthony D. Williams, founder and president of the research firm the DEEP Centre.

The report was sponsored by Futurewei, Huawei’s U.S.-based research and development arm, and is a product of the Linux Foundation Research, founded in 2021.

Fragmentation is happening in the open source community from a variety of reasons, the Linux Foundation contends.

One potential culprit: Governmental techno-nationalism, which can block the transfer of critical innovations across boarders with protectionist measures, which has happened with both China and the U.S.. Likewise, the war between Russia and Ukraine has jeopardized software supply chains.

But fragmentation can also come from lack of standardization, or the proliferation of too many domain-specific systems (see the Internet of Things). And, of course, it can come from cultural differences and language barriers as well.

The study is “a thoughtful discussion of how the open source community can continue to thrive and continue its massively impressive growth,” wrote Hilary Carter, Linux Foundation senior vice president of research and communications, in a written statement to The New Stack.

Back in the USA

In the report, the U.S. is singled out regarding interference and limitations that can happen with open source sharing and development, though more at a corporate, rather than governmental level.

“Although the open source community is increasingly international, several leaders argue that organizations headquartered in the United States have outsized influence in shaping most open source projects,” Williams writes.

This report is not specifically a critical study of the outsized influence of U.S. open source,  Carter cautioned, but rather an assessment of whether or fragmentation in open source exists as a whole.

That said, “The U.S. is home to the world’s largest economy, which gives it huge influence on all kinds of industries,” Carter noted. “Open source leaders are paying attention to this.”

Certainly, the U.S. government is paying attention to the global flow of technology. Last week, reports surfaced about how the Biden Administration is considering tightening bans by U.S. suppliers to provide chips to Huawei, a Chinese-based multinational company, for mobile phones.

In this report, China is used as an example of the growing global nature of open source.

The study’s author writes that China “has become a significant consumer of and contributor to open source technologies,” according to the report. Nearly 90% of Chinese firms use open source technologies, and Chinese users are also the second most prolific group on GitHub after users from the United States.

“With China intent on boosting its software prowess, Chinese participation in open source will increase dramatically in the years ahead. China’s Ministry of Industry and Information Technology (MIIT) has expressed concerns about its domestic software industry’s international competitiveness and sees deeper participation in international open source projects as a means to place itself on an equal footing with global players.”

Methodology Questions

The key findings Williams communicated are:

  • Fragmentation is a double-edged sword. “Fragmentation challenges occur in developing open source solutions, but a decentralized ecosystem will always have some duplication and fragmentation,” Williams wrote. “Inefficient allocation of resources may occur, but efforts to reduce fragmentation could stifle competition and innovation and kill “the open source goose that laid the golden egg.”
  • Fragmentation can increase costs and complexity for consumers and vendors of open source solutions.
  • The open source community is increasingly global, but language, culture, and geopolitics remain barriers to participation.
  • Techno-nationalism threatens open source collaboration.
  • Foundations can help align open source projects with similar objectives without “picking winners.”

Fifteen of the study’s participants were quoted in the final report, the total number of participants was not disclosed. The report’s findings and recommendations are based on input that the study’s participants gave who the Linux Foundation says represent a cross-section of foundation leaders, member companies and end users. Their titles ranged from executive directors, C-level executives, community leaders, architects and engineers, the Linux Foundation states.

One immediate remedy for fragmentation comes in the form of fostering diversity, the foundation recommends.  The report notes that “there is considerable confidence in the ecosystem’s capacity to foster global inclusion.”

Carter notes that, for open source to succeed, “all industries, including the broader software and technology industries, need to pay attention to fostering diversity and inclusion.”

TNS Editor Joab Jackson contributed to this report.

The post Linux Foundation Study Assesses ‘Outsized’ U.S. Influence on Open Source appeared first on The New Stack.

]]>
How to Simplify Kubernetes Updates and Reduce Risk https://thenewstack.io/how-to-simplify-kubernetes-updates-and-reduce-risk/ Mon, 06 Feb 2023 15:00:19 +0000 https://thenewstack.io/?p=22699511

One of the advantages of using Kubernetes to run your infrastructure is that it makes keeping applications up to date

The post How to Simplify Kubernetes Updates and Reduce Risk appeared first on The New Stack.

]]>

One of the advantages of using Kubernetes to run your infrastructure is that it makes keeping applications up to date relatively straightforward. So it’s ironic that keeping Kubernetes itself up to date is considered much more of a problem.

It’s not that the updates themselves are an issue. Some software, such as Mirantis Container Cloud, will even do the updates for you. But that doesn’t mean that the update itself is without risk or the need to invest the time of some of your most costly people to prevent catastrophe.

In short, Kubernetes updates mean that multiple applications can break, so naturally, everyone is involved in preventing this from happening — developers, team leads, operators and security. Everything else stops until the update is complete, which can really cut into your bandwidth.

Let’s Look at Why Kubernetes Updates Can Be Such a Problem

The Kubernetes project website suggests a basic order of operations for updating clusters:

  1. Upgrade the control plane.
  2. Upgrade the nodes in your cluster.
  3. Upgrade clients such as kubectl.
  4. Adjust manifests and other resources based on the API changes that accompany the new Kubernetes version.

This seems like a simple process, but each step can be fraught with danger. Kubernetes is a fast-moving project that sometimes introduces breaking changes — for example, deprecating features, extending APIs and introducing new best practices, including new software components and so on. These changes can have widespread effects on how your cluster(s) work. Changes can affect:

  • How the cluster runs on infrastructure (host operating systems and networks). See this article for an example of a bug in last September’s release of Kubernetes version 1.25 that would make Kubernetes worker nodes unable to communicate over the network.
  • How the cluster works with other services and resources like cloud provider APIs, DNS, ingress, service mesh, storage, backup and so on.
  • How the cluster works with application-specific resources, used to help Kubernetes orchestrate the things you build and host on it.
  • And finally, a Kubernetes update can break applications themselves when any of these components change.

So updating a Kubernetes cluster is a deep, potentially scary and perhaps an expensive proposition. If the application that is at the heart of your business goes down, you’re at a standstill. Can you afford for that to happen? Can anyone? For how long?

To prevent this from happening, first you need your most technical people to read the release notes in detail and flag anything that will break something. These changes may be an immediate blocker, in which case you’ll need to adapt your implementation and/or applications before updating.

Then you need to test everything you plan to do before you do it. In practice, that means you need to build a (perhaps substantial) test cluster that duplicates your current environment in as many respects as possible (ideally all of them). You need to mount the latest version of your applications on it, make your integrations to it and make sure everything works “as in production.”

And then you need to perform the update process meticulously while testing for problems as you go, with an eye to halting and rolling back as issues are discovered and assets (manifests and resources) are altered to adapt to the new version of Kubernetes; then retry the update until you can accomplish it without incident.

The complexity of these operations goes way up as cluster size and sophistication increases, as products external to Kubernetes become important dependencies, and so on. Things also get harder as the applications you run get configured in more complicated ways.

Ways to Simplify the Process and Reduce Risk

The best way to manage this complexity is through automation, though surprisingly, many Kubernetes users use only very basic automation to deploy clusters. Monolithic automation may be able to move a single target cluster (and potentially hosting and surrounding infrastructure) to a new desired state, but it might not be up to the complex task of updating a cluster in several phases, interspersed with testing (and rollbacks and so on).

You might need to compose and test custom automation to manage your particular update process, which will then become something that needs to be tested with each update.

All of this involves cooperation between operators, architects, DevOps engineers and application developers, all of whom must take time away from their primary duties until the update is successfully completed.

The alternative is to work with partners and providers such as ZeroOps practitioners, who will take this burden off your shoulders. This “de-risking” of Kubernetes updates is actually a complex process in itself. Look to a critical-path Kubernetes operations partner to:

  • Help you plan software development and operations, make decisions about and build your Kubernetes cluster model. It’s possible to encode best practices from the start — in how you deploy Kubernetes clusters and how you build applications and services for them — to anticipate and prevent dependencies from evolving.
  • Plan for updates, perform necessary tests and build proof-of concept clusters with limited footprints using pre-GA project software assets. This works best if the partner is actively maintaining and supporting the Kubernetes distribution that you use, which means staying away from the absolute “bleeding edge” of Kubernetes updates while still keeping your implementation contemporary (and, of course, fully supported).
  • Provide and support continuously evolving and improving automation — not just to deploy and manage whole clusters, but also to automate the entire update process so that it happens reliably within a short maintenance window. In principle, the goal is to make updates seamless and continuous, entirely without disruption to running applications and processes.
  • Extend your software development and operations teams with deep expertise required to interpret update communications. Work with the Kubernetes community to identify potential impacts and know enough about your operations and applications to flag “gotchas” early. Make plans to remediate — continually improving your way of working to be more and more free of dangerous dependencies and increasingly update friendly.

In short, while Kubernetes updates should, in theory, be straightforward, they can’t be a “set it and forget it” proposition. There’s too much potential for breaks. Whether you’re using a ZeroOps partner or going it alone, Kubernetes updates should always be performed carefully and deliberately, even if it means that everything comes to a stop with all hands on deck until it’s complete.

The post How to Simplify Kubernetes Updates and Reduce Risk appeared first on The New Stack.

]]>
How Platform Teams Can Align Stakeholders https://thenewstack.io/how-platform-teams-can-align-stakeholders/ Mon, 06 Feb 2023 11:00:13 +0000 https://thenewstack.io/?p=22699483

“I don’t really know anything about infrastructure. I’m not technical at all, but I’ve been lucky enough to work on

The post How Platform Teams Can Align Stakeholders appeared first on The New Stack.

]]>

“I don’t really know anything about infrastructure. I’m not technical at all, but I’ve been lucky enough to work on a few different clients on infrastructure and platform products.”

That’s how Poppy Nicolle Rowse, business analyst at the Thoughtworks consultancy kicked off her talk at HashiCorp HashiConf Europe 2022. So why should we care about what she’s talking about? Because platform engineering is all about maximizing business value from technical products. It’s about breaking down another silo between business and technology. Platform engineering, at its core, is about bringing engineers closer to business drivers, and about giving businesses a better understanding of the value of technical work.

Appropriately, she was joined on stage by then-colleague and Deliveroo technical lead Chris Shepherd who brought ample hands-on infrastructure experience.

“When we’re talking about infrastructure, we mean all the stuff that devs need to build great products. So security, scalability, performance, all of that good stuff that you need to build all the good stuff on top,” Rowse said.

“But what you tend to see is development teams, infrastructure teams, building the same things again and again, to achieve the same results,” Shepherd continued.

This is where a platform and a platform team can offer these building blocks in a simpler way to reduce waste. But you still need people to adopt it. And not external customers. You have the often bigger challenge of persuading your colleagues to use your product. And for businesses to continue to fund it. Let’s learn from Rowse and Shepherd how to engage with and align stakeholders to achieve platform engineering success.

Scope out Your Developers

“When we talk about users, we mean developers. We mean the teams who are consuming your products,” Rowse said.

That means you should treat your internal customers the same way as you’d treat the paying ones. Except we know most companies aren’t. Puppet’s 2023 State of DevOps Report, which focuses on platform engineering, found that organizations consistently under-invest in product management skills for their platform teams. About a third of respondents didn’t even have a product owner on their team, and about half didn’t have a pure product manager, but someone that was handling that alongside their DevOps or engineering duties.

The report found that embracing platform engineering hinges on adopting a product mindset with tight feedback loops to make sure “that they’re building systems that solve the problems their users face.” The Puppet team found that the same steps are required for a highly effective internal product team:

  • User research
  • Product roadmaps
  • Soliciting feedback
  • Iterating
  • Launching
  • Maintaining
  • Marketing

If a third of respondents don’t even have PMs, can you imagine how few have internal marketers or user researchers?

“We actually need to talk to our consumers and figure out what’s most important to them,” Rowse said, which is kicked off via a discovery or scoping phase to figure out what your customer wants. She recommends the agile practice of event storming or model storming, a practice of behavior-driven development that has all stakeholders — including those developer customers — jotting domains on sticky notes, then grouping them together and organizing them into logical processes. It’s called a storm because it kicks off a bit chaotic but then creates a safe environment for all ideas to be considered. It can be done co-located on a wall or on a visual collaboration tool like Mural, Miro or Jamboard.

“You want to put all the different pain points. This gives you a beautiful visual heat map of, ‘Okay, we can see the areas where there’s a lot of pain points going on. This is where we maybe need to focus some of our efforts’,” Rowse said, recommending running event storming sessions not just at the start of your platform engineering journey, but once it’s live-in-production as well as running day-to-day.

Plan for Each Persona’s Pain Points

“I know your users are developers, and I know a lot of you are developers, so you think, ‘OK, I already know what’s the best thing and what’s most appropriate here.’ But actually not all of the users are the same,” she said. Embedding on dev teams or borrowing devs to help build the platform are ways to overcome platform engineers assuming what the dev teams want.

From the start, Rowse reminds us to get a wide range of stakeholders talking: “Talk to different teams. Talk to your teams that have been running for five years. Talk to your new teams who are spinning up brand-new products from scratch. Talk to your people who are brand new to the organization. Talk to the people who’ve been there for years, and just get that broad range. Get them all in the same room. Do the event storming with them all at once.” Or else, she continued, “host asynchronous sessions and compare maps to find where they differ and where they are repeating work” — which often becomes first priority.

Rowse then makes an interesting point that just because it’s a shared pain point, doesn’t necessarily mean the platform team has to create something new for it. The platform team may just facilitate a conversation where learnings are shared across teams.

It also will vary by team, which is why she recommends building different customer personas. Infrastructure experts may just want to continue to build their own thing, while your front-end developers — that suddenly have to learn cloud engineering outside of their job descriptions — are “really frustrated at the process and don’t necessarily have the skills or capabilities to deliver infrastructure stuff themselves and spin up things themselves.”

If platform engineering is new to your organization, Rowse advises: “Be really purposeful about the scope and who you decide to be your customer. It might be that you’ve got a load of legacy stuff, and, actually, it’s a bit too mangled and awful to migrate over [to the cloud], so actually you’re going to focus on building infrastructure products that can really accelerate the delivery of your newer products. Or it might be that, actually, you’ve got a handful of products being built and they’re the top priority in your organization, so you’re really going to narrow in on those ones and make sure that you’re supporting the delivery of those products.”

Remember, at the end of the day, it’s about increasing the business value of engineering.

“And don’t assume that people are just going to come because you’ve built this fancy thing and you’ve spent loads of time on it,” warned Rowse.

This means continuously interacting and asking for feedback from your developer customers, especially during the build stage. Check that you’re building what they want and can use, she advised. The more you invest in engaging with them, the more they are going to be ready to use it — because the more it’ll suit their needs.

‘Architectaeology’ and Alignment

As a platform team is meant to serve several or all developer teams, it’s important to align cross-organizational values and then document and over-communicate any architectural decisions based on them.

Lean value stream mapping for a faster release cycle

Rowse said to kick off with a shared strategy like the lean value stream mapping:

  • Vision
  • Measurable, achievable goals
  • Bets, the deliverables to support the goals

“The things that we might actually build are actually quite different depending on your vision,” meaning alignment around vision is key, Rowse said, clarifying what you want to get out of your infrastructure platform. “Maybe it’s cost. Maybe it’s just pain points from all your different developers and keeping your developers happy. Maybe it’s that you don’t have capability within the teams so you need these centralized products that are super easy to use.”

lean value stream mapping when security is the most important value

She also offered an adapted version of Scaled Agile Framework’s Weighted Shortest Job First (WSJF) as a way to take the objectivity out of privatization, looking for the biggest wins with the least amount of effort needed.

A complex formula that shows that the task of highest value and least time is putting Launch Darkly on-prem

“This is so important, not just to get that priority, but also to get that alignment and make sure that actually all your non-technical stakeholders really understand the value you’re going to deliver here. And because you’ve spent that time doing it, you are fairly confident that that is going to deliver your vision and your strategy,” Rowse said.

Shepherd also emphasized that the platform team has to communicate out to the widest stakeholders. Diagrams, documents and Wardley Maps have to be circulated early and often.

The platform teams should outlive product teams, he remarked, “because you’re building stuff on which other teams are going to be running their stuff, right?” And staff changes pretty fast in tech. In order for an organization to adopt a platform mindset, he says you need to embrace “architectaeology” — which includes documentation about why the platform was built that way, the roadmap and the technical vision behind it.

“This is forward-looking, but also backward-looking, to help engineers in the future understand the decisions that have been made and the thing that you’re building,” Shepherd said, pointing to the C4 model as they best way to visualize “architectaeology”:

  • Context diagram – describes at a very high level
  • Containers diagram – zooms in on various components and how they connect
  • Components diagram – how to use and configure in your environment
  • Code diagram – entity relationships and classes, especially when building your own, highlight quirks

It shows the shiny new service and flows to Cloudwatch then FluentD then elastic search

Not every diagram speaks to every stakeholder, but, as pagers pass and roles change, this architecture will always be valuable to someone.

This is also supported by architectural decision records, in your code repository with:

  • Decision
  • Date made
  • Context of why
  • Decision
  • Consequences (can include licenses or subscription)
  • Who was in the room

If You Build It, Will They Come?

It’s all well and good for stakeholders to be aligned, but will your developer customers actually want to use your shiny new, value-driven platform? And how usable is it, anyway?

a very complex user journey that has developers crossing the platform barrier five times

“Time spent with your users is really valuable,” Shepherd said, “but time spent with your users to have to onboard to your system is wasted effort.”

From the start, he argues, you have to aim for a self-service experience, especially in onboarding. Whenever possible, this should be achieved via “one and done” where ideally users only interact with the operator one time, or, at least, reduce the operational steps down to a minimum.

a more simplistic user journey that only has developers interacting with operations once

And, of course, he says, eat your own dogwood and go through every step like onboarding to make it as smooth as possible.

Keep a Minimum Viable Product mentality and build enough to gain feedback and then iterate. Because, just like an external-facing SaaS product, your success in platform engineering hinges on having users.

The post How Platform Teams Can Align Stakeholders appeared first on The New Stack.

]]>
Can C++ Be Saved? Bjarne Stroustrup on Ensuring Memory Safety https://thenewstack.io/can-c-be-saved-bjarne-stroustrup-on-ensuring-memory-safety/ Sun, 05 Feb 2023 14:00:42 +0000 https://thenewstack.io/?p=22698999

There’s turmoil in the C++ community. In mid-January, the official C++ “direction group” — which makes recommendations for the programming

The post Can C++ Be Saved? Bjarne Stroustrup on Ensuring Memory Safety appeared first on The New Stack.

]]>

There’s turmoil in the C++ community. In mid-January, the official C++ “direction group” — which makes recommendations for the programming language’s evolution — issued a statement addressing concerns about C++ safety. While many languages now support “basic type safety” — that is, ensuring that variables access only sections of memory that are clearly defined by their data types — C++ has struggled to offer similar guarantees.

This new statement, co-authored by C++ creator Bjarne Stroustrup, now appears to call for changing the C++ programming language itself to address safety concerns. “We now support the idea that the changes for safety need to be not just in tooling, but visible in the language/compiler, and library.”

The group still also supports its long-preferred use of debugging tools to ensure safety (and “pushing tooling to enable more global analysis in identifying hard for humans to identify safety concerns”). But that January statement emphasizes its recommendation for changes within C++.

Specifically, it proposes “packaging several features into profiles” (with profiles defined later as “a collection of restrictions and requirements that defines a property to be enforced” by, for example, triggering an automatic analysis.) In this way the new changes for safety “should be visible such that the Safe code section can be named (possibly using profiles), and can mix with normal code.”

And this new approach would ultimately bring not just safety but also flexibility, with profiles specifically designed to support embedded computing, performance-sensitive applications, or highly specific problem domains, like automotive, aerospace, avionics, nuclear, or medical applications.

“For example, we might even have safety profiles for safe-embedded, safe-automotive, safe-medical, performance-games, performance-HPC, and EU-government-regulation,” the group suggests.

Elsewhere in the document they put it more succinctly. “To support more than one notion of ‘safety’, we need to be able to name them.”

But the proposed changes echo thoughts that emerged in a kind of showdown in December with the federal government. The mid-January statement notes concerns raised about the safety of C++ by a particularly heavy-hitting organization: the U.S. Department of Commerce’s influential National Institute of Standards and Technology. And in November, America’s National Security Agency also called out C++ in an information sheet on software memory safety (as part of its mission to identify threats to various federal systems and “issue cybersecurity specifications and mitigations.”)

Maybe it was that high-level concern that ultimately planted the seeds of change…

A National Security Issue

The NSA had cited estimates from Microsoft and Google that, over several years, roughly 70% of vulnerabilities come from memory safety issues. They followed this with a warning that these simple programmer mistakes can allow attackers to access sensitive information or even execute unauthorized code that leads to large-scale network intrusions. So whether it’s overflowing a memory buffer or memory allocation vulnerabilities, race conditions or uninitialized variables — “all of these memory issues are much too common occurrences.”

A 2019 Microsoft security presentation found 70% of vulnerabilities from 2006 to 2018 involved memory safety

Yes, software analysis tools and “operating environment options” can spot many of the issues, but the NSA had still recommended, “when possible,” to just use a memory-safe language instead.

To be clear, they defined this as a language where through run-time and compile-time checks, memory “is managed automatically as part of the computer language; it does not rely on the programmer adding code to implement memory protections.” The NSA provided as its examples: C#, Go, Java, Ruby, Rust, and Swift.

Responding in December on the Open Standards website, Stroustrup had countered that he doesn’t consider those languages superior to C++ “for the range of uses I care about.”

Stroustrup also objected that the NSA’s discussion of safety “is limited to memory safety, leaving out on the order of a dozen other ways that a language could (and will) be used to violate some form of safety and security… There is not just one definition of ‘safety’, and we can achieve a variety of kinds of safety through a combination of programming styles, support libraries, and enforcement through static analysis.”

Along the way, Stroustrup also made a second argument: that in some real-world scenarios where performance is paramount, “Not everyone prioritizes ‘safety’ above all else.” So Stroustrup argued that the “sensible” thing to do is to make a list of safety issues (including undefined behavior), then find ways to prevent them as needed using pre-execution debugging tools (like static analyzers).

Along those lines, Stroustrup had already been calling for both compiler options and code annotations for C++ that request type safety (and resource safety), saying this “lets you apply the safety guarantees only where required and use your favorite tuning techniques where needed….”

The newly-proposed “profiles” seem like an in-language way of accomplishing just that.

Safety in C++

Stroustrup also objected to C++ being lumped in with C in the NSA’s document. He pointed out that even now “The C++ Core Guidelines specifically aims at delivering statically guaranteed type-safe and resource-safe C++ for people who need that without disrupting code bases that can manage without such strong guarantees or introducing additional toolchains.”

And those Core Guidelines are already supported by Microsoft’s Visual Studio analyzer (and its memory-safety profile), as well as many static analyzers. (Stroustrup also cites the linter Clang tidy, which he says has some support for the C++ core guidelines.) This approach allows C++ “to completely deliver those guarantees at a fraction of the cost of a change to a variety of novel ‘safe’ languages,” Stroustrup argued.

Stroustrup also cited another paper he wrote in 2021 which made the case that “Complete type-and-resource safety have been an ideal (aim) of C++ from very early on (1979) and is achievable though a judicious programming technique enforced by language rules and static analysis.” (Later Stroustrup writes that the solution is “a carefully crafted set of programming rules supported by library facilities and enforced by static analysis.”)

The paper acknowledged that on its own, “By default, the Core Guidelines do not provide complete type-and-resource safety” — but argued that it can be guaranteed by enforcing additional rules (“as implemented by the Core Guidelines checker distributed with Microsoft Visual Studio,” for example.) In a nod to Rust’s compiler-based type-checking, Stroustrup wrote that “The compiler is not our only tool, and has never been,” providing specific examples of the powerful checks that can be performed by a (pre-compilation) static analysis. For example, static analysis can:

  • Prevent unsafe type conversions
  • Prevent the creation of uninitialized objects
  • Ensure no memory-referencing pointer “escapes” beyond its narrowly-defined scope to erroneously point to something else.

In December’s response to the NSA, Stroustrup wrote that we live in a world where “the billions of lines of C++ code will not magically disappear,” adding that instead it’s important to have a gradual adoption of these safety rules (and the adoption of different safety rules, where appropriate).

The NSA’s paper seemed to agree with some of this — to a point. The NSA paper included tips on “hardening” code written in a non-memory-safe language, recommending tools for both static analysis (examining the source code) and dynamic analysis (performed while the code is executing) — along with vulnerability correlation tools to simplify the results. “Working through the issues identified by the tools can take considerable work, but will result in more robust and secure code.”

And the NSA’s paper does note the “considerable protection” provided by “the use of added protections to non-memory safe languages”. (It also suggests hardening the compilation and execution environment through security features like Control Flow Guard, Address Space Layout Randomization, and Data Execution Prevention.)

A Long-Standing Design Goal

In a new interview for Honeypot’s “Untold Developer Stories”, 72-year-old Stroustrup looked back to his student days, when as a young man he’d discovered that he wasn’t as good at math as he thought he was — but that “machine architecture was really fun.”

But there was less to say in 2020 when someone asked Stroustrup what he’d change if he could go back in time. “That’s a time machine question, and we don’t have a time machine,” he replied.

“One of the interesting aspects of programming language design is that if you succeed, you have what you did many many years and decades ago, and you have to live with it. Once you get users, you have responsibilities, and one of the responsibilities is not to break their code… There’s a few hundred billion lines of C++ out there, and we can’t break them.”

Stroustrup stressed his faith in C++. “I think C++ can do anything Rust can do, and I would like it to be much simpler to use.” But he also said in that 2020 interview that basic type safety — ensuring variables access only their clearly-delineated chunks of memory — was one of his earliest design goals, and one he’s spent decades trying to achieve. “I get a little bit sad when I hear people talk about C++ as if they were back in the 1980s, the 1990s, which a lot of people do,” Stroustrup said in 2020.

“They looked at it back in the dark ages, and they haven’t looked since.”

The post Can C++ Be Saved? Bjarne Stroustrup on Ensuring Memory Safety appeared first on The New Stack.

]]>
Install Minikube on Ubuntu Linux for Easy Kubernetes Development https://thenewstack.io/install-minikube-on-ubuntu-linux-for-easy-kubernetes-development/ Sat, 04 Feb 2023 14:00:33 +0000 https://thenewstack.io/?p=22698826

I’ve long said that Kubernetes is far from user-friendly. Not only is deploying pods and services to a cluster a

The post Install Minikube on Ubuntu Linux for Easy Kubernetes Development appeared first on The New Stack.

]]>

I’ve long said that Kubernetes is far from user-friendly. Not only is deploying pods and services to a cluster a challenge, but simply getting the cluster up and running can be a real nightmare.

Fortunately, there are a few applications available that make deploying a Kubernetes-friendly environment relatively simple. I’ve already talked about deploying a Kubernetes cluster via MicroK8s and this time around we’ll do something similar with a tool called Minikube. The purpose of Minikube is to create a local Kubernetes cluster for development purposes. This means you won’t be deploying apps and services at scale with this platform. Instead, Minikube is a great way to start learning how to work with Kubernetes.

You can deploy Minikube on Linux, macOS, and Windows. Given Linux is my operating system of choice, I’ll demonstrate on Ubuntu Linux. With this tutorial, you should be able to get a Kubernetes environment up and running in less than five minutes.

Ready? Let’s get busy.

Requirements

To make this work, you’ll need a running instance of a Ubuntu-based Linux distribution and a user with sudo privileges. The minimum requirements of Minikube are:

  • Two CPUs or more
  • 2GB of free memory
  • 20GB of free disk space

With those requirements met, it’s time to install.

Installing Docker CE

Unlike regular Kubernetes, Minikube depends on Docker. So, before Minikube will function, you must first install the Docker runtime. Here’s how.

The first thing to do (after you’ve logged into your Ubuntu instance) is to add the official Docker GPG key with the command:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg


Next, add the Docker repository:

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list &gt; /dev/null


Install the necessary dependencies with the following command:

sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release -y


Install the latest version of the Docker engine with these two commands:

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io -y


Finally, add your user to the docker group with the command:

sudo usermod -aG docker $USER


Log out and log back in for the changes to take effect.

Docker is now installed.

Installing Minikube

Download the latest Minikube binary with the command:

wget https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64


Copy the file to the /usr/local/bin directory with the command:

sudo cp minikube-linux-amd64 /usr/local/bin/minikube


Give the Minikube executable the proper permissions with:

sudo chmod +x /usr/local/bin/minikube


Next, we need to install the kubectl command line utility. Download the binary executable file with:

curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl


Give the new file the executable permission with:

chmod +x kubectl


Move the file into /usr/local/bin with the command:

sudo mv kubectl /usr/local/bin/


You can now start Minikube with the command:

minikube start --driver=docker


After the command completes, you can verify it’s running properly with the command:

minikube status


The output will look like the following:

minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

Using kubectl via Minikube

With Minikube ready, you can now start to play around with Kubernetes. For example, you can check on the status of the cluster with the command:

kubetcl get-nodes


The output of the command will look something like this:

Kubernetes control plane is running at https://192.168.49.2:8443
CoreDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy


To further debug and diagnose cluster problems, use ‘kubectl cluster-info dump

Check the status of the nodes with the command:

NAME       STATUS   ROLES           AGE   VERSION
minikube   Ready    control-plane   16m   v1.25.3

Installing Add-ons

Minkube also includes a number of add-ons to extend the feature set, such as ingress, metrics server, and dashboard. To find out what add-ons are available, issue the command:

minikube addons list


Let’s say you want to add the Dashboard add-on, which can be achieved with the command:

minikube addons enable dashboard


The output of the command will include the address used to access your new Dashboard. Of course, the caveat to this is that it’ll be a local address, such as 127.0.0.1, and you can’t access it from outside the machine hosting Minikube. Because of this, your best bet is to install and use Minikube on a Linux distribution that has a desktop, otherwise, the Dashboard won’t be accessible.

Other than that, you can start developing or learning the ropes of Kubernetes with the help of Minikube. You certainly won’t be using Minikube in a production environment, but as a development environment, it’s hard to be this simple to use the Kubernetes platform.

The post Install Minikube on Ubuntu Linux for Easy Kubernetes Development appeared first on The New Stack.

]]>
Nextdoor’s Plan for Catching New Release Troubles Early on https://thenewstack.io/nextdoors-plan-for-catching-new-release-troubles-early-on/ Fri, 03 Feb 2023 21:02:57 +0000 https://thenewstack.io/?p=22697947

Hyperlocal social networking service Nextdoor uses old-school statistics to isolate problems early on with new mobile app updates. There are

The post Nextdoor’s Plan for Catching New Release Troubles Early on appeared first on The New Stack.

]]>

Hyperlocal social networking service Nextdoor uses old-school statistics to isolate problems early on with new mobile app updates.

There are tens of millions of active users weekly on Nextdoor’s mobile Android and iOS applications. To keep up with regular weekly releases, Nextdoor built an App Release Anomaly Detection tool that stops client-side regressions before they have a severe impact on the application. The blog post written by Walt Leung, Nextdoor software engineer, and Shane Butler, Nextdoor data scientist, goes into great detail on the topic.

Nextdoor employs a phased rollout strategy for its weekly mobile releases. Initial users are limited to a new version to ensure safe and scalable deployments. But, since a very small and specific subset of their total user base engages with the latest versions first, traditional observability methods aren’t as effective in earlier phases. Nextdoor started using “difference-in-differences” analysis to identify app session decline 10 days earlier than week-over-week figures.

The Problem

Phased rollouts aren’t the problem, observing them is. Out-of-the-box methods fall short in early phases for two reasons — early data is very specific to the early users and very small. The first users to adopt a new version tend to be more active than the median skewing the overall data sample in one direction.

Small relates to the sample size in general. Consider a hypothetical new version, v.1.234.5, released on March 4th. If a regression was introduced where an app session wasn’t counted 5% of the time, at a 1% rollout, the aggregate impact is roughly 0.05% of all iOS app sessions. It’s a number that’s impossible to detect with aggregate-level observability. Factor in the high activity level — maybe 0.06% or 0.07% or all sessions were impacted. It’s hard to see or draw a clear analysis.

… until the full rollout when a 5% regression is “business critical”.

The top trend line shows app sessions. The bottom trend line shows the release adoption over time due to phased rollouts.

The information Nextdoor needs from the phased rollout’s early adopters is, “what is the difference between their actual app sessions after adoption compared with their hypothetical app sessions had they never adopted the release in the first place?” This is an unobserved counterfactual in statistics. Difference-in-differences analysis measures it.

The Solution — Applying Difference-in-Differences Analysis

Nextdoor can’t compare users who adopted the new version and those who didn’t directly — their underlying behaviors are too different. But they can look at overall trends and turn all metrics into relative metrics.

Nextdoor applied difference-in-differences analysis of this effect by accounting for the separate time-varying effects of users that have and haven’t adopted a release. For v1.234.5, this meant calculating the difference in app sessions of both early adopters and non-adopters for the three days before the release period and three days after. Nextdoor observed a -0.02 decline in early adopters and a +0.20 increase in non-adopters.

It’s critical to make sure both groups exhibit similar behavior pre-adoption. If the trend is similar before adoption, it means the results would match if the trends continued (the pre-trend assumption).

This didn’t happen in the case of v1.234.5 since there was a decrease of -0.02 in the adopters and an increase of +0.20 in the adopters. The difference-in-differences is calculated to estimate a comparison against an unobserved counterfactual.

-0.02–0.20 = -0.22 decrease in app sessions due to iOS release v1.234.5

Difference-in-differences analysis and their sample size of over hundreds of thousands of users gave Nextdoor high confidence in similar pre-trend behavior with a standard deviation bound over the preceding few days to adoption. If the behavior holds, they fit a linear regression model that estimates the average effect of a release for any particular metric.

y = β0 + β1* Time_Period + β2* Treated + β3*(Time_Period*Treated) + e

Results

Now statistically significant negative effects can be measured across multiple app sessions metrics.

The image above illustrates the average % lift of metrics Nextdoor ran App Release Anomaly Detection on for v1.234.5.

Because of difference-on-differences, phase 1 of the rollout is incredibly informative. Nextdoor can flag an app sessions decline 10 days earlier than the previous, diagnose the decline to a specific release, and isolate the regression to less than 1% of users. The engineering team also no longer needs to factor in external variables such as seasonality or the day of the week.

Nextdoor credits the App Release Anomaly Detection as one of the foundational elements that allow the engineering team to iterate quickly and effectively by preventing, “nearly all severe critical client-side regressions,” the Nextdoor engineers write. They also credit this tool for the “peace of mind it gives us to release bigger changes at a more rapid pace.”

The post Nextdoor’s Plan for Catching New Release Troubles Early on appeared first on The New Stack.

]]>
Is the Answer to Your Data Science Needs Inside Your IT Team?  https://thenewstack.io/is-the-answer-to-your-data-science-needs-inside-your-it-team/ Fri, 03 Feb 2023 17:19:55 +0000 https://thenewstack.io/?p=22699375

The demand for data scientists is at a fever pitch. A 2022 survey revealed that insufficient talent or headcount is

The post Is the Answer to Your Data Science Needs Inside Your IT Team?  appeared first on The New Stack.

]]>

The demand for data scientists is at a fever pitch. A 2022 survey revealed that insufficient talent or headcount is the biggest barrier to the successful enterprise adoption of data science.

But there might be a better option than trying to compete for talent on the open market: upskilling and cultivating data science talent from within your team.

Gartner calls this practice “quiet hiring.” A play on the well-known “quiet quitting” trend, quiet hiring involves quietly tapping into current resources to fulfill corporate needs without adding headcount. Quiet hiring involves creating an atmosphere where current talent — including developers, operations managers and data scientists — can learn, stretch their boundaries and thrive.

One of the keys to quiet hiring, according to Gartner, is offering “upskilling opportunities for existing employees while meeting evolving organizational needs.” The first part sounds ideal for nurturing data science talent from within your organization.

The second part implies a need for continual education so the resources you uncover can remain at the top of their games and ready to respond to changing corporate dynamics.

Where Is the Data Science Talent in Your Organization?

Perhaps it’s in your development team.

The lines between data science and software development have blurred considerably in recent years. Applications have become much more data-driven, requiring data scientists and app developers to work closely together.

As a result, application developers might be interested in familiarizing themselves with common data science languages, such as Python and R, and tools such as PyTorch, TensorFlow and Jupyter Notebook.

This is not meant to replace the education and training they’ve already invested in. But as developers begin to learn more about the basics behind data science, they may want to learn more about data science processes so they can contribute to those processes.

If the future is data-driven applications, understanding what it takes to create those apps can’t be a one-way street. So data scientists should be just as curious about what goes into development.

It’s been said that software development skills are essential for data science, and that’s true: A basic understanding of development processes and how to work with software engineers makes it much more likely that models will be put into production. Therefore, continually educating data scientists on the latest methods for data analysis and model development, along with software development best practices, will help them advance in their expertise.

Wherever your talent lies, it’s important to continually cultivate it so that your team can remain challenged, fulfilled, happy and far less likely to leave for another opportunity. Here are four strategies that can help you build and retain your valuable data science resources.

Feed the Thirst for Knowledge

Recently, Red Hat commissioned a couple of surveys to find out where data scientists and application developers go for information or to have questions answered. We discovered that both parties are ravenous for information. They get their news and insights through myriad sources, from online publications to message boards to conferences, trade shows and workshops. Both have a desire for learning and knowledge sharing.

Providing your team members with easy access to the information they need can help them thrive in the workplace and help your organization grow. Encourage them to peruse their usual sources, and consider creating an internal information library with information on the latest tools for data analytics, deep learning, model optimization and more.

It’s also important to give your team time to learn. For example, Red Hat is an open source software company. We see new technologies and projects being developed continuously in the open source community. We want to keep our teams apprised of those innovative technologies and projects, so we maintain a learning library with various modules and workshops.

If someone wants to learn how to get started with Jupyter Notebook, optimize their models with OpenVINO or build machine learning into applications, they can go to the library, find what they’re interested in and learn how to apply it. We also provide a regular day of learning to give individuals the space to invest in expanding their skills.

Provide a Common Platform for Collaboration

Supplying team members with access to written knowledge and workshops is a great start, but real continuous learning stems from allowing people to participate in team projects that promote collaboration. That means providing access to the tools themselves, as well as the ability to collaborate without silos.

Allowing data scientists and developers to work together in real time provides multiple benefits. First, it allows for more expeditious and agile development of intelligent apps. Second, it allows developers and data scientists to learn about each other’s needs and processes. When each group is so closely connected and understands each other, it improves the chances of project success.

Agile application development requires everyone to work in sync. When Red Hat began exploring ways to bridge the gap that has traditionally existed between developers and data scientists, we expanded on the idea of creating a common platform for real-time collaboration between them.

Within this common platform, development and data science teams would have access to all the tools they need to perform their tasks, and could quickly build and share production pipelines.

This platform, Open Data Hub, started in the office of the CTO a few years ago. It connected data scientists, developers and operation managers to create a common platform for MLOps.

Open Data Hub was so effective at solving our internal data science and development challenges that we ultimately evolved it into a commercial offering called Red Hat OpenShift Data Science. It brings data science and development closer together to expedite application development and deployment.

It also allows teams to better understand how their work affects the process. With that understanding, they can learn how to optimize their contributions and, by extension, their knowledge of how to create intelligent applications.

Give Data Scientists the Tools They Want and an Environment to Use Them

Knowledge sharing and hands-on experience are vital keys to cultivating and retaining the data science talent in your organization, but it’s equally essential to allow scientists to get creative with the skills they’re building. To do this, they’ll need the right tools and access to environments that welcome experimentation and innovation.

The open source community has long been a hotbed of innovation for software engineering, but open source data science is growing quickly, too. PyTorch, scikit-learn, TensorFlow, Kubeflow and others are all great examples of open source projects that resulted in the creation of some of the most powerful data science tools.

It’s important to provide your data scientists (and even developers with an interest in data science) with access to these and other tools, whether through a common platform or some other means. These tools provide them with the freedom to experiment, innovate, and add value to your organization. They help create an engaging and challenging environment that, ideally, results in talent retention.

Encourage your team to take those tools — and their skills — to the open source community, where they can experiment, refine their talent and offer their contributions.

The Open Data Hub, for example, is an open source community initiative to bring together over 20 technologies across the model life cycle on top of OpenShift, Red Hat’s Kubernetes-powered application platform. It is an excellent place for data scientists to learn about and participate in upstream efforts to build intelligent applications. They can collaborate with other scientists and continue to build their skills while helping to forge the future of data science.

Make It Easier for Everyone to Do What They Love

To continually refine a craft, people have to immerse themselves in activities they love. For data scientists, that means analyzing data, building and refining models, finding new and unique ways to incorporate artificial intelligence and machine learning and more. Likewise, for developers it means writing quality code and developing software that solves real-world challenges.

But it’s hard for them to do these things when they’re distracted by bug fixes and other operational headaches. To help your teams continue to learn and grow, make it easier for them to do the things that are most important to them by alleviating the need to continually context shift between value-added work and distractions.

There are a couple of options. Consider using a managed cloud service that curates the latest open source tools and manages necessary updates and fixes so that teams can concentrate on honing their talents and maximizing their value.

Or, create an integrated model development and MLOps environment that brings together data science, development and operations. That way, data scientists and developers can focus on their tasks while operations ensure everything runs smoothly.

Investing in Internal Talent = Investing in Your Company’s Future

Intelligent applications are your company’s present and future. You need to be able to build them quickly, cost-effectively and at scale. The strategies outlined in this article will help you do that, but they’ll also help you accomplish something perhaps even more important.

By tapping into the talent that’s already around you, you’ll be able to grow your team’s data science and development capabilities. And by growing and nurturing that talent, you’ll help create a loyal, passionate and informed workforce primed to supply your organization with innovative thinking, creative solutions and truly smart applications.

In short, by investing in your internal talent, you’ll also be investing in your company’s data-driven future.

The post Is the Answer to Your Data Science Needs Inside Your IT Team?  appeared first on The New Stack.

]]>
Tech Works: How Can We Break Our Obsession with Meetings? https://thenewstack.io/tech-works-how-can-we-break-our-obsession-with-meetings/ Fri, 03 Feb 2023 16:30:39 +0000 https://thenewstack.io/?p=22699006

How many hours a week do you spend in meetings? Go on. Look at your calendar. I’ll wait. Your first

The post Tech Works: How Can We Break Our Obsession with Meetings? appeared first on The New Stack.

]]>

How many hours a week do you spend in meetings? Go on. Look at your calendar. I’ll wait.

Your first response was probably “Too [insert expletive of choice] many,” followed by shock at how many hours you actually do waste in meetings. Because the tech industry has a meeting problem. Daily stand-ups. Retrospectives. Let’s jump on a bridge. Kickoffs. Ask Me Anythings (AMAs). Team meetings that could’ve been one-to-ones. Agenda-less meetings that should’ve been an email.

What started as a way to check in on isolated teammates almost three years ago has grown into a habit of being constantly on camera. Meetings spill well over their time boxes, not considering that colleagues may need comfort breaks or caffeine to endure — let alone time to do their actual jobs.

It doesn’t just feel like we’re hiding our yawns through more meetings. Scheduling app ReclaimAI found that there was a nearly 70% increase in meetings between February 2020 and October 2021. There’s no way that co-located water cooler time had that much benefit. We really are having more meetings — which means we’re working longer days.

Then, a couple weeks ago, Shopify became a beacon of hope, kicking off the year with a company-wide “calendar purge.” Calling meetings a bug, not a feature, the mandate is to cut any recurring meetings with three or more people. Any mega, 50-plus person meetings can only happen in a six-hour window on Thursdays. And Shopify has forbidden any meetings on Wednesdays, in order to foster precious uninterrupted time.

Is this the start of a welcome trend? For my inaugural Tech Works column, let’s figure out how to have fewer, better meetings.

How to Protect Your Own Time

“It’s one thing to ask me what I’m doing. It’s another one to ask me with the intention of taking that time,” said Jarrett Hill, a journalist, in a recent episode of his FANTI podcast. “I don’t like when people are asking me what I’m doing for the sake of them being able to take up whatever that time is — that evening, my day, my calendar.”

We rolled right into work-life fusion at the kickoff of the pandemic, but now we need to push back and recreate boundaries. How do we protect our time? How do we evade meetings when we really just want to get work done — or eat?

Some companies are trying to help workers set boundaries. At Spotify, colleagues are now encouraged to say no to meeting requests and leave large group chats.

That’s easier said than done. Still, it’s a hopeful sign that the industry might be giving agency back to the individual over their own time.

“I finally think the pendulum is moving towards more awareness on the role of meetings in the modern workday,” Rowena Hennigan, founder of RoRemote and a global expert in remote work and digital nomadism, told The New Stack.

Hopefully, this is a sign of a bounce-back based on lessons learned from the pager-always-on, developer burnout that clouded the early days of the pandemic. “Individuals are more aware of the need to be diligent in protecting and guarding their own time, to be effective and focused in their work,” Hennigan said.

People are beginning to ask the right questions, she observed, including:

  • Do we need a meeting at all?
  • I have a meeting request. Should I accept it?
  • What is the purpose of the meeting and where is the agenda?
  • What are the ideal decisions and outcomes of this meeting?
  • Who should attend the meeting to get the best outcomes and decisions?

There’s a growing awareness of time management and calendar blocking as a way to deliver it. I learned from remote work advocate Lisette Sutherland, to always put your exercise time right into your work calendar, and, for at least five years, my yoga classes have been as precious as any work meeting.

For Hennigan, it’s about taking the time to plan out all the repetitive scheduling that allows you to prioritize yourself, including marking your weekly planning sessions and lunch and other breaks as “repeat weekly.”

“People are less likely to simply offer up all of their time at work time to the potential of a possible meeting. Guarding and controlling that schedule is a key skill for modern remote workers,” she said, and really any workers would benefit from the habit.

This can also be only choosing to check your email only two or three times a day or putting your devices on Do Not Disturb. And by putting an actual blocker on your calendar to focus on a specific task or project, you can worry less about someone sneaking in a last-minute meeting.

Of course, there’s also those of us suspicious of Calendly and the like because it lets people put things on your calendar, usurping control over your schedule.

How to Protect Your Team’s Time

Maybe meetings are so inefficient because you haven’t invested in training your team on processes, tooling and ways to optimize communication. Almost half of the respondents to Mural and Microsoft’s just-released 2023 Collaboration Trends Report have left their jobs because of poor collaboration.

And this can’t be fixed with tools. While the collaboration tool market is set to double within a couple years, 47% of the people who used five or more collaboration tools responded that they still run into obstacles with effective communication.

That could be because three out of five respondents have never learned formal collaboration skills. Remote-first is the mindset and practice of treating company-wide communication as if it were remote — irrespective of whether you have just one or all colleagues working offsite.

Even in a typically collocated company, this honed and enforced culture of written and asynchronous communication avoids unnecessary meetings whenever possible and enables flexible work whenever needed.

Just like we do with DevOps, we should look to elite remote-first teams to learn from their years of practice. Certainly, the 100-person team, distributed across 35 countries and 15 time zones, that is building the productivity tool Doist is one of those standouts. This tech company reports a retention rate of over 85%, alongside continuous revenue growth.

Doist’s Head of Remote Chase Warrington attributes a lot of this success to “a strong stance against meetings, making them the last resort instead of a go-to activity.” Based on a recent anonymous internal survey, he discovered:

  • All team members spend less than eight hours per week in meetings.
  • 65% spend less than two hours a week in them.
  • 88% agree or strongly agree the meetings they do attend are a good use of their time.
  • Teammates have a 24-hour window to respond to any messages.

“About 90% of our communication happens in writing through Twist, our team communication tool,” Warrington told The New Stack.

“Everything from project updates to feedback to proposals occurs asynchronously.”

He added, “This mindset shift, from meeting-centric to async-centric, has positively affected our bottom line, employee engagement, and productivity.”

New Rules for Meetings

Meetings are expensive, both in terms of time and productivity, according to Sid Sijbrandi, CEO and co-founder of GitLab.

The all-remote company, with roughly 2,000 employees spread the globe, must wrangle time zones when it sets up meetings, so it must make them count.

On a December edition of Logan Bartlett’s Cartoon Avatars podcast, Sijbrandi spelled out how his company handles meetings. Pre-meeting written agendas are mandatory; notes are always taken, because what is actually discussed may differ from the stated agenda. Presentations are videotaped and sent to attendees in the meeting invite, but not usually given during the meeting itself.

Also, it’s almost a “badge of honor,” the GitLab CEO said, to multitask during online meetings.

“We think it’s a spectacular coincidence if 100% of a meeting is relevant to you,” he told Bartlett. “So it’s totally cool to do your email on the side, to do whatever you want on the side. It’s OK to say, ‘Sorry, I wasn’t paying attention. Could you repeat the question?’”

We wouldn’t feel like being part of a team without some meetings. But where can you start to cut — maybe without going so far as Shopify? On the blog of collaboration tool Slack, writer Deanna DeBara shared three types of meetings she believes should never happen:

  • Status update meetings
  • Agenda-less meetings
  • In-person by default

It’s always good to poll your colleagues to see if they gain benefits from those daily stand-ups and ask if they’d rather only check-in weekly or use Jira or Slack to work transparently and asynchronously. And all business- or tech-driven meetings should have agendas sent out in advance — try to stick to it and end five minutes early.

Of course, each team and its needs are different, so clarifying the meaning of meeting is important upfront. Remote teams should also create a team agreement, Hennigan recommended, “that includes clear guidelines on the purpose and focus of meetings, covering off the key criteria and guiding their team members on best practices.”

Some teams, she continued, also create best-practice checklists for remote meetings versus hybrid meetings versus co-located ones.

So tell us, how are your meetings evolving (or not) in 2023?


Author’s note: Interviews for this month’s column were conducted asynchronously. It shouldn’t have surprised me, but remote work advocates prefer written communication in lieu of meetings, too.

Tell me what you would like to read about in future installments of Tech Works. I’m on Twitter and LinkedIn.

The post Tech Works: How Can We Break Our Obsession with Meetings? appeared first on The New Stack.

]]>
I Need to Talk to You about Kubernetes GitOps https://thenewstack.io/i-need-to-talk-to-you-about-kubernetes-gitops/ Fri, 03 Feb 2023 14:13:19 +0000 https://thenewstack.io/?p=22699305

GitOps is one of the most impactful evolutions that has happened during Kubernetes’ rise to the top. We had been

The post I Need to Talk to You about Kubernetes GitOps appeared first on The New Stack.

]]>

GitOps is one of the most impactful evolutions that has happened during Kubernetes’ rise to the top. We had been building our kubefirst instant cloud native platform for more than a year when we discovered GitOps. It wrecked us. We decided to throw away a bunch of our work and start over on the new GitOps discipline, and it was the right call.

In a world where microservices and microproducts endlessly blossom throughout your platform’s ecosystem, it becomes increasingly difficult over time to manage these tens, then hundreds and soon thousands of microcomponents. But GitOps is able to reel all this back under control with the simplicity of a single branch of a git repository and some files that describe exactly what’s deployed.

Let’s explore Kubernetes GitOps together.

What GitOps Isn’t

When first reading the term GitOps, many think it’s what they’re already doing. If you use git, and that’s automatically driving your DevOps pipelines — that’s GitOps right? Decidedly not.

Here are some indicators that you’re not doing GitOps:

  • If you’re not using Kubernetes, you’re likely not doing GitOps. Oh, stop yelling.
  • If you are using Kubernetes but you have a delivery pipeline that includes the command kubectl apply or helm install (or any other “push this to Kubernetes” scripting or “sync everything” job), you are doing GitOps’ predecessor that’s recently been coined ScriptOps.
  • If you are using Kubernetes but you manually run a kubectl apply or helm install to deliver content to your cluster, you are doing ClickOps (even when it’s a command).
  • Even if you’re logging into Argo CD and pointing apps at git directories using the UI, that UI activity is a ClickOps operation and should be avoided when doing GitOps. ClickOps should probably be done in ephemeral environments for tech spikes only.

What GitOps Is

GitOps marries your git provider with your Kubernetes engine and serves as an application control plane for your desired state, which you keep hosted in git. If set up correctly, a GitOps shop can establish a registry of all Kubernetes resources across your organization in a single main branch of a git repository.

When thinking about GitOps, it’s best to imagine an impenetrable wall separating CI (automation pipelines) from CD (delivery), and that barrier is this GitOps main branch. As an engineer, continuous delivery (CD) will no longer be your responsibility; that’s the job of your CD Engine. Your only delivery role in continuous integration (CI) is to establish that desired state.

Why You Should Commit to GitOps #gitpuns

Architecture Simplicity

Kubernetes operations are conducted by handing Kubernetes what you want in YAML format.

Git is very good at versioning a distributed system of flat files like YAML and tracking when and why they change.

When you add a GitOps engine that can pull the desired state from git, apply it to the cluster and report back with sync status in an endless reconciliation loop, it produces a powerful, yet simple architecture that’s rooted in proven distributed technologies that your engineers are already using.

Discoverability

The GitOps approach to Kubernetes asset management and how you scale your applications across your clusters is superior to any alternative. A good GitOps engineer can walk into a new GitOps environment and be almost immediately impactful.

The discipline allows you to register a tree of applications represented as YAML files in a single main branch of a single git repository. If you bootstrap your clusters this way, anyone who is familiar with GitOps will be able to follow the tree and discover all of the desired state and already know how the content is delivered.

Your GitOps engineer will become one of the most valuable, and somehow also the most replaceable, members of your organization due to the cluster registry’s discoverability.

This should give a feel for what a GitOps registry would look like: https://github.com/kubefirst/GitOps-template/tree/1.10.9/registry

Security

CI tools are common attack vectors for bad actors. Any access they have is at risk during a breach. With GitOps, your CI tool won’t need access to your cluster. Instead your cluster will pull its deployments and configurations using a read-only connection to your GitOps git repository. Securing that git repository becomes the new game, which is much easier to manage, especially if you manage your git repositories in Terraform.

System Audit Log

If every change to your infrastructure gets applied because of a pull request in a single repository, then the history of pull requests being applied is the audit log of everything that’s ever happened, along with who approved it and everything else that git provides on its own. This is a much more convenient process for engineers than what ScriptOps organizations can typically offer.

Rollback

Another advantage of hosting your declared desired state in git is its ability to roll back any problem that’s introduced, in many cases, by merely reverting the problematic commit in the GitOps repository. It’s a simple engineering skill that can now solve some big problems.

Disaster Recovery

If your cluster always gets and syncs apps from a single git repo registry, replacing that cluster is as simple as pointing your new cluster to that same registry point, and it will make itself the same cluster. There are some devils in those details with host names and traffic concurrency, but this is one of your goals as a GitOps admin.

If you manually delete a deployment in Kubernetes and it’s part of GitOps, it comes back immediately. The GitOps repo main branch source will continually attempt to become the actual state without any scripted jobs anywhere. This is an excellent posture to keep your service availability high.

How GitOps Works

Git Repositories

You need to create a GitOps repository that has a folder that you can register your cluster against. For a single cluster, you can just call this folder “registry.”

GitOps Engines

GitOps works by placing a GitOps engine like Argo CD or Flux CD into your Kubernetes cluster. You’ll need to configure the engine with read-only access to your GitOps git repository. When you install this GitOps engine, you’ll point it to this directory in order to hydrate your clusters with apps.

GitOps Architecture Decisions

Once you provision your management cluster (a centralized cluster that manages and orchestrates your management systems and infrastructure), you have a GitOps architecture decision to make about the workload clusters that run your applications (production cluster, preprod, etc.) and whether they should each have their own GitOps engine or use that of the management cluster.

Distributed GitOps Architecture

The distributed GitOps architecture (sometimes called bootstrapped or standalone) is the only model that prevents your management cluster from needing access to your production cluster. For organizations with higher security requirements or who are subject to a compliance boundary, this is an ideal posture.

The distributed GitOps architecture requires that each workload cluster have its own dedicated instance of Argo CD. This won’t be much of a burden to your admins, provided you have single sign on implemented throughout the Argo CD instances.

This pattern also distributes the resource demand on Argo CD to each cluster, allowing it to horizontally scale those GitOps computations across the ecosystem. The smaller blast radius when adjusting the GitOps engine itself is also a nice benefit.

Centralized GitOps Architecture

Some organizations go with a centralized GitOps architecture, where the management cluster Argo CD instance has push access to enforce desired state onto the downstream workload clusters. This architecture allows you to view all environments’ applications from a single Argo CD instance, which can be a convenience in a number of ways, lending itself to GitOps templating techniques like ApplicationSets.

Abstracting Secrets in GitOps

If all your system is defined declaratively in git, how do you deal with secrets? The external secrets operator is probably the best tool for this problem. It provides a custom resource definition (CRD) that lets you define external secret resources that map to your secret tool of choice like Hashicorp Vault or your cloud secret store. This allows you to reference your secrets in git, without actually placing them there.

Automating GitOps Delivery in Your CI Pipelines

To deliver your app to your environments in your CI pipeline, you simply need to update the YAML file that represents that application instance in your GitOps repository. This is really all that is meant by the term “desired state.” It’s a file in a folder where you can define what should be in Kubernetes.

Choosing Your GitOps Technology

Choosing the right GitOps driver is one of your most important architectural decisions when designing your new Kubernetes platform. OpenGitOps offers a set of principles that define the GitOps discipline in a vendor-agnostic way to help guide you through this decision-making process.

See GitOps Automation Workflows Locally in Under 5 Minutes

Our kubefirst CLI provisions free and fully automated cloud native open source GitOps platforms.

We’re building a community of users who are using the same free open source cloud native tools in the same approximate ways. These users can contribute, help each other, and help produce a frictionless and fully automated cloud native core ecosystem.

We have a local platform and an expanding set of cloud platforms that are fully automated from the start. The open source tools that we provision are all preconfigured to work well with each other. You’ll be able to explore scalable, unwrapped open source tools like Argo CD, Argo Workflows, Vault, Terraform, Atlantis, External Secrets Operator, Cert Manager, External DNS and many others.

Because our architecture provides you with your own GitOps repository that powers your new environment, you’re free to take your new platform in any direction you choose, and you can always leave us behind if you choose to. We won’t be in your way. It’s a completely GitOps-based offering where you can remove our opinions and add your own with a pull request to the GitOps repository that you now own.

The fastest way to check out the platform is on the kubefirst local variation, where you can have a full, free cloud native ecosystem in just five minutes on your own localhost. Give your Docker runtime at least 5 CPU / 5 GB memory for a good time, if you can. Then just run:

brew install kubefirst/tools/kubefirst
kubefirst local

Our cloud platform details and other install types can be found at https://docs.kubefirst.io/

The cloud native platform includes a sample application called metaphor-frontend that demonstrates how to run GitOps delivery using Argo workflows that come with batteries included — prebuilt GitHub Actions that are running on private GitHub runners in your local cluster. We provide some automation to build and publish containers and charts, set GitOps-desired state and automatically manage versions for releases, even locally.

The metaphor-frontend app also demonstrates how to use the rest of the platform, like leveraging secrets from Vault, Helm values overrides, using an ingress, TLS certificate automation, DNS management automation, release management automation and so much more.

Run a quick install, shoot us a GitHub star for the free management platform, join our workspace and help us build an awesome community of engineers using the same approximate cloud native tools the same way. You really won’t believe what Kubernetes GitOps can do.

The post I Need to Talk to You about Kubernetes GitOps appeared first on The New Stack.

]]>
Feature Flags Are Not Just for Devs https://thenewstack.io/feature-flags-are-not-just-for-devs/ Thu, 02 Feb 2023 22:37:39 +0000 https://thenewstack.io/?p=22698882

The story goes something like this: There’s this marketing manager who is trying to time a launch. She asks the

The post Feature Flags Are Not Just for Devs appeared first on The New Stack.

]]>

The story goes something like this:

There’s this marketing manager who is trying to time a launch. She asks the developer team when the service will be ready. The dev team says maybe a few months. Let’s say three months from now in April. The marketing manager begins prepping for the release.

The dev team releases the services the following week.

It’s not an uncommon occurrence.

Feature Flags are not Just for Devs

Edith Harbaugh is the co-founder and CEO of LaunchDarkly, a company she launched in 2014 with John Kodumal to solve these problems with software releases that affect organizations worldwide. Today, LaunchDarkly has 4,000 customers and an annual return revenue rate of $100 million.

We interviewed Harbaugh for our Tech Founder Odyssey series on The New Stack Makers about her journey and LaunchDarkly’s work. The interview starts with this question about the timing of dev releases and the relationship between developers and other constituencies, particularly the marketing organization.

LaunchDarkly is the number one feature management company, Harbaugh said. “Their mission is to provide services to launch software in a measured, controlled fashion. Harbaugh and Kodumal, CTO, founded the company on the premise that software development and releasing software is arduous.

“You wonder whether you’re building the right thing,” Harbaugh said, who has worked as both an engineer and a product manager. “Once you get it out to the market, it often is not quite right. And then you just run this huge risk of how do you fix things on the fly.”

Feature flagging was a technique that a lot of software companies did. Harbaugh worked at Tripit, a travel service, where they used feature flags as did companies such as Atlassian, where Kodumal had developed software.

“So the kernel of LaunchDarkly, when we started in 2014, was to make this technique of feature flagging into a movement called feature management, to allow everybody to build better software faster, in a safer way.”

LaunchDarkly allows companies to release features however granular an organization wants, allowing a developer to push a release into production in different pieces at different times, Harbaugh said. So, a marketing organization can send a release out even after the developer team has released it into production.

“So, for example, if, we were running a release, and we wanted somebody from The New Stack to see it first, the marketing person could turn it on just for you.”

Harbaugh describes herself as a huge geek. But she also gets it in a rare way for geeks and non-geeks alike. She and Kodumal took a concept used effectively by developers, transforming it into a service that provides feature management for a broader customer base, like the marketer wanting to push releases out in a granular way for a launch on the East Coast that is pre-programmed with feature flags in advance from the company office the previous day in San Francisco.

The idea is novel, but like many intelligent, technical founders, Harbaugh’s journey reflects her place today. She’s a leader in the space, and a fun person to talk to, so we hope you enjoy this latest episode in our tech founder series from The New Stack Makers.

The post Feature Flags Are Not Just for Devs appeared first on The New Stack.

]]>
Watch Out, Attackers Have Their Heads in Your Cloud! https://thenewstack.io/watch-out-attackers-have-their-heads-in-your-cloud/ Thu, 02 Feb 2023 20:29:55 +0000 https://thenewstack.io/?p=22699383

It should come as little surprise that when enterprise and IT leaders turned their attention to the cloud, so did

The post Watch Out, Attackers Have Their Heads in Your Cloud! appeared first on The New Stack.

]]>

It should come as little surprise that when enterprise and IT leaders turned their attention to the cloud, so did attackers. Today’s cloud-first approach to building dynamic work environments blurs the boundaries of where the corporate network begins and ends, and which apps belong to the company. This combined with the growing adoption of multicloud and hybrid work environments, means these boundaries are no longer fixed.

Unfortunately, the security capabilities of enterprises have not always kept up with the threat landscape. Poor visibility, management challenges and misconfigurations combine with other security and compliance issues to make protecting cloud environments a complex endeavor.

The price of failure is high. According to IBM’s ”Cost of a Data Breach Report 2021,” it took organizations at a “mature stage of cloud modernization” an average of 252 days to identify and contain a cloud-based data breach.

Public cloud breaches were the most costly, at an estimated average price tag of $4.8 million. The costs for organizations with a high level of cloud migration were also significantly higher than for those with low levels of cloud migration.

Why Traditional Approaches Fail

As the risk has grown, so too has the need for organizations to rethink their approach to security. Silos are the death of security in the cloud. Yet, silos are common for organizations using multiple tools to manage user access to their cloud assets. If security is not implemented in a unified, integrated way, blind spots and security issues are inevitable.

Many organizations have responded by implementing cloud native tools from cloud security platforms. However, many of these tools are focused on pre-runtime vulnerabilities and compliance and only offer a snapshot of the organization’s security posture at a moment in time.

The movement to “shift security left” and bake it deeper into the development process has allowed organizations to catch security vulnerabilities earlier, but insecure APIs, misconfigurations and other issues can slip through the cracks due to the dynamic nature of cloud environments and the desire to avoid any slowdown in application delivery.

Adversaries know this. They know today’s continuous integration, continuous delivery (CI/CD) development life cycle has DevOps teams spinning clouds up and down in minutes, paying little attention to potential misconfigurations. Adversaries know that it only takes a second for an intrusion to latch on to a vulnerability and convert into a fast-moving lateral breach.

This is why security teams need an adversary-focused approach — understanding the different attackers, their mindset, tools and techniques — that automates security controls regardless of the cloud provider or deployment model.

So Why Take an Adversary-Focused Approach?

Finding the right defensive strategy is contingent on understanding how attackers are targeting cloud environments. To be successful, you need the ability to correlate security events with indicators of attack, based on real-time threat intelligence and telemetry from across your cloud estate and on-prem environment.

Only then can you put this data into action, identifying the shifts in adversarial tactics to better understand how an adversary will target an organization and to prevent threats in real time.

Taking an adversary-focused approach arms security and incident response (IR) teams with a higher level of context about the situation they are facing. By leveraging threat intelligence and mixing it with continuous visibility, organizations can better defend their assets.

Pre-runtime and compliance data alone will not provide IR teams with the type of comprehensive data they need — they require as much data as possible to support their investigations and get a complete picture of what is happening.

Elements of an Adversary-Focused Approach:

Integrated threat intelligence is key. A proactive security strategy for today’s cloud begins with studying the tactics, techniques and procedures (TTPs) that threat actors are executing in hybrid environments. Only then can security teams turn their attention to preventing cloud breaches.

Visibility is critical. Organizations need to know how many cloud assets exist and where they reside. When all the dark corners have been lit, threat intelligence can lay the foundation for relevant insights. If an attacker is taking advantage of a lack of outbound communication restrictions to exfiltrate data, organizations have to be able to detect that and enforce policies to block it.

The principle of least privilege should be a governing idea of any security strategy, particularly one being applied to a cloud environment where the concept of the traditional perimeter is essentially nonexistent. Knowing how threat actors are trying to access cloud resources better positions organizations to lock down cloud applications and resources and reduce risk.

Cloud hygiene is a simple step that can go a long way in defending against modern attackers. Businesses operating in the cloud should clarify security responsibility so both the vendor and security teams know how to apportion monitoring tasks. Access management is a key part of this as well; not everyone needs access to all cloud environments at all times. IT and security teams must also understand the need to protect applications during coding and run time but need to do so at the speed of DevOps.

Automation is another key pillar of an adversary-focused approach to today’s security solutions. Given the thousands of attack surfaces that cloud environments work with, automation is necessary to monitor and remediate solutions at scale.

Security teams need to ensure that the secure thing to do is the easy thing to do by allowing DevOps to actively participate without friction, and automation is a critical component to making that happen.

Make the Secure Thing to Do the Easy Thing to Do

Thinking like an attacker and knowing their tactics, techniques and procedures is a fundamental part of protecting IT infrastructure. The attack surface of the cloud — with its dynamic mix of containers, virtual machines, microservices and more — is complex and growing. With attackers circling, it would be a mistake for organizations to focus on the cloud less than attackers do.

Attacks are not always direct; sometimes, adversaries strike the on-premises environment first and then go after cloud resources. In a hybrid IT world, organizations need to be able to extend the security controls protecting their on-premises environment beyond to the cloud to maintain consistency and compliance.

To learn more visit us here.

The post Watch Out, Attackers Have Their Heads in Your Cloud! appeared first on The New Stack.

]]>