4. Finding and selecting software

From SME Guide

Jump to: navigation, search

As briefly mentioned in the previous chapter, the software selection process is an often overlooked but extremely important component of a migration or adoption of FLOSS. As mentioned in Appendix 1, there are more than 18000 mature and stable open source project, and most of these have no strict "promotional" budget or are not backed by companies that are able to provide marketing and dissemination support.

One of the “hidden” costs of the adoption or migration to FLOSS is the selection process - deciding which packages to use, and estimating the risk of use when a project is not “mature” or considered enterprise-grade. In the COSPA migration project it was found that in many instances the selection and evaluation process was responsible for 20% of the total cost of migration (including both the actual process, and the cost incurred in selecting the wrong package and then re-performing the assessment with a new one).

The problem of software selection is that there is a full spectrum of choices, and a different attitude to risk - a research experiment may be more interested in features, while a mission-critical adoption may be more interested in the long-term survivability of the software they are adopting. For this reason many different estimating methods were researched in the past, including EU-based research projects (the QSOS method, SQO-OSS, QUALOSS) and business-oriented systems like OpenBRR or the Open Source Maturity Model of CapGemini. The biggest problem of those methods is related to the fact that the non-functional assessment (that is, estimating the “quality” of the code and its community and liveness) is a non-trivial activity, that involves the evaluation and understanding of many different aspects of how FLOSS is produced. For this reason, we will introduce a different approach based on one side on automated extraction of parameters, and on the other hand on individual feature evaluation.

There are three separate steps that should be taken to successfully identify a set of FLOSS packages:

  • identify your requirements
  • search for packages matching your functional requirements
  • select the appropriate package from the matching set

The first step is an often overlooked activity, but is crucial for a successful adoption; in many cases, there are no perfect matches for a given proprietary product, but equally good alternatives that perform the necessary activity as well (and sometimes even better). In this sense, a small shortlist of "required" and "useful" functions should be a first step in performing the selection.

After the shortlist, it is necessary to find the packages that may satisfy the given requirements. There are several important web sites that provide information on available software, both in an undifferentiated way (like SourceForge, that mainly acts as a project repository) and through detailed reviews and comparisons with proprietary software.


Forge-based sites:

these sites are mostly providing support and download services, and host a number of project that varies between 150000 (Sourceforge) to a few hundred; an integrated search functionality is provided. Most are based on SourceForge code, its reimplementation (GForge), or on collaborative development platforms that provide similar services (storage, email communication, code versioning and change support, bug tracking). Some of the most important sites:

http://sourceforge.net/

http://savannah.gnu.org/

https://gna.org/

http://alioth.debian.org/

http://www.berlios.de/

http://codehaus.org/


Software announce and catalog sites:

These sites are mainly news aggregators, that provide detailed information on recently announced versions of a FLOSS package, along with information on licenses, home page and screenshots.

http://freshmeat.net/

http://www.eosdirectory.com/

http://sourcewell.berlios.de/


List of software equivalents:

http://www.linuxrsp.ru/win-lin-soft/table-eng.html

http://www.osalt.com/


Most Linux distributions also include a package search tool, like Debian and Ubuntu's Synaptic tool:

this tool provides search and installation support for all the installable packages that are included in the distribution "repositories", specialized sites that provide binary packages of the available FLOSS projects. The repositories are divided usually into "stable" and "unstable" ones, to provide the end-users with the choice between stable software and the last version (with the latest features, but not as thoroughly tested). It should be noted that nowadays no modern, end-user targeted distribution require the user to see or interact in any way with the FLOSS source code; in this sense, if to install a package it is necessary to perform code compilation or similar activities, the package itself should be considered experimental, and its adoption should be limited to where internal, specialized support is available.

Once a set of potentially useful applications have been found, it is fundamental to evaluate between the various applications. The evaluation should be based on a first step (refinement) and a second step (liveness); the first step is used to create a list of the application that provides all the features that are necessary for the task at hand, and the second is used to prioritize among the maturity and risk of a project.

Creating a graph for a product selection involves three easy steps:

  • starting from the list of features, extract those considered to be indispensable from the optional ones; all projects lacking in indispensable features are excluded from the list.
  • for every optional feature a +1 score is added to the project “feature score”, obtaining a separate score for each project.
  • using the automated tools from FLOSSMETRICS, a readiness score is computed using the following rule: for every “green” in the liveness and quality parameters a +1 score is added, -1 for every “red”.

This gives for each project a position in a two-dimensional graph, like this one:

Liveness parameters
ID Measurement Procedure Idea New Indicators
CM–SRA-1 Retrieving the date of the first bug for each member of the community, we are able to know if the number of new member reporting bugs remains stable Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM–SRA-2 Retrieving the date of the first commit for each member of the community, we are able to know if the number of new member committing remains stable Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-SRA-3 CVSAnalY: looking for the first commit of each detected committer in the SCM whose commit is not a code commit (for instance, ignoring source code extensions. MLS: Each new email address detected and its monthly evolution. Bicho: We measure monthly the first bug submitted by registered people. Retrieving the evolution of the first event in the community by a person and if it remains stable, can give an idea of how it evolves, and how many people are coming inside the community. Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-SRA-4 Check the core group of developers (those with the 80% of the commits). Now check the first commit of each new member who starts working on the core group. Retrieving this information gives an estimator of how the core contributors is evolving. Thus, we can see if there is a natural regeneration of core developers. Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-SRA-5 Core Team = people with the 80% of the commits. After this, any number of people who disappears from this core team is counted as one. Taking into account this metric we can estimate if there is a dramatic decrease in the number of core developers, and so, a risk in the regeneration. Green: There are no members leaving the project Yellow: There are some people leaving the project, one or two each year Red: A high number of people leave the project. The evolution shows an increase or even a stable period. Black: The number of people leaving the project is extremely high.
CM-SRA-6 Number of people who left the core team minus number of new members of the core team. Monthly analysis. Green: The balance shows an increase in the number of people coming to the project Yellow: The balance is equal to 0 Red: The balance shows an increase in the number of people leaving the project Black: The balance shows a really high number of people leaving the project
CM-SRA-7 Average age of people working on a project. This metric is focused on the average of years worked by each developer. With this approximation, we are able to know of members are approaching this limit and we can estimate future effort needs. Green: The longevity is older than 3 years Yellow: The longevity is older than 2 years and younger than 3 years Red: The longevity is older than 1 year and younger than 2 years Black: The longevity is younger than 1 year
CM-SRA-8 Evolution of people who contribute to the source code and reporting bugs. A way to retrieve this data is to analyze those committers and reporters with the same nickname. Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-SRA-9 Same metric than above, but this is the sum of all of them, and not the evolution. General number. We can measure the size of a community. Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-IWA-1 An event is defined as any kind of activity measurable from a community. Generally speaking, posts, commits or bug reports. Monthly analysis will provide a general view of the project and its tendency. Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-IWA-2 Monthly analysis will provide a general view of the project. In this way an increase or decrease in the number of commits will show the tendency of the community Taking into account the slope of the resultant line (y=mx+b) while measuring the aggregated number and periods of one year: Green: if m > 0 Yellow: if m=0 Red: if m<0 Black if there are no new submitters for several periods
CM-IWA-3 Number of people working on old releases, out of total work on the project. We can determine how supported are the old releases for maintenance purposes. Green: More than 10% Yellow: Between 5% and 10% Red: Between 0% and 5% Black: Nobody
CM-IWA-4 Looking at the number of committers per each file. This metric shows the territoriality in a project. Generally speaking, most of the files are touched or handled by just one committers. It means that high levels of orphaning may be seen as a risk situation. If a developer leaves the project, her knowledge will disappear and all her files are totally unknown by the rest of the developers team. Green: Less than 50% of the files are handled by just one committer Yellow: More than 50% of the files are handled by just one committer Red: More than 70% of the files are handled by just one committer Black: More than 90% of the files are handled by just one committer
CM-IWA-5 Number of people working on the project, out of number of people working on the whole project and taking into account the whole set of activities to carry on. High number of SLOC, e-mails or bugs to be fixed per active developer may mean that they are overworked. In this case, the community is clearly busy and they need more people to help on it. Green: Less than 30.000 Lines per committer and less than 25 bugs per committer Yellow: Between 30.000 and 50.000 lines per committer and between 25 and 75 bugs per committer. Red: Between 50.000 and 100.000 lines per committer and between 75 and 150 bugs per committer Black: More than 100.000 lines per committer and more than 150 bugs per committer
CM-IWA-6 Relationship between committers and total number of lines or files. With this absolute number, we are able to check the number of lines per committer. Thus, just regarding to the source code, we can say if they need more resources on it. Green: Less than 30.000 Lines per committer Yellow: Between 30.000 and 50.000 lines per committer Red: Between 50.000 and 100.000 lines per committer Black: More than 100.000 lines per committer
CM-IWA-7 Knowledge of the current team about the whole source code, measured in number of files touched by all committers out of the total number of files. This metric gives an approximation of the number of files touched by the whole set of active committers. High percentages will show a high level of knowledge of the current developer team over the whole set of files. Green: Less than 50 files Yellow: Between 50 and 200 files Red: Between 200 and 500 files Black: More than 500 files per committer

The evaluation becomes quite simple: if there is any red or black metric, you are looking at a high risk project, because there is a significant part of the code managed by a single, or a very small, group of people. We will estimate the number of yellow parameters that can be associated with a medium risk project by comparing our previous QSOS estimates with the new ones; it will be published directly in the guide.

image:Graph.jpg

The evaluator can then prioritize the selection according to the kind of adoption that is planned: those that are mission-critical and that requires a high project stability (and a good probability that the project itself is successful and alive) will prefer the project positioned on the right-hand of the graph, while those that are more “experimental” will favour the project placed in the top:

image:Graph2.jpg


This approach integrates the advantage of automated estimation of quality (and can be applied to the FLOSSMETRICS parameters or the previous QSOS ones) with a visual approach that provides in a single image the “risk” or inherent suitability of a set of projects.

Previous Next

Personal tools