Open Source Software Development

29 November 2005

This chapter explains

  • the principles behind open source development
  • the schism in the open source movement
  • how to carry out open source development

 

1. Introduction

Open source is a development approach in which the source code of the software is entirely free to access. The term “open source” is used to refer to both the product and the development approach. Any individual is able to view the code, modify it or duplicate it. Access to the source code facilitates the distributed and cooperative approach to software development that is fundamental to an open source style of development. 

 

Some examples of larger open source products are: Mozilla web browser; Apache web server; GNU/Linux and GNU/HURD operating systems; MySQL database software; Perl programming language; MyOffice and OpenOffice office suites. This range of products illustrates that open source software development can produce a diverse range of products.

 

2. The Principles of Open Source Development

Open source development is recognition of the hacker’s attitude to building software. Sometimes the term hacker has been associated with negative aspects of computing. However, hackers are now recognised as a community of highly skilled programmers, who relish the act of writing code and participate for enjoyment or to enhance their programming reputation. It is fundamental to the hacker ethic that information and knowledge should be freely shared without restriction because this stimulates collaborative thinking, leading to superior ideas.

 

The same principle is applied in open source development. Rather than the code being confined to a small core of developers (or even just one person), as in proprietary methods, a wider audience facilitates a greater influx of ideas and a greater degree of innovation. It is believed that because the source code is examined by a larger audience than proprietary software, any imperfections stand a greater chance of being identified and consequently rectified. The sharing of code therefore leads to more reliable code.

 

However, openness and the concept of code sharing do not mean that open source products are free to buy. There are other important issues of principles, which we now discuss.

 

Self Test Question

What is the primary goal of open source development?

Answer

Reliable software

End

 

3. The Schism within Open Source Development

 

Whilst there is a shared belief in collaboration and openness within the development community, a schism does exist in terms of the motivation and underlying philosophy of open source. The main split is between the Free Software Foundation and the Open Source Initiative.

 

The Free Software Foundation (FSF) was founded in 1985. Richard Stallman acts as its charismatic guru and is fiercely principled. The idea behind free software is that software should be free. But what exactly does free mean? Richard Stallman helps clarify this by saying that free software is free as in free speech, but not as in free beer. This means that the software is open but not available gratis. This software development community promotes free software projects and places emphasis on the social benefits of working collaboratively.

 

The philosophy of the FSF is that individual freedom should never be compromised and that all individual action should also benefit the wider community. Therefore, whilst individual programmers are encouraged and admired, they are also expected to feed their findings and their skills back into the community of programmers to which they ultimately belong. This is done through the sharing of code and the distribution of good programming practice.

 

The principle of freedom is reflected in the use of the term copyleft in order to distinguish it from the usual term copyright. In order to distribute its products, the free software community has devised its own license agreement, the General Public License (GPL). The GPL (which runs to about 5 pages of text) provides users with certain fundamental freedoms:

 

  • the freedom to run the program, for any purpose (freedom 0).
  • the freedom to study how the program works, and adapt it to their needs (freedom 1). Access to the source code is a precondition for this.
  • the freedom to redistribute copies so they can help their neighbor (freedom 2).
  • the freedom to improve the program, and release their improvements to the public, so that the whole community benefits (freedom 3). Access to the source code is a precondition for this.

There are two other important ingredients:

 

  • when redistributing a program, you cannot add restrictions to deny other people the freedoms. So when you redistribute modified GPL software, it must be distributed under the GPL.
  • a GPL program cannot be incorporated into proprietary programs

 

Thus the GPL actually makes it illegal for anyone to make GPL code proprietary or “closed”. It also disallows building any GPL covered software into proprietary software. Essentially, this means that GPL software always remains GPL software and therefore always remain free

 

The FSF is absolutely resolute in not allowing any proprietary software to be incorporated into their software. All their products are covered under the GPL and they are largely unaffiliated with commercial software development companies.

 

Free software organizations often offer their program code for free, most commonly as a download from their website. However, remembering that is “free as in free speech, not as in free beer”, some organizations sell free software as a complete package, shrink wrapped, sometimes including user manuals and additional support services.

 

The Open Source Movement, which later became the Open Source Initiative (OSI) is spearheaded by Eric S. Raymond. Their emphasis is on the benefits of open source as a development approach, rather than any moral benefits that can accrue. It is a purely pragmatic approach. They stress that open source development can produce higher quality software than other approaches.

 

The OSI are more willing to collaborate with larger software companies, sometimes including developers of proprietary products. They wish to appeal to the business sector because this enables greater distribution of their product. However, unlike the FSF, their approach is motivated primarily because of the quality, rather than because of the freeness of the software.  Forming contracts with larger companies is one way of exposing OSI products to a larger potential market. However, it also means that the product must compete with other commercial package products.

 

Some open source development projects have devised their own open source licenses, which differ in varying degrees from the GPL. However, the majority of founding open source projects still deploy the GPL.

 

Self Test Question

Can you write and sell software with a GPL license?

Answer

yes

End

 

Self Test Question

Can you obtain software that has a GPL license and then sell it?

Answer

yes

End

 

4. Techniques of Open Source Development

Despite the schism within open source in terms of ethics and philosophy, the development practices principally remain the same between the two movements.

 

Eric Raymond uses the following analogies to contrast conventional methods with open source development. Conventional development is like building a cathedral: it requires the careful coordination of large numbers of people, each working in a disciplined way. There are specializations such as managers, designers, implementers. The term cathedral suggests grandness, being overwhelmed. By contrast, open source development is like a bazaar, where there are also large numbers of people, but it is decentralized, roles are not clearly defined, people use different methods and interactions are unplanned. The term bazaar suggests vibrancy, movement, sharing, communication and enthusiasm.

 

Open source development tends to use the following techniques:

 

  • release an early initial version, to encourage people to get involved
  • treat users as co-developers, to enlist their help
  • integrate and release frequently, for early bug detection. In the early days of the Linux kernel, releases were sometimes made at least once every day
  • maintain two versions, a stable version with few bugs (for cautious users) and a buggy version with more features (for adventurous users to debug).
  • create a highly modularized design, to promote parallel development by different people.

 

Managing an open source project is potentially complex, and we look at this topic in the next section.

 

Throughout development, the internet facilitates communication between developers and also the distribution of source code, via the Web, File Transfer Sites and email.

 

There is usually no formal mechanism for gathering initial user requirements for an open source development. The process often consists of a software requirement that is instigated by a sole developer, with requests for collaboration, targeting the hacker community. The head developer specifies most requirements. Additional user requirements are either implemented by individual developers themselves via personal modification of the source code, or through a communal process known as “code forking”. Code forking occurs when the developer base has alternative requirements or conflicting ideas on how to implement a requirement. The code is seen to “fork” because it is split and each copy of the code is developed in parallel. After this split occurs, the code is irreconcilable and therefore two different products exist, both growing from the same base code. Each fork competes for developer attention, so that the most popular or the most reliable version survives.  In theory, the “fittest” code should survive.

 

 

The design of software is communicated via web-based tools. Sometimes UML diagrams or other notations using hyperlinks to depict the overall structure of the program are deployed. However, generally, there is a lack of design documentation within open source products. This reflects the hacker ethic that software is simply code.

 

The code writing on an open source project is sustained through voluntary contributions. Developers are motivated by the enjoyment of programming, the belief in the sharing of software or their own requirement for the software product.  Code is commonly implemented via re-use and many open source projects begin immediately by re-writing the code of existing products, with enhancements and alterations made where necessary. When there is no original from which to copy, a core developer base begins writing the code before offering it to the wider community for critique.

 

Once contributions have been implemented, beta versions of open source products are released. Releases are made frequently, so that the effectiveness of contributions can be tested immediately. Feedback on the latest version is received and contributions again incorporated into the code in a continuous cycle, which continues until the community is satisfied with the eventual outcome. Contributions then slow down or cease.

 

Development communities and product websites act as sources of support for users of open source software. The websites contain installation tutorials and user forum groups providing technical support. The development community mostly provides these voluntarily. 

 

As an alternative means of support, commercially supported versions of open source software such as GNU/Linux are available to buy. This software is an exact replica of the source code, but is provided with supporting manuals and services. These services do not exist for all products and therefore many smaller open source products are only used by technically adept users.

 

In summary, the following table lists most of the essential tasks of software development, alongside how they are carried out in open source development.

 

Requirements elicitation

An individual has an idea for a program and puts it on a mailing list or a newsgroup. Potential users suggest features. A discussion takes place until a consensus arrives. There is no market research, no interviewing of potential users.

Architectural design

There is no explicit process for this activity. Either the design is implicit (and obvious) or it evolves over time. (For example, there has been a long and vigorous argument about the best structure for the Linux kernel.) This is perhaps the least-defined part of development - and, perhaps, the most vulnerable.

Detailed design

This stage simply does not exist, except as a by-product of the next stage.

Coding

This is the central part of open source development. This is what developers enjoy doing. Source code is regarded as the most important, or only, product.

Integration

A version is built and placed on an internet site for users to download.

Verification

This is what the collaborators do. Not only can they run the program (and reveal bugs) but they can study the source code (and reveal bugs). There is no test plan or strategy. Different users will investigate the program in different ways. For example, some will be interested in robustness, others in security. This diversity has the potential to provide thorough testing.

Bug fixes

Again, this is what the collaborators do.

Support

Via newsgroups or commercial organizations

 

 

Self Test Question

What is the main technique of open source development?

Answer

Code sharing

End

Self Test Question

What is the main tool of open source development?

Answer

The internet

End

 

5. Project Management

The hacker ethic is essentially anti-managerial. So how are the following activities carried out in open source development?

 

  • planning schedules
  • deciding who is involved and how
  • deciding which new features will be incorporated
  • deciding on the architectural structure for the software
  • deciding what patches (corrections) will be incorporated
  • deciding when to make new releases

 

These activities are particularly challenging because of the large numbers of people involved. There is a need for a responsive decision-making structure so that decisions can be made quickly when bugs are reported and new features are suggested.

 

An explicit project manager or management group is generally in place on open source projects. They decide on the usefulness and appropriateness of contributions that are made by the wider developer community. They also usually add a patch to the code and therefore act as chief implementer on the project. Various organizational styles are used but a common characteristic is a hierarchy of control.

 

As an example of individual rule,  The Linux kernel development (see below) is based on a hierarchy, with Linus Torvalds at the top. Proposals are examined for appropriateness, selectively filtered and sent up the hierarchy until they reach Torvalds. If he accepts the proposal he integrates it into the code. He tightly controls the kernel. He has said "I couldn't manage lots of developers. I would not have been able to keep control"

 

 An example of group hierarchical management is the Apache project. This development is led by a group, the Project Management Committee (PMC). Membership of this group is by invitation only and must be approved by a majority vote of the group, with no veto from any member. A potential member must demonstrate high technical competence. Outside the PMC are the large numbers of contributors. A contributor submits a change, which is vetted by a PMC member. The PMC then votes on whether to accept the change.

 

As these examples indicate, most open source projects use an explicit project manager or a management group at the head of a hierarchy. The hierarchy means that a project is prevented from being overwhelmed by contributions. It also lessens the risk of sabotage. While the hierarchy within open source projects does not provide an “undo" facility, it does attempt to ensure that it does attempt to ensure that contributions are rigorously interrogated as they pass through the structure. There is clearly an important element of trust.

 

6. Case Study – the GNU/Linux Operating System

 

GNU/Linux is an open source operating system, loosely based upon Unix. It contains over 10 million lines of code and has been developed using over 3000 major contributors of code from 90 countries. It is perhaps the most famous open source project. It has achieved a reputation for high reliability, and is widely used in servers across the world. It is distributed under the GPL license.

 

Linus Torvalds, who still oversees the development today, instigated the project in 1991. Torvalds originally began the project because none of the current operating systems served his own requirements. They were either unreliable, too expensive or devoid of the functionality he required. He knew that the main free software operating system, GNU ("GNU is not Unix"), was some way from completion. He could not wait. What was missing from GNU was the central component, the kernel. So he began to write a kernel. Torvalds was also motivated by the enjoyment of writing code and claims that he wrote it “just for fun!” The kernel was named Linux, after Linus. The Free Software Foundation stringently claim that the majority of the complete operating system was written by the GNU people and that it should more appropriately be called Gnu/Linux.

 

Torvalds targeted developer forums and websites, posting an early release of the kernel and requesting feedback and contributions. Increased contributions and collaborations between GNU/Linux and GNU groups meant that distribution of beta versions was frequent and continuous.

 

Now you might imagine that Linux was developed in an egalitarian fashion, in line with the hacker ethic. But this is not so. Torvalds is in charge, surrounded by a small group of trusted lieutenants (called credited developers). These people are selected by Linux and by the group itself. The architecture of Linux is deliberately modular, so as to minimise communication between developers and to make it is easier to carry out development of different modules in parallel. Each credited developer has responsibility for an individual module.

 

The development of Linux centres around an electronic mailing list. This contains:

 

·        bug reports from users and testers

·        suggested bug fixes (patches)

·        code for new features

·        announcements, such as the announcement of a new release

 

The scale of this list is huge - in one five year period, approximately 13,000 contributors posted around 175,000 messages to the list. The credited developers watch the mailing list, looking for entries relevant to their module. They assess whether a contribution is useful and, where appropriate, submit patches to Torvalds. Then he decides what happens.

 

So the people on the project are organized as follows:

 

·        Torvalds is at the centre

·        he is surrounded by a small group of credited developers

·        they are surrounded by thousands of contributors

 

Clearly this is a very centralized, and hierarchical, structure.

 

After years of continuous development, GNU/Linux is now a renowned open source operating system, competing on the world market with other commercial and proprietary products. What began as a personal project is now widely used and technically reputable. The GNU/Linux code is still available in its original non-supported format. However, a number of commercial organizations also exist to provide appropriate support for various user markets. GNU/Linux remains in continuous development, undergoing corrections and enhancements.

 

7. Discussion

Open source development’s most attractive asset is the enormous enthusiasm and passion that resonates throughout the developer community and their building of software. Developers have an unrelenting belief in what they do; voice their pride in their hacker roots; and find nothing more fulfilling than the art of programming.

 

Forking ensures that developer requirements are established and implemented in a democratic process. This means that the requirements of the majority of the development community are satisfied. Similarly, any specific personal modification can be made by individuals, providing that they have the technical ability to implement them.

 

However, it is worth noting that this process largely ignores non-developer user requirements. The general user does not have the power to register their vote via code implementation; neither can they personally modify their own code.

 

The re-use of code is an important development approach. However, in the case of open source projects that attempt to re-write entire systems and applications, a re-use approach can only be facilitated by source code that is not covered by a proprietary license. Liability issues may hinder entire projects because developers may not have legal access to any code that they would like to re-write. However, the overall expertise of the hacker community usually means that volunteers are willing to take on the alternative and more difficult task of writing entire systems from scratch.

 

Releasing frequent versions of the software brings benefits of continuous feedback. Whilst the beta code may not contain all the functionality that is required, it means that the developer base can immediately evaluate the code and get a feel for the software. Crucially, the potentially vast audience of testers can immediately begin to track and fix bugs, so that changes can be made incrementally, continuously and at a relatively fast pace.

 

Inappropriate patches, once incorporated into code can irreparably damage a project. Having an explicit manager on open source projects means that all contributions are monitored and approved. This ensures that the freedom to contribute is upheld, but lessens the risk of any sabotage attempts.

 

Open source program code tends to be highly reliable because bugs are found and fixed by a wide viewing audience with highly proficient programming abilities. Proprietary software corporations are being forced to acknowledge open-source development as a valid approach and are beginning to experiment with its techniques. The high viewing audience that can track and fix bugs is seen as an efficient way of “cleaning up” software that is proving to be unreliable. Consequently, some companies have now opened up previously closed code. This suggests that the open source development approach can influence other mainstream techniques.

 

Contributors to open source projects have a passion for programming, so that writing code is seen as more of a hobby, than a chore or a job. They and gain enormous satisfaction in seeing their patches integrated into a program. However, because open source projects generally rely upon voluntary contributions, there is always the risk that the community will cease to contribute to the project. This would result in a stagnation of a development project and an unfinished product.

 

Similarly, the lack of documentation also potentially limits the maintenance to the original developer base and lessens the ability of someone else being able to take on the project. If the initial developer base tires of a project, it is not easy for another developer to take on the project without documentation as a means of communicating the design of the program.

 

The usefulness of informal support mechanisms is questionable, particularly for the non-teachnical user. Web site tutorials are often aimed at a technically adept audience. In addition, since support services are voluntary, there is no guarantee that someone will be available when required and users may have to wait until someone responds to their enquiry.

 

8. Summary

Open source development is a collaborative approach relying upon voluntary contributions of program code. It has its roots in a hacker ethic that promotes individual skill, but also upholds the importance of community. 

 

The approach produces extremely reliable software because open source code means bugs are exposed to a vast audience. Thus more bugs are likely to be found and fixed. The regular release of the software also means that program code is continually tested before the final product version is released.

 

Non-commercial open source organizations are often weak in supporting the general user (for example, producing and supporting a word processor). However, the commercial sector, acknowledging the superiority of the open source code, is addressing this problem, providing support services for open source products and adopting open source development techniques.

 

9. Exercises

 

1.      Can you think of any situations or products for which open source development might be most appropriate?

2.      Can you think of examples of situations in which open source development of products might be unwise?

3.      Assess whether open source would be suitable for each of the developments given in appendix A.

4.      Compare and contrast the approaches of the Free Software Foundation and the Open Source Initiative.

5.      Is open source development just hacking?

 

10. References

Hackers: Heroes of the Computer Revolution

Levy, S. (2002), Anchor Books

 

This provides a rare insight into the history of hacking, from its origins at MIT in the 1950 to the rise of open source software.

--------------------------------------------------------------------------------------------------------------

Open Sources: Voices from the Open Source Revolution (1st Edition)

DiBona, C., Ockman, S., & Stone, M. (1999), O’Reilly

 

A comprehensive collection of essays covering topics from licensing issues to the engineering of major open source products such as Mozilla and Perl.

-----------------------------------------------------------------------------------------------------------------

Rebel Code: Inside Linux and the Open Source Revolution

Moody, G. (2001), Perseus Publishing.

 

This is a very accessible book which depicts the development of the GNU/Linux Operating System, including interviews with major contributors in the open source field.

------------------------------------------------------------------------------------------------------

The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary (Revised Edition)

Raymond, E. S. (2001) O’Reilly.

 

This is a response to Fred Brooks’ seminal proprietary software development text The Mythical Man Month (1974). Raymond argues why the open source approach to software development will provide a higher quality product.

-------------------------------------------------------------------------------------------------------------

Free as in Freedom: Richard Stallman’s Crusade for Free Software

Williams, S. (2002), O’Reilly.

 

Primarily focusing on the life and moral crusade of Stallman, this text also describes the development of GNU project and other projects of the Free Software Foundation. 

--------------------------------------------------------------------------------------------------------------

SourceForge.com is a web site that coordinates open source development projects. If you want to contribute to projects, this is the place.

The URL is http://www.sourceforge.net

 

---------------------------------------------------------------------------------------------

 

The Success of Open Source

Steven Weber, Harvard University Press, 2004

 

A social scientist's view of open source development, showing how it challenges conventional wisdom.