MAINFRAME COBOL IN THE TWENTY-FIRST CENTURY

 

Version 3.0 – March 27, 2001

Version 3.1 – May 27, 2009

 

 

Abstract

 

NOTE:  This is a reissue of a paper first produced as a client study in 2001.  Some new material from the author’s Best Practices in Software Engineering (McGraw Hill 2009) is included. The book will be published in the Autumn of 2009.

 

Mainframe computers running under IBM’s MVS operating system are the primary tools of large business and government information processing.  Although COBOL is one of the oldest programming languages, it remains one of the most widely used.  As of 2001, there are more professional COBOL programmers than for any other language.  The COBOL language also has more tools available for application development and especially for maintenance than any other language.  This article discusses the demographics of COBOL use in the United States and abroad.

 

 

Capers Jones, Chief Scientist Emeritus

Software Productivity Research LLC

 

Email                CJonesiii@CS.com

Web                 http://www.spr.com

 

 

 

Copyright ã 1998 - 2009 by Capers Jones.

All Rights Reserved.

 


INTRODUCTION

 

As of 2009 there are close to 45,000,000 software application throughout the world.  These applications are written in a total of more than 2,500 programming languages.  Most of the languages are obscure and academic, but there are perhaps 150 common commercial programming languages (i.e. COBOL, Java, C, etc.) and at least 100 proprietary languages developed by companies for software development (ESPL/I, CORAL, etc.).  About one third of software applications utilize two or more languages simultaneously, with COBOL and SQL being the most common pairing.  Software applications operate on more than 50 hardware platforms and run under more than 25 operating systems.

 

Although software applications run on many computers and under many operating systems the most common for large businesses and government agencies are applications on mainframe computers running IBM’s MVS operating system and written in the COBOL programming language.

 

Although in 2009 maintenance programmers using COBOL outnumber new developers using COBOL, the language is still used for many business applications.

 

Because mainframe computers and COBOL continue to be prominent it is useful to examine the size of these domains.  There is a high margin of error with the demographic information.  However, real accuracy in software demographics is a long-standing weakness of the software community.

 

The Overall U.S. Software Population and the COBOL Software Population

 

In 1994 Software Productivity Research (SPR) was commissioned to do a study of software occupation groups in major corporations and government agencies (Jones 1995).  The study noted more than 50 kinds of occupations in the software groups working in major corporations such as AT&T, IBM, Texas Instruments, and the U.S. Air Force who were participants in the study.

 

Using this study as a jumping off place, table 1 attempts to show the percentage of the overall United States software occupation groups that are primarily concerned with COBOL applications.  The basic demographic data was updated through calendar year 1998 at the request of one of SPR’s clients.

 

Table 1 is sorted in descending order by two major groups:  1) Occupations that are immediately concerned with building and maintaining software comprise the first group;  2) Occupations such as customer support and sales which deal with software applications indirectly after they have been deployed comprise the second group.  Overall, roughly 24% of the total U.S. software population is involved with the development and maintenance of COBOL applications circa 1998.


 


Table 1:  Software  and COBOL Personnel in the United States in 1998

 

 

 

 

 

Software Occupation Groups

Number

COBOL

COBOL

 

 

Employed

Percent

Personnel

 

 

 

 

 

 

Programmer/analyst

400,000

40.00%

160,000

 

Programmer (maintenance)

350,000

42.00%

147,000

 

Project manager (1st level)

225,000

33.00%

74,250

 

Programmer (development)

275,000

20.00%

55,000

 

Systems analyst

100,000

33.00%

33,000

 

Data administration specialist

50,000

35.00%

17,500

 

Testing specialist

125,000

12.00%

15,000

 

Software technical writer

75,000

20.00%

15,000

 

Project manager (2nd level)

35,000

20.00%

7,000

 

Software Quality Assurance

25,000

15.00%

3,750

 

Configuration control specialist

15,000

15.00%

2,250

 

Software engineer (systems)

200,000

1.00%

2,000

 

Performance specialists

7,500

25.00%

1,875

 

Project manager (3rd level)

5,000

20.00%

1,000

 

Software Architect

1,500

12.00%

180

 

Software engineer (realtime)

75,000

0.00%

0

 

Software engineer (embedded)

70,000

0.00%

0

 

 

 

 

 

 

SUBTOTAL

2,034,000

26.29%

534,805

 

 

 

 

 

 

 

 

 

 

 

Support Occupations

Number

COBOL

COBOL

 

 

Employed

Percent

Personnel

 

 

 

 

 

 

Software mgt. consultant

45,000

25.00%

11,250

 

Customer support specialist

80,000

12.00%

9,600

 

Software sales specialist

105,000

5.00%

5,250

 

Software librarians

15,000

25.00%

3,750

 

Software education specialist

30,000

10.00%

3,000

 

Systems administration

50,000

5.00%

2,500

 

Measurement specialist

3,500

45.00%

1,575

 

Process improvement specialist

5,000

10.00%

500

 

Software marketing specialist

3,000

15.00%

450

 

Process auditors/assessors

7,500

5.00%

375

 

Project planning specialist

2,000

15.00%

300

 

Cost estimating specialist

2,000

15.00%

300

 

Certified function point counter

500

60.00%

300

 

Human factors specialist

1,000

10.00%

100

 

 

 

 

 

 

SUBTOTAL

349,500

11.23%

39,250

 

 

 

 

 

 

TOTAL

2,383,500

24.08%

574,055

 

 

 

 

 

 

 

While COBOL can be found on a variety of hardware platforms ranging from notebooks through mainframes, COBOL is the traditional language of mainframe business applications.  From queries among companies that market mainframe applications, there are at least 5,000 MVS sites in the United States.  Assuming that a typical MVS site is supported by a staff of about 80 software personnel, then roughly 400,000 of the U.S. COBOL personnel out of about 574,000 would be associated with MVS mainframe locations in the United States.

 

Software Productivity Research collects information on programming languages and programmers who use various languages, but does not collect information on hardware platforms.  (SPR’s interest in programming languages is in support of our software cost estimating tools CHECKPOINTÒ and KnowledgePlanÒ which can estimate software projects developed in “all known programming languages.”)

 

The market-research firm, Gartner Group, has estimated that “MVS platforms using COBOL code comprise roughly 80% of the overall year 2000 marketplace.…”  Presumably the Gartner statement applies to the overall COBOL marketplace.  Using Gartner Group’s estimate of 80% combined with the table 1 total of about 574,000 then the U.S. total of MVS COBOL personnel would be about 459,000.  Since neither the SPR nor Gartner assumptions are precise, it would be fair to take an intermediate value of about 450,000 U.S. MVS COBOL personnel.

 

One of the difficulties of this kind of analysis is the fact that many software personnel are multi-lingual in terms of the programming languages they use.  Thus it is easily possible for the same software personnel to develop applications in either COBOL, C, Fortran, Visual Basic, SQL, Java, or some other language depending upon the needs of the project being developed.

 

In addition, software personnel are also able to work on multiple platforms.  While some personnel may specialize in mainframes, mid-range, or deck-top platforms, many others can develop on several platforms.  For example client-server applications are routinely two-tier or three-tier applications that span multiple platforms and can include mainframes as servers and desktop computers as clients.

 

World Total of Software Personnel and COBOL Personnel

 

Building on the information in table 1, it is interesting to extrapolate the total numbers of professional software personnel, and the total numbers of professional software personnel who work in the COBOL arena to a global level.

 

Table 2 shows the approximate 1998 total software populations in terms of six large geographic areas:  North America (U.S. and Canada), Latin America (Mexico through Argentina), Africa and the Middle East, Western Europe, Eastern Europe (the former Soviet Block), and Asia and the Pacific Rim which includes China, India, Japan, the Koreas, Thailand, Malaysia, Indonesia, etc.

 

Table 2:  Software and COBOL Populations by Geographic Region

 

 

 

 

 

 

Geographic Regions

Total

All COBOL

All COBOL

MVS COBOL

 

Software

Percent

Personnel

Personnel

 

Population

 

 

 

 

 

 

 

 

North America

2,600,000

24.00%

624,000

486,720

Western Europe

2,900,000

20.00%

580,000

452,400

Asia-Pacific

3,500,000

12.00%

420,000

327,600

Latin America

2,500,000

15.00%

375,000

292,500

Eastern Europe

1,900,000

12.00%

228,000

177,840

Middle East/Africa

1,000,000

15.00%

150,000

117,000

 

 

 

 

 

TOTAL

14,400,000

16.51%

2,377,000

1,854,060

 

Once again, table 2 has a large margin of error.  However, the underlying assumptions appear to be reasonable.  COBOL as a business-oriented language is more widely used in North America than anywhere else.  Western Europe ranks number 2 in terms of COBOL programmers, but Western Europe is comparatively high in Algol and PL/I applications both of which overlap COBOL in application usage.

 

Eastern Europe is dominated by military and systems software, rather than business applications, so COBOL is a comparatively sparse language in terms of frequency of usage.

 

Latin America, the Asia-Pacific area, and the Middle East and Africa also use COBOL for business applications, but not to the same degree as noted in either North America or in Western Europe.

 

Note that table 2 condenses the software populations of about 200 countries and territories, which also adds to the margin of error.  Even so, COBOL has been the dominant international language for business applications for more than 25 years.

 

Table 2 shows a snapshot for calendar year 1998.  SPR was also asked about trends over time, with emphasis on MVS COBOL, which is a very complex issue.  Because of the urgent need to update mainframe MVS applications for both the Euro and the Year 2000 problems from 1997 through 1999, the MVS COBOL population has been expanding rapidly.  This trend may continue until through 2001.  By 2001 the main Euro and Year 2000 software problems will have been repaired.

 

Table 3 builds on the world MVS COBOL data for 1998 and shows the approximate numbers of MVS COBOL programmers from 1995 through 2005.  Needless to say, table 3 has a large margin of error.


 

Table 3:  World MVS COBOL Population, 1995 - 2005

 

 

 

 

 

Year

World MVS

Percent

 

 

 

COBOL

of 1998

 

 

 

Population

Population

 

 

 

 

 

 

 

1995

1,539,778

83.05%

 

 

1996

1,655,676

89.30%

 

 

1997

1,761,357

95.00%

 

 

1998

1,854,060

100.00%

 

 

1999

1,965,304

106.00%

 

 

2000

2,063,569

111.30%

 

 

2001

2,146,112

115.75%

 

 

2002

2,210,495

119.22%

 

 

2003

2,232,600

120.42%

 

 

2004

2,124,650

114.59%

 

 

2005

2,060,911

111.16%

 

 

 

Since the base or starting year for table 3 is calendar year 1998 the number of MVS COBOL programmers employed currently is set at 100%, and the other years are expressed as changes from the 1998 base year.  The peak year for MVS COBOL is predicted to be 2003, after which a decline should begin to occur.

 

COBOL Applications Throughout the World

 

The next topic of interest is to try to ascertain the total volume of COBOL applications throughout the world.  There is no solid, definitive data on the number of applications deployed in any programming language.  Further, there are many dialects of COBOL including but not limited to IBM’s MVS COBOL, COBOL II, MicroFocus COBOL, COBOL LE, VS/COBOL, COBOL74, COBOL68, and RM COBOL.

 

While there are differences among these dialects, we can reach a useful approximation by using some overall rules of thumb derived from software assessment and benchmark studies.  Let us assume these approximate values:

 

·        There will be 1000 function points of COBOL applications for each COBOL staff member in every geographic region.

 

·        There will be an average of 105 COBOL source code statements per function point, counting the procedure and data divisions. (The range is from about 65 to more than 175 statements per function point.)

 

·        The average COBOL software application is 250 function points in size (although the maximum observed size can be greater than 15,000 function points).  This is equivalent to 26,250 COBOL source code statements.

 

·        The average COBOL program is about 15 function points or 1500 lines of executable code in size.

 

·        The average COBOL program is developed at a rate of about 15 function points per staff month, although there are broad ranges.

 

·        The average COBOL application of 2500 function points is developed at a rate of about 8 function points per staff month.  Here too there are broad ranges.

 

·        During maintenance, one maintenance programmer can support about 1,500 function points of COBOL.  This is equivalent to about 160,000 COBOL source statements in the procedure and data divisions.

 

Using some of these initial assumptions, the world total for COBOL applications would amount to almost 250 billion COBOL source code statements as shown in Table 4.

 

Table 4:  World Total of COBOL Staff, Function Points, and Lines of COBOL Code

 

 

 

 

 

Geographic Regions

World COBOL

World COBOL

World COBOL

 

 

Software

Function

Lines of Code

 

 

Population

Points

 

 

 

 

 

 

 

North America

624,000

624,000,000

65,520,000,000

 

Western Europe

580,000

580,000,000

60,900,000,000

 

Asia-Pacific

420,000

420,000,000

44,100,000,000

 

Latin America

375,000

375,000,000

39,375,000,000

 

Eastern Europe

228,000

228,000,000

23,940,000,000

 

Middle East/Africa

150,000

150,000,000

15,750,000,000

 

 

 

 

 

 

TOTAL

2,377,000

2,377,000,000

249,585,000,000

 

 

 

 

 

 

MVS TOTALS

1,854,060

1,854,060,000

194,676,300,000

 

 

The previously-cited report by Gartner Group states than an average COBOL program (as opposed to an application which contains several programs) is about 1,500 lines of source code in size.  This is roughly equivalent to about 15 function points in size.  Dividing the MVS world number of lines of code of 194,676,300,000 by 1500 indicates that there may a total of 129,784,200 MVS COBOL programs in the world.

 

If you expand your view from programs of 15 function points to applications with an average size of 250 function points, then the world-wide total of MVS COBOL applications totals to 7,787,052 applications.  In other words, quite a high percentage of the world’s business is conducted by means of COBOL applications.

 

 

EXCERPT FROM CHAPTER 8 OF BEST PRACTICES IN SOFTWARE ENGINEERING

 

The following excerpts are from chapter 8 of the author’s new book on software engineering, which will be published by McGraw Hill in the autumn of 2009.  Chapter 8 deals with the problems caused by the existence of more than 2,500 total programming languages.

 

…..The popularity of programming languages bears a certain resemblance to the popularity of prime-time television shows.  Some new shows such as “Two and a Half Men” surface, attract millions of viewers, and may last for a number of seasons.  A few shows such as “Seinfeld” become so popular that they go into syndication and continue to be aired long after production stops.   But many shows are dropped after a single season.

 

It is interesting that the life expectancy of programming languages and the life expectancy of television shows are about the same duration.  Many programming languages have active lives that span only a few “seasons” and then disappear.  Other languages become standards and may last for many years.  However when all 2,500 languages are considered, the average active life of a programming language when it is being used for new development is less than 5 years.  Very few programming languages attract development programmers after more than 10 years.

 

Some of the languages that are in the class of “Seinfeld” or “I Love Lucy” and may last more than 25 years under syndication include:

 

  • Ada                                                       
  • C
  • C++
  • COBOL
  • Java
  • Objective C
  • PL/I
  • SQL
  • Visual Basic
  • XML

 

In a programming language context, the term “syndication” means that the language is no longer under the direct control of its originator, but rather control has passed to a user group or a commercial company, or that the language has been put in the public domain and is available via open-source compilers.

 

It would be interesting and valuable if there were benchmarks and statistics kept of the numbers of applications written in these long-lived programming languages.  No doubt C and COBOL have both been used for more than 1,000,000 applications on a global basis. 

 

In fact continuing with the analogy of the entertainment business, it might be interesting to have awards for languages that have been used for large numbers of applications.  Perhaps “silver” might go for 100,000 applications; “gold” for 1,000,000 applications and “platinum” for 10,000,000 applications.

 

If such an award were created, a good name for it might be the “Hopper” after Admiral Grace Hopper who did so much to advance programming languages and especially COBOL.  In fact COBOL is probably the first programming language in history to achieve the 1,000,000-application plateau.

 

Although the idea of awards for various numbers of applications is interesting, that would means that statistics were available for ascertaining how many applications were created in specific languages or combinations of languages.  As of 2009 the software industry does not keep such data….

 

It would be technically possible to develop a standard method of describing and cataloging the features of programming languages.  Indeed with more than 2,500 languages in existence, such a catalog is urgently needed.   Even if the catalog only started with 100 of the most widely used languages it would provide valuable information.

 

The full set of topics included to create an effective taxonomy of programming languages is outside the scope of this book, but might contain factors such as:

 

  1. Language name:                                   Name of language
  2. Architecture:                                         Object-Oriented, functional, procedural, etc.
  3. Origin:                                                    Year of creation; names of inventors
  4. Sources:                                                 URL’s of distributors of language compilers
  5. Current version:                                   Version number of current release; 1,2 or whatever
  6. Support:                                                 URL’s or addresses of maintenance organizations
  7. User associations:                               Names, URL’s, and locations of user groups
  8. Tutorial materials:                                Books and learning sources about the language
  9. Reviews or critiques                            Published reviews of language in refereed journals
  10. Legal status:                                         Public domain; licensed, patents, etc.
  11. Language definition:                           Whether it is formal, informal
  12. Language syntax:                                 Description of syntax
  13. Language typing:                                 Strongly typed, weakly typed, untyped, etc.
  14. Problem domains:                 Mathematics, web, embedded, graphics etc.
  15. Hardware platforms:                            Hardware language was intended to support
  16. OS platforms:                                        Operating systems language compilers work with
  17. Intended uses:                                      Targeted application types
  18. Known limitations:                               Performance, security, problem domains, etc.
  19. Dialects:                                                 Variations of the basic language
  20. Companion languages:                       .NET, XML,  ASP.NET, etc. (languages used jointly)  
  21. Extensibility:                                         Commands added by language users
  22. Level:                                                     Logical statements relative to assembly language
  23. Backfire level:                                       Logical statements per function point
  24. Reuse sources:                                     Certified modules, uncertified, etc.
  25. Security features:                                 Intrinsic security features, such as in the E language
  26. Debuggers available:                           Names of debugging tools
  27. Static analysis available:                     Names of static analysis tools
  28. Development tools available:             Names of development tools
  29. Maintenance tools available:             Names of maintenance tools
  30. Applications to date:                           Approximately 100, 1000, 10,000, 100,000, etc.

 

Given the huge number of programming languages it is surprising that no standard taxonomy exists.  Web searches reveal more than a dozen topics when using search arguments such as “taxonomies of programming languages” or “categories of programming languages.”  However these vary widely and some contain more than 50 different descriptive forms, but seem to lack any fundamental organizing principle. 

 

Returning now to the main theme, somewhat alarmingly, the life expectancy of many software applications is longer than the active life of the languages in which they were written.  An example of this is the patient-record systems of medical records maintained by the Veterans Administration.  It is written in the Mumps programming language and has far outlived Mumps itself.

 

It is obvious to students of software engineering economics that if programming languages have an average life expectancy of only 5 years, but large applications last an average of 25 years, then software maintenance costs are being driven higher than they should be due to the very large number of aging applications that were coded in programming languages that are now dead or dying…..

 

 

….Table 8.6 is a summary of 40 kinds of software applications that have critical importance to the United States.  Table 8.6 also shows the various programming languages used in these 40 kinds of applications.  A major function of a code translation center would be to accumulate more precise data on critical applications and the languages used in them.

 

Both columns of table 8.6 need additional research.  There are no doubt more kinds of critical applications than the 40 listed here.  Also, in order to fit on a printed page the second column of the table is limited to about six or seven programming languages.  For many of these critical applications there may be 50 or more languages in use at national levels.

 

Table 8.6: Programming Languages Used for Critical Software Applications

 

 

 

 

Critical Software

Programming Languages

 

 

 

1

Air traffic control

Ada, Assembly, C, Jovial, PL/I

2

Anti-virus & security

ActiveX, C, C++, Oberon7

3

Automotive engines

C, C++, Forth, Giotto

4

Banking applications

C, COBOL, E, HTML, Java, PL/I, SQL, XML

5

Broadband

C, C++, CESOF, JAVA

6

Cell phones

C, C++, C#, Objective C

7

Credit cards

ASP.NET, C, COBOL, Java,  Perl, PHP, PL/I

8

Credit checking

ABAP, COBOL, Fortran, PL/I, XML

9

Credit unions

C, COBOL, HTML, PL/I, SQL

10

Criminal records

ABAP, C, COBOL, Fortran, Hancock

11

Defense applications

Ada, Assembly, C, CMS2,  Fortran, Java, Jovial, SPL

12

Electric power

Assembly, C, DCOPEJ, Java, Matpower

13

FBI, CIA, NSA, etc.

Ada, APL, Assembly, C, C++, Fortran, Hancock

14

Federal taxation

C, COBOL, Delphi, Fortran, Java, SQL

15

Flight controls

Ada, Assembly, C, C++, C#, LabView

16

Insurance

ABAP, COBOL, Fortran, Java, PL/I

17

Mail and shipping

COBOL, dBase2, PL/I, Python, SQL

18

Manufacturing

AML, APT, C, Forth, Lua, RLL

19

Medical equipment

Assembly, Basic, C, CO, CMS2, Java

20

Medical records

ABAP, COBOL, MUMPS. SQL

21

Medicare

Assembly, COBOL, Java, PL/I, dBase2, SQL

22

Municipal taxation

C, COBOL, Delphi, Java

23

Navigation

Assembly, C, C++, C#, Lua, Logo, MatLab

24

Oil and Energy

AMPL,C,  G, GAMS/MPSGE, SLP,

25

Open-source software

C, C++, JavaScript, Python, Suneido, XUL

26

Operating systems-Large

Assembly, C, C#, Objective C, PL/S, VB

27

Operating Systems-Small

C, C++, Objective C, OSL, SR

28

Pharmaceuticals

C, C++, Java, Pascal, SAS, Visual Basic

29

Police records

C, COBOL, DBase2, Hancock, SQL

30

Satellites

C, C++, C#, Java, Jovial, PHP,  Pluto

31

Securities trading

ABAP, C #,COBOL, DBase2, Java, SQL

32

Social Security

Assembly, COBOL, PL/I, dBase2, SQL

33

State taxation

C, COBOL, Delphi, Fortran, Java, SQL

34

Surface transportation

C, C++, COBOL, FORTRAN, HTML, SQL

35

Telephone switching

C, CHILL, CORAL, Erlang, ESPL1,ESTEREL

36

Television broadcasts

C, C++, C#, Java, Forth

37

Voting equipment

Ada, C, C++, Java

38

Weapons systems

Ada, Assembly, C, C++, Jovial

39

Web applications

Applescript, ASP, CMM, Dylan, E,  Perl; PHP, .NET

40

Welfare (State)

ASP.NET, C, COBOL,  dBASE2, PL/I, SQL

 

The North American Industry Classification (NAIC) codes of the Department of Commerce identify at least 250 industries which the author knows create software in substantial volumes.  However the 40 industries shown in table 8.6 probably contain almost 50% of applications critical to U.S. business and government operations.

 

As a result of the importance of these 40 software application areas to the United States business and to government operations, they probably receive almost 75% of cyber attacks in the form of viruses, spyware, search-bots, and denial of service attacks.  These 40 industries need to focus on security.  Even a cursory examination of the programming languages used by these industries reveals that few of them are particularly resistant to viruses or malware attacks.

 

For all 40 maintenance is expensive and  for many growing progressively more expensive due to the difficulty of simultaneously maintaining applications written in so many different programming languages….

 

How Many Programmers Use Various Programming Languages?

 

There is no real census of either languages used in applications or number of programmers.  While the Department of Commerce and the Bureau of Labor Statistics do issue reports on such topics in the United States, their statistics are known to be inaccurate.

 

A survey done by the author and his colleagues a few years ago found the Human Resources organizations in most large corporations did not know how many programmers or software engineers were actually employed.  Since government statistics are based on reports from HR organizations, if they themselves don’t know then they can’t provide good data to the government either.

 

Among the reasons why government statistics probably understate the numbers of programmers and software engineers is because of ambiguous job titles.  For example some large companies use titles such as “member of the technical staff” as an umbrella title that might include software engineers, hardware engineers, systems analysts, and perhaps another dozen occupations.

 

Another problem with knowing how many software engineers there are is the fact that many personnel working on embedded applications are not software engineers or computer scientists by training, but rather electrical engineers, aeronautical engineers, telecommunications engineers, or some other form of engineering.

 

Because the status of these older forms of engineering is higher than the status of software engineering, many people working on embedded software refuse to be called software engineers and insist on being identified by their true academic credentials. 

 

The study carried out by the author and his colleagues was to derive information on the number of software specialists (i.e. quality assurance, data base administration, etc.) employed by large software-intensive companies such as IBM, AT&T, Hartford Insurance, and so forth.

 

The study included on-site visits and discussions with both HR organizations and also local software managers and executives.  It was during the discussions with local software managers and executives that it was discovered the not a single HR organization actually had good statistics on software engineering populations.

 

Based on on-site interviews with client companies and then extrapolation from their data to national levels, the author assumes that the U.S. total of software engineers circa 2009 is about 2,500,000.  Government statistics as of 2009 indicate around 600,000 programmers, but these statistics are low for reasons already discussed.  Additionally, the government statistics also tends to omit one-person companies and individual programmers who develop applets or single applications.

 

About 60% of these software engineers work in maintenance and enhancement tasks and 40% work as developers on new applications.  There are of course variations.  For example there are many more developers than maintenance personnel working on web applications because all of these applications are fairly new.  But for traditional main-fame business applications and ordinary embedded and systems software applications maintenance outnumbers development by a substantial margin.

 

Table 8.7 shows the approximate numbers of software engineers by language for the United States.  However the data in table 8.7 is hypothetical and not exact.  Among the reasons that the data is not exact is that many software engineers know more than one programming language and work with more than one programming language.

 

However Table 8.7 does illustrate a key point:  The most common languages for software development are not the same as the most common languages for software maintenance.  This situation leads to a great deal of trouble for the software industry.

 

Table 8.7:  Estimated Number of Software Engineers by Language

 

 

 

 

 

 

Development

Software

 

Maintenance

Software

 

Languages

Engineers

 

Languages

Engineers

 

 

 

 

 

 

 

Java

175,000

 

COBOL

575,000

 

C

150,000

 

PL/I

125,000

 

C++

130,000

 

Ada

100,000

 

Visual Basic

100,000

 

Visual Basic

75,000

 

C#

90,000

 

RPG

75,000

 

Ruby

65,000

 

Basic

75,000

 

JavaScript

50,000

 

Assembler

75,000

 

Perl

30,000

 

C

75,000

 

Python

20,000

 

Fortran

65,000

 

COBOL

15,000

 

Java

60,000

 

PHP

15,000

 

JavaScript

40,000

 

Objective-C

10,000

 

Jovial

10,000

 

Others

150,000

 

Others

150,000

 

 

 

 

 

 

 

 

1,000,000

 

 

1,500,000

 

 

The most obvious problem illustrated by Table 8.7 is that it is difficult to get development personnel to work on maintenance tasks because of the perceived view that older languages are not as glamorous as modern languages.

 

A second problem is that due to the differences in programming languages between maintenance and new development, two different sets of tools are likely to be needed.  The developers are interested in using modern tools including static analysis, automated testing, and other fairly new innovations.

 

However many of these new tools do not support older languages, so the software maintenance community needs to be equipped with maintenance workbenches that include tools with different capabilities.  For example tools that analyze cyclomatic and essential complexity are used more often in maintenance work than in new development.  Tools that can trace execution flow are also used more widely in maintenance work than in development.  Another new kind of tool that supports maintenance more than development are tools that can “mine” legacy code and extract hidden business rules.  Yet another kind of tool that supports maintenance work are tools that can parse the code and automatically generate function point totals….

 

 

SUMMARY AND CONCLUSIONS

 

In spite of the age of the COBOL language it remains a mainstream programming language.  There are more business-oriented COBOL applications than any other programming language.  The large number of legacy applications written in COBOL means that this language will continue to have a very pervasive influence on the overall software work within businesses and government groups.  Because COBOL is a popular language, there are relatively large populations of COBOL programmers in most industrialized countries.  In addition, COBOL has more tools and consulting groups than almost any other programming language.

 

 

READINGS AND REFERENCES

 

Jones, Capers; Best Practices in Software Engineering; 1st edition; McGraw Hill,  to be published in 2009.

 

Jones, Capers; Applied Software Measurement; 3rd edition; McGraw Hill, 2008.

 

Jones, Capers; Estimating Software Costs; 2nd edition; McGraw Hill, 2009.

 

Jones, Capers; Critical Problems in Software Measurement; Information Systems Management Group, 1993; ISBN 1-56909-000-9; 195 pages.

 

Jones, Capers; Software Productivity and Quality Today -- The Worldwide Perspective;  Information Systems Management Group, 1993; ISBN -156909-001-7;  200 pages.

 

Jones, Capers; Assessment and Control of Software Risks; Prentice Hall, 1994;  ISBN 0-13-741406-4; 711 pages.

 

Jones, Capers;  New Directions in Software Management; Information Systems Management Group;  ISBN 1-56909-009-2;  150 pages.

 

Jones, Capers; Patterns of Software System Failure and Success;  International Thomson Computer Press, Boston, MA;  December 1995; 250 pages; ISBN 1-850-32804-8; 292 pages.

 

Jones, Capers; Table of Programming Languages and Levels (8 Versions from 1985 through July 1996); 67 pages for Version 8;  Software Productivity Research, Inc., Burlington, MA; 1996.

 

Jones, Capers; The Year 2000 Software Problem - Quantifying the Costs and Assessing the Consequences; Addison Wesley Longman, 1998.