357
vote
Author: 
Allan Kalar

Scope

This first article in the series will explain the basics of the COBOL (COmmon Business Oriented Language) programming language. If you have no previous coding experience, COBOL is a good way to get started. If you do have coding experience, COBOL will probably be easy to learn. If this were a University course, it would be called "COBOL 101".

After you finish this article, you will understand compiled programs, the overall COBOL layout and why it is set up that way, and a few of the basic instruction of the language.

We will assume a COBOL compiler on an IBM mainframe, which is actually simpler than some other environments until it's time to debug a problem program. Each platform, from mainframe to desktop has its own extensions to the language to handle the differences in hardware and software the COBOL program is expected to deal with. The most obvious is for real-time keyboard / monitor interaction on desktops.

COBOL was invented in 1959 by a team under Grace Hopper (1906 - 1992), who retired as a Rear Admiral in 1971. Due to her efforts, COBOL became the high-level business language of choice for the U.S. Navy and she lead the efforts to adopt and standardize the language.

She also discovered the first computer "bug" when a moth shorted out a relay in a Harvard computer.

After her retirement, Adm. Hopper traveled extensively on the lecture circuit passing out "nanoseconds"; pieces of wire just under 1 foot in length which represented the distance light traveled in a nanosecond. She used pepper for picoseconds.

The famous quotation "It's easier to ask forgiveness than it is to get permission" is often attributed to Adm. Hopper.

How computers work

A simplified computer is a collection of parts. All computers have several things in common.

At the heart of every computer is a Central Processing Unit (CPU) that is the "brain" of the computer. Programs tell the CPU how to process work.

The CPU also needs some temporary memory. Memory is a chalkboard where the CPU keeps a list of instructions and fiddles with data.

Computers must have some sort of input device, a way to send instructions, feedback (for process control computers), and data to the CPU.

They also need an output device to give results back to the operator, build or update files, or to control a process.

The CPU has to be told, step by step, how to do its job. A CPU has the IQ of a doorknob. You, as the programmer, have to provide even the most rudimentary intelligence. Fortunately, other very talented programmers will have told the CPU how to talk on a basic level with the items attached to it, so you don't have to worry about clock timing cycles and other very detailed items. You only have to tell the CPU how to solve your particular problem.

COBOL is one of the easiest ways of telling a CPU what to do. And it looks almost like simple English, which makes it almost self-documenting. If you give your data meaningful names, your program will be very easy to understand.

The Environment

Mainframes don't give their full attention to the operator. Mainframes were originally set up to handle batch jobs (start a job, process a stack of input against attached files, produce a report, end the job) as opposed to interactive jobs.

When early business mainframes were "taught" to handle "real time, multi-tasking" operations, they actually did so by pretending that each interaction, or task, was a small batch job. This was done with add-on programs that sat on top of the operating system. The two most popular are CICS and IMS/TP, which are first cousins. Both work pretty much the same way. CICS has been adapted to other platforms while IMS/TP (sometimes called IMS/DC) remains a mainframe product.

When an operator entered information on her terminal and hit the enter button, it sent a message to the computer which included information about the task and the input data for the task. Once a "conversation" got going, the computer program would provide a formatted screen for the operator to enter data. Information about the task (program name and the current "place" in the "conversation") was hidden on the screen. After the operator entered data on the screen and pressed enter, the entire screen contents were sent to the computer.

The computer would put the transaction in a "to do" stack and process each one in turn. Each transaction would load a fresh copy of the program that would read the screen information to decide what to do with it. The only memory of what was going on was within the transaction. The computer had "forgotten" everything not entered into a file somewhere. When the computer came to a stopping point such as sending another screen to the operator, it would dump the job, forget about it, and go on to the next job in the "to do" stack. To the operator this looked like she had the full attention of the computer because the computer does all these things very rapidly.

Mainframes still work that way. So do most interactive Internet programs. Desktops do this in a modified manner in that the job remains in memory, so it is not constantly being swapped in and out of memory. However, the CPU will give its attention to something else if there is work to be done.

Since CICS and IMS are another subject altogether, we'll stick to batch examples. When you get involved with the desktop, you'll find a much friendlier on-line environment.

COBOL is a compiled language. This means that another program that translates the commands into machine code must process it. The output of a compiler is a file that the computer can read and execute directly.

The actual steps of compiling a COBOL program varies from installation to installation. If you installed a compiler on your own desktop machine, you'll have to read the instructions that came with it. If you work in a shop where the compiler is on a server or a mainframe, you'll have to ask the resident "expert" how to accomplish the task and where to put your source and object (compiled) files. Then you'll have to ask how to execute the compiled object program (easy on a desktop system unless you have to install the program into the Register, but involves Job Control Language (JCL) on a mainframe).

COBOL's Basic Structure

COBOL is written using four "divisions". Each division gives the compiler information about the program, external environment, files used, and step-by-step instructions that the program must execute.

The divisions are: ID DIVISION, ENVIRONMENT DIVISION, DATA DIVISION, and PROCEDURE DIVISION.

The compiler will print a listing of the program along with some of its own information. On mainframes, the information produced on a compile listing can be controlled by options in the JCL statements or the defaults for that particular shop.

ID DIVISION

The Identification Division is nothing more than the name of the program, the author, and some remarks. This functions more as comments than anything else. This is a good place to keep a revision log as to who changed what, when.

ENVIRONMENT DIVISION

This division tells the computer what type of computer is being used to compile the program and what type of computer is the target. Most compilers pretty much ignore this information and assume the computer being used to compile the program.

It also describes the external (outside the program) names of the files and other equipment being used for the job. This information is very important. Without this information, the program itself is deaf, dumb, and blind.

In some cases, there will be no external environment, most notably if the program is a subprogram being called by another program that actually deals with the "outside world". In those cases, the input and output may only be data that is passed within the computer. A good example would be a subprogram that returns the date and time from the computer's clock.

DATA DIVISION

Here's where we describe the actual format of files attached to the job as well as the internal data items and structures used within the program.

Any variable or named constant must be described here. Un-named constants, such as 1276 or ‘mother' can be used on the fly within the instructions. The compiler will set up those constants as data items to handle them.

PROCEDURE DIVISION

This is where all the work and logic gets done and is usually, but not always, the largest division within a program. Here's where you tell the computer what you want it to do and how to accomplish the tasks.

Formats

COBOL has five fields on a line that it inherited from the old computer card days. IBM originally set up an 80-column punched card that was read into the computer via a card reader (which could also punch new cards). The cards used Hollrith coding, which is a 12-bit (actually 12 hole) scheme (see http://www.maxmon.com/punch1.htm for a discussion of Hollrith's work). UNIVAC used a 90-column card. The IBM cards were handy. You could carry a few in your shirt pocket and use the back (blank) side for notes.

The cards are long gone now, but the formats for COBOL and other early languages such as FORTRAN lives on.

COBOL's format is:

1-6 Numbers (for resorting the deck when you drop it on the floor). Not required today, you can leave it blank.

7 Indicator Area: Enter an asterisk ("*") to turn the line into a comment. The compiler will print the comment on the program listing, but ignore its contents.

8-11 Area A. A few headers and such must begin in this area. We'll note them as we progress

12-72 Area B. This is our main work area.

73-80 Ignored. In the card days, the name of the program was punched here in case two dropped card decks got mixed up. A card sorter could separate them and then sort them into sequence. You can put anything you want in this area.

COBOL is easy to type. Within the B area, you can put things pretty much where you want to. Outside of quote marks, spaces are pretty much ignored in that multiple spaces are treated as just one space for compilation purposes. An instruction can overflow to subsequent lines and the overflow will be treated as a space. For instance, "MOVE A TO B" and "MOVE A TO B" produce the same code.

However, if you are entering a long character string (inside single quote marks) that won't fit on one line, special rules apply:

If there is no hyphen (-) in the indicator area (column 7) of the next line, the last character of the preceding line is assumed to be followed by a space.

If there is a hyphen in the indicator area of the next line, the first nonblank character of this continuation line immediately follows the last nonblank character of the continued line without an intervening space.

In both cases, the continuation must start within the B Area.

COBOL is made up of sentences terminated by a period. Even within the data structures. Some of the time, forgetting a period won't mean much, but there are times when missing one will change the meaning of a command, so get used to using them, especially in the data description areas.

Data names

Data that is being manipulated by the program resides in memory. In the bad old days, the programmer had to know exactly where, in memory, the information was parked and had to tell the CPU the exact address of the data. For instance 1576897, was an instruction to an IBM 1401 to "add" (‘1' was the add command) the number that ended (right hand end of the number) at location 576 to the number ending at location 897. Something called a "word mark" would be at the left side of the numbers to tell the CPU where the other end of the number resided.

In COBOL, we don't have to mess with this sort of thing. The compiler will set it up for us. All we have to do is describe the field and give it a name. For instance, we can tell the compiler that JOB-NUMBER is a 5-byte numeric field and leave it up to the compiler to find a place for JOB-NUMBER and keep track of it for us. COBOL does the same thing with alphanumeric (fields with a mix of alphabetic, special characters, and/or number) data.

A Simple Program

"C" programmers will recognize this one. Refer to figure 1 for the actual code.

The ID DIVISION is there so someone who comes along later can get an idea of what the program does.

The EJECT statement on line 001300 causes the compiler to display the following line of the program listing on a new page.

The ENVIRONMENT DIVISION contains sub headings called SECTIONs.

The CONFIGURATION SECTION is the source/object computer stuff we mentioned earlier.

The INPUT-OUTPUT SECTION contains a subhead called FILE-CONTROL where we describe the attached device, in this case a printer. SELECT says "a device or file description follows". The internal name of the device is going to be PRINT-F. The compiler knows that it's a hardware device because we gave it an external name starting with "UT" which means it's a "utility" device. The ASSIGN verb creates an external name that will correspond to a JCL statement at run-time. The JCL will tell the program which device to use for UT-S-REPORT1. It could be an attached printer or a file that can be batch printed later.

The DATA DIVISION also has sections. The FILE SECTION is where we tell the compiler what the file specifications are for the external files and devices we described within the ENVIRONMENT DIVISION. In this case the print file is described as a file with 133 character records, unblocked (which means each record is separate as opposed to collecting a number of them together and writing them as one long combined record).

The record is 133 characters because an attached IBM chain printer prints 132 characters per line on 14" wide paper (10 characters per inch). The extra character is the first character that is used to send special instructions to the printer, such as "EJECT" or "skip 3 lines". For this job, we could have used 8 1/2" wide paper and specified an 81 character record.

PIC is short for "PICTURE" and uses special characters to describe the format of the data. "X" is used for alphanumeric data. It can handle a mix of letters, numbers, and special characters. "A" is for alphabetical characters only. "N" is for numbers. We'll get into more of this in later lessons.

The WORKING-STORAGE SECTION is used to describe internal data spaces. We only have one. The "01" in Area A on line 4700 says this is a new structure unrelated to any other structures. In this case, we don't define a data structure within the 01 level because we don't need one.

The PROCEDURE DIVISION is straightforward for this program. It starts with a "paragraph" name. Paragraphs define groups of code within a COBOL program and can be used to identify entry points. For instance, you can "GO TO" a paragraph name, thus skipping intervening code. However, good programming practice is to avoid the "GO TO" statement (in any language, not just COBOL). We'll get into other techniques in future lessons. Paragraph names always start in Area A.

The instructions tell the CPU to "open" the printer file (required before you can use it). Move "HELLO WORLD" into WORKING-STORAGE's FILE-RECORD, then move FILE-RECORD to the print area. After that, it prints the single line and "closes" the printer, which will tell the printer to stop printing and eject a page so the next program won't print on our page.

That's it. Our program is done, so the computer will flush it and look for something else to do.

We could have sent the "HELLO WORLD" directly to the print area, but when things get more complicated, this method is cleaner.

When an alphabetic or alphanumeric field is moved to a larger, similar field, it gets left justified and right blank filled.

Note: The DIVISIONs and some of the key words within them all start in Area A. Comments (* in column 7) can start in the A Area because the compiler skips over those lines and just prints them on the listing. I got in the habit of using asterisks for blocking in comments rather than dashes, because on the old impact printers, the dashes could create a tear across the paper if it was thin or poor quality. It was also hard on the printer's ribbon.

Data structures and File records all start with a "01" level that starts in Area A.

If we were working with a data structure it might have looked like this:

003000 01 FILE-RECORD.

003100 05 FR-SECT-1 PIC X(12).

003500 05 FR-SECT-2 PIC X(8).

COBOL can be written without either of these disciplines, but if you want to be taken seriously as a programmer (and get paid for it), you need to produce code that's easy for someone else to understand and maintain. Heck, five years after you produce a program, you might be the one assigned to change it. I assure you that you'll have forgotten it by then, so be nice to yourself as well as others and leave a trail of bread crumbs so you can find your way.

Summary

We've learned the basics about COBOL including its format and the use of the four DIVISIONs. We can now create a simple program to print something.

Figure 1: A simple COBOL program

 

000100 ID DIVISION.

000200 PROGRAM-ID. HELLO.

000300 AUTHOR. ALLAN B. KALAR, VIKING WATERS.

000500*REMARKS.

000600*

000800* PRINT HELLO WORLD ON THE SYSTEM PRINTER.

000900*

001300 EJECT

001400 ENVIRONMENT DIVISION.

001500 CONFIGURATION SECTION.

001600*SOURCE-COMPUTER. IBM370 WITH DEBUGGING MODE.

001700 SOURCE-COMPUTER. IBM370.

001800 OBJECT-COMPUTER. IBM370.

001900 INPUT-OUTPUT SECTION.

002000 FILE-CONTROL.

002100 SELECT PRINT-F ASSIGN UT-S-REPORT1.

002400*

002500*

002600 DATA DIVISION.

002700 FILE SECTION.

002800 FD PRINT-F

002900 BLOCK 0 RECORDS.

003000 01 PRINT-R PIC X(133).

003100

003500

003600 EJECT

003700 WORKING-STORAGE SECTION.

004400******************************************************

004500* RECORD LAYOUTS

004600******************************************************

004700 01 FILE-RECORD PIC X(20).

004800

012600 EJECT

012700 PROCEDURE DIVISION.

012800 MAIN-LINE.

013500 OPEN OUTPUT PRINT-F.

013600

013700 MOVE ‘HELLO WORLD' TO FILE-RECORD.

013800 MOVE FILE-RECORD TO PRINT-R.

013900 PRINT PRINT-R.

014000

014800 CLOSE PRINT-F.

015000

016300******************************************************

016400******************************************************

 

 

Allan Kalar is the Director of Technical Services for Viking Waters (www.vikingwaters.com). As an "old time" programmer, he still remembers mainframes fondly and occasionally gets an opportunity to work on one for a client.


Average: 1 (2 votes)

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • You can use context links in the text to create context-related links to pages or sites that provide additional information about a word or phrase.
  • Allowed HTML tags: <br> </p> <p> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <object> <embed> <script>
  • You can use <object>, <embed> and <script> tags from the following sites to add media to your posts:

  • Each email address will be obfuscated in a human readble fashion or (if JavaScript is enabled) replaced with a spamproof clickable link.
  • You may link to images on this site using a special syntax
  • You may quote other posts using [quote] tags.
  • Web page addresses and e-mail addresses turn into links automatically.
  • You may link to webpages through the weblinks registry

More information about formatting options

Syndicate content