Simulated Machine (Assembler): Learn the Anatomy – Then Move to the Tools William G. Verbrugge wgverbrugge@csupomona.edu California State Polytechnic University, Pomona 3801 West Temple Avenue Pomona, CA 91768 Abstract Integrated Development Environments are excellent production tools for intermediate and advanced programming students and even beginners after they have learned the concepts of stored data, computer instructions, and the anatomy of the computer. There is a need for an assembler language that is simple and straightforward for the beginning student to understand. Most authors of introduction to programming books recognize this by their inclusion of one to twenty pages on this topic. This paper presents how using a simulated assembler with a simple assembly language can introduce the beginning student to the concepts of stored programs, core storage, the difference between instructions and data, the ability to modify a set of instructions, etc. without having to be concerned with all the exceptions and rigor of a full assembler language. The Simulated Assembler and the easy procedures for using it in a first programming course are provided. Students studying Computer Information Science need the basic knowledge of the computer to develop their skills in design and programming. Sometimes students find high-level Languages hard to comprehend because of the ‘seeming magic' of the language. Using the assembler tool described here should provide an increase production in learning (learn by doing). Keywords: assembler, simple machine, software tools, language, programming, object oriented, machine language 1. INTRODUCTION The growth in hardware technology has allowed the theories of modern programming languages to become a reality. In the beginning, developers of computer languages were hindered by the lack of processing speed and memory to implement their vision. Variable names and data were restricted in size and thus not very descriptive of their meaning. Most languages then followed a close representation of the function of the hardware in order to conserve on memory and be resourceful. But researchers continued to work on natural languages. One of the good outcomes from learning our industries first languages is that the concept of how the computer worked was inherited in the language. Thus the logic of the application and how the computer actually ran the program was a natural outcome of learning the language. Most computer languages taught today are Object Oriented Programming Languages. In these languages one builds objects (black boxes that have attributes and behavior and identity (a name)) that can be used by other objects. To aid in covering all the essential topics, production tools are used so more time can be spent on logic and object concepts. Many instructors use Integrated Development Environments (IDEs) to aid in writing the source code. Results of student tests in introduction to Object Oriented Programming Languages using IDEs, has indicated that many students had a weak understanding of the concepts of stored programs, memory, the difference between instructions and data, the compilation process, and simple execution logic. This poor outcome was in spite of the Instructors clearly covering these topics and providing diagrams of how a created object would be referenced in memory. What seemed to be missing were the hands on writing and viewing of a logical process in the core of a computer. This paper presents how using a simulated assembler (a tool for learning) with a simple assembly language can introduce the beginning student to the basic concepts of how programming languages will run on the hardware. Although the Simulated Machine (SM) can solve complicated procedures (sorting, simulations, etc.), it is best used in an introduction programming course to show simple comparisons, arithmetic operations, transfers, etc. The author has found that thirty to forty minutes of class time and a simple assignment provides an excellent reference when introducing an Object Oriented Language. Most authors of introduction to programming books recognize this by their inclusion of one to twenty pages on this topic (Gittleman, 2002) (Koffman, 2002). The Simulated Assembler is available at www.csupomona.edu/~wgverbrugge and the easy procedures for using it in a first programming course are provided. 2. LEARN THE ANATOMY The Simulated Machine (SM) illustrates the anatomy of the computer. Its view allows the student to see all phases of the programming cycle (writing the source, compiling the source to object code, and running the object code) in one view. And with the ability to execute one instruction at a time, the student can see the program move data (instructions, variable, or constants) from memory to registrars to memory. A common practice in introduction programming courses is for the instructor to display a small set of numbers and ask the class to tell them the average (answers come quickly). Then the class is asked to explain how they obtained the answer step by step (answers come slowly). This leads to a flow chart or pseudo code of the procedure and then how one needs to tell the computer box to perform the task (Malik, 2005). One can then describe the anatomy of the computer (Central Processing Unit (CPU) - machine instructions and registers, memory, input/output, etc.). Next with a simple assembler language, described below, the pseudo code can be translated into a computer language that represents the instructions of the CPU. Subsequently the same program could be illustrated in the language being taught. Figure 1 (in the appendix) shows a more general-purpose program that uses a loop and test to accept a sequence of numbers. The JAVA equivalent is also shown. A sequence of machine instructions is a program. Each instruction command is represented by a binary pattern (0001010000110100). If the first 6 bits (= 5 in decimal) of this pattern represents the operation code (opCode) and the remaining 10 bits (the operand) represents the memory location (= 100 in decimal), then in the SM this instruction would mean - clear the accumulator registrar and add the value at location 100 in memory. The SM illustrates the binary using decimal so that the instructions and memory locations are easy to read. Many instructors teach binary to decimal conversion. The SM provides an answer to the question “Why are we doing this?”. Programming in the machines language would be a real test of ones personal memory. Thus a programming language called an assembler was created that used a mnemonic code for the instruction operation and used numbers or variable names that represented the operand. In Figure 1 the first assembler instruction ( Start cla 0 ) is shown in the source code section. The compiler (translator), which is called when one presses the Compile Code button, translates this instruction to its machine instruction equivalent. “Start” would have the value 0, since the first instruction is in location zero of memory. The operation code “cla” is translated to a binary 5. The “0” gets stored in location 100 of memory (the first location that data is stored in this SM). The object code section of Figure 1 shows the result “0 5100” of the translation to machine code. This is the instruction CLA (5) - clear the accumulator and add to it the value at memory location 100. Thus a source program is entered into the Source code panel. It is then submitted by pressing the Compile Code button. Each line is read sequentially, interpreted, and stored into the Simulated Machine’s memory. The instructions are stored in addresses 000-099. All variables and constants are stored in addresses 100-999. This is shown in the “Object Code from Source” panel. After writing a program it is easy to execute the code using the Simple Machine - "Run Object Code" or “Run One Inst” button. The third panel shows the execution results and the values in the registrars as the program is executed in the machine. The simple machine, like most computers, consists of three major elements, core memory, instruction control unit, and registers. All store data in a five digit numerical format. In a real computer these decimal digits are binary numbers (i.e. memory location 030 is 11110 in binary. We could make the machine all binary, but using decimal digits loses nothing and it's much easier to visualize the internal hardware. The SM has: * 1000 Memory Locations (Addressed 0 - 999) o 000 - 100 Reserved for Instructions o 101 - 999 Reserved for Variables and Constants * AC -- Accumulator Register * MQ -- Multiplier-Quotient Register o Both registers can hold any positive or negative value greater than or equal to –2,147,483,648 and less than or equal to 2,147,483,647. * Two controlling Registers o Instruction Register -- Holds the binary instruction - viewed in Decimal o Instruction Location Register -- Holds the binary value of the memory location where the instruction was - viewed in Decimal The simple machine will execute instruction sequentially beginning with address 000, unless altered by a transfer statement. 3. THE ASSEMBLER LANGUAGE The assembler language presents mnemonic codes that represent the machine hard wired bit code instructions. An instruction consists of an operation code and an operand. The operation code determines the action the computer should perform and the operand is the location in memory that the action is performed on. An assembler program consists of a list of instructions. Assembler instructions have the following format:       LABEL:  OperationCode Operand   #Comment The assembler is not case sensitive. Thus cla, CLA, and Cla are the same. Some different forms of an instruction are the following: start: CLA 1 # 1 is a constant # start: is a label STO one #one is a variable # and holds 1 TRA Next: ADD one # this instruction #will be skipped Next: STP Notice that labels and comments are not essential for an instruction. However, all operation codes except “STP” (Stop the program) require an operand. Each instruction may be comprised of the following four major elements. 1. Labels o Labels are used as a reference to a specific memory location. o Labels follow the following format.     Name: * "Name:" is an identifier, which refers to the current line. It can be any word followed by a colon, which is left to the programmer’s discretion. * The colon, (:), is used to signify that it is a label. It follows directly after the name. 2. Operation Code o An operation code is a special three-character command, which informs the computer to perform a specific function, such as add or subtract. o At run time, the operation code has been translated into a two-digit code, which the Machine simulator can understand and manipulate. o See Table 1 for a list of the operation codes and their function. 3. Operands o Operands can consist of labels, variables, and constants. o Variables refer to memory locations, which store binary data. o Variable are formatted as follows.   variable or VARIABLE; number1 or NUMBER1 * Any sequence of characters - the compiler is not case sensitive. o Constants are positive or negative numbers, which can range from negative 2,147,483,648 to positive 2,147,483,647. These do not have any distinctive characters attached. * To use a constant, simply use the positive or negative number after an operation code. o At run time, the simulator will translate the operand into its numerical code and store it in the proper memory address. For example, if you had only one variable, the simulator would store that variable at address 100. Anytime it is referenced in an instruction, the variable is replaced with its address. 4. Comments o Comments are non-essential parts of a program. They are there for the sole purpose of readability of a program. The format of a comment is as follows:     # this is a comment * Notice that a comment may begin with a pound sign (#). * The simulator will ignore anything following the pound sign. o At run time, the simulator will strip all comments from the instructions. 4. THE MACHINE INSTRUCTIONS The machine has fourteen instructions as listed in Table 1. When reading the table, note that (x) should be read as the contents of x (e.g. (MQ) means the contents of the Multiplier-Quotient Register). The "->" symbol should be read as “is placed into". The letters “bbb” refer to the memory address of the operand. For example, the instruction “ADD one” would be interpreted as operation code (OpCode) = ADD and operand = one (a variable which is a reference to a memory location). The effect ((bbb) + (AC) -> (AC)) is read as "the contents of the memory location of the variable (one) plus the contents of AC are placed into the contents of AC." Also, all operations except STP need a memory location (represented by bbb), which can be a constant, label, or a variable. The letter sequence “iff” is read as “if and only if“. Operation Code Numerical Value Effect STP 01 Stops the program TRA 02 Transfer to next instruction at location bbb TLE 03 Transfer to next instruction at location bbb iff (AC) <= 0, else next instruction TNZ 04 Transfer to next instruction at location bbb iff (AC) != 0, else next instruction TEZ 15 Transfer to next instruction at location bbb iff (AC) = 0, else next instruction CLA 05 (bbb) -> (AC) STO 06 (AC) -> (bbb) LDQ 07 (bbb) -> (MQ) STQ 08 (MQ) -> (bbb) ADD 09 (bbb) + (AC) -> (AC) SUB 10 (AC) - (bbb) -> (AC) MPY 11 (MQ) * (bbb) -> (MQ) DIV 12 (MQ) / (bbb) -> (MQ), remainder -> (AC) RD 13 Contents of input -> (bbb) WRT 14 (bbb) -> Printed to output frame Table 1 – Instructions (Operation Codes and their effect) 5. CONCLUSION Experience has established that an understanding of how a stored program is executed by a computer is one of the main learning concepts to understanding how to write a program in a procedural oriented language. As one moves to Object Oriented languages, where the running program creates objects and stores them in memory, the understanding of the concept becomes even more important. The Simulated Assembler presented here should provide the student with the fundamental concepts of developing and running a computer program. Thus, the learning progression of defining global and local variables, operations, and objects will have a foundation to build on. The Simulated Assembler can be used as a root to many courses – providing a time saving reference as new topics are presented. The author intends to provide access for anyone via the Universities web server (www.csupomona.edu/~wgverbrugge). Some courses of study require learning a full assembler language (IBM, 2001) as the root to their discipline. For those that do not have this requirement, the SM can be illustrated in one lecture. If a course requires more profound study, the SM can be used to illustrate topics like setting up arrays, modifying instructions, etc. The SM is easy to operate and operating instructions are provided on the web, since these will change as enhancements are added. More readability and save/load source code are in development. 6. REFERENCES Gittleman, Art (2002). Computing with JAVA Alternate second Edition. Scott/Jones. pp 2-5. IBM - International Business Machines Corporation (2001). AIX 5L for POWER-based Systems Assembler Language Reference 2nd Edition. Koffman, Elliot and Wolz, Ursala (2002). Problem Solving with Java - 2nd Edition. Addison Wesley. pp 1-16. Larman, Craig (2002). Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process (2nd Ed). Upper Saddle River, NJ: Prentice Hall PTR. Malik, D.S. (2005). Java Programming – From Problem Analysis to Program Design 2nd Edition. Thomson Course Technology. pp 1-21. Appendix Figure 1 -- This program finds the average of a set of numbers input until a –9999 is entered. See corresponding Java program below. // The Java model class for the assembler find average program public class Avg { public void computeAvg() { double num = Double.parseDouble( JOptionPane.showInputDialog("Enter A Number")); double count = 0; double sum = 0; while (!(num == -9999)) { sum = sum + num; count = count + 1; num = Double.parseDouble( JOptionPane.showInputDialog("Enter A Number")); } JOptionPane.showMessageDialog(null, " Avg = " + sum / count); } // end computeAvg() } // end class