Big Online Book of Linux Ada Programming - 19 Java Byte Code / Mixing Languages

19 Java Byte Code and Mixing Languages

19.1 Ada Meets Java

ACT's JGNAT compiler will compile Gnat source files into Java applications or applets. This section introduces JGNAT.

19.1.1 The Java Virtual Machine

Java is an interpreted language. It is compiled in an artificial machine language for a computer that doesn't exist. This computer is called the Java Virtual Machine (JVM) and the instructions are known as bytecode. The process is similar to the one used by the UCSD Pascal, a popular language from the 1980s that also used a virtual machine language (called P-Code).

When Java is compiled, the source code is converted to a .class file. This file contains the instructions for the JVM.

The JVM language is not directly related to Java. It is possible for other languages to create JVM executables. The JGNAT compiler converts Ada source files into .class files that can be executed by Java.

The official Java Virtual Machine Specification is available at Sun's Java website.

19.1.2 JGnat

Most of the Gnat tools have a corresponding JGnat version, including gnatmake. To compile an Ada program into a Java byte-code program, use jgnatmake:

  jgnatmake hello

Table: jgnatmake switches

JGnatmake Switch	Description
-a	Consider all files, even readonly ali files
-c	Compile only, do not bind and link
-f	Force recompilations of non predefined units
-i	In place. Replace existing ali file, or put it with source
-jnum	Use nnn processes to compile
-k	Keep going after compilation errors
-m	Minimal recompilation
-M	List object file dependences for Makefile
-n	Check objects up to date, output next file to compile if not
-o name	Choose an alternate executable name
-q	Be quiet/terse
-s	Recompile if compiler switches have changed
-v	Display reasons for all (re)compilations
-z	No main subprogram (zero main)
--GCC=command	Use this jgnat command
--GNATBIND=command	Use this gnatbind command
--GNATLINK=command	Use this gnatlink command
-aLdir	Skip missing library sources if ali in dir
-Adir	like -aLdir -aIdir
-aOdir	Specify library/object files search path
-aIdir	Specify source files search path
-Idir	Like -aIdir -aOdir
-I-	Don't look for sources & library files in the default directory
-Ldir	Look for program libraries also in dir
-nostdinc	Don't look for sources in the system default directory
-nostdlib	Don't look for library files in the system default directory
-cargs opts	opts are passed to the compiler
-bargs opts	opts are passed to the binder
-largs opts	opts are passed to the linker
-g	Generate debugging information
-Idir	Specify source files search path
-I-	Do not look for sources in current directory
-O[0123]	Control the optimization level
-gnata	Assertions enabled. Pragma Assert/Debug to be activated
-gnatA	Avoid processing gnat.adc, if present file will be ignored
-gnatb	Generate brief messages to stderr even if verbose mode set
-gnatc	Check syntax and semantics only (no code generation)
-gnatd?	Compiler debug option ? (a-z,A-Z,0-9), see debug.adb
-gnatD	Debug expanded generated code rather than source code
-gnate	Error messages generated immediately, not saved up till end
-gnatE	Dynamic elaboration checking mode enabled
-gnatf	Full errors. Verbose details, all undefined references
-gnatF	Force all import/export external names to all uppercase
-gnatg	GNAT implementation mode (used for compiling GNAT units)
-gnatG	Output generated expanded code in source form
-gnath	Output this usage (help) information
-gnati?	Identifier char set (?=1/2/3/4/8/p/f/n/w)
-gnatk	Limit file names to nnn characters (k = krunch)
-gnatl	Output full source listing with embedded error messages
-gnatL	Use longjmp/setjmp for exception handling
-gnatmnnn	Limit number of detected errors to nnn (1-999)
-gnatn	Inlining of subprograms (apply pragma Inline across units)
-gnato	Enable overflow checking (off by default)
-gnatO nm	Set name of output ali file (internal switch)
-gnatp	Suppress all checks
-gnatP	Generate periodic calls to System.Polling.Poll
-gnatq	Don't quit, try semantics, even if parse errors
-gnatR	List representation information
-gnats	Syntax check only
-gnatt	Tree output file to be generated
-gnatTnnn	All compiler tables start at nnn times usual starting size
-gnatu	List units for this compilation
-gnatU	Enable unique tag for error messages
-gnatv	Verbose mode. Full error output with source lines to stdout
-gnatw?	Warning mode. (?=s/e/l/u for suppress/error/elab/undefined)
-gnatW	Wide character encoding method (h/u/s/e/8/b)
-gnatx	Suppress output of cross-reference information
-gnatX	Language extensions permitted
-gnaty	Enable all style checks
-gnatyxxx	Enable selected style checks xxx = list of parameters: 1-9 check indentation b check no blanks at end of lines c check comment format e check end labels present f check no form feeds/vertical tabs in source h check no horizontal tabs in source i check if-then layout k check casing rules for keywords, identifiers m check line length <= 79 characters Mnnn check line length <= nnn characters r check RM column layout s check separate subprogram specs present t check token separation rules
-gnatz	Distribution stub generation (r/s for receiver/sender stubs)
-gnatZ	Use zero cost exception handling
-gnat83	Enforce Ada 83 restrictions

jgnatmake will create two files: hello.class and ada_hello.class. To run the program under the Java interpreter, type

  java hello

Table: java switches

Java Interpreter Switch	Description
`-help`	Print usage info
`-version`	Print version number
`-ss size`	Maximum native stack size
`-mx size`	Maximum heap size
`-ms size`	Initial heap size
`-as size`	Heap increment
`-classpath path`	Set classpath
`-verify`	Verify all bytecode
`-verifyremote`	Verify bytecode loaded from network
`-noverify`	Do not verify any bytecode
`-Dproperty=value`	Set a property
`-verbosegc`	Print message during garbage collection
`-noclassgc`	Disable class garbage collection
`-v, -verbose`	Be verbose
`-verbosejit`	Print message during JIT code generation
`-verbosemem`	Print detailed memory allocation statistics
`-debug`	Trace method calls
`-noasyncgc`	Do not garbage collect asynchronously
`-cs, -checksource`	Check source against class files
`-oss size`	Maximum java stack size
`-jar`	Executable is a JAR

19.2 ASIS

The Ada Semantic Interface Specification (ASIS) is a standard for building development tools for Ada 95. ASIS is implemented as a series of Ada packages and they are included with the Gnat compiler. Debuggers, source code browsers and code checkers can all be written using ASIS.

ASIS works like a database. Different ASIS child packages return different information. A program using ASIS issues queries to the ASIS packages and ASIS returns the query results.

19.3 Assembly Language

This section discusses embedding assembly language into an Ada 95 program using Gnat. There is a tutorial on Ada assembly language programming is available at http://www.adapower.com/articles.

Optimizing your programs is more than just rewriting your Ada source code. Gnat performs many basic optimizations for you. For example, Gnat will take these statements

In order to improve the performance of you source code, you'll have to consider issues that Gnat cannot know beforehand or issues that Gnat does not consider when it compiles:

The latest Pentium processors make great effort to reduce the average length of time it takes to execute an instruction, even when it makes programs difficult to optimize. In a sense, the processors take such drastic measures to improve bad source code that it's difficult to write good, efficient source code. Even in Ada, reversing two statements can measurably improve (or degrade) performance on a Pentium family processor.

19.3.1 Pentium Family Processors

The latest Pentium processors are referred to as "IA-32" (32-bit Intel architecture) in the Intel literature. Although the instructions they execute are based on the 80386 processor, you can think of them as hardware emulators that pretend to follow 80386 instructions: internally, the instructions are broken down into different instructions called "microops".

Despite years of extensions, optimizations, and cache increases, the Pentium family is still just a 32-bit 80386 at its core. There are 8 general purpose registers--although they aren't really general purpose since some are more "general" than others. Only six are normally useful:

Instructions can address less than a register's entire contents. For example EAX can be refered to as AX (lower 16-bits) AH (the second (high) byte), and AL (the first (low) byte). The second four can address only the lower 16-bits by dropping the leading "E" (BP, SI, DI, SP).

There are also 6 segment registers (CS, DS, SS, ES, FS, GS), but you don't normally need to work with these.

An additional register, EFLAGS, contains the status bits for the processor. This represents the results of the last operation or modes that the processor is running in:

19.3.2 Instruction Set

The actual instruction set for the Pentium family is too large to list here. The manual is available for download from Intel and is nearly 1000 pages long. They are available at Intel's Web Site. (The instruction set documentation on the Pentium II site covers all IA-32 processors, including Pentium III, 4, and so on.)

Floating point instructions use a separate set of registers and their own instructions usually prefaced by a "F".

The MMX, SSE and SSE2 are collectively referred to as SIMD instructions in the Intel documentation. "SIMD" is an acronym simply meaning an array operation. These instructions load, save and perform operations on part of an array. For example, using MMX instructions, you can load 32 16-bit integers, add them to a second set of 32 16-bit integers, and save the results with only 3 instructions. Since they work on fixed-size blocks, your arrays must be sized accordingly.

Most of these basic instructions can take a length suffix indicating the amount of data: b (byte), w (word), l (long). MOVW moves a 16-bit value. INCL increments a 32-bit value. This suffix is different in Linux than in the Intel documentation because Linux uses the AT&T syntax. For example, PUSHFD (push flags double) is PUSHFL (push flags long) in Linux.

Many of the IA-32 instructions are "CISC" instructions containing an instruction that does the equivalent of two or more simplier instructions at one time. For example, "CMOV" performs a test and then moves the data if the test succeeds--implicitly a jump plus a move. Since the lastest processors are heavily pipelined and paralleled, this type of combined instruction has less of an impact on performance than you might think. The processor sees both CMOV and a jump+MOV as the same sequence of microops internally.

19.3.3 Operands

Assembly language instructions take zero or more operands. Here are some examples of operands.

To tie in Ada variables, refer to the variable as %0...%n. The compiler will substitute in the proper operand to access the value of that variable.

In the AT&T assembly language syntax, the order of operands is reversed to that of Intel's literature. For example, to load hex F into EAX, the mov operands would be "$0XF, %eax" not "%eax, $0XF".

19.3.4 System.Machine_Code.Asm

To use assemble language, include the System.Machine_Code package in your Ada source file. This package contains a procedure called "asm" for inserting assembly language instructions into a program. This is very similar to the C language function of the same name.

The first and only required parameter (named template =>) is the text, as a string, to be given to the assembler. Expect this to be quite literally saved into a temporary file for the GNU assembler to process. As a result, including any formatting, such as line feeds and tabs, so the assembler will read your instructions properly.

In order for Ada to use your assembly code to do something useful, it needs to know how to interface the Ada variables. The "inputs" and "outputs" parameters do this. These parameters use items created by the special 'Asm_Input and 'Asm_Output attributes.

These constraints implicitly let Ada know that these registers will be used by you and if it was using them, it will save them prior to executing your assembly code. You don't need to save them yourself.

In addition, there are general constraints that don't specify a particular register:

These aren't as useful as you might think. When using a general constraint, Gnat doesn't keep track of which registers it has assigned. "r", the constraint for any available register, will likely be the EAX accumulator. Using "r" for two inputs is the same as using "a" twice and will cause one value to overwrite the other.

When counting, the inputs are numbered first. That is, if you have one item in inputs and one item in outputs, the input is %0 and the output is %1.

19.3.5 Other Asm Flags

Another asm parameter, clobber, is a string with the names of the registers that need to be saved besides the ones implicilty referred to in the inputs and outputs. Clobber strings can be a register name, "cc" for processor flag or "memory" for a memory location.

The Asm procedure is treated as a normal Ada procedure. During optimization, Gnat may change the order in which the instructions in your program are executed to improve performance. For example, if your Asm procedure is inside a loop, Gnat may move the procedure outside of the loop if it thinks it is save to do so. This is safe to do for an Ada procedure, but an Asm procedure may suffer side-effects and not function correctly. Use the fourth Asm parameter, volatile, to indicate to Gnat that it is not safe to move your Asm procedure during optimzation.

For the same reason, you should not use two or more Asm procedures in one block of source code because Gnat may attempt to reorder them. Instead, place all your instructions into one Asm procedure to ensure the instructions will execute in the proper order.

19.3.6 A Complete Example

19.4 Calling Ada from C

It is possible to combine Ada 95 with a main program written in C. Using Ada 95 classes, functions and procedures from another language is more difficult than the reverse process. While the GNAT compiler has a lot of support for other languages, the other languages do not supply the same level of support. Be prepared to do some manual chores in order to compile and link your program.

It is best to use the same GCC compiler for all the source files. Both the ACT and ALT versions of GCC have the C language enabled.

As discussed under types in this document, most C types have direct correspondence to Ada types. A C "int" is the same as an Ada "integer". For greater portability, the Interfaces.C package and its children contain the definitions of many standard C types. Arrays and records are directly equivalent to C arrays and structures. Special cases are noted below.

Before calling any Ada 95 subprograms, the C program should call the function adainit which performs the initializations and elaborations for the Ada 95 source code. Before the C program exits, it should call adafinal to perform any cleanup. These functions are created by gnatbind so you cannot create a C test program without creating at least one an Ada source file as well.

Suppose you want to call a single Ada procedure with no parameters. Your C main program would look something like this.

Create an Ada package containing the "ada_subroutine" procedure. The procedure should be exported to C using pragma export. Because C is a case sensitive language, pragma export will convert the procedure name to lower case characters. (There's another pragma that can change how the case conversion is performed.) Alternatively, you can explicitly supply a new C name in pragma export.

Although the exported subprograms don't need to be prototyped (declaring the function headers), all subprograms should be prototyped in the same way that external C functions are prototyped. Prototyping ensures that the functions will be called with the proper parameters.

Declaring a function in Ada doesn't ensure that the parameters are strongly typed. The parameters are set up by C prior to calling the Ada function.

If Times_2 is overloaded, exporting it will cause an "already defined" error during linking. You will have to provide pragma export with alternate C names that won't conflict with each other.

Likewise arrays can be exported. Remember that in C, array indices always start at zero.

Certain variable types are more difficult to deal with. For example, Ada enumerated types must be declared with "pragma convention( C" to be compatible with C's enumerated type. This is the only case where Gnat stores a type differently for the benefit of another language.

The contents of arrays or records packed with pragma pack are not easily accessible from C. Ada represents these as raw bits.

Ada unconstrained arrays (including unconstrained strings) have no equivalent in C. The easiest work around for strings is to use C strings instead, as defined in the Interfaces.C package. For C to call a subprogram with unbounded Ada strings as parameters, the C program must also pass the string bounds as parameters. [If anyone knows how to do this, email me.--KB]

19.5 Calling C++ from Ada

The Ada 95 standard doesn't support interfacing to C++, but Gnat provides extensions so that GNU C++ source code can be used with Ada.

If possible, the same GCC compiler should be used for all source files. If you are using the ACT version of GNAT 3.x or earlier, you should recompile GNAT to enable C++. The ALT version of GNAT 3.x (usually) has C++ enabled. If C++ is not enabled, you will receive a message from GCC about "cc1plus" (the C++ compiler) not being found. However, for the complete example below, I used two different version of GCC and had no problems.

For the most part, calling C++ from Ada is done the same was as calling C from Ada. Instead of using "C" in pragma import, use "CPP" (that is, C++) as the language convention. However, Gnat provides no support for C++ "name mangling": all C++ declarations should use extern "C" to stop the name mangling. (If you are an adventurer, use the dumpobj -t command to determine the symbol names used by C++ and include them explicitly in pragma import.)

If Ada and C++ use different GCC compilers, the linker may not be able to tell which version of libgcc to use. You can check which library is being used with the gnatlink -v -v (very verbose) switches.

Importing C++ objects into Ada is possible but difficult. C++ and Ada implement objects using different Object Oriented Programming models. C++ objects are not identical to tagged records. Gnat has special pragmas for importing C++ objects:

The Gnat C++ interface proposal also has a CPP_Destructor pragma, but this has not been implemented. [Perhaps it is not necessary? --KB]

There are two naming problems. First, name mangling is necessary with C++ classes. You will have to use objdump to determine the C++ method names. Second, if two C++ methods from two different classes have the same name (this is often the case when overridding), you'll have to declare the C classes in separate Ada packages. Otherwise, pragma import will report an error when attempting to import the same name twice.

If a class has no vtable, it cannot be imported in Ada. CPP_Class will report an error.

Only the C++ classes you intend to use need to be imported. If cpp_car has a parent class called cpp_vehicle, it does not have to be declared in Ada if it will not be used. However, any fields in cpp_vehicle will have to be added to the beginning of the cpp_car tagged record or they will be missing.

C++ classes imported into Ada can't be assigned. This has to do with the differences in assignment semantics between the two languages. C++ classes used in Ada should always have a constructor because this is the only way to assign values to the object.

The name of the constructor is not important. It simply gives the C++ constructor an Ada compatible name.

C++ has no object parameter--it is implied. In Ada, the object must be declared and it can be declared in any position in the parameter list. When importing C++ objects, always put the object name in the position of the first parameter. This is the parameter used by C++ (even though it is not seen by the programmer).

One of the differences between C++ and Ada objects is that Ada has no equivalent of "virtual methods". In Ada, whether or not a method is virtual is determined by the way the class is declared. Also, Ada doesn't allow a class-wide type to be overridden by any children--a class-wide type is always class-wide with no hidden surprises further down the class tree. In C++, virtual functions must be explicitly declared as "virtual". Ada doesn't require all the methods in a C++ class to be imported.

pragma CPP_Virtual identifies which methods are virtual and the position in the vtable. In the simplest case, the first C++ virtual function is at position 1, the second is at position 2, and so on.

When a virtual function is not overridden, it must be declared in Ada. Use CPP_Virtual to indicate which parent function to use.

If necessary, Gnat allows the C++ class to be extended with Ada-specific tagged records. (A multi-language class may make a project unnecessarily complex and difficult to debug.)

Private and protected fields in a C++ object and be simulated using a combination of Ada's private keyword and the information hiding capabilities of Ada packages.

Write the corresponding Ada packages containing the C++ class interface. Since virtual methods are used, we'll need to define each C++ class in a separate package to avoid problems with pragma import.

19.6 Calling Ada from C++

If possible, the same GCC compiler should be used for all source files. If you are using the ACT version of GNAT 3.x or earlier, you should recompile GNAT to enable C++. The ALT version of GNAT 3.x (usually) has C++ enabled. If C++ is not enabled, you will receive a message from GCC about "cc1plus" (the C++ compiler) not being found. However, the example below was compiled with two different versions of GCC and there were no errors.

For the most part, calling Ada from C++ is done the same was as calling Ada from C. Instead of using "C" in pragma export, use "CPP" (that is, C++) as the language convention. However, Gnat provides no support for CPP "name mangling": all Ada extern declarations in a C++ file should use extern "C".

The following is the same sample C program used above in 19.4, converted to C++.

Tagged records cannot be exported directly to C++. Gnat does not understand C++ name mangling and it cannot give the tagged record subprograms names that C++ would recognize. Also, C++ does not understand Ada's Object Oriented Programming model--Ada tagged records are not identical to C++ object classes.

In order to use Ada tagged records from C++, you will have (dynamically) declare the objects in Ada and pass a "handle" (an ID number or a pointer) back to C++ to use in reference. The Ada source must have special subprograms to match the handle to a particular object and call the appropriate Ada subprogram for that object on behalf of C++.