Purifier1 Special Cases

Wrong Classref

Brief Description

Occationally a ConstantPool will have two different CONSTANT_Class entries that point to the same Class name. Since the Purifier1 uses BCEL's ConstantPoolGen.lookupClass method, there is a posibility that the wrong CONSTANT_Class may be found for any StackMapType that has two possible CONSTANT_Class entries.

Status

The Purifier1 project will, temporarily, use a patched version of BCEL. Both expecting BCEL to fix their library and getting Sun to fix their preverifier are long term fixes. The temporary patch is in place in the Purifier1 development version of BCEL and works just fine.

Action Items

  1. Attempt to implement the suggested fix
  2. Create a patched version of BCEL for temporary use with the Purifier1
  3. Submit fix to the Apache BCEL developer community for review.
  4. Encourage the Apache BCEL developer community to adopt the fix.

Test Cases Exhibiting Wrong Classref Issue

Test Cases Exhibiting Wrong Classref Issue
Package Class Method Method # Effect Source
org.spruce.midp.common c a(org.spruce.midp.common.c) 8 Fails midlet.org
org.spruce.midp.alarmclock a a(String) 22

Details

Apparently it is possible for a ConstantPool to have two CONSTANT_Class entries with the same object signture. Therefore it is possible that when the StackMapGen is looking up a Class index using it's signature, it will find the wrong one and create a StackMapType entry that has an index that is in disagreement with the corresponding StackMapType entry created by Sun's preverifier.

Example

org.spruce.midp.alarmclock.a.a(String) provides an example of this issue and is shown below.

Source Code

Not available. The Spruce AlarmClock is a third party program from Spruce Technologies that was found on midlet.org. It is provided as a jar file only and there is no source code available. Furthermore, examination of the jar indicates that the classes are obfuscated. Such Midlets are used in testing the Purifier1 because they represent real world examples of Midlets (A.K.A "wild" or "free range" Midlets) as opposed to the contrived Midlets designed for testing or that come with examples from Sun.

Bytecode

The following byte code excerpt is from the compiled and preverified class file created from the source in the previous section.

private static String a(String arg0)
Code(max_stack = 5, max_locals = 4, code_length = 103)
0:    aload_0
1:    invokevirtual	java.lang.String.toCharArray ()[C (121)

The relevant portion of the ConstantPool is:

114)CONSTANT_Integer[3](bytes = 119)
115)CONSTANT_Class[7](name_index = 116)
116)CONSTANT_Utf8[1]("java/lang/String")
117)CONSTANT_Class[7](name_index = 116)
118)CONSTANT_NameAndType[12](name_index = 119, signature_index = 120)
119)CONSTANT_Utf8[1]("toCharArray")
120)CONSTANT_Utf8[1]("()[C")
121)CONSTANT_Methodref[10](class_index = 117, name_and_type_index = 118)
122)CONSTANT_Utf8[1]("<init>")
123)CONSTANT_Utf8[1]("()V")
124)CONSTANT_Class[7](name_index = 116)
125)CONSTANT_NameAndType[12](name_index = 79, signature_index = 77)

Note that CONSTANT_Class entries 115, 117 and 124 all point to the same UTF-8 String at index 116. This is highly unusual and is probably an artifact of the obfuscation program that the original author is probably using. Since the method, a(String), has a java.lang.String argument, BCEL's ConstantPoolGen.lookupClass(String) method has to be used to look it up in the ConstantPool. Since BCEL uses a HashMap to store these and each entry overrides the previous one, it finds the last one (ie: 124) and returns that. However, the invokevirtual instruction at offset 1 indicates via the CONSTANT_Methodref at index 121 that the correct CONSTANT_Class to use is the one at index 117. Sun's preverifier returns the first entry (ie: 115). Even though that is also wrong, it is the reference implementation and is considered to be the right answer.

Another relevant portion of the bytecode is:

94:   new		<java.lang.String> (115)
97:   dup
98:   aload_1
99:   invokespecial	java.lang.String.<init> ([C)V (215)
102:  areturn

The corresponding ConstantPool entries are:

214)CONSTANT_Utf8[1]("([C)V")
215)CONSTANT_Methodref[10](class_index = 124, name_and_type_index = 192)
216)CONSTANT_NameAndType[12](name_index = 76, signature_index = 107)

This is contradictory. If the java.lang.String entry for the new object is 115 (created by the new instruction at offset 94), then why is the <init> method (called by the invokespecial at offset 99) using a CONSTANT_Methodref that points to a class at index 124?

Data Flow Analysis

Not much DFA is really required here since the error occurs in the method initialization where the Frame is pre-loaded with the method arguments. The existing StackMap in the class includes:

offset = 15,
locals = {
	(type=Object, (115)class=java.lang.String),
	(type=Object, (219)class=[C),
	(type=Integer),
	(type=Integer)
}

The StackMap generated by the Purifier1 includes:

offset = 15,
locals = {
	(type=Object, (124)class=java.lang.String),
	(type=Object, (219)class=[C),
	(type=Integer),
	(type=Integer)
}

The only difference is that the preverifier used by the author of the program chose the java.lang.String CONSTANT_Class at index 115, while the Purifier1 found the one at 124.

As mentioned above, when the Purifier1 uses the BCEL library to look up the index of java.lang.String, it finds the last one, 124, and uses that in the StackMap. The author's preverifier (presumably Sun's) is using the first one, 115. This discrepancy might be described as a "bug" in the BCEL library. However, the code above indicates that Sun's preverifire is wrong too, so it also has a bug. However, since Sun's implementation is the reference implementation, 115 is considered to be the right answer that the Purifier1 will have to emulate.

Solutions

This is a problem with BCEL's search algorithm. It is difficult to say it's a bug because this is, admittedly an undefined situation. I don't think it is legal to have multiple identical CONSTANT_Class entries like this and it is understandable if BCEL guesses incorrectly in an undefined situtaion. Nonetheless, these classes pass the JustIce verifier (org.apache.bcel.verifier.Verifier) with only warnings about the StackMap attributes, which it doesn't understand. A bugzilla report has been submitted to the Apache BCEL group.

The current solution has been implemented and was to "fix" BCEL's ConstantPoolGen.lookupClass(String) method to find the first matching CONSTANT_Class rather than the last. This was a fairly trivial change that simply checked to if a given key in the class_table HashMap was already in use to prevent over-writing it with subsequent data (see BCEL Bugzilla bug #18031)

Acknowledgments

Thanks go out to Spruce Technologies for making midlets available for free via their web site and midlet.org.

Back to Developer's Guide   Purifier1 Home Page

Valid HTML 4.01!