Every release, along with the migration instructions, is documented on the Github Releases page. Skip to content. Star 2. View license. Branches Tags. Could not load branches.
Could not load tags. Latest commit. Git stats commits. Added a getMaxStackSize method in asm. Javadoc typos. Don't create a new ClassWriter instance in toByteArray, in order for this code to work with subclasses.
Fix regression: LDC with a primitive class doesn't work anymore. ClassOptimizer does not work with classes in default package.
MemberNode, EmptyVisitor or adapted e. Done with a new visitInvokeDynamicInsn method. The second step was to replace the generated parser with a hand written, recursive descent parser, containing 11 recursive methods one per rule of the grammar. This led to major performance improvements, and to a huge code size reduction. This was due to the fact that hand written code can be more efficient than generated code, and also to the fact that all lexical and syntactic verifications were removed [R2].
In the third step a refactoring reduced the number of writer classes to 6 instead of 7 by using inheritance. In the fourth step the number of abstract classes was reduced to 4 [R10] , and the 6 writer classes were merged into a single one [R12] , using a boolean stack encoded as an int [R8].
The parser was mostly unchanged. The reduced API allowed invalid signatures to be generated it corresponded to a generalized grammar where FieldType , ClassType , ArrayType , TypeVariable and Type are merged into a single Type non terminal , but this was seen as an acceptable compromise. In the sixth step the parser was optimized by replacing some fields that represented the last parsed character and its index with method arguments [R6].
In addition some methods were inlined, which removed 6 parsing methods out of 11 [R13]. Finally, after some feedback from users, the API was reduced to only one abstract class although it allowed even more invalid signatures to be generated, this was judged more practical to define SignatureAdapter s , and this provided more opportunities to inline parser methods: the final parser contains only 3 parsing methods.
The code size decreased from 22KB at the first step, to 6. At the same time the average time to parse and rebuild a signature decreased from 74 micro seconds at the first step, to 15 micro seconds at the second step, 5. As can be seen from this example more details can be found on the ASM mailing list archives of January , improving performances often leads to code size reductions, and vice versa!
They use the standard Maven directory layout, which is also the default layout used by Gradle. Both tools use ASM itself. This package does not depend on the other ones and can be used alone. It is independent of the core package but complements it.
It can be used to implement complex class transformations, for which the core package would be too complicated to use. It can be used in addition to the tree package to implement really complex class transformations that need to know the state of the stack map frames for each instruction. These adapters can be used as is or can be extended to implement more specific class transformations. It is generally not needed at runtime. It provided the ability to convert classes to and from XML.
The inverse process is done by the other 14 classes, which are organized around the ClassWriter class: Classes: the ClassWriter class is the main entry point. It contains fields that describe the class version, access flags, name, etc.
It also contains references to other objects that represent the constant pool, the fields, the methods, the annotations and the attributes of the class. Constant pool: the SymbolTable class is used to represent the constant pool and the bootstrap methods, as well as an ASM specific type table used to compute stack map frames.
These structures are represented both in byte array form and as a hash set of Symbol instances, in order to efficiently test if a given constant pool item, bootstrap method or type has already been added to the symbol table. Fields: the FieldWriter class is used to write fields.
It contains fields that describe the field's name, type, signature, value, etc. It also contains references to other objects that represent the field's annotations and attributes.
Methods: the MethodWriter class is used to write methods. It contains fields that describe the method's name, signature, exceptions, etc. It also contains references to other objects that represent the method's annotations and attributes. The method's code is stored in a byte array that is constructed during the visit of bytecode instructions.
The labels used to reference instructions are stored in a linked list of Label instructions. Each handler references three Label objects that define the start and end of the try block, and the start of the catch block. An Edge is an edge between two Label objects in this graph. Modules: the ModuleWriter class is used to write the Module , ModulePackages and ModuleMainClass class attributes, which are related to modules a Java module definition is compiled into a class file containing these attributes.
Annotations: the AnnotationWriter class is used to write annotations. This class is referenced from the ClassWriter , FieldWriter and MethodWriter classes, since classes, fields and methods can have annotations.
Attributes: the Attribute class is used to read and write non standard class attributes. It must be subclassed for each specific non standard attribute that must be read and written. This class is referenced from the ClassWriter , FieldWriter and MethodWriter classes, since classes, fields and methods can have attributes. Resources: the ByteVector class is used to serialize the class elements while they are visited.
It is used to represent the constant pool, the annotation values, the method's code, the stack map tables, the line number tables, etc. Instead, lists, sets and graphs are encoded in dedicated fields of their elements: Lists are represented as linked lists whose links are stored directly in the list elements themselves. For instance a list of FieldWriter objects is represented as the FieldWriter objects themselves, linked through their fv field.
The advantage of this method, compared to using separate objects to store the linked list itself as in java. LinkedList is that it saves memory. The drawback is that a given element cannot belong to several lists at the same time, but this is not a problem in the ASM case. The only hash set used in the core package, in the SymbolTable class, is implemented with an array of SymbolTable. Entry instances a subclass of Symbol that can be chained together through their next field to handle the case of hash collisions.
In other words, as for lists, the hash set structure is embedded in the hash set elements themselves. The advantages and drawbacks are the same saves memory but elements cannot belong to several hash sets at once. Similarly, the control flow graph see section 3.
Since Label objects must be stored in several data structures at the same time, they have several distinct fields that encode these data structures: the nextBasicBlock field is used to encode the list of labels of a method, in the order they are visited.
It is summarized below. Depending on the complexity of the attribute: either parse it and store its value in a local variable for attributes containing a simple value , or store its start offset in a local variable for attributes with a complex structure call the visit methods corresponding to the detected attributes.
For complex attributes, which were not parsed in the previous step such as annotations , parse them and visit them at the same time: i. Depending on the attribute: either parse it and store its value in a local variable, or store its start offset in a local variable call visitField call the visit methods corresponding to the detected attributes. Depending on the attribute: either parse it and store its value in a local variable, or store its start offset in a local variable call visitMethod if the returned visitor is a MethodWriter , and if its ClassWriter 's constant pool was copied from this reader see section 3.
In the case of stack map frames, not only the visit, but also the parsing of the stack map table and of the method's code is interleaved.
The advantage, compared to a parsing of the stack map table followed by a parsing of the method's code, is that no complex data structure is needed to store the parsed frames for the second step.
Not also that a single char array is reused to parse these items. It must be large enough to parse the longest string, hence the computation of maxStringLength in the constructor. Instead, in order to be able to automatically compute the maximum stack size, the maximum number of local variables and the stack map frames, each visit Xxx Insn method does the following: append the instruction to the code byte vector if currentBasicBlock!
The case of forward jumps is solved in the following way: The jump instruction is written with a temporary relative offset equal to 0. The target Label object is updated to memorize the fact that this jump instruction makes a forward reference to this label the forwardReferences array in Label is used for that.
When this label is visited, i. This is done in two steps: the size of the class is computed by summing the size of all the pieces, which is given by the getSize method this method can add items to the constant pool, which modifies its size; this is why the constant pool size is added only at the very end.
This is done by calling the put Xxx method on each piece e. For this: the content of the ClassWriter is cleared except its symbol table , and the byte array is parsed with a ClassReader chained to this ClassWriter , to rebuild the class. The existing frames are also used as is, to replace the CurrentFrame with the visited frame.
In computeAllFrames , a fix point algorithm is used to compute the "input frame" of each basic block, i. Second step During the second step of the algorithm, which takes place in the computeAllFrames method, the input frames of each basic block are computed by using an iterative fix point algorithm i.
We compute the owner subroutine of each basic block as follows: visit all the basic blocks that are reachable from the first one, without following a JSR or RET instruction, and mark them as belonging to a main "subroutine". In the process, push all the targets of the visited JSR instructions into a queue Q. In the process, push all the targets of the visited JSR instructions into Q.
Here this means only L4 is marked, because L5 was already marked as belonging to subroutine 2 this happened because the nested subroutine 3 can "return" to its parent subroutine 2 without a RET : L0 belongs to: subroutine 1 L1 belongs to: subroutine 1 L2 belongs to: subroutine 2 L3 belongs to: subroutine 2 L4 belongs to: subroutine 3 L5 belongs to: subroutine 2 The second step consists in finding the successors of the RET instructions.
The final control flow graph is the following: L0 successors: L2, L1 L1 successors: none L2 successors: L4, L3 L3 successors: L5 L4 successors: L5, L3 L5 successors: L1 Note: you may have noticed that the L4 "basic block" is not a real basic block, because it contains several instructions that can lead to other blocks. There are however some general rules that can be used when designing an API and when implementing it, in order to get good performances a side effect of these rules is to reduce code size : [R1] the API must stay very close to the internal class file structures, in order to avoid costly conversions during parsing and writing.
Many examples of this rule can be seen in the existing API: internal class names, type descriptors and signatures are passed as argument in visit methods in the same form as they are stored in the class file.
Stack map frames are also visited as they as stored in the class file, i. The drawback of this rule is when several class adapters need a high level view of these encoded structures for example a SignatureVisitor based view of a signature string : indeed, in this case, the decoding and encoding steps will be executed in each adapter, while they could be executed only once in ClassReader and ClassWriter.
These verifications must be added in Check Xxx Adapter classes. Once the best API and algorithms have been found, several "low level" techniques can be used to optimize performances: [R3] avoid string manipulation operations.
These operations generally have a high cost. Examples of this can be seen in ClassReader : the cpInfoValues array is used to avoid parsing and building strings several times for the same constant pool UTF8 item. Another example is in the Label class: the types used for stack map frames are encoded as int values instead of strings they were initially stored as strings; the change to int values improved performances a lot.
Several examples of this principle can be seen in the existing implementation. For example the toByteArray method first computes the size of the class, then allocates a ByteVector of this size, and finally writes the class content in this vector.
This avoids many calls to the enlarge method in ByteVector , and therefore many array copy operations. Some JVMs do not inline these methods, in which case accessing fields directly is faster than using getter and setters. This also saves code. The core package does not use any such method. The tree package also exposes many fields directly. See for example the ByteVector class: the length and data fields are copied into local variables in methods that access them several times.
A variant of this rule is to replace fields with method parameters this variant was used in the signature package - see section 4. This rule is used, for example, in the visitMethod method in MethodWriter : the result of getArgumentsAndReturnSizes for a method Symbol is cached in a lazily computed field of Symbol. It is also used in ClassReader : the cpInfoValues array is used as a cache to avoid parsing the same string constant pool item several times.
0コメント