Introduction
The PHP Opcode series will take a look at the theory and implementation of working with the Zend Engine in the specific areas of adding opcodes and enabling language features. The specific areas will be how it would look in the PHP language, how it would look as PHP opcodes, and how it would look as Assembler for looking at enabling JIT compiling later.
The posts will be researched and go through multiple drafts for professionalism before posting. In this hope, it will strive to enable discussion that isn’t flaming and collective of the topic at hand. For as much as I can achieve at my level of writing skill and researching the topic at hand.
(Honestly, I’m spending more time on these posts than I do on my school research papers… which is to still say that I am doing more than previous posts.)
Purpose
I want to add JIT compiling to the current Zend Engine and before I can do that I will have to learn how the Zend Engine works. The best way would be to read a book or learn from the minds that created it. There is a current lack of printed material on extending the Zend Engine and I’ll rather not bother the Zend Engine masters with my countless n00bish questions. I’ll wait to ask those questions for whenever I hit a road block or can’t find the answer myself.
I’m going to try to implement some of the easier features of the “Opcode Theory” part to learn how different areas of the Zend Engine work. I’ll also intentionally break the Engine and learn from there. Such questions as, “I wonder what would happen if I add this?” or “What if I change this to my own code.”
I’m tempted to rewrite the entire thing to my achieve my own goals, but that would negate the point of these exercises.
Not that I will implement all of the theory, but to think about how it would be possible. To learn how PHP uses Opcodes and how it can be improved and optimized by using more. By writing theory and how the current Zend Engine works, I can remember better and try to implement my ideas to see if they are possible.
Areas of Focus
There will be areas of focus with titles for keeping order of which post is for what.
- Opcode Theory
- Opcode Implementation
- Opcode Documentation
Opcode Theory
These posts will abstract out the problem and how it could look once implemented. From above, the sections will be PHP userland code, PHP Opcode, and finally how it would look as Assembler for JIT. The Assembler isn’t much use for the JIT, but it will help to think about solutions once JIT is finally implemented. The JIT and Assembler will achieve the same functionality of converting opcodes to machine code.
It would be nice to see it in action outside of PHP, using Assembler, before implementing the feature using the JIT library. Getting Assembler code to work against the Zend Engine and PHP will also be part of the Implementation, which is outside the scope of the Theory. It is just to say that the Theory part will have puesdo-Assembler code for how it might look.
I will not be implementing all of these theory posts and I will be lucky if I even implement one. My main purpose is learning enough of the Opcode parts of the Zend Engine to hook into and write a Zend Engine extension that enables JIT compilation. These side projects should help to learn enough to accomplish that much.
Opcode Implementation
Will document how well I’m doing implementing some of the theory and eventually the JIT feature for the Zend Engine. In the event that I do implement one of the features in the theory part, I will post what it finally looks like and how it works. These posts will also include the road and difficulties in implementing the features in the theory posts and eventually how it was done.
Opcode Documentation
I expect that I’ll learn a lot by the above mangling of the Zend Engine and picking at the minds and team behind the Zend Engine. Some things I will have to learn myself and I’ll post them in this heading to make sure that I’ll remember them when I come back after an extended break or two to work on other projects.
“Hmm, how did I do that one thing again? Oh yeah, I wrote about that!” and also to store help I will eventually (hopefully) receive from the PHP lists.
Possibly Related Posts:
- What is the Obituary Story
- Hashing Out Obituary Book Details
- English IV: Untitled Satire
- In Response to JIT Compilation
- Userland Multitasking: Introduction
Why can’t you ask the minds who created it? They hang around on IRC, and all read internals@lists.php.net as well.
Oops, This wasn’t supposed to be part of Planet PHP! I’ll fix that.
@Greg Beaver
Yeah, I’ve done that, except for IRC as I’m not fond of chatting. Wastes way to much time. I also would not like to waste their time either with pointless question after pointless question. The quest is to find the answers myself and if I stumbled, then ask. I may be a n00b, but I find it is better not to act like one.
Excited to hear what your findings are! Would be great if you could blog along the way =)
Look forward to the series. Getting your feet wet in a large codebase can be a pretty time consuming exercise so I dig your approach. It’s imperfect, but builds early understanding. Definitely a topic worth blogging about.
@Padraic Brady
Thanks! Yeah, I figured it would be easier and quicker to learn and work with the Zend Engine than it would be to build my own compiler. I still want to build my own compiler, but “getting my feet wet” is something I should do to better learn how compilers work, or more specifically how the Zend Engine works.
If I can’t adapt to the Zend Engine, then there is no way in hell I can call myself a professional or say that I’m ready for the professional world where I’ll be working with another person’s code all of the time. At the very least, I can say on my resume, that I can adapt well to alien code bases. Need to learn not to rewrite everything I don’t understand (my New Year’s resolution).
It also gets the little ideas that are floating around the inside of my head out in coherent form, so that I can follow and choose which idea is worth pursuing in the amount of time that I will have. Summer will be most interesting indeed, but I will also spend some time compiling theory and breaking the Zend Engine over the course of two months. Should be a fun side project.
My first step was going to be just jumping and using an library to allow JIT compilation for the Zend Engine. When I actually took a better look at the Zend Engine, the areas unrelated to PHP Extension writing, I got the idea to do this series. Comments are quite sparse and yeah, it might have went better for me if I had another two or three years of C/C++ experience.
Drowning is a better term I think to describe how over my head I am. I do have better respect for the people who understand and maintain the Zend Engine. Actually, some of the ideas I have are far too complicated for me to implement, even if I did understand the Zend Engine. Taking some posts that I had saved from last year and converting them to this format.
There’s a lot about ZE that’s beautiful, and lot that’s just plain annoyingly wrongity-wrong (but has to be in order to appease the gods of BC). Once you get your head about the organization of the linkages between the lexer, parser, compiler, and the awe-and-mystery that is the back-patching routines, it should start to hold together a little better.
To paraphrase someone I once knew(RIP), who was referring to the sendmail configuration file:
It’s like the Necronomicon… You are warned away, but you go regardless, hoping to learn unearthly secrets. If your
sanity survives, you spend the rest of your life conversing with znodes, running from refcounts, and striking fear into most heredocs. I don’t know whether it has driven me insane or revealed to me deep secrets about the universe.
Dear Sara,
Your book on writing PHP Extensions is awesome! Any thoughts of extending it to include Zend Engine details?
Thanks.
Going-insane-wannabe-Zend-Engine-Hacker,
Jacob Santos
————-
There would be a greater need for writing extensions than for hacking the Zend Engine, but it might be good for a double length book. It would help make this long journey shorter. Although I do realized that asking is probably a slap in a face, since you didn’t have such material when you started hacking.
I’ll take this as a rite as passage then, until which time such material does become widely available in printed form. I did find a lot of functions that might help in implementing the JIT compiler, but I really do need to hack and see what each does. That is quite a long process and it usually leads to other areas of the Zend Engine. For that is another reason for the creation of this series. Once I learn one part, I can accept it in ease, or at least as much ease as one can get with a month or two of hacking.
Thanks, I’m glad you like it. I do have several additions planned for the 2nd edition, but I’m not sure compiler theory will make it in (assuming there is a 2nd ed, still too early to tell on that one).
I *have*, however, been meaning to get some more blog entries out which take a look under the hood (along the lines of “How long is a piece of string?”). It’s just a question of time and I’m already quite late on getting part 4 of my devzone.zend.com series out the door.
As far as where you should start with your JIT compiler, just take a page out of the opcode cache book and hook zend_compile_file to get the opcodes, do some voodoo to turn that into C code, then run that into libgcc to make happy little binaries. The principle is simple enough, it’s just that voodoo step in the middle that’ll drive you to seek truth at the bottom of a bottle some late frigid night…
In terms of understanding how the opcodes fit together, you ought to start by just pumping scripts into VLD and doing your best to make sense of the output. The opcodes are (by and large) named pretty intuitively, QM being the major exception: Hint, it’s to do with ternary expressions. Here’s a few files of interrest in the Zend/ directory:
zend_language_scanner.l – Turns source into tokens (e.g. “echo” => T_ECHO )
zend_language_parser.y – Turns tokens into expressions (e.g. T_ECHO T_STRING ‘;’ => zend_do_echo(&$1) )
zend_compile.c – Turns expressions into opcodes (e.g. zend_do_echo(znode *expr) => ZEND_ECHO )
zend_vm_def.h – Executes opcodes (e.g. ZEND_ECHO => zend_print_variable(op1) )
Note that all these files (with the exception of zend_compile.c) aren’t exactly C-source. zend_language_scanner.l, zend_language_parser.y, and zend_vm_def.h are pre-processed by flex, bison, and zend_vm_gen.php, respectively.