PHP TSRM Explanation – PART 1:

PHP Lifecycle:

PHP boots and loads all the required extensions by calling PHP_MINIT function defined in each extension.

Request Processing start by:
Calling PHP_RINIT function defined in each extension.
Then it compiles and execute the request script.
At the end of the request, it calls PHP_RSHUTDOWN of each extension.
Request Processing end.

Request Processing starts:
…..
Request Processing end.

Request Processing starts:
…..
Request Processing end.
PHP terminates by calling PHP_MSHUTDOWN defined in each loaded extension.

For easy understanding, Every extension that is being loaded in PHP goes through the phases defined below:
PHP_MINIT
           PHP_RINIT
           PHP_RSHUTDOWN

           PHP_RINIT
           PHP_RSHUTDOWN

           PHP_RINIT
           PHP_RSHUTDOWN
PHP_MSHUTDOWN

If request comes sequentially, then we do not have to ponder too much about threading related issue. But in reality, request come in parallel.
It is interesting how PHP handles threading.

Some thoughts about threading models:

For sometime, let’s forget about PHP and discuss about threading.

Below 2 cases covers almost all the threading scenario.

Assume that Thread1, Thread2 are running parallel. You can extend below models to N threads.

CASE1::PARALLEL SYSTEM::Simplest Threading Scenario:No shared resources:
Thread1 start
Allocates private data on which the thread will work
Read/Update on pvt data
Free the private data
Thread1 ends

Thread1 start
Allocates private data on which the thread will work
Read/Update on pvt data
Free the private data
Thread1 ends

CASE2::CONCURRENT SYSTEMS::Complex Threading Scenario :Shared resources:
Thread1 start
Allocates private data on which the threads will work
MUTEX_LOCK
Read/Update on pvt data
MUTEX_UNLOCK
Free the private data
Thread1 ends

Thread1 start
Allocates private data on which the threads will work
MUTEX_LOCK
Read/Update on pvt data
MUTEX_UNLOCK
Free the private data
Thread1 ends

Threading model in PHP:

Now getting back to PHP, we are bit lucky that the threading model we require in PHP is the CASE1 as PHP do not want any sharing of private data between two parallel PHP requests.

The name that is being used by PHP to achieve threading is called TSRM model.
TSRM = Thread Safe Resource Manager.

I guess if we try to understand each word of TSRM, it will give us the superficial idea about the approach used inside PHP.
Lets try to analyse expanded form of TSRM keeping CASE1 of threading model defined above.
Private data and Resource means same.
TSRM == Thread Safe Resource Manager == Thread Safe Private Data Manager.

Lets forget what PHP does and try to design our own so called TSRM.
What we just need is a hash, where
key = thread id of PHP request
value = Resource/Private data belonging to PHP request.

And hash needs to be global so that all the PHP request threads can access the global hash.
That means that its a shared resource. Now according to case 2 of threading model defined above, we need some mutex when any thread tries to write/update on this global hash.
Reads do not need locks (in most cases).

And that’s what is PHP TSRM model essentially is.

In the next section, I will deep dive in the source code and try to explain TSRM.c.

Advertisements

Quick Tutorial to understand function call stack frame using C code and Gdb.

All the explanation below will be with respect to the C code in test.c. Sizeof(int) = 4, sizeof registers = 8 bytes and address space theoretically = 2pow64.

test.c:

#include <stdio.h>

int func(int a, int b)
{
int h = 10;
int c = h+a+b;

return c;
}

int main()
{
int t = 1;
int t1 = 2;
int d = func((int)0xdeadbeef, (int)0xdeedbeef);

return 0;
}

Compiling test.c:
gcc -ggdb -O0 -o mybin ./test.c

How stack works:
1) Stack grows from from HIGH address to LOW address.
2) If main calls func(int a, int b), then call stack looks like:

HIGH_ADDR
4bytes param1 of func (Depending on compiler, params may directly go in registers and not on stack)
4bytes param2 of func
8bytes Address of instruction in main just after func ($eip or $rip)
8bytes base pointer of func (after saving old $bp on stack, $bp = $sp)
4bytes local var1 of func
4bytes local var2 of func
….
….
LOW_ADDR

Next we will debug mybin and try to understand call stack frame.

Let’s debug the code in GDB.

>gdb mybin   

Breakpoint 1, main () at ./test.c:15
(gdb) p $rbp //$rbp register value inside main
$1 = (void *) 0x7fff2f54e590
(gdb) p $rsp //stack pointer value just before calling func
$2 = (void *) 0x7fff2f54e580
(gdb) c
Continuing.

Breakpoint 2, func (a=-559038737, b=-554844433) at ./test.c:5
(gdb) p $rsp //stack pointer value after entering func
$3 = (void *) 0x7fff2f54e570
(gdb) p 0x7fff2f54e570 – 0x7fff2f54e580 //Subtract new stack pointer (inside func) with old stack pointer (in main just before calling func)
$4 = -16 //Indicates that we only push $eip value which was address of next instruction in main after func, and we pushed old $ebp value. We did not push input args of func on stack.
(gdb) p $rbp //$rbp = $rsp, so value at address $rbp is the value of old $rbp
$5 = (void *) 0x7fff2f54e570
(gdb) x/8b $rbp //Print value stored at memory $rbp and we get the value = old $rbp. See above $1
0x7fff2f54e570: 0x90 0xe5 0x54 0x2f 0xff 0x7f 0x00 0x00
(gdb) x/16b $rbp //Print value stored in stack before the oldbp. Value = address of instruction that will execute just after func call inside main function
0x7fff2f54e570: 0x90 0xe5 0x54 0x2f 0xff 0x7f 0x00 0x00
0x7fff2f54e578: 0x6f 0x04 0x40 0x00 0x00 0x00 0x00 0x00
Breakpoint 3 at 0x400445: file ./test.c, line 8.
(gdb) c
Continuing.

Breakpoint 3, func (a=-559038737, b=-554844433) at ./test.c:8
(gdb) x/4b $rbp – 4  //value of h
0x7fff2f54e568: 0x0a 0x00 0x00 0x00
(gdb) x/4b $rbp – 8 //value of c
0x7fff2f54e56c: 0xe8 0x7d 0x9b 0xbd

(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040044a <main+0>: push rbp
0x000000000040044b <main+1>: mov rbp,rsp
0x000000000040044e <main+4>: sub rsp,0x10
0x0000000000400452 <main+8>: mov DWORD PTR [rbp-0xc],0x1
0x0000000000400459 <main+15>: mov DWORD PTR [rbp-0x8],0x2
0x0000000000400460 <main+22>: mov esi,0xdeedbeef
0x0000000000400465 <main+27>: mov edi,0xdeadbeef
0x000000000040046a <main+32>: call 0x400428 <func>
0x000000000040046f <main+37>: mov DWORD PTR [rbp-0x4],eax
0x0000000000400472 <main+40>: mov eax,0x0
0x0000000000400477 <main+45>: leave
0x0000000000400478 <main+46>: ret
End of assembler dump.
(gdb) disassemble func
Dump of assembler code for function func:
0x0000000000400428 <func+0>: push rbp
0x0000000000400429 <func+1>: mov rbp,rsp
0x000000000040042c <func+4>: mov DWORD PTR [rbp-0x14],edi
0x000000000040042f <func+7>: mov DWORD PTR [rbp-0x18],esi
0x0000000000400432 <func+10>: mov DWORD PTR [rbp-0x8],0xa
0x0000000000400439 <func+17>: mov eax,DWORD PTR [rbp-0x14]
0x000000000040043c <func+20>: add eax,DWORD PTR [rbp-0x8]
0x000000000040043f <func+23>: add eax,DWORD PTR [rbp-0x18]
0x0000000000400442 <func+26>: mov DWORD PTR [rbp-0x4],eax
0x0000000000400445 <func+29>: mov eax,DWORD PTR [rbp-0x4]
0x0000000000400448 <func+32>: leave
0x0000000000400449 <func+33>: ret
End of assembler dump.
(gdb)

Let’s try to understand assuming our stack started at address 31.
//Assumption params are of 4 bytes

StartAddr Data EndAddr Print values in gdb inside function func
31 0xdeadbeef 28 x/4b  $rbp+20  //In our case, this params did not go on stack
27 0xdeedbeef 24 x/4b $rbp+16  //In our case, this did not go on stack
23 return address 16 x/8b $rbp+8
15 old rbp 8 x/8b $rbp
[At this point $sp = 8, so $bp = 8.]
7 h 4 x/4b $rbp-4
3 c 0 x/4b $rbp-8

Continue reading “Quick Tutorial to understand function call stack frame using C code and Gdb.”

Algorithms inside Linux Kernel

Many times, a new CS student is always confused that what can be achieved by learning Data Structures and Algorithms. So I am just pointing to page which can give strong motivation for learning them.

There is a beautiful answer at cstheory.stackexchange.com. http://cstheory.stackexchange.com/questions/19759/core-algorithms-deployed/19773#19773

And this page has been well summarized at                       http://luisbg.blogalia.com/historias/74062

What happens after you have written high level language program in you IDE

I found 2 very handy articles explaining on what happens after you are done with writing program in you IDE.

These articles can be very useful to get better understanding of what goes behind the scene and especially to college students. These were the question I had when I started programming and I never got nice answers :(.

These article inspects what happens at:
Compiling
Linking
Loading
Running stages of your program.

The best part is that you have wide list of tools to detect all the internal details and these articles throw some good light on their usage.

http://www.cs.swarthmore.edu/~newhall/unixhelp/compilecycle.html

http://www.lurklurk.org/linkers/linkers.html

I enjoyed reading them and I hope you will also enjoy them.

How to compile naclports (ffmpeg)

1) Download the nacl_sdk. (https://developer.chrome.com/native-client/devguide/tutorial/tutorial-part1)
2) cd nacl_sdk; ./naclsdk //will show all the possible commands
3) ./naclsdk update will install/update the nacl_sdk.
4) ls will show that you have installed folders. You will find pepper_xx where xx denotes the version of pepper installed.
5) cd getting_started; make serve; will start the httpd server on loalhost:5103. Then you can see the live demo of nacl yourself. I wont go into details .nmf file which directs wheather to fetch pnacl or .nexe binary. These binary are created using nacl snadboxing and not normal binaries(elf or exe).
You can also try cd pepper_xx/examples; make serve. This will also start the server at 5103 and you can try and debug the demo

Now to install ffmpeg nacl port, we need to install depot_tools first.

6) Follow installation instructions for depot_tools from http://dev.chromium.org/developers/how-tos/install-depot-tools
7) Make sure that till this point your ~/.bashrc has these two changes:
export PATH=”$PATH”:/home/user/depot_tools
export NACL_SDK_ROOT=/home/user/nacl_sdk/pepper_xx

8) Now to install naclports follow instructions given at http://code.google.com/p/naclports/wiki/HowTo_Checkout.
(I just used git clone instead gclient config…But thats not important)

9) Its very important to read README.rst once before starting compilation of naclports.
For my case I used, NACL_ARCH=pnacl make ffmpeg.
This will download and compile ffmpeg which has been. The version of ffmpeg compiled can be looked upon at http://code.google.com/p/naclports/wiki/PortList.

References:
https://developer.chrome.com/native-client

Why to use size_t in C ?

In short, This is useful for portability.

Generally on many platforms, size_t = unsigned int, But on some datamodels/platform like 64 bit platforms, unsigned int = 4 bytes and size_t = 8 bytes. This is due to data model used by particular platform. (http://en.wikipedia.org/wiki/64-bit#64-bit_data_models)

So instead of using unsigned int for any kind of length input/output, we can always use size_t and that inadvertently gives us the portability. It provides scope for compiler optimizations and also gives the maximum possible positive value possible on that platform.

In linux kernel also SIZE_MAX of any platform is equal to max value of size_t.
#define SIZE_MAX (~(size_t)0)
(http://lxr.free-electrons.com/source/include/linux/kernel.h)

That is why, everywhere in clib where we need to deal with length, we find size_t. For instance:
void *malloc(size_t length).
void *memcpy(void *dest, const void *src, size_t length)
strncpy…

References:
http://www.embedded.com/electronics-blogs/programming-pointers/4026076/Why-size-t-matters
http://msdn.microsoft.com/en-us/library/3b2e7499.aspx
http://stackoverflow.com/questions/918787/whats-sizeofsize-t-on-32-bit-vs-the-various-64-bit-data-models
http://stackoverflow.com/questions/131803/unsigned-int-vs-size-t

Map Reduce :: Algorithm for finding max number.

MAPPER PHASE::
Private int maxNumber = 0;
Map(record) {
if(record > maxNumber)
maxNumber = record.
//Nothing to emit
}

cleanup() {
emit(maxNumber, NULL)
}

So the total numbers that will be emitted are equal to the number of mapper or in other words, equal to the number of splits of the input.

Now define the custom key comparator so as to force map reduce to sort key in descending order.

REDUCE PHASE::
Private haveEmitted = 0;
Reducer(key, listof) {
//Because the first key is the max number
if(haveEmitted == 0)  emit(key, NULL)
}

OPTIMIZATION::
We can instead use global counter MR job and set the counter in cleanup() function in mapper phase.

cleanup() {
if(current value of counter < maxNumber)
set the counter = maxNumber.
}

The synchronization on access of global counter is maintained by MR framework.
So at the end we can print the counter and it will have the max number. Using this way we can save the cost of creation of one reducer and associated shuffling and sorting.