PHP TSRM Explanation – PART 1:

PHP Lifecycle:

PHP boots and loads all the required extensions by calling PHP_MINIT function defined in each extension.

Request Processing start by:
Calling PHP_RINIT function defined in each extension.
Then it compiles and execute the request script.
At the end of the request, it calls PHP_RSHUTDOWN of each extension.
Request Processing end.

Request Processing starts:
…..
Request Processing end.

Request Processing starts:
…..
Request Processing end.
PHP terminates by calling PHP_MSHUTDOWN defined in each loaded extension.

For easy understanding, Every extension that is being loaded in PHP goes through the phases defined below:
PHP_MINIT
           PHP_RINIT
           PHP_RSHUTDOWN

           PHP_RINIT
           PHP_RSHUTDOWN

           PHP_RINIT
           PHP_RSHUTDOWN
PHP_MSHUTDOWN

If request comes sequentially, then we do not have to ponder too much about threading related issue. But in reality, request come in parallel.
It is interesting how PHP handles threading.

Some thoughts about threading models:

For sometime, let’s forget about PHP and discuss about threading.

Below 2 cases covers almost all the threading scenario.

Assume that Thread1, Thread2 are running parallel. You can extend below models to N threads.

CASE1::PARALLEL SYSTEM::Simplest Threading Scenario:No shared resources:
Thread1 start
Allocates private data on which the thread will work
Read/Update on pvt data
Free the private data
Thread1 ends

Thread1 start
Allocates private data on which the thread will work
Read/Update on pvt data
Free the private data
Thread1 ends

CASE2::CONCURRENT SYSTEMS::Complex Threading Scenario :Shared resources:
Thread1 start
Allocates private data on which the threads will work
MUTEX_LOCK
Read/Update on pvt data
MUTEX_UNLOCK
Free the private data
Thread1 ends

Thread1 start
Allocates private data on which the threads will work
MUTEX_LOCK
Read/Update on pvt data
MUTEX_UNLOCK
Free the private data
Thread1 ends

Threading model in PHP:

Now getting back to PHP, we are bit lucky that the threading model we require in PHP is the CASE1 as PHP do not want any sharing of private data between two parallel PHP requests.

The name that is being used by PHP to achieve threading is called TSRM model.
TSRM = Thread Safe Resource Manager.

I guess if we try to understand each word of TSRM, it will give us the superficial idea about the approach used inside PHP.
Lets try to analyse expanded form of TSRM keeping CASE1 of threading model defined above.
Private data and Resource means same.
TSRM == Thread Safe Resource Manager == Thread Safe Private Data Manager.

Lets forget what PHP does and try to design our own so called TSRM.
What we just need is a hash, where
key = thread id of PHP request
value = Resource/Private data belonging to PHP request.

And hash needs to be global so that all the PHP request threads can access the global hash.
That means that its a shared resource. Now according to case 2 of threading model defined above, we need some mutex when any thread tries to write/update on this global hash.
Reads do not need locks (in most cases).

And that’s what is PHP TSRM model essentially is.

In the next section, I will deep dive in the source code and try to explain TSRM.c.

Quick Tutorial to understand function call stack frame using C code and Gdb.

All the explanation below will be with respect to the C code in test.c. Sizeof(int) = 4, sizeof registers = 8 bytes and address space theoretically = 2pow64.

test.c:

#include <stdio.h>

int func(int a, int b)
{
int h = 10;
int c = h+a+b;

return c;
}

int main()
{
int t = 1;
int t1 = 2;
int d = func((int)0xdeadbeef, (int)0xdeedbeef);

return 0;
}

Compiling test.c:
gcc -ggdb -O0 -o mybin ./test.c

How stack works:
1) Stack grows from from HIGH address to LOW address.
2) If main calls func(int a, int b), then call stack looks like:

HIGH_ADDR
4bytes param1 of func (Depending on compiler, params may directly go in registers and not on stack)
4bytes param2 of func
8bytes Address of instruction in main just after func ($eip or $rip)
8bytes base pointer of func (after saving old $bp on stack, $bp = $sp)
4bytes local var1 of func
4bytes local var2 of func
….
….
LOW_ADDR

Next we will debug mybin and try to understand call stack frame.

Let’s debug the code in GDB.

>gdb mybin   

Breakpoint 1, main () at ./test.c:15
(gdb) p $rbp //$rbp register value inside main
$1 = (void *) 0x7fff2f54e590
(gdb) p $rsp //stack pointer value just before calling func
$2 = (void *) 0x7fff2f54e580
(gdb) c
Continuing.

Breakpoint 2, func (a=-559038737, b=-554844433) at ./test.c:5
(gdb) p $rsp //stack pointer value after entering func
$3 = (void *) 0x7fff2f54e570
(gdb) p 0x7fff2f54e570 – 0x7fff2f54e580 //Subtract new stack pointer (inside func) with old stack pointer (in main just before calling func)
$4 = -16 //Indicates that we only push $eip value which was address of next instruction in main after func, and we pushed old $ebp value. We did not push input args of func on stack.
(gdb) p $rbp //$rbp = $rsp, so value at address $rbp is the value of old $rbp
$5 = (void *) 0x7fff2f54e570
(gdb) x/8b $rbp //Print value stored at memory $rbp and we get the value = old $rbp. See above $1
0x7fff2f54e570: 0x90 0xe5 0x54 0x2f 0xff 0x7f 0x00 0x00
(gdb) x/16b $rbp //Print value stored in stack before the oldbp. Value = address of instruction that will execute just after func call inside main function
0x7fff2f54e570: 0x90 0xe5 0x54 0x2f 0xff 0x7f 0x00 0x00
0x7fff2f54e578: 0x6f 0x04 0x40 0x00 0x00 0x00 0x00 0x00
Breakpoint 3 at 0x400445: file ./test.c, line 8.
(gdb) c
Continuing.

Breakpoint 3, func (a=-559038737, b=-554844433) at ./test.c:8
(gdb) x/4b $rbp – 4  //value of h
0x7fff2f54e568: 0x0a 0x00 0x00 0x00
(gdb) x/4b $rbp – 8 //value of c
0x7fff2f54e56c: 0xe8 0x7d 0x9b 0xbd

(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040044a <main+0>: push rbp
0x000000000040044b <main+1>: mov rbp,rsp
0x000000000040044e <main+4>: sub rsp,0x10
0x0000000000400452 <main+8>: mov DWORD PTR [rbp-0xc],0x1
0x0000000000400459 <main+15>: mov DWORD PTR [rbp-0x8],0x2
0x0000000000400460 <main+22>: mov esi,0xdeedbeef
0x0000000000400465 <main+27>: mov edi,0xdeadbeef
0x000000000040046a <main+32>: call 0x400428 <func>
0x000000000040046f <main+37>: mov DWORD PTR [rbp-0x4],eax
0x0000000000400472 <main+40>: mov eax,0x0
0x0000000000400477 <main+45>: leave
0x0000000000400478 <main+46>: ret
End of assembler dump.
(gdb) disassemble func
Dump of assembler code for function func:
0x0000000000400428 <func+0>: push rbp
0x0000000000400429 <func+1>: mov rbp,rsp
0x000000000040042c <func+4>: mov DWORD PTR [rbp-0x14],edi
0x000000000040042f <func+7>: mov DWORD PTR [rbp-0x18],esi
0x0000000000400432 <func+10>: mov DWORD PTR [rbp-0x8],0xa
0x0000000000400439 <func+17>: mov eax,DWORD PTR [rbp-0x14]
0x000000000040043c <func+20>: add eax,DWORD PTR [rbp-0x8]
0x000000000040043f <func+23>: add eax,DWORD PTR [rbp-0x18]
0x0000000000400442 <func+26>: mov DWORD PTR [rbp-0x4],eax
0x0000000000400445 <func+29>: mov eax,DWORD PTR [rbp-0x4]
0x0000000000400448 <func+32>: leave
0x0000000000400449 <func+33>: ret
End of assembler dump.
(gdb)

Let’s try to understand assuming our stack started at address 31.
//Assumption params are of 4 bytes

StartAddr Data EndAddr Print values in gdb inside function func
31 0xdeadbeef 28 x/4b  $rbp+20  //In our case, this params did not go on stack
27 0xdeedbeef 24 x/4b $rbp+16  //In our case, this did not go on stack
23 return address 16 x/8b $rbp+8
15 old rbp 8 x/8b $rbp
[At this point $sp = 8, so $bp = 8.]
7 h 4 x/4b $rbp-4
3 c 0 x/4b $rbp-8

Continue reading “Quick Tutorial to understand function call stack frame using C code and Gdb.”

Algorithms inside Linux Kernel

Many times, a new CS student is always confused that what can be achieved by learning Data Structures and Algorithms. So I am just pointing to page which can give strong motivation for learning them.

There is a beautiful answer at cstheory.stackexchange.com. http://cstheory.stackexchange.com/questions/19759/core-algorithms-deployed/19773#19773

And this page has been well summarized at                       http://luisbg.blogalia.com/historias/74062

What happens after you have written high level language program in you IDE

I found 2 very handy articles explaining on what happens after you are done with writing program in you IDE.

These articles can be very useful to get better understanding of what goes behind the scene and especially to college students. These were the question I had when I started programming and I never got nice answers :(.

These article inspects what happens at:
Compiling
Linking
Loading
Running stages of your program.

The best part is that you have wide list of tools to detect all the internal details and these articles throw some good light on their usage.

http://www.cs.swarthmore.edu/~newhall/unixhelp/compilecycle.html

http://www.lurklurk.org/linkers/linkers.html

I enjoyed reading them and I hope you will also enjoy them.

How to compile naclports (ffmpeg)

1) Download the nacl_sdk. (https://developer.chrome.com/native-client/devguide/tutorial/tutorial-part1)
2) cd nacl_sdk; ./naclsdk //will show all the possible commands
3) ./naclsdk update will install/update the nacl_sdk.
4) ls will show that you have installed folders. You will find pepper_xx where xx denotes the version of pepper installed.
5) cd getting_started; make serve; will start the httpd server on loalhost:5103. Then you can see the live demo of nacl yourself. I wont go into details .nmf file which directs wheather to fetch pnacl or .nexe binary. These binary are created using nacl snadboxing and not normal binaries(elf or exe).
You can also try cd pepper_xx/examples; make serve. This will also start the server at 5103 and you can try and debug the demo

Now to install ffmpeg nacl port, we need to install depot_tools first.

6) Follow installation instructions for depot_tools from http://dev.chromium.org/developers/how-tos/install-depot-tools
7) Make sure that till this point your ~/.bashrc has these two changes:
export PATH=”$PATH”:/home/user/depot_tools
export NACL_SDK_ROOT=/home/user/nacl_sdk/pepper_xx

8) Now to install naclports follow instructions given at http://code.google.com/p/naclports/wiki/HowTo_Checkout.
(I just used git clone instead gclient config…But thats not important)

9) Its very important to read README.rst once before starting compilation of naclports.
For my case I used, NACL_ARCH=pnacl make ffmpeg.
This will download and compile ffmpeg which has been. The version of ffmpeg compiled can be looked upon at http://code.google.com/p/naclports/wiki/PortList.

References:
https://developer.chrome.com/native-client

Why to use size_t in C ?

In short, This is useful for portability.

Generally on many platforms, size_t = unsigned int, But on some datamodels/platform like 64 bit platforms, unsigned int = 4 bytes and size_t = 8 bytes. This is due to data model used by particular platform. (http://en.wikipedia.org/wiki/64-bit#64-bit_data_models)

So instead of using unsigned int for any kind of length input/output, we can always use size_t and that inadvertently gives us the portability. It provides scope for compiler optimizations and also gives the maximum possible positive value possible on that platform.

In linux kernel also SIZE_MAX of any platform is equal to max value of size_t.
#define SIZE_MAX (~(size_t)0)
(http://lxr.free-electrons.com/source/include/linux/kernel.h)

That is why, everywhere in clib where we need to deal with length, we find size_t. For instance:
void *malloc(size_t length).
void *memcpy(void *dest, const void *src, size_t length)
strncpy…

References:
http://www.embedded.com/electronics-blogs/programming-pointers/4026076/Why-size-t-matters
http://msdn.microsoft.com/en-us/library/3b2e7499.aspx
http://stackoverflow.com/questions/918787/whats-sizeofsize-t-on-32-bit-vs-the-various-64-bit-data-models
http://stackoverflow.com/questions/131803/unsigned-int-vs-size-t

Map Reduce :: Algorithm for finding max number.

MAPPER PHASE::
Private int maxNumber = 0;
Map(record) {
if(record > maxNumber)
maxNumber = record.
//Nothing to emit
}

cleanup() {
emit(maxNumber, NULL)
}

So the total numbers that will be emitted are equal to the number of mapper or in other words, equal to the number of splits of the input.

Now define the custom key comparator so as to force map reduce to sort key in descending order.

REDUCE PHASE::
Private haveEmitted = 0;
Reducer(key, listof) {
//Because the first key is the max number
if(haveEmitted == 0)  emit(key, NULL)
}

OPTIMIZATION::
We can instead use global counter MR job and set the counter in cleanup() function in mapper phase.

cleanup() {
if(current value of counter < maxNumber)
set the counter = maxNumber.
}

The synchronization on access of global counter is maintained by MR framework.
So at the end we can print the counter and it will have the max number. Using this way we can save the cost of creation of one reducer and associated shuffling and sorting.

Read input from memory buffer in ffmpeg.

The aim is to read input from memory buffer instead of file, pipe, etc in ffmpeg.

1) Do the normal configure, make. This will generate config.h with all the macros depending upon the choices you made when configuring the ffmpeg.

2) Add #define CONFIG_MEMBUF_PROTOCOL  1  at the end of config.h. (But just before #endif /* FFMPEG_CONFIG_H */)

3) Add REGISTER_PROTOCOL(MEMBUF,    membuf); any where in libavformat/allformats.c.

4) Add the below code snippet in libavformat/file.c just below  the line #endif /* CONFIG_PIPE_PROTOCOL */

#if CONFIG_MEMBUF_PROTOCOL
typedef struct MemContext {
const AVClass *class;
char *membuf;
int membuf_size;
int curr_pos;
} MemContext;


static const AVOption membuf_options[] = {
{ "membuf_size", "set size of input buffer", offsetof(MemContext, membuf_size), AV_OPT_TYPE_INT, { .i64 = INT_MAX }, 1, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM },
{ NULL }
};


static const AVClass membuf_class = {
.class_name = "membuf",
.item_name = av_default_item_name,
.option = membuf_options,
.version = LIBAVUTIL_VERSION_INT,
};


static int mem_open(URLContext *h, const char *filename, int flags)
{
MemContext *c = h->priv_data;
char *ptr = NULL;
long long int mem_addr = NULL;


av_strstart(filename, "membuf:", &ptr);


if(ptr != NULL) {
mem_addr = strtoll(ptr, NULL, 16);
if(mem_addr != 0) {
c->membuf = (char*)mem_addr;
c->curr_pos = 0;
return 0;
}

}
return -1;
}

static int mem_read(URLContext *h, unsigned char *buf, int size)
{
MemContext *c = h->priv_data;


if((c->curr_pos + size) <= c->membuf_size) {
;
} else {
size = c->membuf_size - c->curr_pos;
}


memcpy(buf, c->membuf + c->curr_pos, size);
c->curr_pos += size;
return size;
}


/* For now we dont need it */

static int mem_write(URLContext *h, const unsigned char *buf, int size)
{
MemContext *c = h->priv_data;
#if 0
int r;
size = FFMIN(size, c->blocksize);
r = write(c->fd, buf, size);
return (-1 == r)?AVERROR(errno):r;
#endif
return 0;
}

static int64_t mem_seek(URLContext *h, int64_t pos, int whence)
{
MemContext *c = h->priv_data;
char* end;

if (whence == AVSEEK_SIZE)
return c->membuf_size;


if (whence == SEEK_SET)
c->curr_pos = pos;
else if (whence == SEEK_CUR)
c->curr_pos += pos;
else if (whence == SEEK_END) {
c->curr_pos = c->membuf_size;
c->curr_pos += pos;
} else {
errno = EINVAL;
return -1;
}

return c->curr_pos;
}

static int mem_get_handle(URLContext *h)
{
MemContext *c = h->priv_data;
return 0;
}

static int mem_check(URLContext *h, int mask)
{
int ret = 0;
const char *filename = h->filename;
av_strstart(filename, "file:", &filename);

return 0;
}


static int mem_close(URLContext *h)
{
return 0;
}


URLProtocol ff_membuf_protocol = {
.name = "membuf",
.url_open = mem_open,
.url_read = mem_read,
.url_write = mem_write,
.url_seek = mem_seek,
.url_close = mem_close,
.url_get_file_handle = mem_get_handle,
.url_check = mem_check,
.priv_data_size = sizeof(MemContext),
.priv_data_class = &membuf_class,
};

#endif /* CONFIG_MEMBUF_PROTOCOL */

5) The sample command line for the above code should be something like this:

ffmpeg  -membuf_size size_of_membuf membuf:address_of_input_buffer outpt.mp4

6) Modification in ffmpeg.c to test the following changes:

Change the name of the main to ffmpeg_main and define your own main function which will read file from disk and put in the memory, and then pointer will be passed on to ffmpeg as input parameter.


int ffmpeg_main(int argc, char **argv);
int main() {
char ffmpegCmd[2048] = {0};
char *argv[100] = {0};


int size = 0;
int read = 0;
char *buf = NULL;
//Read the file in memory
{
FILE *fp = fopen("/home/user/Desktop/1.mp4", "rb");


if(fp == NULL)
return 0;


fseek(fp, 0, SEEK_END);
size = ftell(fp);
fseek(fp, 0, SEEK_SET);


buf = (char*)av_malloc(size);
read = fread(buf, 1, size, fp);
if(read != size)
return 0;


fclose(fp);
}

snprintf(ffmpegCmd, 2048, "ffmpeg -membuf_size %d -i membuf:0x%x /home/user/Desktop/1.avi", size, buf);


char *token = strtok(ffmpegCmd, " ");
int argc = 0;
/* walk through other tokens */
while((argc < 100) && (token != NULL) ) {
argv[argc] = (char*)av_malloc(2048);
snprintf(argv[argc], 2048, "%s", token );
argc++;


token = strtok(NULL, " ");
}

ffmpeg_main(argc, &argv);


//free up the mem
while(argc != 0) {
av_free(argv[argc]);
argc--;
}
}

Some details about STUN Protocol

STUN

EXCERPTS FROM http://www.voip-info.org/wiki/view/STUN

  • STUN enables a device sitting inside private network,  find its public IP address and the type of (Private Network) NAT service its sitting behind. Its the backbone for any P2P(Peer to Peer networks).
  • STUN operates on TCP and UDP port 3478.
  • STUN may use DNS SRV records to find STUN servers attached to a domain. The service name is _stun._udp or _stun._tcp

Working
Stun client send STUN REQUEST to public stun server and server replies with STUN RESPONSE. Stun response includes the ip and port information of public router from which the Stun request was propagted to public Stun server.

Various types of NAT

  • Full Cone: A full cone NAT is one where all requests from the same internal IP address and port are mapped to the same external IP address and port. Furthermore, any external host can send a packet to the internal host, by sending a packet to the mapped external address.
  • Restricted Cone: A restricted cone NAT is one where all requests from the same internal IP address and port are mapped to the same external IP address and port. Unlike a full cone NAT, an external host (with IP address X) can send a packet to the internal host only if the internal host had previously sent a packet to IP address X.
  • Port Restricted Cone: A port restricted cone NAT is like a restricted cone NAT, but the restriction includes port numbers. Specifically, an external host can send a packet, with source IP address X and source port P, to the internal host only if the internal host had previously sent a packet to IP address X and port P.
  • Symmetric: A symmetric NAT is one where all requests from the same internal IP address and port, to a specific destination IP address and port, are mapped to the same external IP address and port. If the same host sends a packet with the same source address and port, but to a different destination, a different mapping is used. Furthermore, only the external host that receives a packet can send a UDP packet back to the internal host.

By default all the connection that our NAT Device (Router) makes are symmetrical. The internal ip:port mapping with ext ip:port is short lived. That’s why, the ext ip:port which is visible to some other external device is not eternal and will close when there is no data movement through this ext ip:port to internal ip:port or vice versa.

PRACTICAL USAGE OF STUN:

This protocol is backbone of all the P2P based softwares.

How this works?

Both the parties trying to establish contact, need to know each others details. The details needed are:

* The ip:port of other side + some extra info like codecs + rtpmap, etc.

So each side use stun protocol to get it corresponding public ip and port. Then application sends this information to other involved party. The other party also does same thing.

Now each end has the information about other side. Now the only thing left is to use udp sendto using the address and port that application received.

* A peek into Stun.[RFC 5389 (Current as per October 2008)]

Note: XOR_MAPPED_ADDRESS support on STUN Servers

Some home routers (namely Linksys routers whose firmware is based on the Linux kernel) have a tendency to alter the STUN reply packets from the STUN server. It changes the MAPPED_ADDRESS from the public IP address derived by the server to the IP address of the router’s WAN port. If the router’s WAN port is not assigned a public IP address (as in the case of Internet Service Providers like AT&T Uverse), then the application using STUN to discover its public IP address will get the wrong info.

STUN provides a work-around to this problem via XOR_MAPPED_ADDRESS. A STUN client can request an XOR_MAPPED_ADDRESS as well as the standard MAPPED_ADDRESS. While the router may alter the MAPPED_ADDRESS, it shouldn’t change the XOR_MAPPED_ADDRESS.

Unfortunately, not all STUN servers support XOR_MAPPED_ADDRESS. The public STUN servers listed on this wiki have been updated with info about lack of support for XOR_MAPPED_ADDRESS.

* Stun Protocol is also used in ICE, so as to check whether the other side of UDP data receiver is alive or not. Visit http://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment for details.

IETF Specifications

  • RFC 5389: Session Traversal Utilities for NAT (STUN).
  • RFC 5766: Traversal Using Relays around NAT (TURN): Relay Extensions to STUN.
  • RFC 5245: Interactive Connectivity Establishment (ICE): A Protocol for NAT Traversal for Offer/Answer Protocols.

What lies ahead of Moores’ law ?

Intoduction

            Moore’s law is not a scientific law. Rather, it was an observation of development trend in the field of semiconductor industry. In 1965, Dr. Gordon Moore predicted that the number of transistors on a computer chip would double every year. In 1975, Moore reconsidered his prediction and revised the rate of doubling to roughly every two years. This observation was implicitly linked to things as disparate as microprocessor clock frequency, microprocessor power consumption, general computer-system throughput, disk storage capacity. The applicability of Moore’s statement in almost all realms of semiconductor industry propelled this observation to become a law, which can predict the trends and help in laying out the roadmap for future research and business planning.
For the last 45 years, industry has been successful in meeting the prediction of Moore’s law. Today, Intel’s 8-Core Xeon Nehalem-EX has 2 billion transistors with spacing of 45 nm, and they have almost decreased the gap to 32 nm recently, as compared to 60 transistors on a chip in 1965. These developments lead to drastic enhancement in the computational speed, which further lead to several paradigm shifts like Hardware to software, evolution of Internet, popularity of Mobile phones.
But miniaturization cannot be a tool to boost chip speeds forever. There are several hindrances like
1) Heating issue – Excessive heat dissipation can destroy the silicon wafer.
2) Lithographic issue – Excessive doping can go disastrous and how to carve out transistor of size of few nanometers.
3) Financial issue – Packaging cost has drastically increased at such a small size of transistors.
4) Fundamental limit – If we start hitting layers and components that are 5 atoms thick, the Heisenberg Uncertainty Principle starts to kick in and we would no longer know where the electron is. Most likely, the electrons on such a small transistor would leak out, causing the circuit to short.
So, demise of Moore’s law is inevitable.  It can’t sustain more than a decade. This raises a question on advent of new technology (ies) which can carry the baton of computational speed enhancement which was till now, were in the hands of silicon chip.
Areas of interests of current researches
A lot of researches are going in diverse fields in a bid to find a suitable successor of Silicon. Popular areas of current researches are Quantum computing, Parallel computing, Genome computing, optical computing, neuron computing.

Quantum Computing
The belief over quantum computing got catapulted in 1994 when shor published the paper to calculate the factor using quantum computer. This leads quantum computer scientist to believe that quantum computers can solve problems that are intractable for conventional computers because quantum computers work according to principles entirely different than conventional computers, and using those principles can solve problems whose solution will never be feasible on a conventional computer.
Quantum computers work by exploiting what is called “quantum parallelism”. The idea is that a quantum computer can simultaneously explore many possible solutions to a problem and with some checks, it is possible to pick out the correct solution to the problem.

Suppose you have n bit register. You can always define a particular state by a single number for conventional computers. But you need 2n numbers to define a state in quantum computer. When these states are worked upon using Quantum gates, they further produce new list of exponential states.  And this application of quantum gates is what defines a quantum algorithm.

Parallel Computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously. This works on divide and conquers principal. Large problems can often be divided into smaller ones, which are then solved in parallel. Parallelism is implemented at bit level, instruction level, data and task level.
Physical constraints preventing frequency scaling has made parallel computing, the dominant paradigm in computer hardware, mainly in the form of multicore processor.
Multicore processors are only useful only when the programs have been written to use all cores, else it will be waste of CPU. This is huge paradigm shift for software programmers who are used to write sequential programs. Moreover parallel programs are difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs like race condition, communication and synchronization between the different subtasks are typically one of the greatest obstacles for getting good parallel program performance. Debugging is also very critical issue.
Grid computing and cloud computing are forms of parallel computing in which the machines are distributed over the inter or intra network.

Challenges for informatics
Informatics science deals with processing, management and retrieval of information. Huge generation of data from diverse fields, like internet, biological researches, and meteorological observations has been demanding state of the art solutions. Informatics has to be ever evolving so as to be in sync with the latest developments which promise to increase the computational throughput. Quantum computation is still in research stages, so introducing parallelism in existing solutions is the demand at present. It is very important to develop software solutions that use extensive multi threading.  Developing new software on this line is no problem but to alter huge codes which are sequential in nature is huge task.  It has become very important to have new age of compilers which takes away the pain from developer to develop parallel solutions. We also need to have nice and easy frameworks which can easily alter the sequential code into parallel code. Technical institutions should also prepare new generation of engineers who approach parallel to a problem rather than sequential.  Solutions which are resource intensive should start using cloud computing and grid computing. Huge computations required in the field of bio-informatics, whether prediction should be able to reap the benefits of cloud computing.

 

Conclusion

            Moore’s law had a tremendous run in changing computational world once and for all. It served well to accelerate development economically and socially across the globe by replacing all the conventional systems by computerized systems. In other words, our life is now ruled by microchips. But now the death of Moore’s law is inevitable.

            Vision of future technology is still blur as all the work in quantum computing, genome computing, optical computing, neuron computing are in research stages.  Parallel computing can only provide improvement over existing solution but it can’t be assumed as icebreaker. Moreover parallelism is always restricted by sequential procedures.

            In my opinion, convergence of quantum computing and neuron computing will be the ultimate solution. As these two are the bases of evolution of Universe. We need the ultimate speed of quantum computing supported with the artificial intelligence of Humans.

            Jack Welch, former CEO of General Electric once said, “When the rate of change inside an institution is less than the rate of change outside, the end is in Site”.

            So, it has become utterly important to fuel up our institutions of excellence with funds and fervor of cold war. We need to push scholars in natural sciences and researches rather than professional courses which just get you a good job.
The demand of time is to change the conventional mindset to evolutionary mindset. No one from vacuum tube industry was part of silicon evolution because they were bounded by that thinking. So in my point of view, out of bound thinking is the crux that will take the computational limits to new zenith.

References
The above materials are collated from web based resources.