Tuesday, December 28, 2010

Fact Sheet for computer science

In this blog I will include all the facts about algorithms or data structures.

Insertion Sort vs Merge Sort
Which sort to use and when is important or must know practical limitations. It is not necessary to blindly implement the best algorithm always.
Normally in languages like Java or Python sort is implemented as Insertion Sort or Quick Sort as default system sort.
Merget Sort - O(n lg n) - when input is more than 2^12
Insertion Sort - O(n^2) - when input is less than 2^12

Binary Tree fanout :
Height of binary tree is calculated as
h <= lg{f} n , where f= fanouts possible at each level for each node. To redue disk reads fanout can be increased or can be adjusted at runtime depending upon number of nodes. Normally 1000 fanouts can contain 1 million nodes in the tree which is quite sufficient.

Use C wrapper for computation:
C is fast. But when. Why everyone is not using C then?
There are issues with C wrt memory management, reading files, DB connections, finding developers, learning curve, fast development (RAD). But still I agree that for computations C is the best. C can give 20X improvement when load is computations. In case when it is required to compute keys very fast (as in large systems, to calculate key for DHT) C wrapper can improve performance depending upon the load curve.

===============================================================================

Registers & CPU cache (1 nanosecond): These are small, expensive and very fast memory slots. Memory controllers try mightily to keep this space populated with the data the CPU needs. A cache miss means a 100X speed penalty. Even with a 95% hit rate, CPU cache misses waste half the time.

Main memory (10^2 nanoseconds): If your computer was an office, RAM would be the desk scattered with manuals and scraps of paper. The kernel is there, reserving Papal land-grant-sized chunks of memory for its own mysterious purposes. So are the programs that are either running or waiting to run, network packets you are receiving, data the kernel thinks it's going to need, and (if you want your program to run fast) your working set. RAM is hundreds of times slower than a register but still orders of magnitude faster than anything else. That's why server people go to such lengths to jam more and more RAM in.

Solid-state drive (10^5 nanoseconds): SSDs can greatly improve the performance of systems with working sets too large to fit into main memory. Being "only" one thousand times slower than RAM, solid-state devices can be used as ersatz memory. It will take a few more years for SSDs to replace magnetic disks. And then we'll have to rewrite software tuned for the RAM / magnetic gap and not for the new reality.

Magnetic disk (10^7 nanoseconds): Magnetic storage can handle large, contiguous streams of data very well. Random disk access is what kills performance. The latency gap between RAM and magnetic disks is so great that it's hard to overstate its importance. It's like the difference between having a dollar in your wallet and having your mom send you a dollar in the mail. The other important fact is that access time varies wildly. You can get at any part of RAM or SSD in about the same time, but a hard disk has a physical metal arm that swings around to reach the right part of the magnetic platter.

Network (10^6 to 10^9 nanoseconds): Other computers. Unless you control that computer too, and it's less than a hundred feet away, network calls should be a last resort.

Saturday, December 25, 2010

Google File System

Here is my understading of Google File System from Youtube video.

Google File System
- provides file system abstraction for scalable system
- Google File System has a Namenode
- Namenode works as master server

- Files are devided into 64 MB chunks (why? Becuase large systems have a very high read load. So 64MB chunks (which are pretty big) are necessary so reads will finish only when chunk is read completely which is regular case)
- Due to high frequency of hard disk failures data needs to be replicated so chunks are replicated
- The locations of the chunks are stored by Namenode and it fits totally in the memory

writing the files
- Coordinated by not executed by Namenode
- Namenode chooses the primary replica to do the writes
- For all the writes the primary replicas would be same
- Now for primary replica is leased for mutation to client i guess
- Namenode does the write ahead logging for each write
- Each chunk server sends back the the ack to primary replica server
- After each write is received at every chunk server primary replica server commands to execute writes
- Writes done and ack is send back to primary replica to client (not to namenode)

Appending Records at the end of the file:
- Record append functionality is provided
- Line can be appended at the end of the line

Deleting Files
- Deleting files works as lazy deletes
- Files are maked delete and the garbage collected
- Deletion is not considered as main operation

GFS also puts some responsibility to clients / programs. As appending records may create muliple reocords in the file and those records are sent back to client/program. The program now reads the file and sees multiple / stale records. Recognizes them and handles the descrepancy. Consider this as trade off between consistency and scalability.

For more information refer to this Google File System.

My Understanding of Network File System (NFS)

NFS
> Stands for Network File System
> Client Server Mechanism
> NFS to be hosted by one single server machine (not suitable for scaling)
> NFS works on TCP/IP layer.

Mounting NFS
> User boots up the machine and NFS is mounted as file system
> OS is required to do same file system calls and NFS mount will handle it

Interaction between client and server
> Client makes a request for a file and translated to NFS
> Client uses TCP/IP to communicate
> Reqeusted file is locked and sent back to client
> Client caches the copy of the file (or part of the file)
> The lock is a time base entity. Say for 5 mins lock will stay alive for client.
> Meanwhile accessing the file client can release the lock, can request lock on the file (to restart the time quota), on go down (NFS will time out and releases the lock)

Cache Consistency or Coherence of files
> As mentioned above the files are stored in the local cache (temporal locality)
> Now if client requires to use the same file, a request goes back to NFS checking if the cache is consistent with file on the server (caching the same copy)
> If the copy is updated then new version of file is fetched else client can use the cached copy of file

How to cache consistency is not 100% in NFS
> Client has already cached a file
> Client requests the file and cache is consistent with NFS copy
> While client is using the cached copy of file another client (client0) requests a lock on that file to NFS
> NFS allocates lock to client0.
> Client0 updates the file while client is reading the file
> So there are chances that client may read a stale version

For more information check NFS @ Wiki.

Friday, December 24, 2010

Tree traversals

Preparing for interviews always go through different tree traversals. Wikipedia has an excellent article over tree traversal.



The tree traversals are:
> InOrder (Sorted Output)
> PreOrder (Root first)
> PostOrder [Depth First Search](children first)
> LevelOrder(Top Level First)

InOrder Code:

void InOrder(Node root){
if(root.left != null) InOrder(root.left);
print (root.value);
if(root.right != null) InOrder(root.right);
}


PreOrder Code:

void PreOrder(Node root){
print(root.value);
if(root.left != null) PreOrder(root.left);
if(root.right != null) PreOrder(root.right);
}


PostOrder Code (Depth First Search):

void PostOrder(Node root){
if(root.left != null) PostOrder(root.left);
if(root.right != null) PosOrder(root.right);
print(root.value);
}


LevelOrder Code :

void LevelOrder(Node root){
// use queue to get level order traversal
// Queue provides two functions. 1. head() 2.add()
Queue q = new Queue();
q.add(root);

while(!q.isEmpty()){
root = q.head();

if(root.left != null) q.add(root.left);
if(root.right != null) q.add(root.right)l

print(roor.value);

}
}


Except Level Order Traversal remaining of the traversal methods can be rewritten using stacks. Remember you might need to push and pop elements in stack rather than calling the function again. Also sometimes visited flag is required to make sure function is not repeating the same values again.

Have a great fight !!

How to switch between remote host and localhost while SSHing?

How to switch between localhost and remotehost while sshing?

localhost$ssh -l username remotehost.com
-put your password here
remotehost$
-now you are connected to remote host
remotehost$~
-press enter, now remote host goes to background and localhost shows up
localhost$
-to open the session to remote host
localhost$jobs
[1]+ Stopped ssh -l username remotehost.com
-to go back use fg command
localhost$fg %1
ssh -l username remotehost.com
remotehost$
-now you are back to remotehost session

Wednesday, December 22, 2010

Installing Apache MySql Server and Apache on Ubuntu

Ubuntu gives beautiful functionality of installing Apapche http web server, MySql Server and PHP. It is sudo apt-get install.

To start before installation just update OSes package index.

$ sudo apt-get update


I was trying to install apache and it works fine. Here is what I did.

$ sudo apt-get install apache2

Any one have ever seen which lang Apache Web Server is written?
It is C. It made me worried because it does not have garbage collection.
Might not work if too much load is generated and resources are getting free is required.

Anyway lets get back to installation. To check Apache has been install properly to to web browser and go to http://localhost/
The text itself is self explanatory.

Now to install php :

$ sudo apt-get install php5 libapache2-mod-php5

As libapache2 has been installed to take affect restart Apache.

$ sudo /etc/init.d/restart

OK. So now php is up and running. How?

$ sudo gedit /var/www/phpinfo.php

Type in

It you see page which is not showing any error, PHP works. NICE !!

Now its time for installing MySql :
$ sudo apt-get install mysql-server mysql-client

This should finish the install. Remember the passwords you are installing here.

Now its time to check whether login and MySql is working fine.

$ mysql -u root -p

Here root is the user created. MySql service will ask for the password after pressing enter. And then you will see

> mysql
> exit

Now some more installation for PHP
$ sudo apt-get install libapache2-mod-auth-mysql php5-mysql phpmyadmin

Now this installation may or may not ask you to select Yes No type pop up box.
I got some but in this changing world if you don't get them, don't worry just move on.

If you do get such Yes No Pop ups then select by reading the text.

Next thing to do is to restart Apache2.

$ sudo /etc/init.d/restart

This is good enough for right now. If you run into any issue just Google it.

Have great day / night ahead.

Sunday, October 31, 2010

SSH to Virtual machine running as a service

Running virtual box as a service has many advantages however accessibility is limited. Ssh is very famous way to access virtual machines running ssh server.In this blog I will show how to setup ssh on Ubuntu as Guest running over Windows as a host and accessing Ubuntu resources through Windows machine using ssh.

On ubuntu:
$sudo apt-get install ssh

#this will prompt you sshserver. Say yes and install ssh / open-ssh server

$ifconfig

#This command will print your current IP configuration
#As virtual machine in virutal box runs as NAT usually, no static ip is attached to it.
#To ssh we will need a static IP attached to virtualbox. Lets do that.

$sudo vim /etc/network/interfaces

#This will open the file containing network configuration. Comment out existing text and write this:

iface eth0 inet static
address 192.168.10.50
netmask 255.255.255.0
gateway 192.168.1.254


#Now restart your virtual machine
$sudo shutdown -r now

Now on windows machine start putty and ssh to 192.168.10.50.

And this is it if you know your login userid and password for ubuntu.

References:
1) A bog which I dont remember anymore
2) VirtualBox
3) Ubuntu

Running Virtual Machine as Service (virtual box)

Oracle VM Virtual Box is a good utility to try different OSes and keep you host machine clutter free from development installations. Also makes sense to take a regular back up using rsync (file synchronization utility).

This blog post describes shortly how to run virtual box as a service.

To run virtual box as a service we have to start it from command line first and then add the batch / script to scheduled task (on windows) or a cron job ( on unix/linux).

Here are the steps:
1) go to c:\Program Files\Oracle\VirtualBox
2) execute "start /b /wait /low VBoxHeadless -startvm "virtual machine name"
3) press enter

This is it. Now you just need to schedule this execution and now you can run virtual machine in virtual box as a service.

To access virtual machine you can use ssh / remote rdp.

Reference:
http://forums.virtualbox.org/viewtopic.php?f=6&t=1887&start=15

http://www.virtualbox.org

Sunday, October 24, 2010

Optimizing JVM : tips and tricks

Looking at todays fast paced world performance management and optimization is necessary to stay competent. As working on Java currently, JVM optimization is my current research topic.

JVM works certainly good buy by all means can be improved. I read some suggestions here and will summarize them quickly.

Using JAVA_OPTS -Xmx and -Xms
- If JVM is invoking garbage collection too frequently, meaning that your app heap space is not enough. JVM is using garbage collection to free up some space for your app. Result is drastically reduced performance. -Xmx option can be used to increase the heap space.
- Use -Xms switch to enable heap space equal to maximum allocated memory (all your RAM).
- If garbage collection is using too much time of your CPU, use -Xincgc to garbage collect in phases rather than all in one shot.

Benchmarking suite for java

Java Grande, an initiative to promote Java for grande applications. Grande applications are apps which require lots of memory, bandwidth and processing power.

For more information visit JavaG Benchmarking

Saturday, October 23, 2010

Optimize query or filter data on App side

Some days ago I ran into problem where complexity of query became prominent. The query used to return 60000 rows after filters and has to compare them against "IN" block of query which again containing around 60000 keys in them. You can see it is an N^2 complexity.

So the first solution I ran into was to pass 60000 keys to the database. As Java being my (so called sophasticated) language that I am using, I have to use jdbc connectors. As jdbc connectors has limitation of passing only 1000 parameters, it is my problem to tackle with. At this point I was still hoping 60000^2 would be pretty fast on DB side. Coming back to parameters issue, I first used Query.setParameterList(Collection), but failed miserable.

Moving forward and identifying problem decided not to use Query.setParameterList(Collection). However still hoping 60000^2 would work once I can pass the parameters. Now to hack the jdbc a bit, I requested the query string first with a patters in it. The patter would be replaced by 60000 keys which is generated by java program. After replacing the string,query sent to database to do the comparison and I waited ........ Did not come back. ALAS.

An interesting discussion with my colleague suggested that processing data on java would be a better idea. I fastened my seat belts for this experiment. I did the same query but this time just returned all the rows to app side. App was comparing the results, a bit slow but much much faster than SQL query.

Sharing in the benefits of those who believe in getting things done .....

How to use setParameterList

JAVA Link

java.text.SimpleDateFormat

DateFormat dateFormat = new SimpleDateFormat("");

For 20th October, 2010 13:40:19.333, that is 20th October, 1:40 PM and 19 seconds with 333 seconds, the Java mapping would be.

yyyy = 2010
yy = 10 (instead of 2010 only 10 would be printed. Don't forget y2k problem)
MM - 10 (10th month of the georgean calendar)
dd - 20 (20th day of October month)
HH/hh - 01 (should print hour part of date format)
mm - 40 (minute part of the date part)
ss - 10 (seconds of the date part)
SSS - 333 (milliseconds part of the date format)

Usage:

Date d = new Date();
DateFormat datefromat = new SimpleDateFormat("yyyyMMdd HH:mm:ss SSS");
String dateAsString = dateformat.format(d);
System.out.println("The date in text format is:" + dateAsString);

Imports:

java.util.Date
java.text.SimpleDateFormat

Tuesday, June 29, 2010

I want to merge two word docs in following fashion. Doc1={1,2,..n}, Doc2={1,2,..m}.

The final doc should be Doc3 where Doc3_1 = Doc1_1 (first page of doc 1), Doc3_2 = Doc2_1(first page of doc 1),Doc3_3 = Doc2_2 and so on and so forth.

So all pages of Doc1 should be mapped as odd pages of Doc3 while Doc2 pages should be mapped as even pages of Doc3. Is there a way of doing it without writing macros?

Let me know if you come across the solution.

Sunday, June 27, 2010

Difference between /etc/hosts and /etc/networks

In many of linux based Operating Systems, /etc/hosts and /etc/networks both files exists. Both files are useful to give aliases to computer names.

/etc/host file contains alias for all the hosts. So if you have host named "tango", you can put an entry "10.2.3.1 tango", indicating the which IP to connect when OS is told to connect "tango". Moreover you can also specify multiple ips to perticular host.

/etc/networks is helpful to see if two different box names are referring to same ip. As an example "charlie 12.234.2.1 alice" tells OS network adapter that one ip refers to two box names.

Due to similar functionality of /etc/hosts and /etc/networks, /etc/networks is removes from many linux based Systems. Linux based Fedora, RedHat linux and Ubuntu no more have /etc/networks file.

Friday, June 25, 2010

Do you want to win a free iPad(registered trademark of Apple Inc.). Click on the link below. You can win a free iPad. Yes the same iPad which is known as laptop killer. A new generation of tablets itegrated with touch technology.

Enjoy

https://www.csi.ca/student/en_ca/prmo/csi_app.xhtml?icid=BNN-CSIAPP-I01

Saturday, June 5, 2010

VIDEO tag in HTML5

Long long time ago when HTML was invented it was mend to create information and provide information over the Internet ie TCP/IP. The HTML also improved time by time and incorporated the CSS and other beautific attributes.

As time passed the pdf, doc, xml, excel, png, jpeg and other file formates were created and included in the HTML to transfer the data and information. Looking at the new era roaring, vidoes are creating a major part of file transfer. Goolge in GOOGLE I/O published that video tag will be included instead of flash, flv or other objects to be put in the HTML. It will be similar to put a jpeg, png on the web page. H.264 video extension was bought and publisized by google to inlcude as in HTML without any issue and every browser having HTML5 parsing capability, can show and render video like any other HTML tag.

So get ready for heavy video tags on HTML. I would prefer to get higher speed connection as video is the next form of information than text. For more accurate information click HTML5 wiki link .

Tuesday, May 25, 2010

Walmart iphone price drop by $100

Folks who are interested in buying iPhone will be happy to hear that Walmart has decreased the prices of the iPhones. The iPhone 3G is now on sale for CAD$97 + taxes with contract depending upon your carrier and nearest store.

The cause of price drop could be new iPhone arriving with 4G, the next generation which will definitely lead to price drop of iPhone 3G which is on sale by Walmart. The news I read also confirms that they had a very big stock of iPhone 3G which lead to steep decrease in the price. At the same time Apple is no more accepting orders for iPhone 3G while some time ago the iPhone 4G also leaked which might be the cause of price drop.

As still what will be in 4G is not confirmed by Apple Inc it is well established that it would be another step stone in Smartphone arena.

Friday, May 14, 2010

Call center firm to employ Indian prisoners

The indian call center company is determined to employ criminals who are accused for murder and violence. The company's name is Radiat Infosystems, the indian firm, who had customers from Britain and the United States. Such companies have eyed upon security issues such as hiding sensitive information however it is certain that the idea will definitely add to Jail's revenue and also improving culture of jail.

For more information click here

Tuesday, May 11, 2010

People donated hair of their head for oil cleanup

A charity based on San Francisco named Matter of Trust reported hundreds of thousands of pounds of hair donated from United States,Brazil, Canada, France, Germany, Spain and UK spreading awareness through well main stream media and social networking sites.

Hair for Oil Spills program is an effective idea, as hays (not hairs) can be used to soak the oil and fishermen can be paid for clearing the oil spill. This strategy can provide income to fishermen who are badly affected by the oil spill while oil giants are still fighting over taking responsibility of the oil spill. The farmers who make the hays can make money selling the hay so we can have multiple winner here.

Here is the matter of trust video. However BP has acknowledged its awareness of this solution and decided to stick with Sornet bolls mad by Andax Industries as there is no shortage of Sorbent booms at the moment.


Here is the matter of trust video.

Monday, May 10, 2010

US has more Android users than iPhone

Analyst house NPD at UK research held a comparison of smart phone OS market in US. According to NPD the android handset acquires 28% of market while keeping RIM at first place with 36% of market share leaving Apple's at third place with 21%.

Mr Rubin from NPD announces that BlackBerry 6, HP's acquisition of Palm, Windows 7 phone by Microsoft shows large companies have eyed upon increasing smarphone market (of course after iPhone success) to meet consumer demand.

While comparing carrier shares (not by NPD) AT & T has largest slice of smart phone market while keeping Verizon at 2nd place, T-mobile at 3rd.

In one of the news[3] showed chart below which compares the sales results of Apple's iPhone and Motorola Droid (an Android OS phone) for first 74 days. The launch dates were: iPhone, June 29, 2007; Droid, November 5, 2009; and, Nexus One, January 5, 2010.



Definitely it raises a question that is Android phones are iPhone killer? Whatever the answer may be but it is definitely affecting RIM's stock prices.

References:
[1]http://www.v3.co.uk/v3/news/2262773/android-handsets-outsell-iphone
[2]http://www.v3.co.uk/v3/news/2262372/smartphone-sales-rise
[3]http://blog.flurry.com/bid/31410/Day-74-Sales-Apple-iPhone-vs-Google-Nexus-One-vs-Motorola-Droid