Wednesday, February 8, 2012

Setting up email server on CentOS 6.2 within 5 minutes

If you have only 5 minutes,  you can still setup an email server on CentOS 6.2. Don't waste it :)

This email server supports SMTP(TCP port 25) and IMAPS(secure IMAP,TCP port 993). Now, you can specify this as an outgoing and incoming email server on email client such as Thunderbird on your PC.

1. Install packages

Three packages are required for this. Install them if you haven't done yet.
$yum install sendmail
$yum install sendmail-cf
$yum install dovecot

The role of sendmail is to receive emails destined to you and keep them in your mailbox on email server. Then, dovecot actually delivers those emails to your PC when you open Thunderbird or Microsoft Outlook. For outgoing email, Thunderbird first contacts sendmail, then sendmail relays the email to final destination for you.

2. Configure sendmail

You just need to change 2 lines in configuration file /etc/mail/sendmail.mc

Comment out this to allow receiving email from anywhere.
dnl DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')dnl


Add this line
FEATURE(`relay_hosts_only')dnl

Add your PC's full hostname in this file. Create one if this file doesn't exist.
/etc/mail/relay-domains

After changing configuration file, run this command to activate it.

$/etc/mail/make
$service sendmail start

3. Configure dovecot

You just need to edit two files.

In /etc/dovecot/dovtcot.conf, just edit these two lines
protocols = imap
listen = *, ::

In /etc/dovtcot/dovecot.d/10-mail.conf, edit these 3 lines

mail_location = mbox:~/mail:INBOX=/var/mail/%u
mail_privileged_group = mail
mbox_write_locks = dotlock fcntl

Start dovecot service
$service dovecot start

4. (Optional) Reconfigure iptables only if you are already using iptables
Add these 2 lines into /etc/sysconfig/iptables to allow email to go through firewalls.

-A INPUT -m state --state NEW -m tcp -p tcp --dport 25 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 993 -j ACCEPT

then, restart iptables by
$service iptables restart

DONE

This is it. Of course, you can do more to enhance the security level of your email server. For example, you can make sendmail more secure by using SMTP over SSL. Feel free to suggest any idea about this article. Thanks.

Monday, January 30, 2012

2012 Spring Short Course, Introduction to Unix/Linux

Date : Jan. 30(Monday) ~ Dec. 2(Thursday).
Time : 3PM ~ 5PM
Location : Teague Rm# 103

It is highly recommended for all attendee to get your EOS login ID ready before the first day of class. EOS login ID is required for hands-on lab.

Introduction to Linux is a short course specifically designed for beginner to Linux/Unix system. It will cover basic concept of Linux and frequently used commands.

Basic
  • What is Linux/Unix?
  • File and Directory
  • Edit text file
  • Setup environment
  • Remote access
Advanced
  • Process,Signal
  • I/O redirection,Pipe
  • Alias
  • Permission
  • Kernel & Shell
For detail,

https://sites.google.com/site/tamulinux/introduction-to-linux

Please, post a comment or feedback about this class. It would be helpful to improve the class.
Thanks you.

Brian Kim

Wednesday, October 26, 2011

Performance Analysis of NAMD on NVIDIA GPU

Abstract

As GPU become readily available resources in high performance computing environment, more and more applications are being considered to be a target application for taking advantage of highly parallel computing capability of GPU. To evaluate the performance improvement in application level, molecular dynamics applications, NAMD, is experimented in this article and its performance has been analyzed in three different perspectives: Scalability, speedup, and GPU utilization.

1. Experimental environment

GPU nodes are relatively new addition to the existing EOS system at Texas A&M University. Table 1 shows detail of the node tested in this experiment. It has two six core 2.8Ghz Intel Xeon X5660 processors based on Westmere architecture and 24GB DDR3 DRAM with 1,333Ghz clock cycle. Additionally, it is equipped with two NVIDIA M2050 GPU devices. Each M2050 GPU device runs at 1.15Ghz and 2687MB GDDR5 memory with 1.546Ghz clock cycle. M2050 is a high performance grade device with 14 streaming multiprocessors and each of them has 32 cores which is based on Fermi architecture. It also has ECC memory error protection, and L1 and L2 cache to provide not only accuracy but also high performance on computation. The node is running 64-bits Red Hat Enterprise Linux Server release 5.4.

NAMD has been tested on three configurations. First, it is executed with CPU only configuration. Secondly, it is executed with 1 NVIDIA M2070 GPU. Finally, it is executed with 2 NVIDIA M2050 GPUs. Since these tests are submitted through batch system and there is no guarantee that each test ran exclusively on compute node and compute node could be shared by several other jobs as well. Therefore, the performance presented in this article could be different from exclusive testing environment.


NVIDIA
M2050
NVIDIA
M2070
INTEL
X5660
Clock speed(GHz)
1.15
1.15
2.8
Memory(GB)
2.687
5.375
24.081
Memory Clock(GHz)
1.546
1.566
1.333
# of CUDA cores
448
448
N/A
CUDA Driver ver.
4.0
4.0
N/A
CUDA CC ver.
2.0
2.0
N/A
Table 1: Specification of experimental environment

To benchmark NAMD, apoa1 dataset is used in experiment. This dataset contains 92,000 atoms and simulates 500 steps. Among several parameters in input file, outputEnergies set to 100 as suggested in user manual to remove unnecessary CPU involvement for generating additional output in file.

2.Results

In this section, performance of NAMD has been analyzed in three aspects: Scalability, speedup, and GPU utilization.

A.Scalability

The ratio of CPU cores and GPU is 12:1 on single GPU node and 12:2 on multi GPU node. Each process of NAMD occupy one CPU core and 1 GPU. However, GPU can be shared by multiple process. Therefore, it is possible that GPU can be oversubscribed by too many processes and become bottleneck in certain configuration.
Figure 1

Since each process on NAMD maps to single CPU cores, number of cores means same as number of process in graph. CPU version of NAMD scales well up to 12 cores on Westmere node in Figure 1. However, GPU version of NAMD shows best performance around 4-6 cores. On 1 GPU nodes, all processes invoke kernel on same GPU concurrently, so it breaks scalability when there is 12 processes.

B.Performance

Figure 2 shows relative speedup of GPU version compared to CPU only version. Suppose the performance of CPU only version with 1 core is 1x. With 1 CPU core and 1 GPU, NAMD runs 7 times faster than CPU only version. With 2 CPU cores and 1 GPU, NAMD runs 13 times faster. While CPU version scales up well until 12 cores, GPU version hits the peak around 6 cores and starts falling.
Figure 2
C.GPU Utilization 

1)Single GPU environment 

Figure 3 shows how utilization of single GPU changes as the number of CPU core increases. With 1 CPU core, GPU uses only 30% of its capability. In other words, 70% of GPU is not doing anything and just stay idle. However, if there are 2 CPU cores, GPU is shared by these 2 cores and overall utilization has been doubled up to around 60%. Likewise, utilization goes over 90% when 4 CPU cores share GPU.
Figure 3
It turned out that higher GPU utilization directly affects on application’s performance as shown in Figure 2. With 2 CPU cores, NAMD runs twice faster than 1 CPU core.

2)Multi GPU environment

Among two graphs in Figure 4, top one shows utilization of first GPU and bottom one shows utilization of second GPU. The fact that two graphs look very similar each other represents NAMD distributes workload evenly to multiple GPU for computation. With 2 CPU cores, utilization of each GPU stays around 40%. With 4 CPU cores, utilization of each GPU goes up over 60%.
Figure 4
IV Conclusions

GPU application shows different performance charac-teristics from CPU application. Scalability can be affected by the ratio of CPU core and GPU. Oversubscribing GPU can limit the scalability of application. On multi GPU node, the overall speedup is affected by how application distribute workload to each GPU device. At the same time, under-subscribing GPU device, such as workload is not big enough to keep all GPU devices busy, is also limit the performance of GPU application.


Performance of Parallel Migrate-n on Linux cluster


Introduction

Migrate estimates effective population sizes and past migration rates between n population assuming a migration matrix model with asymmetric migration rates and different subpopulation sizes[1].

Experimental environment

In this experiment, the performance of serial version, which runs on single core, is measured as a basis of comparison. Then, the number of core is increased to 8, 16, 32, and 64. Each compute node has two quad core 2.8Ghz Intel Xeon X5560 processors based on Nehalem architecture and 24GB DDR3 DRAM with 1333Ghz clock cycle[2].

Since each compute node has 8 cores, 8 compute nodes are required for 64 core job. parmfile.testml is used as an input file for experiment. The only change made to it for this experiment is that value of menu parameter is changed to ‘NO’ to be able to run it in batch mode on EOS and HYDRA[3][4][5].

Results

Migrate keeps improving its performance up to 32 cores on both HYDRA and EOS[Fig.1].
[Figure 1]

However, with 64 cores, performance starts falling on HYDRA[Fig.2] and it does not perform better than 32 cores on EOS[Fig.3].
[Figure 2]

[Figure 3]

On EOS, 8 cores show 5 times faster than 1 core. There is almost no speedup between 32 and 64 cores. On HYDRA, 8 cores bring 4.5 times speedup compared to 1 core. 32 cores show best performance and 64 cores are a little bit slower than 32 cores.
[Figure 4]

Migrate runs more than twice faster on EOS than on HYDRA in general. With 8 cores EOS is about 3 times faster than HYDRA[Fig.4].

References

1.http://popgen.sc.fsu.edu/Migrate/Info.html
2.http://sc.tamu.edu/systems/eos/hardware.php
3.http://sc.tamu.edu/help/eos/batch/
4.http://sc.tamu.edu/help/hydra/batch.php
5.http://sc.tamu.edu/systems/hydra/hardware.php

Friday, September 16, 2011

Is GPU good for large vector addition?

Introduction

With more than one GPU programming interfaces available, there is a great interest in converting regular program to GPU program with two frequently asked questions.

  1. How much performance improvement can we expect from it?
  2. Which programming interface is better than the others?

In this post, the performance CPU and GPU are compared for vector addition and matrix multiplication, which are widely used building blocks for scientific application.

Additionally, the performance of OpenCL, CUDA, and PGI Accelerator on NVIDIA's M2050GPU are analyzed to compare different GPU programming interfaces.

1. Is GPU faster than CPU?

No - for vector addition

Yes - for matrix multiplication

Figure 1 shows 'elapsed time' for vector addition on CPU and GPU. CPU is about 4 times faster than GPU in this experiment.



Figure 1. CPU is faster than GPU for vector addition

Figure 2. Elapsed time per function on GPU

Large vectors need to be copied from CPU memory to GPU memory through relatively slow PCIe bus and it overshadows the higher computational capability of GPU. Figure 2 tells us that most of time on GPU are used to copy data.

However, for matrix multiplication, GPU is 200 times or more faster than CPU. Computing intensive calculation such as matrix multiplication is best candidate for GPU.

Figure 3. GPU is extremely faster than CPU for matrix multiplication

Most of times are consumed for computations and data transfer time is almost negligible for matrix multiplication on GPU.

Figure 4. Elapsed time per function on GPU

2. Is OpenCL better than CUDA?

Yes for compatibility

??? for performance

CUDA is about 5% faster than OpenCL for matrix multiplication in this experiment on NVIDIA M2050 platform.

Conclusion

To be able to get maximum performance out of NVIDIA GPU, try to take advantage of CUBLAS as much as possible. If there is any chance that your program need to be running on different GPU platform such as AMD or INTEL, etc, then use OpenCL for compatibility.

If you are interested in full article about this experiment, check here.





Monday, September 5, 2011

2011 Fall Short Course, Introduction to Unix/Linux

Fall 2011 class starts on Sep. 5 through Sep. 8 at Teague B013.

Date : Sep. 5(Monday) ~ Sep. 8(Thursday).
Time : 3PM ~ 5PM
Location : Teague B013

This short course will be held in computer room and each attendee will be able to access computer for hands-on lab session. It is highly recommended for all attendee to get your EOS login ID ready before the first day of class. EOS login ID is required for hands-on lab.

Introduction to Linux is a short course specifically designed for beginner to Linux/Unix system. It will cover basic concept of Linux and frequently used commands.

Basic
  • What is Linux/Unix?
  • File and Directory
  • Edit text file
  • Setup environment
  • Remote access
Advanced
  • Process,Signal
  • I/O redirection,Pipe
  • Alias
  • Permission
  • Kernel & Shell

For detail,

https://sites.google.com/site/tamulinux/introduction-to-linux

Please, post a comment or feedback about this class. It would be helpful to improve the class.

Thanks you.


Sunday, August 28, 2011

Setting up dual monitor on Ubuntu 11.04 with NVIDIA

Several users have been complaining that global menu bar is disappeared after setting up dual monitor on Ubuntu 11.04. It happened to me as well. Finally, I found an alternative setting which is not exactly what I wanted, but, it gives me minimum functionality that I can accept.

Run 'NVIDIA X Server Settings' and make it look like these setup, then restart gdm.

Primary monitor
Secondary monitor

With this setting, you will get these features,

(1) dual monitors
(2) move windows across dual monitors
(3) global menu bar on each monitor*

I don't like (3). What I wanted is having dual monitor with global menu bar on primary monitor only. But, I couldn't find a solution for that.

After using this setup several days, I found out that having global menu bar on secondary monitor is actually very useful or must have feature. Global menu bar becomes application menu bar once application is launched on secondary monitor. So application menu bar resides close to application itself. Otherwise, you will end up with application menu bar on primary monitor and application on secondary monitor.

Let me know if you have better solution about this.

Thanks.