0% found this document useful (0 votes)
211 views602 pages

Folks Talk

Uploaded by

mahima patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views602 pages

Folks Talk

Uploaded by

mahima patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

FOLKSTALK

SSH Command Examples - Unix / Linux Tutorials

SSH client utility in unix or linux server is used to logging into a remote host
and execute commands
on the remote machine. The rlogin and rsh commands can also be used to login into
the remote
machine. However these are not secure. The ssh command provides a secure connection
between
two hosts over a insecure network.
The syntax ssh command is

ssh [-l username] hostname | user@remote-hostname [command]

Let see the examples of ssh command.


SSH Command Examples:
1. Logging to a remote server
You can login to a remote server from the local host as shown below:

localhost:[~]> ssh -l username remote-server

username@remote-server password:

remote-server:[~]>

Alternatively you can use the below ssh command for connecting to remote host:

localhost:[~]> ssh username@remote-server

username@remote-server password:

remote-server:[~]>

Note: If you are logging for the first time, then it will prints a message that
host key not found and
you can give yes to continue. The host key of the remote server will be cached and
added to the
.ssh2/hostkeys directory in your home directory. From second time onwards you just
need to enter
the password.
2. Logging out from remote server
Simply enter the exit command on the terminal to close the connection. This is
shown below:

remote-server:[~]>exit

logout

Connection to remote-server closed.

localhost:[~]>

3. Running remote commands from local host


Sometimes it is necessary to run the unix commands on the remote server from the
local host. An
example is shown below:

localhost:[~]> ssh user@remote-host "ls test"

[Link]

[Link]

[Link]

The ssh command connects to the remote host, runs the ls command, prints the output
on the local
host terminal and exits the connection from remote host.
Let see whether the ls command actually displayed the correct result or not by
connecting to the
remote host.

localhost:[~]> ssh user@remote-host

user@remotehost password:

remotehost:[~]> cd test

remotehost:[~/test]> ls

[Link]

[Link]

[Link]
4. Version of the SSH command
We can find the version of SSH installed on the unix system using the -V option to
the ssh. This is
shown below:

> ssh -V

OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008

5. Debugging the SSH Client


When we are not able to connect to the remote host, it is good to debug and find
the exact error
messages that causing the issue. Use the -v option for debugging the ssh client.

ssh -v user@remote-host

OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: Applying options for *

debug1: Connecting to remote-host [[Link]] port 22.

debug1: Connection established.

debug1: identity file /home/user/.ssh/identity type -1

debug1: identity file /home/user/.ssh/id_rsa type -1

debug1: identity file /home/user/.ssh/id_dsa type 2

debug1: loaded 3 keys

..........

..........

6. Copying files between remote host and local host.


We can use the scp command to copy the files securely between the local host and
remote host
using the ssh authentication.
To copy the file from local host to remote hosts /var/tmp/ directory, run the below
scp command.
scp filename user@remote-host:/var/tmp/

To copy the file from remote hosts /usr/local/bin/ directory to local hosts current
directory, run the
below scp command.

scp user@remote-host:/usr/local/bin/[Link] .

WC Command Examples - Count of Lines, Words, Characters - Unix / Linux

WC command in unix or linux is used to find the number of lines, words and
characters in a file. The
syntax of wc command is shown below:

wc [options] filenames

You can use the following options with the wc command.

-l : Prints the number of lines in a file.

-w : prints the number of words in a file.

-c : Displays the count of bytes in a file.

-m : prints the count of characters from a file.

-L : prints only the length of the longest line in a file.

Let see how to use the wc command with few examples. Create the following file in
your unix or linux
operating system.

> cat unix_wc.bat

Oracle Storage

unix distributed system

linux file server

debian server
Oracle backup server

WC Command Examples:
1. Printing count of lines
This is the most commonly used operation to find the number of lines from a file.
Run the below
command to display the number of lines:

wc -l unix_wc.bat

5 unix_wc.bat

Here in the output, the first field indicates count and second field is the
filename
2. Displaying the number of words.
Just use the -w option to find the count of words in a file. This is shown below:

wc -w unix_wc.bat

13 unix_wc.bat

3. Print count of bytes, count of characters from a file


We can use the -c and -m options to find the number of bytes and characters
respectively in a file.

> wc -c unix_wc.bat

92 unix_wc.bat

> wc -m unix_wc.bat

92 unix_wc.bat

4. Print the length of longest line


The -L option is used to print the number of characters in the longest line from a
file.

wc -L unix_wc.bat
23 unix_wc.bat

In this example, the second line is the longest line with 23 characters.
5. Print count of lines, words and characters.
If you dont specify any option to the wc command, by default it prints the count of
lines, words and
characters. This is shown below:

wc unix_wc.bat

5 13 92 unix_wc.bat

6. Wc help
For any help on the wc command, just run the wc --help on the unix terminal.

SCP Command Examples - Linux / Unix Tutorials

SCP stands for secure copy is used to copy data (files or directories) from one
unix or linux system
to another unix or linux server. SCP uses secured shell (ssh) to transfer the data
between the
remote hosts. The features of SCP are:

. Copies files within in the same machine


. Copies files from local machine to remote machine.
. Copies files from remote machine to local machine.
. Copies files between two different remote servers.

SCP Command Syntax:


The syntax of SCP command is

scp [Options] [[User@]From_Host:]Source_File


[[User@]To_Host:][Destination_File]

Each element of the scp command is explained in detail below:

. User is the one who have the permissions to access the files and directories.
User should
have read permissions if it is a source and write permissions if it is the
destination.
. From_Host: hostname or Ip address where the source file or directory resides.
This is
optional if the from host is the host where you are running the scp command.
. Source_File: Files or directories to be copied to the destination.
. To_Host: Destination host where you want to copy the files. You can omit this
when you
want to copy the files to the host where you are issuing the scp command.
. Destination_File: Name of the file or directory in the target host.

SCP Command Options:


The important SCP command options are listed below:

. -r : Recursively copies the contents of source files or directories.


. -p : Preserves the access time, modification time, permissions of the source
files in the
destination.
. -q : Progress bar in not displayed
. -v : verbose mode. Displays debugging messages.
. -P : copy files using the specified port number.

SCP Command Examples:


Let see the examples of scp command in unix or linux system.
1. Copying with in the same system
You can use the scp command just like the cp command to copy files from one
directory to another
directory.

scp [Link] /var/tmp/

This command copies the file [Link] from current directory to the
/var/tmp directory.
2. Copy file from local host to remote server
This is most frequently used operation to transfer files in unix system.

scp filename user@remotehost:/remote/directory/

This command connects to the remote host and copies the specified file to the
/remote/directory/.
3. Copy files from remote host to local server.
This operation is used when taking backup of the files in remote server.
scp user@remotehost:/usr/backup/oracle_backup.dat .

This command copies the oracle backup file in the remote host to the current
directory.
4. Copying files between two remote servers
The scp command can also be used to copy files between two remote hosts.

scp source_user@source_remote_host:/usr/bin/mysql_backup.sh
target_user@target_remote_host:/var/tmp/

The above command copies the mysql bakup shell script from the source remote host
the /var/tmp
directory of target remote host.
5. Copying a directory.
To copy all the files in a directory, use the -r option with the scp command. This
makes the scp
command to copy the directory recursively.

scp -r directory user@remotehost:/var/tmp/

The above command copies the directory from local server to the remote host.
6. Improving performance of scp command
By default the scp command uses the Triple-DES cipher/AES-128 to encrypt the data.
Using the
blowfish or arcfour encryption will improve the performance of the scp command.

scp -c blowfish filename user@remoteserver:/var/

scp -c arcfour localfile user@remoteserver:/var/

7. Limit bandwidth
You can limit the bandwidth used by the scp command using the -l option.

scp -l bandwidth_limit filename user@hostname:/usr/backup/


Here bandwidth_limit is numeric to be specified in kilobits per second.

8. Specifying the port number


We can make the scp command to copy the files over a specified port number using
the -P option.

scp -P 6001 storage_backup.bat username@hostname:/tmp/

xargs command examples in Unix / Linux Tutorial

Xargs command in unix or linux operating system is used to pass the output of one
command as an
argument to another command. Some of the unix or linux commands like ls and find
produces a long
list of filenames. We want to do some operation on this list of file names like
searching for a pattern,
removing and renaming files etc. The xargs command provide this capability by
taking the huge list
of arguments as input , divides the list into small chunks and then passes them as
arguments to
other unix commands.
Unix Xargs Command Examples:
1. Renaming files with xargs
We have to first list the files to be renamed either by using the ls or find
command and then pipe the
output to xargs command to rename the files. First list the files which end with
".log" using the ls
command.

ls *.log

[Link] [Link]

> ls *.log | xargs -i mv {} {}_bkp

> ls *_bkp

oracle.log_bkp storage.log_bkp

You can see how the log files are renamed with backup (bkp) suffix. Here the option
"i" tells the
xargs command to replace the {} with the each file returned by the ls command.
2. Searching for a pattern
We can combine the grep command with xargs to search for a pattern in a list of
files returned by
another unix command (ls or find). Let�s list out all the bash files in the current
directory with the find
command in unix.

find . -name "*.bash"

./sql_server.bash

./mysql_backup.bash

./oracle_backup.bash

Now we grep for the "echo" statements from the list of files returned by the find
command with the
help of xargs. The command is shown below:

find . -name "*.bash" |xargs grep "echo"

If you don�t use xargs and piped the output of find command to grep command
directly, then the grep
command treats each file returned by the find command as a line of string and
searches for the word
"echo" in that line rather in that file.
3. Removing files using xargs
We can remove the temporary files in a directory using the rm command along with
the xargs
command. This is shown below:

ls "*.tmp" | xargs rm

This removes all the files with ".tmp" suffix.


4. Converting Multi-line output to Single line output.
If you run the ls -1 command, it will list each file on a separate line. This is
shown below:

ls -1
[Link]

online_backup.dat

mysql_storage.bat

We can convert this multi-line output to single line output using the xargs
command. This is shown
below:

ls -1 | xargs

[Link] online_backup.dat mysql_storage.bat

5. Handling spaces in file names


By default the xargs command treats the space as a delimiter and sends each item as
an argument
to the unix command. If the file name contains a space (example: "oracle storage"),
then each item
will be treated as a separate file and will be passed as an argument. This will
cause an issue. Let
see how to handle the spaces in file names with an example.

Creating a file which contains space in the name

> touch "oracle storage"

> ls oracle\ storage | xargs grep "log"

grep: oracle: No such file or directory

grep: storage: No such file or directory

You can see that grep command is treating oracle as separate file and storage as
separate file. This
is because of xargs treats space as delimiter. To avoid this kind of errors use the
-i option with
braces as shown in below:

> ls oracle\ storage | xargs -i grep "log" {}

If you want to know what command the xargs is executing use the -t option with
xargs. This will print
the command on the terminal before executing it.
6. Passing subset of arguments
We can pass only a subset of arguments from a long list of arguments using the -n
option with xargs
command. This is shown in below.

> ls -1

backup

mysql

network

online

oracle

storage

wireless

> ls -1 | xargs -n 3 echo

backup mysql network

online oracle storage

wireless

You can see from the above output that 3 arguments are passed at a time to the echo
statement.
Important Notes on Xargs Command:
1. Xargs directly cannot handle files which contain new lines or spaces in their
names. To handle
this kind of files use the -i option with xargs command. Another way to handle
these characters is to
treat the new line or spaces as null characters using th -0 option with xargs.
However this requires
that the input to xargs should also use the null as separator. An example is shown
below

find . -print0 | xargs -0 rm


The print0 in find command makes the newline or space characters as null
separator.
2. By default the xargs uses the end of the file string as "_". If this string
appears in input string, then
xargs command stops reading the input and rest of the input is ignored. You can
change the end of
file string by using the "-eof" option.
3. To know more about xargs command, run the xargs --help on the unix or linux
terminal.

Generate Date / Time Dimension in Informatica

We use database procedures to generate the date dimension for data warehouse
applications. Here
i am going to show you how to generate the date dimension in informatica.
Let see how to generate list out all the days between two given dates using oracle
sql query.

SELECT to_date('01-JAN-2000','DD-MON-YYYY') + level-1 calendar_date

FROM dual

connect by level <=

to_date('31-DEC-2000','DD-MON-YYYY') -

to_date('01-JAN-2000','DD-MON-YYYY') + 1

);

Output:

CALENDAR_DATE

-------------

1/1/2000

1/2/2000

1/3/2000

.
.

12/31/2000

Now we can apply date functions on the Calendar date field and can derive the rest
of the columns
required in a date dimension.
We will see how to get the list of days between two given dates in informatica.
Follow the below
steps for creating the mapping in informatica.

. Create a source with two ports ( Start_Date and End_Date) in the source analyzer.

. Create a new mapping in the mapping designer Drag the source definition into the
mapping.
. Create the java transformation in active mode.
. Drag the ports of source qualifier transformation in to the java transformation.
. Now edit the java transformation by double clicking on the title bar and go to
the "Java Code"
tab. Here you will again find sub tabs. Go to the "Import Package" tab and enter
the below java
code:

import [Link];

import [Link];

import [Link];

import [Link];

. Not all these packages are required. However i included just in case if you want
to apply any
formatting on dates. Go to the "On Input Row" tab and enter the following java
code:

int num_days = (int) ((End_Date - Start_Date) / (1000 * 60 * 60 * 24));

long Start_Seconds = Start_Date;

for (int i=1; i <= num_days ; i++)

{
if (i == 1)

generateRow();

else

Start_Date = Start_Date + (1000 * 60 * 60 * 24);

generateRow();

Start_Date = Start_Date + (1000 * 60 * 60 * 24);

generateRow();

. Compile the java code by clicking on the compile. This will generate the java
class files.
. Connect only the Start_Date output port from java transformation to expression
transformation.
. Connect the Start_Date port from expression transformation to target and save the
mapping.
. Now create a workflow and session. Enter the following oracle sql query in the
Source SQL
Query option:

SELECT to_date('01-JAN-2000','DD-MON-YYYY') Start_Date,

to_date('31-DEC-2000','DD-MON-YYYY') End_Date

FROM DUAL;

Save the workflow and run. Now in the target you can see the list of dates loaded
between the two
given dates.
Note1: I have used relational table as my source. You can use a flat file instead.
Note2: In the expression transformation, create the additional output ports and
apply date functions
on the Start_Date to derive the data required for date dimension.

Bash Shell Script to Read / Parse Comma Separated (CSV) File - Unix / Linux

Q) How to parse CVS files and print the contents on the terminal using the bash
shell script in Unix
or Linux system?
It is the most common operation in Unix system to read the data from a delimited
file and applying
some operations on the data. Here we see how to read the Comma separated value
(CSV) file using
the while loop in shell script and print these values on the Unix terminal.
Consider the below CSV file as an example:

> cat os_server.csv

Unix, dedicated server

Linux, virtual server

This file contains two fields. First field is operating system and the second field
contains the hosting
server type. Let see how to parse this CVS file with simple bash script shown
below:

#!/usr/bin/bash

INPUT_FILE='unix_file.csv'

IFS=','

while read OS HS

do

echo "Operating system - $OS"


echo "Hosting server type - $HS"

done < $INPUT_FILE

Here IFS is the input field separator. As the file is comma delimited, the IFS
variable is set with
comma. The output of the above script is

Operating system - Unix

Hosting server type - dedicated server

Operating system - Linux

Hosting server type - virtual server

Here in the code, the fourth line (IFS=',') and sixth line (while) can be merged
into a single statement
as shown below:

while IFS=',' read OS HS

Informatica Architecture Tutorial - Version 8 / 9

Informatica is an ETL tool used for extracting the data from various sources (flat
files, relational
database, xml etc), transform the data and finally load the data into a centralized
location such as
data warehouse or operational data store. Informatica powercenter has a service
oriented
architecture that provides the ability to scale services and share resources across
multiple
machines.
The architectural diagram of informatica is shown below:
Informatica Architecture

Informatica Architecture Image

The important components of the informatica power center are listed below:
Domain: Domain is the primary unit for management and administration of services in
Powercenter.
The components of domain are one or more nodes, service manager an application
services.
Node: Node is logical representation of machine in a domain. A domain can have
multiple nodes.
Master gateway node is the one that hosts the domain. You can configure nodes to
run application
services like integration service or repository service. All requests from other
nodes go through the
master gateway node.
Service Manager: Service manager is for supporting the domain and the application
services. The
Service Manager runs on each node in the domain. The Service Manager starts and
runs the
application services on a machine.
Application services: Group of services which represents the informatica server
based
functionality. Application services include powercenter repository service,
integration service, Data
integration service, Metadata manage service etc.
Powercenter Repository: The metadata is store in a relational database. The tables
contain the
instructions to extract, transform and load data.
Powercenter Repository service: Accepts requests from the client to create and
modify the
metadata in the repository. It also accepts requests from the integration service
for metadata to run
workflows.
Powercenter Integration Service: The integration service extracts data from the
source, transforms
the data as per the instructions coded in the workflow and loads the data into the
targets.
Informatica Administrator: Web application used to administer the domain and
powercenter
security.
Metadata Manager Service: Runs the metadata manager web application. You can
analyze the
metadata from various metadata repositories.

Add Job to Cron (Crontab Command Examples) - Unix / Linux Tutorials

Unix or Linux operating system provides a feature for scheduling the jobs. You can
setup command
or scripts which will run periodically at the specified time. The Crontab is
command used to add or
remove jobs from the cron. The cron service is a daemon runs in the background and
checks for
/etc/crontab file, /etc/con.*/ directories and /var/spool/cron/ directory for any
scheduled jobs.
Each user has a separate /var/spool/cron/crontab file. Users are not allowed
directly to modify the
files. The crontab command is used for setting up the jobs in the cron.
The format of crontab command is

* * * * * command to be executed

You can easily remember this command in the below format

MI HH DOM MON DOW command

The field descriptions of the crontab are explained below:

MI : Minutes from 0 to 59

HH : Hours from 0 to 23

DOM : Day of month from 0 to 31

MON : Months from 1 to 12

DOW : Day of week from 0 to 7 (0 or 7 represents Sunday)


Command: Any command or script to be scheduled

Let see the usage of crontab command with examples.


1. List crontab entries
You can list out all the jobs which are already scheduled in cron. Use "crontab -l"
for listing the jobs.

crontab -l

0 0 * * * /usr/local/bin/list_unix_versions.sh

The above contab command displays the cron entries. Here the shell script for
listing the unix
versions (list_unix_version.sh) is scheduled to run daily at midnight.
2. List crontab entries of other users
To list the corntab entries of other user in the unix, use the -u option with
crontab. The syntax is
shown below:

crontab -u username -l

3. Removing all crontab entries


You can un-schedule all the jobs by removing them from the crontab. The syntax for
removing all the
crontab entries is

crontab -r

For removing other user�s crontab entries:

crontab -u username -r

4. Editing the crontab


You can edit the crontab and add a new job to it. You can also remove an existing
job from the
crontab. Use the -e option for editing the crontab.
crontab -e

For editing other user�s crontab entries:

crontab -u username -e

This will open a file in VI editor. Now use the VI commands for adding, removing
the jobs and for
saving the crontab entries.
5. Schedule a job to take oracle backup on every Sunday at midnight
Edit crontab using "crontab -e" and append the following entry in the file.

0 0 * * 0 /usr/local/bin/oracle_backup.sh

6. Schedule a job to run every six hours in a day


You can schedule a job to run more than once in a day. As an example the following
crontab entry
takes the mysql backup more than once in a day.

0 0,6,12,18 * * * /usr/bin/mysql_backup.sh

Here the list 0,6,12,18 indicates midnight, 6am, 12pm and 6pm respectively.
7. Schedule job to run for the first 15 days of the month.
You can schedule a job by specifying the range of values for a field. The following
example takes the
sql server backup daily at midnight for the first 15 days in a month.

0 0 * 1-15 * /usr/bin/sql_server_backup.sh

8. Schedule job to run every minute.


The following crontab command runs the command to send emails to group of users for
every
minute.

* * * * * /bin/batch_email_send.sh
[Link]
9. Taking backup of cron entries
Before editing the cron entries, it is good to take backup of the cron entries. So
that even if you do
mistake you can get back those entries from the backup.

crontab -l > /var/tmp/cron_backup.dat

10. Restoring the cron entries


You can restore the cron entries from the backup as

crontab cron_backup.dat

Understanding the Operators:


There are three operators allowed for specifying the scheduling times. They are:

. Asterisk (*) : Indicates all possible values for a field. An asterisk in the
month field indicates all
possible months (January to December).
. Comma (,) : Indicates list of values. See example 6 above.
. Hyphen (-): Indicates range of values. See example 7 above.

Disabling Emails:
By default the crontab sends emails to the local user if the commands or scripts
produce any output.
To disable sending of emails redirect the output of commands to /dev/null 2>&1.

0 0 * 20 * /usr/bin/online_backup.sh > /dev/null 2>&1

Note: you cannot schedule a job to run at second�s level as the minimum allowed
scheduling is at
minute level.

Email ThisBlogThis!Share to TwitterShare to Facebook

Labels: Unix

1 comment:

1.

Swathi04 July, 2012 02:44


The crontab command has the built in constant strings for specifying the scheduling
times. They are:
@reboot : Run job once after booting the unix or linux system.
@yearly : Run the job once in a year
@annually : Run once in a year
@monthly : Run once in a month
@weekly : Run once in a week
@daily : Run once in a day
@midnight : Run once in a day at midnight
@hourly : Run for every hour
Example:
@hourly /usr/local/bin/automate_script

Yum Command Examples - Install, Update, Uninstall - Linux Tutorials

Yum (Yellowdog Updater Modified) is one of the package manager utility in Linux
operating system.
Yum command is used for installing, updating and removing packages on Linux
environment. Some
other package manger utilities in linux system are apt-get, dpkg, rpm etc.
By default yum is installed on some of the linux distributions like CentOS, Fedora,
Redhat. Let see
some of the mostly used yum commands with examples.
1. Listing available packages
You can list all the available packages in the yum repository using the list.

yum list

yum list httpd

Display list of updated softwares

yum list updates

2. View installed packages


To print all the packages which are installed on your linux system, execute the
following command.

yum list installed

3. Search for package


Searching for a package to be installed helps you when the exact package name is
not known in
advance. The syntax for searching a package is

yum search package_to_be_searched

If you want to search for mysql package, then execute the following yum command.

yum search mysql

However this yum command matches only in the name and summary. Use "search all" for

everything.
4. How to install package using yum
"Yum install package_name" will install the specified package name in the linux
operating system.
The yum command will automatically finds the dependencies and also installs them in
the linux
machine.

yum install firefox.x86_64

Yum will prompt for the user to accept or decline before installing the package. If
you want yum to
avoid prompting the user, then use -y option with yum command.

yum -y install package_name

5. Check package is installed or not.


After installing, you don�t know whether the package is installed or not. To check
whether a package
is installed or not, run the below yum command.

yum list installed package_name

yum list installed firefox.x86_64

6. Updating existing package using yum


You can upgrade older version of package to the newer version by using the yum
command. To
check for the updates and for upgrading a package to the current version, run the
following yum
command.

yum update firefox.x86_64

7. Uninstalling a package.
You can remove (uninstall) a package with all dependencies using the "yum remove
package_name". This is shown below:

yum remove firefox.x86_64

The yum remove prompts the user to accept or decline the uninstalling of package.
8. Information about the package.
You can print and check for the information about the package before installing it.
Execute the
following yum command to get info about the package.

yum info firefox.x86_64

9. Print what package provides the file.


You can know the name of the package that file belongs to. For example, if you want
to know the file
'/etc/passwd' belongs to which file, run the below yum command.

yum whatprovides /etc/passwd

10. Print List of group software


Basically, the related softwares are grouped together. You can install all the
packages belong to a
single group at one shot. This will help you in saving time in installing each
individual package. To
print the list of available package groups, run the below yum command.

yum grouplist
11. Installing a software group.
You can install the group software, by running the following command.

yum groupinstall 'software_group_name'

12. Update a software group.


You can update the installed software group from a older version to latest version.
The yum
command for this is

yum groupupdate 'software_group_name'

13. Removing a software group


You can uninstall (delete) existing software group using 'yum groupremove'. This is
shown below:

yum groupremove 'software_group_name'

14. Print yum repository


You can display all the packages in the yum repository using the below command.

yum repolist [all]

Here "all" is optional. If you provide "all", then it displays enabled and disabled
repositories.
Otherwise it displays only enabled repositories.
15. More info about yum command
If you want to know more information about the yum command, then run the man on yum
as

man yum
Find and Remove Files Modified / accessed N days ago - Unix / Linux

Q) How to find the files which were modified or accessed N or more days ago and
then delete those
files using the unix or linux command?
Searching for the files which were modified (or accessed) 10 or more days ago is
common operation
especially when you want to archive or remove older log files. Let see this with
the help of an
example.
Consider the below list of files in the current directory:

ls -l

total 24

-rw-r--r-- 1 user group 67 Jul 2 01:39 unix_temporary

-rw-r--r-- 1 user group 238 Jul 2 03:00 linux_command.xml

-rw-r--r-- 1 user group 74 Jun 28 00:43 [Link]

-rw-r--r-- 1 user group 74 Jun 20 00:43 [Link]

Let see the todays date in my unix Operating system by issuing the date command.
Here i am
providing this date just for reference to N days.

date

Mon Jul 2 [Link] PDT 2012

We can use the find command for searching the files modified N or more days ago.
The find
command for this is:

find . -mtime +N

As an example, lets list out the files modified 5 days ago. The unix command for
this is:

find . -mtime +5

./[Link]
./[Link]

We got the list of files. Next we have to delete these files. We have to use the rm
command in unix
for removing the files. One way of removing the files is piping the output of find
command to xargs.
This is shown below:

find . -mtime +5 | xargs rm

The find command itself has the capability of executing the commands on the files
it listed. We have
to use the exec option in the find command. The complete find command for deleting
the files
modified N days ago is

find . -mtime +5 -exec rm {} \;

Note: To remove the files based on access time use the -atime in the find command.

Replace string on Nth Line - Unix / Linux

Q) How to search for a string (or pattern) in a file and replace that matched
string with another string
only on the specified line number in the file?
Let see this with the help of an example. Consider the sample file with the
following content:

> cat fedora_overview.dat

Fedora is an operating system based on linux kernal.

Fedora applications include LibreOffice, Empathy and GIMP.

Security features of Fedora is Security-Enhanced Linux.

Implements variety of security policies, access controls.

SELinux operating system was introduces in Fedora Core 2.


First we will see how to replace a pattern with another patter using sed command.
The sed
command for replacing the string "Fedora" with "BSD" is:

sed 's/Fedora/BSD/' fedora_oveview.dat

The above sed command will replace the string on all the matched lines. In this
example, it replaces
the string on first, second, third and fifth lines.
If we want to replace the pattern on a specific line number, then we have to
specify the line number
to the sed command. The sed command syntax for replacing the pattern on Nth line
is:

sed 'n s/search_pattern/replace_pattern/' filename

To replace "Fedora" with "BSD" on second line, run the below sed command on the
unix terminal:

sed '2 s/Fedora/BSD/' fedora_oveview.dat

Delete Range of Lines - Unix / Linux

Q) How to delete range of lines from a file using unix or inux command?
Unix provides simple way to delete lines whose number are line numbers are between
m and n. This
feature is not directly available in windows operating system.
We can use the sed command for removing the lines. The syntax for removing range of
lines
(between m and n) is:

sed 'm,nd' filename

Here the number n should be greater than m. Let see this with an example. Consider
the sample file
with the following contents:

> cat linux_range.dat

Linux is just like unix operating system.


Linux is leading os on servers like mainframes and super computers.

Linux also runs on embeded systems like network routers, mobiles etc.

Android is built on using linux kernal system.

Variants of linux are debian, fedora and open SUSE.

From the above file if we want to delete the lines from number 2 to 4, then run the
below sed
command in unix:

sed '2,4d' linux_range.dat

However this command just prints the lines on the terminal and did not remove from
the file. To
delete the lines from the source file itself use the -i option to the sed command.

sed -i '2,4d' linux_range.dat

You can negate this operation and can delete lines that are not in the specified
range.. This is shown
in the following sed command:

sed -i '2,4!d' linux_range.dat

Remove Last Line or Footer Line - Unix / Linux

Q) How to delete the trailer line (first line) from a file using the unix or inux
command?
Let see how to remove the last line from a file with an example. Consider the file
with sample content
as shown below:

> cat unix_file.txt

Unix is a multitasking, multi-user operating system.

Unix operating system was first developed in assembly language.

In later periods unix is developed with C programming.


In academics BSD variant of unix is used.

Mostly used unix flavors are Solaris, HP-UX and AIX.

Unix Sed command is popularly used for searching a pattern and then replacing the
matched pattern
with another string. However we can also use the sed command for deleting the lines
from a file.
To remove the last line from a file, run the below sed command:

sed '$d' unix_file.txt

Here $ represents the last line in a file. d is for deleting the line. The above
command will display the
contents of the file on the unix terminal excluding the footer line. However it
does not delete the line
from the source file. If you want the line to be removed from the source file
itself, then use the -i
option with sed command. This command is shown below:

sed -i '$d' unix_file.txt

If you want only the footer line to be present in the file and remove other lines
from the line, then you
have to negate the delete operation. For this use the exclamation (!) before the d.
This is shown in
the following sed command:

sed -i '$!d' unix_file.txt

Delete First Line or Header Line - Unix / Linux

Q) How to remove the header line (first line) from a file using the unix or inux
command?
Let see how to delete the first line from a file with an example. Consider the file
with sample content
as shown below:

> cat linux_file.txt

First line a file is called header line.

Remaining lines are generally called as detail or data lines.


Another detail line which is in the third row.

The last line in a unix file is called footer line.

Mostly we see the sed command for replacing the strings in a file. We can also use
the sed
command for removing the lines in a file. To delete the first line from a file, run
the following sed
command:

sed '1d' linux_file.txt

Here 1 represents the first line. d is for deleting the line. The above command
will print the contents
of the file on the unix terminal by removing the first line from the file. However
it does not remove the
line from the source file. If you want to changes in the source file itself, then
use the -i option with
sed command. this command is shown below:

sed -i '1d' linux_file.txt

You can keep only the first line and remove the remaining lines from the file by
negating the above
sed command. You have to use the exclamation (!) before the d command. The
following sed
command keeps only the frist line in the file and removes the other lines:

sed -i '1!d' unix_file.txt

Print Lines Ending with String - Unix / Linux

Q) How to display the lines from a file that ends with specified string ( or
pattern) using unix or linux
commands?
Printing the lines that ends with specified pattern on the terminal is most
commonly used operation
in unix environment. Grep is the frequently used command in unix for searching a
pattern in a file
and printing the lines that contains the specified pattern.
We will see how to print the lines that end with the specified pattern with an
example. Consider the
Sample log file data as an example, which is shown below:

> cat unix_os_install.dat


How to install unix virtual machine on windows operating system

First download virtual box or vmware and install it on windows

Next place the ubuntu CD and follow the instructions in vmware

For running ubuntu allocate at least 512MB RAM in vmware

After installing, start the unix operating system

Now if we want to print the lines that end with the string "vmware", then use the
grep command with
dollar ($) in the pattern. The complete unix grep command is shown below:

grep "vmware$" unix_os_install.dat

Here $ is used to indicate the end of the line in a file. This grep command prints
the lines that end
with the word "vmware". In this example, the third and fourth lines are printed on
the unix terminal.
If you want to display the lines that end with the word "system", then run the
following grep command
on unix command prompt:

grep "windows$" unix_os_install.dat

Search (match) for Whole Words in File - Unix / Linux

Q) How to print the lines from a file that contain the specified whole word using
unix or linux
command.
Whole words are complete words which are not part of another string. As an example
consider the
sentence, "How to initialize shell". Here the words "how, to, shell, initialize"
are whole words.
However the word "initial" is not a whole word as it is part of another string
(initialize).
Let see this in detail with the help of an example. Consider the following sample
data in a file:

> cat unix_word.txt

matching for whole words in file is easy with unix

use the unix grep command match for a pattern.


Another example of whole word is: boy's

Here boy is a whole word.

Now we have the sample file. First we will see how to search for a word and print
the lines with the
help of grep command in unix. The following grep command prints the lines that have
the word
"match" in the line:

grep "match" unix_word.txt

The above command displays the first two lines on the unix terminal. Even though
the first line does
not contain the whole word "match", the grep command displays the line as it
matches for the word
in the string "matching". This is the default behavior of grep command.
To print only the lines that contain the whole words, you have to use the -w option
to the grep
command. The grep command for this is:

grep -w "match" unix_word.txt

Now the above command only displays the second line on the unix terminal. Another
example for
matching the whole word �boy� is shown below:

Grep -w "boy" unix_word.txt

Print Non Matching Lines (Inverse of Grep Command) - Unix / Linux

Q) How to print the lines from a file that does not contain the specified pattern
using unix / linux
command?
The grep command in unix by default prints the lines from a file that contain the
specified pattern.
We can use the same grep command to display the lines that do not contain the
specified pattern.
Let see this with the help of an example.
Consider the following sample file as an example:

> cat unix_practice.txt

You can practice unix commands by installing the unix operating system.
There so many unix flavors available in the market.

ubuntu is one of the used operating system. it is available for free.

Go to the ubuntu website and download the OS image.

Alternatively you can order for a free CD.

The question is how to start learning unix.

First know about the unix operating system in detail.

Then start slowly learning unix commands one by one.

Practice these unix command daily to have a grip.

First we will see how to display the lines that match a specified pattern. To print
the lines that contain
the word "ubuntu", run the below grep command on unix terminal:

grep "ubuntu" unix_practice.txt

The above command displays the third and fourth lines from the above sample file.
Now we will see how to print non matching lines which means the lines that do not
contain the
specified pattern. Use the -v option to the grep command for inverse matching. This
is shown below:

grep -v "ubuntu" unix_practice.txt

This command prints the first, second and fifth lines from the example file.

Print Lines Starting with String - Unix / Linux

Q) How to print the lines from a file that starts with specified string (pattern)
using unix or linux
commands?
Displaying lines that starts with specified pattern is most commonly used when
processing log files in
unix environment. Log files are used to store the messages of shell scripts (echo
statements). We
can search for errors in the log file using grep command. Generally, the error
keyword will appear at
the start of the line.
Sample log file data is shown below:

> cat linux_log_file.dat

Success: Shell script execution started.

Success: Exported the environment variables in shell environment.

Success: Able to connect to oracle DB from bash script.

Success: Run the SQL statement and inserted rows into table.

Error: Unable to process the stored procedure.

Message: Processing the statements in shell scripts stopped due to Error.

Message: Script failed. Aborting the bash script to avoid further errors.

Now if we want to get the lines that start with the string "Error", then use the
grep command with
anchor (^) in the pattern. The complete unix grep command is shown below:

grep "^Error" linux_log_file.dat

Here ^ is used to specify the start of the line in a file. This grep command will
displays the lines that
start with the word "Error". In this example, the third line starts with the
specified pattern (Error).

Find the Count of Lines Matching the Pattern- Unix /Linux

Q) How to print the count of number of lines from a file that match the specified
pattern in unix or
linux operating system?
Let say you are looking for the word "unix" in a file and want to display the count
of lines that contain
the word "unix". We will see how to find the count with an example.
Assume that i have a file (unix_sample.dat) in my unix operating system. Sample
data from the file is
shown below:

> cat unix_sample.dat


Monitoring your hosting on unix server is very important.

Otherwise you don�t know whether the unix server is running fine or not.

Use monitoring tools or email alerts to get the status of the unix server.

Send the status, logs in the email to your email Id.

In the sample data, the word "unix" appears in two lines. Now we will print this
count on unix terminal
using the commands in unix.
1. Using wc command.
We can pipe the output of grep command to wc command to find the number of lines
that match a
pattern. The unix command is

grep "unix" unix_sample.dat | wc -l

2. Using grep -c option


The grep command has the -c option to find the count of lines that match a pattern.
The grep
command for this is

grep -c "unix" unix_sample.dat

Remove Empty Lines from a File in Unix / Linux

Unix commands can be used to remove the empty (blank) lines from a file. Let see
this with the help
of an example.
Consider the following data file as an example:

> cat linux_hosting.dat

Hosting a website on linux operating system

helps your site guarding from viruses. Host website on a


dedicated linux server to reduce the load on the server.

This improves the performance and provides good uptime of the site.

The above sample file contains three empty lines. We will see how to remove these
blank lines with
the help of unix / linux commands.
1. Remove empty lines with Grep command.
The grep command can be used to delete the blank lines from a file. The command is
shown below:

grep -v "^$" linux_hosting.dat

Here ^ specifies the start of the line and $ specifies the end of the line. The -v
option inverses the
match of the grep command.
2. Delete blank lines with Sed command
The sed command can also be used to remove the empty lines from the file. This
command is
shown below:

sed '/^$/d' linux_hosting.dat

The difference between the sed and grep here is: Sed command removes the empty
lines from the
file and prints on the unix terminal. Where as grep command matches the non-empty
lines and
displays on the terminal.

Print N Lines After a Pattern Match - Unix / Linux

The unix grep command can be used to print the lines from a file that match
specified pattern. The
grep command has an option for printing the lines around the line that match the
pattern. Here we
will see how to display N line after a matching line with the help of an example.
Consider the below data file:
> cat linux_enterpise.dat

The advantage of linux operating system is its flexibility.

You can use linux in various systems from mobile phones to space crafts.

You have to choose a best linux operating system.

Things involved in choosing are hardware, scale of software program etc.

Some Os are ubuntu, fedora, linux mint, puppy linux etc.

First of all we will see the general syntax for displaying N lines after the
matched line. The syntax of
grep command is:

grep -A N "pattern" filename

This grep command will display N lines after the matched line and also prints the
matched line. Now
we will try to display the 2 lines after the line that contains the word
"flexibility". The grep command
for this is

grep -A 2 "flexibility" linux_enterpise.dat

This grep command will print first, second and third lines from the above file.

Print Lines before a Pattern Match - Unix / Linux

We know how to use the unix grep command to display the lines from a file that
match a pattern.
However we can use the grep command to display the lines around the line that match
the pattern.
We will see this with the help of an example.
Consider the below data file which talks about importance of online backup:

> cat online_backup.dat

The most important concern is to keep your documents safe and secure
in a protected place. There are so many companies which offer online

backup services. However selecting a good online backup service is

important. The companies should offer a free trial of backup service.

Use a backup software to take backup and restore of your data.

First, we will see the general syntax for displaying N lines before the matched
line. The syntax of
grep command is:

grep -B N "pattern" filename

This grep command will display N lines before the matched line and also prints the
matched line.
Now we will try to print the 2 lines before the line that contains the word
"important". The grep
command for this is

grep -B 2 "important" online_backup.dat

This will display second, third and fourth lines from the above file.

Grep String in Multiple Files - Unix / Linux

The Unix Grep command is used to search for a pattern in a line from a file and if
it founds the
pattern displays the line on the terminal. We can also use the grep command to
match for a pattern
in multiple files.
We will see this with the help of an example. Let�s consider two files shown below:

> cat webhost_online.dat

There are so many web hosting companies

which provides services for hosting a website

> cat webhost_trail.dat

You can go for trail before choosing a web hosting company.


Once you are happy with the free web hosting trail

then you can host your website there.

To grep for the word "hosting" from these two files specify both the file names as
space separated
list in grep command. The complete command is

grep hosting webhost_online.dat webhost_trail.dat

The output of the above grep command is

webhost_online.dat:There are so many web hosting companies

webhost_online.dat:which provides services for hosting a website

webhost_trail.dat:You can go for trail before choosing a web hosting company.

webhost_trail.dat:Once you are happy with the free web hosting trail

This will display the filename along with the matching line. Instead of specifying
each file name, you
can specify a pattern (regular expressions) for the filename. Let say you want to
grep for the word
"company" in all the files whose name starts with "webhost_", you can use the below
grep command:

grep hosting webhost_*

Case insensitive Grep Command - Unix / Linux

Q) How to make the grep command case in-sensitive and search for a pattern in a
file?
Let see how to do this with an example. Consider the below "Car insurance" data
file:

> cat car_insurance.dat

Tom bought a new car and confused about car insurance quotes.

He is worrying which car insurance policy he should take.


So tom went to an insurance company and asks for clear explanation of the
policies.

The insurance guy then explains about various policies in detail.

Now TOM gets an idea and chooses the right insurance for him.

In the above file, you can see the name "tom" appears in different cases (upper
case, lower case
and mixed case).
If I want to display the lines that contain the pattern tom with ordinary grep
command, it will display
only the third line. The grep command is shown below:

grep tom car_insurance.dat

To make this grep command case insensitive use the -i option to the command. Now it
will display
the first, third and fifth lines from the file. The case in sensitive grep command
is

grep -i tom car_insurance.dat

How to Send Mail From Shell Script

We write automated scripts to perform scheduled tasks and put them in crontab.
These automated
scripts run at their scheduled times. However we don�t know whether the scripts are
succeeded or
not. So sending an email from automated bash scripts in unix host helps us to know
whether the
script is succeeded or not.
Here we will see simple bash script to send emails using the mail command in linux
operating
system.

#!/bin/bash

TO_ADDRESS="recipient@[Link]"

FROM_ADDRESS="sender"

SUBJECT="Mail Server Hosting Demo"


BODY="This is a linux mail system. Linux is one of the email operating
systems which can be used to send and receive emails."

echo ${BODY}| mail -s ${SUBJECT} ${TO_ADDRESS} -- -r ${FROM_ADDRESS}

From the name of the variables you can easily understand the significance of each.
In the mail
command -s represents the subject. Here for the address by default the logged in
unix / linux
hostname is used as the sent address. For example if you have logged into unix host
which is
"[Link]" and specified the from address as "test". Then your complete from
address will be
"test@[Link]".
In the above bash script we specified the body from a file and did not specified
any attachments. We
will enhance the above script to attach files, to read body from a file and
specifying a list of users in
CC. The enhanced mail script is shown below:

#!/bin/bash

TO_ADDRESS="recipient@[Link]"

FROM_ADDRESS="sender"

SUBJECT="linux mail send attachment example"

BODY_FILE="[Link]"

ATTACHMENT_FILE="[Link]"

CC_LIST="user1@[Link];user2@[Link];user3@[Link];user4@cheetahmail.
com"

uuencode ${ATTACHMENT_FILE} | mail -s ${SUBJECT} -c ${CC_LIST} ${TO_ADDRESS}


-- -r ${FROM_ADDRESS} < ${BODY_FILE}

The uuencode is used to attach files using the mail command. Here -c option in mail
command is
used to specify the list of users in cc list.
Connect to Oracle Database in Unix Shell script

Q) How to connect to oracle database and run sql queries using a unix shell script?

The first thing you have to do to connect to oracle database in unix machine is to
install oracle
database drivers on the unix box. Once you installed, test whether you are able to
connect to the
database from command prompt or not. If you are able to connect to the database,
then everything
is going fine.
Here i am not going to discuss about how to install oracle database drivers. I am
just providing the
shell script which can be used to connect to the database and run sql statements.
The following Shell script connects to the scott schema of the oracle database and
writes the
database to the "[Link]" file.

#!/bin/bash

LogDirectory='/var/tmp/logs'

DataDirectory='/var/tmp/data'

DBUSER='scott'

DBUSERPASSWORD='tiger'

DB='oracle'

sqlplus -s <<EOF > ${LogDirectory}/[Link]

${DBUSER}/${DBUSERPASSWORD}@${MYDB}

set linesize 32767

set feedback off

set heading off

select * from dual;


EOF

If the sql statements are failed to run, then the errors are written to the same
"[Link]" file. A better
solution is to write the sql statements output to one file and the errors to
another file. The below
script uses the spooling concept in oracle to write to data to another file:

#!/bin/bash

LogDirectory='/var/tmp/logs'

DataDirectory='/var/tmp/data'

DBUSER='scott'

DBUSERPASSWORD='tiger'

DB='oracle'

sqlplus -s <<EOF > ${LogDirectory}/[Link]

${DBUSER}/${DBUSERPASSWORD}@${MYDB}

set linesize 32767

set feedback off

set heading off

spool ${DataDirectory}/query_output.dat

SELECT * from dual

spool off

EOF
Here the output of the select statement is written to the "query_output.dat" file.

Delete all lines in VI / VIM editor - Unix / Linux

Q) How to delete all the lines in a file when opened in a VI editor or VIM editor?
Those who are new to unix will use the dd to delete each and every line to empty
the file. There is an
easy way to delete all the lines in a file when opened in an Editor.
Follow the below steps to empty a file:

. Go to command mode in the editor by pressing ESC key on the keyboard.


. Press gg. It will take to the first line of the file.
. Then press dG. This will delete from the first line to the last line.

See how simple it is to remove all the lines in a file. We will see how to empty
the file when not
opened in an editor. In unix /dev/null is any empty stream, you can use that to
empty a file. The
following commands shows how to empty a file

cat /dev/null > file

How to Read Lines using loops in Shell Scripting

Q) How to read each line from a file using loops in bash scripting?
Reading lines from files and then processing on each line is a basic operation in
shell scripting. We
will see here how to read each line from a file using for and while loop in bash
scripting.
Read Line using While Loop:
The below bash script reads line from the file, "[Link]", using while loop and
prints the line on the
terminal:

#!/bin/bash

i=1

while read LINE

do
echo $i $LINE

i=`expr $i+ 1`

done < [Link]

Here the variable i is just used to represent the line number.


Read Line using For Loop:
The following shell script reads line using for loop from the file [Link]:

#!/usr/bin/bash

n=1

for y in `cat [Link]`

do

echo $n $y

n=`expr $n+ 1`

done

Examples of Arrays in Awk Command - Unix / Linux

Awk command in unix has rich set of features. One of the feature is it can store
the elements in
arrays and can process the data in the elements. Here we will see how to use arrays
in awk
command with examples.
Examples of Arrays in Awk Command:
1. Finding the sum of values
I want to find the sum of values in the first column of all the lines and display
it on the unix or linux
terminal. Let say my file has the below data:

> cat [Link]

10
20

30

After summing up all the values, the output should be 60. The awk command to sum
the values
without using the arrays is shown below:

awk 'BEGIN {sum=0} {sum=sum+$1} END {print sum}' text

Here i have used a variable to store the sum of values. At the end after summing up
all the values,
the sum is printed on the terminal.
The awk command to find the sum of values by using arrays is shown below:

awk '{arr[NR]=arr[NR-1]+$1} END {print arr[NR]}' text

Here an array is used to store the sum of values. Basically this array will store
the cumulative sum of
values, at the end it contains the total and it is displayed on the terminal.
2. Ranking values in a file.
Let say I have a source file which contains the employees data. This file has three
fields first field is
department_id, second one is employee name and third one is salary. Sample data
from the file is
shown below:

> cat [Link]

10, AAA, 6000

10, BBB, 8000

10, CCC, 6000

20, DDD, 4000

20, EEE, 2000

20, FFF, 7000


Now i want to assign ranks to the employees in each department based on their
salary. The output
should look as

20, FFF, 7000, 1

20, DDD, 4000, 2

20, EEE, 2000, 3

10, BBB, 8000, 1

10, AAA, 6000, 2

10, CCC, 6000, 2

Here Employees AAA and CCC got same rank as their salaries are same.
To solve this problem first we have to sort the data and then pipe it to awk
command. The complete
command is shown below:

sort -nr -k1 -k3 text |

awk -F"," '{

department_array[NR]=$1;

salary_array[NR]=$3;

if (department_array[NR] != department_array[NR-1])

rank_array[NR]=1;

else if (salary_array[NR] == salary_array[NR-1] )

rank_array[NR] =rank_array[NR-1];
}

else

rank_array[NR] = rank_array[NR-1]+1;

print department_array[NR]","$2","salary_array[NR]","rank_array[NR];

}'

For readability purpose the above command is written in multiple lines. You have to
write the above
command in single line to make it work in unix.

Create Tunnel in Unix using Putty

Q) How to create tunnel to access network resources (Internet) in a remote unix


machine using putty
client?
You might have faced situations where you want to open a website from your browser
and the
website URL is blocked by your company. Especially this happens in software
companies. Here i will
show you how to open a blocked website by creating a tunnel.
The software�s required to create a tunnel are:

. Putty client tool


. Mozilla firefox browser
. Access to remote unix server. This server should be capable of opening any
website.

Creating Tunnle in Unix


Follow the below steps to create a tunnel

. Open the putty client tool. Enter the remote unix hostname in the "Host Name (or
IP address)".
In this demo i have entered the hostname as "[Link]".
. To save this hostname, enter a name like "Tunnel" in the "Saved Sessions" place.
This is shown
in the below image:
[Link]
tunnel_putty1.jpg
[Link]
tunnel_putty2.jpg

. On the left side of the client, you can see a navigation panel. Go to SSH->
Tunnels.
. Again enter the remote hostname ([Link]) in "Destination" section.
. Enter the source port as 1100 (any value you prefer) and check the Dynamic
option. This is
shown below:

. Now click on Add. Go back to the previous window by clicking on the Session in
the left side
pan. Here clik on save. I will save your tunnel details.
. Open this tunnel and enter your remote machine login details. Do not close this
unix session. If
you close it, your tunneling won�t work.
[Link]
mozilla_firefox_tunnel.jpg
. Open the Mozilla fire fox browser. Go to Tools->Options->Advanced->Network-
>Settings.
. In the settings, Check the manual proxy configurations, enter the Socket host as
localhost and
port as 1100 (Same port which is specified in tunnel configuration) and click on
Ok. This is
shown in below image.

Now you can open any website with this approach provided your remote host has
access.

VIM Editor - Save and Quit Options

VIM is a powerful editor in unix or linux. The VIM editor got so many features.
Here we will see the
options for saving and quitting from the vim editor.
The following options works in command mode of VIM editor. To go to the command
mode press
ESC key on the keyboard and then type the below commands:

. :w ->Saves the contents of the file without exiting from the VIM editor
. :wq ->Saves the text in the file and then exits from the editor
. :w filename -> Saves the contents of the opened file in the specified filename.
However it won�t
save the contents of the current file.
. :x -> Saves changes to the current file and then exits. Similar to the :wq
. :m,nw filename -> Here m and n are numbers. This option will write the lines from
the specified
numbers m and n to the mentioned filename.
. :q -> Exits from the current file only if you did not do any changes to the file.

. :q! -> Exits from the current file and ignores any changes that you made to the
file.

Dirname Command Examples in Unix / Linux

The unix dirname command strips non-directory suffix from a file name.
The syntax of dirname command is
dirname NAME

The dirname command removes the trailing / component from the NAME and prints the
remaining
portion. If the NAME does not contain / component then it prints '.' (means current
directory).
Dirname command is useful when dealing with directory paths in unix or linux
operating systems.
Some examples on dirname command are shown below:
Dirname Command Examples:
1. Remove the file name from absolute path.
Let say my directory path is /usr/local/bin/[Link]. Now i want to remove /[Link]
and display only
/usr/local/bin, then we can use the dirname command.

> dirname /usr/local/bin/[Link]

/usr/local/bin

2. dirname [Link]
Here you can see that the NAME does not contain the / component. In this case the
dirname
produces '.' as the output.

> dirname [Link]

Note: The directories and filename which i have passed as arguments to dirname
command in the
above examples are just strings. There is no need of these directories or files to
exist in the unix
machine.

Split Command Examples in Unix / Linux

The Split command in unix or linux operating system splits a file into many pieces
(multiple files). We
can split a file based on the number of lines or bytes. We will see how to use the
split command with
an example.
As an example, let�s take the below text file as the source file which we want to
split:
> cat textfile

unix linux os

windows mac os

linux environment

There are three lines in that file and the size of the file is 47 bytes.
Split Command Examples:
1. Splitting file on number of lines.
The Split command has an option -l to split the file based on the number of lines.
Let say i want to
split the text file with number of lines in each file as 2. The split command for
this is

split -l2 textfile

The new files created are xaa and xab. Always the newly created (partitioned) file
names start with
x. We will see the contents of these files by doing a cat operation.

> cat xaa

unix linux os

windows mac os

> cat xab

linux environment

As there only three lines in the source file we got only one line in the last
created file.
2. Splitting file on the number of bytes
We can use the -b option to specify the number of bytes that each partitioned file
should contains.
As an example we will split the source files on 10 bytes as
split -b10 textfile

The files created are xaa, xab, xac, xad, xae. The first four files contain 10
bytes and the last file
contains 7 bytes as the source file size is 47 bytes.
3. Changing the newly created file names from character sequences to numeric
sequences.
So far we have seen that the newly created file names are created in character
sequences like xaa,
Xab and so on. We can change this to numeric sequence by using the -d option as

split -l2 -d textfile

The names of the new files created are x00 and x01.
4. Changing the number of digits in the sequence of filenames.
In the above example, you can observe that the sequences have two digits (00 and
01) in the file
names. You can change the number of digits in the sequence by using the -a option
as

split -l2 -d -a3 textfile

Now the files created are x000 and x001

Swap Fields (strings) in a File - Unix / Linux

First we will see how to swap two strings in a line and then we will see how to
swap two columns in a
file.
As an example, consider the text file with below data:

unix linux os

windows mac os

Swap Strings using Sed command:


Let see how to swap the words unix and linux using sed command in unix or linux
environment. The
sed command to swap the strings is shown below:
> sed 's/\(unix\) \(linux\)/\2 \1/' textfile

linux unix os

windows mac os

The parentheses are used to remember the pattern. \1 indicates first pattern and \2
indicates second
pattern.
Swap Fields using Awk command:
From the above file structure, we can observe that the file is in format of rows
and columns where
the columns are delimited by space.
Awk command can be used to process delimited files. Awk command to swap the first
two fields in a
file is

> awk '{$0=$2." "$1" "$3; print $0}' textfile

linux unix os

mac windows os

Another way using awk is

awk '{print $2" "$1" "$3}' textfile

How to replace braces symbols in Unix / Linux

Q) My log file contains the braces symbols '(' and ')'. I would like to replace the
braces with empty
string. Sample data in the log file is shown below:

> cat logfile

Error - (unix script failed)

The output should not contain the braces and the data should look as
Error - unix script failed

How can i achieve this using unix or linux commands?


Solution:
1. Replacing using tr command
We can use the tr command to delete characters in a file. The deleting of strings
using tr command
is shown below:

tr -d '()' < logfile

2. Replacing using sed command Sed command is popularly used for replacing the text
in a file with
another text. The sed command is

sed 's/[()]//g' logfile

Another way of replacing is using sed with pipes in unix:

sed 's/(//' logfile| sed 's/)//'

Split File Data into Multiple Files - Unix / Linux

Q) I have file with 10000 lines in unix or linux operating system. I want to split
this file and create 10
files such that each file has 1000 lines. What I mean is the first 100 lines should
go into one file; next
100 lines should go into another file and so on. How to do this using unix
commands.
Solution:
Unix has the split command which can be used to partition the data in a file into
multiple files. The
command to split a file based on the number of lines is shown below:

split -l 1000 filename


The above split command splits the file such that each file has 1000 lines. Here
the option l indicates
the number of lines. You can split the file based on number of bytes using the -b
option.

split -b 1024 filename

By default, the partitioned filenames starts with x like xab, xac, xad and so on.
Instead of
alphabetical sequences, you can use numeric sequences in filenames like x01, x02
using the -d
option.

split -l 1000 -d filename

You can specify the number of digits to be used in the numeric sequences with the
help of -a option.

split -l 1000 -d -a 3 filename

Examples: Let say i have a text file with 4 lines. The data in the file is shown
below:

> cat textfile

unix is os

linux environment

centos

red hat linux

We will run the split command for each of the points discussed above and see what
files will be
created.

> split -l 2 textfile

Files: xaa, xab

> split -b 10 textfile


Files: xaa, xab, xac, xad, xae

> split -l 2 -d textfile

Files: x00, x01

> split -l 2 -d -a 3 textfile

Files: x000, x001

Remove Last character in String - Unix / Linux

Q) I have a file with bunch of lines. I want to remove the last character in each
line from that file.
How can i achieve this in unix or linux environment.
Solution:
1. SED command to remove last character
You can use the sed command to delete the last character from a text. The sed
command is

sed s/.$// filename

2. Bash script
The below bash script can be used to remove the last character in a file.

#! /bin/bash

while read LINE

do

echo ${LINE%?}

done < filename


3. Using Awk command We can use the built-in functions length and substr of awk
command to
delete the last character in a text.

awk '{$0=substr($0,1,length($0)-1); print $0}' filename

4. Using rev and cut command We can use the combination of reverse and cut command
to remove
the last character. The command is shown below:

rev [Link] | cut -c2- |rev

Convert Multiple Rows into Single Row - Unix/Linux

Q) I have a products data in the text file. The data in the file look as shown
below:

> cat [Link]

iphone

samsung

nokia

yahoo

google

aol

amazon

ebay

walmart

Now my requirement is to group each 3 consecutive rows into a single row and
produce a comma
separated list of products. The output should look as

iphone,samsung,nokia

yahoo,google,aol
amazon,ebay,walmart

I want this to be implemented using unix or linux commands in different ways?


Solution:
1. One way we can implement this is using the awk command. The complete awk command
is
shown below:

awk '{printf("%s%s",$0,NR%3?",":"\n")}' [Link]

2. Another way is using the paste command. The solution using the paste command is
shown below.

paste -d, - - - < [Link]

Awk Command to Split Column into Row - Unix/Linux

Awk command to split list data in a column into multiple rows - Unix/Linux
Q) I have a flat file in the unix or linux environment. The data in the flat file
looks as below

> cat [Link]

Mark Maths,Physics,Chemistry

Chris Social

Henry Biology, Science

The flat file contains the list of subjects that were taken by the students in
their curriculum. I want the
subjects list in each column to be splitted into multiple rows. After splitting the
data in the target
should look as:

Mark Maths

Mark Physics
Mark Chemistry

Chris Social

Henry Science

Henry Science

Write a command in UNIX or LINUx operating system to produce the result?


Solution:
We can use the AWK command which can process the files with table like structures.
The solution to
the problem using Awk command is

awk '{n=split($2,s,",");for (i=1;i<=n;i++) {$2=s[i];print}}' [Link]

Search and Grep for text in unix/linux

We will see how to search for files and then grep for a string of text in those
files. First i will use the
find command in unix or linux to search for the regular files in the current
directory. The grep
command to search for the normal files in the current directory is shown below:

> find . -type f

./docs/[Link]

./[Link]

./sample

Now we will grep for a particular word in these files and display only the
filenames that has the
matching word. The unix command is shown below:

> find . -type f -exec grep -l word {} \;


The above command just displays the filenames that has the specified word. Now we
will try to
display the lines from the files that have the matching word. The unix command for
this is:

> find . -type f -exec grep -l word {} \; -exec grep word {} \;

If you want to put space between the results of the above command, display the line
using echo. The
complete unix command is

> find . -type f -exec grep -l word {} \; -exec grep word {} \; -exec echo \;

The above example shows how to use multiple grep�s with the find command in unix or
linux.

Range Partitioning Examples - Oracle

Range partition is a partitioning technique where the ranges of data are stored on
separate sub-
tables.
MAXVALUE is offered as a catch-all values that exceed the specified ranges. Note
that NULL values
are treated as greater than all other values except MAXVALUE.
Range Partitioning Examples:
1. Range Partition on numeric values

Create table sales

sale_id number,

product_id number,

price number

PARTITION BY RANGE(sale_id) (

partition s1 values less than (10000) tablespace ts1,


partition s2 values less than (3000) tablespace ts2,

partition s3 values less than (MAXVALUE) tablespace ts3

);

2. Range Partition on Strings

Create table products

product_id number,

product_name varchar2(30),

category varchar2(30)

PARTITION BY RANGE(category) (

partition c1 values less than ('I') tablespace ts1,

partition c2 values less than ('S') tablespace ts2,

partition c3 values less than (MAXVALUE) tablespace ts3

);

3. Range Partition on Dates

Create table orders

order_id number,

order_date date

PARTITION BY RANGE(order_date) (
partition o1 values less than (to_date('01-01-2010,'DD-MM-YYYY'))
tablespace ts1,

partition o2 values less than (to_date('01-01-2011,'DD-MM-YYYY'))


tablespace ts2,

partition o3 values less than (to_date('01-01-2012,'DD-MM-YYYY'))


tablespace ts3,

partition o4 values less than (MAXVALUE) tablespace ts3

);

Tuning Lookup Transformation - Informatica

Q) How to tune lookup transformation to improve the performance of the mapping?


This is a frequently asked question in informatica interview. Follow the below
steps to tune a lookup
transformation:
Cache the lookup transformation: This will query the lookup source once and stores
the data in
the cache. Whenever a row enters the lookup, the lookup retrieves the data from the
lookup source
rather than querying the lookup source again. This will improve the performance of
lookup a lot.
Restrict Order by columns: By default, the integration orders by on all ports in
the lookup
transformation. Override this default order by clause to include few ports in the
lookup.
Persistent Cache: If your lookup source is not going change at all (example:
countries, zip codes).
Use persistent cache in this case.
Prefer Static Cache over Dynamic Cache: If you use dynamic cache, the lookup may
update the
cache. Updating the lookup cache is overhead. Avoid dynamic cache.
Restrict Number of lookup ports: Make sure that you include only the required ports
in the lookup
transformation. Unnecessary ports in the lookup make the lookup to take time in
querying the lookup
source, building the lookup cache.
Sort the flat file lookups: If the lookup source is a flat file, using the sorted
input option improves
the performance.
Indexing the columns: If you have used any columns in the where clause, creating
any index (in
case of relational lookups) on these columns improves the performance of querying
the lookup
source.
Database level tuning: For relational lookups you can improve the performance by
doing some
tuning at database level.

Lookup Transformation is Active - Informatica

One of the changes that made in informatica version 9 was making the lookup
transformation as
active transformation. The lookup transformation can return all the matching rows.
When creating the lookup transformation itself you have to specify whether the
lookup
transformation returns multiple rows or not. Once you make the lookup
transformation as active
transformation, you cannot change it back to passive transformation. The "Lookup
Policy on Multiple
Match" property value will become "Use All Values". This property becomes read-only
and you
cannot change this property.
As an example, for each country you can configure the lookup transformation to
return all the states
in that country. You can cache the lookup table to improve performance. If you
configure the lookup
transformation for caching, the integration service caches all the rows form the
lookup source. The
integration service caches all rows for a lookup key by the key index.
Guidelines for Returning Multiple Rows:
Follow the below guidelines when you configure the lookup transformation to return
multiple rows:

. You can cache all the rows from the lookup source for cached lookups.
. You can customize the SQL Override for both cached and uncache lookup that return
multiple
rows.
. You cannot use dynamic cache for Lookup transformation that returns multiple
rows.
. You cannot return multiple rows from an unconnected Lookup transformation.
. You can configure multiple Lookup transformations to share a named cache if the
Lookup
transformations have matching caching lookup on multiple match policies.
. Lookup transformation that returns multiple rows cannot share a cache with a
Lookup
transformation that returns one matching row for each input row.

Lookup Transformation Properties - Informatica

Lookup transformation has so many proprties which you can configure. Depending on
the lookup
source (flat file or relational lookup), you can configure the below properties of
lookup
transformation:
Lookup Transformation Properties:

Lookup Property

Lookup
Type

Description
Lookup SQL
Override

Relational

Override the default sql query generated by the lookup


transformation. Use this option when lookup cache is enabled.

Lookup Table
Name

Pipeline
Relational

You can choose a source, target or source qualifier as the lookup


table name. This is the lookup source which will be used to query or
cache the data.
If you have override the sql query, then you can ignore this option

Lookup Source
Filter

Relational

You can filter looking up in the cache based on the value of data in
the lookup ports. Works only when lookup cache is enabled.

Lookup Caching
Enabled

Flat File
Pipeline
Relational

When lookup cache is enabled, the integration service queries the


lookup source once and caches the entire data. Caching the lookup
source improves the performance. If the caching is disabled, the
integration service queries the lookup source for each row.
The integration service always caches the flat file and pipeline
lookups

Lookup Policy on
Multiple Match

Flat File
Pipeline
Relational

Which row to return when the lookup transformation finds multiple


rows that match the lookup condition.
Report Error: Reports error and does not return a row.
Use Last Value: Returns the last row that matches the lookup
condition.
Use All Values: Returns all matched rows.
Use Any Value: Returns the first value that matches the lookup
condition.

Lookup Condition
Flat File
Pipeline
Relational

You can define the lookup condition in the condition tab. The lookup
condition is displayed here.

Connection
Information

Relational

specifies the database that contains the lookup table.

Source Type

Flat File
Pipeline
Relational

Indicates the lookup source type: flat file or relational table or source
qualifier.

Tracing Level

Flat File
Pipeline
Relational

Set amount of detail to be included in the lookup

Lookup Cache

Flat File

Specifies the directory used to build the lookup cache files


Directory Name

Pipeline
Relational

Lookup Cache
Persistent

Flat File
Pipeline
Relational

Use when the lookup source data does not change at all. Examples:
zipcodes, countries, states etc.
The lookup caches the data once and it uses the cache even in
multiple session runs.

Lookup Data
Cache Size
Lookup Index
Cache Size

Flat File
Pipeline
Relational

Cache sizes of the lookup data and lookup index

Dynamic Lookup
Cache

Flat File
Pipeline
Relational

Indicates to use a dynamic lookup cache. Inserts or updates rows in


the lookup cache as it passes rows to the target table.

Output Old Value


On Update

Flat File
Pipeline
Relational

Use with dynamic caching enabled. When you enable this property,
the Integration Service outputs old values out of the lookup/output
ports. When the Integration Service updates a row in the cache, it
outputs the value that existed in the lookup cache before it updated
the row based on the input data. When the Integration Service
inserts a row in the cache, it outputs null values.

Update Dynamic
Cache Condition

Flat File
Pipeline
Relational
An expression that indicates whether to update dynamic cache.
Create an expression using lookup ports or input ports. The
expression can contain input values or values in the lookup cache.
The Integration Service updates the cache when the condition is true
and the data exists in the cache. Use with dynamic caching enabled.
Default is true.

Cache File Name


Prefix

Flat File
Pipeline
Relational

Use with persistent lookup cache. Specifies the file name prefix to
use with persistent lookup cache files.

Recache From
Lookup Source

Flat File
Pipeline
Relational

The integration service rebuilds the lookup cache.

Insert Else Update

Flat File
Pipeline
Relational

Use with dynamic caching enabled. Applies to rows entering the


Lookup transformation with the row type of insert.

Update Else Insert

Flat File

Use with dynamic caching enabled. Applies to rows entering the


Pipeline
Relational

Lookup transformation with the row type of update.

Datetime Format

Flat File

Specify the date format for the date fields in the file.

Thousand
Separator

Flat File

specify the thousand separator for the port.

Decimal Separator

Flat File

Specify the Decimal Separator for the port.

Case-Sensitive
String Comparison

Flat File

The Integration Service uses case sensitive string comparisons when


performing lookups on string columns.

Null Ordering

Flat File
Pipeline

Specifies how to sort null data.

Sorted Input

Flat File
Pipeline

Indicates whether the lookup source data is in sorted order or not.

Lookup Source is
Static

Flat File
Pipeline
Relational

The lookup source does not change in a session.

Pre-build Lookup
Cache
Flat File
Pipeline
Relational

Allows the Integration Service to build the lookup cache before the
Lookup transformation receives the data. The Integration Service can
build multiple lookup cache files at the same time to improve
performance.

Subsecond
Precision

Relational

Specifies the subsecond precision for datetime ports.

Creating Lookup Transformation - Informatica

The steps to create a lookup transformation are bit different when compared to
other
transformations. If you want to create a reusable lookup transformation, create it
in the
Transformation Developer. To create a non-reusable lookup transformation, create it
in the Mapping
Designer. Follow the below steps to create the lookup transformation.
1. Login to the Power center Designer. Open either Transformation Developer tab or
Mapping
Designer tab.
Click on the Transformation in the toolbar, and then click on Create.
[Link]
[Link]
2. Select the lookup transformation and enter a name for the transformation. Click
Create.
3. Now you will get a "Select Lookup Table" dialog box for selecting the lookup
source, choosing
active or passive option. This is shown in the below image:

4. You can choose one of the below option to import the lookup source definition:

. Source definition from the repository.


. Target definition from the repository.
. Source qualifier in the mapping (applicable only for non-reusable lookup
transformation)
. Import a relational or flat file definition as the lookup source.

5. In the same dialog box, you have an option to choose active or passive lookup
transformation.
You can see this option in red circle in the above image. To make the lookup
transformation as
active, check the option "Return All Values on Multiple Match". Do not check this
when creating a
passive lookup transformation. If you have created an active lookup transformation,
the value of the
property "Lookup policy on multiple match" will be "Use All Values". You cannot
change an active
lookup transformation back to a passive lookup transformation.
6. Click OK or Click Skip if you want to manually add ports to lookup
transformation.
7. For connected lookup transformation, add input and output ports.
8. For unconnected lookup transformation, create a return port for the value you
want to return from
the lookup.
9. Go to the properties and configure the lookup transformation properties.
10. For dynamic lookup transformation, you have to associate an input port, output
port or sequence
Id with each lookup port.
11. Go the condition tab and add the lookup condition.
Connected and Unconnected Lookup Transformation - Inforamtica

The lookup transformation can be used in both connected and unconnected mode. The
difference
between the connected and unconnected lookup transformations are listed in the
below table:

Connected Lookup Transformation

Unconnected Lookup Transformation

Receives input values directly from the upstream


transformations in the pipeline.

Receives input values from the :LKP expression


in another transformation such as expression
transformation.

You can use either static or dynamic cache.

You can use only static cache.

Lookup cache contains both the lookup condition


column and lookup source columns that are
output ports.

Lookup cache contains all lookup/output ports in


the lookup condition and the lookup/return port.

Returns multiple columns for the same row or


inserts into the dynamic lookup cache.

Returns only one column which is designated as


return port for each row.

When there is no match for the lookup condition,


the integration service returns the default values
for output ports. In case of dynamic cache, the
integration service inserts the row into the cache.

When there is no match for the lookup condition,


the integration service returns NULL value for the
return port.

When there is a match for the lookup condition,


the integration service returns all the output ports.
In case of dynamic cache, the integration service
either updates the row in the cache or leaves the
row unchanged.

When there is a match for the lookup condition,


the integration service returns the value from the
return port.

Passes multiple values to downstream


transformations.

Passes single output value to another


transformation.

Supports user-defined default values.

user defined default values are not supported.

Lookup Transformation Source Types in Informatica

First step when creating a lookup transformation is choosing the lookup source. You
can select a
relational table, flat file or a source qualifier as the lookup source.
Relational lookups:
When you want to use a relational table as a lookup source in the lookup
transformation, you have to
connect to the lookup source using a ODBC and import the table definition as the
structure for the
lookup transformation. You can use the below options for relational lookups:

. You can override the default sql query and write your own customized sql to add a
WHERE
clause or query multiple tables.
. You can sort null data based on the database support.
. You can perform case-sensitive comparison based on the database support.

Flat File lookups:


When you want to use a flat file as a lookup source in the lookup transformation,
select the flat file
definition in the repository or import the source when you create the
transformation. When you want
to import the flat file lookup source, the designer invokes the flat file wizard.
You can use the below
options for flat file lookups:

. You can use indirect files as lookup sources by configuring a file list as the
lookup file name.
. You can use sorted input for the lookup.
. You can sort null data high or low.
. You can use case-sensitive string comparison with flat file lookups.

Sorted Input for Flat File Lookups:


For flat file lookup source, you can improve the performance by sorting the flat
files on the columns
which are specified in the lookup condition. The condition columns in the lookup
transformation must
be treated as a group for sorting the flat file. Sort the flat file on the
condition columns for optimal
performance.

Lookup Transformation in Informatica

Lookup transformation is used to look up data in a flat file, relational table,


view or synonym. Lookup
is a passive/active transformation and can be used in both connected/unconnected
modes. From
informatica version 9 onwards lookup is an active transformation. The lookup
transformation can
return a single row or multiple rows.
You can import the definition of lookup from any flat file or relational database
or even from a source
qualifier. The integration service queries the lookup source based on the ports,
lookup condition and
returns the result to other transformations or target in the mapping.
The lookup transformation is used to perform the following tasks:
. Get a Related Value: You can get a value from the lookup table based on the
source value. As
an example, we can get the related value like city name for the zip code value.
. Get Multiple Values: You can get multiple rows from a lookup table. As an
example, get all the
states in a country.
. Perform Calculation. We can use the value from the lookup table and use it in
calculations.
. Update Slowly Changing Dimension tables: Lookup transformation can be used to
determine
whether a row exists in the target or not.
You can configure the lookup transformation in the following types of lookup:

. Flat File or Relational lookup: You can perform the lookup on the flat file or
relational
database. When you create a lookup using flat file as lookup source, the designer
invokes flat
file wizard. If you used relational table as lookup source, then you can connect to
the lookup
source using ODBC and import the table definition.
. Pipeline Lookup: You can perform lookup on application sources such as JMS, MSMQ
or SAP.
You have to drag the source into the mapping and associate the lookup
transformation with the
source qualifier. Improve the performance by configuring partitions to retrieve
source data for
the lookup cache.
. Connected or Unconnected lookup: A connected lookup receives source data,
performs a
lookup and returns data to the pipeline. An unconnected lookup is not connected to
source or
target or any other transformation. A transformation in the pipeline calls the
lookup
transformation with the :LKP expression. The unconnected lookup returns one column
to the
calling transformation.
. Cached or Uncached Lookup: You can improve the performance of the lookup by
caching the
lookup source. If you cache the lookup source, you can use a dynamic or static
cache. By
default, the lookup cache is static and the cache does not change during the
session. If you use
a dynamic cache, the integratiion service inserts or updates row in the cache. You
can lookup
values in the cache to determine if the values exist in the target, then you can
mark the row for
insert or update in the target.

TE_7073: Expecting Keys to be ascending - Informatica

Aggregator Transformation computes calculations on a group of rows and returns a


single row for
each group. We can improve the performance of aggregator transformation by sorting
the data on
the group by ports and then specifying the "Sorted Input" option in the aggregator
transformation
properties.
The aggregator transformation will throw the below error if you do not sort the
data and specified the
"Sorted Input" option in the properties of the aggregator transformation:

TE_7073: Expecting Keys to be ascending


This error is due to the data sorting issues in the mapping. This is quite obvious
as the aggregator
transformation expects the data to be in sorted order and however it gets the data
in unsorted order.
To avoid this error, simply follow the below steps:
Sort on Group By Ports:
Be sure to sort the data using a sorter transformation or source qualifier
transformation before
passing to the aggregator transformation.
The order of the ports is important while sorting the data. The order of the ports
that you specify in
the sorter transformation should be exactly same as the order of the ports
specified in "Group By"
ports of aggregator transformation. If the order of the ports does not match, then
you will get this
error.
Trim String Ports:
If you are using the string or varchar ports in the "Group By" of aggregator
transformation, then
remove the trailing, leading spaces in the expression transformation and then pass
to sorter
transformation to sort the data.
Avoid Transformations that Change Sorting Order:
Do not place transformations which change the sorting order before the aggregator
transformation.

Target update override - Informatica

When you used an update strategy transformation in the mapping or specified the
"Treat Source
Rows As" option as update, informatica integration service updates the row in the
target table
whenever there is match of primary key in the target table found.
The update strategy works only

. when there is primary key defined in the target definition.


. When you want update the target table based on the primary key.

What if you want to update the target table by a matching column other than the
primary key? In this
case the update strategy wont work. Informatica provides feature, "Target Update
Override", to
update even on the columns that are not primary key.
You can find the Target Update Override option in the target definition properties
tab. The syntax of
update statement to be specified in Target Update Override is

UDATE TARGET_TABLE_NAME

SET TARGET_COLUMN1 = :TU.TARGET_PORT1,

[Additional update columns]

WHERE TARGET_COLUMN = :TU.TARGET_PORT

AND [Additional conditions]


Here TU means target update and used to specify the target ports.
Example: Consider the employees table as an example. In the employees table, the
primary key is
employee_id. Let say we want to update the salary of the employees whose employee
name is
MARK. In this case we have to use the target update override. The update statement
to be specified
is

UPDATE EMPLOYEES

SET SALARY = :[Link]

WHERE EMPLOYEE_NAME = :TU.EMP_NAME

Update Strategy - Session Settings in Informatica

This post is continuation to my previous one on update strategy. Here we will see
the different
settings that we can configure for update strategy at session level.
Single Operation of All Rows:
We can specify a single operation for all the rows using the "Treat Sources Rows
As" setting in the
session properties tab. The different values you can specify for this option are:

. Insert: The integration service treats all the rows for insert operation. If
inserting a new row
violates the primary key or foreign key constraint in the database, then the
integration service
rejects the row.
. Delete: The integration service treats all the rows for delete operation and
deletes the
corresponding row in the target table. You must define a primary key constraint in
the target
definition.
. Update: The integration service treats all the rows for update operation and
updates the rows in
the target table that matches the primary key value. You must define a primary key
in the target
definition.
. Data Driven: An update strategy transformation must be used in the mapping. The
integration
service either inserts or updates or deletes a row in the target table based on the
logic coded in
the update strategy transformation. If you do not specify the data driven option
when you are
using a update strategy in the mapping, then the workflow manager displays a
warning. The
integration service does not follow the instructions in the update strategy
transformation.

Update Strategy Operations for each Target Table:


You can also specify the update strategy options for each target table
individually. Specify the
update strategy options for each target in the Transformations view on the Mapping
tab of the
session:

. Insert: Check this option to insert a row in the target table.


. Delete: Check this option to delete a row in the target table.
. Truncate Table: check this option to truncate the target table before loading the
data.
. Update as Update: Update the row in the target table.
. Update as Insert: Insert the row which is flagged as update.
. Update else Insert: If the row exists in the target table, then update the row.
Otherwise, insert
the row.

The below table illustrates how the data in target table is inserted or updated or
deleted for various
combinations of "Row Flagging" and "Settings of Individual Target Table".

Row Flagging
Type

Target Table
Settings

Result

Insert

Insert is specified

Source row is inserted into the target.

Insert

Insert option is not


specified

Source row is not inserted into the target

Delete

Delete option is
specified

If the row exists in target, then it will be deleted.

Delete

Delete option is not


specified

Even if the row exists in target, then it will not be deleted from
the target.

Update

Update as Update

If the row exists in target, then it will be updated.

Update
Insert is specified
Update as Insert is
specified

Even if the row is flagged as udpate, it will not be updated in


Target. Instead, the row will be inserted into the target.

Update

Insert is not
specified
Update as Insert is
Specified.

Neither update nor insertion of row happens

Update

Insert is specified
Update else Insert
is specified

If the row exists in target, then it will be updated. Otherwise it


will be inserted.

Update

Insert is not
specified
Update else Insert
is Specified

If the row exists in target, then it will be updated. Row will not
be inserted in case if it not exists in target.

Update Strategy Transformation in Informatica

Update strategy transformation is an active and connected transformation. Update


strategy
transformation is used to insert, update, and delete records in the target table.
It can also reject the
records without reaching the target table. When you design a target table, you need
to decide what
data should be stored in the target.
When you want to maintain a history or source in the target table, then for every
change in the
source record you want to insert a new record in the target table.
When you want an exact copy of source data to be maintained in the target table,
then if the source
data changes you have to update the corresponding records in the target.
The design of the target table decides how to handle the changes to existing rows.
In the
informatica, you can set the update strategy at two different levels:

. Session Level: Configuring at session level instructs the integration service to


either treat all
rows in the same way (Insert or update or delete) or use instructions coded in the
session
mapping to flag for different database operations.
. Mapping Level: Use update strategy transformation to flag rows for inert, update,
delete or
reject.

Flagging Rows in Mapping with Update Strategy:


You have to flag each row for inserting, updating, deleting or rejecting. The
constants and their
numeric equivalents for each database operation are listed below.

. DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.


. DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.
. DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.
. DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

The integration service treats any other numeric value as an insert.


Update Strategy Expression:
You have to flag rows by assigning the constant numeric values using the update
strategy
expression. The update strategy expression property is available in the properties
tab of the update
strategy transformation.
Each row is tested against the condition specified in the update strategy
expression and a constant
value is assigned to it. A sample expression is show below:

IIF(department_id=10, DD_UPDATE, DD_INSERT)

Mostly IIF and DECODE functions are used to test for a condition in update strategy
transformation.
Update Strategy and Lookup Transformations:
Update strategy transformation is used mostly with lookup transformation. The row
from the source
qualifier is compared with row from lookup transformation to determine whether it
is already exists or
a new record. Based on this comparison, the row is flagged to insert or update
using the update
strategy transformation.
Update Strategy and Aggregator Transformations:
If you place an update strategy before an aggregator transformation, the way the
aggregator
transformation performs aggregate calculations depends on the flagging of the row.
For example, if
you flag a row for delete and then later use the row to calculate the sum, then the
integration service
subtracts the value appearing in this row. If it�s flagged for insert, then the
aggregator adds its value
to the sum.
[Link]
s320/[Link]
Important Note:
Update strategy works only when we have a primary key on the target table. If there
is no primary
key available on the target table, then you have to specify a primary key in the
target definition in the
mapping for update strategy transformation to work.
Recommended Reading:
Update Strategy Session Level Settings

SQL Query Overwrite in Source Qualifier - Informatica

One of the properties of source qualifier transformation is "SQL Query" which can
be used to
overwrite the default query with our customized query. We can generate SQL queries
only for
relational sources. For flat files, all the properties of source qualifier
transformation will be disabled
state.
Here we will see how to generate the SQL query and the errors that we will get
while generating the
SQL query.
Error When Generating SQL query:
The most frequent error that we will get is "Cannot generate query because there
are no valid fields
projected from the Source Qualifier".
First we will see simulate this error and then we will see how to avoid this.
Follow the below steps for
simulating and fixing error:

. Create a new mapping and drag the relational source into it. For example drag the
customers
source definition into the mapping.

. Do not connect the source qualifier transformation to any of other


transformations or target.
. Edit the source qualifier and go to the properties tab and then open the SQL
Query Editor.
. Enter the ODBC data source name, user name, password and then click on Generate
SQL.
. Now we will get the error while generating the SQL query.
[Link]
s320/[Link]

. Informatica produces this error because the source qualifier transformation ports
are not
connected to any other transformations or target. Informatica just knows the
structure of the
source. However it doesn't know what columns to be read from source table. It will
know only
when the source qualifier is connected to downstream transformations or target.
. To avoid this error, connect the source qualifier transformation to downstream
transformation or
target.

Generating the SQL Query in Source Qualifier:


To explain this I am taking the customers table as the source. The source structure
looks as below

Create table Customers

Customer_Id Number,

Name Varchar2(30),

Email_Id Varchar2(30),

Phone Number

Follow the below steps to generate the SQL query in source qualifier
transformation.

. Create a new mapping and drag the customers relational source into the mapping.
. Now connect the source qualifier transformation to any other transformation or
target. Here I
have connected the SQ to expression transformation. This is shown in the below
image.
[Link]
s320/[Link]
[Link]
s320/[Link]
[Link]
s320/[Link]

. Edit the source qualifier transformation, go to the properties tab and then open
the editor of SQL
query.
. Enter the username, password, data source name and click on Generate SQL query.
Now the
SQL query will be generated. This is shown in the below image.

The SQL query generated is

SELECT Customers.Customer_Id,

[Link],

Customers.Email_Id,

[Link]

FROM Customers

Now we will do a small change to understand more about the "Generating SQL query".
Remove the
link (connection) between Name port of source qualifier and expression
transformation.
[Link]
s320/[Link]
Repeat the above steps to generate the SQL query and observe what SQL query will be
generated.

The SQL query generated in this case is

SELECT Customers.Customer_Id,

Customers.Email_Id,

[Link]

FROM Customers

The Name column is missing in the generated query. This means that whatever the
ports connected
from Source Qualifier transformation to other downstream transformations or target
will be included
in the SQL query and read from the database table.

Avoiding Sequence Generator Transformation in Informatica

Q) How to generate sequence numbers without using the sequence generator


transformation?
We use sequence generator transformation mostly in SCDs. Using a sequence generator

transformation to generate unique primary key values can cause performance issues
as an
additional transformation is required to process in mapping.
You can use expression transformation to generate surrogate keys in a dimensional
table. Here we
will see the logic on how to generate sequence numbers with expression
transformation.
Sequence Generator Reset Option:
When you use the reset option in a sequence generator transformation, the sequence
generator
uses the original value of Current Value to generate the numbers. The sequences
will always start
from the same number.
As an example, if the Current Value is 1 with reset option checked, then the
sequences will always
start from value 1 for multiple session runs. We will see how to implement this
reset option with
expression transformation.
Follow the below steps:

. Create a mapping parameter and call it as $$Current_Value. Assign the default


value to this
parameter, which is the start value of the sequence numbers.
. Now create an expression transformation and connect the source qualifier
transformation ports
to the expression transformation.
. In the expression transformation create the below additional ports and assign the
expressions:

v_seq (variable port) = IIF(v_seq>0,v_seq+1,$$Current_Value)

o_key (output port) = v_seq

. The v_seq port generates the numbers same as NEXTVAL port in sequence generator
transformation.

Primary Key Values Using Expression and Parameter:


We will see here how to generate the primary key values using the expression
transformation and a
parameter. Follow the below steps:

. Create a mapping to write the maximum value of primary key in the target to a
parameter file.
Assign the maximum value to the parameter ($$MAX_VAL) in this mapping. Create a
session
for this mapping. This should be the first session in the workflow.
. Create another mapping where you want to generate the sequence numbers. In this
mapping,
connect the required ports to the expression transformation, create the below
additional ports in
the expression transformation and assign the below expressions:

v_cnt (variable port) = v_cnt+1

v_seq (variable port) = IIF( ISNULL($$MAX_VAL) OR


$$MAX_VAL=0,1,v_cnt+$$MAX_VAL)
o_surrogate_key (output port) = v_seq
. The o_surrogate_key port generates the primary key values just as the sequence
generator
transformation.

Primary Key Values Using Expression and Lookup Transformations:


Follow the below steps to generate sequence numbers using expression and lookup
transformations.

. Create an unconnected lookup transformation and create only one return port in
the lookup.
Now overwrite the lookup query to get the maximum value of primary key from the
target. The
query looks as

SELECT MAX(primary_key_column) FROM Dimension_table

. Now create an expression transformation and connect the required ports to it. Now
we will call
the unconnected lookup transformation from this expression transformation. Create
the below
additional port in the expression transformation:

v_cnt (variable port) = v_cnt+1

v_max_val (variable port) = IIF(v_cnt=1, :LKP.lkp_trans(), v_max_val)

v_seq (variable port) = IIF(ISNULL(v_max_val) or v_max_val=0, 1,


v_cnt+v_max_val)

o_primary_key (output port) = v_seq

. The o_primary_key port generates the surrogate key values for the dimension
table.

Reusable VS Non Reusable & Properties of Sequence Generator Transformation

We will see the difference of reusable and non reusable sequence generator
transformation along
with the properties of the transformation.
Sequence Generator Transformation Properties:
You have to configure the following properties of a sequence generator
transformation:
Start Value:
Specify the Start Value when you configure the sequence generator transformation
for Cycle option.
If you configure the cycle, the integration service cycles back to this value when
it reaches the End
Value. Use Cycle to generate a repeating sequence numbers, such as numbers 1
through 12 to
correspond to the months in a year. To cycle the integration service through a
sequence:

. Enter the lowest value in the sequence to use for the Start Value.
. Enter the highest value to be used for End Value.
. Select Cycle option.

Increment By:
The Integration service generates sequence numbers based on the Current Value and
the Increment
By properties in the sequence generator transformation. Increment By is the integer
the integration
service adds to the existing value to create the new value in the sequence. The
default value of
Increment By is 1.
End Value:
End value is the maximum value that the integration service generates. If the
integration service
reaches the end value and the sequence generator is not configured for cycle
option, then the
session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
If the sequence generator is configured for cycle option, then the integration
service cycles back to
the start value and starts generating numbers from there.
Current Value:
The integration service uses the Current Value as the basis for generated values
for each session.
Specify the value in "Current Value" you want the integration service as a starting
value to generate
sequence numbers. If you want to cycle through a sequence of numbers, then the
current value
must be greater than or equal to the Start Value and less than the End Value.
At the end of the session, the integration service updates the current value to the
last generated
sequence number plus the Increment By value in the repository if the sequence
generator Number
of Cached Values is 0. When you open the mapping after a session run, the current
value displays
the last sequence value generated plus the Increment By value.
Reset:
The reset option is applicable only for non reusable sequence generator
transformation and it is
disabled for reusable sequence generator. If you select the Reset option, the
integration service
based on the original current value each time it starts the session. Otherwise the
integration service
updates the current value in the repository with last value generated plus the
increment By value.
Number of Cached Values:
The Number of Cached Values indicates the number of values that the integration
service caches at
one time. When this value is configured greater than zero, then the integration
service caches the
specified number of values and updates the current value in the repository.
Non Reusable Sequence Generator:
The default value of Number of Cached Values is zero for non reusable sequence
generators. It
means the integration service does not cache the values. The integration service,
accesses the
Current Value from the repository at the start of the session, generates the
sequence numbers, and
then updates the current value at the end of the session.
When you set the number of cached values greater than zero, the integration service
caches the
specified number of cached values and updates the current value in the repository.
Once the cached
values are used, then the integration service again accesses the current value from
repository,
caches the values and updates the repository. At the end of the session, the
integration service
discards any unused cached values.
For non-reusable sequence generator setting the Number of Cached Values greater
than zero can
increase the number of times the Integration Service accesses the repository during
the session.
And also discards unused cache values at the end of the session.
As an example when you set the Number of Cached Values to 100 and you want to
process only 70
records in a session. The integration service first caches 100 values and updates
the current value
with 101. As there are only 70 rows to be processed, only the first 70 sequence
number will be used
and the remaining 30 sequence numbers will be discarded. In the next run the
sequence numbers
starts from 101.
The disadvantage of having Number of Cached Values greater than zero are: 1)
Accessing the
repository multiple times during the session. 2) Discarding of unused cached
values, causing
discontinuous sequence numbers
Reusable Sequence Generators:
The default value of Number of Cached Values is 100 for reusable sequence
generators. When you
are using the reusable sequence generator in multiple sessions which run in
parallel, then specify
the Number of Cache Values greater than zero. This will avoid generating the same
sequence
numbers in multiple sessions.
If you increase the Number of Cached Values for reusable sequence generator
transformation, the
number of calls to the repository decreases. However there is chance of having
highly discarded
values. So, choose the Number of Cached values wisely.
Recommended Reading:
Sequence Generator Transformation
Sequence Generator Transformation in Infotmatica

Sequence generator transformation is an active and connected transformation. The


sequence
generator transformation is used for

. Generating unique primary key values.


. Replace missing primary keys
. Generate surrogate keys for dimension tables in SCDs.
. Cycle through a sequential range of numbers.

Creating Sequence Generator Transformation:


Follow the below steps to create a sequence generator transformation:

. Go to the mapping designer tab in power center designer.


. Click on the transformation in the toolbar and then on create.
. Select the sequence generator transformation. Enter the name and then click on
Create. Click
Done.
. Edit the sequence generator transformation, go to the properties tab and
configure the options.
. To generate sequence numbers, connect the NEXTVAL port to the transformations or
target in
the mapping.

Configuring Sequence Generator Transformation:


Configure the following properties of sequence generator transformation:

. Start Value: Specify the start value of the generated sequence that you want the
integration
service to use the cycle option. If you select cycle, the integration service
cycles back to this
value when it reaches the end value.
. Increment By: Difference between two consecutive values from the NEXTVAL port.
Default
value is 1. Maximum value you can specify is 2,147,483,647.
. End Value: Maximum sequence value the integration service generates. If the
integration
service reaches this value during the session and the sequence is not configured to
cycle, the
session fails. Maximum value is 9,223,372,036,854,775,807.
. Current Value: Current Value of the sequence. This value is used as the first
value in the
sequence. If cycle option is configured, then this value must be greater than or
equal to start
value and less than end value.
. Cycle: The integration service cycles through the sequence range.
. Number of Cached Values: Number of sequential values the integration service
caches at a
time. Use this option when multiple sessions use the same reusable generator.
Default value for
non-reusable sequence generator is 0 and reusable sequence generator is 1000.
Maximum
value is ,223,372,036,854,775,807.
. Reset: The integration service generate values based on the original current
value for each
session. Otherwise, the integration service updates the current value to reflect
the last-
generated value for the session plus one.
. Tracing level: The level of detail to be logged in the session log file.
[Link]
sequence_generator_transformation_properties.jpg

Sequence Generator Transformation Ports:


The sequence generator transformation contains only two output ports. They are
CURRVAL and
NEXTVAL output ports.
NEXTVAL Port:
You can connect the NEXTVAL port to multiple transformations to generate the unique
values for
each row in the transformation. The NEXTVAL port generates the sequence numbers
base on the
Current Value and Increment By properties. If the sequence generator is not
configure to Cycle, then
the NEXTVAL port generates the sequence numbers up to the configured End Value.
The sequence generator transformation generates a block of numbers at a time. Once
the block of
numbers is used then it generates the next block of sequence numbers. As an
example, let say you
connected the nextval port to two targets in a mapping, the integration service
generates a block of
numbers (eg:1 to 10) for the first target and then another block of numbers (eg:11
to 20) for the
second target.
If you want the same sequence values to be generated for more than one target, then
connect the
sequence generator to an expression transformation and connect the expression
transformation port
to the targets. Another option is create sequence generator transformation for each
target.
CURRVAL Port:
The CURRVAL is the NEXTVAL plus the Increment By value. You rarely connect the
CURRVAL port
to other transformations. When a row enters a transformation connected to the
CURRVAL port, the
integration service passes the NEXTVAL value plus the Increment By value. For
example, when you
configure the Current Value=1 and Increment By=1, then the integration service
generates the
following values for NEXTVAL and CURRVAL ports.

NEXTVAL CURRVAL
---------------

1 2

2 3

3 4

4 5

5 6

If you connect only the CURRVAL port without connecting the NEXTVAL port, then the
integration
service passes a constant value for each row.
Recommended Reading:
Reusable vs Non Reusable Sequence Generator

Load Variable Fields Flat File in Oracle Table

In one of my project, we got a requirement to load data from a varying fields flat
file into oracle table.
The complete requirements are mentioned below:
Requirement:

. Daily we will get a comma delimited flat file which contains the monthly wise
sales information of
products.
. The data in the flat file is in denormalized structure.
. The number of months in the flat file may vary from day to day.
. The header of the flat file contains the fields.

Let say today the structure of the flat file might look as

Product,Jan2012,Feb2012

A,100,200

B,500,300

The next day the flat file structure might vary in the number of months. However
the product field will
be the always be there in the first field of the flat file. The sample flat file
structure in the next day
looks as

Product,Jan2012,Feb2012,Mar2012

C,300,200,500

D,100,300,700

Now the problem is to load this flat file into the oracle table. The first thing is
designing the target
table. We designed a normalized target table and the structure of the table looks
as

Table Name: Product_Sales

Product, Month, Sales

---------------------

A, Jan2012,100

A, Feb2012,200

B, Jan2012,500

B, Feb2012,300

C, Jan2012,300

C, Feb2012,200

C, Mar2012,500

D, Jan2012,100

D, Feb2012,300

D, Mar2012,700

Anyhow we designed the target table. Now comes the real problem. How to identify
the number of
fields in the flat file and how to load the denormalized flat file into the
normalized table?
We created new procedure to handle this problem. Here i am listing the sequence of
steps in the
procedure which we used to load the flat file data into the oracle database.
Reading the Header information from the file:
. Created the required variables. I will mention them as and when required.
. We have used the utl_file package in oracle which is for reading the flat file.
. The syntax for opening the file is

FileHandle utl_file.file_type; --variable

FileHandle:=utl_file.fopen(

file_location IN VARCHAR2,

file_name IN VARCHAR2,

open_mode IN VARCHAR2,

max_linesize IN BINARY_INTEGER DEFAULT NULL);

. We have opened the file. Now we will read the flat file header which is the first
line in the file.
The syntax is

Header Varchar2(4000);

utl_file.get_line(FileHandle,Header);

utl_file.fclose(FileHandle);

. The Header variable contains the header part of the file which contains the
fields in the file. The
data in the Header variable looks as

Product,Jan2012,Feb2012
. We have created an external table by using this Header variable.

Creating the External Table:

. As the Header variable contains the fields from the file, it is easy to construct
the syntax for
external table creation.
. Replace the comma in the Header variable with "varchar2(100),". Then concatenate
the variable
with " Varchar2(100)" at the end. This step is shown in the below example:

Header_With_datatypes:=Replace(Header,',',' varchar2(100),') || '


varchar2(100);

The data in the Header_With_datatypes variable will look as

product varchar2(100), Jan2012 varchar2(100), Feb2012 Varchar2(30)

. We have constructed the fields with data types. Now we have to construct the
structure of
external table using the variable. This is show in the below example:

Create table external_stage_table

Header_With_datatypes

Organization external

Access parameters

skip 1

Location(file_location)
);

. Use execute immediate to create the external table.

Transposing the columns into rows:

. Now we have to transpose the columns in the flat file into rows and then load
into the final table.
We have to transpose only the month columns and not the product column. The steps
involved
in transposing the columns are listed below:

Header:=Replace(Header,'product,','');

insert into target_table

select *

from external_stage_table

unpivot (sales for month in (Header))

. Drop the external table once inserting the target table is done.

I have provided just an overview of the steps that we have used.

Parameterizing the Flat File Names - Informatica

Q) How to load the data from a flat file into the target where the source flat file
name changes daily?
Example: I want to load the customers data into the target file on a daily basis.
The source file name
is in the format customers_yyyymmdd.dat. How to load the data where the filename
varies daily?
The solution to this kind of problems is using the parameters. You can specify
session parameters
for both the source and target flat files. Then create a parameter file and assign
the flat file names to
the parameters.
Specifying Parameters for File Names:
The steps involved in parameterizing the file names are:

. Creating the Parameter File


. Specifying the parameters in Session
. Specifying the parameter file name

Creating Parameter File:


Assume two session parameters $InputFileName and $OutputFileName for specifying the
source
and target flat file names respectively. Now create a parameter file in the below
format

> cat dynamic_file_names.param

[[Link]]

$InputFileName=customers_20120101.dat

$outputFileName=customers_file.dat

Specifying Parameters in Session:


Now you have to specify the parameters in the session. Edit the session and go to
the Mapping tab.
In the mapping tab, select the source qualifier in the Sources folder and set the
file property "Source
FileName" as $InputFileName. Similarly, for target file set the "Source FileName"
as
$OutputFileName.
Specifying Parameter File Name:
The last step is specifying the parameter file name. You can specify the parameter
file name either
in the session level or workflow level. To specify in the session level, go the
properties tab of the
session and set the property "Parameter FileName".
To specify the parameter file at workflow level, click on the "Worfklows" in
toolbar and then on Edit.
Now go to the properties and set the file property "Parameter FileName"
Thats it you are done with using the parameters as filenames. Now you have to take
care of
changing the file name in the parameter file daily.
Note: You can even specify the source and target directories as parameters.
Direct and Indirect Flat File Loading (Source File Type) - Informatica

File processing is one of the key features of informatica. Informatica provided a


source Filetype
option to specify the direct and indirect loading of source flat files into the
target. Here we will see
about direct, indirect source file type options and when to use them.
Direct Load of Flat File:
When you want to a load a single file into the target, then you can use the direct
source filetype
option. You can set the following source file properties in the mapping tab of the
session:

. Source File Directory: Enter the directory name where the source file resides.
. Source Filename: Enter the name of the file to be loaded into the target.
. Source Filetype: Specify the direct option when you want to load a single file
into the target.

Example: Let say we want to load the employees source file ([Link]) in the
directory
$PMSourceFileDir into the target, then source file properties to be configured in
the session are:

. Source File Directory: $PMSourceFileDir/


. Source Filename: [Link]
. Source Filetype: Direct

Indirect Load of Flat file:


Let say from each country we are getting the customers data in a separate file.
These files have the
same structure and same properties and we want to load all these files into a
single target. Creating
a mapping for each source file will be a tedious process. Informatica provides an
easy option
(indirect load) to handle this type of scenarios.
The indirect source file type option is used load the data from multiple source
files that have the
same structure and properties. The integration service reads each file sequentially
and then loads
the data into the target.
The process involved in specifying the indirect load options are [Link] a list
file and
[Link] the file properties in session.
Creating the list file:
You can create a list file manually and specify each source file you want to load
into the target in a
separate line. As an example consider the following list file:

>cat customers_list.dat

$PMSourceFileDir/customers_us.dat

$PMSourceFileDir/customers_uk.dat
$PMSourceFileDir/customers_india.dat
Rules and guidelines for creating the list file:

. Each file in the list must use the user-defined code page configured in the
source definition.
. Each file in the file list must share the same file properties as configured in
the source definition
or as entered for the source instance in the session property sheet.
. Enter one file name or one path and file name on a line. If you do not specify a
path for a file,
the Integration Service assumes the file is in the same directory as the file list.

. Each path must be local to the Integration Service node.

Configuring the File Properties in Session:


Configure the following source file properties in the session for indirect source
filetype:

. Source File Directory: Enter the directory name where the source file resides.
. Source Filename: Enter the list file name in case of indirect load
. Source Filetype: Specify the indirect option when you want to load a multiple
files with same
properties.

Note: If you have multiple files with different properties, then you cannot use the
indirect load option.
You have to use direct load option in this case.

Target Load Order/ Target Load Plan in Informatica

Target Load Order:


Target load order (or) Target load plan is used to specify the order in which the
integration service
loads the targets. You can specify a target load order based on the source
qualifier transformations
in a mapping. If you have multiple source qualifier transformations connected to
multiple targets, you
can specify the order in which the integration service loads the data into the
targets.
Target Load Order Group:
A target load order group is the collection of source qualifiers, transformations
and targets linked in a
mapping. The integration service reads the target load order group concurrently and
it processes the
target load order group sequentially. The following figure shows the two target
load order groups in a
single mapping:
[Link]
target_load_plan.jpg

Use of Target Load Order:


Target load order will be useful when the data of one target depends on the data of
another target.
For example, the employees table data depends on the departments data because of
the primary-
key and foreign-key relationship. So, the departments table should be loaded first
and then the
employees table. Target load order is useful when you want to maintain referential
integrity when
inserting, deleting or updating tables that have the primary key and foreign key
constraints.
Target Load Order Setting:
You can set the target load order or plan in the mapping designer. Follow the below
steps to
configure the target load order:
1. Login to the powercenter designer and create a mapping that contains multiple
target load order
groups.
2. Click on the Mappings in the toolbar and then on Target Load Plan. The following
dialog box will
pop up listing all the source qualifier transformations in the mapping and the
targets that receive data
from each source qualifier.
[Link]
target_load_plan_order.jpg

3. Select a source qualifier from the list.


4. Click the Up and Down buttons to move the source qualifier within the load
order.
5. Repeat steps 3 and 4 for other source qualifiers you want to reorder.
6. Click OK.

Reverse the Contents of Flat File � Informatica

Q1) I have a flat file, want to reverse the contents of the flat file which means
the first record should
come as last record and last record should come as first record and load into the
target file.
As an example consider the source flat file data as

Informatica Enterprise Solution

Informatica Power center

Informatica Power exchange

Informatica Data quality

The target flat file data should look as

Informatica Data quality


Informatica Power exchange

Informatica Power center

Informatica Enterprise Solution

Solution:
Follow the below steps for creating the mapping logic

. Create a new mapping.


. Drag the flat file source into the mapping.
. Create an expression transformation and drag the ports of source qualifier
transformation into
the expression transformation.
. Create the below additional ports in the expression transformation and assign the
corresponding
expressions

Variable port: v_count = v_count+1

Output port o_count = v_count

. Now create a sorter transformation and drag the ports of expression


transformation into it.
. In the sorter transformation specify the sort key as o_count and sort order as
DESCENDING.
. Drag the target definition into the mapping and connect the ports of sorter
transformation to the
target.

Q2) Load the header record of the flat file into first target, footer record into
second target and the
remaining records into the third target.
The solution to this problem I have already posted by using aggregator and joiner.
Now we will see
how to implement this by reversing the contents of the file.
Solution:

. Connect the source qualifier transformation to the expression transformation. In


the expression
transformation create the additional ports as mentioned above.
. Connect the expression transformation to a router. In the router transformation
create an output
group and specify the group condition as o_count=1. Connect this output group to a
target and
the default group to sorter transformation.
. Sort the data in descending order on o_count port.
. Connect the output of sorter transformation to expression transformation (don�t
connect o_count
port).
. Again in the expression transformation create the same additional ports mentioned
above.
. Connect this expression transformation to router and create an output group. In
the output
group specify the condition as o_count=1 and connect this group to second target.
Connect the
default group to the third group.

Mapping Variable Usage Example in Informatica

The variables in informatica can be used to store intermediate values and can be
used in
calculations. We will see how to use the mapping variables with an example.
Q) I want to load the data from a flat file into a target. The flat file has n
number of records. How the
load should happen is: In the first run i want to load the first 50 records, in the
second run the next
20 records, in the third run, the next 20 records and so on?
We will solve this problem with the help of mapping variables. Follow the below
steps to implement
this logic:

. Login to the mapping designer. Create a new mapping.


. Create a mapping variable. call it as $$Rec_Var.
. Drag the flat file source into the mapping.
. Create an expression transformation and drag the ports of source qualifier
transformation into
the expression transformation.
. In the expression transformtion, create the below ports.

variable port: v_cnt = v_cnt+1

output port: o_cnt = v_cnt

variable port v_num_rec = IIF($$Rec_Var is null OR $$Rec_Var=0 , 50, 20)

output port o_check_rec = SETVARIABLE($$Rec_Var,v_num_rec+$$Rec_Var)

. Now create a filter transformtion and drag the ports of expression transformation
into it. In the
filter transformation specfiy the contition as

IIF(v_check_rec=50,

IIF(o_cnt <= o_check_rec,TRUE,FALSE),

IIF(o_cnt<=o_check_rec AND o_cnt>o_ceck_rec-20,TRUE,FALSE)


)

. Drag the target definition into the mapping and connect the appropriate ports of
filter
transformation to the target.
. Create a workflow and run the workflow multiple times to see the effect.

Transaction Control Transformation in Informatica

Transaction Control is an active and connected transformation. The transaction


control
transformation is used to control the commit and rollback of transactions. You can
define a
transaction based on varying number of input rows. As an example, you can define a
transaction on
a group rows in the employees data using the department Id as a key.
In the informatica power center, you can define the transaction at the following
levels:

. Mapping level: Use the transaction control transformation to define the


transactions.
. Session level: You can specify the "Commit Type" option in the session properties
tab. The
different options of "Commit Type" are Target, Source and User Defined. If you have
used the
transaction control transformation in the mapping, then the "Commit Type" will
always be "User
Defined"

When you run a session, the integration service evaluates the expression for each
row in the
transaction control transformation. When it evaluates the expression as commit,
then it commits all
the rows in the transaction to the target(s). When the integration service
evaluates the expression as
rollback, then it roll back all the rows in the transaction from the target(s).
When you have flat file as the target, then the integration service creates an
output file for each time
it commits the transaction. You can dynamically name the target flat files. Look at
the example for
creating flat files dynamically - Dynamic flat file creation.
Creating Transaction Control Transformation
Follow the below steps to create transaction control transformation:

. Go to the mapping designer, click on transformation in the toolbar, Create.


. Select the transaction control transformation, enter the name and click on Create
and then
Done.
. You can drag the ports in to the transaction control transformation or you can
create the ports
manually in the ports tab.
. Go to the properties tab. Enter the transaction control expression in the
Transaction Control
Condition.
[Link]
transaction_control_transofrmation.jpg

Configuring Transaction Control Transformation


You can configure the following components in the transaction control
transformation:

. Transformation Tab: You can rename the transformation and add a description.
. Ports Tab: You can create input/output ports
. Properties Tab: You can define the transaction control expression and tracing
level.
. Metadata Extensions Tab: You can add metadata information.

Transaction Control Expression


You can enter the transaction control expression in the Transaction Control
Condition option in the
properties tab. The transaction control expression uses the IIF function to test
each row against the
condition. Use the following syntax for the expression

Syntax:

IIF (condition, value1, value2)

Example:

IIF(dept_id=10, TC_COMMIT_BEFORE,TC_ROLLBACK_BEFORE)

Use the following built-in variables in the expression editor of the transaction
control transformation:

. TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction

change for this row. This is the default value of the expression.
. TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the
new transaction.
. TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the
transaction, and begins a new transaction. The current row is in the committed
transaction.
. TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a
new transaction, and writes the current row to the target. The current row is in
the new
transaction.
. TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back
the transaction, and begins a new transaction. The current row is in the rolled
back transaction.

If the transaction control transformation evaluates to a value other than the


commit, rollback or
continue, then the integration service fails the session.
Transaction Control Transformation in Mapping
Transaction control transformation defines or redefines the transaction boundaries
in a mapping. It
creates a new transaction boundary or drops any incoming transaction boundary
coming from
upstream active source or transaction control transformation.
Transaction control transformation can be effective or ineffective for the
downstream transformations
and targets in the mapping. The transaction control transformation can become
ineffective for
downstream transformations or targets if you have used transformation that drops
the incoming
transaction boundaries after it. The following transformations drop the transaction
boundaries.

. Aggregator transformation with Transformation scope as "All Input".


. Joiner transformation with Transformation scope as "All Input".
. Rank transformation with Transformation scope as "All Input".
. Sorter transformation with Transformation scope as "All Input".
. Custom transformation with Transformation scope as "All Input".
. Custom transformation configured to generate transactions
. Transaction Control transformation
. A multiple input group transformation, such as a Custom transformation, connected
to multiple
upstream transaction control points.

Mapping Guidelines and Validation


Use the following rules and guidelines when you create a mapping with a Transaction
Control
transformation:

. If the mapping includes an XML target, and you choose to append or create a new
document on
commit, the input groups must receive data from the same transaction control point.
. Transaction Control transformations connected to any target other than
relational, XML, or
dynamic MQSeries targets are ineffective for those targets.
. You must connect each target instance to a Transaction Control transformation.
. You can connect multiple targets to a single Transaction Control transformation.
. You can connect only one effective Transaction Control transformation to a
target.
. You cannot place a Transaction Control transformation in a pipeline branch that
starts with a
Sequence Generator transformation.
. If you use a dynamic Lookup transformation and a Transaction Control
transformation in the
same mapping, a rolled-back transaction might result in unsynchronized target data.
. A Transaction Control transformation may be effective for one target and
ineffective for another
target. If each target is connected to an effective Transaction Control
transformation, the
mapping is valid.
. Either all targets or none of the targets in the mapping should be connected to
an effective
Transaction Control transformation.

Load Source File Name in Target - Informatica

Q) How to load the name of the current processing flat file along with the data
into the target using
informatica mapping?
We will create a simple pass through mapping to load the data and "file name" from
a flat file into the
target. Assume that we have a source file "customers" and want to load this data
into the target
"customers_tgt". The structures of source and target are

Source file name: [Link]

Customer_Id

Location

Target: Customers_TBL

Customer_Id

Location

FileName

The steps involved are:

. Login to the powercenter mapping designer and go to the source analyzer.


. You can create the flat file or import the flat file.
. Once you created a flat file, edit the source and go to the properties tab. Check
the option "Add
Currently Processed Flat File Name Port". This option is shown in the below image.
[Link]
Flat_file_properties.jpg
[Link]
flat_file_pass_through_mapping.jpg

. A new port, "CurrentlyProcessedFileName" is created in the ports tab.


. Now go to the Target Designer or Warehouse Designer and create or import the
target
definition. Create a "Filename" port in the target.
. Go to the Mapping designer tab and create new mapping.
. Drag the source and target into the mapping. Connect the appropriate ports of
source qualifier
transformation to the target.
. Now create a workflow and session. Edit the session and enter the appropriate
values for
source and target connections.
. The mapping flow is shown in the below image

The loading of the filename works for both Direct and Indirect Source filetype.
After running the
workflow, the data and the filename will be loaded in to the target. The important
point to note is the
complete path of the file will be loaded into the target. This means that the
directory path and the
filename will be loaded(example: /informatica/9.1/SrcFiles/[Link]).
If you don�t want the directory path and just want the filename to be loaded in to
the target, then
follow the below steps:

. Create an expression transformation and drag the ports of source qualifier


transformation into it.
. Edit the expression transformation, go to the ports tab, create an output port
and assign the
below expression to it.

REVERSE

(
SUBSTR

REVERSE(CurrentlyProcessedFileName),

1,

INSTR(REVERSE(CurrentlyProcessedFileName), '/') - 1

. Now connect the appropriate ports of expression transformation to the target


definition.

Joiner Transformation in Informatica

The joiner transformation is an active and connected transformation used to join


two heterogeneous
sources. The joiner transformation joins sources based on a condition that matches
one or more
pairs of columns between the two sources. The two input pipelines include a master
and a detail
pipeline or branch. To join more than two sources, you need to join the output of
the joiner
transformation with another source. To join n number of sources in a mapping, you
need n-1 joiner
transformations.

Creating Joiner Transformation

Follow the below steps to create a joiner transformation in informatica

. Go to the mapping designer, click on the Transformation->Create.


. Select the joiner transformation, enter a name and click on OK.
. Drag the ports from the first source into the joiner transformation. By default
the designer
creates the input/output ports for the source fields in the joiner transformation
as detail fields.
. Now drag the ports from the second source into the joiner transformation. By
default the
designer configures the second source ports as master fields.
. Edit the joiner transformation, go the ports tab and check on any box in the M
column to switch
the master/detail relationship for the sources.
. Go to the condition tab, click on the Add button to add a condition. You can add
multiple
conditions.
. Go to the properties tab and configure the properties of the joiner
transformation.
[Link]
Joiner_transformation_properties.jpg

Configuring Joiner Transformation

Configure the following properties of joiner transformation:

. Case-Sensitive String Comparison: When performing joins on string columns, the


integration
service uses this option. By default the case sensitive string comparison option is
checked.
. Cache Directory: Directory used to cache the master or detail rows. The default
directory path
is $PMCacheDir. You can override this value.
. Join Type: The type of join to be performed. Normal Join, Master Outer Join,
Detail Outer Join
or Full Outer Join.
. Tracing Level: Level of tracing to be tracked in the session log file.
. Joiner Data Cache Size: Size of the data cache. The default value is Auto.
. Joiner Index Cache Size: Size of the index cache. The default value is Auto.
. Sorted Input: If the input data is in sorted order, then check this option for
better performance.
. Master Sort Order: Sort order of the master source data. Choose Ascending if the
master
source data is sorted in ascending order. You have to enable Sorted Input option if
you choose
Ascending. The default value for this option is Auto.
. Transformation Scope: You can choose the transformation scope as All Input or
Row.

Join Condition

The integration service joins both the input sources based on the join condition.
The join condition
contains ports from both the input sources that must match. You can specify only
the equal (=)
operator between the join columns. Other operators are not allowed in the join
condition. As an
example, if you want to join the employees and departments table then you have to
specify the join
condition as department_id1= department_id. Here department_id1 is the port of
departments
source and department_id is the port of employees source.

Join Type

The joiner transformation supports the following four types of joins.

. Normal Join
. Master Outer Join
. Details Outer Join
. Full Outer Join

We will learn about each join type with an example. Let say i have the following
students and
subjects tables as the source.

Table Name: Subjects

Subject_Id subject_Name

-----------------------

1 Maths

2 Chemistry

3 Physics

Table Name: Students

Student_Id Subject_Id

---------------------

10 1

20 2

30 NULL
Assume that subjects source is the master and students source is the detail and we
will join these
sources on the subject_id port.

Normal Join:

The joiner transformation outputs only the records that match the join condition
and discards all the
rows that do not match the join condition. The output of the normal join is

Master Ports | Detail Ports

---------------------------------------------

Subject_Id Subject_Name Student_Id Subject_Id

---------------------------------------------

1 Maths 10 1

2 Chemistry 20 2

Master Outer Join:

In a master outer join, the joiner transformation keeps all the records from the
detail source and only
the matching rows from the master source. It discards the unmatched rows from the
master source.
The output of master outer join is

Master Ports | Detail Ports

---------------------------------------------

Subject_Id Subject_Name Student_Id Subject_Id

---------------------------------------------

1 Maths 10 1
2 Chemistry 20 2
NULL NULL 30 NULL

Detail Outer Join:

In a detail outer join, the joiner transformation keeps all the records from the
master source and only
the matching rows from the detail source. It discards the unmatched rows from the
detail source.
The output of detail outer join is

Master Ports | Detail Ports

---------------------------------------------

Subject_Id Subject_Name Student_Id Subject_Id

---------------------------------------------

1 Maths 10 1

2 Chemistry 20 2

3 Physics NULL NULL

Full Outer Join:

The full outer join first brings the matching rows from both the sources and then
it also keeps the
non-matched records from both the master and detail sources. The output of full
outer join is

Master Ports | Detail Ports

---------------------------------------------

Subject_Id Subject_Name Student_Id Subject_Id

---------------------------------------------

1 Maths 10 1
2 Chemistry 20 2

3 Physics NULL NULL

NULL NULL 30 NULL

Sorted Input

Use the sorted input option in the joiner properties tab when both the master and
detail are sorted on
the ports specified in the join condition. You can improve the performance by using
the sorted input
option as the integration service performs the join by minimizing the number of
disk IOs. you can see
good performance when worked with large data sets.

Steps to follow for configuring the sorted input option

. Sort the master and detail source either by using the source qualifier
transformation or sorter
transformation.
. Sort both the source on the ports to be used in join condition either in
ascending or descending
order.
. Specify the Sorted Input option in the joiner transformation properties tab.

Why joiner transformation is called as blocking transformation

The integration service blocks and unblocks the source data depending on whether
the joiner
transformation is configured for sorted input or not.

Unsorted Joiner Transformation

In case of unsorted joiner transformation, the integration service first reads all
the master rows
before it reads the detail rows. The integration service blocks the detail source
while it caches the all
the master rows. Once it reads all the master rows, then it unblocks the detail
source and reads the
details rows.

Sorted Joiner Transformation


Blocking logic may or may not possible in case of sorted joiner transformation. The
integration
service uses blocking logic if it can do so without blocking all sources in the
target load order group.
Otherwise, it does not use blocking logic.

Joiner Transformation Performance Improve Tips

To improve the performance of a joiner transformation follow the below tips

. If possible, perform joins in a database. Performing joins in a database is


faster than performing
joins in a session.
. You can improve the session performance by configuring the Sorted Input option in
the joiner
transformation properties tab.
. Specify the source with fewer rows and with fewer duplicate keys as the master
and the other
source as detail.

Limitations of Joiner Transformation

The limitations of joiner transformation are

. You cannot use joiner transformation when the input pipeline contains an update
strategy
transformation.
. You cannot connect a sequence generator transformation directly to the joiner
transformation.

Design/Implement/Create SCD Type 2 Effective Date Mapping in Informatica

Q) How to create or implement slowly changing dimension (SCD) Type 2 Effective Date
mapping in
informatica?
SCD type 2 will store the entire history in the dimension table. In SCD type 2
effective date, the
dimension table will have Start_Date (Begin_Date) and End_Date as the fields. If
the End_Date is
Null, then it indicates the current row. Know more about SCDs at Slowly Changing
Dimensions
Concepts.
We will see how to implement the SCD Type 2 Effective Date in informatica. As an
example consider
the customer dimension. The source and target table structures are shown below:

--Source Table
Create Table Customers

Customer_Id Number Primary Key,

Location Varchar2(30)

);

--Target Dimension Table

Create Table Customers_Dim

Cust_Key Number Primary Key,

Customer_Id Number,

Location Varchar2(30),

Begin_Date Date,

End_Date Date

);

The basic steps involved in creating a SCD Type 2 Effective Date mapping are

. Identifying the new records and inserting into the dimension table with
Begin_Date as the
Current date (SYSDATE) and End_Date as NULL.
. Identifying the changed record and inserting into the dimension table with
Begin_Date as the
Current date (SYSDATE) and End_Date as NULL.
. Identify the changed record and update the existing record in dimension table
with End_Date as
Curren date.

We will divide the steps to implement the SCD type 2 Effective Date mapping into
four parts.
SCD Type 2 Effective Date implementation - Part 1
[Link]
[Link]
[Link]
LKP_PORTS_WINDOW.jpg
Here we will see the basic set up and mapping flow require for SCD type 2 Effective
Date. The steps
involved are:

. Create the source and dimension tables in the database.


. Open the mapping designer tool, source analyzer and either create or import the
source
definition.
. Go to the Warehouse designer or Target designer and import the target definition.

. Go to the mapping designer tab and create new mapping.


. Drag the source into the mapping.
. Go to the toolbar, Transformation and then Create.
. Select the lookup Transformation, enter a name and click on create. You will get
a window as
shown in the below image.

. Select the customer dimension table and click on OK.


. Edit the lookup transformation, go to the ports tab and remove unnecessary ports.
Just keep
only Cust_key, customer_id and location ports in the lookup transformation. Create
a new port
(IN_Customer_Id) in the lookup transformation. This new port needs to be connected
to the
customer_id port of the source qualifier transformation.
[Link]
scd_type2_effective_date_mapping_part1.jpg
. Go to the conditions tab of the lookup transformation and enter the condition as
Customer_Id =
IN_Customer_Id
. Go to the properties tab of the LKP transformation and enter the below query in
Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database
in the
Lookup SQL Override expression editor and then add the WHERE clause.

SELECT Customers_Dim.Cust_Key as Cust_Key,

Customers_Dim.Location as Location,

Customers_Dim.Customer_Id as Customer_Id

FROM Customers_Dim

WHERE Customers_Dim.End_Date IS NULL

. Click on Ok in the lookup transformation. Connect the customer_id port of source


qualifier
transformation to the In_Customer_Id port of the LKP transformation.
. Create an expression transformation with input/output ports as Cust_Key,
LKP_Location,
Src_Location and output ports as New_Flag, Changed_Flag. Enter the below
expressions for
output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)

Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND

LKP_Location != SRC_Location, 1, 0)

. The part of the mapping flow is shown below.


[Link]
scd_type2_effective_date_mapping_part2.jpg
SCD Type 2 Effective Date implementation - Part 2
In this part, we will identify the new records and insert them into the target with
Begin Date as the
current date. The steps involved are:

. Now create a filter transformation to identify and insert new record in to the
dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier
transformation
(Customer_Id, Location) into the filter transformation.
. Go the properties tab of filter transformation and enter the filter condition as
New_Flag=1
. Now create a update strategy transformation and connect the ports of filter
transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy
expression as
DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Create a sequence generator and an expression transformation. Call this
expression
transformation as "Expr_Date".
. Drag and connect the NextVal port of sequence generator to the Expression
transformation. In
the expression transformation create a new output port (Begin_Date with date/time
data type)
and assign value SYSDATE to it.
. Now connect the ports of expression transformation (Nextval, Begin_Date) to the
Target
definition ports (Cust_Key, Begin_Date). The part of the mapping flow is shown in
the below
image.

SCD Type 2 Effective Date implementation - Part 3


In this part, we will identify the changed records and insert them into the target
with Begin Date as
the current date. The steps involved are:

. Create a filter transformation. Call this filter transformation as FIL_Changed.


This is used to find
the changed records. Now drag the ports from expression transformation
(changed_flag),
source qualifier transformation (customer_id, location), LKP transformation
(Cust_Key) into the
filter transformation.
. Go to the filter transformation properties and enter the filter condition as
changed_flag =1.
. Now create an update strategy transformation and drag the ports of Filter
transformation
(customer_id, location) into the update strategy transformation. Go to the
properties tab and
enter the update strategy expression as DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Now connect the Next_Val, Begin_Date ports of expression transformation
(Expr_Date created
in part 2) to the cust_key, Begin_Date ports of the target definition respectively.
The part of the
mapping diagram is shown below.
[Link]
scd_type2_effective_date_mapping_part3.jpg
[Link]
scd_type2_effective_date_mapping_complete.jpg

SCD Type 2 Effective Date implementation - Part 4


In this part, we will update the changed records in the dimension table with End
Date as current
date.

. Create an expression transformation and drag the Cust_Key port of filter


transformation
(FIL_Changed created in part 3) into the expression transformation.
. Go to the ports tab of expression transformation and create a new output port
(End_Date with
date/time data type). Assign a value SYSDATE to this port.
. Now create an update strategy transformation and drag the ports of the expression

transformation into it. Go to the properties tab and enter the update strategy
expression as
DD_UPDATE.
. Drag the target definition into the mapping and connect the appropriate ports of
update strategy
to it. The complete mapping image is shown below.

Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date

Design/Implement/Create SCD Type 2 Flag Mapping in Informatica


Q) How to create or implement slowly changing dimension (SCD) Type 2 Flagging
mapping in
informatica?
SCD type 2 will store the entire history in the dimension table. Know more about
SCDs at Slowly
Changing Dimensions Concepts.
We will see how to implement the SCD Type 2 Flag in informatica. As an example
consider the
customer dimension. The source and target table structures are shown below:

--Source Table

Create Table Customers

Customer_Id Number Primary Key,

Location Varchar2(30)

);

--Target Dimension Table

Create Table Customers_Dim

Cust_Key Number Primary Key,

Customer_Id Number,

Location Varchar2(30),

Flag Number

);

The basic steps involved in creating a SCD Type 2 Flagging mapping are

. Identifying the new records and inserting into the dimension table with flag
column value as one.
. Identifying the changed record and inserting into the dimension table with flag
value as one.
[Link]
[Link]
. Identify the changed record and update the existing record in dimension table
with flag value as
zero.

We will divide the steps to implement the SCD type 2 flagging mapping into four
parts.
SCD Type 2 Flag implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2 Flagging.
The steps
involved are:

. Create the source and dimension tables in the database.


. Open the mapping designer tool, source analyzer and either create or import the
source
definition.
. Go to the Warehouse designer or Target designer and import the target definition.

. Go to the mapping designer tab and create new mapping.


. Drag the source into the mapping.
. Go to the toolbar, Transformation and then Create.
. Select the lookup Transformation, enter a name and click on create. You will get
a window as
shown in the below image.

. Select the customer dimension table and click on OK.


. Edit the lookup transformation, go to the ports tab and remove unnecessary ports.
Just keep
only Cust_key, customer_id and location ports in the lookup transformation. Create
a new port
(IN_Customer_Id) in the lookup transformation. This new port needs to be connected
to the
customer_id port of the source qualifier transformation.
[Link]
LKP_PORTS_WINDOW.jpg

. Go to the conditions tab of the lookup transformation and enter the condition as
Customer_Id =
IN_Customer_Id
. Go to the properties tab of the LKP transformation and enter the below query in
Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database
in the
Lookup SQL Override expression editor and then add the WHERE clause.

SELECT Customers_Dim.Cust_Key as Cust_Key,

Customers_Dim.Location as Location,

Customers_Dim.Customer_Id as Customer_Id

FROM Customers_Dim

WHERE Customers_Dim.Flag = 1

. Click on Ok in the lookup transformation. Connect the customer_id port of source


qualifier
transformation to the In_Customer_Id port of the LKP transformation.
. Create an expression transformation with input/output ports as Cust_Key,
LKP_Location,
Src_Location and output ports as New_Flag, Changed_Flag. Enter the below
expressions for
output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)

Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND


[Link]
scd_type2_flag_mapping_part1.jpg
[Link]
scd_type2_flag_mapping_part2.jpg
LKP_Location != SRC_Location, 1, 0)

. The part of the mapping flow is shown below.

SCD Type 2 Flag implementation - Part 2


In this part, we will identify the new records and insert them into the target with
flag value as 1. The
steps involved are:

. Now create a filter transformation to identify and insert new record in to the
dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier
transformation
(Customer_Id, Location) into the filter transformation.
. Go the properties tab of filter transformation and enter the filter condition as
New_Flag=1
. Now create a update strategy transformation and connect the ports of filter
transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy
expression as
DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Create a sequence generator and an expression transformation. Call this
expression
transformation as "Expr_Flag".
. Drag and connect the NextVal port of sequence generator to the Expression
transformation. In
the expression transformation create a new output port (Flag) and assign value 1 to
it.
. Now connect the ports of expression transformation (Nextval, Flag) to the Target
definition ports
(Cust_Key, Flag). The part of the mapping flow is shown in the below image.

SCD Type 2 Flag implementation - Part 3


In this part, we will identify the changed records and insert them into the target
with flag value as 1.
The steps involved are:
[Link]
scd_type2_flag_mapping_part3.jpg
[Link]
scd_type2_flag_mapping_complete.jpg
. Create a filter transformation. Call this filter transformation as FIL_Changed.
This is used to find
the changed records. Now drag the ports from expression transformation
(changed_flag),
source qualifier transformation (customer_id, location), LKP transformation
(Cust_Key) into the
filter transformation.
. Go to the filter transformation properties and enter the filter condition as
changed_flag =1.
. Now create an update strategy transformation and drag the ports of Filter
transformation
(customer_id, location) into the update strategy transformation. Go to the
properties tab and
enter the update strategy expression as DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Now connect the Next_Val, Flag ports of expression transformation (Expr_Flag
created in part
2) to the cust_key, Flag ports of the target definition respectively. The part of
the mapping
diagram is shown below.

SCD Type 2 Flag implementation - Part 4


In this part, we will update the changed records in the dimension table with flag
value as 0.

. Create an expression transformation and drag the Cust_Key port of filter


transformation
(FIL_Changed created in part 3) into the expression transformation.
. Go to the ports tab of expression transformation and create a new output port
(Flag). Assign a
value "0" to this Flag port.
. Now create an update strategy transformation and drag the ports of the expression

transformation into it. Go to the properties tab and enter the update strategy
expression as
DD_UPDATE.
. Drag the target definition into the mapping and connect the appropriate ports of
update strategy
to it. The complete mapping image is shown below.
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date

Design/Implement/Create SCD Type 2 Version Mapping in Informatica

Q) How to create or implement slowly changing dimension (SCD) Type 2 versioning


mapping in
informatica?
SCD type 2 will store the entire history in the dimension table. Know more about
SCDs at Slowly
Changing Dimensions DW Concepts.
We will see how to implement the SCD Type 2 version in informatica. As an example
consider the
customer dimension. The source and target table structures are shown below:

--Source Table

Create Table Customers

Customer_Id Number Primary Key,

Location Varchar2(30)

);

--Target Dimension Table

Create Table Customers_Dim

Cust_Key Number Primary Key,

Customer_Id Number,
[Link]
[Link]
Location Varchar2(30),

Version Number

);

The basic steps involved in creating a SCD Type 2 version mapping are

. Identifying the new records and inserting into the dimension table with version
number as one.
. Identifying the changed record and inserting into the dimension table by
incrementing the
version number.

Lets divide the steps to implement the SCD type 2 version mapping into three parts.

SCD Type 2 version implementation - Part 1


Here we will see the basic set up and mapping flow require for SCD type 2 version.
The steps
involved are:

. Create the source and dimension tables in the database.


. Open the mapping designer tool, source analyzer and either create or import the
source
definition.
. Go to the Warehouse designer or Target designer and import the target definition.

. Go to the mapping designer tab and create new mapping.


. Drag the source into the mapping.
. Go to the toolbar, Transformation and then Create.
. Select the lookup Transformation, enter a name and click on create. You will get
a window as
shown in the below image.

. Select the customer dimension table and click on OK.


. Edit the lookup transformation, go to the ports tab and remove unnecessary ports.
Just keep
only Cust_key, customer_id, location ports and Version ports in the lookup
transformation.
[Link]
LKP_PORTS_window.jpg
Create a new port (IN_Customer_Id) in the lookup transformation. This new port
needs to be
connected to the customer_id port of the source qualifier transformation.

. Go to the conditions tab of the lookup transformation and enter the condition as
Customer_Id =
IN_Customer_Id
. Go to the properties tab of the LKP transformation and enter the below query in
Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database
in the
Lookup SQL Override expression editor and then add the order by clause.

SELECT Customers_Dim.Cust_Key as Cust_Key,

Customers_Dim.Location as Location,

Customers_Dim.Version as Version,

Customers_Dim.Customer_Id as Customer_Id

FROM Customers_Dim

ORDER BY Customers_Dim.Customer_Id, Customers_Dim.Version--

. You have to use an order by clause in the above query. If you sort the version
column in
ascending order, then you have to specify "Use Last Value" in the "Lookup policy on
multiple
match" property. If you have sorted the version column in descending order then you
have to
specify the "Lookup policy on multiple match" option as "Use First Value"
. Click on Ok in the lookup transformation. Connect the customer_id port of source
qualifier
transformation to the In_Customer_Id port of the LKP transformation.
[Link]
SCD_type2_version_mapping_part_1.jpg
. Create an expression transformation with input/output ports as Cust_Key,
LKP_Location,
Src_Location and output ports as New_Flag, Changed_Flag. Enter the below
expressions for
output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)

Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND

LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2 version implementation - Part 2


In this part, we will identify the new records and insert them into the target with
version value as 1.
The steps involved are:

. Now create a filter transformation to identify and insert new record in to the
dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier
transformation
(Customer_Id, Location) into the filter transformation.
. Go the properties tab of filter transformation and enter the filter condition as
New_Flag=1
. Now create a update strategy transformation and connect the ports of filter
transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy
expression as
DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Create a sequence generator and an expression transformation. Call this
expression
transformation as "Expr_Ver".
. Drag and connect the NextVal port of sequence generator to the Expression
transformation. In
the expression transformation create a new output port (Version) and assign value 1
to it.
. Now connect the ports of expression transformation (Nextval, Version) to the
Target definition
ports (Cust_Key, Version). The part of the mapping flow is shown in the below
image.
[Link]
SCD_type2_version_mapping_part_2.jpg
[Link]
SCD_type2_version_complete_mapping.jpg

SCD Type 2 Version implementation - Part 3


In this part, we will identify the changed records and insert them into the target
by incrementing the
version number. The steps involved are:

. Create a filter transformation. This is used to find the changed record. Now drag
the ports from
expression transformation (changed_flag), source qualifier transforamtion
(customer_id,
location) and LKP transformation (version) into the filter transformation.
. Go to the filter transformation properties and enter the filter condition as
changed_flag =1.
. Create an expression transformation and drag the ports of filter transformation
except the
changed_flag port into the expression transformation.
. Go to the ports tab of expression transformation and create a new output port
(O_Version) and
assign the expression as (version+1).
. Now create an update strategy transformation and drag the ports of expression
transformation
(customer_id, location,o_version) into the update strategy transformation. Go to
the properties
tab and enter the update strategy expression as DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Now connect the Next_Val port of expression transformation (Expr_Ver created in
part 2) to the
cust_key port of the target definition. The complete mapping diagram is shown in
the below
image:

You can implement the SCD type 2 version mapping in your own way. Remember that SCD
type2
version mapping is rarely used in real time.
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
Rank Transformation in Informatica

Rank transformation is an active and connected transformation. The rank


transformation is used to
select the top or bottom rank of data. The rank transformation is used to select
the smallest or
largest numeric/string values. The integration service caches the input data and
then performs the
rank calculations.
Creating Rank Transformation
Follow the below steps to create an expression transformation

. In the mapping designer, create a new mapping or open an existing mapping.


. Go to Toolbar->click Transformation -> Create. Select the Rank transformation.
. Enter a name, click on Create and then click on Done.
. By default, the rank transformation creates a RANKINDEX port. The RankIndex port
is used to
store the ranking position of each row in the group.
. You can add additional ports to the rank transformation either by selecting and
dragging ports
from other transformations or by adding the ports manually in the ports tab.
. In the ports tab, check the Rank (R) option for the port which you want to do
ranking. You can
check the Rank (R) option for only one port. Optionally you can create the groups
for ranked
rows. select the Group By option for the ports that define the groups.

Configuring the Rank Transformation


Configure the following properties of Rank transformation

. Cache Directory: Directory where the integration service creates the index and
data cache
files.
. Top/Bottom: Specify whether you want to select the top or bottom rank of data.
. Number of Ranks: specify the number of rows you want to rank.
. Case-Sensitive String Comparison: Used to sort the strings using case sensitive
or not.
. Tracing Level: Amount of logging to be tracked in the session log file.
. Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. You
can set a
numeric value, or Auto for the data cache size. In case of Auto, the Integration
Service
determines the cache size at runtime.
. Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. You
can set a
numeric value, or Auto for the index cache size. In case of Auto, the Integration
Service
determines the cache size at runtime.

Rank Transformation Examples:


Q) Create a mapping to load the target table with top 2 earners (employees) in each
department
using the rank transformation.
Solution:

. Create a new mapping, Drag the source definition into the mapping.
[Link]
rank_ports_window.jpg
[Link]
rank_properties_window.jpg
. Create a rank transformation and drag the ports of source qualifier
transformation into the rank
transformation.
. Now go to the ports tab of the rank transformation. Check the rank (R) option for
the salary port
and Group By option for the Dept_Id port.

. Go to the properties tab, select the Top/Bottom value as Top and the Number of
Ranks property
as 2.

. Now connect the ports of rank transformation to the target definition.


Create/Design/Implement SCD Type 3 Mapping in Informatica

Q) How to create or implement or design a slowly changing dimension (SCD) Type 3


using the
informatica ETL tool.
The SCD Type 3 method is used to store partial historical data in the Dimension
table. The
dimension table contains the current and previous data.
The process involved in the implementation of SCD Type 3 in informatica is

. Identifying the new record and insert it in to the dimension table.


. Identifying the changed record and update the existing record in the dimension
table.

We will see the implementation of SCD type 3 by using the customer dimension table
as an
example. The source table looks as

CREATE TABLE Customers (

Customer_Id Number,

Location Varchar2(30)

Now I have to load the data of the source into the customer dimension table using
SCD Type 3. The
Dimension table structure is shown below.

CREATE TABLE Customers_Dim (

Cust_Key Number,

Customer_Id Number,

Curent_Location Varchar2(30),

Previous_Location Varchar2(30)

Steps to Create SCD Type 3 Mapping


Follow the below steps to create SCD Type 3 mapping in informatica

. Create the source and dimension tables in the database.


. Open the mapping designer tool, source analyzer and either create or import the
source
definition.
[Link]
[Link]
[Link]
LKP_window1.jpg
. Go to the Warehouse designer or Target designer and import the target definition.

. Go to the mapping designer tab and create new mapping.


. Drag the source into the mapping.
. Go to the toolbar, Transformation and then Create.
. Select the lookup Transformation, enter a name and click on create. You will get
a window as
shown in the below image.

. Select the customer dimension table and click on OK.


. Edit the LKP transformation, go to the properties tab, remove the
Previous_Location port and
add a new port In_Customer_Id. This new port needs to be connected to the
Customer_Id port
of source qualifier transformation.

. Go to the condition tab of LKP transformation and enter the lookup condition as
Customer_Id =
IN_Customer_Id. Then click on OK.
. Connect the customer_id port of source qualifier transformation to the
IN_Customer_Id port of
LKP transformation.
. Create the expression transformation with input ports as Cust_Key, Prev_Location,

Curr_Location and output ports as New_Flag, Changed_Flag


. For the output ports of expression transformation enter the below expressions and
click on ok
[Link]
type3_mapping_part1.jpg
[Link]
type3_mapping_part2.jpg

New_Flag = IIF(ISNULL(Cust_Key),1,0)

Changed_Flag = IIF(NOT ISNULL(Cust_Key)

AND Prev_Location != Curr_Location,

1, 0 )

. Now connect the ports of LKP transformation (Cust_Key, Curent_Location) to the


expression
transformaiton ports (Cust_Key, Prev_Location) and ports of source qualifier
transformation
(Location) to the expression transformation ports (Curr_Location) respectively.
. The mapping diagram so far created is shown in the below image.

. Create a filter transformation and drag the ports of source qualifier


transformation into it. Also
drag the New_Flag port from the expression transformation into it.
. Edit the filter transformation, go to the properties tab and enter the Filter
Condition as
New_Flag=1. Then click on ok.
. Now create an update strategy transformation and connect all the ports of the
filter
transformation (except the New_Flag port) to the update strategy. Go to the
properties tab of
update strategy and enter the update strategy expression as DD_INSERT
. Now drag the target definition into the mapping and connect the appropriate ports
from update
strategy to the target definition. Connect Location port of update strategy to the
Current_Location port of the target definition.
. Create a sequence generator transformation and connect the NEXTVAL port to the
target
surrogate key (cust_key) port.
. The part of the mapping diagram for inserting a new row is shown below:
[Link]
type3_complete_mapping.jpg
. Now create another filter transformation, Go to the ports tab and create the
ports Cust_Key,
Curr_Location, Prev_Location, Changed_Flag. Connect the ports LKP Transformation
(Cust_Key, Current_Location) to the filter transformation ports (Cust_Key,
Prev_Location),
source qualifier transformation ports (Location) to the filter transformation port
(Curr_Location)
and expression transformation port(changed_flag) to the changed_flag port of the
filter
transformation.
. Edit the filter transformation, go to the properties tab and enter the Filter
Condition as
Changed_Flag=1. Then click on ok.
. Now create an update strategy transformation and connect the ports of the filter
transformation
(Cust_Key, Curr_Location, Prev_location) to the update strategy. Go to the
properties tab of
update strategy and enter the update strategy expression as DD_Update
. Now drag the target definition into the mapping and connect the appropriate ports
from update
strategy to the target definition.
. The complete mapping diagram is shown in the below image.

Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date

Create/Design/Implement SCD Type 1 Mapping in Informatica

Q) How to create or implement or design a slowly changing dimension (SCD) Type 1


using the
informatica ETL tool.
The SCD Type 1 method is used when there is no need to store historical data in the
Dimension
table. The SCD type 1 method overwrites the old data with the new data in the
dimension table.
The process involved in the implementation of SCD Type 1 in informatica is

. Identifying the new record and inserting it in to the dimension table.


. Identifying the changed record and updating the dimension table.

We see the implementation of SCD type 1 by using the customer dimension table as an
example.
The source table looks as
CREATE TABLE Customers (

Customer_Id Number,

Customer_Name Varchar2(30),

Location Varchar2(30)

Now I have to load the data of the source into the customer dimension table using
SCD Type 1. The
Dimension table structure is shown below.

CREATE TABLE Customers_Dim (

Cust_Key Number,

Customer_Id Number,

Customer_Name Varchar2(30),

Location Varchar2(30)

Steps to Create SCD Type 1 Mapping


Follow the below steps to create SCD Type 1 mapping in informatica

. Create the source and dimension tables in the database.


. Open the mapping designer tool, source analyzer and either create or import the
source
definition.
. Go to the Warehouse designer or Target designer and import the target definition.

. Go to the mapping designer tab and create new mapping.


. Drag the source into the mapping.
. Go to the toolbar, Transformation and then Create.
. Select the lookup Transformation, enter a name and click on create. You will get
a window as
shown in the below image.
[Link]
[Link]
[Link]
LKP_ports_window_type1.jpg

. Select the customer dimension table and click on OK.


. Edit the lkp transformation, go to the properties tab, and add a new port
In_Customer_Id. This
new port needs to be connected to the Customer_Id port of source qualifier
transformation.

. Go to the condition tab of lkp transformation and enter the lookup condition as
Customer_Id =
IN_Customer_Id. Then click on OK.
[Link]
LKP_ports_window_type1_condition_tab.jpg

. Connect the customer_id port of source qualifier transformation to the


IN_Customer_Id port of
lkp transformation.
. Create the expression transformation with input ports as Cust_Key, Name,
Location, Src_Name,
Src_Location and output ports as New_Flag, Changed_Flag
. For the output ports of expression transformation enter the below expressions and
click on ok

New_Flag = IIF(ISNULL(Cust_Key),1,0)

Changed_Flag = IIF(NOT ISNULL(Cust_Key)

AND (Name != Src_Name

OR Location != Src_Location),

1, 0 )

. Now connect the ports of lkp transformation (Cust_Key, Name, Location) to the
expression
transformaiton ports (Cust_Key, Name, Location) and ports of source qualifier
transformation(Name, Location) to the expression transforamtion ports(Src_Name,
Src_Location) respectively.
. The mapping diagram so far created is shown in the below image.
[Link]
scd_typ1_mapping_part1.jpg
[Link]
scd_typ1_mapping_part2.jpg

. Create a filter transformation and drag the ports of source qualifier


transformation into it. Also
drag the New_Flag port from the expression transformation into it.
. Edit the filter transformation, go to the properties tab and enter the Filter
Condition as
New_Flag=1. Then click on ok.
. Now create an update strategy transformation and connect all the ports of the
filter
transformation (except the New_Flag port) to the update strategy. Go to the
properties tab of
update strategy and enter the update strategy expression as DD_INSERT
. Now drag the target definition into the mapping and connect the appropriate ports
from update
strategy to the target definition.
. Create a sequence generator transformation and connect the NEXTVAL port to the
target
surrogate key (cust_key) port.
. The part of the mapping diagram for inserting a new row is shown below:

. Now create another filter transformation and drag the ports from lkp
transformation (Cust_Key),
source qualifier transformation (Name, Location), expression transformation
(changed_flag)
ports into the filter transformation.
. Edit the filter transformation, go to the properties tab and enter the Filter
Condition as
Changed_Flag=1. Then click on ok.
. Now create an update strategy transformation and connect the ports of the filter
transformation
(Cust_Key, Name, and Location) to the update strategy. Go to the properties tab of
update
strategy and enter the update strategy expression as DD_Update
. Now drag the target definition into the mapping and connect the appropriate ports
from update
strategy to the target definition.
. The complete mapping diagram is shown in the below image.
[Link]
scd_typ1_complete_mapping.jpg

Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date

Create/Implement SCD - Informatica Mapping Wizard

Q) How to create different types of slowly changing dimensions (SCD) in informatica


using the
mapping wizard?

The Mapping Wizards in informatica provides an easy way to create the different
types of SCDs. We
will see how to create the SCDs using the mapping wizards in step by step.

The below steps are common for creating the SCD type 1, type 2 and type 3

Open the mapping designer tool, Go to the source analyzer tab and either create or
import the
source definition. As an example i am using the customer table as the source. The
fields in the
customer table are listed below.

Customers (Customer_Id, Customer_Name, Location)

Go to the mapping designer tab, in the tool bar click on Mappings, select Wizards
and then click on
Slowly Changing Dimensions.
[Link]
Mapping_Wizard.jpg
[Link]
SCD_Window1.jpg

Now enter the mapping name and select the SCD mapping type you want to create. This
is shown in
the below image. Then click on Next.

Select the source table name (Customers in this example) and enter the name for the
target table to
be created. Then click on next.
[Link]
SCD_Window2.jpg

Now you have to select the logical key fields and fields to compare for changes.
Logical key fields
are the fields which the source qualifier and the Lookup will be joined. Fields to
compare for changes
are the fields which are used to determine whether the values are changed or not.
Here i am using
customer_id as the logical key field and the location as the field to compare.

As of now we have seen the common steps for creating the SCDs. Now we will see the
specific
steps for creating each SCD

SCD Type 1 Mapping:

Once you have selected the logical key fields and fields to compare for changes.
Then you have to
simply click the finish button to create the SCD Type 1 mapping.
[Link]
SCD_TYPE1.jpg

SCD Type 2 Mapping:

After selecting the logical fields click on the next button. You will get a window
where you can select
what type of SCD 2 you want to create. For

. Versioning: select "keep the version number in separate column"


. Flagging: select "Mark the current dimension record with a flag"
. Effective Date: select "Mark the dimension records with their effective date.

Once you have selected the required type, then click on the finish button to create
the SCD type 2
mapping.
[Link]
SCD_TYPE2.jpg
[Link]
SCD_TYPE3.jpg

SCD Type 3 Mapping:

Click on the next button after selecting the logical key fields. You will get
window for selecting the
optional Effective Date. If you want the effective date to be created in the
dimension table, you can
check this box or else ignore. Now click on the finish button to create the SCD
type 3 mapping.
[Link]
Constraint_based_load_ordering.jpg

Constraint Based Loading in Informatica

Constraint based load ordering is used to load the data first in to a parent table
and then in to the
child tables. You can specify the constraint based load ordering option in the
Config Object tab of the
session. When the constraint based load ordering option is checked, the integration
service order
the target load order on a row by row basis. For every row generated by the active
source, the
integration service first loads the row into the primary key table and then to the
foreign key tables.
The constraint based loading is helpful to normalize the data from a denormalized
source data.

. The constraint based load ordering option applies for only insert operations.
. You cannot update or delete the rows using the constraint base load ordering.
. You have to define the primary key and foreign key relationships for the targets
in the
warehouse or target designer.
. The target tables must be in the same Target connection group.

Complete Constraint based load ordering


There is a work around to do updates and deletes using the constraint based load
ordering. The
informatica powercenter provides an option called complete constraint-based loading
for inserts,
updates and deletes in the target tables. To enable complete constraint based
loading, specify
FullCBLOSupport=Yes in the Custom Properties attribute on the Config Object tab of
session. This
is shown in the below image.

When you enable complete constraint based loading, the change data (inserts,
updates and deletes)
is loaded in the same transaction control unit by using the row ID assigned to the
data by the CDC
reader. As a result the data is applied to the target in the same order in which it
was applied to the
sources. You can also set this property in the integration service, which makes it
applicable for all
the sessions and workflows. When you use complete constraint based load ordering,
mapping
should not contain active transformations which change the row ID generated by the
CDC reader.
The following transformations can change the row ID value

. Aggregator Transformation
. Custom Transformation configured as an active
. Joiner Transformation
. Normalizer Transformation
. Rank Transformation
. Sorter Transformation

Mapping Implementation of constraint based load ordering


As an example, consider the following source table with data to be loaded into the
target tables
using the custom transformation.

Table Name: EMP_DEPT

Create table emp_dept

dept_id number,

dept_name varchar2(30),

emp_id number,

emp_name varchar2(30)

);

dept_id dept_name emp_id emp_name

---------------------------------

10 Finance 1 Mark

10 Finance 2 Henry

20 Hr 3 Christy

20 Hr 4 Tailor

The target tables should contain the below data.

Target Table 1: Dept


Create table dept

dept_id number primary key,

dept_name varchar2(30)

);

dept_id dept_name

-----------------

10 Finance

20 Hr

Target Table 2: Emp

create table emp

dept_id number,

emp_id number,

emp_name varchar2(30),

foreign key (dept_id) references dept(dept_id)

);

dept_id emp_id emp_name

---------------------------------
[Link]
target_relational_source_primary_key.jpg
10 1 Mark

10 2 Henry

20 3 Christy

20 4 Tailor

Follow the below steps for creating the mapping using constraint based load
ordering option.

. Create the source and target tables in the oracle database


. Go to the mapping designer, source analyzer and import the source definition from
the oracle
database.
. Now go to the warehouse designer or target designer and import the target
definitions from the
oracle database.
. Make sure that the foreign key relationship exists between the dept and emp
targets. Otherwise
create the relationship as shown in the below images.
[Link]
target_relational_source_foreign_key.jpg
[Link]
Constraint_based_load_ordering_mapping.jpg

. Now create a new mapping. Drag the source and targets into the mapping.
. Connect the appropriate ports of source qualifier transformation to the target
definition as shown
in the below image.

. Go to the workflow manager tool, create a new workflow and then session.
. Go to the Config object tab of session and check the option of constraint based
load ordering.
. Go to the mapping tab and enter the connections for source and targets.
. Save the mapping and run the workflow.

Min and Max values of contiguous rows - Oracle SQL Query

Q) How to find the Minimum and maximum values of continuous sequence numbers in a
group of
rows.
I know the problem is not clear without giving an example. Let say I have the
Employees table with
the below data.

Table Name: Employees

Dept_Id Emp_Seq

---------------
10 1

10 2

10 3

10 5

10 6

10 8

10 9

10 11

20 1

20 2

I want to find the minimum and maximum values of continuous Emp_Seq numbers. The
output
should look as.

Dept_Id Min_Seq Max_Seq

-----------------------

10 1 3

10 5 6

10 8 9

10 11 11

20 1 2

Write an SQL query in oracle to find the minimum and maximum values of continuous
Emp_Seq in
each department?
STEP1: First we will generate unique sequence numbers in each department using the
Row_Number analytic function in the Oracle. The SQL query is.
SELECT Dept_Id,

Emp_Seq,

ROW_NUMBER() OVER (PARTITION BY Dept_Id ORDER BY Emp_Seq) rn

FROM employees;

Dept_Id Emp_Seq rn

--------------------

10 1 1

10 2 2

10 3 3

10 5 4

10 6 5

10 8 6

10 9 7

10 11 8

20 1 1

20 2 2

STEP2: Subtract the value of rn from emp_seq to identify the continuous sequences
as a group. The
SQL query is

SELECT Dept_Id,

Emp_Seq,

Emp_Seq-ROW_NUMBER() OVER (PARTITION BY Dept_Id ORDER BY Emp_Seq) Dept_Split

FROM employees;
Dept_Id Emp_Seq Dept_Split

---------------------------

10 1 0

10 2 0

10 3 0

10 5 1

10 6 1

10 8 2

10 9 2

10 11 3

20 1 0

20 2 0

STEP3: The combination of the Dept_Id and Dept_Split fields will become the group
for continuous
rows. Now use group by on these fields and find the min and max values. The final
SQL query is

SELECT Dept_Id,

MIN(Emp_Seq) Min_Seq,

MAX(Emp_Seq) Max_Seq

FROM

SELECT Dept_Id,

Emp_Seq,

Emp_Seq-ROW_NUMBER() OVER (PARTITION BY Dept_Id ORDER BY Emp_Seq) Dept_Split

FROM employees;

) A
Group BY Dept_Id, Dept_Split

Slowly Changing Dimensions (SCD) - Types | Data Warehouse

Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which
the data
changes slowly, rather than changing regularly on a time basis.
For example, you may have a customer dimension in a retail domain. Let say the
customer is in
India and every month he does some shopping. Now creating the sales report for the
customers is
easy. Now assume that the customer is transferred to United States and he does
shopping there.
How to record such a change in your customer dimension?
You could sum or average the sales done by the customers. In this case you won't
get the exact
comparison of the sales done by the customers. As the customer salary is increased
after the
transfer, he/she might do more shopping in United States compared to in India. If
you sum the total
sales, then the sales done by the customer might look stronger even if it is good.
You can create a
second customer record and treat the transferred customer as the new customer.
However this will
create problems too.
Handling these issues involves SCD management methodologies which referred to as
Type 1 to
Type 3. The different types of slowly changing dimensions are explained in detail
below.
SCD Type 1: SCD type 1 methodology is used when there is no need to store
historical data in the
dimension table. This method overwrites the old data in the dimension table with
the new data. It is
used to correct data errors in the dimension.
As an example, i have the customer table with the below data.

surrogate_key customer_id customer_name Location

------------------------------------------------

1 1 Marspton Illions

Here the customer name is misspelt. It should be Marston instead of Marspton. If


you use type1
method, it just simply overwrites the data. The data in the updated table will be.

surrogate_key customer_id customer_name Location

------------------------------------------------

1 1 Marston Illions
The advantage of type1 is ease of maintenance and less space occupied. The
disadvantage is that
there is no historical data kept in the data warehouse.
SCD Type 3: In type 3 method, only the current status and previous status of the
row is maintained
in the table. To track these changes two separate columns are created in the table.
The customer
dimension table in the type 3 method will look as

surrogate_key customer_id customer_name Current_Location previous_location

--------------------------------------------------------------------------

1 1 Marston Illions NULL

Let say, the customer moves from Illions to Seattle and the updated table will look
as

surrogate_key customer_id customer_name Current_Location previous_location

--------------------------------------------------------------------------

1 1 Marston Seattle Illions

Now again if the customer moves from seattle to NewYork, then the updated table
will be

surrogate_key customer_id customer_name Current_Location previous_location

--------------------------------------------------------------------------

1 1 Marston NewYork Seattle

The type 3 method will have limited history and it depends on the number of columns
you create.
SCD Type 2: SCD type 2 stores the entire history the data in the dimension table.
With type 2 we
can store unlimited history in the dimension table. In type 2, you can store the
data in three different
ways. They are

. Versioning
. Flagging
. Effective Date

SCD Type 2 Versioning: In versioning method, a sequence number is used to represent


the
change. The latest sequence number always represents the current row and the
previous sequence
numbers represents the past data.
As an example, let�s use the same example of customer who changes the location.
Initially the
customer is in Illions location and the data in dimension table will look as.

surrogate_key customer_id customer_name Location Version

--------------------------------------------------------

1 1 Marston Illions 1

The customer moves from Illions to Seattle and the version number will be
incremented. The
dimension table will look as

surrogate_key customer_id customer_name Location Version

--------------------------------------------------------

1 1 Marston Illions 1

2 1 Marston Seattle 2

Now again if the customer is moved to another location, a new record will be
inserted into the
dimension table with the next version number.
SCD Type 2 Flagging: In flagging method, a flag column is created in the dimension
table. The
current record will have the flag value as 1 and the previous records will have the
flag as 0.
Now for the first time, the customer dimension will look as.

surrogate_key customer_id customer_name Location flag

--------------------------------------------------------

1 1 Marston Illions 1

Now when the customer moves to a new location, the old records will be updated with
flag value as
0 and the latest record will have the flag value as 1.

surrogate_key customer_id customer_name Location Version

--------------------------------------------------------

1 1 Marston Illions 0
2 1 Marston Seattle 1

SCD Type 2 Effective Date: In Effective Date method, the period of the change is
tracked using the
start_date and end_date columns in the dimension table.

surrogate_key customer_id customer_name Location Start_date End_date

-------------------------------------------------------------------------

1 1 Marston Illions 01-Mar-2010 20-Fdb-2011

2 1 Marston Seattle 21-Feb-2011 NULL

The NULL in the End_Date indicates the current version of the data and the
remaining records
indicate the past data.

SQL Transformation in Script Mode Examples - Informatica

This is continuation to my previous post on SQL Transformation in Query Mode. Here


we will see
how to use SQL transformation in script mode.
Script Mode
In a script mode, you have to create the sql scripts in a text file. The SQL
transformation runs your
sql scripts from these text files. You have to pass each script file name from the
source to the SQL
transformation ScriptName port. The script file name should contain a complete path
to the script
file. The SQL transformation acts as passive transformation in script mode and
returns one row for
each input row. The output row contains results of the query and any database
error.
SQL Transformation default ports in script mode
In script mode, By default three ports will be created in SQL transformation. They
are

. ScriptName (Input port) : Receives the name of the script to execute for the
current row.
. ScriptResult (output port) : Returns PASSED if the script execution succeeds for
the row.
Otherwise FAILED.
. ScriptError (Output port) : Returns errors that occur when a script fails for a
row.

Rules and Guidelines for script mode


You have to follow the below rules and guidelines when using the sql transformation
in script mode:
. You can run only static sql queries and cannot run dynamic sql queries in script
mode.
. You can include multiple sql queries in a script. You need to separate each query
with a
semicolon.
. The integration service ignores the output of select statements in the SQL
scripts.
. You cannot use procedural languages such as oracle plsql or Microsoft/Sybase T-
SQL in the
script.
. You cannot call a script from another script. Avoid using nested scripts.
. The script must be accessible to the integration service.
. You cannot pass arguments to the script.
. You can use mapping variables or parameters in the script file name.
. You can use static or dynamic database connection in the script mode.

Note: Use SQL transformation in script mode to run DDL (data definition language)
statements like
creating or dropping the tables.
Create SQL Transformation in Script Mode
We will see how to create sql transformation in script mode with an example. We
will create the
following sales table in oracle database and insert records into the table using
the SQL
transformation.

Script Name: $PMSourceFileDir/sales_ddl.txt

Create Table Sales (

Sale_id Number,

Product_name varchar2(30),

Price Number

);

Script Name: $PMSourceFileDir/sales_dml.txt

Insert into sales values(1,'Samsung',2000);

Insert into sales values(2,'LG',1000);

Insert into sales values(3,'Nokia',5000);

I created two script files in the $PMSourceFileDir directory. The sales_ddl.txt


contains the sales table
[Link]
flat_file_structure_informatica.jpg
[Link]
target_flat_file_structure_informatica.jpg
creation statement and the sales_dml.txt contains the insert statements. These are
the script files to
be executed by SQL transformation.
We need a source which contains the above script file names. So, I created another
file in the
$PMSourceFileDir directory to store these script file names.

File name: $PMSourceFileDir/Script_names.txt

> cat $PMSourceFileDir/Script_names.txt

$PMSourceFileDir/sales_ddl.txt

$PMSourceFileDir/sales_dml.txt

Now we will create a mapping to execute the script files using the SQL
transformation. Follow the
below steps to create the mapping.

. Go to the mapping designer tool, source analyzer and create the source file
definition with the
structure as the $PMSourceFileDir/Script_names.txt file. The flat file structure is
shown in the
below image.

. Go to the warehouse designer or target designer and create a target flat file
with result and error
ports. This is shown in the below image.

. Go to the mapping designer and create a new mapping.


. Drag the flat file into the mapping designer.
. Go to the Transformation in the toolbar, Create, select the SQL transformation,
enter a name
and click on create.
. Now select the SQL transformation options as script mode and DB type as Oracle
and click ok.
[Link]
sql_transformation_script_mode.jpg
[Link]
sql_transformation_script_mode_mapping..jpg

. The SQL transformation is created with the default ports.


. Now connect the source qualifier transformation ports to the SQL transformation
input port.
. Drag the target flat file into the mapping and connect the SQL transformation
output ports to the
target.
. Save the mappping. The mapping flow image is shown in the below picture.

. Go to the workflow manager, create a new mapping and session.


. Edit the session. For source, enter the source file directory, source file name
options as
$PMSourceFileDir\ and Script_names.txt respectively. For the SQL transformation,
enter the
oracle database relational connection.
. Save the workflow and run it.

This will create the sales table in the oracle database and inserts the records.

SQL Transformation in Informatica with examples

SQL Transformation is a connected transformation used to process SQL queries in the


midstream of
a pipeline. We can insert, update, delete and retrieve rows from the database at
run time using the
SQL transformation.
The SQL transformation processes external SQL scripts or SQL queries created in the
SQL editor.
You can also pass the database connection information to the SQL transformation as
an input data
at run time.
The following SQL statements can be used in the SQL transformation.

. Data Definition Statements (CREATE, ALTER, DROP, TRUNCATE, RENAME)


. DATA MANIPULATION statements (INSERT, UPDATE, DELETE, MERGE)
. DATA Retrieval Statement (SELECT)
. DATA Control Language Statements (GRANT, REVOKE)
. Transaction Control Statements (COMMIT, ROLLBACK)

Configuring SQL Transformation

The following options can be used to configure an SQL transformation

. Mode: SQL transformation runs either in script mode or query mode.


. Active/Passive: By default, SQL transformation is an active transformation. You
can configure
it as passive transformation.
. Database Type: The type of database that the SQL transformation connects to.
. Connection type: You can pass database connection information or you can use a
connection
object.

We will see how to create an SQL transformation in script mode, query mode and
passing the
dynamic database connection with examples.

Creating SQL Transformation in Query Mode

Query Mode: The SQL transformation executes a query that defined in the query
editor. You can
pass parameters to the query to define dynamic queries. The SQL transformation can
output
multiple rows when the query has a select statement. In query mode, the SQL
transformation acts as
an active transformation.
You can create the following types of SQL queries
Static SQL query: The SQL query statement does not change, however you can pass
parameters
to the sql query. The integration service runs the query once and runs the same
query for all the
input rows.
Dynamic SQL query: The SQL query statement and the data can change. The integration
service
prepares the query for each input row and then runs the query.
SQL Transformation Example Using Static SQL query
Q1) Let�s say we have the products and Sales table with the below data.

Table Name: Products

PRODUCT
-------

SAMSUNG

LG

IPhone

Table Name: Sales

PRODUCT QUANTITY PRICE

----------------------

SAMSUNG 2 100

LG 3 80

IPhone 5 200

SAMSUNG 5 500

Create a mapping to join the products ant sales table on product column using the
SQL
Transformation? The output will be

PRODUCT QUANTITY PRICE

----------------------

SAMSUNG 2 100

SAMSUNG 5 500

LG 3 80

Solution:
Just follow the below steps for creating the SQL transformation to solve the
example

. Create a new mapping, drag the products source definition to the mapping.
. Go to the toolbar -> Transformation -> Create -> Select the SQL transformation.
Enter a name
and then click create.
informatica sql transformation in query mode
sql transformation sql ports tab in informatica
. Select the execution mode as query mode, DB type as Oracle, connection type as
static. This is
shown in the below [Link] click OK.

. Edit the sql transformation, go to the "SQL Ports" tab and add the input and
output ports as
shown in the below image.

. In the same "SQL Ports" Tab, go to the SQL query and enter the below sql in the
SQL editor.
sql transformation informatica mapping
select product, quantity, price from sales where product = ?product?

. Here ?product? is the parameter binding variable which takes its values from the
input port.
Now connect the source qualifier transformation ports to the input ports of SQL
transformation
and target input ports to the SQL transformation output ports. The complete mapping
flow is
shown below.

. Create the workflow, session and enter the connections for source, target. For
SQL
transformation also enter the source connection.

After you run the workflow, the integration service generates the following queries
for sql
transformation

select product, quantity, price from sales where product ='SAMSUNG'

select product, quantity, price from sales where product ='LG'

Dynamic SQL query: A dynamic SQL query can execute different query statements for
each input
row. You can pass a full query or a partial query to the sql transformation input
ports to execute the
dynamic sql queries.
SQL Transformation Example Using Full Dynamic query
Q2) I have the below source table which contains the below data.

Table Name: Del_Tab

Del_statement

------------------------------------------

Delete FROM Sales WHERE Product = 'LG'

Delete FROM products WHERE Product = 'LG'


Solution:
Just follow the same steps for creating the sql transformation in the example 1.

. Now go to the "SQL Ports" tab of SQL transformation and create the input port as
"Query_Port".
Connect this input port to the Source Qualifier Transformation.
. In the "SQL Ports" tab, enter the sql query as ~Query_Port~. The tilt indicates a
variable
substitution for the queries.
. As we don�t need any output, just connect the SQLError port to the target.
. Now create workflow and run the workflow.

SQL Transformation Example Using Partial Dynamic query


Q3) In the example 2, you can see the delete statements are similar except Athe
table name. Now
we will pass only the table name to the sql transformation. The source table
contains the below data.

Table Name: Del_Tab

Tab_Names

----------

sales

products

Solution:
Create the input port in the sql transformation as Table_Name and enter the below
query in the SQL
Query window.

Delete FROM ~Table_Name WHERE Product = 'LG'

Recommended Reading
More about sql transformation - create SQL transformation in script mode.

Generate rows based on a column value - Informatica

Q) How to generate or load values in to the target table based on a column value
using informatica
etl tool.
I have the products table as the source and the data of the products table is
shown below.

Table Name: Products

Product Quantity

-----------------

Samsung NULL

Iphone 3

LG 0

Nokia 4

Now i want to duplicate or repeat each product in the source table as many times as
the value in the
quantity column. The output is

product Quantity

----------------

Iphone 3

Iphone 3

Iphone 3

Nokia 4

Nokia 4

Nokia 4

Nokia 4

The Samsung and LG products should not be loaded as their quantity is NULL, 0
respectively.
Now create informatica workflow to load the data in to the target table?
Solution:
Follow the below steps
. Create a new mapping in the mapping designer
. Drag the source definition in to the mapping
. Create the java transformation in active mode
. Drag the ports of source qualifier transformation in to the java transformation.
. Now edit the java transformation by double clicking on the title bar of the java
transformation
and go to the "Java Code" tab.
. Enter the below java code in the "Java Code" tab.

if (!isNull("quantity"))

double cnt = quantity;

for (int i = 1; i <= quantity; i++)

product = product;

quantity = quantity;

generateRow();

}
informatica joiner transformation example

. Now compile the java code. The compile button is shown in red circle in the
image.
. Connect the ports of the java transformation to the target.
. Save the mapping, create a workflow and run the workflow.

Sorter Transformation in Informatica

Sorter transformation is an active and connected transformation used to sort the


data. The data can
be sorted in ascending or descending order by specifying the sort key. You can
specify one or more
ports as a sort key and configure each sort key port to sort in ascending or
descending order. You
can also configure the order of the ports in which the integration service applies
to sort the data.
The sorter transformation is used to sort the data from relational or flat file
sources. The sorter
transformation can also be used for case-sensitive sorting and can be used to
specify whether the
output rows should be distinct or not.
Creating Sorter Transformation
Follow the below steps to create a sorter transformation

1. In the mapping designer, create a new mapping or open an existing mapping


2. Go the toolbar->Click on Transformation->Create
3. Select the Sorter Transformation, enter the name, click on create and then click
on Done.
4. Select the ports from the upstream transformation and drag them to the sorter
transformation.
You can also create input ports manually on the ports tab.
5. Now edit the transformation by double clicking on the title bar of the
transformation.
6. Select the ports you want to use as the sort key. For each selected port,
specify whether you
want the integration service to sort data in ascending or descending order.

Configuring Sorter Transformation


Configure the below properties of sorter transformation

. Case Sensitive: The integration service considers the string case when sorting
the data. The
integration service sorts the uppercase characters higher than the lowercase
characters.
. Work Directory: The integration service creates temporary files in the work
directory when it is
sorting the data. After the integration service sorts the data, it deletes the
temporary files.
. Distinct Output Rows: The integration service produces distinct rows in the
output when this
option is configured.
. Tracing Level: Configure the amount of data needs to be logged in the session log
file.
. Null Treated Low: Enable the property, to treat null values as lower when
performing the sort
operation. When disabled, the integration service treats the null values as higher
than any other
value.
. Sorter Cache Size: The integration service uses the sorter cache size property to
determine the
amount of memory it can allocate to perform sort operation

Performance improvement Tip


Use the sorter transformation before the aggregator and joiner transformation and
sort the data for
better performance.
Sorter Transformation Examples
1. Create a mapping to sort the data of employees on salary in descending order?
2. Create a mapping to load distinct departments into the target table?

Router Transformation in Informatica

Router transformation is an active and connected transformation. It is similar to


the filter
transformation used to test a condition and filter the data. In a filter
transformation, you can specify
only one condition and drops the rows that do not satisfy the condition. Where as
in a router
transformation, you can specify more than one condition and provides the ability
for route the data
that meet the test condition. Use router transformation if you need to test the
same input data on
multiple conditions.
Creating Router Transformation
Follow the below steps to create a router transformation
1. In the mapping designer, create a new mapping or open an existing mapping
2. Go the toolbar->Click on Transformation->Create
3. Select the Router Transformation, enter the name, click on create and then click
on Done.
4. Select the ports from the upstream transformation and drag them to the router
transformation.
You can also create input ports manually on the ports tab.

Configuring Router Transformation


The router transformation has input and output groups. You need to configure these
groups.

. Input groups: The designer copies the input ports properties to create a set of
output ports for
each output group.
. Output groups: Router transformation has two output groups. They are user-defined
groups
and default group.

User-defined groups: Create a user-defined group to test a condition based on the


incoming data.
Each user-defined group consists of output ports and a group filter condition. You
can create or
modify the user-defined groups on the groups tab. Create one user-defined group for
each condition
you want to specify.
Default group: The designer creates only one default group when you create one new
user-defined
group. You cannot edit or delete the default group. The default group does not have
a group filter
condition. If all the conditions evaluate to FALSE, the integration service passes
the row to the
default group.
Specifying Group Filter Condition
Specify the group filter condition on the groups tab using the expression editor.
You can enter any
expression that returns a single value. The group filter condition returns TRUE or
FALSE for each
row that passes through the transformation.
Advantages of Using Router over Filter Transformation
Use router transformation to test multiple conditions on the same input data. If
you use more than
one filter transformation, the integration service needs to process the input for
each filter
transformation. In case of router transformation, the integration service processes
the input data only
once and thereby improving the performance.
Router Transformation Examples
1. Create the employees data into two target tables. The first target table should
contain employees
with department_id 10 and second target table should contain employees with
department_id 20?
Solution: connect the source qualifier transformation to the router transformation.
In the router transformation, create two output groups. Enter the below filter
conditions.

In the first group filter condition,

department_id=10
In the second group filter condition,

department_id=20

Now connect the output groups of router transformation to the targets


2. The router transformation has the following group filter conditions.

In the first group filter condition,

department_id=30

In the second group filter condition,

department_id<=30

What data will be loaded into the first and second target tables?
Solution: The first target table will have employees from department 30. The second
table will have
employees whose department ids are less than or equal to 30.

Union Transformation in Informatica

Union transformation is an active and connected transformation. It is multi input


group transformation
used to merge the data from multiple pipelines into a single pipeline. Basically it
merges data from
multiples sources just like the UNION ALL set operator in SQL. The union
transformation does not
remove any duplicate rows.
Union Transformation Guidelines
The following rules and guidelines should be used when using a union transformation
in a mapping

. Union transformation contains only one output group and can have multiple input
groups.
. The input groups and output groups should have matching ports. The datatype,
precision and
scale must be same.
. Union transformation does not remove duplicates. To remove the duplicate rows use
sorter
transformation with "select distinct" option after the union transformation.
. The union transformation does not generate transactions.
. You cannot connect a sequence generator transformation to the union
transformation.
. Union transformation does not generate transactions.

Creating union transformation


Follow the below steps to create a union transformation

1. Go the mapping designer, create a new mapping or open an existing mapping


2. Go to the toolbar-> click on Transformations->Create
3. Select the union transformation and enter the name. Now click on Done and then
click on OK.
4. Go to the Groups Tab and then add a group for each source you want to merge.
5. Go to the Group Ports Tab and add the ports.

Components of union transformation


Configure the following tabs of union transformation

. Transformation: You can enter name and description of the transformation


. Properties: Specify the amount of tracing level to be tracked in the session log.

. Groups Tab: You can create new input groups or delete existing input groups.
. Group Ports Tab: You can create and delete ports for the input groups.

Note: The ports tab displays the groups and ports you create. You cannot edit the
port or group
information in the ports tab. To do changes use the groups tab and group ports tab.

Why union transformation is active


Union is an active transformation because it combines two or more data streams into
one. Though
the total number of rows passing into the Union is the same as the total number of
rows passing out
of it, and the sequence of rows from any given input stream is preserved in the
output, the positions
of the rows are not preserved, i.e. row number 1 from input stream 1 might not be
row number 1 in
the output stream. Union does not even guarantee that the output is repeatable.
Union Transformation Example
1. There are two tables in the source. The table names are employees_US and
employees_UK and
have the structure. Create a mapping to load the data of these two tables into
single target table
employees?

Filter Transformation in Informatica

Filter transformation is an active, connected transformation. The filter


transformation is used to filter
out rows in a mapping. As the filter transformation is an active transformation, it
may change the
number of rows passed through it. You have to specify a filter condition in the
filter transformation.
The rows that meet the specified filter condition are passed to other
transformations. The rows that
do not meet the filter condition are dropped.
Creating Filter Transformation
Follow the below steps to create a filter transformation

1. In the mapping designer, open a mapping or create a new mapping.


2. Go to the toolbar->click on Transformation->Create->Select the filter
transformation
3. Enter a name->Click on create and then click on done.
4. You can add ports either by dragging from other transformations or manually
creating the ports
within the transformation.

Specifying Filter Condition


To configure the filter condition, go to the properties tab and in the filter
condition section open the
expression editor. Enter the filter condition you want to apply. Click on validate
button to verify the
syntax and then click OK.
Components of Filter Transformation
The filter transformation has the following components.

. Transformation: You can enter the name and description of the transformation.
. Ports: Create new ports and configure them
. Properties: You can specify the filter condition to filter the rows. You can also
configure the
tracing levels.
. Metadata Extensions: Specify the metadata details like name, datatype etc.

Configuring Filter Transformation


The following properties needs to be configured on the ports tab in filter
transformation

. Port name: Enter the name of the ports created.


. Datatype, precision, and scale: Configure the data type and set the precision and
scale for
each port.
. Port type: All the ports in filter transformation are input/output.

Performance Tuning Tips

. Use the filter transformation as close as possible to the sources in the mapping.
This will reduce
the number of rows to be processed in the downstream transformations.
. In case of relational sources, if possible use the source qualifier
transformation to filter the rows.
This will reduce the number of rows to be read from the source.

Note: The input ports to the filter transformation mush come from a single
transformation. You
cannot connect ports from more than one transformation to the filter.
Filter Transformation examples
Specify the filter conditions for the following examples
1. Create a mapping to load the employees from department 50 into the target?

department_id=50

2. Create a mapping to load the employees whose salary is in the range of 10000 to
50000?

salary >=10000 AND salary <= 50000

3. Create a mapping to load the employees who earn commission (commission should
not be null)?

IIF(ISNULL(commission),FALSE,TRUE)

Expression Transformation in Informatica

Expression transformation is a connected, passive transformation used to calculate


values on a
single row. Examples of calculations are concatenating the first and last name,
adjusting the
employee salaries, converting strings to date etc. Expression transformation can
also be used to test
conditional statements before passing the data to other transformations.
Creating an Expression Transformation
Just follow the below steps to create an expression transformation

1. In the mapping designer, create a new mapping or open an existing mapping.


2. Go to Toolbar->click Transformation -> Create. Select the expression
transformation.
3. Enter a name, click on Create and then click on Done.
4. You can add ports to expression transformation either by selecting and dragging
ports from
other transformations or by opening the expression transformation and create ports
manually.

Adding Expressions
Once you created an expression transformation, you can add the expressions either
in a variable
port or output port. Create a variable or output port in the expression
transformation. Open the
Expression Editor in the expression section of the variable or output port. Enter
an expression and
then click on Validate to verify the expression syntax. Now Click OK.
Expression Transformation Components or Tabs
The expression transformation has the following tabs

. Transformation: You can enter the name and description of the transformation. You
can also
make the expression transformation reusable.
. Ports: Create new ports and configuring the ports.
. Properties: Configure the tracing level to set the amount of transaction detail
to be logged in
session log file.
. Metadata Extensions: You can specify extension name, data type, precision, value
and can
also create reusable metadata extensions.

Configuring Ports:
You can configure the following components on the ports tab

. Port name: Enter a name for the port.


. Datatype: Select the data type
. Precision and scale: set the precision and scale for each port.
. Port type: A port can be input, input/output, output or variable.
. Expression: Enter the expressions in the expression editor.

Expression transformation examples


1. Create a mapping to increase the salary of an employee by 10 percent?
Solution:
In the expression transformation, create a new output port (call it as adj_sal) and
enter the
expression as salary+salary*(10/100)
The expression can be simplified as salary*(110/100)
2. Create a mapping to concatenate the first and last names of the employee?
Include space
between the names
Solution:
Just create a new port in the expression transformation and enter the expression as

CONCAT(CONCAT(first_name,' '),last_name)
The above expression can be simplified as first_name||' '||last_name
Solve more scenarios on expression stransformation at Informatica Scenarios

Join Command in Unix/Linux Examples

Join command is one of the text processing utility in Unix/Linux. Join command is
used to combine
two files based on a matching fields in the files. If you know SQL, the join
command is similar to
joining two tables in a database.
The syntax of join command is

join [options] file1 file2


The join command options are

-1 field number : Join on the specified field number in the first file

-2 field number : Join on the specified field number in the second file

-j field number : Equivalent to -1 fieldnumber and -2 fieldnumber

-o list : displays only the specified fields from both the files

-t char : input and output field delimiter

-a filenumber : Prints non matched lines in a file

-i : ignore case while joining

Unix Join Command Examples


1. Write a join command to join two files on the first field?
The basic usage of join command is to join two files on the first field. By default
the join command
matches the files on the first fields when we do not specify the field numbers
explicitly. Let's say we
have two files [Link] and [Link]

> cat [Link]

10 mark

10 steve

20 scott

30 chris

> cat [Link]

10 hr

20 finance

30 db
Here we will join on the first field and see the output. By default, the join
command treats the field
delimiter as space or tab.

> join [Link] [Link]

10 mark hr

10 steve hr

20 scott finance

30 chris db

Important Note: Before joining the files, make sure to sort the fields on the
joining fields. Otherwise
you will get incorrect result.
2. Write a join command to join the two files? Here use the second field from the
first file and the first
field from the second file to join.
In this example, we will see how to join two files on different fields rather than
the first field. For this
consider the below two files as an example

> cat [Link]

mark 10 1

steve 10 1

scott 20 2

chris 30 3

> cat [Link]

10 hr 1

20 finance 2

30 db 3

From the above, you can see the join fields are the second field from the [Link]
and the first field
from the [Link]. The join command to match these two files is
> join -1 2 -2 1 [Link] [Link]

10 mark 1 hr 1

10 steve 1 hr 1

20 scott 2 finance 2

30 chris 3 db 3

You can also see that the two files can also be joined on the third filed. As the
both the files have the
matching join field, you can use the j option in the join command.
Here -1 2 specifies the second field from the first file ([Link]) and -2 1
specifies the first field from
the second file ([Link])

> join -j 3 [Link] [Link]

1 mark 10 10 hr

1 steve 10 10 hr

2 scott 20 20 finance

3 chris 30 30 db

3. Write a join command to select the required fields from the input files in the
output? Select first
filed from first file and second field from second file in the output.
By default, the join command prints all the fields from both the files (except the
join field is printed
once). We can choose what fields to be printed on the terminal with the -o option.
We will use the
same files from the above example.

> join -o 1.1 2.2 -1 2 -2 1 [Link] [Link]

mark hr

steve hr

scott finance

chris db
Here 1.1 means in the first file select the first field. Similarly, 2.2 means in
the second file select the
second field
4. Write a command to join two delimited files? Here the delimiter is colon (:)
So far we have joined files with space delimiter. Here we will see how to join
files with a colon as
delimiter. Consider the below two files.

> cat [Link]

mark:10

steve:10

scott:20

chris:30

> cat [Link]

10:hr

20:finance

30:db

The -t option is used to specify the delimiter. The join command for joining the
files is

> join -t: -1 2 -2 1 [Link] [Link]

10:mark:hr

10:steve:hr

20:scott:finance

30:chris:db

5. Write a command to ignore case when joining the files?


If the join fields are in different cases, then the join will not be performed
properly. To ignore the case
in join use the -i option.
> cat [Link]

mark,A

steve,a

scott,b

chris,C

> cat [Link]

a,hr

B,finance

c,db

> join -t, -i -1 2 -2 1 [Link] [Link]

A,mark,hr

a,steve,hr

b,scott,finance

C,chris,db

6. Write a join command to print the lines which do not match the values in joining
fields?
By default the join command prints only the matched lines from both the files which
means prints the
matched lines that passed the join condition. We can use the -a option to print the
non-matched
lines.

> cat [Link]

A 1

B 2

C 3

> cat [Link]


B 2

C 3

D 4

Print non pairable lines from first file.

> join -a 1 [Link] [Link]

A 1

B 2 2

C 3 3

Print non pairable lines from second file.

> join -a 2 [Link] [Link]

B 2 2

C 3 3

D 4

Print non pairable lines from both file.

> join -a 1 -a 2 [Link] [Link]

A 1

B 2 2

C 3 3
D 4

Move / Rename files, Directory - MV Command in Unix / Linux

Q. How to rename a file or directory in unix (or linux) and how to move a file or
directory from the
current directory to another directory?
Unix provides a simple mv (move) command which can be used to rename or move files
and
directories. The syntax of mv command is

mv [options] oldname newname

The options of mv command are

f : Do not prompt before overwriting a file.

i : Prompts for the user input before overwriting a file.

If the newname already exists, then the mv command overwrites that file. Let see
some examples on
how to use mv command.
Unix mv command examples
1. Write a unix/linux command to rename a file?
Renaming a file is one of the basic features of the mv command. To rename a file
from "[Link]" to
"[Link]", use the below mv command

> mv [Link] [Link]

Note that if the "[Link]" file already exists, then its contents will be
overwritten by "[Link]". To avoid
this use the -i option, which prompts you before overwriting the file.

mv -i [Link] [Link]

mv: overwrite `[Link]'?


2. Write a unix/linux command to rename a directory?
Just as renaming a file, you can use the mv command to rename a directory. To
rename the
directory from docs to documents, run the below command

mv docs/ documents/

If the documents directory already exists, then the docs directory will be moved in
to the documents
directory.
3. Write a unix/linux command to move a file into another directory?
The mv command can also be used to move the file from one directory to another
directory. The
below command moves the [Link] file in the current directory to /var/tmp directory.

mv [Link] /var/tmp/

If the [Link] file already exists in the /var/tmp directory, then the contents of
that file will be
overwritten.
4. Write a unix/linux command to move a directory in to another directory?
Just as moving a file, you can move a directory into another directory. The below
mv command
moves the documents directory into the tmp directory

mv documents /tmp/

5. Write a unix/linux command to move all the files in the current directory to
another directory?
You can use the regular expression pattern * to move all the files from one
directory to another
directory.

mv * /var/tmp/

The above command moves all the files and directories in the current directory to
the /var/tmp/
directory.
6. mv *
What happens if you simply type mv * and then press enter?
[Link]
avatar1360444_1.gif
[Link]
[Link]
It depends on the files you have in the directory. The * expands to all the files
and directories. Three
scenarios are possible.

. If the current directory has only files, then the contents of all the files
(except one file) will be
written in to the one file. The one file is the last file which depends on the
pattern *.
. If the current directory contains only directories, then all the directories
(except one directory)
will be moved to another directory.
. If the current directory contains both files and directories, then it depends on
the expansion of
the *. If the pattern * gives the last one as directory then all the files will be
moved to that
directory. Otherwise the mv command will fail.

Some Tips:

. Try to avoid mv *
. Avoid moving large number of files.

Email ThisBlogThis!Share to TwitterShare to Facebook

Labels: Unix

3 comments:

1.

Admin14 February, 2012 14:23

hi..., this is my first time using linux OS, your article very helpfull for me. and
im always learn from this site
about unix/linux. thank u so much for this tutorial. i give you +1 for this
article.

ReplyDelete

2.

Rohit25 April, 2012 09:55

Write a Bash script called mv (which replaces the GNU utility mv) that tries to
rename the specified file
(using the GNU utility mv), but if the destination file exists, instead creates an
index number to append to
the destination file, a sort of version number. For example, if I type:
$mv [Link] [Link]
But [Link] already exists, mv will move the file to [Link].1. Note that if [Link].1
already exists, you must rename
the file to [Link].2, and so on, until you can successfully rename the file to a
name that does not already
exist.
Help me out on this question.
If you have a solution plz reply me back on rastogirohit007@[Link]

ReplyDelete

3.

Gamer26 April, 2012 00:22


Here is the sample bash script for simulating the mv command. If you find any bugs,
do let me know.
#! /bin/bash
usage() {
USAGE="`basename $0` options"
echo ""
echo
"===============================================================================
==="
echo "Usage: $USAGE"
echo ""
echo "options:"
echo " -s : source file"
echo " -t : Target file"
echo
"===============================================================================
==="
echo ""
exit
}
while getopts "s:t:" options
do
case $options in
s) SOURCE="$OPTARG" ;;
t) TARGET="$OPTARG" ;;
*) usage ;;
esac
done
SOURCE_FILE=`basename $SOURCE` #removing the directory path
TARGET_DIR=`dirname $TARGET` # getting the target directory path
SOURCE_FILE_IN_TARGET=$TARGET_DIR/$SOURCE_FILE
if [ -e $TARGET ]
then
if [ -e $SOURCE_FILE_IN_TARGET.1 ]
then
MAX_SEQ_NUM=`ls -1 $SOURCE_FILE_IN_TARGET*|sed "s|$SOURCE_FILE_IN_TARGET\.||"|sort
-
n|tail -1`
NEXT_SEQ_NUM=`expr $MAX_SEQ_NUM + 1`
mv $SOURCE $TARGET_DIR/$SOURCE_FILE.$NEXT_SEQ_NUM
else
mv $SOURCE $TARGET_DIR/$SOURCE_FILE.1
fi
else
mv $SOURCE $TARGET
fi

Sed -i Command Examples in Unix and Linux


Sed is great tool for replacing the text in a file. sed is a stream editor which
means edit the file as a
stream of characters. To replace a text using the unix sed command, you have to
pass the search
string and replacement string. By default the sed command does not edit the file
and displays the
output on the terminal.
We will see the usage of -i command with an example
Consider the below text file with data

> cat [Link]

linux sed command tutorial

We will replace the word "tutorial" with "example" in the file using the sed
command.

> sed 's/tutorial/example/' [Link]

linux sed command example

> cat [Link]

linux sed command tutorial

The sed command replaced the text in the file and displayed the result on the
terminal. However it
did not changed the contents of the file. You can redirect the output of sed
command and save it in a
file as

> sed 's/tutorial/example/' [Link] > new_file.txt

The -i option comes in handy to edit the original file itself. If you use the -i
option the sed command
replaces the text in the original file itself rather than displaying it on the
terminal.

> sed -i 's/tutorial/example/' [Link]

> cat [Link]

linux sed command example


Be careful in using the -i option. Once you changed the contents of the file, you
cannot revert back to
the original file. It is good to take the backup of the original file. You can
provide a suffix to the -i
option for taking the backup of the file. Now we will replace the "example" with
"tutorial" and at the
same time will take the backup of the file.

> sed -i_bkp 's/example/tutorial/' [Link]

> ls [Link]*

[Link] file.txt_bkp

> cat file.txt_bkp

linux sed command example

> cat [Link]

linux sed command tutorial

See the backup file created with the contents of the original file.
Recommended reading for you
Sed command Tutorial

Dynamic Target Flat File Name Generation in Informatica

Informatica 8.x or later versions provides a feature for generating the target
files dynamically. This
feature allows you to

. Create a new file for every session run


. create a new file for each transaction.

Informatica provides a special port,"FileName" in the Target file definition. This


port you have to add
explicitly. See the below diagram for adding the "FileName" port.
[Link]
Dynamic_file_creation_informatica.jpg

Go to the Target Designer or Warehouse builder and edit the file definition. You
have to click on the
button indicated in red color circle to add the special port.
Now we will see some informatica mapping examples for creating the target file name
dynamically
and load the data.
1. Generate a new file for every session run.
Whenever the session runs you need to create a new file dynamically and load the
source data into
that file. To do this just follow the below steps:
STEP1: Connect the source qualifier to an expression transformation. In the
expression
transformation create an output port (call it as File_Name) and assign the
expression as
'EMP_'||to_char(sessstarttime, 'YYYYMMDDHH24MISS')||'.dat'
STPE2: Now connect the expression transformation to the target and connect eh
File_Name port of
expression transformation to the FileName port of the target file definition.
STEP3: Create a workflow and run the workflow.
Here I have used sessstarttime, as it is constant throughout the session run. If
you have used
sysdate, a new file will be created whenever a new transaction occurs in the
session run.
The target file names created would look like EMP_20120101125040.dat.
2. Create a new file for every session run. The file name should contain suffix as
numbers
(EMP_n.dat)
In the above mapping scenario, the target flat file name contains the suffix as
'[Link]'. Here
we have to create the suffix as a number. So, the file names should looks as
EMP_1.dat, EMP_2.dat
and so on. Follow the below steps:
STPE1: Go the mappings parameters and variables -> Create a new variable, $
$COUNT_VAR and
its data type should be Integer
STPE2: Connect the source Qualifier to the expression transformation. In the
expression
transformation create the following new ports and assign the expressions.

v_count (variable port) = v_count+1


v_file_count (variable port) = IIF(v_count = 1,
SETVARIABLE($$COUNT_VAR,$$COUNT_VAR+1),$$COUNT_VAR)

o_file_name (output port) = 'EMP_'||v_file_count||'.dat'

STEP3: Now connect the expression transformation to the target and connect the
o_file_name port
of expression transformation to the FileName port of the target.
3. Create a new file once a day.
You can create a new file only once in a day and can run the session multiple times
in the day to
load the data. You can either overwrite the file or append the new data.
This is similar to the first problem. Just change the expression in expression
transformation to
'EMP_'||to_char(sessstarttime, 'YYYYMMDD')||'.dat'. To avoid overwriting the file,
use Append If
Exists option in the session properties.
4. Create a flat file based on the values in a port.
You can create a new file for each distinct values in a port. As an example
consider the employees
table as the source. I want to create a file for each department id and load the
appropriate data into
the files.
STEP1: Sort the data on department_id. You can either use the source qualifier or
sorter
transformation to sort the data.
STEP2: Connect to the expression transformation. In the expression transformation
create the below
ports and assign expressions.

v_curr_dept_id (variable port) = dept_id

v_flag (variable port) = IIF(v_curr_dept_id=v_prev_dept_id,0,1)

v_prev_dept_id (variable port) = dept_id

o_flag (output port) = v_flag

o_file_name (output port) = dept_id||'.dat'

STEP4: Now connect the expression transformation to the transaction control


transformation and
specify the transaction control condition as

IIF(o_flag = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)

STEP5: Now connect to the target file definition.


If you like this post, then please share it on google by clicking on the +1
button.

Delete Directory, Files - rm, rmdir command in Unix / Linux

Q. How to delete directories and files in unix/linux


Unix provides rmdir and rm commands to remove the directories and files. Let see
each command in
detail.
Unix rmdir command syntax
The syntax of rmdir command is

rmdir [options] directories

The rmdir command options are

-p : Removes directory and its parent directories

-v : Provides the diagnostic information of the directory processed

Unix rmdir command examples


1. Write a unix/linux command to remove a directory?
The rmdir command deletes only the empty directories. If a directory contains files
or sub directories,
then the rmdir command fails.

rmdir docs/

rmdir: docs/: Directory not empty

Here the docs directory is not empty, that is why the rmdir command failed to
remove the directory.
To remove the docs directory first we have to make the directory empty and then
delete the
directory.
rm doc/*

rmdir docs/

We will see later how to remove non-empty directories with a single command.
2. Write a unix/linux command to remove the directory and its parent directories?
As mentioned earlier the -p option allows the rmdir command to delete the directory
and also its
parent directories.

rmdir -p docs/entertainment/movies/

This rmdir command removes the docs directory completely. If you don�t use the -p
option, then it
only deletes the movies directory.
3. Write a unix/linux command to remove directories using pattern matching?
You can specify the directory names using the regular expressions and can delete
them.

rm doc*

This rm command deletes the directories like doc, documents, doc_1 etc.
Now we will see the rm command in unix.
Unix rm command syntax
The syntax of rm command is

rm [options] [directory|file]

The rm command options are

f : Removes all files in a directory without prompting the user.

i : Interactive: prompts the user for confirmation before deleting a file.


R or r : Recursively remove directories and sub directories.

The rm command can be used to delete both the files and directories. The rm command
also deletes
the non-empty directories.
Unix rm command examples
1. Write a unix/linux command to remove a file?
This is the basic feature of rm command. To remove a file, [Link], in the
current directory use the
below rm command

rm [Link]

2. Write a unix/linux command to remove all the files in a directory?


use the * regular pattern as the file list in rm command for deleting all the files
in the current
directory.

rm *

3. Write a unix/linux command to delete empty directory?


The rm command can also be used to delete the empty directory. The command for this
is

rm docs/

If the directory is non-empty, then the above command fails to remove the
directories.
4. Write a unix/linux command to delete directories recursively (delete non empty
directories)?
As mentioned earlier, the -r option can be used to remove the directories and sub
directories.

rm -r docs

This removes the docs directory even if it is non-empty.


Incremental Aggregation in Informatica

Incremental Aggregation is the process of capturing the changes in the source and
calculating the
aggregations in a session. This process makes the integration service to update the
target
incrementally and avoids the process of calculating the aggregations on the entire
source. Consider
the below sales table as an example and see how the incremental aggregation works.

Source:

YEAR PRICE

----------

2010 100

2010 200

2010 300

2011 500

2011 600

2012 700

For simplicity, I have used only the year and price columns of sales table. We need
to do
aggregation and find the total price in each year.
When you run the session for the first time using the incremental aggregation, then
integration
service process the entire source and stores the data in two file, index and data
file. The integration
service creates the files in the cache directory specified in the aggregator
transformation properties.
After the aggregation, the target table will have the below data.

Target:

YEAR PRICE

----------
2010 600

2011 1100

2012 700

Now assume that the next day few more rows are added into the source table.

Source:

YEAR PRICE

----------

2010 100

2010 200

2010 300

2011 500

2011 600

2012 700

2010 400

2011 100

2012 200

2013 800

Now for the second run, you have to pass only the new data changes to the
incremental
aggregation. So, the source will contain the last four records. The incremental
aggregation uses the
data stored in the cache and calculates the aggregation. Once the aggregation is
done, the
integration service writes the changes to the target and the cache. The target
table will contains the
below data.
[Link]
Target:

YEAR PRICE

----------

2010 1000

2011 1200

2012 900

2013 800

Points to remember

1. When you use incremental aggregation, first time you have to run the session
with complete
source data and in the subsequent runs you have to pass only the changes in the
source data.
2. Use incremental aggregation only if the target is not going to change
significantly. If the
incremental aggregation process changes more than hhalf of the data in target, then
the session
perfromance many not benfit. In this case go for normal aggregation.

Note: The integration service creates a new aggregate cache when

. A new version of mapping is saved


. Configure the session to reinitialize the aggregate cache
. Moving or deleting the aggregate files
. Decreasing the number of partitions

Configuring the mapping for incremental aggregation


Before enabling the incremental aggregation option, make sure that you capture the
changes in the
source data. You can use lookup transformation or stored procedure transformation
to remove the
data which is already processed. You can also create a trigger on the source
database and can read
only the source changes in the mapping.

Email ThisBlogThis!Share to TwitterShare to Facebook

Labels: Informatica

3 comments:

1.
Neel08 February, 2012 23:01
[Link]
[Link]
Hi,
Is incremental aggregation so simple? If we implement d idea of incremental load or
CDC, and by default
aggregator has caching property...why do i need to excercise incremental
aggregation as separate option.
What is the advantage of using this over normal map. (using cdc and not using
incremental aggregation
property). Please explain.

ReplyDelete

Replies

1.

Anonymous10 February, 2012 07:24

Normal aggregator also caches the data. However, this cache will be cleared when
the session
run completes. In case of incremental aggregation the cache will not be cleared and
it is reused in
the next session run.
If you want to use normal aggregation, every time you run the session you have to
pass the
complete source data to calculate the aggregation. In case of incremental
aggregation, as the
processed data is stored in the cache, you just need to pass only the changes in
the source. This
way the data in cache and the changes form the complete source.

Delete

Reply

2.

Anderson Schmitt14 February, 2012 09:27

Thanks, this solved me a big problem!

Cut Command in Unix ( Linux) Examples

Cut command in unix (or linux) is used to select sections of text from each line of
files. You can use
the cut command to select fields or columns from a line by specifying a delimiter
or you can select a
portion of text by specifying the range or characters. Basically the cut command
slices a line and
extracts the text.
Unix Cut Command Example
We will see the usage of cut command by considering the below text file as an
example

> cat [Link]

unix or linux os
is unix good os

is linux good os

1. Write a unix/linux cut command to print characters by position?


The cut command can be used to print characters in a line by specifying the
position of the
characters. To print the characters in a line, use the -c option in cut command

cut -c4 [Link]

The above cut command prints the fourth character in each line of the file. You can
print more than
one character at a time by specifying the character positions in a comma separated
list as shown in
the below example

cut -c4,6 [Link]

xo

ui

ln

This command prints the fourth and sixth character in each line.
[Link] a unix/linux cut command to print characters by range?
You can print a range of characters in a line by specifying the start and end
position of the
characters.

cut -c4-7 [Link]

x or

unix
linu

The above cut command prints the characters from fourth position to the seventh
position in each
line. To print the first six characters in a line, omit the start position and
specify only the end position.

cut -c-6 [Link]

unix o

is uni

is lin

To print the characters from tenth position to the end, specify only the start
position and omit the end
position.

cut -c10- [Link]

inux os

ood os

good os

If you omit the start and end positions, then the cut command prints the entire
line.

cut -c- [Link]

[Link] a unix/linux cut command to print the fields using the delimiter?
You can use the cut command just as awk command to extract the fields in a file
using a delimiter.
The -d option in cut command can be used to specify the delimiter and -f option is
used to specify
the field position.

cut -d' ' -f2 [Link]

or

unix
linux

This command prints the second field in each line by treating the space as
delimiter. You can print
more than one field by specifying the position of the fields in a comma delimited
list.

cut -d' ' -f2,3 [Link]

or linux

unix good

linux good

The above command prints the second and third field in each line.
Note: If the delimiter you specified is not exists in the line, then the cut
command prints the entire
line. To suppress these lines use the -s option in cut command.
4. Write a unix/linux cut command to display range of fields?
You can print a range of fields by specifying the start and end position.

cut -d' ' -f1-3 [Link]

The above command prints the first, second and third fields. To print the first
three fields, you can
ignore the start position and specify only the end position.

cut -d' ' -f-3 [Link]

To print the fields from second fields to last field, you can omit the last field
position.

cut -d' ' -f2- [Link]

5. Write a unix/linux cut command to display the first field from /etc/passwd file?

The /etc/passwd is a delimited file and the delimiter is a colon (:). The cut
command to display the
first field in /etc/passwd file is
cut -d':' -f1 /etc/passwd

6. The input file contains the below text

> cat [Link]

[Link]

[Link]

add_int.sh

Using the cut command extract the portion after the dot.
First reverse the text in each line and then apply the command on it.

rev [Link] | cut -d'.' -f1

Delete Empty Lines Using Sed / Grep Command in Unix (or Linux)

In Unix / Linux you can use the Sed / Grep command to remove empty lines from a
file. For
example, Consider the below text file as input

> cat [Link]

Remove line using unix grep command

Delete lines using unix sed command

How it works

Now we will see how to remove the lines from the above file in unix / linux
1. Remove lines using unix sed command
The d command in sed can be used to delete the empty lines in a file.

sed '/^$/d' [Link]

Here the ^ specifies the start of the line and $ specifies the end of the line. You
can redirect the
output of above command and write it into a new file.

sed '/^$/d' [Link] > no_empty_lines.txt

2. Delete lines using unix grep command


First we will see how to search for empty lines using grep command.

grep '^$' [Link]

Now we will use the -v option to the grep command to reverse the pattern matching

grep -v '^$' [Link]

The output of both sed and grep commands after deleting the empty lines from the
file is

Remove line using unix grep command

Delete lines using unix sed command

How it works

Change Directory (cd) Examples | Unix and Linux Command

The Change directory (cd) command is one of the simple commands in Unix (or Linux)
and it is very
easy to use. The cd command is used to change from the current directory to another
directory. The
syntax of cd command is
cd [directory]

Here directory is the name of the directory where you wish to go.

CD Command Examples
1. Write a unix/linux cd command to change to home directory?
Just simply type cd command on the unix terminal and then press the enter key. This
will change
your directory to home directory.

> pwd

/usr/local/bin

Now i am in the /usr/local/bin directory. After typing the cd command and unix
window, you will go to
your home directory.

> cd

> pwd

/home/matt

Here pwd command displays the present working directory.


2. Write a unix/linux cd command to go back to one directory?
The cd .. changes the directory to its parent directory by going back one level.
The space between
the cd and .. is must.

> pwd

/var/tmp

> cd ..

> pwd

/var
3. Write a unix/linux cd command to go back to two directories?
The cd ../../ takes you back to two directories. You can extend this cd command to
go back to n
number of directories.

> pwd

/usr/local/bin

> cd ../../

> pwd

/usr

4. Write a unix/linux cd command to change the directory using the absolute path?
In case of changing directory using absolute path you have to specify the full
directory path.
Absolute path directories always start with a slash (/). An example is changing
your directory to
/usr/bin from your home directory.

> cd /usr/bin

5. Write a unix/linux cd command to change the directory using the relative path?
In relative path, you have to specify the directory path relative to your current
directory. For example,
you are in /var/tmp directory and you want to go to /var/lib directory, then you
can use the relative
path.

> pwd

/var/tmp

> cd ../lib

> pwd

/var/lib

Here the cd ../lib, first takes you to the parent directory which is /var and then
changes the directory
to the lib.
6. Write a unix/linux cd command to change back to previous directory.
As an example, i am in the directory /home/matt/documents and i changed to a new
directory
/home/matt/backup. Now i want to go back to my previous directory
/home/matt/documents. In this
case, you can use the cd - command to go back to the previous directory.

> pwd

/home/matt/documents

> cd /home/matt/backup

>pwd

/home/matt/backup

> cd -

> pwd

/home/matt/documents

Unix TimeStamp Command

What is Unix Timestamp


Unix timestamp is the representation of time as the running total of number of
seconds since the
unix epoch time on January 1st, 1970. Simply the Unix timestamp is the number of
seconds between
the particular date and the Unix Epoch.
The unix timestamp become standard in computer systems for tracking the information
especially in
distributed processing system like hadoop, cloud computing etc.
Here we will see how to convert the unix date to timestamp and unix timestamp to
date. We will also
see how to generate the unix current timestamp. Let see each one:
1. Unix Current Timestamp
To find the unix current timestamp use the %s option in the date command. The %s
option
calculates unix timestamp by finding the number of seconds between the current date
and unix
epoch.
date '+%s'

1327312578

You will get a different output if you run the above date command.
2. Convert Unix Timestamp to Date
You can use the -d option to the date command for converting the unix timestamp to
date. Here you
have to specify the unix epoch and the timestamp in seconds.

date -d "1970-01-01 956684800 sec GMT"

Tue Apr 25 [Link] PDT 2000

3. Convert Unix Date to Timestamp


You have to combine the -d option and the %s option for converting the unix date to
timestamp.

date -d "2000-01-01 GMT" '+%s'

946684800

Copy (cp) File And Directory Examples | Unix and Linux Command

Copy (cp) is the frequently used command in Unix (or Linux). The cp Command is used
to copy the
files from one directory to another directory. The cp command can also be used to
copy the
directories also. The syntax of cp command is

cp [options] source destination

Examples of cp Command
1. Write a unix/linux cp command to copy file in to a directory?
The basic usage of cp command is to copy a file from the current directory to
another directory.
cp [Link] tmp/

The cp command copies the file [Link] into the tmp directory. The cp command does
not remove
the source file. It just copies the file into a new location. If a file with the
same name as the source
exists in the destination location, then by default the cp command overwrites that
new file
2. Write a unix/linux cp to prompt for user before overwriting a file ( Interactive
cp command)?
The -i option to the cp command provides the ability to prompt for a user input
whether to overwrite
the destination file or not.

> cp [Link] tmp/

cp: overwrite `tmp/[Link]'?

If you enter y, then the cp command overwrites the destination file, otherwise the
cp command does
not copy the file.
3. Write a unix/linux cp command to copy multiple files in to a new directory?
You can specify multiple files as the source and can copy to the new location.

cp [Link] [Link] tmp/

The cp command copies the [Link], [Link] files in the current directory to the
tmp directory.
4. Write a unix/linux cp command to do a Regular expression copy?
You can copy a set of files by specifying a regular expression pattern.

cp *.dat tmp/

Here the cp command copies all the files which has "dat" as suffix to the
destination directory.
5. Write a unix/linux cp command to copy a file in to the current directory?
You can copy a file from a different directory to the current directory.
cp /usr/local/bin/[Link] .

Here the cp command copies the [Link] file in the /usr/local/bin directory the
current directory.
The dot (.) indicates the current directory.
6. Write a unix/linux cp command to copy all the files in a directory?
The cp command can be used to copy all the files in directory to another directory.

cp docs/* tmp/

This command copies all the files in the docs directory to the tmp directory.
7. Write a unix/linux cp command to copy files from multiple directories?
You can copy the files from different directories into a new location.

cp docs/* scripts/* tmp/

The command copies the files from docs and script directories to the destination
directory tmp.
8. Write a unix/linux cp command to Copy a directory.
You can recursively copy a complete directory and its sub directory to another
location using the cp
command

cp -r docs tmp/

This copies the complete directory docs into the new directory tmp
9. Write a unix/linux cp command to Forcibly copy a file with -f option?
You can force the cp command to copy an existing destination file even it cannot be
opened.

cp -f force_file.txt /var/tmp/
ls Command in Unix and Linux Examples

ls is the most widely used command in unix or linux. ls command is used to list the
contents of a
directory. Learn the power of ls command to make your life easy. The syntax of ls
command is

ls [options] [pathnames]

1. Write a unix/linux ls command to display the hidden files and directories?


To display the hidden files and directories in the current directory use the -a
option of the ls
command.

> ls -a

. .. documents .hidden_file [Link]

Hidden files are the one whose name starts with dot (.). The las -a displays the
current directory (.)
and parent directory (..) also. If you want to exclude the current directory,
parent directory, then use -
A option.

> ls -A

documents .hidden_file [Link]

2. Write a unix/linux ls command to classify the files with special characters


The -F option to ls command classifies the files. It marks the

. Directories with trailing slash (/)


. Executable files with trailing asterisk (*)
. FIFOs with trailing vertical bar (|)
. Symbolic links with trailing at the rate sign (@)
. Regular files with nothing

> ls -F

documents/ [Link] link@


3. Write a unix/linux ls command to print each file in a separate line?
The -1 option to the ls command specifies that each file should be displayed on a
separate line

> ls -1

documents

[Link]

4. Write a unix/linux ls command to display the inode number of file?


In some cases, you want to know the inode number of a file. Use -i option to the ls
command to print
the inode number of a file.

> ls -i1

10584066 documents

3482450 [Link]

5. Write a unix/linux ls command to display complete information about the files?


The -l option provides lots of information about the file type, owner, group,
permissions, file size, last
modification date.

> ls -l

total 16

drwxr-xr-x 2 matt db 4096 Jan 30 23:08 documents

-rw-r--r-- 1 matt db 49 Jan 31 01:17 [Link]

. The first character indicates the type of the file. - for normal file, d for
directory, l for link file and
s for socket file
. The next 9 characters in the first field represent the permissions. Each 3
characters refers the
read (r), write (w), execute (x) permissions on owner, group and others. - means no
permission.
. The second field indicates the number of links to that file.
. The third field indicates the owner name.
. The fourth field indicates the group name.
. The fifth field represents the file size in bytes.
. The sixth field represents the last modification date and time of the file.
. And finally the seventh field is the name of the file.

6. Write a unix/linux ls command to sort the files by their modification time?


The -t option allows the ls command to sort the files in descending order based on
the modification
time.

> ls -t1

[Link]

documents

7. Write a unix/linux ls command to sort the files in ascending order of


modification time?
The -r option reverses the order of the files displayed. Combine the -t and -r
options to sort the files
in ascending order.

> ls -rt1

documents

[Link]

8. Write a unix/linux ls command to print the files recursively?


So far the ls command prints the files in the current directory. Use the -R option
to recursively print
the files in the sub-directories also.

> ls -R

.:

documents [Link]

./documents:

[Link]
9. Write a unix/linux ls command to print the files in a specific directory?
You can pass a directory to the ls command as an argument to print for the files in
it.

> ls /usr/local/bin

10. Write a unix/linux ls command to display files in columns?


The -x option specifies the ls command to display the files in columns.

> ls -x

Rewrite Sql Query | Sql Performance Tuning

Tuning an SQL query for performance is a big topic. Here I will just cover how to
re-write a query
and thereby improve the performance. Rewriting an SQL query is one of the ways you
can improve
performance. You can rewrite a query in many different ways.
To explain this, i have used the sales and products table.

SALES(SALE_ID, YEAR, PRODUCT_ID, PRICE);

PRODUCTS(PRODUCT_ID, PRODUCT_NAME);

Follow the below steps in re writing a query for optimization.


1. Avoid Redundant Logic
I have seen people writing redundant sub-queries and worrying about their query
performance. As
an example, find the total sales in each year and also the sales of product with id
10 in each year.

SELECT [Link],

T.TOT_SAL,

P.PROD_10_SAL

SELECT YEAR,
SUM(PRICE) TOT_SAL

FROM SALES

GROUP BY YEAR

) T

LEFT OUTER JOIN

SELECT YEAR,

SUM(PRICE) PROD_10_SAL

FROM SALES

WHERE PRODUCT_ID = 10

) P

ON ([Link] = [Link]);

Most SQL developers write the above Sql query without even thinking that it can be
solved in a
single query. The above query is rewritten as

SELECT YEAR,

SUM(CASE WHEN PRODUCT_ID = 10

THEN PRICE

ELSE NULL

END ) PROD_10_SAL,

SUM(SALES) TOT_SAL

FROM SALES

GROUP BY YEAR;

Now you can see the difference, just by reading the sales table one time we will
able to solve the
problem.
First take a look at of your query, identify the redundant logic and then tune it.

2. LEFT OUTER JOIN, NOT EXISTS, NOT IN


Some times you can rewrite a LEFT OUTER JOIN by using NOT EXISTS or NOT IN and vice
versa.
As an example, I want to find the products which do not sold in the year 2011.

SELECT P.PRODUCT_ID,

P.PRODUCT_NAME

FROM PRODUCTS P

LEFT OUTER JOIN

SALES S

ON (P.PRODUCT_ID = S.PRODUCT_ID)

WHERE S.SALE_ID IS NULL;

The same query can be rewritten using NOT EXISTS and NOT IN as

SELECT P.PRODUCT_ID,

P.PRODUCT_NAME

FROM PRODUCTS P

WHERE NOT EXISTS

SELECT 1

FROM SALES S

WHERE S.PRODUCT_ID = P.PRODUCT_ID);

SELECT P.PRODUCT_ID,

P.PRODUCT_NAME
FROM PRODUCTS P

WHERE PRODUCT_ID NOT IN

SELECT PRODUCT_ID

FROM SALES

);

Analyze the performance of these three queries and use the appropriate one.
Note: Be careful while using the NOT IN. If the sub query returns at lease row with
NULL data, then
the main query won't return a row at all.
3. INNER JOIN, EXISTS, IN
As similar to LEFT OUTER JOIN, the INNER JOINS can also be implemented with the
EXISTS or IN
operators. As an example, we will find the sales of products whose product id�s
exists in the products
table.

SELECT S.PRODUCT_ID,

SUM(PRICE)

FROM SALES S

JOIN

PRODUCTS P

ON (S.PRODUCT_ID = P.PRODUCT_ID)

GROUP BY S.PRODUCT_ID;

As we are not selecting any columns from the products table, we can rewrite the
same query with
the help of EXISTS or IN operator.

SELECT S.PRODUCT_ID,

SUM(PRICE)

FROM SALES S
WHERE EXISTS

SELECT 1

FROM PRODUCTS P

WHERE P.PRODUCT_ID = S.PRODUCT_ID);

GROUP BY S.PRODUCT_ID;

SELECT S.PRODUCT_ID,

SUM(PRICE)

FROM SALES S

WHERE PRODUCT_ID IN

SELECT PRODUCT_ID

FROM PRODUCTS P

);

GROUP BY S.PRODUCT_ID;

4. INNER JOIN, CORRELATED QUERY


We will see a simple join between the SALES and PRODUCTS table.

SELECT S.SALE_ID,

S.PRODUCT_ID,

P.PRODUCT_NAME

FROM SALES S

JOIN
PRODUCTS P

ON (S.PRODUCT_ID = P.PRODUCT_ID)

The above query can be rewritten with correlated query as

SELECT S.SALE_ID,

S.PRODUCT_ID,

(SELECT PRODUCT_NAME

FROM PRODUCTS P

WHERE P.PRODUCT_ID = S.PRODUCT_ID)

FROM SALES S

Analyze these two queries with the explain plan and check which one gives better
performance.
5. Using With Clause or Temporary Tables.
Try to avoid writing complex Sql queries. Split the queries and store the data in
temporary tables or
use the Oracle With Clause for temporary storage. This will improve the
performance. You can also
use the temporary tables or with clause when you want to reuse the same query more
than once.
This saves the time and increases the performance.
Tips for increasing the query performance:

. Create the required indexes. In the mean time avoid creating too many indexes on
a table.
. Rewrite the Sql query.
. Use the explain plan, auto trace to know about the query execution.
. Generate statistics on tables.
. Specify the oracle Hints in the query.
. Ask the DBA to watch the query and gather stats like CPU usage, number of row
read etc.

Please help in improving this article, by commenting on more ways to rewrite a Sql
query.

Sed Command in Unix and Linux Examples


Sed is a Stream Editor used for modifying the files in unix (or linux). Whenever
you want to make
changes to the file automatically, sed comes in handy to do this. Most people never
learn its power;
they just simply use sed to replace text. You can do many things apart from
replacing text with sed.
Here I will describe the features of sed with examples.
Consider the below text file as an input.

>cat [Link]

unix is great os. unix is opensource. unix is free os.

learn operating system.

unixlinux which one you choose.

Sed Command Examples

1. Replacing or substituting string


Sed command is mostly used to replace the text in a file. The below simple sed
command replaces
the word "unix" with "linux" in the file.

>sed 's/unix/linux/' [Link]

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

Here the "s" specifies the substitution operation. The "/" are delimiters. The
"unix" is the search
pattern and the "linux" is the replacement string.
By default, the sed command replaces the first occurrence of the pattern in each
line and it won't
replace the second, third...occurrence in the line.
2. Replacing the nth occurrence of a pattern in a line.
Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a
line. The below
command replaces the second occurrence of the word "unix" with "linux" in a line.
>sed 's/unix/linux/2' [Link]

unix is great os. linux is opensource. unix is free os.

learn operating system.

unixlinux which one you choose.

3. Replacing all the occurrence of the pattern in a line.


The substitute flag /g (global replacement) specifies the sed command to replace
all the occurrences
of the string in the line.

>sed 's/unix/linux/g' [Link]

linux is great os. linux is opensource. linux is free os.

learn operating system.

linuxlinux which one you choose.

4. Replacing from nth occurrence to all occurrences in a line.


Use the combination of /1, /2 etc and /g to replace all the patterns from the nth
occurrence of a
pattern in a line. The following sed command replaces the third, fourth, fifth...
"unix" word with "linux"
word in a line.

>sed 's/unix/linux/3g' [Link]

unix is great os. unix is opensource. linux is free os.

learn operating system.

unixlinux which one you choose.

5. Changing the slash (/) delimiter


You can use any delimiter other than the slash. As an example if you want to change
the web url to
another url as
>sed 's/http:\/\//www/' [Link]

In this case the url consists the delimiter character which we used. In that case
you have to escape
the slash with backslash character, otherwise the substitution won't work.
Using too many backslashes makes the sed command look awkward. In this case we can
change
the delimiter to another character as shown in the below example.

>sed 's_[Link] [Link]

>sed 's|[Link] [Link]

6. Using & as the matched string


There might be cases where you want to search for the pattern and replace that
pattern by adding
some extra characters to it. In such cases & comes in handy. The & represents the
matched string.

>sed 's/unix/{&}/' [Link]

{unix} is great os. unix is opensource. unix is free os.

learn operating system.

{unix}linux which one you choose.

>sed 's/unix/{&&}/' [Link]

{unixunix} is great os. unix is opensource. unix is free os.

learn operating system.

{unixunix}linux which one you choose.

7. Using \1,\2 and so on to \9


The first pair of parenthesis specified in the pattern represents the \1, the
second represents the \2
and so on. The \1,\2 can be used in the replacement string to make changes to the
source string. As
an example, if you want to replace the word "unix" in a line with twice as the word
like "unixunix" use
the sed command as below.
>sed 's/\(unix\)/\1\1/' [Link]

unixunix is great os. unix is opensource. unix is free os.

learn operating system.

unixunixlinux which one you choose.

The parenthesis needs to be escaped with the backslash character. Another example
is if you want
to switch the words "unixlinux" as "linuxunix", the sed command is

>sed 's/\(unix\)\(linux\)/\2\1/' [Link]

unix is great os. unix is opensource. unix is free os.

learn operating system.

linuxunix which one you choose.

Another example is switching the first three characters in a line

>sed 's/^\(.\)\(.\)\(.\)/\3\2\1/' [Link]

inux is great os. unix is opensource. unix is free os.

aelrn operating system.

inuxlinux which one you choose.

8. Duplicating the replaced line with /p flag


The /p print flag prints the replaced line twice on the terminal. If a line does
not have the search
pattern and is not replaced, then the /p prints that line only once.

>sed 's/unix/linux/p' [Link]

linux is great os. unix is opensource. unix is free os.

linux is great os. unix is opensource. unix is free os.

learn operating system.


linuxlinux which one you choose.

linuxlinux which one you choose.

9. Printing only the replaced lines


Use the -n option along with the /p print flag to display only the replaced lines.
Here the -n option
suppresses the duplicate rows generated by the /p flag and prints the replaced
lines only one time.

>sed -n 's/unix/linux/p' [Link]

linux is great os. unix is opensource. unix is free os.

linuxlinux which one you choose.

If you use -n alone without /p, then the sed does not print anything.
10. Running multiple sed commands.
You can run multiple sed commands by piping the output of one sed command as input
to another
sed command.

>sed 's/unix/linux/' [Link]| sed 's/os/system/'

linux is great system. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you chosysteme.

Sed provides -e option to run multiple sed commands in a single sed command. The
above output
can be achieved in a single sed command as shown below.

>sed -e 's/unix/linux/' -e 's/os/system/' [Link]

linux is great system. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you chosysteme.


11. Replacing string on a specific line number.
You can restrict the sed command to replace the string on a specific line number.
An example is

>sed '3 s/unix/linux/' [Link]

unix is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

The above sed command replaces the string only on the third line.
12. Replacing string on a range of lines.
You can specify a range of line numbers to the sed command for replacing a string.

>sed '1,3 s/unix/linux/' [Link]

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

Here the sed command replaces the lines with range from 1 to 3. Another example is

>sed '2,$ s/unix/linux/' [Link]

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

Here $ indicates the last line in the file. So the sed command replaces the text
from second line to
last line in the file.
13. Replace on a lines which matches a pattern.
You can specify a pattern to the sed command to match in a line. If the pattern
match occurs, then
only the sed command looks for the string to be replaced and if it finds, then the
sed command
replaces the string.

>sed '/linux/ s/unix/centos/' [Link]

unix is great os. unix is opensource. unix is free os.

learn operating system.

centoslinux which one you choose.

Here the sed command first looks for the lines which has the pattern "linux" and
then replaces the
word "unix" with "centos".
14. Deleting lines.
You can delete the lines a file by specifying the line number or a range or
numbers.

>sed '2 d' [Link]

>sed '5,$ d' [Link]

15. Duplicating lines


You can make the sed command to print each line of a file two times.

>sed 'p' [Link]

16. Sed as grep command


You can make sed command to work as similar to grep command.

>grep 'unix' [Link]

>sed -n '/unix/ p' [Link]

Here the sed command looks for the pattern "unix" in each line of a file and prints
those lines that
has the pattern.
You can also make the sed command to work as grep -v, just by using the reversing
the sed with
NOT (!).

>grep -v 'unix' [Link]

>sed -n '/unix/ !p' [Link]

The ! here inverts the pattern match.


17. Add a line after a match.
The sed command can add a new line after a pattern match is found. The "a" command
to sed tells it
to add a new line after a match is found.

>sed '/unix/ a "Add a new line"' [Link]

unix is great os. unix is opensource. unix is free os.

"Add a new line"

learn operating system.

unixlinux which one you choose.

"Add a new line"

18. Add a line before a match


The sed command can add a new line before a pattern match is found. The "i" command
to sed tells
it to add a new line before a match is found.

>sed '/unix/ i "Add a new line"' [Link]

"Add a new line"

unix is great os. unix is opensource. unix is free os.

learn operating system.

"Add a new line"


unixlinux which one you choose.

19. Change a line


The sed command can be used to replace an entire line with a new line. The "c"
command to sed
tells it to change the line.

>sed '/unix/ c "Change line"' [Link]

"Change line"

learn operating system.

"Change line"

20. Transform like tr command


The sed command can be used to convert the lower case letters to upper case letters
by using the
transform "y" option.

>sed 'y/ul/UL/' [Link]

Unix is great os. Unix is opensoUrce. Unix is free os.

Learn operating system.

UnixLinUx which one yoU choose.

Here the sed command transforms the alphabets "ul" into their uppercase format "UL"

Oracle General Function Examples

The general functions work with any data type and are mainly used to handle null
values. The Oracle
general functions are

Oracle NVL Function


The syntax of NVL function is

NVL(expr1, expr2)

The NVL function takes two arguments as its input. If the first argument is NULL,
then it returns the
second argument otherwise it returns the first argument.

SELECT NVL(10,2) FROM DUAL;

SELECT NVL(NULL,'Oracle') FROM DUAL;

SELECT NVL(NULL,NULL) FROM DUAL;

Oracle NVL2 Function

The syntax of NVL2 function is

NVL2(expr1,expr2,expr3)

The NVL2 function takes three arguments as its input. If the expr1 is NOT NULL,
NVL2 function
returns expr2. If expr1 is NULL, then NVL2 returns expr3.

SELECT NVL2('Ora','SID','TNS') FROM DUAL;

SELECT NVL2(NULL,'SID','TNS') FROM DUAL;

Oracle NULLIF Function

The Syntax of NULLIF function is

NULLIF(expr1, expr2)
The NULLIF function compares the two expressions and returns NULL if they are
equal otherwise it
returns the first expression.

SELECT NULLIF('Oracle','MYSQL') FROM DUAL;

SELECT NULLIF('MYSQL','MYSQL') FROM DUAL;

Oracle COALESCE Function

The Syntax of COALESCE function is

COALESCE(expr1,expr2,expr3,...)

The COALESCE function takes N number of arguments as its input and returns the
first NON-NULL
argument.

SELECT COALESCE('DB Backup','Oracle') FROM DUAL;

SELECT COALESCE(NULL,'MYSQL',NULL) FROM DUAL;

Oracle Conversion Functions Examples

Conversions functions are used to convert one data type to another type. In some
cases oracle
server automatically converts the data to the required type. This is called
implicit conversion. Explicit
conversions are done by using the conversion functions. You have to take care of
explicit
conversions.

Oracle Explicit Data Type Conversion

Oracle provides three functions to covert from one data type to another.
1. To_CHAR ( number | date, [fmt], [nlsparams] )
The TO_CHAR function converts the number or date to VARCHAR2 data type in the
specified
format (fmt). The nlsparams parameter is used for number conversions. The nlsparams
specifies the
following number format elements:
. Decimal character
. Group separator
. Local currency symbol
. International currency symbol

If the parameters are omitted, then it uses the default formats specified in the
session.
Converting Dates to Character Type Examples
The Date format models are:

. YYYY: Four digit representation of year


. YEAR: Year spelled out
. MM: Two digit value of month
. MONTH: Full name of month
. MON: Three letter representation of month
. DY: Three letter representation of the day of the week
. DAY: Full name of the day
. DD: Numeric day of the month
. fm: used to remove any padded blanks or leading zeros.

SELECT TO_CHAR(hire_date, 'DD-MON-YYYY') FROM EMPLOYEES;

SELECT TO_CHAR(hire_date, 'fmYYYY') FROM EMPLOYEES;

SELECT TO_CHAR(hire_date, 'MON') FROM EMPLOYEES;

SELECT TO_CHAR(hire_date, 'YYYY/MM/DD') FROM EMPLOYEES;

Converting Numbers to Character type Examples


The Number format models are:

. 9: Specifies numeric position. The number of 9's determine the display width.
. 0: Specifies leading zeros.
. $: Floating dollar sign
. .: Decimal position
. ,: Comma position in the number

SELECT TO_CHAR(price, '$99,999') FROM SALES;

SELECT TO_CHAR(price, '99.99') FROM SALES;


SELECT TO_CHAR(price, '99,00') FROM SALES;

2. TO_NUMBER( char, ['fmt'] )


The TO_NUMBER function converts the characters to a number format.

SELECT TO_NUMBER('1028','9999') FROM DUAL;

SELECT TO_NUMBER('12,345','99,999') FROM DUAL;

3. TO_DATE( char, ['fmt'] )


The TO_DATE function converts the characters to a date data type.

SELECT TO_DATE('01-JAN-1985','DD-MON-YYYY') FROM DUAL;

SELECT TO_DATE('01-03-85','DD-MM-RR') FROM DUAL;

Oracle Subquery/Correlated Query Examples

A subquery is a SELECT statement which is used in another SELECT statement.


Subqueries are
very useful when you need to select rows from a table with a condition that depends
on the data of
the table itself. You can use the subquery in the SQL clauses including WHERE
clause, HAVING
clause, FROM clause etc.
The subquery can also be referred as nested SELECT, sub SELECT or inner SELECT. In
general,
the subquery executes first and its output is used in the main query or outer
query.
Types of Sub queries:
There are two types of subqueries in oracle:

. Single Row Subqueries: The subquery returns only one row. Use single row
comparison
operators like =, > etc while doing comparisions.
. Multiple Row Subqueries: The subquery returns more than one row. Use multiple row

comparison operators like IN, ANY, ALL in the comparisons.

Single Row Subquery Examples

1. Write a query to find the salary of employees whose salary is greater than the
salary of employee
whose id is 100?
SELECT EMPLOYEE_ID,

SALARY

FROM EMPLOYEES

WHERE SALARY >

SELECT SALARY

FROM EMPLOYEES

WHERE EMPLOYEED_ID = 100

2. Write a query to find the employees who all are earning the highest salary?

SELECT EMPLOYEE_ID,

SALARY

FROM EMPLOYEES

WHERE SALARY =

SELECT MAX(SALARY)

FROM EMPLOYEES

3. Write a query to find the departments in which the least salary is greater than
the highest salary in
the department of id 200?

SELECT DEPARTMENT_ID,

MIN(SALARY)

FROM EMPLOYEES
GROUP BY DEPARTMENT_ID

HAVING MIN(SALARY) >

SELECT MAX(SALARY)

FROM EMPLOYEES

WHERE DEPARTMENT_ID = 200

Multiple Row Subquery Examples

1. Write a query to find the employees whose salary is equal to the salary of at
least one employee
in department of id 300?

SELECT EMPLOYEE_ID,

SALARY

FROM EMPLOYEES

WHERE SALARY IN

SELECT SALARY

FROM EMPLOYEES

WHERE DEPARTMENT_ID = 300

2. Write a query to find the employees whose salary is greater than at least on
employee in
department of id 500?

SELECT EMPLOYEE_ID,
SALARY

FROM EMPLOYEES

WHERE SALARY > ANY

SELECT SALARY

FROM EMPLOYEES

WHERE DEPARTMENT_ID = 500

3. Write a query to find the employees whose salary is less than the salary of all
employees in
department of id 100?

SELECT EMPLOYEE_ID,

SALARY

FROM EMPLOYEES

WHERE SALARY < ALL

SELECT SALARY

FROM EMPLOYEES

WHERE DEPARTMENT_ID = 100

4. Write a query to find the employees whose manager and department should match
with the
employee of id 20 or 30?

SELECT EMPLOYEE_ID,

MANAGER_ID,
DEPARTMENT_ID

FROM EMPLOYEES

WHERE (MANAGER_ID,DEPARTMENT_ID) IN

SELECT MANAGER_ID,

DEPARTMENT_ID

FROM EMPLOYEES

WHERE EMPLOYEE_ID IN (20,30)

5. Write a query to get the department name of an employee?

SELECT EMPLOYEE_ID,

DEPARTMENT_ID,

(SELECT DEPARTMENT_NAME

FROM DEPARTMENTS D

WHERE D.DEPARTMENT_ID = E.DEPARTMENT_ID

FROM EMPLOYEES E

Correlated SubQueries Examples

Correlated sub query is used for row by row processing. The sub query is executed
for each row of
the main query.
1. Write a query to find the highest earning employee in each department?

SELECT DEPARTMENT_ID,
EMPLOYEE_ID,

SALARY

FROM EMPLOYEES E_0

WHERE 1 =

SELECT COUNT(DISTINCT SALARY)

FROM EMPLOYEES E_I

WHERE E_O.DEPARTMENT_ID = E_I.DEPARTMENT_ID

AND E_O.SALARY <= E_I.SALARY

2. Write a query to list the department names which have at lease one employee?

SELECT DEPARTMENT_ID,

DEPARTMENT_NAME

FROM DEPARTMENTS D

WHERE EXISTS

SELECT 1

FROM EMPLOYEES E

WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID)

3. Write a query to find the departments which do not have employees at all?

SELECT DEPARTMENT_ID,

DEPARTMENT_NAME
FROM DEPARTMENTS D

WHERE NOT EXISTS

SELECT 1

FROM EMPLOYEES E

WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID)

Oracle Single Row Functions Examples

Oracle provides single row functions to manipulate the data values. The single row
functions operate
on single rows and return only one result per row. In general, the functions take
one or more inputs
as arguments and return a single value as output. The arguments can be a user-
supplied constant,
variable, column name and an expression.
The features of single row functions are:

. Act on each row returned in the query.


. Perform calculations on data.
. Modify the data items.
. Manipulate the output for groups of rows.
. Format numbers and dates.
. Converts column data types.
. Returns one result per row.
. Used in SELECT, WHERE and ORDER BY clauses.
. Single row functions can be nested.

The single row functions are categorized into

. Character Functions: Character functions accept character inputs and can return
either
character or number values as output.
. Number Functions: Number functions accepts numeric inputs and returns only
numeric values
as output.
. Date Functions: Date functions operate on date data type and returns a date value
or numeric
value.
. Conversions Functions: Converts from one data type to another data type.
. General Functions
Let see each function with an example:

Character Functions Example

1. LOWER
The Lower function converts the character values into lowercase letters.

SELECT lower('ORACLE') FROM DUAL;

2. UPPER
The Upper function converts the character values into uppercase letters.

SELECT upper('oracle') FROM DUAL;

3. INITCAP
The Initcap function coverts the first character of each word into uppercase and
the remaining
characters into lowercase.

SELECT initcap('LEARN ORACLE') FROM DUAL;

4. CONCAT
The Concat function coverts the first string with the second string.

SELECT concat('Oracle',' Backup) FROM DUAL;

5. SUBSTR
The Substr function returns specified characters from character value starting at
position m and n
characters long. If you omit n, all characters starting from position m to the end
are returned.

Syntax: substr(string [,m,n])

SELECT substr('ORACLE DATA RECOVERY',8,4) FROM DUAL;


SELECT substr('ORACLE DATA PUMP',8) FROM DUAL;

You can specify m value as negative. In this case the count starts from the end of
the string.

SELECT substr('ORACLE BACKUP',-6) FROM DUAL;

6. LENGTH
The Length function is used to find the number of characters in a string.

SELECT length('Oracle Data Guard') FROM DUAL;

7. INSTR
The Instr function is used to find the position of a string in another string.
Optionally you can provide
position m to start searching for the string and the occurrence n of the string. By
default m and n are
1 which means to start the search at the beginning of the search and the first
occurrence.

Syntax: instr('Main String', 'substring', [m], [n])

SELECT instr('oralce apps','app') FROM DUAL;

SELECT instr('oralce apps is a great application','app',1,2) FROM DUAL;

8. LPAD
The Lpad function pads the character value right-justified to a total width of n
character positions.

Syntax: lpad(column, n, 'string');

SELECT lpad('100',5,'x') FROM DUAL;

9. RPAD
The Rpad function pads the character value left-justified to a total width of n
character positions.

Syntax: rpad(column, n, 'string');


SELECT rpad('100',5,'x') FROM DUAL;

10. TRIM
The Trim function removes the leading or trailing or both the characters from a
string.

Syntax: trim(leading|trailing|both, trim_char from trim_source)

SELECT trim('O' FROM 'ORACLE') FROM DUAL;

11. REPLACE
The Replace function is used to replace a character with another character in a
string.

Syntax: replace(column, old_char,new_char)

SELECT replace('ORACLE DATA BACKUP', 'DATA','DATABASE') FROM DUAL;

Number Functions Example

1. ROUND
The Round function rounds the value to the n decimal values. If n is not specified,
there won't be any
decimal places. If n is negative, numbers to the left of the decimal point are
rounded.

Syntax: round(number,n)

SELECT round(123.67,1) FROM DUAL;

SELECT round(123.67) FROM DUAL;

SELECT round(123.67,-1) FROM DUAL;

2. TRUNC
The Trunc function truncates the value to the n decimal places. If n is omitted,
then n defaults to
zero.

Syntax: trunc(number,n)
SELECT trunc(123.67,1) FROM DUAL;

SELECT trunc(123.67) FROM DUAL;

3. MOD
The Mod function returns the remainder of m divided by n.

Syntax: mod(m,n)

SELECT mod(10,5) FROM DUAL;

Date Functions Example

1. SYSDATE
The Sysdate function returns the current oracle database server date and time.

SELECT sysdate FROM DUAL;

2. Arithmetic with Dates


You can add or subtract the number of days or hours to the dates. You can also
subtract the dates

SELECT sysdate+2 "add_days" FROM DUAL;

SELECT sysdate-3 "sub_days" FROM DUAL;

SELECT sysdate+3/24 "add_hours" FROM DUAL;

SELECT sysdate-2/24 "sub_hours" FROM DUAL;

SELECT sysdate-hire_date "sub_dates" FROM EMPLOYEES; -- returns number of


days between the two dates.

3. MONTHS_BETWEEN
The Months_Between function returns the number of months between the two given
dates.
Syntax: months_between(date1,date2)

SELECT months_between(sysdate,hire_date) FROM EMPLOYEES:

SELECT months_between('01-JUL-2000', '23-JAN-2000') FROM DUAL;

4. ADD_MONTHS
The Add_Months is used to add or subtract the number of calendar months to the
given date.

Syntax: add_months(date,n)

SELECT add_months(sysdate,3) FROM DUAL;

SELECT add_months(sysdate,-3) FROM DUAL;

SELECT add_months('01-JUL-2000', 3) FROM DUAL;

5. NEXT_DAY
The Next_Day function finds the date of the next specified day of the week. The
syntax is
NEXT_DAY(date,'char')
The char can be a character string or a number representing the day.

SELECT next_day(sysdate,'FRIDAY') FROM DUAL;

SELECT next_day(sysdate,5) FROM DUAL;

SELECT next_day('01-JUL-2000', 'FRIDAY') FROM DUAL;

6. LAST_DAY
The Last_Day function returns the last day of the month.

SELECT last_day(sysdate) FROM DUAL;

SELECT last_day('01-JUL-2000') FROM DUAL;

7. ROUND
The Round function returns the date rounded to the specified format. The Syntax is

Round(date [,'fmt'])

SELECT round(sysdate,'MONTH') FROM DUAL;

SELECT round(sysdate,'YEAR') FROM DUAL;

SELECT round('30-OCT-85','YEAR') FROM DUAL;

8. TRUNC
The Trunc function returns the date truncated to the specified format. The Syntax
is
Trunc(date [,'fmt'])

SELECT trunc(sysdate,'MONTH') FROM DUAL;

SELECT trunc(sysdate,'YEAR') FROM DUAL;

SELECT trunc('01-MAR-85','YEAR') FROM DUAL;

The Oracle Conversion and General Functions are covered in other sections. Go
through the links
Oracle Conversion Functions and Oracle General Functions.

Oracle With Clause Examples

Oracle With Clause is similar to temporary tables, where you store the data once
and read it multiple
times in your sql query. Oracle With Clause is used when a sub-query is executed
multiple times. In
simple With Clause is used to simply the complex SQL. You can improve the
performance of the
query by using with clause.
Syntax of Oracle With Clause

With query_name As

SQL query

SELECT * FROM query_name;

At first, the With Clause syntax seems to be confusing as it does not begin with
the SELECT. Think
of the query_name as a temporary table and use it in your queries.
Oracle With Clause Example
We will see how to write a sql query with the help of With Clause. As an example,
we will do a math
operation by dividing the salary of employee with the total number of employees in
each department.

WITH CNT_DPT AS

SELECT DEPARTMENT_ID,

COUNT(1) NUM_EMP

FROM EMPLOYEES

GROUP BY DEPARTMENT_ID

SELECT EMPLOYEE_ID,

SALARY/NUM_EMP

FROM EMPLOYEES E,

CNT_DEPT C

WHERE E.DEPARTMENT_ID = C.DEPARTMENT_ID;

SQL With clause is mostly used in computing aggregations.

Java String Class/Object Methods Examples

String is one of the widely used java classes. The Java String class is used to
represent the
character strings. All String literals in Java programs are implemented as instance
of the String
Class. Strings are constants and their values cannot be changed after they created.
Java String
objects are immutable and they can be shared. The String Class includes several
methods to edit
the content of string.
Creating a String Object:
A String object can be created as
String Str="Car Insurance";
The string object can also be created using the new operator as
String str= new String("Car Insurance");
Java provides a special character plus (+) to concatenate strings. The plus
operator is the only
operator which is overloaded in java. String concatenation is implemented through
the String Builder
of String Buffer class and their append method.
Examples of Java String Class:
1. Finding the length of the string
The length() method can be used to find the length of the string.

String str = "Car insurance"

[Link]([Link]());

2. Comparing strings
The equals() method is used to comapre two strings.

String str = "car finance";

if ( [Link]("car loan") )

[Link]("Strings are equal");

else

[Link]("Strings are not equal");

}
3. Comparing strings by ignoring case
The equalsIgnoreCase() method is used to compare two strings by ignoring the case.

String str = "insurance";

if ( [Link]("INSURANCE") )

[Link]("insurance strings are equal");

else

[Link]("insurance Strings are not equal");

4. Finding which string is greater


The CompareTo() method compares two strings to find which string is alphabetically
greater.

String str = "car finance";

if ( [Link]("auto finance") > 0)

[Link]("car finance string is alphabetically greater");

else

{
[Link]("car finance string is alphabetically lesser");

5. Finding which string is greater while ignoring the case.


The compareToIgnoreCase() method is same as the CompareTo() method except that it
ignores
case while comparing.

String str = "car finance";

if ( [Link]("CAR finance") = 0)

[Link]("strings are alphabetically same");

else

[Link]("strings are alphabetically not same");

6. Position of a string in another string.


The indexOf() method is used to find the position of a string in another string.

String str= "car insurance";

[Link]( [Link]("car") );

7. Extract single character from a string.


The CharAt() method is used to extract a single character by specifying the
position of the character.
String str = "Auto Finance";

[Link]( [Link](5) );

8. Extracting part of a string.


The substring() is used to method part of a string by specifying the start position
and end position.

String str = "Car Finance";

[Link]( [Link](1,3));

9. Hash code of a string.


The hashCode() method is used to get the hash code of a string.

String str = "Auto Insurance";

[Link]([Link]());

10. replacing characters in a string.


The replace() method is used to replace a character in a string with new character.

string str = "Auto Loan";

[Link]([Link]("A", "L"));

11. Converting a string to upper case.


The toUpperCase() method is used to convert a string to upper case letters.

String str = "Cheap Car Insurance";

[Link]([Link]());

12. Converting a string to lower case.


The toLowerCase() method is used to covert a string to lower case letters.

String str = "Insurance Quote";

[Link]([Link]());

Basic Unix and Linux Commands With Examples

Learning unix operating system is very easy. It is just that you need to understand
the unix server
concepts and familiar with the unix commands. Here I am providing some important
unix commands
which will be used in daily work.
Unix Commands With Examples:
1. Listing files
The first thing after logging into the unix system, everyone does is listing the
files in a directory. The
ls command is used to list the files in a directory.

>ls

[Link]

[Link]

[Link]

If you simply execute ls on the command prompt, then it will display the files and
directories in the
current directory.

>ls /usr/local/bin

You can pass a directory as an argument to ls command. In this case, the ls command
prints all the
files and directories in the specific directory you have passed.
2. Displaying the contents of a file.
The next thing is to display the contents of a file. The cat command is used to
display the contents in
a file.

>cat [Link]

This is a sample unix file

Learning about unix server is awesome

3. Displaying first few lines from a file.


The head command can be used to print the specified number of lines from the
starting of a file. The
below head command displays the first five lines of file.

>head -5 [Link]

4. Displaying last few lines from a file.


The tail command can be used to print the specified number of lines from the ending
of a file. The
below tail command displays the last three lines of file.

>tail -3 [Link]

5. Changing the directories


The cd command can be used to change from one directory to another directory. You
need to
specify the target directory where you want to go.

>cd /var/tmp

After typing this cd command you will be in /var/tmp directory.


6. Creating a file.
The touch command simply creates an empty file. The below touch command creates a
new file in
the current directory.

touch new_file.txt

7. copying the contents of one file into another.


The cp command is used to copy the content of source file into the target file. If
the target file already
have data, then it will be overwritten.

>cp source_file target_file

8. Creating a directory.
Directories are a way of organizing your files. The mkdir command is used to create
the specified
directory.

>mkdir backup

This will create the backup directory in the current directory.


9. Renaming and moving the files.
The mv command is used to rename the files and it also used for moving the files
from one directory
into another directory.

Renaming the file.

>mv [Link] new_file.txt

Moving the file to another directory.

>mv new_file.txt tmp/


10. Finding the number of lines in a file
The wc command can be used to find the number of line, words and characters in a
file.

>wc [Link]

21 26 198 [Link]

To know about the unix command, it is always good to see the man pages. To see the
man pages
simply pass the command as an argument to the man.

man ls

Replace String With Awk/Sed Command in Unix

Replace String With Awk/Sed Command In Unix:


You might have used the Sed Command often to replace the text in file. Awk can also
be used to
replace the strings in a file.
Here i will show you how to replace the string with awk command. To learn about
replacing text with
sed command go though the link, Replace String with Sed Command
Replace text with Awk command
1. First we will see a simple example of replacing the text. The source file
contians the below data

>cat [Link]

Learn unix

Learn linux

We want to replace the word "unix" with "fedora". Here the word "unix" is in the
second field. So, we
need to check for the word "unix" in the second field and replace it with workd
"fedora" by assigning
the new value to the second field. The awk command to replace the text is
awk '{if($2=="unix") {$2="fedora"} print $0}' [Link]

Learn fedora

Learn linux

2. Now we will see a bit complex [Link] the text file with the below data

>cat [Link]

left

In left

right

In top

top

In top

bottom

In bottom

)
Now replace the string, "top" in right section with the string "right". The output
should look as

left

In left

right

In right

top

In top

bottom

In bottom

Here the delimiter in the text file is brace. We have to specify the delimiters in
awk command with
the record separators. The below awk command can be used to replace the string in a
file

awk -vRS=")" '/right/{ gsub(/top/,"right"); }1' ORS=")" [Link]

Here RS is the input record separator and ORS is the output record separator.
Recommended Posts:
Examples of Awk Command

Date Command in Unix and Linux Examples

Date command is used to print the date and time in unix. By default the date
command displays the
date in the time zone that the unix operating system is configured.
Now let see the date command usage in unix
Date Command Examples:
1. Write a unix/linux date command to print the date on the terminal?

>date

Mon Jan 23 [Link] PST 2012

This is the default format in which the date command print the date and time. Here
the unix server is
configured in pacific standard time.
2. Write a unix/linux date command to print the date in GMT/UTC time zone?

>date -u

Mon Jan 23 [Link] UTC 2012

The -u option to the date command tells it to display the time in Greenwich Mean
Time.
3. Write a unix/linux date command to sett the date in unix?
You can change the date and time by using the -s option to the date command.

>date -s "01/01/2000 [Link]"

4. Write a unix/linux date command to display only the date part and ignore the
time part?
>date '+%m-%d-%Y'

01-23-2012

You can format the output of date command by using the %. Here %m for month, %d for
day and
%Y for year.
5. Write a unix/linux date command to display only the time part and ignore the
date part?

>date '+%H-%M-%S'

01-48-45

Here %H is for hours in 24 hour format, %M is for minutes and %S for seconds
6. Write a unix/linux date command to format both the date and time part.

>date '+%m-%d-%Y %H-%M-%S'

01-23-2012 01-49-59

7. Write a unix/linux date command to find the number of seconds from unix epoch.

>date '+%s'

1327312228

Unix epoch is the date on January 1st, 1970. The %s option is used to find the
number of seconds
between the current date and unix epoch.

Oracle Interview Questions

The oralce inteview questions are classified into

1. SQL Interview Questions


2. PL/SQL Interview Questions
SQL Interview Questions:

1. Write a query to find the highest salary earned by an employee in each


department and also the
number of employees who earn the highest salary?

SELECT DEPARTMENT_ID,

MAX(SALARY) HIGHEST_SALARY,

COUNT(1) KEEP(DENSE_RANK LAST ORDER BY SALARY) CNT_HIGH_SAL

FROM EMPLOYEES

GROUP BY DEPARTMENT_ID;

2. Write a query to get the top 2 employees who are earning the highest salary in
each department?

SELECT DEPARTMENT_ID,

EMPLOYEE_ID,

SALARY

FROM

SELECT DEPARTMENT_ID,

EMPLOYEE_ID,

SALARY,

ROW_NUMBER() OVER(PARTITION BY DEPARTMENT_ID ORDER BY SALARY DESC ) R

FROM EMPLOYEES

) A
WHERE R <= 2;

3. Write a query to delete the duplicate records from employees table?

DELETE FROM EMPLOYEES

WHERE ROWID NOT IN

(SELECT MAX(ROWID) FROM EMPLOYEES GROUP BY EMPLOYEE_ID);

4. Write a query to find the employees who are earning more than the average salary
in their
department?

SELECT EMPLOYEE_ID,

SALARY

FROM EMPLOYEES E_O

WHERE SALARY >

( SELECT AVG(SALARY) FROM EMPLOYEES E_I

WHERE E_I.DEPARTMENT_ID = E_O.DEPARTMENT_ID );

5. How do you display the current date in oracle?

SELECT SYSDATE FROM DUAL;

6. What is a correlated Query?


It is a form of sub query, where the sub query uses the values from the outer query
in its WHERE
clause. The sub query runs for each row processed in the outer query. Question 4 is
an example for
a correlated sub query.

PL/SQL Interview Questions:

1. What is a cursor?
A cursor is a reference to the system memory when an SQL statement is executed. A
cursor
contains the information about the select statement and the rows accessed by it.

2. What is implicit cursor and explicit cursor?

. Implicit Cursors: Implicit cursors are created by default when DML statements
like INSERT,
UPDATE and DELETE are executed in PL/SQL objects.
. Explicit Cursors: Explicit cursors must be created by you when executing the
select statements.

3. What are the attributes of a cursor?

Cursor attributes are:

. %FOUND : Returns true if a DML or SELECT statement affects at least one row.
. %NOTFOUND: Returns true if a DML or SELECT statement does not affect at least one
row.
. %ROWCOUNT: Returns the number of rows affected by the DML or SELECT statement.
. %ISOPEN: Returns true if a cursor is in open state.
. %BULK_ROWCOUNT: Similar to %ROWCOUNT, except it is used in bulk operations.

4. What is a private and public procedure?

. Public procedure: In a package, the signature of the procedure is specified in


the package
specification. This procedure can be called outside of the package.
. Private procedure: For private procedure, there won�t be any signature in the
package
specification. So, these procedures can be called only inside the package and
cannot be called
outside of the package.
5. Create a sample delete trigger on employees table?

CREATE OR REPLACE TRIGGER EMPLOYEES_AD"

AFTER DELETE ON EMPLOYEES


REFERENCING NEW AS NEW OLD AS OLD

FOR EACH ROW

BEGIN

INSERT INTO

employees_changes (employee_id,

change_date

VALUES (:OLD.photo_tag_id,

SYSDATE

);

END;

6. What is the difference between a procedure and a function?


A function returns a value. However a procedure does not return a value.

DDL statements are not allowed in Procedures (PLSQL BLOCK)

PL/SQL objects are precompiled. All the dependencies are checked before the
execution of the
objects. This makes the programs to execute faster.
The dependencies include database objects, Tables, Views, synonyms and other
objects. The
dependency does not depend on the data.
As DML (Data Manipulation Language) statements do not change the dependency, they
can run
directly in PL/SQL objects. On the other hand, DDL (Data Definition Language)
statements like
CREATE, DROP, ALTER commands and DCL (Data Control Language) statements like GRANT,

REVOKE can change the dependencies during the execution of the program.
Example: Let say you have dropped a table during the execution of a program and
later in the same
program when you try to insert a record in to that table the program will fail.
This is the reason why DDL statements are not allowed directly in PL/SQL programs.

Unix Search File

One of the basic feature of any operating system is to search for files. Unix
operating system also
provides this feature for searching the files. The Find Command in Unix is used for
searching files
and directories in Unix, Linux and other Unix like operating systems.
You can specify search criteria for searching files and directories. If you do not
specify any criteria,
the find command searches for the files in the current directory.
Unix Search Command Examples:
1. Searching for the files in the current directory.

find . -name '*.sh'

The dot(.) represents the current directory and -name option specifies the name of
the file to be
searched. This find command searches for all the files with ".sh" as the suffix.
2. Searching for the file in all the directories.

find / -type f -name '[Link]'

The / specifies the home directory of the user, which is at the highest level and
the -type option
specifies the type of file. This command searches for the regular file,"[Link]",
in all the directories.
3. Searching for the file in a particular directory.

find /usr/local/bin/ -type f -name '*.java'

This find command searches for all the java files in the /usr/local/bin directory.
4. Searching for a directory.
find . -type d -name 'tmp'

The -type d indicates the directory. This find command searches for the tmp
directory in the current
directory.
5. Searching for a directory in another directory

find /var/tmp/ -typd d -name 'personal'

This find command searches for the personal directory in the /var/tmp directory.

Find Command to Delete Files and Directories

Most of you might have used find command to search for files and directories. The
find command
can also be used to delete the files and directories. The find command has -delete
option which can
be used to delete files, directories etc. Be careful while using the -delete option
of the find command
especially when using recursive find. Otherwise you will end up in deleting the
important files.
1. The basic find command to delete a file in the current directory is

find . -name filename -delete

2. The find command to remove empty files in the current directory is

find . -type f -empty -delete

3. The find command to delete empty directories is

find . -type d -empty -delete


Equi Join Examples in Oracle

What is an EquiJoin
The Join condition specified determines what type of join it is. When you relate
two tables on the join
condition by equating the columns with equal(=) symbol, then it is called an Euqi-
Join. Equi-joins are
also called as simple joins.
Examples:
1. To get the department name of an employee from departments table, then you need
to compare
the department_id column in the Employees table with the department_id column in
the departments
table. The SQL query for this is

SELECT employee_id,

department_name

FROM employee e,

departments d

WHERE e.department_id = d.department_id;

2. Consider the below three tables


Customers(customer_id, customer_name)
Products(product_id, product_name)
Sales(sale_id, price, customer_id, product_id)
Write a sql query to get the products purchased by a customer. You have to do an
equi-join between
the customers, products and sales table. The SQL query for this is

SELECT c.customer_name,

p.product_name

FROM Customers c,

Sales s,

Products p

WEHRE c.customer_id = s.customer_id


AND s.product_id = p.product_id

Make Awk Command Case Insensitive

Awk command is used to parse the files which have delimited data. By default, awk
command does
a case-sensitive parsing. The awk command has a IGNORECASE built-in variable to do
a case
insensitive parsing. We will see about the IGNORECASE in detail here.
Consider a sample text file with the below data

>cat [Link]

mark iphone

jhon sony

peter Iphone

chrisy motorola

The below awk command can be used to display the lines which have the word "iphone"
in it.
awk '{if($2 == "iphone") print $0 }' [Link]
This awk command looks for the word "iphone" in the second column of each line and
if it finds a
match, then it displays that line. Here it just matches the word "iphone" and did
not match the word
"Iphone". The awk did a case sensitive match.
The output of the above command is

mark iphone

You can make the awk to do case insensitive and match even for the words like
"Iphone" or
"IPHONE" etc.
The IGNORECASE is a built in variable which can be used in awk command to make it
either case
sensitive or case insensitive.
If the IGNORECASE value is 0, then the awk does a case sensitive match. If the
value is 1, then the
awk does a case insensitive match.
The awk command for case insensitive match is

awk 'BEGIN {IGNORECASE = 1} {if($2 == "iphone") print $0 }' [Link]

The ouput is

mark iphone

peter Iphone

Parsing /etc/passwd Unix File With Awk Command

Awk command parses the files which have delimited structure. The /etc/passwd file
is a delimited
file. Using the Awk command is a good choice to parse the /etc/passwd file.
Sample /etc/passwd file
looks like as below

root:x:0:0:root:/root:/bin/bash

bin:x:1:1:bin:/bin:/sbin/nologin

daemon:x:2:2:daemon:/sbin:/sbin/nologin

adm:x:3:4:adm:/var/adm:/sbin/nologin

lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

sync:x:5:0:sync:/sbin:/bin/sync

The /etc/passwd file contains the data in the form of row and columns. The columns
are delimited by
a colon (:) character.
Now we will see how to write an Awk command which reads the /etc/passwd file and
prints the
names of the users who have the /bin/bash program as their defaualt shell command.
awk -F: '$7 == "/bin/bash" { print $1 }' /etc/passwd
The -F option is used to specify the filed delimiter.
The output of the above awk command is

root

Grep Command in Unix and Linux Examples

Grep is the frequently used command in Unix (or Linux). Most of us use grep just
for finding the
words in a file. The power of grep comes with using its options and regular
expressions. You can
analyze large sets of log files with the help of grep command.
Grep stands for Global search for Regular Expressions and Print.
The basic syntax of grep command is
grep [options] pattern [list of files]
Let see some practical examples on grep command.
1. Running the last executed grep command
This saves a lot of time if you are executing the same command again and again.

!grep

This displays the last executed grep command and also prints the result set of the
command on the
terminal.
2. Search for a string in a file
This is the basic usage of grep command. It searches for the given string in the
specified file.

grep "Error" [Link]

This searches for the string "Error" in the log file and prints all the lines that
has the word "Error".
3. Searching for a string in multiple files.

grep "string" file1 file2


grep "string" file_pattern

This is also the basic usage of the grep command. You can manually specify the list
of files you want
to search or you can specify a file pattern (use regular expressions) to search
for.
4. Case insensitive search
The -i option enables to search for a string case insensitively in the give file.
It matches the words
like "UNIX", "Unix", "unix".

grep -i "UNix" [Link]

5. Specifying the search string as a regular expression pattern.

grep "^[0-9].*" [Link]

This will search for the lines which starts with a number. Regular expressions is
huge topic and I am
not covering it here. This example is just for providing the usage of regular
expressions.
6. Checking for the whole words in a file.
By default, grep matches the given string/pattern even if it found as a substring
in a file. The -w
option to grep makes it match only the whole words.

grep -w "world" [Link]

7. Displaying the lines before the match.


Some times, if you are searching for an error in a log file; it is always good to
know the lines around
the error lines to know the cause of the error.

grep -B 2 "Error" [Link]

This will prints the matched lines along with the two lines before the matched
lines.
8. Displaying the lines after the match.

grep -A 3 "Error" [Link]


This will display the matched lines along with the three lines after the matched
lines.
9. Displaying the lines around the match

grep -C 5 "Error" [Link]

This will display the matched lines and also five lines before and after the
matched lines.
10. Searching for a sting in all files recursively
You can search for a string in all the files under the current directory and sub-
directories with the
help -r option.

grep -r "string" *

11. Inverting the pattern match


You can display the lines that are not matched with the specified search sting
pattern using the -v
option.

grep -v "string" [Link]

12. Displaying the non-empty lines


You can remove the blank lines using the grep command.

grep -v "^$" [Link]

13. Displaying the count of number of matches.


We can find the number of lines that matches the given string/pattern

grep -c "sting" [Link]

14. Display the file names that matches the pattern.


We can just display the files that contains the given string/pattern.
grep -l "string" [Link]

15. Display the file names that do not contain the pattern.
We can display the files which do not contain the matched string/pattern.

grep -l "string" [Link]

16. Displaying only the matched pattern.


By default, grep displays the entire line which has the matched string. We can make
the grep to
display only the matched string by using the -o option.

grep -o "string" [Link]

17. Displaying the line numbers.


We can make the grep command to display the position of the line which contains the
matched
string in a file using the -n option

grep -n "string" [Link]

18. Displaying the position of the matched string in the line


The -b option allows the grep command to display the character position of the
matched string in a
file.

grep -o -b "string" [Link]

19. Matching the lines that start with a string


The ^ regular expression pattern specifies the start of a line. This can be used in
grep to match the
lines which start with the given string or pattern.

grep "^start" [Link]


20. Matching the lines that end with a string
The $ regular expression pattern specifies the end of a line. This can be used in
grep to match the
lines which end with the given string or pattern.

grep "end$" [Link]

If you like this post, please share it on google by clicking on the +1 button.

Restricting Rows(WHERE Clause) and Sorting Rows(ORDER BY Clause)


Examples - Oracle

The WHERE clause in Oracle is used to limit the rows in a table. And the ORDER BY
clause is used
to sort the rows that are retrieved by a SELECT statement.
The syntax is

SELECT *| {[DISTINCT]} column| expression [alias],..}

FROM table

[WHERE Condition]

[ORDER BY columns|expressions [ASC|DESC]]

Examples:
Let�s use the sales table as an example for all the below oracle problems. The
sales table structure
is

CREATE TABLE SALES

SALE_ID INTEGER,

PRODUCT_ID INTEGER,

YEAR INTEGER,
Quantity INTEGER,

PRICE INTEGER

);

1. Comparison Operators.
The comparison operators are used in conditions that compare one expression to
another value or
expression. The
comparison operators supported in oracle are "Equal To (=)", "Greater Than (>)",
"Greater Than Or
Equal To (>=)" , "Less Than (<)", "Less Than Or Equal To (<=)", "Not Equal To (<>
or !=)".

SELECT *

FROM SALES

WHERE YEAR = 2012;

This query will return only the rows which have the year column data as 2012.
Similarly you can use
other comparison operators in the where condition.
2. Using the AND logical operator.
You can specify more than one condition in the WHERE clause. The AND operator is
used when
you want all the conditions to satisfy.

SELECT *

FROM SALES

WHERE YEAR=2012

AND PRODUCT_ID=10;

This query will return rows when both the conditions (YEAR=2012, PRODUCT_ID=10) are
true.
3. Using the OR logical operator.
The OR operator is used when you want at least one of the specified conditions to
be true.
SELECT *

FROM SALES

WHERE YEAR=2012

OR PRODUCT_ID=10;

This query will return rows when at least one of the conditions (YEAR=2012,
PRODUCT_ID=10) is
true.
4. Using the IN operator.
The IN operator can be used to test for a value with a list of values.

SELECT *

FROM SALES

WHERE YEAR IN (2010,2011,2012);

Here the YEAR column should match with 2010 or 2011 or 2012. This is like
specifying multiple OR
conditions. This can be rewritten using the OR as

SELECT *

FROM SALES

WHERE YEAR = 2010

OR YEAR = 2011

OR YEAR = 2012;

5. Using the BETWEEN AND operator.


The BETWEEN AND operator can be used to match the condition against a range of
values which
fall between the lower limit and upper limit.

SELECT *
FROM SALES

WHERE YEAR BETWEEN 2010 AND 2020;

This will return the rows whose years fall between 2010 and 2020. This query can be
rewritten with
the AND operator as

SELECT *

FROM SALES

WHERE YEAR >= 2010

AND YEAR <= 2020;

6. Using the LIKE Operator.


The LIKE operator is used to perform wildcard searches of string values. The wild
cards allowed are
% denotes zero or many characters. _ denotes one character. The following example
retrieves all
the rows whose year starts with 19.

SELECT *

FROM SALES

WHERE YEAR LIKE '19%';

The below example selects the data whose year starts with 2 and ends with 9.

SELECT *

FROM SALES

WHERE YEAR LIKE '2__9';

Sometimes, there might be cases where you want to look for the % and _ characters
in the strings.
In such cases you have to escape these wild characters.

SELECT *
FROM CUSTOMERS

WHERE EMAIL LIKE 'Chris\_2000@[Link]' ESCAPE '\';

7. Sort the data by YEAR in ascending order

SELECT *

FROM SALES

ORDER BY YEAR ASC;

The ASC keyword specifies to sort the data in ascending order. By default the
sorting is in ascending
order. You can omit the ASC keyword if you want the data to be sorted in ascending
order. The
DESC keyword is used to sort the data in descending order.
8. Sort the data by YEAR in ascending order and then PRICE in descending order.

SELECT *

FROM SALES

ORDER BY YEAR ASC, PRICE DESC;

Oracle Select Clause With Examples

The SELECT statement is used to retrieve the data from the database. The SELECT
statement can
do the following things:

. Projection: You can choose only the required columns in a table that you want to
retrieve.
. Selection: You can restrict the rows returned by a query.
. Joining: You can bring the data from multiple tables by joining them.

Syntax of SELECT statement:


The basic syntax of SELECT statement is

SELECT *| {[DISTINCT]} column| expression [alias],..}


FROM table;

The SELECT statement has many optional clauses. They are:

. WHERE: specifies which rows to read


. GROUP BY: Groups set of rows in a table.
. HAVING: used to restrict groups
. ORDER BY: specifies the order in which the rows to be returned.

Examples:
Let�s use the sales table as an example for all the below oracle problems. The
sales table structure
is

CREATE TABLE SALES

SALE_ID INTEGER,

PRODUCT_ID INTEGER,

YEAR INTEGER,

Quantity INTEGER,

PRICE INTEGER

);

1. Selecting all columns of all rows from a table.

SELECT *

FROM SALES;

Here the asterisk (*) indicates all columns.


2. Selecting particular columns from a table.

SELECT PRODUCT_ID,
YEAR

FROM SALES;

Here we have selected only two columns from the sales table. This type of selection
is called
projection.
3. Specifying aliases.

SELECT S.SALE_ID ID,

[Link] P

FROM SALES S

We can specify aliases to the columns and tables. These aliases come in handy to
specify a short
name.
4. Arithmetic Operations.
We can do arithmetic operations like addition (+), subtraction (-), division (/)
and multiplication (*).

SELECT SALE_ID,

PRICE*100

FROM SALES

Here the price is multiplied with 100. Similarly, other arithmetic operations can
be applied on the
columns.
5. Concatenation operator.
The concatenation operator can be used to concatenate columns or strings. The
concatenation
operator is represented by two vertical bars(||).

SELECT SALE_ID||PRODUCT_ID SP,

YEAR||'01' Y

FROM SALES;
In the above example, you can see how we can concatenate two columns, a column
with a string.
You can also concatenate two strings.
6. Eliminating duplicate rows
The DISTINCT keyword can be used to suppress the duplicate rows.

SELECT DISTINCT YEAR

FROM SALEs;

This will give only unique years from the sales table.

SQL Interview Questions and Answers

1. What is Normalization?
Normalization is the process of organizing the columns, tables of a database to
minimize the
redundancy of data. Normalization involves in dividing large tables into smaller
tables and defining
relationships between them. Normalization is used in OLTP systems.
2. What are different types of Normalization Levels or Normalization Forms?
The different types of Normalization Forms are:

. First Normal Form: Duplicate columns from the same table needs to be eliminated.
We have to
create separate tables for each group of related data and identify each row with a
unique
column or set of columns (Primary Key)
. Second Normal Form: First it should meet the requirement of first normal form.
Removes the
subsets of data that apply to multiple rows of a table and place them in separate
tables.
Relationships must be created between the new tables and their predecessors through
the use
of foreign keys.
. Third Normal Form: First it should meet the requirements of second normal form.
Remove
columns that are not depending upon the primary key.
. Fourth Normal Form: There should not be any multi-valued dependencies.

Most databases will be in Third Normal Form


3. What is De-normalization?
De-normalization is the process of optimizing the read performance of a database by
adding
redundant data or by grouping data. De-normalization is used in OLAP systems.
4. What is a Transaction?
A transaction is a logical unit of work performed against a database in which all
steps must be
performed or none.
5. What are ACID properties?
A database transaction must be Atomic, Consistent, Isolation and Durability.

. Atomic: Transactions must be atomic. Transactions must fail or succeed as a


single unit.
. Consistent: The database must always be in consistent state. There should not be
any partial
transactions
. Isolation: The changes made by a user should be visible only to that user until
the transaction is
committed.
. Durability: Once a transaction is committed, it should be permanent and cannot be
undone.

6. Explain different storage models of OLAP?

. MOLAP: The data is stored in multi-dimensional cube. The storage is not in the
relational
database, but in proprietary formats.
. ROLAP: ROLAP relies on manipulating the data stored in the RDBMS for slicing and
dicing
functionality.
. HOLAP: HOLAP combines the advantages of both MOLAP and ROLAP. For summary type
information, HOLAP leverages on cube technology for faster performance. For detail
information, HOLAP can drill through the cube.

7. Explain one-to-one relationship with an example?


One to one relationship is a simple reference between two tables. Consider Customer
and Address
tables as an example. A customer can have only one address and an address
references only one
customer.
8. Explain one-to-many relationship with an example?
One-to-many relationships can be implemented by splitting the data into two tables
with a primary
key and foreign key relationship. Here the row in one table is referenced by one or
more rows in the
other table. An example is the Employees and Departments table, where the row in
the Departments
table is referenced by one or more rows in the Employees table.
9. Explain many-to-many relationship with an example?
Many-to-Many relationship is created between two tables by creating a junction
table with the key
from both the tables forming the composite primary key of the junction table.
An example is Students, Subjects and Stud_Sub_junc tables. A student can opt for
one or more
subjects in a year. Similarly a subject can be opted by one or more students. So a
junction table is
created to implement the many-to-many relationship.
10. Write down the general syntax of a select statement?
The basic syntax of a select statement is

SELECT Columns | *
FROM Table_Name

[WHERE Search_Condition]

[GROUP BY Group_By_Expression]

[HAVING Search_Condition]

[ORDER BY Order_By_Expression [ASC|DESC]]

Oracle Procedure To Disable All Triggers In A Schema(User)

The below procedure can be used to disable all the triggers in a schema in oracle
database.

CREATE OR REPLACE PROCEDURE DISABLE_TRIGGERS

IS

v_statement VARCHAR2(500);

CURSOR trigger_cur

IS

SELECT trigger_name

FROM user_triggers;

BEGIN

FOR i in trigger_cur

LOOP

v_statement := 'ALTER TRIGGER '||y.trigger_name||' DISABLE';

EXECUTE IMMEDIATE v_statement;

END LOOP;

END;
SQL Query Interview Questions - Part 5

Write SQL queries for the below interview questions:

1. Load the below products table into the target table.

CREATE TABLE PRODUCTS

PRODUCT_ID INTEGER,

PRODUCT_NAME VARCHAR2(30)

);

INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');

INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');

INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');

INSERT INTO PRODUCTS VALUES ( 400, 'LG');

INSERT INTO PRODUCTS VALUES ( 500, 'BlackBerry');

INSERT INTO PRODUCTS VALUES ( 600, 'Motorola');

COMMIT;

SELECT * FROM PRODUCTS;

PRODUCT_ID PRODUCT_NAME

-----------------------

100 Nokia

200 IPhone

300 Samsung
400 LG

500 BlackBerry

600 Motorola

The requirements for loading the target table are:

. Select only 2 products randomly.


. Do not select the products which are already loaded in the target table with in
the last 30 days.
. Target table should always contain the products loaded in 30 days. It should not
contain the
products which are loaded prior to 30 days.

Solution:

First we will create a target table. The target table will have an additional
column INSERT_DATE to
know when a product is loaded into the target table. The target
table structure is

CREATE TABLE TGT_PRODUCTS

PRODUCT_ID INTEGER,

PRODUCT_NAME VARCHAR2(30),

INSERT_DATE DATE

);

The next step is to pick 5 products randomly and then load into target table. While
selecting check
whether the products are there in the

INSERT INTO TGT_PRODUCTS

SELECT PRODUCT_ID,

PRODUCT_NAME,

SYSDATE INSERT_DATE

FROM

(
SELECT PRODUCT_ID,

PRODUCT_NAME

FROM PRODUCTS S

WHERE NOT EXISTS (

SELECT 1

FROM TGT_PRODUCTS T

WHERE T.PRODUCT_ID = S.PRODUCT_ID

ORDER BY DBMS_RANDOM.VALUE --Random number generator in oracle.

)A

WHERE ROWNUM <= 2;

The last step is to delete the products from the table which are loaded 30 days
back.

DELETE FROM TGT_PRODUCTS

WHERE INSERT_DATE < SYSDATE - 30;

2. Load the below CONTENTS table into the target table.

CREATE TABLE CONTENTS

CONTENT_ID INTEGER,

CONTENT_TYPE VARCHAR2(30)

);

INSERT INTO CONTENTS VALUES (1,'MOVIE');


INSERT INTO CONTENTS VALUES (2,'MOVIE');

INSERT INTO CONTENTS VALUES (3,'AUDIO');

INSERT INTO CONTENTS VALUES (4,'AUDIO');

INSERT INTO CONTENTS VALUES (5,'MAGAZINE');

INSERT INTO CONTENTS VALUES (6,'MAGAZINE');

COMMIT;

SELECT * FROM CONTENTS;

CONTENT_ID CONTENT_TYPE

-----------------------

1 MOVIE

2 MOVIE

3 AUDIO

4 AUDIO

5 MAGAZINE

6 MAGAZINE

The requirements to load the target table are:

. Load only one content type at a time into the target table.
. The target table should always contain only one contain type.
. The loading of content types should follow round-robin style. First MOVIE, second
AUDIO, Third
MAGAZINE and again fourth Movie.

Solution:

First we will create a lookup table where we mention the priorities for the content
types. The lookup
table �Create Statement� and data is shown below.
CREATE TABLE CONTENTS_LKP

CONTENT_TYPE VARCHAR2(30),

PRIORITY INTEGER,

LOAD_FLAG INTEGER

);

INSERT INTO CONTENTS_LKP VALUES('MOVIE',1,1);

INSERT INTO CONTENTS_LKP VALUES('AUDIO',2,0);

INSERT INTO CONTENTS_LKP VALUES('MAGAZINE',3,0);

COMMIT;

SELECT * FROM CONTENTS_LKP;

CONTENT_TYPE PRIORITY LOAD_FLAG

---------------------------------

MOVIE 1 1

AUDIO 2 0

MAGAZINE 3 0

Here if LOAD_FLAG is 1, then it indicates which content type needs to be loaded


into the target
table. Only one content type will have LOAD_FLAG as 1. The other content types will
have
LOAD_FLAG as 0. The target table structure is same as the source table structure.

The second step is to truncate the target table before loading the data

TRUNCATE TABLE TGT_CONTENTS;


The third step is to choose the appropriate content type from the lookup table to
load the source data
into the target table.

INSERT INTO TGT_CONTENTS

SELECT CONTENT_ID,

CONTENT_TYPE

FROM CONTENTS

WHERE CONTENT_TYPE = (SELECT CONTENT_TYPE FROM CONTENTS_LKP WHERE


LOAD_FLAG=1);

The last step is to update the LOAD_FLAG of the Lookup table.

UPDATE CONTENTS_LKP

SET LOAD_FLAG = 0

WHERE LOAD_FLAG = 1;

UPDATE CONTENTS_LKP

SET LOAD_FLAG = 1

WHERE PRIORITY = (

SELECT DECODE( PRIORITY,(SELECT MAX(PRIORITY) FROM CONTENTS_LKP) ,1 ,


PRIORITY+1)

FROM CONTENTS_LKP

WHERE CONTENT_TYPE = (SELECT DISTINCT CONTENT_TYPE FROM TGT_CONTENTS)

);

Recommended Posts:

---

SQL Query Interview Questions


SQL Query Interview Questions On Connect By Clause
How to Estimate the Table Size and Index Size in Oracle Database

Knowing the table size and index size of a table is always worth. This can be
helpful when you want
to load the data of a table from one database to another database. You can create
the required
space in the new database just a head.
Estimated Table Size:
The SQL query to know the estimated table size in Oracle is

SELECT (row_size_in_bytes * cnt_of_rows)/1000/1000/1000 table_size_in_GB

FROM

SELECT table_name ,

(sum (data_length) / 1048576) * 1000000 row_size_in_bytes

FROM user_tab_columns

WHERE table_name=UPPER('&Enter_Table_Name')

GROUP BY table_name

) A,

(SELECT count(1) cnt_of_rows FROM &Enter_Table_Name );

Estimated Indexes Size:


The below SQL query can be used to know the estimated index size occupied by a
table in oracle.

SELECT (row_size_in_bytes * cnt_of_rows)/1000/1000/1000 index_size_in_GB

FROM

SELECT table_name ,
(sum (column_length) / 1048576) * 1000000 row_size_in_bytes

FROM user_ind_columns

WHERE table_name=UPPER('&Enter_Table_Name')

GROUP BY table_name

) A,

(SELECT count(1) cnt_of_rows FROM &Enter_Table_Name );

SQL Queries Interview Questions - Oracle Analytical Functions Part 1

Analytic functions compute aggregate values based on a group of rows. They differ
from aggregate
functions in that they return multiple rows for each group. Most of the SQL
developers won't use
analytical functions because of its cryptic syntax or uncertainty about its logic
of operation. Analytical
functions saves lot of time in writing queries and gives better performance when
compared to native
SQL.
Before starting with the interview questions, we will see the difference between
the aggregate
functions and analytic functions with an example. I have used SALES TABLE as an
example to
solve the interview questions. Please create the below sales table in your oracle
database.

CREATE TABLE SALES

SALE_ID INTEGER,

PRODUCT_ID INTEGER,

YEAR INTEGER,

Quantity INTEGER,

PRICE INTEGER

);
INSERT INTO SALES VALUES ( 1, 100, 2008, 10, 5000);

INSERT INTO SALES VALUES ( 2, 100, 2009, 12, 5000);

INSERT INTO SALES VALUES ( 3, 100, 2010, 25, 5000);

INSERT INTO SALES VALUES ( 4, 100, 2011, 16, 5000);

INSERT INTO SALES VALUES ( 5, 100, 2012, 8, 5000);

INSERT INTO SALES VALUES ( 6, 200, 2010, 10, 9000);

INSERT INTO SALES VALUES ( 7, 200, 2011, 15, 9000);

INSERT INTO SALES VALUES ( 8, 200, 2012, 20, 9000);

INSERT INTO SALES VALUES ( 9, 200, 2008, 13, 9000);

INSERT INTO SALES VALUES ( 10,200, 2009, 14, 9000);

INSERT INTO SALES VALUES ( 11, 300, 2010, 20, 7000);

INSERT INTO SALES VALUES ( 12, 300, 2011, 18, 7000);

INSERT INTO SALES VALUES ( 13, 300, 2012, 20, 7000);

INSERT INTO SALES VALUES ( 14, 300, 2008, 17, 7000);

INSERT INTO SALES VALUES ( 15, 300, 2009, 19, 7000);

COMMIT;

SELECT * FROM SALES;

SALE_ID PRODUCT_ID YEAR QUANTITY PRICE

--------------------------------------

1 100 2008 10 5000


2 100 2009 12 5000

3 100 2010 25 5000

4 100 2011 16 5000

5 100 2012 8 5000

6 200 2010 10 9000

7 200 2011 15 9000

8 200 2012 20 9000

9 200 2008 13 9000

10 200 2009 14 9000

11 300 2010 20 7000

12 300 2011 18 7000

13 300 2012 20 7000

14 300 2008 17 7000

15 300 2009 19 7000

Difference Between Aggregate and Analytic Functions:


Q. Write a query to find the number of products sold in each year?
The SQL query Using Aggregate functions is

SELECT Year,

COUNT(1) CNT

FROM SALES

GROUP BY YEAR;
YEAR CNT

---------

2009 3

2010 3

2011 3

2008 3

2012 3

The SQL query Using Aanalytic functions is

SELECT SALE_ID,

PRODUCT_ID,

Year,

QUANTITY,

PRICE,

COUNT(1) OVER (PARTITION BY YEAR) CNT

FROM SALES;

SALE_ID PRODUCT_ID YEAR QUANTITY PRICE CNT

------------------------------------------

9 200 2008 13 9000 3

1 100 2008 10 5000 3

14 300 2008 17 7000 3

15 300 2009 19 7000 3

2 100 2009 12 5000 3


10 200 2009 14 9000 3

11 300 2010 20 7000 3

6 200 2010 10 9000 3

3 100 2010 25 5000 3

12 300 2011 18 7000 3

4 100 2011 16 5000 3

7 200 2011 15 9000 3

13 300 2012 20 7000 3

5 100 2012 8 5000 3

8 200 2012 20 9000 3

From the ouputs, you can observe that the aggregate functions return only one row
per group
whereas analytic functions keeps all the rows in the gorup. Using the aggregate
functions, the select
clause contains only the columns specified in group by clause and aggregate
functions whereas in
analytic functions you can specify all the columns in the table.
The PARTITION BY clause is similar to GROUP By clause, it specifies the window of
rows that the
analytic funciton should operate on.
I hope you got some basic idea about aggregate and analytic functions. Now lets
start with solving
the Interview Questions on Oracle Analytic Functions.
1. Write a SQL query using the analytic function to find the total sales(QUANTITY)
of each product?
Solution:
SUM analytic function can be used to find the total sales. The SQL query is

SELECT PRODUCT_ID,

QUANTITY,

SUM(QUANTITY) OVER( PARTITION BY PRODUCT_ID ) TOT_SALES

FROM SALES;
PRODUCT_ID QUANTITY TOT_SALES

-----------------------------

100 12 71

100 10 71

100 25 71

100 16 71

100 8 71

200 15 72

200 10 72

200 20 72

200 14 72

200 13 72

300 20 94

300 18 94

300 17 94

300 20 94

300 19 94

2. Write a SQL query to find the cumulative sum of sales(QUANTITY) of each product?
Here first
sort the QUANTITY in ascendaing order for each product and then accumulate the
QUANTITY.
Cumulative sum of QUANTITY for a product = QUANTITY of current row + sum of
QUANTITIES all
previous rows in that product.
Solution:
We have to use the option "ROWS UNBOUNDED PRECEDING" in the SUM analytic function
to get
the cumulative sum. The SQL query to get the ouput is
SELECT PRODUCT_ID,

QUANTITY,

SUM(QUANTITY) OVER( PARTITION BY PRODUCT_ID

ORDER BY QUANTITY ASC

ROWS UNBOUNDED PRECEDING) CUM_SALES

FROM SALES;

PRODUCT_ID QUANTITY CUM_SALES

-----------------------------

100 8 8

100 10 18

100 12 30

100 16 46

100 25 71

200 10 10

200 13 23

200 14 37

200 15 52

200 20 72

300 17 17

300 18 35

300 19 54

300 20 74

300 20 94
The ORDER BY clause is used to sort the data. Here the ROWS UNBOUNDED PRECEDING
option
specifies that the SUM analytic function should operate on the current row and the
pervious rows
processed.
3. Write a SQL query to find the sum of sales of current row and previous 2 rows in
a product group?
Sort the data on sales and then find the sum.
Solution:
The sql query for the required ouput is

SELECT PRODUCT_ID,

QUANTITY,

SUM(QUANTITY) OVER(

PARTITION BY PRODUCT_ID

ORDER BY QUANTITY DESC

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) CALC_SALES

FROM SALES;

PRODUCT_ID QUANTITY CALC_SALES

------------------------------

100 25 25

100 16 41

100 12 53

100 10 38

100 8 30

200 20 20
200 15 35

200 14 49

200 13 42

200 10 37

300 20 20

300 20 40

300 19 59

300 18 57

300 17 54

The ROWS BETWEEN clause specifies the range of rows to consider for calculating the
SUM.
4. Write a SQL query to find the Median of sales of a product?
Solution:
The SQL query for calculating the median is

SELECT PRODUCT_ID,

QUANTITY,

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY QUANTITY ASC)

OVER (PARTITION BY PRODUCT_ID) MEDIAN

FROM SALES;

PRODUCT_ID QUANTITY MEDIAN

--------------------------

100 8 12

100 10 12
100 12 12

100 16 12

100 25 12

200 10 14

200 13 14

200 14 14

200 15 14

200 20 14

300 17 19

300 18 19

300 19 19

300 20 19

300 20 19

5. Write a SQL query to find the minimum sales of a product without using the group
by clause.
Solution:
The SQL query is

SELECT PRODUCT_ID,

YEAR,

QUANTITY

FROM

SELECT PRODUCT_ID,

YEAR,
QUANTITY,

ROW_NUMBER() OVER(PARTITION BY PRODUCT_ID

ORDER BY QUANTITY ASC) MIN_SALE_RANK

FROM SALES

) WHERE MIN_SALE_RANK = 1;

PRODUCT_ID YEAR QUANTITY

------------------------

100 2012 8

200 2010 10

300 2008 17

SQL Queries Interview Questions - Oracle Part 2

This is continuation to my previous post, SQL Queries Interview Questions - Oracle


Part 1 , Where i
have used PRODUCTS and SALES tables as an example. Here also i am using the same
tables.
So, just take a look at the tables by going through that link and it will be easy
for you to understand
the questions mentioned here.
Solve the below examples by writing SQL queries.
1. Write a query to find the products whose quantity sold in a year should be
greater than the
average quantity of the product sold across all the years?
Solution:
This can be solved with the help of correlated query. The SQL query for this is

SELECT P.PRODUCT_NAME,

[Link],
[Link]

FROM PRODUCTS P,

SALES S

WHERE P.PRODUCT_ID = S.PRODUCT_ID

AND [Link] >

(SELECT AVG(QUANTITY)

FROM SALES S1

WHERE S1.PRODUCT_ID = S.PRODUCT_ID

);

PRODUCT_NAME YEAR QUANTITY

--------------------------

Nokia 2010 25

IPhone 2012 20

Samsung 2012 20

Samsung 2010 20

2. Write a query to compare the products sales of "IPhone" and "Samsung" in each
year? The output
should look like as

YEAR IPHONE_QUANT SAM_QUANT IPHONE_PRICE SAM_PRICE

---------------------------------------------------

2010 10 20 9000 7000

2011 15 18 9000 7000

2012 20 20 9000 7000


Solution:
By using self-join SQL query we can get the required result. The required SQL query
is

SELECT S_I.YEAR,

S_I.QUANTITY IPHONE_QUANT,

S_S.QUANTITY SAM_QUANT,

S_I.PRICE IPHONE_PRICE,

S_S.PRICE SAM_PRICE

FROM PRODUCTS P_I,

SALES S_I,

PRODUCTS P_S,

SALES S_S

WHERE P_I.PRODUCT_ID = S_I.PRODUCT_ID

AND P_S.PRODUCT_ID = S_S.PRODUCT_ID

AND P_I.PRODUCT_NAME = 'IPhone'

AND P_S.PRODUCT_NAME = 'Samsung'

AND S_I.YEAR = S_S.YEAR

3. Write a query to find the ratios of the sales of a product?


Solution:
The ratio of a product is calculated as the total sales price in a particular year
divide by the total
sales price across all years. Oracle provides RATIO_TO_REPORT analytical function
for finding the
ratios. The SQL query is

SELECT P.PRODUCT_NAME,

[Link],
RATIO_TO_REPORT([Link]*[Link])

OVER(PARTITION BY P.PRODUCT_NAME ) SALES_RATIO

FROM PRODUCTS P,

SALES S

WHERE (P.PRODUCT_ID = S.PRODUCT_ID);

PRODUCT_NAME YEAR RATIO

-----------------------------

IPhone 2011 0.333333333

IPhone 2012 0.444444444

IPhone 2010 0.222222222

Nokia 2012 0.163265306

Nokia 2011 0.326530612

Nokia 2010 0.510204082

Samsung 2010 0.344827586

Samsung 2012 0.344827586

Samsung 2011 0.310344828

4. In the SALES table quantity of each product is stored in rows for every year.
Now write a query to
transpose the quantity for each product and display it in columns? The output
should look like as

PRODUCT_NAME QUAN_2010 QUAN_2011 QUAN_2012

------------------------------------------

IPhone 10 15 20

Samsung 20 18 20
Nokia 25 16 8

Solution:
Oracle 11g provides a pivot function to transpose the row data into column data.
The SQL query for
this is

SELECT * FROM

SELECT P.PRODUCT_NAME,

[Link],

[Link]

FROM PRODUCTS P,

SALES S

WHERE (P.PRODUCT_ID = S.PRODUCT_ID)

)A

PIVOT ( MAX(QUANTITY) AS QUAN FOR (YEAR) IN (2010,2011,2012));

If you are not running oracle 11g database, then use the below query for
transposing the row data
into column data.

SELECT P.PRODUCT_NAME,

MAX(DECODE([Link],2010, [Link])) QUAN_2010,

MAX(DECODE([Link],2011, [Link])) QUAN_2011,

MAX(DECODE([Link],2012, [Link])) QUAN_2012

FROM PRODUCTS P,

SALES S

WHERE (P.PRODUCT_ID = S.PRODUCT_ID)


GROUP BY P.PRODUCT_NAME;

5. Write a query to find the number of products sold in each year?


Solution:
To get this result we have to group by on year and the find the count. The SQL
query for this
question is

SELECT YEAR,

COUNT(1) NUM_PRODUCTS

FROM SALES

GROUP BY YEAR;

YEAR NUM_PRODUCTS

------------------

2010 3

2011 3

2012 3

SQL Queries Interview Questions - Oracle Part 1

As a database developer, writing SQL queries, PLSQL code is part of daily life.
Having a good
knowledge on SQL is really important. Here i am posting some practical examples on
SQL queries.
To solve these interview questions on SQL queries you have to create the products,
sales tables in
your oracle database. The "Create Table", "Insert" statements are provided below.

CREATE TABLE PRODUCTS

(
PRODUCT_ID INTEGER,

PRODUCT_NAME VARCHAR2(30)

);

CREATE TABLE SALES

SALE_ID INTEGER,

PRODUCT_ID INTEGER,

YEAR INTEGER,

Quantity INTEGER,

PRICE INTEGER

);

INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');

INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');

INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');

INSERT INTO PRODUCTS VALUES ( 400, 'LG');

INSERT INTO SALES VALUES ( 1, 100, 2010, 25, 5000);

INSERT INTO SALES VALUES ( 2, 100, 2011, 16, 5000);

INSERT INTO SALES VALUES ( 3, 100, 2012, 8, 5000);

INSERT INTO SALES VALUES ( 4, 200, 2010, 10, 9000);

INSERT INTO SALES VALUES ( 5, 200, 2011, 15, 9000);

INSERT INTO SALES VALUES ( 6, 200, 2012, 20, 9000);

INSERT INTO SALES VALUES ( 7, 300, 2010, 20, 7000);


INSERT INTO SALES VALUES ( 8, 300, 2011, 18, 7000);

INSERT INTO SALES VALUES ( 9, 300, 2012, 20, 7000);

COMMIT;

The products table contains the below data.

SELECT * FROM PRODUCTS;

PRODUCT_ID PRODUCT_NAME

-----------------------

100 Nokia

200 IPhone

300 Samsung

The sales table contains the following data.

SELECT * FROM SALES;

SALE_ID PRODUCT_ID YEAR QUANTITY PRICE

--------------------------------------

1 100 2010 25 5000

2 100 2011 16 5000

3 100 2012 8 5000

4 200 2010 10 9000

5 200 2011 15 9000

6 200 2012 20 9000


7 300 2010 20 7000

8 300 2011 18 7000

9 300 2012 20 7000

Here Quantity is the number of products sold in each year. Price is the sale price
of each product.
I hope you have created the tables in your oracle database. Now try to solve the
below SQL queries.
1. Write a SQL query to find the products which have continuous increase in sales
every year?
Solution:
Here �Iphone� is the only product whose sales are increasing every year.
STEP1: First we will get the previous year sales for each product. The SQL query to
do this is

SELECT P.PRODUCT_NAME,

[Link],

[Link],

LEAD([Link],1,0) OVER (

PARTITION BY P.PRODUCT_ID

ORDER BY [Link] DESC

) QUAN_PREV_YEAR

FROM PRODUCTS P,

SALES S

WHERE P.PRODUCT_ID = S.PRODUCT_ID;

PRODUCT_NAME YEAR QUANTITY QUAN_PREV_YEAR

-----------------------------------------
Nokia 2012 8 16

Nokia 2011 16 25

Nokia 2010 25 0

IPhone 2012 20 15

IPhone 2011 15 10

IPhone 2010 10 0

Samsung 2012 20 18

Samsung 2011 18 20

Samsung 2010 20 0

Here the lead analytic function will get the quantity of a product in its previous
year.
STEP2: We will find the difference between the quantities of a product with its
previous year�s
quantity. If this difference is greater than or equal to zero for all the rows,
then the product is a
constantly increasing in sales. The final query to get the required result is

SELECT PRODUCT_NAME

FROM

SELECT P.PRODUCT_NAME,

[Link] -

LEAD([Link],1,0) OVER (

PARTITION BY P.PRODUCT_ID

ORDER BY [Link] DESC

) QUAN_DIFF

FROM PRODUCTS P,

SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID

)A

GROUP BY PRODUCT_NAME

HAVING MIN(QUAN_DIFF) >= 0;

PRODUCT_NAME

------------

IPhone

2. Write a SQL query to find the products which does not have sales at all?
Solution:
�LG� is the only product which does not have sales at all. This can be achieved in
three ways.
Method1: Using left outer join.

SELECT P.PRODUCT_NAME

FROM PRODUCTS P

LEFT OUTER JOIN

SALES S

ON (P.PRODUCT_ID = S.PRODUCT_ID);

WHERE [Link] IS NULL

PRODUCT_NAME

------------

LG
Method2: Using the NOT IN operator.

SELECT P.PRODUCT_NAME

FROM PRODUCTS P

WHERE P.PRODUCT_ID NOT IN

(SELECT DISTINCT PRODUCT_ID FROM SALES);

PRODUCT_NAME

------------

LG

Method3: Using the NOT EXISTS operator.

SELECT P.PRODUCT_NAME

FROM PRODUCTS P

WHERE NOT EXISTS

(SELECT 1 FROM SALES S WHERE S.PRODUCT_ID = P.PRODUCT_ID);

PRODUCT_NAME

------------

LG

3. Write a SQL query to find the products whose sales decreased in 2012 compared to
2011?
Solution:
Here Nokia is the only product whose sales decreased in year 2012 when compared
with the sales
in the year 2011. The SQL query to get the required output is
SELECT P.PRODUCT_NAME

FROM PRODUCTS P,

SALES S_2012,

SALES S_2011

WHERE P.PRODUCT_ID = S_2012.PRODUCT_ID

AND S_2012.YEAR = 2012

AND S_2011.YEAR = 2011

AND S_2012.PRODUCT_ID = S_2011.PRODUCT_ID

AND S_2012.QUANTITY < S_2011.QUANTITY;

PRODUCT_NAME

------------

Nokia

4. Write a query to select the top product sold in each year?


Solution:
Nokia is the top product sold in the year 2010. Similarly, Samsung in 2011 and
IPhone, Samsung in
2012. The query for this is

SELECT PRODUCT_NAME,

YEAR

FROM

SELECT P.PRODUCT_NAME,

[Link],

RANK() OVER (
PARTITION BY [Link]

ORDER BY [Link] DESC

) RNK

FROM PRODUCTS P,

SALES S

WHERE P.PRODUCT_ID = S.PRODUCT_ID

) A

WHERE RNK = 1;

PRODUCT_NAME YEAR

--------------------

Nokia 2010

Samsung 2011

IPhone 2012

Samsung 2012

5. Write a query to find the total sales of each product.?


Solution:
This is a simple query. You just need to group by the data on PRODUCT_NAME and then
find the
sum of sales.

SELECT P.PRODUCT_NAME,

NVL( SUM( [Link]*[Link] ), 0) TOTAL_SALES

FROM PRODUCTS P

LEFT OUTER JOIN

SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID)

GROUP BY P.PRODUCT_NAME;

PRODUCT_NAME TOTAL_SALES

---------------------------

LG 0

IPhone 405000

Samsung 406000

Nokia 245000

Source Qualifier Transformation Examples - Informatica

Here i am providing some basic examples of using source qualifier transformation in


your mappings.
This is a continuation to my previous posts, Learn Source Qualifier Transformation,
Quiz on Source
Qualifier Transformation.
To solve these examples create the employees and departments tables in your
database. The
"create table" statements are provided below.

create table DEPARTMENTS

DEPARTMENT_ID NUMBER(4) not null,

DEPARTMENT_NAME VARCHAR2(15) not null,

MANAGER_ID NUMBER(6)

alter table DEPARTMENTS add primary key (DEPARTMENT_ID);


create table EMPLOYEES

EMPLOYEE_ID NUMBER(6) not null,

NAME VARCHAR2(10),

LAST_NAME VARCHAR2(10),

SALARY NUMBER(10,2),

MANAGER_ID NUMBER(6),

DEPARTMENT_ID NUMBER(4)

alter table EMPLOYEES add primary key (EMPLOYEE_ID);

alter table EMPLOYEES add foreign key (DEPARTMENT_ID) references DEPARTMENTS


(DEPARTMENT_ID);

Table Name: Employees

EMPLOYEE_ID NAME SALARY MANAGER_ID DEPARTMENT_ID

-------------------------------------------------

101 PAT 1000 201 10

102 KEVIN 2000 201 10

201 MIKE 5000 NULL 10

301 JOHN 7000 NULL NULL

Table Name: Departments

DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID


-----------------------------------------

10 Account 201

20 HR 501

Try Solving the below examples:


1. Create a mapping to join employees and departments table on " DEPARTMENT_ID "
column
using source qualifier transformation?
Solution:

1. Source qualifier transformation can be used to join sources only from the same
database.
2. Connect the source definitions of departments and employees to the same
qualifier
transformation.
3. As there is a primary-key, foreign-key relationship between the source tables,
the source
qualifier transformation by default joins the two sources on the DEPARTMENT_ID
column.

2. Create a mapping to join employees and departments table on "MANAGER_ID" column


using
source qualifier transformation?
Solution:

1. Connect the source definitions of departments and employees to the same


qualifier
transformation.
2. Go to the properties tab of source qualifier ->User Defined Join and then open
the editor. Enter
the join condition as DEPARTMENTS.MANAGER_ID = EMPLOYEES.MANAGER_ID. Click Ok.
3. Now connect the required ports from the source qualifier transformation to the
target.

3. Create a mapping to get only the employees who have manager?


Solution:
This is very simple. Go to the properties tab of source qualifier-> Source Filter.
Open the editor and
enter EMPLOYEES.MANAGER_ID IS NOT NULL
4. Create a mapping to sort the data of employees table on DEPARTMENT_ID, SALARY?
Solution:
Make sure the ports order in the source qualifier transformation as shown below

DEPARTMENT_ID

SALARY
EMPLOYEE_ID

NAME

LAST_NAME

MANAGER_ID

The first two ports should be DEPARTMENT_ID, SALARY and the rest of the ports can
be in any
order.
Now go to the properties tab of source qualifier-> Number Of Sorted Ports. Make the
Number Of
Sorted Ports value as 2.
5. Create a mapping to get only distinct departments in employees table?
Solution:

1. The source qualifier transformation should only contain the DEPARTMENT_ID port
from
EMPLOYEES source definition.
2. Now go to the properties tab of source qualifier-> Select Distinct. Check the
check box of Select
Distinct option.

If you are interested to solve complex problems on mappings, just go through


Examples of
Informatica Mappings.

Find Command in Unix and Linux Examples

Find is one of the powerful utility of Unix (or Linux) used for searching the files
in a directory
hierarchy. The syntax of find command is

find [pathnames] [conditions]

Let see some practical exercises on using find command.


1. How to run the last executed find command?

!find

This will execute the last find command. It also displays the last find command
executed along with
the result on the terminal.
2. How to find for a file using name?

find -name "[Link]"

./bkp/[Link]

./[Link]

This will find all the files with name "[Link]" in the current directory and sub-
directories.
3. How to find for files using name and ignoring case?

find -iname "[Link]"

./[Link]

./bkp/[Link]

./[Link]

This will find all the files with name "[Link]" while ignoring the case in the
current directory and
sub-directories.
4. How to find for a file in the current directory only?

find -maxdepth 1 -name "[Link]"

./[Link]

This will find for the file "[Link]" in the current directory only
5. How to find for files containing a specific word in its name?

find -name "*java*"

./[Link]

./bkp/[Link]
./[Link]

./[Link]

It displayed all the files which have the word "java" in the filename
6. How to find for files in a specific directory?

find /etc -name "*java*"

This will look for the files in the /etc directory with "java" in the filename
7. How to find the files whose name are not "[Link]"?

find -not -name "[Link]"

./[Link]

./bkp

./[Link]

This is like inverting the match. It prints all the files except the given file
"[Link]".
8. How to limit the file searches to specific directories?

find -name "[Link]"

./tmp/[Link]

./bkp/var/tmp/files/[Link]

./bkp/var/tmp/[Link]

./bkp/var/[Link]

./bkp/[Link]
./[Link]

You can see here the find command displayed all the files with name "[Link]" in
the current
directory and sub-directories.
a. How to print the files in the current directory and one level down to the
current directory?

find -maxdepth 2 -name "[Link]"

./tmp/[Link]

./bkp/[Link]

./[Link]

b. How to print the files in the current directory and two levels down to the
current directory?

find -maxdepth 3 -name "[Link]"

./tmp/[Link]

./bkp/var/[Link]

./bkp/[Link]

./[Link]

c. How to print the files in the subdirectories between level 1 and 4?

find -mindepth 2 -maxdepth 5 -name "[Link]"

./tmp/[Link]

./bkp/var/tmp/files/[Link]

./bkp/var/tmp/[Link]

./bkp/var/[Link]

./bkp/[Link]
9. How to find the empty files in a directory?

find . -maxdepth 1 -empty

./empty_file

10. How to find the largest file in the current directory and sub directories

find . -type f -exec ls -s {} \; | sort -n -r | head -1

The find command "find . -type f -exec ls -s {} \;" will list all the files along
with the size of the file.
Then the sort command will sort the files based on the size. The head command will
pick only the
first line from the output of sort.
11. How to find the smallest file in the current directory and sub directories

find . -type f -exec ls -s {} \; | sort -n -r | tail -1

Another method using find is

find . -type f -exec ls -s {} \; | sort -n | head -1

12. How to find files based on the file type?


a. Finding socket files

find . -type s

b. Finding directories

find . -type d

c. Finding hidden directories


find -type d -name ".*"

d. Finding regular files

find . -type f

e. Finding hidden files

find . -type f -name ".*"

13. How to find files based on the size?


a. Finding files whose size is exactly 10M

find . -size 10M

b. Finding files larger than 10M size

find . -size +10M

c. Finding files smaller than 10M size

find . -size -10M

14. How to find the files which are modified after the modification of a give file.

find -newer "[Link]"

This will display all the files which are modified after the file "[Link]"
15. Display the files which are accessed after the modification of a give file.
find -anewer "[Link]"

16. Display the files which are changed after the modification of a give file.

find -cnewer "[Link]"

17. How to find the files based on the file permissions?

find . -perm 777

This will display the files which have read, write, and execute permissions. To
know the permissions
of files and directories use the command "ls -l".
18. Find the files which are modified within 30 minutes.

find . -mmin -30

19. Find the files which are modified within 1 day.

find . -mtime -1

20. How to find the files which are modified 30 minutes back

find . -not -mmin -30

21. How to find the files which are modified 1 day back.

find . -not -mtime -1

22. Print the files which are accessed within 1 hour.


find . -amin -60

23. Print the files which are accessed within 1 day.

find . -atime -1

24. Display the files which are changed within 2 hours.

find . -cmin -120

25. Display the files which are changed within 2 days.

find . -ctime -2

26. How to find the files which are created between two files.

find . -cnewer f1 -and ! -cnewer f2

So far we have just find the files and displayed on the terminal. Now we will see
how to perform
some operations on the files.
1. How to find the permissions of the files which contain the name "java"?

find -name "*java*"|xargs ls -l

Alternate method is

find -name "*java*" -exec ls -l {} \;

2. Find the files which have the name "java" in it and then display only the files
which have "class"
word in them?
find -name "*java*" -exec grep -H class {} \;

3. How to remove files which contain the name "java".

find -name "*java*" -exec rm -r {} \;

This will delete all the files which have the word �java" in the file name in the
current directory and
sub-directories.
Similarly you can apply other Unix commands on the files found using the find
command. I will add
more examples as and when i found.

Aggregator Transformation in Informatica

Aggregator transformation is an active transformation used to perform calculations


such as sums,
averages, counts on groups of data. The integration service stores the data group
and row data in
aggregate cache. The Aggregator Transformation provides more advantages than the
SQL, you can
use conditional clauses to filter rows.
Creating an Aggregator Transformation:
Follow the below steps to create an aggregator transformation

. Go to the Mapping Designer, click on transformation in the toolbar -> create.


. Select the Aggregator transformation, enter the name and click create. Then click
Done. This
will create an aggregator transformation without ports.
. To create ports, you can either drag the ports to the aggregator transformation
or create in the
ports tab of the aggregator.

Configuring the aggregator transformation:


You can configure the following components in aggregator transformation

. Aggregate Cache: The integration service stores the group values in the index
cache and row
data in the data cache.
. Aggregate Expression: You can enter expressions in the output port or variable
port.
. Group by Port: This tells the integration service how to create groups. You can
configure input,
input/output or variable ports for the group.
. Sorted Input: This option can be used to improve the session performance. You can
use this
option only when the input to the aggregator transformation in sorted on group by
ports.

Properties of Aggregator Transformation:


The below table illustrates the properties of aggregator transformation

Property

Description

Cache Directory

The Integration Service creates the index and data cache files.

Tracing Level

Amount of detail displayed in the session log for this transformation.

Sorted Input

Indicates input data is already sorted by groups. Select this option only if the
input to the Aggregator transformation is sorted.

Aggregator Data
Cache Size

Default cache size is 2,000,000 bytes. Data cache stores row data.

Aggregator Index
Cache Size

Default cache size is 1,000,000 bytes. Index cache stores group by ports
data

Transformation Scope

Specifies how the Integration Service applies the transformation logic to


incoming data

Group By Ports:
The integration service performs aggregate calculations and produces one row for
each group. If you
do not specify any group by ports, the integration service returns one row for all
input rows. By
default, the integration service returns the last row received for each group along
with the result of
aggregation. By using the FIRST function, you can specify the integration service
to return the first
row of the group.
Aggregate Expressions:
You can create the aggregate expressions only in the Aggregator transformation. An
aggregate
expression can include conditional clauses and non-aggregate functions. You can use
the following
aggregate functions in the Aggregator transformation,

AVG

COUNT

FIRST

LAST

MAX

MEDIAN

MIN
PERCENTILE

STDDEV

SUM

VARIANCE

Examples: SUM(sales), AVG(salary)

Nested Aggregate Functions:


You can nest one aggregate function within another aggregate function. You can
either use single-
level aggregate functions or multiple nested functions in an aggregate
transformation. You cannot
use both single-level and nested aggregate functions in an aggregator
transformation. The Mapping
designer marks the mapping as invalid if an aggregator transformation contains both
single-level and
nested aggregate functions. If you want to create both single-level and nested
aggregate functions,
create separate aggregate transformations.

Examples: MAX(SUM(sales))

Conditional clauses:

You can reduce the number of rows processed in the aggregation by specifying a
conditional clause.

Example: SUM(salary, slaray>1000)

This will include only the salaries which are greater than 1000 in the SUM
calculation.
Non Conditional clauses:
You can also use non-aggregate functions in aggregator transformation.

Example: IIF( SUM(sales) <20000, SUM(sales),0)

Note: By default, the Integration Service treats null values as NULL in aggregate
functions. You can
change this by configuring the integration service.
Incremental Aggregation:
After you create a session that includes an Aggregator transformation, you can
enable the session
option, Incremental Aggregation. When the Integration Service performs incremental
aggregation, it
passes source data through the mapping and uses historical cache data to perform
aggregation
calculations incrementally.
Sorted Input:
You can improve the performance of aggregator transformation by specifying the
sorted input. The
Integration Service assumes all the data is sorted by group and it performs
aggregate calculations
as it reads rows for a group. If you specify the sorted input option without
actually sorting the data,
then integration service fails the session.
Take a Quiz on Aggregator Transformation
If you like this post, then please share it on google by clicking on the +1 button.

Java HashMap Class Example

HashMap is a Hash table based implementation of the Map interface. The Map
interface associates
the values to the unique keys. The HashMap implements all the operations of map and
also permits
null values and null key. The HashMap is equivalent to Hashtable except that it is
unsynchronized
and permits nulls. As the HashMap is unsynchronized, it is not multi-threaded safe.
The order of the
map will not remain constant over the time.
The HashMap works on the principle of hashing. It has the put() and get() methods
for storing and
retrieving data from HashMap.
How the HashMap stores keys and values:
When you want to store an object on HashMap using the put() method, the HashMap
calls the
hashcode() hashMap key object and by applying that hashcode on its own hashing
function it
identifies a bucket location for storing value object. The important point to note
is that the HashMap
stores both key and values in the bucket.
HashMap Example:

//Example for students marks

import [Link].*

class HashMapDemo {
public static void main(string args[]) {

//creating a hash map

HashMap student = new HashMap();

//Putting values to the map

[Link]("Mary", new Integer(85));

[Link]("Tim", new Integer(77));

[Link]("Karren", new Integer(92));

[Link]("Cristy", new Integer(63));

[Link]("John", new Integer(56));

// Create a set to get the hashmap elements

Set set = [Link]();

Iterator i = [Link]();

//printing the elements

while( [Link]() ) {

[Link] m = ([Link])[Link]();

[Link]([Link]() + "-" + [Link]() );


}

The output of this program is

John-56

Tim-77

Mary-85

cristy-63

Karren-92

What is Informatica PowerCenter

Informatica PowerCenter is one of the Enterprise Data Integration products


developed by
Informatica Corporation. Informatica PowerCenter is an ETL tool used for extracting
data from the
source, transforming and loading data in to the target.

. The Extraction part involves understanding, analyzing and cleaning of the source
data.
. Transformation part involves cleaning of the data more precisely and modifying
the data as per
the business requirements.
. The loading part involves assigning the dimensional keys and loading into the
warehouse.

What is the need of an ETL tool


The problem comes with traditional programming languages where you need to connect
to multiple
sources and you have to handle errors. For this you have to write complex code. ETL
tools provide a
ready-made solution for this. You dont need to worry about handling these things
and can
concentrate only on coding the requirement part.
Grouping a Set of Lines as a Paragraph - Unix Awk Command

The awk command can be used to group a set of lines into a paragraph. We will also
use a bash
shell script to group the lines into a paragraph. As an example, consider the file,
[Link], with the
below data

>cat [Link]

A one

B two

C three

D four

E five

F six

G seven

We want to group 3 lines in the file as a paragraph. The required output is

A one

B two

C three

D four

E five

F six

G seven

The bash script for achieving this is


#!/bin/bash

line_count=1

while read line

do

S=`expr $line_count % 3`

if [ "$S" -eq 0 ]

then

echo -e $line"\n"

else

echo $line

fi

line_count=`expr $line_count + 1`

done < [Link]

Now we will see how to achieve this using the Awk command in Unix. The awk command
for this is

awk '!( NR % 3 ) {$0 = $0"\n"} 1' [Link]

Download Informatica PowerCenter Version 9.1 Tutorials (PDF Documents)

The list of documents in Informatica version 9.1 are:


IDP Developer Guide:
This document talks about the application programming interfaces (APIs) that enable
you to embed
data integration capabilities in an enterprise application
Advanced Workflow Guide:
Advanced workflow guide discusses about topics like Pipeline Partitioning,
Pushdown Optimization,
Real-Time Processing, Grid Processing, External Loading etc.
Data Profiling Guide:
Data profiling guide helps you to understand, analyze the content, quality and
structure of data.
Designer Guide:
You can learn how to import or create Sources, Targets, create Transformations,
Mappings,
Mapplets and so on.
Getting Started Guide:
This document will help you on how to use the Informatica PowerCenter tool.
Installation and Configuration Guide:
Learn how to install Informatica on multiple nodes and configuring it.
Mapping Analysis for Excel:
PowerCenter mappings in Microsoft Office Excel, and to export PowerCenter mappings
to Microsoft
Office Excel
Mapping Architect Visio Guide:
Mapping Architect for Visio helps you to create mapping templates using Microsoft
Office Visio.
Performance Tuning Guide:
Helps you to understand how to optimize source, target, transformation, mapping,
sessions etc.
Repository Guide:
Helps in you understanding the repository architecture, metadata and repository
object locks.
Transformation Guide:
Learn about different transformation in Informatica version 9.
Web Services Provider Guide:
Web services describe a collection of operations that are network accessible
through standardized
XML messaging.
Click Here to download all the pdf documents

List of Top 15 Cloud Computing Companies

Cloud computing is the delivery of computing as a service rather than a product,


where shared
resources, software, and information are provided to computers and other devices as
a metered
service over a network.
The Top companies which offers cloud computing services are
Amazon
Amazon is the cloud computing industry. Amazon Web Services are the Elastic Compute
Cloud, for
computing capacity, and the Simple Storage Service, for on-demand storage capacity.

AT&T
Synaptic Hosting service offers pay-as-you-go access to virtual servers and storage
integrated with
security and networking functions
BlueLock
BlueLock is one of the leading VCE providers in the world. Provides Cloud resources
for VMware.
CSC
CSC launched BizCloud, a unique private cloud service that integrates
Infrastructure as a Service
into legacy IT system and interlinks it with Software as a Service providers.
Enomaly
Enomaly's Elastic Computing Platform (ECP) is software that integrates enterprise
data centers with
commercial cloud computing offerings.
Google
Google usescloud computing in building google apps. Google apps include e-mail,
calendar, word
processing, Web site creation tool and many more.
GoGrid
The GoGrid focuses on Web-based storage. Deploy Windows and Linux virtual servers
onto the
cloud quickly.
IBM
IBM expanding its cloud services business and occupying the market share quickly.
Joyent
creates cloud infrastructure packages.
Microsoft
Microsoft uses cloud computing in Azure. Azure is a Windows as-a-service platform
consisting of the
operating system and developer services.
NetSuite
Netsuite offers cloud computing in e-commerce, CRM, accounting and ERP tools.
Rackspace
The Rackspace provides cloud computing services like Cloud sites for websites,
Cloud Files for
storage, Cloud Servers for virtual servers.
RightScale
The RightScale provides cloud services that helps customers manage the IT
processes.
Salesforce
Salesforce CRM tools include salesforce automation, analytics, marketing and social
networking
tools.
Verizon
Verizon was able to expand its cloud services portfolio into the enterprise market.

Methods to Convert Hexadecimal to Decimal in Unix

We will see how to convert hexadecimal numbers into decimal numbers using Unix
command bash
scripting.
The bc command can be used to convert hexadecimal number into decimal number
Example:
>echo "ibase=16;ABCD"|bc

43981

The hexadecimal number "ABCD" is converted into decimal number 43981.


The bc command can also be used convert the decimal numbers back to hexadecimal
numbers.
Exmple:

>echo "obase=16;43981"|bc

ABCD

Converting hexadecimal to decimal using bash script:


Consider the below file with hexadecimal numbers.

>cat [Link]

ABCD

125A

F36C

E962

The bash script shown below converts the hexadecimal numbers into decimal numbers

#!/bin/bash

while read line

do

printf "%d\n" "0x"$line

done < [Link]


The output after running this script is

43981

4698

62316

59746

Converting decimal to hexadecimal using bash script:


The input file contains the decimal numbers with below data.

>cat [Link]

43981

4698

62316

59746

Now we will do the reverse process. The bash script for converting the decimal
numbers to
hexadecimal numbers is

#!/bin/bash

while read line

do

printf "%0X\n" $line

done < [Link]

After running this script the output will be

ABCD
125A

F36C

E962

If you like this post, then please share it on Google by clicking on the +1 button.

Transformations in Informatica 9

What is a Transformation

A transformation is a repository object which reads the data, modifies the data and
passes the data.
Transformations in a mapping represent the operations that the integration service
performs on the
data.
Transformations can be classified as active or passive, connected or unconnected.
Active Transformations:
A transformation can be called as an active transformation if it performs any of
the following actions.

. Change the number of rows: For example, the filter transformation is active
because it removes
the rows that do not meet the filter condition. All multi-group transformations are
active because
they might change the number of rows that pass through the transformation.
. Change the transaction boundary: The transaction control transformation is active
because it
defines a commit or roll back transaction.
. Change the row type: Update strategy is active because it flags the rows for
insert, delete,
update or reject.

Note: You cannot connect multiple active transformations or an active and passive
transformation to
the downstream transformation or transformation same input group. This is because
the integration
service may not be able to concatenate the rows generated by active
transformations. This rule is
not applicable for sequence generator transformation.
Passive Transformations:
Transformations which does not change the number of rows passed through them,
maintains the
transaction boundary and row type are called passive transformation.
Connected Transformations:
Transformations which are connected to the other transformations in the mapping are
called
connected transformations.
Unconnected Transformations:
An unconnected transformation is not connected to other transformations in the
mapping and is
called within another transformation, and returns a value to that.
The below table lists the transformations available in Informatica version 9:

Transformation

Type

Description

Aggregator

Active/Connected

Performs aggregate
calculations.

ApplicationSourceQualifier

Active/Connected

Represents therows that the


Integration Service reads
from an application, such as
an ERP source, when it runs
a session.

Custom

ActiveorPassive/Connected

Calls a procedure in a
shared library or DLL.

DataMasking

Passive/Connected

Replaces sensitive
production data with realistic
test data for non-production
environments.

Expression

Passive/Connected

Calculates a value.

ExternalProcedure

Passive/ConnectedorUnconnected

Calls a procedure in a
shared library or in the COM
layer of Windows.
Filter

Active/Connected

Filters data.

HTTP

Passive/Connected

Connects to an HTTP server


to read or update data.

Input

Passive/Connected

Defines mapplet input rows.


Available in the Mapplet
Designer

Java

ActiveorPassive/Connected

Executes user logic coded in


[Link] byte code for the
user logic is stored in the
repository

Joiner

Active/Connected

Joins data from different


databases or flat file
systems.

Lookup

ActiveorPassive/ConnectedorUnconnected

Lookup and return data from


a flat file, relational table,
view, or synonym.

Normalizer

Active/Connected

Source qualifier for COBOL


sources. Can also use in the
pipeline to normalize data
from relational or flat file
sources.

Output
Passive/Connected

Defines mapplet output


rows. Available in the
Mapplet Designer.

Rank

Active/Connected

Limits records to a top or


bottom range.

Router

Active/Connected

Routes data into multiple


transformations based on
group conditions.

SequenceGenerator

Passive/Connected

Generates primary keys.

Sorter

Active/Connected

Sorts data based on a sort


key.

SourceQualifier

Active/Connected

Represents the rows that the


Integration Service reads
from a relational or flat file
source when it runs a
session.

SQL

ActiveorPassive/Connected

Executes SQL queries


against a database.

StoredProcedure

Passive/ConnectedorUnconnected

Calls a stored procedure.

TransactionControl

Active/Connected

Defines commit and rollback


transactions.

Union

Active/Connected

Merges data from different


databases or flat file
systems.

UnstructuredData

ActiveorPassive/Connected

Transforms data in
unstructured and semi-
structured formats.

UpdateStrategy

Active/Connected

Determines whether to
insert, delete, update, or
reject rows.

XMLGenerator

Active/Connected

Reads data from one or


more input ports and outputs
XML through a single output
port.

XMLParser

Active/Connected

Reads XML from one input


port and outputs data to one
or more output ports.

XMLSourceQualifier

Active/Connected

Represents the rows that the


Integration Service reads
from an XML source when it
runs a session.
Source Qualifier Transformation in Informatica
[Link]
SQ_Create.jpg
The source qualifier transformation is an active,connected transformation used to
represent the rows
that the integrations service reads when it runs a session. You need to connect the
source qualifier
transformation to the relational or flat file definition in a mapping. The source
qualifier transformation
converts the source data types to the Informatica native data types. So, you should
not alter the data
types of the ports in the source qualifier transformation.
The source qualifier transformation is used to do the following tasks:

. Joins: You can join two or more tables from the same source database. By default
the sources
are joined based on the primary key-foreign key relationships. This can be changed
by explicitly
specifying the join condition in the "user-defined join" property.
. Filter rows: You can filter the rows from the source database. The integration
service adds a
WHERE clause to the default query.
. Sorting input: You can sort the source data by specifying the number for sorted
ports. The
Integration Service adds an ORDER BY clause to the default SQL query
. Distinct rows: You can get distinct rows from the source by choosing the "Select
Distinct"
property. The Integration Service adds a SELECT DISTINCT statement to the default
SQL
query.
. Custom SQL Query: You can write your own SQL query to do calculations.

Creating Source Qualifier Transformation:


The easiest method to create a source qualifier transformation is to drag the
source definition in to a
mapping. This will create the source qualifier transformation automatically.
Follow the below steps to create the source qualifier transformation manually.

. Click Transformation -> Create.


. Select the Source Qualifier transformation.
. Enter a name for the transformation
. Click on create.

. Select a source, click OK and then click Done.


[Link]
SQ_Select.jpg
[Link]
SQ_connect_to_source.jpg
[Link]
SQ_Properties.jpg

Now you can see in the below image how the source qualifier transformation is
connected to the
source definition.

Source Qualifier Transformation Properties:


We can configure the following source qualifier transformation properties on the
properties tab. To
go to the properties tab, open the source qualifier transformation by double
clicking on it and then
click on the properties tab.

Property

Description

SQL Query

To specify a custom query which replaces the default query.

User-Defined Join

Condition used for joining multiple sources.

Source Filter

Specifies the filter condition the Integration Service applies when querying
rows.

Number of Sorted
Ports

Used for sorting the source data

Tracing Level

Sets the amount of detail included in the session log when you run a session
containing this transformation.

Select Distinct

To select only unique rows from the source.


Pre-SQL

Pre-session SQL commands to run against the source database before the
Integration Service reads the source.
Post-SQL

Post-session SQL commands to run against the source database after the
Integration Service writes to the target.

Output is
Deterministic

Specify only when the source output does not change between session runs.

Output is
Repeatable

Specify only when the order of the source output is same between the session
runs.

Note: For flat file source definitions, all the properties except the Tracing level
will be disabled.
To Understand the following, Please create the employees and departments tables in
the source
and emp_dept table in the target database.

create table DEPARTMENTS

DEPARTMENT_ID NUMBER(4) not null,

DEPARTMENT_NAME VARCHAR2(15) not null,

MANAGER_ID NUMBER(6)

);

create table EMPLOYEES

EMPLOYEE_ID NUMBER(6) not null,

NAME VARCHAR2(10),

SALARY NUMBER(10,2),

MANAGER_ID NUMBER(6),

DEPARTMENT_ID NUMBER(4)

);

create table EMP_DEPT


(
EMPLOYEE_ID NUMBER(6) not null,

NAME VARCHAR2(10),

SALARY NUMBER(10,2),

MANAGER_ID NUMBER(6),

DEPARTMENT_ID NUMBER(4),

DEPARTMENT_NAME VARCHAR2(15) not null

);

Viewing the Default Query or Generating the SQL query:


For relational sources, the Integration Service generates a query for each Source
Qualifier
transformation when it runs a session. To view the default query generated, just
follow the below
steps:

. Go to the Properties tab, select "SQL Query" property. Then open the SQL Editor,
select the
"ODBC data source" and enter the username, password.
. Click Generate SQL.
. Click Cancel to exit.

The default query generated in this case is

SELECT employees.employee_id,

[Link],

[Link],

employees.manager_id,

employees.department_id

FROM employees

You can write your own SQL query rather than relaying the default query for
performing calculations.
Note: You can generate the SQL query only if the output ports of source qualifier
transformation is
connected to any other transformation in the mapping. The SQL query generated
contains only the
columns or ports which are connected to the downstream transformations.
[Link]
SQ_Prop1.jpg
[Link]
SQ_sql_editor.jpg
Specifying the "Source Filter, Number Of Sorted Ports and Select Distinct"
properties:
Follow the below steps for specifying the filter condition, sorting the source data
and for selecting the
distinct rows.

. Go to the properties tab.


. Select "Source Filter" property, open the editor and enter the filter condition
(Example:
employees.department_id=100) and click OK.
. Go to the "Number Of Sorted Ports" property and enter a value (Example: 2). This
value (2)
means to sort the data on the first two ports in the source qualifier
transformation.
. Tick the check box for the "Select Distinct" property.

Now follow the steps for "Generating the SQL query" and generate the SQL query. The
SQL query
generated is

SELECT DISTINCT employees.employee_id,

[Link],

[Link],

employees.manager_id,
employees.department_id

FROM employees

WHERE employees.department_id=100

ORDER BY employees.employee_id, [Link]

Observe the DISTINCT, WHERE and ORDER BY clauses in the SQL query generated. The
order by
clause contains the first two ports in the source qualifier transformation. If you
want to sort the data
on department_id, salary ports; simply move these ports to top position in the
source qualifier
transformationa and specify the "Number Of Sorted Ports" property as 2
Joins:
The SQL transformation can be used to join sources from the same database. By
default it joins the
sources based on the primary-key, foreign-key relationships. To join heterogeneous
sources, use
Joiner Transformation.
A foreign-key is created on the department_id column of the employees table, which
references the
primary-key column, department_id, of the departments table.
Follow the below steps to see the default join
Create only one source qualifier transformation for both the employees and
departments.
Go to the properties tab of the source qualifier transformation, select the "SQL
QUERY" property and
generate the SQL query.
The Generated SQL query is

SELECT employees.employee_id,

[Link],

[Link],

employees.manager_id,

employees.department_id,

departments.department_name

FROM employees,

departments

WHERE departments.department_id=employees.department_id
[Link]
SQ_default_join.jpg
You can see the employees and departments tables are joined on the department_id
column in the
WHERE clause.

There might be case where there won't be any relationship between the sources. In
that case, we
need to override the default join. To do this we have to specify the join condition
in the "User Defined
Join" Property. Using this property we can specify outer joins also. The join
conditions entered here
are database specific.
As an example, if we want to join the employees and departments table on the
manager_id column,
then in the "User Defined Join" property specify the join condition as
"departments.manager_id=employees.manager_id". Now generate the SQL and observe the

WHERE clause.
Pre and Post SQL:
You can add the Pre-SQL and Post-SQL commands. The integration service runs the
Pre-SQL and
Post-SQL before and after reading the source data respectively.
Take Quiz on Source Qualifier Transformation.
If you like this post, then please share it on google by clicking on the +1 button.

Print File in Reverse Using Unix Command

Q) How to print the lines in a file in reverse order? Which means we have to print
the data of file from
last line to the first line.
We will see different methods to reverse the data in a file. As an example,
consider the file with the
below data.

>cat [Link]

Header

line2

line3

line4
Footer

We need to display the lines in a file in reverse order. The output data is

Footer

line4

line3

line2

Header

[Link] tac command in unix can be used to print the file in reverse. The tac command
is

tac [Link]

2. The sed command for reversing the lines in a file is.

sed '1!G;h;$!d' [Link]

3. Another usage of sed command is

sed -n '1!G;h;$p' [Link]

If you know more methods, then please comment here.

Methods to Reverse a String Using Unix Commands

This topic will cover different methods to reverse each character in a string and
reversing the tokens
in a string.
Reversing a string:
1. The sed command can be used to reverse a string. The sed command for this is

echo "hello world" | sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

2. Aother usage of sed command with tac and tr is

echo "hello world" |sed 's/./&\n/g' |tac |tr -d '\n'

3. The awk command for reversing the string is

echo "hello world" | awk '{

n=split($0,arr,"");

for(i=1;i<=n;i++)

s=arr[i] s

END

print s

}'

4. In this method, a bash script will be written to reverse the string. The bash
script is

#!/bin/bash

str="hello world"

len=`echo $str | wc -c`

len=`expr $len - 1`

rev=""

while test $len -gt 0


do

rev1=`echo $str | cut -c$len`

rev=$rev$rev1

len=`expr $len - 1`

done

echo $rev

The output of all the above four methods is the reverse of the string "hello
world", which is

dlrow olleh

5. Using the rev command


We can use the rev command to reverse the string which is shown below:

echo "hello world"|rev

Reversing the tokens in a string:


1. The awk command can be used to reverse the tokens in a string. The awk command
for this is

echo "hello world" | awk '{

n=split($0,A);

S=A[n];

for(i=n-1;i>0;i--)

S=S" "A[i]

END

{
print S

}'

2. Using the tac and tr command we can reverse the tokens in a string. The unix
command is

echo "hello world"|tac -s " "| tr "\n" " "

3. The bash script for reversing the tokens in a string is.

#!/bin/bash

TOKENS="hello world"

for i in $TOKENS

do STR="$i $STR"

done

echo $STR

The output of the above two methods is

world hello

If you know more methods, then please comment here.

Remove Trailing zeros using Unix command

Q) How to remove the trailing zeros from each line of a file?


Trailing zeros are the zero which appear only at the end of the line. There are
many ways to remove
the trailing zeros. Let us assume that the source file contains the below data.

>cat [Link]
12345

67890

10100

10000

The required output should not contain the trailing zeros. The output should be

12345

6789

101

The unix command for producing this result is

rev [Link] | awk '{print $1*1}'|rev

Here the rev command will reverse the string in each line. Now the trailing zeros
will become leading
zeros. In the awk command the string is converted into a number and the leading
zeros will be
removed. At the end, the rev command again reverses the string.
If you know any other methods to remove the trailing zeros, then please comment
here.

Remove the Lines from a file which are same as the first line - Unix Awk

Q) How to remove the lines which are same as the first line.
Awk command can be used to remove the lines which are same as the first line in a
file. I will also
show you another method of removing the file. As an example, consider the file with
the below data.

Header
line2

line3

Header

line5

line6

line7

Header

The first line contains the text "Header". We need to remove the lines which has
the same text as the
first line.
The required output data is

Header

line2

line3

line5

line6

line7

The awk command can be used to achieve this. The awk command for this is

awk '{

if(NR==1)

x=$0;

print $0
}

else if(x!=$0)

print $0

}' [Link]

The other way to get the output is using bash script.

x=`head -1 [Link]`; echo $x; cat [Link] | grep -v \^"${x}"\$

Examples of Awk Command in Unix - Part 2

1. Inserting a new line after every 2 lines


We will see how to implement this using the awk command with an example.
The input "[Link]" contains the below data:

1 A

2 B

3 C

4 D

5 E

6 F

Let say, we want to insert the new line "9 Z" after every two lines in the input
file. The required output
data after inserting a new line looks as

1 A

2 B

9 Z
3 C

4 D

9 Z

5 E

6 F

9 Z

The awk command for getting this output is

awk '{

if(NR%2 == 0)

print $0"\n9 Z";

else

print $0

}' [Link]

2. Replace the Nth occurrence of a pattern


The input file contains the data.

AAA 1

BBB 2

CCC 3
AAA 4

AAA 5

BBB 6

CCC 7

AAA 8

BBB 9

AAA 0

Now we want to replace the fourth occurrence of the first filed "AAA" with "ZZZ" in
the file.
The required output is:

AAA 1

BBB 2

CCC 3

AAA 4

AAA 5

BBB 6

CCC 7

ZZZ 8

BBB 9

AAA 0

The awk command for getting this output is

awk 'BEGIN {count=0}

if($1 == "AAA")
{

count++

if(count == 4)

sub("AAA","ZZZ",$1)

print $0

}' [Link]

3. Find the sum of even and odd lines separately


The input file data:

A 10

B 39

C 22

D 44

E 75

F 89

G 67

You have to get the second field and then find the sum the even and odd lines.
The required output is

174, 172
The awk command for producing this output is

awk '{

if(NR%2 == 1)

sum_e = sum_e + $2

else

sum_o = sum_o + $2

END { print sum_e,sum_o }' [Link]

4. Fibonacci series using awk command


Now we will produce the Fibonacci series using the awk command.

awk ' BEGIN{

for(i=0;i<=10;i++)

if (i <=1 )

x=0;

y=1;

print i;

}
else

z=x+y;

print z;

x=y;

y=z;

}'

The output is

13

21

34

55

5. Remove leading zeros from a file using the awk command. The input file contains
the below data.
0012345

05678

01010

00001

After removing the leading zeros, the output should contain the below data.

12345

5678

1010

The awk command for this is.

awk '{print $1 + 0}' [Link]

awk '{printf "%d\n",$0}' [Link]

For more examples on awk command:


Examples of awk command - part 1
Examples of awk command - part 2

String aggregating Analytic Functions in Oracle Database

The string aggregate functions concatenate multiple rows into a single row.
Consider the products
table as an example.
Table Name: Products

Year product

-------------
2010 A

2010 B

2010 C

2010 D

2011 X

2011 Y

2011 Z

Here, in the output we will concatenate the products in each year by a comma
separator. The
desired output is:

year product_list

------------------

2010 A,B,C,D

2011 X,Y,Z

LISTAGG analytic function in 11gR2:


The LISTAGG function can be used to aggregate the strings. You can pass the
explicit delimiter to
the LISTAGG function.

SELECT year,

LISTAGG(product, ',') WITHIN GROUP (ORDER BY product) AS product_list

FROM products

GROUP BY year;

WM_CONCAT function:
You cannot pass an explicit delimiter to the WM_CONCAT function. It uses comma as
the string
separator.
SELECT year,

wm_concat(product) AS product_list

FROM products

GROUP BY year;

Pivot and Unpivot Operators in Oracle Database 11g

Pivot:
The pviot operator converts row data to column data and also can do aggregates
while converting.
To see how pivot operator works, consider the following "sales" table as any
example

Table Name: Sales

customer_id product price

--------------------------------------

1 A 10

1 B 20

2 A 30

2 B 40

2 C 50

3 A 60

3 B 70

3 C 80

The rows of the "sales" table needs to be converted into columns as shown below

Table Name: sales_rev

cutomer_id a_product b_product c_product

-----------------------------------------
1 10 20

2 30 40 50

3 60 70 80

The query for converting the rows to columns is

SELECT *

FROM (SELECT customer_id,product,price from sales)

pivot ( sum(price) as total_price for (product) IN ( 'A' as a, 'B' as b, 'C'


as c) )

Pivot can be used to generate the data in xml format. The query for generating the
data into xml
fomat is shown below.

SELECT *

FROM (SELECT customer_id,product,price from sales)

pivot XML ( sum(price) as total_price for (product) IN ( SELECT distinct


product from sales) )

If you are not using oracle 11g database, then you can implement the unpivot
feature as converting
rows to columns

Unpivot:

Unpivot operator converts the columns into rows.

Table Name: sales_rev

cutomer_id a_product b_product c_product

-----------------------------------------

1 10 20

2 30 40 50

3 60 70 80
Table Name: sales

customer_id product price

---------------------------

1 A 10

1 B 20

2 A 30

2 B 40

2 C 50

3 A 60

3 B 70

3 C 80

The query to convert rows into columns is

SELECT *

FROM sales_rev

UNPIVOT [EXCLUDE NULLs | INCLUDE NULLs] (price FOR product IN (a_product AS


'A', b_product AS 'B', c_product_c AS 'C'));

Points to note about the query

. The columns price and product in the unpivot clause are required and these names
need not to
be present in the table.
. The unpivoted columns must be specified in the IN clause
. By default the query excludes null values.
Top Examples of Awk Command in Unix

Awk is one of the most powerful tools in Unix used for processing the rows and
columns in a file.
Awk has built in string functions and associative arrays. Awk supports most of the
operators,
conditional blocks, and loops available in C language.
One of the good things is that you can convert Awk scripts into Perl scripts using
a2p utility.
The basic syntax of AWK:

awk 'BEGIN {start_action} {action} END {stop_action}' filename

Here the actions in the begin block are performed before processing the file and
the actions in the
end block are performed after processing the file. The rest of the actions are
performed while
processing the file.
Examples:
Create a file input_file with the following data. This file can be easily created
using the output of ls -l.

-rw-r--r-- 1 center center 0 Dec 8 21:39 p1

-rw-r--r-- 1 center center 17 Dec 8 21:15 t1

-rw-r--r-- 1 center center 26 Dec 8 21:38 t2

-rw-r--r-- 1 center center 25 Dec 8 21:38 t3

-rw-r--r-- 1 center center 43 Dec 8 21:39 t4

-rw-r--r-- 1 center center 48 Dec 8 21:39 t5

From the data, you can observe that this file has rows and columns. The rows are
separated by a
new line character and the columns are separated by a space characters. We will use
this file as the
input for the examples discussed here.
1. awk '{print $1}' input_file
Here $1 has a meaning. $1, $2, $3... represents the first, second, third columns...
in a row
respectively. This awk command will print the first column in each row as shown
below.

-rw-r--r--
-rw-r--r--

-rw-r--r--

-rw-r--r--

-rw-r--r--

-rw-r--r--

To print the 4th and 6th columns in a file use awk '{print $4,$5}' input_file
Here the Begin and End blocks are not used in awk. So, the print command will be
executed for
each row it reads from the file. In the next example we will see how to use the
Begin and End blocks.
2. awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}' input_file
This will prints the sum of the value in the 5th column. In the Begin block the
variable sum is
assigned with value 0. In the next block the value of 5th column is added to the
sum variable. This
addition of the 5th column to the sum variable repeats for every row it processed.
When all the rows
are processed the sum variable will hold the sum of the values in the 5th column.
This value is
printed in the End block.
3. In this example we will see how to execute the awk script written in a file.
Create a file
sum_column and paste the below script in that file

#!/usr/bin/awk -f

BEGIN {sum=0}

{sum=sum+$5}

END {print sum}

Now execute the the script using awk command as


awk -f sum_column input_file.
This will run the script in sum_column file and displays the sum of the 5th column
in the input_file.
4. awk '{ if($9 == "t4") print $0;}' input_file
This awk command checks for the string "t4" in the 9th column and if it finds a
match then it will print
the entire line. The output of this awk command is

-rw-r--r-- 1 pcenter pcenter 43 Dec 8 21:39 t4

5. awk 'BEGIN { for(i=1;i<=5;i++) print "square of", i, "is",i*i; }'


This will print the squares of first numbers from 1 to 5. The output of the command
is

square of 1 is 1

square of 2 is 4

square of 3 is 9

square of 4 is 16

square of 5 is 25

Notice that the syntax of �if� and �for� are similar to the C language.
Awk Built in Variables:
You have already seen $0, $1, $2... which prints the entire line, first column,
second column...
respectively. Now we will see other built in variables with examples.
FS - Input field separator variable:
So far, we have seen the fields separted by a space character. By default Awk
assumes that fields in
a file are separted by space characters. If the fields in the file are separted by
any other character,
we can use the FS variable to tell about the delimiter.
6. awk 'BEGIN {FS=":"} {print $2}' input_file
OR
awk -F: '{print $2} input_file
This will print the result as

39 p1

15 t1
38 t2

38 t3

39 t4

39 t5

OFS - Output field separator variable:


By default whenever we printed the fields using the print statement the fields are
displayed with
space character as delimiter. For example
7. awk '{print $4,$5}' input_file
The output of this command will be

center 0

center 17

center 26

center 25

center 43

center 48

We can change this default behavior using the OFS variable as


awk 'BEGIN {OFS=":"} {print $4,$5}' input_file

center:0

center:17

center:26

center:25

center:43
center:48

Note: print $4,$5 and print $4$5 will not work the same way. The first one displays
the output with
space as delimiter. The second one displays the output without any delimiter.
NF - Number of fileds variable:
The NF can be used to know the number of fields in line
8. awk '{print NF}' input_file
This will display the number of columns in each row.
NR - number of records variable:
The NR can be used to know the line number or count of lines in a file.
9. awk '{print NR}' input_file
This will display the line numbers from 1.
10. awk 'END {print NR}' input_file
This will display the total number of lines in the file.
String functions in Awk:
Some of the string functions in awk are:
index(string,search)
length(string)
split(string,array,separator)
substr(string,position)
substr(string,position,max)
tolower(string)
toupper(string)
Advanced Examples:
1. Filtering lines using Awk split function
The awk split function splits a string into an array using the delimiter.
The syntax of split function is
split(string, array, delimiter)
Now we will see how to filter the lines using the split function with an example.
The input "[Link]" contains the data in the following format
1 U,N,UNIX,000

2 N,P,SHELL,111

3 I,M,UNIX,222

4 X,Y,BASH,333

5 P,R,SCRIPT,444

Required output: Now we have to print only the lines in which whose 2nd field has
the string "UNIX"
as the 3rd field( The 2nd filed in the line is separated by comma delimiter ).
The ouptut is:

1 U,N,UNIX,000

3 I,M,UNIX,222

The awk command for getting the output is:

awk '{

split($2,arr,",");

if(arr[3] == "UNIX")

print $0

} ' [Link]

Examples of Basename Command in Unix

The basename utility is used to

. Remove any prefix ending in /.


. Remove the suffix from a string.

Syntax of basename command:


basename [string] [suffix]
Here 'string' is the input string and suffix is the string which needs to removed
from the input string.
Examples:
1. basename /usr/bin/perlscript
This will remove the prefix, /usr/bin/, and prints only the string 'perlscript'
2. basename perlscript script
This will remove the suffix 'script' from 'perlscript' and prints only 'perl'
3. basename /usr/bin/perlscript script
This will remove both the prefix and suffix and prints only 'perl'
basename command is mostly used in shell scripts to get the name of the shell
script file you are
running. Sample shell script code is shown below

#!/usr/bin/sh

filename=`basename $0`

echo $filename

Examples of Alias Command in Unix

Alias command is an alternative name used for long strings that are frequently
used. It is mostly
used for creating a simple name for a long command.
Syntax of alias command:
alias [alias_name=['command']]
For more information on alias utility see the man pages. Type 'man alias' on the
command prompt.
Examples:
1. alias
If you simply type alias on the command prompt and then enter, it will list all the
aliases that were
created.
2. alias pg='ps -aef'
The ps -aef command will list all the running processes. After creating the alias
pg for ps -aef, then
by using the pg on command prompt will display the running processes. The pg will
work same as
the ps -aef.
By creating an alias for a command on the command prompt will be present only for
that session.
Once you exit from the session, then the aliases won�t take effect. To make the
aliases to remain
permanent, place the alias command in the ".profile" of the user. Open the user
".profile" and place
the command alias pg="ps -aef", save the file and then source the ".profile" file.
Now the alias pg will
remain forever.
To remove an alias use the unalias command
Example: unalias pg

Converting Awk Script to Perl Script - Examples of a2p Unix Command

Unix provides the a2p (awk to perl) utility for converting the awk script to perl
script. The a2p
command takes an awk script and produces a comparable perl script.
Syntax of a2p:
a2p [options] [awk_script_filename]
Some of the useful options that you can pass to a2p are:
-D<number> Sets debugging flags.
-F<character> This will tell a2p that awk script is always invoked with -F option.
-<number> This makes a2p to assume that input will always have the specified number
of fields.
For more options see the man pages; man a2p
Example1:
The awk script which prints the squares of numbers up to 10 is shown below. Call
the below script
as awk_squares.

#!/bin/awk -f

BEGIN

for (i=1; i <= 10; i++)

print "The square of ", i, " is ", i*i;


}

exit;

Run this script using awk command; awk -f awk_squares. This will produce squares of
numbers up
to 10.
Now we will convert this script using the a2p as
a2p awk_squares > perl_squares
The content of converted perl script, perl_squares, is shown below:

#!/usr/bin/perl

eval 'exec /usr/bin/perl -S $0 ${1+"[Link]

if $running_under_some_shell;

# this emulates #! processing on NIH machines.

# (remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;

# process any FOO=bar switches

$, = ' '; # set output field separator

$\ = "\n"; # set output record separator

for ($i = 1; $i <= 10; $i++) {

print 'The square of ', $i, ' is ', $i * $i;

last line;

Run the perl script as: perl perl_squares. This will produce the same result as the
awk.
Example2:
We will see an awk script which prints the first field from a file. The awk script
for this is shown
below. Call this script at awk_first_field.

#!/bin/awk -f

print $1;

Run this script using awk command by passing a file as input: awk -f
awk_first_field file_name. This
will prints the first field of each line from the file_name.
We will convert this awk script into per script using the a2p command as
a2p awk_first_field > perl_first_field
The content of converted perl script, perl_first_field, is shown below:

#!/usr/bin/perl

eval 'exec /usr/bin/perl -S $0 ${1+"[Link]

if $running_under_some_shell;

# this emulates #! processing on NIH machines.

# (remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;

# process any FOO=bar switches

$, = ' '; # set output field separator

$\ = "\n"; # set output record separator

while (<>) {

($Fld1) = split(' ', $_, -1);

print $Fld1;

}
Now run the perl script as: perl perl_first_field file_name. This will produce the
same result as awk
command.

Informatica Problems With Solutions - Part 1

1. In this problem we will see how to implement the not equal operator, greater
than, greater than or
equal to, less than and less than or equal to operators when joining two tables in
informatica.
Consider the below sales table as an example?
Table name: Sales

product, prod_quantity, price , Year


A , 10 , 100 , 2010
B , 15 , 150 , 2010
A , 8 , 80 , 2011
B , 26 , 260 , 2011

Now the problem is to identify the products whose sales is less than in the current
year (In this
example: 2011) when compared to the last year.
Here in this example, Product A sold less in 2011 when compared with the sales in
2010.
This problem can be easily implemented with the help of SQL query as shown below

SELECT cy.*
FROM SALES cy,
SALES py
WHERE [Link] = [Link]
AND [Link]=2011
AND [Link]=2010
AND cy.prod_quantity < py.prod_quantity;

In informatica, you can specify only equal to condition in joiner. Now we will see
how to implement
this problem using informatica.
Solution:
STEP1: Connect two source qualifier transformations to the source definition. Call
the first source
qualifier transformation as sq_cy (cy means current year) and the other as sq_py
(py means
previous year).
STEP2: In the sq_cy source qualifier transformation, specify the source filter as
price=2011. In the
sq_py, specify the source filter as price=2010
STEP3: Now connect these two source qualifier transformations to joiner
transformation and make
sq_cy as master, sq_py as detail. In the join condition, select the product port
from master and
detail.
STEP4: Now connect all the master ports and only the prod_quantity port from detail
to the filter
transformation. In the filter transformation specify the filter condition as
prod_quantity <
prod_quantity1. Here pord_quantity port is from master port and prod_quantity1 is
from detail port.
STEP4: Connect all the ports except the prod_quantity1 of filter transformation to
the target
definition.
2. How to implement the not exists operator in informatica which is available in
database?
Solution:
Implementing the Not Exists operator is very easy in informatica. For example, we
want to get only
the records which are available in table A and not in table B. For this use a
joiner transformation with
A as master and B as detail. Specify the join condition and in the join type,
select detail outer join.
This will get all the records from A table and only the matching records from B
table.
Connect the joiner to a filter transformation and specify the filter condition as
B_port is NULL. This
will give the records which are in A and not in B. Then connect the filter to the
target definition.

Top Unix Interview Questions - Part 8

1. Write a command to print the lines that has the the pattern "july" in all the
files in a particular
directory?
grep july *
This will print all the lines in all files that contain the word �july� along with
the file name. If any of the
files contain words like "JULY" or "July", the above command would not print those
lines.
2. Write a command to print the lines that has the word "july" in all the files in
a directory and also
suppress the filename in the output.
grep -h july *
3. Write a command to print the lines that has the word "july" while ignoring the
case.
grep -i july *
The option i make the grep command to treat the pattern as case insensitive.
4. When you use a single file as input to the grep command to search for a
pattern, it won't print the
filename in the output. Now write a grep command to print the filename in the
output without using
the '-H' option.
grep pattern filename /dev/null
The /dev/null or null device is special file that discards the data written to it.
So, the /dev/null is
always an empty file.
Another way to print the filename is using the '-H' option. The grep command for
this is
grep -H pattern filename
5. Write a command to print the file names in a directory that does not contain the
word "july"?
grep -L july *
The '-L' option makes the grep command to print the filenames that do not contain
the specified
pattern.
6. Write a command to print the line numbers along with the line that has the word
"july"?
grep -n july filename
The '-n' option is used to print the line numbers in a file. The line numbers start
from 1
7. Write a command to print the lines that starts with the word "start"?
grep '^start' filename
The '^' symbol specifies the grep command to search for the pattern at the start of
the line.
8. In the text file, some lines are delimited by colon and some are delimited by
space. Write a
command to print the third field of each line.
awk '{ if( $0 ~ /:/ ) { FS=":"; } else { FS =" "; } print $3 }' filename
9. Write a command to print the line number before each line?
awk '{print NR, $0}' filename
10. Write a command to print the second and third line of a file without using NR.
awk 'BEGIN {RS="";FS="\n"} {print $2,$3}' filename
11. How to create an alias for the complex command and remove the alias?
The alias utility is used to create the alias for a command. The below command
creates alias for ps -
aef command.
alias pg='ps -aef'
If you use pg, it will work the same way as ps -aef.
To remove the alias simply use the unalias command as
unalias pg
12. Write a command to display todays date in the format of 'yyyy-mm-dd'?
The date command can be used to display todays date with time
date '+%Y-%m-%d'
Top Unix Interview Questions - Part 7

1. Write a command to display your name 100 times.


The Yes utility can be used to repeatedly output a line with the specified string
or 'y'.
yes <your_name> | head -100
2. Write a command to display the first 10 characters from each line of a file?
cut -c -10 filename
3. The fields in each line are delimited by comma. Write a command to display third
field from each
line of a file?
cut -d',' -f2 filename
4. Write a command to print the fields from 10 to 20 from each line of a file?
cut -d',' -f10-20 filename
5. Write a command to print the first 5 fields from each line?
cut -d',' -f-5 filename
6. By default the cut command displays the entire line if there is no delimiter in
it. Which cut option is
used to supress these kind of lines?
The -s option is used to supress the lines that do not contain the delimiter.
7. Write a command to replace the word "bad" with "good" in file?
sed s/bad/good/ < filename
8. Write a command to replace the word "bad" with "good" globally in a file?
sed s/bad/good/g < filename
9. Write a command to replace the word "apple" with "(apple)" in a file?
sed s/apple/(&)/ < filename
10. Write a command to switch the two consecutive words "apple" and "mango" in a
file?
sed 's/\(apple\) \(mango\)/\2 \1/' < filename
11. Write a command to display the characters from 10 to 20 from each line of a
file?
cut -c 10-20 filename
Top Unix Interview Questions - Part 6

1. Write a command to remove the prefix of the string ending with '/'.
The basename utility deletes any prefix ending in /. The usage is mentioned below:
basename /usr/local/bin/file
This will display only file
2. How to display zero byte size files?
ls -l | grep '^-' | awk '/^-/ {if ($5 !=0 ) print $9 }'
3. How to replace the second occurrence of the word "bat" with "ball" in a file?
sed 's/bat/ball/2' < filename
4. How to remove all the occurrences of the word "jhon" except the first one in a
line with in the
entire file?
sed 's/jhon//2g' < filename
5. How to replace the word "lite" with "light" from 100th line to last line in a
file?
sed '100,$ s/lite/light/' < filename
6. How to list the files that are accessed 5 days ago in the current directory?
find -atime 5 -type f
7. How to list the files that were modified 5 days ago in the current directory?
find -mtime 5 -type f
8. How to list the files whose status is changed 5 days ago in the current
directory?
find -ctime 5 -type f
9. How to replace the character '/' with ',' in a file?
sed 's/\//,/' < filename
sed 's|/|,|' < filename
10. Write a command to find the number of files in a directory.
ls -l|grep '^-'|wc -l

Top Unix Interview Questions - Part 5

1. How to display the processes that were run by your user name ?
ps -aef | grep <user_name>
2. Write a command to display all the files recursively with path under current
directory?
find . -depth -print
3. Display zero byte size files in the current directory?
find -size 0 -type f
4. Write a command to display the third and fifth character from each line of a
file?
cut -c 3,5 filename
5. Write a command to print the fields from 10th to the end of the line. The fields
in the line are
delimited by a comma?
cut -d',' -f10- filename
6. How to replace the word "Gun" with "Pen" in the first 100 lines of a file?
sed '1,00 s/Gun/Pen/' < filename
7. Write a Unix command to display the lines in a file that do not contain the word
"RAM"?
grep -v RAM filename
The '-v' option tells the grep to print the lines that do not contain the specified
pattern.
8. How to print the squares of numbers from 1 to 10 using awk command
awk 'BEGIN { for(i=1;i<=10;i++) {print "square of",i,"is",i*i;}}'
9. Write a command to display the files in the directory by file size?
ls -l | grep '^-' |sort -nr -k 5
10. How to find out the usage of the CPU by the processes?
The top utility can be used to display the CPU usage by the processes.

Top Unix Interview Questions - Part 4

1. How do you write the contents of 3 files into a single file?


cat file1 file2 file3 > file
2. How to display the fields in a text file in reverse order?
awk 'BEGIN {ORS=""} { for(i=NF;i>0;i--) print $i," "; print "\n"}' filename
3. Write a command to find the sum of bytes (size of file) of all files in a
directory.
ls -l | grep '^-'| awk 'BEGIN {sum=0} {sum = sum + $5} END {print sum}'
4. Write a command to print the lines which end with the word "end"?
grep 'end$' filename
The '$' symbol specifies the grep command to search for the pattern at the end of
the line.
5. Write a command to select only those lines containing "july" as a whole word?
grep -w july filename
The '-w' option makes the grep command to search for exact whole words. If the
specified pattern is
found in a string, then it is not considered as a whole word. For example: In the
string "mikejulymak",
the pattern "july" is found. However "july" is not a whole word in that string.
6. How to remove the first 10 lines from a file?
sed '1,10 d' < filename
7. Write a command to duplicate each line in a file?
sed 'p' < filename
8. How to extract the username from 'who am i' comamnd?
who am i | cut -f1 -d' '
9. Write a command to list the files in '/usr' directory that start with 'ch' and
then display the number
of lines in each file?
wc -l /usr/ch*
Another way is
find /usr -name 'ch*' -type f -exec wc -l {} \;
10. How to remove blank lines in a file ?
grep -v �^$� filename > new_filename

Top Unix Interview Questions - Part 3

1. Display all the files in current directory sorted by size?


ls -l | grep '^-' | awk '{print $5,$9}' |sort -n|awk '{print $2}'

2. Write a command to search for the file 'map' in the current directory?
find -name map -type f

3. How to display the first 10 characters from each line of a file?


cut -c -10 filename

4. Write a command to remove the first number on all lines that start with "@"?
sed '\,^@, s/[0-9][0-9]*//' < filename

5. How to print the file names in a directory that has the word "term"?
grep -l term *
The '-l' option make the grep command to print only the filename without printing
the content of the
file. As soon as the grep command finds the pattern in a file, it prints the
pattern and stops searching
other lines in the file.

6. How to run awk command specified in a file?


awk -f filename
7. How do you display the calendar for the month march in the year 1985?
The cal command can be used to display the current month calendar. You can pass the
month and
year as arguments to display the required year, month combination calendar.
cal 03 1985
This will display the calendar for the March month and year 1985.

8. Write a command to find the total number of lines in a file?


wc -l filename
Other ways to pring the total number of lines are
awk 'BEGIN {sum=0} {sum=sum+1} END {print sum}' filename
awk 'END{print NR}' filename

9. How to duplicate empty lines in a file?


sed '/^$/ p' < filename

10. Explain iostat, vmstat and netstat?

. Iostat: reports on terminal, disk and tape I/O activity.


. Vmstat: reports on virtual memory statistics for processes, disk, tape and CPU
activity.
. Netstat: reports on the contents of network data structures.

Top Unix Interview Questions - Part 2

1. How do you rename the files in a directory with _new as suffix?


ls -lrt|grep '^-'| awk '{print "mv "$9" "$9".new"}' | sh
2. Write a command to convert a string from lower case to upper case?
echo "apple" | tr [a-z] [A-Z]
3. Write a command to convert a string to Initcap.
echo apple | awk '{print toupper(substr($1,1,1)) tolower(substr($1,2))}'
4. Write a command to redirect the output of date command to multiple files?
The tee command writes the output to multiple files and also displays the output on
the terminal.
date | tee -a file1 file2 file3
5. How do you list the hidden files in current directory?
ls -a | grep '^\.'

6. List out some of the Hot Keys available in bash shell?

. Ctrl+l - Clears the Screen.


. Ctrl+r - Does a search in previously given commands in shell.
. Ctrl+u - Clears the typing before the hotkey.
. Ctrl+a - Places cursor at the beginning of the command at shell.
. Ctrl+e - Places cursor at the end of the command at shell.
. Ctrl+d - Kills the shell.
. Ctrl+z - Places the currently running process into background.

7. How do you make an existing file empty?


cat /dev/null > filename
8. How do you remove the first number on 10th line in file?
sed '10 s/[0-9][0-9]*//' < filename
9. What is the difference between join -v and join -a?
join -v : outputs only matched lines between two files.
join -a : In addition to the matched lines, this will output unmatched lines also.
10. How do you display from the 5th character to the end of the line from a file?
cut -c 5- filename

Top Unix Interview Questions - Part 1

1. How to display the 10th line of a file?


head -10 filename | tail -1
2. How to remove the header from a file?
sed -i '1 d' filename
3. How to remove the footer from a file?
sed -i '$ d' filename
4. Write a command to find the length of a line in a file?
The below command can be used to get a line from a file.
sed �n '<n> p' filename
We will see how to find the length of 10th line in a file
sed -n '10 p' filename|wc -c
5. How to get the nth word of a line in Unix?
cut �f<n> -d' '
6. How to reverse a string in unix?
echo "java" | rev
7. How to get the last word from a line in Unix file?
echo "unix is good" | rev | cut -f1 -d' ' | rev
8. How to replace the n-th line in a file with a new line in Unix?
sed -i'' '10 d' filename # d stands for delete
sed -i'' '10 i new inserted line' filename # i stands for insert
9. How to check if the last command was successful in Unix?
echo $?
10. Write command to list all the links from a directory?
ls -lrt | grep "^l"
11. How will you find which operating system your system is running on in UNIX?
uname -a
12. Create a read-only file in your home directory?
touch file; chmod 400 file
13. How do you see command line history in UNIX?
The 'history' command can be used to get the list of commands that we are executed.

14. How to display the first 20 lines of a file?


By default, the head command displays the first 10 lines from a file. If we change
the option of head,
then we can display as many lines as we want.
head -20 filename
An alternative solution is using the sed command
sed '21,$ d' filename
The d option here deletes the lines from 21 to the end of the file
15. Write a command to print the last line of a file?
The tail command can be used to display the last lines from a file.
tail -1 filename
Alternative solutions are:
sed -n '$ p' filename
awk 'END{print $0}' filename

Informatica Scenario Based Questions - Part 5

Q1. The source data contains only column 'id'. It will have sequence numbers from 1
to 1000. The
source data looks like as

Id
1
2
3
4
5
6
7
8
....
1000
Create a workflow to load only the Fibonacci numbers in the target table. The
target table data
should look like as

Id
1
2
3
5
8
13
.....

In Fibonacci series each subsequent number is the sum of previous two numbers. Here
assume that
the first two numbers of the fibonacci series are 1 and 2.
Solution:
STEP1: Drag the source to the mapping designer and then in the Source Qualifier
Transformation
properties, set the number of sorted ports to one. This will sort the source data
in ascending order.
So that we will get the numbers in sequence as 1, 2, 3, ....1000
STEP2: Connect the Source Qualifier Transformation to the Expression
Transformation. In the
Expression Transformation, create three variable ports and one output port. Assign
the expressions
to the ports as shown below.
Ports in Expression Transformation:
id
v_sum = v_prev_val1 + v_prev_val2
v_prev_val1 = IIF(id=1 or id=2,1, IIF(v_sum = id, v_prev_val2, v_prev_val1) )
v_prev_val2 = IIF(id=1 or id =2, 2, IIF(v_sum=id, v_sum, v_prev_val2) )
o_flag = IIF(id=1 or id=2,1, IIF( v_sum=id,1,0) )
STEP3: Now connect the Expression Transformation to the Filter Transformation and
specify the
Filter Condition as o_flag=1
STEP4: Connect the Filter Transformation to the Target Table.
Q2. The source table contains two columns "id" and "val". The source data looks
like as below

id val
1 a,b,c
2 pq,m,n
3 asz,ro,liqt
Here the "val" column contains comma delimited data and has three fields in that
column.
Create a workflow to split the fields in �val� column to separate rows. The output
should look like as
below.

id val
1 a
1 b
1 c
2 pq
2 m
2 n
3 asz
3 ro
3 liqt

Solution:
STEP1: Connect three Source Qualifier transformations to the Source Definition
STEP2: Now connect all the three Source Qualifier transformations to the Union
Transformation.
Then connect the Union Transformation to the Sorter Transformation. In the sorter
transformation
sort the data based on Id port in ascending order.
STEP3: Pass the output of Sorter Transformation to the Expression Transformation.
The ports in
Expression Transformation are:
id (input/output port)
val (input port)
v_currend_id (variable port) = id
v_count (variable port) = IIF(v_current_id!=v_previous_id,1,v_count+1)
v_previous_id (variable port) = id
o_val (output port) = DECODE(v_count, 1,
SUBSTR(val, 1, INSTR(val,',',1,1)-1 ),
2,
SUBSTR(val, INSTR(val,',',1,1)+1, INSTR(val,',',1,2)-INSTR(val,',',1,1)-1),
3,
SUBSTR(val, INSTR(val,',',1,2)+1),
NULL
)
STEP4: Now pass the output of Expression Transformation to the Target definition.
Connect id,
o_val ports of Expression Transformation to the id, val ports of Target Definition.

For those who are interested to solve this problem in oracle sql, Click Here. The
oracle sql query
provides a dynamic solution where the "val" column can have varying number of
fields in each row.
Unix Interview Questions on FIND Command

Find utility is used for searching files using the directory information.
1. Write a command to search for the file 'test' in the current directory?
find -name test -type f
2. Write a command to search for the file 'temp' in '/usr' directory?
find /usr -name temp -type f
3. Write a command to search for zero byte size files in the current directory?
find -size 0 -type f
4. Write a command to list the files that are accessed 5 days ago in the current
directory?
find -atime 5 -type f
5. Write a command to list the files that were modified 5 days ago in the current
directory?
find -mtime 5 -type f
6. Write a command to search for the files in the current directory which are not
owned by any user
in the /etc/passwd file?
find . -nouser -type f
7. Write a command to search for the files in '/usr' directory that start with
'te'?
find /usr -name 'te*' -type f
8. Write a command to search for the files that start with 'te' in the current
directory and then display
the contents of the file?
find . -name 'te*' -type f -exec cat {} \;
9. Write a command to list the files whose status is changed 5 days ago in the
current directory?
find -ctime 5 -type f
10. Write a command to list the files in '/usr' directory that start with 'ch' and
then display the number
of lines in each file?
find /usr -name 'ch*' -type f -exec wc -l {} \;

Unix Interview Questions on CUT Command


The cut command is used to used to display selected columns or fields from each
line of a file. Cut
command works in two modes:

. Delimited selection: The fields in the line are delimited by a single character
like blank,comma
etc.
. Range selection: Each field starts with certain fixed offset defined as range.

1. Write a command to display the third and fourth character from each line of a
file?

cut -c 3,4 filename

2. Write a command to display the characters from 10 to 20 from each line of a


file?

cut -c 10-20 filename

3. Write a command to display the first 10 characters from each line of a file?

cut -c -10 filename

4. Write a comamnd to display from the 10th character to the end of the line?

cut -c 10- filename

5. The fields in each line are delimited by comma. Write a command to display third
field from each
line of a file?

cut -d',' -f2 filename

6. Write a command to print the fields from 10 to 20 from each line of a file?

cut -d',' -f10-20 filename

7. Write a command to print the first 5 fields from each line?

cut -d',' -f-5 filename

8. Write a command to print the fields from 10th to the end of the line?

cut -d',' -f10- filename

9. By default the cut command displays the entire line if there is no delimiter in
it. Which cut option is
used to supress these kind of lines?
The -s option is used to supress the lines that do not contain the delimiter.

10. Write a cut command to extract the username from 'who am i' comamnd?

who am i | cut -f1 -d' '

Unix Interview Questions on SED Command

SED is a special editor used for modifying files automatically.


1. Write a command to replace the word "bad" with "good" in file?
sed s/bad/good/ < filename
2. Write a command to replace the word "bad" with "good" globally in a file?
sed s/bad/good/g < filename
3. Write a command to replace the character '/' with ',' in a file?
sed 's/\//,/' < filename
sed 's|/|,|' < filename
4. Write a command to replace the word "apple" with "(apple)" in a file?
sed s/apple/(&)/ < filename
5. Write a command to switch the two consecutive words "apple" and "mango" in a
file?
sed 's/\(apple\) \(mango\)/\2 \1/' < filename
6. Write a command to replace the second occurrence of the word "bat" with "ball"
in a file?
sed 's/bat/ball/2' < filename
7. Write a command to remove all the occurrences of the word "jhon" except the
first one in a line
with in the entire file?
sed 's/jhon//2g' < filename
8. Write a command to remove the first number on line 5 in file?
sed '5 s/[0-9][0-9]*//' < filename
9. Write a command to remove the first number on all lines that start with "@"?
sed '\,^@, s/[0-9][0-9]*//' < filename
10. Write a command to replace the word "gum" with "drum" in the first 100 lines of
a file?
sed '1,00 s/gum/drum/' < filename
11. write a command to replace the word "lite" with "light" from 100th line to last
line in a file?
sed '100,$ s/lite/light/' < filename
12. Write a command to remove the first 10 lines from a file?
sed '1,10 d' < filename
13. Write a command to duplicate each line in a file?
sed 'p' < filename
14. Write a command to duplicate empty lines in a file?
sed '/^$/ p' < filename
15. Write a sed command to print the lines that do not contain the word "run"?
sed -n '/run/!p' < filename
Hive Built-in Functions

Functions in Hive are categorized as below.


Numeric and Mathematical Functions: These functions mainly used to perform
mathematical
calculations.
Date Functions: These functions are used to perform operations on date data types
like adding the
number of days to the date etc.
String Functions: These functions are used to perform operations on strings like
finding the length
of a string etc.
Conditional Functions: These functions are used to test conditions and returns a
value based on
whether the test condition is true or false.
Collection Functions: These functions are used to find the size of the complex
types like array and
map. The only collection function is SIZE. The SIZE function is used to find the
number of elements
in an array and map. The syntax of SIZE function is

SIZE( Array<A> ) and SIZE( MAP<key,value> )

Type Conversion Function: This function is used to convert from one data type to
another. The
only type conversion function is CAST. The syntax of CAST is

CAST( expr as <type> )

The CAST function converts the expr into the specified type.
Table Generating Functions: These functions transform a single row into multiple
rows. EXPLODE
is the only table generated function. This function takes array as an input and
outputs the elements
of array into separate rows. The syntax of EXPLODE is

EXPLODE( ARRAY<A> )

When you use the table generating functions in the SELECT clause, you cannot
specify any other
columns in the SELECT clause.

Conditional Functions in Hive

Hive supports three types of conditional functions. These functions are listed
below:
IF( Test Condition, True Value, False Value )
The IF condition evaluates the �Test Condition� and if the �Test Condition� is
true, then it returns the
�True Value�. Otherwise, it returns the False Value.
Example: IF(1=1, 'working', 'not working') returns 'working'
COALESCE( value1,value2,... )
The COALESCE function returns the fist not NULL value from the list of values. If
all the values in
the list are NULL, then it returns NULL.
Example: COALESCE(NULL,NULL,5,NULL,4) returns 5
CASE Statement
The syntax for the case statement is:

CASE [ expression ]
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
WHEN conditionn THEN resultn
ELSE result
END

Here expression is optional. It is the value that you are comparing to the list of
conditions. (ie:
condition1, condition2, ... conditionn).
All the conditions must be of same datatype. Conditions are evaluated in the order
listed. Once a
condition is found to be true, the case statement will return the result and not
evaluate the conditions
any further.
All the results must be of same datatype. This is the value returned once a
condition is found to be
true.
IF no condition is found to be true, then the case statement will return the value
in the ELSE clause.
If the ELSE clause is omitted and no condition is found to be true, then the case
statement will return
NULL
Example:

CASE Fruit
WHEN 'APPLE' THEN 'The owner is APPLE'
WHEN 'ORANGE' THEN 'The owner is ORANGE'
ELSE 'It is another Fruit'
END

The other form of CASE is


CASE
WHEN Fruit = 'APPLE' THEN 'The owner is APPLE'
WHEN Fruit = 'ORANGE' THEN 'The owner is ORANGE'
ELSE 'It is another Fruit'
END

String Functions in Hive

The string functions in Hive are listed below:


ASCII( string str )
The ASCII function converts the first character of the string into its numeric
ascii value.
Example1: ASCII('hadoop') returns 104
Example2: ASCII('A') returns 65
CONCAT( string str1, string str2... )
The CONCAT function concatenates all the stings.
Example: CONCAT('hadoop','-','hive') returns 'hadoop-hive'
CONCAT_WS( string delimiter, string str1, string str2... )
The CONCAT_WS function is similar to the CONCAT function. Here you can also provide
the
delimiter, which can be used in between the strings to concat.
Example: CONCAT_WS('-','hadoop','hive') returns 'hadoop-hive'
FIND_IN_SET( string search_string, string source_string_list )
The FIND_IN_SET function searches for the search string in the source_string_list
and returns the
position of the first occurrence in the source string list. Here the source string
list should be comma
delimited one. It returns 0 if the first argument contains comma.
Example: FIND_IN_SET('ha','hao,mn,hc,ha,hef') returns 4
LENGTH( string str )
The LENGTH function returns the number of characters in a string.
Example: LENGTH('hive') returns 4
LOWER( string str ), LCASE( string str )
The LOWER or LCASE function converts the string into lower case letters.
Example: LOWER('HiVe') returns 'hive'
LPAD( string str, int len, string pad )
The LPAD function returns the string with a length of len characters left-padded
with pad.
Example: LPAD('hive',6,'v') returns 'vvhive'
LTRIM( string str )
The LTRIM function removes all the trailing spaces from the string.
Example: LTRIM(' hive') returns 'hive'
REPEAT( string str, int n )
The REPEAT function repeats the specified string n times.
Example: REPEAT('hive',2) returns 'hivehive'
RPAD( string str, int len, string pad )
The RPAD function returns the string with a length of len characters right-padded
with pad.
Example: RPAD('hive',6,'v') returns 'hivevv'
REVERSE( string str )
The REVERSE function gives the reversed string
Example: REVERSE('hive') returns 'evih'
RTRIM( string str )
The RTRIM function removes all the leading spaces from the string.
Example: LTRIM('hive ') returns 'hive'
SPACE( int number_of_spaces )
The SPACE function returns the specified number of spaces.
Example: SPACE(4) returns ' '
SPLIT( string str, string pat )
The SPLIT function splits the string around the pattern pat and returns an array of
strings. You can
specify regular expressions as patterns.
Example: SPLIT('hive:hadoop',':') returns ["hive","hadoop"]
SUBSTR( string source_str, int start_position [,int length] ), SUBSTRING( string
source_str,
int start_position [,int length] )
The SUBSTR or SUBSTRING function returns a part of the source string from the start
position with
the specified length of characters. If the length is not given, then it returns
from the start position to
the end of the string.
Example1: SUBSTR('hadoop',4) returns 'oop'
Example2: SUBSTR('hadoop',4,2) returns 'oo'
TRIM( string str )
The TRIM function removes both the trailing and leading spaces from the string.
Example: LTRIM(' hive ') returns 'hive'
UPPER( string str ), UCASE( string str )
The UPPER or LCASE function converts the string into upper case letters.
Example: UPPER('HiVe') returns 'HIVE'

Date Functions in Hive

Date data types do not exist in Hive. In fact the dates are treated as strings in
Hive. The date
functions are listed below.
UNIX_TIMESTAMP()
This function returns the number of seconds from the Unix epoch (1970-01-01
[Link] UTC) using
the default time zone.
UNIX_TIMESTAMP( string date )
This function converts the date in format 'yyyy-MM-dd HH:mm:ss' into Unix
timestamp. This will
return the number of seconds between the specified date and the Unix epoch. If it
fails, then it
returns 0.
Example: UNIX_TIMESTAMP('2000-01-01 [Link]') returns 946713600
UNIX_TIMESTAMP( string date, string pattern )
This function converts the date to the specified date format and returns the number
of seconds
between the specified date and Unix epoch. If it fails, then it returns 0.
Example: UNIX_TIMESTAMP('2000-01-01 [Link]','yyyy-MM-dd') returns 946713600
FROM_UNIXTIME( bigint number_of_seconds [, string format] )
The FROM_UNIX function converts the specified number of seconds from Unix epoch and
returns
the date in the format 'yyyy-MM-dd HH:mm:ss'.
Example: FROM_UNIXTIME( UNIX_TIMESTAMP() ) returns the current date including the
time. This
is equivalent to the SYSDATE in oracle.
TO_DATE( string timestamp )
The TO_DATE function returns the date part of the timestamp in the format 'yyyy-MM-
dd'.
Example: TO_DATE('2000-01-01 [Link]') returns '2000-01-01'
YEAR( string date )
The YEAR function returns the year part of the date.
Example: YEAR('2000-01-01 [Link]') returns 2000
MONTH( string date )
The MONTH function returns the month part of the date.
Example: YEAR('2000-03-01 [Link]') returns 3
DAY( string date ), DAYOFMONTH( date )
The DAY or DAYOFMONTH function returns the day part of the date.
Example: DAY('2000-03-01 [Link]') returns 1
HOUR( string date )
The HOUR function returns the hour part of the date.
Example: HOUR('2000-03-01 [Link]') returns 10
MINUTE( string date )
The MINUTE function returns the minute part of the timestamp.
Example: MINUTE('2000-03-01 [Link]') returns 20
SECOND( string date )
The SECOND function returns the second part of the timestamp.
Example: SECOND('2000-03-01 [Link]') returns 30
WEEKOFYEAR( string date )
The WEEKOFYEAR function returns the week number of the date.
Example: WEEKOFYEAR('2000-03-01 [Link]') returns 9
DATEDIFF( string date1, string date2 )
The DATEDIFF function returns the number of days between the two given dates.
Example: DATEDIFF('2000-03-01', '2000-01-10') returns 51
DATE_ADD( string date, int days )
The DATE_ADD function adds the number of days to the specified date
Example: DATE_ADD('2000-03-01', 5) returns '2000-03-06'
DATE_SUB( string date, int days )
The DATE_SUB function subtracts the number of days to the specified date
Example: DATE_SUB('2000-03-01', 5) returns �2000-02-25�

Difference between Normal Tables and External Tables in Hive


The difference between the normal tables and external tables can be seen in LOAD
and DROP
operations.
Normal Tables: Hive manages the normal tables created and moves the data into its
warehouse
directory.
As an example, consider the table creation and loading of data into the table.

CREATE TABLE <table name> (col string);


LOAD DATA INPATH �/user/husr/[Link]� INTO TABLE <table name>;

This LOAD will move the file [Link] from HDFS into Hive�s warehouse directory for
the table. If the
table is dropped, then the table metadata and the data will be deleted.
External Tables: An external table refers to the data that is outside of the
warehouse directory.
As an example, consider the table creation and loading of data into the external
table.

CREATE EXTERNAL TABLE <external table name> ( col string)


LOCATION �/user/husr/<external table name>�;
LOAD DATA INPATH �/user/husr/[Link]� INTO <external table name>;

In case of external tables, Hive does not move the data into its warehouse
directory. If the external
table is dropped, then the table metadata is deleted but not the data.
Note: Hive does not check whether the external table location exists or not at the
time the external
table is created.

Numeric and Mathematical Functions in Hive

The Numerical functions are listed below in alphabetical order. Use these functions
in SQL queries.
ABS( double n )
The ABS function returns the absolute value of a number.
Example: ABS(-100)
ACOS( double n )
The ACOS function returns the arc cosine of value n. This function returns Null if
the value n is not in
the range of -1<=n<=1.
Example: ACOS(0.5)
ASIN( double n )
The ASIN function returns the arc sin of value n. This function returns Null if the
value n is not in the
range of -1<=n<=1.
Example: ASIN(0.5)
BIN( bigint n )
The BIN function returns the number n in the binary format.
Example: BIN(100)
CEIL( double n ), CEILING( double n )
The CEILING or CEILING function returns the smallest integer greater than or equal
to the decimal
value n.
Example: CEIL(9.5)
CONV( bigint n, int from_base, int to_base )
The CONV function converts the given number n from one base to another base.
EXAMPLE: CONV(100, 10,2)
COS( double n )
The COS function returns the cosine of the value n. Here n should be specified in
radians.
Example: COS(180*3.1415926/180)
EXP( double n )
The EXP function returns e to the power of n. Where e is the base of natural
logarithm and its value
is 2.718.
Example: EXP(50)
FLOOR( double n )
The FLOOR function returns the largest integer less than or equal to the given
value n.
Example: FLOOR(10.9)
HEX( bigint n)
This function converts the value n into hexadecimal format.
Example: HEX(16)
HEX( string n )
This function converts each character into hex representation format.
Example: HEX(�ABC�)
LN( double n )
The LN function returns the natural log of a number.
Example: LN(123.45)
LOG( double base, double n )
The LOG function returns the base logarithm of the number n.
Example: LOG(3, 66)
LOG2( double n )
The LOG2 function returns the base-2 logarithm of the number n.
Example: LOG2(44)
LOG10( double n )
The LOG10 function returns the base-10 logarithm of the number n.
Example: LOG10(100)
NEGATIVE( int n ), NEGATIVE( double n )
The NEGATIVE function returns �n
Example: NEGATIVE(10)
PMOD( int m, int n ), PMOD( double m, double n )
The PMOD function returns the positive modulus of a number.
Example: PMOD(3,2)
POSITIVE( int n ), POSITIVE( double n )
The POSITIVE function returns n
Example: POSITIVE(-10)
POW( double m, double n ), POWER( double m, double n )
The POW or POWER function returns m value raised to the n power.
Example: POW(10,2)
RAND( [int seed] )
The RAND function returns a random number. If you specify the seed value, the
generated random
number will become deterministic.
Example: RAND( )
ROUND( double value [, int n] )
The ROUND function returns the value rounded to n integer places.
Example: ROUND(123.456,2)
SIN( double n )
The SIN function returns the sin of a number. Here n should be specified in
radians.
Example: SIN(2)
SQRT( double n )
The SQRT function returns the square root of the number
Example: SQRT(4)
UNHEX( string n )
The UNHEX function is the inverse of HEX function. It converts the specified string
to the number
format.
Example: UNHEX(�AB�)

Data Types in Hive

Hive data types are categorized into two types. They are the primitive and complex
data types.
The primitive data types include Integers, Boolean, Floating point numbers and
strings. The below
table lists the size of each data type:

Type Size
----------------------
TINYINT 1 byte
SMALLINT 2 byte
INT 4 byte
BIGINT 8 byte
FLOAT 4 byte (single precision floating point numbers)
DOUBLE 8 byte (double precision floating point numbers)
BOOLEAN TRUE/FALSE value
STRING Max size is 2GB.

The complex data types include Arrays, Maps and Structs. These data types are built
on using the
primitive data types.
Arrays: Contain a list of elements of the same data type. These elements are
accessed by using an
index. For example an array, �fruits�, containing a list of elements [�apple�,
�mango�, �orange�], the
element �apple� in the array can be accessed by specifying fruits[1].
Maps: Contains key, value pairs. The elements are accessed by using the keys. For
example a
map, �pass_list� containing the �user name� as key and �password� as value, the
password of the
user can be accessed by specifying pass_list[�username�]
Structs: Contains elements of different data types. The elements can be accessed by
using the dot
notation. For example in a stuct, �car�, the color of the car can be retrieved as
specifying [Link]
The create table statement containing the complex type is shown below.

CREATE TABLE complex_data_types


(
Fruits ARRAY<string>,
Pass_list MAP<STRING, STRING>,
Car STRUCT<color: STRING, wheel_size: FLOAT>
);

What is Hive

Hive data warehouse is used to manage large datasets residing in Hadoop and for
querying
purpose. Hive can be used to access files stored in HDFS or in other data storage
system.
Hive provides SQL, which is called Hive QL, to read the data from the data storage
system. Hive
does not support the complete SQL-92 specification. It executes the queries via
MapReduce
algorithms. Hive provides the flexibility for users to create their own UDF�s via
MapReduce
framework. The programmer need to write the mapper and reducer scripts.
As Hadoop is batch processing system, the data processed by the Hadoop and returned
have high
latency. So, Hive queries have high latency and therefore it is not suitable for
online transactional
processing.

Unix Interview Questions on Grep Command

The grep is one of the powerful tools in unix. Grep stands for "global search for
regular expressions
and print". The power of grep lies in using regular expressions mostly.
The general syntax of grep command is
grep [options] pattern [files]
1. Write a command to print the lines that has the the pattern "july" in all the
files in a particular
directory?
grep july *
This will print all the lines in all files that contain the word �july� along with
the file name. If any of the
files contain words like "JULY" or "July", the above command would not print those
lines.
2. Write a command to print the lines that has the word "july" in all the files in
a directory and also
suppress the filename in the output.
grep -h july *
3. Write a command to print the lines that has the word "july" while ignoring the
case.
grep -i july *
The option i make the grep command to treat the pattern as case insensitive.
4. When you use a single file as input to the grep command to search for a pattern,
it won't print the
filename in the output. Now write a grep command to print the filename in the
output without using
the '-H' option.
grep pattern filename /dev/null
The /dev/null or null device is special file that discards the data written to it.
So, the /dev/null is
always an empty file.
Another way to print the filename is using the '-H' option. The grep command for
this is
grep -H pattern filename
5. Write a Unix command to display the lines in a file that do not contain the word
"july"?
grep -v july filename
The '-v' option tells the grep to print the lines that do not contain the specified
pattern.
6. Write a command to print the file names in a directory that has the word "july"?

grep -l july *
The '-l' option make the grep command to print only the filename without printing
the content of the
file. As soon as the grep command finds the pattern in a file, it prints the
pattern and stops searching
other lines in the file.
7. Write a command to print the file names in a directory that does not contain the
word "july"?
grep -L july *
The '-L' option makes the grep command to print the filenames that do not contain
the specified
pattern.
8. Write a command to print the line numbers along with the line that has the word
"july"?
grep -n july filename
The '-n' option is used to print the line numbers in a file. The line numbers start
from 1
9. Write a command to print the lines that starts with the word "start"?
grep '^start' filename
The '^' symbol specifies the grep command to search for the pattern at the start of
the line.
10. Write a command to print the lines which end with the word "end"?
grep 'end$' filename
The '$' symbol specifies the grep command to search for the pattern at the end of
the line.
11. Write a command to select only those lines containing "july" as a whole word?
grep -w july filename
The '-w' option makes the grep command to search for exact whole words. If the
specified pattern is
found in a string, then it is not considered as a whole word. For example: In the
string
"mikejulymak", the pattern "july" is found. However "july" is not a whole word in
that string.

Swapping and Paging in Unix

Swapping
The whole process in swapping is moved from the swap device to the main memory for
execution.
The process size must be less than or equal to the available main memory. It is
easier to
implementation and overhead to the system. Swapping systems does not handle the
memory more
flexibly as compared to the paging systems.
Paging
Only the required memory pages are moved to main memory from the swap device for
execution.
The process size does not matter. Paging gives the concept of the virtual memory.
It provides
greater flexibility in mapping the virtual address space into the physical memory
of the machine. It
allows more number of processes to fit in the main memory simultaneously and allows
the greater
process size than the available physical memory. Demand paging systems handle the
memory more
flexibly.

Unix Interview Questions on Awk Command

Awk is powerful tool in Unix. Awk is an excellent tool for processing the files
which have data
arranged in rows and columns format. It is a good filter and report writer.
1. How to run awk command specified in a file?
awk -f filename
2. Write a command to print the squares of numbers from 1 to 10 using awk command
awk 'BEGIN { for(i=1;i<=10;i++) {print "square of",i,"is",i*i;}}'
3. Write a command to find the sum of bytes (size of file) of all files in a
directory.
ls -l | awk 'BEGIN {sum=0} {sum = sum + $5} END {print sum}'
4. In the text file, some lines are delimited by colon and some are delimited by
space. Write a
command to print the third field of each line.
[Link]
unix..jpg
awk '{ if( $0 ~ /:/ ) { FS=":"; } else { FS =" "; } print $3 }' filename
5. Write a command to print the line number before each line?
awk '{print NR, $0}' filename
6. Write a command to print the second and third line of a file without using NR.
awk 'BEGIN {RS="";FS="\n"} {print $2,$3}' filename
7. Write a command to print zero byte size files?
ls -l | awk '/^-/ {if ($5 !=0 ) print $9 }'
8. Write a command to rename the files in a directory with "_new" as postfix?
ls -F | awk '{print "mv "$1" "$1".new"}' | sh
9. Write a command to print the fields in a text file in reverse order?
awk 'BEGIN {ORS=""} { for(i=NF;i>0;i--) print $i," "; print "\n"}' filename
10. Write a command to find the total number of lines in a file without using NR
awk 'BEGIN {sum=0} {sum=sum+1} END {print sum}' filename
Another way to print the number of lines is by using the NR. The command is
awk 'END{print NR}' filename

Unix File Structure (File Tree)

The Unix file structure is organized in a reverse tree structure manner. The
following figure shows a
typical organization of files in Unix system.

The diagram looks like any upside-down tree. The slash (/) indicates the root
directory. Names like
etc, usr, local are directories and [Link] is a file. The regular files in
Unix are the leaves in a tree
structure.

Different Types of Unix files

There are mainly three types of Unix files. They are

. Regular files
. Directories
. Special or Device files

Regular Files

Regular files hold data and executable programs. Executable programs are the
commands (ls) that
you enter on the prompt. The data can be anything and there is no specific format
enforced in the
way the data is stored.
The regular files can be visualized as the leaves in the UNIX tree.
Directories
Directories are files that contain other files and sub-directories. Directories are
used to organize the
data by keeping closely related files in the same place. The directories are just
like the folders in
windows operating system.
The kernel alone can write the directory file. When a file is added to or deleted
from this directory,
the kernel makes an entry.
A directory file can be visualized as the branch of the UNIX tree.
Special Or Device Files
These files represent the physical devices. Files can also refer to computer
hardware such as
terminals and printers. These device files can also refer to tape and disk drives,
CD-ROM players,
modems, network interfaces, scanners, and any other piece of computer hardware.
When a process
writes to a special file, the data is sent to the physical device associated with
it. Special files are not
literally files, but are pointers that point to the device drivers located in the
kernel. The protection
applicable to files is also applicable to physical devices.

Unix File System

The strength of the Unix lies in treating the files in a consistent way. For Unix a
file is a file. This
consistency makes it easy to work with files and the user does not have to learn
special commands
for new tasks. The user can write Unix programs easily without worrying about
whether he�s
communicating to a terminal, a printer, or an ordinary file on a disk drive.
For example a "cat" command can be used to display the contents of a file on
terminal screen and
can also send the file to a printer. As far as Unix is concerned the terminal and
the printer are files
just as other files.
Unix User Login Programs - Getty And Login

The Kernel should know who the user is logging in and how to communicate with the
user. To do
this the kernel invokes two programs, getty and login.
The kernel invokes the getty program for every user terminal. When the getty
program receives input
from the user, it invokes the login program. The login program verifies the
identity of the user by
checking the password file. If the user fails to provide valid password, the login
program returns the
control back to the getty program. If the user enters a valid password, the login
program takes the
user to the shell prompt.

Functions of Unix Shell

Some of the basic functions of shell are:

. Command line interpretation


. Program initiation
. Input-output redirection
. Pipeline connection
. Substitution of filenames
. Maintenance of variables
. Environment control
. Shell programming

Unix Outer Unit - Shell

The instructions to the kernel are complex and highly technical. To protect the
kernel from the short
comings of user, a shell is built around the kernel. The Shell acts like a mediator
between the user
and the kernel. Whenever a user run a command, the shell interprets the command and
passes the
command to the kernel.
Three types of shell are standard in Unix

. Bourne shell is developed by Stephen Bourne. It is the most widely used shell and
is a
program with name sh. The bourne shell prompts with $ symbol
. Korn shell is developed by David Korn. The korn shell has additional features
than bourne shell
and is called by the name ksh.
. C shell is developed by Bill Joy and is called by the name csh.

Unix Core Unit - Kernel

The kernel is the heart of a UNIX system and manages the hardware, executing
processes etc.
When the computer is booted, kernel is loaded into the computer's main memory and
it remains
there until the computer is shut down. The kernel performs many low-level and
system-level
functions.

The tasks of kernel include


. Interpreting and sending basic instructions to the computer's processor.
. Running and scheduling the processes.
. Allocating the necessary hardware.
. Controlling the I/O operations.

What is a Unix Operating System

Unix is a multi-tasking, multi-user operating system. It is a layer between the


hardware and the
applications that run on the computer. It has functions which manage the hardware
and applications.

The structure of Unix operating system can be divided into three parts.

. Kernel is the core part of Unix which interacts with the hardware for low level
functions.
. Shell is the outer unit of Unix which interacts with the user to perform the
functions.
. File System.

Oracle Query to split the delimited data in a column to multiple rows

1. Consider the following table "t" data as the source

id value

----------

1 A,B,C

2 P,Q,R,S,T

3 M,N

Here the data in value column is a delimited by comma. Now write a query to split
the delimited data
in the value column into multiple rows. The output should look like as

id value

--------

1 A

1 B
1 C

2 P

2 Q

2 R

2 S

2 T

3 M

3 N

Solution:

SELECT [Link],

CASE WHEN a.l = 1

THEN substr(value, 1, instr(value,',',1,a.l)-1)

ELSE substr(value, instr(value,',',1,a.l-1)+1,

CASE WHEN instr(value,',',1,a.l)-


instr(value,',',1,a.l-1)-1 > 0

THEN instr(value,',',1,a.l)-
instr(value,',',1,a.l-1)-1

ELSE length(value)

END

END final_value

FROM t,

( SELECT level l

FROM DUAL
CONNECT BY LEVEL <=

SELECT Max(length(value) - length(replace(value,',',''))+1)


FROM t

) a

WHERE length(value) - length(replace(value,',',''))+1 >= a.l

order by [Link], a.l;

How to find (calculate) median using oracle sql query

A median is a value separating the higher half of sample from the lower half. The
median can be
found by arranging all the numerical values from lowest to highest value and
picking the middle one.
If there are even number of numerical values, then there is no single middle value;
then the median
is defined as the mean of the two middle values.
Now let see how to calculate the median in oracle with the employees table as
example.
Table name: employees

empid, deptid, salary


1, 100, 5000
2, 100, 3000
3, 100, 4000
5, 200, 6000
6, 200, 8000

The below query is used to calculate the median of employee salaries across the
entire table.

select empid,
dept_id,
salary,
percentile_disc(0.5) within group (order by salary desc)
over () median
from employees;

The output of the above query is

empid, deptid, salary, median


1, 100, 5000, 5000
[Link]
[Link]
2, 100, 3000, 5000
3, 100, 4000, 5000
5, 200, 6000, 5000
6, 200, 8000, 5000

Now we will write a query to find the median of employee salaries in each
department.

select empid,
dept_id,
salary,
percentile_disc(0.5) within group (order by salary desc)
over (partition by department_id) median
from employees;

The ouput of the above query is

empid, deptid, salary, median


1, 100, 5000, 4000
2, 100, 3000, 4000
3, 100, 4000, 4000
5, 200, 6000, 7000
6, 200, 8000, 7000

Oracle Complex Queries - Part 3

The source data is represented in the form the tree structure. You can easily
derive the parent-child
relationship between the elements. For example, B is parent of D and E. As the
element A is root
element, it is at level 0. B, C are at level 1 and so on.

The above tree structure data is represented in a table as shown below.

c1, c2, c3, c4


A, B, D, H
A, B, D, I
A, B, E, NULL
A, C, F, NULL
A, C, G, NULL

Here in this table, column C1 is parent of column C2, column C2 is parent of column
C3, column C3
is parent of column C4.
Q1. Write a query to load the target table with the below data. Here you need to
generate sequence
numbers for each element and then you have to get the parent id. As the element "A"
is at root, it
does not have any parent and its parent_id is NULL.

id, element, lev, parent_id


1, A, 0, NULL
2, B, 1, 1
3, C, 1, 1
4, D, 2, 2
5, E, 2, 2
6, F, 2, 3
7, G, 2, 3
8, H, 3, 4
9, I, 3, 4

Solution:

WITH t1 AS
(
SELECT VALUE PARENT,
LEV,
LEAD(value,1) OVER (PARTITION BY r ORDER BY lev) CHILD
FROM (SELECT c1,
c2,
c3,
c4,
ROWNUM r
FROM table_name
)
UNPIVOT (value FOR lev IN (c1 as 0,c2 as 1,c3 as 2,c4 as 3))
),
t2 AS
(
SELECT PARENT,
LEV,
ROWNUM SEQ
FROM
(SELECT DISTINCT PARENT,
LEV
FROM T1
ORDER BY LEV
)
),
T3 AS
(
SELECT DISTINCT PARENT,
CHILD
FROM T1
WHERE CHILD IS NOT NULL
UNION ALL
SELECT DISTINCT NULL,
PARENT
FROM T1
WHERE LEV=0
)
SELECT [Link] Id,
[Link] ELEMENT,
[Link],
[Link] PARENT_ID
FROM T3
INNER JOIN
T2 C
ON ([Link] = [Link])
LEFT OUTER JOIN
T2 P
ON ([Link] = [Link])
ORDER BY [Link];

Note: The unpivot function works in oracle 11g.

SQL Queries Interview Questions - Oracle Part 4

1. Consider the following friends table as the source

Name, Friend_Name

-----------------

sam, ram

sam, vamsi

vamsi, ram

vamsi, jhon

ram, vijay

ram, anand

Here ram and vamsi are friends of sam; ram and jhon are friends of vamsi and so on.
Now write a
query to find friends of friends of sam. For sam; ram,jhon,vijay and anand are
friends of friends. The
output should look as

Name, Friend_of_Firend

----------------------

sam, ram

sam, jhon

sam, vijay

sam, anand

Solution:

SELECT [Link],

f2.friend_name as friend_of_friend

FROM friends f1,

friends f2

WHERE [Link] = 'sam'

AND f1.friend_name = [Link];

2. This is an extension to the problem 1. In the output, you can see ram is
displayed as friends of
friends. This is because, ram is mutual friend of sam and vamsi. Now extend the
above query to
exclude mutual friends. The outuput should look as

Name, Friend_of_Friend

----------------------

sam, jhon

sam, vijay

sam, anand
Solution:

SELECT [Link],

f2.friend_name as friend_of_friend

FROM friends f1,

friends f2

WHERE [Link] = 'sam'

AND f1.friend_name = [Link]

AND NOT EXISTS

(SELECT 1 FROM friends f3

WHERE [Link] = [Link]

AND f3.friend_name = f2.friend_name);

3. Write a query to get the top 5 products based on the quantity sold without using
the row_number
analytical function? The source data looks as

Products, quantity_sold, year

-----------------------------

A, 200, 2009

B, 155, 2009

C, 455, 2009

D, 620, 2009

E, 135, 2009

F, 390, 2009

G, 999, 2010

H, 810, 2010
I, 910, 2010

J, 109, 2010

L, 260, 2010

M, 580, 2010

Solution:

SELECT products,

quantity_sold,

year

FROM

SELECT products,

quantity_sold,

year,

rownum r

from t

ORDER BY quantity_sold DESC

)A

WHERE r <= 5;

4. This is an extension to the problem 3. Write a query to produce the same output
using
row_number analytical function?
Solution:

SELECT products,
quantity_sold,

year

FROM

SELECT products,

quantity_sold,

year,

row_number() OVER(

ORDER BY quantity_sold DESC) r

from t

)A

WHERE r <= 5;

5. This is an extension to the problem 3. write a query to get the top 5 products
in each year based
on the quantity sold?
Solution:

SELECT products,

quantity_sold,

year

FROM

SELECT products,

quantity_sold,

year,

row_number() OVER(
PARTITION BY year

ORDER BY quantity_sold DESC) r

from t

)A

WHERE r <= 5;

SQL Queries Interview Questions - Oracle Part 3

Here I am providing Oracle SQL Query Interview Questions. If you find any bugs in
the queries,
Please do comment. So, that i will rectify them.
1. Write a query to generate sequence numbers from 1 to the specified number N?
Solution:

SELECT LEVEL FROM DUAL CONNECT BY LEVEL<=&N;

2. Write a query to display only friday dates from Jan, 2000 to till now?
Solution:

SELECT C_DATE,

TO_CHAR(C_DATE,'DY')

FROM

SELECT TO_DATE('01-JAN-2000','DD-MON-YYYY')+LEVEL-1 C_DATE

FROM DUAL

CONNECT BY LEVEL <=

(SYSDATE - TO_DATE('01-JAN-2000','DD-MON-YYYY')+1)
)

WHERE TO_CHAR(C_DATE,'DY') = 'FRI';

3. Write a query to duplicate each row based on the value in the repeat column? The
input table data
looks like as below

Products, Repeat

----------------

A, 3

B, 5

C, 2

Now in the output data, the product A should be repeated 3 times, B should be
repeated 5 times and
C should be repeated 2 times. The output will look like as below

Products, Repeat

----------------

A, 3

A, 3

A, 3

B, 5

B, 5

B, 5

B, 5

B, 5

C, 2

C, 2
Solution:

SELECT PRODUCTS,

REPEAT

FROM T,

( SELECT LEVEL L FROM DUAL

CONNECT BY LEVEL <= (SELECT MAX(REPEAT) FROM T)

) A

WHERE [Link] >= A.L

ORDER BY [Link];

4. Write a query to display each letter of the word "SMILE" in a separate row?

Solution:

SELECT SUBSTR('SMILE',LEVEL,1) A

FROM DUAL

CONNECT BY LEVEL <=LENGTH('SMILE');

5. Convert the string "SMILE" to Ascii values? The output should look like as
83,77,73,76,69. Where
83 is the ascii value of S and so on.
The ASCII function will give ascii value for only one character. If you pass a
string to the ascii
function, it will give the ascii value of first letter in the string. Here i am
providing two solutions to get
the ascii values of string.
Solution1:

SELECT SUBSTR(DUMP('SMILE'),15)

FROM DUAL;

Solution2:

SELECT WM_CONCAT(A)

FROM

SELECT ASCII(SUBSTR('SMILE',LEVEL,1)) A

FROM DUAL

CONNECT BY LEVEL <=LENGTH('SMILE')

);

SQL Queries Interview Questions - Oracle Part 2

This is continuation to my previous post, SQL Queries Interview Questions - Oracle


Part 1 , Where i
have used PRODUCTS and SALES tables as an example. Here also i am using the same
tables.
So, just take a look at the tables by going through that link and it will be easy
for you to understand
the questions mentioned here.
Solve the below examples by writing SQL queries.
1. Write a query to find the products whose quantity sold in a year should be
greater than the
average quantity of the product sold across all the years?
Solution:
This can be solved with the help of correlated query. The SQL query for this is
SELECT P.PRODUCT_NAME,

[Link],

[Link]

FROM PRODUCTS P,

SALES S

WHERE P.PRODUCT_ID = S.PRODUCT_ID

AND [Link] >

(SELECT AVG(QUANTITY)

FROM SALES S1

WHERE S1.PRODUCT_ID = S.PRODUCT_ID

);

PRODUCT_NAME YEAR QUANTITY

--------------------------

Nokia 2010 25

IPhone 2012 20

Samsung 2012 20

Samsung 2010 20

2. Write a query to compare the products sales of "IPhone" and "Samsung" in each
year? The output
should look like as

YEAR IPHONE_QUANT SAM_QUANT IPHONE_PRICE SAM_PRICE

---------------------------------------------------

2010 10 20 9000 7000

2011 15 18 9000 7000


2012 20 20 9000 7000

Solution:
By using self-join SQL query we can get the required result. The required SQL query
is

SELECT S_I.YEAR,

S_I.QUANTITY IPHONE_QUANT,

S_S.QUANTITY SAM_QUANT,

S_I.PRICE IPHONE_PRICE,

S_S.PRICE SAM_PRICE

FROM PRODUCTS P_I,

SALES S_I,

PRODUCTS P_S,

SALES S_S

WHERE P_I.PRODUCT_ID = S_I.PRODUCT_ID

AND P_S.PRODUCT_ID = S_S.PRODUCT_ID

AND P_I.PRODUCT_NAME = 'IPhone'

AND P_S.PRODUCT_NAME = 'Samsung'

AND S_I.YEAR = S_S.YEAR

3. Write a query to find the ratios of the sales of a product?


Solution:
The ratio of a product is calculated as the total sales price in a particular year
divide by the total
sales price across all years. Oracle provides RATIO_TO_REPORT analytical function
for finding the
ratios. The SQL query is

SELECT P.PRODUCT_NAME,
[Link],

RATIO_TO_REPORT([Link]*[Link])

OVER(PARTITION BY P.PRODUCT_NAME ) SALES_RATIO

FROM PRODUCTS P,

SALES S

WHERE (P.PRODUCT_ID = S.PRODUCT_ID);

PRODUCT_NAME YEAR RATIO

-----------------------------

IPhone 2011 0.333333333

IPhone 2012 0.444444444

IPhone 2010 0.222222222

Nokia 2012 0.163265306

Nokia 2011 0.326530612

Nokia 2010 0.510204082

Samsung 2010 0.344827586

Samsung 2012 0.344827586

Samsung 2011 0.310344828

4. In the SALES table quantity of each product is stored in rows for every year.
Now write a query to
transpose the quantity for each product and display it in columns? The output
should look like as

PRODUCT_NAME QUAN_2010 QUAN_2011 QUAN_2012

------------------------------------------

IPhone 10 15 20

Samsung 20 18 20
Nokia 25 16 8

Solution:
Oracle 11g provides a pivot function to transpose the row data into column data.
The SQL query for
this is

SELECT * FROM

SELECT P.PRODUCT_NAME,

[Link],

[Link]

FROM PRODUCTS P,

SALES S

WHERE (P.PRODUCT_ID = S.PRODUCT_ID)

)A

PIVOT ( MAX(QUANTITY) AS QUAN FOR (YEAR) IN (2010,2011,2012));

If you are not running oracle 11g database, then use the below query for
transposing the row data
into column data.

SELECT P.PRODUCT_NAME,

MAX(DECODE([Link],2010, [Link])) QUAN_2010,

MAX(DECODE([Link],2011, [Link])) QUAN_2011,

MAX(DECODE([Link],2012, [Link])) QUAN_2012

FROM PRODUCTS P,

SALES S

WHERE (P.PRODUCT_ID = S.PRODUCT_ID)


GROUP BY P.PRODUCT_NAME;

5. Write a query to find the number of products sold in each year?


Solution:
To get this result we have to group by on year and the find the count. The SQL
query for this
question is

SELECT YEAR,

COUNT(1) NUM_PRODUCTS

FROM SALES

GROUP BY YEAR;

YEAR NUM_PRODUCTS

------------------

2010 3

2011 3

2012 3

SQL Queries Interview Questions - Oracle Part 1

As a database developer, writing SQL queries, PLSQL code is part of daily life.
Having a good
knowledge on SQL is really important. Here i am posting some practical examples on
SQL queries.
To solve these interview questions on SQL queries you have to create the products,
sales tables in
your oracle database. The "Create Table", "Insert" statements are provided below.

CREATE TABLE PRODUCTS

(
PRODUCT_ID INTEGER,

PRODUCT_NAME VARCHAR2(30)

);

CREATE TABLE SALES

SALE_ID INTEGER,

PRODUCT_ID INTEGER,

YEAR INTEGER,

Quantity INTEGER,

PRICE INTEGER

);

INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');

INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');

INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');

INSERT INTO PRODUCTS VALUES ( 400, 'LG');

INSERT INTO SALES VALUES ( 1, 100, 2010, 25, 5000);

INSERT INTO SALES VALUES ( 2, 100, 2011, 16, 5000);

INSERT INTO SALES VALUES ( 3, 100, 2012, 8, 5000);

INSERT INTO SALES VALUES ( 4, 200, 2010, 10, 9000);

INSERT INTO SALES VALUES ( 5, 200, 2011, 15, 9000);

INSERT INTO SALES VALUES ( 6, 200, 2012, 20, 9000);

INSERT INTO SALES VALUES ( 7, 300, 2010, 20, 7000);


INSERT INTO SALES VALUES ( 8, 300, 2011, 18, 7000);

INSERT INTO SALES VALUES ( 9, 300, 2012, 20, 7000);

COMMIT;

The products table contains the below data.

SELECT * FROM PRODUCTS;

PRODUCT_ID PRODUCT_NAME

-----------------------

100 Nokia

200 IPhone

300 Samsung

The sales table contains the following data.

SELECT * FROM SALES;

SALE_ID PRODUCT_ID YEAR QUANTITY PRICE

--------------------------------------

1 100 2010 25 5000

2 100 2011 16 5000

3 100 2012 8 5000

4 200 2010 10 9000

5 200 2011 15 9000

6 200 2012 20 9000


7 300 2010 20 7000

8 300 2011 18 7000

9 300 2012 20 7000

Here Quantity is the number of products sold in each year. Price is the sale price
of each product.
I hope you have created the tables in your oracle database. Now try to solve the
below SQL queries.
1. Write a SQL query to find the products which have continuous increase in sales
every year?
Solution:
Here �Iphone� is the only product whose sales are increasing every year.
STEP1: First we will get the previous year sales for each product. The SQL query to
do this is

SELECT P.PRODUCT_NAME,

[Link],

[Link],

LEAD([Link],1,0) OVER (

PARTITION BY P.PRODUCT_ID

ORDER BY [Link] DESC

) QUAN_PREV_YEAR

FROM PRODUCTS P,

SALES S

WHERE P.PRODUCT_ID = S.PRODUCT_ID;

PRODUCT_NAME YEAR QUANTITY QUAN_PREV_YEAR

-----------------------------------------
Nokia 2012 8 16

Nokia 2011 16 25

Nokia 2010 25 0

IPhone 2012 20 15

IPhone 2011 15 10

IPhone 2010 10 0

Samsung 2012 20 18

Samsung 2011 18 20

Samsung 2010 20 0

Here the lead analytic function will get the quantity of a product in its previous
year.
STEP2: We will find the difference between the quantities of a product with its
previous year�s
quantity. If this difference is greater than or equal to zero for all the rows,
then the product is a
constantly increasing in sales. The final query to get the required result is

SELECT PRODUCT_NAME

FROM

SELECT P.PRODUCT_NAME,

[Link] -

LEAD([Link],1,0) OVER (

PARTITION BY P.PRODUCT_ID

ORDER BY [Link] DESC

) QUAN_DIFF

FROM PRODUCTS P,

SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID

)A

GROUP BY PRODUCT_NAME

HAVING MIN(QUAN_DIFF) >= 0;

PRODUCT_NAME

------------

IPhone

2. Write a SQL query to find the products which does not have sales at all?
Solution:
�LG� is the only product which does not have sales at all. This can be achieved in
three ways.
Method1: Using left outer join.

SELECT P.PRODUCT_NAME

FROM PRODUCTS P

LEFT OUTER JOIN

SALES S

ON (P.PRODUCT_ID = S.PRODUCT_ID);

WHERE [Link] IS NULL

PRODUCT_NAME

------------

LG
Method2: Using the NOT IN operator.

SELECT P.PRODUCT_NAME

FROM PRODUCTS P

WHERE P.PRODUCT_ID NOT IN

(SELECT DISTINCT PRODUCT_ID FROM SALES);

PRODUCT_NAME

------------

LG

Method3: Using the NOT EXISTS operator.

SELECT P.PRODUCT_NAME

FROM PRODUCTS P

WHERE NOT EXISTS

(SELECT 1 FROM SALES S WHERE S.PRODUCT_ID = P.PRODUCT_ID);

PRODUCT_NAME

------------

LG

3. Write a SQL query to find the products whose sales decreased in 2012 compared to
2011?
Solution:
Here Nokia is the only product whose sales decreased in year 2012 when compared
with the sales
in the year 2011. The SQL query to get the required output is
SELECT P.PRODUCT_NAME

FROM PRODUCTS P,

SALES S_2012,

SALES S_2011

WHERE P.PRODUCT_ID = S_2012.PRODUCT_ID

AND S_2012.YEAR = 2012

AND S_2011.YEAR = 2011

AND S_2012.PRODUCT_ID = S_2011.PRODUCT_ID

AND S_2012.QUANTITY < S_2011.QUANTITY;

PRODUCT_NAME

------------

Nokia

4. Write a query to select the top product sold in each year?


Solution:
Nokia is the top product sold in the year 2010. Similarly, Samsung in 2011 and
IPhone, Samsung in
2012. The query for this is

SELECT PRODUCT_NAME,

YEAR

FROM

SELECT P.PRODUCT_NAME,

[Link],

RANK() OVER (
PARTITION BY [Link]

ORDER BY [Link] DESC

) RNK

FROM PRODUCTS P,

SALES S

WHERE P.PRODUCT_ID = S.PRODUCT_ID

) A

WHERE RNK = 1;

PRODUCT_NAME YEAR

--------------------

Nokia 2010

Samsung 2011

IPhone 2012

Samsung 2012

5. Write a query to find the total sales of each product.?


Solution:
This is a simple query. You just need to group by the data on PRODUCT_NAME and then
find the
sum of sales.

SELECT P.PRODUCT_NAME,

NVL( SUM( [Link]*[Link] ), 0) TOTAL_SALES

FROM PRODUCTS P

LEFT OUTER JOIN

SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID)

GROUP BY P.PRODUCT_NAME;

PRODUCT_NAME TOTAL_SALES

---------------------------

LG 0

IPhone 405000

Samsung 406000

Nokia 245000

SQL Query Interview Questions - Part 5

Write SQL queries for the below interview questions:

1. Load the below products table into the target table.

CREATE TABLE PRODUCTS

PRODUCT_ID INTEGER,

PRODUCT_NAME VARCHAR2(30)

);

INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');

INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');

INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');

INSERT INTO PRODUCTS VALUES ( 400, 'LG');


INSERT INTO PRODUCTS VALUES ( 500, 'BlackBerry');

INSERT INTO PRODUCTS VALUES ( 600, 'Motorola');

COMMIT;

SELECT * FROM PRODUCTS;

PRODUCT_ID PRODUCT_NAME

-----------------------

100 Nokia

200 IPhone

300 Samsung

400 LG

500 BlackBerry

600 Motorola

The requirements for loading the target table are:

. Select only 2 products randomly.


. Do not select the products which are already loaded in the target table with in
the last 30 days.
. Target table should always contain the products loaded in 30 days. It should not
contain the
products which are loaded prior to 30 days.

Solution:

First we will create a target table. The target table will have an additional
column INSERT_DATE to
know when a product is loaded into the target table. The target
table structure is

CREATE TABLE TGT_PRODUCTS

PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30),

INSERT_DATE DATE

);

The next step is to pick 5 products randomly and then load into target table. While
selecting check
whether the products are there in the

INSERT INTO TGT_PRODUCTS

SELECT PRODUCT_ID,

PRODUCT_NAME,

SYSDATE INSERT_DATE

FROM

SELECT PRODUCT_ID,

PRODUCT_NAME

FROM PRODUCTS S

WHERE NOT EXISTS (

SELECT 1

FROM TGT_PRODUCTS T

WHERE T.PRODUCT_ID = S.PRODUCT_ID

ORDER BY DBMS_RANDOM.VALUE --Random number generator in oracle.

)A

WHERE ROWNUM <= 2;

The last step is to delete the products from the table which are loaded 30 days
back.
DELETE FROM TGT_PRODUCTS

WHERE INSERT_DATE < SYSDATE - 30;

2. Load the below CONTENTS table into the target table.

CREATE TABLE CONTENTS

CONTENT_ID INTEGER,

CONTENT_TYPE VARCHAR2(30)

);

INSERT INTO CONTENTS VALUES (1,'MOVIE');

INSERT INTO CONTENTS VALUES (2,'MOVIE');

INSERT INTO CONTENTS VALUES (3,'AUDIO');

INSERT INTO CONTENTS VALUES (4,'AUDIO');

INSERT INTO CONTENTS VALUES (5,'MAGAZINE');

INSERT INTO CONTENTS VALUES (6,'MAGAZINE');

COMMIT;

SELECT * FROM CONTENTS;

CONTENT_ID CONTENT_TYPE

-----------------------

1 MOVIE

2 MOVIE
3 AUDIO

4 AUDIO

5 MAGAZINE

6 MAGAZINE

The requirements to load the target table are:

. Load only one content type at a time into the target table.
. The target table should always contain only one contain type.
. The loading of content types should follow round-robin style. First MOVIE, second
AUDIO, Third
MAGAZINE and again fourth Movie.

Solution:

First we will create a lookup table where we mention the priorities for the content
types. The lookup
table �Create Statement� and data is shown below.

CREATE TABLE CONTENTS_LKP

CONTENT_TYPE VARCHAR2(30),

PRIORITY INTEGER,

LOAD_FLAG INTEGER

);

INSERT INTO CONTENTS_LKP VALUES('MOVIE',1,1);

INSERT INTO CONTENTS_LKP VALUES('AUDIO',2,0);

INSERT INTO CONTENTS_LKP VALUES('MAGAZINE',3,0);

COMMIT;

SELECT * FROM CONTENTS_LKP;


CONTENT_TYPE PRIORITY LOAD_FLAG

---------------------------------

MOVIE 1 1

AUDIO 2 0

MAGAZINE 3 0

Here if LOAD_FLAG is 1, then it indicates which content type needs to be loaded


into the target
table. Only one content type will have LOAD_FLAG as 1. The other content types will
have
LOAD_FLAG as 0. The target table structure is same as the source table structure.

The second step is to truncate the target table before loading the data

TRUNCATE TABLE TGT_CONTENTS;

The third step is to choose the appropriate content type from the lookup table to
load the source data
into the target table.

INSERT INTO TGT_CONTENTS

SELECT CONTENT_ID,

CONTENT_TYPE

FROM CONTENTS

WHERE CONTENT_TYPE = (SELECT CONTENT_TYPE FROM CONTENTS_LKP WHERE


LOAD_FLAG=1);

The last step is to update the LOAD_FLAG of the Lookup table.

UPDATE CONTENTS_LKP

SET LOAD_FLAG = 0

WHERE LOAD_FLAG = 1;
UPDATE CONTENTS_LKP

SET LOAD_FLAG = 1

WHERE PRIORITY = (

SELECT DECODE( PRIORITY,(SELECT MAX(PRIORITY) FROM CONTENTS_LKP) ,1 ,


PRIORITY+1)

FROM CONTENTS_LKP

WHERE CONTENT_TYPE = (SELECT DISTINCT CONTENT_TYPE FROM TGT_CONTENTS)

);

Informatica Interview Questions - Part3

1. What is polling?
Polling displays the updated information about the session in the monitor window.
The monitor
window displays the status of each session when you poll the informatica server.

2. In which circumstances, informatica server creates Reject files?


When the informatica server encounters the DD_Reject in update strategy
transformation, violates
the database constraints, filed in the rows were truncated or overflowed.
3. What are the data movement modes in informatica?
Data movement mode determines how informatica server handles the character data.
You can
choose the data movement mode in the informatica server configuration settings. Two
types of data
movement modes are available in informatica. They are ASCII mode and Unicode mode.

4. Define mapping and session?

. Mapping: It is a set of source and target definitions linked by transformation


objects that define
the rules for transformation.
. Session: It is a set of instructions that describe how and when to move data from
source to
targets.

5. Can u generate reports in Informatica?


Yes. By using Metadata reporter we can generate reports in informatica.

6. What is metadata reporter?


It is a web based application that enables you to run reports against repository
metadata. With a
metadata reporter, you can access information about the repository without having
knowledge of
SQL, transformation language or underlying tables in the repository.
7. What is the default source option for update strategy transformation?
Data driven.
8. What is Data driven?
The informatica server follows the instructions coded in the update strategy
transformations with in
the mapping and determines how to flag the records for insert, update, delete or
reject. If you do not
choose data driven option setting, the informatica server ignores all update
strategy transformations
in the mapping.
9. What is source qualifier transformation?
When you add a relational or a flat file source definition to a mapping, you need
to connect it to a
source qualifier transformation. The source qualifier transformation represents the
records that the
informatica server reads when it runs a session.

10. What are the tasks that source qualifier perform?

. Joins the data originating from same source data base.


. Filter records when the informatica server reads source data.
. Specify an outer join rather than the default inner join
. specify sorted records.
. Select only distinct values from the source.
. Create custom query to issue a special SELECT statement for the informatica
server to read the
source data.

11. What is the default join that source qualifier provides?


Equi Join

12. What are the basic requirements to join two sources in a source qualifier
transformation using
default join?

. The two sources should have primary key and foreign key relationship.
. The two sources should have matching data types.

Informatica Interview Questions - Part 2

1. What are the differences between joiner transformation and source qualifier
transformation?

A joiner transformation can join heterogeneous data sources where as a source


qualifier can join
only homogeneous sources. Source qualifier transformation can join data from only
relational
sources but cannot join flat files.
2. What are the limitations of joiner transformation?

. Both pipelines begin with the same original data source.


. Both input pipelines originate from the same Source Qualifier transformation.
. Both input pipelines originate from the same Normalizer transformation.
. Both input pipelines originate from the same Joiner transformation.
. Either input pipelines contains an Update Strategy transformation.
. Either input pipelines contains a connected or unconnected Sequence Generator
transformation.

3. What are the settings that you use to configure the joiner transformation?

The following settings are used to configure the joiner transformation.

. Master and detail source


. Type of join
. Condition of the join

4. What are the join types in joiner transformation?


The join types are

. Normal (Default)
. Master outer
. Detail outer
. Full outer

5. What are the joiner caches?

When a Joiner transformation occurs in a session, the Informatica Server reads all
the records from
the master source and builds index and data caches based on the master rows. After
building the
caches, the Joiner transformation reads records from the detail source and performs
joins.

6. What is the look up transformation?

Lookup transformation is used to lookup data in a relational table, view and


synonym. Informatica
server queries the look up table based on the lookup ports in the transformation.
It compares the
lookup transformation port values to lookup table column values based on the look
up condition.

7. Why use the lookup transformation?


Lookup transformation is used to perform the following tasks.

. Get a related value.


. Perform a calculation.
. Update slowly changing dimension tables.

8. What are the types of lookup transformation?

The types of lookup transformation are Connected and unconnected.


9. What is meant by lookup caches?
The informatica server builds a cache in memory when it processes the first row of
a data in a
cached look up transformation. It allocates memory for the cache based on the
amount you
configure in the transformation or session properties. The informatica server
stores condition values
in the index cache and output values in the data cache.
10. What are the types of lookup caches?

. Persistent cache: You can save the lookup cache files and reuse them the next
time the
informatica server processes a lookup transformation configured to use the cache.
. Re-cache from database: If the persistent cache is not synchronized with the
lookup table, you
can configure the lookup transformation to rebuild the lookup cache.
. Static cache: you can configure a static or read only cache for only lookup
table. By default
informatica server creates a static cache. It caches the lookup table and lookup
values in the
cache for each row that comes into the transformation. When the lookup condition is
true, the
informatica server does not update the cache while it processes the lookup
transformation.
. Dynamic cache: If you want to cache the target table and insert new rows into
cache and the
target, you can create a look up transformation to use dynamic cache. The
informatica server
dynamically inserts data to the target table.
. Shared cache: You can share the lookup cache between multiple transactions. You
can share
unnamed cache between transformations in the same mapping.

11. Which transformation should we use to normalize the COBOL and relational
sources?

Normalizer Transformation is used to normalize the data.

12. In which transformation you cannot drag ports into it?


Normalizer Transformation.
13. How the informatica server sorts the string values in Rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session
data using
Binary sort order. If you configure the session to use a binary sort order, the
informatica server
calculates the binary value of each string and returns the specified number of rows
with the highest
binary values for the string.
14. What are the rank caches?
During the session, the informatica server compares an input row with rows in the
data cache. If the
input row out-ranks a stored row, the informatica server replaces the stored row
with the input row.
The informatica server stores group information in an index cache and row data in a
data cache.
15. What is the Rankindex port in Rank transformation?
The Designer automatically creates a RANKINDEX port for each Rank transformation.
The
Informatica Server uses the Rank Index port to store the ranking position for each
record in a group.
16. What is the Router transformation?
A Router transformation is similar to a Filter transformation because both
transformations allow you
to use a condition to test data. However, a Filter transformation tests data for
one condition and
drops the rows of data that do not meet the condition. A Router transformation
tests data for one or
more conditions and gives you the option to route rows of data that do not meet any
of the conditions
to a default output group.
If you need to test the same input data based on multiple conditions, use a Router
Transformation in
a mapping instead of creating multiple Filter transformations to perform the same
task.
17. What are the types of groups in Router transformation?
The different types of groups in router transformation are

. Input group
. Output group

The output group contains two types. They are

. User defined groups


. Default group

18. What are the types of data that passes between informatica server and stored
procedure?
Three types of data passes between the informatica server and stored procedure.
. Input/Output parameters
. Return Values
. Status code.

19. What is the status code in stored procedure transformation?


Status code provides error handling for the informatica server during the session.
The stored
procedure issues a status code that notifies whether or not stored procedure
completed
successfully. This value cannot seen by the user. It only used by the informatica
server to determine
whether to continue running the session or stop.
20. What is the target load order?
You can specify the target load order based on source qualifiers in a mapping. If
you have the
multiple source qualifiers connected to the multiple targets, you can designate the
order in which
informatica server loads data into the targets.

Informatica Interview Questions - Part 1

1. While importing the relational source definition from the database, what are the
metadata of
source that will be imported?

The metadata of the source that will be imported are:

. Source name
. Database location
. Column names
. Data type�s
. Key constraints

2. How many ways a relational source definition can be updated and what are they?

There are two ways to update the relational source definition:

. Edit the definition


. Re-import the definition

3. To import the flat file definition into the designer where should the flat file
be placed?
Place the flat file in local folder in the local machine

4. To provide support for Mainframes source data, which files are used as a source
definitions?
COBOL files
5. Which transformation is needed while using the cobol sources as source
definitions?
As cobol sources consists of denormalized data, normalizer transformation is
required to normalize
the data.
6. How to create or import flat file definition in to the warehouse designer?
We cannot create or import flat file definition into warehouse designer directly.
We can create or
import the file in source analyzer and then drag it into the warehouse designer.
7. What is a mapplet?
A mapplet is a set of transformations that you build in the mapplet designer and
can be used in
multiple mappings.
8. What is a transformation?
It is a repository object that generates, modifies or passes data.

9. What are the designer tools for creating transformations?

. Mapping designer
. Transformation developer
. Mapplet designer

10. What are active and passive transformations?


An active transformation can change the number of rows that pass through it. A
passive
transformation does not change the number of rows that pass through it.

11. What are connected or unconnected transformations?


An unconnected transformation is not connected to other transformations in the
mapping. Connected
transformation is connected to other transformations in the mapping pipeline.

12. How many ways are there to create ports?

There are two ways to create the ports:

. Drag the port from another transformation


. Click the add button on the ports tab.

13. What are the reusable transformations?


Reusable transformations can be used in multiple mappings and mapplets. When you
need to
include this transformation into a mapping or a mapplet, an instance of it is
dragged into the mapping
or mapplet. Since, the instance of reusable transformation is a pointer to that
transformation, any
change in the reusable transformation will be inherited by all the instances.

14. What are the methods for creating reusable transformations?

Two methods:

. Design it in the transformation developer.


. Promote a standard transformation (Non reusable) from the mapping designer. After
adding a
transformation to the mapping, we can promote it to the status of reusable
transformation.

15. What are the unsupported repository objects for a mapplet?


. COBOL source definition
. Joiner transformations
. Normalizer transformations
. Non reusable sequence generator transformations.
. Pre or post session stored procedures
. Target definitions
. Power mart 3.5 style Look Up functions
. XML source definitions
. IBM MQ source definitions

16. What are the mapping parameters and mapping variables?


. Mapping parameter represents a constant value which is defined before running a
session. A
mapping parameter retains the same value throughout the entire session. A parameter
can be
declared either in a mapping or mapplet and can have a default value. We can
specify the value
of the parameter in the parameter file and the session reads the parameter value
from the
parameter file.
. Unlike a mapping parameter, a mapping variable represents can change throughout
the
session. The informatica server saves the value of mapping variable in the
repository at the end
of session run and uses that value next time when the session runs.

17. Can we use the mapping parameters or variables created in one mapping into
another mapping?
NO. We can use the mapping parameters or variables only in the transformations of
the same
mapping or mapplet in which we have created the mapping parameters or variables.

18. Can we use the mapping parameters or variables created in one mapping into any
other
reusable transformation?
Yes. As an instance of the reusable transformation created in the mapping belongs
to that mapping
only.

19. How can we improve session performance in aggregator transformation?


Use sorted input. Sort the input on the ports which are specified as group by ports
in aggregator.
20. What is aggregate cache in aggregator transformation?
The aggregator stores data in the aggregate cache until it completes aggregate
calculations. When
we run a session that uses an aggregator transformation, the informatica server
creates index and
data caches in memory to process the transformation. If the informatica server
requires more space,
it stores overflow values in cache files.

Case Converter Transformation in informatica

The Case converter transformation is a passive transformation used to format the


data to similar
character formats. Case converter transformation is used to maintain data quality.
The predefined case conversion types are uppercase, lowercase, toggle case, title
case and
sentence case.
Reference tables can also be used to control the case conversion. Use the "Valid"
column in the
reference table to change the case of input strings. Use reference tables only when
the case
conversion type is "title case or sentence case".
Case Strategy Properties:
You can create multiple case conversion strategies. Each strategy uses a single
conversion type.
Configure the following properties on the strategies view in the case converter
transformation:
Reference Tables: used to apply the capitalization format specified by a reference
table. Reference
tables work only if the case conversion option is title case or sentence case. If a
reference table
match occurs at the start of a string, the next character in that string changes to
uppercase. For
[Link]
[Link]
example, if the input string is vieditor and the reference table has an entry for
Vi, the output string is
ViEditor.
Conversion Types: The conversion types are uppercase, lowercase, toggle case, title
case and
sentence case. The default conversion type is uppercase.
Leave uppercase words unchanged: Overrides the chosen capitalization for uppercase
strings.
Delimiters: Specifies how capitalization functions work for title case conversion.
For example,
choose a colon as a delimiter to transform "james:bond" to "James:Bond". The
default delimiter is
the space character.

Informatica Scenario Based Questions - Part 4

Take a look at the following tree structure diagram. From the tree structure, you
can easily derive the
parent-child relationship between the elements. For example, B is parent of D and
E.

The above tree structure data is represented in a table as shown below.


c1, c2, c3, c4
A, B, D, H
A, B, D, I
A, B, E, NULL
A, C, F, NULL
A, C, G, NULL
Here in this table, column C1 is parent of column C2, column C2 is parent of column
C3, column C3
is parent of column C4.
Q1. Design a mapping to load the target table with the below data. Here you need to
generate
sequence numbers for each element and then you have to get the parent id. As the
element "A" is at
root, it does not have any parent and its parent_id is NULL.
id, element, parent_id
1, A, NULL
2, B, 1
3, C, 1
4, D, 2
5, E, 2
6, F, 3
7, G, 3
Datastage Scenario Based Questions - Part 3
8, H, 4
9, I, 4
I have provided the solution for this problem in Oracle Sql query. If you are
interested you can Click
Here to see the solution.
Q2. This is an extension to the problem Q1. Let say column C2 has null for all the
rows, then C1
becomes the parent of C3 and c3 is parent of C4. Let say both columns c2 and c3 has
null for all the
rows. Then c1 becomes the parent of c4. Design a mapping to accommodate these type
of null
conditions.

For more scenario based questions visit


Part1
Part2
Part3
Part4
Part5

Datastage Scenario Based Questions - Part 3

Here i am providing some more scenario based interview questions on datastage. Try
to solve these
scenarios and improve your technical skills. If you get solutions to these
scenarios, please do
comment here.
1. Consider the following product types data as the source.

Product_id, product_type

------------------------

10, video

10, Audio

20, Audio

30, Audio

40, Audio

50, Audio
10, Movie

20, Movie

30, Movie

40, Movie

50, Movie

60, Movie

Assume that there are only 3 product types are available in the source. The source
contains 12
records and you dont know how many products are available in each product type.
Q1. Create a job to select 9 products in such a way that 3 products should be
selected from video, 3
products should be selected from Audio and the remaining 3 products should be
selected from
Movie.
Q2. In the above problem Q1, if the number of products in a particular product type
are less than 3,
then you wont get the total 9 records in the target table. For example, see the
videos type in the
source data. Now design a mapping in such way that even if the number of products
in a particular
product type are less than 3, then you have to get those less number of records
from another
product types. For example: If the number of products in videos are 1, then the
reamaining 2 records
should come from audios or movies. So, the total number of records in the target
table should
always be 9.
2. Create a job to convert column data into row data.
The source data looks like

col1, col2, col3

----------------

a, b, c

d, e, f

The target table data should look like

Col

---
a

3. Create a job to convert row data into column data.


The source data looks like

id, value

---------

10, a

10, b

10, c

20, d

20, e

20, f

The target table data should look like

id, col1, col2, col3

--------------------

10, a, b, c

20, d, e, f
Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

Here i am providing some scenario based questions on datastage. This scenarios not
only help you
for preparing the interview, these will also help you in improving your technical
skills in stage. Try to
solve the below scenario based questions.
1. Consider the following employees data as source?

employee_id, salary

-------------------

10, 1000

20, 2000

30, 3000

40, 5000

Q1. Create a job to load the cumulative sum of salaries of employees into target
table?
The target table data should look like as

employee_id, salary, cumulative_sum

-----------------------------------

10, 1000, 1000

20, 2000, 3000

30, 3000, 6000

40, 5000, 11000

Q2. Create a job to get the pervious row salary for the current row. If there is no
pervious row exists
for the current row, then the pervious row salary should be displayed as null.
The output should look like as
employee_id, salary, pre_row_salary

-----------------------------------

10, 1000, Null

20, 2000, 1000

30, 3000, 2000

40, 5000, 3000

Q3. Create a job to get the next row salary for the current row. If there is no
next row for the current
row, then the next row salary should be displayed as null.
The output should look like as

employee_id, salary, next_row_salary

------------------------------------

10, 1000, 2000

20, 2000, 3000

30, 3000, 5000

40, 5000, Null

Q4. Create a job to find the sum of salaries of all employees and this sum should
repeat for all the
rows.
The output should look like as

employee_id, salary, salary_sum

-------------------------------

10, 1000, 11000

20, 2000, 11000

30, 3000, 11000


40, 5000, 11000

2. Consider the following employees table as source

department_no, employee_name

----------------------------

20, R

10, A

10, D

20, P

10, B

10, C

20, Q

20, S

Q1. Create a job to load a target table with the following values from the above
source?

department_no, employee_list

--------------------------------

10, A

10, A,B

10, A,B,C

10, A,B,C,D

20, A,B,C,D,P

20, A,B,C,D,P,Q

20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S

Q2. Create a job to load a target table with the following values from the above
source?

department_no, employee_list

----------------------------

10, A

10, A,B

10, A,B,C

10, A,B,C,D

20, P

20, P,Q

20, P,Q,R

20, P,Q,R,S

Q3. Create a job to load a target table with the following values from the above
source?

department_no, employee_names

-----------------------------

10, A,B,C,D

20, P,Q,R,S

Datastage Scenario Based Questions - Part 1


Datastage Scenario Based Questions - Part 1

Most of the you just simply prepare for the interview questions by reading
conceptual questions and
ignore preparing for the scenario questions. That is the reason here i am providing
the scenarios
which mostly asked in the interviews. Be prepared with the below interview
questions.
1. Create a job to load the first 3 records from a flat file into a target table?
2. Create a job to load the last 3 records from a flat file into a target table?
3. Create a job to load the first record from a flat file into one table A, the
last record from a flat file
into table B and the remaining records into table C?
4. Consider the following products data which contain duplicate records.

Answer the below questions


Q1. Create a job to load all unique products in one table and the duplicate rows in
to another table.
The first table should contain the following output

The second target should contain the following output


Business Objects Interview Questions - Part 4
B

Q2. Create a job to load each product once into one table and the remaining
products which are
duplicated into another table.
The first table should contain the following output

The second table should contain the following output

Business Objects Interview Questions - Part 4

1. What are the different types of Universes?


Simple universe and Complex universe.
Business Objects Interview Questions - Part 3
2. What are Universe parameters?
The universe parameters are Name of the universe, Description, RDBMS connection,
size and
rights.
3. Explain detail objects?
Detail objects are attached to dimensions; one cannot drill on details or link on
details when linking
multiple data providers. While Customer ID would be a dimension, customer name,
address, phone
and so on should be details.
4. What is a loop? How can we overcome?
Loop is nothing but a closed circular flow; it can be overcome by making use of
Alias and Context.
5. What is fan trap?
Fan trap is a type of join path between three tables where a one-to-many links a
table which in turn
is linked to another table by one-to-many join.
6. What is Chasm trap?
Chasm trap is a type of join path between three tables where two many-to-one join
converges to a
single table and there is no context in place to separate the converging paths.
7. What is a context in business objects?
Context is used to resolve loops in the universe. Context is a rule by which the
designer can decide
which path to choose when more than one join path exists from one table to another
table.
8. What is detect alias in business objects?
Detect alias is used to resolve the loops in universe because of the joins. Alias
is created on a table
which will have the same columns as of the original table.
9. What do you mean by Object qualification?
Object qualification represents what kind of object it is. we have three types of
object qualifiers they
are measure, dimension and detail.
10. What is the default behavior when creating a report from two queries from the
same universe?
Dimension objects are automatically merged.

Business Objects Interview Questions - Part 3

1. What are Linked Universes?


If the data provided is from two different data providers then we can link those
two universes, such
type of universe is called Linked Universe.
Business Objects Interview Questions - Part 2
2. What is an alerter?
Alters are used to draw attention to a block of data by highlighting.
3. What is a break?
Break is used to group the data without any change in the format. Break is applied
on the whole
block. It gives the row at the bottom of block for subtotal. It breaks a whole
report into the smaller
groups according to the selected column. When you apply a chart on this block, it
gives you one
chart as it is only a single block.
4. What is a condition and filter?

. A condition forces a query to retrieve only the data that meets the criteria.
Condition is not
reusable.
. A filter is applied on a report and allows you to view the required data. Filter
restricts the number
of rows displayed in the report.

5. What is the difference between master-detail and Breaks?


In break common fields are deleted and the table format is not changed. In master-
detail, we declare
certain entity as a master to get the detailed information or report. In this case
the table format is
changed.
6. What is the use of AFD? Where it can be stored?
AFD is used to create dashboards. It can be stored in repository, corporate or
personal.
7. What is the use of BO SDK?
Bo SDK's main use is to suppress �no data to fetch� using Macros.
8. How can you hide data using an alerter?
Build an alerter with a formula that changes the font color to the background
color.
9. How can you activate data tracking?
Select the �Track� button on the toolbar.
10. What is the use of BCA?
BCA is used to refresh and schedule and export and save
as .html, .rtf, .xls , .pdf.

Business Objects Interview Questions - Part 2

1. What is [Link]?
[Link] file contains the information about the repository site i.e. it contains
the address of the
repository security domain.
2. What is a metric?
Metrics are a system of parameters or ways of quantitative and periodic assessment
of a process
that is to be measured; these are used to track trends, productivity.
Business Objects Interview Questions - Part 1
3. What is the source for metrics?
Measure objects.
4. What is a Set?
A set is grouping of users.
5. Why do we need metrics and sets?
Metrics are used for analysis and Sets are used for grouping.
6. What is a section in Business objects report?
When you apply section on a block it divides the report into smaller sections and
the columns on
which you apply section will appear as the heading out of the block. When you apply
a chart on this
block every section have an individual chart for its own section.
7. What are the different sections available in Business objects?
The different sections are:

. Report Header
. Page Header
. Details
. Report Footer
. Page Footer

8. What is a local filter?


A filter which applies to a one single block in the report is called a local
filter. Local filters are
applicable to a specific data provider.
9. What is a global filter?
A filter which applies to all blocks in the report is called a global filter.
Global filters are applicable to
all data providers in a given report.
10. What is a Microcube?
Microcube is the bulk of data retrieved when you run the query. The BO server gets
the result of the
query from the database and makes all possible combinations of the query in the
microcube. Finally
you can see the report and can perform slicing and dicing.

Business Objects Interview Questions - Part 1

1. What is a repository?
Business objects repository is a set of database tables where the metadata of your
application is
stored.
2. When is the repository created?
In 5i/6i versions, the repository is created after installing the software. In Xi
version a repository is
created at the time of installation.
3. What is a domain?
A domain is nothing but a logical grouping of system tables.
4. How many domains are there in the basic setup and what are they?
There are three domains in the business objects. They are:

. Security domain: Contains the information about user/groups/access privileges


etc.
. Universe domain: Contains the information about joins, loops, classes, objects
and hierarchies.
. Document domain: Contains the output of reports.

5. Can we have multiple domains in business objects?


Yes. Security domain cannot be multiple.
6. What is a universe?
Universe is a semantic layer between database and the designer used to create
objects and
classes. It maps to data in the database.
7. What is a category?
A category is a grouping of certain entities.
8. What is an object?
An object is an instance of a class. It is nothing but an entity.
9. How you will link two universes?
In the BO designer, go to links option in the edit menu. Then universe parameter
dialog box opens.
Click add link button to select the universe from the list of available universes.
10. What are the different types of objects available in the universe class of
business objects?
The different types of objects are:

. Dimension object: Provides the parameters which are mainly focus for analysis.
Example:
customer name, country name
. Detail object: Provides the description of a dimension object but is not the
focus for analysis.
Example: customer address, phone number.
. Measure object: Provides the numerical quantities. Example: sales, revenue.

Informatica Interview Questions on Transformations

The transformations which used mostly are listed in the below table. Click on the
transforamtion to
see the interview questions on the particular transformation.

Aggregator

Active/Connected

Expression

Passive/Connected
Filter

Active/Connected

Joiner

Active/Connected

Lookup

Passive/Connected or Unconnected

Normalizer

Active/Connected

Rank

Active/Connected

Router

Active/Connected

Sequence Generator

Passive/Connected

Sorter

Active/Connected

Source Qualifier

Active/Connected

SQL

Active or Passive/Connected

Stored Procedure

Passive/Connected or Unconnected

Transaction Control

Active/Connected

Union

Active/Connected

Update Strategy

Active/Connected
1. What is a transformation?
A transformation is a repository object that generates, modifies, or passes data.
2. What is an active transformation?
An active transformation is the one which changes the number of rows that pass
through it.
Example: Filter transformation
3. What is a passive transformation?
A passive transformation is the one which does not change the number of rows that
pass through it.
Example: Expression transformation
4. What is a connected transformation?
A connected transformation is connected to the data flow or connected to the other
transformations
in the mapping pipeline.
Example: sorter transformation
5. What is an unconnected transformation?
An unconnected transformation is not connected to other transformations in the
mapping. An
unconnected transformation is called within another transformation and returns a
value to that
transformation.
Example: Unconnected lookup transformation, unconnected stored procedure
transformation
6. What are multi-group transformations?
Transformations having multiple input and output groups are called multi-group
transformations.
Examples: Custom, HTTP, Joiner, Router, Union, Unstructured Data, XML source
qualifier, XML
Target definition, XML parser, XML generator
7. List out all the transformations which use cache?
Aggregator, Joiner, Lookup, Rank, Sorter
8. What is blocking transformation?
Transformation which blocks the input rows are called blocking transformation.
Example: Custom transformation, unsorted joiner
9. What is a reusable transformation?
A reusable transformation is the one which can be used in multiple mappings.
Reusable
transformation is created in transformation developer.
10. How do you promote a non-reusable transformation to reusable transformation?
Edit the transformation and check the Make Reusable option
11. How to create a non-reusable instance of reusable transformations?
In the navigator, select an existing transformation and drag the transformation
into the mapping
workspace. Hold down the Ctrl key before you release the transformation.
12. Which transformation can be created only as reusable transformation but not as
non-reusable
transformation?
External procedure transformation.

Informatica Interview Questions on Expression Transformation

1. What is an expression transformation?


An expression transformation is used to calculate values in a single row.
Example: salary+1000
2. How to generate sequence numbers using expression transformation?
Create a variable port in expression transformation and increment it by one for
every row. Assign
this variable port to an output port.
3. Consider the following employees data as source?
Employee_id, Salary

-------------------

10, 1000

20, 2000

30, 3000

40, 5000

Q1. Design a mapping to load the cumulative sum of salaries of employees into
target table?
The target table data should look like as

Employee_id, Salary, Cumulative_sum

-----------------------------------

10, 1000, 1000

20, 2000, 3000

30, 3000, 6000

40, 5000, 11000

Q2. Design a mapping to get the pervious row salary for the current row. If there
is no pervious row
exists for the current row, then the pervious row salary should be displayed as
null.
The output should look like as

Employee_id, Salary, Pre_row_salary

-----------------------------------

10, 1000, Null

20, 2000, 1000

30, 3000, 2000

40, 5000, 3000


4. Consider the following employees table as source

Department_no, Employee_name

----------------------------

20, R

10, A

10, D

20, P

10, B

10, C

20, Q

20, S

Q1. Design a mapping to load a target table with the following values from the
above source?

Department_no, Employee_list

----------------------------

10, A

10, A,B

10, A,B,C

10, A,B,C,D

20, A,B,C,D,P

20, A,B,C,D,P,Q

20, A,B,C,D,P,Q,R

20, A,B,C,D,P,Q,R,S
Q2. Design a mapping to load a target table with the following values from the
above source?

Department_no, Employee_list

----------------------------

10, A

10, A,B

10, A,B,C

10, A,B,C,D

20, P

20, P,Q

20, P,Q,R

20, P,Q,R,S

Solution

Informatica Scenario Based Questions - Part 2

1. Consider the following employees data as source


employee_id, salary
10, 1000
20, 2000
30, 3000
40, 5000
Q1. Design a mapping to load the cumulative sum of salaries of employees into
target table?
The target table data should look like as
employee_id, salary, cumulative_sum
10, 1000, 1000
20, 2000, 3000
30, 3000, 6000
40, 5000, 11000
Solution:
Connect the source Qualifier to expression transformation. In the expression
transformation, create
a variable port V_cum_sal and in the expression editor write V_cum_sal+salary.
Create an output
port O_cum_sal and assign V_cum_sal to it.
Q2. Design a mapping to get the pervious row salary for the current row. If there
is no pervious row
exists for the current row, then the pervious row salary should be displayed as
null.
The output should look like as
employee_id, salary, pre_row_salary
10, 1000, Null
20, 2000, 1000
30, 3000, 2000
40, 5000, 3000
Solution:
Connect the source Qualifier to expression transformation. In the expression
transformation, create
a variable port V_count and increment it by one for each row entering the
expression transformation.
Also create V_salary variable port and assign the expression
IIF(V_count=1,NULL,V_prev_salary) to
it . Then create one more variable port V_prev_salary and assign Salary to it. Now
create output port
O_prev_salary and assign V_salary to it. Connect the expression transformation to
the target ports.
In the expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
V_salary=IIF(V_count=1,NULL,V_prev_salary)
V_prev_salary=salary
O_prev_salary=V_salary
Q3. Design a mapping to get the next row salary for the current row. If there is no
next row for the
current row, then the next row salary should be displayed as null.
The output should look like as
employee_id, salary, next_row_salary
10, 1000, 2000
20, 2000, 3000
30, 3000, 5000
40, 5000, Null
Solution:
Step1: Connect the source qualifier to two expression transformation. In each
expression
transformation, create a variable port V_count and in the expression editor write
V_count+1. Now
create an output port O_count in each expression transformation. In the first
expression
transformation, assign V_count to O_count. In the second expression transformation
assign
V_count-1 to O_count.
In the first expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
O_count=V_count
In the second expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
O_count=V_count-1
Step2: Connect both the expression transformations to joiner transformation and
join them on the
port O_count. Consider the first expression transformation as Master and second one
as detail. In
the joiner specify the join type as Detail Outer Join. In the joiner transformation
check the property
sorted input, then only you can connect both expression transformations to joiner
transformation.
Step3: Pass the output of joiner transformation to a target table. From the joiner,
connect the
employee_id, salary which are obtained from the first expression transformation to
the employee_id,
salary ports in target table. Then from the joiner, connect the salary which is
obtained from the
second expression transformaiton to the next_row_salary port in the target table.
Q4. Design a mapping to find the sum of salaries of all employees and this sum
should repeat for all
the rows.
The output should look like as
employee_id, salary, salary_sum
10, 1000, 11000
20, 2000, 11000
30, 3000, 11000
40, 5000, 11000
Solution:
Step1: Connect the source qualifier to the expression transformation. In the
expression
transformation, create a dummy port and assign value 1 to it.
In the expression transformation, the ports will be
employee_id
salary
O_dummy=1
Step2: Pass the output of expression transformation to aggregator. Create a new
port
O_sum_salary and in the expression editor write SUM(salary). Do not specify group
by on any port.
In the aggregator transformation, the ports will be
salary
O_dummy
O_sum_salary=SUM(salary)
Step3: Pass the output of expression transformation, aggregator transformation to
joiner
transformation and join on the DUMMY port. In the joiner transformation check the
property sorted
input, then only you can connect both expression and aggregator to joiner
transformation.
Step4: Pass the output of joiner to the target table.
2. Consider the following employees table as source
department_no, employee_name
20, R
10, A
10, D
20, P
10, B
10, C
20, Q
20, S
Q1. Design a mapping to load a target table with the following values from the
above source?
department_no, employee_list
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then
pass the output to the expression transformation. In the expression transformation,
the ports will be
department_no
employee_name
V_employee_list =
IIF(ISNULL(V_employee_list),employee_name,V_employee_list||','||employee_name)
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.
Q2. Design a mapping to load a target table with the following values from the
above source?
department_no, employee_list
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, P
20, P,Q
20, P,Q,R
20, P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then
pass the output to the expression transformation. In the expression transformation,
the ports will be
department_no
employee_name
V_curr_deptno=department_no
V_employee_list = IIF(V_curr_deptno! =
V_prev_deptno,employee_name,V_employee_list||','||employee_name)
V_prev_deptno=department_no
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.
Q3. Design a mapping to load a target table with the following values from the
above source?
department_no, employee_names
10, A,B,C,D
20, P,Q,R,S
Solution:
The first step is same as the above problem. Pass the output of expression to an
aggregator
transformation and specify the group by as department_no. Now connect the
aggregator
transformation to a target table.

Informatica Interview Questions on Aggregator Transformation

1. What is aggregator transformation?


Aggregator transformation performs aggregate calculations like sum, average, count
etc. It is an
active transformation, changes the number of rows in the pipeline. Unlike
expression transformation
(performs calculations on a row-by-row basis), an aggregator transformation
performs calculations
on group of rows.
2. What is aggregate cache?
The integration service creates index and data cache in memory to process the
aggregator
transformation and stores the data group in index cache, row data in data cache. If
the integration
service requires more space, it stores the overflow values in cache files.
3. How can we improve performance of aggregate transformation?

. Use sorted input: Sort the data before passing into aggregator. The integration
service uses
memory to process the aggregator transformation and it does not use cache memory.
. Filter the unwanted data before aggregating.
. Limit the number of input/output or output ports to reduce the amount of data the
aggregator
transformation stores in the data cache.

4. What are the different types of aggregate functions?


The different types of aggregate functions are listed below:

. AVG
. COUNT
. FIRST
. LAST
. MAX
. MEDIAN
. MIN
. PERCENTILE
. STDDEV
. SUM
. VARIANCE
5. Why cannot you use both single level and nested aggregate functions in a single
aggregate
transformation?
The nested aggregate function returns only one output row, whereas the single level
aggregate
function returns more than one row. Since the number of rows returned are not same,
you cannot
use both single level and nested aggregate functions in the same transformation. If
you include both
the single level and nested functions in the same aggregator, the designer marks
the mapping or
mapplet as invalid. So, you need to create separate aggregator transformations.
6. Up to how many levels, you can nest the aggregate functions?
We can nest up to two levels only.
Example: MAX( SUM( ITEM ) )
7. What is incremental aggregation?
The integration service performs aggregate calculations and then stores the data in
historical cache.
Next time when you run the session, the integration service reads only new data and
uses the
historical cache to perform new aggregation calculations incrementally.
8. Why cannot we use sorted input option for incremental aggregation?
In incremental aggregation, the aggregate calculations are stored in historical
cache on the server.
In this historical cache the data need not be in sorted order. If you give sorted
input, the records
come as presorted for that particular run but in the historical cache the data may
not be in the sorted
order. That is why this option is not allowed.
9. How the NULL values are handled in Aggregator?
You can configure the integration service to treat null values in aggregator
functions as NULL or
zero. By default the integration service treats null values as NULL in aggregate
functions.

Informatica Interview Questions on Filter Transformation

1. What is a filter transformation?


A filter transformation is used to filter out the rows in mapping. The filter
transformation allows the
rows that meet the filter condition to pass through and drops the rows that do not
meet the condition.
Filter transformation is an active transformation.
2. Can we specify more than one filter condition in a filter transformation?
We can only specify one condition in the filter transformation. To specify more
than one condition,
we have to use router transformation?
3. In which case a filter transformation acts as passive transformation?
If the filter condition is set to TRUE, then it passes all the rows without
filtering any data. In this case,
the filter transformation acts as passive transformation.
4. Can we concatenate ports from more than one transformation into the filter
transformation?
No. The input ports for the filter must come from a single transformation.
5. How to filter the null values and spaces?
Use the ISNULL and IS_SPACES functions
Example: IIF(ISNULL(commission),FALSE,TRUE)
6. How session performance can be improved by using filter transformation?
Keep the filter transformation as close as possible to the sources in the mapping.
This allows the
unwanted data to be discarded and the integration service processes only the
required rows. If the
source is relational source, use the source qualifier to filter the rows.

Informatica Interview Questions on Joiner Transformation

1. What is a joiner transformation?


A joiner transformation joins two heterogeneous sources. You can also join the data
from the same
source. The joiner transformation joins sources with at least one matching column.
The joiner uses a
condition that matches one or more joins of columns between the two sources.
2. How many joiner transformations are required to join n sources?
To join n sources n-1 joiner transformations are required.
3. What are the limitations of joiner transformation?

. You cannot use a joiner transformation when input pipeline contains an update
strategy
transformation.
. You cannot use a joiner if you connect a sequence generator transformation
directly before the
joiner.

4. What are the different types of joins?

. Normal join: In a normal join, the integration service discards all the rows from
the master and
detail source that do not match the join condition.
. Master outer join: A master outer join keeps all the rows of data from the detail
source and the
matching rows from the master source. It discards the unmatched rows from the
master source.
. Detail outer join: A detail outer join keeps all the rows of data from the master
source and the
matching rows from the detail source. It discards the unmatched rows from the
detail source.
. Full outer join: A full outer join keeps all rows of data from both the master
and detail rows.

5. What is joiner cache?


When the integration service processes a joiner transformation, it reads the rows
from master source
and builds the index and data cached. Then the integration service reads the detail
source and
performs the join. In case of sorted joiner, the integration service reads both
sources (master and
detail) concurrently and builds the cache based on the master rows.
6. How to improve the performance of joiner transformation?

. Join sorted data whenever possible.


. For an unsorted Joiner transformation, designate the source with fewer rows as
the master
source.
. For a sorted Joiner transformation, designate the source with fewer duplicate key
values as the
master source.

7. Why joiner is a blocking transformation?


When the integration service processes an unsorted joiner transformation, it reads
all master rows
before it reads the detail rows. To ensure it reads all master rows before the
detail rows, the
integration service blocks all the details source while it caches rows from the
master source. As it
blocks the detail source, the unsorted joiner is called a blocking transformation.
8. What are the settings used to configure the joiner transformation

. Master and detail source


. Type of join
. Join condition

Informatica Interview Questions on Lookup Transformation

1. What is a lookup transformation?


A lookup transformation is used to look up data in a flat file, relational table,
view, and synonym.

2. What are the tasks of a lookup transformation?


The lookup transformation is used to perform the following tasks?

. Get a related value: Retrieve a value from the lookup table based on a value in
the source.
. Perform a calculation: Retrieve a value from a lookup table and use it in a
calculation.
. Update slowly changing dimension tables: Determine whether rows exist in a
target.

3. How do you configure a lookup transformation?


Configure the lookup transformation to perform the following types of lookups:
. Relational or flat file lookup
. Pipeline lookup
. Connected or unconnected lookup
. Cached or uncached lookup

4. What is a pipeline lookup transformation?


A pipeline lookup transformation is used to perform lookup on application sources
such as JMS,
MSMQ or SAP. A pipeline lookup transformation has a source qualifier as the lookups
source.

5. What is connected and unconnected lookup transformation?

. A connected lookup transformation is connected the transformations in the mapping


pipeline. It
receives source data, performs a lookup and returns data to the pipeline.
. An unconnected lookup transformation is not connected to the other
transformations in the
mapping pipeline. A transformation in the pipeline calls the unconnected lookup
with a :LKP
expression.

6. What are the differences between connected and unconnected lookup


transformation?

. Connected lookup transformation receives input values directly from the pipeline.
Unconnected
lookup transformation receives input values from the result of a :LKP expression in
another
transformation.
. Connected lookup transformation can be configured as dynamic or static cache.
Unconnected
lookup transformation can be configured only as static cache.
. Connected lookup transformation can return multiple columns from the same row or
insert into
the dynamic lookup cache. Unconnected lookup transformation can return one column
from
each row.
. If there is no match for the lookup condition, connected lookup transformation
returns default
value for all output ports. If you configure dynamic caching, the Integration
Service inserts rows
into the cache or leaves it unchanged. If there is no match for the lookup
condition, the
unconnected lookup transformation returns null.
. In a connected lookup transformation, the cache includes the lookup source
columns in the
lookup condition and the lookup source columns that are output ports. In an
unconnected
lookup transformation, the cache includes all lookup/output ports in the lookup
condition and the
lookup/return port.
. Connected lookup transformation passes multiple output values to another
transformation.
Unconnected lookup transformation passes one output value to another
transformation.
. Connected lookup transformation supports user-defined values. Unconnected lookup
transformation does not support user-defined default values.

7. How do you handle multiple matches in lookup transformation? or what is "Lookup


Policy on
Multiple Match"?
"Lookup Policy on Multiple Match" option is used to determine which rows that the
lookup
transformation returns when it finds multiple rows that match the lookup condition.
You can select
lookup to return first or last row or any matching row or to report an error.

8. What is "Output Old Value on Update"?


This option is used when dynamic cache is enabled. When this option is enabled, the
integration
service outputs old values out of the lookup/output ports. When the Integration
Service updates a
row in the cache, it outputs the value that existed in the lookup cache before it
updated the row
based on the input data. When the Integration Service inserts a new row in the
cache, it outputs null
values. When you disable this property, the Integration Service outputs the same
values out of the
lookup/output and input/output ports.

9. What is "Insert Else Update" and "Update Else Insert"?


These options are used when dynamic cache is enabled.

. Insert Else Update option applies to rows entering the lookup transformation with
the row type of
insert. When this option is enabled the integration service inserts new rows in the
cache and
updates existing rows when disabled, the Integration Service does not update
existing rows.
. Update Else Insert option applies to rows entering the lookup transformation with
the row type of
update. When this option is enabled, the Integration Service updates existing rows,
and inserts
a new row if it is new. When disabled, the Integration Service does not insert new
rows.

10. What are the options available to configure a lookup cache?


The following options can be used to configure a lookup cache:

. Persistent cache
. Recache from lookup source
. Static cache
. Dynamic cache
. Shared Cache
. Pre-build lookup cache

11. What is a cached lookup transformation and uncached lookup transformation?

. Cached lookup transformation: The Integration Service builds a cache in memory


when it
processes the first row of data in a cached Lookup transformation. The Integration
Service
stores condition values in the index cache and output values in the data cache. The
Integration
Service queries the cache for each row that enters the transformation.
. Uncached lookup transformation: For each row that enters the lookup
transformation, the
Integration Service queries the lookup source and returns a value. The integration
service does
not build a cache.
12. How the integration service builds the caches for connected lookup
transformation?
The Integration Service builds the lookup caches for connected lookup
transformation in the
following ways:

. Sequential cache: The Integration Service builds lookup caches sequentially. The
Integration
Service builds the cache in memory when it processes the first row of the data in a
cached
lookup transformation.
. Concurrent caches: The Integration Service builds lookup caches concurrently. It
does not need
to wait for data to reach the Lookup transformation.

13. How the integration service builds the caches for unconnected lookup
transformation?
The Integration Service builds caches for unconnected Lookup transformations as
sequentially.
14. What is a dynamic cache?
The dynamic cache represents the data in the target. The Integration Service builds
the cache when
it processes the first lookup request. It queries the cache based on the lookup
condition for each row
that passes into the transformation. The Integration Service updates the lookup
cache as it passes
rows to the target. The integration service either inserts the row in the cache or
updates the row in
the cache or makes no change to the cache.

15. When you use a dynamic cache, do you need to associate each lookup port with
the input port?
Yes. You need to associate each lookup/output port with the input/output port or a
sequence ID. The
Integration Service uses the data in the associated port to insert or update rows
in the lookup cache.

16. What are the different values returned by NewLookupRow port?


The different values are

. 0 - Integration Service does not update or insert the row in the cache.
. 1 - Integration Service inserts the row into the cache.
. 2 - Integration Service updates the row in the cache.

17. What is a persistent cache?


If the lookup source does not change between session runs, then you can improve the
performance
by creating a persistent cache for the source. When a session runs for the first
time, the integration
service creates the cache files and saves them to disk instead of deleting them.
The next time when
the session runs, the integration service builds the memory from the cache file.

18. What is a shared cache?


You can configure multiple Lookup transformations in a mapping to share a single
lookup cache.
The Integration Service builds the cache when it processes the first Lookup
transformation. It uses
the same cache to perform lookups for subsequent Lookup transformations that share
the cache.

19. What is unnamed cache and named cache?

. Unnamed cache: When Lookup transformations in a mapping have compatible caching


structures, the Integration Service shares the cache by default. You can only share
static
unnamed caches.
. Named cache: Use a persistent named cache when you want to share a cache file
across
mappings or share a dynamic and a static cache. The caching structures must match
or be
compatible with a named cache. You can share static and dynamic named caches.
20. How do you improve the performance of lookup transformation?

. Create an index on the columns used in the lookup condition


. Place conditions with equality operator first
. Cache small lookup tables.
. Join tables in the database: If the source and the lookup table are in the same
database, join
the tables in the database rather than using a lookup transformation.
. Use persistent cache for static lookups.
. Avoid ORDER BY on all columns in the lookup source. Specify explicitly the ORDER
By clause
on the required columns.
. For flat file lookups, provide Sorted files as lookup source.
Informatica Interview Questions on Normalizer Transformation

1. What is normalizer transformation?


The normalizer transformation receives a row that contains multiple-occurring
columns and retruns a
row for each instance of the multiple-occurring data. This means it converts column
data in to row
data. Normalizer is an active transformation.
2. Which transformation is required to process the cobol sources?
Since the cobol sources contain denormalzed data, normalizer transformation is used
to normalize
the cobol sources.
3. What is generated key and generated column id in a normalizer transformation?

. The integration service increments the generated key sequence number each time it
process a
source row. When the source row contains a multiple-occurring column or a multiple-
occurring
group of columns, the normalizer transformation returns a row for each occurrence.
Each row
contains the same generated key value.
. The normalizer transformation has a generated column ID (GCID) port for each
multiple-
occurring column. The GCID is an index for the instance of the multiple-occurring
data. For
example, if a column occurs 3 times in a source record, the normalizer returns a
value of 1,2 or
3 in the generated column ID.

4. What is VSAM?
VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe
operating
system. VSAM organize records in indexed or sequential flat files.
5. What is VSAM normalizer transformation?
The VSAM normalizer transformation is the source qualifier transformation for a
COBOL source
definition. A COBOL source is flat file that can contain multiple-occurring data
and multiple types of
records in the same file.
6. What is pipeline normalizer transformation?
Pipeline normalizer transformation processes multiple-occurring data from
relational tables or flat
files.
7. What is occurs clause and redefines clause in normalizer transformation?

. Occurs clause is specified when the source row has a multiple-occurring columns.
. A redefines clause is specified when the source has rows of multiple columns.
Informatica Interview Questions on Rank Transformation

1. What is rank transformation?


A rank transformation is used to select top or bottom rank of data. This means, it
selects the largest
or smallest numeric value in a port or group. Rank transformation also selects the
strings at the top
or bottom of a session sort order. Rank transformation is an active transformation.

2. What is rank cache?


The integration service compares input rows in the data cache, if the input row
out-ranks a cached
row, the integration service replaces the cached row with the input row. If you
configure the rank
transformation to rank across multiple groups, the integration service ranks
incrementally for each
group it finds. The integration service stores group information in index cache and
row data in data
cache.
3. What is RANKINDEX port?
The designer creates RANKINDEX port for each rank transformation. The integration
service uses
the rank index port to store the ranking position for each row in a group.
4. How do you specify the number of rows you want to rank in a rank transformation?

In the rank transformation properties, there is an option 'Number of Ranks' for


specifying the number
of rows you wants to rank.
5. How to select either top or bottom ranking for a column?
In the rank transformation properties, there is an option 'Top/Bottom' for
selecting the top or bottom
ranking for a column.
6. Can we specify ranking on more than one port?
No. We can specify to rank the data based on only one port. In the ports tab, you
have to check the
R option for designating the port as a rank port and this option can be checked
only on one port.

Informatica Interview Questions on Sequence Generator Transformation

1. What is a sequence generator transformation?


A Sequence generator transformation generates numeric values. Sequence generator
transformation is a passive transformation.
2. What is the use of a sequence generator transformation?
A sequence generator is used to create unique primary key values, replace missing
primary key
values or cycle through a sequential range of numbers.
3. What are the ports in sequence generator transformation?
A sequence generator contains two output ports. They are CURRVAL and NEXTVAL.
4. What is the maximum number of sequence that a sequence generator can generate?
The maximum value is 9,223,372,036,854,775,807
5. When you connect both the NEXTVAL and CURRVAL ports to a target, what will be
the output
values of these ports?
The output values are
NEXTVAL CURRVAL
1 2
2 3
3 4
4 5
5 6
6. What will be the output value, if you connect only CURRVAL to the target without
connecting
NEXTVAL?
The integration service passes a constant value for each row.
7. What will be the value of CURRVAL in a sequence generator transformation?
CURRVAL is the sum of "NEXTVAL" and "Increment By" Value.
8. What is the number of cached values set to default for a sequence generator
transformation?
For non-reusable sequence generators, the number of cached values is set to zero.
For reusable sequence generators, the number of cached values is set to 1000.
9. How do you configure a sequence generator transformation?
The following properties need to be configured for a sequence generator
transformation:

. Start Value
. Increment By
. End Value
. Current Value
. Cycle
. Number of Cached Values
Informatica Interview Questions on Sorter Transformation

1. What is a sorter transformation?


Sorter transformation is used to sort the data. You can sort the data either in
ascending or
descending order according to a specified sort key.
2. Why sorter is an active transformation?
As sorter transformation can suppress the duplicate records in the source, it is
called an active
transformation.
3. How to improve the performance of a session using sorter transformation?
Sort the data using sorter transformation before passing in to aggregator or joiner
transformation. As
the data is sorted, the integration service uses the memory to do aggregate and
join operations and
does not use cache files to process the data.

Informatica Interview Questions on Source Qualifier Transformation

1. What is a source qualifier transformation?


A source qualifier represents the rows that the integration service reads when it
runs a session.
Source qualifier is an active transformation.
2. Why you need a source qualifier transformation?
The source qualifier transformation converts the source data types into informatica
native data types.
3. What are the different tasks a source qualifier can do?

. Join two or more tables originating from the same source (homogeneous sources)
database.
. Filter the rows.
. Sort the data
. Selecting distinct values from the source
. Create custom query
. Specify a pre-sql and post-sql

4. What is the default join in source qualifier transformation?


The source qualifier transformation joins the tables based on the primary key-
foreign key
relationship.
5. How to create a custom join in source qualifier transformation?
When there is no primary key-foreign key relationship between the tables, you can
specify a custom
join using the 'user-defined join' option in the properties tab of source
qualifier.
6. How to join heterogeneous sources and flat files?
Use joiner transformation to join heterogeneous sources and flat files
7. How do you configure a source qualifier transformation?

. SQL Query
. User-Defined Join
. Source Filter
. Number of Sorted Ports
. Select Distinct
. Pre-SQL
. Post-SQL

Informatica Interview Questions on SQL Transformation

1. What is SQL transformation?


SQL transformation process SQL queries midstream in a pipeline and you can insert,
update, delete
and retrieve rows from a database.

2. How do you configure a SQL transformation?

The following options are required to configure SQL transformation:

. Mode: Specifies the mode in which SQL transformation runs. SQL transformation
supports two
modes. They are script mode and query mode.
. Database type: The type of database that SQL transformation connects to.
. Connection type: Pass database connection to the SQL transformation at run time
or specify a
connection object.

3. What are the different modes in which a SQL transformation runs?


SQL transformation runs in two modes. They are:

. Script mode: The SQL transformation runs scripts that are externally located. You
can pass a
script name to the transformation with each input row. The SQL transformation
outputs one row
for each input row.
. Query mode: The SQL transformation executes a query that you define in a query
editor. You
can pass parameters to the query to define dynamic queries. You can output multiple
rows
when the query has a SELECT statement.

4. In which cases the SQL transformation becomes a passive transformation and


active
transformation?
If you run the SQL transformation in script mode, then it becomes passive
transformation. If you run
the SQL transformation in the query mode and the query has a SELECT statement, then
it becomes
an active transformation.

5. When you configure an SQL transformation to run in script mode, what are the
ports that the
designer adds to the SQL transformation?
The designer adds the following ports to the SQL transformation in script mode:
. ScriptName: This is an input port. ScriptName receives the name of the script to
execute the
current row.
. ScriptResult: This is an output port. ScriptResult returns PASSED if the script
execution
succeeds for the row. Otherwise it returns FAILED.
. ScriptError: This is an output port. ScriptError returns the errors that occur
when a script fails for
a row.

6. What are the types of SQL queries you can specify in the SQL transformation when
you use it in
query mode.

. Static SQL query: The query statement does not change, but you can use query
parameters to
change the data. The integration service prepares the query once and runs the query
for all
input rows.
. Dynamic SQL query: The query statement can be changed. The integration service
prepares a
query for each input row.

7. What are the types of connections to connect the SQL transformation to the
database available?

. Static connection: Configure the connection object tin the session. You must
first create the
connection object in workflow manager.
. Logical connection: Pass a connection name to the SQL transformation as input
data at run
time. You must first create the connection object in workflow manager.
. Full database connection: Pass the connect string, user name, password and other
connection
information to SQL transformation input ports at run time.

8. How do you find the number of rows inserted, updated or deleted in a table?

You can enable the NumRowsAffected output port to return the number of rows
affected by the
INSERT, UPDATE or DELETE query statements in each input row. This NumRowsAffected
option
works in query mode.

9. What will be the output of NumRowsAffected port for a SELECT statement?

The NumRowsAffected outout is zero for the SELECT statement.

10. When you enable the NumRowsAffected output port in script mode, what will be
the output?

In script mode, the NumRowsAffected port always returns NULL.

11. How do you limit the number of rows returned by the select statement?

You can limit the number of rows by configuring the Max Output Row Count property.
To configure
unlimited output rows, set Max Output Row Count to zero.

Informatica Interview Questions on Stored Procedure Transformation


1. What is a stored procedure?
A stored procedure is a precompiled collection of database procedural statements.
Stored
procedures are stored and run within the database.
2. Give some examples where a stored procedure is used?
The stored procedure can be used to do the following tasks

. Check the status of a target database before loading data into it.
. Determine if enough space exists in a database.
. Perform a specialized calculation.
. Drop and recreate indexes.

3. What is a connected stored procedure transformation?


The stored procedure transformation is connected to the other transformations in
the mapping
pipeline.
4. In which scenarios a connected stored procedure transformation is used?

. Run a stored procedure every time a row passes through the mapping.
. Pass parameters to the stored procedure and receive multiple output parameters.

5. What is an unconnected stored procedure transformation?


The stored procedure transformation is not connected directly to the flow of the
mapping. It either
runs before or after the session or is called by an expression in another
transformation in the
mapping.
6. In which scenarios an unconnected stored procedure transformation is used?

. Run a stored procedure before or after a session


. Run a stored procedure once during a mapping, such as pre or post-session.
. Run a stored procedure based on data that passes through the mapping, such as
when a
specific port does not contain a null value.
. Run nested stored procedures.
. Call multiple times within a mapping.

7. What are the options available to specify when the stored procedure
transformation needs to be
run?
The following options describe when the stored procedure transformation runs:

. Normal: The stored procedure runs where the transformation exists in the mapping
on a row-by-
row basis. This is useful for calling the stored procedure for each row of data
that passes
through the mapping, such as running a calculation against an input port. Connected
stored
procedures run only in normal mode.
. Pre-load of the Source: Before the session retrieves data from the source, the
stored procedure
runs. This is useful for verifying the existence of tables or performing joins of
data in a
temporary table.
. Post-load of the Source: After the session retrieves data from the source, the
stored procedure
runs. This is useful for removing temporary tables.
. Pre-load of the Target: Before the session sends data to the target, the stored
procedure runs.
This is useful for verifying target tables or disk space on the target system.
. Post-load of the Target: After the session sends data to the target, the stored
procedure runs.
This is useful for re-creating indexes on the database.

A connected stored procedure transformation runs only in Normal mode. A unconnected


stored
procedure transformation runs in all the above modes.
8. What is execution order in stored procedure transformation?
The order in which the Integration Service calls the stored procedure used in the
transformation,
relative to any other stored procedures in the same mapping. Only used when the
Stored Procedure
Type is set to anything except Normal and more than one stored procedure exists.
9. What is PROC_RESULT in stored procedure transformation?
PROC_RESULT is a system variable, where the output of an unconnected stored
procedure
transformation is assigned by default.
10. What are the parameter types in a stored procedure?
There are three types of parameters exist in a stored procedure:

. IN: Input passed to the stored procedure


. OUT: Output returned from the stored procedure
. INOUT: Defines the parameter as both input and output. Only Oracle supports this
parameter
type.

Informatica Interview Questions on Union Transformation

1. What is a union transformation?


A union transformation is used merge data from multiple sources similar to the
UNION ALL SQL
statement to combine the results from two or more SQL statements.

2. As union transformation gives UNION ALL output, how you will get the UNION
output?
Pass the output of union transformation to a sorter transformation. In the
properties of sorter
transformation check the option select distinct. Alternatively you can pass the
output of union
transformation to aggregator transformation and in the aggregator transformation
specify all ports as
group by ports.
3. What are the guidelines to be followed while using union transformation?
The following rules and guidelines need to be taken care while working with union
transformation:

. You can create multiple input groups, but only one output group.
. All input groups and the output group must have matching ports. The precision,
datatype, and
scale must be identical across all groups.
. The Union transformation does not remove duplicate rows. To remove duplicate
rows, you must
add another transformation such as a Router or Filter transformation.
. You cannot use a Sequence Generator or Update Strategy transformation upstream
from a
Union transformation.
. The Union transformation does not generate transactions.

4. Why union transformation is an active transformation?


Union is an active transformation because it combines two or more data streams into
one. Though
the total number of rows passing into the Union is the same as the total number of
rows passing out
of it, and the sequence of rows from any given input stream is preserved in the
output, the positions
of the rows are not preserved, i.e. row number 1 from input stream 1 might not be
row number 1 in
the output stream. Union does not even guarantee that the output is repeatable

Informatica Interview Questions on Update Strategy Transformation

1. What is an update strategy transformation?


Update strategy transformation is used to flag source rows for insert, update,
delete or reject within a
mapping. Based on this flagging each row will be either inserted or updated or
deleted from the
target. Alternatively the row can be rejected.
2. Why update strategy is an active transformation?
As update strategy transformation can reject rows, it is called as an active
transformation.
3. What are the constants used in update strategy transformation for flagging the
rows?

. DD_INSERT is used for inserting the rows. The numeric value is 0.


. DD_UPDATE is used for updating the rows. The numeric value is 1.
. DD_DELETE is used for deleting the rows. The numeric value is 2.
. DD_REJECT is used for rejecting the rows. The numeric value is 3.

4. If you place an aggregator after the update strategy transformation, how the
output of aggregator
will be affected?
The update strategy transformation flags the rows for insert, update and delete of
reject before you
perform aggregate calculation. How you flag a particular row determines how the
aggregator
transformation treats any values in that row used in the calculation. For example,
if you flag a row for
delete and then later use the row to calculate the sum, the integration service
subtracts the value
appearing in this row. If the row had been flagged for insert, the integration
service would add its
value to the sum.
5. How to update the target table without using update strategy transformation?
In the session properties, there is an option 'Treat Source Rows As'. Using this
option you can
specify whether all the source rows need to be inserted, updated or deleted.
6. If you have an update strategy transformation in the mapping, what should be the
value selected
for 'Treat Source Rows As' option in session properties?
The value selected for the option is 'Data Driven'. The integration service follows
the instructions
coded in the update strategy transformation.
7. If you have an update strategy transformation in the mapping and you did not
selected the value
'Data Driven' for 'Treat Source Rows As' option in session, then how the session
will behave?
If you do not choose Data Driven when a mapping contains an Update Strategy or
Custom
transformation, the Workflow Manager displays a warning. When you run the session,
the Integration
Service does not follow instructions in the Update Strategy transformation in the
mapping to
determine how to flag rows.
8. In which files the data rejected by update strategy transformation will be
written?
If the update strategy transformation is configured to Forward Rejected Rows then
the integration
service forwards the rejected rows to next transformation and writes them to the
session reject file. If
you do not select the forward reject rows option, the integration service drops
rejected rows and
writes them to the session log file. If you enable row error handling, the
Integration Service writes the
rejected rows and the dropped rows to the row error logs. It does not generate a
reject file.

Informatica Interview Questions on Transaction Control Transformation

1. What is a transaction control transformation?


A transaction is a set of rows bound by a commit or rollback of rows. The
transaction control
transformation is used to commit or rollback a group of rows.

2. What is the commit type if you have a transaction control transformation in the
mapping?
The commit type is "user-defined".

3. What are the different transaction levels available in transaction control


transformation?
The following are the transaction levels or built-in variables:

. TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction

change for this row. This is the default value of the expression.
. TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the
new transaction.
. TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the
transaction, and begins a new transaction. The current row is in the committed
transaction.
. TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a
new transaction, and writes the current row to the target. The current row is in
the new
transaction.
. TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back
the transaction, and begins a new transaction. The current row is in the rolled
back transaction.

Informatica Scenario Based Questions - Part 3

1. Consider the following product types data as the source.


Product_id, product_type
10, video
10, Audio
20, Audio
30, Audio
40, Audio
50, Audio
10, Movie
20, Movie
30, Movie
40, Movie
50, Movie
60, Movie
Assume that there are only 3 product types are available in the source. The source
contains 12
records and you dont know how many products are available in each product type.
Q1. Design a mapping to select 9 products in such a way that 3 products should be
selected from
video, 3 products should be selected from Audio and the remaining 3 products should
be selected
from Movie.
Solution:
Step1: Use sorter transformation and sort the data using the key as product_type.
Step2: Connect the sorter transformation to an expression transformation. In the
expression
transformation, the ports will be
product_id
product_type
V_curr_prod_type=product_type
V_count = IIF(V_curr_prod_type = V_prev_prod_type,V_count+1,1)
V_prev_prod_type=product_type
O_count=V_count
Step3: Now connect the expression transformaion to a filter transformation and
specify the filter
condition as O_count<=3. Pass the output of filter to a target table.
Q2. In the above problem Q1, if the number of products in a particular product type
are less than 3,
then you wont get the total 9 records in the target table. For example, see the
videos type in the
source data. Now design a mapping in such way that even if the number of products
in a particular
product type are less than 3, then you have to get those less number of records
from another porduc
types. For example: If the number of products in videos are 1, then the reamaining
2 records should
come from audios or movies. So, the total number of records in the target table
should always be 9.
Solution:
The first two steps are same as above.
Step3: Connect the expression transformation to a sorter transformation and sort
the data using the
key as O_count. The ports in soter transformation will be
product_id
product_type
O_count (sort key)
Step3: Discard O_count port and connect the sorter transformation to an expression
transformation.
The ports in expression transformation will be
product_id
product_type
V_count=V_count+1
O_prod_count=V_count
Step4: Connect the expression to a filter transformation and specify the filter
condition as
O_prod_count<=9. Connect the filter transformation to a target table.
2. Design a mapping to convert column data into row data without using the
normalizer
transformation.
The source data looks like
col1, col2, col3
a, b, c
d, e, f
The target table data should look like
Col
a
b
c
d
e
f
Solution:
Create three expression transformations with one port each. Connect col1 from
Source Qualifier to
port in first expression transformation. Connect col2 from Source Qualifier to port
in second
expression transformation. Connect col3 from source qualifier to port in third
expression
transformation. Create a union transformation with three input groups and each
input group should
have one port. Now connect the expression transformations to the input groups and
connect the
union transformation to the target table.
3. Design a mapping to convert row data into column data.
The source data looks like
id, value
10, a
10, b
10, c
20, d
20, e
20, f
The target table data should look like
id, col1, col2, col3
10, a, b, c
20, d, e, f
Solution:
Step1: Use sorter transformation and sort the data using id port as the key. Then
connect the sorter
transformation to the expression transformation.
Step2: In the expression transformation, create the ports and assign the
expressions as mentioned
below.
id
value
V_curr_id=id
V_count= IIF(v_curr_id=V_prev_id,V_count+1,1)
V_prev_id=id
O_col1= IIF(V_count=1,value,NULL)
O_col2= IIF(V_count=2,value,NULL)
O_col3= IIF(V_count=3,value,NULL)
Step3: Connect the expression transformation to aggregator transformation. In the
aggregator
transforamtion, create the ports and assign the expressions as mentioned below.
id (specify group by on this port)
O_col1
O_col2
O_col3
col1=MAX(O_col1)
col2=MAX(O_col2)
col3=MAX(O_col3)
Stpe4: Now connect the ports id, col1, col2, col3 from aggregator transformation to
the target table.
For more scenario based questions visit
Part1
Part2
Part3
Part4
Part5

Informatica Scenario Based Questions - Part 2

1. Consider the following employees data as source


employee_id, salary
10, 1000
20, 2000
30, 3000
40, 5000
Q1. Design a mapping to load the cumulative sum of salaries of employees into
target table?
The target table data should look like as
employee_id, salary, cumulative_sum
10, 1000, 1000
20, 2000, 3000
30, 3000, 6000
40, 5000, 11000
Solution:
Connect the source Qualifier to expression transformation. In the expression
transformation, create
a variable port V_cum_sal and in the expression editor write V_cum_sal+salary.
Create an output
port O_cum_sal and assign V_cum_sal to it.
Q2. Design a mapping to get the pervious row salary for the current row. If there
is no pervious row
exists for the current row, then the pervious row salary should be displayed as
null.
The output should look like as
employee_id, salary, pre_row_salary
10, 1000, Null
20, 2000, 1000
30, 3000, 2000
40, 5000, 3000
Solution:
Connect the source Qualifier to expression transformation. In the expression
transformation, create
a variable port V_count and increment it by one for each row entering the
expression transformation.
Also create V_salary variable port and assign the expression
IIF(V_count=1,NULL,V_prev_salary) to
it . Then create one more variable port V_prev_salary and assign Salary to it. Now
create output port
O_prev_salary and assign V_salary to it. Connect the expression transformation to
the target ports.
In the expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
V_salary=IIF(V_count=1,NULL,V_prev_salary)
V_prev_salary=salary
O_prev_salary=V_salary
Q3. Design a mapping to get the next row salary for the current row. If there is no
next row for the
current row, then the next row salary should be displayed as null.
The output should look like as
employee_id, salary, next_row_salary
10, 1000, 2000
20, 2000, 3000
30, 3000, 5000
40, 5000, Null
Solution:
Step1: Connect the source qualifier to two expression transformation. In each
expression
transformation, create a variable port V_count and in the expression editor write
V_count+1. Now
create an output port O_count in each expression transformation. In the first
expression
transformation, assign V_count to O_count. In the second expression transformation
assign
V_count-1 to O_count.
In the first expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
O_count=V_count
In the second expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
O_count=V_count-1
Step2: Connect both the expression transformations to joiner transformation and
join them on the
port O_count. Consider the first expression transformation as Master and second one
as detail. In
the joiner specify the join type as Detail Outer Join. In the joiner transformation
check the property
sorted input, then only you can connect both expression transformations to joiner
transformation.
Step3: Pass the output of joiner transformation to a target table. From the joiner,
connect the
employee_id, salary which are obtained from the first expression transformation to
the employee_id,
salary ports in target table. Then from the joiner, connect the salary which is
obtained from the
second expression transformaiton to the next_row_salary port in the target table.
Q4. Design a mapping to find the sum of salaries of all employees and this sum
should repeat for all
the rows.
The output should look like as
employee_id, salary, salary_sum
10, 1000, 11000
20, 2000, 11000
30, 3000, 11000
40, 5000, 11000
Solution:
Step1: Connect the source qualifier to the expression transformation. In the
expression
transformation, create a dummy port and assign value 1 to it.
In the expression transformation, the ports will be
employee_id
salary
O_dummy=1
Step2: Pass the output of expression transformation to aggregator. Create a new
port
O_sum_salary and in the expression editor write SUM(salary). Do not specify group
by on any port.
In the aggregator transformation, the ports will be
salary
O_dummy
O_sum_salary=SUM(salary)
Step3: Pass the output of expression transformation, aggregator transformation to
joiner
transformation and join on the DUMMY port. In the joiner transformation check the
property sorted
input, then only you can connect both expression and aggregator to joiner
transformation.
Step4: Pass the output of joiner to the target table.
2. Consider the following employees table as source
department_no, employee_name
20, R
10, A
10, D
20, P
10, B
10, C
20, Q
20, S
Q1. Design a mapping to load a target table with the following values from the
above source?
department_no, employee_list
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then
pass the output to the expression transformation. In the expression transformation,
the ports will be
department_no
employee_name
V_employee_list =
IIF(ISNULL(V_employee_list),employee_name,V_employee_list||','||employee_name)
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.
Q2. Design a mapping to load a target table with the following values from the
above source?
department_no, employee_list
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, P
20, P,Q
20, P,Q,R
20, P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then
pass the output to the expression transformation. In the expression transformation,
the ports will be
department_no
employee_name
V_curr_deptno=department_no
V_employee_list = IIF(V_curr_deptno! =
V_prev_deptno,employee_name,V_employee_list||','||employee_name)
V_prev_deptno=department_no
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.
Q3. Design a mapping to load a target table with the following values from the
above source?
department_no, employee_names
10, A,B,C,D
20, P,Q,R,S
Solution:
The first step is same as the above problem. Pass the output of expression to an
aggregator
transformation and specify the group by as department_no. Now connect the
aggregator
transformation to a target table.
For more scenario based questions visit

Scenario Based Interview Questions with Answers - Part 1

1. How to generate sequence numbers using expression transformation?


Solution:
In the expression transformation, create a variable port and increment it by 1.
Then assign the
variable port to an output port. In the expression transformation, the ports are:
V_count=V_count+1
O_count=V_count
2. Design a mapping to load the first 3 rows from a flat file into a target?
Solution:
You have to assign row numbers to each record. Generate the row numbers either
using the
expression transformation as mentioned above or use sequence generator
transformation.
Then pass the output to filter transformation and specify the filter condition as
O_count <=3
3. Design a mapping to load the last 3 rows from a flat file into a target?
Solution:
Consider the source has the following data.
col
a
b
c
d
e
Step1: You have to assign row numbers to each record. Generate the row numbers
using the
expression transformation as mentioned above and call the row number generated port
as O_count.
Create a DUMMY output port in the same expression transformation and assign 1 to
that port. So
that, the DUMMY output port always return 1 for each row.
In the expression transformation, the ports are
V_count=V_count+1
O_count=V_count
O_dummy=1
The output of expression transformation will be
col, o_count, o_dummy
a, 1, 1
b, 2, 1
c, 3, 1
d, 4, 1
e, 5, 1
Step2: Pass the output of expression transformation to aggregator and do not
specify any group by
condition. Create an output port O_total_records in the aggregator and assign
O_count port to it.
The aggregator will return the last row by default. The output of aggregator
contains the DUMMY
port which has value 1 and O_total_records port which has the value of total number
of records in
the source.
In the aggregator transformation, the ports are
O_dummy
O_count
O_total_records=O_count
The output of aggregator transformation will be
O_total_records, O_dummy
5, 1
Step3: Pass the output of expression transformation, aggregator transformation to
joiner
transformation and join on the DUMMY port. In the joiner transformation check the
property sorted
input, then only you can connect both expression and aggregator to joiner
transformation.
In the joiner transformation, the join condition will be
O_dummy (port from aggregator transformation) = O_dummy (port from expression
transformation)
The output of joiner transformation will be
col, o_count, o_total_records
a, 1, 5
b, 2, 5
c, 3, 5
d, 4, 5
e, 5, 5
Step4: Now pass the ouput of joiner transformation to filter transformation and
specify the filter
condition as O_total_records (port from aggregator)-O_count(port from expression)
<=2
In the filter transformation, the filter condition will be
O_total_records - O_count <=2
The output of filter transformation will be
col o_count, o_total_records
c, 3, 5
d, 4, 5
e, 5, 5
4. Design a mapping to load the first record from a flat file into one table A, the
last record from a flat
file into table B and the remaining records into table C?
Solution:
This is similar to the above problem; the first 3 steps are same. In the last step
instead of using the
filter transformation, you have to use router transformation. In the router
transformation create two
output groups.
In the first group, the condition should be O_count=1 and connect the corresponding
output group to
table A. In the second group, the condition should be O_count=O_total_records and
connect the
corresponding output group to table B. The output of default group should be
connected to table C.
5. Consider the following products data which contain duplicate rows.
A
B
C
C
B
D
B
Q1. Design a mapping to load all unique products in one table and the duplicate
rows in another
table.
The first table should contain the following output
A
D
The second target should contain the following output
B
B
B
C
C
Solution:
Use sorter transformation and sort the products data. Pass the output to an
expression
transformation and create a dummy port O_dummy and assign 1 to that port. So that,
the DUMMY
output port always return 1 for each row.
The output of expression transformation will be
Product, O_dummy
A, 1
B, 1
B, 1
B, 1
C, 1
C, 1
D, 1
Pass the output of expression transformation to an aggregator transformation. Check
the group by
on product port. In the aggreagtor, create an output port O_count_of_each_product
and write an
expression count(product).
The output of aggregator will be
Product, O_count_of_each_product
A, 1
B, 3
C, 2
D, 1
Now pass the output of expression transformation, aggregator transformation to
joiner
transformation and join on the products port. In the joiner transformation check
the property sorted
input, then only you can connect both expression and aggregator to joiner
transformation.
The output of joiner will be
product, O_dummy, O_count_of_each_product
A, 1, 1
B, 1, 3
B, 1, 3
B, 1, 3
C, 1, 2
C, 1, 2
D, 1, 1
Now pass the output of joiner to a router transformation, create one group and
specify the group
condition as O_dummy=O_count_of_each_product. Then connect this group to one table.
Connect
the output of default group to another table.
Q2. Design a mapping to load each product once into one table and the remaining
products which
are duplicated into another table.
The first table should contain the following output
A
B
C
D
The second table should contain the following output
B
B
C
Solution:
Use sorter transformation and sort the products data. Pass the output to an
expression
transformation and create a variable port,V_curr_product, and assign product port
to it. Then create
a V_count port and in the expression editor write
IIF(V_curr_product=V_prev_product, V_count+1,1).
Create one more variable port V_prev_port and assign product port to it. Now create
an output port
O_count port and assign V_count port to it.
In the expression transformation, the ports are
Product
V_curr_product=product
V_count=IIF(V_curr_product=V_prev_product,V_count+1,1)
V_prev_product=product
O_count=V_count
The output of expression transformation will be
Product, O_count
A, 1
B, 1
B, 2
B, 3
C, 1
C, 2
D, 1
Now Pass the output of expression transformation to a router transformation, create
one group and
specify the condition as O_count=1. Then connect this group to one table. Connect
the output of
default group to another table.

Informatica Interview Questions on Router Transformation

1. What is a router transformation?


A router is used to filter the rows in a mapping. Unlike filter transformation, you
can specify one or
more conditions in a router transformation. Router is an active transformation.
2. How to improve the performance of a session using router transformation?
Use router transformation in a mapping instead of creating multiple filter
transformations to perform
the same task. The router transformation is more efficient in this case. When you
use a router
transformation in a mapping, the integration service processes the incoming data
only once. When
you use multiple filter transformations, the integration service processes the
incoming data for each
transformation.
3. What are the different groups in router transformation?
The router transformation has the following types of groups:

. Input
. Output

4. How many types of output groups are there?


There are two types of output groups:

. User-defined group
. Default group

5. Where you specify the filter conditions in the router transformation?


You can creat the group filter conditions in the groups tab using the expression
editor.
6. Can you connect ports of two output groups from router transformation to a
single target?
No. You cannot connect more than one output group to one target or a single input
group
transformation.

Convert String to Ascii Values in Oracle

The below query converts a string to ascii characters.


select replace(substr(dump('oracle'),instr(dump('oracle'),': ')+2),',') from dual;
The output of this query is 1111149799108101.

Oracle Dump Function

The Dump function returns a varchar2 value that includes the data type code, length
in bytes and the
internal representation of the expression.
The syntax of dump function is
dump(expression, [return_format],[start_position],[length])
Data Mining
The start_position and length indicates which portion of the internal
representation to display.
The return_format specifies the return format. The various return formats and its
description are
provided below
return format value, description
8 octal notation
10 decimal notation
16 hexadecimal notation
17 single characters
1008 octal notation with the character set name
1010 decimal notation with the character set name
1016 hexadecimal notation with the character set name
1017 single characters with the character set name
Example: dump('oracle') --Typ=96 Len=6: 111,114,97,99,108,101

Data Mining

Data mining is the process of finding patterns from large data sets and analyzing
data from different
perspectives. It allows business users to analyze data from different angles and
summarize the
relationships identified. Data mining can be useful in increasing the revenue and
cut costs.
Example:
In a supermarket, the persons who bought the tooth brush on Sundays also bought
tooth paste. This
information can be used in increasing the revenue by providing an offer on tooth
brush and tooth
paste. There by selling more number of products (tooth paste and tooth brush) on
Sundays.
Data mining process:
Data mining analyzes relationships and patterns in the stored data based on user
queries. Data
mining involves four tasks.

. Association: Find the relationship between the variables. For example in retail a
store, we can
determine which products are bought together frequently and this information can be
used to
market these products.
. Clustering: Identifying the logical relationship in the data items and grouping
them. For example
in a retail store, a tooth paste, tooth brush can be logically grouped.
. Classifying: Involves in applying a known pattern to the new data.

Data Mart
Data Mart
Data Profiling

A data mart is a subset of the data warehouse, which concentrates on a specific


business unit. A
data mart, may or may not derived from a data warehouse and is aimed at meeting an
immediate
requirement.
Data marts may or may not dependent on other data marts in an organization. If the
data marts have
conformed dimensions and facts, then these data marts will be related to each
other.
Benefits of data mart:

. Frequently needed data can be accessed very easily.


. Performance improvement.
. Data marts can be created easily.
. Lower cost in implementing data mart than a data warehouse.

Dependent data mart:


A dependent data mart is a subset (either logical subset or physical subset) of a
large data
warehouse which is isolated. Due to frequent refreshment of data, performance,
security; the
dependent data mart is isolated.

Data Profiling

Data profiling is the process of examining the quality of data available in the
data source (database
or file) and collecting statistics and information about the data. A very clean
data source that has
been well maintained before it reaches the data warehouse requires minimal
transformations and
human intervention to load the data into the facts and dimensions. A good data
profiling system can
process very large amounts of data with ease.
A dirty data in the source requires

. Elimination of some input fields.


. Flagging missing data.
. Automatic replacement of corrupted data.
. Human intervention at row level.

How to do data profiling


Data profiling is about finding the metadata information about the source like data
type, length,
discrete values, uniqueness, occurrence of null values, typical string patterns,
and abstract type
recognition. This metadata can be used to find the problems in the source data such
as illegal
values, misspelling, missing values, varying value representation and duplicates.
We can also
Data Warehouse Design Approaches
compute statistics like minimum, maximum, mean, mode, percentile, standard
deviation, frequency,
variation, and aggregates on the source data. In addition to these, data profiling
can also be done on
each individual column to know about the type and frequency distribution of
different values.
Benefits of data profiling

. Improves data quality.


. Provides accurate data.
. Reduces project cost.

Data Warehouse Design Approaches

Data warehouse design is one of the key technique in building the data warehouse.
Choosing a right
data warehouse design can save the project time and cost. Basically there are two
data warehouse
design approaches are popular.
Bottom-Up Design:
In the bottom-up design approach, the data marts are created first to provide
reporting capability. A
data mart addresses a single business area such as sales, Finance etc. These data
marts are then
integrated to build a complete data warehouse. The integration of data marts is
implemented using
data warehouse bus architecture. In the bus architecture, a dimension is shared
between facts in
two or more data marts. These dimensions are called conformed dimensions. These
conformed
dimensions are integrated from data marts and then data warehouse is built.
Advantages of bottom-up design are:

. This model contains consistent data marts and these data marts can be delivered
quickly.
. As the data marts are created first, reports can be generated quickly.
. The data warehouse can be extended easily to accommodate new business units. It
is just
creating new data marts and then integrating with other data marts.

Disadvantages of bottom-up design are:

. The positions of the data warehouse and the data marts are reversed in the
bottom-up
approach design.
Top-Down Design:
In the top-down design approach the, data warehouse is built first. The data marts
are then created
from the data warehouse.
Advantages of top-down design are:
Extraction Methods in Data Warehouse
. Provides consistent dimensional views of data across data marts, as all data
marts are loaded
from the data warehouse.
. This approach is robust against business changes. Creating a new data mart from
the data
warehouse is very easy.

Disadvantages of top-down design are:

. This methodology is inflexible to changing departmental needs during


implementation phase.
. It represents a very large project and the cost of implementing the project is
significant.

Extraction Methods in Data Warehouse

The extraction methods in data warehouse depend on the source system, performance
and
business requirements. There are two types of extractions, Logical and Physical. We
will see in
detail about the logical and physical designs.
Logical extraction
There are two types of logical extraction methods:
Full Extraction: Full extraction is used when the data needs to be extracted and
loaded for the first
time. In full extraction, the data from the source is extracted completely. This
extraction reflects the
current data available in the source system.
Incremental Extraction: In incremental extraction, the changes in source data need
to be tracked
since the last successful extraction. Only these changes in data will be extracted
and then loaded.
These changes can be detected from the source data which have the last changed
timestamp. Also
a change table can be created in the source system, which keeps track of the
changes in the source
data.
One more method to get the incremental changes is to extract the complete source
data and then do
a difference (minus operation) between the current extraction and last extraction.
This approach
causes a performance issue.
Physical extraction
The data can be extracted physically by two methods:
Online Extraction: In online extraction the data is extracted directly from the
source system. The
extraction process connects to the source system and extracts the source data.
Data Warehouse
Offline Extraction: The data from the source system is dumped outside of the source
system into a
flat file. This flat file is used to extract the data. The flat file can be created
by a routine process daily.

Logical and Physical Design of Data Warehouse

Logical design:
Logical design deals with the logical relationships between objects. Entity-
relationship (ER) modeling
technique can be used for logical design of data warehouse. ER modeling involves
identifying the
entities (important objects), attributes (properties about objects) and the
relationship among them.
An entity is a chunk of information, which maps to a table in database. An
attribute is a part of an
entity, which maps to a column in database.
A unique identifier can be used to make sure the data is consistent.
Physical design:
Physical design deals with the effective way of storing and retrieving the data. In
the physical design,
the logical design needs to be converted into a description of the physical
database structures.
Physical design involves creation of the database objects like tables, columns,
indexes, primary
keys, foreign keys, views, sequences etc.

Data Warehouse

A data warehouse is a relational database that is designed for query and business
analysis rather
than for transaction processing. It contains historical data derived from
transaction data. This
historical data is used by the business analysts to understand about the business
in detail.
A data warehouse should have the following characteristics:
Subject oriented: A data warehouse helps in analyzing the data. For example, to
know about a
company's sales, a data warehouse needs to build on sales data. Using this data
warehouse we can
find the last year sales. This ability to define a data warehouse by subject
(sales) makes it a subject
oriented.
Integrated: Bringing data from different sources and putting them in to a
consistent format. This
includes resolving the units of measures, naming conflicts etc.
Non volatile: Once the data enters into the data warehouse, the data should not be
updated.
Time variant: To analyze the business, analysts need large amounts of data. So, the
data
warehouse should contain historical data.

Oracle Set Operators

The set operators in oracle are UNION, UNION ALL, INTERSECT, MINUS. These set
operators
allow us to combine more than one select statements and only one result set will be
returned.
UNION ALL

. UNION ALL selects all rows from all the select statements
. UNION ALL output is not sorted.
. Distinct keyword cannot be used in select statements.

UNION

. UNION is very similar to UNION ALL, but it suppresses duplicate rows from all the
select
statements.

INTERSECT

. INTERSECT returns the rows that are found common is all select statements.

MINUS

. MINUS returns all the rows from the first select statement except those rows
which are available
or duplicated in the following select statements.
. All the columns in the where clause must be in the select clause for the MINUS
operator to
work.

Common points to all the set operators:

. Only one ORDER BY clause should be present and it should appear at the very end
of the
statement. The ORDER by clause will accept column names, aliases from the first
select
statement.
. Duplicate rows are automatically eliminated except in UNION ALL
. Column names, aliases from the first query will appear in the result set.
. By default the output is sorted in ascending order of the first column of the
first select statement
except for UNION ALL.
Informatica Performance Improvement Tips

Following standards/guidelines can improve the overall performance:

. Use Source Qualifier if the Source tables reside in the same schema.
. Make use of Source Qualifer "Filter" Properties if the Source type is Relational.

. If the subsequent sessions are doing lookup on the same table, use persistent
cache in the first
session. Data remains in the Cache and available for the subsequent session for
usage.
. Use flags as integer, as the integer comparison is faster than the string
comparison.
. Use tables with lesser number of records as master table for joins.
. While reading from Flat files, define the appropriate data type instead of
reading as String and
converting.
. Have all Ports that are required connected to Subsequent Transformations else
check whether
we can remove these ports.
. Suppress ORDER BY using the '--' at the end of the query in Lookup
Transformations.
. Minimize the number of Update strategies.
. Group by simple columns in transformations like Aggregate, Source Qualifier.
. Use Router transformation in place of multiple Filter transformations.
. Turn off the Verbose Logging while moving the workflows to Production
environment.
. For large volume of data drop index before loading and recreate indexes after
load.
. For large of volume of records Use Bulk load Increase the commit interval to a
higher value
large volume of data.
. Set 'Commit on Target' in the sessions.

Convert String to Initcap in unix

The following unix command converts the first letter in a string to upper case and
the remaining
letters to lower case.
echo apple | awk '{print toupper(substr($1,1,1)) tolower(substr($1,2))}'

Convert lower case to upper case

'tr' command will convert one set of characters to another set. The following
command converts
lower case alphabets in to upper case.
echo "apple" | tr [a-z] [A-Z]

Similarly to convert from upper case to lower case, use the following command
echo "APPLE" | tr [A-Z] [a-z]

For more details on tr look at "man tr".

Redirect output to multiple files

The tee command in unix writes the output to multiple files and also displays the
output on terminal.
Example:
date | tee -a file1 file2 file3
For more details look at "man tee"

Group Concat function in mysql

This function is used to concatenate multiple rows into a single column in mysql.
This function returns a string result with the concatenated non-NULL values from a
group. It returns
NULL if there are no non-NULL values.
Syntax of Mysql Group Concat Function

GROUP_CONCAT ([DISTINCT] expr [,expr ...]

[ORDER BY {unsigned_integer | col_name | expr}

[ASC | DESC] [,col_name ...]]

[SEPARATOR str_val])

Example: As an example consider the teachers table with the below data.

Table Name: Teachers

teacher_id subjects

------------------

10 English
10 Maths

20 Physics

20 Social

After concatenating the subjects of each teacher, the output will look as

teacher_id subjects_list

------------------

10 English,Maths

20 Physics,Social

The Mysql Sql query to get the result is

select teacher_id, group_concat( subjects) from subjects group by teacher

Creating a Non Reusable Object from Reusable Object

Q) How to create a non-reusable transformation or session or task from a reusable


transformation or
session or task?
I still remember my first project in which i created so many reusable
transformations and developed
a mapping. My project lead reviewed the code and told me that you created
unnecessary reusable
transformation change them to non reusable transformations. I created non reusable
transformations
and re-implemented the entire logic. It took almost one day for me to implement the
code. Still so
many new informatica developers will do the same mistake and re implement the
entire logic.
I found an easy way to create a non-reusable transformation from a reusable
transformation. Follow
the below steps to create a non-reusable transformation or session or task from a
reusable
transformation or session or task in informatica is

1. Go to the Navigator which is on the left side.


2. Select the reusable transformation or session or task which you want to convert
to non resuable
with the mouse.
3. Drag the object (transformation/session/task) to the work-space and just before
leaving the
object on the work-space hold the ctrl key and then release the object.

Now you are done with creating a non-reusable transformation or session or task.
Normalizer Transformation Error - Informatica

Normalizer transformation is used to convert the data in multiple columns into


different rows.
Basically the normalizer transformation converts the denormalized data in a table
in to a normalized
table.
Normalizer Transformation Error
Getting the following Error for the Normalizer transformation in mapping when
pivoting the columns
in to Rows

TT_11054 Normalizer Transformation: Initialization Error: [Cannot match


OFOSid with IFOTid.]

How to fix the Normalizer Transformation Error?


Solution:
Follow the below steps to avoid this error.

1. There should be no unconnected input ports to the Normalizer transformation.


2. If the Normalizer has an OCCURS in it, make sure number of input ports matches
the number of
OCCURS.

Query to Generate Duplicate Rows in Oracle

Some times we want to duplicate each row based on a column value. We will see how
to solve this
problem with an example. Assume that we have a products table, which has the
product name and
the number of products sold.
Table: products

product_name

products_sold

Now we want to duplicate each row based on the products_sold field. So, that
product A record will
repeat 2 times and product B record will repeat 3 times as shown below:
product_name

products_sold
A

The following query will generate the duplicate records that we need
SELECT product_name,
products_sold
FROM products p,
(SELECT rownum repeat FROM dual
CONNECT BY LEVEL<=
(SELECT MAX(products_sold) from products )
)r
WHERE p.products_sold>=[Link];

Converting Rows to Columns

Lets see the conversion of rows to columns with an example. Suppose we have a
products table
which looks like
Table: products

product_id

product_name

AAA

BBB

CCC
2

PPP

QQQ

RRR

Now we want to convert the data in the products table as


product_id

prodcut_name_1

prodcut_name_2

prodcut_name_3

AAA

BBB

CCC

PPP

QQQ

RRR

The following query converts the rows to columns:


SELECT product_id,
MAX(DECODE(product_id,1,product_name,NULL))
product_name_1,
MAX(DECODE(product_id,1,product_name,NULL))
product_name_2,
MAX(DECODE(product_id,1,product_name,NULL))
product_name_3
FROM products
GROUP BY product_id;

Query to Generate Sequence numbers 1 to n

In oracle we can generate sequence numbers from 1 to n by using the below query:
SELECT rownum
FROM dual
CONNECT BY LEVEL<=n;
Replace n with a number.
Flat file header row, footer row and detail rows to multiple tables

Assume that we have a flat file with header row, footer row and detail rows. Now
Lets see how to
load header row into one table, footer row into other table and detail rows into
another table just by
using the transformations only.
First pass the data from source qualifier to an expression transformation. In the
expression
transformation assign unique number to each row (assume exp_count port). After that
pass the data
from expression to aggregator. In the aggregator transformation don't check any
group by port. So
that the aggregator will provide last row as the default output (assume agg_count
port).
Now pass the data from expression and aggregator to joiner transformation. In the
joiner select the
ports from aggregator as master and the ports from expression as details. Give the
join condition on
the count ports and select the join type as master outer join. Pass the joiner
output to a router
transformation and create two groups in the router. For the first group give the
condtion as
exp_count=1, which gives header row. For the second group give the condition as
exp_count=agg_count, which gives the footer row. The default group will give the
detail rows.
Click Here to download the mapping xml code
Note: Check the sorted input option in the joiner properties. Otherwise you can't
connect the data
from expression and aggregator.

Converting Columns to Rows in Oracle

Lets see the conversion of columns to rows with an example. Suppose we have a table
which
contains the subjects handled by each teacher. The table looks like
Table: teachers

teacher_id

subject1

subject2

subject3

maths

physics

english

social
science

drawing

Now we want to convert the data in the teachers table as

teacher_id

subject

maths

physics

english

social
2

science

drawing

To achieve this we need each row in teachers table to be repeated 3 times (number
of subject
columns). The following query converts the columns into rows:
SELECT teacher_id,
CASE pivot
WHEN 1
THEN subject1
WHEN 2
THEN subject2
WHEN 3
THEN subject3
ELSE NULL
END subject
FROM teachers,
(SELECT rownum pivot from dual
CONNECT BY LEVEL <=3)

New Features of Informatica-9

1. Informatica 9 supports data integration for the cloud as well as on premise. You
can integrate the
data in cloud applications, as well as run Informatica 9 on cloud infrastructure.
2. Informatica analyst is a new tool available in Informatica 9.
3. There is architectural difference in Informatica 9 compared to previous version.

4. Browser based tool for business analyst is a new feature.


5. Data steward is a new feature.
6. Allows unified administration with a new admin console that enables you to
manage power centre
and power exchange from the same console.
7. Powerful new capabilities for data quality.
8. Single admin console for data quality, power centre, power exchange and data
services.
9. In Informatica 9, Informatica data quality (IDQ) has been further integrated
with the Informatica
Platform and performance, manageability and reusability have all been significantly
enhanced.
10. The mappings rules are shared between the browser based tool for analysts and
the eclipse
based development leveraging unified metadata underneath.
11. The data services capabilities in Informatica 9 , both over sql and web
services ,can be used for
real time dash boarding.
12. Informatica data quality provides world wide address validation support with
integrated
geocoding.
13. The ability to define rules and view and run profiles is available in both the
Informatica developer
(Thick client) and Informatica analyst (browser based tool-Thin client).these tools
sit on a unified
metadata infrastructure. Both tools incorporate security features like
authentication and authorization
ensuring..
14. The developer tool is now eclipse based and supports both data integration and
data quality for
enhanced productivity. It provides browser based tool for analysts to support the
types of tasks they
engage in, such as profiling data, specifying and validating rules & monitoring
data quality.
15. There will a velocity methodology. Soon it�s going to introduce on I9.
16. Informatica has the capability to pull data from IMS, DB2 on series and series
and from other
several other legacy systems (Mainframe) environment like VSAM, Datacom, and IDMS
etc.
17. There are separate tools available for different roles. The Mapping architect
for Vision tool is
designed for architects and developers to create templates for common data
integration patterns
saving developer�s tremendous amount of time.
18. Informatica 9 does not include ESB infrastructure.
19. Informatica supports open interfaces such as web services and can integrate
with other tools
that support these as well including BPM tool.
20. Informatica 9 complements existing BI architectures by providing immediate
access to data
through data virtualization, which can supplement the data in existing data
warehouse and
operational data store.
21. Informatica 9 supports profiling of Mainframe data. Leveraging the Informatica
platform�s
connectivity to Mainframe sources.
22. Informatica 9 will continue support feature of running the same workflow
simultaneously.
23. Eclipse based environment is build for developers.
24. Browser based tool is a fully functional interface for business analysts.
Data Warehouse Dimensional Modelling Types of Schemas
[Link]
[Link]
25. Dashboards are designed for business executives.
26. There are 3 interfaces through which these capabilities can be accessed.
Analyst tool is a
browsed tool for analyst and stewards. Developers can use the eclipse based
developer tool. Line of
business managers can view data quality scorecards.

Data Warehouse Dimensional Modelling (Types of Schemas)

There are four types of schemas are available in data warehouse. Out of which the
star schema is
mostly used in the data warehouse designs. The second mostly used data warehouse
schema is
snow flake schema. We will see about these schemas in detail.
Star Schema:
A star schema is the one in which a central fact table is sourrounded by
denormalized dimensional
tables. A star schema can be simple or complex. A simple star schema consists of
one fact table
where as a complex star schema have more than one fact table.

Snow Flake Schema:


A snow flake schema is an enhancement of star schema by adding additional
dimensions. Snow
flake schema are useful when there are low cardinality attributes in the
dimensions.
[Link]
[Link]
Types of Facts in Data Warehouse
[Link]
[Link]

Galaxy Schema:
Galaxy schema contains many fact tables with some common dimensions (conformed
dimensions).
This schema is a combination of many data marts.

Fact Constellation Schema:


The dimensions in this schema are segregated into independent dimensions based on
the levels of
hierarchy. For example, if geography has five levels of hierarchy like teritary,
region, country, state
and city; constellation schema would have five dimensions instead of one.

Types of Facts in Data Warehouse

A fact table is the one which consists of the measurements, metrics or facts of
business process.
These measurable facts are used to know the business value and to forecast the
future business.
The different types of facts are explained in detail below.
Types of Dimensions in data warehouse
Additive:
Additive facts are facts that can be summed up through all of the dimensions in the
fact table. A
sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in
the fact table,
but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not
through the
time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact
table.
Eg: Facts which have percentages, ratios calculated.

Factless Fact Table:

In the real world, it is possible to have a fact table that contains no measures or
facts. These tables
are called "Factless Fact tables".
Eg: A fact table which has only product key and date key is a factless fact. There
are no measures in
this table. But still you can get the number products sold over a period of time.

A fact tables that contain aggregated facts are often called summary tables.

Types of Dimensions in data warehouse

A dimension table consists of the attributes about the facts. Dimensions store the
textual
descriptions of the business. With out the dimensions, we cannot measure the facts.
The different
types of dimension tables are explained in detail below.
Conformed Dimension:
Concatenating multiple rows into a single column dynamically - Oracle
Conformed dimensions mean the exact same thing with every possible fact table to
which they are
joined.
Eg: The date dimension table connected to the sales facts is identical to the date
dimension
connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure
that provides a
convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the
fact table we
need to maintain two keys referring to these dimensions. Instead of that create a
junk dimension
which has all the combinations of gender and marital status (cross join gender and
marital status
table and create a junk table). Now we can maintain only one key in the fact table.

Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesn't have its own
dimension table.
Eg: A transactional code in a fact table.
Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are
called role-
playing dimensions. For example, a date dimension can be used for �date of sale",
as well as "date
of delivery", or "date of hire".

Concatenating multiple rows into a single column dynamically - Oracle

Q) How to concatenate multiple rows of a column in a table into a single column?


I have to concatenate multiple rows to a single column. For example consider the
below teachers
table.

Table Name: Teacher

Teacher_id subject_name
-----------------------

1 Biology

1 Maths

1 Physics

2 English

2 Social

The above table is a normalized table containing the subjects and teacher id. We
will denormalize
the table, by concatenating the subjects of each teacher into a single column and
thus preserving
the teacher id as unique in the output. The output data should look like as below

teacher_id subjects_list

-------------------------------

1 Biology|Maths|Physics

2 English|Social

How to achieve this?


Solution:
We can concatenate multiple rows in to a single column dynamically by using the
Hierarchical query.
The SQL query to get the result is

SELECT teacher_id,

SUBSTR(SYS_CONNECT_BY_PATH(subject_name, '|'),2)

subjects_list

FROM

SELECT teacher_id,
subject_name,

COUNT(*) OVER (PARTITION BY teacher_id) sub_cnt,

ROW_NUMBER () OVER (PARTITION BY teacher_id

ORDER BY subject_name) sub_seq

FROM teachers

) A

WHERE sub_seq=sub_cnt

START WITH sub_seq=1

CONNECT BY prior sub_seq+1=sub_seq

AND prior teacher_id=teacher_id

You might also like