Folks Talk
Folks Talk
SSH client utility in unix or linux server is used to logging into a remote host
and execute commands
on the remote machine. The rlogin and rsh commands can also be used to login into
the remote
machine. However these are not secure. The ssh command provides a secure connection
between
two hosts over a insecure network.
The syntax ssh command is
username@remote-server password:
remote-server:[~]>
Alternatively you can use the below ssh command for connecting to remote host:
username@remote-server password:
remote-server:[~]>
Note: If you are logging for the first time, then it will prints a message that
host key not found and
you can give yes to continue. The host key of the remote server will be cached and
added to the
.ssh2/hostkeys directory in your home directory. From second time onwards you just
need to enter
the password.
2. Logging out from remote server
Simply enter the exit command on the terminal to close the connection. This is
shown below:
remote-server:[~]>exit
logout
localhost:[~]>
[Link]
[Link]
[Link]
The ssh command connects to the remote host, runs the ls command, prints the output
on the local
host terminal and exits the connection from remote host.
Let see whether the ls command actually displayed the correct result or not by
connecting to the
remote host.
user@remotehost password:
remotehost:[~]> cd test
remotehost:[~/test]> ls
[Link]
[Link]
[Link]
4. Version of the SSH command
We can find the version of SSH installed on the unix system using the -V option to
the ssh. This is
shown below:
> ssh -V
ssh -v user@remote-host
..........
..........
To copy the file from remote hosts /usr/local/bin/ directory to local hosts current
directory, run the
below scp command.
scp user@remote-host:/usr/local/bin/[Link] .
WC command in unix or linux is used to find the number of lines, words and
characters in a file. The
syntax of wc command is shown below:
wc [options] filenames
Let see how to use the wc command with few examples. Create the following file in
your unix or linux
operating system.
Oracle Storage
debian server
Oracle backup server
WC Command Examples:
1. Printing count of lines
This is the most commonly used operation to find the number of lines from a file.
Run the below
command to display the number of lines:
wc -l unix_wc.bat
5 unix_wc.bat
Here in the output, the first field indicates count and second field is the
filename
2. Displaying the number of words.
Just use the -w option to find the count of words in a file. This is shown below:
wc -w unix_wc.bat
13 unix_wc.bat
> wc -c unix_wc.bat
92 unix_wc.bat
> wc -m unix_wc.bat
92 unix_wc.bat
wc -L unix_wc.bat
23 unix_wc.bat
In this example, the second line is the longest line with 23 characters.
5. Print count of lines, words and characters.
If you dont specify any option to the wc command, by default it prints the count of
lines, words and
characters. This is shown below:
wc unix_wc.bat
5 13 92 unix_wc.bat
6. Wc help
For any help on the wc command, just run the wc --help on the unix terminal.
SCP stands for secure copy is used to copy data (files or directories) from one
unix or linux system
to another unix or linux server. SCP uses secured shell (ssh) to transfer the data
between the
remote hosts. The features of SCP are:
. User is the one who have the permissions to access the files and directories.
User should
have read permissions if it is a source and write permissions if it is the
destination.
. From_Host: hostname or Ip address where the source file or directory resides.
This is
optional if the from host is the host where you are running the scp command.
. Source_File: Files or directories to be copied to the destination.
. To_Host: Destination host where you want to copy the files. You can omit this
when you
want to copy the files to the host where you are issuing the scp command.
. Destination_File: Name of the file or directory in the target host.
This command copies the file [Link] from current directory to the
/var/tmp directory.
2. Copy file from local host to remote server
This is most frequently used operation to transfer files in unix system.
This command connects to the remote host and copies the specified file to the
/remote/directory/.
3. Copy files from remote host to local server.
This operation is used when taking backup of the files in remote server.
scp user@remotehost:/usr/backup/oracle_backup.dat .
This command copies the oracle backup file in the remote host to the current
directory.
4. Copying files between two remote servers
The scp command can also be used to copy files between two remote hosts.
scp source_user@source_remote_host:/usr/bin/mysql_backup.sh
target_user@target_remote_host:/var/tmp/
The above command copies the mysql bakup shell script from the source remote host
the /var/tmp
directory of target remote host.
5. Copying a directory.
To copy all the files in a directory, use the -r option with the scp command. This
makes the scp
command to copy the directory recursively.
The above command copies the directory from local server to the remote host.
6. Improving performance of scp command
By default the scp command uses the Triple-DES cipher/AES-128 to encrypt the data.
Using the
blowfish or arcfour encryption will improve the performance of the scp command.
7. Limit bandwidth
You can limit the bandwidth used by the scp command using the -l option.
Xargs command in unix or linux operating system is used to pass the output of one
command as an
argument to another command. Some of the unix or linux commands like ls and find
produces a long
list of filenames. We want to do some operation on this list of file names like
searching for a pattern,
removing and renaming files etc. The xargs command provide this capability by
taking the huge list
of arguments as input , divides the list into small chunks and then passes them as
arguments to
other unix commands.
Unix Xargs Command Examples:
1. Renaming files with xargs
We have to first list the files to be renamed either by using the ls or find
command and then pipe the
output to xargs command to rename the files. First list the files which end with
".log" using the ls
command.
ls *.log
[Link] [Link]
> ls *_bkp
oracle.log_bkp storage.log_bkp
You can see how the log files are renamed with backup (bkp) suffix. Here the option
"i" tells the
xargs command to replace the {} with the each file returned by the ls command.
2. Searching for a pattern
We can combine the grep command with xargs to search for a pattern in a list of
files returned by
another unix command (ls or find). Let�s list out all the bash files in the current
directory with the find
command in unix.
./sql_server.bash
./mysql_backup.bash
./oracle_backup.bash
Now we grep for the "echo" statements from the list of files returned by the find
command with the
help of xargs. The command is shown below:
If you don�t use xargs and piped the output of find command to grep command
directly, then the grep
command treats each file returned by the find command as a line of string and
searches for the word
"echo" in that line rather in that file.
3. Removing files using xargs
We can remove the temporary files in a directory using the rm command along with
the xargs
command. This is shown below:
ls "*.tmp" | xargs rm
ls -1
[Link]
online_backup.dat
mysql_storage.bat
We can convert this multi-line output to single line output using the xargs
command. This is shown
below:
ls -1 | xargs
You can see that grep command is treating oracle as separate file and storage as
separate file. This
is because of xargs treats space as delimiter. To avoid this kind of errors use the
-i option with
braces as shown in below:
If you want to know what command the xargs is executing use the -t option with
xargs. This will print
the command on the terminal before executing it.
6. Passing subset of arguments
We can pass only a subset of arguments from a long list of arguments using the -n
option with xargs
command. This is shown in below.
> ls -1
backup
mysql
network
online
oracle
storage
wireless
wireless
You can see from the above output that 3 arguments are passed at a time to the echo
statement.
Important Notes on Xargs Command:
1. Xargs directly cannot handle files which contain new lines or spaces in their
names. To handle
this kind of files use the -i option with xargs command. Another way to handle
these characters is to
treat the new line or spaces as null characters using th -0 option with xargs.
However this requires
that the input to xargs should also use the null as separator. An example is shown
below
We use database procedures to generate the date dimension for data warehouse
applications. Here
i am going to show you how to generate the date dimension in informatica.
Let see how to generate list out all the days between two given dates using oracle
sql query.
FROM dual
to_date('31-DEC-2000','DD-MON-YYYY') -
to_date('01-JAN-2000','DD-MON-YYYY') + 1
);
Output:
CALENDAR_DATE
-------------
1/1/2000
1/2/2000
1/3/2000
.
.
12/31/2000
Now we can apply date functions on the Calendar date field and can derive the rest
of the columns
required in a date dimension.
We will see how to get the list of days between two given dates in informatica.
Follow the below
steps for creating the mapping in informatica.
. Create a source with two ports ( Start_Date and End_Date) in the source analyzer.
. Create a new mapping in the mapping designer Drag the source definition into the
mapping.
. Create the java transformation in active mode.
. Drag the ports of source qualifier transformation in to the java transformation.
. Now edit the java transformation by double clicking on the title bar and go to
the "Java Code"
tab. Here you will again find sub tabs. Go to the "Import Package" tab and enter
the below java
code:
import [Link];
import [Link];
import [Link];
import [Link];
. Not all these packages are required. However i included just in case if you want
to apply any
formatting on dates. Go to the "On Input Row" tab and enter the following java
code:
{
if (i == 1)
generateRow();
else
generateRow();
generateRow();
. Compile the java code by clicking on the compile. This will generate the java
class files.
. Connect only the Start_Date output port from java transformation to expression
transformation.
. Connect the Start_Date port from expression transformation to target and save the
mapping.
. Now create a workflow and session. Enter the following oracle sql query in the
Source SQL
Query option:
to_date('31-DEC-2000','DD-MON-YYYY') End_Date
FROM DUAL;
Save the workflow and run. Now in the target you can see the list of dates loaded
between the two
given dates.
Note1: I have used relational table as my source. You can use a flat file instead.
Note2: In the expression transformation, create the additional output ports and
apply date functions
on the Start_Date to derive the data required for date dimension.
Bash Shell Script to Read / Parse Comma Separated (CSV) File - Unix / Linux
Q) How to parse CVS files and print the contents on the terminal using the bash
shell script in Unix
or Linux system?
It is the most common operation in Unix system to read the data from a delimited
file and applying
some operations on the data. Here we see how to read the Comma separated value
(CSV) file using
the while loop in shell script and print these values on the Unix terminal.
Consider the below CSV file as an example:
This file contains two fields. First field is operating system and the second field
contains the hosting
server type. Let see how to parse this CVS file with simple bash script shown
below:
#!/usr/bin/bash
INPUT_FILE='unix_file.csv'
IFS=','
while read OS HS
do
Here IFS is the input field separator. As the file is comma delimited, the IFS
variable is set with
comma. The output of the above script is
Here in the code, the fourth line (IFS=',') and sixth line (while) can be merged
into a single statement
as shown below:
Informatica is an ETL tool used for extracting the data from various sources (flat
files, relational
database, xml etc), transform the data and finally load the data into a centralized
location such as
data warehouse or operational data store. Informatica powercenter has a service
oriented
architecture that provides the ability to scale services and share resources across
multiple
machines.
The architectural diagram of informatica is shown below:
Informatica Architecture
The important components of the informatica power center are listed below:
Domain: Domain is the primary unit for management and administration of services in
Powercenter.
The components of domain are one or more nodes, service manager an application
services.
Node: Node is logical representation of machine in a domain. A domain can have
multiple nodes.
Master gateway node is the one that hosts the domain. You can configure nodes to
run application
services like integration service or repository service. All requests from other
nodes go through the
master gateway node.
Service Manager: Service manager is for supporting the domain and the application
services. The
Service Manager runs on each node in the domain. The Service Manager starts and
runs the
application services on a machine.
Application services: Group of services which represents the informatica server
based
functionality. Application services include powercenter repository service,
integration service, Data
integration service, Metadata manage service etc.
Powercenter Repository: The metadata is store in a relational database. The tables
contain the
instructions to extract, transform and load data.
Powercenter Repository service: Accepts requests from the client to create and
modify the
metadata in the repository. It also accepts requests from the integration service
for metadata to run
workflows.
Powercenter Integration Service: The integration service extracts data from the
source, transforms
the data as per the instructions coded in the workflow and loads the data into the
targets.
Informatica Administrator: Web application used to administer the domain and
powercenter
security.
Metadata Manager Service: Runs the metadata manager web application. You can
analyze the
metadata from various metadata repositories.
Unix or Linux operating system provides a feature for scheduling the jobs. You can
setup command
or scripts which will run periodically at the specified time. The Crontab is
command used to add or
remove jobs from the cron. The cron service is a daemon runs in the background and
checks for
/etc/crontab file, /etc/con.*/ directories and /var/spool/cron/ directory for any
scheduled jobs.
Each user has a separate /var/spool/cron/crontab file. Users are not allowed
directly to modify the
files. The crontab command is used for setting up the jobs in the cron.
The format of crontab command is
* * * * * command to be executed
MI : Minutes from 0 to 59
HH : Hours from 0 to 23
crontab -l
0 0 * * * /usr/local/bin/list_unix_versions.sh
The above contab command displays the cron entries. Here the shell script for
listing the unix
versions (list_unix_version.sh) is scheduled to run daily at midnight.
2. List crontab entries of other users
To list the corntab entries of other user in the unix, use the -u option with
crontab. The syntax is
shown below:
crontab -u username -l
crontab -r
crontab -u username -r
crontab -u username -e
This will open a file in VI editor. Now use the VI commands for adding, removing
the jobs and for
saving the crontab entries.
5. Schedule a job to take oracle backup on every Sunday at midnight
Edit crontab using "crontab -e" and append the following entry in the file.
0 0 * * 0 /usr/local/bin/oracle_backup.sh
0 0,6,12,18 * * * /usr/bin/mysql_backup.sh
Here the list 0,6,12,18 indicates midnight, 6am, 12pm and 6pm respectively.
7. Schedule job to run for the first 15 days of the month.
You can schedule a job by specifying the range of values for a field. The following
example takes the
sql server backup daily at midnight for the first 15 days in a month.
0 0 * 1-15 * /usr/bin/sql_server_backup.sh
* * * * * /bin/batch_email_send.sh
[Link]
9. Taking backup of cron entries
Before editing the cron entries, it is good to take backup of the cron entries. So
that even if you do
mistake you can get back those entries from the backup.
crontab cron_backup.dat
. Asterisk (*) : Indicates all possible values for a field. An asterisk in the
month field indicates all
possible months (January to December).
. Comma (,) : Indicates list of values. See example 6 above.
. Hyphen (-): Indicates range of values. See example 7 above.
Disabling Emails:
By default the crontab sends emails to the local user if the commands or scripts
produce any output.
To disable sending of emails redirect the output of commands to /dev/null 2>&1.
Note: you cannot schedule a job to run at second�s level as the minimum allowed
scheduling is at
minute level.
Labels: Unix
1 comment:
1.
Yum (Yellowdog Updater Modified) is one of the package manager utility in Linux
operating system.
Yum command is used for installing, updating and removing packages on Linux
environment. Some
other package manger utilities in linux system are apt-get, dpkg, rpm etc.
By default yum is installed on some of the linux distributions like CentOS, Fedora,
Redhat. Let see
some of the mostly used yum commands with examples.
1. Listing available packages
You can list all the available packages in the yum repository using the list.
yum list
If you want to search for mysql package, then execute the following yum command.
However this yum command matches only in the name and summary. Use "search all" for
everything.
4. How to install package using yum
"Yum install package_name" will install the specified package name in the linux
operating system.
The yum command will automatically finds the dependencies and also installs them in
the linux
machine.
Yum will prompt for the user to accept or decline before installing the package. If
you want yum to
avoid prompting the user, then use -y option with yum command.
7. Uninstalling a package.
You can remove (uninstall) a package with all dependencies using the "yum remove
package_name". This is shown below:
The yum remove prompts the user to accept or decline the uninstalling of package.
8. Information about the package.
You can print and check for the information about the package before installing it.
Execute the
following yum command to get info about the package.
yum grouplist
11. Installing a software group.
You can install the group software, by running the following command.
Here "all" is optional. If you provide "all", then it displays enabled and disabled
repositories.
Otherwise it displays only enabled repositories.
15. More info about yum command
If you want to know more information about the yum command, then run the man on yum
as
man yum
Find and Remove Files Modified / accessed N days ago - Unix / Linux
Q) How to find the files which were modified or accessed N or more days ago and
then delete those
files using the unix or linux command?
Searching for the files which were modified (or accessed) 10 or more days ago is
common operation
especially when you want to archive or remove older log files. Let see this with
the help of an
example.
Consider the below list of files in the current directory:
ls -l
total 24
Let see the todays date in my unix Operating system by issuing the date command.
Here i am
providing this date just for reference to N days.
date
We can use the find command for searching the files modified N or more days ago.
The find
command for this is:
find . -mtime +N
As an example, lets list out the files modified 5 days ago. The unix command for
this is:
find . -mtime +5
./[Link]
./[Link]
We got the list of files. Next we have to delete these files. We have to use the rm
command in unix
for removing the files. One way of removing the files is piping the output of find
command to xargs.
This is shown below:
The find command itself has the capability of executing the commands on the files
it listed. We have
to use the exec option in the find command. The complete find command for deleting
the files
modified N days ago is
Note: To remove the files based on access time use the -atime in the find command.
Q) How to search for a string (or pattern) in a file and replace that matched
string with another string
only on the specified line number in the file?
Let see this with the help of an example. Consider the sample file with the
following content:
The above sed command will replace the string on all the matched lines. In this
example, it replaces
the string on first, second, third and fifth lines.
If we want to replace the pattern on a specific line number, then we have to
specify the line number
to the sed command. The sed command syntax for replacing the pattern on Nth line
is:
To replace "Fedora" with "BSD" on second line, run the below sed command on the
unix terminal:
Q) How to delete range of lines from a file using unix or inux command?
Unix provides simple way to delete lines whose number are line numbers are between
m and n. This
feature is not directly available in windows operating system.
We can use the sed command for removing the lines. The syntax for removing range of
lines
(between m and n) is:
Here the number n should be greater than m. Let see this with an example. Consider
the sample file
with the following contents:
Linux also runs on embeded systems like network routers, mobiles etc.
From the above file if we want to delete the lines from number 2 to 4, then run the
below sed
command in unix:
However this command just prints the lines on the terminal and did not remove from
the file. To
delete the lines from the source file itself use the -i option to the sed command.
You can negate this operation and can delete lines that are not in the specified
range.. This is shown
in the following sed command:
Q) How to delete the trailer line (first line) from a file using the unix or inux
command?
Let see how to remove the last line from a file with an example. Consider the file
with sample content
as shown below:
Unix Sed command is popularly used for searching a pattern and then replacing the
matched pattern
with another string. However we can also use the sed command for deleting the lines
from a file.
To remove the last line from a file, run the below sed command:
Here $ represents the last line in a file. d is for deleting the line. The above
command will display the
contents of the file on the unix terminal excluding the footer line. However it
does not delete the line
from the source file. If you want the line to be removed from the source file
itself, then use the -i
option with sed command. This command is shown below:
If you want only the footer line to be present in the file and remove other lines
from the line, then you
have to negate the delete operation. For this use the exclamation (!) before the d.
This is shown in
the following sed command:
Q) How to remove the header line (first line) from a file using the unix or inux
command?
Let see how to delete the first line from a file with an example. Consider the file
with sample content
as shown below:
Mostly we see the sed command for replacing the strings in a file. We can also use
the sed
command for removing the lines in a file. To delete the first line from a file, run
the following sed
command:
Here 1 represents the first line. d is for deleting the line. The above command
will print the contents
of the file on the unix terminal by removing the first line from the file. However
it does not remove the
line from the source file. If you want to changes in the source file itself, then
use the -i option with
sed command. this command is shown below:
You can keep only the first line and remove the remaining lines from the file by
negating the above
sed command. You have to use the exclamation (!) before the d command. The
following sed
command keeps only the frist line in the file and removes the other lines:
Q) How to display the lines from a file that ends with specified string ( or
pattern) using unix or linux
commands?
Printing the lines that ends with specified pattern on the terminal is most
commonly used operation
in unix environment. Grep is the frequently used command in unix for searching a
pattern in a file
and printing the lines that contains the specified pattern.
We will see how to print the lines that end with the specified pattern with an
example. Consider the
Sample log file data as an example, which is shown below:
Now if we want to print the lines that end with the string "vmware", then use the
grep command with
dollar ($) in the pattern. The complete unix grep command is shown below:
Here $ is used to indicate the end of the line in a file. This grep command prints
the lines that end
with the word "vmware". In this example, the third and fourth lines are printed on
the unix terminal.
If you want to display the lines that end with the word "system", then run the
following grep command
on unix command prompt:
Q) How to print the lines from a file that contain the specified whole word using
unix or linux
command.
Whole words are complete words which are not part of another string. As an example
consider the
sentence, "How to initialize shell". Here the words "how, to, shell, initialize"
are whole words.
However the word "initial" is not a whole word as it is part of another string
(initialize).
Let see this in detail with the help of an example. Consider the following sample
data in a file:
Now we have the sample file. First we will see how to search for a word and print
the lines with the
help of grep command in unix. The following grep command prints the lines that have
the word
"match" in the line:
The above command displays the first two lines on the unix terminal. Even though
the first line does
not contain the whole word "match", the grep command displays the line as it
matches for the word
in the string "matching". This is the default behavior of grep command.
To print only the lines that contain the whole words, you have to use the -w option
to the grep
command. The grep command for this is:
Now the above command only displays the second line on the unix terminal. Another
example for
matching the whole word �boy� is shown below:
Q) How to print the lines from a file that does not contain the specified pattern
using unix / linux
command?
The grep command in unix by default prints the lines from a file that contain the
specified pattern.
We can use the same grep command to display the lines that do not contain the
specified pattern.
Let see this with the help of an example.
Consider the following sample file as an example:
You can practice unix commands by installing the unix operating system.
There so many unix flavors available in the market.
First we will see how to display the lines that match a specified pattern. To print
the lines that contain
the word "ubuntu", run the below grep command on unix terminal:
The above command displays the third and fourth lines from the above sample file.
Now we will see how to print non matching lines which means the lines that do not
contain the
specified pattern. Use the -v option to the grep command for inverse matching. This
is shown below:
This command prints the first, second and fifth lines from the example file.
Q) How to print the lines from a file that starts with specified string (pattern)
using unix or linux
commands?
Displaying lines that starts with specified pattern is most commonly used when
processing log files in
unix environment. Log files are used to store the messages of shell scripts (echo
statements). We
can search for errors in the log file using grep command. Generally, the error
keyword will appear at
the start of the line.
Sample log file data is shown below:
Success: Run the SQL statement and inserted rows into table.
Message: Script failed. Aborting the bash script to avoid further errors.
Now if we want to get the lines that start with the string "Error", then use the
grep command with
anchor (^) in the pattern. The complete unix grep command is shown below:
Here ^ is used to specify the start of the line in a file. This grep command will
displays the lines that
start with the word "Error". In this example, the third line starts with the
specified pattern (Error).
Q) How to print the count of number of lines from a file that match the specified
pattern in unix or
linux operating system?
Let say you are looking for the word "unix" in a file and want to display the count
of lines that contain
the word "unix". We will see how to find the count with an example.
Assume that i have a file (unix_sample.dat) in my unix operating system. Sample
data from the file is
shown below:
Otherwise you don�t know whether the unix server is running fine or not.
Use monitoring tools or email alerts to get the status of the unix server.
In the sample data, the word "unix" appears in two lines. Now we will print this
count on unix terminal
using the commands in unix.
1. Using wc command.
We can pipe the output of grep command to wc command to find the number of lines
that match a
pattern. The unix command is
Unix commands can be used to remove the empty (blank) lines from a file. Let see
this with the help
of an example.
Consider the following data file as an example:
This improves the performance and provides good uptime of the site.
The above sample file contains three empty lines. We will see how to remove these
blank lines with
the help of unix / linux commands.
1. Remove empty lines with Grep command.
The grep command can be used to delete the blank lines from a file. The command is
shown below:
Here ^ specifies the start of the line and $ specifies the end of the line. The -v
option inverses the
match of the grep command.
2. Delete blank lines with Sed command
The sed command can also be used to remove the empty lines from the file. This
command is
shown below:
The difference between the sed and grep here is: Sed command removes the empty
lines from the
file and prints on the unix terminal. Where as grep command matches the non-empty
lines and
displays on the terminal.
The unix grep command can be used to print the lines from a file that match
specified pattern. The
grep command has an option for printing the lines around the line that match the
pattern. Here we
will see how to display N line after a matching line with the help of an example.
Consider the below data file:
> cat linux_enterpise.dat
You can use linux in various systems from mobile phones to space crafts.
First of all we will see the general syntax for displaying N lines after the
matched line. The syntax of
grep command is:
This grep command will display N lines after the matched line and also prints the
matched line. Now
we will try to display the 2 lines after the line that contains the word
"flexibility". The grep command
for this is
This grep command will print first, second and third lines from the above file.
We know how to use the unix grep command to display the lines from a file that
match a pattern.
However we can use the grep command to display the lines around the line that match
the pattern.
We will see this with the help of an example.
Consider the below data file which talks about importance of online backup:
The most important concern is to keep your documents safe and secure
in a protected place. There are so many companies which offer online
First, we will see the general syntax for displaying N lines before the matched
line. The syntax of
grep command is:
This grep command will display N lines before the matched line and also prints the
matched line.
Now we will try to print the 2 lines before the line that contains the word
"important". The grep
command for this is
This will display second, third and fourth lines from the above file.
The Unix Grep command is used to search for a pattern in a line from a file and if
it founds the
pattern displays the line on the terminal. We can also use the grep command to
match for a pattern
in multiple files.
We will see this with the help of an example. Let�s consider two files shown below:
To grep for the word "hosting" from these two files specify both the file names as
space separated
list in grep command. The complete command is
webhost_trail.dat:Once you are happy with the free web hosting trail
This will display the filename along with the matching line. Instead of specifying
each file name, you
can specify a pattern (regular expressions) for the filename. Let say you want to
grep for the word
"company" in all the files whose name starts with "webhost_", you can use the below
grep command:
Q) How to make the grep command case in-sensitive and search for a pattern in a
file?
Let see how to do this with an example. Consider the below "Car insurance" data
file:
Tom bought a new car and confused about car insurance quotes.
Now TOM gets an idea and chooses the right insurance for him.
In the above file, you can see the name "tom" appears in different cases (upper
case, lower case
and mixed case).
If I want to display the lines that contain the pattern tom with ordinary grep
command, it will display
only the third line. The grep command is shown below:
To make this grep command case insensitive use the -i option to the command. Now it
will display
the first, third and fifth lines from the file. The case in sensitive grep command
is
We write automated scripts to perform scheduled tasks and put them in crontab.
These automated
scripts run at their scheduled times. However we don�t know whether the scripts are
succeeded or
not. So sending an email from automated bash scripts in unix host helps us to know
whether the
script is succeeded or not.
Here we will see simple bash script to send emails using the mail command in linux
operating
system.
#!/bin/bash
TO_ADDRESS="recipient@[Link]"
FROM_ADDRESS="sender"
From the name of the variables you can easily understand the significance of each.
In the mail
command -s represents the subject. Here for the address by default the logged in
unix / linux
hostname is used as the sent address. For example if you have logged into unix host
which is
"[Link]" and specified the from address as "test". Then your complete from
address will be
"test@[Link]".
In the above bash script we specified the body from a file and did not specified
any attachments. We
will enhance the above script to attach files, to read body from a file and
specifying a list of users in
CC. The enhanced mail script is shown below:
#!/bin/bash
TO_ADDRESS="recipient@[Link]"
FROM_ADDRESS="sender"
BODY_FILE="[Link]"
ATTACHMENT_FILE="[Link]"
CC_LIST="user1@[Link];user2@[Link];user3@[Link];user4@cheetahmail.
com"
The uuencode is used to attach files using the mail command. Here -c option in mail
command is
used to specify the list of users in cc list.
Connect to Oracle Database in Unix Shell script
Q) How to connect to oracle database and run sql queries using a unix shell script?
The first thing you have to do to connect to oracle database in unix machine is to
install oracle
database drivers on the unix box. Once you installed, test whether you are able to
connect to the
database from command prompt or not. If you are able to connect to the database,
then everything
is going fine.
Here i am not going to discuss about how to install oracle database drivers. I am
just providing the
shell script which can be used to connect to the database and run sql statements.
The following Shell script connects to the scott schema of the oracle database and
writes the
database to the "[Link]" file.
#!/bin/bash
LogDirectory='/var/tmp/logs'
DataDirectory='/var/tmp/data'
DBUSER='scott'
DBUSERPASSWORD='tiger'
DB='oracle'
${DBUSER}/${DBUSERPASSWORD}@${MYDB}
If the sql statements are failed to run, then the errors are written to the same
"[Link]" file. A better
solution is to write the sql statements output to one file and the errors to
another file. The below
script uses the spooling concept in oracle to write to data to another file:
#!/bin/bash
LogDirectory='/var/tmp/logs'
DataDirectory='/var/tmp/data'
DBUSER='scott'
DBUSERPASSWORD='tiger'
DB='oracle'
${DBUSER}/${DBUSERPASSWORD}@${MYDB}
spool ${DataDirectory}/query_output.dat
spool off
EOF
Here the output of the select statement is written to the "query_output.dat" file.
Q) How to delete all the lines in a file when opened in a VI editor or VIM editor?
Those who are new to unix will use the dd to delete each and every line to empty
the file. There is an
easy way to delete all the lines in a file when opened in an Editor.
Follow the below steps to empty a file:
See how simple it is to remove all the lines in a file. We will see how to empty
the file when not
opened in an editor. In unix /dev/null is any empty stream, you can use that to
empty a file. The
following commands shows how to empty a file
Q) How to read each line from a file using loops in bash scripting?
Reading lines from files and then processing on each line is a basic operation in
shell scripting. We
will see here how to read each line from a file using for and while loop in bash
scripting.
Read Line using While Loop:
The below bash script reads line from the file, "[Link]", using while loop and
prints the line on the
terminal:
#!/bin/bash
i=1
do
echo $i $LINE
i=`expr $i+ 1`
#!/usr/bin/bash
n=1
do
echo $n $y
n=`expr $n+ 1`
done
Awk command in unix has rich set of features. One of the feature is it can store
the elements in
arrays and can process the data in the elements. Here we will see how to use arrays
in awk
command with examples.
Examples of Arrays in Awk Command:
1. Finding the sum of values
I want to find the sum of values in the first column of all the lines and display
it on the unix or linux
terminal. Let say my file has the below data:
10
20
30
After summing up all the values, the output should be 60. The awk command to sum
the values
without using the arrays is shown below:
Here i have used a variable to store the sum of values. At the end after summing up
all the values,
the sum is printed on the terminal.
The awk command to find the sum of values by using arrays is shown below:
Here an array is used to store the sum of values. Basically this array will store
the cumulative sum of
values, at the end it contains the total and it is displayed on the terminal.
2. Ranking values in a file.
Let say I have a source file which contains the employees data. This file has three
fields first field is
department_id, second one is employee name and third one is salary. Sample data
from the file is
shown below:
Here Employees AAA and CCC got same rank as their salaries are same.
To solve this problem first we have to sort the data and then pipe it to awk
command. The complete
command is shown below:
department_array[NR]=$1;
salary_array[NR]=$3;
if (department_array[NR] != department_array[NR-1])
rank_array[NR]=1;
rank_array[NR] =rank_array[NR-1];
}
else
rank_array[NR] = rank_array[NR-1]+1;
print department_array[NR]","$2","salary_array[NR]","rank_array[NR];
}'
For readability purpose the above command is written in multiple lines. You have to
write the above
command in single line to make it work in unix.
. Open the putty client tool. Enter the remote unix hostname in the "Host Name (or
IP address)".
In this demo i have entered the hostname as "[Link]".
. To save this hostname, enter a name like "Tunnel" in the "Saved Sessions" place.
This is shown
in the below image:
[Link]
tunnel_putty1.jpg
[Link]
tunnel_putty2.jpg
. On the left side of the client, you can see a navigation panel. Go to SSH->
Tunnels.
. Again enter the remote hostname ([Link]) in "Destination" section.
. Enter the source port as 1100 (any value you prefer) and check the Dynamic
option. This is
shown below:
. Now click on Add. Go back to the previous window by clicking on the Session in
the left side
pan. Here clik on save. I will save your tunnel details.
. Open this tunnel and enter your remote machine login details. Do not close this
unix session. If
you close it, your tunneling won�t work.
[Link]
mozilla_firefox_tunnel.jpg
. Open the Mozilla fire fox browser. Go to Tools->Options->Advanced->Network-
>Settings.
. In the settings, Check the manual proxy configurations, enter the Socket host as
localhost and
port as 1100 (Same port which is specified in tunnel configuration) and click on
Ok. This is
shown in below image.
Now you can open any website with this approach provided your remote host has
access.
VIM is a powerful editor in unix or linux. The VIM editor got so many features.
Here we will see the
options for saving and quitting from the vim editor.
The following options works in command mode of VIM editor. To go to the command
mode press
ESC key on the keyboard and then type the below commands:
. :w ->Saves the contents of the file without exiting from the VIM editor
. :wq ->Saves the text in the file and then exits from the editor
. :w filename -> Saves the contents of the opened file in the specified filename.
However it won�t
save the contents of the current file.
. :x -> Saves changes to the current file and then exits. Similar to the :wq
. :m,nw filename -> Here m and n are numbers. This option will write the lines from
the specified
numbers m and n to the mentioned filename.
. :q -> Exits from the current file only if you did not do any changes to the file.
. :q! -> Exits from the current file and ignores any changes that you made to the
file.
The unix dirname command strips non-directory suffix from a file name.
The syntax of dirname command is
dirname NAME
The dirname command removes the trailing / component from the NAME and prints the
remaining
portion. If the NAME does not contain / component then it prints '.' (means current
directory).
Dirname command is useful when dealing with directory paths in unix or linux
operating systems.
Some examples on dirname command are shown below:
Dirname Command Examples:
1. Remove the file name from absolute path.
Let say my directory path is /usr/local/bin/[Link]. Now i want to remove /[Link]
and display only
/usr/local/bin, then we can use the dirname command.
/usr/local/bin
2. dirname [Link]
Here you can see that the NAME does not contain the / component. In this case the
dirname
produces '.' as the output.
Note: The directories and filename which i have passed as arguments to dirname
command in the
above examples are just strings. There is no need of these directories or files to
exist in the unix
machine.
The Split command in unix or linux operating system splits a file into many pieces
(multiple files). We
can split a file based on the number of lines or bytes. We will see how to use the
split command with
an example.
As an example, let�s take the below text file as the source file which we want to
split:
> cat textfile
unix linux os
windows mac os
linux environment
There are three lines in that file and the size of the file is 47 bytes.
Split Command Examples:
1. Splitting file on number of lines.
The Split command has an option -l to split the file based on the number of lines.
Let say i want to
split the text file with number of lines in each file as 2. The split command for
this is
The new files created are xaa and xab. Always the newly created (partitioned) file
names start with
x. We will see the contents of these files by doing a cat operation.
unix linux os
windows mac os
linux environment
As there only three lines in the source file we got only one line in the last
created file.
2. Splitting file on the number of bytes
We can use the -b option to specify the number of bytes that each partitioned file
should contains.
As an example we will split the source files on 10 bytes as
split -b10 textfile
The files created are xaa, xab, xac, xad, xae. The first four files contain 10
bytes and the last file
contains 7 bytes as the source file size is 47 bytes.
3. Changing the newly created file names from character sequences to numeric
sequences.
So far we have seen that the newly created file names are created in character
sequences like xaa,
Xab and so on. We can change this to numeric sequence by using the -d option as
The names of the new files created are x00 and x01.
4. Changing the number of digits in the sequence of filenames.
In the above example, you can observe that the sequences have two digits (00 and
01) in the file
names. You can change the number of digits in the sequence by using the -a option
as
First we will see how to swap two strings in a line and then we will see how to
swap two columns in a
file.
As an example, consider the text file with below data:
unix linux os
windows mac os
linux unix os
windows mac os
The parentheses are used to remember the pattern. \1 indicates first pattern and \2
indicates second
pattern.
Swap Fields using Awk command:
From the above file structure, we can observe that the file is in format of rows
and columns where
the columns are delimited by space.
Awk command can be used to process delimited files. Awk command to swap the first
two fields in a
file is
linux unix os
mac windows os
Q) My log file contains the braces symbols '(' and ')'. I would like to replace the
braces with empty
string. Sample data in the log file is shown below:
The output should not contain the braces and the data should look as
Error - unix script failed
2. Replacing using sed command Sed command is popularly used for replacing the text
in a file with
another text. The sed command is
Q) I have file with 10000 lines in unix or linux operating system. I want to split
this file and create 10
files such that each file has 1000 lines. What I mean is the first 100 lines should
go into one file; next
100 lines should go into another file and so on. How to do this using unix
commands.
Solution:
Unix has the split command which can be used to partition the data in a file into
multiple files. The
command to split a file based on the number of lines is shown below:
By default, the partitioned filenames starts with x like xab, xac, xad and so on.
Instead of
alphabetical sequences, you can use numeric sequences in filenames like x01, x02
using the -d
option.
You can specify the number of digits to be used in the numeric sequences with the
help of -a option.
Examples: Let say i have a text file with 4 lines. The data in the file is shown
below:
unix is os
linux environment
centos
We will run the split command for each of the points discussed above and see what
files will be
created.
Q) I have a file with bunch of lines. I want to remove the last character in each
line from that file.
How can i achieve this in unix or linux environment.
Solution:
1. SED command to remove last character
You can use the sed command to delete the last character from a text. The sed
command is
2. Bash script
The below bash script can be used to remove the last character in a file.
#! /bin/bash
do
echo ${LINE%?}
4. Using rev and cut command We can use the combination of reverse and cut command
to remove
the last character. The command is shown below:
Q) I have a products data in the text file. The data in the file look as shown
below:
iphone
samsung
nokia
yahoo
aol
amazon
ebay
walmart
Now my requirement is to group each 3 consecutive rows into a single row and
produce a comma
separated list of products. The output should look as
iphone,samsung,nokia
yahoo,google,aol
amazon,ebay,walmart
2. Another way is using the paste command. The solution using the paste command is
shown below.
Awk command to split list data in a column into multiple rows - Unix/Linux
Q) I have a flat file in the unix or linux environment. The data in the flat file
looks as below
Mark Maths,Physics,Chemistry
Chris Social
The flat file contains the list of subjects that were taken by the students in
their curriculum. I want the
subjects list in each column to be splitted into multiple rows. After splitting the
data in the target
should look as:
Mark Maths
Mark Physics
Mark Chemistry
Chris Social
Henry Science
Henry Science
We will see how to search for files and then grep for a string of text in those
files. First i will use the
find command in unix or linux to search for the regular files in the current
directory. The grep
command to search for the normal files in the current directory is shown below:
./docs/[Link]
./[Link]
./sample
Now we will grep for a particular word in these files and display only the
filenames that has the
matching word. The unix command is shown below:
If you want to put space between the results of the above command, display the line
using echo. The
complete unix command is
> find . -type f -exec grep -l word {} \; -exec grep word {} \; -exec echo \;
The above example shows how to use multiple grep�s with the find command in unix or
linux.
Range partition is a partitioning technique where the ranges of data are stored on
separate sub-
tables.
MAXVALUE is offered as a catch-all values that exceed the specified ranges. Note
that NULL values
are treated as greater than all other values except MAXVALUE.
Range Partitioning Examples:
1. Range Partition on numeric values
sale_id number,
product_id number,
price number
PARTITION BY RANGE(sale_id) (
);
product_id number,
product_name varchar2(30),
category varchar2(30)
PARTITION BY RANGE(category) (
);
order_id number,
order_date date
PARTITION BY RANGE(order_date) (
partition o1 values less than (to_date('01-01-2010,'DD-MM-YYYY'))
tablespace ts1,
);
One of the changes that made in informatica version 9 was making the lookup
transformation as
active transformation. The lookup transformation can return all the matching rows.
When creating the lookup transformation itself you have to specify whether the
lookup
transformation returns multiple rows or not. Once you make the lookup
transformation as active
transformation, you cannot change it back to passive transformation. The "Lookup
Policy on Multiple
Match" property value will become "Use All Values". This property becomes read-only
and you
cannot change this property.
As an example, for each country you can configure the lookup transformation to
return all the states
in that country. You can cache the lookup table to improve performance. If you
configure the lookup
transformation for caching, the integration service caches all the rows form the
lookup source. The
integration service caches all rows for a lookup key by the key index.
Guidelines for Returning Multiple Rows:
Follow the below guidelines when you configure the lookup transformation to return
multiple rows:
. You can cache all the rows from the lookup source for cached lookups.
. You can customize the SQL Override for both cached and uncache lookup that return
multiple
rows.
. You cannot use dynamic cache for Lookup transformation that returns multiple
rows.
. You cannot return multiple rows from an unconnected Lookup transformation.
. You can configure multiple Lookup transformations to share a named cache if the
Lookup
transformations have matching caching lookup on multiple match policies.
. Lookup transformation that returns multiple rows cannot share a cache with a
Lookup
transformation that returns one matching row for each input row.
Lookup transformation has so many proprties which you can configure. Depending on
the lookup
source (flat file or relational lookup), you can configure the below properties of
lookup
transformation:
Lookup Transformation Properties:
Lookup Property
Lookup
Type
Description
Lookup SQL
Override
Relational
Lookup Table
Name
Pipeline
Relational
Lookup Source
Filter
Relational
You can filter looking up in the cache based on the value of data in
the lookup ports. Works only when lookup cache is enabled.
Lookup Caching
Enabled
Flat File
Pipeline
Relational
Lookup Policy on
Multiple Match
Flat File
Pipeline
Relational
Lookup Condition
Flat File
Pipeline
Relational
You can define the lookup condition in the condition tab. The lookup
condition is displayed here.
Connection
Information
Relational
Source Type
Flat File
Pipeline
Relational
Indicates the lookup source type: flat file or relational table or source
qualifier.
Tracing Level
Flat File
Pipeline
Relational
Lookup Cache
Flat File
Pipeline
Relational
Lookup Cache
Persistent
Flat File
Pipeline
Relational
Use when the lookup source data does not change at all. Examples:
zipcodes, countries, states etc.
The lookup caches the data once and it uses the cache even in
multiple session runs.
Lookup Data
Cache Size
Lookup Index
Cache Size
Flat File
Pipeline
Relational
Dynamic Lookup
Cache
Flat File
Pipeline
Relational
Flat File
Pipeline
Relational
Use with dynamic caching enabled. When you enable this property,
the Integration Service outputs old values out of the lookup/output
ports. When the Integration Service updates a row in the cache, it
outputs the value that existed in the lookup cache before it updated
the row based on the input data. When the Integration Service
inserts a row in the cache, it outputs null values.
Update Dynamic
Cache Condition
Flat File
Pipeline
Relational
An expression that indicates whether to update dynamic cache.
Create an expression using lookup ports or input ports. The
expression can contain input values or values in the lookup cache.
The Integration Service updates the cache when the condition is true
and the data exists in the cache. Use with dynamic caching enabled.
Default is true.
Flat File
Pipeline
Relational
Use with persistent lookup cache. Specifies the file name prefix to
use with persistent lookup cache files.
Recache From
Lookup Source
Flat File
Pipeline
Relational
Flat File
Pipeline
Relational
Flat File
Datetime Format
Flat File
Specify the date format for the date fields in the file.
Thousand
Separator
Flat File
Decimal Separator
Flat File
Case-Sensitive
String Comparison
Flat File
Null Ordering
Flat File
Pipeline
Sorted Input
Flat File
Pipeline
Lookup Source is
Static
Flat File
Pipeline
Relational
Pre-build Lookup
Cache
Flat File
Pipeline
Relational
Allows the Integration Service to build the lookup cache before the
Lookup transformation receives the data. The Integration Service can
build multiple lookup cache files at the same time to improve
performance.
Subsecond
Precision
Relational
The steps to create a lookup transformation are bit different when compared to
other
transformations. If you want to create a reusable lookup transformation, create it
in the
Transformation Developer. To create a non-reusable lookup transformation, create it
in the Mapping
Designer. Follow the below steps to create the lookup transformation.
1. Login to the Power center Designer. Open either Transformation Developer tab or
Mapping
Designer tab.
Click on the Transformation in the toolbar, and then click on Create.
[Link]
[Link]
2. Select the lookup transformation and enter a name for the transformation. Click
Create.
3. Now you will get a "Select Lookup Table" dialog box for selecting the lookup
source, choosing
active or passive option. This is shown in the below image:
4. You can choose one of the below option to import the lookup source definition:
5. In the same dialog box, you have an option to choose active or passive lookup
transformation.
You can see this option in red circle in the above image. To make the lookup
transformation as
active, check the option "Return All Values on Multiple Match". Do not check this
when creating a
passive lookup transformation. If you have created an active lookup transformation,
the value of the
property "Lookup policy on multiple match" will be "Use All Values". You cannot
change an active
lookup transformation back to a passive lookup transformation.
6. Click OK or Click Skip if you want to manually add ports to lookup
transformation.
7. For connected lookup transformation, add input and output ports.
8. For unconnected lookup transformation, create a return port for the value you
want to return from
the lookup.
9. Go to the properties and configure the lookup transformation properties.
10. For dynamic lookup transformation, you have to associate an input port, output
port or sequence
Id with each lookup port.
11. Go the condition tab and add the lookup condition.
Connected and Unconnected Lookup Transformation - Inforamtica
The lookup transformation can be used in both connected and unconnected mode. The
difference
between the connected and unconnected lookup transformations are listed in the
below table:
First step when creating a lookup transformation is choosing the lookup source. You
can select a
relational table, flat file or a source qualifier as the lookup source.
Relational lookups:
When you want to use a relational table as a lookup source in the lookup
transformation, you have to
connect to the lookup source using a ODBC and import the table definition as the
structure for the
lookup transformation. You can use the below options for relational lookups:
. You can override the default sql query and write your own customized sql to add a
WHERE
clause or query multiple tables.
. You can sort null data based on the database support.
. You can perform case-sensitive comparison based on the database support.
. You can use indirect files as lookup sources by configuring a file list as the
lookup file name.
. You can use sorted input for the lookup.
. You can sort null data high or low.
. You can use case-sensitive string comparison with flat file lookups.
. Flat File or Relational lookup: You can perform the lookup on the flat file or
relational
database. When you create a lookup using flat file as lookup source, the designer
invokes flat
file wizard. If you used relational table as lookup source, then you can connect to
the lookup
source using ODBC and import the table definition.
. Pipeline Lookup: You can perform lookup on application sources such as JMS, MSMQ
or SAP.
You have to drag the source into the mapping and associate the lookup
transformation with the
source qualifier. Improve the performance by configuring partitions to retrieve
source data for
the lookup cache.
. Connected or Unconnected lookup: A connected lookup receives source data,
performs a
lookup and returns data to the pipeline. An unconnected lookup is not connected to
source or
target or any other transformation. A transformation in the pipeline calls the
lookup
transformation with the :LKP expression. The unconnected lookup returns one column
to the
calling transformation.
. Cached or Uncached Lookup: You can improve the performance of the lookup by
caching the
lookup source. If you cache the lookup source, you can use a dynamic or static
cache. By
default, the lookup cache is static and the cache does not change during the
session. If you use
a dynamic cache, the integratiion service inserts or updates row in the cache. You
can lookup
values in the cache to determine if the values exist in the target, then you can
mark the row for
insert or update in the target.
When you used an update strategy transformation in the mapping or specified the
"Treat Source
Rows As" option as update, informatica integration service updates the row in the
target table
whenever there is match of primary key in the target table found.
The update strategy works only
What if you want to update the target table by a matching column other than the
primary key? In this
case the update strategy wont work. Informatica provides feature, "Target Update
Override", to
update even on the columns that are not primary key.
You can find the Target Update Override option in the target definition properties
tab. The syntax of
update statement to be specified in Target Update Override is
UDATE TARGET_TABLE_NAME
UPDATE EMPLOYEES
This post is continuation to my previous one on update strategy. Here we will see
the different
settings that we can configure for update strategy at session level.
Single Operation of All Rows:
We can specify a single operation for all the rows using the "Treat Sources Rows
As" setting in the
session properties tab. The different values you can specify for this option are:
. Insert: The integration service treats all the rows for insert operation. If
inserting a new row
violates the primary key or foreign key constraint in the database, then the
integration service
rejects the row.
. Delete: The integration service treats all the rows for delete operation and
deletes the
corresponding row in the target table. You must define a primary key constraint in
the target
definition.
. Update: The integration service treats all the rows for update operation and
updates the rows in
the target table that matches the primary key value. You must define a primary key
in the target
definition.
. Data Driven: An update strategy transformation must be used in the mapping. The
integration
service either inserts or updates or deletes a row in the target table based on the
logic coded in
the update strategy transformation. If you do not specify the data driven option
when you are
using a update strategy in the mapping, then the workflow manager displays a
warning. The
integration service does not follow the instructions in the update strategy
transformation.
The below table illustrates how the data in target table is inserted or updated or
deleted for various
combinations of "Row Flagging" and "Settings of Individual Target Table".
Row Flagging
Type
Target Table
Settings
Result
Insert
Insert is specified
Insert
Delete
Delete option is
specified
Delete
Even if the row exists in target, then it will not be deleted from
the target.
Update
Update as Update
Update
Insert is specified
Update as Insert is
specified
Update
Insert is not
specified
Update as Insert is
Specified.
Update
Insert is specified
Update else Insert
is specified
Update
Insert is not
specified
Update else Insert
is Specified
If the row exists in target, then it will be updated. Row will not
be inserted in case if it not exists in target.
Mostly IIF and DECODE functions are used to test for a condition in update strategy
transformation.
Update Strategy and Lookup Transformations:
Update strategy transformation is used mostly with lookup transformation. The row
from the source
qualifier is compared with row from lookup transformation to determine whether it
is already exists or
a new record. Based on this comparison, the row is flagged to insert or update
using the update
strategy transformation.
Update Strategy and Aggregator Transformations:
If you place an update strategy before an aggregator transformation, the way the
aggregator
transformation performs aggregate calculations depends on the flagging of the row.
For example, if
you flag a row for delete and then later use the row to calculate the sum, then the
integration service
subtracts the value appearing in this row. If it�s flagged for insert, then the
aggregator adds its value
to the sum.
[Link]
s320/[Link]
Important Note:
Update strategy works only when we have a primary key on the target table. If there
is no primary
key available on the target table, then you have to specify a primary key in the
target definition in the
mapping for update strategy transformation to work.
Recommended Reading:
Update Strategy Session Level Settings
One of the properties of source qualifier transformation is "SQL Query" which can
be used to
overwrite the default query with our customized query. We can generate SQL queries
only for
relational sources. For flat files, all the properties of source qualifier
transformation will be disabled
state.
Here we will see how to generate the SQL query and the errors that we will get
while generating the
SQL query.
Error When Generating SQL query:
The most frequent error that we will get is "Cannot generate query because there
are no valid fields
projected from the Source Qualifier".
First we will see simulate this error and then we will see how to avoid this.
Follow the below steps for
simulating and fixing error:
. Create a new mapping and drag the relational source into it. For example drag the
customers
source definition into the mapping.
. Informatica produces this error because the source qualifier transformation ports
are not
connected to any other transformations or target. Informatica just knows the
structure of the
source. However it doesn't know what columns to be read from source table. It will
know only
when the source qualifier is connected to downstream transformations or target.
. To avoid this error, connect the source qualifier transformation to downstream
transformation or
target.
Customer_Id Number,
Name Varchar2(30),
Email_Id Varchar2(30),
Phone Number
Follow the below steps to generate the SQL query in source qualifier
transformation.
. Create a new mapping and drag the customers relational source into the mapping.
. Now connect the source qualifier transformation to any other transformation or
target. Here I
have connected the SQ to expression transformation. This is shown in the below
image.
[Link]
s320/[Link]
[Link]
s320/[Link]
[Link]
s320/[Link]
. Edit the source qualifier transformation, go to the properties tab and then open
the editor of SQL
query.
. Enter the username, password, data source name and click on Generate SQL query.
Now the
SQL query will be generated. This is shown in the below image.
SELECT Customers.Customer_Id,
[Link],
Customers.Email_Id,
[Link]
FROM Customers
Now we will do a small change to understand more about the "Generating SQL query".
Remove the
link (connection) between Name port of source qualifier and expression
transformation.
[Link]
s320/[Link]
Repeat the above steps to generate the SQL query and observe what SQL query will be
generated.
SELECT Customers.Customer_Id,
Customers.Email_Id,
[Link]
FROM Customers
The Name column is missing in the generated query. This means that whatever the
ports connected
from Source Qualifier transformation to other downstream transformations or target
will be included
in the SQL query and read from the database table.
transformation to generate unique primary key values can cause performance issues
as an
additional transformation is required to process in mapping.
You can use expression transformation to generate surrogate keys in a dimensional
table. Here we
will see the logic on how to generate sequence numbers with expression
transformation.
Sequence Generator Reset Option:
When you use the reset option in a sequence generator transformation, the sequence
generator
uses the original value of Current Value to generate the numbers. The sequences
will always start
from the same number.
As an example, if the Current Value is 1 with reset option checked, then the
sequences will always
start from value 1 for multiple session runs. We will see how to implement this
reset option with
expression transformation.
Follow the below steps:
. The v_seq port generates the numbers same as NEXTVAL port in sequence generator
transformation.
. Create a mapping to write the maximum value of primary key in the target to a
parameter file.
Assign the maximum value to the parameter ($$MAX_VAL) in this mapping. Create a
session
for this mapping. This should be the first session in the workflow.
. Create another mapping where you want to generate the sequence numbers. In this
mapping,
connect the required ports to the expression transformation, create the below
additional ports in
the expression transformation and assign the below expressions:
. Create an unconnected lookup transformation and create only one return port in
the lookup.
Now overwrite the lookup query to get the maximum value of primary key from the
target. The
query looks as
. Now create an expression transformation and connect the required ports to it. Now
we will call
the unconnected lookup transformation from this expression transformation. Create
the below
additional port in the expression transformation:
. The o_primary_key port generates the surrogate key values for the dimension
table.
We will see the difference of reusable and non reusable sequence generator
transformation along
with the properties of the transformation.
Sequence Generator Transformation Properties:
You have to configure the following properties of a sequence generator
transformation:
Start Value:
Specify the Start Value when you configure the sequence generator transformation
for Cycle option.
If you configure the cycle, the integration service cycles back to this value when
it reaches the End
Value. Use Cycle to generate a repeating sequence numbers, such as numbers 1
through 12 to
correspond to the months in a year. To cycle the integration service through a
sequence:
. Enter the lowest value in the sequence to use for the Start Value.
. Enter the highest value to be used for End Value.
. Select Cycle option.
Increment By:
The Integration service generates sequence numbers based on the Current Value and
the Increment
By properties in the sequence generator transformation. Increment By is the integer
the integration
service adds to the existing value to create the new value in the sequence. The
default value of
Increment By is 1.
End Value:
End value is the maximum value that the integration service generates. If the
integration service
reaches the end value and the sequence generator is not configured for cycle
option, then the
session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
If the sequence generator is configured for cycle option, then the integration
service cycles back to
the start value and starts generating numbers from there.
Current Value:
The integration service uses the Current Value as the basis for generated values
for each session.
Specify the value in "Current Value" you want the integration service as a starting
value to generate
sequence numbers. If you want to cycle through a sequence of numbers, then the
current value
must be greater than or equal to the Start Value and less than the End Value.
At the end of the session, the integration service updates the current value to the
last generated
sequence number plus the Increment By value in the repository if the sequence
generator Number
of Cached Values is 0. When you open the mapping after a session run, the current
value displays
the last sequence value generated plus the Increment By value.
Reset:
The reset option is applicable only for non reusable sequence generator
transformation and it is
disabled for reusable sequence generator. If you select the Reset option, the
integration service
based on the original current value each time it starts the session. Otherwise the
integration service
updates the current value in the repository with last value generated plus the
increment By value.
Number of Cached Values:
The Number of Cached Values indicates the number of values that the integration
service caches at
one time. When this value is configured greater than zero, then the integration
service caches the
specified number of values and updates the current value in the repository.
Non Reusable Sequence Generator:
The default value of Number of Cached Values is zero for non reusable sequence
generators. It
means the integration service does not cache the values. The integration service,
accesses the
Current Value from the repository at the start of the session, generates the
sequence numbers, and
then updates the current value at the end of the session.
When you set the number of cached values greater than zero, the integration service
caches the
specified number of cached values and updates the current value in the repository.
Once the cached
values are used, then the integration service again accesses the current value from
repository,
caches the values and updates the repository. At the end of the session, the
integration service
discards any unused cached values.
For non-reusable sequence generator setting the Number of Cached Values greater
than zero can
increase the number of times the Integration Service accesses the repository during
the session.
And also discards unused cache values at the end of the session.
As an example when you set the Number of Cached Values to 100 and you want to
process only 70
records in a session. The integration service first caches 100 values and updates
the current value
with 101. As there are only 70 rows to be processed, only the first 70 sequence
number will be used
and the remaining 30 sequence numbers will be discarded. In the next run the
sequence numbers
starts from 101.
The disadvantage of having Number of Cached Values greater than zero are: 1)
Accessing the
repository multiple times during the session. 2) Discarding of unused cached
values, causing
discontinuous sequence numbers
Reusable Sequence Generators:
The default value of Number of Cached Values is 100 for reusable sequence
generators. When you
are using the reusable sequence generator in multiple sessions which run in
parallel, then specify
the Number of Cache Values greater than zero. This will avoid generating the same
sequence
numbers in multiple sessions.
If you increase the Number of Cached Values for reusable sequence generator
transformation, the
number of calls to the repository decreases. However there is chance of having
highly discarded
values. So, choose the Number of Cached values wisely.
Recommended Reading:
Sequence Generator Transformation
Sequence Generator Transformation in Infotmatica
. Start Value: Specify the start value of the generated sequence that you want the
integration
service to use the cycle option. If you select cycle, the integration service
cycles back to this
value when it reaches the end value.
. Increment By: Difference between two consecutive values from the NEXTVAL port.
Default
value is 1. Maximum value you can specify is 2,147,483,647.
. End Value: Maximum sequence value the integration service generates. If the
integration
service reaches this value during the session and the sequence is not configured to
cycle, the
session fails. Maximum value is 9,223,372,036,854,775,807.
. Current Value: Current Value of the sequence. This value is used as the first
value in the
sequence. If cycle option is configured, then this value must be greater than or
equal to start
value and less than end value.
. Cycle: The integration service cycles through the sequence range.
. Number of Cached Values: Number of sequential values the integration service
caches at a
time. Use this option when multiple sessions use the same reusable generator.
Default value for
non-reusable sequence generator is 0 and reusable sequence generator is 1000.
Maximum
value is ,223,372,036,854,775,807.
. Reset: The integration service generate values based on the original current
value for each
session. Otherwise, the integration service updates the current value to reflect
the last-
generated value for the session plus one.
. Tracing level: The level of detail to be logged in the session log file.
[Link]
sequence_generator_transformation_properties.jpg
NEXTVAL CURRVAL
---------------
1 2
2 3
3 4
4 5
5 6
If you connect only the CURRVAL port without connecting the NEXTVAL port, then the
integration
service passes a constant value for each row.
Recommended Reading:
Reusable vs Non Reusable Sequence Generator
In one of my project, we got a requirement to load data from a varying fields flat
file into oracle table.
The complete requirements are mentioned below:
Requirement:
. Daily we will get a comma delimited flat file which contains the monthly wise
sales information of
products.
. The data in the flat file is in denormalized structure.
. The number of months in the flat file may vary from day to day.
. The header of the flat file contains the fields.
Let say today the structure of the flat file might look as
Product,Jan2012,Feb2012
A,100,200
B,500,300
The next day the flat file structure might vary in the number of months. However
the product field will
be the always be there in the first field of the flat file. The sample flat file
structure in the next day
looks as
Product,Jan2012,Feb2012,Mar2012
C,300,200,500
D,100,300,700
Now the problem is to load this flat file into the oracle table. The first thing is
designing the target
table. We designed a normalized target table and the structure of the table looks
as
---------------------
A, Jan2012,100
A, Feb2012,200
B, Jan2012,500
B, Feb2012,300
C, Jan2012,300
C, Feb2012,200
C, Mar2012,500
D, Jan2012,100
D, Feb2012,300
D, Mar2012,700
Anyhow we designed the target table. Now comes the real problem. How to identify
the number of
fields in the flat file and how to load the denormalized flat file into the
normalized table?
We created new procedure to handle this problem. Here i am listing the sequence of
steps in the
procedure which we used to load the flat file data into the oracle database.
Reading the Header information from the file:
. Created the required variables. I will mention them as and when required.
. We have used the utl_file package in oracle which is for reading the flat file.
. The syntax for opening the file is
FileHandle:=utl_file.fopen(
file_location IN VARCHAR2,
file_name IN VARCHAR2,
open_mode IN VARCHAR2,
. We have opened the file. Now we will read the flat file header which is the first
line in the file.
The syntax is
Header Varchar2(4000);
utl_file.get_line(FileHandle,Header);
utl_file.fclose(FileHandle);
. The Header variable contains the header part of the file which contains the
fields in the file. The
data in the Header variable looks as
Product,Jan2012,Feb2012
. We have created an external table by using this Header variable.
. As the Header variable contains the fields from the file, it is easy to construct
the syntax for
external table creation.
. Replace the comma in the Header variable with "varchar2(100),". Then concatenate
the variable
with " Varchar2(100)" at the end. This step is shown in the below example:
. We have constructed the fields with data types. Now we have to construct the
structure of
external table using the variable. This is show in the below example:
Header_With_datatypes
Organization external
Access parameters
skip 1
Location(file_location)
);
. Now we have to transpose the columns in the flat file into rows and then load
into the final table.
We have to transpose only the month columns and not the product column. The steps
involved
in transposing the columns are listed below:
Header:=Replace(Header,'product,','');
select *
from external_stage_table
. Drop the external table once inserting the target table is done.
Q) How to load the data from a flat file into the target where the source flat file
name changes daily?
Example: I want to load the customers data into the target file on a daily basis.
The source file name
is in the format customers_yyyymmdd.dat. How to load the data where the filename
varies daily?
The solution to this kind of problems is using the parameters. You can specify
session parameters
for both the source and target flat files. Then create a parameter file and assign
the flat file names to
the parameters.
Specifying Parameters for File Names:
The steps involved in parameterizing the file names are:
[[Link]]
$InputFileName=customers_20120101.dat
$outputFileName=customers_file.dat
. Source File Directory: Enter the directory name where the source file resides.
. Source Filename: Enter the name of the file to be loaded into the target.
. Source Filetype: Specify the direct option when you want to load a single file
into the target.
Example: Let say we want to load the employees source file ([Link]) in the
directory
$PMSourceFileDir into the target, then source file properties to be configured in
the session are:
>cat customers_list.dat
$PMSourceFileDir/customers_us.dat
$PMSourceFileDir/customers_uk.dat
$PMSourceFileDir/customers_india.dat
Rules and guidelines for creating the list file:
. Each file in the list must use the user-defined code page configured in the
source definition.
. Each file in the file list must share the same file properties as configured in
the source definition
or as entered for the source instance in the session property sheet.
. Enter one file name or one path and file name on a line. If you do not specify a
path for a file,
the Integration Service assumes the file is in the same directory as the file list.
. Source File Directory: Enter the directory name where the source file resides.
. Source Filename: Enter the list file name in case of indirect load
. Source Filetype: Specify the indirect option when you want to load a multiple
files with same
properties.
Note: If you have multiple files with different properties, then you cannot use the
indirect load option.
You have to use direct load option in this case.
Q1) I have a flat file, want to reverse the contents of the flat file which means
the first record should
come as last record and last record should come as first record and load into the
target file.
As an example consider the source flat file data as
Solution:
Follow the below steps for creating the mapping logic
Q2) Load the header record of the flat file into first target, footer record into
second target and the
remaining records into the third target.
The solution to this problem I have already posted by using aggregator and joiner.
Now we will see
how to implement this by reversing the contents of the file.
Solution:
The variables in informatica can be used to store intermediate values and can be
used in
calculations. We will see how to use the mapping variables with an example.
Q) I want to load the data from a flat file into a target. The flat file has n
number of records. How the
load should happen is: In the first run i want to load the first 50 records, in the
second run the next
20 records, in the third run, the next 20 records and so on?
We will solve this problem with the help of mapping variables. Follow the below
steps to implement
this logic:
. Now create a filter transformtion and drag the ports of expression transformation
into it. In the
filter transformation specfiy the contition as
IIF(v_check_rec=50,
. Drag the target definition into the mapping and connect the appropriate ports of
filter
transformation to the target.
. Create a workflow and run the workflow multiple times to see the effect.
When you run a session, the integration service evaluates the expression for each
row in the
transaction control transformation. When it evaluates the expression as commit,
then it commits all
the rows in the transaction to the target(s). When the integration service
evaluates the expression as
rollback, then it roll back all the rows in the transaction from the target(s).
When you have flat file as the target, then the integration service creates an
output file for each time
it commits the transaction. You can dynamically name the target flat files. Look at
the example for
creating flat files dynamically - Dynamic flat file creation.
Creating Transaction Control Transformation
Follow the below steps to create transaction control transformation:
. Transformation Tab: You can rename the transformation and add a description.
. Ports Tab: You can create input/output ports
. Properties Tab: You can define the transaction control expression and tracing
level.
. Metadata Extensions Tab: You can add metadata information.
Syntax:
Example:
IIF(dept_id=10, TC_COMMIT_BEFORE,TC_ROLLBACK_BEFORE)
Use the following built-in variables in the expression editor of the transaction
control transformation:
change for this row. This is the default value of the expression.
. TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the
new transaction.
. TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the
transaction, and begins a new transaction. The current row is in the committed
transaction.
. TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a
new transaction, and writes the current row to the target. The current row is in
the new
transaction.
. TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back
the transaction, and begins a new transaction. The current row is in the rolled
back transaction.
. If the mapping includes an XML target, and you choose to append or create a new
document on
commit, the input groups must receive data from the same transaction control point.
. Transaction Control transformations connected to any target other than
relational, XML, or
dynamic MQSeries targets are ineffective for those targets.
. You must connect each target instance to a Transaction Control transformation.
. You can connect multiple targets to a single Transaction Control transformation.
. You can connect only one effective Transaction Control transformation to a
target.
. You cannot place a Transaction Control transformation in a pipeline branch that
starts with a
Sequence Generator transformation.
. If you use a dynamic Lookup transformation and a Transaction Control
transformation in the
same mapping, a rolled-back transaction might result in unsynchronized target data.
. A Transaction Control transformation may be effective for one target and
ineffective for another
target. If each target is connected to an effective Transaction Control
transformation, the
mapping is valid.
. Either all targets or none of the targets in the mapping should be connected to
an effective
Transaction Control transformation.
Q) How to load the name of the current processing flat file along with the data
into the target using
informatica mapping?
We will create a simple pass through mapping to load the data and "file name" from
a flat file into the
target. Assume that we have a source file "customers" and want to load this data
into the target
"customers_tgt". The structures of source and target are
Customer_Id
Location
Target: Customers_TBL
Customer_Id
Location
FileName
The loading of the filename works for both Direct and Indirect Source filetype.
After running the
workflow, the data and the filename will be loaded in to the target. The important
point to note is the
complete path of the file will be loaded into the target. This means that the
directory path and the
filename will be loaded(example: /informatica/9.1/SrcFiles/[Link]).
If you don�t want the directory path and just want the filename to be loaded in to
the target, then
follow the below steps:
REVERSE
(
SUBSTR
REVERSE(CurrentlyProcessedFileName),
1,
INSTR(REVERSE(CurrentlyProcessedFileName), '/') - 1
Join Condition
The integration service joins both the input sources based on the join condition.
The join condition
contains ports from both the input sources that must match. You can specify only
the equal (=)
operator between the join columns. Other operators are not allowed in the join
condition. As an
example, if you want to join the employees and departments table then you have to
specify the join
condition as department_id1= department_id. Here department_id1 is the port of
departments
source and department_id is the port of employees source.
Join Type
. Normal Join
. Master Outer Join
. Details Outer Join
. Full Outer Join
We will learn about each join type with an example. Let say i have the following
students and
subjects tables as the source.
Subject_Id subject_Name
-----------------------
1 Maths
2 Chemistry
3 Physics
Student_Id Subject_Id
---------------------
10 1
20 2
30 NULL
Assume that subjects source is the master and students source is the detail and we
will join these
sources on the subject_id port.
Normal Join:
The joiner transformation outputs only the records that match the join condition
and discards all the
rows that do not match the join condition. The output of the normal join is
---------------------------------------------
---------------------------------------------
1 Maths 10 1
2 Chemistry 20 2
In a master outer join, the joiner transformation keeps all the records from the
detail source and only
the matching rows from the master source. It discards the unmatched rows from the
master source.
The output of master outer join is
---------------------------------------------
---------------------------------------------
1 Maths 10 1
2 Chemistry 20 2
NULL NULL 30 NULL
In a detail outer join, the joiner transformation keeps all the records from the
master source and only
the matching rows from the detail source. It discards the unmatched rows from the
detail source.
The output of detail outer join is
---------------------------------------------
---------------------------------------------
1 Maths 10 1
2 Chemistry 20 2
The full outer join first brings the matching rows from both the sources and then
it also keeps the
non-matched records from both the master and detail sources. The output of full
outer join is
---------------------------------------------
---------------------------------------------
1 Maths 10 1
2 Chemistry 20 2
Sorted Input
Use the sorted input option in the joiner properties tab when both the master and
detail are sorted on
the ports specified in the join condition. You can improve the performance by using
the sorted input
option as the integration service performs the join by minimizing the number of
disk IOs. you can see
good performance when worked with large data sets.
. Sort the master and detail source either by using the source qualifier
transformation or sorter
transformation.
. Sort both the source on the ports to be used in join condition either in
ascending or descending
order.
. Specify the Sorted Input option in the joiner transformation properties tab.
The integration service blocks and unblocks the source data depending on whether
the joiner
transformation is configured for sorted input or not.
In case of unsorted joiner transformation, the integration service first reads all
the master rows
before it reads the detail rows. The integration service blocks the detail source
while it caches the all
the master rows. Once it reads all the master rows, then it unblocks the detail
source and reads the
details rows.
. You cannot use joiner transformation when the input pipeline contains an update
strategy
transformation.
. You cannot connect a sequence generator transformation directly to the joiner
transformation.
Q) How to create or implement slowly changing dimension (SCD) Type 2 Effective Date
mapping in
informatica?
SCD type 2 will store the entire history in the dimension table. In SCD type 2
effective date, the
dimension table will have Start_Date (Begin_Date) and End_Date as the fields. If
the End_Date is
Null, then it indicates the current row. Know more about SCDs at Slowly Changing
Dimensions
Concepts.
We will see how to implement the SCD Type 2 Effective Date in informatica. As an
example consider
the customer dimension. The source and target table structures are shown below:
--Source Table
Create Table Customers
Location Varchar2(30)
);
Customer_Id Number,
Location Varchar2(30),
Begin_Date Date,
End_Date Date
);
The basic steps involved in creating a SCD Type 2 Effective Date mapping are
. Identifying the new records and inserting into the dimension table with
Begin_Date as the
Current date (SYSDATE) and End_Date as NULL.
. Identifying the changed record and inserting into the dimension table with
Begin_Date as the
Current date (SYSDATE) and End_Date as NULL.
. Identify the changed record and update the existing record in dimension table
with End_Date as
Curren date.
We will divide the steps to implement the SCD type 2 Effective Date mapping into
four parts.
SCD Type 2 Effective Date implementation - Part 1
[Link]
[Link]
[Link]
LKP_PORTS_WINDOW.jpg
Here we will see the basic set up and mapping flow require for SCD type 2 Effective
Date. The steps
involved are:
Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM Customers_Dim
LKP_Location != SRC_Location, 1, 0)
. Now create a filter transformation to identify and insert new record in to the
dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier
transformation
(Customer_Id, Location) into the filter transformation.
. Go the properties tab of filter transformation and enter the filter condition as
New_Flag=1
. Now create a update strategy transformation and connect the ports of filter
transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy
expression as
DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Create a sequence generator and an expression transformation. Call this
expression
transformation as "Expr_Date".
. Drag and connect the NextVal port of sequence generator to the Expression
transformation. In
the expression transformation create a new output port (Begin_Date with date/time
data type)
and assign value SYSDATE to it.
. Now connect the ports of expression transformation (Nextval, Begin_Date) to the
Target
definition ports (Cust_Key, Begin_Date). The part of the mapping flow is shown in
the below
image.
transformation into it. Go to the properties tab and enter the update strategy
expression as
DD_UPDATE.
. Drag the target definition into the mapping and connect the appropriate ports of
update strategy
to it. The complete mapping image is shown below.
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
--Source Table
Location Varchar2(30)
);
Customer_Id Number,
Location Varchar2(30),
Flag Number
);
The basic steps involved in creating a SCD Type 2 Flagging mapping are
. Identifying the new records and inserting into the dimension table with flag
column value as one.
. Identifying the changed record and inserting into the dimension table with flag
value as one.
[Link]
[Link]
. Identify the changed record and update the existing record in dimension table
with flag value as
zero.
We will divide the steps to implement the SCD type 2 flagging mapping into four
parts.
SCD Type 2 Flag implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2 Flagging.
The steps
involved are:
. Go to the conditions tab of the lookup transformation and enter the condition as
Customer_Id =
IN_Customer_Id
. Go to the properties tab of the LKP transformation and enter the below query in
Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database
in the
Lookup SQL Override expression editor and then add the WHERE clause.
Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM Customers_Dim
WHERE Customers_Dim.Flag = 1
. Now create a filter transformation to identify and insert new record in to the
dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier
transformation
(Customer_Id, Location) into the filter transformation.
. Go the properties tab of filter transformation and enter the filter condition as
New_Flag=1
. Now create a update strategy transformation and connect the ports of filter
transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy
expression as
DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Create a sequence generator and an expression transformation. Call this
expression
transformation as "Expr_Flag".
. Drag and connect the NextVal port of sequence generator to the Expression
transformation. In
the expression transformation create a new output port (Flag) and assign value 1 to
it.
. Now connect the ports of expression transformation (Nextval, Flag) to the Target
definition ports
(Cust_Key, Flag). The part of the mapping flow is shown in the below image.
transformation into it. Go to the properties tab and enter the update strategy
expression as
DD_UPDATE.
. Drag the target definition into the mapping and connect the appropriate ports of
update strategy
to it. The complete mapping image is shown below.
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
--Source Table
Location Varchar2(30)
);
Customer_Id Number,
[Link]
[Link]
Location Varchar2(30),
Version Number
);
The basic steps involved in creating a SCD Type 2 version mapping are
. Identifying the new records and inserting into the dimension table with version
number as one.
. Identifying the changed record and inserting into the dimension table by
incrementing the
version number.
Lets divide the steps to implement the SCD type 2 version mapping into three parts.
. Go to the conditions tab of the lookup transformation and enter the condition as
Customer_Id =
IN_Customer_Id
. Go to the properties tab of the LKP transformation and enter the below query in
Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database
in the
Lookup SQL Override expression editor and then add the order by clause.
Customers_Dim.Location as Location,
Customers_Dim.Version as Version,
Customers_Dim.Customer_Id as Customer_Id
FROM Customers_Dim
. You have to use an order by clause in the above query. If you sort the version
column in
ascending order, then you have to specify "Use Last Value" in the "Lookup policy on
multiple
match" property. If you have sorted the version column in descending order then you
have to
specify the "Lookup policy on multiple match" option as "Use First Value"
. Click on Ok in the lookup transformation. Connect the customer_id port of source
qualifier
transformation to the In_Customer_Id port of the LKP transformation.
[Link]
SCD_type2_version_mapping_part_1.jpg
. Create an expression transformation with input/output ports as Cust_Key,
LKP_Location,
Src_Location and output ports as New_Flag, Changed_Flag. Enter the below
expressions for
output ports.
LKP_Location != SRC_Location, 1, 0)
. Now create a filter transformation to identify and insert new record in to the
dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier
transformation
(Customer_Id, Location) into the filter transformation.
. Go the properties tab of filter transformation and enter the filter condition as
New_Flag=1
. Now create a update strategy transformation and connect the ports of filter
transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy
expression as
DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Create a sequence generator and an expression transformation. Call this
expression
transformation as "Expr_Ver".
. Drag and connect the NextVal port of sequence generator to the Expression
transformation. In
the expression transformation create a new output port (Version) and assign value 1
to it.
. Now connect the ports of expression transformation (Nextval, Version) to the
Target definition
ports (Cust_Key, Version). The part of the mapping flow is shown in the below
image.
[Link]
SCD_type2_version_mapping_part_2.jpg
[Link]
SCD_type2_version_complete_mapping.jpg
. Create a filter transformation. This is used to find the changed record. Now drag
the ports from
expression transformation (changed_flag), source qualifier transforamtion
(customer_id,
location) and LKP transformation (version) into the filter transformation.
. Go to the filter transformation properties and enter the filter condition as
changed_flag =1.
. Create an expression transformation and drag the ports of filter transformation
except the
changed_flag port into the expression transformation.
. Go to the ports tab of expression transformation and create a new output port
(O_Version) and
assign the expression as (version+1).
. Now create an update strategy transformation and drag the ports of expression
transformation
(customer_id, location,o_version) into the update strategy transformation. Go to
the properties
tab and enter the update strategy expression as DD_INSERT.
. Now drag the target definition into the mapping and connect the appropriate ports
of update
strategy transformation to the target definition.
. Now connect the Next_Val port of expression transformation (Expr_Ver created in
part 2) to the
cust_key port of the target definition. The complete mapping diagram is shown in
the below
image:
You can implement the SCD type 2 version mapping in your own way. Remember that SCD
type2
version mapping is rarely used in real time.
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
Rank Transformation in Informatica
. Cache Directory: Directory where the integration service creates the index and
data cache
files.
. Top/Bottom: Specify whether you want to select the top or bottom rank of data.
. Number of Ranks: specify the number of rows you want to rank.
. Case-Sensitive String Comparison: Used to sort the strings using case sensitive
or not.
. Tracing Level: Amount of logging to be tracked in the session log file.
. Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. You
can set a
numeric value, or Auto for the data cache size. In case of Auto, the Integration
Service
determines the cache size at runtime.
. Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. You
can set a
numeric value, or Auto for the index cache size. In case of Auto, the Integration
Service
determines the cache size at runtime.
. Create a new mapping, Drag the source definition into the mapping.
[Link]
rank_ports_window.jpg
[Link]
rank_properties_window.jpg
. Create a rank transformation and drag the ports of source qualifier
transformation into the rank
transformation.
. Now go to the ports tab of the rank transformation. Check the rank (R) option for
the salary port
and Group By option for the Dept_Id port.
. Go to the properties tab, select the Top/Bottom value as Top and the Number of
Ranks property
as 2.
We will see the implementation of SCD type 3 by using the customer dimension table
as an
example. The source table looks as
Customer_Id Number,
Location Varchar2(30)
Now I have to load the data of the source into the customer dimension table using
SCD Type 3. The
Dimension table structure is shown below.
Cust_Key Number,
Customer_Id Number,
Curent_Location Varchar2(30),
Previous_Location Varchar2(30)
. Go to the condition tab of LKP transformation and enter the lookup condition as
Customer_Id =
IN_Customer_Id. Then click on OK.
. Connect the customer_id port of source qualifier transformation to the
IN_Customer_Id port of
LKP transformation.
. Create the expression transformation with input ports as Cust_Key, Prev_Location,
New_Flag = IIF(ISNULL(Cust_Key),1,0)
1, 0 )
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
We see the implementation of SCD type 1 by using the customer dimension table as an
example.
The source table looks as
CREATE TABLE Customers (
Customer_Id Number,
Customer_Name Varchar2(30),
Location Varchar2(30)
Now I have to load the data of the source into the customer dimension table using
SCD Type 1. The
Dimension table structure is shown below.
Cust_Key Number,
Customer_Id Number,
Customer_Name Varchar2(30),
Location Varchar2(30)
. Go to the condition tab of lkp transformation and enter the lookup condition as
Customer_Id =
IN_Customer_Id. Then click on OK.
[Link]
LKP_ports_window_type1_condition_tab.jpg
New_Flag = IIF(ISNULL(Cust_Key),1,0)
OR Location != Src_Location),
1, 0 )
. Now connect the ports of lkp transformation (Cust_Key, Name, Location) to the
expression
transformaiton ports (Cust_Key, Name, Location) and ports of source qualifier
transformation(Name, Location) to the expression transforamtion ports(Src_Name,
Src_Location) respectively.
. The mapping diagram so far created is shown in the below image.
[Link]
scd_typ1_mapping_part1.jpg
[Link]
scd_typ1_mapping_part2.jpg
. Now create another filter transformation and drag the ports from lkp
transformation (Cust_Key),
source qualifier transformation (Name, Location), expression transformation
(changed_flag)
ports into the filter transformation.
. Edit the filter transformation, go to the properties tab and enter the Filter
Condition as
Changed_Flag=1. Then click on ok.
. Now create an update strategy transformation and connect the ports of the filter
transformation
(Cust_Key, Name, and Location) to the update strategy. Go to the properties tab of
update
strategy and enter the update strategy expression as DD_Update
. Now drag the target definition into the mapping and connect the appropriate ports
from update
strategy to the target definition.
. The complete mapping diagram is shown in the below image.
[Link]
scd_typ1_complete_mapping.jpg
Recommended Reading
Learn how to Design Different Types of SCDs in informatica
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
The Mapping Wizards in informatica provides an easy way to create the different
types of SCDs. We
will see how to create the SCDs using the mapping wizards in step by step.
The below steps are common for creating the SCD type 1, type 2 and type 3
Open the mapping designer tool, Go to the source analyzer tab and either create or
import the
source definition. As an example i am using the customer table as the source. The
fields in the
customer table are listed below.
Go to the mapping designer tab, in the tool bar click on Mappings, select Wizards
and then click on
Slowly Changing Dimensions.
[Link]
Mapping_Wizard.jpg
[Link]
SCD_Window1.jpg
Now enter the mapping name and select the SCD mapping type you want to create. This
is shown in
the below image. Then click on Next.
Select the source table name (Customers in this example) and enter the name for the
target table to
be created. Then click on next.
[Link]
SCD_Window2.jpg
Now you have to select the logical key fields and fields to compare for changes.
Logical key fields
are the fields which the source qualifier and the Lookup will be joined. Fields to
compare for changes
are the fields which are used to determine whether the values are changed or not.
Here i am using
customer_id as the logical key field and the location as the field to compare.
As of now we have seen the common steps for creating the SCDs. Now we will see the
specific
steps for creating each SCD
Once you have selected the logical key fields and fields to compare for changes.
Then you have to
simply click the finish button to create the SCD Type 1 mapping.
[Link]
SCD_TYPE1.jpg
After selecting the logical fields click on the next button. You will get a window
where you can select
what type of SCD 2 you want to create. For
Once you have selected the required type, then click on the finish button to create
the SCD type 2
mapping.
[Link]
SCD_TYPE2.jpg
[Link]
SCD_TYPE3.jpg
Click on the next button after selecting the logical key fields. You will get
window for selecting the
optional Effective Date. If you want the effective date to be created in the
dimension table, you can
check this box or else ignore. Now click on the finish button to create the SCD
type 3 mapping.
[Link]
Constraint_based_load_ordering.jpg
Constraint based load ordering is used to load the data first in to a parent table
and then in to the
child tables. You can specify the constraint based load ordering option in the
Config Object tab of the
session. When the constraint based load ordering option is checked, the integration
service order
the target load order on a row by row basis. For every row generated by the active
source, the
integration service first loads the row into the primary key table and then to the
foreign key tables.
The constraint based loading is helpful to normalize the data from a denormalized
source data.
. The constraint based load ordering option applies for only insert operations.
. You cannot update or delete the rows using the constraint base load ordering.
. You have to define the primary key and foreign key relationships for the targets
in the
warehouse or target designer.
. The target tables must be in the same Target connection group.
When you enable complete constraint based loading, the change data (inserts,
updates and deletes)
is loaded in the same transaction control unit by using the row ID assigned to the
data by the CDC
reader. As a result the data is applied to the target in the same order in which it
was applied to the
sources. You can also set this property in the integration service, which makes it
applicable for all
the sessions and workflows. When you use complete constraint based load ordering,
mapping
should not contain active transformations which change the row ID generated by the
CDC reader.
The following transformations can change the row ID value
. Aggregator Transformation
. Custom Transformation configured as an active
. Joiner Transformation
. Normalizer Transformation
. Rank Transformation
. Sorter Transformation
dept_id number,
dept_name varchar2(30),
emp_id number,
emp_name varchar2(30)
);
---------------------------------
10 Finance 1 Mark
10 Finance 2 Henry
20 Hr 3 Christy
20 Hr 4 Tailor
dept_name varchar2(30)
);
dept_id dept_name
-----------------
10 Finance
20 Hr
dept_id number,
emp_id number,
emp_name varchar2(30),
);
---------------------------------
[Link]
target_relational_source_primary_key.jpg
10 1 Mark
10 2 Henry
20 3 Christy
20 4 Tailor
Follow the below steps for creating the mapping using constraint based load
ordering option.
. Now create a new mapping. Drag the source and targets into the mapping.
. Connect the appropriate ports of source qualifier transformation to the target
definition as shown
in the below image.
. Go to the workflow manager tool, create a new workflow and then session.
. Go to the Config object tab of session and check the option of constraint based
load ordering.
. Go to the mapping tab and enter the connections for source and targets.
. Save the mapping and run the workflow.
Q) How to find the Minimum and maximum values of continuous sequence numbers in a
group of
rows.
I know the problem is not clear without giving an example. Let say I have the
Employees table with
the below data.
Dept_Id Emp_Seq
---------------
10 1
10 2
10 3
10 5
10 6
10 8
10 9
10 11
20 1
20 2
I want to find the minimum and maximum values of continuous Emp_Seq numbers. The
output
should look as.
-----------------------
10 1 3
10 5 6
10 8 9
10 11 11
20 1 2
Write an SQL query in oracle to find the minimum and maximum values of continuous
Emp_Seq in
each department?
STEP1: First we will generate unique sequence numbers in each department using the
Row_Number analytic function in the Oracle. The SQL query is.
SELECT Dept_Id,
Emp_Seq,
FROM employees;
Dept_Id Emp_Seq rn
--------------------
10 1 1
10 2 2
10 3 3
10 5 4
10 6 5
10 8 6
10 9 7
10 11 8
20 1 1
20 2 2
STEP2: Subtract the value of rn from emp_seq to identify the continuous sequences
as a group. The
SQL query is
SELECT Dept_Id,
Emp_Seq,
FROM employees;
Dept_Id Emp_Seq Dept_Split
---------------------------
10 1 0
10 2 0
10 3 0
10 5 1
10 6 1
10 8 2
10 9 2
10 11 3
20 1 0
20 2 0
STEP3: The combination of the Dept_Id and Dept_Split fields will become the group
for continuous
rows. Now use group by on these fields and find the min and max values. The final
SQL query is
SELECT Dept_Id,
MIN(Emp_Seq) Min_Seq,
MAX(Emp_Seq) Max_Seq
FROM
SELECT Dept_Id,
Emp_Seq,
FROM employees;
) A
Group BY Dept_Id, Dept_Split
Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which
the data
changes slowly, rather than changing regularly on a time basis.
For example, you may have a customer dimension in a retail domain. Let say the
customer is in
India and every month he does some shopping. Now creating the sales report for the
customers is
easy. Now assume that the customer is transferred to United States and he does
shopping there.
How to record such a change in your customer dimension?
You could sum or average the sales done by the customers. In this case you won't
get the exact
comparison of the sales done by the customers. As the customer salary is increased
after the
transfer, he/she might do more shopping in United States compared to in India. If
you sum the total
sales, then the sales done by the customer might look stronger even if it is good.
You can create a
second customer record and treat the transferred customer as the new customer.
However this will
create problems too.
Handling these issues involves SCD management methodologies which referred to as
Type 1 to
Type 3. The different types of slowly changing dimensions are explained in detail
below.
SCD Type 1: SCD type 1 methodology is used when there is no need to store
historical data in the
dimension table. This method overwrites the old data in the dimension table with
the new data. It is
used to correct data errors in the dimension.
As an example, i have the customer table with the below data.
------------------------------------------------
1 1 Marspton Illions
------------------------------------------------
1 1 Marston Illions
The advantage of type1 is ease of maintenance and less space occupied. The
disadvantage is that
there is no historical data kept in the data warehouse.
SCD Type 3: In type 3 method, only the current status and previous status of the
row is maintained
in the table. To track these changes two separate columns are created in the table.
The customer
dimension table in the type 3 method will look as
--------------------------------------------------------------------------
Let say, the customer moves from Illions to Seattle and the updated table will look
as
--------------------------------------------------------------------------
Now again if the customer moves from seattle to NewYork, then the updated table
will be
--------------------------------------------------------------------------
The type 3 method will have limited history and it depends on the number of columns
you create.
SCD Type 2: SCD type 2 stores the entire history the data in the dimension table.
With type 2 we
can store unlimited history in the dimension table. In type 2, you can store the
data in three different
ways. They are
. Versioning
. Flagging
. Effective Date
--------------------------------------------------------
1 1 Marston Illions 1
The customer moves from Illions to Seattle and the version number will be
incremented. The
dimension table will look as
--------------------------------------------------------
1 1 Marston Illions 1
2 1 Marston Seattle 2
Now again if the customer is moved to another location, a new record will be
inserted into the
dimension table with the next version number.
SCD Type 2 Flagging: In flagging method, a flag column is created in the dimension
table. The
current record will have the flag value as 1 and the previous records will have the
flag as 0.
Now for the first time, the customer dimension will look as.
--------------------------------------------------------
1 1 Marston Illions 1
Now when the customer moves to a new location, the old records will be updated with
flag value as
0 and the latest record will have the flag value as 1.
--------------------------------------------------------
1 1 Marston Illions 0
2 1 Marston Seattle 1
SCD Type 2 Effective Date: In Effective Date method, the period of the change is
tracked using the
start_date and end_date columns in the dimension table.
-------------------------------------------------------------------------
The NULL in the End_Date indicates the current version of the data and the
remaining records
indicate the past data.
. ScriptName (Input port) : Receives the name of the script to execute for the
current row.
. ScriptResult (output port) : Returns PASSED if the script execution succeeds for
the row.
Otherwise FAILED.
. ScriptError (Output port) : Returns errors that occur when a script fails for a
row.
Note: Use SQL transformation in script mode to run DDL (data definition language)
statements like
creating or dropping the tables.
Create SQL Transformation in Script Mode
We will see how to create sql transformation in script mode with an example. We
will create the
following sales table in oracle database and insert records into the table using
the SQL
transformation.
Sale_id Number,
Product_name varchar2(30),
Price Number
);
$PMSourceFileDir/sales_ddl.txt
$PMSourceFileDir/sales_dml.txt
Now we will create a mapping to execute the script files using the SQL
transformation. Follow the
below steps to create the mapping.
. Go to the mapping designer tool, source analyzer and create the source file
definition with the
structure as the $PMSourceFileDir/Script_names.txt file. The flat file structure is
shown in the
below image.
. Go to the warehouse designer or target designer and create a target flat file
with result and error
ports. This is shown in the below image.
This will create the sales table in the oracle database and inserts the records.
We will see how to create an SQL transformation in script mode, query mode and
passing the
dynamic database connection with examples.
Query Mode: The SQL transformation executes a query that defined in the query
editor. You can
pass parameters to the query to define dynamic queries. The SQL transformation can
output
multiple rows when the query has a select statement. In query mode, the SQL
transformation acts as
an active transformation.
You can create the following types of SQL queries
Static SQL query: The SQL query statement does not change, however you can pass
parameters
to the sql query. The integration service runs the query once and runs the same
query for all the
input rows.
Dynamic SQL query: The SQL query statement and the data can change. The integration
service
prepares the query for each input row and then runs the query.
SQL Transformation Example Using Static SQL query
Q1) Let�s say we have the products and Sales table with the below data.
PRODUCT
-------
SAMSUNG
LG
IPhone
----------------------
SAMSUNG 2 100
LG 3 80
IPhone 5 200
SAMSUNG 5 500
Create a mapping to join the products ant sales table on product column using the
SQL
Transformation? The output will be
----------------------
SAMSUNG 2 100
SAMSUNG 5 500
LG 3 80
Solution:
Just follow the below steps for creating the SQL transformation to solve the
example
. Create a new mapping, drag the products source definition to the mapping.
. Go to the toolbar -> Transformation -> Create -> Select the SQL transformation.
Enter a name
and then click create.
informatica sql transformation in query mode
sql transformation sql ports tab in informatica
. Select the execution mode as query mode, DB type as Oracle, connection type as
static. This is
shown in the below [Link] click OK.
. Edit the sql transformation, go to the "SQL Ports" tab and add the input and
output ports as
shown in the below image.
. In the same "SQL Ports" Tab, go to the SQL query and enter the below sql in the
SQL editor.
sql transformation informatica mapping
select product, quantity, price from sales where product = ?product?
. Here ?product? is the parameter binding variable which takes its values from the
input port.
Now connect the source qualifier transformation ports to the input ports of SQL
transformation
and target input ports to the SQL transformation output ports. The complete mapping
flow is
shown below.
. Create the workflow, session and enter the connections for source, target. For
SQL
transformation also enter the source connection.
After you run the workflow, the integration service generates the following queries
for sql
transformation
Dynamic SQL query: A dynamic SQL query can execute different query statements for
each input
row. You can pass a full query or a partial query to the sql transformation input
ports to execute the
dynamic sql queries.
SQL Transformation Example Using Full Dynamic query
Q2) I have the below source table which contains the below data.
Del_statement
------------------------------------------
. Now go to the "SQL Ports" tab of SQL transformation and create the input port as
"Query_Port".
Connect this input port to the Source Qualifier Transformation.
. In the "SQL Ports" tab, enter the sql query as ~Query_Port~. The tilt indicates a
variable
substitution for the queries.
. As we don�t need any output, just connect the SQLError port to the target.
. Now create workflow and run the workflow.
Tab_Names
----------
sales
products
Solution:
Create the input port in the sql transformation as Table_Name and enter the below
query in the SQL
Query window.
Recommended Reading
More about sql transformation - create SQL transformation in script mode.
Q) How to generate or load values in to the target table based on a column value
using informatica
etl tool.
I have the products table as the source and the data of the products table is
shown below.
Product Quantity
-----------------
Samsung NULL
Iphone 3
LG 0
Nokia 4
Now i want to duplicate or repeat each product in the source table as many times as
the value in the
quantity column. The output is
product Quantity
----------------
Iphone 3
Iphone 3
Iphone 3
Nokia 4
Nokia 4
Nokia 4
Nokia 4
The Samsung and LG products should not be loaded as their quantity is NULL, 0
respectively.
Now create informatica workflow to load the data in to the target table?
Solution:
Follow the below steps
. Create a new mapping in the mapping designer
. Drag the source definition in to the mapping
. Create the java transformation in active mode
. Drag the ports of source qualifier transformation in to the java transformation.
. Now edit the java transformation by double clicking on the title bar of the java
transformation
and go to the "Java Code" tab.
. Enter the below java code in the "Java Code" tab.
if (!isNull("quantity"))
product = product;
quantity = quantity;
generateRow();
}
informatica joiner transformation example
. Now compile the java code. The compile button is shown in red circle in the
image.
. Connect the ports of the java transformation to the target.
. Save the mapping, create a workflow and run the workflow.
. Case Sensitive: The integration service considers the string case when sorting
the data. The
integration service sorts the uppercase characters higher than the lowercase
characters.
. Work Directory: The integration service creates temporary files in the work
directory when it is
sorting the data. After the integration service sorts the data, it deletes the
temporary files.
. Distinct Output Rows: The integration service produces distinct rows in the
output when this
option is configured.
. Tracing Level: Configure the amount of data needs to be logged in the session log
file.
. Null Treated Low: Enable the property, to treat null values as lower when
performing the sort
operation. When disabled, the integration service treats the null values as higher
than any other
value.
. Sorter Cache Size: The integration service uses the sorter cache size property to
determine the
amount of memory it can allocate to perform sort operation
. Input groups: The designer copies the input ports properties to create a set of
output ports for
each output group.
. Output groups: Router transformation has two output groups. They are user-defined
groups
and default group.
department_id=10
In the second group filter condition,
department_id=20
department_id=30
department_id<=30
What data will be loaded into the first and second target tables?
Solution: The first target table will have employees from department 30. The second
table will have
employees whose department ids are less than or equal to 30.
. Union transformation contains only one output group and can have multiple input
groups.
. The input groups and output groups should have matching ports. The datatype,
precision and
scale must be same.
. Union transformation does not remove duplicates. To remove the duplicate rows use
sorter
transformation with "select distinct" option after the union transformation.
. The union transformation does not generate transactions.
. You cannot connect a sequence generator transformation to the union
transformation.
. Union transformation does not generate transactions.
. Groups Tab: You can create new input groups or delete existing input groups.
. Group Ports Tab: You can create and delete ports for the input groups.
Note: The ports tab displays the groups and ports you create. You cannot edit the
port or group
information in the ports tab. To do changes use the groups tab and group ports tab.
. Transformation: You can enter the name and description of the transformation.
. Ports: Create new ports and configure them
. Properties: You can specify the filter condition to filter the rows. You can also
configure the
tracing levels.
. Metadata Extensions: Specify the metadata details like name, datatype etc.
. Use the filter transformation as close as possible to the sources in the mapping.
This will reduce
the number of rows to be processed in the downstream transformations.
. In case of relational sources, if possible use the source qualifier
transformation to filter the rows.
This will reduce the number of rows to be read from the source.
Note: The input ports to the filter transformation mush come from a single
transformation. You
cannot connect ports from more than one transformation to the filter.
Filter Transformation examples
Specify the filter conditions for the following examples
1. Create a mapping to load the employees from department 50 into the target?
department_id=50
2. Create a mapping to load the employees whose salary is in the range of 10000 to
50000?
3. Create a mapping to load the employees who earn commission (commission should
not be null)?
IIF(ISNULL(commission),FALSE,TRUE)
Adding Expressions
Once you created an expression transformation, you can add the expressions either
in a variable
port or output port. Create a variable or output port in the expression
transformation. Open the
Expression Editor in the expression section of the variable or output port. Enter
an expression and
then click on Validate to verify the expression syntax. Now Click OK.
Expression Transformation Components or Tabs
The expression transformation has the following tabs
. Transformation: You can enter the name and description of the transformation. You
can also
make the expression transformation reusable.
. Ports: Create new ports and configuring the ports.
. Properties: Configure the tracing level to set the amount of transaction detail
to be logged in
session log file.
. Metadata Extensions: You can specify extension name, data type, precision, value
and can
also create reusable metadata extensions.
Configuring Ports:
You can configure the following components on the ports tab
CONCAT(CONCAT(first_name,' '),last_name)
The above expression can be simplified as first_name||' '||last_name
Solve more scenarios on expression stransformation at Informatica Scenarios
Join command is one of the text processing utility in Unix/Linux. Join command is
used to combine
two files based on a matching fields in the files. If you know SQL, the join
command is similar to
joining two tables in a database.
The syntax of join command is
-1 field number : Join on the specified field number in the first file
-2 field number : Join on the specified field number in the second file
-o list : displays only the specified fields from both the files
10 mark
10 steve
20 scott
30 chris
10 hr
20 finance
30 db
Here we will join on the first field and see the output. By default, the join
command treats the field
delimiter as space or tab.
10 mark hr
10 steve hr
20 scott finance
30 chris db
Important Note: Before joining the files, make sure to sort the fields on the
joining fields. Otherwise
you will get incorrect result.
2. Write a join command to join the two files? Here use the second field from the
first file and the first
field from the second file to join.
In this example, we will see how to join two files on different fields rather than
the first field. For this
consider the below two files as an example
mark 10 1
steve 10 1
scott 20 2
chris 30 3
10 hr 1
20 finance 2
30 db 3
From the above, you can see the join fields are the second field from the [Link]
and the first field
from the [Link]. The join command to match these two files is
> join -1 2 -2 1 [Link] [Link]
10 mark 1 hr 1
10 steve 1 hr 1
20 scott 2 finance 2
30 chris 3 db 3
You can also see that the two files can also be joined on the third filed. As the
both the files have the
matching join field, you can use the j option in the join command.
Here -1 2 specifies the second field from the first file ([Link]) and -2 1
specifies the first field from
the second file ([Link])
1 mark 10 10 hr
1 steve 10 10 hr
2 scott 20 20 finance
3 chris 30 30 db
3. Write a join command to select the required fields from the input files in the
output? Select first
filed from first file and second field from second file in the output.
By default, the join command prints all the fields from both the files (except the
join field is printed
once). We can choose what fields to be printed on the terminal with the -o option.
We will use the
same files from the above example.
mark hr
steve hr
scott finance
chris db
Here 1.1 means in the first file select the first field. Similarly, 2.2 means in
the second file select the
second field
4. Write a command to join two delimited files? Here the delimiter is colon (:)
So far we have joined files with space delimiter. Here we will see how to join
files with a colon as
delimiter. Consider the below two files.
mark:10
steve:10
scott:20
chris:30
10:hr
20:finance
30:db
The -t option is used to specify the delimiter. The join command for joining the
files is
10:mark:hr
10:steve:hr
20:scott:finance
30:chris:db
mark,A
steve,a
scott,b
chris,C
a,hr
B,finance
c,db
A,mark,hr
a,steve,hr
b,scott,finance
C,chris,db
6. Write a join command to print the lines which do not match the values in joining
fields?
By default the join command prints only the matched lines from both the files which
means prints the
matched lines that passed the join condition. We can use the -a option to print the
non-matched
lines.
A 1
B 2
C 3
C 3
D 4
A 1
B 2 2
C 3 3
B 2 2
C 3 3
D 4
A 1
B 2 2
C 3 3
D 4
Q. How to rename a file or directory in unix (or linux) and how to move a file or
directory from the
current directory to another directory?
Unix provides a simple mv (move) command which can be used to rename or move files
and
directories. The syntax of mv command is
If the newname already exists, then the mv command overwrites that file. Let see
some examples on
how to use mv command.
Unix mv command examples
1. Write a unix/linux command to rename a file?
Renaming a file is one of the basic features of the mv command. To rename a file
from "[Link]" to
"[Link]", use the below mv command
Note that if the "[Link]" file already exists, then its contents will be
overwritten by "[Link]". To avoid
this use the -i option, which prompts you before overwriting the file.
mv -i [Link] [Link]
mv docs/ documents/
If the documents directory already exists, then the docs directory will be moved in
to the documents
directory.
3. Write a unix/linux command to move a file into another directory?
The mv command can also be used to move the file from one directory to another
directory. The
below command moves the [Link] file in the current directory to /var/tmp directory.
mv [Link] /var/tmp/
If the [Link] file already exists in the /var/tmp directory, then the contents of
that file will be
overwritten.
4. Write a unix/linux command to move a directory in to another directory?
Just as moving a file, you can move a directory into another directory. The below
mv command
moves the documents directory into the tmp directory
mv documents /tmp/
5. Write a unix/linux command to move all the files in the current directory to
another directory?
You can use the regular expression pattern * to move all the files from one
directory to another
directory.
mv * /var/tmp/
The above command moves all the files and directories in the current directory to
the /var/tmp/
directory.
6. mv *
What happens if you simply type mv * and then press enter?
[Link]
avatar1360444_1.gif
[Link]
[Link]
It depends on the files you have in the directory. The * expands to all the files
and directories. Three
scenarios are possible.
. If the current directory has only files, then the contents of all the files
(except one file) will be
written in to the one file. The one file is the last file which depends on the
pattern *.
. If the current directory contains only directories, then all the directories
(except one directory)
will be moved to another directory.
. If the current directory contains both files and directories, then it depends on
the expansion of
the *. If the pattern * gives the last one as directory then all the files will be
moved to that
directory. Otherwise the mv command will fail.
Some Tips:
. Try to avoid mv *
. Avoid moving large number of files.
Labels: Unix
3 comments:
1.
hi..., this is my first time using linux OS, your article very helpfull for me. and
im always learn from this site
about unix/linux. thank u so much for this tutorial. i give you +1 for this
article.
ReplyDelete
2.
Write a Bash script called mv (which replaces the GNU utility mv) that tries to
rename the specified file
(using the GNU utility mv), but if the destination file exists, instead creates an
index number to append to
the destination file, a sort of version number. For example, if I type:
$mv [Link] [Link]
But [Link] already exists, mv will move the file to [Link].1. Note that if [Link].1
already exists, you must rename
the file to [Link].2, and so on, until you can successfully rename the file to a
name that does not already
exist.
Help me out on this question.
If you have a solution plz reply me back on rastogirohit007@[Link]
ReplyDelete
3.
We will replace the word "tutorial" with "example" in the file using the sed
command.
The sed command replaced the text in the file and displayed the result on the
terminal. However it
did not changed the contents of the file. You can redirect the output of sed
command and save it in a
file as
The -i option comes in handy to edit the original file itself. If you use the -i
option the sed command
replaces the text in the original file itself rather than displaying it on the
terminal.
> ls [Link]*
[Link] file.txt_bkp
See the backup file created with the contents of the original file.
Recommended reading for you
Sed command Tutorial
Informatica 8.x or later versions provides a feature for generating the target
files dynamically. This
feature allows you to
Go to the Target Designer or Warehouse builder and edit the file definition. You
have to click on the
button indicated in red color circle to add the special port.
Now we will see some informatica mapping examples for creating the target file name
dynamically
and load the data.
1. Generate a new file for every session run.
Whenever the session runs you need to create a new file dynamically and load the
source data into
that file. To do this just follow the below steps:
STEP1: Connect the source qualifier to an expression transformation. In the
expression
transformation create an output port (call it as File_Name) and assign the
expression as
'EMP_'||to_char(sessstarttime, 'YYYYMMDDHH24MISS')||'.dat'
STPE2: Now connect the expression transformation to the target and connect eh
File_Name port of
expression transformation to the FileName port of the target file definition.
STEP3: Create a workflow and run the workflow.
Here I have used sessstarttime, as it is constant throughout the session run. If
you have used
sysdate, a new file will be created whenever a new transaction occurs in the
session run.
The target file names created would look like EMP_20120101125040.dat.
2. Create a new file for every session run. The file name should contain suffix as
numbers
(EMP_n.dat)
In the above mapping scenario, the target flat file name contains the suffix as
'[Link]'. Here
we have to create the suffix as a number. So, the file names should looks as
EMP_1.dat, EMP_2.dat
and so on. Follow the below steps:
STPE1: Go the mappings parameters and variables -> Create a new variable, $
$COUNT_VAR and
its data type should be Integer
STPE2: Connect the source Qualifier to the expression transformation. In the
expression
transformation create the following new ports and assign the expressions.
STEP3: Now connect the expression transformation to the target and connect the
o_file_name port
of expression transformation to the FileName port of the target.
3. Create a new file once a day.
You can create a new file only once in a day and can run the session multiple times
in the day to
load the data. You can either overwrite the file or append the new data.
This is similar to the first problem. Just change the expression in expression
transformation to
'EMP_'||to_char(sessstarttime, 'YYYYMMDD')||'.dat'. To avoid overwriting the file,
use Append If
Exists option in the session properties.
4. Create a flat file based on the values in a port.
You can create a new file for each distinct values in a port. As an example
consider the employees
table as the source. I want to create a file for each department id and load the
appropriate data into
the files.
STEP1: Sort the data on department_id. You can either use the source qualifier or
sorter
transformation to sort the data.
STEP2: Connect to the expression transformation. In the expression transformation
create the below
ports and assign expressions.
rmdir docs/
Here the docs directory is not empty, that is why the rmdir command failed to
remove the directory.
To remove the docs directory first we have to make the directory empty and then
delete the
directory.
rm doc/*
rmdir docs/
We will see later how to remove non-empty directories with a single command.
2. Write a unix/linux command to remove the directory and its parent directories?
As mentioned earlier the -p option allows the rmdir command to delete the directory
and also its
parent directories.
rmdir -p docs/entertainment/movies/
This rmdir command removes the docs directory completely. If you don�t use the -p
option, then it
only deletes the movies directory.
3. Write a unix/linux command to remove directories using pattern matching?
You can specify the directory names using the regular expressions and can delete
them.
rm doc*
This rm command deletes the directories like doc, documents, doc_1 etc.
Now we will see the rm command in unix.
Unix rm command syntax
The syntax of rm command is
rm [options] [directory|file]
The rm command can be used to delete both the files and directories. The rm command
also deletes
the non-empty directories.
Unix rm command examples
1. Write a unix/linux command to remove a file?
This is the basic feature of rm command. To remove a file, [Link], in the
current directory use the
below rm command
rm [Link]
rm *
rm docs/
If the directory is non-empty, then the above command fails to remove the
directories.
4. Write a unix/linux command to delete directories recursively (delete non empty
directories)?
As mentioned earlier, the -r option can be used to remove the directories and sub
directories.
rm -r docs
Incremental Aggregation is the process of capturing the changes in the source and
calculating the
aggregations in a session. This process makes the integration service to update the
target
incrementally and avoids the process of calculating the aggregations on the entire
source. Consider
the below sales table as an example and see how the incremental aggregation works.
Source:
YEAR PRICE
----------
2010 100
2010 200
2010 300
2011 500
2011 600
2012 700
For simplicity, I have used only the year and price columns of sales table. We need
to do
aggregation and find the total price in each year.
When you run the session for the first time using the incremental aggregation, then
integration
service process the entire source and stores the data in two file, index and data
file. The integration
service creates the files in the cache directory specified in the aggregator
transformation properties.
After the aggregation, the target table will have the below data.
Target:
YEAR PRICE
----------
2010 600
2011 1100
2012 700
Now assume that the next day few more rows are added into the source table.
Source:
YEAR PRICE
----------
2010 100
2010 200
2010 300
2011 500
2011 600
2012 700
2010 400
2011 100
2012 200
2013 800
Now for the second run, you have to pass only the new data changes to the
incremental
aggregation. So, the source will contain the last four records. The incremental
aggregation uses the
data stored in the cache and calculates the aggregation. Once the aggregation is
done, the
integration service writes the changes to the target and the cache. The target
table will contains the
below data.
[Link]
Target:
YEAR PRICE
----------
2010 1000
2011 1200
2012 900
2013 800
Points to remember
1. When you use incremental aggregation, first time you have to run the session
with complete
source data and in the subsequent runs you have to pass only the changes in the
source data.
2. Use incremental aggregation only if the target is not going to change
significantly. If the
incremental aggregation process changes more than hhalf of the data in target, then
the session
perfromance many not benfit. In this case go for normal aggregation.
Labels: Informatica
3 comments:
1.
Neel08 February, 2012 23:01
[Link]
[Link]
Hi,
Is incremental aggregation so simple? If we implement d idea of incremental load or
CDC, and by default
aggregator has caching property...why do i need to excercise incremental
aggregation as separate option.
What is the advantage of using this over normal map. (using cdc and not using
incremental aggregation
property). Please explain.
ReplyDelete
Replies
1.
Normal aggregator also caches the data. However, this cache will be cleared when
the session
run completes. In case of incremental aggregation the cache will not be cleared and
it is reused in
the next session run.
If you want to use normal aggregation, every time you run the session you have to
pass the
complete source data to calculate the aggregation. In case of incremental
aggregation, as the
processed data is stored in the cache, you just need to pass only the changes in
the source. This
way the data in cache and the changes form the complete source.
Delete
Reply
2.
Cut command in unix (or linux) is used to select sections of text from each line of
files. You can use
the cut command to select fields or columns from a line by specifying a delimiter
or you can select a
portion of text by specifying the range or characters. Basically the cut command
slices a line and
extracts the text.
Unix Cut Command Example
We will see the usage of cut command by considering the below text file as an
example
unix or linux os
is unix good os
is linux good os
The above cut command prints the fourth character in each line of the file. You can
print more than
one character at a time by specifying the character positions in a comma separated
list as shown in
the below example
xo
ui
ln
This command prints the fourth and sixth character in each line.
[Link] a unix/linux cut command to print characters by range?
You can print a range of characters in a line by specifying the start and end
position of the
characters.
x or
unix
linu
The above cut command prints the characters from fourth position to the seventh
position in each
line. To print the first six characters in a line, omit the start position and
specify only the end position.
unix o
is uni
is lin
To print the characters from tenth position to the end, specify only the start
position and omit the end
position.
inux os
ood os
good os
If you omit the start and end positions, then the cut command prints the entire
line.
[Link] a unix/linux cut command to print the fields using the delimiter?
You can use the cut command just as awk command to extract the fields in a file
using a delimiter.
The -d option in cut command can be used to specify the delimiter and -f option is
used to specify
the field position.
or
unix
linux
This command prints the second field in each line by treating the space as
delimiter. You can print
more than one field by specifying the position of the fields in a comma delimited
list.
or linux
unix good
linux good
The above command prints the second and third field in each line.
Note: If the delimiter you specified is not exists in the line, then the cut
command prints the entire
line. To suppress these lines use the -s option in cut command.
4. Write a unix/linux cut command to display range of fields?
You can print a range of fields by specifying the start and end position.
The above command prints the first, second and third fields. To print the first
three fields, you can
ignore the start position and specify only the end position.
To print the fields from second fields to last field, you can omit the last field
position.
5. Write a unix/linux cut command to display the first field from /etc/passwd file?
The /etc/passwd is a delimited file and the delimiter is a colon (:). The cut
command to display the
first field in /etc/passwd file is
cut -d':' -f1 /etc/passwd
[Link]
[Link]
add_int.sh
Using the cut command extract the portion after the dot.
First reverse the text in each line and then apply the command on it.
Delete Empty Lines Using Sed / Grep Command in Unix (or Linux)
In Unix / Linux you can use the Sed / Grep command to remove empty lines from a
file. For
example, Consider the below text file as input
How it works
Now we will see how to remove the lines from the above file in unix / linux
1. Remove lines using unix sed command
The d command in sed can be used to delete the empty lines in a file.
Here the ^ specifies the start of the line and $ specifies the end of the line. You
can redirect the
output of above command and write it into a new file.
Now we will use the -v option to the grep command to reverse the pattern matching
The output of both sed and grep commands after deleting the empty lines from the
file is
How it works
The Change directory (cd) command is one of the simple commands in Unix (or Linux)
and it is very
easy to use. The cd command is used to change from the current directory to another
directory. The
syntax of cd command is
cd [directory]
Here directory is the name of the directory where you wish to go.
CD Command Examples
1. Write a unix/linux cd command to change to home directory?
Just simply type cd command on the unix terminal and then press the enter key. This
will change
your directory to home directory.
> pwd
/usr/local/bin
Now i am in the /usr/local/bin directory. After typing the cd command and unix
window, you will go to
your home directory.
> cd
> pwd
/home/matt
> pwd
/var/tmp
> cd ..
> pwd
/var
3. Write a unix/linux cd command to go back to two directories?
The cd ../../ takes you back to two directories. You can extend this cd command to
go back to n
number of directories.
> pwd
/usr/local/bin
> cd ../../
> pwd
/usr
4. Write a unix/linux cd command to change the directory using the absolute path?
In case of changing directory using absolute path you have to specify the full
directory path.
Absolute path directories always start with a slash (/). An example is changing
your directory to
/usr/bin from your home directory.
> cd /usr/bin
5. Write a unix/linux cd command to change the directory using the relative path?
In relative path, you have to specify the directory path relative to your current
directory. For example,
you are in /var/tmp directory and you want to go to /var/lib directory, then you
can use the relative
path.
> pwd
/var/tmp
> cd ../lib
> pwd
/var/lib
Here the cd ../lib, first takes you to the parent directory which is /var and then
changes the directory
to the lib.
6. Write a unix/linux cd command to change back to previous directory.
As an example, i am in the directory /home/matt/documents and i changed to a new
directory
/home/matt/backup. Now i want to go back to my previous directory
/home/matt/documents. In this
case, you can use the cd - command to go back to the previous directory.
> pwd
/home/matt/documents
> cd /home/matt/backup
>pwd
/home/matt/backup
> cd -
> pwd
/home/matt/documents
1327312578
You will get a different output if you run the above date command.
2. Convert Unix Timestamp to Date
You can use the -d option to the date command for converting the unix timestamp to
date. Here you
have to specify the unix epoch and the timestamp in seconds.
946684800
Copy (cp) File And Directory Examples | Unix and Linux Command
Copy (cp) is the frequently used command in Unix (or Linux). The cp Command is used
to copy the
files from one directory to another directory. The cp command can also be used to
copy the
directories also. The syntax of cp command is
Examples of cp Command
1. Write a unix/linux cp command to copy file in to a directory?
The basic usage of cp command is to copy a file from the current directory to
another directory.
cp [Link] tmp/
The cp command copies the file [Link] into the tmp directory. The cp command does
not remove
the source file. It just copies the file into a new location. If a file with the
same name as the source
exists in the destination location, then by default the cp command overwrites that
new file
2. Write a unix/linux cp to prompt for user before overwriting a file ( Interactive
cp command)?
The -i option to the cp command provides the ability to prompt for a user input
whether to overwrite
the destination file or not.
If you enter y, then the cp command overwrites the destination file, otherwise the
cp command does
not copy the file.
3. Write a unix/linux cp command to copy multiple files in to a new directory?
You can specify multiple files as the source and can copy to the new location.
The cp command copies the [Link], [Link] files in the current directory to the
tmp directory.
4. Write a unix/linux cp command to do a Regular expression copy?
You can copy a set of files by specifying a regular expression pattern.
cp *.dat tmp/
Here the cp command copies all the files which has "dat" as suffix to the
destination directory.
5. Write a unix/linux cp command to copy a file in to the current directory?
You can copy a file from a different directory to the current directory.
cp /usr/local/bin/[Link] .
Here the cp command copies the [Link] file in the /usr/local/bin directory the
current directory.
The dot (.) indicates the current directory.
6. Write a unix/linux cp command to copy all the files in a directory?
The cp command can be used to copy all the files in directory to another directory.
cp docs/* tmp/
This command copies all the files in the docs directory to the tmp directory.
7. Write a unix/linux cp command to copy files from multiple directories?
You can copy the files from different directories into a new location.
The command copies the files from docs and script directories to the destination
directory tmp.
8. Write a unix/linux cp command to Copy a directory.
You can recursively copy a complete directory and its sub directory to another
location using the cp
command
cp -r docs tmp/
This copies the complete directory docs into the new directory tmp
9. Write a unix/linux cp command to Forcibly copy a file with -f option?
You can force the cp command to copy an existing destination file even it cannot be
opened.
cp -f force_file.txt /var/tmp/
ls Command in Unix and Linux Examples
ls is the most widely used command in unix or linux. ls command is used to list the
contents of a
directory. Learn the power of ls command to make your life easy. The syntax of ls
command is
ls [options] [pathnames]
> ls -a
Hidden files are the one whose name starts with dot (.). The las -a displays the
current directory (.)
and parent directory (..) also. If you want to exclude the current directory,
parent directory, then use -
A option.
> ls -A
> ls -F
> ls -1
documents
[Link]
> ls -i1
10584066 documents
3482450 [Link]
> ls -l
total 16
. The first character indicates the type of the file. - for normal file, d for
directory, l for link file and
s for socket file
. The next 9 characters in the first field represent the permissions. Each 3
characters refers the
read (r), write (w), execute (x) permissions on owner, group and others. - means no
permission.
. The second field indicates the number of links to that file.
. The third field indicates the owner name.
. The fourth field indicates the group name.
. The fifth field represents the file size in bytes.
. The sixth field represents the last modification date and time of the file.
. And finally the seventh field is the name of the file.
> ls -t1
[Link]
documents
> ls -rt1
documents
[Link]
> ls -R
.:
documents [Link]
./documents:
[Link]
9. Write a unix/linux ls command to print the files in a specific directory?
You can pass a directory to the ls command as an argument to print for the files in
it.
> ls /usr/local/bin
> ls -x
Tuning an SQL query for performance is a big topic. Here I will just cover how to
re-write a query
and thereby improve the performance. Rewriting an SQL query is one of the ways you
can improve
performance. You can rewrite a query in many different ways.
To explain this, i have used the sales and products table.
PRODUCTS(PRODUCT_ID, PRODUCT_NAME);
SELECT [Link],
T.TOT_SAL,
P.PROD_10_SAL
SELECT YEAR,
SUM(PRICE) TOT_SAL
FROM SALES
GROUP BY YEAR
) T
SELECT YEAR,
SUM(PRICE) PROD_10_SAL
FROM SALES
WHERE PRODUCT_ID = 10
) P
ON ([Link] = [Link]);
Most SQL developers write the above Sql query without even thinking that it can be
solved in a
single query. The above query is rewritten as
SELECT YEAR,
THEN PRICE
ELSE NULL
END ) PROD_10_SAL,
SUM(SALES) TOT_SAL
FROM SALES
GROUP BY YEAR;
Now you can see the difference, just by reading the sales table one time we will
able to solve the
problem.
First take a look at of your query, identify the redundant logic and then tune it.
SELECT P.PRODUCT_ID,
P.PRODUCT_NAME
FROM PRODUCTS P
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID)
The same query can be rewritten using NOT EXISTS and NOT IN as
SELECT P.PRODUCT_ID,
P.PRODUCT_NAME
FROM PRODUCTS P
SELECT 1
FROM SALES S
SELECT P.PRODUCT_ID,
P.PRODUCT_NAME
FROM PRODUCTS P
SELECT PRODUCT_ID
FROM SALES
);
Analyze the performance of these three queries and use the appropriate one.
Note: Be careful while using the NOT IN. If the sub query returns at lease row with
NULL data, then
the main query won't return a row at all.
3. INNER JOIN, EXISTS, IN
As similar to LEFT OUTER JOIN, the INNER JOINS can also be implemented with the
EXISTS or IN
operators. As an example, we will find the sales of products whose product id�s
exists in the products
table.
SELECT S.PRODUCT_ID,
SUM(PRICE)
FROM SALES S
JOIN
PRODUCTS P
ON (S.PRODUCT_ID = P.PRODUCT_ID)
GROUP BY S.PRODUCT_ID;
As we are not selecting any columns from the products table, we can rewrite the
same query with
the help of EXISTS or IN operator.
SELECT S.PRODUCT_ID,
SUM(PRICE)
FROM SALES S
WHERE EXISTS
SELECT 1
FROM PRODUCTS P
GROUP BY S.PRODUCT_ID;
SELECT S.PRODUCT_ID,
SUM(PRICE)
FROM SALES S
WHERE PRODUCT_ID IN
SELECT PRODUCT_ID
FROM PRODUCTS P
);
GROUP BY S.PRODUCT_ID;
SELECT S.SALE_ID,
S.PRODUCT_ID,
P.PRODUCT_NAME
FROM SALES S
JOIN
PRODUCTS P
ON (S.PRODUCT_ID = P.PRODUCT_ID)
SELECT S.SALE_ID,
S.PRODUCT_ID,
(SELECT PRODUCT_NAME
FROM PRODUCTS P
FROM SALES S
Analyze these two queries with the explain plan and check which one gives better
performance.
5. Using With Clause or Temporary Tables.
Try to avoid writing complex Sql queries. Split the queries and store the data in
temporary tables or
use the Oracle With Clause for temporary storage. This will improve the
performance. You can also
use the temporary tables or with clause when you want to reuse the same query more
than once.
This saves the time and increases the performance.
Tips for increasing the query performance:
. Create the required indexes. In the mean time avoid creating too many indexes on
a table.
. Rewrite the Sql query.
. Use the explain plan, auto trace to know about the query execution.
. Generate statistics on tables.
. Specify the oracle Hints in the query.
. Ask the DBA to watch the query and gather stats like CPU usage, number of row
read etc.
Please help in improving this article, by commenting on more ways to rewrite a Sql
query.
>cat [Link]
Here the "s" specifies the substitution operation. The "/" are delimiters. The
"unix" is the search
pattern and the "linux" is the replacement string.
By default, the sed command replaces the first occurrence of the pattern in each
line and it won't
replace the second, third...occurrence in the line.
2. Replacing the nth occurrence of a pattern in a line.
Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a
line. The below
command replaces the second occurrence of the word "unix" with "linux" in a line.
>sed 's/unix/linux/2' [Link]
In this case the url consists the delimiter character which we used. In that case
you have to escape
the slash with backslash character, otherwise the substitution won't work.
Using too many backslashes makes the sed command look awkward. In this case we can
change
the delimiter to another character as shown in the below example.
The parenthesis needs to be escaped with the backslash character. Another example
is if you want
to switch the words "unixlinux" as "linuxunix", the sed command is
If you use -n alone without /p, then the sed does not print anything.
10. Running multiple sed commands.
You can run multiple sed commands by piping the output of one sed command as input
to another
sed command.
Sed provides -e option to run multiple sed commands in a single sed command. The
above output
can be achieved in a single sed command as shown below.
The above sed command replaces the string only on the third line.
12. Replacing string on a range of lines.
You can specify a range of line numbers to the sed command for replacing a string.
Here the sed command replaces the lines with range from 1 to 3. Another example is
Here $ indicates the last line in the file. So the sed command replaces the text
from second line to
last line in the file.
13. Replace on a lines which matches a pattern.
You can specify a pattern to the sed command to match in a line. If the pattern
match occurs, then
only the sed command looks for the string to be replaced and if it finds, then the
sed command
replaces the string.
Here the sed command first looks for the lines which has the pattern "linux" and
then replaces the
word "unix" with "centos".
14. Deleting lines.
You can delete the lines a file by specifying the line number or a range or
numbers.
Here the sed command looks for the pattern "unix" in each line of a file and prints
those lines that
has the pattern.
You can also make the sed command to work as grep -v, just by using the reversing
the sed with
NOT (!).
"Change line"
"Change line"
Here the sed command transforms the alphabets "ul" into their uppercase format "UL"
The general functions work with any data type and are mainly used to handle null
values. The Oracle
general functions are
NVL(expr1, expr2)
The NVL function takes two arguments as its input. If the first argument is NULL,
then it returns the
second argument otherwise it returns the first argument.
NVL2(expr1,expr2,expr3)
The NVL2 function takes three arguments as its input. If the expr1 is NOT NULL,
NVL2 function
returns expr2. If expr1 is NULL, then NVL2 returns expr3.
NULLIF(expr1, expr2)
The NULLIF function compares the two expressions and returns NULL if they are
equal otherwise it
returns the first expression.
COALESCE(expr1,expr2,expr3,...)
The COALESCE function takes N number of arguments as its input and returns the
first NON-NULL
argument.
Conversions functions are used to convert one data type to another type. In some
cases oracle
server automatically converts the data to the required type. This is called
implicit conversion. Explicit
conversions are done by using the conversion functions. You have to take care of
explicit
conversions.
Oracle provides three functions to covert from one data type to another.
1. To_CHAR ( number | date, [fmt], [nlsparams] )
The TO_CHAR function converts the number or date to VARCHAR2 data type in the
specified
format (fmt). The nlsparams parameter is used for number conversions. The nlsparams
specifies the
following number format elements:
. Decimal character
. Group separator
. Local currency symbol
. International currency symbol
If the parameters are omitted, then it uses the default formats specified in the
session.
Converting Dates to Character Type Examples
The Date format models are:
. 9: Specifies numeric position. The number of 9's determine the display width.
. 0: Specifies leading zeros.
. $: Floating dollar sign
. .: Decimal position
. ,: Comma position in the number
. Single Row Subqueries: The subquery returns only one row. Use single row
comparison
operators like =, > etc while doing comparisions.
. Multiple Row Subqueries: The subquery returns more than one row. Use multiple row
1. Write a query to find the salary of employees whose salary is greater than the
salary of employee
whose id is 100?
SELECT EMPLOYEE_ID,
SALARY
FROM EMPLOYEES
SELECT SALARY
FROM EMPLOYEES
2. Write a query to find the employees who all are earning the highest salary?
SELECT EMPLOYEE_ID,
SALARY
FROM EMPLOYEES
WHERE SALARY =
SELECT MAX(SALARY)
FROM EMPLOYEES
3. Write a query to find the departments in which the least salary is greater than
the highest salary in
the department of id 200?
SELECT DEPARTMENT_ID,
MIN(SALARY)
FROM EMPLOYEES
GROUP BY DEPARTMENT_ID
SELECT MAX(SALARY)
FROM EMPLOYEES
1. Write a query to find the employees whose salary is equal to the salary of at
least one employee
in department of id 300?
SELECT EMPLOYEE_ID,
SALARY
FROM EMPLOYEES
WHERE SALARY IN
SELECT SALARY
FROM EMPLOYEES
2. Write a query to find the employees whose salary is greater than at least on
employee in
department of id 500?
SELECT EMPLOYEE_ID,
SALARY
FROM EMPLOYEES
SELECT SALARY
FROM EMPLOYEES
3. Write a query to find the employees whose salary is less than the salary of all
employees in
department of id 100?
SELECT EMPLOYEE_ID,
SALARY
FROM EMPLOYEES
SELECT SALARY
FROM EMPLOYEES
4. Write a query to find the employees whose manager and department should match
with the
employee of id 20 or 30?
SELECT EMPLOYEE_ID,
MANAGER_ID,
DEPARTMENT_ID
FROM EMPLOYEES
WHERE (MANAGER_ID,DEPARTMENT_ID) IN
SELECT MANAGER_ID,
DEPARTMENT_ID
FROM EMPLOYEES
SELECT EMPLOYEE_ID,
DEPARTMENT_ID,
(SELECT DEPARTMENT_NAME
FROM DEPARTMENTS D
FROM EMPLOYEES E
Correlated sub query is used for row by row processing. The sub query is executed
for each row of
the main query.
1. Write a query to find the highest earning employee in each department?
SELECT DEPARTMENT_ID,
EMPLOYEE_ID,
SALARY
WHERE 1 =
2. Write a query to list the department names which have at lease one employee?
SELECT DEPARTMENT_ID,
DEPARTMENT_NAME
FROM DEPARTMENTS D
WHERE EXISTS
SELECT 1
FROM EMPLOYEES E
3. Write a query to find the departments which do not have employees at all?
SELECT DEPARTMENT_ID,
DEPARTMENT_NAME
FROM DEPARTMENTS D
SELECT 1
FROM EMPLOYEES E
Oracle provides single row functions to manipulate the data values. The single row
functions operate
on single rows and return only one result per row. In general, the functions take
one or more inputs
as arguments and return a single value as output. The arguments can be a user-
supplied constant,
variable, column name and an expression.
The features of single row functions are:
. Character Functions: Character functions accept character inputs and can return
either
character or number values as output.
. Number Functions: Number functions accepts numeric inputs and returns only
numeric values
as output.
. Date Functions: Date functions operate on date data type and returns a date value
or numeric
value.
. Conversions Functions: Converts from one data type to another data type.
. General Functions
Let see each function with an example:
1. LOWER
The Lower function converts the character values into lowercase letters.
2. UPPER
The Upper function converts the character values into uppercase letters.
3. INITCAP
The Initcap function coverts the first character of each word into uppercase and
the remaining
characters into lowercase.
4. CONCAT
The Concat function coverts the first string with the second string.
5. SUBSTR
The Substr function returns specified characters from character value starting at
position m and n
characters long. If you omit n, all characters starting from position m to the end
are returned.
You can specify m value as negative. In this case the count starts from the end of
the string.
6. LENGTH
The Length function is used to find the number of characters in a string.
7. INSTR
The Instr function is used to find the position of a string in another string.
Optionally you can provide
position m to start searching for the string and the occurrence n of the string. By
default m and n are
1 which means to start the search at the beginning of the search and the first
occurrence.
8. LPAD
The Lpad function pads the character value right-justified to a total width of n
character positions.
9. RPAD
The Rpad function pads the character value left-justified to a total width of n
character positions.
10. TRIM
The Trim function removes the leading or trailing or both the characters from a
string.
11. REPLACE
The Replace function is used to replace a character with another character in a
string.
1. ROUND
The Round function rounds the value to the n decimal values. If n is not specified,
there won't be any
decimal places. If n is negative, numbers to the left of the decimal point are
rounded.
Syntax: round(number,n)
2. TRUNC
The Trunc function truncates the value to the n decimal places. If n is omitted,
then n defaults to
zero.
Syntax: trunc(number,n)
SELECT trunc(123.67,1) FROM DUAL;
3. MOD
The Mod function returns the remainder of m divided by n.
Syntax: mod(m,n)
1. SYSDATE
The Sysdate function returns the current oracle database server date and time.
3. MONTHS_BETWEEN
The Months_Between function returns the number of months between the two given
dates.
Syntax: months_between(date1,date2)
4. ADD_MONTHS
The Add_Months is used to add or subtract the number of calendar months to the
given date.
Syntax: add_months(date,n)
5. NEXT_DAY
The Next_Day function finds the date of the next specified day of the week. The
syntax is
NEXT_DAY(date,'char')
The char can be a character string or a number representing the day.
6. LAST_DAY
The Last_Day function returns the last day of the month.
7. ROUND
The Round function returns the date rounded to the specified format. The Syntax is
Round(date [,'fmt'])
8. TRUNC
The Trunc function returns the date truncated to the specified format. The Syntax
is
Trunc(date [,'fmt'])
The Oracle Conversion and General Functions are covered in other sections. Go
through the links
Oracle Conversion Functions and Oracle General Functions.
Oracle With Clause is similar to temporary tables, where you store the data once
and read it multiple
times in your sql query. Oracle With Clause is used when a sub-query is executed
multiple times. In
simple With Clause is used to simply the complex SQL. You can improve the
performance of the
query by using with clause.
Syntax of Oracle With Clause
With query_name As
SQL query
At first, the With Clause syntax seems to be confusing as it does not begin with
the SELECT. Think
of the query_name as a temporary table and use it in your queries.
Oracle With Clause Example
We will see how to write a sql query with the help of With Clause. As an example,
we will do a math
operation by dividing the salary of employee with the total number of employees in
each department.
WITH CNT_DPT AS
SELECT DEPARTMENT_ID,
COUNT(1) NUM_EMP
FROM EMPLOYEES
GROUP BY DEPARTMENT_ID
SELECT EMPLOYEE_ID,
SALARY/NUM_EMP
FROM EMPLOYEES E,
CNT_DEPT C
String is one of the widely used java classes. The Java String class is used to
represent the
character strings. All String literals in Java programs are implemented as instance
of the String
Class. Strings are constants and their values cannot be changed after they created.
Java String
objects are immutable and they can be shared. The String Class includes several
methods to edit
the content of string.
Creating a String Object:
A String object can be created as
String Str="Car Insurance";
The string object can also be created using the new operator as
String str= new String("Car Insurance");
Java provides a special character plus (+) to concatenate strings. The plus
operator is the only
operator which is overloaded in java. String concatenation is implemented through
the String Builder
of String Buffer class and their append method.
Examples of Java String Class:
1. Finding the length of the string
The length() method can be used to find the length of the string.
[Link]([Link]());
2. Comparing strings
The equals() method is used to comapre two strings.
if ( [Link]("car loan") )
else
}
3. Comparing strings by ignoring case
The equalsIgnoreCase() method is used to compare two strings by ignoring the case.
if ( [Link]("INSURANCE") )
else
else
{
[Link]("car finance string is alphabetically lesser");
if ( [Link]("CAR finance") = 0)
else
[Link]( [Link]("car") );
[Link]( [Link](5) );
[Link]( [Link](1,3));
[Link]([Link]());
[Link]([Link]("A", "L"));
[Link]([Link]());
[Link]([Link]());
Learning unix operating system is very easy. It is just that you need to understand
the unix server
concepts and familiar with the unix commands. Here I am providing some important
unix commands
which will be used in daily work.
Unix Commands With Examples:
1. Listing files
The first thing after logging into the unix system, everyone does is listing the
files in a directory. The
ls command is used to list the files in a directory.
>ls
[Link]
[Link]
[Link]
If you simply execute ls on the command prompt, then it will display the files and
directories in the
current directory.
>ls /usr/local/bin
You can pass a directory as an argument to ls command. In this case, the ls command
prints all the
files and directories in the specific directory you have passed.
2. Displaying the contents of a file.
The next thing is to display the contents of a file. The cat command is used to
display the contents in
a file.
>cat [Link]
>head -5 [Link]
>tail -3 [Link]
>cd /var/tmp
touch new_file.txt
8. Creating a directory.
Directories are a way of organizing your files. The mkdir command is used to create
the specified
directory.
>mkdir backup
>wc [Link]
21 26 198 [Link]
To know about the unix command, it is always good to see the man pages. To see the
man pages
simply pass the command as an argument to the man.
man ls
>cat [Link]
Learn unix
Learn linux
We want to replace the word "unix" with "fedora". Here the word "unix" is in the
second field. So, we
need to check for the word "unix" in the second field and replace it with workd
"fedora" by assigning
the new value to the second field. The awk command to replace the text is
awk '{if($2=="unix") {$2="fedora"} print $0}' [Link]
Learn fedora
Learn linux
2. Now we will see a bit complex [Link] the text file with the below data
>cat [Link]
left
In left
right
In top
top
In top
bottom
In bottom
)
Now replace the string, "top" in right section with the string "right". The output
should look as
left
In left
right
In right
top
In top
bottom
In bottom
Here the delimiter in the text file is brace. We have to specify the delimiters in
awk command with
the record separators. The below awk command can be used to replace the string in a
file
Here RS is the input record separator and ORS is the output record separator.
Recommended Posts:
Examples of Awk Command
Date command is used to print the date and time in unix. By default the date
command displays the
date in the time zone that the unix operating system is configured.
Now let see the date command usage in unix
Date Command Examples:
1. Write a unix/linux date command to print the date on the terminal?
>date
This is the default format in which the date command print the date and time. Here
the unix server is
configured in pacific standard time.
2. Write a unix/linux date command to print the date in GMT/UTC time zone?
>date -u
The -u option to the date command tells it to display the time in Greenwich Mean
Time.
3. Write a unix/linux date command to sett the date in unix?
You can change the date and time by using the -s option to the date command.
4. Write a unix/linux date command to display only the date part and ignore the
time part?
>date '+%m-%d-%Y'
01-23-2012
You can format the output of date command by using the %. Here %m for month, %d for
day and
%Y for year.
5. Write a unix/linux date command to display only the time part and ignore the
date part?
>date '+%H-%M-%S'
01-48-45
Here %H is for hours in 24 hour format, %M is for minutes and %S for seconds
6. Write a unix/linux date command to format both the date and time part.
01-23-2012 01-49-59
7. Write a unix/linux date command to find the number of seconds from unix epoch.
>date '+%s'
1327312228
Unix epoch is the date on January 1st, 1970. The %s option is used to find the
number of seconds
between the current date and unix epoch.
SELECT DEPARTMENT_ID,
MAX(SALARY) HIGHEST_SALARY,
FROM EMPLOYEES
GROUP BY DEPARTMENT_ID;
2. Write a query to get the top 2 employees who are earning the highest salary in
each department?
SELECT DEPARTMENT_ID,
EMPLOYEE_ID,
SALARY
FROM
SELECT DEPARTMENT_ID,
EMPLOYEE_ID,
SALARY,
FROM EMPLOYEES
) A
WHERE R <= 2;
4. Write a query to find the employees who are earning more than the average salary
in their
department?
SELECT EMPLOYEE_ID,
SALARY
1. What is a cursor?
A cursor is a reference to the system memory when an SQL statement is executed. A
cursor
contains the information about the select statement and the rows accessed by it.
. Implicit Cursors: Implicit cursors are created by default when DML statements
like INSERT,
UPDATE and DELETE are executed in PL/SQL objects.
. Explicit Cursors: Explicit cursors must be created by you when executing the
select statements.
. %FOUND : Returns true if a DML or SELECT statement affects at least one row.
. %NOTFOUND: Returns true if a DML or SELECT statement does not affect at least one
row.
. %ROWCOUNT: Returns the number of rows affected by the DML or SELECT statement.
. %ISOPEN: Returns true if a cursor is in open state.
. %BULK_ROWCOUNT: Similar to %ROWCOUNT, except it is used in bulk operations.
BEGIN
INSERT INTO
employees_changes (employee_id,
change_date
VALUES (:OLD.photo_tag_id,
SYSDATE
);
END;
PL/SQL objects are precompiled. All the dependencies are checked before the
execution of the
objects. This makes the programs to execute faster.
The dependencies include database objects, Tables, Views, synonyms and other
objects. The
dependency does not depend on the data.
As DML (Data Manipulation Language) statements do not change the dependency, they
can run
directly in PL/SQL objects. On the other hand, DDL (Data Definition Language)
statements like
CREATE, DROP, ALTER commands and DCL (Data Control Language) statements like GRANT,
REVOKE can change the dependencies during the execution of the program.
Example: Let say you have dropped a table during the execution of a program and
later in the same
program when you try to insert a record in to that table the program will fail.
This is the reason why DDL statements are not allowed directly in PL/SQL programs.
One of the basic feature of any operating system is to search for files. Unix
operating system also
provides this feature for searching the files. The Find Command in Unix is used for
searching files
and directories in Unix, Linux and other Unix like operating systems.
You can specify search criteria for searching files and directories. If you do not
specify any criteria,
the find command searches for the files in the current directory.
Unix Search Command Examples:
1. Searching for the files in the current directory.
The dot(.) represents the current directory and -name option specifies the name of
the file to be
searched. This find command searches for all the files with ".sh" as the suffix.
2. Searching for the file in all the directories.
The / specifies the home directory of the user, which is at the highest level and
the -type option
specifies the type of file. This command searches for the regular file,"[Link]",
in all the directories.
3. Searching for the file in a particular directory.
This find command searches for all the java files in the /usr/local/bin directory.
4. Searching for a directory.
find . -type d -name 'tmp'
The -type d indicates the directory. This find command searches for the tmp
directory in the current
directory.
5. Searching for a directory in another directory
This find command searches for the personal directory in the /var/tmp directory.
Most of you might have used find command to search for files and directories. The
find command
can also be used to delete the files and directories. The find command has -delete
option which can
be used to delete files, directories etc. Be careful while using the -delete option
of the find command
especially when using recursive find. Otherwise you will end up in deleting the
important files.
1. The basic find command to delete a file in the current directory is
What is an EquiJoin
The Join condition specified determines what type of join it is. When you relate
two tables on the join
condition by equating the columns with equal(=) symbol, then it is called an Euqi-
Join. Equi-joins are
also called as simple joins.
Examples:
1. To get the department name of an employee from departments table, then you need
to compare
the department_id column in the Employees table with the department_id column in
the departments
table. The SQL query for this is
SELECT employee_id,
department_name
FROM employee e,
departments d
SELECT c.customer_name,
p.product_name
FROM Customers c,
Sales s,
Products p
Awk command is used to parse the files which have delimited data. By default, awk
command does
a case-sensitive parsing. The awk command has a IGNORECASE built-in variable to do
a case
insensitive parsing. We will see about the IGNORECASE in detail here.
Consider a sample text file with the below data
>cat [Link]
mark iphone
jhon sony
peter Iphone
chrisy motorola
The below awk command can be used to display the lines which have the word "iphone"
in it.
awk '{if($2 == "iphone") print $0 }' [Link]
This awk command looks for the word "iphone" in the second column of each line and
if it finds a
match, then it displays that line. Here it just matches the word "iphone" and did
not match the word
"Iphone". The awk did a case sensitive match.
The output of the above command is
mark iphone
You can make the awk to do case insensitive and match even for the words like
"Iphone" or
"IPHONE" etc.
The IGNORECASE is a built in variable which can be used in awk command to make it
either case
sensitive or case insensitive.
If the IGNORECASE value is 0, then the awk does a case sensitive match. If the
value is 1, then the
awk does a case insensitive match.
The awk command for case insensitive match is
The ouput is
mark iphone
peter Iphone
Awk command parses the files which have delimited structure. The /etc/passwd file
is a delimited
file. Using the Awk command is a good choice to parse the /etc/passwd file.
Sample /etc/passwd file
looks like as below
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
The /etc/passwd file contains the data in the form of row and columns. The columns
are delimited by
a colon (:) character.
Now we will see how to write an Awk command which reads the /etc/passwd file and
prints the
names of the users who have the /bin/bash program as their defaualt shell command.
awk -F: '$7 == "/bin/bash" { print $1 }' /etc/passwd
The -F option is used to specify the filed delimiter.
The output of the above awk command is
root
Grep is the frequently used command in Unix (or Linux). Most of us use grep just
for finding the
words in a file. The power of grep comes with using its options and regular
expressions. You can
analyze large sets of log files with the help of grep command.
Grep stands for Global search for Regular Expressions and Print.
The basic syntax of grep command is
grep [options] pattern [list of files]
Let see some practical examples on grep command.
1. Running the last executed grep command
This saves a lot of time if you are executing the same command again and again.
!grep
This displays the last executed grep command and also prints the result set of the
command on the
terminal.
2. Search for a string in a file
This is the basic usage of grep command. It searches for the given string in the
specified file.
This searches for the string "Error" in the log file and prints all the lines that
has the word "Error".
3. Searching for a string in multiple files.
This is also the basic usage of the grep command. You can manually specify the list
of files you want
to search or you can specify a file pattern (use regular expressions) to search
for.
4. Case insensitive search
The -i option enables to search for a string case insensitively in the give file.
It matches the words
like "UNIX", "Unix", "unix".
This will search for the lines which starts with a number. Regular expressions is
huge topic and I am
not covering it here. This example is just for providing the usage of regular
expressions.
6. Checking for the whole words in a file.
By default, grep matches the given string/pattern even if it found as a substring
in a file. The -w
option to grep makes it match only the whole words.
This will prints the matched lines along with the two lines before the matched
lines.
8. Displaying the lines after the match.
This will display the matched lines and also five lines before and after the
matched lines.
10. Searching for a sting in all files recursively
You can search for a string in all the files under the current directory and sub-
directories with the
help -r option.
grep -r "string" *
15. Display the file names that do not contain the pattern.
We can display the files which do not contain the matched string/pattern.
If you like this post, please share it on google by clicking on the +1 button.
The WHERE clause in Oracle is used to limit the rows in a table. And the ORDER BY
clause is used
to sort the rows that are retrieved by a SELECT statement.
The syntax is
FROM table
[WHERE Condition]
Examples:
Let�s use the sales table as an example for all the below oracle problems. The
sales table structure
is
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
1. Comparison Operators.
The comparison operators are used in conditions that compare one expression to
another value or
expression. The
comparison operators supported in oracle are "Equal To (=)", "Greater Than (>)",
"Greater Than Or
Equal To (>=)" , "Less Than (<)", "Less Than Or Equal To (<=)", "Not Equal To (<>
or !=)".
SELECT *
FROM SALES
This query will return only the rows which have the year column data as 2012.
Similarly you can use
other comparison operators in the where condition.
2. Using the AND logical operator.
You can specify more than one condition in the WHERE clause. The AND operator is
used when
you want all the conditions to satisfy.
SELECT *
FROM SALES
WHERE YEAR=2012
AND PRODUCT_ID=10;
This query will return rows when both the conditions (YEAR=2012, PRODUCT_ID=10) are
true.
3. Using the OR logical operator.
The OR operator is used when you want at least one of the specified conditions to
be true.
SELECT *
FROM SALES
WHERE YEAR=2012
OR PRODUCT_ID=10;
This query will return rows when at least one of the conditions (YEAR=2012,
PRODUCT_ID=10) is
true.
4. Using the IN operator.
The IN operator can be used to test for a value with a list of values.
SELECT *
FROM SALES
Here the YEAR column should match with 2010 or 2011 or 2012. This is like
specifying multiple OR
conditions. This can be rewritten using the OR as
SELECT *
FROM SALES
OR YEAR = 2011
OR YEAR = 2012;
SELECT *
FROM SALES
This will return the rows whose years fall between 2010 and 2020. This query can be
rewritten with
the AND operator as
SELECT *
FROM SALES
SELECT *
FROM SALES
The below example selects the data whose year starts with 2 and ends with 9.
SELECT *
FROM SALES
Sometimes, there might be cases where you want to look for the % and _ characters
in the strings.
In such cases you have to escape these wild characters.
SELECT *
FROM CUSTOMERS
SELECT *
FROM SALES
The ASC keyword specifies to sort the data in ascending order. By default the
sorting is in ascending
order. You can omit the ASC keyword if you want the data to be sorted in ascending
order. The
DESC keyword is used to sort the data in descending order.
8. Sort the data by YEAR in ascending order and then PRICE in descending order.
SELECT *
FROM SALES
The SELECT statement is used to retrieve the data from the database. The SELECT
statement can
do the following things:
. Projection: You can choose only the required columns in a table that you want to
retrieve.
. Selection: You can restrict the rows returned by a query.
. Joining: You can bring the data from multiple tables by joining them.
Examples:
Let�s use the sales table as an example for all the below oracle problems. The
sales table structure
is
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
SELECT *
FROM SALES;
SELECT PRODUCT_ID,
YEAR
FROM SALES;
Here we have selected only two columns from the sales table. This type of selection
is called
projection.
3. Specifying aliases.
[Link] P
FROM SALES S
We can specify aliases to the columns and tables. These aliases come in handy to
specify a short
name.
4. Arithmetic Operations.
We can do arithmetic operations like addition (+), subtraction (-), division (/)
and multiplication (*).
SELECT SALE_ID,
PRICE*100
FROM SALES
Here the price is multiplied with 100. Similarly, other arithmetic operations can
be applied on the
columns.
5. Concatenation operator.
The concatenation operator can be used to concatenate columns or strings. The
concatenation
operator is represented by two vertical bars(||).
YEAR||'01' Y
FROM SALES;
In the above example, you can see how we can concatenate two columns, a column
with a string.
You can also concatenate two strings.
6. Eliminating duplicate rows
The DISTINCT keyword can be used to suppress the duplicate rows.
FROM SALEs;
This will give only unique years from the sales table.
1. What is Normalization?
Normalization is the process of organizing the columns, tables of a database to
minimize the
redundancy of data. Normalization involves in dividing large tables into smaller
tables and defining
relationships between them. Normalization is used in OLTP systems.
2. What are different types of Normalization Levels or Normalization Forms?
The different types of Normalization Forms are:
. First Normal Form: Duplicate columns from the same table needs to be eliminated.
We have to
create separate tables for each group of related data and identify each row with a
unique
column or set of columns (Primary Key)
. Second Normal Form: First it should meet the requirement of first normal form.
Removes the
subsets of data that apply to multiple rows of a table and place them in separate
tables.
Relationships must be created between the new tables and their predecessors through
the use
of foreign keys.
. Third Normal Form: First it should meet the requirements of second normal form.
Remove
columns that are not depending upon the primary key.
. Fourth Normal Form: There should not be any multi-valued dependencies.
. MOLAP: The data is stored in multi-dimensional cube. The storage is not in the
relational
database, but in proprietary formats.
. ROLAP: ROLAP relies on manipulating the data stored in the RDBMS for slicing and
dicing
functionality.
. HOLAP: HOLAP combines the advantages of both MOLAP and ROLAP. For summary type
information, HOLAP leverages on cube technology for faster performance. For detail
information, HOLAP can drill through the cube.
SELECT Columns | *
FROM Table_Name
[WHERE Search_Condition]
[GROUP BY Group_By_Expression]
[HAVING Search_Condition]
The below procedure can be used to disable all the triggers in a schema in oracle
database.
IS
v_statement VARCHAR2(500);
CURSOR trigger_cur
IS
SELECT trigger_name
FROM user_triggers;
BEGIN
FOR i in trigger_cur
LOOP
END LOOP;
END;
SQL Query Interview Questions - Part 5
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
COMMIT;
PRODUCT_ID PRODUCT_NAME
-----------------------
100 Nokia
200 IPhone
300 Samsung
400 LG
500 BlackBerry
600 Motorola
Solution:
First we will create a target table. The target table will have an additional
column INSERT_DATE to
know when a product is loaded into the target table. The target
table structure is
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30),
INSERT_DATE DATE
);
The next step is to pick 5 products randomly and then load into target table. While
selecting check
whether the products are there in the
SELECT PRODUCT_ID,
PRODUCT_NAME,
SYSDATE INSERT_DATE
FROM
(
SELECT PRODUCT_ID,
PRODUCT_NAME
FROM PRODUCTS S
SELECT 1
FROM TGT_PRODUCTS T
)A
The last step is to delete the products from the table which are loaded 30 days
back.
CONTENT_ID INTEGER,
CONTENT_TYPE VARCHAR2(30)
);
COMMIT;
CONTENT_ID CONTENT_TYPE
-----------------------
1 MOVIE
2 MOVIE
3 AUDIO
4 AUDIO
5 MAGAZINE
6 MAGAZINE
. Load only one content type at a time into the target table.
. The target table should always contain only one contain type.
. The loading of content types should follow round-robin style. First MOVIE, second
AUDIO, Third
MAGAZINE and again fourth Movie.
Solution:
First we will create a lookup table where we mention the priorities for the content
types. The lookup
table �Create Statement� and data is shown below.
CREATE TABLE CONTENTS_LKP
CONTENT_TYPE VARCHAR2(30),
PRIORITY INTEGER,
LOAD_FLAG INTEGER
);
COMMIT;
---------------------------------
MOVIE 1 1
AUDIO 2 0
MAGAZINE 3 0
The second step is to truncate the target table before loading the data
SELECT CONTENT_ID,
CONTENT_TYPE
FROM CONTENTS
UPDATE CONTENTS_LKP
SET LOAD_FLAG = 0
WHERE LOAD_FLAG = 1;
UPDATE CONTENTS_LKP
SET LOAD_FLAG = 1
WHERE PRIORITY = (
FROM CONTENTS_LKP
);
Recommended Posts:
---
Knowing the table size and index size of a table is always worth. This can be
helpful when you want
to load the data of a table from one database to another database. You can create
the required
space in the new database just a head.
Estimated Table Size:
The SQL query to know the estimated table size in Oracle is
FROM
SELECT table_name ,
FROM user_tab_columns
WHERE table_name=UPPER('&Enter_Table_Name')
GROUP BY table_name
) A,
FROM
SELECT table_name ,
(sum (column_length) / 1048576) * 1000000 row_size_in_bytes
FROM user_ind_columns
WHERE table_name=UPPER('&Enter_Table_Name')
GROUP BY table_name
) A,
Analytic functions compute aggregate values based on a group of rows. They differ
from aggregate
functions in that they return multiple rows for each group. Most of the SQL
developers won't use
analytical functions because of its cryptic syntax or uncertainty about its logic
of operation. Analytical
functions saves lot of time in writing queries and gives better performance when
compared to native
SQL.
Before starting with the interview questions, we will see the difference between
the aggregate
functions and analytic functions with an example. I have used SALES TABLE as an
example to
solve the interview questions. Please create the below sales table in your oracle
database.
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
INSERT INTO SALES VALUES ( 1, 100, 2008, 10, 5000);
COMMIT;
--------------------------------------
SELECT Year,
COUNT(1) CNT
FROM SALES
GROUP BY YEAR;
YEAR CNT
---------
2009 3
2010 3
2011 3
2008 3
2012 3
SELECT SALE_ID,
PRODUCT_ID,
Year,
QUANTITY,
PRICE,
FROM SALES;
------------------------------------------
From the ouputs, you can observe that the aggregate functions return only one row
per group
whereas analytic functions keeps all the rows in the gorup. Using the aggregate
functions, the select
clause contains only the columns specified in group by clause and aggregate
functions whereas in
analytic functions you can specify all the columns in the table.
The PARTITION BY clause is similar to GROUP By clause, it specifies the window of
rows that the
analytic funciton should operate on.
I hope you got some basic idea about aggregate and analytic functions. Now lets
start with solving
the Interview Questions on Oracle Analytic Functions.
1. Write a SQL query using the analytic function to find the total sales(QUANTITY)
of each product?
Solution:
SUM analytic function can be used to find the total sales. The SQL query is
SELECT PRODUCT_ID,
QUANTITY,
FROM SALES;
PRODUCT_ID QUANTITY TOT_SALES
-----------------------------
100 12 71
100 10 71
100 25 71
100 16 71
100 8 71
200 15 72
200 10 72
200 20 72
200 14 72
200 13 72
300 20 94
300 18 94
300 17 94
300 20 94
300 19 94
2. Write a SQL query to find the cumulative sum of sales(QUANTITY) of each product?
Here first
sort the QUANTITY in ascendaing order for each product and then accumulate the
QUANTITY.
Cumulative sum of QUANTITY for a product = QUANTITY of current row + sum of
QUANTITIES all
previous rows in that product.
Solution:
We have to use the option "ROWS UNBOUNDED PRECEDING" in the SUM analytic function
to get
the cumulative sum. The SQL query to get the ouput is
SELECT PRODUCT_ID,
QUANTITY,
FROM SALES;
-----------------------------
100 8 8
100 10 18
100 12 30
100 16 46
100 25 71
200 10 10
200 13 23
200 14 37
200 15 52
200 20 72
300 17 17
300 18 35
300 19 54
300 20 74
300 20 94
The ORDER BY clause is used to sort the data. Here the ROWS UNBOUNDED PRECEDING
option
specifies that the SUM analytic function should operate on the current row and the
pervious rows
processed.
3. Write a SQL query to find the sum of sales of current row and previous 2 rows in
a product group?
Sort the data on sales and then find the sum.
Solution:
The sql query for the required ouput is
SELECT PRODUCT_ID,
QUANTITY,
SUM(QUANTITY) OVER(
PARTITION BY PRODUCT_ID
FROM SALES;
------------------------------
100 25 25
100 16 41
100 12 53
100 10 38
100 8 30
200 20 20
200 15 35
200 14 49
200 13 42
200 10 37
300 20 20
300 20 40
300 19 59
300 18 57
300 17 54
The ROWS BETWEEN clause specifies the range of rows to consider for calculating the
SUM.
4. Write a SQL query to find the Median of sales of a product?
Solution:
The SQL query for calculating the median is
SELECT PRODUCT_ID,
QUANTITY,
FROM SALES;
--------------------------
100 8 12
100 10 12
100 12 12
100 16 12
100 25 12
200 10 14
200 13 14
200 14 14
200 15 14
200 20 14
300 17 19
300 18 19
300 19 19
300 20 19
300 20 19
5. Write a SQL query to find the minimum sales of a product without using the group
by clause.
Solution:
The SQL query is
SELECT PRODUCT_ID,
YEAR,
QUANTITY
FROM
SELECT PRODUCT_ID,
YEAR,
QUANTITY,
FROM SALES
) WHERE MIN_SALE_RANK = 1;
------------------------
100 2012 8
200 2010 10
300 2008 17
SELECT P.PRODUCT_NAME,
[Link],
[Link]
FROM PRODUCTS P,
SALES S
(SELECT AVG(QUANTITY)
FROM SALES S1
);
--------------------------
Nokia 2010 25
IPhone 2012 20
Samsung 2012 20
Samsung 2010 20
2. Write a query to compare the products sales of "IPhone" and "Samsung" in each
year? The output
should look like as
---------------------------------------------------
SELECT S_I.YEAR,
S_I.QUANTITY IPHONE_QUANT,
S_S.QUANTITY SAM_QUANT,
S_I.PRICE IPHONE_PRICE,
S_S.PRICE SAM_PRICE
SALES S_I,
PRODUCTS P_S,
SALES S_S
SELECT P.PRODUCT_NAME,
[Link],
RATIO_TO_REPORT([Link]*[Link])
FROM PRODUCTS P,
SALES S
-----------------------------
4. In the SALES table quantity of each product is stored in rows for every year.
Now write a query to
transpose the quantity for each product and display it in columns? The output
should look like as
------------------------------------------
IPhone 10 15 20
Samsung 20 18 20
Nokia 25 16 8
Solution:
Oracle 11g provides a pivot function to transpose the row data into column data.
The SQL query for
this is
SELECT * FROM
SELECT P.PRODUCT_NAME,
[Link],
[Link]
FROM PRODUCTS P,
SALES S
)A
If you are not running oracle 11g database, then use the below query for
transposing the row data
into column data.
SELECT P.PRODUCT_NAME,
FROM PRODUCTS P,
SALES S
SELECT YEAR,
COUNT(1) NUM_PRODUCTS
FROM SALES
GROUP BY YEAR;
YEAR NUM_PRODUCTS
------------------
2010 3
2011 3
2012 3
As a database developer, writing SQL queries, PLSQL code is part of daily life.
Having a good
knowledge on SQL is really important. Here i am posting some practical examples on
SQL queries.
To solve these interview questions on SQL queries you have to create the products,
sales tables in
your oracle database. The "Create Table", "Insert" statements are provided below.
(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
COMMIT;
PRODUCT_ID PRODUCT_NAME
-----------------------
100 Nokia
200 IPhone
300 Samsung
--------------------------------------
Here Quantity is the number of products sold in each year. Price is the sale price
of each product.
I hope you have created the tables in your oracle database. Now try to solve the
below SQL queries.
1. Write a SQL query to find the products which have continuous increase in sales
every year?
Solution:
Here �Iphone� is the only product whose sales are increasing every year.
STEP1: First we will get the previous year sales for each product. The SQL query to
do this is
SELECT P.PRODUCT_NAME,
[Link],
[Link],
LEAD([Link],1,0) OVER (
PARTITION BY P.PRODUCT_ID
) QUAN_PREV_YEAR
FROM PRODUCTS P,
SALES S
-----------------------------------------
Nokia 2012 8 16
Nokia 2011 16 25
Nokia 2010 25 0
IPhone 2012 20 15
IPhone 2011 15 10
IPhone 2010 10 0
Samsung 2012 20 18
Samsung 2011 18 20
Samsung 2010 20 0
Here the lead analytic function will get the quantity of a product in its previous
year.
STEP2: We will find the difference between the quantities of a product with its
previous year�s
quantity. If this difference is greater than or equal to zero for all the rows,
then the product is a
constantly increasing in sales. The final query to get the required result is
SELECT PRODUCT_NAME
FROM
SELECT P.PRODUCT_NAME,
[Link] -
LEAD([Link],1,0) OVER (
PARTITION BY P.PRODUCT_ID
) QUAN_DIFF
FROM PRODUCTS P,
SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID
)A
GROUP BY PRODUCT_NAME
PRODUCT_NAME
------------
IPhone
2. Write a SQL query to find the products which does not have sales at all?
Solution:
�LG� is the only product which does not have sales at all. This can be achieved in
three ways.
Method1: Using left outer join.
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID);
PRODUCT_NAME
------------
LG
Method2: Using the NOT IN operator.
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
PRODUCT_NAME
------------
LG
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
PRODUCT_NAME
------------
LG
3. Write a SQL query to find the products whose sales decreased in 2012 compared to
2011?
Solution:
Here Nokia is the only product whose sales decreased in year 2012 when compared
with the sales
in the year 2011. The SQL query to get the required output is
SELECT P.PRODUCT_NAME
FROM PRODUCTS P,
SALES S_2012,
SALES S_2011
PRODUCT_NAME
------------
Nokia
SELECT PRODUCT_NAME,
YEAR
FROM
SELECT P.PRODUCT_NAME,
[Link],
RANK() OVER (
PARTITION BY [Link]
) RNK
FROM PRODUCTS P,
SALES S
) A
WHERE RNK = 1;
PRODUCT_NAME YEAR
--------------------
Nokia 2010
Samsung 2011
IPhone 2012
Samsung 2012
SELECT P.PRODUCT_NAME,
FROM PRODUCTS P
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID)
GROUP BY P.PRODUCT_NAME;
PRODUCT_NAME TOTAL_SALES
---------------------------
LG 0
IPhone 405000
Samsung 406000
Nokia 245000
MANAGER_ID NUMBER(6)
NAME VARCHAR2(10),
LAST_NAME VARCHAR2(10),
SALARY NUMBER(10,2),
MANAGER_ID NUMBER(6),
DEPARTMENT_ID NUMBER(4)
-------------------------------------------------
10 Account 201
20 HR 501
1. Source qualifier transformation can be used to join sources only from the same
database.
2. Connect the source definitions of departments and employees to the same
qualifier
transformation.
3. As there is a primary-key, foreign-key relationship between the source tables,
the source
qualifier transformation by default joins the two sources on the DEPARTMENT_ID
column.
DEPARTMENT_ID
SALARY
EMPLOYEE_ID
NAME
LAST_NAME
MANAGER_ID
The first two ports should be DEPARTMENT_ID, SALARY and the rest of the ports can
be in any
order.
Now go to the properties tab of source qualifier-> Number Of Sorted Ports. Make the
Number Of
Sorted Ports value as 2.
5. Create a mapping to get only distinct departments in employees table?
Solution:
1. The source qualifier transformation should only contain the DEPARTMENT_ID port
from
EMPLOYEES source definition.
2. Now go to the properties tab of source qualifier-> Select Distinct. Check the
check box of Select
Distinct option.
Find is one of the powerful utility of Unix (or Linux) used for searching the files
in a directory
hierarchy. The syntax of find command is
!find
This will execute the last find command. It also displays the last find command
executed along with
the result on the terminal.
2. How to find for a file using name?
./bkp/[Link]
./[Link]
This will find all the files with name "[Link]" in the current directory and sub-
directories.
3. How to find for files using name and ignoring case?
./[Link]
./bkp/[Link]
./[Link]
This will find all the files with name "[Link]" while ignoring the case in the
current directory and
sub-directories.
4. How to find for a file in the current directory only?
./[Link]
This will find for the file "[Link]" in the current directory only
5. How to find for files containing a specific word in its name?
./[Link]
./bkp/[Link]
./[Link]
./[Link]
It displayed all the files which have the word "java" in the filename
6. How to find for files in a specific directory?
This will look for the files in the /etc directory with "java" in the filename
7. How to find the files whose name are not "[Link]"?
./[Link]
./bkp
./[Link]
This is like inverting the match. It prints all the files except the given file
"[Link]".
8. How to limit the file searches to specific directories?
./tmp/[Link]
./bkp/var/tmp/files/[Link]
./bkp/var/tmp/[Link]
./bkp/var/[Link]
./bkp/[Link]
./[Link]
You can see here the find command displayed all the files with name "[Link]" in
the current
directory and sub-directories.
a. How to print the files in the current directory and one level down to the
current directory?
./tmp/[Link]
./bkp/[Link]
./[Link]
b. How to print the files in the current directory and two levels down to the
current directory?
./tmp/[Link]
./bkp/var/[Link]
./bkp/[Link]
./[Link]
./tmp/[Link]
./bkp/var/tmp/files/[Link]
./bkp/var/tmp/[Link]
./bkp/var/[Link]
./bkp/[Link]
9. How to find the empty files in a directory?
./empty_file
10. How to find the largest file in the current directory and sub directories
The find command "find . -type f -exec ls -s {} \;" will list all the files along
with the size of the file.
Then the sort command will sort the files based on the size. The head command will
pick only the
first line from the output of sort.
11. How to find the smallest file in the current directory and sub directories
find . -type s
b. Finding directories
find . -type d
find . -type f
14. How to find the files which are modified after the modification of a give file.
This will display all the files which are modified after the file "[Link]"
15. Display the files which are accessed after the modification of a give file.
find -anewer "[Link]"
16. Display the files which are changed after the modification of a give file.
This will display the files which have read, write, and execute permissions. To
know the permissions
of files and directories use the command "ls -l".
18. Find the files which are modified within 30 minutes.
find . -mtime -1
20. How to find the files which are modified 30 minutes back
21. How to find the files which are modified 1 day back.
find . -atime -1
find . -ctime -2
26. How to find the files which are created between two files.
So far we have just find the files and displayed on the terminal. Now we will see
how to perform
some operations on the files.
1. How to find the permissions of the files which contain the name "java"?
Alternate method is
2. Find the files which have the name "java" in it and then display only the files
which have "class"
word in them?
find -name "*java*" -exec grep -H class {} \;
This will delete all the files which have the word �java" in the file name in the
current directory and
sub-directories.
Similarly you can apply other Unix commands on the files found using the find
command. I will add
more examples as and when i found.
. Aggregate Cache: The integration service stores the group values in the index
cache and row
data in the data cache.
. Aggregate Expression: You can enter expressions in the output port or variable
port.
. Group by Port: This tells the integration service how to create groups. You can
configure input,
input/output or variable ports for the group.
. Sorted Input: This option can be used to improve the session performance. You can
use this
option only when the input to the aggregator transformation in sorted on group by
ports.
Property
Description
Cache Directory
The Integration Service creates the index and data cache files.
Tracing Level
Sorted Input
Indicates input data is already sorted by groups. Select this option only if the
input to the Aggregator transformation is sorted.
Aggregator Data
Cache Size
Default cache size is 2,000,000 bytes. Data cache stores row data.
Aggregator Index
Cache Size
Default cache size is 1,000,000 bytes. Index cache stores group by ports
data
Transformation Scope
Group By Ports:
The integration service performs aggregate calculations and produces one row for
each group. If you
do not specify any group by ports, the integration service returns one row for all
input rows. By
default, the integration service returns the last row received for each group along
with the result of
aggregation. By using the FIRST function, you can specify the integration service
to return the first
row of the group.
Aggregate Expressions:
You can create the aggregate expressions only in the Aggregator transformation. An
aggregate
expression can include conditional clauses and non-aggregate functions. You can use
the following
aggregate functions in the Aggregator transformation,
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
Examples: MAX(SUM(sales))
Conditional clauses:
You can reduce the number of rows processed in the aggregation by specifying a
conditional clause.
This will include only the salaries which are greater than 1000 in the SUM
calculation.
Non Conditional clauses:
You can also use non-aggregate functions in aggregator transformation.
Note: By default, the Integration Service treats null values as NULL in aggregate
functions. You can
change this by configuring the integration service.
Incremental Aggregation:
After you create a session that includes an Aggregator transformation, you can
enable the session
option, Incremental Aggregation. When the Integration Service performs incremental
aggregation, it
passes source data through the mapping and uses historical cache data to perform
aggregation
calculations incrementally.
Sorted Input:
You can improve the performance of aggregator transformation by specifying the
sorted input. The
Integration Service assumes all the data is sorted by group and it performs
aggregate calculations
as it reads rows for a group. If you specify the sorted input option without
actually sorting the data,
then integration service fails the session.
Take a Quiz on Aggregator Transformation
If you like this post, then please share it on google by clicking on the +1 button.
HashMap is a Hash table based implementation of the Map interface. The Map
interface associates
the values to the unique keys. The HashMap implements all the operations of map and
also permits
null values and null key. The HashMap is equivalent to Hashtable except that it is
unsynchronized
and permits nulls. As the HashMap is unsynchronized, it is not multi-threaded safe.
The order of the
map will not remain constant over the time.
The HashMap works on the principle of hashing. It has the put() and get() methods
for storing and
retrieving data from HashMap.
How the HashMap stores keys and values:
When you want to store an object on HashMap using the put() method, the HashMap
calls the
hashcode() hashMap key object and by applying that hashcode on its own hashing
function it
identifies a bucket location for storing value object. The important point to note
is that the HashMap
stores both key and values in the bucket.
HashMap Example:
import [Link].*
class HashMapDemo {
public static void main(string args[]) {
Iterator i = [Link]();
while( [Link]() ) {
[Link] m = ([Link])[Link]();
John-56
Tim-77
Mary-85
cristy-63
Karren-92
. The Extraction part involves understanding, analyzing and cleaning of the source
data.
. Transformation part involves cleaning of the data more precisely and modifying
the data as per
the business requirements.
. The loading part involves assigning the dimensional keys and loading into the
warehouse.
The awk command can be used to group a set of lines into a paragraph. We will also
use a bash
shell script to group the lines into a paragraph. As an example, consider the file,
[Link], with the
below data
>cat [Link]
A one
B two
C three
D four
E five
F six
G seven
A one
B two
C three
D four
E five
F six
G seven
line_count=1
do
S=`expr $line_count % 3`
if [ "$S" -eq 0 ]
then
echo -e $line"\n"
else
echo $line
fi
line_count=`expr $line_count + 1`
Now we will see how to achieve this using the Awk command in Unix. The awk command
for this is
AT&T
Synaptic Hosting service offers pay-as-you-go access to virtual servers and storage
integrated with
security and networking functions
BlueLock
BlueLock is one of the leading VCE providers in the world. Provides Cloud resources
for VMware.
CSC
CSC launched BizCloud, a unique private cloud service that integrates
Infrastructure as a Service
into legacy IT system and interlinks it with Software as a Service providers.
Enomaly
Enomaly's Elastic Computing Platform (ECP) is software that integrates enterprise
data centers with
commercial cloud computing offerings.
Google
Google usescloud computing in building google apps. Google apps include e-mail,
calendar, word
processing, Web site creation tool and many more.
GoGrid
The GoGrid focuses on Web-based storage. Deploy Windows and Linux virtual servers
onto the
cloud quickly.
IBM
IBM expanding its cloud services business and occupying the market share quickly.
Joyent
creates cloud infrastructure packages.
Microsoft
Microsoft uses cloud computing in Azure. Azure is a Windows as-a-service platform
consisting of the
operating system and developer services.
NetSuite
Netsuite offers cloud computing in e-commerce, CRM, accounting and ERP tools.
Rackspace
The Rackspace provides cloud computing services like Cloud sites for websites,
Cloud Files for
storage, Cloud Servers for virtual servers.
RightScale
The RightScale provides cloud services that helps customers manage the IT
processes.
Salesforce
Salesforce CRM tools include salesforce automation, analytics, marketing and social
networking
tools.
Verizon
Verizon was able to expand its cloud services portfolio into the enterprise market.
We will see how to convert hexadecimal numbers into decimal numbers using Unix
command bash
scripting.
The bc command can be used to convert hexadecimal number into decimal number
Example:
>echo "ibase=16;ABCD"|bc
43981
>echo "obase=16;43981"|bc
ABCD
>cat [Link]
ABCD
125A
F36C
E962
The bash script shown below converts the hexadecimal numbers into decimal numbers
#!/bin/bash
do
43981
4698
62316
59746
>cat [Link]
43981
4698
62316
59746
Now we will do the reverse process. The bash script for converting the decimal
numbers to
hexadecimal numbers is
#!/bin/bash
do
ABCD
125A
F36C
E962
If you like this post, then please share it on Google by clicking on the +1 button.
Transformations in Informatica 9
What is a Transformation
A transformation is a repository object which reads the data, modifies the data and
passes the data.
Transformations in a mapping represent the operations that the integration service
performs on the
data.
Transformations can be classified as active or passive, connected or unconnected.
Active Transformations:
A transformation can be called as an active transformation if it performs any of
the following actions.
. Change the number of rows: For example, the filter transformation is active
because it removes
the rows that do not meet the filter condition. All multi-group transformations are
active because
they might change the number of rows that pass through the transformation.
. Change the transaction boundary: The transaction control transformation is active
because it
defines a commit or roll back transaction.
. Change the row type: Update strategy is active because it flags the rows for
insert, delete,
update or reject.
Note: You cannot connect multiple active transformations or an active and passive
transformation to
the downstream transformation or transformation same input group. This is because
the integration
service may not be able to concatenate the rows generated by active
transformations. This rule is
not applicable for sequence generator transformation.
Passive Transformations:
Transformations which does not change the number of rows passed through them,
maintains the
transaction boundary and row type are called passive transformation.
Connected Transformations:
Transformations which are connected to the other transformations in the mapping are
called
connected transformations.
Unconnected Transformations:
An unconnected transformation is not connected to other transformations in the
mapping and is
called within another transformation, and returns a value to that.
The below table lists the transformations available in Informatica version 9:
Transformation
Type
Description
Aggregator
Active/Connected
Performs aggregate
calculations.
ApplicationSourceQualifier
Active/Connected
Custom
ActiveorPassive/Connected
Calls a procedure in a
shared library or DLL.
DataMasking
Passive/Connected
Replaces sensitive
production data with realistic
test data for non-production
environments.
Expression
Passive/Connected
Calculates a value.
ExternalProcedure
Passive/ConnectedorUnconnected
Calls a procedure in a
shared library or in the COM
layer of Windows.
Filter
Active/Connected
Filters data.
HTTP
Passive/Connected
Input
Passive/Connected
Java
ActiveorPassive/Connected
Joiner
Active/Connected
Lookup
ActiveorPassive/ConnectedorUnconnected
Normalizer
Active/Connected
Output
Passive/Connected
Rank
Active/Connected
Router
Active/Connected
SequenceGenerator
Passive/Connected
Sorter
Active/Connected
SourceQualifier
Active/Connected
SQL
ActiveorPassive/Connected
StoredProcedure
Passive/ConnectedorUnconnected
TransactionControl
Active/Connected
Union
Active/Connected
UnstructuredData
ActiveorPassive/Connected
Transforms data in
unstructured and semi-
structured formats.
UpdateStrategy
Active/Connected
Determines whether to
insert, delete, update, or
reject rows.
XMLGenerator
Active/Connected
XMLParser
Active/Connected
XMLSourceQualifier
Active/Connected
. Joins: You can join two or more tables from the same source database. By default
the sources
are joined based on the primary key-foreign key relationships. This can be changed
by explicitly
specifying the join condition in the "user-defined join" property.
. Filter rows: You can filter the rows from the source database. The integration
service adds a
WHERE clause to the default query.
. Sorting input: You can sort the source data by specifying the number for sorted
ports. The
Integration Service adds an ORDER BY clause to the default SQL query
. Distinct rows: You can get distinct rows from the source by choosing the "Select
Distinct"
property. The Integration Service adds a SELECT DISTINCT statement to the default
SQL
query.
. Custom SQL Query: You can write your own SQL query to do calculations.
Now you can see in the below image how the source qualifier transformation is
connected to the
source definition.
Property
Description
SQL Query
User-Defined Join
Source Filter
Specifies the filter condition the Integration Service applies when querying
rows.
Number of Sorted
Ports
Tracing Level
Sets the amount of detail included in the session log when you run a session
containing this transformation.
Select Distinct
Pre-session SQL commands to run against the source database before the
Integration Service reads the source.
Post-SQL
Post-session SQL commands to run against the source database after the
Integration Service writes to the target.
Output is
Deterministic
Specify only when the source output does not change between session runs.
Output is
Repeatable
Specify only when the order of the source output is same between the session
runs.
Note: For flat file source definitions, all the properties except the Tracing level
will be disabled.
To Understand the following, Please create the employees and departments tables in
the source
and emp_dept table in the target database.
MANAGER_ID NUMBER(6)
);
NAME VARCHAR2(10),
SALARY NUMBER(10,2),
MANAGER_ID NUMBER(6),
DEPARTMENT_ID NUMBER(4)
);
NAME VARCHAR2(10),
SALARY NUMBER(10,2),
MANAGER_ID NUMBER(6),
DEPARTMENT_ID NUMBER(4),
);
. Go to the Properties tab, select "SQL Query" property. Then open the SQL Editor,
select the
"ODBC data source" and enter the username, password.
. Click Generate SQL.
. Click Cancel to exit.
SELECT employees.employee_id,
[Link],
[Link],
employees.manager_id,
employees.department_id
FROM employees
You can write your own SQL query rather than relaying the default query for
performing calculations.
Note: You can generate the SQL query only if the output ports of source qualifier
transformation is
connected to any other transformation in the mapping. The SQL query generated
contains only the
columns or ports which are connected to the downstream transformations.
[Link]
SQ_Prop1.jpg
[Link]
SQ_sql_editor.jpg
Specifying the "Source Filter, Number Of Sorted Ports and Select Distinct"
properties:
Follow the below steps for specifying the filter condition, sorting the source data
and for selecting the
distinct rows.
Now follow the steps for "Generating the SQL query" and generate the SQL query. The
SQL query
generated is
[Link],
[Link],
employees.manager_id,
employees.department_id
FROM employees
WHERE employees.department_id=100
Observe the DISTINCT, WHERE and ORDER BY clauses in the SQL query generated. The
order by
clause contains the first two ports in the source qualifier transformation. If you
want to sort the data
on department_id, salary ports; simply move these ports to top position in the
source qualifier
transformationa and specify the "Number Of Sorted Ports" property as 2
Joins:
The SQL transformation can be used to join sources from the same database. By
default it joins the
sources based on the primary-key, foreign-key relationships. To join heterogeneous
sources, use
Joiner Transformation.
A foreign-key is created on the department_id column of the employees table, which
references the
primary-key column, department_id, of the departments table.
Follow the below steps to see the default join
Create only one source qualifier transformation for both the employees and
departments.
Go to the properties tab of the source qualifier transformation, select the "SQL
QUERY" property and
generate the SQL query.
The Generated SQL query is
SELECT employees.employee_id,
[Link],
[Link],
employees.manager_id,
employees.department_id,
departments.department_name
FROM employees,
departments
WHERE departments.department_id=employees.department_id
[Link]
SQ_default_join.jpg
You can see the employees and departments tables are joined on the department_id
column in the
WHERE clause.
There might be case where there won't be any relationship between the sources. In
that case, we
need to override the default join. To do this we have to specify the join condition
in the "User Defined
Join" Property. Using this property we can specify outer joins also. The join
conditions entered here
are database specific.
As an example, if we want to join the employees and departments table on the
manager_id column,
then in the "User Defined Join" property specify the join condition as
"departments.manager_id=employees.manager_id". Now generate the SQL and observe the
WHERE clause.
Pre and Post SQL:
You can add the Pre-SQL and Post-SQL commands. The integration service runs the
Pre-SQL and
Post-SQL before and after reading the source data respectively.
Take Quiz on Source Qualifier Transformation.
If you like this post, then please share it on google by clicking on the +1 button.
Q) How to print the lines in a file in reverse order? Which means we have to print
the data of file from
last line to the first line.
We will see different methods to reverse the data in a file. As an example,
consider the file with the
below data.
>cat [Link]
Header
line2
line3
line4
Footer
We need to display the lines in a file in reverse order. The output data is
Footer
line4
line3
line2
Header
[Link] tac command in unix can be used to print the file in reverse. The tac command
is
tac [Link]
This topic will cover different methods to reverse each character in a string and
reversing the tokens
in a string.
Reversing a string:
1. The sed command can be used to reverse a string. The sed command for this is
n=split($0,arr,"");
for(i=1;i<=n;i++)
s=arr[i] s
END
print s
}'
4. In this method, a bash script will be written to reverse the string. The bash
script is
#!/bin/bash
str="hello world"
len=`expr $len - 1`
rev=""
rev=$rev$rev1
len=`expr $len - 1`
done
echo $rev
The output of all the above four methods is the reverse of the string "hello
world", which is
dlrow olleh
n=split($0,A);
S=A[n];
for(i=n-1;i>0;i--)
S=S" "A[i]
END
{
print S
}'
2. Using the tac and tr command we can reverse the tokens in a string. The unix
command is
#!/bin/bash
TOKENS="hello world"
for i in $TOKENS
do STR="$i $STR"
done
echo $STR
world hello
>cat [Link]
12345
67890
10100
10000
The required output should not contain the trailing zeros. The output should be
12345
6789
101
Here the rev command will reverse the string in each line. Now the trailing zeros
will become leading
zeros. In the awk command the string is converted into a number and the leading
zeros will be
removed. At the end, the rev command again reverses the string.
If you know any other methods to remove the trailing zeros, then please comment
here.
Remove the Lines from a file which are same as the first line - Unix Awk
Q) How to remove the lines which are same as the first line.
Awk command can be used to remove the lines which are same as the first line in a
file. I will also
show you another method of removing the file. As an example, consider the file with
the below data.
Header
line2
line3
Header
line5
line6
line7
Header
The first line contains the text "Header". We need to remove the lines which has
the same text as the
first line.
The required output data is
Header
line2
line3
line5
line6
line7
The awk command can be used to achieve this. The awk command for this is
awk '{
if(NR==1)
x=$0;
print $0
}
else if(x!=$0)
print $0
}' [Link]
1 A
2 B
3 C
4 D
5 E
6 F
Let say, we want to insert the new line "9 Z" after every two lines in the input
file. The required output
data after inserting a new line looks as
1 A
2 B
9 Z
3 C
4 D
9 Z
5 E
6 F
9 Z
awk '{
if(NR%2 == 0)
else
print $0
}' [Link]
AAA 1
BBB 2
CCC 3
AAA 4
AAA 5
BBB 6
CCC 7
AAA 8
BBB 9
AAA 0
Now we want to replace the fourth occurrence of the first filed "AAA" with "ZZZ" in
the file.
The required output is:
AAA 1
BBB 2
CCC 3
AAA 4
AAA 5
BBB 6
CCC 7
ZZZ 8
BBB 9
AAA 0
if($1 == "AAA")
{
count++
if(count == 4)
sub("AAA","ZZZ",$1)
print $0
}' [Link]
A 10
B 39
C 22
D 44
E 75
F 89
G 67
You have to get the second field and then find the sum the even and odd lines.
The required output is
174, 172
The awk command for producing this output is
awk '{
if(NR%2 == 1)
sum_e = sum_e + $2
else
sum_o = sum_o + $2
for(i=0;i<=10;i++)
if (i <=1 )
x=0;
y=1;
print i;
}
else
z=x+y;
print z;
x=y;
y=z;
}'
The output is
13
21
34
55
5. Remove leading zeros from a file using the awk command. The input file contains
the below data.
0012345
05678
01010
00001
After removing the leading zeros, the output should contain the below data.
12345
5678
1010
The string aggregate functions concatenate multiple rows into a single row.
Consider the products
table as an example.
Table Name: Products
Year product
-------------
2010 A
2010 B
2010 C
2010 D
2011 X
2011 Y
2011 Z
Here, in the output we will concatenate the products in each year by a comma
separator. The
desired output is:
year product_list
------------------
2010 A,B,C,D
2011 X,Y,Z
SELECT year,
FROM products
GROUP BY year;
WM_CONCAT function:
You cannot pass an explicit delimiter to the WM_CONCAT function. It uses comma as
the string
separator.
SELECT year,
wm_concat(product) AS product_list
FROM products
GROUP BY year;
Pivot:
The pviot operator converts row data to column data and also can do aggregates
while converting.
To see how pivot operator works, consider the following "sales" table as any
example
--------------------------------------
1 A 10
1 B 20
2 A 30
2 B 40
2 C 50
3 A 60
3 B 70
3 C 80
The rows of the "sales" table needs to be converted into columns as shown below
-----------------------------------------
1 10 20
2 30 40 50
3 60 70 80
SELECT *
Pivot can be used to generate the data in xml format. The query for generating the
data into xml
fomat is shown below.
SELECT *
If you are not using oracle 11g database, then you can implement the unpivot
feature as converting
rows to columns
Unpivot:
-----------------------------------------
1 10 20
2 30 40 50
3 60 70 80
Table Name: sales
---------------------------
1 A 10
1 B 20
2 A 30
2 B 40
2 C 50
3 A 60
3 B 70
3 C 80
SELECT *
FROM sales_rev
. The columns price and product in the unpivot clause are required and these names
need not to
be present in the table.
. The unpivoted columns must be specified in the IN clause
. By default the query excludes null values.
Top Examples of Awk Command in Unix
Awk is one of the most powerful tools in Unix used for processing the rows and
columns in a file.
Awk has built in string functions and associative arrays. Awk supports most of the
operators,
conditional blocks, and loops available in C language.
One of the good things is that you can convert Awk scripts into Perl scripts using
a2p utility.
The basic syntax of AWK:
Here the actions in the begin block are performed before processing the file and
the actions in the
end block are performed after processing the file. The rest of the actions are
performed while
processing the file.
Examples:
Create a file input_file with the following data. This file can be easily created
using the output of ls -l.
From the data, you can observe that this file has rows and columns. The rows are
separated by a
new line character and the columns are separated by a space characters. We will use
this file as the
input for the examples discussed here.
1. awk '{print $1}' input_file
Here $1 has a meaning. $1, $2, $3... represents the first, second, third columns...
in a row
respectively. This awk command will print the first column in each row as shown
below.
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
To print the 4th and 6th columns in a file use awk '{print $4,$5}' input_file
Here the Begin and End blocks are not used in awk. So, the print command will be
executed for
each row it reads from the file. In the next example we will see how to use the
Begin and End blocks.
2. awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}' input_file
This will prints the sum of the value in the 5th column. In the Begin block the
variable sum is
assigned with value 0. In the next block the value of 5th column is added to the
sum variable. This
addition of the 5th column to the sum variable repeats for every row it processed.
When all the rows
are processed the sum variable will hold the sum of the values in the 5th column.
This value is
printed in the End block.
3. In this example we will see how to execute the awk script written in a file.
Create a file
sum_column and paste the below script in that file
#!/usr/bin/awk -f
BEGIN {sum=0}
{sum=sum+$5}
square of 1 is 1
square of 2 is 4
square of 3 is 9
square of 4 is 16
square of 5 is 25
Notice that the syntax of �if� and �for� are similar to the C language.
Awk Built in Variables:
You have already seen $0, $1, $2... which prints the entire line, first column,
second column...
respectively. Now we will see other built in variables with examples.
FS - Input field separator variable:
So far, we have seen the fields separted by a space character. By default Awk
assumes that fields in
a file are separted by space characters. If the fields in the file are separted by
any other character,
we can use the FS variable to tell about the delimiter.
6. awk 'BEGIN {FS=":"} {print $2}' input_file
OR
awk -F: '{print $2} input_file
This will print the result as
39 p1
15 t1
38 t2
38 t3
39 t4
39 t5
center 0
center 17
center 26
center 25
center 43
center 48
center:0
center:17
center:26
center:25
center:43
center:48
Note: print $4,$5 and print $4$5 will not work the same way. The first one displays
the output with
space as delimiter. The second one displays the output without any delimiter.
NF - Number of fileds variable:
The NF can be used to know the number of fields in line
8. awk '{print NF}' input_file
This will display the number of columns in each row.
NR - number of records variable:
The NR can be used to know the line number or count of lines in a file.
9. awk '{print NR}' input_file
This will display the line numbers from 1.
10. awk 'END {print NR}' input_file
This will display the total number of lines in the file.
String functions in Awk:
Some of the string functions in awk are:
index(string,search)
length(string)
split(string,array,separator)
substr(string,position)
substr(string,position,max)
tolower(string)
toupper(string)
Advanced Examples:
1. Filtering lines using Awk split function
The awk split function splits a string into an array using the delimiter.
The syntax of split function is
split(string, array, delimiter)
Now we will see how to filter the lines using the split function with an example.
The input "[Link]" contains the data in the following format
1 U,N,UNIX,000
2 N,P,SHELL,111
3 I,M,UNIX,222
4 X,Y,BASH,333
5 P,R,SCRIPT,444
Required output: Now we have to print only the lines in which whose 2nd field has
the string "UNIX"
as the 3rd field( The 2nd filed in the line is separated by comma delimiter ).
The ouptut is:
1 U,N,UNIX,000
3 I,M,UNIX,222
awk '{
split($2,arr,",");
if(arr[3] == "UNIX")
print $0
} ' [Link]
#!/usr/bin/sh
filename=`basename $0`
echo $filename
Alias command is an alternative name used for long strings that are frequently
used. It is mostly
used for creating a simple name for a long command.
Syntax of alias command:
alias [alias_name=['command']]
For more information on alias utility see the man pages. Type 'man alias' on the
command prompt.
Examples:
1. alias
If you simply type alias on the command prompt and then enter, it will list all the
aliases that were
created.
2. alias pg='ps -aef'
The ps -aef command will list all the running processes. After creating the alias
pg for ps -aef, then
by using the pg on command prompt will display the running processes. The pg will
work same as
the ps -aef.
By creating an alias for a command on the command prompt will be present only for
that session.
Once you exit from the session, then the aliases won�t take effect. To make the
aliases to remain
permanent, place the alias command in the ".profile" of the user. Open the user
".profile" and place
the command alias pg="ps -aef", save the file and then source the ".profile" file.
Now the alias pg will
remain forever.
To remove an alias use the unalias command
Example: unalias pg
Unix provides the a2p (awk to perl) utility for converting the awk script to perl
script. The a2p
command takes an awk script and produces a comparable perl script.
Syntax of a2p:
a2p [options] [awk_script_filename]
Some of the useful options that you can pass to a2p are:
-D<number> Sets debugging flags.
-F<character> This will tell a2p that awk script is always invoked with -F option.
-<number> This makes a2p to assume that input will always have the specified number
of fields.
For more options see the man pages; man a2p
Example1:
The awk script which prints the squares of numbers up to 10 is shown below. Call
the below script
as awk_squares.
#!/bin/awk -f
BEGIN
exit;
Run this script using awk command; awk -f awk_squares. This will produce squares of
numbers up
to 10.
Now we will convert this script using the a2p as
a2p awk_squares > perl_squares
The content of converted perl script, perl_squares, is shown below:
#!/usr/bin/perl
if $running_under_some_shell;
last line;
Run the perl script as: perl perl_squares. This will produce the same result as the
awk.
Example2:
We will see an awk script which prints the first field from a file. The awk script
for this is shown
below. Call this script at awk_first_field.
#!/bin/awk -f
print $1;
Run this script using awk command by passing a file as input: awk -f
awk_first_field file_name. This
will prints the first field of each line from the file_name.
We will convert this awk script into per script using the a2p command as
a2p awk_first_field > perl_first_field
The content of converted perl script, perl_first_field, is shown below:
#!/usr/bin/perl
if $running_under_some_shell;
while (<>) {
print $Fld1;
}
Now run the perl script as: perl perl_first_field file_name. This will produce the
same result as awk
command.
1. In this problem we will see how to implement the not equal operator, greater
than, greater than or
equal to, less than and less than or equal to operators when joining two tables in
informatica.
Consider the below sales table as an example?
Table name: Sales
Now the problem is to identify the products whose sales is less than in the current
year (In this
example: 2011) when compared to the last year.
Here in this example, Product A sold less in 2011 when compared with the sales in
2010.
This problem can be easily implemented with the help of SQL query as shown below
SELECT cy.*
FROM SALES cy,
SALES py
WHERE [Link] = [Link]
AND [Link]=2011
AND [Link]=2010
AND cy.prod_quantity < py.prod_quantity;
In informatica, you can specify only equal to condition in joiner. Now we will see
how to implement
this problem using informatica.
Solution:
STEP1: Connect two source qualifier transformations to the source definition. Call
the first source
qualifier transformation as sq_cy (cy means current year) and the other as sq_py
(py means
previous year).
STEP2: In the sq_cy source qualifier transformation, specify the source filter as
price=2011. In the
sq_py, specify the source filter as price=2010
STEP3: Now connect these two source qualifier transformations to joiner
transformation and make
sq_cy as master, sq_py as detail. In the join condition, select the product port
from master and
detail.
STEP4: Now connect all the master ports and only the prod_quantity port from detail
to the filter
transformation. In the filter transformation specify the filter condition as
prod_quantity <
prod_quantity1. Here pord_quantity port is from master port and prod_quantity1 is
from detail port.
STEP4: Connect all the ports except the prod_quantity1 of filter transformation to
the target
definition.
2. How to implement the not exists operator in informatica which is available in
database?
Solution:
Implementing the Not Exists operator is very easy in informatica. For example, we
want to get only
the records which are available in table A and not in table B. For this use a
joiner transformation with
A as master and B as detail. Specify the join condition and in the join type,
select detail outer join.
This will get all the records from A table and only the matching records from B
table.
Connect the joiner to a filter transformation and specify the filter condition as
B_port is NULL. This
will give the records which are in A and not in B. Then connect the filter to the
target definition.
1. Write a command to print the lines that has the the pattern "july" in all the
files in a particular
directory?
grep july *
This will print all the lines in all files that contain the word �july� along with
the file name. If any of the
files contain words like "JULY" or "July", the above command would not print those
lines.
2. Write a command to print the lines that has the word "july" in all the files in
a directory and also
suppress the filename in the output.
grep -h july *
3. Write a command to print the lines that has the word "july" while ignoring the
case.
grep -i july *
The option i make the grep command to treat the pattern as case insensitive.
4. When you use a single file as input to the grep command to search for a
pattern, it won't print the
filename in the output. Now write a grep command to print the filename in the
output without using
the '-H' option.
grep pattern filename /dev/null
The /dev/null or null device is special file that discards the data written to it.
So, the /dev/null is
always an empty file.
Another way to print the filename is using the '-H' option. The grep command for
this is
grep -H pattern filename
5. Write a command to print the file names in a directory that does not contain the
word "july"?
grep -L july *
The '-L' option makes the grep command to print the filenames that do not contain
the specified
pattern.
6. Write a command to print the line numbers along with the line that has the word
"july"?
grep -n july filename
The '-n' option is used to print the line numbers in a file. The line numbers start
from 1
7. Write a command to print the lines that starts with the word "start"?
grep '^start' filename
The '^' symbol specifies the grep command to search for the pattern at the start of
the line.
8. In the text file, some lines are delimited by colon and some are delimited by
space. Write a
command to print the third field of each line.
awk '{ if( $0 ~ /:/ ) { FS=":"; } else { FS =" "; } print $3 }' filename
9. Write a command to print the line number before each line?
awk '{print NR, $0}' filename
10. Write a command to print the second and third line of a file without using NR.
awk 'BEGIN {RS="";FS="\n"} {print $2,$3}' filename
11. How to create an alias for the complex command and remove the alias?
The alias utility is used to create the alias for a command. The below command
creates alias for ps -
aef command.
alias pg='ps -aef'
If you use pg, it will work the same way as ps -aef.
To remove the alias simply use the unalias command as
unalias pg
12. Write a command to display todays date in the format of 'yyyy-mm-dd'?
The date command can be used to display todays date with time
date '+%Y-%m-%d'
Top Unix Interview Questions - Part 7
1. Write a command to remove the prefix of the string ending with '/'.
The basename utility deletes any prefix ending in /. The usage is mentioned below:
basename /usr/local/bin/file
This will display only file
2. How to display zero byte size files?
ls -l | grep '^-' | awk '/^-/ {if ($5 !=0 ) print $9 }'
3. How to replace the second occurrence of the word "bat" with "ball" in a file?
sed 's/bat/ball/2' < filename
4. How to remove all the occurrences of the word "jhon" except the first one in a
line with in the
entire file?
sed 's/jhon//2g' < filename
5. How to replace the word "lite" with "light" from 100th line to last line in a
file?
sed '100,$ s/lite/light/' < filename
6. How to list the files that are accessed 5 days ago in the current directory?
find -atime 5 -type f
7. How to list the files that were modified 5 days ago in the current directory?
find -mtime 5 -type f
8. How to list the files whose status is changed 5 days ago in the current
directory?
find -ctime 5 -type f
9. How to replace the character '/' with ',' in a file?
sed 's/\//,/' < filename
sed 's|/|,|' < filename
10. Write a command to find the number of files in a directory.
ls -l|grep '^-'|wc -l
1. How to display the processes that were run by your user name ?
ps -aef | grep <user_name>
2. Write a command to display all the files recursively with path under current
directory?
find . -depth -print
3. Display zero byte size files in the current directory?
find -size 0 -type f
4. Write a command to display the third and fifth character from each line of a
file?
cut -c 3,5 filename
5. Write a command to print the fields from 10th to the end of the line. The fields
in the line are
delimited by a comma?
cut -d',' -f10- filename
6. How to replace the word "Gun" with "Pen" in the first 100 lines of a file?
sed '1,00 s/Gun/Pen/' < filename
7. Write a Unix command to display the lines in a file that do not contain the word
"RAM"?
grep -v RAM filename
The '-v' option tells the grep to print the lines that do not contain the specified
pattern.
8. How to print the squares of numbers from 1 to 10 using awk command
awk 'BEGIN { for(i=1;i<=10;i++) {print "square of",i,"is",i*i;}}'
9. Write a command to display the files in the directory by file size?
ls -l | grep '^-' |sort -nr -k 5
10. How to find out the usage of the CPU by the processes?
The top utility can be used to display the CPU usage by the processes.
2. Write a command to search for the file 'map' in the current directory?
find -name map -type f
4. Write a command to remove the first number on all lines that start with "@"?
sed '\,^@, s/[0-9][0-9]*//' < filename
5. How to print the file names in a directory that has the word "term"?
grep -l term *
The '-l' option make the grep command to print only the filename without printing
the content of the
file. As soon as the grep command finds the pattern in a file, it prints the
pattern and stops searching
other lines in the file.
Q1. The source data contains only column 'id'. It will have sequence numbers from 1
to 1000. The
source data looks like as
Id
1
2
3
4
5
6
7
8
....
1000
Create a workflow to load only the Fibonacci numbers in the target table. The
target table data
should look like as
Id
1
2
3
5
8
13
.....
In Fibonacci series each subsequent number is the sum of previous two numbers. Here
assume that
the first two numbers of the fibonacci series are 1 and 2.
Solution:
STEP1: Drag the source to the mapping designer and then in the Source Qualifier
Transformation
properties, set the number of sorted ports to one. This will sort the source data
in ascending order.
So that we will get the numbers in sequence as 1, 2, 3, ....1000
STEP2: Connect the Source Qualifier Transformation to the Expression
Transformation. In the
Expression Transformation, create three variable ports and one output port. Assign
the expressions
to the ports as shown below.
Ports in Expression Transformation:
id
v_sum = v_prev_val1 + v_prev_val2
v_prev_val1 = IIF(id=1 or id=2,1, IIF(v_sum = id, v_prev_val2, v_prev_val1) )
v_prev_val2 = IIF(id=1 or id =2, 2, IIF(v_sum=id, v_sum, v_prev_val2) )
o_flag = IIF(id=1 or id=2,1, IIF( v_sum=id,1,0) )
STEP3: Now connect the Expression Transformation to the Filter Transformation and
specify the
Filter Condition as o_flag=1
STEP4: Connect the Filter Transformation to the Target Table.
Q2. The source table contains two columns "id" and "val". The source data looks
like as below
id val
1 a,b,c
2 pq,m,n
3 asz,ro,liqt
Here the "val" column contains comma delimited data and has three fields in that
column.
Create a workflow to split the fields in �val� column to separate rows. The output
should look like as
below.
id val
1 a
1 b
1 c
2 pq
2 m
2 n
3 asz
3 ro
3 liqt
Solution:
STEP1: Connect three Source Qualifier transformations to the Source Definition
STEP2: Now connect all the three Source Qualifier transformations to the Union
Transformation.
Then connect the Union Transformation to the Sorter Transformation. In the sorter
transformation
sort the data based on Id port in ascending order.
STEP3: Pass the output of Sorter Transformation to the Expression Transformation.
The ports in
Expression Transformation are:
id (input/output port)
val (input port)
v_currend_id (variable port) = id
v_count (variable port) = IIF(v_current_id!=v_previous_id,1,v_count+1)
v_previous_id (variable port) = id
o_val (output port) = DECODE(v_count, 1,
SUBSTR(val, 1, INSTR(val,',',1,1)-1 ),
2,
SUBSTR(val, INSTR(val,',',1,1)+1, INSTR(val,',',1,2)-INSTR(val,',',1,1)-1),
3,
SUBSTR(val, INSTR(val,',',1,2)+1),
NULL
)
STEP4: Now pass the output of Expression Transformation to the Target definition.
Connect id,
o_val ports of Expression Transformation to the id, val ports of Target Definition.
For those who are interested to solve this problem in oracle sql, Click Here. The
oracle sql query
provides a dynamic solution where the "val" column can have varying number of
fields in each row.
Unix Interview Questions on FIND Command
Find utility is used for searching files using the directory information.
1. Write a command to search for the file 'test' in the current directory?
find -name test -type f
2. Write a command to search for the file 'temp' in '/usr' directory?
find /usr -name temp -type f
3. Write a command to search for zero byte size files in the current directory?
find -size 0 -type f
4. Write a command to list the files that are accessed 5 days ago in the current
directory?
find -atime 5 -type f
5. Write a command to list the files that were modified 5 days ago in the current
directory?
find -mtime 5 -type f
6. Write a command to search for the files in the current directory which are not
owned by any user
in the /etc/passwd file?
find . -nouser -type f
7. Write a command to search for the files in '/usr' directory that start with
'te'?
find /usr -name 'te*' -type f
8. Write a command to search for the files that start with 'te' in the current
directory and then display
the contents of the file?
find . -name 'te*' -type f -exec cat {} \;
9. Write a command to list the files whose status is changed 5 days ago in the
current directory?
find -ctime 5 -type f
10. Write a command to list the files in '/usr' directory that start with 'ch' and
then display the number
of lines in each file?
find /usr -name 'ch*' -type f -exec wc -l {} \;
. Delimited selection: The fields in the line are delimited by a single character
like blank,comma
etc.
. Range selection: Each field starts with certain fixed offset defined as range.
1. Write a command to display the third and fourth character from each line of a
file?
3. Write a command to display the first 10 characters from each line of a file?
4. Write a comamnd to display from the 10th character to the end of the line?
5. The fields in each line are delimited by comma. Write a command to display third
field from each
line of a file?
6. Write a command to print the fields from 10 to 20 from each line of a file?
8. Write a command to print the fields from 10th to the end of the line?
9. By default the cut command displays the entire line if there is no delimiter in
it. Which cut option is
used to supress these kind of lines?
The -s option is used to supress the lines that do not contain the delimiter.
10. Write a cut command to extract the username from 'who am i' comamnd?
Type Conversion Function: This function is used to convert from one data type to
another. The
only type conversion function is CAST. The syntax of CAST is
The CAST function converts the expr into the specified type.
Table Generating Functions: These functions transform a single row into multiple
rows. EXPLODE
is the only table generated function. This function takes array as an input and
outputs the elements
of array into separate rows. The syntax of EXPLODE is
EXPLODE( ARRAY<A> )
When you use the table generating functions in the SELECT clause, you cannot
specify any other
columns in the SELECT clause.
Hive supports three types of conditional functions. These functions are listed
below:
IF( Test Condition, True Value, False Value )
The IF condition evaluates the �Test Condition� and if the �Test Condition� is
true, then it returns the
�True Value�. Otherwise, it returns the False Value.
Example: IF(1=1, 'working', 'not working') returns 'working'
COALESCE( value1,value2,... )
The COALESCE function returns the fist not NULL value from the list of values. If
all the values in
the list are NULL, then it returns NULL.
Example: COALESCE(NULL,NULL,5,NULL,4) returns 5
CASE Statement
The syntax for the case statement is:
CASE [ expression ]
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
WHEN conditionn THEN resultn
ELSE result
END
Here expression is optional. It is the value that you are comparing to the list of
conditions. (ie:
condition1, condition2, ... conditionn).
All the conditions must be of same datatype. Conditions are evaluated in the order
listed. Once a
condition is found to be true, the case statement will return the result and not
evaluate the conditions
any further.
All the results must be of same datatype. This is the value returned once a
condition is found to be
true.
IF no condition is found to be true, then the case statement will return the value
in the ELSE clause.
If the ELSE clause is omitted and no condition is found to be true, then the case
statement will return
NULL
Example:
CASE Fruit
WHEN 'APPLE' THEN 'The owner is APPLE'
WHEN 'ORANGE' THEN 'The owner is ORANGE'
ELSE 'It is another Fruit'
END
Date data types do not exist in Hive. In fact the dates are treated as strings in
Hive. The date
functions are listed below.
UNIX_TIMESTAMP()
This function returns the number of seconds from the Unix epoch (1970-01-01
[Link] UTC) using
the default time zone.
UNIX_TIMESTAMP( string date )
This function converts the date in format 'yyyy-MM-dd HH:mm:ss' into Unix
timestamp. This will
return the number of seconds between the specified date and the Unix epoch. If it
fails, then it
returns 0.
Example: UNIX_TIMESTAMP('2000-01-01 [Link]') returns 946713600
UNIX_TIMESTAMP( string date, string pattern )
This function converts the date to the specified date format and returns the number
of seconds
between the specified date and Unix epoch. If it fails, then it returns 0.
Example: UNIX_TIMESTAMP('2000-01-01 [Link]','yyyy-MM-dd') returns 946713600
FROM_UNIXTIME( bigint number_of_seconds [, string format] )
The FROM_UNIX function converts the specified number of seconds from Unix epoch and
returns
the date in the format 'yyyy-MM-dd HH:mm:ss'.
Example: FROM_UNIXTIME( UNIX_TIMESTAMP() ) returns the current date including the
time. This
is equivalent to the SYSDATE in oracle.
TO_DATE( string timestamp )
The TO_DATE function returns the date part of the timestamp in the format 'yyyy-MM-
dd'.
Example: TO_DATE('2000-01-01 [Link]') returns '2000-01-01'
YEAR( string date )
The YEAR function returns the year part of the date.
Example: YEAR('2000-01-01 [Link]') returns 2000
MONTH( string date )
The MONTH function returns the month part of the date.
Example: YEAR('2000-03-01 [Link]') returns 3
DAY( string date ), DAYOFMONTH( date )
The DAY or DAYOFMONTH function returns the day part of the date.
Example: DAY('2000-03-01 [Link]') returns 1
HOUR( string date )
The HOUR function returns the hour part of the date.
Example: HOUR('2000-03-01 [Link]') returns 10
MINUTE( string date )
The MINUTE function returns the minute part of the timestamp.
Example: MINUTE('2000-03-01 [Link]') returns 20
SECOND( string date )
The SECOND function returns the second part of the timestamp.
Example: SECOND('2000-03-01 [Link]') returns 30
WEEKOFYEAR( string date )
The WEEKOFYEAR function returns the week number of the date.
Example: WEEKOFYEAR('2000-03-01 [Link]') returns 9
DATEDIFF( string date1, string date2 )
The DATEDIFF function returns the number of days between the two given dates.
Example: DATEDIFF('2000-03-01', '2000-01-10') returns 51
DATE_ADD( string date, int days )
The DATE_ADD function adds the number of days to the specified date
Example: DATE_ADD('2000-03-01', 5) returns '2000-03-06'
DATE_SUB( string date, int days )
The DATE_SUB function subtracts the number of days to the specified date
Example: DATE_SUB('2000-03-01', 5) returns �2000-02-25�
This LOAD will move the file [Link] from HDFS into Hive�s warehouse directory for
the table. If the
table is dropped, then the table metadata and the data will be deleted.
External Tables: An external table refers to the data that is outside of the
warehouse directory.
As an example, consider the table creation and loading of data into the external
table.
In case of external tables, Hive does not move the data into its warehouse
directory. If the external
table is dropped, then the table metadata is deleted but not the data.
Note: Hive does not check whether the external table location exists or not at the
time the external
table is created.
The Numerical functions are listed below in alphabetical order. Use these functions
in SQL queries.
ABS( double n )
The ABS function returns the absolute value of a number.
Example: ABS(-100)
ACOS( double n )
The ACOS function returns the arc cosine of value n. This function returns Null if
the value n is not in
the range of -1<=n<=1.
Example: ACOS(0.5)
ASIN( double n )
The ASIN function returns the arc sin of value n. This function returns Null if the
value n is not in the
range of -1<=n<=1.
Example: ASIN(0.5)
BIN( bigint n )
The BIN function returns the number n in the binary format.
Example: BIN(100)
CEIL( double n ), CEILING( double n )
The CEILING or CEILING function returns the smallest integer greater than or equal
to the decimal
value n.
Example: CEIL(9.5)
CONV( bigint n, int from_base, int to_base )
The CONV function converts the given number n from one base to another base.
EXAMPLE: CONV(100, 10,2)
COS( double n )
The COS function returns the cosine of the value n. Here n should be specified in
radians.
Example: COS(180*3.1415926/180)
EXP( double n )
The EXP function returns e to the power of n. Where e is the base of natural
logarithm and its value
is 2.718.
Example: EXP(50)
FLOOR( double n )
The FLOOR function returns the largest integer less than or equal to the given
value n.
Example: FLOOR(10.9)
HEX( bigint n)
This function converts the value n into hexadecimal format.
Example: HEX(16)
HEX( string n )
This function converts each character into hex representation format.
Example: HEX(�ABC�)
LN( double n )
The LN function returns the natural log of a number.
Example: LN(123.45)
LOG( double base, double n )
The LOG function returns the base logarithm of the number n.
Example: LOG(3, 66)
LOG2( double n )
The LOG2 function returns the base-2 logarithm of the number n.
Example: LOG2(44)
LOG10( double n )
The LOG10 function returns the base-10 logarithm of the number n.
Example: LOG10(100)
NEGATIVE( int n ), NEGATIVE( double n )
The NEGATIVE function returns �n
Example: NEGATIVE(10)
PMOD( int m, int n ), PMOD( double m, double n )
The PMOD function returns the positive modulus of a number.
Example: PMOD(3,2)
POSITIVE( int n ), POSITIVE( double n )
The POSITIVE function returns n
Example: POSITIVE(-10)
POW( double m, double n ), POWER( double m, double n )
The POW or POWER function returns m value raised to the n power.
Example: POW(10,2)
RAND( [int seed] )
The RAND function returns a random number. If you specify the seed value, the
generated random
number will become deterministic.
Example: RAND( )
ROUND( double value [, int n] )
The ROUND function returns the value rounded to n integer places.
Example: ROUND(123.456,2)
SIN( double n )
The SIN function returns the sin of a number. Here n should be specified in
radians.
Example: SIN(2)
SQRT( double n )
The SQRT function returns the square root of the number
Example: SQRT(4)
UNHEX( string n )
The UNHEX function is the inverse of HEX function. It converts the specified string
to the number
format.
Example: UNHEX(�AB�)
Hive data types are categorized into two types. They are the primitive and complex
data types.
The primitive data types include Integers, Boolean, Floating point numbers and
strings. The below
table lists the size of each data type:
Type Size
----------------------
TINYINT 1 byte
SMALLINT 2 byte
INT 4 byte
BIGINT 8 byte
FLOAT 4 byte (single precision floating point numbers)
DOUBLE 8 byte (double precision floating point numbers)
BOOLEAN TRUE/FALSE value
STRING Max size is 2GB.
The complex data types include Arrays, Maps and Structs. These data types are built
on using the
primitive data types.
Arrays: Contain a list of elements of the same data type. These elements are
accessed by using an
index. For example an array, �fruits�, containing a list of elements [�apple�,
�mango�, �orange�], the
element �apple� in the array can be accessed by specifying fruits[1].
Maps: Contains key, value pairs. The elements are accessed by using the keys. For
example a
map, �pass_list� containing the �user name� as key and �password� as value, the
password of the
user can be accessed by specifying pass_list[�username�]
Structs: Contains elements of different data types. The elements can be accessed by
using the dot
notation. For example in a stuct, �car�, the color of the car can be retrieved as
specifying [Link]
The create table statement containing the complex type is shown below.
What is Hive
Hive data warehouse is used to manage large datasets residing in Hadoop and for
querying
purpose. Hive can be used to access files stored in HDFS or in other data storage
system.
Hive provides SQL, which is called Hive QL, to read the data from the data storage
system. Hive
does not support the complete SQL-92 specification. It executes the queries via
MapReduce
algorithms. Hive provides the flexibility for users to create their own UDF�s via
MapReduce
framework. The programmer need to write the mapper and reducer scripts.
As Hadoop is batch processing system, the data processed by the Hadoop and returned
have high
latency. So, Hive queries have high latency and therefore it is not suitable for
online transactional
processing.
The grep is one of the powerful tools in unix. Grep stands for "global search for
regular expressions
and print". The power of grep lies in using regular expressions mostly.
The general syntax of grep command is
grep [options] pattern [files]
1. Write a command to print the lines that has the the pattern "july" in all the
files in a particular
directory?
grep july *
This will print all the lines in all files that contain the word �july� along with
the file name. If any of the
files contain words like "JULY" or "July", the above command would not print those
lines.
2. Write a command to print the lines that has the word "july" in all the files in
a directory and also
suppress the filename in the output.
grep -h july *
3. Write a command to print the lines that has the word "july" while ignoring the
case.
grep -i july *
The option i make the grep command to treat the pattern as case insensitive.
4. When you use a single file as input to the grep command to search for a pattern,
it won't print the
filename in the output. Now write a grep command to print the filename in the
output without using
the '-H' option.
grep pattern filename /dev/null
The /dev/null or null device is special file that discards the data written to it.
So, the /dev/null is
always an empty file.
Another way to print the filename is using the '-H' option. The grep command for
this is
grep -H pattern filename
5. Write a Unix command to display the lines in a file that do not contain the word
"july"?
grep -v july filename
The '-v' option tells the grep to print the lines that do not contain the specified
pattern.
6. Write a command to print the file names in a directory that has the word "july"?
grep -l july *
The '-l' option make the grep command to print only the filename without printing
the content of the
file. As soon as the grep command finds the pattern in a file, it prints the
pattern and stops searching
other lines in the file.
7. Write a command to print the file names in a directory that does not contain the
word "july"?
grep -L july *
The '-L' option makes the grep command to print the filenames that do not contain
the specified
pattern.
8. Write a command to print the line numbers along with the line that has the word
"july"?
grep -n july filename
The '-n' option is used to print the line numbers in a file. The line numbers start
from 1
9. Write a command to print the lines that starts with the word "start"?
grep '^start' filename
The '^' symbol specifies the grep command to search for the pattern at the start of
the line.
10. Write a command to print the lines which end with the word "end"?
grep 'end$' filename
The '$' symbol specifies the grep command to search for the pattern at the end of
the line.
11. Write a command to select only those lines containing "july" as a whole word?
grep -w july filename
The '-w' option makes the grep command to search for exact whole words. If the
specified pattern is
found in a string, then it is not considered as a whole word. For example: In the
string
"mikejulymak", the pattern "july" is found. However "july" is not a whole word in
that string.
Swapping
The whole process in swapping is moved from the swap device to the main memory for
execution.
The process size must be less than or equal to the available main memory. It is
easier to
implementation and overhead to the system. Swapping systems does not handle the
memory more
flexibly as compared to the paging systems.
Paging
Only the required memory pages are moved to main memory from the swap device for
execution.
The process size does not matter. Paging gives the concept of the virtual memory.
It provides
greater flexibility in mapping the virtual address space into the physical memory
of the machine. It
allows more number of processes to fit in the main memory simultaneously and allows
the greater
process size than the available physical memory. Demand paging systems handle the
memory more
flexibly.
Awk is powerful tool in Unix. Awk is an excellent tool for processing the files
which have data
arranged in rows and columns format. It is a good filter and report writer.
1. How to run awk command specified in a file?
awk -f filename
2. Write a command to print the squares of numbers from 1 to 10 using awk command
awk 'BEGIN { for(i=1;i<=10;i++) {print "square of",i,"is",i*i;}}'
3. Write a command to find the sum of bytes (size of file) of all files in a
directory.
ls -l | awk 'BEGIN {sum=0} {sum = sum + $5} END {print sum}'
4. In the text file, some lines are delimited by colon and some are delimited by
space. Write a
command to print the third field of each line.
[Link]
unix..jpg
awk '{ if( $0 ~ /:/ ) { FS=":"; } else { FS =" "; } print $3 }' filename
5. Write a command to print the line number before each line?
awk '{print NR, $0}' filename
6. Write a command to print the second and third line of a file without using NR.
awk 'BEGIN {RS="";FS="\n"} {print $2,$3}' filename
7. Write a command to print zero byte size files?
ls -l | awk '/^-/ {if ($5 !=0 ) print $9 }'
8. Write a command to rename the files in a directory with "_new" as postfix?
ls -F | awk '{print "mv "$1" "$1".new"}' | sh
9. Write a command to print the fields in a text file in reverse order?
awk 'BEGIN {ORS=""} { for(i=NF;i>0;i--) print $i," "; print "\n"}' filename
10. Write a command to find the total number of lines in a file without using NR
awk 'BEGIN {sum=0} {sum=sum+1} END {print sum}' filename
Another way to print the number of lines is by using the NR. The command is
awk 'END{print NR}' filename
The Unix file structure is organized in a reverse tree structure manner. The
following figure shows a
typical organization of files in Unix system.
The diagram looks like any upside-down tree. The slash (/) indicates the root
directory. Names like
etc, usr, local are directories and [Link] is a file. The regular files in
Unix are the leaves in a tree
structure.
. Regular files
. Directories
. Special or Device files
Regular Files
Regular files hold data and executable programs. Executable programs are the
commands (ls) that
you enter on the prompt. The data can be anything and there is no specific format
enforced in the
way the data is stored.
The regular files can be visualized as the leaves in the UNIX tree.
Directories
Directories are files that contain other files and sub-directories. Directories are
used to organize the
data by keeping closely related files in the same place. The directories are just
like the folders in
windows operating system.
The kernel alone can write the directory file. When a file is added to or deleted
from this directory,
the kernel makes an entry.
A directory file can be visualized as the branch of the UNIX tree.
Special Or Device Files
These files represent the physical devices. Files can also refer to computer
hardware such as
terminals and printers. These device files can also refer to tape and disk drives,
CD-ROM players,
modems, network interfaces, scanners, and any other piece of computer hardware.
When a process
writes to a special file, the data is sent to the physical device associated with
it. Special files are not
literally files, but are pointers that point to the device drivers located in the
kernel. The protection
applicable to files is also applicable to physical devices.
The strength of the Unix lies in treating the files in a consistent way. For Unix a
file is a file. This
consistency makes it easy to work with files and the user does not have to learn
special commands
for new tasks. The user can write Unix programs easily without worrying about
whether he�s
communicating to a terminal, a printer, or an ordinary file on a disk drive.
For example a "cat" command can be used to display the contents of a file on
terminal screen and
can also send the file to a printer. As far as Unix is concerned the terminal and
the printer are files
just as other files.
Unix User Login Programs - Getty And Login
The Kernel should know who the user is logging in and how to communicate with the
user. To do
this the kernel invokes two programs, getty and login.
The kernel invokes the getty program for every user terminal. When the getty
program receives input
from the user, it invokes the login program. The login program verifies the
identity of the user by
checking the password file. If the user fails to provide valid password, the login
program returns the
control back to the getty program. If the user enters a valid password, the login
program takes the
user to the shell prompt.
The instructions to the kernel are complex and highly technical. To protect the
kernel from the short
comings of user, a shell is built around the kernel. The Shell acts like a mediator
between the user
and the kernel. Whenever a user run a command, the shell interprets the command and
passes the
command to the kernel.
Three types of shell are standard in Unix
. Bourne shell is developed by Stephen Bourne. It is the most widely used shell and
is a
program with name sh. The bourne shell prompts with $ symbol
. Korn shell is developed by David Korn. The korn shell has additional features
than bourne shell
and is called by the name ksh.
. C shell is developed by Bill Joy and is called by the name csh.
The kernel is the heart of a UNIX system and manages the hardware, executing
processes etc.
When the computer is booted, kernel is loaded into the computer's main memory and
it remains
there until the computer is shut down. The kernel performs many low-level and
system-level
functions.
The structure of Unix operating system can be divided into three parts.
. Kernel is the core part of Unix which interacts with the hardware for low level
functions.
. Shell is the outer unit of Unix which interacts with the user to perform the
functions.
. File System.
id value
----------
1 A,B,C
2 P,Q,R,S,T
3 M,N
Here the data in value column is a delimited by comma. Now write a query to split
the delimited data
in the value column into multiple rows. The output should look like as
id value
--------
1 A
1 B
1 C
2 P
2 Q
2 R
2 S
2 T
3 M
3 N
Solution:
SELECT [Link],
THEN instr(value,',',1,a.l)-
instr(value,',',1,a.l-1)-1
ELSE length(value)
END
END final_value
FROM t,
( SELECT level l
FROM DUAL
CONNECT BY LEVEL <=
) a
A median is a value separating the higher half of sample from the lower half. The
median can be
found by arranging all the numerical values from lowest to highest value and
picking the middle one.
If there are even number of numerical values, then there is no single middle value;
then the median
is defined as the mean of the two middle values.
Now let see how to calculate the median in oracle with the employees table as
example.
Table name: employees
The below query is used to calculate the median of employee salaries across the
entire table.
select empid,
dept_id,
salary,
percentile_disc(0.5) within group (order by salary desc)
over () median
from employees;
Now we will write a query to find the median of employee salaries in each
department.
select empid,
dept_id,
salary,
percentile_disc(0.5) within group (order by salary desc)
over (partition by department_id) median
from employees;
The source data is represented in the form the tree structure. You can easily
derive the parent-child
relationship between the elements. For example, B is parent of D and E. As the
element A is root
element, it is at level 0. B, C are at level 1 and so on.
Here in this table, column C1 is parent of column C2, column C2 is parent of column
C3, column C3
is parent of column C4.
Q1. Write a query to load the target table with the below data. Here you need to
generate sequence
numbers for each element and then you have to get the parent id. As the element "A"
is at root, it
does not have any parent and its parent_id is NULL.
Solution:
WITH t1 AS
(
SELECT VALUE PARENT,
LEV,
LEAD(value,1) OVER (PARTITION BY r ORDER BY lev) CHILD
FROM (SELECT c1,
c2,
c3,
c4,
ROWNUM r
FROM table_name
)
UNPIVOT (value FOR lev IN (c1 as 0,c2 as 1,c3 as 2,c4 as 3))
),
t2 AS
(
SELECT PARENT,
LEV,
ROWNUM SEQ
FROM
(SELECT DISTINCT PARENT,
LEV
FROM T1
ORDER BY LEV
)
),
T3 AS
(
SELECT DISTINCT PARENT,
CHILD
FROM T1
WHERE CHILD IS NOT NULL
UNION ALL
SELECT DISTINCT NULL,
PARENT
FROM T1
WHERE LEV=0
)
SELECT [Link] Id,
[Link] ELEMENT,
[Link],
[Link] PARENT_ID
FROM T3
INNER JOIN
T2 C
ON ([Link] = [Link])
LEFT OUTER JOIN
T2 P
ON ([Link] = [Link])
ORDER BY [Link];
Name, Friend_Name
-----------------
sam, ram
sam, vamsi
vamsi, ram
vamsi, jhon
ram, vijay
ram, anand
Here ram and vamsi are friends of sam; ram and jhon are friends of vamsi and so on.
Now write a
query to find friends of friends of sam. For sam; ram,jhon,vijay and anand are
friends of friends. The
output should look as
Name, Friend_of_Firend
----------------------
sam, ram
sam, jhon
sam, vijay
sam, anand
Solution:
SELECT [Link],
f2.friend_name as friend_of_friend
friends f2
2. This is an extension to the problem 1. In the output, you can see ram is
displayed as friends of
friends. This is because, ram is mutual friend of sam and vamsi. Now extend the
above query to
exclude mutual friends. The outuput should look as
Name, Friend_of_Friend
----------------------
sam, jhon
sam, vijay
sam, anand
Solution:
SELECT [Link],
f2.friend_name as friend_of_friend
friends f2
3. Write a query to get the top 5 products based on the quantity sold without using
the row_number
analytical function? The source data looks as
-----------------------------
A, 200, 2009
B, 155, 2009
C, 455, 2009
D, 620, 2009
E, 135, 2009
F, 390, 2009
G, 999, 2010
H, 810, 2010
I, 910, 2010
J, 109, 2010
L, 260, 2010
M, 580, 2010
Solution:
SELECT products,
quantity_sold,
year
FROM
SELECT products,
quantity_sold,
year,
rownum r
from t
)A
WHERE r <= 5;
4. This is an extension to the problem 3. Write a query to produce the same output
using
row_number analytical function?
Solution:
SELECT products,
quantity_sold,
year
FROM
SELECT products,
quantity_sold,
year,
row_number() OVER(
from t
)A
WHERE r <= 5;
5. This is an extension to the problem 3. write a query to get the top 5 products
in each year based
on the quantity sold?
Solution:
SELECT products,
quantity_sold,
year
FROM
SELECT products,
quantity_sold,
year,
row_number() OVER(
PARTITION BY year
from t
)A
WHERE r <= 5;
Here I am providing Oracle SQL Query Interview Questions. If you find any bugs in
the queries,
Please do comment. So, that i will rectify them.
1. Write a query to generate sequence numbers from 1 to the specified number N?
Solution:
2. Write a query to display only friday dates from Jan, 2000 to till now?
Solution:
SELECT C_DATE,
TO_CHAR(C_DATE,'DY')
FROM
FROM DUAL
(SYSDATE - TO_DATE('01-JAN-2000','DD-MON-YYYY')+1)
)
3. Write a query to duplicate each row based on the value in the repeat column? The
input table data
looks like as below
Products, Repeat
----------------
A, 3
B, 5
C, 2
Now in the output data, the product A should be repeated 3 times, B should be
repeated 5 times and
C should be repeated 2 times. The output will look like as below
Products, Repeat
----------------
A, 3
A, 3
A, 3
B, 5
B, 5
B, 5
B, 5
B, 5
C, 2
C, 2
Solution:
SELECT PRODUCTS,
REPEAT
FROM T,
) A
ORDER BY [Link];
4. Write a query to display each letter of the word "SMILE" in a separate row?
Solution:
SELECT SUBSTR('SMILE',LEVEL,1) A
FROM DUAL
5. Convert the string "SMILE" to Ascii values? The output should look like as
83,77,73,76,69. Where
83 is the ascii value of S and so on.
The ASCII function will give ascii value for only one character. If you pass a
string to the ascii
function, it will give the ascii value of first letter in the string. Here i am
providing two solutions to get
the ascii values of string.
Solution1:
SELECT SUBSTR(DUMP('SMILE'),15)
FROM DUAL;
Solution2:
SELECT WM_CONCAT(A)
FROM
SELECT ASCII(SUBSTR('SMILE',LEVEL,1)) A
FROM DUAL
);
[Link],
[Link]
FROM PRODUCTS P,
SALES S
(SELECT AVG(QUANTITY)
FROM SALES S1
);
--------------------------
Nokia 2010 25
IPhone 2012 20
Samsung 2012 20
Samsung 2010 20
2. Write a query to compare the products sales of "IPhone" and "Samsung" in each
year? The output
should look like as
---------------------------------------------------
Solution:
By using self-join SQL query we can get the required result. The required SQL query
is
SELECT S_I.YEAR,
S_I.QUANTITY IPHONE_QUANT,
S_S.QUANTITY SAM_QUANT,
S_I.PRICE IPHONE_PRICE,
S_S.PRICE SAM_PRICE
SALES S_I,
PRODUCTS P_S,
SALES S_S
SELECT P.PRODUCT_NAME,
[Link],
RATIO_TO_REPORT([Link]*[Link])
FROM PRODUCTS P,
SALES S
-----------------------------
4. In the SALES table quantity of each product is stored in rows for every year.
Now write a query to
transpose the quantity for each product and display it in columns? The output
should look like as
------------------------------------------
IPhone 10 15 20
Samsung 20 18 20
Nokia 25 16 8
Solution:
Oracle 11g provides a pivot function to transpose the row data into column data.
The SQL query for
this is
SELECT * FROM
SELECT P.PRODUCT_NAME,
[Link],
[Link]
FROM PRODUCTS P,
SALES S
)A
If you are not running oracle 11g database, then use the below query for
transposing the row data
into column data.
SELECT P.PRODUCT_NAME,
FROM PRODUCTS P,
SALES S
SELECT YEAR,
COUNT(1) NUM_PRODUCTS
FROM SALES
GROUP BY YEAR;
YEAR NUM_PRODUCTS
------------------
2010 3
2011 3
2012 3
As a database developer, writing SQL queries, PLSQL code is part of daily life.
Having a good
knowledge on SQL is really important. Here i am posting some practical examples on
SQL queries.
To solve these interview questions on SQL queries you have to create the products,
sales tables in
your oracle database. The "Create Table", "Insert" statements are provided below.
(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
COMMIT;
PRODUCT_ID PRODUCT_NAME
-----------------------
100 Nokia
200 IPhone
300 Samsung
--------------------------------------
Here Quantity is the number of products sold in each year. Price is the sale price
of each product.
I hope you have created the tables in your oracle database. Now try to solve the
below SQL queries.
1. Write a SQL query to find the products which have continuous increase in sales
every year?
Solution:
Here �Iphone� is the only product whose sales are increasing every year.
STEP1: First we will get the previous year sales for each product. The SQL query to
do this is
SELECT P.PRODUCT_NAME,
[Link],
[Link],
LEAD([Link],1,0) OVER (
PARTITION BY P.PRODUCT_ID
) QUAN_PREV_YEAR
FROM PRODUCTS P,
SALES S
-----------------------------------------
Nokia 2012 8 16
Nokia 2011 16 25
Nokia 2010 25 0
IPhone 2012 20 15
IPhone 2011 15 10
IPhone 2010 10 0
Samsung 2012 20 18
Samsung 2011 18 20
Samsung 2010 20 0
Here the lead analytic function will get the quantity of a product in its previous
year.
STEP2: We will find the difference between the quantities of a product with its
previous year�s
quantity. If this difference is greater than or equal to zero for all the rows,
then the product is a
constantly increasing in sales. The final query to get the required result is
SELECT PRODUCT_NAME
FROM
SELECT P.PRODUCT_NAME,
[Link] -
LEAD([Link],1,0) OVER (
PARTITION BY P.PRODUCT_ID
) QUAN_DIFF
FROM PRODUCTS P,
SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID
)A
GROUP BY PRODUCT_NAME
PRODUCT_NAME
------------
IPhone
2. Write a SQL query to find the products which does not have sales at all?
Solution:
�LG� is the only product which does not have sales at all. This can be achieved in
three ways.
Method1: Using left outer join.
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID);
PRODUCT_NAME
------------
LG
Method2: Using the NOT IN operator.
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
PRODUCT_NAME
------------
LG
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
PRODUCT_NAME
------------
LG
3. Write a SQL query to find the products whose sales decreased in 2012 compared to
2011?
Solution:
Here Nokia is the only product whose sales decreased in year 2012 when compared
with the sales
in the year 2011. The SQL query to get the required output is
SELECT P.PRODUCT_NAME
FROM PRODUCTS P,
SALES S_2012,
SALES S_2011
PRODUCT_NAME
------------
Nokia
SELECT PRODUCT_NAME,
YEAR
FROM
SELECT P.PRODUCT_NAME,
[Link],
RANK() OVER (
PARTITION BY [Link]
) RNK
FROM PRODUCTS P,
SALES S
) A
WHERE RNK = 1;
PRODUCT_NAME YEAR
--------------------
Nokia 2010
Samsung 2011
IPhone 2012
Samsung 2012
SELECT P.PRODUCT_NAME,
FROM PRODUCTS P
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID)
GROUP BY P.PRODUCT_NAME;
PRODUCT_NAME TOTAL_SALES
---------------------------
LG 0
IPhone 405000
Samsung 406000
Nokia 245000
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
COMMIT;
PRODUCT_ID PRODUCT_NAME
-----------------------
100 Nokia
200 IPhone
300 Samsung
400 LG
500 BlackBerry
600 Motorola
Solution:
First we will create a target table. The target table will have an additional
column INSERT_DATE to
know when a product is loaded into the target table. The target
table structure is
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30),
INSERT_DATE DATE
);
The next step is to pick 5 products randomly and then load into target table. While
selecting check
whether the products are there in the
SELECT PRODUCT_ID,
PRODUCT_NAME,
SYSDATE INSERT_DATE
FROM
SELECT PRODUCT_ID,
PRODUCT_NAME
FROM PRODUCTS S
SELECT 1
FROM TGT_PRODUCTS T
)A
The last step is to delete the products from the table which are loaded 30 days
back.
DELETE FROM TGT_PRODUCTS
CONTENT_ID INTEGER,
CONTENT_TYPE VARCHAR2(30)
);
COMMIT;
CONTENT_ID CONTENT_TYPE
-----------------------
1 MOVIE
2 MOVIE
3 AUDIO
4 AUDIO
5 MAGAZINE
6 MAGAZINE
. Load only one content type at a time into the target table.
. The target table should always contain only one contain type.
. The loading of content types should follow round-robin style. First MOVIE, second
AUDIO, Third
MAGAZINE and again fourth Movie.
Solution:
First we will create a lookup table where we mention the priorities for the content
types. The lookup
table �Create Statement� and data is shown below.
CONTENT_TYPE VARCHAR2(30),
PRIORITY INTEGER,
LOAD_FLAG INTEGER
);
COMMIT;
---------------------------------
MOVIE 1 1
AUDIO 2 0
MAGAZINE 3 0
The second step is to truncate the target table before loading the data
The third step is to choose the appropriate content type from the lookup table to
load the source data
into the target table.
SELECT CONTENT_ID,
CONTENT_TYPE
FROM CONTENTS
UPDATE CONTENTS_LKP
SET LOAD_FLAG = 0
WHERE LOAD_FLAG = 1;
UPDATE CONTENTS_LKP
SET LOAD_FLAG = 1
WHERE PRIORITY = (
FROM CONTENTS_LKP
);
1. What is polling?
Polling displays the updated information about the session in the monitor window.
The monitor
window displays the status of each session when you poll the informatica server.
12. What are the basic requirements to join two sources in a source qualifier
transformation using
default join?
. The two sources should have primary key and foreign key relationship.
. The two sources should have matching data types.
1. What are the differences between joiner transformation and source qualifier
transformation?
3. What are the settings that you use to configure the joiner transformation?
. Normal (Default)
. Master outer
. Detail outer
. Full outer
When a Joiner transformation occurs in a session, the Informatica Server reads all
the records from
the master source and builds index and data caches based on the master rows. After
building the
caches, the Joiner transformation reads records from the detail source and performs
joins.
. Persistent cache: You can save the lookup cache files and reuse them the next
time the
informatica server processes a lookup transformation configured to use the cache.
. Re-cache from database: If the persistent cache is not synchronized with the
lookup table, you
can configure the lookup transformation to rebuild the lookup cache.
. Static cache: you can configure a static or read only cache for only lookup
table. By default
informatica server creates a static cache. It caches the lookup table and lookup
values in the
cache for each row that comes into the transformation. When the lookup condition is
true, the
informatica server does not update the cache while it processes the lookup
transformation.
. Dynamic cache: If you want to cache the target table and insert new rows into
cache and the
target, you can create a look up transformation to use dynamic cache. The
informatica server
dynamically inserts data to the target table.
. Shared cache: You can share the lookup cache between multiple transactions. You
can share
unnamed cache between transformations in the same mapping.
11. Which transformation should we use to normalize the COBOL and relational
sources?
. Input group
. Output group
18. What are the types of data that passes between informatica server and stored
procedure?
Three types of data passes between the informatica server and stored procedure.
. Input/Output parameters
. Return Values
. Status code.
1. While importing the relational source definition from the database, what are the
metadata of
source that will be imported?
. Source name
. Database location
. Column names
. Data type�s
. Key constraints
2. How many ways a relational source definition can be updated and what are they?
3. To import the flat file definition into the designer where should the flat file
be placed?
Place the flat file in local folder in the local machine
4. To provide support for Mainframes source data, which files are used as a source
definitions?
COBOL files
5. Which transformation is needed while using the cobol sources as source
definitions?
As cobol sources consists of denormalized data, normalizer transformation is
required to normalize
the data.
6. How to create or import flat file definition in to the warehouse designer?
We cannot create or import flat file definition into warehouse designer directly.
We can create or
import the file in source analyzer and then drag it into the warehouse designer.
7. What is a mapplet?
A mapplet is a set of transformations that you build in the mapplet designer and
can be used in
multiple mappings.
8. What is a transformation?
It is a repository object that generates, modifies or passes data.
. Mapping designer
. Transformation developer
. Mapplet designer
Two methods:
17. Can we use the mapping parameters or variables created in one mapping into
another mapping?
NO. We can use the mapping parameters or variables only in the transformations of
the same
mapping or mapplet in which we have created the mapping parameters or variables.
18. Can we use the mapping parameters or variables created in one mapping into any
other
reusable transformation?
Yes. As an instance of the reusable transformation created in the mapping belongs
to that mapping
only.
Take a look at the following tree structure diagram. From the tree structure, you
can easily derive the
parent-child relationship between the elements. For example, B is parent of D and
E.
Here i am providing some more scenario based interview questions on datastage. Try
to solve these
scenarios and improve your technical skills. If you get solutions to these
scenarios, please do
comment here.
1. Consider the following product types data as the source.
Product_id, product_type
------------------------
10, video
10, Audio
20, Audio
30, Audio
40, Audio
50, Audio
10, Movie
20, Movie
30, Movie
40, Movie
50, Movie
60, Movie
Assume that there are only 3 product types are available in the source. The source
contains 12
records and you dont know how many products are available in each product type.
Q1. Create a job to select 9 products in such a way that 3 products should be
selected from video, 3
products should be selected from Audio and the remaining 3 products should be
selected from
Movie.
Q2. In the above problem Q1, if the number of products in a particular product type
are less than 3,
then you wont get the total 9 records in the target table. For example, see the
videos type in the
source data. Now design a mapping in such way that even if the number of products
in a particular
product type are less than 3, then you have to get those less number of records
from another
product types. For example: If the number of products in videos are 1, then the
reamaining 2 records
should come from audios or movies. So, the total number of records in the target
table should
always be 9.
2. Create a job to convert column data into row data.
The source data looks like
----------------
a, b, c
d, e, f
Col
---
a
id, value
---------
10, a
10, b
10, c
20, d
20, e
20, f
--------------------
10, a, b, c
20, d, e, f
Datastage Scenario Based Questions - Part 2
Here i am providing some scenario based questions on datastage. This scenarios not
only help you
for preparing the interview, these will also help you in improving your technical
skills in stage. Try to
solve the below scenario based questions.
1. Consider the following employees data as source?
employee_id, salary
-------------------
10, 1000
20, 2000
30, 3000
40, 5000
Q1. Create a job to load the cumulative sum of salaries of employees into target
table?
The target table data should look like as
-----------------------------------
Q2. Create a job to get the pervious row salary for the current row. If there is no
pervious row exists
for the current row, then the pervious row salary should be displayed as null.
The output should look like as
employee_id, salary, pre_row_salary
-----------------------------------
Q3. Create a job to get the next row salary for the current row. If there is no
next row for the current
row, then the next row salary should be displayed as null.
The output should look like as
------------------------------------
Q4. Create a job to find the sum of salaries of all employees and this sum should
repeat for all the
rows.
The output should look like as
-------------------------------
department_no, employee_name
----------------------------
20, R
10, A
10, D
20, P
10, B
10, C
20, Q
20, S
Q1. Create a job to load a target table with the following values from the above
source?
department_no, employee_list
--------------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Q2. Create a job to load a target table with the following values from the above
source?
department_no, employee_list
----------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, P
20, P,Q
20, P,Q,R
20, P,Q,R,S
Q3. Create a job to load a target table with the following values from the above
source?
department_no, employee_names
-----------------------------
10, A,B,C,D
20, P,Q,R,S
Most of the you just simply prepare for the interview questions by reading
conceptual questions and
ignore preparing for the scenario questions. That is the reason here i am providing
the scenarios
which mostly asked in the interviews. Be prepared with the below interview
questions.
1. Create a job to load the first 3 records from a flat file into a target table?
2. Create a job to load the last 3 records from a flat file into a target table?
3. Create a job to load the first record from a flat file into one table A, the
last record from a flat file
into table B and the remaining records into table C?
4. Consider the following products data which contain duplicate records.
Q2. Create a job to load each product once into one table and the remaining
products which are
duplicated into another table.
The first table should contain the following output
. A condition forces a query to retrieve only the data that meets the criteria.
Condition is not
reusable.
. A filter is applied on a report and allows you to view the required data. Filter
restricts the number
of rows displayed in the report.
1. What is [Link]?
[Link] file contains the information about the repository site i.e. it contains
the address of the
repository security domain.
2. What is a metric?
Metrics are a system of parameters or ways of quantitative and periodic assessment
of a process
that is to be measured; these are used to track trends, productivity.
Business Objects Interview Questions - Part 1
3. What is the source for metrics?
Measure objects.
4. What is a Set?
A set is grouping of users.
5. Why do we need metrics and sets?
Metrics are used for analysis and Sets are used for grouping.
6. What is a section in Business objects report?
When you apply section on a block it divides the report into smaller sections and
the columns on
which you apply section will appear as the heading out of the block. When you apply
a chart on this
block every section have an individual chart for its own section.
7. What are the different sections available in Business objects?
The different sections are:
. Report Header
. Page Header
. Details
. Report Footer
. Page Footer
1. What is a repository?
Business objects repository is a set of database tables where the metadata of your
application is
stored.
2. When is the repository created?
In 5i/6i versions, the repository is created after installing the software. In Xi
version a repository is
created at the time of installation.
3. What is a domain?
A domain is nothing but a logical grouping of system tables.
4. How many domains are there in the basic setup and what are they?
There are three domains in the business objects. They are:
. Dimension object: Provides the parameters which are mainly focus for analysis.
Example:
customer name, country name
. Detail object: Provides the description of a dimension object but is not the
focus for analysis.
Example: customer address, phone number.
. Measure object: Provides the numerical quantities. Example: sales, revenue.
The transformations which used mostly are listed in the below table. Click on the
transforamtion to
see the interview questions on the particular transformation.
Aggregator
Active/Connected
Expression
Passive/Connected
Filter
Active/Connected
Joiner
Active/Connected
Lookup
Passive/Connected or Unconnected
Normalizer
Active/Connected
Rank
Active/Connected
Router
Active/Connected
Sequence Generator
Passive/Connected
Sorter
Active/Connected
Source Qualifier
Active/Connected
SQL
Active or Passive/Connected
Stored Procedure
Passive/Connected or Unconnected
Transaction Control
Active/Connected
Union
Active/Connected
Update Strategy
Active/Connected
1. What is a transformation?
A transformation is a repository object that generates, modifies, or passes data.
2. What is an active transformation?
An active transformation is the one which changes the number of rows that pass
through it.
Example: Filter transformation
3. What is a passive transformation?
A passive transformation is the one which does not change the number of rows that
pass through it.
Example: Expression transformation
4. What is a connected transformation?
A connected transformation is connected to the data flow or connected to the other
transformations
in the mapping pipeline.
Example: sorter transformation
5. What is an unconnected transformation?
An unconnected transformation is not connected to other transformations in the
mapping. An
unconnected transformation is called within another transformation and returns a
value to that
transformation.
Example: Unconnected lookup transformation, unconnected stored procedure
transformation
6. What are multi-group transformations?
Transformations having multiple input and output groups are called multi-group
transformations.
Examples: Custom, HTTP, Joiner, Router, Union, Unstructured Data, XML source
qualifier, XML
Target definition, XML parser, XML generator
7. List out all the transformations which use cache?
Aggregator, Joiner, Lookup, Rank, Sorter
8. What is blocking transformation?
Transformation which blocks the input rows are called blocking transformation.
Example: Custom transformation, unsorted joiner
9. What is a reusable transformation?
A reusable transformation is the one which can be used in multiple mappings.
Reusable
transformation is created in transformation developer.
10. How do you promote a non-reusable transformation to reusable transformation?
Edit the transformation and check the Make Reusable option
11. How to create a non-reusable instance of reusable transformations?
In the navigator, select an existing transformation and drag the transformation
into the mapping
workspace. Hold down the Ctrl key before you release the transformation.
12. Which transformation can be created only as reusable transformation but not as
non-reusable
transformation?
External procedure transformation.
-------------------
10, 1000
20, 2000
30, 3000
40, 5000
Q1. Design a mapping to load the cumulative sum of salaries of employees into
target table?
The target table data should look like as
-----------------------------------
Q2. Design a mapping to get the pervious row salary for the current row. If there
is no pervious row
exists for the current row, then the pervious row salary should be displayed as
null.
The output should look like as
-----------------------------------
Department_no, Employee_name
----------------------------
20, R
10, A
10, D
20, P
10, B
10, C
20, Q
20, S
Q1. Design a mapping to load a target table with the following values from the
above source?
Department_no, Employee_list
----------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Q2. Design a mapping to load a target table with the following values from the
above source?
Department_no, Employee_list
----------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, P
20, P,Q
20, P,Q,R
20, P,Q,R,S
Solution
. Use sorted input: Sort the data before passing into aggregator. The integration
service uses
memory to process the aggregator transformation and it does not use cache memory.
. Filter the unwanted data before aggregating.
. Limit the number of input/output or output ports to reduce the amount of data the
aggregator
transformation stores in the data cache.
. AVG
. COUNT
. FIRST
. LAST
. MAX
. MEDIAN
. MIN
. PERCENTILE
. STDDEV
. SUM
. VARIANCE
5. Why cannot you use both single level and nested aggregate functions in a single
aggregate
transformation?
The nested aggregate function returns only one output row, whereas the single level
aggregate
function returns more than one row. Since the number of rows returned are not same,
you cannot
use both single level and nested aggregate functions in the same transformation. If
you include both
the single level and nested functions in the same aggregator, the designer marks
the mapping or
mapplet as invalid. So, you need to create separate aggregator transformations.
6. Up to how many levels, you can nest the aggregate functions?
We can nest up to two levels only.
Example: MAX( SUM( ITEM ) )
7. What is incremental aggregation?
The integration service performs aggregate calculations and then stores the data in
historical cache.
Next time when you run the session, the integration service reads only new data and
uses the
historical cache to perform new aggregation calculations incrementally.
8. Why cannot we use sorted input option for incremental aggregation?
In incremental aggregation, the aggregate calculations are stored in historical
cache on the server.
In this historical cache the data need not be in sorted order. If you give sorted
input, the records
come as presorted for that particular run but in the historical cache the data may
not be in the sorted
order. That is why this option is not allowed.
9. How the NULL values are handled in Aggregator?
You can configure the integration service to treat null values in aggregator
functions as NULL or
zero. By default the integration service treats null values as NULL in aggregate
functions.
. You cannot use a joiner transformation when input pipeline contains an update
strategy
transformation.
. You cannot use a joiner if you connect a sequence generator transformation
directly before the
joiner.
. Normal join: In a normal join, the integration service discards all the rows from
the master and
detail source that do not match the join condition.
. Master outer join: A master outer join keeps all the rows of data from the detail
source and the
matching rows from the master source. It discards the unmatched rows from the
master source.
. Detail outer join: A detail outer join keeps all the rows of data from the master
source and the
matching rows from the detail source. It discards the unmatched rows from the
detail source.
. Full outer join: A full outer join keeps all rows of data from both the master
and detail rows.
. Get a related value: Retrieve a value from the lookup table based on a value in
the source.
. Perform a calculation: Retrieve a value from a lookup table and use it in a
calculation.
. Update slowly changing dimension tables: Determine whether rows exist in a
target.
. Connected lookup transformation receives input values directly from the pipeline.
Unconnected
lookup transformation receives input values from the result of a :LKP expression in
another
transformation.
. Connected lookup transformation can be configured as dynamic or static cache.
Unconnected
lookup transformation can be configured only as static cache.
. Connected lookup transformation can return multiple columns from the same row or
insert into
the dynamic lookup cache. Unconnected lookup transformation can return one column
from
each row.
. If there is no match for the lookup condition, connected lookup transformation
returns default
value for all output ports. If you configure dynamic caching, the Integration
Service inserts rows
into the cache or leaves it unchanged. If there is no match for the lookup
condition, the
unconnected lookup transformation returns null.
. In a connected lookup transformation, the cache includes the lookup source
columns in the
lookup condition and the lookup source columns that are output ports. In an
unconnected
lookup transformation, the cache includes all lookup/output ports in the lookup
condition and the
lookup/return port.
. Connected lookup transformation passes multiple output values to another
transformation.
Unconnected lookup transformation passes one output value to another
transformation.
. Connected lookup transformation supports user-defined values. Unconnected lookup
transformation does not support user-defined default values.
. Insert Else Update option applies to rows entering the lookup transformation with
the row type of
insert. When this option is enabled the integration service inserts new rows in the
cache and
updates existing rows when disabled, the Integration Service does not update
existing rows.
. Update Else Insert option applies to rows entering the lookup transformation with
the row type of
update. When this option is enabled, the Integration Service updates existing rows,
and inserts
a new row if it is new. When disabled, the Integration Service does not insert new
rows.
. Persistent cache
. Recache from lookup source
. Static cache
. Dynamic cache
. Shared Cache
. Pre-build lookup cache
. Sequential cache: The Integration Service builds lookup caches sequentially. The
Integration
Service builds the cache in memory when it processes the first row of the data in a
cached
lookup transformation.
. Concurrent caches: The Integration Service builds lookup caches concurrently. It
does not need
to wait for data to reach the Lookup transformation.
13. How the integration service builds the caches for unconnected lookup
transformation?
The Integration Service builds caches for unconnected Lookup transformations as
sequentially.
14. What is a dynamic cache?
The dynamic cache represents the data in the target. The Integration Service builds
the cache when
it processes the first lookup request. It queries the cache based on the lookup
condition for each row
that passes into the transformation. The Integration Service updates the lookup
cache as it passes
rows to the target. The integration service either inserts the row in the cache or
updates the row in
the cache or makes no change to the cache.
15. When you use a dynamic cache, do you need to associate each lookup port with
the input port?
Yes. You need to associate each lookup/output port with the input/output port or a
sequence ID. The
Integration Service uses the data in the associated port to insert or update rows
in the lookup cache.
. 0 - Integration Service does not update or insert the row in the cache.
. 1 - Integration Service inserts the row into the cache.
. 2 - Integration Service updates the row in the cache.
. The integration service increments the generated key sequence number each time it
process a
source row. When the source row contains a multiple-occurring column or a multiple-
occurring
group of columns, the normalizer transformation returns a row for each occurrence.
Each row
contains the same generated key value.
. The normalizer transformation has a generated column ID (GCID) port for each
multiple-
occurring column. The GCID is an index for the instance of the multiple-occurring
data. For
example, if a column occurs 3 times in a source record, the normalizer returns a
value of 1,2 or
3 in the generated column ID.
4. What is VSAM?
VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe
operating
system. VSAM organize records in indexed or sequential flat files.
5. What is VSAM normalizer transformation?
The VSAM normalizer transformation is the source qualifier transformation for a
COBOL source
definition. A COBOL source is flat file that can contain multiple-occurring data
and multiple types of
records in the same file.
6. What is pipeline normalizer transformation?
Pipeline normalizer transformation processes multiple-occurring data from
relational tables or flat
files.
7. What is occurs clause and redefines clause in normalizer transformation?
. Occurs clause is specified when the source row has a multiple-occurring columns.
. A redefines clause is specified when the source has rows of multiple columns.
Informatica Interview Questions on Rank Transformation
. Start Value
. Increment By
. End Value
. Current Value
. Cycle
. Number of Cached Values
Informatica Interview Questions on Sorter Transformation
. Join two or more tables originating from the same source (homogeneous sources)
database.
. Filter the rows.
. Sort the data
. Selecting distinct values from the source
. Create custom query
. Specify a pre-sql and post-sql
. SQL Query
. User-Defined Join
. Source Filter
. Number of Sorted Ports
. Select Distinct
. Pre-SQL
. Post-SQL
. Mode: Specifies the mode in which SQL transformation runs. SQL transformation
supports two
modes. They are script mode and query mode.
. Database type: The type of database that SQL transformation connects to.
. Connection type: Pass database connection to the SQL transformation at run time
or specify a
connection object.
. Script mode: The SQL transformation runs scripts that are externally located. You
can pass a
script name to the transformation with each input row. The SQL transformation
outputs one row
for each input row.
. Query mode: The SQL transformation executes a query that you define in a query
editor. You
can pass parameters to the query to define dynamic queries. You can output multiple
rows
when the query has a SELECT statement.
5. When you configure an SQL transformation to run in script mode, what are the
ports that the
designer adds to the SQL transformation?
The designer adds the following ports to the SQL transformation in script mode:
. ScriptName: This is an input port. ScriptName receives the name of the script to
execute the
current row.
. ScriptResult: This is an output port. ScriptResult returns PASSED if the script
execution
succeeds for the row. Otherwise it returns FAILED.
. ScriptError: This is an output port. ScriptError returns the errors that occur
when a script fails for
a row.
6. What are the types of SQL queries you can specify in the SQL transformation when
you use it in
query mode.
. Static SQL query: The query statement does not change, but you can use query
parameters to
change the data. The integration service prepares the query once and runs the query
for all
input rows.
. Dynamic SQL query: The query statement can be changed. The integration service
prepares a
query for each input row.
7. What are the types of connections to connect the SQL transformation to the
database available?
. Static connection: Configure the connection object tin the session. You must
first create the
connection object in workflow manager.
. Logical connection: Pass a connection name to the SQL transformation as input
data at run
time. You must first create the connection object in workflow manager.
. Full database connection: Pass the connect string, user name, password and other
connection
information to SQL transformation input ports at run time.
8. How do you find the number of rows inserted, updated or deleted in a table?
You can enable the NumRowsAffected output port to return the number of rows
affected by the
INSERT, UPDATE or DELETE query statements in each input row. This NumRowsAffected
option
works in query mode.
10. When you enable the NumRowsAffected output port in script mode, what will be
the output?
11. How do you limit the number of rows returned by the select statement?
You can limit the number of rows by configuring the Max Output Row Count property.
To configure
unlimited output rows, set Max Output Row Count to zero.
. Check the status of a target database before loading data into it.
. Determine if enough space exists in a database.
. Perform a specialized calculation.
. Drop and recreate indexes.
. Run a stored procedure every time a row passes through the mapping.
. Pass parameters to the stored procedure and receive multiple output parameters.
7. What are the options available to specify when the stored procedure
transformation needs to be
run?
The following options describe when the stored procedure transformation runs:
. Normal: The stored procedure runs where the transformation exists in the mapping
on a row-by-
row basis. This is useful for calling the stored procedure for each row of data
that passes
through the mapping, such as running a calculation against an input port. Connected
stored
procedures run only in normal mode.
. Pre-load of the Source: Before the session retrieves data from the source, the
stored procedure
runs. This is useful for verifying the existence of tables or performing joins of
data in a
temporary table.
. Post-load of the Source: After the session retrieves data from the source, the
stored procedure
runs. This is useful for removing temporary tables.
. Pre-load of the Target: Before the session sends data to the target, the stored
procedure runs.
This is useful for verifying target tables or disk space on the target system.
. Post-load of the Target: After the session sends data to the target, the stored
procedure runs.
This is useful for re-creating indexes on the database.
2. As union transformation gives UNION ALL output, how you will get the UNION
output?
Pass the output of union transformation to a sorter transformation. In the
properties of sorter
transformation check the option select distinct. Alternatively you can pass the
output of union
transformation to aggregator transformation and in the aggregator transformation
specify all ports as
group by ports.
3. What are the guidelines to be followed while using union transformation?
The following rules and guidelines need to be taken care while working with union
transformation:
. You can create multiple input groups, but only one output group.
. All input groups and the output group must have matching ports. The precision,
datatype, and
scale must be identical across all groups.
. The Union transformation does not remove duplicate rows. To remove duplicate
rows, you must
add another transformation such as a Router or Filter transformation.
. You cannot use a Sequence Generator or Update Strategy transformation upstream
from a
Union transformation.
. The Union transformation does not generate transactions.
4. If you place an aggregator after the update strategy transformation, how the
output of aggregator
will be affected?
The update strategy transformation flags the rows for insert, update and delete of
reject before you
perform aggregate calculation. How you flag a particular row determines how the
aggregator
transformation treats any values in that row used in the calculation. For example,
if you flag a row for
delete and then later use the row to calculate the sum, the integration service
subtracts the value
appearing in this row. If the row had been flagged for insert, the integration
service would add its
value to the sum.
5. How to update the target table without using update strategy transformation?
In the session properties, there is an option 'Treat Source Rows As'. Using this
option you can
specify whether all the source rows need to be inserted, updated or deleted.
6. If you have an update strategy transformation in the mapping, what should be the
value selected
for 'Treat Source Rows As' option in session properties?
The value selected for the option is 'Data Driven'. The integration service follows
the instructions
coded in the update strategy transformation.
7. If you have an update strategy transformation in the mapping and you did not
selected the value
'Data Driven' for 'Treat Source Rows As' option in session, then how the session
will behave?
If you do not choose Data Driven when a mapping contains an Update Strategy or
Custom
transformation, the Workflow Manager displays a warning. When you run the session,
the Integration
Service does not follow instructions in the Update Strategy transformation in the
mapping to
determine how to flag rows.
8. In which files the data rejected by update strategy transformation will be
written?
If the update strategy transformation is configured to Forward Rejected Rows then
the integration
service forwards the rejected rows to next transformation and writes them to the
session reject file. If
you do not select the forward reject rows option, the integration service drops
rejected rows and
writes them to the session log file. If you enable row error handling, the
Integration Service writes the
rejected rows and the dropped rows to the row error logs. It does not generate a
reject file.
2. What is the commit type if you have a transaction control transformation in the
mapping?
The commit type is "user-defined".
change for this row. This is the default value of the expression.
. TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the
new transaction.
. TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the
transaction, and begins a new transaction. The current row is in the committed
transaction.
. TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a
new transaction, and writes the current row to the target. The current row is in
the new
transaction.
. TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back
the transaction, and begins a new transaction. The current row is in the rolled
back transaction.
. Input
. Output
. User-defined group
. Default group
The Dump function returns a varchar2 value that includes the data type code, length
in bytes and the
internal representation of the expression.
The syntax of dump function is
dump(expression, [return_format],[start_position],[length])
Data Mining
The start_position and length indicates which portion of the internal
representation to display.
The return_format specifies the return format. The various return formats and its
description are
provided below
return format value, description
8 octal notation
10 decimal notation
16 hexadecimal notation
17 single characters
1008 octal notation with the character set name
1010 decimal notation with the character set name
1016 hexadecimal notation with the character set name
1017 single characters with the character set name
Example: dump('oracle') --Typ=96 Len=6: 111,114,97,99,108,101
Data Mining
Data mining is the process of finding patterns from large data sets and analyzing
data from different
perspectives. It allows business users to analyze data from different angles and
summarize the
relationships identified. Data mining can be useful in increasing the revenue and
cut costs.
Example:
In a supermarket, the persons who bought the tooth brush on Sundays also bought
tooth paste. This
information can be used in increasing the revenue by providing an offer on tooth
brush and tooth
paste. There by selling more number of products (tooth paste and tooth brush) on
Sundays.
Data mining process:
Data mining analyzes relationships and patterns in the stored data based on user
queries. Data
mining involves four tasks.
. Association: Find the relationship between the variables. For example in retail a
store, we can
determine which products are bought together frequently and this information can be
used to
market these products.
. Clustering: Identifying the logical relationship in the data items and grouping
them. For example
in a retail store, a tooth paste, tooth brush can be logically grouped.
. Classifying: Involves in applying a known pattern to the new data.
Data Mart
Data Mart
Data Profiling
Data Profiling
Data profiling is the process of examining the quality of data available in the
data source (database
or file) and collecting statistics and information about the data. A very clean
data source that has
been well maintained before it reaches the data warehouse requires minimal
transformations and
human intervention to load the data into the facts and dimensions. A good data
profiling system can
process very large amounts of data with ease.
A dirty data in the source requires
Data warehouse design is one of the key technique in building the data warehouse.
Choosing a right
data warehouse design can save the project time and cost. Basically there are two
data warehouse
design approaches are popular.
Bottom-Up Design:
In the bottom-up design approach, the data marts are created first to provide
reporting capability. A
data mart addresses a single business area such as sales, Finance etc. These data
marts are then
integrated to build a complete data warehouse. The integration of data marts is
implemented using
data warehouse bus architecture. In the bus architecture, a dimension is shared
between facts in
two or more data marts. These dimensions are called conformed dimensions. These
conformed
dimensions are integrated from data marts and then data warehouse is built.
Advantages of bottom-up design are:
. This model contains consistent data marts and these data marts can be delivered
quickly.
. As the data marts are created first, reports can be generated quickly.
. The data warehouse can be extended easily to accommodate new business units. It
is just
creating new data marts and then integrating with other data marts.
. The positions of the data warehouse and the data marts are reversed in the
bottom-up
approach design.
Top-Down Design:
In the top-down design approach the, data warehouse is built first. The data marts
are then created
from the data warehouse.
Advantages of top-down design are:
Extraction Methods in Data Warehouse
. Provides consistent dimensional views of data across data marts, as all data
marts are loaded
from the data warehouse.
. This approach is robust against business changes. Creating a new data mart from
the data
warehouse is very easy.
The extraction methods in data warehouse depend on the source system, performance
and
business requirements. There are two types of extractions, Logical and Physical. We
will see in
detail about the logical and physical designs.
Logical extraction
There are two types of logical extraction methods:
Full Extraction: Full extraction is used when the data needs to be extracted and
loaded for the first
time. In full extraction, the data from the source is extracted completely. This
extraction reflects the
current data available in the source system.
Incremental Extraction: In incremental extraction, the changes in source data need
to be tracked
since the last successful extraction. Only these changes in data will be extracted
and then loaded.
These changes can be detected from the source data which have the last changed
timestamp. Also
a change table can be created in the source system, which keeps track of the
changes in the source
data.
One more method to get the incremental changes is to extract the complete source
data and then do
a difference (minus operation) between the current extraction and last extraction.
This approach
causes a performance issue.
Physical extraction
The data can be extracted physically by two methods:
Online Extraction: In online extraction the data is extracted directly from the
source system. The
extraction process connects to the source system and extracts the source data.
Data Warehouse
Offline Extraction: The data from the source system is dumped outside of the source
system into a
flat file. This flat file is used to extract the data. The flat file can be created
by a routine process daily.
Logical design:
Logical design deals with the logical relationships between objects. Entity-
relationship (ER) modeling
technique can be used for logical design of data warehouse. ER modeling involves
identifying the
entities (important objects), attributes (properties about objects) and the
relationship among them.
An entity is a chunk of information, which maps to a table in database. An
attribute is a part of an
entity, which maps to a column in database.
A unique identifier can be used to make sure the data is consistent.
Physical design:
Physical design deals with the effective way of storing and retrieving the data. In
the physical design,
the logical design needs to be converted into a description of the physical
database structures.
Physical design involves creation of the database objects like tables, columns,
indexes, primary
keys, foreign keys, views, sequences etc.
Data Warehouse
A data warehouse is a relational database that is designed for query and business
analysis rather
than for transaction processing. It contains historical data derived from
transaction data. This
historical data is used by the business analysts to understand about the business
in detail.
A data warehouse should have the following characteristics:
Subject oriented: A data warehouse helps in analyzing the data. For example, to
know about a
company's sales, a data warehouse needs to build on sales data. Using this data
warehouse we can
find the last year sales. This ability to define a data warehouse by subject
(sales) makes it a subject
oriented.
Integrated: Bringing data from different sources and putting them in to a
consistent format. This
includes resolving the units of measures, naming conflicts etc.
Non volatile: Once the data enters into the data warehouse, the data should not be
updated.
Time variant: To analyze the business, analysts need large amounts of data. So, the
data
warehouse should contain historical data.
The set operators in oracle are UNION, UNION ALL, INTERSECT, MINUS. These set
operators
allow us to combine more than one select statements and only one result set will be
returned.
UNION ALL
. UNION ALL selects all rows from all the select statements
. UNION ALL output is not sorted.
. Distinct keyword cannot be used in select statements.
UNION
. UNION is very similar to UNION ALL, but it suppresses duplicate rows from all the
select
statements.
INTERSECT
. INTERSECT returns the rows that are found common is all select statements.
MINUS
. MINUS returns all the rows from the first select statement except those rows
which are available
or duplicated in the following select statements.
. All the columns in the where clause must be in the select clause for the MINUS
operator to
work.
. Only one ORDER BY clause should be present and it should appear at the very end
of the
statement. The ORDER by clause will accept column names, aliases from the first
select
statement.
. Duplicate rows are automatically eliminated except in UNION ALL
. Column names, aliases from the first query will appear in the result set.
. By default the output is sorted in ascending order of the first column of the
first select statement
except for UNION ALL.
Informatica Performance Improvement Tips
. Use Source Qualifier if the Source tables reside in the same schema.
. Make use of Source Qualifer "Filter" Properties if the Source type is Relational.
. If the subsequent sessions are doing lookup on the same table, use persistent
cache in the first
session. Data remains in the Cache and available for the subsequent session for
usage.
. Use flags as integer, as the integer comparison is faster than the string
comparison.
. Use tables with lesser number of records as master table for joins.
. While reading from Flat files, define the appropriate data type instead of
reading as String and
converting.
. Have all Ports that are required connected to Subsequent Transformations else
check whether
we can remove these ports.
. Suppress ORDER BY using the '--' at the end of the query in Lookup
Transformations.
. Minimize the number of Update strategies.
. Group by simple columns in transformations like Aggregate, Source Qualifier.
. Use Router transformation in place of multiple Filter transformations.
. Turn off the Verbose Logging while moving the workflows to Production
environment.
. For large volume of data drop index before loading and recreate indexes after
load.
. For large of volume of records Use Bulk load Increase the commit interval to a
higher value
large volume of data.
. Set 'Commit on Target' in the sessions.
The following unix command converts the first letter in a string to upper case and
the remaining
letters to lower case.
echo apple | awk '{print toupper(substr($1,1,1)) tolower(substr($1,2))}'
'tr' command will convert one set of characters to another set. The following
command converts
lower case alphabets in to upper case.
echo "apple" | tr [a-z] [A-Z]
Similarly to convert from upper case to lower case, use the following command
echo "APPLE" | tr [A-Z] [a-z]
The tee command in unix writes the output to multiple files and also displays the
output on terminal.
Example:
date | tee -a file1 file2 file3
For more details look at "man tee"
This function is used to concatenate multiple rows into a single column in mysql.
This function returns a string result with the concatenated non-NULL values from a
group. It returns
NULL if there are no non-NULL values.
Syntax of Mysql Group Concat Function
[SEPARATOR str_val])
Example: As an example consider the teachers table with the below data.
teacher_id subjects
------------------
10 English
10 Maths
20 Physics
20 Social
After concatenating the subjects of each teacher, the output will look as
teacher_id subjects_list
------------------
10 English,Maths
20 Physics,Social
Now you are done with creating a non-reusable transformation or session or task.
Normalizer Transformation Error - Informatica
Some times we want to duplicate each row based on a column value. We will see how
to solve this
problem with an example. Assume that we have a products table, which has the
product name and
the number of products sold.
Table: products
product_name
products_sold
Now we want to duplicate each row based on the products_sold field. So, that
product A record will
repeat 2 times and product B record will repeat 3 times as shown below:
product_name
products_sold
A
The following query will generate the duplicate records that we need
SELECT product_name,
products_sold
FROM products p,
(SELECT rownum repeat FROM dual
CONNECT BY LEVEL<=
(SELECT MAX(products_sold) from products )
)r
WHERE p.products_sold>=[Link];
Lets see the conversion of rows to columns with an example. Suppose we have a
products table
which looks like
Table: products
product_id
product_name
AAA
BBB
CCC
2
PPP
QQQ
RRR
prodcut_name_1
prodcut_name_2
prodcut_name_3
AAA
BBB
CCC
PPP
QQQ
RRR
In oracle we can generate sequence numbers from 1 to n by using the below query:
SELECT rownum
FROM dual
CONNECT BY LEVEL<=n;
Replace n with a number.
Flat file header row, footer row and detail rows to multiple tables
Assume that we have a flat file with header row, footer row and detail rows. Now
Lets see how to
load header row into one table, footer row into other table and detail rows into
another table just by
using the transformations only.
First pass the data from source qualifier to an expression transformation. In the
expression
transformation assign unique number to each row (assume exp_count port). After that
pass the data
from expression to aggregator. In the aggregator transformation don't check any
group by port. So
that the aggregator will provide last row as the default output (assume agg_count
port).
Now pass the data from expression and aggregator to joiner transformation. In the
joiner select the
ports from aggregator as master and the ports from expression as details. Give the
join condition on
the count ports and select the join type as master outer join. Pass the joiner
output to a router
transformation and create two groups in the router. For the first group give the
condtion as
exp_count=1, which gives header row. For the second group give the condition as
exp_count=agg_count, which gives the footer row. The default group will give the
detail rows.
Click Here to download the mapping xml code
Note: Check the sorted input option in the joiner properties. Otherwise you can't
connect the data
from expression and aggregator.
Lets see the conversion of columns to rows with an example. Suppose we have a table
which
contains the subjects handled by each teacher. The table looks like
Table: teachers
teacher_id
subject1
subject2
subject3
maths
physics
english
social
science
drawing
teacher_id
subject
maths
physics
english
social
2
science
drawing
To achieve this we need each row in teachers table to be repeated 3 times (number
of subject
columns). The following query converts the columns into rows:
SELECT teacher_id,
CASE pivot
WHEN 1
THEN subject1
WHEN 2
THEN subject2
WHEN 3
THEN subject3
ELSE NULL
END subject
FROM teachers,
(SELECT rownum pivot from dual
CONNECT BY LEVEL <=3)
1. Informatica 9 supports data integration for the cloud as well as on premise. You
can integrate the
data in cloud applications, as well as run Informatica 9 on cloud infrastructure.
2. Informatica analyst is a new tool available in Informatica 9.
3. There is architectural difference in Informatica 9 compared to previous version.
There are four types of schemas are available in data warehouse. Out of which the
star schema is
mostly used in the data warehouse designs. The second mostly used data warehouse
schema is
snow flake schema. We will see about these schemas in detail.
Star Schema:
A star schema is the one in which a central fact table is sourrounded by
denormalized dimensional
tables. A star schema can be simple or complex. A simple star schema consists of
one fact table
where as a complex star schema have more than one fact table.
Galaxy Schema:
Galaxy schema contains many fact tables with some common dimensions (conformed
dimensions).
This schema is a combination of many data marts.
A fact table is the one which consists of the measurements, metrics or facts of
business process.
These measurable facts are used to know the business value and to forecast the
future business.
The different types of facts are explained in detail below.
Types of Dimensions in data warehouse
Additive:
Additive facts are facts that can be summed up through all of the dimensions in the
fact table. A
sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in
the fact table,
but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not
through the
time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact
table.
Eg: Facts which have percentages, ratios calculated.
In the real world, it is possible to have a fact table that contains no measures or
facts. These tables
are called "Factless Fact tables".
Eg: A fact table which has only product key and date key is a factless fact. There
are no measures in
this table. But still you can get the number products sold over a period of time.
A fact tables that contain aggregated facts are often called summary tables.
A dimension table consists of the attributes about the facts. Dimensions store the
textual
descriptions of the business. With out the dimensions, we cannot measure the facts.
The different
types of dimension tables are explained in detail below.
Conformed Dimension:
Concatenating multiple rows into a single column dynamically - Oracle
Conformed dimensions mean the exact same thing with every possible fact table to
which they are
joined.
Eg: The date dimension table connected to the sales facts is identical to the date
dimension
connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure
that provides a
convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the
fact table we
need to maintain two keys referring to these dimensions. Instead of that create a
junk dimension
which has all the combinations of gender and marital status (cross join gender and
marital status
table and create a junk table). Now we can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesn't have its own
dimension table.
Eg: A transactional code in a fact table.
Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are
called role-
playing dimensions. For example, a date dimension can be used for �date of sale",
as well as "date
of delivery", or "date of hire".
Teacher_id subject_name
-----------------------
1 Biology
1 Maths
1 Physics
2 English
2 Social
The above table is a normalized table containing the subjects and teacher id. We
will denormalize
the table, by concatenating the subjects of each teacher into a single column and
thus preserving
the teacher id as unique in the output. The output data should look like as below
teacher_id subjects_list
-------------------------------
1 Biology|Maths|Physics
2 English|Social
SELECT teacher_id,
SUBSTR(SYS_CONNECT_BY_PATH(subject_name, '|'),2)
subjects_list
FROM
SELECT teacher_id,
subject_name,
FROM teachers
) A
WHERE sub_seq=sub_cnt