Articles

How to merge the contents of one file with the contents of another file

In Unix on 04/03/2010 by pier0w Tagged: ,

This is how you use AWK to take the contents of one file and then insert that content into a specific place within another file.

Say we have file1 and file2 that have the following contents:

file1:
one, NUM, three
four, NUM, six
seven, NUM, nine
ten, NUM, twelve

file2:
two
five
eight
eleven

First we need to get AWK to pars both files, this can be done as follows:

~#: awk '{print $0}' file2 file1
two
five
eight
eleven
one, NUM, three
four, NUM, six
seven, NUM, nine
ten, NUM, twelve

So you can see it is very easy to get AWK to pars multiple files, all you have to do is add as many files as you like as the final arguments.
Next we need to get AWK to run one set of code over the first file then another set of code over the next file. This can be done with the use of the NR and FNR global variables. NR is a number that increments for every line that is parsed by AWK over all of the files.

~#: awk '{print NR "\t" $0}' file2 file1
1 two
2 five
3 eight
4 eleven
5 one, NUM, three
6 four, NUM, six
7 seven, NUM, nine
8 ten, NUM, twelve

Where FNR is incremented for every line that is parsed by AWK within each file.

~#: awk '{print FNR "\t" $0}' file2 file1
1 two
2 five
3 eight
4 eleven
1 one, NUM, three
2 four, NUM, six
3 seven, NUM, nine
4 ten, NUM, twelve

So to get AWK to run some code over only the first file we can use a conditional to check weather the global counts equals the file count and then only run the code when this is true.

~#: awk 'NR==FNR{print "file2\t" $0; next}{print "file1\t" $0}' file2 file1
file2 two
file2 five
file2 eight
file2 eleven
file1 one, NUM, three
file1 four, NUM, six
file1 seven, NUM, nine
file1 ten, NUM, twelve

In the code above we check to see if we are processing the first file with the “NR==FNR” conditional. If this is true we will run the code in the curly braces directly to the right of the conditional. Now the conditional only applies to this code not the second code block, this means that the second code block will be run every time. So to stop this we call the next function within the first code block, this is similar to “break;” and tells AWK to stop processing this line and move to the next.

Now that we have separate code blocks running over each file we need to insert the text from the first file into the second. To do this we will need to record the contents of the first file into an array to be used when parsing the second file.

~#: awk 'NR==FNR{array\[FNR\]=$1; next}{print "file1\t" $0 "\tfile2\t" array\[FNR\]}' file2 file1
file1 one, NUM, three file2 two
file1 four, NUM, six file2 five
file1 seven, NUM, nine file2 eight
file1 ten, NUM, twelve file2 eleven

In the first code block we have built the array with the first element within file2 and indexed the array with file2’s line numbers array[FNR]=$1. Then we have appended the contents of the array onto the end of each line of the second file using the second files line numbers print "file1\t" $0 "file2\t" array[FNR].
Now that we are able to use the data of file2 with the data of file1 we can do a search and replace. This can be done with AWK’s gsub function, this function is used by supplying a regular expression for the first argument that will be used for the search and then some text as the second argument to be used as the replace. So gsub(/one/,"two") would be the same as :s/one/two/g in vim.

~#: awk 'NR==FNR{array\[FNR\]=$0; next}{gsub(/NUM/,array\[FNR\]);print}' file2 file1
one, two, three
four, five, six
seven, eight, nine
ten, eleven, twelve

There is also another way to carry out this replace, that would be to replace the field instead of the string.

~#: awk 'NR==FNR{array\[FNR\]=$0; next}{$2=array\[FNR\]",";print}' file2 file1
one, two, three
four, five, six
seven, eight, nine
ten, eleven, twelve

Advertisements

Comments Off on How to merge the contents of one file with the contents of another file

%d bloggers like this: