Could anybody explain this code?

Hi,

I have a shell script and it looks like this:

#!/bin/bash
for i in *dat.gz do gunzip $i echo uniprot_sprot_archaea.dat | perl -slane '$a=(split /\_/, $_)[2]; $a=~/(\w+).dat/; $b=$1; print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i done

I don't know coding. But I need to understand this perl commands. From echo to end of the command, I don't understand. Could someone please explain them?

Thanks a ton.

5 Comments

Looks like redundant nonsense to me. The perl is rather clumsy but it splits the input, which is always "uniprot_sprot_archaea.dat", on underscore characters and extracts the third part, so that's always "archaea.dat", then it finds a word before the ".dat", so that's always "archaea", and then prints another perl command line assembled from that, which will always be:
perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta
Then finally there's an apparently meaningless "-- -i=$i" on the end.

My best guess is that this was intended to construct and run some perl command against each of the unzipped files but it didn't work so someone modified it slightly in an attempt to debug it (trying to see what commands it would have run) and then gave up and left it in a broken state.

Mixture of bash and perl, aimed at generating another perl script, I think.


1) Bash for every files ending in dat.gz puts that file nam in $i does the following.
2) it unrchives that file
3) the next bit extracts the characters in 'uniprot_sprot_archaea.dat' passing it into a perl script, splits on @_@, and gets the third match found and puts it in $a. Then everything before.dat is captures and put into $b. This will always be "archaea" in this context.
4) it then creates a line containing ' "perl screen_complete_proteome_from_uniprot_division.pl \$i "...ie.e a line if execuated would call a perl program calles screen_complete_proteome_from_uniprot_division.pl, passing to it each of the filenames it has found and put into $i.
5) The output for this print is a file called " uniprot_archea.fasta"

So at the end of this the file uniprot_archea_fasta would be a perl script that if executed runs screen_complete_proteome_from_uniprot_division.pl on every archivename that ends in dat.gz after unzipping that archive.

The eror is probably that initial extraction would always yield the same result (archea).

Sorry, I think the new perl exectuable is actually printed to the screen?, and the perl gnerated copies the results of running sreen_complete(too long snip).pl into uniprot_archea.fasta. Maybe.

As others have said, your script looks like an unfinished attempt to solve a problem, and doesn’t actually do much. It just uncompresses the *dat.gz files in the current directory. For each of the files it also prints a line to the screen, but it’s always the same line, so it obviously serves no purpose.

Btw, blogs.perl.org is (as the name says) not a forum. You will have better luck asking somewhere like https://perlmonks.org/

Merlyn:

The bit at the end is not meaningless. It works with the -s switch that’s passed to the perl call. It makes the value of the -i switch into the value of the variable $i in the code. And because the value of the -i switch is the shell variable $i, the Perl variable $i gets the same value as the shell variable $i: the filename being processed.

(Surprise! Did you know Perl had this feature? 😊 You are likely one of 99.9% of Perl programmers who have never seen its -s switch in their life. (For good reason: it’s basically the register_globals of Perl. The Getopt::* modules are generally a much better idea than setting arbitrary global variables from the command line.))

However, this does not end up doing anything. Because of the backslash in the Perl code, the $i variable is never actually used.

Leave a comment

About seq25

user-pic I have questions about Perl.