perl去除重复内容的脚本代码(重复行+数组重复字段)

摘要：假如有这样的一段序列：12122113141541我们需要得到如下的结果：13152141那么，请借助以下的perl脚本来实现。代码一：复制...

假如有这样的一段序列：

1 2

2 1

1 3

1 4

1 5

4 1

我们需要得到如下的结果：

1 3

1 5

2 1

4 1

那么，请借助以下的perl脚本来实现。

代码一：

复制代码代码如下:

#!/bin/perl

use strict;

use warnings;

my $filename;

my %hash;

my @information;

my $key1;

my $key2;

print "please put in the file like this f:perldata.txtn";

chomp($filename=<STDIN>);

open(IN,"$filename")||die("can not open");

while(<IN>)

{

chomp;

@information=split/s+/,$_;

if(exists $hash{$information[0]}{$information[1]})

{

next;

}

else

{

$hash{$information[0]}{$information[1]}='A';

}

close IN;

open(IN,"$filename")||die("can not open");

while(<IN>)

{

@information=split/s+/,$_;

if(exists $hash{$information[1]}{$information[0]})

{

delete $hash{$information[0]}{$information[1]}

}

else

{

next;

}

close IN;

open(OUT,">f:A_B_result.txt")||die("can not open");

foreach $key1 (sort{$a<=>$b} keys %hash)

{

foreach $key2 (sort{$a<=>$b} keys %{$hash{$key1}})

{

print OUT "$key1 $key2n";

}

close OUT;

代码二：

如果有一个文件data有10G大，但是有好多行都是重复的，需要将该文件中重复的行合并为一行，那么我们需要用什么办法来实现

cat data |sort|uniq > new_data #该方法可以实现，但是你需要花上好几个小时。结果才能出来。

下面是一个使用perl脚本来完成此功能的小工具。原理很简单，创建一个hash，每行的内容为键,值由每行出现的次数来填充，脚本如下;

复制代码代码如下:

#!/usr/bin/perl

# Author :CaoJiangfeng

# Date:2011-09-28

# Version :1.0

use warnings;

use strict;

my %hash;

my $script = $0; # Get the script name

sub usage

{

printf("Usage:n");

printf("perl $script <source_file> <dest_file>n");

}

# If the number of parameters less than 2 ,exit the script

if ( $#ARGV+1 < 2) {

&usage;

exit 0;

}

my $source_file = $ARGV[0]; #File need to remove duplicate rows

my $dest_file = $ARGV[1]; # File after remove duplicates rows

open (FILE,"<$source_file") or die "Cannot open file $!n";

open (SORTED,">$dest_file") or die "Cannot open file $!n";

while(defined (my $line = <FILE>))

{

chomp($line);

$hash{$line} += 1;

# print "$line,$hash{$line}n";

}

foreach my $k (keys %hash) {

print SORTED "$k,$hash{$k}n";#改行打印出列和该列出现的次数到目标文件

}

close (FILE);

close (SORTED);

代码三：

通过perl脚本，删除数据组中重复的字段

复制代码代码如下:

#!/usr/bin/perl

use strict;

my %hash;

my @array = (1..10,5,20,2,3,4,5,5);

#grep 保存符合条件的元素

@array = grep { ++$hash{$_} < 2 } @array;

print join(" ",@array);

print "n";