`
ykk81ykk
  • 浏览: 12069 次
社区版块
存档分类
最新评论

poj 3294 Life Forms 求n(n>1)个字符串的最长的一个子串 后缀数组

阅读更多

  Description You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes like cubes, oil slicks or clouds of dust. The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant's life forms ended up with a large fragment of common DNA. Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them. Input Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case. Output Sample Input Sample Output bcdefg cdefgh ?#include #include #include #include using namespace std; ///后缀数组 倍增算法 const int maxn=500000; char str[maxn]; int wa[maxn],wb[maxn],wv[maxn],wn[maxn],a[maxn],sa[max n]; int cmp(int* r,int a,int b,int l) {return r[a]==r[b]&&r[a+l]==r[b+l];} /**n为字符串长度,m为字符的取值范围,r为字符串。后面的j为每次排 序时子串的长度*/ void DA(int* r,int* sa,int n,int m) { int i,j,p,*x=wa,*y=wb,*t; ///对R中长度为1的子串进行基数排序 for(i=0;i=0;i--)sa[--wn[x[i]]]=i; for(j=1,p=1;p=j)y[p++]=sa[i]-j; ///基数排序 for(i=0;i=0;i--)sa[--wn[wv[i]]]=y[i]; ///当p=n的时候,说明所有串都已经排好序了 ///在第一次排序以后,rank数组中的最大值小于p,所以让m=p for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1;i字符串后面添加了一个0号字符,所以它必然是最小的 一个后缀。而字符串中的其他字符都应该是大于0的(前面有提到,使用倍 增算法前需要确保这点),所以排名第二的字符串和0号字符的公共前缀 (即height[1])应当为0.在调用calheight函数时,要注意height数组的范 围应该是[1..n]。所以调用时应该是calheight(r,sa,n) 而不是calheight(r,sa,n+1)。*/ int rank[maxn],height[maxn]; void calheight(int* r,int* sa,int n) { int i,j,k=0; for(i=1;i字符串的最长的一个子串 int n=0;//总字符串长度 int m;//字符串个数 int l,r; int belong[maxn];//属于第几个字符串 int cnt[200]; int _check( int mid ){ memset(cnt,0,sizeof(cnt)); int flag= 1, ans= 0; for( int i= 1; i字符串 if( ans>= m/ 2+ 1 ) return 1; } return 0; } void print( int mid ){ memset(cnt,0,sizeof(cnt)); int ans= 0, flag= 1, isp= 0, beg; for( int i= 1; i= (m/ 2+ 1 ) ){ isp= 1; for( int j= 0; j字符串 for(int i=0;i字符串的最长的一个子串,满足该子串在超过一半以上的字符串 中出现过,并输出该子串,如果有多个子串满足要求,则按字典序输出所有的子串; 算法:二分长度+后缀数组 将n个字符串连起来,中间用不相同的且没有出现在字符串中的字符隔开,求 后缀数组。然后二分答案,将后缀分成若干组,判断每组的后缀是否出现在不 小于k个的原串中。这个做法的时间复杂度为O(nlogn)*/ while(scanf("%d",&m)==1&&m) { n=0; for(int i=1;i>1; if(_check(mid)) l=mid+1; else r=mid; } int ans=r-1; if(!ans) printf("?\n"); else print(ans); printf("\n");//题目要求多输出一个空行 } return 0; } 
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics