Suffix tree construction LCP array



case 1 (



d
(
v
)
=
h
[
i
+
1
]


{\displaystyle d(v)=h[i+1]}

): suppose suffixes



a
$


{\displaystyle a\$}

,



a
n
a
$


{\displaystyle ana\$}

,



a
n
a
n
a
$


{\displaystyle anana\$}

,



b
a
n
a
n
a
$


{\displaystyle banana\$}

of string



s
=
b
a
n
a
n
a
$


{\displaystyle s=banana\$}

added suffix tree. suffix



n
a
$


{\displaystyle na\$}

added tree shown in picture. rightmost path highlighted in red.


start



s

t

0




{\displaystyle st_{0}}

, tree consisting of root. insert



a
[
i
+
1
]


{\displaystyle a[i+1]}





s

t

i




{\displaystyle st_{i}}

, walk rightmost path beginning @ inserted leaf



a
[
i
]


{\displaystyle a[i]}

root, until deepest node



v


{\displaystyle v}





d
(
v
)

h
[
i
+
1
]


{\displaystyle d(v)\leq h[i+1]}

reached.


we need distinguish 2 cases:







d
(
v
)
=
h
[
i
+
1
]


{\displaystyle d(v)=h[i+1]}

: means concatenation of labels on root-to-



v


{\displaystyle v}

path equals longest common prefix of suffixes



a
[
i
]


{\displaystyle a[i]}

,



a
[
i
+
1
]


{\displaystyle a[i+1]}

.

in case, insert



a
[
i
+
1
]


{\displaystyle a[i+1]}

new leaf



x


{\displaystyle x}

of node



v


{\displaystyle v}

, label edge



(
v
,
x
)


{\displaystyle (v,x)}





s
[
a
[
i
+
1
]
+
h
[
i
+
1
]
,
n
]


{\displaystyle s[a[i+1]+h[i+1],n]}

. edge label consists of remaining characters of suffix



a
[
i
+
1
]


{\displaystyle a[i+1]}

not represented concatenation of labels of root-to-



v


{\displaystyle v}

path.

this creates partial suffix tree



s

t

i
+
1




{\displaystyle st_{i+1}}

.

case 2 (



d
(
v
)
<
h
[
i
+
1
]


{\displaystyle d(v)<h[i+1]}

): in order add suffix



n
a
n
a
$


{\displaystyle nana\$}

, edge inserted suffix



n
a
$


{\displaystyle na\$}

has split up. new edge new internal node labeled longest common prefix of suffixes



n
a
$


{\displaystyle na\$}

,



n
a
n
a
$


{\displaystyle nana\$}

. edges connecting 2 leaves labeled remaining suffix characters not part of prefix.







d
(
v
)
<
h
[
i
+
1
]


{\displaystyle d(v)<h[i+1]}

: means concatenation of labels on root-to-



v


{\displaystyle v}

path displays less characters longest common prefix of suffixes



a
[
i
]


{\displaystyle a[i]}

,



a
[
i
+
1
]


{\displaystyle a[i+1]}

, missing characters contained in edge label of



v


{\displaystyle v}

s rightmost edge. therefore, have split edge follows:

let



w


{\displaystyle w}

child of



v


{\displaystyle v}

on



s

t

i




{\displaystyle st_{i}}

s rightmost path.


a simple amortization argument shows running time of algorithm bounded



o
(
n
)


{\displaystyle o(n)}

:


the nodes traversed in step



i


{\displaystyle i}

walking rightmost path of



s

t

i




{\displaystyle st_{i}}

(apart last node



v


{\displaystyle v}

) removed rightmost path, when



a
[
i
+
1
]


{\displaystyle a[i+1]}

added tree new leaf. these nodes never traversed again subsequent steps



j
>
i


{\displaystyle j>i}

. therefore, @



2
n


{\displaystyle 2n}

nodes traversed in total.







Comments

Popular posts from this blog

Causes Portuguese conquest of the Banda Oriental

History The Vandals

Publications Daniel Kolak