NER model #6278

Bmikaella · 2020-10-20T10:17:17Z

Bmikaella
Oct 20, 2020

Hi!
I have a question regarding the architecture of the NER models. I have watched the video explanation (https://spacy.io/universe/project/video-spacys-ner-model) but I’m unsure if I understand all of it correctly.
I’m mostly concerned about the CNN part. Here is my understanding of it.

input_vector = [number_of_words x embedding_size]
cnn_layer_n = [3 x embedding_size]

For a sentence of 6 words:
input vector = [v1,v2,v3,v4,v5,v6]
vn is of size [1 x embedding_size]

First, we pad the input_vector to preserve the dimensions. A window (of size 3) is slid across the padded input vector (sliding step is one) and for every three words, max component in each dimension is used. After this, we have obtained a vector of the same size as the input that is a combination of the two surrounding words. The output is added to the input into each CNN layer.

[v1,v2,v3,v4,v5,v6] — padd —> [0,v1,v2,v3,v4,v5,v6,0]
0 denotes a zero vector not a scalar value.
[0,v1,v2,v3,v4,v5,v6,0] —- cnn —> [maxout (0,v1,v2), maxout(v1,v2,v3), maxout(v2,v3,v4), maxout(v3,v4,v5), maxout(v4,v5,v6), maxout(v5,v6,0)]

My question here is about the CNN layer. There is no mention of kernels and neither how many are used. For the dimensions to stay unchanged and for the proposed flow to work are the kernels of size [3 x embeddig_size]? I assume that the number of kernels should be the size of the embedding as [8 X embedding size] goes into the CNN and the output should be [6 x embedding size].
If there are no kernels and we only slide a window of size: [3 x embedding_size] over the input and apply maxout to each component then the proposed flow also works. I’m only confused as the CNN is mentioned but there is no mention of kernels and filters.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER model #6278

{{title}}

Replies: 0 comments

Select a reply

NER model #6278

Bmikaella Oct 20, 2020

Replies: 0 comments

Bmikaella
Oct 20, 2020