Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there callbacks when processing closed labels? #543

Open
ddx533534 opened this issue Jun 20, 2024 · 5 comments
Open

Are there callbacks when processing closed labels? #543

ddx533534 opened this issue Jun 20, 2024 · 5 comments

Comments

@ddx533534
Copy link

ddx533534 commented Jun 20, 2024

when label closed, I wanna do something, Is there anyway to implement it? For example:

// when encounter </div> I want to add a line break for the text
<div><span>A</span></div>
@max-heller
Copy link

I hoped TreeSink::pop() would allow this, but it doesn't seem to be called when I expect it to.

@max-heller
Copy link

Seems like #149 is asking for the same thing. #149 (comment) suggests pop() may be usable, but that it isn't always called

@ddx533534
Copy link
Author

    /// Indicate that a node was popped off the stack of open elements.
    fn pop(&mut self, _node: &Self::Handle) {}

Yep, I have tried TreeSink::pop(), but it isn't called every time that encountered end tags. Maybe it wasn't designed to do that.

@simonwuelker
Copy link
Contributor

Yep, I have tried TreeSink::pop(), but it isn't called every time that encountered end tags. Maybe it wasn't designed to do that

Do you have an example where pop should be called, but isn't?

Not all pop calls are caused by closing tags, because html closes elements automagically in certain situation. But I believe all end tags should result in popping an element from the stack of open elements (-> pop being called).

@max-heller
Copy link

max-heller commented Jan 1, 2025

Do you have an example where pop should be called, but isn't?

Not all pop calls are caused by closing tags, because html closes elements automagically in certain situation. But I believe all end tags should result in popping an element from the stack of open elements (-> pop being called).

Running this test with the code below, I'd expect to see a call to pop() for the <span> since it was properly closed:

#[test]
fn pop() {
    use html5ever::tendril::TendrilSink;
    let mut parser = parser();
    parser.process("<span>hello</span>".into());

    let sink = &parser.tokenizer.sink.sink;
    let open = sink.open.borrow();
    let open = open
        .iter()
        .map(|node| sink.elem_name(node).local.to_string())
        .collect::<Vec<_>>();
    assert_eq!(open, Vec::<String>::new());
}

Instead, I get:

assertion `left == right` failed
  left: ["body", "html", "span"]
 right: []
Code
use std::{
    borrow::Cow,
    cell::{self, RefCell},
    collections::BTreeSet,
    fmt,
};

use html5ever::{interface::TreeSink, namespace_url};

type Handle = <scraper::HtmlTreeSink as TreeSink>::Handle;

pub fn parser() -> html5ever::Parser<HtmlTreeSink> {
    html5ever::driver::parse_fragment(
        HtmlTreeSink {
            open: Default::default(),
            inner: scraper::HtmlTreeSink::new(scraper::Html::new_fragment()),
        },
        Default::default(),
        html5ever::QualName::new(None, html5ever::ns!(html), html5ever::local_name!("body")),
        Vec::new(),
    )
}

pub struct HtmlTreeSink {
    open: RefCell<BTreeSet<Handle>>,
    inner: scraper::HtmlTreeSink,
}

impl TreeSink for HtmlTreeSink {
    type Handle = Handle;
    type Output = <scraper::HtmlTreeSink as TreeSink>::Output;
    type ElemName<'a> = <scraper::HtmlTreeSink as TreeSink>::ElemName<'a>;

    fn finish(self) -> Self::Output {
        self.inner.finish()
    }

    fn parse_error(&self, msg: Cow<'static, str>) {
        self.inner.parse_error(msg)
    }

    fn get_document(&self) -> Self::Handle {
        self.inner.get_document()
    }

    fn elem_name<'a>(&'a self, target: &'a Self::Handle) -> Self::ElemName<'a> {
        self.inner.elem_name(target)
    }

    fn create_element(
        &self,
        name: html5ever::QualName,
        attrs: Vec<html5ever::Attribute>,
        flags: html5ever::interface::ElementFlags,
    ) -> Self::Handle {
        let handle = self.inner.create_element(name, attrs, flags);
        self.open.borrow_mut().insert(handle);
        handle
    }

    fn pop(&self, node: &Self::Handle) {
        println!("pop({})", self.elem_name(node).local.as_ref());
        self.open.borrow_mut().remove(node);
    }

    fn create_comment(&self, text: html5ever::tendril::StrTendril) -> Self::Handle {
        self.inner.create_comment(text)
    }

    fn create_pi(
        &self,
        target: html5ever::tendril::StrTendril,
        data: html5ever::tendril::StrTendril,
    ) -> Self::Handle {
        self.inner.create_pi(target, data)
    }

    fn append(&self, parent: &Self::Handle, child: html5ever::interface::NodeOrText<Self::Handle>) {
        self.inner.append(parent, child)
    }

    fn append_based_on_parent_node(
        &self,
        element: &Self::Handle,
        prev_element: &Self::Handle,
        child: html5ever::interface::NodeOrText<Self::Handle>,
    ) {
        self.inner
            .append_based_on_parent_node(element, prev_element, child)
    }

    fn append_doctype_to_document(
        &self,
        name: html5ever::tendril::StrTendril,
        public_id: html5ever::tendril::StrTendril,
        system_id: html5ever::tendril::StrTendril,
    ) {
        self.inner
            .append_doctype_to_document(name, public_id, system_id)
    }

    fn get_template_contents(&self, target: &Self::Handle) -> Self::Handle {
        self.inner.get_template_contents(target)
    }

    fn same_node(&self, x: &Self::Handle, y: &Self::Handle) -> bool {
        self.inner.same_node(x, y)
    }

    fn set_quirks_mode(&self, mode: html5ever::interface::QuirksMode) {
        self.inner.set_quirks_mode(mode)
    }

    fn append_before_sibling(
        &self,
        sibling: &Self::Handle,
        new_node: html5ever::interface::NodeOrText<Self::Handle>,
    ) {
        self.inner.append_before_sibling(sibling, new_node)
    }

    fn add_attrs_if_missing(&self, target: &Self::Handle, attrs: Vec<html5ever::Attribute>) {
        self.inner.add_attrs_if_missing(target, attrs)
    }

    fn remove_from_parent(&self, target: &Self::Handle) {
        self.inner.remove_from_parent(target)
    }

    fn reparent_children(&self, node: &Self::Handle, new_parent: &Self::Handle) {
        self.inner.reparent_children(node, new_parent)
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants