Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use in windows 10 error : UnboundLocalError: local variable 'browser' referenced before assignment #777

Open
1272870698 opened this issue Oct 29, 2024 · 10 comments

Comments

@1272870698
Copy link

playwright install :

image

use :
image

error :

UnboundLocalError: local variable 'browser' referenced before assignment

detail is :

Cell In[4], line 27
     18 # ************************************************
     19 # Create the SmartScraperGraph instance and run it
     20 # ************************************************
     21 smart_scraper_graph = SmartScraperGraph(
     22     prompt="Find some information about what does the company do, the name and a contact email.",
     23     source="https://scrapegraphai.com/",
     24     config=graph_config
     25 )
---> 27 result = smart_scraper_graph.run()
     28 print(result)

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\graphs\smart_scraper_graph.py:212, in SmartScraperGraph.run(self)
    204 """
    205 Executes the scraping process and returns the answer to the prompt.
    206 
    207 Returns:
    208     str: The answer to the prompt.
    209 """
    211 inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 212 self.final_state, self.execution_info = self.graph.execute(inputs)
    214 return self.final_state.get("answer", "No answer found.")

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\graphs\base_graph.py:284, in BaseGraph.execute(self, initial_state)
    282     return (result["_state"], [])
    283 else:
--> 284     return self._execute_standard(initial_state)

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\graphs\base_graph.py:198, in BaseGraph._execute_standard(self, initial_state)
    185     graph_execution_time = time.time() - start_time
    186     log_graph_execution(
    187         graph_name=self.graph_name,
    188         source=source,
   (...)
    196         exception=str(e)
    197     )
--> 198     raise e
    199 node_exec_time = time.time() - curr_time
    200 total_exec_time += node_exec_time

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\graphs\base_graph.py:182, in BaseGraph._execute_standard(self, initial_state)
    180 with self.callback_manager.exclusive_get_callback(llm_model, llm_model_name) as cb:
    181     try:
--> 182         result = current_node.execute(state)
    183     except Exception as e:
    184         error_node = current_node.node_name

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\nodes\fetch_node.py:130, in FetchNode.execute(self, state)
    128     return self.handle_local_source(state, source)
    129 else:
--> 130     return self.handle_web_source(state, source)

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\nodes\fetch_node.py:305, in FetchNode.handle_web_source(self, state, source)
    303 else:
    304     loader = ChromiumLoader([source], headless=self.headless, **loader_kwargs)
--> 305     document = loader.load()
    307 if not document or not document[0].page_content.strip():
    308     raise ValueError("""No HTML body content found in
    309                      the document fetched by ChromiumLoader.""")

File D:\softs\anaconda3\envs\flux\lib\site-packages\langchain_core\document_loaders\base.py:31, in BaseLoader.load(self)
     29 def load(self) -> list[Document]:
     30     """Load data into Document objects."""
---> 31     return list(self.lazy_load())

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\docloaders\chromium.py:192, in ChromiumLoader.lazy_load(self)
    189 scraping_fn = getattr(self, f"ascrape_{self.backend}")
    191 for url in self.urls:
--> 192     html_content = asyncio.run(scraping_fn(url))
    193     metadata = {"source": url}
    194     yield Document(page_content=html_content, metadata=metadata)

File D:\softs\anaconda3\envs\flux\lib\site-packages\nest_asyncio.py:30, in _patch_asyncio.<locals>.run(main, debug)
     28 task = asyncio.ensure_future(main)
     29 try:
---> 30     return loop.run_until_complete(task)
     31 finally:
     32     if not task.done():

File D:\softs\anaconda3\envs\flux\lib\site-packages\nest_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File D:\softs\anaconda3\envs\flux\lib\asyncio\futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception.with_traceback(self._exception_tb)
    202 return self._result

File D:\softs\anaconda3\envs\flux\lib\asyncio\tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File D:\softs\anaconda3\envs\flux\lib\site-packages\scrapegraphai\docloaders\chromium.py:136, in ChromiumLoader.ascrape_playwright(self, url)
    134             results = f"Error: Network error after {self.RETRY_LIMIT} attempts - {e}"
    135     finally:
--> 136         await browser.close()
    138 return results

pelase help me

@1272870698 1272870698 changed the title use in windows 10 error use in windows 10 error : UnboundLocalError: local variable 'browser' referenced before assignment Oct 29, 2024
@VinciGit00
Copy link
Collaborator

which side is this?

@VinciGit00
Copy link
Collaborator

Hi can I have reply?

@aleenprd
Copy link

I am having the same issue using a container to run the code. Problem not present in local run

@aleenprd
Copy link

which side is this?

what do you mean side??

@VinciGit00
Copy link
Collaborator

Website

@aleenprd
Copy link

I don't think it's relevant. Also watch my issue I just posted

@calvincolton
Copy link

calvincolton commented Nov 20, 2024

I too am experiencing the same issue with a container. I am currently trying with the python playwright docker image:

# Use Playwright's official Docker image
FROM mcr.microsoft.com/playwright/python:v1.48.0-noble

ARG OPENAI_API_KEY
ENV OPENAI_API_KEY=$OPENAI_API_KEY

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app /app

ENV PYTHONPATH=/app

CMD ["sh", "-c", "export PYTHONPATH=/app && python main.py"]

I have followed the quick install instructions and have tried both headless, and non-headless options, i.e.:

graph_config = {
            "llm": {
                "api_key": OPENAI_API_KEY,
                "model": "openai/gpt-4o-mini",
            },
            "verbose": True,
            "headless": True,  # Headless mode for Docker compatibility
        }

The below might not fix it, but It looks as though there is an error in the scrapegraphai/docloaders/chromium.py file--the browser variable in the finally block is not guaranteed to be instantiated. It should be instantiated before the try/except/block, assigned a nullish value, e.g. None and checked with an if block in the finally block

@dejoma
Copy link

dejoma commented Dec 9, 2024

Same issue here when running in AWS lambda. This is my scraper config:

SCRAPE_CONFIG = {
    "llm": {
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "openai/gpt-4o-mini",
    },
    "num_results": 5,
    "loader_kwargs": {
        # https://github.com/microsoft/playwright/issues/14023
        "args": ["--single-process", "--disable-gpu", "--disable-dev-shm-usage"],
    },
    "force": True,
    "verbose": True,
    "headless": True,
}

@VinciGit00
Copy link
Collaborator

VinciGit00 commented Dec 9, 2024

Hi @dejoma, I suggest to use our API with AWS lambda, they are more easier to implement https://scrapegraphai.com

@dejoma
Copy link

dejoma commented Dec 10, 2024

@VinciGit00 Could you give insights in the setup of that paywalled API; Then maybe I can overcome these issues instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants