This project is a web application that allows users to search for academic papers and articles based on a given query. It utilizes the arXiv API and Google Custom Search Engine (CSE) to retrieve relevant results and ranks them based on their similarity to the query using embeddings and cosine similarity.
- Search for academic papers on arXiv and articles using Google CSE
- Generate keywords in boolean format for the given query using a language model
- Calculate relatedness scores between the query and search results using embeddings
- Display search results with titles, summaries, publication dates, URLs, and relatedness scores
- Store past searches and allow users to revisit them
- Delete past searches from the sidebar
- Python
- Streamlit
- arXiv API
- Google Custom Search API
- Ollama (for embeddings and language model)
- Pandas
- SciPy
- Langchain
-
Clone the repository
-
pip install -r requirements.txt
-
Create a
.env
file in the project directory. -
Obtain API key for:
- Google Custom Search Engine (https://developers.google.com/custom-search)
-
Add the following to your
.env
file:GOOGLE_CSE_KEY=your_cse_api_key GOOGLE_CSE_ID=your_cse_engine_id
- Select the search engine (arXiv or Google CSE)
- Enter a query in the text area
- Click the "Search" button to retrieve relevant results
- View the search results with their titles, summaries, URLs, and relatedness scores
- Access past searches from the sidebar and revisit or delete them
arxiv/
: Contains CSV files with past arXiv search resultscse/
: Contains CSV files with past Google CSE search results
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the GNU GENERAL PUBLIC LICENSE.