In today's world, it is a well known fact the internet and social media can be used to influence opinions and spread hate. Although the platform is not to blame, popular brands like facebook lose customers and their usage statistics go down because many people find some content offensive or inappropriate. In the recent past, people have been very vocal about their opinions, and have been posting offendable posts. This has caused social unrest and even riots between various sections of the society.
User generated contents contains different types of opinions and expressions through written texts, images or videos. And such contents may also contain objectionable visuals or which is not favorable for many people viewing such contents.
Moderating the content on social media websites is important, as now such platforms are used to promote the business, products and brands and such spam contents can disappoint other customers and discourage them to either stay away from such platforms or minimize the use of social media sites.
The idea is to develop a filter which can be embedded in any chat system or social media to prevent cyberbullying, hate spread, obscenity etc.
We have came up with a AI based Content Moderation model which is deployed as an API and it can be easily used in any Web or Mobile based platforms.
For the image data, we are checking for any inappropriate text in the image or whether the image contains violence. The API checks the same for audio and video data aswell . Apart from images and videos of violence, It will also check for sexual or nudity related contents like, sexual activities, pornography, offensive signs, stripped images of people, especially females with revealing dresses and erotic gestures that are against the community of the chat platforms. In case of text data, along with checking for hatefull speech , the API will even check for any malicious links in the text in order to prevent cyberattacks such as phishing attacks. It will also check for spam messages/mails with circulates in online chat plaforms in the form of fake advertisements,self promotion etc.
- Languages : Python. We will use Python basically for everything, from Modeling to ETL
- AI/ML : Pytorch/Tensorflow(for most Deep Learning Tasks) and Scikit-Learn(Our go-to for most Non DL Tasks
- NLP : NLTK/spaCy for extracting features from the text data.
- OCR : For extracting Text from images, We rely on Tesseract OCR Engine.
- Speech Recognition : Analysing Audio input can be achieved through Speech to text Conversion. We have used Sphinx/Google Speech Recognition for the same.
- Version Control : Github is the best choice for any group project for code control and tracking.
- Additional Libraries: Pandas, Numpy, MatplotLib
- Google Colab for GPU support.
- Project Management: We used Notion and We find ourself using Notion for more than just project management and tracking. We were able to keep track of what Our Team Members do on a daily basis, make sure that we allocate time efficiently, and track what everyone on the team is up to.
- For Windows
- CLone the Repo.
git clone https://github.com/Slainteee/Jatayu-ContentModeration.git
- Change the working directory.
cd Jatayu-ContentModeration
- Create and activate Virtual Environment (Conda is preferred).
- Install the required packages.
pip install -r requirements.txt
-
Install OCR Engine: The project uses pytesseract OCR engine. To install pytesseract, Follow this
-
After succesfully installing pytesseract, set the tesseract path in the script. Add the following line in the app.py
pytesseract.pytesseract.tesseract_cmd = <tesseract-path-here>
The Project is ready to run on your local machineπ₯π₯
- For Windows
- Open 2 tabs of Command prompt
- Change the working directory
cd Jatayu-ContentModeration
- Activate virtual Environment(Conda is preferred).
- In first CMD tab
python app.py
It will start the API server. 5) In second CMD tab, change the directory to run the UI sample
cd Web-Frontend
- Create a folder named upload inside the static folder
cd static && mkdir uploads
- Return to the Web-Frontend Directory
cd ..
- Execute the server.py to start flask web server
python server.py
- Flask server will run on host='127.0.0.1' and port=1025
- Open any web browser(Chrome preferred) and type
127.0.0.1:1025
to see the live demo
Testing will ensure that the api is working as expected. The project uses unittest as a test framework. After setting up the basic test structure, Unnitest framework makes it really easy to write and provides a lot of flexibility for running the tests.
To Run the tests:
- Navigate to the project directory
cd Jatayu-ContentModeration
- Activate virtual Environment
- Run the test script
python test.py
Wanna know a secret, You can add more tests ππ. So go ahead and create a pull requestππ
βNo one and nothing is perfect, or we wouldn't have uniqueness.β <-- Random quote from google. You can find more here -->. βNo matter how good you get you can always get better, and that's the exciting part.β <- Another one π π ->. To Contribute to the project and make it better, follow the contributing guidelines for project specific details.
Khushhal Reddy |
Thanks goes to these wonderful people:
Khushhal Reddy π |
Ranjan Panda π |
Aryamaan Srivastava π£ |
This project follows the all-contributors specification. Contributions of any kind welcome!