Data mining techniques applied on census data for obtaining recommendations. Employed Hadoop filesystem to store the big dataset, devised Map Reduce framework on High performance computing cluster using Apache spark Machine learning library. Core algorithm concepts are employed for data preprocessing and wrangling. Refer to the Project document for more details.
The potential benefits derived from association rule mining are:
- Employment status of the entire population in United states.
- Education levels of normal US citizen.
- Taxable income amount range for individuals.
- Female Entrepreneurs in United states
- Most of the united states population works in the private sector. Among the working population most of them are male.
- Most of the US citizens are High school graduates.
- Taxable income is less than $50000.
- Female Entrepreneurs are taxable less than $50000.
Based on our findings we recommend The US government to focus on Education for citizens and reduce the taxes to encourage more women in Business.
http://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html