-
I was trying to encode my 4+ million transactions for input into .fpgrowth() when I got an error at the pd.DataFrame constructor: ValueError: Shape of passed values is (4669117, 1), indices imply (4669117, 118669) Note: In my case .transform(...,sparse=True) is necessary because using sparse=False was trying to allocate a half-terabyte of RAM which I do not have. fitted = te.fit(itemSetList) te_ary = fitted.transform(itemSetList, sparse=True) df = pd.DataFrame(te_ary, columns=te.columns_) So question is, for purpose where I am going to input a dataframe into mlxtend's .fpgrowth(), what is correct syntax for this DataFrame constructor after calling .transform(...,sparse=True)? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I found the solution. You need to make a couple of small changes where you encode the transactions. #df = pd.DataFrame(te_ary, columns=te.columns_) fitted = te.fit(itemSetList) te_ary = fitted.transform(itemSetList, sparse=True) # seemed to work good df = pd.DataFrame.sparse.from_spmatrix(te_ary, columns=te.columns_) # seemed to work good |
Beta Was this translation helpful? Give feedback.
I found the solution. You need to make a couple of small changes where you encode the transactions.
`
#te_ary = te.fit(itemSetList).transform(itemSetList)
#df = pd.DataFrame(te_ary, columns=te.columns_)
fitted = te.fit(itemSetList)
te_ary = fitted.transform(itemSetList, sparse=True) # seemed to work good
df = pd.DataFrame.sparse.from_spmatrix(te_ary, columns=te.columns_) # seemed to work good
`
Now you can call mlxtend's fpgrowth() followed by association_rules().