-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproducing results paper #15
Comments
Thanks for asking the question. The result in the paper is obtained using the default parameter in the repo on a AWS g3.xlarge machine. There are 3 sources for the difference between experiments (and the sensitivity of RL training tends to amplify it): But the difference you saw are larger than the standard deviation in my experiments, so I would also like to investigate it. I am working on an update to fix (1) and (2) to make experiments more determinisitc. For (3), may I know the machine configuration your are using? In README, I attached a picture of the learning curve of one run that reached 72.35% dev accuracy on WikiSQL. If it helps, I can also share with you the full tensorboard log and the saved best model from some more recent experiment. |
Thanks for your quick response! I used a AWS g3.xlarge. I tried multiple times but I do get consistent results around 70.3. |
Thanks for the input. I will try starting some new AWS instances to see if I can replicate the issue. In the meantime, here's a link to the data of a recent run that reached 72.2% dev accuracy. The tensorboard log is in the |
Thanks, I'd love to find out where the difference originates from. I downloaded the repo again to make sure I did not make any changes and ran again, but reached the same result. The only thing I had to change to make it work is replacing (line 70 table/utils.py) : try:
val = babel.numbers.parse_decimal(val)
except (babel.numbers.NumberFormatError):
val = val.lower() with try:
val = babel.numbers.parse_decimal(val)
except (babel.numbers.NumberFormatError, UnicodeEncodeError):
val = val.lower() Due to errors like this Do you think that might be the reason? And if so, do you have any idea how to prevent catching those errors? |
Sorry for the late reply. I have added your change into the codebase and rerun the experiments on two new AWS instances. The mean and std from 3 experiments (each averages 5 runs) are 71.92+-0.21%, 71.97+-0.17%, 71.93+-0.38%. You can also download all the data for these 3 experiments here 1 2 3. I am also curious about the reason of the difference. I have added a new branch named Thanks. |
Hi! i ran the experiments again in the fix_randomization branch, but it did not result in different results (still around 70%). Did you re-download the data before running the experiments? I cannot think of any other form of randomness at this point but the difference is quite consistent. |
Oke, I finally found the source of the difference. I used a newer version of the Deep Learning AMI in AWS, i ran the experiments with v10 now and got the same results (around 71.7) . Would be interesting to know which operations are changed. |
Thanks for reporting this and for running the experiments to confirm it! That's interesting. I would also like to look into this. So what is the newer version of Deep Learning AMI you used, is it |
Hi there :-) I'm trying to replicate the results of WikiTableQuestions. I tried Tensorflow v1.12.0 (Deep Learning AMI 21.0) and v1.8.0 (Deep Learning AMI 10.0). The corresponding accuracies are 41.12% for v1.12.0 and 43.27% for v1.8.0. It looks like the difference is because of Tensorflow version. Also, is the current settings in |
Hi, thanks for the information :) I will run some experiments to compare TF v1.12.0 vs v1.8.0. The current setting in The |
As an update, I have created a branch This setting gets slightly lower results on WikiTable (41.51+-0.19% dev accuracy, 42.78+-0.77% test accuracy). Below is the command to reproduce the experiments (after pulling the latest version of the repo):
|
Can you add more details about dataset preprocessing? For example, how to generate the |
Where do you get |
@dungtn Here's a detailed summary created by another researcher on how to replicate the preprocessing and experiments starting from the raw WikiTableQuestions dataset and how to adapt the code to other similar datasets. Also added the link to this summary into the readme. @guotong1988 Unfortunately I don't remember where exactly I got the list of |
Hi!,
I was playing with your code, great work! I am trying to reproduce the results from your paper on WikiSQL. However, when using run.sh I get results in the 70.3 ballpark (on dev set) instead of the reported 72.2%. Are there any parameters I need to change to get the reported results?
Thanks in advance!
The text was updated successfully, but these errors were encountered: