-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type estimators #1542
Open
eddiebergman
wants to merge
105
commits into
development
Choose a base branch
from
type_estimators
base: development
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Type estimators #1542
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* only active if kernel == 'poly' * adapt the metadata to reflect this
* black checker * Simplified * add examples to black format check Co-authored-by: Matthias Feurer <[email protected]>
* re-structure manual and use 'collapse' * ADD link to auto-sklearn-talks * unifying titles * Clarify default memory and cpu usage * FIX sphinx_gallery to <=0.10.0 0.10.1 would raise an error for '-D plot_gallery=0' * Re-structure faq * FIX comments by mfeurer * boldface items * merge manual into FAQ * FIX minor * FIX typo * Update doc/faq.rst Co-authored-by: Eddie Bergman <[email protected]> * Update doc/faq.rst Co-authored-by: Eddie Bergman <[email protected]> * Update doc/faq.rst Co-authored-by: Eddie Bergman <[email protected]> * Update doc/faq.rst Co-authored-by: Eddie Bergman <[email protected]> * Update doc/manual.rst Co-authored-by: Eddie Bergman <[email protected]> * Update doc/manual.rst Co-authored-by: Eddie Bergman <[email protected]> * Update doc/faq.rst Co-authored-by: Eddie Bergman <[email protected]> * FIX link Co-authored-by: Eddie Bergman <[email protected]>
If you're only exposure to using... -> If your only exposure to using...
* np.bool deprecation * Invalid escape sequence \_ * Series specify dtype * drop na requires keyword args deprecation * unspecified np.int size deprecated, use int instead * deprecated unspeicifed np.int precision * Element wise comparison failed, will raise error in the future * Specify explicit dtype for empty series * metric warnings for mismatch between y_pred and y_true label count * Quantile transformer n_quantiles larger than n_samples warning ignored * Silenced convergence warnings * pass sklearn args as keywords * np.bool deprecation * Invalid escape sequence \_ * Series specify dtype * drop na requires keyword args deprecation * unspecified np.int size deprecated, use int instead * deprecated unspeicifed np.int precision * Element wise comparison failed, will raise error in the future * Specify explicit dtype for empty series * metric warnings for mismatch between y_pred and y_true label count * Quantile transformer n_quantiles larger than n_samples warning ignored * Silenced convergence warnings * pass sklearn args as keywords * flake8'd * flake8'd * Fixed CategoricalImputation not accounting for sparse matrices * Updated to use distro for linux distribution * Ignore convergence warnings for gaussian process regressor * Averaging metrics now use zero_division parameter * Readded scorers to module scope * flake8'd * Fix * Fixed dtype for metalearner no run * Catch gaussian process iterative fit warning * Moved ignored warnings to tests * Correctly type pd.Series * Revert back to usual iterative fit * Readded missing iteration increment * Removed odd backslash * Fixed imputer for sparse matrices * Ignore warnings we are aware about in tests * Flake'd: * Revert "Fixed imputer for sparse matrices" This reverts commit 05675ad. * Revert "Revert "Fixed imputer for sparse matrices"" This reverts commit d031b0d. * Back to default values * Reverted to default behaviour with comment * Added xfail test to document * flaked * Fixed test, moved to np.testing for assertion * Update autosklearn/pipeline/components/data_preprocessing/categorical_encoding/encoding.py Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: Matthias Feurer <[email protected]>
* Added manual dispatch to tests * Removed parameters to manual dispatch
…tors (#1332) * Update docstrings and types * doc typo fix * flake'd
* added python 3.10 to versions * Added quotes around versions * Trigger tests
* Add submodule * Port to abstract_ensemble, backend from automl_common * Updated workflow files * Update imports * Trigger actions * Another import fix * update import * m * Backend fixes * Backend parameter update * fixture fix for backend * Fix tests * readd old abstract ensemble for now * flake8'd * Added install from source to readme * Moved installation w.r.t submodules to the docs * Temporarily remove submodule * Readded submodule * Updated to use automl_common under autosklearn * Updated MANIFEST * Removed uneeded statements from MANIFEST * Fixed import * Fixed comment line in MANIFEST.in * Added automl_common/setup.py to MANIFEST * Added prefix to script * Re-added removed title # * Added note for submodule for CONTRIBUTING * Made the submodule step a bit more clear for contributing.md * CONTRIBUTING fixes
* Added versioning for sphinx, docutils - introduced by sphinxtoolbox * Fixed bug with config value for `plot_gallery` in doc makefile * Update linkcheck command as well
* Added ignored_warnings file * Use ignored_warnings file * Test regressors with 1d, 1d as 2d and 2d targets * Flake'd * Fix broken relative imports to ignore_warnings * Removed print and updated parameter type for tests * Type import fix
* Added random state to classifiers * Added some doc strings * Removed random_state again * flake'd * Fix some test issues * Re-added seed to test * Updated test doc for unknown test * flake'd
* Added ignored_warnings file * Use ignored_warnings file * Test regressors with 1d, 1d as 2d and 2d targets * Flake'd * Fix broken relative imports to ignore_warnings * Removed print and updated parameter type for tests * Added warning catches to fit methods in tests * Added more warning catches * Flake'd * Created top-level module to allow relativei imports * Deleted blank line in __init__ * Remove uneeded ignore warnings from tests * Fix bad indent * Fix github merge conflict editor whitespaces and indents
* update workflow files * typo fix * Update pytest * remove bad semi-colon * Fix test runner command * Remove explicit steps required from older version * Explicitly add Conda python to path for subprocess command in test * Fix the mypy compliance check * Added PEP 561 compliance * Add py.typed to MANIFEST for dist * Remove py.typed from setup.py
* rename OSX -> macOS as it is the new name rename OSX -> macOS as it is the new name for the operating system. e.g. see https://www.apple.com/macos * Update doc/installation.rst Co-authored-by: Matthias Feurer <[email protected]> * Update doc/installation.rst Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: Matthias Feurer <[email protected]>
…semble (#1321) * Changed show_models() function to return a dictionary of models in the ensemble instead of a string
* Remove flaky dep * Remove unused pytest import
* Fix: MLPRegressor tests * Fix: Ordering of statements in test * Fix: MLP n_calls
* Fix: Raises errors with the config * Add: Skip error for kernal_pca Seems kernel_pca emits the error: * `"zero-size array to reduction operation maximum which has no identity"` This is gotten on the line `max_eig = lambdas.max()` which makes me assume it emits a matrix with no real eigen values, not something we can really control for
…ures (#1250) * Moved to new splitter, moved to util file * flake8'd * Fixed errors, added test specifically for CustomStratifiedShuffleSplit * flake8'd * Updated docstring * Updated types in docstring * reduce_dataset_size_if_too_large supports more types * flake8'd * flake8'd * Updated docstring * Seperated out the data subsampling into individual functions * Improved typing from Automl.fit to reduce_dataset_size_if_too_large * flak8'd * subsample tested * Finished testing and flake8'd * Cleaned up transform function that was touched * ^ * Removed double typing * Cleaned up typing of convert_if_sparse * Cleaned up splitters and added size test * Cleanup doc in data * rogue line added was removed * Test fix * flake8'd * Typo fix * Fixed ordering of things * Fixed typing and tests of target_validator fit, transform, inv_transform * Updated doc * Updated Type return * Removed elif gaurd * removed extraneuous overload * Updated return type of feature validator * Type fixes for target validator fit * flake8'd * Moved to new splitter, moved to util file * flake8'd * Fixed errors, added test specifically for CustomStratifiedShuffleSplit * flake8'd * Updated docstring * Updated types in docstring * reduce_dataset_size_if_too_large supports more types * flake8'd * flake8'd * Updated docstring * Seperated out the data subsampling into individual functions * Improved typing from Automl.fit to reduce_dataset_size_if_too_large * flak8'd * subsample tested * Finished testing and flake8'd * Cleaned up transform function that was touched * ^ * Removed double typing * Cleaned up typing of convert_if_sparse * Cleaned up splitters and added size test * Cleanup doc in data * rogue line added was removed * Test fix * flake8'd * Typo fix * Fixed ordering of things * Fixed typing and tests of target_validator fit, transform, inv_transform * Updated doc * Updated Type return * Removed elif gaurd * removed extraneuous overload * Updated return type of feature validator * Type fixes for target validator fit * flake8'd * Fixed err message str and automl sparse y tests * Flak8'd * Fix sort indices * list type to List * Remove uneeded comment * Updated comment to make it more clear * Comment update * Fixed warning message for reduce_dataset_if_too_large * Fix test * Added check for error message in tests * Test Updates * Fix error msg * reinclude csr y to test * Reintroduced explicit subsample values test * flaked * Missed an uncomment * Update the comment for test of splitters * Updated warning message in CustomSplitter * Update comment in test * Update tests * Removed overloads * Narrowed type of subsample * Removed overload import * Fix `todense` giving np.matrix, using `toarray` * Made subsampling a little less aggresive * Changed multiplier back to 10 * Allow argument to specfiy how auto-sklearn handles compressing dataset size (#1341) * Added dataset_compression parameter and validation * Fix docstring * Updated docstring for `resampling_strategy` * Updated param def and memory_allocation can now be absolute * insert newline * Fix params into one line * fix indentation in docs * fix import breaks * Allow absolute memory_allocation * Tests * Update test on for precision omitted from methods * Update test for akslearn2 with same args * Update to use TypedDict for better Mypy parsing * Added arg to asklearn2 * Updated tests to remove some warnings * flaked * Fix broken link? * Remove TypedDict as it's not supported in Python3.7 * Missing import * Review changes * Fix magic mock for python < 3.9 * Fixed bad merge
* commit meta learning data bases * commit changed files * commit new files * fixed experimental settings * implemented last comments on old PR * adapted metalearning to last commit * add a text preprocessing example * intigrated feedback * new changes on *.csv files * reset changes * add changes for merging * add changes for merging * add changes for merging * try to merge * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * init * init * commit changes for text preprocessing * text prepreprocessing commit * fix metalearning * fix metalearning * adapted test to new text feature * fix style guide issues * integrate PR comments * integrate PR comments * implemented the comments to the last PR * fitted operation is not in place therefore we have to assgin the fitted self.preprocessor again to it self * add first text processing tests * add first text processing tests * including comments from 01.25. * including comments from 01.28. * including comments from 01.28. * including comments from 01.28. * including comments from 01.31.
* Init commit * Fix logging server cleanup (#1503) * Fix logging server cleanup * Add comment relating to the `try: finally:` * Remove nested try: except: from `fit` * Bump peter-evans/find-comment from 1 to 2 (#1520) Bumps [peter-evans/find-comment](https://github.com/peter-evans/find-comment) from 1 to 2. - [Release notes](https://github.com/peter-evans/find-comment/releases) - [Commits](peter-evans/find-comment@v1...v2) --- updated-dependencies: - dependency-name: peter-evans/find-comment dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/stale from 4 to 5 (#1521) Bumps [actions/stale](https://github.com/actions/stale) from 4 to 5. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](actions/stale@v4...v5) --- updated-dependencies: - dependency-name: actions/stale dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Init commit * Update evaluation module * Clean up other occurences of the word validation * Re-add test for test predictions Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add debug statements and 30s timeouts * Fix formatting * Update internal timeout param * +timeout, use allocated tmpdir * +timeout, use allocated tmpdir * Remove another occurence of explicit `tmp` * Increase timelimits once again * Remove incomplete comment
* Init commit * Fix DummyClassifiers in _load_pareto_set * Add test for dummy only in classifiers * Update no ensemble docstring * Add automl case where automl only has dummy * Remove tmp file * Fix `include` statement to be regressor
* Create PR * Update MLP regressor values
* Make docker file install from `setup.py` * Add pytest cache to gitignore * Up timeouts on test_metadata_generation
* Create PR * Fix test fixture
* Bump docker/build-push-action from 1 to 3 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 1 to 3. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v1...v3) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> * Update docker-publish.yml Replace password by token Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias Feurer <[email protected]>
* Create PR * Abstract out dask client types * Fix _ issue * Extend scope of dask_client in automl.py * Add docstring to dask module * Indent result addition * Add basic tests for Dask wrappers
eddiebergman
force-pushed
the
type_estimators
branch
from
July 16, 2022 22:08
a921117
to
857a8c6
Compare
eddiebergman
force-pushed
the
development
branch
from
August 18, 2022 18:14
d813838
to
259ed3d
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a pretty big PR aimed at doing a simple thing, remove
estimators.py
andautoml.py
from the mypy ignore list. In Progress, notes on changes are TODO. I'll resolve conflicts once v0.15 is out. I can also split this into multiple smaller PRS to make it easier if needed.Tests still need to be updated to accomodate changes.
There were 168 typing errors :) Some of them were actual possible bugs based on order of things being called and parameters set.
Major points:
AutoML
anABC
. This removes some failure cases such as when task type is not defined or is_classification is misspecified, see Tests for theAutoML
class relying onis_classification=false
even when it is a classificaiton task, crash when corrected #1212.AutoML
now relies on its subclasses specfying things which was also simplified greatly with just class variables.is_classification=True/False
since those parameters were removed as well.Made the
AutoSklearnEstimator
smarter with respect to types in a similar fashion, notably it's smarter around what it retunrns through the use of aGeneric
in the main class and providing those types in the subclass. This mainly means that code editors will know ifpredict_proba
will be available or not and thatfit
will return the the right estimator and not just the abtractAutoSklearnEstimator
.These are then specified in the subclass as
fit
are now wrapped in a property, ie.self._logger
orself._task
and raising aNotFittedError
as sklearn would. This is because using them in other methods would correctly warn something like "self._task could be "None"
if trying to call methods relying onfit
to have been called first.transform
,get_cost_of_crash
smarter with@overload
, it now knows it's return type correctly based on type of input.Run
pyupgrade
on a few files I touchedSimplified the datacompression things into a class, the typing caught some weirdness when datacompression was on but memory_limit wasn't set.