This project develops specialized machine learning models to predict reaction energies for heterogeneous catalysis, focusing on surface-mediated reactions. The models are designed to handle different types of reactions separately, recognizing that different reaction mechanisms may require different prediction approaches.
The dataset contains 3,269 chemical reactions from the CatApp database, including:
- Reaction energies
- Surface information (metal type, facet)
- Reactant and product species
- Reference information
- Adsorption Reactions: 1,794 reactions (55%)
- Simple adsorption (e.g., CO → CO*)
- Dissociative adsorption (e.g., H2 → 2H*)
- Surface Reactions: 760 reactions (23%)
- Reactions between adsorbed species
- Surface-mediated transformations
- Other Reactions: 715 reactions (22%)
- Complex mechanisms
- Multiple step reactions
- Algorithm: Gradient Boosting Regressor
- Performance:
- R² Score: 0.67
- RMSE: 1.10 eV
- Predictions within ±0.5 eV: 63.7%
- Predictions within ±1.0 eV: 86.3%
- Key Features:
- Surface electronic properties
- Molecular complexity descriptors
- Adsorption site information
- Algorithm: Random Forest Regressor
- Performance:
- R² Score: 0.85
- RMSE: 0.69 eV
- Predictions within ±0.5 eV: 54.4%
- Predictions within ±1.0 eV: 88.6%
- Key Features:
- Reactant-surface interactions
- Bond formation/breaking descriptors
- Surface geometry information
- Algorithm: Random Forest Regressor
- Performance:
- R² Score: 0.22
- RMSE: 0.94 eV
- Predictions within ±0.5 eV: 66.7%
- Predictions within ±1.0 eV: 87.0%
- Key Features:
- Complex reaction descriptors
- Multiple step indicators
- Combined interaction terms
- Element-wise composition analysis
- Molecular complexity scores
- Surface binding indicators (*)
- Molecular size descriptors
- Metal type encoding
- Facet information
- Alloy composition analysis
- Electronic properties:
- Electronegativity
- Atomic radius
- Surface energy (where available)
- Surface-adsorbate interactions
- Reactant-reactant coupling
- Metal-facet correlations
- Alloy-specific descriptors
-
Surface Reactions show highest predictability (R² = 0.85)
- Well-defined reaction mechanisms
- Consistent surface interactions
- Clear structure-property relationships
-
Adsorption Reactions show good reliability (R² = 0.67)
- Simple mechanisms are highly predictable
- Alloy surfaces introduce complexity
- Surface structure effects are significant
-
Other Reactions need improvement (R² = 0.22)
- Complex mechanisms reduce predictability
- Multiple steps increase uncertainty
- Need for more specialized descriptors
-
Simple Metal Surfaces:
- High prediction accuracy
- Consistent behavior
- Well-understood mechanisms
-
Alloy Surfaces:
- Increased prediction errors
- Complex electronic effects
- Composition-dependent behavior
-
Facet Effects:
- (211) facets show higher variability
- Structure sensitivity varies by reaction
- Surface reconstruction effects
- CO on Pd3Sb(111): 13.79 eV error
- Possible electronic structure complexity
- Surface reconstruction effects
- Limited training data for similar systems
- N2 formation on Re surfaces: ~2.46 eV error
- Complex electronic structure
- Strong correlation effects
- Multiple reaction pathways
- OH* + H2 → H2O on Pd alloys: ~6.13 eV error
- Complex reaction mechanism
- Multiple interaction sites
- Electronic structure effects
- Implement hierarchical modeling approaches
- Develop metal-specific sub-models
- Add ensemble methods for robust predictions
- Include uncertainty quantification
- Add electronic structure descriptors
- Develop reaction mechanism indicators
- Include surface reconstruction effects
- Add thermodynamic descriptors
- Gather more data for challenging cases
- Balance dataset across reaction types
- Include more diverse surface structures
- Add experimental validation data
- Rapid screening of new catalysts
- Property prediction for novel materials
- Optimization of surface composition
- Reaction condition optimization
- Catalyst stability prediction
- Process efficiency improvement
- Mechanism investigation
- Structure-property relationships
- Design principle development
- Python 3.8+
- scikit-learn
- pandas
- numpy
models.py
: Model implementationsutils.py
: Utility functionstrain.py
: Training pipelinemodel_base.py
: Base model class
- Implement deep learning approaches
- Add molecular fingerprinting
- Develop automated feature selection
- Include temperature and pressure effects
- Add reaction path analysis
- CatApp database
- Relevant publications
- Method documentation # CatalystML