【摘 要】
:
Using data-driven algorithms to accurately forecast solar flares requires reliable data sets.The solar flare dataset is composed of many non-flaring samples with a small percentage of flaring samples.This is called the class imbalance problem in data mini
【机 构】
:
Laboratory for Space Environment and Physical Sciences,Harbin Institute of Technology,Harbin 150001,
论文部分内容阅读
Using data-driven algorithms to accurately forecast solar flares requires reliable data sets.The solar flare dataset is composed of many non-flaring samples with a small percentage of flaring samples.This is called the class imbalance problem in data mining tasks.The prediction model is sensitive to most classes of the original data set during training.Therefore,the class imbalance problem for building up the flare prediction model from observational data should be systematically discussed.Aiming at the problem of class imbalance,three strategies are proposed corresponding to the data set,loss function,and training process:Type Ⅰ resamples the training samples,including oversampling for the minority class,undersampling,or mixed sampling for the majority class.Type Ⅱ usually changes the decision-making boundary,assigning the majority and minority categories of prediction loss to different weights.Type Ⅲassigns different weights to the training samples,the majority categories are assigned smaller weights,and the minority categories are assigned larger weights to improve the training process of the prediction model.The main work of this paper compares these imbalance processing methods when building a flare prediction model and tries to find the optimal strategy.Our results show that among these strategies,the performance of oversampling and sample weighting is better than other strategies in most parameters,and the generality of resampling and changing the decision boundary is better.
其他文献
We investigate relations in the emission properties as revealed by drifting subpulses detected at different observing frequencies based on the method that incorporates the rotating carousels in pulsar magnetospheres of multiple emission states.An emission
PSR J1946 + 3417 is a millisecond pulsar (MSP) with a spin period P ≈ 3.17 ms.Harbored in a binary with an orbital period Pb ≈ 27 days,the MSP is accompanied by a white dwarf (WD).The masses of the MSP and the WD were determined to be 1.83 M⊙ and 0.266 M⊙
Spectrum denoising is an important procedure for large-scale spectroscopical surveys.This work proposes a novel stellar spectrum denoising method based on deep Bayesian modeling.The construction of our model includes a prior distribution for each stellar
Ellerman bombs (EBs) and ultraviolet (UV) bursts are common brightening phenomena,which are usually generated in the low solar atmosphere of emerging flux regions.In this paper,we have investigated the emergence of an initial un-twisted magnetic flux rope
Flat-field reflects the non-uniformity of the photometric response at the focal plane of an instrument,which uses digital image sensors,such as Charge Coupled Device (CCD) and Complementary Metal-Oxide-Semiconductor (CMOS).This non-uniformity must be corr
PolarLight is a space-borne X-ray polarimeter that measures the X-ray polarization via electron tracking in an ionization chamber.It is a collimated instrument and thus suffers from the background on the whole detector plane.The majority of background eve
This paper presents the results of Hα imaging of 169 galaxies randomly selected from the α.40-SDSS catalog.The sample has excluded all low surface brightness galaxies (LSBGs) whose central surface brightness in B band (μ0(B)) fainter than 22.5 mag arcsec-
The lower solar atmosphere is a gravitationally stratified layer of partially ionized plasma.We calculate the electric resistivity in the solar photosphere and chromosphere,which is the key parameter that controls the rate of magnetic reconnection in a Sw
Multi-band photometry and light curve analysis for two newly recognized contact binary systems,TYC 6995-813-1 and NSVS 13602901,are presented.Both were found to be of extreme low mass ratios 0.11 and 0.17,respectively.The secondary components of both syst
Forbush decrease (FD),discovered by Scott E.Forbush about 80 years ago,is referred to as the non-repetitive short-term depression in Galactic cosmic ray (GCR) flux,presumed to be associated with large-scale perturbations in solar wind and interplanetary m