A publishing partnership

Carbon Stars Identified from LAMOST DR4 Using Machine Learning

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , and

Published 2018 February 5 © 2018. The American Astronomical Society. All rights reserved.
, , Citation Yin-Bi Li et al 2018 ApJS 234 31 DOI 10.3847/1538-4365/aaa415

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/234/2/31

Abstract

In this work, we present a catalog of 2651 carbon stars from the fourth Data Release (DR4) of the Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST). Using an efficient machine-learning algorithm, we find these stars from more than 7 million spectra. As a by-product, 17 carbon-enhanced metal-poor turnoff star candidates are also reported in this paper, and they are preliminarily identified by their atmospheric parameters. Except for 176 stars that could not be given spectral types, we classify the other 2475 carbon stars into five subtypes: 864 C-H, 226 C-R, 400 C-J, 266 C-N, and 719 barium stars based on a series of spectral features. Furthermore, we divide the C-J stars into three subtypes, C-J(H), C-J(R), and C-J(N), and about 90% of them are cool N-type stars as expected from previous literature. Besides spectroscopic classification, we also match these carbon stars to multiple broadband photometries. Using ultraviolet photometry data, we find that 25 carbon stars have FUV detections and that they are likely to be in binary systems with compact white dwarf companions.

Export citation and abstract BibTeX RIS

1. Introduction

Carbon stars were first discovered by Secchi (1869), who loosely defined these stars as those with optical spectra dominated by carbon molecular bands, such as CN, CH, or the Swan bands of C2. They were initially considered to be asymptotic giant branch (AGB) stars because the carbon element in their atmospheres can only be dredged to the surface during the third dredge-up process. Researchers did not realize that an extrinsic process can also produce carbon stars until the first main-sequence carbon star (dC) G77-61 was discovered by Dahn et al. (1977). Dearborn et al. (1986) pointed out that G77-61 is a single-line binary and has a cool optical invisible white dwarf companion. Such binary theories hypothesize that carbon-rich material may be lost through the stellar wind from a thermally pulsing AGB star and accreted by its companion main-sequence star (Han et al. 1995).

Numerous abundance studies of metal-poor stars (Beers et al. 1985, 1992; Wisotzki et al. 1996; Christlieb et al. 2001; Christlieb 2003) have discovered carbon-enhanced metal-poor (CEMP) stars, which were originally defined as stars with metallicity $[\mathrm{Fe}/{\rm{H}}]\leqslant \,-1.0$ and $[{\rm{C}}/\mathrm{Fe}]\geqslant 1.0$ (Beers & Christlieb 2005). Among the CEMP stars, main-sequence turnoff stars are expected to be of particular importance because they might preserve on their surfaces pure material accreted from their AGB companions, which can be used to investigate the production efficiency of carbon and neutron-capture elements of AGB stars.

1.1. Types of Carbon Stars

Keenan (1993) proposed the Morgan–Keenan (MK) classification system to divide carbon stars into seven types based on significant spectral features, and it was widely applied to the subsequent classification of medium- or low-resolution spectra of carbon stars (Wallerstein & Knapp 1998; Barnbaum et al. 1996; Goswami 2005; Goswami et al. 2010; Lloyd Evans 2010). Barnbaum et al. (1996) revised the MK classification criteria, which are listed in Table 1. We adopt this revised MK system to classify carbon stars into five types, i.e., C-H, C-R, C-J, C-N, and barium stars, in this paper.

Table 1.  Classification Criteria of Carbon Stars

Subtype Criteria
C-H (1) Prominence of the secondary P-branch head near 4342 Å
  (2) Strong CH band
  (3) Hβ and Ba ii at 4554 Å are clearly noticeable
  (4) Hα and Ba ii at 6496 Å are noticeable
  (5) Blend feature of Na i D1 and Na i D2 is not distinguishable
  (6) Ca i at 4226 Å is marginally noticeable
C-R (1) Strong Ca i at 4226 Å
  (2) Na i D1 and Na i D2 blended lines have two distinct dips
  (3) Weak Hβ and Ba ii at 4554 Å blended with atomic and molecular lines
  (4) Weak Hα and Ba ii at 6496 Å blended with the CN bands around 6500 Å
C-N (1) No flux at $\lambda \lt 4400\,\mathring{\rm A} $; some very late-type C-N can be flat even at $\lambda \lt 5000\,\mathring{\rm A} $
  (2) Strong Ba ii at 6496 Å
  (3) Weak Hα and isotopic C bands
C-J (1) A high isotope ratio of 13C to 12C with j index ≥4
Ba (1) Strong lines of s-process elements, particularly Ba ii at 4554 Å and Sr ii at 4077 Å

Download table as:  ASCIITypeset image

C-H stars are known to be binaries with a compact white dwarf companion (McClure 1983, 1984; McClure & Woodsworth 1990), and they can be recognized from the dominance of the secondary P-branch head near 4342 Å as listed in Table 1. C-R stars were once considered to be binaries, but are now taken to be single stars coming from binary mergers (Izzard et al. 2007; Domínguez et al. 2010). They can be identified by the almost equal depths of the Ca i line at 4226 Å and the CN band at 4215 Å (Goswami 2005; Goswami et al. 2010). C-N stars are AGB stars, and can be distinguished by the predominant feature of a blue depression often nearly obliterating the spectrum below 4400 Å (Barnbaum et al. 1996; Goswami 2005; Goswami et al. 2010). Like the C-R stars, the exact evolutionary stage and nature of C-J stars are still unclear (Abia et al. 2003). Their spectra possess unusually strong isotopic carbon bands, and are clearly recognized by having a j index larger than 4 (De Mello et al. 2009). Barium stars are red giants showing strong spectral lines of s-process elements, especially Ba ii at 4554 and 6496 Å, and Sr ii at 4077 Å, which make them easily recognizable (de Castro et al. 2016).

1.2. Carbon Star Catalogs in the Literature

As we know, carbon-enhanced stars play important roles in understanding the evolution of the Galaxy, but the number of these stars with detailed chemical abundances studies is small. To date, a series of work has been conducted to search for Galactic carbon stars, and we summarize them in Table 2.

Table 2.  Carbon Stars Reported in Previous Literature

Literature Total Numbera if_MK_Sptypeb C-H C-R Ba C-J C-N Unknown
Alksnis et al. (2001) 6891 no
Christlieb et al. (2001) 403 no
Margon et al. (2002) 39 no
Downes et al. (2004) 251 no
Green (2013) 1220 no
Si et al. (2014) 260 no
Si et al. (2015) 183 yes 69 66 4 33 10
Ji et al. (2016) 894 yes 339 259 108 82

Notes.

aThe total number of carbon stars reported in each literature. bThis flag specifies whether spectral types were given using the MK classification system.

Download table as:  ASCIITypeset image

In 2001, Alksnis et al. (2001) published a revised catalog containing 6891 carbon stars, which is based on a collection of journal articles (Stephenson 1973, 1989). Christlieb et al. (2001) presented a sample of 403 faint high-latitude carbon (FHLC) stars from the digitized objective prism plates of the Hamburg/ESO Survey (HES). From the years 2002 to 2013, carbon stars were systematically searched from the Sloan Digital Sky Survey (SDSS). Margon et al. (2002), Downes et al. (2004), and Green (2013) respectively found 39, 251, and 1220 FHLC stars. In addition, Si et al. (2014) was the first to apply a machine-learning algorithm to SDSS DR8 spectra and discovered 260 new carbon stars. The Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST; Wang et al. 1996; Su & Cui 2004; Cui et al. 2012; Luo et al. 2012; Zhao et al. 2012) began to release data from the year 2011, and new carbon stars were reported from the massive number of LAMOST spectra. Si et al. (2015) applied a manifold ranking algorithm to Pilot survey data and obtained 183 new carbon stars. Ji et al. (2016) identified 894 carbon stars with multiple spectral line indices from LAMOST DR2s.

1.3. The Rank-based PU Learning Algorithm

LAMOST has obtained 7,725,624 spectra in the first four years of regular survey, and searching for carbon stars from such a massive data set is our main aim. Since carbon stars are extremely rare, it is impossible to manually seek a small number of carbon stars from the massive data. We turned machine-learning methods, and finally adopted the Bagging TopPush algorithm (Du et al. 2016) to retrieve carbon stars from LAMOST DR4. It is a supervised rank-based PU learning algorithm and needs positive and negative samples to train the rank model. In our work, carbon stars reported by previous works can be used as a positive sample set (P), and the massive LAMOST data can be treated as unlabeled samples (U). The Bagging TopPush algorithm randomly selects negative samples from the set U and develops models using the positive and negative samples at first, ranking the positive samples ahead of the selected negative samples. Then, the developed model calculates the scores for all unlabeled samples in set U and rank them in descending order by their scores. In this ranked unlabeled sample list, the algorithm should rank carbon stars ahead of other objects.

This paper is organized as follows. In Section 2, we introduce the steps to select positive samples and the method to cluster them. In Section 3, we first analyze the effect of a spectral preprocessing method on algorithm performance and then determine the spectral preprocessing method in Section 3.1. Then, we search for carbon stars from LAMOST DR4, and roughly estimate the completeness and contamination in Section 3.2. Next, we analyze the classification results of our carbon stars given by the LAMOST 1D pipeline in Section 3.3, and compare our stars with two previous catalogs in Section 3.4. In the end, we also find 17 carbon-enhanced metal-poor turnoff (CEMPTO) stars, and preliminarily verify their nature as such by measuring their atmospheric parameters from low-resolution spectra in Section 3.5. In Section 4, we classify carbon stars into five types, and analyze their spatial and magnitude distributions. In Section 5, we investigate carbon stars using photometric data, for example, the distributions of ultraviolet, optical, and infrared magnitudes. Finally, a brief summary is given in Section 6.

2. Positive Samples

Du et al. (2016) investigated in detail the performance of six widely used machine-learning algorithms, and pointed out that the Bagging TopPush algorithm has the best performance, lowest computation complexity, and is least CPU time consuming. Three parameters of the Bagging TopPush algorithm need to be determined before use, and Du et al. (2016) studied their value ranges by comparing the performance of the algorithm under different parameter values. Here, we use the Du et al. parameters and only need to construct the positive sample set P.

2.1. Positive Sample Selection

In this subsection, we select positive samples from previous literature that provide catalogs of carbon stars from SDSS and LAMOST. Margon et al. (2002) and Downes et al. (2004) respectively reported 39 and 251 FHLCs from the commissioning data and the First Data Release (DR1) of SDSS, and Green (2013) found 1220 FHLCs from the Seventh Data Release (DR7) of SDSS, which is about five times greater than previously found. Furthermore, Si et al. (2015) found 183 carbon stars from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) pilot survey—158 of them reported for the first time—and is the first research work on carbon stars using LAMOST spectra.

We obtain the equatorial coordinates of the SDSS carbon stars reported in the above three publications from the SIMBAD Astronomical Database10 and download their spectra from the Web site of SDSS DR12.11 For stars with multiple observations, after submitting their coordinates to DR12, we only download the spectrum with the highest signal-to-noise ratio (S/N) for each star. For 183 LAMOST carbon stars, we download their spectra from the web site of LAMOST DR1.12 In total, we obtain 1682 carbon star spectra from SDSS and the LAMOST survey, and we further select positive samples through the following four steps, with the numbers in the parentheses indicating the amount of stars left after each step.

  • 1.  
    For stars observed several times, we only retain the spectrum with the highest S/N and remove other repeated spectra. Through this step, we remove 234 SDSS spectra (1448).
  • 2.  
    In our experience, the S/Ns of the blue and red ends in an SDSS or LAMOST spectrum are generally low, thus we restrict our spectra to begin at 3920 Å and to have 3590 wavelength points. There exists a few spectra that have different wavelength ranges from our limitation, thus we remove 13 spectra, which start from wavelengths larger than 3920 Å (1435).
  • 3.  
    For SDSS spectra, there may be no flux information within a fraction of a wavelength. We individually check all SDSS carbon star spectra and remove such spectra. For LAMOST spectra, they may have zero fluxes, which were artificially set to represent negative fluxes, may have low quality in the red and blue overlaps in the range of [5700 Å, 6000 Å], and may exhibit the water line around 7600 Å. We also check all LAMOST spectra and rule them out. In this step, 52 SDSS spectra and 49 LAMOST spectra are removed (1334).
  • 4.  
    For a fraction of carbon stars, their entire spectra are contaminated by considerable noise, and they have extremely low S/Ns. When we manually inspect them, we find that it is difficult for us to identify them as carbon stars, thus we exclude these spectra. Through this step, seven LAMOST spectra and 277 SDSS spectra are removed (1050).

After these four steps, we obtain a total of 127 LAMOST spectra and 923 SDSS spectra, and treat them as positive samples to construct the positive sample set P. In Appendix A, we show the S/N distributions of the 1334 spectra left after step three and 1050 positive sample spectra. In addition, we also preliminarily analyze the effect of these extreme low S/N spectra on algorithm performance.

2.2. Positive Sample Clustering

The number of cool C-N and C-J stars in our 1050 positive samples is much smaller than that of other star types. In this case, a fraction of cool carbon stars in the massive LAMOST data are difficult to rank into the top K results if we use the Bagging TopPush method. Thus, we need to divide the 1050 positive samples into different clusters and retrieve the carbon stars from these clusters. Furthermore, carbon stars can be classified into five types using the revised MK classification system (Barnbaum et al. 1996) as mentioned in the introduction. But, the MK types of the 1050 samples have not been provided in the literature, and it is necessary for us to cluster positive samples first and then to search for carbon stars of these types. We adopt the unsupervised K-mean cluster algorithm (Seber 1984; Spath 1985) to classify the 1050 positive samples, but we do not know how many groups can be classified in advance. Thus, we adopt the commonly used approach to classify positive samples into tens or even hundreds of groups at first, and then combine groups with high similarities. Initially, the 1050 samples are clustered into 50 groups. In this step, we randomly select 50 positive samples as the initial cluster centers, and several test experiments show that the classification results are stable and do not depend on the selection of the initial cluster centers. Then, we estimate the cosine distance of the cluster center spectra of each of the two groups; the number density distribution and contour plot of these cosine distances are depicted in Figure 1.

Figure 1.

Figure 1. (Left) Number density distribution of the cosine distances of each of the two templates. (Right) Contour plot of these distances.

Standard image High-resolution image

From this figure, we manually selected nine outlier groups, shown by the lighter colors, and then we divide our 50 groups of positive samples into two groups. Forty-one groups form the first cluster (C_I), and the nine outlier groups form the second cluster (C_II). We calculate the inner average cosine distances of C_I and C_II (D_I and D_II), and also their outer distance (D_I_II). D_I and D_II are respectively 0.0082 and 0.0324, and D_I_II is 0.0467. We found that D_I_II is not much larger than D_II, and the inner distance D_II is too large. Thus, we try to divide C_II into two clusters (C_II and C_III), still using the K-mean method, and estimate the inner and outer distances. The inner average distances of the three clusters are 0.0088(D_I), 0.0081(D_II), and 0.0165(D_III), respectively, and their outer distances are 0.0613 (D_I_II), 0.0911 (D_I_III), and 0.2585 (D_II_III). We can see that their inner distances are much less than their outer distances if we divide the positive samples into three clusters, which ensures that their inner distances are as small as possible and their outer distances are as large as possible. Thus, we determine that our 1050 positive samples can be divided into three clusters. There are a total of 743 positive samples classified as C_I, 278 positive samples belonging to C_II, and 29 positive samples classified as C_III. Figure 2 shows the spectra of the cluster centers with normalized fluxes.

Figure 2.

Figure 2. Cluster center spectra of positive samples for the three types. The horizontal axis is the wavelength, and the vertical axis is the normalized flux.

Standard image High-resolution image

Such a classification result is different from the spectral subtypes of carbon stars (five types), but it is understandable. The C-H, C-R, and barium stars lie in a smaller color range, which indicates that they have similar effective temperatures and spectral types. Thus, their main spectral features are similar, for example, the continuum and carbon molecular band features. As mentioned in the abstract, the method to distinguish them mainly depends on the spectral line features. For example, Ba stars can be distinguished from C-R stars based on the strong spectral lines of Ba ii at 4554 Å and Sr ii at 4077 Å. Such subtle features would be ignored by similarity measures, which only focus on main features. For cool C-J and C-N stars, the same problem exists. The difference between them only lies in the strong isotopic carbon line and band, thus they will be clustered into the same subtype with a similar continuum and strong CN bands. Thus, the similarity measure will put different spectral subtypes into the same groups depending on the main spectral features, and will ignore the subtle features that may be useful for the classification of C-H, C-R, C-J, C-N, and Ba stars.

In the next section, we will retrieve carbon stars with the three types of positive samples, and combine their results.

3. LAMOST Carbon Stars

3.1. Determination of the Spectral Preprocessing Method

Before using the Bagging TopPush algorithm to retrieve carbon stars, preprocessing methods for spectra are investigated to determine both the effect of spectral preprocessing methods on algorithm performance and the method used in this paper. For positive samples of each group, we adopt eight methods, which are detailed in Appendix B, to preprocess the spectra. For each preprocessing method, we execute a retrieving experiment using the Bagging TopPush method (Du et al. 2016), and compare the experiment results. There are two approaches to displaying the experiment results: one is the relationship of recall and K, and the other is the relation between precision and K. Recall that (n/N) is the fraction of retrieved carbon stars (n) within all carbon stars of the unlabeled sample set U (N), and the precision (n/K) represents the proportion of retrieved carbon stars (n) within the top K ranking samples (K) in descending order.

The experiment results are shown in Figures 35, respectively. The left panels in the three figures are the relationships of recall and K, and the right panels are the variation of precision with K. In these figures, the eight color polylines represent eight preprocessing methods: "nmap," "nunit," "nmap + pca," "nunit + pca," "nmap + subcon," "nunit + subcon," "nmap + pca + subcon," and "nunit + pca + subcon." Among these, "nmap" and "nunit" are two spectral normalization methods, while "subcon" and "pca" stand for the continuum subtraction and principal component analysis (PCA) method, respectively.

Figure 3.

Figure 3. The left panel is the relationship between recall and K for positive samples of type I, and the right panel is the relationship between precision and K, where K is the number of ranking results returned. The eight color lines in the two subplots represent the eight spectral preprocessing methods.

Standard image High-resolution image
Figure 4.

Figure 4. Same as Figure 3, but for positive samples of type II.

Standard image High-resolution image
Figure 5.

Figure 5. Same as Figure 3 but for positive samples of type III.

Standard image High-resolution image

The figures tell us that the preprocessing approach "nmap + pca," shown in purple solid polylines, has both the highest recall and precision over the others. Thus, in this paper, we conspicuously employ the combination of "nmap" plus "pca" as our preprocessing method for all spectra. The symbol "nmap + pca" means that we use the "nmap" method to normalize the spectra and apply the PCA method to reduce the data dimension.

Because we cannot know definitively the number of carbon stars in the massive LAMOST DR4 data set, it is impossible to accurately estimate the completeness and contamination rates of our method. However, we can roughly evaluate them using above preprocessing experiments. For the three groups of positive samples, the recall of the "nmap + pca" method at K = 5000 (r1, r2, and r3) is respectively 99.54%, 99.78%, and 100%, and the precision at K = 5000 (p1, p2, and p3) is respectively 7.39%, 2.77%, and 0.28%. As a result, the recall can be roughly treated as the completeness rates for the three groups of positive samples, and $1-p1$, $1-p2$, and $1-p3$ are their contamination rates, respectively. We can see that the completeness at K = 5000 is extremely high; meanwhile, the contamination at the K = 5000 is also very high. However, for the retrieval of carbon stars from the massive spectral data set, we are more concerned with the completeness and hardly care about the contamination. This is because subsequent works can help us decrease the contamination rate, for example, through repeat retrieval and manual inspection and identification.

3.2. Carbon Stars from LAMOST DR4

In this subsection, we adopt the Bagging TopPush approach (Du et al. 2016) to retrieve carbon stars from LAMOST DR4. During the first four years of the LAMOST regular survey, 20% of the nights in each month were used for instrument tests, and spectra obtained in such test nights were not published. In order to find more carbon stars, we use both released and test night spectra. Taking into account the computer memory capacity, we divide all LAMOST spectra into 24 groups, and there are 500,000 spectra included in each group. We use the positive sample set P1 to retrieve carbon stars from each group of 500,000 LAMOST spectra, then return the top 5000 spectra from the ranked results. From 24 groups of LAMOST spectra, we obtain 120,000 preliminary candidates. We then continue to use the P1 set to retrieve carbon stars from the 120,000 samples, and output the top 20,000 spectra for automated identification. For the other two positive sample sets P2 and P3, which have 278 and 29 spectra respectively, we also perform a similar procedure. Finally, we combine the results of the three groups of positive samples together, and obtain a total of 60,000 spectra, which need visual inspection.

After a manual check, we find over 3000 carbon star spectra. Since a fraction of objects were observed several times, we only retain the spectrum with the highest S/N, and finally identify 2651 carbon stars from LAMOST DR4. Comparing with previous catalogs, we find that 1415 of them have not been reported in previous works. Moreover, among these carbon stars, 296 were observed in test nights, and thus their spectra were not published on the Web site of LAMOST DR4.13

3.3. Comparison with the LAMOST 1D Pipeline

For our carbon stars, it is essential to obtain the number of stars classified correctly by the spectral analysis software of LAMOST (LAMOST 1D pipeline), which can be used to determine whether the machine-learning algorithm adopted in this work shows good performance. We obtained the spectral types given by the LAMOST 1D pipeline for our carbon stars and list them in Table 3. From this table, we can clearly see that 1671 (about 63%) of our carbon stars were classified correctly, but others (37%) were incorrectly classified by the LAMOST 1D pipeline. Among these incorrectly classified carbon stars, 74% (726) were classified as G-type stars (mainly G5 type stars), and others were respectively categorized as DQ white dwarfs (two), an A9-type star (one), F-type stars (four), K-type stars (97), an M-type star (one), and unknown (149).

Table 3.  The Classification Results of Our Carbon Stars given by the LAMOST 1D Pipeline

LAMOST Spectral Type Number
Carbon 1671
Carbon white dwarf 2
Unknown 149
A9 1
F F5: 1
  F9: 3
  Total Number: 4
G G0: 6
  G2: 8
  G5: 601
  G6: 6
  G7: 91
  G8: 12
  G9: 2
  Total Number: 726
K K0: 2
  K1: 23
  K2: 1
  K3: 22
  K4: 22
  K5: 23
  K7: 4
  Total Number: 97
M4 1

Download table as:  ASCIITypeset image

Compared with that for F-, G-, or K-type stars, the classification accuracy of the LAMOST 1D pipeline for carbon stars is relatively low, which probably arose from carbon stars having fewer template spectra. Figure 6 shows the carbon star templates used by the LAMOST 1D pipeline. We can see that the LAMOST 1D pipeline has three templates for carbon stars, two of which are common carbon stars and the last one is a DQ white dwarf. Obviously, the two carbon star templates are relatively hot stars, and no cool CN-type carbon star is included in the templates. In addition, a few types of carbon stars, which have special spectral features, such as CEMP stars and carbon subdwarf stars, are also not included in the templates. Thus, constructing new carbon star template spectra will be an important work in the future for the LAMOST 1D pipeline, which will greatly increase classification accuracy and reliability.

Figure 6.

Figure 6. The three template spectra for carbon stars used by the LAMOST 1D pipeline.

Standard image High-resolution image

3.4. Comparing with Previous Works

Before this work, Si et al. (2015) found 183 carbon stars from the LAMOST pilot survey using an efficient machine-learning method, and Ji et al. (2016) reported 894 carbon stars from LAMOST DR2 using a series of molecular band index criteria. Thus, it is important to compare our carbon stars with these two publications.

We compare our stars with the two previous catalogs of carbon stars (Si et al. 2015; Ji et al. 2016), and the results are listed in Table 4. The first, second, and third columns respectively give the references, the total number of stars, and the data used by the two catalogs. The fourth column, N_1, gives the number of carbon stars in the two previous catalogs that was eliminated in our catalog during manual identification because of low spectral quality. The fifth column, N_2, shows the number of carbon stars included in our catalog but excluded from previous catalogs. We can see that there are a total of 33 and 575 stars excluded from the catalog of Si et al. (2015) and Ji et al. (2016), respectively. The two publications used two older versions of the data set, which did not include data obtained in test nights. In contrast, we used the latest version of LAMOST spectra and data observed during the test night. Furthermore, we also use a more efficient method to find carbon stars. These are the reasons why there are relatively large numbers of carbon stars omitted by the two catalogs. The sixth column, N_test, shows the number of carbon stars, which were obtained during the test nights but excluded by previous literature, and there are respectively four and 122 carbon stars omitted because test night data were not used. The seventh column, N_version, provides the number of carbon stars ignored by the two catalogs since the old version data were used. The last column, N_algorithm, lists the number of carbon stars removed by the selection methods in the two publications. There are 17 stars eliminated from the catalog of Si et al. (2015) by the machine-learning method (EMR) they used. Du et al. (2016) compared the EMR algorithm with our Bagging TopPush method, and found that the performance of Bagging TopPush is better than that of the EMR algorithm. Further, there were 403 carbon stars excluded from the work of Ji et al. (2016) by their target selection method. We use each step of their selection criteria to check how these stars were eliminated, and list the results in Table 5.

Table 4.  Comparison with Two Previous LAMOST Carbon Star Catalogs

Literature Total Numbera Data Releaseb N_1c N_2d N_teste N_versionf N_algorithmg
Si et al. (2015) 183 Pilot 4 33 4 12 17
Ji et al. (2016) 894 DR2 5 575 122 50 403

Notes.

aThe total number of carbon stars reported in each publication. bThe LAMOST data version used in each publication. cThe number of carbon stars included in the publication but excluded from our catalog. dThe number of carbon stars excluded from the publication but included in our catalog. eAmong the N_2 carbon stars excluded from the publication, the number of carbon stars observed in the test nights. fAmong the N_2 carbon stars excluded from the publication, the number of carbon stars excluded from the old version of the data set used in the publication. gAmong the N_2 carbon stars excluded from the publication, the number of carbon stars removed by the selection method in the publication.

Download table as:  ASCIITypeset image

Table 5.  The Number of Our Carbon Stars Excluded by Each Step of the Selection Criteria of Ji et al. (2016)

Step Criteria Number of Candidates Removed
1 ${\rm{S}}/{\rm{N}}(i)\gt 10$ and leave only one epoch for multiply observed stars 32
2 No radial velocity 25
3 Equations (2)–(4) 138
4 CN7065 ≥ 4 or CN7820 ≥ 8 113
  $\mathrm{CN}7065\,\lt 4$ and $\mathrm{CN}7820\,\lt 8$ with ${\rm{S}}/{\rm{N}}(g)\gt 20$ and ${\rm{S}}/{\rm{N}}(i)\gt 20$  
5 CN7065 ≥ 2 and C2 ≥ −13 54
6 ${K}_{s}\lt 14.5$ and $J-{K}_{s}\gt 0.45$ 41

Download table as:  ASCIITypeset image

From this table, we can see that the first step is an S/N criterion, which omits 32 carbon stars from our catalog. We carefully check the S/Ns of these 32 spectra, and find their spectral qualities are good enough to be recognized visually. The S/Ns provided by the LAMOST pipeline are calculated based on continua, and it is just a rough indicator instead of an exact quality measurement of a spectrum. Besides, the i-band S/Ns used in this step are not the best spectral quality estimates for late-type stars. Then, steps 2, 3, 4, and 5 adopted molecular band index selection criteria, and 25, 138, 113, and 54 carbon stars were ignored in each of the four steps, respectively, resulting in a total of 330 carbon stars being missed by these molecular band index methods. Finally, 41 carbon stars were removed by a color selection criterion in step 6, which could eliminate carbon stars with hotter temperatures. We check the spectra of these 41 stars, and all of them are hot carbon stars as expected, which means that relatively strict color criteria could exclude a small fraction of carbon stars.

3.5. Carbon-enhanced Main-sequence Turnoff Star Candidates

In addition to the above 2651 carbon stars, we also find 17 stars that have strong Balmer absorption lines and C–H molecular bands in their spectra, as shown in Figure 7; these features show that they are hot carbon-enhanced stars and have higher effective temperatures than other types of carbon stars. We check their atmospheric parameters given by the LAMOST stellar parameter pipeline (LASP) in our database, but the LASP does not provide any. Thus, we estimate their effective temperatures, surface gravities, and metallicities with the ${\chi }^{2}$ minimization technique described in Section 4.4 of Lee et al. (2008) and list them in Table 6, which also includes other basic information, i.e., equatorial coordinates, r-band signal-to-noise ratio "S/N_r," and spectral type "SpType_PL " given by the LAMOST 1D pipeline. From this table, we can see that their effective temperatures are all greater than 5800, and their surface gravities are all greater than 3.6 except one, which indicates that they are likely located at the main-sequence turnoff region on the Hertzsprung–Russell diagram. In addition, their metallicities, parameterized by [Fe/H], are all below −1.6 and even as low as −2.4 except for one star, indicating that they are metal-poor stars. Thus, we infer that these 17 stars are likely to be CEMPTO stars as mentioned in Aoki et al. (2008) and need future high-resolution follow-up observation for identification.

Figure 7.

Figure 7. Two example spectra of the CEMP turnoff stars, which have strong Balmer absorption lines and CH molecular bands.

Standard image High-resolution image

Table 6.  Equatorial Coordinates, S/Ns, Atmospheric Parameters, and Spectral Types of the 17 CEMP Turnoff Star Candidates

Designation R.A. Decl. S/N_r Teff log(g) [Fe/H] SpType_PL
  (degree) (degree)   (K)      
J012514.16+352233.0 21.309014 35.375843 31 5982 ± 44 3.87 ± 0.07 −2.05 ± 0.07 F2
J025723.25+331638.6 44.346896 33.277406 32 5624 ± 41 3.90 ± 0.08 −1.76 ± 0.06 G0
J074637.32+291941.8 116.655510 29.328284 31 5858 ± 45 3.90 ± 0.09 −2.04 ± 0.07 F0
J082459.32+302542.0 126.247187 30.428351 43 6068 ± 40 4.02 ± 0.06 −2.19 ± 0.06 A7
J091243.72+021623.4 138.182182 2.273190 53 6001 ± 36 3.91 ± 0.06 −2.17 ± 0.05 F2
J093539.71+283138.7 143.915485 28.527433 40 5847 ± 40 3.74 ± 0.07 −2.24 ± 0.05 F4
J095406.90+063728.2 148.528758 6.624504 48 5943 ± 37 4.04 ± 0.05 −1.88 ± 0.05 F0
J112535.49+414242.5 171.397877 41.711829 62 5997 ± 41 4.00 ± 0.06 −1.96 ± 0.06 F0
J121106.02+321044.8 182.775099 32.179121 25 5831 ± 50 3.73 ± 0.09 −2.14 ± 0.07 F5
J121244.80+315134.0 183.186669 31.859463 69 5919 ± 21 3.68 ± 0.04 −2.28 ± 0.03 F5
J121817.74+295300.4 184.573949 29.883457 59 5861 ± 25 4.01 ± 0.04 −1.64 ± 0.03 F5
J125725.33+335356.8 194.355549 33.899133 100 5866 ± 23 3.92 ± 0.04 −1.34 ± 0.03 F2
J133539.75+152359.8 203.915639 15.399948 42 5768 ± 43 3.66 ± 0.08 −2.23 ± 0.06 F5
J140753.27+473517.3 211.971998 47.588155 58 5913 ± 34 3.98 ± 0.06 −1.62 ± 0.05 F5
J144427.16+314601.9 221.113201 31.767221 23 5698 ± 45 3.31 ± 0.1 −2.42 ± 0.06 G3
J151139.17+335252.9 227.913238 33.881386 102 5920 ± 25 3.68 ± 0.04 −2.26 ± 0.03 F2
J163008.22+231438.7 247.534280 23.244110 160 5850 ± 22 3.90 ± 0.03 −1.90 ± 0.03 F5

Download table as:  ASCIITypeset image

We cross-match these 17 stars with the Galaxy Evolution Explorer (GALEX; Martin et al. 2005), Panoramic Survey Telescope and Rapid Response System (Pan-STARRS; Kaiser et al. 2002, 2010; Chambers et al. 2016), The Two Micron All Sky Survey (2MASS; Skrutskie et al. 2006), and Wide-field Infrared Survey Explorer (WISE; Wright et al. 2010), which will be described in Section 5, and list their ultraviolet, optical, and infrared magnitudes in Table 7. From this table, we note that none of them have an FUV detection, suggesting that it is highly likely that none of them are in a binary system with a compact white dwarf companion. In addition, their radial velocities "RV" given by the LAMOST 1D pipeline and proper motions in the right ascension and declination directions from the PPMXL (Roeser et al. 2010) and UCAC4 (Zacharias et al. 2013) catalogs are listed in Table 8, where ${\mu }_{\alpha }\cos {(\delta )}_{\_{\rm{P}}}$ and ${\mu }_{\delta \_{\rm{P}}}$ are the two proper motions from the PPMXL catalog, and ${\mu }_{\alpha }\cos {(\delta )}_{\_{\rm{U}}}$ and ${\mu }_{\delta \_{\rm{U}}}$ are the proper motions from the UCAC4 catalog. From this table, we can see that all of them have PPMXL proper motions but 13 of them have UCAC4 proper motions. The two proper motions are roughly consistent, and most of them have large proper motions, consistent with their possible nature as main-sequence turnoff stars.

Table 7.  Ultraviolet, Optical, and Infrared Magnitudes of the 17 CEMP Turnoff Star Candidates

Designation fuv nuv u g r i z J H K W1 W2 W3
  (mag) (mag) (mag) (mag) (mag) (mag) (mag) (mag) (mag) (mag) (mag) (mag) (mag)
J012514.16+352233.0 a 19.87 15.99 15.71 15.74 15.53 15.71 12.88
J025723.25+331638.6 20.53 18.07 17.07 16.63 16.44 16.36 15.52 15.08 14.91 14.91 14.99 12.47
J074637.32+291941.8 19.14 17.30 16.31 16.03 15.91 15.89 15.10 14.88 14.98 14.74 14.70 11.82
J082459.32+302542.0 18.89 17.15 16.24 15.94 15.85 15.82 15.10 14.64 14.81 14.71 14.76 11.82
J091243.72+021623.4 16.55 15.67 15.37 15.24 15.21 14.47 14.18 14.07 14.03 14.03 11.78
J093539.71+283138.7 18.14 16.23 15.39 15.06 14.94 14.90 14.15 13.76 13.76 13.76 13.77 12.63
J095406.90+063728.2 19.58 17.72 16.76 16.43 16.32 16.27 15.42 15.17 15.24 15.18 15.16 12.62
J112535.49+414242.5 16.89 14.97 14.51 14.63 14.98 13.65 12.86 12.57 12.53 12.51 12.53 11.97
J121106.02+321044.8 20.15 18.47 17.57 17.27 17.13 17.11 16.22 15.72 15.67 16.00 16.08 12.50
J121244.80+315134.0 16.13 14.53 13.67 13.35 14.46 13.25 12.44 12.14 12.15 12.09 12.13 12.65
J121817.74+295300.4 17.27 15.32 14.44 14.21 14.01 14.03 13.18 12.95 12.84 12.85 12.85 12.42
J125725.33+335356.8 18.25 16.96 15.98 15.66 15.55 15.56 14.75 14.52 14.43 14.38 14.41 12.41
J133539.75+152359.8 18.50 16.45 15.57 15.20 15.08 15.03 14.21 13.87 13.88 13.84 13.88 12.70
J140753.27+473517.3 17.50 15.59 14.59 14.38 14.14 14.15 13.31 13.03 12.99 12.89 12.88 12.53
J144427.16+314601.9 18.92 17.01 16.03 15.64 15.55 15.50 14.71 14.29 14.31 14.28 14.34 12.86
J151139.17+335252.9 16.70 14.89 13.96 13.68 13.55 13.53 12.74 12.47 12.43 12.39 12.40 12.65
J163008.22+231438.7 17.35 15.41 14.50 14.34 14.02 14.01 13.20 12.86 12.81 12.80 12.82 12.09

Note.

a"⋯" means no value is available.

Download table as:  ASCIITypeset image

Table 8.  Radial Velocities and Proper Motions of the 17 CEMP Turnoff Star Candidates

Designation RV ${\mu }_{\alpha }\cos {(\delta )}_{\_{\rm{P}}}$ ${\mu }_{\delta \_{\rm{P}}}$ ${\mu }_{\alpha }\cos {(\delta )}_{\_{\rm{U}}}$ ${\mu }_{\delta \_{\rm{U}}}$
  (km s−1) (mas yr−1) (mas yr−1) (mas yr−1) (mas yr−1)
J012514.16+352233.0 −263 ± 4 2.7 ± 5.0 −3.8 ± 5.0 a
J025723.25+331638.6 −235 ± 3 8.6 ± 4.3 −32.2 ± 4.3 5.0 ± 3.8 −32.1 ± 3.7
J074637.32+291941.8 78 ± 4 9.0 ± 4.1 0.3 ± 4.1 10.8 ± 5.5 −10.5 ± 6.1
J082459.32+302542.0 24 ± 4 4.3 ± 4.3 −32.0 ± 4.3 8.9 ± 4.0 −28.5 ± 4.5
J091243.72+021623.4 115 ± 4 −9.5 ± 3.6 −29.6 ± 3.6 −1.9 ± 2.2 −30.3 ± 2.4
J093539.71+283138.7 −79 ± 3 53.6 ± 3.7 −73.0 ± 3.7 55.0 ± 2.7 −76.1 ± 3.0
J095406.90+063728.2 283 ± 3 −13.5 ± 4.1 −21.3 ± 4.1 −15.6 ± 18 −15.9 ± 18.1
J112535.49+414242.5 −288 ± 3 −23.8 ± 4.0 −38.3 ± 4.0 −23.0 ± 1.5 −39.6 ± 1.8
J121106.02+321044.8 45 ± 4 8.8 ± 4.8 −13.6 ± 4.8 ... ...
J121244.80+315134.0 29 ± 2 −61.6 ± 4.0 −73.3 ± 4.0 −63.3 ± 1.2 −75.5 ± 1.4
J121817.74+295300.4 −62 ± 2 −41.7 ± 5.0 −38.3 ± 5.0 −38.2 ± 2.0 −33.2 ± 2.7
J125725.33+335356.8 36 ± 1 −31.8 ± 3.8 −16.4 ± 3.8 −27.0 ± 2.3 −17.9 ± 2.8
J133539.75+152359.8 143 ± 4 −32.4 ± 4.1 −59.9 ± 4.1 −30.0 ± 3.1 −61.6 ± 3.7
J140753.27+473517.3 23 ± 2 −25.1 ± 3.6 −15.5 ± 3.6 −22.5 ± 1.5 −14.0 ± 1.6
J144427.16+314601.9 −103 ± 3 −22.7 ± 4.2 −10.4 ± 4.2 −25.4 ± 2.9 −3.3 ± 3.3
J151139.17+335252.9 −337 ± 2 −54.5 ± 4.3 −70.8 ± 4.3 −48.7 ± 2.8 −72.1 ± 2.4
J163008.22+231438.7 −244 ± 2 −31.5 ± 4.0 56.0 ± 4.0 −25.6 ± 3 54.5 ± 3.4

Download table as:  ASCIITypeset image

4. Spectral Types and Spatial Distribution

4.1. Classification

In this paper, we adopt the Barnbaum et al. (1996) classification system to categorize our 2651 carbon stars; their prominent spectral features are summarized in Table 1 and are detailed in the introduction. Based on these criteria, we classify the 2651 carbon stars into five types.

First, we manually select over 500 C-J star candidates with strong isotopic bands, then pick out 400 C-J stars with j indexes greater than 4 as defined by De Mello et al. (2009). Further, we divide the C-J stars into three types, i.e., C-J(N), C-J(H), and C-J(R), as described in Lloyd Evans (1986), which, respectively, represent C-N, C-H, and C-R stars with intensity isotopic bands. The number of C-J(N), C-J(H), C-J(R), and C-J(UNKNOWN) stars among our 400 C-J stars is listed in Table 9, with C-J(UNKNOWN) representing stars unable to be classified as a certain type. From the classification result, we can clearly see that close to 90% of C-J stars are cool C-N type. Two example spectra of C-J(N) are plotted in Figure 8, and spectra of a C-J(H) and a C-J(R) star are shown in Figure 9. The upper-left panel of Figure 9 is the spectrum of a C-J(H) star, i.e., LAMOST J004619.17+354537.1; the upper-right panel is the spectrum of a C-J(R) star, i.e., J033109.37+325732.7.

Figure 8.

Figure 8. The upper panels are two spectra of C-J(N)-type stars, and the bottom panels are their local spectra from 5700 to 6700 Å, used to calculate their pseudo-continua and further j indexes. The spectral lines of 13C12C at 6168 Å and 12C12C at 6192 Å are, respectively, marked by red and green dashed lines, and the normal 12C14N bands at 6206 Å and isotopic 13C14N bands at 6260 Å are also displayed.

Standard image High-resolution image
Figure 9.

Figure 9. Same as Figure 8 for a C-J(H)- and a C-J(R)-type star.

Standard image High-resolution image

Table 9.  Number of Each Type of C-J Star

C-J_TNa C-J(H) C-J(R) C-J(N) C-J(UNKNOWN)
400 41 (10.25%) 2 (0.50%) 353 (88.25%) 4 (1.00%)

Note.

aThe total number of our C-J stars.

Download table as:  ASCIITypeset image

In addition, we find 56 C-J stars with emission lines, and a cool C-J(N) star with composite spectra. Among these 56 emission-line (EM) stars, 55 are C-J(N) stars and one is a C-J(H) star. The spectrum with emission lines (J033109.37+325732.7) is shown in the upper-right panel, and the composite spectrum (J004619.17+354537.1) is plotted in the upper-left panel of Figure 10.

Figure 10.

Figure 10. Same as Figure 8 for a C-J star with a composite spectrum and a C-J(N)-EM type star.

Standard image High-resolution image

In the bottom two panels of Figures 810, we show the local spectra from 5700 to 6700 Å of the upper two spectra, which are used to calculate the pseudo-continuum and the j indices. We also show the isotopic 13C12C lines at 6168 Å and the normal 12C12C lines at 6192 Å with red and green, respectively, and also the normal 12C14N band at 6206 Å and isotopic 13C14N band at 6260 Å with red and green colors. From these local spectra, we can see that the C-J stars show distinctly different C2 and CN bands compared to the common C-H, C-R, and C-N stars.

Aside from the 400 C-J stars, we identify 864 C-H stars, 226 C-R stars, 719 barium stars, 266 C-N stars, and 176 unclassified stars as listed in Table 10. Among the 176 stars without spectral types, 12 of them are stars with composite spectra, and the other 164 stars have low spectral quality which make them unable to be classified. Four example spectra of the C-H, C-R, barium, and C-N stars are displayed in Figure 11, and their most prominent spectral lines, which are used to determine their spectral types, are marked in red.

Figure 11.

Figure 11. The upper-left panel is the spectrum of a C-H star, and the red region is the P-branch band. The upper-right panel is the spectrum of a C-R star, and the red spectral lines are respectively Ba ii, Sr ii and Ca i lines. The bottom left panel is the spectrum of a Barium star, and the Ba ii, Sr ii and Ca i lines are also marked by red color. The bottom right panel is the spectrum of a C-N star.

Standard image High-resolution image

Table 10.  Number of Each Type of Carbon Star

C-TN a C-H C-R C-J C-N Barium Unknown
2651 864 226 400 266 719 176

Note.

aC_TN is the total number of our carbon stars.

Download table as:  ASCIITypeset image

Green (2013) found 134 G-type stars, 51 emission-line (EM) stars, and 9 stars with clear composite spectra, which have distinctly different spectral features from other carbon stars. According to the MK classification criteria, G-type stars can be classified as C-H stars, and majority of EM stars are either C-N or C-J(N) stars. In our carbons stars, there are 308 G-type carbon stars, 93 EM stars, and 12 stars with composite spectra. Among the 308 G-type stars, 250 are new discoveries. We plot two example spectra in Figure 12. Among the 93 EM stars, 45 are reported for the first time. We plot three example spectra in Figure 13; the upper panel is a cool C-N type star, the middle panel is a C-H type star, J213721.01+300629.7, and the bottom panel is a C-R type star, J055821.00+284549.6. The C-H and C-R type EM stars are extremely rare, and we only find two C-H type EM stars and one C-R type EM star in this paper. Of the 12 stars with composite spectra, seven of them are newly recognized in this work, and two example plots are shown in Figure 14.

Figure 12.

Figure 12. Two spectra of G-type carbon stars, which remarkably show blue continua, strong C2 bands, and strong narrow Balmer and Ca absorption lines. Sometimes, a fraction of G-type stars show a weak CN band in the red part, for example, the object of J124759.51+243947.1.

Standard image High-resolution image
Figure 13.

Figure 13. Three spectra of EM stars: the upper panel shows a cool C-N type star, the middle panel is a C-H type star, and the bottom one is a C-R type EM star.

Standard image High-resolution image
Figure 14.

Figure 14. Carbon stars with composite spectra, which display strong continua and pressure-broadened hydrogen Balmer absorption lines of typical DA white dwarfs in the blue part and clear C2 bands and red CN bands in the red part.

Standard image High-resolution image

The basic information on our carbon stars are listed in Table 11, which includes "R.A.," "decl.," "S/N_r," "SpType_PL," "G_EM_B," "If_new," and "SpType_MK." "R.A." and "decl." are equatorial coordinates; "S/N_r" is the r-band S/N; "SpType_PL" is the spectral type given by the LAMOST 1D pipeline; and "G_EM_B" has four values, which are respectively "EM," "G_type," "Binary," and "NULL," and represent EM stars, G-type stars, stars with composite spectra, and other stars. "If_new," which has two values, "NEW" and "NULL," denotes whether it is a newly recognized star; and "SpType_MK" gives the MK spectral classification results, which has 15 values. Among them, "C-H," "C-R," "C-N," and "Ba," respectively represent C-H, C-R, C-N, and barium stars, "UNKNOWN" means stars for which we cannot give a spectral type, and "NULL" is for stars that cannot be classified. For C-J stars, we set four values for them. "C-J(H)," "C-J(R)," "C-J(N)," and "C-J(UNKNOWN)," respectively represent C-H, C-R, C-N, and unknown stars with unusually strong isotopic carbon bands. For stars with emission lines, we use five values to represent them; these are "C-H(EM)," "C-N(EM)," "C-R(EM)," "C-J(H)-EM," and "C-J(N)-EM." Among these, "C-H(EM)," "C-N(EM)," and "C-R(EM)" represent C-H, C-N, and C-R stars with emission lines, and "C-J(H)-EM" and "C-J(N)-EM" are C-H and C-N stars with strong isotopic carbon bands and emission lines in their spectra.

Table 11.  Basic Parameters of the Carbon Stars Reported in This Paper

Designation R.A. Decl. S/N_r SpType_PL G_EM_B If_new SpType_MK
  (degree) (degree)          
J085222.34+494610.3 133.093100 49.769550 458 K4 NULL NULL C-R
J004619.17+354537.1 11.579886 35.760306 31 Carbon NULL NULL C-J(H)
J005917.52+315605.4 14.823016 31.934838 130 Carbon EM new C-H(EM)
J164406.42+470635.7 251.026756 47.109933 108 G5 NULL NULL Ba
J101754.70+251201.0 154.477944 25.200283 112 Carbon G_type NULL C-H
J081747.79+290531.8 124.449140 29.092172 14 Non NULL NULL UNKNOWN
J200607.53+460847.2 301.531408 46.146456 216 Carbon NULL NULL C-N
J033109.37+325732.7 52.789053 32.959109 17 Carbon NULL new C-J(R)
J055821.00+284549.6 89.587501 28.763803 62 G5 EM NULL C-R(EM)
J062224.23+032520.2 95.600993 3.422284 28 G2 Binary new NULL
J072058.63+250006.4 110.244321 25.001782 95 Carbon NULL new C-J(N)

Note. The machine-readable version includes ultraviolet, optical, and infrared magnitudes and proper motion information. The format is similar to that shown in Tables 68.

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

We match our carbon stars to the photometric catalogs of GALEX, Pan-STARRS, 2MASS, and WISE, and list their magnitude information in the machine-readable version of Table 11. FUV and NUV are ultraviolet GALEX magnitudes; u, g, r, i, and z are five optical Pan-STARRS magnitudes; J, H, and K are three 2MASS magnitudes; W1, W2, and W3 are three WISE magnitudes, and "..." represents no photometry detection in this band. In addition, we match to the PPMXL, UCAC4, and TGAS (Gaia Collaboration et al. 2016) catalogs, and also list their proper motions and parallaxes in the machine-readable version of Table 11. There, ${\mu }_{\alpha }\cos {(\delta )}_{\_{\rm{P}}}$ and ${\mu }_{\delta \_{\rm{P}}}$ are proper motions in the R.A. and decl. directions from the PPMXL catalog, ${\mu }_{\alpha }\cos {(\delta )}_{\_{\rm{U}}}$ and ${\mu }_{\delta \_{\rm{U}}}$ are from the UCAC4 catalog, ${\mu }_{\alpha }\cos {(\delta )}_{\_\mathrm{TG}}$ and ${\mu }_{\delta \_\mathrm{TG}}$ are from the TGAS catalog, and the parallax${}_{\_\mathrm{TG}}$ is also from TGAS. Among our carbon stars, a total of 124 stars have astrometry parameters from the TGAS catalog, 43 of which have a parallax larger than 1 mas yr−1, which shows that they have relatively reliable measurement values of the parallax.

4.2. Spatial Distribution

The spatial distribution in Galactic coordinates of the 2651 carbon stars is plotted in Figure 15, and colored symbols represent the types of carbon stars. Among them, 12 C-N, 24 C-J, 568 C-H, 54 C-R and 125 barium stars are situated in regions with $| b| \ge {30}^{\circ }$, and respectively account for about 4.5%, 6%, 65.7%, 23.9%, and 17.4% of the five types. As expected, the majority of C-N and C-J stars (95%) and a large number of C-R and barium stars (75%) are concentrated at low Galactic latitudes. A majority of the C-H stars (over 60%) are located at high latitudes, and the rest lie in low latitudes.

Figure 15.

Figure 15. Spatial distribution of the carbon stars reported in this paper.

Standard image High-resolution image

5. Photometry of Carbon Stars

In this section, we investigate the 2651 carbon stars using photometric data from the ultraviolet to infrared. For the ultraviolet and optical bands, we respectively match with the GR6Plus7 catalog of GALEX (Martin et al. 2005) and the DR1 catalog of the Pan-STARRS (Kaiser et al. 2002, 2010; Chambers et al. 2016) using the MAST CasJobs tool.14

GALEX15 is a NASA small explorer mission launched in 2003 April, and it performed the first all-sky imaging and spectroscopic surveys in space in the ultraviolet band (1350–2750 Å). The main goal of GALEX is to investigate the causes and evolution of star formation in galaxies over the history of the universe in the ultraviolet band, making it feasible to detect hot white dwarfs in unresolved binaries with main-sequence companions as early as G or K type, and cooler white dwarfs with early M-type or later companions as mentioned in Green (2013).

Pan-STARRS16 is a system for wide-field astronomical imaging developed and operated by the Institute for Astronomy at the University of Hawaii. Pan-STARRS1 (PS1) is the first part of Pan-STARRS to be completed and is the basis for Data Release 1 (DR1). It images the sky in the g, r, i, z, and y broadband filters.

For the infrared bands, we match to 2MASS (Skrutskie et al. 2006) and WISE (Wright et al. 2010) using the X-match tool17 of the SIMBAD astronomical database.18 2MASS performed uniform precise photometry and astrometry in the near-infrared photometric bandpasses J (1.25 μm), H (1.65 μm), and K (2.16 μm) between 1997 June and 2001 February, and produced a point source catalog containing 470,992,970 sources and an extended source catalog of 1,647,599 sources covering 99.998% of the celestial sphere. WISE began a mid-infrared survey of the entire sky in 2009 mid-December, and completed it in 2010. It mapped the whole sky in four infrared bands, W1, W2, W3, and W4, which were centered at 3.4, 4.6, 12, and 22 μm using 40 cm telescope feeding arrays with a total of 4 million pixels.

5.1. GALEX Ultraviolet-detected Stars

GALEX photometry is in two bands, far- and near-ultraviolet (FUV: 1771–2831 Å and NUV: 1344–1786 Å). The strong UV flux detected by GALEX in stellar systems can arise from a very hot blackbody, such as a hot white dwarf, or may also be associated with stellar activity (Green 2013).

We match to the GR6Plus7 catalog of GALEX with a 6'' search radius and use the "fGetNearestObjEq" function of CasJobs tool to only get the best matching result for each star. In all our carbon stars, we find a total of 1099 GALEX detections, and 1098 of them have NUV detections. We plot their NUV magnitude distribution in Figure 16. It can be seen that the NUV magnitude is from 16 to 26 mag; over 90% of them lies in the range from 19 to 24 mag, and over 50% of them are in a narrow range from 21 to 23 mag.

Figure 16.

Figure 16. Distribution of ultraviolet GALEX NUV magnitudes.

Standard image High-resolution image

Among our 308 G-type stars, about 81.2% of them have GALEX magnitudes, which is an extremely high detection fraction as mentioned in Green (2013). The number of C-H, C-R, C-J, C-N, and barium stars having GALEX detections is listed in Table 12. From this table, it should be noted that there are a total of 991 hot C-H, C-R, and barium stars with GALEX detections, which represent 54.8% of all 1809 hot stars, and a total of 59 cool C-J and C-N stars with GALEX detections, which make up 8.9% of all 666 cool carbon stars. Thus, we can see that hot carbon stars have a higher GALEX detection rate than cool stars as expected.

Table 12.  Number of C-H, C-R, C-J, C-N, and Barium Stars With GALEX, Pan-STARRS, 2MASS, and WISE Photometric Magnitudes

Survey Total Numbera C-H C-R Ba C-J C-N Unknown
GALEX 1099 370 109 262 41 12 47
Pan-STARRS 2608 852 223 712 387 259 176
2MASS 2567 852 225 715 359 247 158
WISE 2550 850 222 713 352 241 161

Note.

aThe total number of carbon stars that have GALEX, Pan-STARRS, 2MASS, or WISE photometric information.

Download table as:  ASCIITypeset image

Further, we note that 1098 of the GALEX detections have NUV detections, and 25 of them have FUV detections. In these FUV-detected stars, one of them, J050736.14+305149.6, has a magnitude but no NUV detection, and its FUV magnitude error is over 0.5 mag. We list the FUV magnitudes, NUV magnitudes, and spectral types of these FUV detections in Table 13. From this table, we can clearly see that there are 11 C-H stars, two C-R stars, one C-J star, one C-N star, six barium stars, and five stars with unknown spectral type. Among the 11 hot C-H stars, three are G-type stars. Note that UV brightness can arise in young, active objects from their active regions, transition regions, or chromospheric emission. Among these 25 FUV detections, none have emission lines, and thus they are likely to be in binary systems with hot white dwarf companions.

Table 13.  GALEX Detections of 25 Binary Candidates That Likely Have Compact White Dwarf Companions

Designation fuv fuv_err nuv nuv_err SpType
  (mag) (mag) (mag) (mag)
J005749.75+013835.2 22.14 0.10 21.99 0.16 C-H
J020726.72+453216.9 20.92 0.34 20.51 0.28 C-N
J050736.14+305149.6 21.71 0.53 −999 −999 Ba
J073406.93+351345.5 22.56 0.33 17.54 0.02 Ba
J074743.32+173302.0 22.41 0.44 20.64 0.14 C-H
J080917.58+004256.5 19.74 0.14 19.33 0.08 Ba
J083021.22+154319.6 23.08 0.28 22.21 0.17 UNKNOWN
J084906.99+462727.2 22.21 0.16 21.81 0.08 UNKNOWN
J091555.05+043115.6 21.53 0.32 20.42 0.18 Ba
J093450.24+022355.0 22.35 0.40 21.47 0.22 Ba
J094634.19+140521.7 24.01 0.30 20.10 0.02 C-H
J101110.08+285036.0 22.12 0.37 21.24 0.27 Ba
J101423.40+302200.3 25.20 0.25 21.07 0.01 C-H
J101946.87+252932.7 22.45 0.41 20.32 0.11 C-R
J115932.16+014326.9 22.54 0.50 20.90 0.18 C-H
J130359.18+050938.6 23.74 0.27 21.98 0.07 C-H
J130824.28+530224.4 23.03 0.20 22.42 0.13 C-H
J131525.84+062520.9 19.09 0.12 18.61 0.06 C-H
J133841.23+014523.7 22.89 0.26 21.79 0.15 UNKNOWN
J140953.08-061141.8 21.36 0.33 19.79 0.12 C-H
J142057.12-031953.2 19.09 0.13 18.66 0.06 C-H
J154903.86+033253.1 22.38 0.33 22.10 0.18 UNKNOWN
J164420.62+034506.6 19.75 0.12 19.59 0.08 C-J(UNKNOWN)
J212426.82-030344.6 23.98 0.42 21.53 0.10 C-R
J220255.21-010708.3 21.12 0.06 21.00 0.05 C-H

Download table as:  ASCIITypeset image

5.2. Pan-STARRS Optical-detected Stars

We match our carbon stars with the DR1 catalog of Pan-STARRS using a 3'' search radius and also use the "fGetNearestObjEq" function of the CasJobs tool of Pan-STARRS to return the best matching result for each star. There are a total of 2608 stars having optical Pan-STARRS detections; the number of C-H, C-R, C-J, C-N, and barium stars having Pan-STARRS detections are listed in Table 12. We plot the g-band magnitude distribution of the 2608 Pan-STARRS detections in Figure 17, and can clearly see that the magnitude range of the g band is from 10 to 23 mag. In addition, over 90% of the carbon stars are located in the range from 13 to 19 mag, and over half of them are concentrated in the range between 14 and 17 mag.

Figure 17.

Figure 17. Distribution of optical Pan-STARRS g magnitudes.

Standard image High-resolution image

5.3. 2MASS and WISE Infrared-detected Stars

We match all of our carbon star samples to the 2MASS catalog within 3'', and find 2567 near-infrared 2MASS detections, dominating 96.8% of our carbon stars. Table 12 lists the number of C-H, C-R, C-J, C-N, and barium stars that have 2MASS magnitudes.

We also match to the WISE catalog within 3'' and find 2550 mid-infrared WISE detections, which means that 96.2% of our carbon stars have mid-infrared photometric magnitudes. Among these, there are 713 barium stars, 850 C-H stars, 222 C-R stars, 352 C-J stars, 241 C-N stars, and 161 carbon stars of unknown type, which are listed in Table 12. It should be noted that, when we match to 2MASS and WISE catalogs, only results with distances nearest to the coordinates we upload are retained for stars with multiple results returned.

In the infrared bands, we select the Ks band for the investigation of the magnitude distribution. Considering a smaller extinction effect in the infrared bands, we plot the Ks magnitude distributions of each type of carbon star in Figure 18, together with the distribution of all 2MASS-detected stars. In this plot, the black dashed histogram shows the distribution of all 2567 stars; the red, cyan, blue, green, and magenta histograms respectively exhibit the distribution of the Ba, C-H, C-R, C-J and C-N stars. From the black dashed histogram, we can see that the Ks magnitude of our stars is from 4 to 17 mag, and the peak of the distribution lies between 11 and 12 mag. Further, from the five colored histograms, it can be seen that the cool C-J and C-N stars are the brightest, followed by the hot C-R, barium, and C-H stars, as expected. Since the C-N and cool C-J stars are post-AGB stars with high luminosity, they can show very bright Ks magnitudes, while C-H and barium stars are regarded as being in binary systems, and C-R stars were previously binaries, and thus these three types of stars have moderate distributions of magnitude.

Figure 18.

Figure 18. Distribution of infrared 2MASS Ks magnitudes.

Standard image High-resolution image

6. Summary

In this paper, we retrieve carbon stars from the large spectral database of LAMOST DR4 and adopt an efficient machine-learning algorithm, i.e., the Bagging TopPush method.

As a supervised machine-learning method, positive samples are needed to train ranking models. We obtain 1050 positive samples selected from over 1600 spectra of SDSS and LAMOST carbon stars in the literature, and cluster them into three groups. For each group, we respectively analyze the effect of spectral preprocessing methods on the algorithm performance and find that the so-called "nmap+pca" method, which combines the "nmap" normalization (explained in Appendix B) and the PCA dimension reduction has the highest performance for each group. The completeness and contamination for each group are also discussed.

We find 2651 carbon stars from LAMOST DR4, of which 1415 are reported for the first time. Among these stars, there are 308 G-type stars, 92 EM stars, and 12 spectral binaries, which show distinctly different spectral features from other stars. Among the 92 EM stars, we find two C-H type EM stars and one C-R type EM star, accounting for the extremely low proportion of emission-line carbon stars.

After identifying these carbons stars, we compare them with the classification results of the LAMOST 1D pipeline and two previous LAMOST catalogs (Si et al. 2015; Ji et al. 2016). At first, 63% of our carbon stars were correctly classified as carbon stars by the LAMOST 1D pipeline, and the other 37% of our stars were recognized by our method. Among the 37% of stars misclassified, 74% were classified as G-type stars by the 1D pipeline, which suggests that it is necessary to construct new templates of carbon stars for the LAMOST 1D pipeline to improve the accuracy of the spectral classification. Then, a total of nine carbon stars present in previous catalogs were removed from our catalog because of low spectral quality. But, over 600 of our carbon stars were omitted from their catalogs at the same observation period, and we check and analyze the reasons why these stars in our catalog were excluded by them.

Based on a series of spectral features, we classify our carbon stars into five subtypes, i.e., C-H, C-R, C-J, C-N, and barium stars. We use j indexes larger than 4 to find 400 C-J stars, and artificially classify the other stars using the spectral features summarized in Table 1. Finally, we identify 864 C-H stars, 226 C-R stars, 400 C-J stars, 266 C-N stars, 719 barium stars, and 176 unclassified stars. The C-J stars are further divided into three subtypes, C-J(H), C-J(R), and C-J(N), and we find that close to 90% of them are cool C-J(N) stars.

We investigate the spatial distribution of the five types of carbon stars in Galactic coordinates. For C-H stars, over 60% of them are located at the high-latitude region, while others are located at the low latitude area. In addition, about 95% of cool C-N and C-J stars, and at least 75% of C-R and barium stars are concentrated in regions of low Galactic latitude.

Aside from the 2651 carbon stars, we also find 17 CEMPTO stars, and preliminarily study their nature using atmospheric parameters obtained from low-resolution spectra. Up until now, there have only been about 20 CEMPTO stars reported in the literature; they are extremely rare stars, and high-resolution follow-up observations are needed for further identification.

At the end of this paper, we cross-match our carbons stars with the ultraviolet GALEX, optical Pan-STARRS, near-infrared 2MASS, and mid-infrared WISE catalogs, and study the magnitude distributions of the NUV, g, and Ks bands. From the distribution of Ks, it can be seen that cool carbon stars are the brightest because of their post-AGB evolution stage, as expected.

In the future, it would be helpful to perform follow-up time domain photometric and high-resolution spectroscopic observations, which can be used to identify carbon stars and further investigate their nature.

We thank Bruce Margon, Wei Ji, Jian-Rong Shi, Wen-Yuan Cui, and Jincheng Guo for useful discussions. This work was supported by the National Natural Science Foundation of China (grant Nos. 11303036 and 11390371/4), the LAMOST FELLOWSHIP is supported by the Special Funding for Advanced Users, budgeted and administrated by the Center for Astronomical Mega-Science, Chinese Academy of Science (CAMS), and the National Basic Research Program of China (973 Program, 2014CB845700). Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope, LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.

Appendix A: The Selection Effect of Positive Samples

In Section 2.1, we use four steps to select positive samples. At first, of the 1682 SDSS and LAMOST carbon stars, a fraction were observed multiple times. For such stars, we only retain the spectrum with the highest S/N and exclude other spectra. Through this first step, we removed 234 repeatedly observed spectra. Next, we limit the wavelength range in step 2, and 13 spectra that did not satisfy our wavelength restriction are excluded by this step. In step 3, we remove 101 spectra that have abnormal fluxes, such as 0 or negative fluxes, or other problems. After these three steps, there were still 1344 spectra left. When we manually checked those spectra, we find that 284 of them have considerable noise, making it difficult for us to identify them as carbon stars, and thus we only retain 1050 spectra, which we can definitively identify as carbons stars from spectral features, to construct a positive sample set. We remove 284 ambiguous spectra from the positive sample set.

Figure 19 shows the r-band S/N distribution of 1334 carbon star spectra, which are left after the third step as mentioned above. Figure 20 displays the S/N distribution of the 1050 positive samples used in this paper, and Figure 21 exhibits the distribution of the 284 removed samples. From the three figures, we can clearly see that spectra with S/Ns lower than 10 are also included in our 1050 positive samples, which roughly accounts for less than 30% of the entire sample. Unlike the 284 samples removed due to low-S/N spectra, those low-S/N positive samples that were retained have definitely distinguishable carbon molecular band features in their spectra. In addition, we can also see that the S/Ns of almost all of the 284 removed spectra are between 2 and 9, which indicates that these spectra have extremely low S/Ns.

Figure 19.

Figure 19. S/N distribution of the SDSS r-band for 1334 carbon star spectra, which are retained after step 3 in Section 2.1 and include 284 low-S/N spectra.

Standard image High-resolution image
Figure 20.

Figure 20. S/N distribution of the SDSS r-band for the spectra of the 1050 positive samples, which are used to construct the positive sample set P in this paper.

Standard image High-resolution image
Figure 21.

Figure 21. S/N distribution of the SDSS r-band for the 284 low-S/N spectra, which are removed from the positive samples through step 4 in Section 2.1.

Standard image High-resolution image

Then, it is necessary to design an experiment to analyze the effect of the exclusion of these 284 low-S/N spectra on the recall. First, we mix the 2651 carbon stars, which were found in Section 3 of this paper, into 500,000 LAMOST spectra, and make them the unlabeled sample set U. Then, we implement two carbon star retrieval experiments. In the first experiment, we use the 1334 spectra mentioned above to construct the positive sample set P and retrieve carbon stars from set U. In the second experiment, we use the spectra of the 1050 positive samples used in this paper to construct set P, and also retrieve carbon stars from set U. Finally, we compare the recall at different values of K in the two experiments. Figure 22 presents the experiment results. In this figure, the magenta and black lines respectively present the results of the first and second experiments.

Figure 22.

Figure 22. Recall comparison.The black line shows the recall at different values of K when we use the 1050 positive samples to search for carbon stars, and the magenta line displays the recall when we use the 1334 positive samples, which include the 284 removed spectra, to search carbon stars.

Standard image High-resolution image

From Figure 22, we can conclude that the recall of the second experiment (R2) is much higher than that of the first experiment (R1) when K is smaller than 2000. However, when K is larger than 2000, R1 is slightly higher than R2. For example, when K = 3000, R1 is about 0.92% higher than R2, and R1 is about 0.99% higher than R2 at K = 5000. Therefore, if we do not classify the 1050 positive samples and use all of them to search for carbon stars, such an experiment result tells us that we may lose at least about 0.99% carbon stars at K = 5000. But, we should recognize that the small number of positive samples will make carbon star retrieval faster, although we may miss few carbon stars. In Section 3, we search for carbon stars three times, and use one group of positive samples each time. Comparing Figure 22 with Figures 35 in Section 3.1, we can clearly see that the recall is much higher if we classify the positive samples into several groups, which indicates our classification of positive samples in Section 2.2 is extremely essential work. In addition, another investigation should be undertaken in subsequent work, and such a work will reveal whether we can get a higher recall than this paper if we classify the 1344 positive samples into different groups and search for carbon stars within each group.

Appendix B: Spectral Preprocessing Experiments

Generally, before searching for carbon stars from a massive data set, spectral data should be preprocessed first to get better retrieval performance. The widely used preprocessing methods include de-noising, normalization, feature selection, and so on. In this paper, we first eliminate the disturbance of narrow lines, such as sky emission lines and bad pixels. Figure 23 displays an example of spectral de-noising, and we can see that the bad pixels around 5800 Å have been removed effectively after filtering, showing that spectral de-noising is an indispensable step. Second, it is necessary to normalize the spectral flux to the same scale. We try two commonly used normalization methods to check which method is more efficient. The first method maps the minimum and maximum fluxes of each spectra to [0, 1], which is abbreviated as "nmap" in this paper, and the other method normalizes the flux to the value of the square root of the flux square sum, which is marked as "nunit." Third, we perform a median filter method with a width of 300 Å to determine the pseudo-continuum, which is used to investigate the effect of the continuum on the retrieval performance. Finally, when searching for rare objects, we are often confronted with high dimensional spectral data, which may contain non-informative or noisy features, so it is necessary to extract the main information hidden in the spectral data. We apply the PCA, which has been popularly used to obtain low-dimensional data representation, and 50 principal components have been retained.

Figure 23.

Figure 23. The upper panel is the original spectrum of the object J054640.48+351014.0, and the bottom plot is the spectrum after filtering strong noises.

Standard image High-resolution image

Based on above preprocessing steps, we obtain eight spectral preprocessing methods, which are referred to as "nmap," "nunit," "nmap + pca," "nunit + pca," "nmap + subcon," "nunit + subcon," "nmap + pca + subcon," and "nunit + pca + subcon." "nmap" and "nunit" indicate that spectra are preprocessed only by the "nmap" and "nunit" normalizations; "nmap + pca" and "nunit + pca" indicate that spectra are preprocessed with the normalization and PCA dimension reduction, "nmap + subcontinuum" and "nunit + subcontinuum" are two preprocesses with normalization and continuum subtraction, and "nmap + pca + subcontinuum" and "nunit + pca + subcontinuum" indicate spectra are preprocessed by normalization, PCA dimension reduction, and continuum subtraction.

Footnotes

Please wait… references are loading.
10.3847/1538-4365/aaa415