Read FCS files#

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs section of the anndata file and perform compensation on the data.

import readfcs
import pytometry as pm

Read data from readfcs package example.

path_data = readfcs.datasets.example()
adata = pm.io.read_fcs(path_data)
adata
AnnData object with n_obs × n_vars = 65016 × 16
    var: 'n', 'channel', 'marker', '$PnB', '$PnR', '$PnG'
    uns: 'meta'

The .var section of the AnnData object contains the channel information. We set the marker names as var_names by default. In addition, we save the channel information in the "channel" column.

adata.var
n channel marker $PnB $PnR $PnG
FSC-A 1 FSC-A 32 262207 1
FSC-H 2 FSC-H 32 262207 1
SSC-A 3 SSC-A 32 261588 1
KI67 4 B515-A KI67 32 261588 1
CD3 5 R780-A CD3 32 261588 1
CD28 6 R710-A CD28 32 261588 1
CD45RO 7 R660-A CD45RO 32 261588 1
CD8 8 V800-A CD8 32 261588 1
CD4 9 V655-A CD4 32 261588 1
CD57 10 V585-A CD57 32 261588 1
CD14 11 V450-A CD14 32 261588 1
CCR5 12 G780-A CCR5 32 261588 1
CD19 13 G710-A CD19 32 261588 1
CD27 14 G660-A CD27 32 261588 1
CCR7 15 G610-A CCR7 32 261588 1
CD127 16 G560-A CD127 32 261588 1

The .uns['meta'] section contains the header information from the FCS file.

adata.uns["meta"]
{'__header__': {'FCS format': 'FCS2.0',
  'text start': 58,
  'text end': 5099,
  'data start': 5120,
  'data end': 4166142,
  'analysis start': 0,
  'analysis end': 0},
 '$TOT': 65016,
 '$PAR': 16,
 '$MODE': 'L',
 '$BYTEORD': '4,3,2,1',
 '$FIL': '100715.fcs',
 '$NEXTDATA': 0,
 '$DATATYPE': 'F',
 '$BEGINSTEXT': '0',
 '$BTIM': '15:36:28',
 '$CYT': 'Main Aria (FACSAria)',
 '$DATE': '17-JUL-2007',
 '$ENDSTEXT': '0',
 '$ETIM': '15:38:06',
 '$INST': ' ',
 '$OP': 'Administrator',
 '$SRC': 'Specimen_001',
 '$SYS': 'Windows XP 5.1',
 '$TIMESTEP': '0.08',
 'APPLY COMPENSATION': 'TRUE',
 'AUTOBS': 'TRUE',
 'CD Age': '19.6',
 'CD CD4, %CM': '.',
 'CD CD4, %EM': '.',
 'CD CD4, %N': '.',
 'CD CD4, %TM': '.',
 'CD Event Censor': '0',
 'CD First Viral Load': '2024',
 'CD First Viral Load Date': '11/09/1999',
 'CD Gag/100 CD4 Cells': '.',
 'CD Gag/100 CM Cells': '.',
 'CD Gag/100 EM Cells': '.',
 'CD Gag/100 N Cells': '.',
 'CD GAG/100 TM CELLS': '.',
 'CD Seroconversion Datae': '04/30/1999',
 'CD Survival time from seroconversion': '63',
 'CD Time from seroc to sample': '194',
 'CYTNUM': '1',
 'EXPERIMENT NAME': '070717_AB02_tb',
 'EXPORT TIME': '17-JUL-2007-16:04:38',
 'EXPORT USER NAME': 'Administrator',
 'Final Pin': '100715',
 'FJ_$P17R': '262144',
 'FJ_$TIMESTEP': '0.01',
 'FJ_CompMatrixName': ' ',
 'FSC ASF': '0.63',
 'GUID': '0d8e743a-05fe-4e8b-9ec4-25993c124ee2',
 'Index': '416',
 'LASER1ASF': '0.66',
 'LASER1DELAY': '0.00',
 'LASER1NAME': 'Blue',
 'LASER2ASF': '0.55',
 'LASER2DELAY': '-59.80',
 'LASER2NAME': 'Red',
 'LASER3ASF': '0.48',
 'LASER3DELAY': '-24.40',
 'LASER3NAME': 'Violet',
 'LASER4ASF': '0.53',
 'LASER4DELAY': '-82.60',
 'LASER4NAME': 'Green',
 'Live Cells Recovered': ' ',
 'PIN': ' ',
 'pin check': ' ',
 'SORT TYPE': 'SORT',
 'THRESHOLD': 'FSC,27000',
 'TUBE NAME': 'Tube_025',
 'Viability': ' ',
 'VIAL ID': '100715',
 'VRC ID': ' ',
 'WINDOW EXTENSION': '3.00',
 'CREATOR': 'LYSYS',
 'P1BS': '0',
 'P1DISPLAY': 'LIN',
 'P1MS': '0',
 'P2BS': '0',
 'P2DISPLAY': 'LIN',
 'P2MS': '0',
 'P3BS': '0',
 'P3DISPLAY': 'LOG',
 'P3MS': '0',
 'P4BS': '0',
 'P4DISPLAY': 'LOG',
 'P4MS': '0',
 'P5BS': '2926',
 'P5DISPLAY': 'LOG',
 'P5MS': '0',
 'P6BS': '1162',
 'P6DISPLAY': 'LOG',
 'P6MS': '0',
 'P7BS': '1849',
 'P7DISPLAY': 'LOG',
 'P7MS': '0',
 'P8BS': '2029',
 'P8DISPLAY': 'LOG',
 'P8MS': '0',
 'P9BS': '3343',
 'P9DISPLAY': 'LOG',
 'P9MS': '0',
 'P10BS': '331',
 'P10DISPLAY': 'LOG',
 'P10MS': '0',
 'P11BS': '0',
 'P11DISPLAY': 'LOG',
 'P11MS': '0',
 'P12BS': '14511',
 'P12DISPLAY': 'LOG',
 'P12MS': '0',
 'P13BS': '6053',
 'P13DISPLAY': 'LOG',
 'P13MS': '0',
 'P14BS': '9362',
 'P14DISPLAY': 'LOG',
 'P14MS': '0',
 'P15BS': '557',
 'P15DISPLAY': 'LOG',
 'P15MS': '0',
 'P16BS': '9808',
 'P16DISPLAY': 'LOG',
 'P16MS': '0',
 '$BEGINDATA': '        5120',
 '$ENDDATA': '     4166143',
 'channels':       $PnN    $PnS  $PnB    $PnR $PnG
 n                                    
 1    FSC-A            32  262207    1
 2    FSC-H            32  262207    1
 3    SSC-A            32  261588    1
 4   B515-A    KI67    32  261588    1
 5   R780-A     CD3    32  261588    1
 6   R710-A    CD28    32  261588    1
 7   R660-A  CD45RO    32  261588    1
 8   V800-A     CD8    32  261588    1
 9   V655-A     CD4    32  261588    1
 10  V585-A    CD57    32  261588    1
 11  V450-A    CD14    32  261588    1
 12  G780-A    CCR5    32  261588    1
 13  G710-A    CD19    32  261588    1
 14  G660-A    CD27    32  261588    1
 15  G610-A    CCR7    32  261588    1
 16  G560-A   CD127    32  261588    1,
 'header': {'FCS format': 'FCS2.0',
  'text start': 58,
  'text end': 5099,
  'data start': 5120,
  'data end': 4166142,
  'analysis start': 0,
  'analysis end': 0},
 'spill':             KI67       CD3      CD28    CD45RO       CD8       CD4      CD57  \
 KI67    1.000000  0.000000  0.000000  0.000088  0.000249  0.000645  0.007198   
 CD3     0.000000  1.000000  0.071188  0.148448  0.338903  0.009717  0.000000   
 CD28    0.000000  0.331405  1.000000  0.061965  0.120979  0.004053  0.000000   
 CD45RO  0.000000  0.088621  0.389424  1.000000  0.029759  0.065553  0.000000   
 CD8     0.000000  0.136618  0.010757  0.000000  1.000000  0.000156  0.000000   
 CD4     0.000000  0.000124  0.019463  0.218206  0.004953  1.000000  0.003583   
 CD57    0.000000  0.000000  0.000000  0.000000  0.001056  0.002287  1.000000   
 CD14    0.000000  0.000000  0.000000  0.000000  0.000000  0.008118  0.170066   
 CCR5    0.003122  0.008526  0.001024  0.001163  0.125401  0.018142  0.193646   
 CD19    0.002015  0.069645  0.194715  0.001008  0.151611  0.001270  0.007133   
 CD27    0.001685  0.054340  0.277852  0.343008  0.061753  0.077523  0.004263   
 CCR7    0.000000  0.008713  0.048213  0.073190  0.150563  0.386293  0.101896   
 CD127   0.001684  0.000000  0.000000  0.000095  0.003463  0.015712  0.174122   
 
         CD14      CCR5      CD19      CD27      CCR7     CD127  
 KI67     0.0  0.000000  0.000131  0.000067  0.000582  0.002520  
 CD3      0.0  0.301380  0.007478  0.012354  0.000000  0.000000  
 CD28     0.0  0.109117  0.100314  0.005832  0.000000  0.000000  
 CD45RO   0.0  0.031294  0.039306  0.091375  0.000396  0.000057  
 CD8      0.0  0.483235  0.014858  0.000000  0.000000  0.000000  
 CD4      0.0  0.001311  0.029646  0.408902  0.006506  0.000119  
 CD57     0.0  0.000389  0.000194  0.000000  0.062551  0.132484  
 CD14     1.0  0.000000  0.000000  0.000000  0.000000  0.000000  
 CCR5     0.0  1.000000  0.066898  0.161456  0.286823  1.238037  
 CD19     0.0  1.150032  1.000000  0.016077  0.014674  0.055352  
 CD27     0.0  0.497488  0.743923  1.000000  0.010329  0.037635  
 CCR7     0.0  0.370277  0.613490  1.218024  1.000000  0.065211  
 CD127    0.0  0.023802  0.049474  0.132511  0.239216  1.000000  }

Missing marker column#

In some FCS files, the marker information does not follow the $P[0-9]S pattern, and reading the FCS file might fail. You can set the reindex=False option when reading the FCS files.

adata = pm.io.read_fcs(path_data, reindex=False)
adata
AnnData object with n_obs × n_vars = 65016 × 16
    var: 'channel', 'marker', '$PnB', '$PnR', '$PnG'
    uns: 'meta'

The .var section of the AnnData object contains the channel information. Here we use a running number as var_names. The marker names may be created manually from the channel column.

adata.var
channel marker $PnB $PnR $PnG
n
1 FSC-A 32 262207 1
2 FSC-H 32 262207 1
3 SSC-A 32 261588 1
4 B515-A KI67 32 261588 1
5 R780-A CD3 32 261588 1
6 R710-A CD28 32 261588 1
7 R660-A CD45RO 32 261588 1
8 V800-A CD8 32 261588 1
9 V655-A CD4 32 261588 1
10 V585-A CD57 32 261588 1
11 V450-A CD14 32 261588 1
12 G780-A CCR5 32 261588 1
13 G710-A CD19 32 261588 1
14 G660-A CD27 32 261588 1
15 G610-A CCR7 32 261588 1
16 G560-A CD127 32 261588 1