Read FCS files#
In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs
section of the anndata file and perform compensation on the data.
import readfcs
import pytometry as pm
Read data from readfcs
package example.
path_data = readfcs.datasets.example()
adata = pm.io.read_fcs(path_data)
adata
AnnData object with n_obs × n_vars = 65016 × 16
var: 'n', 'channel', 'marker', '$PnB', '$PnR', '$PnG'
uns: 'meta'
The .var
section of the AnnData object contains the channel information. We set the marker names as var_names
by default. In addition, we save the channel information in the "channel"
column.
adata.var
n | channel | marker | $PnB | $PnR | $PnG | |
---|---|---|---|---|---|---|
FSC-A | 1 | FSC-A | 32 | 262207 | 1 | |
FSC-H | 2 | FSC-H | 32 | 262207 | 1 | |
SSC-A | 3 | SSC-A | 32 | 261588 | 1 | |
KI67 | 4 | B515-A | KI67 | 32 | 261588 | 1 |
CD3 | 5 | R780-A | CD3 | 32 | 261588 | 1 |
CD28 | 6 | R710-A | CD28 | 32 | 261588 | 1 |
CD45RO | 7 | R660-A | CD45RO | 32 | 261588 | 1 |
CD8 | 8 | V800-A | CD8 | 32 | 261588 | 1 |
CD4 | 9 | V655-A | CD4 | 32 | 261588 | 1 |
CD57 | 10 | V585-A | CD57 | 32 | 261588 | 1 |
CD14 | 11 | V450-A | CD14 | 32 | 261588 | 1 |
CCR5 | 12 | G780-A | CCR5 | 32 | 261588 | 1 |
CD19 | 13 | G710-A | CD19 | 32 | 261588 | 1 |
CD27 | 14 | G660-A | CD27 | 32 | 261588 | 1 |
CCR7 | 15 | G610-A | CCR7 | 32 | 261588 | 1 |
CD127 | 16 | G560-A | CD127 | 32 | 261588 | 1 |
The .uns['meta']
section contains the header information from the FCS file.
adata.uns["meta"]
{'__header__': {'FCS format': 'FCS2.0',
'text start': 58,
'text end': 5099,
'data start': 5120,
'data end': 4166142,
'analysis start': 0,
'analysis end': 0},
'$TOT': 65016,
'$PAR': 16,
'$MODE': 'L',
'$BYTEORD': '4,3,2,1',
'$FIL': '100715.fcs',
'$NEXTDATA': 0,
'$DATATYPE': 'F',
'$BEGINSTEXT': '0',
'$BTIM': '15:36:28',
'$CYT': 'Main Aria (FACSAria)',
'$DATE': '17-JUL-2007',
'$ENDSTEXT': '0',
'$ETIM': '15:38:06',
'$INST': ' ',
'$OP': 'Administrator',
'$SRC': 'Specimen_001',
'$SYS': 'Windows XP 5.1',
'$TIMESTEP': '0.08',
'APPLY COMPENSATION': 'TRUE',
'AUTOBS': 'TRUE',
'CD Age': '19.6',
'CD CD4, %CM': '.',
'CD CD4, %EM': '.',
'CD CD4, %N': '.',
'CD CD4, %TM': '.',
'CD Event Censor': '0',
'CD First Viral Load': '2024',
'CD First Viral Load Date': '11/09/1999',
'CD Gag/100 CD4 Cells': '.',
'CD Gag/100 CM Cells': '.',
'CD Gag/100 EM Cells': '.',
'CD Gag/100 N Cells': '.',
'CD GAG/100 TM CELLS': '.',
'CD Seroconversion Datae': '04/30/1999',
'CD Survival time from seroconversion': '63',
'CD Time from seroc to sample': '194',
'CYTNUM': '1',
'EXPERIMENT NAME': '070717_AB02_tb',
'EXPORT TIME': '17-JUL-2007-16:04:38',
'EXPORT USER NAME': 'Administrator',
'Final Pin': '100715',
'FJ_$P17R': '262144',
'FJ_$TIMESTEP': '0.01',
'FJ_CompMatrixName': ' ',
'FSC ASF': '0.63',
'GUID': '0d8e743a-05fe-4e8b-9ec4-25993c124ee2',
'Index': '416',
'LASER1ASF': '0.66',
'LASER1DELAY': '0.00',
'LASER1NAME': 'Blue',
'LASER2ASF': '0.55',
'LASER2DELAY': '-59.80',
'LASER2NAME': 'Red',
'LASER3ASF': '0.48',
'LASER3DELAY': '-24.40',
'LASER3NAME': 'Violet',
'LASER4ASF': '0.53',
'LASER4DELAY': '-82.60',
'LASER4NAME': 'Green',
'Live Cells Recovered': ' ',
'PIN': ' ',
'pin check': ' ',
'SORT TYPE': 'SORT',
'THRESHOLD': 'FSC,27000',
'TUBE NAME': 'Tube_025',
'Viability': ' ',
'VIAL ID': '100715',
'VRC ID': ' ',
'WINDOW EXTENSION': '3.00',
'CREATOR': 'LYSYS',
'P1BS': '0',
'P1DISPLAY': 'LIN',
'P1MS': '0',
'P2BS': '0',
'P2DISPLAY': 'LIN',
'P2MS': '0',
'P3BS': '0',
'P3DISPLAY': 'LOG',
'P3MS': '0',
'P4BS': '0',
'P4DISPLAY': 'LOG',
'P4MS': '0',
'P5BS': '2926',
'P5DISPLAY': 'LOG',
'P5MS': '0',
'P6BS': '1162',
'P6DISPLAY': 'LOG',
'P6MS': '0',
'P7BS': '1849',
'P7DISPLAY': 'LOG',
'P7MS': '0',
'P8BS': '2029',
'P8DISPLAY': 'LOG',
'P8MS': '0',
'P9BS': '3343',
'P9DISPLAY': 'LOG',
'P9MS': '0',
'P10BS': '331',
'P10DISPLAY': 'LOG',
'P10MS': '0',
'P11BS': '0',
'P11DISPLAY': 'LOG',
'P11MS': '0',
'P12BS': '14511',
'P12DISPLAY': 'LOG',
'P12MS': '0',
'P13BS': '6053',
'P13DISPLAY': 'LOG',
'P13MS': '0',
'P14BS': '9362',
'P14DISPLAY': 'LOG',
'P14MS': '0',
'P15BS': '557',
'P15DISPLAY': 'LOG',
'P15MS': '0',
'P16BS': '9808',
'P16DISPLAY': 'LOG',
'P16MS': '0',
'$BEGINDATA': ' 5120',
'$ENDDATA': ' 4166143',
'channels': $PnN $PnS $PnB $PnR $PnG
n
1 FSC-A 32 262207 1
2 FSC-H 32 262207 1
3 SSC-A 32 261588 1
4 B515-A KI67 32 261588 1
5 R780-A CD3 32 261588 1
6 R710-A CD28 32 261588 1
7 R660-A CD45RO 32 261588 1
8 V800-A CD8 32 261588 1
9 V655-A CD4 32 261588 1
10 V585-A CD57 32 261588 1
11 V450-A CD14 32 261588 1
12 G780-A CCR5 32 261588 1
13 G710-A CD19 32 261588 1
14 G660-A CD27 32 261588 1
15 G610-A CCR7 32 261588 1
16 G560-A CD127 32 261588 1,
'header': {'FCS format': 'FCS2.0',
'text start': 58,
'text end': 5099,
'data start': 5120,
'data end': 4166142,
'analysis start': 0,
'analysis end': 0},
'spill': KI67 CD3 CD28 CD45RO CD8 CD4 CD57 \
KI67 1.000000 0.000000 0.000000 0.000088 0.000249 0.000645 0.007198
CD3 0.000000 1.000000 0.071188 0.148448 0.338903 0.009717 0.000000
CD28 0.000000 0.331405 1.000000 0.061965 0.120979 0.004053 0.000000
CD45RO 0.000000 0.088621 0.389424 1.000000 0.029759 0.065553 0.000000
CD8 0.000000 0.136618 0.010757 0.000000 1.000000 0.000156 0.000000
CD4 0.000000 0.000124 0.019463 0.218206 0.004953 1.000000 0.003583
CD57 0.000000 0.000000 0.000000 0.000000 0.001056 0.002287 1.000000
CD14 0.000000 0.000000 0.000000 0.000000 0.000000 0.008118 0.170066
CCR5 0.003122 0.008526 0.001024 0.001163 0.125401 0.018142 0.193646
CD19 0.002015 0.069645 0.194715 0.001008 0.151611 0.001270 0.007133
CD27 0.001685 0.054340 0.277852 0.343008 0.061753 0.077523 0.004263
CCR7 0.000000 0.008713 0.048213 0.073190 0.150563 0.386293 0.101896
CD127 0.001684 0.000000 0.000000 0.000095 0.003463 0.015712 0.174122
CD14 CCR5 CD19 CD27 CCR7 CD127
KI67 0.0 0.000000 0.000131 0.000067 0.000582 0.002520
CD3 0.0 0.301380 0.007478 0.012354 0.000000 0.000000
CD28 0.0 0.109117 0.100314 0.005832 0.000000 0.000000
CD45RO 0.0 0.031294 0.039306 0.091375 0.000396 0.000057
CD8 0.0 0.483235 0.014858 0.000000 0.000000 0.000000
CD4 0.0 0.001311 0.029646 0.408902 0.006506 0.000119
CD57 0.0 0.000389 0.000194 0.000000 0.062551 0.132484
CD14 1.0 0.000000 0.000000 0.000000 0.000000 0.000000
CCR5 0.0 1.000000 0.066898 0.161456 0.286823 1.238037
CD19 0.0 1.150032 1.000000 0.016077 0.014674 0.055352
CD27 0.0 0.497488 0.743923 1.000000 0.010329 0.037635
CCR7 0.0 0.370277 0.613490 1.218024 1.000000 0.065211
CD127 0.0 0.023802 0.049474 0.132511 0.239216 1.000000 }
Missing marker column#
In some FCS files, the marker information does not follow the $P[0-9]S
pattern, and reading the FCS file might fail. You can set the reindex=False
option when reading the FCS files.
adata = pm.io.read_fcs(path_data, reindex=False)
adata
AnnData object with n_obs × n_vars = 65016 × 16
var: 'channel', 'marker', '$PnB', '$PnR', '$PnG'
uns: 'meta'
The .var
section of the AnnData object contains the channel information. Here we use a running number as var_names
. The marker names may be created manually from the channel
column.
adata.var
channel | marker | $PnB | $PnR | $PnG | |
---|---|---|---|---|---|
n | |||||
1 | FSC-A | 32 | 262207 | 1 | |
2 | FSC-H | 32 | 262207 | 1 | |
3 | SSC-A | 32 | 261588 | 1 | |
4 | B515-A | KI67 | 32 | 261588 | 1 |
5 | R780-A | CD3 | 32 | 261588 | 1 |
6 | R710-A | CD28 | 32 | 261588 | 1 |
7 | R660-A | CD45RO | 32 | 261588 | 1 |
8 | V800-A | CD8 | 32 | 261588 | 1 |
9 | V655-A | CD4 | 32 | 261588 | 1 |
10 | V585-A | CD57 | 32 | 261588 | 1 |
11 | V450-A | CD14 | 32 | 261588 | 1 |
12 | G780-A | CCR5 | 32 | 261588 | 1 |
13 | G710-A | CD19 | 32 | 261588 | 1 |
14 | G660-A | CD27 | 32 | 261588 | 1 |
15 | G610-A | CCR7 | 32 | 261588 | 1 |
16 | G560-A | CD127 | 32 | 261588 | 1 |