4个NC杂志的空间转录组数据分析(GSE190811)-软件玩家

admin管理员组
文章数量:1444896

4个NC杂志的空间转录组数据分析(GSE190811)

我们生信马拉松授课群里有个学员问空间转录组数据如何读取，给出的数据编号为 GSE190811，由于不是10x spaceranger 的标准格式，对于初学者来说有一些难度，下面一起来看看这个数据集！

数据背景

GSE190811 这个数据来自2022年11月10号发表在NC杂志上，标题为《Single cell profiling of primary and paired metastatic lymph node tumors in breast cancer patients》，文章仅用这个数据绘制了一幅图：上皮细胞的特征基因在空间切片上的打分分布。

这个数据的GEO链接为：.cgi?acc=GSE190811，总共是4个空间转录组的样本：

代码语言：javascript代码运行次数：0运行复制

GSM5732357 PT_3 LNM ST
GSM5732358 PT_6 LNM ST
GSM5732359 PT_7 LNM ST
GSM5732360 PT_8 LNM ST

提供的附件 GSE190811_RAW.tar 包含的文件如下：

代码语言：javascript代码运行次数：0运行复制

GSM5732357_A_1938345_11.jpg.gz 52.6 Mb
GSM5732357_A_raw_feature_bc_matrix.h5 29.6 Mb
GSM5732357_A_scalefactors_json.json.gz 161 b
GSM5732357_A_tissue_hires_image.png.gz 14.5 Mb
GSM5732357_A_tissue_positions_list.csv.gz 66.5 Kb
GSM5732358_B_1938529_9.jpg.gz 68.6 Mb
GSM5732358_B_raw_feature_bc_matrix.h5 34.7 Mb
GSM5732358_B_scalefactors_json.json.gz 164 b
GSM5732358_B_tissue_hires_image.png.gz 16.9 Mb
GSM5732358_B_tissue_positions_list.csv.gz 64.4 Kb
GSM5732359_C_2000752_23.jpg.gz 75.1 Mb
GSM5732359_C_raw_feature_bc_matrix.h5 40.5 Mb
GSM5732359_C_scalefactors_json.json.gz 164 b
GSM5732359_C_tissue_hires_image.png.gz 17.3 Mb
GSM5732359_C_tissue_positions_list.csv.gz 62.6 Kb
GSM5732360_D_2000910_33.jpg.gz 65.7 Mb
GSM5732360_D_raw_feature_bc_matrix.h5 38.5 Mb
GSM5732360_D_scalefactors_json.json.gz 160 b
GSM5732360_D_tissue_hires_image.png.gz 16.1 Mb
GSM5732360_D_tissue_positions_list.csv.gz 59.7 Kb

目录中的文件说明

raw_feature_bc_matrix.h5：基因表达矩阵的h5格式
tissue_positions.csv：记录了对应的空间切片里面的每个barcode所在的切片的空间坐标，理论上有了这个csv文件其实就可以相当于有了空间切片的图片的细胞定位作用
tissue_hires_image.png：分辨率很高，所以图片文件就很大
tissue_lowres_image.png：文件很小，因为图片分辨率低，官网说它 is always 600 pixels. 【这个数据没有提供这个文件】
scalefactors_json.json：它关联了tissue_hires_image.png and tissue_lowres_image.png，文件内容如下：

代码语言：javascript代码运行次数：0运行复制

{"tissue_hires_scalef": 0.06628223, "tissue_lowres_scalef": 0.019884668, "fiducial_diameter_fullres": 383.95734, "spot_diameter_fullres": 237.68787000000003}

下载下来后整理成如下格式：

代码语言：javascript代码运行次数：0运行复制

├── GSM5732357_A
│   ├── 1938345_11.jpg
│   ├── raw_feature_bc_matrix.h5
│   ├── scalefactors_json.json
│   ├── tissue_hires_image.png
│   └── tissue_positions_list.csv
├── GSM5732358_B
│   ├── 1938529_9.jpg
│   ├── raw_feature_bc_matrix.h5
│   ├── scalefactors_json.json
│   ├── tissue_hires_image.png
│   └── tissue_positions_list.csv
├── GSM5732359_C
│   ├── 2000752_23.jpg
│   ├── raw_feature_bc_matrix.h5
│   ├── scalefactors_json.json
│   ├── tissue_hires_image.png
│   └── tissue_positions_list.csv
├── GSM5732360_D
│   ├── 2000910_33.jpg
│   ├── raw_feature_bc_matrix.h5
│   ├── scalefactors_json.json
│   ├── tissue_hires_image.png
│   └── tissue_positions_list.csv

使用Seurat读取

这里用到的文件为 raw_feature_bc_matrix.h5，以及图片tissue_hires_image.png：

代码语言：javascript代码运行次数：0运行复制

###
### Create: Jianming Zeng
### Date:  2023-12-31  
### Email: jmzeng1314@163
### Blog: /
### Forum:  .html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31   First version 
### Update Log: 2024-12-09   by juan zhang (492482942@qq)
### 

rm(list=ls())
options(stringsAsFactors = F)
library(ggsci)
library(dplyr) 
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)
getwd()

# 创建目录
getwd()
gse <- "GSE190811"
dir.create(gse)

################## 读取数据，有h5与tissue_hires_image.png
samples <- list.dirs("GSE190811/", recursive = F, full.names = F)
samples
scRNAlist <- lapply(samples, function(pro){
# pro <- samples[1]
print(pro)

# 先读取 h5
  data <- Read10X_h5(filename = paste0("GSE190811/",pro,"/raw_feature_bc_matrix.h5"))
  dim(data)
  data[1:5,1:5]
  object <- CreateSeuratObject(counts = data, assay = "Spatial", min.cells = 3, project = pro)
  object

# 再读取
  image <- Read10X_Image(image.dir = paste0("GSE190811/",pro), 
                         image.name = "tissue_hires_image.png",
                         filter.matrix = TRUE)
  image
  dim(image)
  image <- image[Cells(x = object)]
  DefaultAssay(object = image) <- "Spatial"

# 添加图片到object中
  object[[pro]] <- image
  object@images[[1]]@scale.factors$lowres <- object@images[[1]]@scale.factors$hires
return(object)
})
names(scRNAlist) <-  samples
scRNAlist
# merge
sce.all <- merge(scRNAlist[[1]], y=scRNAlist[-1], add.cell.ids=samples)
sce.all <- JoinLayers(sce.all) # seurat v5
sce.all

## 简单探索一下数据结构
as.data.frame(sce.all[["Spatial"]]$counts[1:4,1:4])
as.data.frame(LayerData(sce.all, assay = "Spatial", layer = "counts")[1:5,1:5])
head(sce.all@meta.data)
table(sce.all$orig.ident)

Layers(sce.all)
Assays(sce.all)

library(qs)
qsave(sce.all, file="GSE190811/sce.all.qs")

简单查看一下数据的分布

每个样本中 nCount_Spatial 和nFeature_Spatial的小提琴分布：

代码语言：javascript代码运行次数：0运行复制

# Data preprocessing
plot1 <- VlnPlot(sce.all, features = c("nCount_Spatial","nFeature_Spatial"), pt.size = 0.1) + NoLegend()
plot1
ggsave(filename = "1-QC/VlnPlot.pdf", width = 12,height = 6, plot = plot1)

空间分布特征模式：nCount_Spatial

代码语言：javascript代码运行次数：0运行复制

plot2 <- SpatialFeaturePlot(sce.all, features = "nCount_Spatial",pt.size.factor = 2) 
ggsave(filename = "1-QC/FeaturePlot_nCount.pdf", width = 12,height = 6, plot = plot2)

nFeature_Spatial 分布：

代码语言：javascript代码运行次数：0运行复制

plot3 <- SpatialFeaturePlot(sce.all, features = "nFeature_Spatial",pt.size.factor = 2) 
ggsave(filename = "1-QC/FeaturePlot_nFeature.pdf", width = 12,height = 6, plot = plot3)

然后就是后面的空转标准分析

降维聚类分群：

代码语言：javascript代码运行次数：0运行复制

## 计算线粒体/核糖体/特定基因集的比例
mito_genes <- grep("^MT-", rownames(sce.all),ignore.case = T, value = T) 
# 可能是13个线粒体基因
print(mito_genes)
sce.all <- PercentageFeatureSet(sce.all, features = mito_genes, col.name = "percent_mito")
fivenum(sce.all@meta.data$percent_mito)
# 另外一种方法
# sce.all$percent_mito <- (Matrix::colSums(sce.all[["Spatial"]]$counts[mt.genes, ])/Matrix::colSums(sce.all[["Spatial"]]$counts))*100
# fivenum(sce.all$percent_mito)


## 过滤spot
sce.all <- subset(sce.all, subset = nFeature_Spatial > 200 )

## 标准化
options(future.globals.maxSize= 5*1024*1024^2)
sce.all <- SCTransform(sce.all, assay = "Spatial", verbose = T)
sce.all <- RunPCA(sce.all, assay = "SCT", verbose = FALSE)
sce.all <- FindNeighbors(sce.all, reduction = "pca", dims = 1:30)
sce.all <- FindClusters(sce.all, verbose = FALSE,resolution = 0.8)
sce.all <- RunUMAP(sce.all, reduction = "pca", dims = 1:30)

p1 <- DimPlot(sce.all, reduction = "umap", label = TRUE,label.size = 7)
p2 <- SpatialDimPlot(sce.all, label = TRUE, label.size = 3,pt.size.factor = 2,image.alpha = 0.6)
p <- p1 + p2
p
ggsave(filename = "1-QC/DimPlot.pdf", width = 12,height = 6, plot = p)

降维聚类结果如下：

上面的代码运行过程中踩了两个坑，对应的报错以及github上面给出的解决方案如下：

踩坑1：矩阵在log后含NA值，，我这里设置了 CreateSeuratObject(counts = data, assay = "Spatial", min.cells = 3, project = pro)的时候min.cells过滤掉在所有细胞中表达都为0的基因。

代码语言：javascript代码运行次数：0运行复制

Running SCTransform on assay: Spatial
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Error in make_cell_attr(umi, cell_attr, latent_var, batch_var, latent_var_nonreg,  : 
  cell attribute "log_umi" contains NA, NaN, or infinite value

踩坑2：运行内存不足

代码语言：javascript代码运行次数：0运行复制

Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) : 
  The total size of the 19 globals exported for future expression (‘FUN()’) is 555.16 MiB.. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). The three largest globals are ‘FUN’ (538.70 MiB of class ‘function’), ‘umi_bin’ (16.12 MiB of class ‘numeric’) and ‘data_step1’ (270.82 KiB of class ‘list’)

到这里简单的基础分析就做完了~

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-03-18，如有侵权请联系 cloudcommunity@tencent 删除数据分析imagespatial数据json

本文标签： 4个NC杂志的空间转录组数据分析(GSE190811)

版权声明：本文标题：4个NC杂志的空间转录组数据分析(GSE190811) 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/biancheng/1748210131a2826688.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

4个NC杂志的空间转录组数据分析(GSE190811)

4个NC杂志的空间转录组数据分析(GSE190811)

数据背景

目录中的文件说明

使用Seurat读取

简单查看一下数据的分布

然后就是后面的空转标准分析

更多相关文章

4个NC杂志的空间转录组数据分析(GSE190811)

发表评论

推荐文章

RetentionPolicy 枚举类型介绍

【详解】Nginx配置WebSocket

STM32实现M法与T法电机测速方法

自然语言处理，能否成为人工智能与人类语言完美交互的答案？

Headless Chrome 优化：减少内存占用与提速技巧

热门文章

解决：Fontconfig head is null, check your fonts or fonts configurat

《深度剖析SQL游标：复杂数据处理场景下的智慧抉择》

嵌入式Linux：文件锁

【03】优雅草星云物联网AI智控系统从0开发鸿蒙端适配

从淘宝推荐到微信搜索：查找算法如何支撑亿级用户——动画可视化数据结构之查找算法

我“AI”发文——DeepMind在强化学习领域的最新突破

Apipost:前后端联调，比Postman更好用的国产接口调试工具

渗透测试流程到底是什么？这篇给你讲清楚(超详细)

揭秘 MySQL 数据类型：深入理解每种类型的使用与区别

【C++】深入探索string类的实现(上)

最新文章

OpenAI主管Lilian Weng分享RAG对抗LLM幻觉策略

NLTK vs LLM：GraphRAG实体提取方法深度大比

JVM之符号引用和直接引用

ai共同开发

小米15小米15Pro一键获取root权限工具使用教程另外有小米14proUltra小米13pro小米13Ultra小米平板6spro

javascript - Type &#39;undefined&#39; is not assignable to type &#39;menuItemProps[]&#39; - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

外星人Alienware M15 ALW15M-D3729S参数报价

海尔S15 Pro-M5i716GB1TB参数报价

海尔S15 Pro-M5i58GB512GB参数报价

技械骑士HZ60 13代酷睿i716GB512GB4G独显参数报价

ThinkPad L13 11代酷睿 i7 1165G716GB512GB集显 参数报价

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow

ThinkPad L13 11代酷睿 i7 1165G716GB512GB集显参数报价