Shuming Ma

Cited by

	All	Since 2019
Citations	3975	3796
h-index	34	33
i10-index	61	58

1500

750

375

1125

2017201820192020202120222023202424 153 220 294 416 568 1493 799

Public access

View all

16 articles

1 article

available

not available

Based on funding mandates

Co-authors

Furu WeiPartner Research Manager, Microsoft ResearchVerified email at microsoft.com
Xu SunAssociate Professor, Peking UniversityVerified email at pku.edu.cn
houfeng wangPeking UniversityVerified email at pku.edu.cn
Junyang LinQwen Team, Alibaba Group & Peking UniversityVerified email at alibaba-inc.com
Lei CuiMicrosoft Research AsiaVerified email at microsoft.com
Tianyu LiuAlibabaVerified email at pku.edu.cn
Jingjing XuShanghai AI LabVerified email at pku.edu.cn
Wenjie LiThe Hong Kong Polytechnic UniversityVerified email at comp.polyu.edu.hk
Sujian LIPeking Univ.Verified email at pku.edu.cn
Yizhong WangUniversity of WashingtonVerified email at cs.washington.edu

Shuming Ma

Microsoft Research Asia

Verified email at microsoft.com - Homepage

Natural language processing deep learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
SGM: sequence generation model for multi-label classification P Yang, X Sun, W Li, S Ma, W Wu, H Wang arXiv preprint arXiv:1806.04822, 2018	417	2018
Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Advances in Neural Information Processing Systems 36, 2024	257	2024
Kosmos-2: Grounding multimodal large language models to the world Z Peng, W Wang, L Dong, Y Hao, S Huang, S Ma, F Wei arXiv preprint arXiv:2306.14824, 2023	222	2023
Graph of thoughts: Solving elaborate problems with large language models M Besta, N Blach, A Kubicek, R Gerstenberger, M Podstawski, ... Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17682 …, 2024	218	2024
Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers D Dai, Y Sun, L Dong, Y Hao, S Ma, Z Sui, F Wei arXiv preprint arXiv:2212.10559, 2022	202	2022
Global encoding for abstractive summarization J Lin, X Sun, S Ma, Q Su arXiv preprint arXiv:1805.03989, 2018	184	2018
meprop: Sparsified back propagation for accelerated deep learning with reduced overfitting X Sun, X Ren, S Ma, H Wang International Conference on Machine Learning, 3299-3308, 2017	175	2017
Deepnet: Scaling transformers to 1,000 layers H Wang, S Ma, L Dong, S Huang, D Zhang, F Wei IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024	108	2024
Xlm-e: Cross-lingual language model pre-training via electra Z Chi, S Huang, L Dong, S Ma, B Zheng, S Singhal, P Bajaj, X Song, ... arXiv preprint arXiv:2106.16138, 2021	103	2021
A simple and effective unified encoder for document-level machine translation S Ma, D Zhang, M Zhou Proceedings of the 58th annual meeting of the association for computational …, 2020	90	2020
Language models are general-purpose interfaces Y Hao, H Song, L Dong, S Huang, Z Chi, W Wang, S Ma, F Wei arXiv preprint arXiv:2206.06336, 2022	81	2022
Improving semantic relevance for sequence-to-sequence learning of chinese social media text summarization S Ma, X Sun, J Xu, H Wang, W Li, Q Su arXiv preprint arXiv:1706.02459, 2017	78	2017
Retentive network: A successor to transformer for large language models Y Sun, L Dong, S Huang, S Ma, Y Xia, J Xue, J Wang, F Wei arXiv preprint arXiv:2307.08621, 2023	74	2023
Query and output: Generating words by querying distributed word representations for paraphrase generation S Ma, X Sun, W Li, S Li, W Li, X Ren arXiv preprint arXiv:1803.01465, 2018	74	2018
Bag-of-words as target for neural machine translation S Ma, X Sun, Y Wang, J Lin arXiv preprint arXiv:1805.04871, 2018	73	2018
A length-extrapolatable transformer Y Sun, L Dong, B Patra, S Ma, S Huang, A Benhaim, V Chaudhary, ... arXiv preprint arXiv:2212.10554, 2022	68	2022
Alternating language modeling for cross-lingual pre-training J Yang, S Ma, D Zhang, S Wu, Z Li, M Zhou Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 9386-9393, 2020	68	2020
Semantic-unit-based dilated convolution for multi-label text classification J Lin, Q Su, P Yang, S Ma, X Sun arXiv preprint arXiv:1808.08561, 2018	65	2018
mT6: Multilingual pretrained text-to-text transformer with translation pairs Z Chi, L Dong, S Ma, SHXL Mao, H Huang, F Wei arXiv preprint arXiv:2104.08692, 2021	64	2021
Deltalm: Encoder-decoder pre-training for language generation and translation by augmenting pretrained multilingual encoders S Ma, L Dong, S Huang, D Zhang, A Muzio, S Singhal, HH Awadalla, ... arXiv preprint arXiv:2106.13736, 2021	62	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors